Prashant's Weblog
Random Brain "Droppings" of yet another software engineer

Wednesday, January 17, 2007

Data, Data, everywhere,.....


"Water, water, everywhere,
Nor any drop to drink."

All are familiar with those popular lines from the 1797 The Rhyme of the Ancient Mariner, fast forward to present time and if it were to be re-written by a business manager today it would be like...

"Data, Data, everywhere,
Not a byte that's worth."

Accenture's recent survey of US and UK based large companies seems to confirm this. The results are there for all of us to see and they unequivocally point at one thing...we are not in control of the data we generate and you might also hear business managers saying IT is not doing enough to "add value" to the business .

If you look at it, the problem is this, we have huge amount of data and we need to make sense of it. Now what makes it so very interesting is the magnitude of huge...consider this;

* Comscore captures 50 attributes for every click the user does anywhere on its 1.5 million member network, that translates to 8 TB a year.

* Walmart adds almost one billion rows daily to its already bulging 500 odd TB sales and inventory data.

* European have linked 16 telescopes and its said each generate one gigabit of data per second.

Nielsen Media Research found that 80-100 TB of data is added annually and guess by whom, a community of 12000 households!!

Now we don't have much say in amount of data that generated, the only way we can do better in that survey next time around would be to reduce the redundancy. Quite a few methods have been proposed..

* SOA - One of the top 5 bets for 2007 according to Accenture CTO. Concept is simple, if I have my Sales and Marketing application why don't they talk to each other and share the customer data instead of each maintaining there own? That is what SOA enables, access to resources without having to know the platform/implementation details.

* Better protocols - When a packet is sent over TCP, it does lot of checks with regards to the packet and that makes it slow to transfer huge amount of data, so probably a protocol that can deliver the packet with less overhead could help.

* Duplicating data is no longer a viable solution. Transaction tables to ODS, ODS to stage, stage to DW may not be a good idea going forward. Concept like EII could be a alternative.

* SAN and NAS could act as a central repositories thus reducing duplication.

* 40% in the survey said other parts of company are not "willing" to share data. If I read this correctly, it means that the data thats generated in other departments are stored in local media and hence not accessible to others. Its more to do with policies. We need IT policies and setup that encourage sharing. They should make it as easy to save the report I generate on a centralized repository as it is to save it in my local hard disk.

* We need good search algorithms. I understand the pain to look for a document on our intranet, it simply sucks. If I needed to create a document, I prefer to create a new one instead of searching for a template on the intranet and reusing it. Thank lord we are now allowing Google to crawl our intranet.

Now say the IT team gets to implement all or more of the above and we score better in the survey next time around, now the business managers will probably say;
well we are getting good data now, but you see, the IT department in competitor's company has also implement all or more of what we have done and their managers are also able to generate same kind of "intelligence" out of data as we do, so IT investments we made are not contributing towards making that differentiation. IT is not "adding value".

And thus IT starts fresh again in this perpetual cycle, looking for that new technology which will help the business make that elusive "

Labels: , , ,


  • Interesting observations. I wonder if anyone is calculating the data expansion that includes all of the ETL-based replication that's going on. Which is why data virtualization techniques like EII make a lot of sense.


    By Anonymous Anonymous, at 4:50 AM  

  • >>techniques like EII make a lot of sense.<<

    Tim, these techniques make sense true , but I am not sure if its a effective has to understand that these techniques do not apply in many cases..EII is preferable only if;
    - I know beforehand what are the user requirements
    - If business needs that on-demand data
    - Business is working with very sensitive data and does not want to replicate it.
    - I need to be sure that the data requested by business does not choke my networks or stall my OLTP systems..etc etc..

    Not sure in real life if we get that many situations where above conditions hold true

    By Blogger Prashant, at 2:46 PM  

Post a Comment

<< Home