Prashant's Weblog
Random Brain "Droppings" of yet another software engineer







Wednesday, January 17, 2007

Data, Data, everywhere,.....

(source)

"Water, water, everywhere,
Nor any drop to drink."

All are familiar with those popular lines from the 1797 The Rhyme of the Ancient Mariner, fast forward to present time and if it were to be re-written by a business manager today it would be like...

"Data, Data, everywhere,
Not a byte that's worth."


Accenture's recent survey of US and UK based large companies seems to confirm this. The results are there for all of us to see and they unequivocally point at one thing...we are not in control of the data we generate and you might also hear business managers saying IT is not doing enough to "add value" to the business .

If you look at it, the problem is this, we have huge amount of data and we need to make sense of it. Now what makes it so very interesting is the magnitude of huge...consider this;

* Comscore captures 50 attributes for every click the user does anywhere on its 1.5 million member network, that translates to 8 TB a year.

* Walmart adds almost one billion rows daily to its already bulging 500 odd TB sales and inventory data.

* European have linked 16 telescopes and its said each generate one gigabit of data per second.

*
Nielsen Media Research found that 80-100 TB of data is added annually and guess by whom, a community of 12000 households!!

Now we don't have much say in amount of data that generated, the only way we can do better in that survey next time around would be to reduce the redundancy. Quite a few methods have been proposed..

* SOA - One of the top 5 bets for 2007 according to Accenture CTO. Concept is simple, if I have my Sales and Marketing application why don't they talk to each other and share the customer data instead of each maintaining there own? That is what SOA enables, access to resources without having to know the platform/implementation details.

* Better protocols - When a packet is sent over TCP, it does lot of checks with regards to the packet and that makes it slow to transfer huge amount of data, so probably a protocol that can deliver the packet with less overhead could help.

* Duplicating data is no longer a viable solution. Transaction tables to ODS, ODS to stage, stage to DW may not be a good idea going forward. Concept like EII could be a alternative.

* SAN and NAS could act as a central repositories thus reducing duplication.

* 40% in the survey said other parts of company are not "willing" to share data. If I read this correctly, it means that the data thats generated in other departments are stored in local media and hence not accessible to others. Its more to do with policies. We need IT policies and setup that encourage sharing. They should make it as easy to save the report I generate on a centralized repository as it is to save it in my local hard disk.

* We need good search algorithms. I understand the pain to look for a document on our intranet, it simply sucks. If I needed to create a document, I prefer to create a new one instead of searching for a template on the intranet and reusing it. Thank lord we are now allowing Google to crawl our intranet.

Now say the IT team gets to implement all or more of the above and we score better in the survey next time around, now the business managers will probably say;
well we are getting good data now, but you see, the IT department in competitor's company has also implement all or more of what we have done and their managers are also able to generate same kind of "intelligence" out of data as we do, so IT investments we made are not contributing towards making that differentiation. IT is not "adding value".

And thus IT starts fresh again in this perpetual cycle, looking for that new technology which will help the business make that elusive "
differentiation".


Labels: , , ,