Andy on Enterprise Software

CDI compared to other master data

May 2, 2006

There is a good article on CDI by Jill Dyche, a co-founder of Baseline Consulting and someone who has clearly seen a lot of real-world CDI projects. She does a good job of explaining how CDI projects have traditionally been quite transaction-oriented, with hubs serving up customer data via middleware to other applications. CDI hubs are at one end of the MDM spectrum, firmly at the “operational” level. At the other end are “analytic” MDM applications, which enable companies to take a cross-enterprise view of key information like assets, people, products, channels etc. Getting to understand the differences between the multiple, conflicting definitions embedded in the source systems is a major job in itself, and will usually result in a master data repository. This in turn can be a feed into a corporate warehouse. A few pioneering companies have taken the final logical step and hooked up their master data repositories, via middleware like Tibco or IBM Websphere, to their operational systems, so that the master data repository becomes the true master source, driving changes as required back down into the operational systems like ERP and CRM.

CDI hubs have started at the other end, linking up to systems providing customer data, often in real-time. Customer data represents a high-value area of MDM, as in the case of consumers the customer data is often quite simple, but is in high volume, and requires fairly simple processing to match a customer record in one system to one in another (e.g. matching “A. Hayler” v “Andy Hayler”). However, this is only part of the answer, as even in the case of “customer” things can get more complex. Suppose you are a company like Shell and you want to treat Unilever as a key global account. Finding out all the information about Unilever is not just a simple keyword matching exercise, since Unilever trades under many different subsidiary names and brands around the world e.g. its main Indian subsidiary is not called Unilever but Hindustan Lever; it also owns a company called Algida, and I defy even the cleverest fuzzy logic algorithm to associate “Algida” with “Unilever” (such examples are why you should always be sceptical about vendors selling matching algorithms) It can be seen that, for more complex situations like this, human intervention is required in order to correctly add up all the element of Unilever’s business.

This issue can become considerably more complex with things like “asset” or “product”, which can have a whole hierarchy of sub-types. This is why CDI hub technology tends to be used specifically for consumer information. Other types of MDM technology are required to manage more complex data and the workflows that surround the updating this e.g. no automated system is going to just create a new brand; this requires numerous approvals and has various knock-on effects to other master data.

I would argue that, at least at present, you are likely to require one kind of technology to handle general purpose MDM data, whether customer or asset or whatever, from an analytical viewpoint, and potentially a separate technology to handle real-time updates, perhaps real-time. Of course it would be nice if a single product did everything, but at present nobody can truly claim this. What does seem a missed opportunity is the way that vendors have made their technology so very specific to particular types of master data e.g. PIM and CDI. While operational and analytic needs are inherently different, there is no reason at all not to take a generic approach to all types of master data. Customers can hardly be expected to buy a separate hub for every type of master data.

del.icio.us:CDI compared to other master data  digg:CDI compared to other master data  reddit:CDI compared to other master data  Y!:CDI compared to other master data

One more TLA to remember

EIM is a recent Gartner market positioning which is an umbrella term for business intelligence, master data management and content management. While there is a certain inevitable “not another acronym” reaction, this particular one makes quite a lot of sense to me. Gartner have sensibly made the term explicitly cover business processes rather than just technology, so that data governance and stewardship would be part of this broad area. As the Gartner notes say, data integration is at the heart of this.

I think this is positive because the industry has taken an overly technology-centric perspective this so far. Technologies such as ETL are necessary but not sufficient to deliver a broad-based understanding of corporate information. I have observed some forward-looking companies setting up new organizations to manage information, staffed with mainly business rather than IT staff. The groups have the remit to cover the provision of data as a service to the rest of the enterprise, and so they have to worry about data quality, data warehouses, master data, integration middleware and all the processes that go along with it: indeed, this is pretty much a definition of what EIM is all about. Taking a holistic, business-led approach is the right thing to do, since providing high quality, timely data requires a level of business ownership that cannot just be delegated to the internal IT department, or out-sourced to India. The various supporting technologies need to do just that: support business rather than being ends to themselves.

It will be interesting to see how this new terminology catches on, but I think it has legs since it seems to me to incorporate a lot of common sense.

del.icio.us:One more TLA to remember  digg:One more TLA to remember  reddit:One more TLA to remember  Y!:One more TLA to remember