There is a good article on CDI by Jill Dyche, a co-founder of Baseline Consulting and someone who has clearly seen a lot of real-world CDI projects. She does a good job of explaining how CDI projects have traditionally been quite transaction-oriented, with hubs serving up customer data via middleware to other applications. CDI hubs are at one end of the MDM spectrum, firmly at the “operational” level. At the other end are “analytic” MDM applications, which enable companies to take a cross-enterprise view of key information like assets, people, products, channels etc. Getting to understand the differences between the multiple, conflicting definitions embedded in the source systems is a major job in itself, and will usually result in a master data repository. This in turn can be a feed into a corporate warehouse. A few pioneering companies have taken the final logical step and hooked up their master data repositories, via middleware like Tibco or IBM Websphere, to their operational systems, so that the master data repository becomes the true master source, driving changes as required back down into the operational systems like ERP and CRM.
CDI hubs have started at the other end, linking up to systems providing customer data, often in real-time. Customer data represents a high-value area of MDM, as in the case of consumers the customer data is often quite simple, but is in high volume, and requires fairly simple processing to match a customer record in one system to one in another (e.g. matching “A. Hayler” v “Andy Hayler”). However, this is only part of the answer, as even in the case of “customer” things can get more complex. Suppose you are a company like Shell and you want to treat Unilever as a key global account. Finding out all the information about Unilever is not just a simple keyword matching exercise, since Unilever trades under many different subsidiary names and brands around the world e.g. its main Indian subsidiary is not called Unilever but Hindustan Lever; it also owns a company called Algida, and I defy even the cleverest fuzzy logic algorithm to associate “Algida” with “Unilever” (such examples are why you should always be sceptical about vendors selling matching algorithms) It can be seen that, for more complex situations like this, human intervention is required in order to correctly add up all the element of Unilever’s business.
This issue can become considerably more complex with things like “asset” or “product”, which can have a whole hierarchy of sub-types. This is why CDI hub technology tends to be used specifically for consumer information. Other types of MDM technology are required to manage more complex data and the workflows that surround the updating this e.g. no automated system is going to just create a new brand; this requires numerous approvals and has various knock-on effects to other master data.
I would argue that, at least at present, you are likely to require one kind of technology to handle general purpose MDM data, whether customer or asset or whatever, from an analytical viewpoint, and potentially a separate technology to handle real-time updates, perhaps real-time. Of course it would be nice if a single product did everything, but at present nobody can truly claim this. What does seem a missed opportunity is the way that vendors have made their technology so very specific to particular types of master data e.g. PIM and CDI. While operational and analytic needs are inherently different, there is no reason at all not to take a generic approach to all types of master data. Customers can hardly be expected to buy a separate hub for every type of master data.
