A generally good article by Colin Beasty about CDI shows a common misconception regarding data warehousing. The article rightly points out that CRM (via Siebel etc) essentially failed to resolve the “single versions of the truth” about customer, with apparently 20-40 systems in a large company having customer data (this sounds plausible but he doesn’t quote a source of this). However he says that data warehouses can’t address this since “data integrity and validity are optional”. Here he seems to be mixing up an operational data store and a data warehouse, or at least a good data warehouse. An operational data store might well be a dump of data straight from a transaction system without work being done on the data (purely for performance reasons) but a data warehouse should definitely not be. A data warehouse is supposed to be pulling together data from multiple systems and providing a single, consistent view across the enterprise. It cannot do that without having a stage of validation of data, rejecting data that is inconsistent with the company’s business rules. If not, it is a case of “garbage in, garbage out”. Now certainly, if you have a source of customer data that is a well implemented CDI hub, rather than several sources (an ERP system, a CRM system etc) then essentially the CDI hub has carried out the validation and resolution stage already i.e. it is acting as a single system of record for customer data. However the warehouse cannot relax, since it also has to deal with all the other kinds of transaction and master data as well. Indeed, I would argue that a hub-based approach carries with it some dangers. If you implement a CDI hub, then do the same for product using a PIM solution, then you will realise that you need another hub for employee, asset, etc. CDI hub technology typically does not handle other types of master data as it is hard coded around the (important) class of master data called customer.
The article acknowledges that CDI is a subset of MDM, but does not draw attention to the danger of a piecemeal hub implementation one datatype at a time. What is needed is a master data repository that can act as a system of record for all types of master data, itself feeding both data warehouses and other systems (possibly via SOA as the article mentioned, but that is essentially optional). Without this realisation we are in danger of creating yet another set of master data sources without really getting to the heart of the issue. You can have multiple hubs, but somewhere you need a single repository which at least knows where every version of master data is in the enterprise, whether in hubs, ERP or elsewhere; better still if that MDM repository can act as an active provider of master data elsewhere, since it will have the enterprise-wide business rules needed to ensure data quality, which systems closer to operational processes may not have. Without a fully integrated approach to master data we are in danger of just adding unnecessary duplicate sources of master data (since these data are, after all, not going away in the ERP systems). Somewhere a true “master of master data” needs to exist, and that needs to be owned by business people with the authority to resolve inter-department disputes over master data (and not just customer data). Otherwise we are just adding another layer to the spaghetti.

No comments so far
Your e-mail address is for administration purposes and is never displayed.