Andy on Enterprise Software

Master data: from jungle to garden in several not so easy steps

June 20, 2007

I very much liked a succinct article by the ever-reliable Colin White on MDM approaches. Companies still struggle to get to grips with what a roadmap for MDM is all about, with apparently competing (and incomplete and immature) MDM technologies and management consultants who are only a few pages ahead of the customers in the manual. This piece neatly sets out the end goal of MDM and the various approaches to getting there (via analytic MDM or operational MDM as a start). It would have been even better had it explained in more detail how the alternatives can be run in parallel, and going into more depth on the issues of each sequences of steps. However by clearly separating out operational and analytic MDM and showing how these are complementary he is already doing a significant service.

The issue he mentions with “approach 1″ i.e. the “complexity of maintaining a complete historical record of master data” can be dealt with if you choose an analytic MDM technology which has built-in support for analysis over time. Colin points out that a key step is to end up with a low-latency master data store as the system of record for the enterprise, acting as a provider of golden copy master data to other sources, both transaction systems and analytical ones such as an enterprise data warehouse. If properly implemented, this will result in a change of the centre of gravity of master data, from the current situation where the system of record is ERP to a situation where the enterprise master data repository is actually the system of record, providing data through a published interface (and an enterprise service bus) through to all other systems, including ERP. This is a desirable end state, and is a key step to starting to unlock the monolithic ERP systems that companies use today into more manageable components.

I really hope that this paper gets the attention that it deserves. Getting most of the key messages into two page article is quite an achievement. I would like to see this developed further, and hopefully it will be.

del.icio.us:Master data: from jungle to garden in several not so easy steps  digg:Master data: from jungle to garden in several not so easy steps  reddit:Master data: from jungle to garden in several not so easy steps  Y!:Master data: from jungle to garden in several not so easy steps

The mythical software productivity miracle

January 11, 2007

We have got used to Moore’s Law, whereby hardware gets faster at a dizzying rate, though there ought to be a caveat to this that points out that software gets less and less efficient in tandem with this. A neat summary of this situation is “Intel giveth, and Microsoft taketh away”. However when it comes to software development, the situation is very different. Sure, things have become more productive for developers over the years. My first job as a systems programmer involved coding IBM job control language (JCL) decks, which entertainingly behaved pretty much as though they were still on punch cards, with all kinds of quirks (like cunningly ignoring you if you had a continuation of a line too far to the right, beyond a certain column). I just missed Assembler and started with PL/1, but anyone who coded IBM’s ADF will be familiar enough with Assembler. However it is not clear how much things have really moved on since then. In the 1980s “4GLs” were all the rage, but apart from not compiling and hence being slower to run, they were scarcely much advance on Cobol or PL/1. Then there were “CASE” tools like James Martin’s IEF, which promised to do away with coding altogether. Well, we all know what happened to those. Experienced programmers always knew that the key to productivity was to reuse bits of code that actually worked, long before object orientation came along and made this a little easier. Good programmers always had carefully structured code libraries to call on rather than repeating similar code by editing a copy and making minor changes, so I’m not convinced that productivity raced along that much due to OO either.

This is all anecdotal though - what about hard numbers? Software productivity can be measured in lines of code produced in a given time e.g. a day, though this measure has limitations e.g. is more code really better (maybe implying less reuse) and anyway how do we compare different programming languages? A more objective attempt was to measure the number of function points per day or month. This had the big advantage of being independent of programming language, and also works for packages - you can count the number of function points in SAP (if you had the patience). Unfortunately it requires some manual counting, and so has never really caught on widely beyond some diehards who worked in project office roles (like me). Well, we always used to reckon that 15-30 function points per man month was pretty much a good average for commercial programming, and when Shell actually measured such things back in the 1990s this turned out be pretty true, almost irrespective of whether you were using a 3GL or 4GL, or even a package. Shell Australia measured their SAP implementations carefully and found that the function points per man month was delivered was no better (indeed a little worse) than for custom code, which was an unpopular political message at the time but was inconveniently true. Hence, while 3GL productivity definitely was an advance on Assembler, just about every advance since then has had a fairly marginal effect i.e. programmer teams writing today are only slightly more productive than ones in 1986. By far the most important factor for productivity was size of project: big projects went slowly and small projects went quickly, and that was that.

A new book “Dreaming in Code” by Scott Rosenberg is a timely reminder of why this is. Many of the issues of writing a moderately complex application are not to do with individual programmer productivity and everything to do with human issues like a clear vision, good team communication, teamwork etc. All the faster processors and latest programmer tools in the world can only optimise one part of the software development process. Sadly, the human issues are still there to haunt us, having moved on not one jot. Scott discusses the Chandler open source project and its woes, reminding us that software productivity is only a little bit about technology, and a great deal about human nature.

http://www.amazon.com/Dreaming-Code-Programmers-Transcendent-Software/dp/1400082463/sr=8-1/qid=1168361266/ref=pd_bbs_sr_1/102-6088231-4592927?ie=UTF8&s=books

When I was doing technology planning at Shell I always had a rule: if a problem was constrained by hardware then it would be fixed quicker than you expect, but if the problem was a software issue it would always take longer than you would think. This book tells you why that is not a bad rule.

del.icio.us:The mythical software productivity miracle  digg:The mythical software productivity miracle  reddit:The mythical software productivity miracle  Y!:The mythical software productivity miracle

In the project jungle, your MDM initiative needs claws

October 2, 2006

Matthew Beyer makes a start on trying to come up with an approach to tackling master data initiatives.  Some of what he says makes good sense, as in “think strategically but act tactically”.  However I’d like to suggest a different approach to him in the way to prioritise.  The biggest problem with the issue of master data is one of scale.  Enterprises have a lot of systems and many types of master data, many far beyond the “10 systems” that is used as an illustration in the article.  Just one Shell subsidiary had 175 interfaces left AFTER they had implemented every module of SAP, to give a sense of the magnitude of the problem in a large company.  Hence an approach that says “just map all the master data in the enterprise and catalog which systems use each type of data” is going to be a severely lengthy process, which will probably get cancelled after a few months when little is to be shown for all the pretty diagrams.

I believe that a master data initiative needs to justify itself, just like any other project that is fighting for the enterprise’s scare resources and capital.  Hence I believe that a good approach is to start by identifying and costing problems that may be associated with master data, and putting a price tag on these problems.  For example, poor customer data could result in duplicate marketing costs, lower customer satisfaction, or misplaced deliveries.  Having an inability to get a view of supplier spend across the enterprise (as 68% of customers in one survey stated at a 2006 UK procurement conference) will have a cost in terms of not being able to get an optimal deal with suppliers, and in resulting in duplicate suppliers.  These things have real costs associated with them, and so, if fixed, have real hard dollar benefits.  Interviewing executives in marketing, procurement, finance, operations etc will soon start to tease out which operational issues are actually causing the business pain, and which have the greatest potential value if they could be fixed.  Business people may not be able to put a precise price tag on each problem, but they must be able to estimate at least a range.  If they cannot, then it is probably not that pressing a problem and you can move on to the next one. 

At the end of such an interview process you will have a series of business problems with estimates of potential savings, and can map this against the master data associated with these business processes.  Now you have a basis for priority.  If it turns out that there are tens of millions of dollars of savings to be gained from fixing problems with (say) supplier data, then that is a very good place to start your MDM pilot.

Such an approach assures you that you will be able to put a business case together for an MDM initiative, even if it has limited scope at first.  Such an initiative has a lot more chance or approval and ongoing survival that something that it perceived to be a purist or IT-led data modelling initiative. 

Provided that you adopt an architecture that can cope with master data in general and not just this one type specifically (i.e. try and avoid “hubs” that only address one type of master data) then you can build on the early success of a pilot project confident that the approach you have taken will be useful across the enterprise.  By getting an early quick win in this way you build the credibility for follow-on projects and can start to justify ongoing investment in protecting the integrity of master data in the future e.g. by setting up a business-led information assed competence centre where ownership of data is clearly defined. 

IT projects of any kind that fail to go through a rigorous cost-benefit case risk not being signed off, and then being cancelled part way through.  The race for funds and resources in a large company is a Darwinian one, so equip your MDM project with the ROI teeth and claws it needs to survive and justify itself.  When times turn sour and the CFO draws up a list of projects to “postpone”, a strong business-driven ROI case will go a long way to ensuring your MDM project claws its way to the top of the heap. 

 

 

 

 

 

 

 

 

del.icio.us:In the project jungle, your MDM initiative needs claws  digg:In the project jungle, your MDM initiative needs claws  reddit:In the project jungle, your MDM initiative needs claws  Y!:In the project jungle, your MDM initiative needs claws

Some rare common sense

July 18, 2006

Ad Stam and Mark Allin have written an excellent piece in DM Review this month covering data stewardship and master data management. They correctly point out that, with regards to business intelligence systems, that “change will occur, and time to react will decrease” and lay out a sensible architecture for dealing with this issue. I very much like the way they put emphasis on the need for a business unit to deal with data governance as a key building block. In the article they explain the key requirements of such a group and make the interesting analogy of logistics, which is usually sourced these days to a separate unit or even separate company. Similarly they believe that the management of master data should be managed by a non-IT business unit.

The article also correctly distinguishes between “golden copy” data held in the data warehouse and a master data management repository, which in addition will hold master data in all its stages. The master data repository should be linked to the data warehouse, but are not the same physical entity since the master data repository has to handle “unclean” data whereas the data warehouse should have only fully validates data stored in it.

It is a pleasant change to read such a sensible article on best practice in data management, but this is because Ad and Mark are real practitioners in serious enterprise-wide projects through their work at Atos Origin e.g. at customers like Philips. They are not people who spend their lives giving slick Powerpoint presentations at conferences but are close to the action in real-world implementations. I worry that there are too many people on the conference circuit who are eloquent speakers but haven’t actually seen a real live project for a long time. I have known Ad Stam for many years and can testify that his team at Atos are an extremely competent and experienced set of practitioners who focus on customer delivery rather than self-publicity. If you have a data warehouse or MDM project then you could do a lot worse than use Ad’s team.

del.icio.us:Some rare common sense  digg:Some rare common sense  reddit:Some rare common sense  Y!:Some rare common sense

Don’t prototype: iterate

July 6, 2006

Stephen Swoyer makes a case for using EII as a prototype for data warehouses in his recent Enterprise Systems article. As the article reflects, there are some dangers as well as benefits here e.g. the prototype may just be extended and a proper data warehouse system never built. This is a problem because, as I have argued elsewhere. EII is suitable only for a small subset of business intelligence needs. However the valid point is that business users do want to see prototypes, especially in an area like business intelligence where the requirements tend to be fluid and ill-defined. However there is an alternative to buying an EII tool, knocking up a prototype and then building the proper warehouse.

These days you do not have to build a data warehouse, since you can buy one. Packaged solutions can be deployed much more rapidly than data warehouses that have to be designed and built by hand, and if they are flexible enough then an iterative approach to the warehouse can be taken. A great example of this was at Owens Corning, who deployed a data warehouse package over a 90 day period, using weekly builds of the business model. Each week a new piece of the business problem was tackled, the business model was updated in the package and the results presented back to the users. Issues would arise, feedback taken, and the next week the model would be refined, and a new business area started. This highly iterative approach ensured that the business users were fully engaged with the project, and could see a visible improvement in what they were going to get week by week.

After a few weeks the problems became less technical and functional, and more business related e.g. issues of data quality and how certain business model issues were to be resolved. After 90 days the application was delivered, and this was no prototype: it was a fully working, deployed production warehouse. The insights this application gave saved Owens Corning a great deal of money, so much so that the project had a three week payback period. Indeed the project became an Infoworld 100 winner.

Data warehouse project leaders need to rid themselves of the notion that data warehouses have to be long, custom build projects. At present TDWI reckons they take 16 months to deliver on average. This is far too long if using a traditional waterfall methodology, and indeed needs a more iterative approach. But why build a throwaway prototype when you can implement the real thing via a data warehouse package?

del.icio.us:Don’t prototype: iterate  digg:Don’t prototype: iterate  reddit:Don’t prototype: iterate  Y!:Don’t prototype: iterate

Mergers and Measurement

June 26, 2006

Margaret Harvey points out in a recent article that the effort of integrating the IT systems of two merged companies can be a major constraint and affect the success of the merger. Certainly this is an area that is often neglected in the heat of the deal. But once the investment bankers have collected their fees and an acquisition or merger is done, what is the best approach to integrating IT systems? What is often missed is that, in addition to different systems e.g. one company might use SAP for ERP and the other Oracle, the immediate problem is that the two companies will have completely different coding systems and terminology for everything, from the chart of accounts, through to product and asset hierarchies, customer segmentation, procurement supplier structures and even HR classifications. Even if you have many systems from the same vendor, this will not help you much given that all the business rules and definitions will be different in the two systems.

To begin with the priority should be to understand business performance across the combined new entity, and this does not necessarily involve ripping out half the operational systems. When HBOS did their merger, both Halifax and Bank of Scotland had the same procurement system, but it was soon discovered that this helped little in taking a single view of suppliers across the new group given the different classification of suppliers in each system. To convert all the data from one system into the other was estimated to take well over a year, but instead they put a data warehouse system in which mapped the two supplier hierarchies together, enabling a single view to be taken even though the two underlying systems were still in place. This system was deployed in just three months, giving an immediate view of combined procurement and enabling large savings to be rapidly made. A similar appraoch was taken when Shell bought Pennzoil, and when Intelsat bought Loral.

It makes sense initially to follow this approach so that a picture of operating performance can quickly be made, but at some point you will want to rationalize the operational systems of the two companies, in order to reduce support costs and eliminate duplicated skill sets. It would be helpful to draw up an asset register of the IT systems of the two companies, but just listing the names and broad functional areas of the systems covered is only of limited use. You also need to know the depth of coverage of the systems, and the likely cost of replacement. Clearly, each company may have some systems in much better shape than others, so unless it is case of a whale swallowing a minnow, it is likely that some selection of systems from both sides will be in order. To be able to have a stab at estimating replacement costs, you could use a fairly old but useful technique to estimate application size: function points.

Function points are a measure of system “size” that does not depend on knowing about the underlying technology used to build the system, so applies equally to packages and custom-build systems. Once you know that a system is, say, 2000 function points in size, then there are well established metrics on how long it costs to replace such a system e.g. for transaction systems, a ballpark figure of 25-30 function points per man month can be delivered, which does not really seem to change much whether it is a package or in-house. Hence a 2000 function point transaction system will cost about 80 man-months to build or implement, as a first pass estimate. MIS systems are less demanding technically than transaction systems (as they are generally read only) and better productivity figures can be be achieved here. These industry averages turned to be about right when I was involved in a metrics program at Shell in the mid 1990s. At that time a number of Shell companies counted function points and discovered productivity of around 15 - 30 function points per man month delivered for medium sized transaction systems, irrespective of whether these were in-house systems or packages. Larger projects had lower productivity, smaller projects have higher productivity, so delivering a 20,000 function point system will be a lot worse than a 2,000 function point system in terms of productivity i.e. fewer function points per man month will be delivered on the larger system. Counting function points in full is tedious and indeed is the single factor that has relegated it to something of a geek niche, yet there are short cut estimating techniques that are fairly accurate and are vastly quicker to do that counting in full. By using these short-cut techniques a broadly accurate picture of an application inventory can be pulled together quite quickly, and this should be good enough for a first pass estimate.

There are a host of good books that discuss project metrics and productivity factors which you can read for more detailed guidance. The point here is that by constructing an inventory of the IT applications of both companies involved in a merger you can get a better feel for the likely cost of replacing those systems, and hence make a business case for doing this. In this way you can have a structured approach to deciding which systems to retire, and avoid the two parties on either side of the merger just defending their own systems without regard to functionality or cost of replacement. Knowing the true costs involved of systems integration should be part of the merger due diligence.

Further reading:

Software Engineering Economics
Controlling Software Projects
Function Points

del.icio.us:Mergers and Measurement  digg:Mergers and Measurement  reddit:Mergers and Measurement  Y!:Mergers and Measurement

Size still isn’t everything

June 7, 2006

Madan Sheina, who is one of the smarter analysts out there, has written an excellent piece in Computer Business Review on an old hobby horse of mine: data warehouses that are unnecessarily large. I won’t rehash the arguments that are made in the article here (in which Madan is kind enough to quote me) as you can read it for yourself but you can be sure that bigger is not necessarily better when it comes to making sense of your business peformance: indeed the opposite is usually true.

Giant data warehouses certainly benefit storage vendors, hardware vendors, consultants who build and tune them and DBAs, who love to discuss their largest database as if is was a proxy for their, er, masculinity (apologies to those female DBAs out there, but you know what I mean; it is good for your resume to have worked on very large databases). The trouble is that high volumes of data make it harder to quickly analyse data in a meaninfgul way, and in most cases this sort of data warehouse elephantitis can be avoided by careful consideration of the use cases,probably saving a lot of money to boot. Of course that would involve IT people actually talking to he business users, I won’t be holding my breath for this more thoughtful approach to take off as a trend. Well done Madan for another thoughtful article.

del.icio.us:Size still isn't everything  digg:Size still isn't everything  reddit:Size still isn't everything  Y!:Size still isn't everything

The patter of tiny pitfalls

June 5, 2006

There are some sensible tips from Jane Griffin on MDM pitfalls in a recent article. As she points out, improving your master data is a journey, not a destination, so it makes sense to avoid trying to boil the ocean and instead concentrate on a few high priority areas, perhaps in one or two business units. It would make sense to me to start by identifying areas where MDM problems were causing the most operational difficulties e.g. misplaced orders. By starting where there is a real problem you will have less difficulty in getting business buy-in to the initiative. Be clear that there are lost of different types of master data e.g. we are involved with a project at BP which manages 350 different master data types, and clearly some of these will be more pressing an issue than others.

I have seen some articles where people are struggling to justify an MDM initiative, yet really such initiatives should be much easier to justify than many IT projects. For a start IT people can put the issues in business terms. Master data problems cause very real, practical issues that cost money. For example poor or duplicated customer data can increase failed deliveries, and issues with invoicing. Poor product data can result in duplicated marketing costs, and in some cases even cause issues with health and safety. Problems with chart of accounts data can delay the time needed to close the books. These are all things that have a cost, and so can be assigned a dollar value to fix.

Successful MDM projects will be heavily business-led, driven by the need to improve operational performance. IT staff need to educate business people that there are now an emerging set of solutions that can help, and get those business people involved in owning the data. It is the lack of data governance in many companies that contributed to the poor state of master data in the first place.

del.icio.us:The patter of tiny pitfalls  digg:The patter of tiny pitfalls  reddit:The patter of tiny pitfalls  Y!:The patter of tiny pitfalls

The weakest data link

May 19, 2006

There is a thoughtful article in McKinsey quarterly on managing supply chains. It highlights the problem that even if you have perfectly consistent and accessible information in your company, in many situations e.g. with mobile phone, there is a web of separate companies between the designer and the customer e.g.

components supplier -> distributor -> ODM -> OEM -> distributor -> customer

Each of these is dependent to some extent on the other, and so if you want to know how your sales are going or how is product quality, you will want to interact with information from other companies further back in the chain. This presents the problem that the systems in other companies will not use the same terminology and coding structures as yours, meaning that you will need to resolve these differences in some way e.g. through a data warehouse project. The article points out that in many cases companies have not built these links and so have no visibility up and down the supply chain. This information is not just nice to have:

“Bridging these gaps pays off. In one case, a leading enterprise-computing company started gathering better data from field services, which gave it information on the incidence of failures and their costs. By feeding that data to design teams, the company developed products that could be serviced and repaired more easily. The result: total costs over the product life cycle fell by 10 to 20 percent.”

Clearly such savings are worth having. The article is an excellent illustration that the issues of dealing with multiple semantics are not confined to internal systems, and indeed in such cases standardization is literally unattainable. Instead software solutions are required that can map multiple business structures together and make sense of them. Companies that invest in such data warehouse solutions are, as this article shows, getting very tangible results.

del.icio.us:The weakest data link  digg:The weakest data link  reddit:The weakest data link  Y!:The weakest data link

CDI compared to other master data

May 2, 2006

There is a good article on CDI by Jill Dyche, a co-founder of Baseline Consulting and someone who has clearly seen a lot of real-world CDI projects. She does a good job of explaining how CDI projects have traditionally been quite transaction-oriented, with hubs serving up customer data via middleware to other applications. CDI hubs are at one end of the MDM spectrum, firmly at the “operational” level. At the other end are “analytic” MDM applications, which enable companies to take a cross-enterprise view of key information like assets, people, products, channels etc. Getting to understand the differences between the multiple, conflicting definitions embedded in the source systems is a major job in itself, and will usually result in a master data repository. This in turn can be a feed into a corporate warehouse. A few pioneering companies have taken the final logical step and hooked up their master data repositories, via middleware like Tibco or IBM Websphere, to their operational systems, so that the master data repository becomes the true master source, driving changes as required back down into the operational systems like ERP and CRM.

CDI hubs have started at the other end, linking up to systems providing customer data, often in real-time. Customer data represents a high-value area of MDM, as in the case of consumers the customer data is often quite simple, but is in high volume, and requires fairly simple processing to match a customer record in one system to one in another (e.g. matching “A. Hayler” v “Andy Hayler”). However, this is only part of the answer, as even in the case of “customer” things can get more complex. Suppose you are a company like Shell and you want to treat Unilever as a key global account. Finding out all the information about Unilever is not just a simple keyword matching exercise, since Unilever trades under many different subsidiary names and brands around the world e.g. its main Indian subsidiary is not called Unilever but Hindustan Lever; it also owns a company called Algida, and I defy even the cleverest fuzzy logic algorithm to associate “Algida” with “Unilever” (such examples are why you should always be sceptical about vendors selling matching algorithms) It can be seen that, for more complex situations like this, human intervention is required in order to correctly add up all the element of Unilever’s business.

This issue can become considerably more complex with things like “asset” or “product”, which can have a whole hierarchy of sub-types. This is why CDI hub technology tends to be used specifically for consumer information. Other types of MDM technology are required to manage more complex data and the workflows that surround the updating this e.g. no automated system is going to just create a new brand; this requires numerous approvals and has various knock-on effects to other master data.

I would argue that, at least at present, you are likely to require one kind of technology to handle general purpose MDM data, whether customer or asset or whatever, from an analytical viewpoint, and potentially a separate technology to handle real-time updates, perhaps real-time. Of course it would be nice if a single product did everything, but at present nobody can truly claim this. What does seem a missed opportunity is the way that vendors have made their technology so very specific to particular types of master data e.g. PIM and CDI. While operational and analytic needs are inherently different, there is no reason at all not to take a generic approach to all types of master data. Customers can hardly be expected to buy a separate hub for every type of master data.

del.icio.us:CDI compared to other master data  digg:CDI compared to other master data  reddit:CDI compared to other master data  Y!:CDI compared to other master data