Andy on Enterprise Software

Never mind the quality, feel the width

December 7, 2007

Frank Buytendijk (ex Gartner analyst, now with Oracle) makes an importantpoint about data quality on his blog: it is inherently dull. This in itself causes problems both to people within organisations who care about data quality (there must be a few of you out there) and for data quality vendors, who struggle to sell their products at a decent price point in sufficient numbers. I have written about this before, in which I pointed out just a couple of real life cases of poor data quality that I have personally encountered, each of which cost many millions of dollars.

The reason that data quality is generally excellent in the area of salary and expense processing is that people care deeply about what they get paid, and you can be pretty sure than any clerical errors get spotted and complained about very quickly. However in most cases data quality occurs due to people being asked to enter or maintain data for which they see no personal or even obvious company benefit. Data that is useful for “some other department” is never going to receive the same care and attention that your own personal expense claims get.

As Frank says, in order to move data quality higher up the enterprise priority list, it needs to widen its perspective: move beyond talking about customer names and addresses. Yes, this is important if you are doing mailshots, and certainly poor customer name and address management can have more serious consequences, but most executives have got better things to do than worry about whether their mailshots are being duplicated.

Despite numerous acquisition over the years (First Logic, Similarity, Vality, …) there are still plenty of small data quality vendors out there, some with very interesting technology. Yet aside from Trillium, few have managed to get even into double figures of millions of revenue. This is not due to an absence of a real problem to address.

Some data quality vendors rightly see master data management as a way of repositioning their offerings in a more fashionable area, but they need to realise that data quality is just a feature of a complete MDM solution. Hence they need to partner with broader-based MDM repository vendors who themselves often lack proper data quality technology, rather than pretending they themselves are a complete solution. They should also do a better job of highlighting quantified customer dollar benefits achieved from the use of data quality technology. This should not be hard to do since data quality projects usually have excellent payback. Yet time after time the example used in data quality collateral are the tired name and address cleanup, followed by an esoteric discussion about whether probabilistic or deterministic matching is better (paying customers don’t care - they are interested in what benefits they see). Far too few data quality case studies mention hard-dollar benefits to the customer.

Data quality should have much going for it: it is a very real problem, the condition of data quality in most large organisation is horrible (and far worse than generally realised), and the costs of this are significant and cause genuine and in some cases very serious operational problems. Yet the industry as a whole has done a poor job of explaining itself to the people with the cheque books in enterprises.

del.icio.us:Never mind the quality, feel the width  digg:Never mind the quality, feel the width  reddit:Never mind the quality, feel the width  Y!:Never mind the quality, feel the width

The dust settles

November 29, 2007

I had a chance recently to dig a little deeper into the recent acquisition of Purisma by Dun & Bradstreet. The way that this news leaked out was a case study in how not to do software PR. The news came out in an investor briefing by D&B, and there were no clues as to whether D&B was even going to continue selling the Purisma technology, or just use it for internal purposes. After all, D&B has daunting master data issues. 150 million companies are tracked, one and a half million updates a day made to the company information it sells: plenty of data management implications there. So what did this mean to Purisma customers and prospects? No clues were offered.

Having now spoken to Bob Hagenau, who was VP of products and co-founder of Purisma, the smoke has cleared a little. Purisma will be retained as a stand-alone business unit, with its own enterprise sales force. The Purisma technology will continued to be sold in its present form, though it is too early to say what the technology roadmap will look like; I am going to take a wild stab in the dark and say bet that further integration with D&B data will feature. Clearly the D&B name brings many benefits: a parent with deep pockets, a customer base that is essentially all large corporations and so a potentially wonderful leads channel. However the botched news release shows the dark side of a large parent in an different core industry: falling foul of the corporate bureaucracy, in this case the corporate press office.

Hopefully, as the acquisition beds down, Purisma will learn how to work the D&B corporate systems and avoid future press gaffes, while taking advantage of the undoubted resources that D&B can bring to bear.

del.icio.us:The dust settles  digg:The dust settles  reddit:The dust settles  Y!:The dust settles

Blowing Bubbles

November 15, 2007

Back in the late 1990s companies filed for IPOs even though they had modest revenues and were losing money. Due to the tulip mentality of the time investors suspended disbelief and bought in anyway, giving way to the crash of 2001. A couple of years after that bankers were telling me that in order to have an IPO you would need “at least a couple of years of solid trading profits”, quarterly revenues of at least $25 million and preferably more, as well as strong growth. Those heady days of the late 1990s were a freak occurrence, like the South Sea Bubble. Certainly technology IPOs dried up almost entirely.

With the recent gloom on Wall Street I was therefore surprised to see Initiate Systems filing for an IPO. They are growing quite rapidly but not only have never made a cent of profit, but their losses appear to be, if anything, widening slightly at about a third of their revenues. Throw in an admitted financial misstatement and does this start to feel to you like the late 1990s again? No doubt Initiate is expertly and expensively advised, but this will certainly be one to watch, as if the IPO goes ahead and well then it will change perceptions of exit strategies for high tech companies.

del.icio.us:Blowing Bubbles  digg:Blowing Bubbles  reddit:Blowing Bubbles  Y!:Blowing Bubbles

Pure and chased

November 7, 2007

Purisma has been acquired by Dun & Bradstreet, the business information company that provide, amongst other things, assessment of credit risk of companies and company statistics. On the face of it this is a somewhat peculiar acquisition, since D&B is not a pure provider of enterprise software solutions in the way that, say Oracle, is. However D&B did have its own data quality offering (clearly data quality is a big issue for an information supplier) and Pursima’s customer hub technology is certainly complementary to this data quality offering. It seems possible that D&B has bought Purisma primarily for its own internal purposes, and at this point it is unclear whether Purisma will even continue to be sold as a product in its current form. Rather ironically, Purisma had a product offering allowing integration of D&B into its CDI application. I guess that will come in handy now.

Purisma does not publish public financial data, so it is tricky to tell whether how good or bad the price paid of USD 48 million for the company was. I believe that Purisma had less than 50 employees and I would speculate that its revenues were in the USD 15-20M range. In general it is known that stand-alone CDI and PIM players have been struggling somewhat in the market. This is part due to a gradual dawning on customers that master data management is a broader topic than just “customer” or “product”, a long term theme of this blog. When customers ask “ah, but what about other kinds of master data” (asset, location, employee etc) then specialist CDI and PIM vendors do not have good answers, however good their offerings in their particular domains are. Even IBM has done an about turn on this topic recently, laying out a roadmap for a single MDM Server that will eventually bring together its menagerie of acquired technologies into a platform that will handle multiple master data domains consistently. For this reason I suspect that D&B did not pay over the odds for Purisma.

D&B has had phases in the past of buying software companies, and then moving away from this business e.g. those with long memories will recall the 4GL Nomad, which it sold off after some years. The press release that is tucked away on the Purisma web site today is not giving anything away. If press releases played poker, this one would be a tough player. Purisma customers need to seek guidance from D&B about its future intentions, and consider their alternatives.

del.icio.us:Pure and chased  digg:Pure and chased  reddit:Pure and chased  Y!:Pure and chased

Common sense starts to prevail

October 30, 2007

Regular readers of this blog are probably tired of hearing about me advocating that MDM vendors need to move beyond single domain solutions (CDI, PIM) into solutions that can cater for a wide range of master data types. I have spoken at a number of the very useful CDI/MDM Institute (previously CDI Institute) conferences organised by Aaron Zornes, which are pretty much the only MDM conferences out there, and initially (as indicated by its earlier name) Aaron seemed fairly sceptical about this message. It is therefore encouraging to see him starting to lean this way in an article in DM Review. In the article he bases this view on multiple conversations with people responsible for MDM at large enterprises.

This is quite right; perhaps I had this view initially because I used to be a technology strategist at Shell and so was trained to think this way, but it has always seemed blindingly obvious to me that single domain solutions are at best a sticking plaster when it comes to MDM. There are simply too many classes of master data to contemplate fragmenting MDM solutions by domain, each to a potentially different vendor. Large companies don’t like dealing with more vendors than they have to, and common sense tells you that it is easier to get economies of scale in terms of skill sets. never mind software licenses, by using technology that is capable of dealing with all kinds of master data in the same way. Personally I would be cautious above vendors who bolt on wider domain capability to existing technologies that were initially hard coded around a specific domain such as customer or product. It is never easy to re-architect software to do something that its original designers never intended. It will be easier for the pure play generic MDM vendors to add better performance etc than it will be for a CDI vendor to be genuinely able to deal with multiple domains consistently.

Having already changed the name from “CDI Institute” to CDI/MDM Institute” it’s only three letters away from ending up with the “MDM Institute”.

del.icio.us:Common sense starts to prevail  digg:Common sense starts to prevail  reddit:Common sense starts to prevail  Y!:Common sense starts to prevail

Data mis-governance

October 22, 2007

I spent this morning at a data governance seminar sponsored by Dataflux, at which Jill Dyche or Baseline Consulting spoke about her experiences of data governance best practice in client organisations, and Philip Howard of Bloor gave his perspective. Data governance seems to be something very much in its infancy despite the long-established issues it addresses, with only a tiny proportion of organisations having made a lot of progress (according to an IBM Global Services 5 point data governance maturity scale, no company is further along than stage three, and only a handful of companies even manage that). There seems little in the way of a silver bullet here, just missionary work to convince the business that data ownership needs to be taken seriously. Sometimes a “burning platform” can stimulate interest. Recently Nationwide Building Society was fined GBP 1 million due to the theft of a laptop on which customer data was stored (albeit in encrypted form). Interestingly, the fine was not directly due to the loss of the data but the fact that they had no processes in place to determine that there was actually customer data on the laptop. Such cases illustrate the risks, at least in regulated industries, of having poor data governance polices.

Another aspect of data governance often overlooked is the proliferation of data in corporate spreadsheets. Apparently Allied Irish Bank have a stunning 185 TB of storage devoted to spreadsheets alone, and who knows how much of this is duplicated. With studies showing that, in a spreadsheet with over 200 rows there is a 90% chance of an error, the potential for problems is self evident. When I was at Shell there was a whole group on the corridor opposite me who built spreadsheet models and audited existing ones, some of which are highly important (e.g. financial models used for capital intensive projects). This group paid its way many times over by uncovering flaws in existing operational models. Yet I suspect they only scratched the surface, and how common are such initiatives? This should be a promising area for companies such as Compassoft, which do spreadsheet “discovery and control”. Indeed there are no shortage of scandals related to manipulation of spreadsheets, including the USD 700M one at Allied Irish. And you thought you had enough data quality problems in your corporate systems….

del.icio.us:Data mis-governance  digg:Data mis-governance  reddit:Data mis-governance  Y!:Data mis-governance

The murky world of market sizing

August 9, 2007

Defining a software segment’s market size is a tricky thing, partly because is all about what you include and what you exclude. Take MDM as an example. A much quoted IDC figure reckoned the MDM market would be USD 10 billion in 2009, implying a USD 5 billion market size in 2005 given compound growth of 14%. Such figures are regularly bandied about by the computer press, but mean little unless you qualify such statements by explaining what is included or excluded. For example this figure includes an estimate for services business associated with MDM. This is itself hard to pin down, but in my experience an MDM project where the software costs X will spend about 3X on services to implement it. Hence that USD 5 billion market size actually only has about USD 1.6 billion of software sales. Then MDM itself is a broad church, including CDI and PIM as well as a generalist MDM solutions such as those from Orchestra Networks and Kalido. I was still puzzled as to why even this USD 1.6 billion figure number was so large, but by deduction I think that the IDC figure was including data quality within the picture also. Fair enough, but it needs to be explicitly stated to make sense of the market, and as we will see still does not explain the gap.

Let’s come at this another way. A Gartner figure just released reckoned the CDI market was worth USD 310 million in 2006. This appears to be an estimate for software rather than services. Getting a figure for the product information management market is murkier, but I believe it will be broadly at a similar level. The generalist MDM vendors are these days mostly from smaller companies (products like Razza having been swallowed and digested by Oracle, and Stratature by Microsoft for example) and I doubt would add USD 100 million in software sales to this picture. Hence, adding PIM + CDI + specialist MDM (but excluding data quality) you get a software market of maybe USD 700M (probably a bit less), which is a far cry from the apparent IDC figure of USD 5 billion, or even the likely USD 1.6 billion of software revenues only. I still struggle to bridge the gap here, as the data quality market is not that large. Again you have to be careful about what is in and what is out, but other than leader Trillium data quality vendors are mostly very small (e.g. Exeros, Datanomic, etc) or are now buried within larger companies through acquisition (e.g. Informatica, Business Objects). However though I have seen estimates like USD 500M for the data quality market, again I wonder how much of this is services; personally I am unconvinced that the software sales of the data quality market would be much beyond USD 100M or so (companies like FirstLogic were not that large prior to their acquisition). So if we take the USD 700M figure and throw in USD 150M for data quality software sales (let’s be generous) this is still a far cry from the USD 1.6 billion estimate we arrived at earlier. Of all the analyst firms I respect the market size figures from IDC best, as they do actually check with the vendors what their revenues really are (they used to do this every year when I was running Kalido) but as you can see their MDM market size figure is still a mystery to me. If someone from IDC is reading this and can shed some light on it I would be interested to hear from them.

MDM is certainly growing quickly: each analyst firm agrees on this, and is clear enough from the number of companies entering the market or (more commonly) re-labelling existing products as MDM. However it can be seen that you can take a figure like the IDC 5 billion number, and also produce a valid market estimate of under USD 850 million, just based on what you include or exclude, for seemingly the same market. Quite a range. I guess it is hoping too much to expect the IT press to actually mention pesky caveats like what a number includes, since it is more headline inducing to say “MDM market worth $5 billion”, but if you are to actually use these figures to help with a decision then you would be well advised to dig deeper, below the headline numbers.

del.icio.us:The murky world of market sizing  digg:The murky world of market sizing  reddit:The murky world of market sizing  Y!:The murky world of market sizing

If all you have is a hammer…

June 29, 2007

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

del.icio.us:If all you have is a hammer...  digg:If all you have is a hammer...  reddit:If all you have is a hammer...  Y!:If all you have is a hammer...

Master data: from jungle to garden in several not so easy steps

June 20, 2007

I very much liked a succinct article by the ever-reliable Colin White on MDM approaches. Companies still struggle to get to grips with what a roadmap for MDM is all about, with apparently competing (and incomplete and immature) MDM technologies and management consultants who are only a few pages ahead of the customers in the manual. This piece neatly sets out the end goal of MDM and the various approaches to getting there (via analytic MDM or operational MDM as a start). It would have been even better had it explained in more detail how the alternatives can be run in parallel, and going into more depth on the issues of each sequences of steps. However by clearly separating out operational and analytic MDM and showing how these are complementary he is already doing a significant service.

The issue he mentions with “approach 1″ i.e. the “complexity of maintaining a complete historical record of master data” can be dealt with if you choose an analytic MDM technology which has built-in support for analysis over time. Colin points out that a key step is to end up with a low-latency master data store as the system of record for the enterprise, acting as a provider of golden copy master data to other sources, both transaction systems and analytical ones such as an enterprise data warehouse. If properly implemented, this will result in a change of the centre of gravity of master data, from the current situation where the system of record is ERP to a situation where the enterprise master data repository is actually the system of record, providing data through a published interface (and an enterprise service bus) through to all other systems, including ERP. This is a desirable end state, and is a key step to starting to unlock the monolithic ERP systems that companies use today into more manageable components.

I really hope that this paper gets the attention that it deserves. Getting most of the key messages into two page article is quite an achievement. I would like to see this developed further, and hopefully it will be.

del.icio.us:Master data: from jungle to garden in several not so easy steps  digg:Master data: from jungle to garden in several not so easy steps  reddit:Master data: from jungle to garden in several not so easy steps  Y!:Master data: from jungle to garden in several not so easy steps

The other shoe drops

June 8, 2007

For sometime I had been wondering which company Microsoft would buy to enter the MDM market. This is a key area in the broader business intelligence arena that they aspire to progress in, and was a major gap in their offering. Stratature was their choice, and it was a smart choice. Stratature plays in the analytical MDM area rather than being an operation transaction hub (like Siperian, say). It had built up a good reputation for flexible hierarchy management, an important feature of most MDM applications. They competed directly with Razza (an excellent tool which Hyperion purchased but Oracle seems to have now buried) and Kalido.

Stratature is the kind of bite-sized (16 employees) acquisition that Microsoft likes. It prefers to catch a company when it is small so that it can easily absorb the technical staff and mould them into the Microsoft way of doing things. When it has deviated from this rule (Great Plains, Navision) it has discovered why this was a good rule in the first place.

Congratulations to Ian Ahern, who impressed me on the several occasions I met with him. He also supports my (possibly biased) thesis that all the best MDM people are Brits. The terms of the deal are not public, and it would have been interesting to see what valuation a good MDM vendor achieved; I am sure it worked out well for Stratature’s shareholders. This now leaves Kalido as the main remaining independent analytic MDM vendor. This is not necessarily a bad thing for Kalido. Informatica has shown how you can thrive once your competitors get swallowed by the behemoths. Being stack-neutral in data management carries advantages.

del.icio.us:The other shoe drops  digg:The other shoe drops  reddit:The other shoe drops  Y!:The other shoe drops