Andy on Enterprise Software

Buying an Identity

April 18, 2008

Informatica did indeed make a purchase this week, but not the one some people were expecting. Instead of buying an MDM platform vendor they purchased Identity Systems for USD 85 million. Identity systems has a strong reputation in the data quality world, and amongst others has plenty of US federal agency clients amongst its 500 customers. It has 55 employees. This will shake things up a bit in the data quality market, where Similarity Systems was already one of the stronger players. The combination of the two under Informatica’s large sales channels will be a potent combination.

It still leaves open the question of whether, and in what way, Informatica will choose to enter the fast growing MDM market. I doubt this will be the last acquisition that they make.

del.icio.us:Buying an Identity   digg:Buying an Identity   reddit:Buying an Identity   Y!:Buying an Identity

Squaring the MDM circle

April 7, 2008

Jill Dyche raises an important point about the how companies are tackling MDM. She mentions the “random acts of MDM” that are done in isolation in a particular business area, or involving a particular data domain, which are unlikely to evolve into an enterprise-wide MDM solution.

The tricky issue that companies face is that MDM is a genuinely large-scale endeavour, and because we all know how well giant enterprise projects usually go, they are understandably reluctant to take on an enterprise-wide project. Instead they pick off an easier piece, such as one particular data type, or perhaps a broader set of master data types but only in a subset of the enterprise, say across one division. As Jill says, such isolated initiatives won’t in themselves magically grow into enterprise MDM. There is a further danger in disconnected initiatives. At this point the vendor technology out there is at very different stages of maturity depending on what kind of data you want to tackle, and on what scale. Some vendors have a well proven customer hub technology, but with limited experience in tackling product data (and may lack key functionality to do this e.g. attribute inheritance) and usually have very limited ideas about business process workflow and data governance support. Other vendors from the PIM or the analytic MDM world usually have much better business workflow support, yet may have limited scalability e.g. you it would be a brave person who tried doing a 100 million record customer hub using a PIM product. The vendors with a CDI heritage are adding more workflow capability, and the PIM and analytic MDM vendors are working on scalability, but these are works in progress rather than completed and tested features and functions. Hence separate initiatives may end up using different technologies due to the demands of a particular area, and it would be easy to end up with one technology to handle product data, another for customer data, and maybe another where analytics were the driver.

In my view you need to combine an enterprise-wide vision with a practical, bite-sized approach i.e. thing big but start small. You can build a broad enterprise strategy that encompasses data governance processes for example, even if you decide to build out actual master data hubs in a stepwise fashion, beginning with certain high value data domains or company divisions that can best benefit form improve master data. However you need to keep the big picture in mind in order to avoid (or minimise) duplicate technology investments that may prove hard to fit together. There are no magic bullets here, but enterprise architects need to put in place the processes and broad strategy that will lead to a better master data in the long term ,even if the technology to deliver across the enterprise is only partly here today. Setting up proper data governance, and getting business people committed to it, should have real benefits and will be valid efforts whatever technologies are deployed.

del.icio.us:Squaring the MDM circle  digg:Squaring the MDM circle  reddit:Squaring the MDM circle  Y!:Squaring the MDM circle

XML is not enough

March 28, 2008

I just read a particularly clear explanation of how XML contributes to helping with, but does not really solve, the problem of data integration. This is major issue as companies begin to deploy applications in the form of services, since as you bring elements of an application together via web services you usually also have to worry about how the data used by the application is going to be passed to another. There are just too many versions of XML, and insufficient semantic integration support, to just say “ah, we don’t need to worry about that - we are XML compliant”, yet this is exactly the marketing position of some vendors. As the article points out, a higher degree of semantic integration is needed. Master data management applications seek to provide this by establishing a repository of trusted information which has the necessary level of understanding to map the various definitions of “customer”, “product”, “fixed asset”, “location” etc together.

Whether you deploy such an application in a “co-existence” mode or “operational” mode is less important than going through the process of mapping together the competing definitions of master data strewn throughout any large company. Having a dial tone on my telephone enables me to phone someone in Argentina, but does not mean that we can communicate unless we also speak the same language. In the same way XML is a useful, but insufficient, building block in the path to data reconciliation in the enterprise. Only higher level semantic-based models are going to do that, and they will be hard work to implement given the amount of human interaction between different departments and company subsidiaries needed to resolve the differences that have built up over time.

del.icio.us:XML is not enough  digg:XML is not enough  reddit:XML is not enough  Y!:XML is not enough

MDM In Savannah - Day 2

March 5, 2008

The conference continued today with a string of customer case studies, plus some panel discussions and a couple of vendor presentations that just about managed to avoid being too blatant in their product plugs. I enjoyed a case study from a transport company called Pitt Ohio Express, who had implemented a customer-oriented MDM hub for the practical reason that they need to know where their trucks have to turn up to deliver things. This seems a more pressing reason to sort out customer name and address than a bit of duplicated direct mail. Also, they had actually measured things properly before and after their project, and had achieved a 2% overall company improvement in operating margin due to the initiative. A proper view of customer spend has enabled targeted customer pricing rather than blanket price lists, as an example of a real benefit seen.

I also enjoyed a lively presentation by Brian Rensing,a data architect at Procter and Gamble. There must be marketing in the blood there, as he was an entertaining speaker, and how many data architects can you say that of? He explained how they had managed to get buy-in to their MDM initiative, working one business unit at a time and relying heavily on iterative prototyping to ensure that business people could see short-term benefits, rather than laying out a grandiose multi-year initiative. Their project covers both customer and product initially, both at the corporate level and (gradually) country level, using KALIDO MDM. They see this MDM initiative as being able to enable them to lead into better data warehousing and analytics n the future, since there will be a sounder data foundation on which to work.

In general I am surprised at the number of companies contemplating (and actually doing) MDM projects using entirely in-house technology. One company even devised its own matching algorithms. Surely this is the kind of thing that off-the-shelf data quality products can do much better? I suppose MDM is still in relative infancy in terms of market size (Rob Karel of Forrester reckoned USD 1 billion in 2006, of which only a third was software, a very different number from IDC estimates, but expecting over 50% compound growth over the coming years). The big systems integrators seem yet to have really caught on to this fast growth, with Baseline Consulting at this conference almost the only SI represented (and they are a specialist boutique). It will be interesting to see at what point PWC, Accenture and Bearing Point start turning up to such conferences.

I should relate a conversation with one vendor at the exhibit last night. “So, what kind of revenues do you guys do?”. “We don’t disclose that”. Fair enough, some companies are shy. “How many customers do you have?”. “We don’t disclose the number of customers we have”. “Er, OK, do you have any customers?”. “Oh yes”. Uh huh. “Who are your investors? “That is private.” “How many employees do you have?”. “We can’t share that information”. So we have here a vendor, at a trade show, unwilling to talk about how big it is, who has invested in it, how many customers or even employees they have. Short of putting a puzzle on its web site in order to find the contact address, it is hard to imagine how they could induce nervousness in a prospect more. Surreal. I guess they are going for the “dark and mysterious” marketing approach pioneered by Ab Initio.

Although many case studies were about customer, over half the respondents in a recent TDWI survey said that their MDM initiative had enterprise-wide scope, and there were certainly examples here of case studies around product information, as well as financial information. I still had the sense that a lot of companies were treading gingerly into the MDM world, but there were enough case studies of completed projects to suggest that the growth in the market which Forrester (and others) predict is plausible based on the level of interest shown here.

Perhaps the most entertaining moment of the conference was watching Todd Goldman, VP of marketing for Exeros, doing a (quite impressive) conjuring trick at the beginning of his presentation. It turns out that he is an amateur magician, a skill that must come in very useful in his career in software marketer. This was not the last time I have seen clever illusions in software marketing, nor will it be the last.

del.icio.us:MDM In Savannah - Day 2  digg:MDM In Savannah - Day 2  reddit:MDM In Savannah - Day 2  Y!:MDM In Savannah - Day 2

MDM In Savannah - Day 1

March 3, 2008

For the first time TDWI has arranged an MDM conference, running this week in Savannah, and they kindly invited me to speak at the event. It is quite well attended, and is unusual in that customer attendees had to apply to the event in order to minimise “tyre kickers” (but qualified attendees had some travel expenses reimbursed). There are around 100 project managers and the like involved with MDM projects, plus the usual vendors and assorted hangers-on (like me).

The highlight of day 1 was a presentation by Barry Briggs, CTO of Microsoft, about Microsoft’s internal MDM project. Since they did not use their own MDM technology for the project, it came across quite credibly. Microsoft have customer records on 80 million enterprises, and a billon consumer records, but had considerable difficulty in getting a consolidated view of a given enterprise due to the multiple systems used to input customer records. In 2005 they found they had 37 systems that claimed to be the system of record for customer (this is pretty average for a large company, by the way). Starting with Dun & Bradstreet data they mapped the various competing customer records and consolidated these into a repository called MIO (which uses Initiate’s CDI hub technology). Apart from its scale (a project team of 40) there were some interesting aspects to the project.

First was that they did actually measure ROI, which was over 500% of the project cost. The savings were mainly due to reduced time spent by sales staff in managing customer information and related information such as arguing over sales commissions; consolidated views of customers also saved time, and in some cases gave new sales opportunities. New sales reps cannot enter new customer account information without the data being checked in the repository first e.g. a “new” customer might turn out to be actually existing based on a match of its address.

One key point discussed was the level at which matching should be imposed. The technology used assigns a probability of a match between two customer records. The project found that records with a probability of over 85% were almost certainly matches, and let the system assign this automatcially. Below 65% they are rarely matches and are assumed to be genuinely new, but those records in between still require manual intervention, since the consequences of a “false positive” i.e. matching up two records incorrectly are worse than those of missing a match. This seemed to me an important consideration for all such projects using matching algorithms. The project encountered a lot of issues not initially expected e.g. even a list of country codes became controversial since, for example, Taiwan is either a province of China or an independent state dependent on your viewpoint, and the wrong “view” could have considerable political and commercial consequences if displayed to customers.

Another point often missed was how the system itself is very much an operational system. Since this feeds, for example, the CRM system, the MDM application needs the same level of robustness. Indeed as more and more systems are hooked up to it then it could become a single point of failure. This is a point rarely mentioned by vendors, and indeed seem to me to be an important architectural consideration. The more all encompassing the MDM repository, the more scary its operational requirements if it is providing real-time links to OLTP systems.

Another case study was from Royal Bank of Canada, which has 10 million customer records. In their case it was important to have a single view of customer to allow cross-selling e.g. someone with a bank account may also want a credit card or insurance policy. Moreover Canada is about to institute a “do not call” system for cosnumers to avoid pestering marketing calls, and a failure to correctly implement this across the enterprise could result in fines. In this case the MDM sytems was an in house built repostory (on DB2) but using the QualityStage data quality technology to help with matching up and sorting out duplicate customer records. A later audit found just 135,000 possible duplicate records in a database of 10 million, which is in fact excellent. The speaker pointed out that at a certain point it becomes uneconomic to chase down the last few dodgy records. There is a team of 60 people, half business, half IT, dedicated to data quality, which interestingly reports into marketing, not into IT.

Other than that there were a number of panels and presentations, and a tradeshow with the usual suspects which is about to start as I write this. SAP and Exeros are the biggest spenders at this particular show, but Siperian, Kalido and IBM are amongst the sponsors also. So far logistics have been very good, with admirable time-keeping from the organisers. The 70F, sunny weather in February has helped the spirits of the attendees, as has the free ice cream.

del.icio.us:MDM In Savannah - Day 1  digg:MDM In Savannah - Day 1  reddit:MDM In Savannah - Day 1  Y!:MDM In Savannah - Day 1

Never mind the quality, feel the width

December 7, 2007

Frank Buytendijk (ex Gartner analyst, now with Oracle) makes an importantpoint about data quality on his blog: it is inherently dull. This in itself causes problems both to people within organisations who care about data quality (there must be a few of you out there) and for data quality vendors, who struggle to sell their products at a decent price point in sufficient numbers. I have written about this before, in which I pointed out just a couple of real life cases of poor data quality that I have personally encountered, each of which cost many millions of dollars.

The reason that data quality is generally excellent in the area of salary and expense processing is that people care deeply about what they get paid, and you can be pretty sure than any clerical errors get spotted and complained about very quickly. However in most cases data quality occurs due to people being asked to enter or maintain data for which they see no personal or even obvious company benefit. Data that is useful for “some other department” is never going to receive the same care and attention that your own personal expense claims get.

As Frank says, in order to move data quality higher up the enterprise priority list, it needs to widen its perspective: move beyond talking about customer names and addresses. Yes, this is important if you are doing mailshots, and certainly poor customer name and address management can have more serious consequences, but most executives have got better things to do than worry about whether their mailshots are being duplicated.

Despite numerous acquisition over the years (First Logic, Similarity, Vality, …) there are still plenty of small data quality vendors out there, some with very interesting technology. Yet aside from Trillium, few have managed to get even into double figures of millions of revenue. This is not due to an absence of a real problem to address.

Some data quality vendors rightly see master data management as a way of repositioning their offerings in a more fashionable area, but they need to realise that data quality is just a feature of a complete MDM solution. Hence they need to partner with broader-based MDM repository vendors who themselves often lack proper data quality technology, rather than pretending they themselves are a complete solution. They should also do a better job of highlighting quantified customer dollar benefits achieved from the use of data quality technology. This should not be hard to do since data quality projects usually have excellent payback. Yet time after time the example used in data quality collateral are the tired name and address cleanup, followed by an esoteric discussion about whether probabilistic or deterministic matching is better (paying customers don’t care - they are interested in what benefits they see). Far too few data quality case studies mention hard-dollar benefits to the customer.

Data quality should have much going for it: it is a very real problem, the condition of data quality in most large organisation is horrible (and far worse than generally realised), and the costs of this are significant and cause genuine and in some cases very serious operational problems. Yet the industry as a whole has done a poor job of explaining itself to the people with the cheque books in enterprises.

del.icio.us:Never mind the quality, feel the width  digg:Never mind the quality, feel the width  reddit:Never mind the quality, feel the width  Y!:Never mind the quality, feel the width

Informatica marches on

October 20, 2007

Informatica had a very solid quarter indeed, with revenue up 22% at USD 96 million, of which USD 41 million was licence revenue (also up 22%). Maintenance is now a handy USD 38.3 million and consulting/services USD 16.7 million. These are very healthy ratios for a software company, as recurring maintenance is the best revenue of all; consulting at less than 20% of revenues means that the company is still a proper software company and not a consulting firm in disguise. Interestingly, growth in Americas was 15%, but growth outside was 36%.

The company revealed that 43% of its use cases were in the context of data warehousing, and its largest verticals were financial services, hi-tech and public sector.

Informatica is an interesting example of how a company can prosper when its main competitor (Ascential) is taken out of the market by a behemoth (IBM). Commentators often assume that being taken over by a behemoth means greater muscle, yet often the behemoth is distracted, bureaucratic and annoys the key staff of the company it has taken over. This can leave an independent competitor in an almost unchallenged position. This effect is amplified when the vendor in question is competing in a market where platform neutrality is important, as data integration and business intelligence are.

One small storm cloud in all the blue sky that Informatica is seeing is that SAP’s purchase of Business Objects will presumably have some effect on the OEM deal that SAP has with Informatica (since Business Objects own rival ETL technology from Acta). However based on the history of Ascential’s own OEM deal with SAP which preceded this one, I doubt that, even in the worst case, this would have much financial significance (my sources told me that Ascential never made much money of that deal) even if SAP dropped the deal entirely, which is by no means clear.

del.icio.us:Informatica marches on  digg:Informatica marches on  reddit:Informatica marches on  Y!:Informatica marches on

Babies and bathwater

October 15, 2007

I read a rather odd article in Enterprise Systems proclaiming that “the data warehouse might be dead”. The thrust of the article was the old idea of piecing together reports for a particular user or department by accessing source systems directly rather than bothering with a pesky data warehouse, in this case advocated by a senior business user from Bank of America. I understand the frustration that business people have with most corporate data warehouses today. They are typically inflexible, and so unable to keep up with the pace of business change. Indeed this thought was echoed recently by Gartner analyst Don Feinberg, who said that data warehouses more than five years old should be re-written. To a person who has an urgent information need, going to an IT department that tells him that he cannot have the information he needs for weeks or months is understandably irritating.

Yet the apparent solution of accessing source systems directly is flawed, and the problems this causes is after all is why people invented data warehouses in the first place. Yes, you can patch together data from source systems, and yes, that one report you want may appear less complicated (and quicker) to get than going through a change request in corporate IT. Overall, though the economics of point to point spaghetti v a central warehouse are easy to see. The organisation as a whole will spend much more money in this manner than by having a central warehouse where the data can be relied upon; the more complex the organisation, the larger this gap will be.

Probably worse, the business user gets some numbers out, but are they the right numbers? Anyone who has worked on data warehouse projects will be familiar with the frequently dismal quality of data even in supposedly trusted corporate source systems such as ERP. It is often only by looking at sources together than problems with the data show up. Often errors in, say, regional systems can cancel out or be obscured, and the true picture only emerges when data is added up at a collective level.

In these days of increasing anxiety about banking scandals and greater regulation, companies can ill afford to subject themselves to unnecessary risk by making decisions based on data of questionable quality. The issue is that most data warehouses today are based on inflexible approaches and designs, causing lengthy and costly delays in updating the warehouse when the business, inevitably, changes. It does not have to be this way. You can construct data warehouses in a more flexible manner, and in a way in which business users are engaged with the process. By running parallel master data management initiatives and setting up an MDM repository, the data warehouse can be relieved of at least some of the burden of sorting out corporate reference data, giving it more of chance.

It is incumbent of IT departments to embrace modern data warehouse technologies and complementary MDM best practice in order to avoid driving their customers to “skunk works” desperation in order to answer their needs. IT organisations that fail to do this risk being marginalised, but also indirectly drive up costs and risks for their companies.

del.icio.us:Babies and bathwater  digg:Babies and bathwater  reddit:Babies and bathwater  Y!:Babies and bathwater

Last man standing

October 11, 2007

So, what does the acquisition of Business Objects by SAP mean for the BI industry? In some ways this is a curious move by SAP. Business Objects is a very successful company, yet the overlap between SAP customers and Business Objects customers is much less than generally thought. Of course really large companies tend to have lots of software, so many companies will have SAP and also Business Objects, but this does not meant that they are being used in concert. I spoke to an executive of Business Objects this week who told me that the overlap was “almost zero” because for Business Objects to access the SAP environment was awkward (”nearly impossible” were his exact words). Typically Business Objects reports will be running against a separate data warehouse being fed by SAP and other systems rather than Business Objects directly reading, say, a BW warehouse or (far less likely) directly accessing SAP systems. Business Objects is much more common in Oracle shops, which makes me wonder whether the purchase is more a defensive one i.e. to take Business Objects out of the market before Oracle gets its hands on it. Oracle + BOBJ would be a much more natural fit, and so I am sure that this move will displease Oracle, who had bought Hyperion, yet Brio was only a distant third in the reporting space behind Business Objects and Cognos.

For Business Objects customers the news is not necessarily positive. SAP tends to be fiercely proprietary about its environment, preferring to fight it out for footprint with Oracle in customers rather than opening everything up. Hence SAP BW, for example, while it can certainly load data from non-SAP systems, is suited best to an SAP environment, and it would be bizarre indeed to imagine a customer without SAP buying SAP BW. Business intelligence is not SAP’s core competence, and there is some considerable danger of dilution of attention, as well as future product directions potentially being pulled in awkward ways e.g just how open will SAP want Business Objects to continue to be to other sources such as Oracle? Of course for now it is all public harmony, but think down the road a year or two and consider whether someone in SAP might at least consider the option of making it harder for Business Objects to work with Oracle’s applications in order to encourage customers to switch from Oracle to SAP applications. Do you think this would never cross their minds; not even a little bit? Moreover SAP has a distinct culture, and has no track record of an acquisition of this size (indeed, until this week its executives had scorned the very idea). I suspect that just about every person in Business Objects is at least updating their resume right now, and even if things work out fine it is going to be at best unsettling for staff. As Woody Allen said, the lamb may lie down with the lion, but the lamb will not get much sleep.

I think Cognos is the winner here, as it can now stand as the clear leader in business intelligence as a truly independent vendor. Companies with genuinely mixed environments will surely edge towards Cognos now in preference to Business Objects. Even if it turns out that Business Objects truly will be run as an independent business, there will always be that nagging doubt (reinforced by Cognos sales people). Informatica has demonstrated that you can prosper perfectly well when your main competitor is bought by a behemoth (in their case Ascential by IBM) and Cognos is now positioned to follow suit.

del.icio.us:Last man standing  digg:Last man standing  reddit:Last man standing  Y!:Last man standing

Creating a burning data quality platform

October 1, 2007

There is a blog I read by Forrester today that rang true. The point being made is that data quality is a hard sell unless some crisis happens. This is evidently true, since the data quality market is small. and yet the problems are large. I have encountered several shocking pieces of data quality in my time that were costing millions of dollars. In one case an undetected pricing error in a regional SAP system meant that a well known brand was being sold at zero profit margin. In another case, a data quality error in a co-ordinate system caused an exploration bore to be dug into an existing oil well, which luckily was not in production at the time so “only” cost a few million dollars. In more general terms every dollar spent on data quality should save you four. Yet these examples I mentioned (and there are plenty more) actually showed up not in data quality projects but in data warehouse or master data projects, which in principle were supposed to be taking “pure” data from the master transaction systems. This does not inspire confidence in the state of data in corporate systems which are not “clean”.

I am not sure why this sorry state of affairs exists other than to note that in most companies data quality is regarded as an IT problem, when in actual fact the IT folk are the last people to be in a position to judge data quality. Responsibility lies firmly in the business camp. Moreover, as I have mentioned, justifying a data quality project should not be hard: it has real dollar savings, quite apart from other benefits e.g. reduction in reputational risk. I suspect that some of the problem is that it is embarrassing (”no problems with our company data, no sirree”) and, let’s face it, pretty dull. Would you rather work on some new product launch or be buried away reviewing endless reports checking whether data is what it should be?

For people toiling away in corporate IT the right way to get attention might be to use a modern data quality tool, find a sympathetic business analyst and poke around some corporate systems. These days the tools do a lot of self-discovery, so finding anomalies is not as manually intensive as it used to be. If you turn over a few stones in corporate systems you will be surprised at what will turn up. Chances are that at least one of the issues you encounter will turn out to be expensive, and this may raise the profile of the work, allowing sponsorship to dig around in other areas.

del.icio.us:Creating a burning data quality platform  digg:Creating a burning data quality platform  reddit:Creating a burning data quality platform  Y!:Creating a burning data quality platform