Andy on Enterprise Software

Ploughing a new data furrow

September 19, 2007

As there was some interest in the last blog I thought it might be useful for people to know a little more about Exeros. The company was set up by an ex-founder of ACTA, and did a series A funding round in 2004. The company has some innovative technology which essentially reverse-engineers the structure of data by looking at the data values of database tables and files. This is different from some other profiling approaches, which often examine metadata e.g. column headings etc rather than data values. In this way it “discovers” business rules inherent in the data, and as a by-product then also discovers how well the data adheres to those rules. For example in one customer example they have, a customer gave Exeros a sample dataset and the product whirred away and discovered its structure. All well and good, but it also pointed out that in one case there was only a 98% match of the data to the structure, which caused the customer to say: “that’s impossible, that it is a mandatory field”. Well, perhaps, it was, but the data was still in error! In my own experience of MDM projects there are plenty of such moments; customers have an amusingly naive view of how good their data quality really is.

Other companies that purport to do data discovery are Sypherlink, and ahref=”http://www.zoomix.com/zoomix_wall.asp”>Zoomix , but Sypherlink in particular seems to use more conventional metadata-based profiling. The functionality that Exeros provides is useful for situations like master data integration projects, or data consolidation projects. It could also be used to help in building staging areas for ETL builds, where multiple sources of data often throw up all sorts of issues that have to be resolved manually. Exeros does not have a repository as such, and generates its analysis as output in either XML form or as feeds into tools like Business Object or ETL tools such as Informatica and IBM/Ascential.

The company started selling commercially in 2006, and already has a dozen or so customers in production. So far this has been mainly in the financial services area, who have plenty of data issues and stiff compliance reporting needs, but there is no reason why the technology should not be applied to any industry as far as I can see. There seems to have been some pretty serious R&D here, with a product team of 40 people, and the company seems to be to have kept to an admirably tight focus so far rather than trying to claim it solves the world’s problems on its own. Over time I would expect to see it having opportunities to partner with MDM vendors, especially those who take a generic MDM approach rather than, say, CDI only vendors. The broader the breadth of data, the more complex data issues emerge.

Marketing the company as “data discovery” rather than “data quality” is a good idea, as the approach is genuinely different, and avoids the company being pigeon-holed alongside more established companies. The drawback is that they essentially trying to carve out a new market, never an easy thing, and will encounter the usual emerging company issues with conservative buyers and analysts who prefer to neatly drop them into an existing slot. However in my view the problem they are tackling is very real, and the approach seems innovative, so they should continue on this path. If they make enough customers happy then the analysts will soon come around to their view.

del.icio.us:Ploughing a new data furrow  digg:Ploughing a new data furrow  reddit:Ploughing a new data furrow  Y!:Ploughing a new data furrow

Discovering Data Quality

September 12, 2007

For those following the mixed fortunes of the data quality vendors, one of the more interesting recent development has been a company called Exeros. After getting a hefty series B round a year ago, Exeros has just landed a partnership with BI behemoth Business Objects. This is potentially very good news for Exeros. Any BI project involves a significant element of data quality, and so the fit is logical, and Exeros’ cunning “discovery” slant to its marketing will give a fresh-sounding label to the otherwise rather dowdy data quality market. What is curious is that Business Objects already owns not own but two data quality vendors, First Logic and the entertainingly Germanic Fuzzy Informatik (which sounds to me like a Kraftwerk single). The press release was the usual partnership waffle, so it was unclear from this as to exactly how the joint proposition would be brought to market, but it does make you wonder about why Business Objects needs a new tool, unless the existing acquired technology is not doing quite what it was supposed to.

Exeros has been very good in its marketing execution so far, and this partnership is another example of it. As far as I can see there is little reason why other data quality vendors (e.g. Datanomic) could not have latched onto this “discovery” label, which makes an old subject sound new and interesting, as their technology seems to do pretty much the same thing, but they have chosen not to. I am not sure what Exeros’ sales have been like so far, but this partnership is certainly a useful step for them.

del.icio.us:Discovering Data Quality  digg:Discovering Data Quality  reddit:Discovering Data Quality  Y!:Discovering Data Quality

Cognos splashes out

September 7, 2007

Cognos has just bought Applix, whose TM1 product was a pioneer in the in-memory database market. Applix has been highly profitable, aiming at volume sales to the mid-market financial analysis market, with a typical sale price well under USD 100k. However it had revenues of USD 61M in 2006, and is likely to show modest growth on that in 2007 (maybe USD 70M revenues) but with a yummy operating profit margin of 24%. Cognos now has an amusing range of OLAP engines, that within Powerplay, the one that came with Adaytum, and now this one; at this stage it is unclear whether these will all continue or whether some sort of consolidation will occur in the long term.

The interesting thing about this purchase was the price: a hefty USD 339M (USD 306M if you strip out the cash in Applix’s bank account that Cognos will acquire). At five times trailing revenues and four and a half times forward revenues this is a very healthy premium indeed. The backers of Qliktech in particular, which is based on a similar engine but has far stronger growth than Applix, must be rubbing their hands in glee.

del.icio.us:Cognos splashes out  digg:Cognos splashes out  reddit:Cognos splashes out  Y!:Cognos splashes out

Searching for meaning

September 6, 2007

I have written before about how many industry surveys can be almost meaningless due to the way that they are phrased or the way that the audience is selected or encouraged to participate. Sometimes the survey itself can miss the point, as in a recent one about the percentage of data that is structured or unstructured. An article about this agonises about whether unstructured data is 31% of all enterprise data, or 50 odd percent, rather than the “80% claimed by other research organisations”. It seems to me that this misses the point. It is less relevant about what proportion of data is unstructured (and by the way, does that mean the storage volume, or the number of sources, or something else, since the article blithely skips over this) than about the value and usage of this data. The context here is the use of search technology of BI, with people who sell this technology presumably wanting to make a point that most data is there in emails and spreadsheets, so therefore search technology can mostly replace that pesky BI business. This seems to me a flawed argument. In the context of business information, we typically know what we want e.g. the monthly sales figures and, unlike when we search the web using Google, we also have a fair idea where it is e.g. the company financial systems. The difficulty is not in finding the information but in making it meaningful, which is what the vast majority of effort in data warehousing and BI is all about. Unlike a search for a video clip of an episode of “Heroes”, or finding a particular book on Amazon, the difficulty is that many ambiguous answers exist. Books are a nice analogy, as the world discovered long ago the sense of putting a unique (it terms out not quite unique, but for most purposes it is true) ISBN number on books to avoid ambiguity. This is not the case in large enterprises with “sales figure”, which in fact will exist not only in the official corporate finance system, but in several other “proper” systems, and in endless spreadsheets to which the information has been downloaded and possibly manipulated for various purposes. Indeed trying to make a meaningful and useful classification scheme around data is what master data management, and much of data modelling, is all about.

Imagine the fun Amazon would have in finding a book in a world with no ISBN number, and where rival publishers regularly published identical titles, some even from the same author. This is more like the world that BI deals with. Indeed, if there was one single place where “sales data” lived, and if everyone agreed on exactly which sales data that was (the whole company’s, just Europe’s, with or without indirect sales?) then the world of BI would be a simple place and data warehouse developers could pack up and learn a new trade. This reality seems to have eluded some vendors plugging BI search, and indeed some of the industry writers. It is almost irrelevant what “percentage” of data is unstructured, semi-structured, or structured. In the imperfect world of enterprise data a high proportion of the important data suffers from the persistent problem of ambiguous classification and multiple copies, with processes that do not perfectly control replication of that data. It is a world Google Search can shake an uncomprehending stick at all it likes, but to me it is likely to have only a limited impact. Until enterprises get a real grip on the life cycle of information management and put processes in place to properly classify and allow for update and distribution or master data (don’t hold your breath), the world of BI won’t be replaced by a search icon.

del.icio.us:Searching for meaning  digg:Searching for meaning  reddit:Searching for meaning  Y!:Searching for meaning

Tradeshows and lemmings

August 30, 2007

August is a tricky time for bloggers since, at least in Europe, everything closes down for most of August: no marketing manager in their right mind would do a product release; nor are there any trade shows, hence very little news to write about. There is at least another enjoyable blog from the Cranky PM about trade shows which is good summer reading though. It raises a valid point about whether exhibits at trade shows are really worth it, at least from the point of view of lead generation. There is no doubt that trade show exhibits are expensive, and take days out of the time of pre-sales and sales staff who might be better employed elsewhere. The quandary vendors find themselves in is that if they do not exhibit then they risk appearing side-lined. Certainly shows vary, but the quality of leads from trade shows is usually pretty poor. As the Cranky PM points out, there are a depressing number of people who seem there only to collect free T shirts and pens and enter prize draws, yet clearly have no budget whatever and often scarcely even a polite interest in the product (”do you have a few more of those pens?”). I assume that such people are only there because (a) the trade show is in Florida/San Diego/Cannes (b) their company has to spend its training budget somehow.

My observation has been that the broader the scope of the show, the less useful it usually turns out to be. Vertical industry conferences may not be large in terms of numbers, but often have real customers with a project and a problem to solve. Generic IT trade shows have lots of people but finding a real qualified prospect at one of these is like finding a beer in Salt Lake City. We can fall back on the excuse that it is all really just advertising and “mindshare” and never mind the leads, but then is that the very best way of spending your marketing dollars? The other argument is that your competitors are there, and while this has a point it is the “10,000 lemmings can’t be wrong” argument (as a side note, it seems lemmings don’t really leap off cliffs, at least not without a well-timed shove). I suspect the Cranky PM is right and that a lot of it is tradition and inertia.

I had positive experiences of webcasts at my previous company, since unlike trade shows people who attend these are doing so willingly rather than because they are wandering past a booth and you catch their eye, and you get to spend 30-60 minutes with people rather than the cursory conversation that is usually possible at a trade show. Of course webcasts are not free; you have to promote them, which involves buying rights to distribution lists, but even so the costs are certainly lower than renting an exhibit booth and tramping several staff around the country for a few days. A famous quote from John Wanamaker is: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” In the case of most trade shows I suspect half is optimistic.

del.icio.us:Tradeshows and lemmings  digg:Tradeshows and lemmings  reddit:Tradeshows and lemmings  Y!:Tradeshows and lemmings

Demanding BI

August 16, 2007

Ken Rudin makes a compelling case for on-demand BI in DM Review. If anything, he over-eggs things by saying that users will get more engaged with on-demand versions of software, and how you don’t really need IT deployment skills, which seems less than obvious to me. However the thrust of the argument is still valid. I know having been on both the end-user side of the fence (at Shell) and as a vendor (Kalido) that a scary proportion of problems that users encounter are “environmental” i.e. some combination of software installed on the PC causes an application to break, and the software support desk has great difficulty in replicating the problem because it is next to impossible to ensure that your PC has precisely the same patch release level of operating system/database/middleware as the customer who has the problem. Indeed this is why large companies go to great lengths to try to standardise the desktop configuration across the enterprise, doing upgrades as rarely as possible. This is no mean project if you have tens of thousands of desktops in lots of countries, and tends to results in customer frustration as they find that the latest feature their children are playing with at home is not available on their locked down and fairly stable but quite out of date software they have at work.

The lack of environmental intrusion seems to me the key advantage of the on-demand model to the customer. As a side benefit, but not really a function of the model, most vendors have changed their pricing models for on-demand so that customers pay on a leasing basis rather than a big up-front license charge. This has potentially benefits to both customers and vendors, since customers get to pay only on usage, which seems to them a fairer way of paying that having to pay up-front for something they are not sure about, and for vendors it gives them a steadier revenue stream, as well as a possibility to reduce the sales cycle: USD 100 per month per user sounds a lot less than USD 100k plus 20% maintenance, though of course it may not actually be if the usage rates become high enough. For example Actuate cite this as a reason why they are bullish on their on-demand offering, as they can employ relatively cheap “inside reps” (essentially up-market telesales people) rather than costly enterprise software salesmen, who may be able to land that big deal but so often do not in practice, yet still expect a hefty salary whether they sell anything or not.

It is indeed rather curious, then, as to why on-demand has not really taken off more in the BI sector. There are some toe-in-the-water efforts from some vendors, but I suspect that most are nervous that such sales cannibalise their conventional channels. To me it seems there are opportunities here for start-up companies who don’t have entrenched ways of doing business to take advantage of this situation, though since the VC community is out frantically hunting for the next Myspace and Facebook it is scarcely funding anything so untrendy as enterprise software, so there is actually little activity here either. At some point the pendulum will surely swing back, and at this point companies who have embraced on-demand delivery seem to me to be well-placed to take advantage of a rare but genuine shift in delivery model.

I’d be interested in any BI vendors or end-users with experience of on-demand BI who’d like to share their experiences, good or bad.

del.icio.us:Demanding BI  digg:Demanding BI  reddit:Demanding BI  Y!:Demanding BI

The surprisingly fertile world of database innovation

July 24, 2007

I came across a thought-provoking article, an interview with Michael Stonebraker. As the inventor of Ingres this is someone who knows a thing or two about databases, and I thought that some interesting points were raised. He essentially argues that advances in hardware have meant that specialist databases can out-perform the traditional ones in a series of particular situations, and that these situations are in themselves substantial markets that start-up database companies could attack. He singles out text, where relational databases have never prospered, fast streaming data feeds of the type seen on Wall Street, data warehouses and specialist OLTP. With Streambase he clearly has some first-hand experience of streaming data, and OLTP is what he is working on right now.

I must admit that with my background in enterprise architecture at Shell I underestimated how much of a market there has been for specialist databases, assuming that the innate conservatism of corporate buyers would make it very hard for specialsit database vendors. Initially I was proved right, with attempts like Red Brick flickering but quickly becoming subsumed, while object databases were clearly not going to take off. With such false starts it was easy to extrapolate and assume that the relational vendors would simply win out and leave no room for innovation. However to take the area of data warehousing, this has clearly not been the case. Teradata blazed the trail of a proprietary database superior in data warehouse performance to Oracle etc, and now Netezza and a host of smaller start-ups are themselves snapping at Teradata’s heels. The in-memory crowd are also doing well, with for example Qliktech now being the fastest growing BI vendors by a long way, thanks to its in-memory database approach. Certainly Stonebraker is right about text - companies like Fast and their competitors would not dream of using relational databases to build their text search applications, an area where Oracle et al never really got it right at all.

Overall there seems to be a surprising amount of innovation in what at first glance looks like an area which is essentially mature, dominated by three big vendors: Oracle, IBM, Microsoft. Teradata has shown that you can build a billion dollar revenue company in the teeth of such entrenched competition, and the recent developments mentioned above suggest that this area is far from being done and dusted from an innovation viewpoint.

del.icio.us:The surprisingly fertile world of database innovation  digg:The surprisingly fertile world of database innovation  reddit:The surprisingly fertile world of database innovation  Y!:The surprisingly fertile world of database innovation

If all you have is a hammer…

June 29, 2007

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

del.icio.us:If all you have is a hammer...  digg:If all you have is a hammer...  reddit:If all you have is a hammer...  Y!:If all you have is a hammer...

The good old days

June 6, 2007

I attended an interesting talk today by Greg Hackett, who founded financial benchmarking company Hackett Group before selling this to Answerthink and “going fishing for a few years”. He is now a business school professor, and has been researching into company performance and, in particular, company failure. Studying the 1,000 largest US public companies from 1960 to 2004 his research shows:

- company profitability is 40% lower in 2004 than in 1960, with a fairly steady decline starting in the mid 1960s
- the average net income after tax of a company in 2004 was just 4.3%
- half of companies were unprofitable for at least two out of five years
- 65% of those top 1,000 companies in 1965 have disappeared since, with just half being acquired but 15% actually going bankrupt.

He gave four reasons for company failure: missing external changes in the market, inflexibility, short term management and failing to use systems that would show warning signs of trouble. What I found most surprising was that the correlation between profitability and stock market performance was zero.

The research suggests that the world is becoming a more competitive place, with pricing pressure in particular reducing profitability despite greater efficiency (cost of goods sold is 67% of turnover, down from 75% in 1960, though SG&A is up from around 13% or turnover to around 18%). All those investments in technology have made companies slightly more efficient, but this has been more than offset by pricing pressure.

I guess this also tells you that holding a single blue chip stock and hanging onto it is a risky business over a very long time; with 15% of companies folding over that 45 year period, it pays to keep an eye on your portfolio.

A key implication is that companies need to get better at implementing management information systems that can react quickly to change and help give them insight into competitive risks, rather than just monitoring current performance. Personally I am unsure that computer systems are ever likely to provide sufficiently smart insight for companies to take consistently better strategic decisions e.g. divesting from businesses that are at risk; even if they did, would management be smart enough to listen and act on this information? It does imply that systems which are good at handling mergers and acquisitions should have a prosperous future. This is one thing, at least, that seems to have a growing future.

del.icio.us:The good old days  digg:The good old days  reddit:The good old days  Y!:The good old days

BI for everyone?

May 29, 2007

As usual, Philip Howard has some thoughtful comments on the subject of enterprise data warehousing. The recent plethora of data warehouse appliances, pioneered by Netezza but now popping up from companies ranging from start-ups to HP, certainly has the potential to change the data warehouse landscape. However as Philip points out, it is less clear that data warehouse appliances need be connected with” ubiquitous BI”. I have written previously st some length on my view that there is really no obvious reason for the “democratisation of data” i.e. with anyone in the company having unfettered access to corporate data using whizzy reporting tools. Quite apart from whether the tools are really cuddly enough (doubtful) the question rarely asked is why would this vision be necessary or even appropriate? There are certainly people in a company whose job it is to analyse data: they would be, er, analysts. Everyone else pretty much needs a limited set of data to get on with their jobs, and certainly I would be nervous if every factory worker and truck driver in a company decided to spend an hour or two a day investigating corporate data warehouses. A salesman needs a limited of set of numbers in a year: “here is your quota” while I struggle to see why people outside finance or marketing (and only a subset of those) really need to be spending their time wrestling with data at all.

To be sure one class of people benefits from a “BI tool on every desktop”: vendors, both BI vendors and those selling associated databases and hardware. I have yet to read any articles in Harvard Business Review from CEOs complaining that their profits would be higher if only every employee in the company had a BI tool. BI ubiquity seems to me a solution in search of a problem.

del.icio.us:BI for everyone?  digg:BI for everyone?  reddit:BI for everyone?  Y!:BI for everyone?