Andy on Enterprise Software

Netezza heads to market

July 20, 2007

The forthcoming Netezza IPO will be closely watched by those interested in the health of the technology space, and the business intelligence market in particular. Netezza has been a great success story in the data warehouse market. From being founded in 2000 its revenues have risen dramatically. Its fiscal year ends in January. Revenues have climbed from $13M in 2004 to around $30M in 2005 to £54M in 2006, to $79.6M in fiscal year ending January 2007. Its revenues in the quarter ending April 2007 were $25M. Hardly any BI vendors can claim this kind of growth rate (other than Qliktech), especially at this scale. Its customer base is nicely spread amongst industries and is not restricted to the obvious retail, telco and retail banking. So, is this the next great software (actually partly hardware in this case) success story?

Before you get too excited, there are some things to ponder. Note that in 2006 Netezza lost $8M despite that steepling revenue rise. In the latest quarter it still lost $1.6M. This is interesting, since conventional wisdom has it that you can only IPO these days with a few quarters of solid profits, yet Netezza has yet to make a dime. Certainly, it would be fair to assume that if it can keep growing at this rate, profit will surely come (at least its losses are shrinking) but the past has showed that profits can be elusive in fast growing software companies. Also, the data warehouse market is certainly healthy, advancing at 9% or so according to IDC projections, but this is well below Netezza’s growth rate. More particularly, Netezza only attacks one slice of the data warehouse market, the high data volume one. If you have a small data warehouse then you don’t need Netezza, so only certain industries will really be happy hunting grounds for appliances like Netezza. This can be seen in the Teradata story, which is Netezza’s true competitor. Teradata has stalled at around $1 billion or so of revenue, growing just 6% last year (of course most of us wish we had this kind of problem). Certainly Netezza can attack Teradata’s installed base, but enterprise buyers are notoriously conservative, and will have to be dragged kicking and screaming to shift platforms once operational. So this to me suggests that there is a ceiling to the appliance market. If true, this means that you cannot just draw an extrapolation of Netezza’s current superb revenue growth. I have not seen this written about elsewhere, so perhaps it is just a figment of my imagination, and Netezza will prove me wrong. However you can look to Teradata to see that even it has entirely failed to enter certain industires, typically business to business industries where data is complex rather than high in volume. Fo example there is scarely a Teradata installation in the oil industry, which fits this category of complex but mostly low volume data (except for certain upstream data).

So, bearing this in mind, what would be a valuation? Well, solid companies like Datamirror are changing hands for 3x revenue or so, though these are companies with merely steady growth rather than the turbo-charged growth demonstrated by Netezza. So suppose we skip the pesky profitability question, accept this is a premium company and went for five times revenues? This would lead to a valuation of $400M on trailing revenues, maybe $500M on this year’s likely revenues. Yet the offer price of the shares implies a market cap of $621M, virtually eight time trailing revenues, and six times likely forward revenues.

This is scarcely a bargain then, though it is a multiple that will bring joy to the faces of other BI vendors, assuming that the IPO goes well. Of course such things are generally carefully judged, and no doubt the silver tongued investment bankers have gauged that they can sell shares at this price. However for me there seems a nagging doubt, based mainly on what I perceive to be this (in my view) effective cap on the market size that appliances can tackle, and to a lesser extent that lack of proven ability to generate profits. The markets will decide.

The performance of Netezza shares will be a very interesting indicator of the capital market’s view on BI vendors, and will show whether enterprise technology is coming in from the cold winter that started in 2001. Anyway, many congratulations to Netezza, who have succeeded in carving out a real success story in the furrow that for so long was owned by Teradata.

Postscript. On the first day of trading, no one seems troubled about any long term concerns.

del.icio.us:Netezza heads to market  digg:Netezza heads to market  reddit:Netezza heads to market  Y!:Netezza heads to market

Gazing Behind the Data Mirror

July 18, 2007

I have been digging a little deeper into the Data Mirror purchase by IBM that I wrote about yesterday.

It’s a good deal for IBM, and not only because the price was quite fair. With its Ascential acquisition IBM positioned itself directly against Informatica, yet Ascential’s technology did not have the serious real-time replication that is important for the future of ETL, and this is what Data Mirror does have. DataMirror gives IBM a working product with heterogeneous data source support in real time, giving IBM an important piece in the puzzle to achieve their vision for real-time operational BI and event-awareness.

A bigger question is whether IBM fully understands what it has bought and whether it will properly exploit it. Data Mirror’s strengths were modest pricing, low-impact installation, neutrality of sources it supports and performance doing this (via its log-scraping abilities and speed of applying changes). IBM must keep their eye on the development ball to ensure these aspects of the DataMirror technology are continued if it is to really exploit its purchase. For example, on the last point, the partnerships DataMirror has with Teradata and Netezza and Oracle should be continued, despite the obviously temptation to snub rivals Oracle and Teradata.

Any acquisition creates uncertainty amongst staff, and IBM needs to move beyond motherhood reassurance to show staff that it understands the DataMirror technology and business and wants to see it thrive and grow. It needs to explain how the DataMirror technology fits within a broader vision for real-time integration in combination with traditional batch oriented ETL, business intelligence and enterprise service bus (not just MQSeries) integration or else the critical technical and visionary people will dust off their resumes and start looking elsewhere.

I gather that IBM has already announced an internal town hall meeting next week, at which it needs to convince key technical staff that they have a bright future within the IBM family. I also hear that no hiring freeze has been imposed, which implies they are making the decision of growing the business, which should reassure people. IBM is an experienced company which will recognise that the true IP of a company is not in libraries of code but in the heads of a limited number of individuals, and no doubt will recognise the need to retain and motivate critical staff. It used to be poor at this (think about the brilliant technology it acquired when it bought Metaphor many years ago, but bungled the follow-up) but has got smarter in recent years e.g. I hear from DWL people that they have been treated well.

Hopefully IBM’s more recent and happier acquisition experiences will be the case here.

del.icio.us:Gazing Behind the Data Mirror  digg:Gazing Behind the Data Mirror  reddit:Gazing Behind the Data Mirror  Y!:Gazing Behind the Data Mirror

Mirror, mirror on the wall, who is most blue of them all?

July 17, 2007

on Monday IBM announced it would buy DataMirror, a Canadian software company. Data Mirror made its living by selling software that detects change in data sources and then managing replication. It differed from other ETL technology in being designed from the ground up to work in real-time rather than batch, which made it well suited to some customer situations, and the software was modestly priced. The technology was also used by some customers for backup and business continuity reasons. It had a large customer base (well over 2,000).

For IBM the acquisition adds some solid technology to its data warehouse offering and its “on demand” strategy, in this case replacing Powerpoint promises with something that actually works. Datamirror was publicly traded on the Toronto stock exchange. It did $46.5 million in revenue last year and was hoping for $55 million in fiscal year 2008, so this was a company that was delivering solid though unspectacular growth, though its share price had doubled in the last twelve months. IBM’s price of $162 million is over three times trailing revenues and so is a healthy valuation for the company, and a small premium to its stock market valuation of last week.

del.icio.us:Mirror, mirror on the wall, who is most blue of them all?  digg:Mirror, mirror on the wall, who is most blue of them all?  reddit:Mirror, mirror on the wall, who is most blue of them all?  Y!:Mirror, mirror on the wall, who is most blue of them all?

If all you have is a hammer…

June 29, 2007

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

del.icio.us:If all you have is a hammer...  digg:If all you have is a hammer...  reddit:If all you have is a hammer...  Y!:If all you have is a hammer...

I see a tall dark stranger in your future….

June 17, 2007

There is an interesting article in CIO Insight by Peter Fade, a professor of marketing at the top-rated Wharton Business School. in this he discusses the limitations of data mining, and it is an article that anyone contemplating investing in this technology should read carefully. I set up a small data mining practice when I was running a consulting division at Shell, and found it a thankless job. Although I had an articulate and smart data mining expert and we invested in what at the time was a high quality data mining tool, we found time and again that it was very hard to find real-world problems where the benefits of data mining could be shown. Either the data was such a mess that little sense could be made of it, or the insights shown by the data mining technology were, as Homer Simpson might say, of the “well, duh” variety.

Professor Faber argues that in most cases the best you can hope for is to develop simple probabilistic models of aggregate behaviour, and you simply cannot get down to the level of predicting individual behaviour using the level of data that we typically have, however alluring the sales demonstrations may be. Moreover, such models can mostly be built in Excel and don’t need large investments in sophisticated data mining tools.

While I am sure there are some very real examples where data mining can work well e.g. why some groups of people are better credit risks than others, the main point he makes is that the vision of 1-1 marketing via a data mining tool is a fantasy, and that the tools have been seriously oversold. Well, that is something that we in the software industry really do understand. We all want technology to provide magical insights into a messy and complex world that is hard to predict. Unfortunately the technology at present is generally as useful as a crystal ball when it comes to predicting individual behaviour. Yet there is still that urge to go into the tent and peer into the mists of the crystal ball in search of patterns.

del.icio.us:I see a tall dark stranger in your future....  digg:I see a tall dark stranger in your future....  reddit:I see a tall dark stranger in your future....  Y!:I see a tall dark stranger in your future....

A new twist to appliances

May 18, 2007

I wondered what Foster Hinshaw would get up to after he left Netezza, and now we know. He has set up the rather awkwardly named Dataupia, a data warehouse appliance with a difference. It is an important difference, as his appliance runs on Oracle rather than on a proprietary database like Netezza. It will also run on DB2 or SQL Server, for that matter. You just plug in MPP capable hardware to take advantage of the appliance. This is important, since having a proprietary database brings with it not only a certain amount of cost and new skills required, but also makes conservative corporate buyers nervous. If you are a Telco with really vast amounts of transaction data then this trade off may be worthwhile, as indeed can be seen in Netezza’s considerable success, but if you could get much of the benefit (and this is unclear since at this stage there are no comparative performance figures) while still running on your existing mainstream database, this would sooth the nerves of corporate CIO types who might otherwise try and block the introduction of a new database. Just as importantly, it allows existing data warehouse applications to be able to claim appliance like performance boosts. While the vast bulk of data warehouses today are custom built, this ought to be of interest to true data warehouse applications such as Kalido, which could presumably easily run on top of Dataupia’s appliance.

I think this is a very interesting development, assuming that the new product delivers on its promise. The market for an appliance capable of running on a mainstream database platform ought to be much broader than the set of applications that currently addressed by hardware appliances (or even software-based ones with their own database like Kognitio).

del.icio.us:A new twist to appliances  digg:A new twist to appliances  reddit:A new twist to appliances  Y!:A new twist to appliances

EII - dead and now buried

April 27, 2007

The most widely publicised piece that I wrote was “EII Dead on Arrival” back in July 2004. Metamatrix was the company that launched the term on the back of heavy funding from top end VCs, and I wrote previously about what seemed to me to be its almost inevitable struggles. There was some controversy over my article, which differed from the usual breathless press coverage which was associated with EII at the time (our industry does love a new trend and acronym, whatever the reality may be). I could never see how it could work outside a very limited set of reporting needs. Well, as they say on Red Dwarf: “smug mode”.

Gravity finally caught up with marketing hype this week, and Metamatrix will be bought by Red Hat and made into open source. It would have been interesting to know what the purchase price was, but Red Hat were keeping quiet about that. It is a fair bet that it was not a large sum of money. Kleiner Perkins won’t be chalking this up as one of their smarter bets.

del.icio.us:EII - dead and now buried  digg:EII - dead and now buried  reddit:EII - dead and now buried  Y!:EII - dead and now buried

Philosophy and data warehouses

March 23, 2007

Database expert Colin White wrote a prvocative article the other day:

http://www.b-eye-network.com/blogs/business_integration/archives/2007/03/what_has_busine.php

in which he ponders whether a data warehouse is really needed for business intelligence. This is an interesting question; after all, why did we end up with data warehouses in the first place rather than just querying the data at source? (which is surely a simpler idea). There seem to me to be a few reasons:

(a) Technical performance. Early relational databases did not like dealing with mixed workloads of transaction update and queries, as locking strategies caused performance headaches.

(b) Storage issues. Data warehouses typically need several years of data, whereas transaction systems do not, so archiving transaction after a few months has a performance benefit and may allow use of cheaper storage media.

(c) Inconsistent master data between transaction systems (who owns “product” and “customer” and “asset” and the like) means that it is semantically difficult to query systems across departments or subsidiaries. Pulling the data together into a data warehouse and somehow mashing it into a consistent structure fixes this.

(d) You may want to store certain BI related data e.g. aggregates or “what-if” information that is useful purely for analysis and is not relevant to transaction systems. A data warehouse may be a good place to do this.

(e) People have trouble finding existing pre-built reports, so having a single place where these live makes re-use easier.

(f) Data quality problems in operational systems mean that a data warehouse is needed to allow data cleansing before analysis.

I think that you can make a case that technology is making some strides to address certain of these areas. In the case of (e), the application of Google and similar search mechanisms (e.g. FAST) to the world of BI may reduce or eliminate problem (e) altogether. Databases have become a lot more tolerant of mixed workloads, addressing problem (a), and storage gets cheaper, attacking problem (b). It doesn’t seem to me that you necessarily have to store what-if type data in a data warehouse, so maybe (d) can be tackled in other ways. Even problem (f), while a long way from being fixed, at least has some potential now that some data quality tools are allowing SOA-style embedding within operational systems, thus holding out the possibility of fixing many data quality issues at source.

If we then take all the master data out of the data warehouse and put in into a master data repository would this not also fix (c)? Well, it might, but regrettably this discipline is still in its infancy, and it seems to me that plucking data out of transaction systems into specific hubs like a “customer hub” or a “product hub” may not be improvoing the situation at all, as indeed Coln acknowledges.

Where I differ from Colin is on his view that a series of data marts combined with a master data store may be the answer. Since data marts are subject specific by definition, they may address a certain subset of needs very well, but cannot address enterprise-wide analysis. This type of analysis can only be done by something with access to potentially all the data in an enterprise, and be capable of resolving master data issues across the source systems. Here a data warehouse in conjunction iwth a master data store makes more sense to me than a series of marts plus a master data store - why perpetuate the issues? I have no problem if the data marts are dependent i.e. generated from the warehouse e.g. for convenience/performance. But as soon as they are maintained outside a controlled environment you come back to problem (c) again.

Sadly, though some of the recent technical improvements point the way to the solution of problems (a) through (f), the reality on the ground is a long way off allowing this. For example the data quality tools could be embedded via SOA into operatonal systems and linked up to a master data store to fix master data qualit issues, but how many companies have done this at all, let alone across more than a pilot system? Master data repositories are typically still stuck in a “hub mentality” that means they are at best, as Colin puts it, “papering over the cracks of master data problems”. Moreover most data warehouses are still poorly designed to deal with historical data and cope with business change.

Hence I can’t see data warehouses going away any time soon. Still, it is a useful exercise to remind ourselves why we built them in the first place. Questioning the meaning of existence is called ontology, which ironically has now been adopted as a term by computer science to mean a data model that represents concepts within a domain and the relationship between those concepts. We seem to have come full circle, a suitable state for the end of the week.
Have a good weekend.

del.icio.us:Philosophy and data warehouses  digg:Philosophy and data warehouses  reddit:Philosophy and data warehouses  Y!:Philosophy and data warehouses

Just Singing The Blues

March 21, 2007

There was a curious piece of “analysis” that appeared a few days ago in response to IBM’s latest data warehouse announcments:

http://www.intelligententerprise.com/showArticle.jhtml;jsessionid=ZOETRZHF0SWBMQSNDLOSKH0CJUNN2JVN?articleID=198000675

In this gushing piece, Current Analysis analyst James Kobelius says: “IBM with these announcements becomes the premiere data warehousing appliance vendor, in terms of the range of targeted solutions they provide”. So, what were all these new innovative products that appeared?

Well, IBM renamed the hideous “Balanced Configuration Units” (you what now?) to “Balanced Warehouse”, a better name for sure. Also in the renaming frame was “Enterprise Class” being renamed to “E Class” (hope they didn’t spend too many dollars on that one). In fact the only supposedly “new” software at all that is apparent is the OmniFind Analytics Edition. The analysis credits this as a new piece of software, which will come as a surprise to many of us with memories longer than a mayfly e.g. the following announcement of Ominifind 8.3 is on the IBM website dated December 2005:

http://www-306.ibm.com/common/ssi/fcgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS205-342

In fact the whole release seems to be around repackaging and repricing, which is all well and good but hardly transports IBM to some new level it wasn’t at, say, a week ago.

Let’s not forget about “new services” such as “implementation of an IBM data warehouse” - well, that certainly was something that never crossed IBM’s Global Services mind before last week. Now, I’m not a betting man, but I would be prepared to wager a dollar that IBM have a contract with Current Analysis - any takers against?

The excellent blog “The Cranky PM” does a fine job of poking fun at the supposedly objective views of analyst firms that are actually taking thick wads of dollars from the vendors hat they are analysing e.g.

http://www.crankypm.com/analysts_whores/index.html

I wonder what she would make of this particular piece of insight?

del.icio.us:Just Singing The Blues  digg:Just Singing The Blues  reddit:Just Singing The Blues  Y!:Just Singing The Blues

Deja vu all over again

March 16, 2007

There is some good old fashioned common sense in an article by John Ladley in DM Review:

http://www.dmreview.com/article_sub.cfm?articleId=1078962

where he rightly points out that although most companies are now on their second or third attempt at data warehouses, they seem not to have learnt from the past and hence seem doomed to repeat prior mistakes. Certainly a common thread in my experience is the desire of IT departments to second guess what their customers need and end up making life unnecessarily hard for themselves. If you ask a customer how long he needs access to his detailed data he will say “forever” and if you ask how real time it needs to be of then of course he would love it to be instantaneous on a global basis. What is often not presented is the trade off: “well, you can have all the data kept forever, but the project costs will go up 30% and your reporting performance will be significantly worse than if you can make do with one year of detailed data and summaries prior to this”. In such a case the user might well change his view on how critical the “forever” requirement was.

This disconnect between corporate IT departments and the business continues to be a wide one. I recently did some work for a global company where a strategy session was to decide the IT architecture to support a major MDM initiative. None of the business people had even bothered to invite the internal IT department, such was the low regard in which it was held. Without good engagement between IT and business data warehouse projects will struggle, however up to date the technology used is.

Mr Ladley is also spot on regarding data quality - it is always much worse than people imagine. “Ah, but the source is our new SAP system so the data is clean” is the kind of naive comment that many of us will recognise. At one project at Shell a few years ago it was discovered that 80% of the pack/product combinations being sold in Lubricants were duplicated somewhere else in the Group. At least that could be partially excused by a decentralised organisation. Yet it also turned out that of a commercial customer database of 20,000 records, only 5,000 were truly unique, and this was in one operating company. Imagine the fun with deliveries and invoice payment that could ensue.

Certainly data warehouse projects these days have the advantage of more reliable technology and fuller function products than ten years ago, meaning less custom coding is required than used to be the case. However the basic project management lessons never change.

del.icio.us:Deja vu all over again  digg:Deja vu all over again  reddit:Deja vu all over again  Y!:Deja vu all over again