Andy on Enterprise Software

BI on demand

October 18, 2006

I write recently about the emergence of software as a service as one of the few bright spots in enterprise software at present.  With perfect timing, today a vendor came along and announced a software as a service offering in the business intelligence field.  Celequest is a start-up and it is certainly early days to see how well this idea takes off, but this is certainly an interesting development.  Celequest has the credibility of being run by Dias Nesamoney, who was founder iof Infomatica, and is backed by VCs Bay Partners and Lightspeed Ventures, who both have long track records.  The company was set up in 2002, and has some good customers like Citigroup, Cendant and Brocade, though it is not clear from the company’ website what scale these deployments are.  The application covers dashboards, analytics and data integration technology.  As far as I am aware the company uses an in-memory database “appliance” though from what I can gather the volume of data dealt with by this application so far is modest.  However this is not the point and no doubt will imcrease over time as the concept gains acceptance.  Celequest has made an astute partnership with salesforce.com, with a bridge to AppExchange.  There is also a connector to SAP. 

Certainly, there are barriers to the widesprea acceptance of this approach.  Large enterprises will be naturally conservative about the idea of letting their data out of the corporate firewall, particularly when it is key performance data of he type that BI applications use.  It is also unclear what sort of scale issues come into play when data is being accessed from beyond the coirporate network.  However for many companies, and especially SMEs, such issues will seem less important than the convenience of being able to deploy a business intelligence solution without the usual hassle of complex software installation and an army of systems integrators.  No doubt where Celequest has begun to tread, others will follow, and it will be a healthy new area of competition in the business intelligence industry.

 

 

 

 

del.icio.us:BI on demand  digg:BI on demand  reddit:BI on demand  Y!:BI on demand

Oracle buys Sunopsis

October 11, 2006

It has just been announced that Oracle has bought Sunopsis, one of the few remaining independent ETL vendors.  Since Oracle’s existing ETL tool (the rather inaccurately named “Data Warehouse Builder”) is pretty weak, this makes a lot of sense for Oracle.  I suspect that their statement about “integrating” the two tools will involve much use of the delete key for the Warehouse Builder code. Sunopsis is a good product, a French company that had been around for some time but had recently made more visible market progress in the US.  No numbers are public, but my information is that Sunopsis revenues were about USD 10M and the purchase price was just over USD 50M, which at a price/sales ratio of over five makes a quite healthy price for the company.  Sunopsis was 80% owned by the founder, who had spurned venture capital, so this is very good personal news for him also. 

Sunopsis made a virtue of using the DBMS functions where possible rather than re-inventing transformation code, so is particularly compatible with Oracle (or other relational databases). This deal should also put paid to the loose marketing relationship Oracle had with Informatica. 

In my view this is a rare case where the deal is good for both companies.  Oracle finally gets a decent ETL capability and Sunopsis gets Oracle’s massive sales channel. 

del.icio.us:Oracle buys Sunopsis  digg:Oracle buys Sunopsis  reddit:Oracle buys Sunopsis  Y!:Oracle buys Sunopsis

Marketing blues

September 28, 2006

My prize for the most creative marketing jargon of the week goes to IBM, who announced that they now consider their offerings to be a “third generation” of business intelligence.  Come again?  In this view of the world, first generation BI was mainframe batch reporting, while the second generation was data warehousing and associated BI tools like Cognos, Business Objects etc.  So, as you wait with bated breath for the other shoe to drop, what is the “new generation”?  Well, it would seem that this should include three things:

(a) pre-packaged applications

(b) focus on the access and delivery of business information to end users, and support both information providers and information consumers

(c) support access to all sorts of information, not just that in a data warehouse.

Well (a) this is certainly a handy definition, since IBM just happens to provide a series of pre-built data models (e.g. their banking data model) and so (surprise) would satisfy the first of these criteria.  It is in fact by no means clear how useful such packages are outside of a few specific sectors that lend themselves to standardisation.  Once you take a pre-existing data model and modify it even a little (as you will need to) then you immediately create a major issue for how you support the next vendor upgrade.  This indeed is a major challenge that customers of the IBM banking model face.  Nothing in this paper talks about any new way of delivering these models e.g. any new semantic integration and versioning capability.

Criteria (b) is essentially meaningless since any self respecting BI tool could reasonably claim to focus on information consumers.  After all, the “universe” of Business Objects was a great example of putting user-defined terminology in front of the customer rather than just presenting tables and columns.  Almost any existing data warehouse with a decent reporting tool could claim to satisfy this criteria.

On (c) there is perhaps a kernel of relevance here, since there is no denying that some information needs are not always kept in a typical data warehouse e.g. unstructured data.  Yet IBM itself does not appear to have any new technology here, but merely is claiming that DB2 Data Joiner allows links to non-DB2 sources. All well and good, but this is not new. They haven’t even done something like OEM an unstructured query product like Autonomy, which would make sense.

Indeed all that this “3rd generation” appears to be is a flashy marketing label for IBM’s catalog of existing BI-related products.  They have Visual Warehouse, which is a glorified data dictionary (now rather oddly split into two separate physical stores) and scheduling tool, just as they always have.  They talk about ETI Extract as an ETL tool partner, which is rather odd given their acquisition of Ascential, which was after all one of the two pre-eminent ETL tools, and given ETI’s near-disappearance in the market over recent years.  They have DB2, which is a good database with support for datatypes other than numbers (just like other databases).  They also have some other assorted tools like Vality for data quality.

All well and good, but this is no more and no less than they had before. Moreover it could well be argued that this list of tools actually misses several important points that could be regarded as important from a “next generation” data warehouse architecture.  The paper is oddly silent on the connection between this and master data management, which is peculiar given IBM’s buying spree in this area and its direct relevance to data warehousing and data quality.  There is nothing about time-variance capabilities and versioning, which are increasingly important.  What about the ability to handle a federation of data warehouses and synchronise these?  What about truly business model-based data warehouse generation and maintenance?  How about the ability to be embedded into transactional systems via SOA?  What about “self discovery” data quality capabilities, which are starting to appear in some start ups.

Indeed IBM’s marketing group would do well to examine Bill Inmon’s DW 2.0 material, which while not perfect at least has a decent go at setting out some of the capabilities which one might expect from a next generation business intelligence system.

There is no denying that IBM has a lot of technology related to business intelligence and data warehousing (indeed, its buying spree has meant that it has a very broad range indeed).  Yet there is not a single thing in this whitepaper that constitutes a true step forward in technology or design.  It is simply a self-serving definition of a “3rd generation” that has nothing to do with limitations in current technology or new features that might actually be useful.  Instead it just sets out a definition which conveniently fits the menagerie of tools that IBM has developed and acquired in this area. To put together a whitepaper that articulates how a series of acquired technologies fits together is valid, and in truth this is what this paper is.  To claim that it represents some sort of generational breakthrough in an industry is just hubris, and destroys credibility in the eyes of any objective observer.  This is by no means unique in the software industry, but is precisely why software marketing has a bad name amongst customers, who are constantly promised the moon but delivered something a lot more down to earth.

I suppose when presented with the choice of developing new capabilities and product features that people might find useful, or just relabelling what you have lying around already as “next generation”, the latter is a great deal easier.  It is not, however, of any use to anyone outside a software sales and marketing team.

 

 

del.icio.us:Marketing blues  digg:Marketing blues  reddit:Marketing blues  Y!:Marketing blues

Back from the dead

September 11, 2006

Those of you with long memories will recall that the first three real ETL vendors were Prism, Carleton and ETI.  The others were acquired but ETI survived as an independent company, though with an ever-diminishing profile.  Early this year they were apparently down to just one salesman, but having the US Department of Defense as a customer does wonders for maintenance revenue.  In recent years the company had been pared down to a minimum, and I had assumed that, like an old soldier, they might just fade away.  However in the summer the original investors were bought out and a new capital injection happened in a USD 6.5 million round from investors Appian Ventures of Denver, Access Venture Partners and Osprey Ventures, and a $5M line of credit was negotiated with Comerica bank.  Consequently the company is effectively born again, with new money, owners and management, but with established technology.

ETI’s software used to be strong at dealing with extracting data from esoteric sources, generating code against things like COBOL workbooks and assorted mainframe file systems, as well as having the usual transformation capabilities.  It suffered from being rather complex to use and from some weak marketing. 

So, the interesting question is whether this old warhorse can be dusted off, repainted and revitalised.  Judging by the seven vice presidents that have appeared in the management ranks, the new board is not afraid to spend some of that new money.  They have also licensed in some data quality offerings from a couple of small British companies, which is a logical step to broaden the product range from just ETL.  This is important because ETL on its own is a tough market, as ETI has discovered.  More and more ETL functionality is being thrown into the database (MSFT with SSIS - previously DTS - and Oracle with the poorly named Warehouse Builder tool, which is really an ETL tool) which makes it hard work to persuade a customer to buy your technology.  Only Sunopsis has really made much headway here in recent times, with a clever pitch built around using rather than competing with the database capabilities.  Other pure plays like Sagent have withered and died.  Informatica is really the only ETL player of size left standing, and they have broadened their appeal by going for a wider integration message.  So what has changed that will allow ETI to flourish now when it clearly has not done fo some time?  Perhaps new the investors have noticed a flurry of companies being bought out in the data quality space recently and so can see a fairly quick exit, perhaps there is just too much venture capital around, or maybe they have more ambitious plans for the company.

ETI certainly has some well proven technology, and its foray into data quality looks logical.  Good luck to them.  Yet relaunching a company is hard work, and it will take some impressive sales and marketing execution to turn breath new life into this particular body.

del.icio.us:Back from the dead  digg:Back from the dead  reddit:Back from the dead  Y!:Back from the dead

Darwin and data warehouse projects

September 4, 2006

Sreedhar Srikant writes about the importance of the logical data model in a data warehouse project in DM Review. This well-written article describes the process of building a model, highlights five pitfalls and suggests some ways of avoiding them.  It is in this last area that I feel the article could be enhanced.  In my experience there are two serious dangers that a data warehouse project faces that go beyond issues of project problems like trouble agreeing on a model.  These are:

(a)   The project gets insufficient business buy-in through lack of a well-articulated and robust business case

(b)   The project takes too long to deliver, making it vulnerable to budget cutbacks since it has not shown tangible benefit early enough.

I am constantly surprised how often IT projects in major corporations seem to get off the ground without a strong business case.  IT projects compete for capital in a company with many other project proposals, and so it can be a Darwinian process when times get tough: projects with the strongest business case and sponsorship will survive.  As a minimum, the project needs to set out the expected returns that it will make, set against the project costs, over a three year (sometimes five year) period.  A simple example is shown below:

Costs:  $3M one-off, $2.16M annual

Benefits: $5M from year 2 onwards, with $2M only in year 1.

In this instance the project costs $3M to deliver and just over $2M to support each year (a Data Warehouse Institute survey showed that the average data warehouse costs 72% of its build costs to support every year).  Against this are some benefits as shown.  In this instance the project is tolerably attractive, since it has a positive net present value (USD $536k using a typical 18% discount rate) and a decent 27% IRR, though its payback period is a little slow.  However, while not stellar, it is respectable as cases go, and it is at least written in the language of business. 

What might the project benefits be?   These will vary from project to project and from industry to industry, but examples might include either profit-enhancing benefits, such as reduced customer churn or improved pricing ability, or cost reductions such as fewer misplaced deliveries due to improved data quality, or better procurement margins to due to improved understanding of supplier spend.  In order to articulate these you need to find a business sponsor, preferably one who has a problem related to poor information.  Trust me; you should not have to look too far in a big company for one of these. 

Having a business case that is properly set out will act as a safety net when project reviews happen, and reduce the chances of a project being cancelled when the knives come out. 

The second thing that can help your project is to deliver something tangible early.  Traditional waterfall methodologies, often used by large systems integrators, are not always well suited to data warehouse projects, where requirements are often rather loose.  The average data warehouse project takes 16 months to deliver according to TDWI, and that is a long time in this turbulent world when management has its budgets adjusted and people are looking for projects to cut.  If your project can deliver something meaningful quickly i.e. a piece of the overall problem, then your project sponsor has a lot better chance of defending the project.  If all the review committee can see is costs then things will be harder.  Many real projects I have been involved with have been killed in this way.

One way to improve your odds of delivering something quickly is to use a data warehouse package, where at least some of the functionality is already pre-built for you.  Packages may or may not be cheaper than custom-build, but they should be quicker.  If you can pick off a chunk of the project and deliver reports back to the sponsor that add value early on then the project is much more likely to survive than one that is still delivering a grand enterprise logical data model. These days there are several packaged or semi-packaged alternatives to custom build.  A good overview of the packaged data warehouse market was done this year by Bloor and can be downloaded for free here.  

By developing a robust business case and by delivering benefits iteratively your project greatly improves its chances of survival.  When the budget sharks come circling, it is nice to have a life raft.

 

 

 

del.icio.us:Darwin and data warehouse projects  digg:Darwin and data warehouse projects  reddit:Darwin and data warehouse projects  Y!:Darwin and data warehouse projects

Diverse data warehouse approaches

August 18, 2006

There seem to be a few debates going on about data warehouse architectures at present e.g. one on William McKnight’s blog.  I think that the increasing alternative approaches available is actually a sign of two things.  One is that the problem that data warehousing seeks to address has by no means been solved: people do not have access to the high quality information they need in a consistent of timely fashion.  Secondly, that there is increasing innovation in the area: witness the rise of packaged data warehouses, EII tools and data warehouse appliances in recent years.  It is all a lot more complicated that Inmon v Kimball.

So what have we learnt? Firstly, there are some approaches that just don’t work well.  For a while in the 1990s there was a school of thought that data marts were sufficient without a central warehouse, and this seems to be pretty well debunked.  Just joining up point to point transaction system data via specific data marts results in a potentially vast set of unmanaged data marts, which do nothing to resolve inconsistency between systems at the enterprise level.  Related to this, selling “analytic apps”, which are basically data marts with a specific data model hard wired and some reports on top, does not work either.  The data model always needs modification to the specifics of the customer, and as soon as you do that you no longer have a package but a series of custom-built (or at least custom modified) marts again.  Informatica found this out the hard way until sensibly withdrawing from this flawed approach.

I think it is also clear that EII only has a limited place in a BI architecture.  The pioneer here, Metamatrix, had flat (and modest) sales last year and, moreover, half its customers use it only against one data source: hardly a wild success.  EII does not address issues of data quality or storing data historically, so at best can be only a partial solution.

Within the data warehouse approach I feel that it is important to understand the different types of usage patterns.  In particular, some types of reporting are very operational in nature, and are best served either by reports directly against an individual transaction system (here EII may have a role) or via an operational data store, essentially a straight copy of data into a separate database.  An ODS avoids queries interfering with operational processing, one of the issues with EII.  ERP vendors have started to provide ODS solutions e.g. SAP BW, but don’t confuse these with a full-function enterprise warehouse. These do well in ODS roles, less well when dealing with a wide set of data sources.  The narrower the scope of the report you need, the better suited it is to an ODS (or EII).  The broader the scope (or if it needs historical data), the better suited it is to a data warehouse.  Having a series of ODSs feeding into an enterprise warehouse is a sensible approach. 

However to satisfy reporting needs that span multiple transaction systems, or which deal with historical data, you really need a data warehouse of some sort.  The choice here is widening.  You can now buy packaged, or at least semi-packaged data warehouses from a number of sources.  See the report from Bloor on this market, which you can download in full here.  It has to make sense to buy functionality rather than building it, since it will be quicker and ultimately cheaper.  Data marts can still be part of the picture, but should be dependent i.e. generated from the warehouse.  In this way they stay in line with changes in the source systems.  If you have a very high volume of data, as happens in some industries like retail, Telco and retail banking, then you can now choose from a range of data warehouse appliances, even an open source one, if you don’t fancy Teradata, which was the pioneer and is still the leader in this area.  An alternative to a single giant warehouse is to have federated data warehouses, each feeding up one or more layers to regional or gloabl warehouses.  This approach is offered by Kalido and deployed at companies like Unilever and BP.

Finally, it is becoming clearer that, in parallel with a data warehouse, in order to make the most of it you will want your master data to be as high quality as possible.  A master data repository can act as the hub for improving data quality across the enterprise, and is complementary to the warehouse (indeed, it can be a source for the warehouse, and also to an enterprise bus in more ambitious deployments). The rise of interest in master data management presents a lifeline to data quality vendors, who has been steadily disappearing. Even here there are new approaches in the form of start-ups like Exeros and Zoomix.

Finally, data warehouses can become as real-time as necessary, given sufficient work.  Few BI requirements are truly real-time, but for those that are you can satisfy them either by embedding reporting directly in the transaction system, via EII, an ODS or even by drip-feeding data into an enterprise warehouse.  For example Kalido has an interesting one of these in a financial services setting, where the data appears just ten seconds after changes to the core transaction systems. 

The continuing thirst for better information, and a realisation that few companies have got it right yet, is causing increasing innovation in all these areas: packaged warehouses, appliances, EII, MDM, data quality.  This is a long way from a mature market.  

del.icio.us:Diverse data warehouse approaches  digg:Diverse data warehouse approaches  reddit:Diverse data warehouse approaches  Y!:Diverse data warehouse approaches

Me 2 for Teradata MDM

August 7, 2006

Teradata has taken on I2’s MDM solution as its own, moving forward from a previous partner relationship to one where they licence the code.  Indeed Teradata has hired I2 staff responsible for the I2 MDM product, so this is more a technology acquisition than a regular partnership. For i2 this will effectively remove it from the MDM space, but in actual fact Teradata is a more logical home for this technology.  I2 has struggled recently, with its share price at $14.30 today barely more than half that last September.  Since I2 is very much a supply chain vendor, the general purpose MDM technology it has was a somewhat ungainly fit. For them this move is about concentrating on their core supply chain competence. 

Teradata’s strong data warehouse offering has a much more natural fit to MDM, as indeed does any data warehouse vendor.  It makes sense for a data warehouse to sit alongside a separate master data repository, as I have written about previously.  Since I2’s MDM technology had a decent reputation (though rather limited customer deployments) Teradata’s strong channel and more natural technology fit should gain better traction for it, as well as plugging a gap in Teradata’s offerings.

 

 

 

 

del.icio.us:Me 2 for Teradata MDM  digg:Me 2 for Teradata MDM  reddit:Me 2 for Teradata MDM  Y!:Me 2 for Teradata MDM

Beware Googles bearing gifts

August 3, 2006

If you go into Google you can find most things remarkably quickly amongst the vastness of the internet, so why can’t you find your sales data inside a large company?  This perfectly reasonable question has prompted some BI vendors to team up with Google in order to put a Google search front end onto enterprise data.  Sound too good to be true?  Sadly I fear that it is.  The search capabilities of Google are superb at searching for keywords on websites, and enable you to quickly zero in on what you are looking for provided you can make your search keywords precise enough.  Unfortunately the same technique does not translate well to the semantic nuances of enterprise data, where finding a database with “price” data in it unfortunately does not give you sufficient context (which for which product, under which commission scheme, on what date, within which sales area, etc?).  Moreover a search engine does not yet generate the SQL to get the wretched data out of the corporate databases where the answers lie.  Hence to put a Google front end on to a BI tool you are probably going to have to run a bunch of reports, give them some tags and publish them as web pages - Google will certainly be able to deal with that, but then is this so much better than just picking the report you want out from a list anyway?

Andrew Binstock writes a useful article about this in Infoworld but perhaps glosses over the magnitude of the problem in terms of finding answers to data on an ad hoc basis.  Indeed early implementations essentially throw the problem back to a BI tool, which generates results in a form that a search front end like Google can use.  Usually this is not the biggest problem anyway, as it easy enough to put menus together with the top 20 or whatever regular reports for users to choose from.  I can see a real use for this when the sheer number of canned reports gets out of hand though.  If you have thousands of reports to trawl through then having a search front end could be genuinely useful.  But the lack of semantic understanding needed of enterprise definitions will make it just as hard for a search tool to make any sense of a mass of numbers as a BI tool, which relies on either some from of front end semantic layer (as Business Objects uses) or assumes the existence of a data warehouse where the semantic complexity has been pre-resolved into a single consistent form.  As the article correctly points out, the only way to fix this is through better metadata.  Indeed, greatly improved master data definitions could find a further use as tags to help search engine front-ends make more sense of large numbers of pre-built corporate reports. Unfortunately the nirvana of ad hoc access to corporate data via an intuitive search front-end seems to me no closer than before. 

What is certainly true is that the BI vendors can use these Google front-ends to make pretty demos to try and sell more software.  However they do run a hidden danger in doing so.  Given that at present the Bi vendors compete partly through the ease of use of their graphical interfaces, by handing over the user interface to Google they may be in danger of commoditising part of their competitive advantage.  If you have a simple search front-end, who knows whether the report originally came from Business Objects, Cognos, or Information Builders?  I wonder whether the BI vendors have really thought through the danger to their own businesses that this seemingly innocent search front end could become.  By jumping on the Google bandwagon they could be unleashing something that removes their direct contact from the end user, a key element in differentiation.

 

del.icio.us:Beware Googles bearing gifts  digg:Beware Googles bearing gifts  reddit:Beware Googles bearing gifts  Y!:Beware Googles bearing gifts

Retail Therapy

August 2, 2006

William McKnight writes an interesting article in DM Review this month looking at business intelligence trends in retail.  As he says, retailers ideally need to track promotions and customer buying trends, yet are often poorly served by management information systems.  One reason for this is the sheer volume of data: if you consider every item on a till receipt as a transaction, then each store will be putting through many thousands of transactions per day. A convenience store might have 20,000 stock keeping units (SKUs) but a department store might have over 200,000.  In addition a retailer will want to keep track of customers who have loyalty cards, will be concerned about space optimisation and stock control.  The more astute retailers vary the stock on their shelves in accordance with buying patterns they have observed: clearly the customer profile in mid morning is different to just after schools finish, for example.  One Japanese chain reckons to change the stock profile on its shelves seven times per day.

To get this kind of insight you need a high quality, robust data warehouse that is able to handle large data volumes and keep up with rapid change. However one thing that I learnt when working on a Shell retail project a few years ago was that you can make life easier for yourself by quickly archiving the transaction detail.  A category manager may want to do basket analysis for a few days, but beyond that is interested in trends, which can be satisfied by aggregate information.  This allows you to rapidly archive the high volume transaction data and so keep the data in the warehouse to manageable levels. 

Good BI on retail can make a major effect on profits, as I have written about before.  The dawn of RFID, though still in its infancy, will further extend the possibilities for more elaborate analysis, though my suspicion is that most retailers are barely scratching the surface of what can be done today using the current technology.

del.icio.us:Retail Therapy  digg:Retail Therapy  reddit:Retail Therapy  Y!:Retail Therapy

Open source appliances

August 1, 2006

The early success of Netezza has not only prompted other “me too” database appliance players like DataAllegro but also now an open source variant.  Greenplum has combined its open source database with Sun hardware to come up with an appliance of its own.  This is an interesting move that makes sense for Greenplum, since Sun obviously has a far larger sales channel than itself and so is a potentially powerful partner.  The article in Information Week is rather too breathless in its description of a DBMS + some hardware as an “instant data warehouse” however.  This is nonsense, and is no more so than a copy of SQL Server and a PC is “an instant data warehouse” (and at least SQL Server has some functionality directly useful to data warehousing in it). If all you had to do was supply a DMBS to have a data warehouse then there would be a lot of unemployed systems integrators and consultants.

However, leaving aside the inability of the journalist to see beyond the words of the company press release, this is an astute move by Greenplum, which can use Sun’s credibility to make it reassure nervous enterprise customers who may otherwise be twitchy about entrusting their data warehouse data to an open source platform.  However any prospect should be careful to check the performance characteristics of their application to see how well the Bizgres DBMS stacks up, since just throwing hardware at a problem can be an expensive thing to do.  But from a customer perspective it is good to have another choice to compare with Teradata and Netezza if you have an ultra-high transaction volume problem.

 

del.icio.us:Open source appliances  digg:Open source appliances  reddit:Open source appliances  Y!:Open source appliances