Andy on Enterprise Software

Psst, want a free business modelling tool?

February 20, 2008

Regular readers of this blog may recall that I mentioned the Kalido business modelling tool that was out with Kalido’s new software release. At TDWI Las Vegas yesterday Kalido launched this formally, and made it available for free download. There is also an on-line community set up to support this, in which as well as tool discussion, participants can share and collaborate on business models.

This seems a smart move to me, as by making the tool available for free Kalido will get some publicity for the tool that it would otherwise not get, and of course if people get hooked on the tool then they might wonder: “hey, maybe I could try connecting it up and building a warehouse” at which point, as the saying goes, a sales person will call. This follows the well-proven drug-dealer technique of giving away a free hit of something in order to lure you on to something more powerful and even more addictive in due course.

Business modelling does not get the attention it deserves, so the on-line forum could prove very interesting. The ability to share and improve models with others could turn out to be very appealing to those involved with projects of this nature; after all, essentially it is a source of free consultancy if the forum develops.

Visit http://www.kalido.com/bmcf to download a copy of the tool.
To join the community visit http://groups.google.com/group/bmcf

del.icio.us:Psst, want a free business modelling tool?  digg:Psst, want a free business modelling tool?  reddit:Psst, want a free business modelling tool?  Y!:Psst, want a free business modelling tool?

A Sideways Glance

February 19, 2008

Vertica is one of the plethora of vendors which have emerged in the analytics “fast database” space pioneered by Teradata and more recently opened up by Netezza. The various vendors take different approaches. Some (e.g. Netezza) have proprietary hardware, some (e.g. Kognitio, Dataupia) are software only, some (e.g. ParAccel) rely mainly on in-memory techniques, others simply use different designs from the traditional designs of the mainstream DBMS vendors (Oracle, DB2).

Vertica (whose CTO is Mike Stonebraker of Ingres and Postgres fame) is in the latter camp. Like Sybase IQ (and Sand) it uses a column-oriented design (i.e., it groups data together by column on disk) rather than the usual row-oriented storage used by Oracle and the like. This approach has a number of advantages for query performance. It reduces disk I/O by only having to read the columns referenced by the query and also by aggressively compressing data within columns. Through use of parallelism across clusters of shared-nothing computers, Vertica databases can scale easily and affordably by adding additional servers to the cluster. Normally the drawback to column-oriented approaches is their relatively slow data load times, but Vertica has some tricks up its sleeve (a mix of in-memory processing which trickle feeds disk updating) which it claims allow load times comparable to, and sometimes better than, row-oriented databases. Vertica comes with an automated design feature that allows DBAs to provide it with the logical schema, plus training data and queries, which it then uses to come up with a physical structure that organizes, compresses and partitions data across the cluster to best match the workload (though ever-wary DBAs can always override this if they think they are smarter). With a standard SQL interface Vertica can work with existing ETL and business intelligence tools such as Business Objects, and has significantly expanded the list of supported vendors in their upcoming 2.0 release.

With so many competing vendors all claiming tens of times better performance than others, the measure that perhaps matters most is not a lab benchmark but customer take-up. Vertica now has 30 customers such as Comcast, BlueCrest Capital Management, NetworkIP, Sonian Networks and LogiXML, and with its upcoming 2.0 release out on 19/2/2008 is doing joint roadshows with some of these. It has done well in Telcos, who have huge data volumes in their call detail records databases. Two deployed Vertica customers have databases approaching 40 TB in size. Another area is financial services, where hedge funds want to backtest their trading algorithms against historical market data. With one year worth of US financial markets data taking up over 2TB, this can quickly add up, and so Vertica has proved popular amongst this community, as well as with marketing companies with large volumes of consumer data to trawl trough. Vertica runs on standard Linux servers, and it has a partnership with HP and Red Hat to provide a pre-bundled appliance, which is available from select HP resellers.

With solid VC backing, a glittering advisory board (Jerry Held, Ray Lane, Don Hadrele,…) and genuine customer traction in an industry long on technology but short on deployed customers, Vertica should be on every vendor short-list for companies with heavy duty analytical requirements which currently stretch performance limits and budgets

del.icio.us:A Sideways Glance  digg:A Sideways Glance  reddit:A Sideways Glance  Y!:A Sideways Glance

A Lively Data Warehouse Appliance

February 15, 2008

DATAllegro was one of the earlier companies to market (2003) in the recent stampede of what I call ”fast databases”, which covers appliances and other approaches to speedy analytics (such as in-memory databases or column-oriented databases). Initially DATAllegro had its own hardware stack (like Netezza) but now uses a more open combination of storage from EMC and Dell Servers (with Cisco InfiniBand Interconnect). It runs on the well proven Ingres database, which has the advantage of being more “tuneable” than some other open databases like MySQL.

The database technology used means that plugging in business intelligence tools is easy, and the product is certified for the major BI tools such as Cognos and Business Objects, and recently Microstrategy. It can also work with Informatica and Ascential Datastage (now IBM) for ETL. Each fast database vendor has its own angle on why its technology is the best, but there are a couple of differentiators that DATAllegro has. One is that it does well in situations of mixed workloads, where as well as queries there are concurrent loads and even updates happening to the database. Another is its new “grid” technology, which allows customers to deal with the age-old compromise of centralised warehouse v decentralised data marts. Centralised is simplest to maintain but creates a bottleneck and creates scale challenges. However de-centralised marts quickly become un-co-ordinated and can lead to lack of business confidence in the data. The DATAllegro grid utilises node-to-node hardware transfer to allow dependent copies of data marts to be maintained from a central data warehouse. With transfer speeds of up to 1 TB a minute (!) claimed, such a deployment allows companies to have their cake and eat it. This technology is in use at one early customer site, and is just being released.

DATAllegro has set its sights firmly at the very high end of data volumes, those encountered by retailers and telcos. One large customer apparently has a live 470 TB database implementation, though since the company is very coy about naming its customers I cannot validate this. Still, this is enough data to give most DBAs sleepless nights, so it is fair to say that this is at the rarefied end of the data volume spectrum. This is territory firmly occupied by Teradata and Netezza (and to a lesser extent Greenplum). The company is tight-lipped about numbers of customers (and I can find only one named customer on its website), revenues and profitability, making it hard to know what market momentum is being achieved. However its technology seems to me to be based on solid foundations and has a large installed base of Teradata customers to attack. Interestingly, Oracle customers can be a harder sell, not because of the technology but because of the weight of stored procedures and triggers that customers have in Oracle’s proprietary extension to the SQL standard, making porting a major issue.

If only DATAllegro can encourage more customers to become public then it will be able to raise its profile further and avoid being painted as a niche vendor. Being secretive over customer and revenue numbers seems to me self-defeating, as it allows competitors to spread fear, uncertainty and doubt: sunlight is the best disinfectant, as Louis Brandeis so wisely said.

del.icio.us:A Lively Data Warehouse Appliance  digg:A Lively Data Warehouse Appliance  reddit:A Lively Data Warehouse Appliance  Y!:A Lively Data Warehouse Appliance

Shameless Self Promotion

February 11, 2008

There is an MDM whitepaper which you can download (free with registration) from the Bloor website:

https://www.bloor-research.com/research/white_paper/908/master_data_management.html

It is a high level overview of the MDM market, and discusses general trends and issues rather than getting into vendor specifics; it does include a new high level functionality model for MDM products. Unsolicited feedback thus far has included:

“Very comprehensive and detailed”
“Great Job”
“Very well written”
“Right on the money”
“One of the best papers I have read on MDM”.

Of course, I may be a little biased, but it may be worth a look…

Thanks to some readers of this blog who provided feedback on my early drafts; much appreciated.

del.icio.us:Shameless Self Promotion  digg:Shameless Self Promotion  reddit:Shameless Self Promotion  Y!:Shameless Self Promotion

Peeking at Models

February 7, 2008

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here: http://www.kalido.com/resources-multimedia-center.htm

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

del.icio.us:Peeking at Models  digg:Peeking at Models  reddit:Peeking at Models  Y!:Peeking at Models

Informatica prospers

February 6, 2008

Informatica shrugged off the US financial services troubles with a strong quarter. Revenue was USD 114M, up 24% year on year, operating profit of USD 25M up 47%. Licence revenue was up 28%, with ten deals over USD 1 million in size.

Informatica continues to prosper as the leading independent ETL tool. IBM’s acquisition of Ascential is rumoured not to have been one of its happier ones, and certainly it seems to have done Informatica no harm at all.

del.icio.us:Informatica prospers  digg:Informatica prospers  reddit:Informatica prospers  Y!:Informatica prospers

The MDM Blues

January 31, 2008

After living in denial for some time, IBM have got the “multi domain” message about MDM which I have been bleating on about at length for years. They have just announced a repackaging of their MDM offerings under the banner “IBM Infosphere MDM Server”. This puts IBM firmly on the path of a server architecture that can deal with multiple types of MDM data in a consistent manner, not just customer and product but all the many other kinds of master data e.g. location, asset, contract, brand, financial profile, …..IBM has been sensibly enabling their MDM offerings in an SOA context, and MDM Server comes with 800 pre-packaged SOA services that can be invoked. IBM has bought high quality MDM technology and now at last has a strong vision of how to bring it all together.

However it is worth emphasising that this is a roadmap. For now there will remain the separate CDI hub technology (bought from DWL) and the PIM Hub technology (bought from Trigo). Over time these technologies will be integrated with common services, but this is a multi-release strategy. It is great news that IBM has finally realised that multi-domain is the right way to go, but prospects and customers need to reassure themselves about whether the roadmap meets their time horizons.

del.icio.us:The MDM Blues  digg:The MDM Blues  reddit:The MDM Blues  Y!:The MDM Blues

Nowhere to hide

January 30, 2008

A Computerworld article highlights the risks that enterprise buyers run in an age of vendor consolidation. In this case the article talks about Peoplesoft and Oracle, but the point is a general one. Just how anxious should software buyers be about their vendor being acquired?

I would argue that the vendor risk issue is frequently overplayed. You may “never get fired by buying IBM” but I recall when IBM dropped its “strategic” 4GL ADF for CSP in the late 1980s, leaving plenty of seriously large customers in the lurch (I worked for Exxon at the time, which had standardised on ADF). There is a risk in any software purchase, not only about whether the vendor will go bust at some point, but as to whether the vendor will continue to maintain and enhance the particular product you are buying. People often agonise about buying software from small vendors, but in the case of a company with one product in their portfolio, you can at least be sure that they will care a lot about that product. An industry giant may have ultra-solid finances, but can decide to drop a product line if it does not do well commercially, or for other internal reasons, as in the IBM example I mentioned. There are numerous other cases e.g. SAP MDM was dumped in favour of a new product based on acquired technology from A2i just a couple of years ago, while Oracle has plenty of “prior” in abandoning acquired product lines that did not meet its view of the world.

I believe that buyers should look at a few things in terms of risk. Look beyond the finances of the vendor to the installed base of the particular product they are buying. A product with hundreds or thousands of enterprise customers is likely to live a lot longer than one with a few. Moreover what is the growth trajectory of the customer base? A fast growing customer base will very likely receive continued investment, either internally in the case of an industry behemoth, or externally from venture capital firms in the case of smaller companies. The situation to be wary of, whatever the vendor size, is where there is a small customer base that is not growing. This situation should send warning bells off, whatever the vendor size. Of course vendors may be very coy about revealing figures, but you can for example try and talk to the chairman of a product user group to get a sense of how well the customer base is growing; a user group with shrinking numbers of attendees would be a worrying sign.

Above all, customers need to ensure that their investment has a clear and rapid payback. If you spend a million dollars in licences, with 20% annual support and 4 million in services putting it in, you should be able to stack up on the other side of the balance sheet the benefits that you are expecting to see. If the benefit case has a payback period of (say) a year, then it is less of an issue to worry about the vendor will be around in ten years time. If you have a choice between a mediocre product from a “safe” vendor and a much more productive product from a smaller riskier, vendor, then you should be able to quantify what the difference in productivity is worth to you. If the better, riskier, technology saves you millions of dollars a year and pays back in eight months v the alternative, then what sense does it make to accept an inferior technology that will actually cost you many millions in poor productivity, however “safe” it may be.

As discussed earlier, very few product lines are completely safe anyway, given the tendency of vendors to cull non-performing product lines and encourage “migration” to newer (read “profitable”) newer products. If you have a fast enough payback then you can be philosophical about a migration a few years down the road. It all comes down to rigorous cost benefit analysis of the software life-cycle, sadly something all too few customers pay proper attention to.

del.icio.us:Nowhere to hide  digg:Nowhere to hide  reddit:Nowhere to hide  Y!:Nowhere to hide

Broadening Information Access

January 15, 2008

I saw an interesting demo today from Endeca, which bills itself as an “information access” company. Of course ever self-respecting BI company would describe itself in a similar way, but Endeca’s technology is quite different in approach from BI vendors. If you build a data warehouse and then add BI reporting to it, you quickly realise that “ad hoc” reporting by end-users is fine on the prototype with a few hundred records, but less amusing if there are a few hundred millions of records involved. Hence in real life aggregates are pre-calculated, predefined reports are carefully tuned and cubes (e.g. with Cognos Powerplay or similar) are built on common subsets of data that the users are likely to want. There is always a careful trade-off between flexibility and performance. Moreover the unstructured world or documents and emails is pretty much a separate dimension, however much in reality the context of a business transaction may be described by those emails and documents rather than what is stored in the sales order system.

Endeca has a proprietary database engine which is designed to combine both structured and unstructured data in a flexible way. The MDEX engine does not just store metadata such as hierarchies and structures, but also master data such as lists of product codes. It also indexes documents and emails from corporate systems (there are a series of adaptors with the technology). The technology makes much use of in-memory searches and caching to optimise performance. Some of the implementations can be large and complex: one deployed pensions system has 800 million records, while an electronic parts application deployed has 20,000 distinct attributes.

An example of such a system that resonated with me was a “human capital” demo which was based on the idea of a consultancy practice manager. A screen was shown allowing filtering on a range of areas e.g. consultant’s billing rates, availability, location etc. So far this looked just like the kind of thing you could prepare with a BI tool e.g. you could select consultants available in the next two weeks, with a billing rate of such and such, etc, and the list of consultants would dynamically refresh. No big deal. However the next filter was “all consultants based within x miles of Detroit”; the consultant records had been tagged with geocodes and the engine calculated distances from this information. Next a query was made to find all those who also spoke French, this information not being a database index but something buried away in the consultant’s resumes i.e. in unstructured document form. Good luck writing SQL to handle these kinds of filters!

There are plenty of situations where this mix of structured and unstructured information is important, and Endeca has prospered as a company from this dawning realisation. The company has doubled its revenue for five years in a row, and in Q4 2007 did USD 30 million in revenue, two-thirds of this in software licences. With a strong base of retail customers such as Tesco and Walmart, other verticals strongly represented include government, with customers such as the FBI, CIA and NASA, financial services e.g. ABN Amro, and manufacturing e.g. Boeing, Schlumberger. There are now enterprise 500 customers in all.

The recent acquisition of arch-competitor FAST by Microsoft demonstrates how this market is increasingly recognised as key by the industry giants. While there are plenty of competitors out there the only others in the current Gartner Leaders quadrant for this market are FAST, IBM (with Omnifind) and Autonomy, which is much more established in unstructured enterprise search. Endeca has set an impressive pace of growth, and it seems to me that there are plenty of situations in other verticals e.g. healthcare, that could suit its technology.

del.icio.us:Broadening Information Access  digg:Broadening Information Access  reddit:Broadening Information Access  Y!:Broadening Information Access

Data quality whining

January 14, 2008

The data quality market is a paradoxical one, as I have discussed before. There is a plethora of vendors, yet few have revenues over USD 10 million. Despite this track record of marginalisation, more are popping up all the time. I am aware of 26 separate data quality vendors today, and this excludes the data quality offerings that have been absorbed into larger vendors such as SAS (DataFlux), Informatica (Similarity Systems), IBM (Ascential Quality Stage) and Business Objects (First Logic). Assuming that you care about data quality at all (and too few do) then how do you go about selecting one?

Well, one thing the industry has done itself no favours over is its confusing and technical terminology (if you don’t think terminology that the buyer understands matter, ask French and German wine producers about why Australian and other wine producers are drinking their lunch). A data quality tool may cover several stages:

discovery
profiling
matching
enrichment
consolidation
monitoring

and let’s just take one stage: matching. Vendors with data matching technology use a variety of techniques to match up candidate data records. These include:

heuristic matching (based on experience)
probabilistic (rules based)
deterministic (based on templates)
empirical (using dictionaries)

and this is not a comprehensive set. I saw an interesting technology today from Netrics which uses a different (patented) matching technology based on “bipartate graphs” (which in fact looked very impressive). How is an end-user buyer to make any sense of this maze? Certainly different data classes may demand different approaches, e.g. customer name and address data is highly structured and may suggest a different approach from much less structured or more complex data (such as product data, or asset data).

I am not sure of the merits of introducing something like a TPC/A benchmark for data quality (such benchmark exercises are tricky to pin down and vendors make great efforts to “game” them). However it would seem that it would not be that hard to take some common data quality issues, set up a set of common errors (transposed letters, missing letters or numbers, spurious additional letters or common misspellings) and try to match some of these up to a sample dataset in a way that compared the various algorithmic approaches, or indeed directly comparing the effectiveness of vendor products. By ensuring that different data types (not just customer name and address) are covered, such an approach may not result in a single “best” approach or product but show where certain approaches shine and others are less well suited. This in itself would be useful information for potential buyers, who at present must try to set up such bake-off comparisons themselves.

In the absence of any industry-wide benchmarks, each potential customer must set up their own benchmarks and attempt to navigate through the maze of arcane terminology, approaches and large number of vendors themselves each time. Such complexity of terminology must increase sales cycles and cause the data quality industry to be less appealing to buyers, who may just give up and just wait a larger vendor to add data quality as a feature (possibly in a manner than is sub-optimal for their particular customer needs).

Consider the wine analogy. If you buy a French wine you must navigate the subtleties of region, village, grower and vintage. For example I am looking right now at a bottle with the label “Grand Vin de Leoville Marquis de Las Cases St Julien Medoc Appellation St Julien Controlee 1975″ (it is from Bordeaux, but actually omits this from the label). Alternatively I can glance over to a (lovely) Italian wine from Jermann with the label “Where Dreams have No End”. Both are fine wines, but which is more likely to appeal to the consumer? Which is more inviting? The data quality has something to learn about marketing, in my view, just as the French wine industry has.

del.icio.us:Data quality whining  digg:Data quality whining  reddit:Data quality whining  Y!:Data quality whining