Andy on Enterprise Software

Searching for meaning

September 6, 2007

I have written before about how many industry surveys can be almost meaningless due to the way that they are phrased or the way that the audience is selected or encouraged to participate. Sometimes the survey itself can miss the point, as in a recent one about the percentage of data that is structured or unstructured. An article about this agonises about whether unstructured data is 31% of all enterprise data, or 50 odd percent, rather than the “80% claimed by other research organisations”. It seems to me that this misses the point. It is less relevant about what proportion of data is unstructured (and by the way, does that mean the storage volume, or the number of sources, or something else, since the article blithely skips over this) than about the value and usage of this data. The context here is the use of search technology of BI, with people who sell this technology presumably wanting to make a point that most data is there in emails and spreadsheets, so therefore search technology can mostly replace that pesky BI business. This seems to me a flawed argument. In the context of business information, we typically know what we want e.g. the monthly sales figures and, unlike when we search the web using Google, we also have a fair idea where it is e.g. the company financial systems. The difficulty is not in finding the information but in making it meaningful, which is what the vast majority of effort in data warehousing and BI is all about. Unlike a search for a video clip of an episode of “Heroes”, or finding a particular book on Amazon, the difficulty is that many ambiguous answers exist. Books are a nice analogy, as the world discovered long ago the sense of putting a unique (it terms out not quite unique, but for most purposes it is true) ISBN number on books to avoid ambiguity. This is not the case in large enterprises with “sales figure”, which in fact will exist not only in the official corporate finance system, but in several other “proper” systems, and in endless spreadsheets to which the information has been downloaded and possibly manipulated for various purposes. Indeed trying to make a meaningful and useful classification scheme around data is what master data management, and much of data modelling, is all about.

Imagine the fun Amazon would have in finding a book in a world with no ISBN number, and where rival publishers regularly published identical titles, some even from the same author. This is more like the world that BI deals with. Indeed, if there was one single place where “sales data” lived, and if everyone agreed on exactly which sales data that was (the whole company’s, just Europe’s, with or without indirect sales?) then the world of BI would be a simple place and data warehouse developers could pack up and learn a new trade. This reality seems to have eluded some vendors plugging BI search, and indeed some of the industry writers. It is almost irrelevant what “percentage” of data is unstructured, semi-structured, or structured. In the imperfect world of enterprise data a high proportion of the important data suffers from the persistent problem of ambiguous classification and multiple copies, with processes that do not perfectly control replication of that data. It is a world Google Search can shake an uncomprehending stick at all it likes, but to me it is likely to have only a limited impact. Until enterprises get a real grip on the life cycle of information management and put processes in place to properly classify and allow for update and distribution or master data (don’t hold your breath), the world of BI won’t be replaced by a search icon.

del.icio.us:Searching for meaning  digg:Searching for meaning  reddit:Searching for meaning  Y!:Searching for meaning

Searching for meaning

I have written before about how many industry surveys can be almost meaningless due to the way that they are phrased or the way that the audience is selected or encouraged to participate. Sometimes the survey itself can miss the point, as in a recent one about the percentage of data that is structured or unstructured. An article about this agonises about whether unstructured data is 31% of all enterprise data, or 50 odd percent, rather than the “80% claimed by other research organisations”. It seems to me that this misses the point. It is less relevant about what proportion of data is unstructured (and by the way, does that mean the storage volume, or the number of sources, or something else, since the article blithely skips over this) than about the value and usage of this data. The context here is the use of search technology of BI, with people who sell this technology presumably wanting to make a point that most data is there in emails and spreadsheets, so therefore search technology can mostly replace that pesky BI business. This seems to me a flawed argument. In the context of business information, we typically know what we want e.g. the monthly sales figures and, unlike when we search the web using Google, we also have a fair idea where it is e.g. the company financial systems. The difficulty is not in finding the information but in making it meaningful, which is what the vast majority of effort in data warehousing and BI is all about. Unlike a search for a video clip of an episode of “Heroes”, or finding a particular book on Amazon, the difficulty is that many ambiguous answers exist. Books are a nice analogy, as the world discovered long ago the sense of putting a unique (it terms out not quite unique, but for most purposes it is true) ISBN number on books to avoid ambiguity. This is not the case in large enterprises with “sales figure”, which in fact will exist not only in the official corporate finance system, but in several other “proper” systems, and in endless spreadsheets to which the information has been downloaded and possibly manipulated for various purposes. Indeed trying to make a meaningful and useful classification scheme around data is what master data management, and much of data modelling, is all about.

Imagine the fun Amazon would have in finding a book in a world with no ISBN number, and where rival publishers regularly published identical titles, some even from the same author. This is more like the world that BI deals with. Indeed, if there was one single place where “sales data” lived, and if everyone agreed on exactly which sales data that was (the whole company’s, just Europe’s, with or without indirect sales?) then the world of BI would be a simple place and data warehouse developers could pack up and learn a new trade. This reality seems to have eluded some vendors plugging BI search, and indeed some of the industry writers. It is almost irrelevant what “percentage” of data is unstructured, semi-structured, or structured. In the imperfect world of enterprise data a high proportion of the important data suffers from the persistent problem of ambiguous classification and multiple copies, with processes that do not perfectly control replication of that data. It is a world Google Search can shake an uncomprehending stick at all it likes, but to me it is likely to have only a limited impact. Until enterprises get a real grip on the life cycle of information management and put processes in place to properly classify and allow for update and distribution or master data (don’t hold your breath), the world of BI won’t be replaced by a search icon.

del.icio.us:Searching for meaning  digg:Searching for meaning  reddit:Searching for meaning  Y!:Searching for meaning