Andy on Enterprise Software

Lies, damned lies, and Excel formulae

November 14, 2005

I made a discovery the other day. Not one of those “eureka” moments beloved of bathing Greeks, but something that prompted me to wonder about the accuracy of some of the figures we take for granted. We are so used to Excel on every desktop that we trust it implictly, and so when I needed to work out the standard deviation of some figures, I naturally turned to Excel. For those of you whose maths is rusty, standard deviation is how spread out a sample of numbers are. For example: the average of 1,3,5,7,9 is 5, and so is 3,4,5,6,7 but it can be seen that the latter sequence has its numbers more closely bunched. Standard deviation is just a mathematical measure of how close or otherwise that bunching is (in the examples above the standard deviation of the first set is 2.83 and the second set 1.41 i.e. the second set is closer bunched than the first set).

In Excel to use a function you just type into a cell something like “=average(1,3,5,7,9)” and magically you get the answer (5 in this case). So, what could be easier to do than type in:
“=stdev(1,3,5,7,9)” and see the answer appear? The trouble is that it doesn’t. The answer pops up as 3.16, not 2.83 as I was expecting. Just in case you doubt my ability to calculate a standard deviation, feel free to do it the old-fashioned way by hand, and you will see that you get 2.83, not 3.16, so what is going on? After digging around the Excel help and chatting to a mathematician friend to check I was not going completely mad, I discovered that there are actually two different standard deviation functions in Excel, one designed for where you want the whole sample set measured, and one where you want to estimate a large population from a sample, which has a slightly different formula. Now I may be getting a bit slow these days, but I did do a maths degree and yet this distinction had eluded me all these years, so I doubt I’m the only person out there unaware of this difference. If you were the person at Microsoft naming Excel functions, which do you think that people would think was the “normal” version, “STDEV” or “STDEVP”, which is what they actually named the function that calculates the standard deviation of a whole population. I am guessing that not too many of us go “aha, I’ll try “=STDEVP I expect that will be it”.

Now this may seem like a lot of fuss about an esoteric mathematical function, but be aware that standard deviation is one of the most commonly used statistical functions, used to look at samples of population, mechanical failure rates, delivery errors, temperatures, patient response rates, you name it. People take serious decisions based on statistics: which drug to put forward for clinical trials, traffic planning, machine maintenance and endless others; standard deviation is the most commonly used tool in “statistical process control”, widespread in the manufacturing industry. Given that the most of the modern world uses Excel, I find it pretty surprising that a sizeable proportion of the world has been using the wrong standard deviation function for the last twenty years all because some idiot in Seattle chose a “precise” name rather than the obvious name that most of us would have chosen.

I suppose this is only what I should have expected from a product who thinks that 1900 is a leap year. Try the formula “=DATE(1900,2,29)” and watch it happily display the 29th February 1900. As we should all be aware after the fuss over Y2k, 1900 is NOT a leap year (leap years are every four years, except centuries which are not, expect every fourth century, which is, so 1600 and 2000 are leap years, but not 1800 or 1800 or 1900). The moral of this little story: don’t take everything on trust!

del.icio.us:Lies, damned lies, and Excel formulae  digg:Lies, damned lies, and Excel formulae  reddit:Lies, damned lies, and Excel formulae  Y!:Lies, damned lies, and Excel formulae

A Halloween Tale

October 31, 2005

It’s a busy week in the master data management world, with big scary monsters out in the night and eating up smaller prey. We have seen Tibco acquire Velosel, and just today SAP acquire moribumd EII vendor Callixa, apparently for its “customer data integration efforts”. I’m not quite sure what potion SAP have been imbibing recently, but I could have sworn that they recently abandoned their own MDM offering, which after two years of selling into their massive user base had managed just 20 sites, and bought vendor A2i in order to replace this gooey mess with a new master data management offering based on A2i’s technology. Perhaps those with crystal balls available as part of their costume for their Halloween party this evening could inquire through the mists as to how buying a second vendor in the space matches up with the coherent vision of master data management that it is presumably trying to portray? At the moment this seems as clear to me as pumpkin soup.

Every vendor worth its salt now seems to be under the MDM spell, with hardly a week going by without a niche player getting gobbled up by one of the industry giants. Yet I continue to be surprised by the disjointed approach that many have taken, tackling two of the two most common key areas: customer and product, with separate technology. Sure, CDI and PIM grew up independently, but there are many, many other kinds of master data to be dealt with in a corporation e.g. general ledgers, people data, pricing information, brands and packaging data, manufacturing data to name just a few. One of our customers, BP, uses KALIDO MDM to manage 350 different types of master data. Surely vendors can’t really expect customers to buy one product for CDI, another for PIM, another for financial data, another for HR etc? This would result in a witches brew of technology, and most likely a mess of new master data technologies which in themselves will need some kind of magic wand waving over them in order to integrate the rival master data technologies. Just this nightmare is unfolding, with the major vendors each trying to stake out their offering as being the one and true source of all master data, managing all the other vendors’ offerings. I certainly understand that if any one vendor could truly own all this territory then it would be very profitable for them, but surely history has taught us that this simply cannot be done. What customers want is technology that allows master data to be shared and managed between multiple technology stacks, whether IBM, SAP, Oracle, Microsoft or whatever, rather than being forced into choosing one (which, given their installed base, is just a mirage anyway). Instead the major vendors seem to be lining up to offer tricks rather than treats.

.

del.icio.us:A Halloween Tale  digg:A Halloween Tale  reddit:A Halloween Tale  Y!:A Halloween Tale

Hiring Top Programmers

August 12, 2005

At Kalido we want to hire the best 1% of programmers. This is for a very good reason: the top 1% of programmers code 10 times as much code as the average ones, and yet their defect rates are half the average. This is a pretty amazing productivity difference, yet has been found consistently over the years e.g. by IBM. In order to try and search out these elusive people, we use a couple of different tests in addition to interviews. Firstly we use ability tests from a commercial company called SHL. In particular their “DIT5″ test, aimed at programming ability, proves to be very useful. We found a very high correlation between the test results and our existing programming team when we tried this on ourselves, and we now use it for all new recruits. Another is a software design test that we developed ourselves. We find that very few people do a decent version of this, which allows us to screen out a lot of people prior to interview, sacing time for all involved.

I actually find it encouraging that some people don’t like to have to do such tests, thinking themselves above such things or (more likely) fearing that they won’t do well. This is an excellent screening mechanism in itself - as a company we want the very best, and in my experience talented people enjoy being challenged at interview, rather than being asked bland HR questions like “what are your strengths and weaknesses” (yeah, yeah we know, you are too much of a perfectionist and work too hard, yawn). Partly as a result of these tests, as well as detailed technical interviews, we have assembled a top class programming team.

I am encouraged that a similar view is shared by Joel Spolsky, who writes a fine series of his insights into software, “Joel on software”:

http://www.joelonsoftware.com/articles/HighNotes.html

which I highly recommend.

del.icio.us:Hiring Top Programmers  digg:Hiring Top Programmers  reddit:Hiring Top Programmers  Y!:Hiring Top Programmers