If you work for a software vendor it’s often presumed by those who don’t that you will know how each and every feature of your software package will work across a wide range of application scenarios – including features that are brand new. The truth is that no one figures these things out except by trial and error. New features are documented in release notes, usually in a very cursory way, but few confident users read release notes. More to the point, this kind of itemisation tells you only that a new feature exists, not how it will actually work. Practical applications are what really matter, and in the end they can only be determined through experimentation.
When you work for a vendor and a new version of your software is made available internally, everyone downloads and tries to install it – often unsuccessfully at first – then someone gets it working and eventually everyone starts playing with it to see what’s new. Only trial and error can tell you whether new features work as anticipated, if existing bugs have been fixed properly, and what new applications have become possible with the increase in functionality. Everyone experiments. When you work for a vendor you have the benefit of belonging to a community of experimenters. Consequently the experiments happen locally, in parallel, and the resulting knowledge is quickly shared with others.
In other words, regardless of whether you’re using commercial, commodity or open source software, community is central. As a user, the reach and activity level of your community and your access to it matter far more than the business model supporting your software’s production.
In solution selling it is common to encounter prospective clients making the following claim:
- “Our business is different.”
It’s a claim that is frequently paired with the following question, addressed to prospective suppliers:
- “Where have you done this before?”
This appears to be a paradox. If your business is different, then at some level you don’t believe that what you’re asking for has been done before. So why ask for prior examples?
When this dissonance arises it usually signals an uneducated buyer. If that buyer is you, you’re probably having your expression of needs shaped by a vendor (a software vendor, an implementation partner, or a consultant). This isn’t inherently problematic, but if your needs are complex – and in Business Analytics they usually are – you run a high risk of prematurely outsourcing their definition – to your eventual detriment.
As it happens, vendors are generally very good at abstracting needs. They do it all the time, but they do so according to a vendor worldview. Each vendor’s worldview is different, and each is invariably shored up by confirmation bias. Assessing which worldview most closely results in a good match with your problem requires you to be highly educated about both your problem and the worldviews self-interestedly competing to frame solutions to it. This can’t be outsourced to providers.
“Our business is different” is usually a proxy for something else. What it really means is:
- “This is important to us.”
- “We want your attention.”
“Where have you done this before?” communicates a set of understandable concerns that prospective buyers have regarding suppliers of complex solutions:
- “We’re uncertain about what we’re doing.”
- “How do we know we can trust you?”
Seen in this way, the two aren’t a paradox. They are, however, a warning signal to both buyers and sellers of Business Analytics capability.
Many business initiatives which involve the use of software, including those in Business Analytics, get bounced around between two opposing conceptions of how things should be done:
- “We will change our business processes to conform to the best practices encoded in the software.”
- “We will customise the software to conform to our business processes.”
In fact these are the end points of a continuum along which all software-enabled initiatives can be placed. Understanding where a particular initiative ideally belongs is critical to its success. In the fashion world there are the ready-to-wear items stocked by department stores, less everyday made-to-measure items such as tailored suits which get customised from template components, and one-off bespoke creations which are rarely seen beyond the catwalk. Each has its place. We know we can buy socks without trying them on. We’re prepared to look around more when choosing a suit or a wedding dress, and we know to factor in alterations.
In business, ‘best practices’ make sense only in well-defined, routine, compliance-driven, non-competitive, non-core areas. Their invocation in areas of competitive advantage or strategy should ring alarm bells. A hedge fund knows it can happily buy its payroll system off the shelf, just as it knows there is no best practice when it comes to the core business of making trades (buying low and selling high is of course the goal, but it doesn’t constitute a practice).
One of the contentions of the Analyst First approach to Business Analytics is that some — arguably most — of what organisations should be aiming to do should be bespoke.
The Canberra Chapter of Analyst First is off and running as of three hours ago, thanks to sponsorship by EMC Greenplum, the valuable time and energy of Graham Williams, the most famous data miner in Canberra, and now the head of the Canberra chapter of A1, and the 61 people attending, who together made it one of the largest Analytics events in Canberra to date.
Slides here, and a video of the talks will follow soon.
We look forward to active participation in A1 from Canberra’s Analytics community.
The McKinsey Global Institute has recently (May 2011) released a comprehensive report (156 pages) entitled “Big Data: The next frontier for innovation, competition, and productivity”. It contains good news for Business Analytics practitioners, analytically literate managers, and proponents of the Analyst First approach:
A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data… Furthermore, this type of talent is difficult to produce, taking years of training in the case of someone with intrinsic mathematical abilities. (p.10)
That said, the report is best summarised as a restatement of the standard business case for Business Analytics, but using the phenomenon of big data as an organising principle. So much so that if you replaced “big data” with “Business Analytics” throughout you would end up with something very similar to Tom Davenport’s ‘Competing on Analytics’ thesis from 2006. Take, for example, the penultimate paragraph from the Executive Summary:
The effective use of big data has the potential to transform economies, delivering a new wave of productivity growth and consumer surplus. Using big data will become a key basis of competition for existing companies, and will create new competitors who are able to attract employees that have the critical skills for a big data world. Leaders of organizations need to recognize the potential opportunity as well as the strategic threats that big data represent and should assess and then close any gap between their current IT capabilities and their data strategy and what is necessary to capture big data opportunities relevant to their enterprise. They will need to be creative and proactive in determining which pools of data they can combine to create value and how to gain access to those pools, as well as addressing security and privacy issues. On the topic of privacy and security, part of big the task could include helping consumers to understand what benefits the use of big data offers, along with the risks. In parallel, companies need to recruit and retain deep analytical talent and retrain their analyst and management ranks to become more data savvy, establishing a culture that values and rewards the use of big data in decision making. (p.13)
A number of challenges to realising value through big data are identified in the report. It does not, however, note the critical fact that most organisations continue to struggle with small data.
This omission has implications. For example, the first chapter, ‘Mapping global data: Growth and value creation’, estimates the data generated by various business sectors by way of storage aggregates. It goes on to estimate different sectors’ ‘intensity’ as a factor of the concentration of this data in order to argue that the greater the number of firms in a sector, the more dispersed the big data, and therefore the fewer the competitive spoils on offer. It is here that the focus on big data gets in the way of a deeper point: organisations have been struggling with Business Analytics for many years and for many reasons, none of which have historically included big data. Moving into a big data world is only going to exacerbate already existing challenges, and if they are perceived to be ‘big data challenges’ they will not be addressed effectively.
This isn’t to say that big data doesn’t present any new challenges – it certainly does – rather, that their novelty may unhelpfully mask longer-standing and more fundamental ones. There is no reason to think that sectors in which there are many players are poorly positioned to take advantage of Business Analytics (discrete and process manufacturing are offered as ‘low intensity’ examples in the report). To the contrary, basic economics would predict that they would be more competitive and therefore have greater incentive.
The opening chapter also duly notes the trends driving big data: the Internet, multimedia, sensors, RFIDs, mobile phones, social media, and so on.
Chapter 2, ‘Big data techniques and technologies’ provides a useful non-exhaustive glossary. It includes definitions of some technologies specific to big data (Cassandra, Hadoop, MapReduce). Most of the technologies and none of the long list of techniques, however, are big data specific nor dependent. One or two are contemporaneous (crowdsourcing). The report does note that “not all” techniques require big data, and that bigger data sets are, for analytical purposes, generally better than smaller ones. But again, the importance of ‘big’ relative to ‘data’ is overstated.
Chapter 3, ‘The transformative potential of big data in five domains’ makes the case for Business Analytics (under the guise of big data) as it is being – and could further be – applied to US health care, EU public sector administration, US retail, global manufacturing, and global personal location data. Each section looks at available data, industry composition, economic and competitive factors, and then presents a range of viable analytical applications ranging from nascent to common practice in terms of maturity. Notably absent here – unsurprisingly – are arms race sectors: those for whom Business Analytics is central to competitive advantage (financial trading, e-commerce, the Internet more generally, and Intelligence, for example). Algorithmic trading is mentioned in the context of stream processing in Chapter 2 (p.33), but Business Analytics is not much explored as a core business function.
The sector-specific applications presented in the report are, furthermore, generally operational and IT-intensive in nature. The potential for Business Analytics to be a primary lever of strategic and tactical decision-support, and its key function as an exploratory, sense-making activity, are not given the attention they deserve. These possibilities are implicit in the report’s comprehensive analysis, however they are systematically obscured by a pervasive bias: existing business models are pictured being made more efficient at the margins through operational analytics being grafted on to existing processes (cross-selling, various kinds of optimization, supply chain management, leaner manufacturing, sales support, and so on), and startups based on new business models made possible by big data are envisaged. What is not envisaged is the opportunity – and the potential – for existing businesses to strategically adapt using Business Analytics.
Chapter 4, ‘Key findings that apply across sectors’, summarises both the sources of value on offer from big data (read: Business Analytics) and various impediments to its realisation (skills shortage, data and technology access, data policy inadequacies). Towards the end it makes a critical point regarding industry structure which begins to get at some of the core challenges to Business Analytics. Unfortunately the insight limits itself to sector level generalisations:
Sectors with a relative lack of competitive intensity and performance transparency and industries with highly concentrated profit pools are likely to be slow to fully leverage the benefits of big data. The public sector, for example, tends to have limited competitive pressure, which limits efficiency and productivity and puts a higher barrier up against the capture of value from using big data. US health care not only has a lack of transparency in terms of the cost and quality of treatment but also an industry structure in which payors gain from the use of clinical data… but at the expense of the providers… from whom they would have to obtain those clinical data. (p.108)
Principal-agent problems and various sources of inertia (commercial, bureaucratic, cognitive, regulatory) are in reality common features of any sizable organisation, public or private; thus the unforgiving measurement and transparency that Business Analytics can’t help but bring are so often resisted. The difficulty of building the necessary ‘human infrastructure’ within this context – combining roles, skills, relationships, trust and culture with supporting electronic infrastructure – should not be understated.
Rounding the report off, the final chapters, ’5. Implications for organization leaders’ and ’6. Implications for policy makers’, are a series of action pitches to prospective decision makers reflecting the SWOT analyses detailed in preceding sections.
I am indebted to an earlier and excellent summary of the McKinsey report by Steve Miller at Information Management – which again, if you substitute “Business Analytics” for “big data”, reads like Davenport.
This post examines data as an economic resource and a source of value, and provides a new functional definition of Analytics on that basis.
Data is a fascinating resource, with a number of characteristics that distinguish it from other things that we think of as “resources”.
The first point is that data is not a commodity. The inherent value of data is that every piece tells you something new, and the pieces vary dramatically in their meaning, importance, value and reliability. While it is intuitively appealing to think of data in aggregate, and to consider its volume in tera-, peta- or zettabytes, it pays to note that the value in such data is not a function of volume in any reliable sense. This is at odds with commodities such as iron ore, gold or crude oil, which can be measured directly in dollars per kilogram.
The value in data, while very real, is not easy to quantify. In this way, data is more like human resources and less like gold or oil. It is inherently heterogeneous in nature, and its value is not proportional to its mass.
Nor is the value extraction process homogeneous. While the way you extract value from one lump of iron or is no different than that applied to another, the same does not hold true for data. Further, the method of value extraction may well be unknown, and require further exploration. Even if a value extraction method has been identified, and indeed proven to work, there may well be additional value in the data. Finally, the value of data may well only prove itself in concert with other data. This is a synergistic, almost alchemical effect for which there is no good metaphor in the world of commodities.
Where the commodity resource analogy does hold is that data holds value locked within it, and effort must be applied to locate and extract it.
Analytics, particularly in its “data mining” incarnation, is best described as investment in the extraction of value from this curious, heterogeneous, synergistic resource. The tools, skills, techniques and processes of Analytics ares all in the service of this investment enterprise.
The act of investing in data is best considered by way of analogy. Investing can be a simple, hands-off activity, based on a simple transaction – money is put in, and value is generated by an external process. This is how stock investment works. The agent investing in the stock merely puts up the money.
Now consider investing in one’s own education. Or health and fitness program. In these cases, considerable effort by the individual is required: they cannot outsource their intimate involvement in the process invested in.
Analytics is like the latter process for any business leader investing in data. If they are not intimately involved in the process, then the process is a most likely a failure, and almost certainly a waste.
The “mining” analogy is thus used at one’s peril. So is an over reliance on specific tools, processes or preconceptions about what is in the data. Regular mining is not a highly specialised form of labour, and much mining activity is rapidly automated.
Data mining is very different, and the mining analogy fails spectacularly if it envisages a repetitive, well-defined activity, where large machines extract value in a predictable, reliable way, supported by interchangeable people performing repetitive tasks. Indeed, following the mining analogy, real data miners are less like miners and more like prospectors and geologists, performing difficult, ever-changing and highly skilled work to detect value when and how it may arise.
Where they do identify repeatably extractable value (e.g. credit scoring models, churn models, fraud detection algorithms) these nuggets of (temporary) value can indeed be automated and a production processes applied. But this is no longer Analytics, rather it is the IT instantiation of Analytics results. The data miner has moved on to discover value somewhere else.
Thus data is a very unusual value store, with equally unusual, heterogeneous value extraction methods, and a reliance on adaptive, exploratory, highly skilled professionals to make them happen.
Data has another unusual economic property: it remains. As an economic asset, it can be preserved at negligible cost, while other assets -premises, machines, staff, funds and lines of credit – may diminish in tough economic times. The potential value of the data is hard to measure as discussed above, but whatever it may be it grows relative to other shrinking assets. Thus, in hard times, the relative value of data grows, and the business case for Analytics, or Investing in Data grows.
Data itself may grow in volume, scope and complexity with relative ease and low cost, thus the business case for Analytics grows with it.
Data may take many forms, with the structured, electronic enterprise data most familiar to analysts the most familiar form. But it is one of many. Unstructured data is anywhere from 4 to 20 times more voluminous than the structured kind. Survey data may be collected to answer those questions that enterprise data fails to answer.
A Postscript – Tacit Data Mining
The definition of Analytics above extends very broadly in its definition of data.
The most interesting, readily available, strategically relevant and poorly understood form of data is tacit data: the information contained in the brains of staff, board, shareholders and anyone else who would see the organisation do well. Tacit data is seen as the most difficult to identify, extract, value and leverage.
The definition of Analytics presented above includes investment in the extraction of value from all data, including tacit. How is tacit data mined? The most effective and powerful way is by use of collective intelligence and forecasting techniques, such as prediction markets. This is a topic for another day.
The modern knowledge worker has indeed progressed far past the illiterate, innumerate businessmen of ancient Sumer. They can do their own reading and counting, and many other things besides. They know the rudiments of double entry bookkeeping, though they may not be accountants. They are familiar with laws pertaining to their business, industry and work area, though they are probably not lawyers. They probably know the basics of project management, marketing, human resources or event management, without being an expert in any of those fields.
Thus, most modern knowledge workers can perform basic functions in any of these areas. Where their expertise is stretched, they would usually know how to recruit, retain or collaborate with an expert in any of these areas, brief them on requirements, and understand any advice or directions given.
This laundry list of capabilities is a core of the checklist of skills one would expect from a business course. While one need not be an expert or accredited in any of these areas, we can say that a modern knowledge worker is literate in all of them.
Closer to the topic at hand, the modern knowledge worker is expected to be computer literate, which is to say able to use a computer productively, often in the service of the professional literacies outlined above. Again, they would hopefully know their limitations, and know when to call an expert to repair faults, enable new capabilities or create new tools.
One interesting thing about literacies is that they are often unspoken: few job interviews ask explicitly if one can actually read. Not many more executive interviews ask if one can surf the Web, read a balance sheet, instruct a lawyer or define what a “marketing campaign” is. These things are tacit, assumed knowledge.
While there are indeed islands of expertise in law, IT, accounting, HR, marketing and many other areas in the modern business, these would be crippled if the rest of the business, particularly senior management lacked the minimal literacy required to engage these expert functions, to cooperate with them, instruct them and act on their advice.
Most crucially, a minimal degree of literacy is required to determine if the expert has done a good job, added value or created risk. Again, these processes are largely tacit.
It would be untrue to suggest that these literacies exist perfectly in all businesses. Indeed, one way to assess the effectiveness of knowledge workers, particularly middle and senior managers, is the degree to which they really have a truly literate grasp of the business functions that they interact with on a regular basis.
The Dilbertesque world all too familiar to so many of us exists due to an epidemic of false literacy in some organisations. What is false literacy? It is the ability to impersonate a literacy to another illiterate person. In business, it can be seen as a minimal, inadequate level of literacy, usually consisting of nothing more than buzzwords. To thrive it usually requires a critical mass of absent or false literacy, a lack of influential, literate people, and poor performance measurement. Usually arbitrary politicisation, poor accountability and poor literacy work together. This situation is rarer in smaller, privately owned organizations with majority shareholders. They are more common at the other extreme of the ownership spectrum.
False literacy is often sufficient, or deemed to be so, in some business areas such as recruiting or sales. Here an explicity “laundry list” of features, skills or other factors can be exchanged between buyer and seller without either party actually knowing what any of the terms mean. And perhaps this is enough in recruiting an IT developer with “C++, Java and backend systems”, but more of an issue when it is the way a CEO runs an insurance company.
Another way of looking at false literacy can be found here. The idea of a “cargo cult” helps to define the culture of an organisation where false literacy is the norm.
And now we are finally ready to talk about modern Business Analytics.
Today’s discussion begins with a key information technology underpinning all business today. This toolset provided novel forms of data storage, access, retrieval and analysis, many of which are used to this day.
The new technology in turn led to significant financial innovation, enabled new forms of exchange, and in particular facilitated the creation of new, tradable derivative products. It also led to improved military, industrial and agricultural production, as well as enhancing emerging communications networks.
The technological breakthrough required developments on a number of fronts, including hardware developments, especially storage media, along with innovations in data encoding techniques, and an extensive training program for the rigorously skilled technicians required to operate the new technology, which was not initially as user friendly as it could be.
Happily, the technology was adopted by business executives who were only glad to hire technological experts to drive these new systems, resulting in growing wealth and influence for those regions where the technology took root.
The technology in question is, of course, writing in its crudest, cuneiform pictographic form. It was closely coupled with its cousin accounting, which is the basic arithmetic of business in recorded form. The substrates were initially clay tablets, which required baking to give them any degree of permanence.
Much has changed since then. Clay tablets have been replaced by LCD screens, and pictographic cuneiform with a Roman phonetic alphabet and zero-bearing, decimal Indo-Arabic numerals. Nevertheless, it can be argued that the Sumerians covered more conceptual ground than what lies between them and many of today’s users of Analytics.
How can this be? One side of the argument is the sheer conceptual distance between Sumer and the pre-literate societies that predated it. This is a historical digression, where it may be instructive to compare Sumer with pre-literate civilizations such as those of the pre-Columbian Americas.
The other side of the argument is more relevant to the issue of Analytics today. Namely, for many users of Analytics, there has been one, and only one significant innovation since the Sumerians: modern, post-Sumerian business users of Analytics no longer require a technical specialist scribe to read, write or count for them. They can do it themselves.
And there is the second great conceptual revolution: the form of literacy invented by the Sumerians has become universal. What is more, it became apparent somewhere between then and now (perhaps around New Testament times) that perhaps business people would require a basic level of literacy and numeracy to conduct their own affairs, that understanding of such elementary concepts was not something to outsource.
Two thousand years later and it is inconceivable for a modern business executive, manager or clerical worker to lack literacy (in the sense of being able to read and write) and numeracy (in the sense of being able to count, add, subtract, multiply, divide).
Thus, in four thousand years, a technology that was the province of technical specialists has become an essential and fundamental part of the business toolkit. Lacking it is not a disadvantage: it is inconceivable.
The reader may have objected much earlier in this piece, noting a cornucopia of conceptual and technological innovations: Syllogisms, geometry, Algebra, calculus, statistics, Cartesian coordinates, symbolic logic, 3D animation, relational databases, OLAP, machine learning…and so many others.
The problem is, basic literacy and arithmetic numeracy is pretty much where it appears to have stopped for all but a new technological elite of scribes. This includes way too many people whose job it is to develop strategy, see “the big picture”, produce “evidence based policy”, hear the arguments of quantitatively skilled advisors or in many other ways interact with, and manage a data-rich world, of changing, poorly understood circumstances, vast uncertainty and with powerful analysis tools just a click away.
This is basically the condition of most people interacting with data in the modern world. These are the people who think that BI=Analytics=Reporting. These are the people who cannot read an XY graph, or trust any data summary more complex than an average. These are the people who when shown any kind of report, dashboard or graph ask to see the raw numbers because they are on firmer ground there, even if the numbers are millions of transactions and no useful inference can be drawn from eyeballing them.
There are many other literacies in the modern world, and most of these remain “unknown unknowns” for most of the people interacting with Analytics. What are these literacies, and how do the deficiencies affect business?
An earlier post clarified the distinctions between forecasting, goal setting and planning. Just as they are different activities, they imply different error measures.
The most common error measure used to assess both goal setting and planning is variance as a percentage of base. This makes sense. If my annual target is 100 and I put together a plan to reach 100 but in the end achieve only 97, it is accurate to say that I have achieved 97% of my target, missing it by 3%. The 3% error is a measure of my shortfall in performance (execution error). The 97% achievement of target or plan is a measure of the value of my performance. What’s the benchmark here for assessing error and value? Well, at the start of the year I had achieved nothing, so the benchmark is zero.
This same error calculation is commonly applied to forecasts, but to do so is misleading. The aim of a forecast is to provide an objective estimate of the most likely outcome. The most accurate forecast will be the one which gets closest to the actual, but how should error and value be measured? As with targets and plans, the key is to understand the implicit benchmark. When I – within this month – forecast next month’s sales to be 100, it is true that next month’s sales are zero. Next month hasn’t happened yet. However my information about the likelihood of next month’s sales figure is almost never zero. In most cases I will know what previous months’ sales have been, as well as having a pretty good idea about how this month’s sales are shaping up, and some knowledge about the seasonality of my business. In other words, my forecasting benchmark is what I already know. If I am to add value I need to do better than a naive extrapolation – for example, the most recent actual or (in a seasonal context) the previous year’s actual for the same month. If last month was 100 then my naive expectation of this month is 100, not zero. This is my benchmark. I need to produce an estimate closer to the actual than 100 in order to have added value – otherwise my forecast was less accurate than a simple assumption of no change. The correct measure of error for forecasting, then, is not the error as a percentage of base (execution error), it’s error as a percentage of change (forecast error). For a forecast to be 97% accurate it needs to capture 97% of the change between actual observations.
Another way to think about this distinction is as follows: over-achieving a target may be desirable, but overshooting a forecast is not. I may, in the forecasting context, customise my error function so that it underweights overshoot compared to undershoot – say if my inventory holding costs are low compared to the cost of missing sales due to being out of stock – but note that I’m trading off costs where I’d prefer 100% accuracy.
Broadly speaking all Business Analytics serves one of two goals: decision support or decision automation. One way to idealise these is as either reports (decision support) or algorithms (decision automation).
Algorithms reduce the need for humans to think. Picture the in-database credit scoring function embedded deep in your bank’s systems and firing thousands of times an hour. This kind of decision automation (or decision replacement) is a common operational analytics endpoint.
Reports, on the other hand, make decisions more difficult. The simplest decision support system is a coin toss, but a business relying only on heads and tails will not survive for long. Real decision support adds ambiguity, complexity, uncertainty, and necessitates human judgement. This makes decisions harder, not easier.
About usAnalyst First is a new approach to analytics, where tools take a far less important place than the people who perform, manage, request and envision analytics, while analytics is seen as a non-repetitive, exploratory and creative process where the outcome is not known at the start, and only a fraction of efforts are expected to result in success. This is in contrast with a common perception of analytics as IT and process.
Tags in a CloudAIPIO analyst first Analyst First Chapters analytics analytics is not IT arms race environments big data business analytics business intelligence cargo cults collective forecasting commodity and open source tools complexity data decision automation decision support educated buyer EMC-greenplum forecasting HBR holy trinity human infrastructure incentives intelligence model of analytics investing in data lean startup literacy management culture MBAnalytics operational analytics organisational-political considerations Philip Russom Philip Tetlock prediction markets presales R Robin Hanson Strategic Analytics tacit data TDWI Tom Davenport uncertainty uneducated buyer vendors why analyst first