Technology Spectator recently published an article highlighting the need for big data speed …
This article highlights the communication challenge for accredited A1 professionals.
We all recognise Analytics is about using information better than competitors, so we are: 1. doing things better, and 2. doing better things than competitors/relevant comparators. But like so much of the coverage of our sector, the article focusses solely on Operational Analytics, not the latter area of Strategic Analytics.
Secondly, the article fails to recognise speed is only one part of the equation.
Taking the author’s example of the retail sector, sure real time analytics can detect an early decline in sales for a particular product, controlling for some extraneous factors. But a retailer’s promotional response (who they target and how) doesn’t necessarily require real time analytics (they can apply in real time outputs from models created last week, with little risk of degradation).
The most important questions for shareholders of the retailer require Strategic Analytic capability: how should pricing across the entire product portfolio be optimised?, what products should we be ordering now for next season (or the season after)?, how to optimise the physical network and supply chain? These strategic questions demand the right answer, not necessarily the fastest answer.
Any experienced industry professional gets that making sense of data is our primary role. But clearly interpreting data to the best of our ability flies in the face of throwing away information (e.g. because inconsistencies in the available data makes the task more cognitively complex). No one would advocate storing and processing data which possesses no incremental information value, but information value can be measured, so that shouldn’t be an issue.
Critically this article fails to recognise many of the barriers for Australian companies in effectively using their data relate to data quality, not their data storage and processing capacity.
Finally, there is no explicit recognition of the talent required to use data more effectively than your competitors.
From an A1 perspective we should welcome the growing focus on our sector, but we need to better articulate the more nuanced (and interesting) story of Analytics in an A1 Practice. It would be easy to criticise the journalist for being naive in swallowing the line of vendors and other vested interests, but the responsibility is ours to better explain the reality.
Eugene is totally right that we need to stand with a united voice. From today, NTF with publicly back A1 in all our proposals and marketing collateral. I regret not taking this action sooner.
This is a short blog to extend our thanks to Richard Volpato for a terrific presentation given to A1 Sydney recently. For those who haven’t been exposed to what Richard’s work and approaches, I believe it is a fantastic example of what can be achieved adhering to A1 principles.
Some key take-out messages for me are the importance of:
Learn what matters before, not after an infrastructure build. Even if Richard’s current employer was convinced against all reason and empirical evidence to implement an alternate system today, the costs of doing so would be a fraction of the ‘blind build’ costs (we’re again reminded of black swan risks in a recent HBR article – http://hbr.org/2011/09/why-your-it-project-may-be-riskier-than-you-think/ar/2), because they now have a set of very rich insights concerning their revenue profile, and importantly a ‘blue print’ for how to transform and process their data in order to operationalise these insights.
Manage Value exchange. Don’t assume users will give up their highly prized spreadsheets, just because you covet the rich sematic content therein. Richard showed us how offering to assist users to repair, document and update their prized digital assets works well for both parties.
Visualisation. Richard and his team used visualisation to help his management (most of whom I understand have a legal background ) to understand the impact on individual cases of imposing multiple, interacting weighting factors (which while leading to entirely logical outcomes, may nevertheless appear to be counter-intuitive).
Most importantly, this is a success story about what can be achieved through assiduous and open minded interrogation of data, using a ‘fit for purpose’ mix of open source solutions.
We’ve spoken much within our Analyst First sessions about the transformational power of open source tools like R and RapidMiner, which effectively harness the collective intelligence of the statistical development community, within a system that is largely self-regulating. I can only imagine open source tools must be creating a vexing challenge for commercial analytics software venders and their R&D Depts, who are having to compete with their rich functionality and enthusiastic support, extended through an intelligent and energised network.
Far from being fringe, we hear the power of tools like R are being effectively exploited by large organisations with highly sophisticated Analytics capabilities ranging from Google to the ATO.
Logically, this must take the focus of Analytics away from data ‘crunching’ and reporting to one which enables Strategic Agility. In short, and consistent with A1 principles, Analytics is there to inform how to ‘do better things’, rather than just ‘doing things better’. Unencumbered by the weight of expensive Enterprise application licensing fees and high data processing costs, companies can now single-mindedly focus their budgets and resources on sourcing smart analysts to answer key strategic questions – ‘why are we….?’, ‘should we be …?’, ‘what if we were to…?’.
So much for the theory, what’s the practical reality of this new era of Analytics? Let me share with you why NTF is so committed to A1 principles, and why I believe Eugene and Stephen are truly two of the most astute thinkers on Analytics I’ve encountered.
Take the last couple of weeks at NTF. One of our clients recently engaged us to undertake a project requiring the analysis and modelling of over 10 million customer records. We read the data into Microsoft Powerpivot (which is free – www.powerpivot.com) using a garden variety desktop (8GB RAM; i7 processor; retail cost about $700). We undertook the usual normalisation and transformation steps. Fast forward through our learning curve and we now load Powerpivot data via MySQL and use the 64-bit version of Excel (yes, sorry you’ve got to jettison the trusty old VB scripts, Outlook compatibility, etc on 1 box – but it is worth it!). We can now interrogate 10 million records, in a fraction of a second, on a standard desktop. We are yet to fully exploit Microsoft’s DAX language capabilities, but we were able to learn much about this (vast and rich ) dataset by conducting basic exploratory data analyses in Powerpivot. Currently because of our ‘history’ with it, we clean and transform data in Python (which again is free), but I can imagine a time when we’re cleaning and transforming data mostly using DAX. To be clear, we’re not confident at this point the above process can scale beyond 25 million records, but recognise we’re currently processing 10 million customer records in a fraction of a second, on a $700 desktop box. I’m not for one minute suggesting this is a ‘big data’ solution, but there is huge scope for companies to transform their businesses through analyses conducted on datasets up to 20 million records, without any material CAPEX or OPEX outlay.
Enter R and the modelling component, and you’ve all seen this film before. Most of you from A1 Sydney have seen the truly amazing AFL modelling system our Head of Analytics, Tony Corke, has created on his own (http://maflonline.squarespace.com/; the A1 presentation will be replayed at SURF next Wed – http://www.meetup.com/R-Users-Sydney/) . I really encourage you to read Tony’s post (http://maflonline.squarespace.com/mafl-stats-journal/2011/8/3/predicting-the-home-teams-final-margin-a-competition-amongst.html) where he uses R’s carat package to test the out of sample fitting performance of 54 different algorithms. Ponder for a couple of seconds what we might be able to learn from a table like this …
The end result is that our client enjoys the ability to predict weekly volumes for their most profitable product to within +/-3% (in a marketplace where weekly volume fluctuated +/- 11% over the past 3 years). Expressed another way, our client has a highly commercially exploitable model which explains 85% of the weekly variability in their most important product (which makes a profit contribution in the hundreds of millions pa), without over-fitting. All models were estimated in R; all exploratory and explanatory data analyses were undertaken using a plurality of open source software and Powerpivot.
What level of CAPEX and risk exposure was incurred? Our total hardware costs were $700; our software costs were $0; but consistent with all the A1 presentations I’ve been privileged to see, we invested all of our client’s budget in tirelessly trying to understand the data – it’s semantic structure and patterns. Like the vast majority of what we do at NTF, and what I see in A1 practitioner presentations: this problem yielded to perspiration, not inspiration.
So to the central questions posed at the outset- if Microsoft are serious about Powerpivot, yes it is absolutely a game changer (please excuse my timidity, it comes from the remembered pain of having the Google Wave hook and sinker deeply embedded in my oesophageal tract). Powerpivot allows corporate decision makers immediate data access, without waiting in queues for SQL programming resources. Smart, numerate people without programming skills (e.g Finance grads) now don’t need to know about indexing or how to code ‘joins’, they just link fields from different datasets via a mouse click. Can Powerpivot change the dynamics of information accessibility? Absolutely – I have no doubt based upon our experience to date. Powerpivot is not there yet – it is still prone to crashing (particularly before you’ve learned some basic tricks – two of which are pointing to a MySQL database and using the 64-bit version Excel [a free but painful download process]). However, for a first release, it is an impressive execution of an even more impressive vision.
I just hope Microsoft, with all its competing priorities, appreciate what they have.
Reminder: All welcome for Tony Corke’s presentation at The NTF Group, Suite 318, 5 Lime Street, Sydney. We will need to start at 5-30.
For interstate Members, please use webex link below:
Topic: Profitably modelling AFL football
Date: Tuesday, July 12, 2011
Time: 5:30 pm, Australia Eastern Standard Time (Sydney, GMT+10:00)
Meeting Number: 863 164 522
Meeting Password: Analyst1st
——————————————————-
To start or join the online meeting
——————————————————-
Go to Webex
——————————————————-
Audio conference information
——————————————————-
Please call: 03 8779 7440 (from Melbourne) or 02 9696 0774 (from Sydney)
Your guest access code: 12589350#
——————————————————-
For assistance
——————————————————-
1. Go to Webex
2. On the left navigation bar, click “Support”.
To update this meeting to your calendar program (for example Microsoft Outlook), click this link:
About us
Analyst First is a new approach to analytics, where tools take a far less important place than the people who perform, manage, request and envision analytics, while analytics is seen as a non-repetitive, exploratory and creative process where the outcome is not known at the start, and only a fraction of efforts are expected to result in success. This is in contrast with a common perception of analytics as IT and process.Authors
- Eugene Dubossarsky (43)
- Greg Taylor (4)
- John Lowry (1)
- Richard Fraccaro (1)
- Stephen Samild (87)
- Tapir (1)
Tags in a Cloud
AIPIO analyst first Analyst First Chapters analytics analytics is not IT arms race environments big data business analytics business intelligence cargo cults collective forecasting commodity and open source tools complexity data decision automation decision support educated buyer EMC-greenplum forecasting HBR holy trinity human infrastructure incentives intelligence model of analytics investing in data lean startup literacy management culture MBAnalytics operational analytics organisational-political considerations Philip Russom Philip Tetlock prediction markets presales R Robin Hanson Strategic Analytics tacit data TDWI Tom Davenport uncertainty uneducated buyer vendors why analyst first

