Continuing with the big data meets big hype theme:
So you want to get into Business Analytics/Big Data/Predictive Analytics.
What areas, skills, tools, data should you focus on first ?
There are three rather big questions that you need to ask yourself:
1. How well do I really understand the problem(s) that I want Analytics to solve, and The roles(s) that Analytics would play ?
2. How well do I understand my data?
3. What data do I actually have, or can get ?
Each question explores a continuum. Together they represent a three dimensional space of possibilities. There is no “magic quadrant” here, each part of the space is a legitimate place to be, with its own solutions, risks and benefits.
Let’s go through them.
1. The range of possibilities looks something like this:
A: having built preliminary offline random forest models and created some prototypes, I want to extend these existing customer acquisition and retention models we have to our intentional markets, and operationalise them for real-time, event based activity, provided this is seem to yield further significant yield. We will need an industrial strength, scalable, and reliable tool, probably a commercial vendor tool, and possibly a Hadoop-based MapReduce solution
B. my CEO just attended a lavish conference where he saw a slide presentation mentioning the Davenport HBR article from 2006 and now he wants us to “get into analytics”.
Most people are somewhere in between. But you get the idea. And there are far too many initiatives that are precisely at B. the ideal vendor customer is precisely at A. Unfortunately, there are not enough A’s around (we call them “Eduacated Buyers”) so some vendors must sell to people who look more like B’s.
Naturally, Analyst First does not advise Bs to get into Big Data, buy expensive vendor tools, or ever believe anyone that there is such a thing as “a solution for getting started in Analytics” especially when said solution is no more than a bunch of software and maybe a few relatively junior technical consultants for a few months.
Indeed, we advise the Bs of this world to invest in learning, exploring and gaining experience, while managing their sponsors’ expectations and growing their personal investment and participation in the new Analytics enterprise (yep, it’s an Enterprise, with all the Lean Startup that entails), and eliciting from said sponsors their real, and realistically achievable needs.
This is a crucial time to invest in smarts, experience, talent, learning and plenty of Lean Startup.
If this approach is not feasible, I do not have high hopes for the future of the function, which will, at best become a showpiece trophy of high tech adding no value, and will more likely be shut down, “restructured” and restarted again, hopefully with a more sensible approach.
And what of the As ?
Speaking to an A recently, indeed one of the best As I know, he noted that his team had kicked some great business goals recently, having implemented a very necessary expensive vendor tool, after trying R and seeing that it was not up to the big data / big crunch job they had to do. He noted that this was necessary, even though he agreed with A1, and that this was not in line with A1′s preference for open source tools.
“not at all”, I replied, “This is exactly A1, you were the quintessential Educated Buyer! A1 is not against vendor tools. We are against people spending money on what they do not understand in the hope of a magic solution. You don’t fall into that category.”
Hopefully, the anonymous A in question will write a more detailed post on this blog, outlining his success story in more detail.
So, our advice to As is… You don’t really need our advice, until you want to do something new again. In which case, chances are you are following A1 principles already, explicitly or not – otherwise how did you get to A in the first place,anyway ?
Most people are somewhere in between, and usually closer to B than to A.
Answering the “what the heck are we going to do?” question involves exploration on a number of axes, including stakeholders needs, own capability, available resources (human and electronic), any impediments or constraints (Hello IT!) and data, the subject of questions 2 and 3. The actual hidden contents of the data, the “gold” of the data “mining” metaphor is a huge exploratory subject in its own right, and must be considered in the context of the others.
This is not a very easy target to hit, and needs defining before that can happen !
So, to all the Bs and almost-B’s out there : invest in learning : invest in your own and your sponsors’. Invest in getting your sponsor invested, supporting and covering you, letting you explore and grow. Invest, above all, in exploration and invest in managing expectations and delivering intermediate ressults to allow all this to happen. Buy your analytics function a chance to grow, learn, explore and breathe free of unreasonable pressures and constraints.
The other two questions will be covered in upcoming posts.
A few thoughts on Big Data, and how it matters, as well as where the hype does more harm than good. Also some thoughts on what I call “Big Crunch”, Big Data’s Siamese twin, and one I am well acquainted with, more so than Big Data proper.
First of all, the hype. A colleague in the industry spoke to me recently regarding an all too common scenario.
The colleague is effectively the analytics manager of the business. The trouble is, the business does not really have much of an idea about what data analytics is, apart from the fact that it has something to do with big brand software, which the company has in abundance. There is far less idea that it requires direct engagement and understanding from management, and most importantly that it requires clean, accessible and appropriate data before anything can happen, and that this requires a clear, concerted effort, and that this usually consumes the bulk of investment in time, money and political capital. Certainly, there is no realization that what the business needs done is easily served by open source and commodity software, but requires investment in good staff, and appropriate, coordinated effort from IT and management.
So far, so typical A1 case study, could really be anyone, and probably an adequate description of at least 100 cases in the city of Sydney alone. The Big Data twist is that a senior manager has been sending the colleague material on Big Data, and how important it is. Again, this is an increasingly common occurrence.
Naturally, the colleague is exasperated. First of all, the business is not really one that collects or benefits from terabytes per day of data. Secondly, they have not even got their small data right It is neither clean, nor reliable, nor ready for analysis. Finally, the manager in question has done little to get appropriately educated or involved in analytics, but bought deeply into the Big Data hype, seemingly without understanding it too much either.
And here endeth yet another, all too typical cautionary tale.
But Analyst First is more about “Though Might Want To” than about “Thou Shalt Not”
So here are the “Thou Shalt” Analyst First commandments on Big Data:
Hear ye o senior manager/sponsor of Analytics!
Before you worry about Big Data, or dare speak its name:
1. Get your Small Data right first. Organised, cleaned, and analysed.
2. Be engaged in analytics. Investing in analytics is like investing in a Gym membership, or an education – you don’t just spend the money and make it someone else’s problem. It only works if you are directly engaged.
3. Invest in the the best Human Infrastructure you can
4. Squeeze all the value you can out of your small data first.
5. Use Big Crunch on small data before moving to Big Data.
So point 5. brings me to Big Crunch.
Ok, admittedly, Big Crunch is actually a feature of an upcoming post, but the intro and teaser is here (Just as A1 itself started life as a teaser at an R presentation to MelbURN almost 2 years ago)
So, what is Big Crunch ?
Big Crunch is my name for the very cool stuff you can do with small data given heaps of computation. Much of my recent work has made extensive use of tools such as re sampling, and its specific manifestation in Random Forests, k-fold cross validation, and other advanced predictive modelling techniques.
I have over the last couple of years developed a number of new, tortuously computationally complex (but deliciously parallelisable) techniques for measurement of such beasts as robust predictive variable importance, generalised nonlinear correlation and forecast performance. I use these to squeeze information out of data sets big, and, more importantly, small.
Incidentally, in practice “small” includes huge data sets with tiny classes, where the task is classification, and the business problem may be customer retention, acquisition or fraud detection.
Big Crunch applies extreme statistical sophistication to small data sets, just the very thing you need to get the very limited amount of information out.
These methods really do squeeze the informational juice out of data the way simple, mean estimating, linear models and Pearson correlation measures cannot ever hope to do.
Indeed, much of my recent work in areas as diverse as geophysical analysis, retail price elasticity modelling and customer acquisition in entertainment can be described as Big Crunch. In all cases, we see dramatic improvements in robustness, performance and reliability. The biggest challenge is to explain the difference between less reliable, classical measures such as Rsquared on training data, and far more reliable, seemingly magical OOB error on random forests, and how these serve as building blocks for far more sophisticated solution.
Finally, the Big Crunch mini-FAQ.
Is “Big Crunch” actually just “Big Data” ? Yes, if you like, in a way. Similar principles usually apply.
Can/should you Hadoop/MapReduce it ? Sure. Highly parallelisable as a rule.
Can you deploy Big Crunch on commercial systems ? Sure, why not. But do you really need to ? (In fact, we just deployed a Big Crunch solution in SAS, after prototyping in R.)
Is Big Crunch difficult ? It shouldn’t be. Not if you have the right people. They are around. And your people can become the right people with a bit of mentoring. A stats degree helps.
Is Big Crunch expensive ? In people, moderately. In systems and software: hey, Hadoop is open source. And so is R. And the same applies to Big Data too. So no.
About usAnalyst First is a new approach to analytics, where tools take a far less important place than the people who perform, manage, request and envision analytics, while analytics is seen as a non-repetitive, exploratory and creative process where the outcome is not known at the start, and only a fraction of efforts are expected to result in success. This is in contrast with a common perception of analytics as IT and process.
Tags in a CloudAIPIO analyst first Analyst First Chapters analytics analytics is not IT arms race environments big data business analytics business intelligence cargo cults collective forecasting commodity and open source tools complexity data decision automation decision support educated buyer EMC-greenplum forecasting HBR holy trinity human infrastructure incentives intelligence model of analytics investing in data lean startup literacy management culture MBAnalytics operational analytics organisational-political considerations Philip Russom Philip Tetlock prediction markets presales R Robin Hanson Strategic Analytics tacit data TDWI Tom Davenport uncertainty uneducated buyer vendors why analyst first