At a recent A1 event I was asked about how academia can help Analytics. At another recent event, a discussion with a recruiting professional focused on the criteria employers in HR departments use in filtering and selecting Analytics candidates.
Analytics education is a hot topic. A growing, multidisciplinary and highly complex field is experiencing a shortage of suitably qualified people, and this is becoming a matter of concern for business.
There is a growing number of data mining/Analytics subjects, majors and even courses. I have been asked by a number of people in recent weeks what I think of particular courses, or what individuals should do to prepare themselves for a career in Business Analytics.
My opinion of the value of existing courses is such: the number of suitable people will increase as a result of these courses. But not by much.
As it stands, there are serious problems with what passes for “data mining” training, particularly at undergraduate level, particularly in computer science.
I single out computer science because it produces seemingly suitable candidates who are hired by HR, and fail in fundamental, but not immediately obvious ways. I also speak as an “insider”, with my entire academic training based in computer science. As with all useful generalizations, there are quite a few exceptions, and the problem outlined lives on a continuum of pathology, with a minority of extreme cases and many more less severe ones. Nevertheless, there is a real and consistent problem with computer science undergraduates (and many postgraduates!) moving into a career in Analytics. Given that these represent most of the new talent in the field, this is an issue to address ASAP.
The problem is not easy to detect by HR at interview time. A typical computer science graduate may well have one or more AI, machine learning, data mining and even statistics courses under their belt. Indeed, they are capable of writing from scratch some of the more sophisticated algorithms of machine learning.
And therein lies the problem. They may insist on writing algorithms when they should be extracting value from data. They will not appreciate the key differences and similarities of algorithms. The will insist on trying them all.
Worse, they may not even have the right basic categories in place. While they will eagerly deploy Boosting, Bagging, Support Vector Machines and Generalised Linear Models they might not do so with the appropriate pre-processing, error function selection and, worst- of all, suitability to the business problem. Worse still, some will not even appreciate that all four are fundamentally distinct from means clustering. To some, they are all just cool algorithms, fun to play with.
Worst of all, this is a cohort that sees Business Analytics as in IT job. Their main activity is tinkering with, writing, re-writing and deploying algorithms. Their computer science backgrounds provide them with an IT model of Analytics, where being able to deploy an algorithm from scratch is possible without understanding the statistical subtlety that gave rise to the algorithm in the first place, and distinguishes it from other methods. Not really understanding the theoretical basis, these candidates are inclined to try them all. Parameters are tinkered with based on “best practice” or voodoo rather than sound statistically trained intuition.
Worst of all, the final product is the code itself, or “specified” outputs, rather than a considered analysis.
There is a somewhat superior cohort which is more inclined to explore the data. This group suffers from a lack of training in this approach, and must rely on their natural curiosity alone, without the benefit of understating the multidimensional, correlated, uncertainty and information-rich nature of data.
The key problem here is that Business Analytics needs educated, curious “finders”, hunters of truth in data, who know their tools, and also their prey, and enjoy the uncertain, manual, iterative nature of the hunt. They also understand their clients, and their multiple, sometimes uncertain, under defined or conflicting objectives. They flourish in uncertainty, and the thrill of the hunt. Tools are important, and sometimes they can build their own, but the tools are far from the main thing, and the process is far less important than the finding.
The IT model of training instead creates “builders”, who see their role as creating, testing and comparing algorithms, which are implemented as black boxes, part of clearly specified processes. Either the process itself, or the data produced by it is the end result. This is what we refer to as the IT model of Analytics.
The most naturally curious and intelligent of these still find a way to become “finders”, but without the benefit of rigorous statistical training, a shortfall that they usually address in time, leading to their becoming competent Business Analytics professionals.
The rest tend to be an ongoing problem, particularly in the larger companies and government departments where they tend to accumulate. At best, they are naturally well suited to data-acquisition, warehousing and pre-processing tasks – supporting Business Analytics at a low level, but not taking part in the real thing. At worst, they suck in valuable time, money and the attention of management, while nothing substantial is produced in terms of insights, or even statistically rigorous, meaningful data processing. They can be particularly problematic in government departments such as those in Australia where it is virtually impossible to fire someone once hired, and difficult enough to direct, performance manage or criticise staff.
This is a fundamental problem. Happily, it can be addressed quite easily on the recruitment side. It should be simple enough to determine whether the candidate is at heart a builder or finder, and what level of statistical analytic, as opposed to computational training they have. All it really takes is to ask some key questions in the interview, and take a critical eye to the CV. Of course, it requires an appropriately educated interviewer.
On the educational side, the issue is a little more complex. Again, the solution begins with recognizing the key distinction between builders and finders. There is a professional track for both, and current computer science courses are better at preparing people for jobs in data warehousing, BI implementation, ETL and other tasks supporting Business Analytics. Perhaps this track should have its own name, to distinguish it from Business Analytics proper.
Having recgnised the key distinction, What can computer science undergrad courses do to produce more actual Analytics professionals ? Is there a computer science graduate “finder” ? Of course there is. But these tend to be the most gifted, curious and unusually quantitative in their training.
The obvious solution is to create a serious, multidisciplinary degree, one with the right amount of computer science, mathematics, statistics, psychology and business studies, ideally a four or five year course. Most importantly, there must be specialized subjects in Business Analytics, taught by competent practitioners. There would also be specialized subjects on data preparation, business tools, communication skills and other things that current undergrads lack.
Whether this course sits in computer science or elsewhere is less important than producing well educated, multidisciplinary finders if we are to meet the current and growing training shortfall.
About usAnalyst First is a new approach to analytics, where tools take a far less important place than the people who perform, manage, request and envision analytics, while analytics is seen as a non-repetitive, exploratory and creative process where the outcome is not known at the start, and only a fraction of efforts are expected to result in success. This is in contrast with a common perception of analytics as IT and process.
Tags in a CloudAIPIO analyst first Analyst First Chapters analytics analytics is not IT arms race environments big data business analytics business intelligence cargo cults collective forecasting commodity and open source tools complexity data decision automation decision support educated buyer EMC-greenplum forecasting HBR holy trinity human infrastructure incentives intelligence model of analytics investing in data lean startup literacy management culture MBAnalytics operational analytics organisational-political considerations Philip Russom Philip Tetlock prediction markets presales R Robin Hanson Strategic Analytics tacit data TDWI Tom Davenport uncertainty uneducated buyer vendors why analyst first