From the monthly archives: July 2012

Electronic data that is.  The most important decisions aren’t data-friendly. But they are the ones worth the most dollars, nerves, careers and lives.

“Do we want to mail an offer to this particular person” is a far less important question than “Do we want to acquire this company”. The former is a decision supporting precise,  very low level action, for which data exists, because essentially the same action has been carried out many times, and will be again.  But how do we apply analytics directly to the second question ?

This is where collective forecasting can help, by applying analytics rigour to get the benefit of the most important data in an organisation, the tacit data in the heads of its people.

Collective forecasting is a truly “Analyst First” technique: the analyst comes before software, and even before (electronic) data. Indeed, software is helpful, but not essential, and data may be scattered, in short supply orabsent entirely.

Here is a presentation given last week at the Australian Institute of Professional Intelligence Officers (AIPIO) annual conference, explaining the benefits of the collective forecasting approach to organisational strategic decision making. These include a powerful KPI for strategic forecasting and decision making, and flow-on effects of a truly meritocratic, depoliticised decision making culture, where Highly Paid People’s Opinions (HiPPOs) do not carry the same weight as a good predictive track record.

Improvement is gained through the use of the group, or collective forecast, which fuses the tacit knowledge of relevant knowledge holders to create a more reliable decision making mechanism.

The presentation also presents results of the first round of AIPIO’s collective forecasting competition, where the group forecast performed very well, as expected.

Readers are invited in the second round of the competition, which is running currently.

 

 

 

 

 

related post: a great Dilbert cartoon, saying essentially the same thing.

Are you ready to pay?

 

Let’s see a brand name magic black box fix these.

Meanwhile, also from Rexer: 

R continued its rise this year and is now being used by close to half
of all data miners (47%).  R users report preferring it for being free, open
source, and having a wide variety of algorithms.  Many people also cited R’s
flexibility and the strength of the user community

.

Tagged with:
 

Following last week’s presentation on Analyst First (“A1″) to NZIIP in New Zealand, I congratulate John Holt, Compliance Modelling Manager at Inland Revenue New Zealand on his becoming head of the newly established Auckland and Wellington chapters of A1.

A number of people from both cities also expressed an interest in participating, promising a lively and active A1 presence in NZ.

This brings the current number of chapters to nine, spanning six countries.

Those in NZ interested in participating in A1 can contact a1@analystfirst.com
Direct contact details for chapter heads will be posted shortly.

Related posts:

http://analystfirst.com/2012/07/21/1444/nziip-presentation-putting-the-analyst-first-the-human-infrastructure-of-data-analytics/

I have just returned from a terrific and all too brief visit to Wellington, New Zealand, where I presented the Analyst First (“A1″) vision to NZ’s intelligence community’s professional body- the New Zealand Institute of Intelligence Professionals (NZIIP) – at their annual conference. A big thanks once again to the organisers for inviting me, and giving me the opportunity to meet such a dynamic, interesting and intelligent group of people.

The presentation was too long for the time allowed,as it tried to capture the main aspects of the set of ideas comprising A1. The response was positive, with further NZ related A1 developments to be announced shortly.

It also included a picture that captures the whole idea of A1 in the ironic “Motivational Posters” style. You can find the picture on the second page.

Here is a copy of the presentation.

This was only the first of two presentations that I gave at the conference, the second being delivered to an audience that included the New Zealand Prime Minister John Key. Not the kind of thing that I am used to by any means. This second presentation did not come with slides, but an extemporaneous opening of the NZIIP Forecasting Competition. Unfortunately this competition is closed to NZIIP members only. All are however welcome to participate in the Australian Institute of Professional Intelligence Officers (AIPIO) Collective Forecasting Competition, which is currently running.

I look forward to seeing some of the NZIIP people again at the AIPIO Annual Conference this week in Sydney.

Related articles:

What is Analyst First ?
Analytics is Intelligence
Analytics is Intelligence – The Podcast
AIPIO Collective Forecasting Competition

Today’s post is from Klaus Felsche, subject of a recent Analyst First Interview on the creation, development and live deployment of an Analytics function at the Australian Department of Immigration And Citizenship (DIAC).

At a recent Analyst 1st meeting in Canberra, it was suggested that I offer a few thoughts about ‘data quality’ as a thought-piece. Here are some of the thoughts:

There is considerable discussion around ‘data quality’; and so it should be. Unfortunately, ‘quality’ is a qualitative and, at time, emotional, assessment of whether the data we have supports analytics. Some, after a quick look at available data, throw in the towel and abandon any attempts to make the data work.

I am inclined to abandon the word ‘quality’ as it tends to under-value the capabilities of skilled analysts and sophisticated software and leads managers to rash judgments about what is and what is not possible. It is far more constructive to consider whether the given data is able to support our processes with sufficient accuracy to be meaningful. In other words, is there enough value in the data to be fit for service?

The Situation: Data is rarely collected to support analytics
We tend to have data that is collected to support a business process. Even if analytics is considered at the initial design stages, by the time we need to get the data to provide answers, we would generally find that these new questions were probably not anticipated years ago when the project defined the current data structures and processes.

The challenge for analysts is, therefore twofold: firstly the analyst tries to build models that address the issues with the highest degree of accuracy from the available data; secondly the analysts compiles a ‘shopping list’ of data that would enhance the process if it were available. The ‘shopping list’ can be provided to management to ensure it is considered in future redesigns.

There will be times when even the smartest analyst cannot squeeze the answer out of existing data sets or sets outside the organisation that may be available.

While we wait to get more useful data analysts may be able to help the organisation build interim solutions based on what is available (eg intelligence reporting, business intuition and knowledge, etc).

Lessons Learnt
• Avoid the term ‘quality’. It assumes that there is a low or high quality data set. This is not helpful. We should probably focus on how well the data supports the analytics processes. As far as business operations are concerned, if the data processes support the core business functions then there is little wrong with its ‘quality’.
• Start analysis from the ‘data end’ rather than having in mind a business (intuition or anecdotally or experientially – derived) model. While these can be useful, measuring the data against such models and then making a call that the data is ‘not good enough’ in some way is not helpful. Analysts should be given the opportunity to test a range of methods to see what meaning can be drawn from the existing data first – in my experience, analysts have been known to produce pleasant surprises when given the chance and the right tools.
• Ensure that there is a business process in place (or build one) that can feed analysts’ suggestions back into future systems enhancements.
• Many vendors offer an end-to-end solution (everything from data capture to storage to analysis and reporting). Such systems would need to be sufficiently flexible to support changes over time to support changes in data structures, collection processes and tools available to support analytics processes.
• Educate the business areas to create an awareness of the value of data:
o enhanced data sets; and
o business processes that better support analytics (ie convince operational staff or clients entering data that there is value in timeliness, accuracy, completeness, etc).

Tagged with:
 

Technology Spectator recently published an article highlighting the need for big data speed …

This article highlights the communication challenge for accredited A1 professionals.

We all recognise Analytics is about using information better than competitors, so we are: 1. doing things better, and 2. doing better things than competitors/relevant comparators.  But like so much of the coverage of our sector, the article focusses solely on Operational Analytics, not the latter area of Strategic Analytics.

Secondly, the article fails to recognise speed is only one part of the equation.

Taking the author’s example of the retail sector, sure real time analytics can detect an early decline in sales for a particular product, controlling for some extraneous factors. But a retailer’s promotional response (who they target and how) doesn’t necessarily require real time analytics (they can apply in real time outputs from models created last week, with little risk of degradation).

The most important questions for shareholders of the retailer require Strategic Analytic capability: how should pricing across the entire product portfolio be optimised?, what products should we be ordering now for next season (or the season after)?, how to optimise the physical network and supply chain? These strategic questions demand the right answer, not necessarily the fastest answer.

Any experienced industry professional gets that making sense of data is our primary role. But clearly interpreting data to the best of our ability flies in the face of throwing away information (e.g. because inconsistencies in the available data makes the task more cognitively complex). No one would advocate storing and processing data which possesses no incremental information value, but information value can be measured, so that shouldn’t be an issue.

Critically this article fails to recognise many of the barriers for Australian companies in effectively using their data relate to data quality, not their data storage and processing capacity.

Finally, there is no explicit recognition of the talent required to use data more effectively than your competitors.

From an A1 perspective we should welcome the growing focus on our sector, but we need to better articulate the more nuanced (and interesting) story of Analytics in an A1 Practice. It would be easy to criticise the journalist for being naive in swallowing the line of vendors and other vested interests, but the responsibility is ours to better explain the reality.

Eugene is totally right that we need to stand with a united voice. From today, NTF with publicly back A1 in all our proposals and marketing collateral. I regret not taking this action sooner.

Continuing with the big data meets big hype theme:
So you want to get into Business Analytics/Big Data/Predictive Analytics.

What areas, skills, tools, data should you focus on first ?

There are three rather big questions that you need to ask yourself:

1. How well do I really understand the problem(s) that I want Analytics to solve, and The roles(s) that Analytics would play ?

2. How well do I understand my data?

3. What data do I actually have, or can get ?

Each question explores a continuum. Together they represent a three dimensional space of possibilities. There is no “magic quadrant” here, each part of the space is a legitimate place to be, with its own solutions, risks and benefits.

Let’s go through them.

1. The range of possibilities looks something like this:
A: having built preliminary offline random forest models and created some prototypes, I want to extend these existing customer acquisition and retention models we have to our intentional markets, and operationalise them for real-time, event based activity, provided this is seem to yield further significant yield. We will need an industrial strength, scalable, and reliable tool, probably a commercial vendor tool, and possibly a Hadoop-based MapReduce solution

B. my CEO just attended a lavish conference where he saw a slide presentation mentioning the Davenport HBR article from 2006 and now he wants us to “get into analytics”.

Most people are somewhere in between. But you get the idea. And there are far too many initiatives that are precisely at B. the ideal vendor customer is precisely at A. Unfortunately, there are not enough A’s around (we call them “Eduacated Buyers”) so some vendors must sell to people who look more like B’s.

Naturally, Analyst First does not advise Bs to get into Big Data, buy expensive vendor tools, or ever believe anyone that there is such a thing as “a solution for getting started in Analytics” especially when said solution is no more than a bunch of software and maybe a few relatively junior technical consultants for a few months.

Indeed, we advise the Bs of this world to invest in learning, exploring and gaining experience, while managing their sponsors’ expectations and growing their personal investment and participation in the new Analytics enterprise (yep, it’s an Enterprise, with all the Lean Startup that entails), and eliciting from said sponsors their real, and realistically achievable needs.
This is a crucial time to invest in smarts, experience, talent, learning and plenty of Lean Startup.
If this approach is not feasible, I do not have high hopes for the future of the function, which will, at best become a showpiece trophy of high tech adding no value, and will more likely be shut down, “restructured” and restarted again, hopefully with a more sensible approach.

And what of the As ?
Speaking to an A recently, indeed one of the best As I know, he noted that his team had kicked some great business goals recently, having implemented a very necessary expensive vendor tool, after trying R and seeing that it was not up to the big data / big crunch job they had to do. He noted that this was necessary, even though he agreed with A1, and that this was not in line with A1′s preference for open source tools.

“not at all”, I replied, “This is exactly A1, you were the quintessential Educated Buyer! A1 is not against vendor tools. We are against people spending money on what they do not understand in the hope of a magic solution. You don’t fall into that category.”

Hopefully, the anonymous A in question will write a more detailed post on this blog, outlining his success story in more detail.

So, our advice to As is… You don’t really need our advice, until you want to do something new again. In which case, chances are you are following A1 principles already, explicitly or not – otherwise how did you get to A in the first place,anyway ?

Most people are somewhere in between, and usually closer to B than to A.

Answering the “what the heck are we going to do?” question involves exploration on a number of axes, including stakeholders needs, own capability, available resources (human and electronic), any impediments or constraints (Hello IT!) and data, the subject of questions 2 and 3. The actual hidden contents of the data, the “gold” of the data “mining” metaphor is a huge exploratory subject in its own right, and must be considered in the context of the others.
This is not a very easy target to hit, and needs defining before that can happen !

So, to all the Bs and almost-B’s out there : invest in learning : invest in your own and your sponsors’. Invest in getting your sponsor invested, supporting and covering you, letting you explore and grow. Invest, above all, in exploration and invest in managing expectations and delivering intermediate ressults to allow all this to happen. Buy your analytics function a chance to grow, learn, explore and breathe free of unreasonable pressures and constraints.

The other two questions will be covered in upcoming posts.

A few thoughts on Big Data, and how it matters, as well as where the hype does more harm than good. Also some thoughts on what I call “Big Crunch”, Big Data’s Siamese twin, and one I am well acquainted with, more so than Big Data proper.

First of all, the hype. A colleague in the industry spoke to me recently regarding an all too common scenario.

The colleague is effectively the analytics manager of the business. The trouble is, the business does not really have much of an idea about what data analytics is, apart from the fact that it has something to do with big brand software, which the company has in abundance. There is far less idea that it requires direct engagement and understanding from management, and most importantly that it requires clean, accessible and appropriate data before anything can happen, and that this requires a clear, concerted effort, and that this usually consumes the bulk of investment in time, money and political capital. Certainly, there is no realization that what the business needs done is easily served by open source and commodity software, but requires investment in good staff, and appropriate, coordinated effort from IT and management.

So far, so typical A1 case study, could really be anyone, and probably an adequate description of at least 100 cases in the city of Sydney alone. The Big Data twist is that a senior manager has been sending the colleague material on Big Data, and how important it is. Again, this is an increasingly common occurrence.

Naturally, the colleague is exasperated. First of all, the business is not really one that collects or benefits from terabytes per day of data. Secondly, they have not even got their small data right It is neither clean, nor reliable, nor ready for analysis. Finally, the manager in question has done little to get appropriately educated or involved in analytics, but bought deeply into the Big Data hype, seemingly without understanding it too much either.

And here endeth yet another, all too typical cautionary tale.
But Analyst First is more about “Though Might Want To” than about “Thou Shalt Not”
So here are the “Thou Shalt” Analyst First commandments on Big Data:

Hear ye o senior manager/sponsor of Analytics!
Before you worry about Big Data, or dare speak its name:

1. Get your Small Data right first. Organised, cleaned, and analysed.
2. Be engaged in analytics. Investing in analytics is like investing in a Gym membership, or an education – you don’t just spend the money and make it someone else’s problem. It only works if you are directly engaged.
3. Invest in the the best Human Infrastructure you can
4. Squeeze all the value you can out of your small data first.
5. Use Big Crunch on small data before moving to Big Data.

So point 5. brings me to Big Crunch.
Ok, admittedly, Big Crunch is actually a feature of an upcoming post, but the intro and teaser is here (Just as A1 itself started life as a teaser at an R presentation to MelbURN almost 2 years ago)

So, what is Big Crunch ?
Big Crunch is my name for the very cool stuff you can do with small data given heaps of computation. Much of my recent work has made extensive use of tools such as re sampling, and its specific manifestation in Random Forests, k-fold cross validation, and other advanced predictive modelling techniques.

I have over the last couple of years developed a number of new, tortuously computationally complex (but deliciously parallelisable) techniques for measurement of such beasts as robust predictive variable importance, generalised nonlinear correlation and forecast performance. I use these to squeeze information out of data sets big, and, more importantly, small.

Incidentally, in practice “small” includes huge data sets with tiny classes, where the task is classification, and the business problem may be customer retention, acquisition or fraud detection.

Big Crunch applies extreme statistical sophistication to small data sets, just the very thing you need to get the very limited amount of information out.
These methods really do squeeze the informational juice out of data the way simple, mean estimating, linear models and Pearson correlation measures cannot ever hope to do.

Indeed, much of my recent work in areas as diverse as geophysical analysis, retail price elasticity modelling and customer acquisition in entertainment can be described as Big Crunch. In all cases, we see dramatic improvements in robustness, performance and reliability. The biggest challenge is to explain the difference between less reliable, classical measures such as Rsquared on training data, and far more reliable, seemingly magical OOB error on random forests, and how these serve as building blocks for far more sophisticated solution.

Finally, the Big Crunch mini-FAQ.

Is “Big Crunch” actually just “Big Data” ? Yes, if you like, in a way. Similar principles usually apply.
Can/should you Hadoop/MapReduce it ? Sure. Highly parallelisable as a rule.
Can you deploy Big Crunch on commercial systems ? Sure, why not. But do you really need to ? (In fact, we just deployed a Big Crunch solution in SAS, after prototyping in R.)
Is Big Crunch difficult ? It shouldn’t be. Not if you have the right people. They are around. And your people can become the right people with a bit of mentoring. A stats degree helps.
Is Big Crunch expensive ? In people, moderately. In systems and software: hey, Hadoop is open source. And so is R. And the same applies to Big Data too. So no.

Set your Twitter account name in your settings to use the TwitterBar Section.