Klaus Felsche is the Director of Intent Management & Analytics for the Australian Department of Immigration and Citizenship (DIAC). We sat down recently to talk about the evolution of Analytics at DIAC, how DIAC’S Analytics initiatives are managed, and the composition of its Analytics team.

ANALYTICS AT DIAC

SS:  How did you first become aware of Analytics?

KF: In 2009 I was asked to reshape my job in the Border Security Division at the behest of the Chief Information Officer, who wanted to have an Analytics capability in DIAC. I had to Google “Analytics” to find out what that was supposed to mean, which was less satisfactory than perhaps it should have been. There was a move across government and amongst various CIOs and other interested players, along with a big push from industry, to go and get a hold of Analytics—whatever it was at the time—in order to try to inform decision making in a more constructive way than had been possible before. So we didn’t at that point have a detailed understanding of what it was and what it involved, but we had a reasonable sense that it might be a good thing.

SS: Where did you go—past those initial Google searches?

KF: I started doing the exploration and following the trails and found a lot of dead ends. I also found that there were some large multinational suppliers to go and do what was called “Analytics” work, and I guess at that stage they were limited in what we could expect from them in a number of different ways. For example, I don’t think we provided much good business input into what these organisations were seeking to do because we didn’t understand our problems in analytic terms. They didn’t seem to understand our business as well as we thought they should—or perhaps as well as they should have. They offered some fairly simplistic solutions, which were much more traditional business intelligence focused than Analytics focused. From our point of view, the whole thing was about identifying potential risk to our various business lines—that’s what Analytics was really aimed at. Following those lines through, I was fortunate that I then had access to people like Warwick Graco from Tax [the Australian Taxation Office (ATO)]. Warwick was one of the first people to help us distill our own way of thinking about what Analytics meant in a large government organisation and how it would be operationalised. As an aside, I had worked with Warwick in the 1980s when we were both serving at the Australian Army’s Officer Cadet School. And it was through that, and listening to Eugene [Dubossarsky] at one of the ATO Community of Practice sessions in late 2009, early 2010, which Tax kindly invited us to, that we started to form our own definition and identify what we could do in terms of immigration business.

We also spoke to some of the banks, and the odd insurance company, and a number of organisations along those lines to get some baseline comparisons. At the same time, of course, once we started asking questions, word got out to the IT industry, which started to push quite hard to capture a portion of our business. I was quite fortunate in that we had no budget to speak of, so I was relatively immune from making foolish business decisions.

So we had started to form our ideas, and then it was a case of fitting into some of the organisational transformation processes. In 2010, the Risk, Fraud and Integrity Division was formed to provide risk-based services to the rest of the organisation. When I was in the Border Security Division, the focus was usually on border security and border transactions, and much of the visa caseload was secondary. Risk, Fraud and Integrity Division allowed us to look across all of the organisation. We then ran a large workshop at the Deputy Secretary level between Australian agencies and New Zealand participants also joined us. Graham Williams from Tax was asked to present because we identified him as someone that clearly knew what he was talking about. All of that laid the foundation for what we’re doing now.

SS: What did an Analytics capability mean then? How did you and others picture it?

KF: The traditional view was it was something to do with IT but providing a business service. Some had the sense that an Analytics capability was an additional service provided by our Business Intelligence platforms. The Chief Information Officer at the time viewed it as an IT-enabled service but he didn’t give it to the IT people. He gave it to the business side to progress, which to me demonstrated that he understood it. I think there was a deliberate attempt to try to get a more open approach to looking at data and what it could do to inform the organisation. The CIO was very clear that we had to be focused on risk identification in the visa caseload.

SS: How was the upside pictured? Not letting higher cases through? Being much more efficient in processing lower risks? Gaining new intelligence? Having an evidence base for making decisions?

KF: At some level, all of those. But I think the main business driver was efficiency: identifying risks and then aligning differentiated treatments appropriate to treat them. Some aspects of which are counter-intuitive if you start looking at the detail. A high risk case may actually require minimal treatment because if you can identify it the decision is relatively simple. You don’t give the person a visa if they represent high risk. The decision can be made in a few seconds. The same is true of low risk. If you can’t see much risk then the processing is also relatively straightforward. The hard part of the caseload is usually the large component in the middle. That is actually the intensive part of visa processing. So very quickly in our area we started focusing on trying to identify as clearly as we could the likelihood of particular risks occurring in the caseload. That is really what has been our driver.

A big issue for us was cultural change. If I go home to watch a football game on television, and I turn the television on and get the right channel, and the sound is OK, and I can see the picture—then I really don’t question the wiring behind the screen. I am looking at performance, and I can see an outcome which convinces me either that I bought a good television set or a rotten one. In this department, there is a legal construct called the delegate who makes decisions on behalf of the minister. An officer will make a visa decision or a border decision effectively as the Minister’s delegate, and it is that person who needs to be satisfied that a decision is valid and complies with the law and policy settings. Many officers are  not prepared to make decisions unless they have personally checked all aspects even if an analytics process informs them that risks are negligible, nor are they prepared to believe a system that says “Trust me, you don’t have to do the full gamut of processing”. So we naturally get pushback from the individual, who says, “Well I am the one who is the delegate, not you. I will decide when I’m satisfied that an application meets requirements, not a machine. When you present a complex means of identifying what could be a streamlineable case because it has almost no visible risk you may find that the pushback from the processing officer is: “How do I know? What happens when it goes wrong? Who’s going to get kicked at the end of the day for making a wrong decision?” If officers assume it is going to be them, then they will insist on doing full processing anyway, regardless of whether or not a case has been identified as low risk.

So there is a lot of cultural change that has to occur before you can trust the process. Going back to the plasma screen example, what we actually have to do is establish that the system is producing results that are reliable. That’s evidence. But people have to actually see that the results are reliable to develop trust in the system. Only then will they start believing it. Until we can demonstrate that, people won’t trust the ‘black box’. They will simply try to second-guess it. With Analytics, particularly using things like predictive modelling, nothing frightens people more than saying, “Trust me. The algorithm running in this little machine will decide whether or not you’re going to get sacked for making a poor decision”.

There’s a real tension here. If the predictive rules you discover through Analytics conform to existing practices then you can embed them in an IT system and they will be accepted. The problem with that is, you’re not a making any advancement towards more efficient processing because you are basically embedding commonly understood business rules that were being applied anyway. You may have standardised, but you’re not looking at anything new.

SS: Was the cultural challenge something you anticipated? Did it come as a surprise?

KF: No, it didn’t. The volume was a bit of a surprise. The difficulty of ‘selling Analytics’ was the gist of Eugene’s presentation two-and-a-half years ago now. In that presentation he covered the issues of how you get corporate buy-in—not just the top level, but other levels. So we knew that it was coming. What we had underestimated was the severity of the pushback. A single anecdotal negative outcome can put a lot of pressure on an Analytics area. You can have 99 out of 100 things working really well and that will be just expected. One bad case will negatively colour the whole enterprise if you let it.

At the same time, the challenge we had was that this was a brand new field. We had to demonstrate potential value very quickly. We used examples that made sense to everybody up and down the line to prove our capabilities. A very early piece of work that we did on some travel data identified that at a particular point in time in 2006, that anybody with a Belgian passport trying to get on a Cathay Pacific flight in Hong Kong and wanting to fly to Brisbane or Perth was 80% likely to be an imposter. This had never been highlighted in that way before, and we demonstrated it just through a simple decision tree, but it was enough to highlight to people just what the power of this particular approach was. It was a data-derived pattern. I remember going to the Identity Branch and saying, “Guess what we found?” And they said, “Yes, we knew that. They were all imposters.” Only a few people at the time would have been aware of this. There were other similar patterns, still invisible to the organisation. It was an interesting exercise in selling the power of the concept. It was a really important part of the process because that decision tree went up all the way to the senior executive as an example of what could be done. This then gave us the permission to go and explore further.

Some of these problems still persist and will persist for years to come. There will be natural organisational reluctance to adopt some new approaches for not necessarily logical reasons that are nonetheless understandable within cultural and organisational constructs that exist. For example, if I use the word “efficiency” some will start counting the bodies they’re going to lose as an ‘efficiency dividend’. We have to get on the front foot, pointing out that in four years’ time there will be a million more visas to process given current growth rates. How are we going to do that with current methods and our current processes? Analytics offers a solution that can help you manage the growing caseload without a loss of integrity.

APPROACH TO ANALYTICS

We have developed a project methodology to develop and deploy innovative approaches, but it’s a non-standard methodology. Our process starts off in the lab. We first have to convince ourselves that something is worth selling. So we want evidence. We want the confidence that whatever we are going to be building is actually going to yield a positive outcome. Once we’ve done that for our own purposes and signed ourselves up to the concept we can then start exposing it through a test environment to the business side of the house that we’re trying to service, but at very limited scale. The main purpose of that is to build more evidence, but to build it closer to the business, and to expose the business people involved in the testing environment to the potential that is offered. If it passes that particular stage then we can move into a proof of concept or prototype that is going to bring it very close to the front line but not in a full blown production environment, so that we can convince more business delivery people, get more information back about how it is going to be consumed, and still have the flexibility to tweak and change our view of how certain things should happen. Once we’ve got all of that boxed up we’ve effectively done the design and development stages of any project and we’re ready to put a really strong business case to the organisation which says: this thing works; it has the following limitations; the following people like it; the following people hate it; and it’s going to cost this much money to turn into an enterprise solution.

That’s quite different to the usual process that relies on coming up with all the answers up front. Doing design and development, in theory, up front, for something that nobody really understands—even the people who are pushing the project—then doing all of that engineering conceptually, again up front, and then in the last three months of a project trying to do the coding and the delivery and the socialisation on the ground. Then we may find out that it doesn’t work properly, and it doesn’t do what it is supposed to do, and people hate it because it is not performing, and you’ve switched the other things off that they were using before because you need the money to pay for all of the new stuff.

We’ve been avoiding that by using a gradual incremental process. The business, even in the lab stage, is aware of what we are doing. We involve the business directly as soon as something is let loose out of the lab so we can pick up ‘cultural bits’. Introducing it early means gathering a lot more information and feedback about what sells and doesn’t sell in our business environment. There’s another layer to this too. If you can get analysts and other team members who understand the business part of the world and the capabilities that your Analytics processes can enable—if you can get that crossover happening—then you start to think of brand new ways of doing the business, brand new things that you can do that were never envisaged at the start of the project. That’s what we have tried to do and it’s been reasonably successful in some areas. It may not fit every type of project, but it certainly does fit the Analytics environment in this Department.

SS: So in that context the Analytics function is at least in part an ongoing R&D shop.

KF: There are two components of our Analytics process. There is what I call the bread and butter Analytics work. Once we have established a dependency on Analytics for a business process we have a responsibility for maintaining that function. So when we produce predictive models for visa systems or border systems, effectively we go into our production environment. These models need monitoring, they need refreshing, they need supplementation, refining. New technologies become available and new systems approaches get brought in, but it’s really a production house. Effectively, that should be around about 80 percent in my view. Within that a lot of innovation is still possible, but if I don’t produce a model that supports a new visa class that’s coming online, then we’re not doing our job. If we get that bit right, then the other 20 percent of the time is the stuff of innovation where we can actually say, “Well, we’ve never looked at this part of the business before. Is there a new way of actually addressing this question or of addressing this risk?” So I think Analytics is both: R&D and Production.

SS: You’ve contrasted the incremental with the big bang approach to systems development, and couched that in terms of uncertainty. How have you thought about software?

KF: If you imagine a simple curve that traces at what stage you are in a systems development cycle—lab, test, prototype, production—making mistakes in the lab is low cost. You can afford to back a few wrong horses. You effectively have a bit of room to try out some thinking, and you can afford to fail, because in your lab environment the cost structures are, generally speaking, quite low. The main cost there is people unless you make the wrong decisions about platforms and processes and potential software solutions. At the other end of the scale, if you make a mistake in a production environment that has to process 40,000 people a day, one mistake can quickly become very expensive—you know the old line about a computer enabling you to make as many mistakes in a couple of seconds as you used to take a whole year to make manually. And the costs are commensurate with that. So if we get a production system wrong, and it doesn’t deliver, the costs are extremely high to the organisation, both in terms of what people think about the organisation’s competency to manage a process, and also in terms of infrastructure and other investments: people, systems, IT solutions, and so on.

One of the strategies that we have tried to use is to minimise the risk to the budget by presenting to the department a low risk model that allows us to get things wrong. If I had, for example, signed up with a software vendor that was going to cost me millions of dollars per solution—CPU dependent, volume dependent, data volume dependent, whatever—it would have severely limited our ability to get things wrong. Everything that we got wrong would have had a high price tag on it. Luckily, in Analytics, there is a fair amount of competition.

In a more traditional IT environment, utilising something like the waterfall project method, even starting a project requires pre-allocation of sizable resources. This effectively stops staff from trying out new processes or ideas as startup costs are high and the chance of failure (to deliver) exists in sufficient quantity.

One of the key factors in the Analytics space is the range of open source work being done that is actually, in some respects, more powerful than what some commercial vendors could ever provide. The concept of having a Research and Development department of 50,000 is beyond even the largest companies of the world but effectively that is what open source is. It is not a well-controlled R&D department, but maybe you don’t want that sort of control in R&D anyway. What we’ve found in the Analytics space is that the open source solutions have allowed us to do things that otherwise we could never have even looked at seriously if we’d had to pay the price of a high price commercial solution. Just the concept of having to go out to tender to test something to do a particular thing is a total deterrent to trying it. We might want to do, for example, some entity extraction work. There are a number of commercial and open source software solutions around. If I go for anything I have to spend money on, it’s a formal procurement process. I have to go out to tender or similar processes. The costs are not trivial—both for us and potential providers—and there is usually a significant time lag. Open source allows us to go and try things straight away. It may not be the best solution around ultimately, but it enables us to build some confidence around the process, and then—if we have enough confidence around the process, and the open source software turns out not to be sufficiently robust for one of the next stages—at least we have the evidence base that the concept and the process work to a particular level of certainty. So we are not actually deterred from trying. That’s what open source has managed to do for us with Analytics.

SS: You’re seeking to delay any large expenditures until the point that your uncertainty has been commensurately reduced, which being able to make mistakes in a lab environment, without going out to market, enables you to do.

KF: It’s a case for value for money: what value can you demonstrate empirically that you get for your money. So, if you get to a particular point at which the open source or the free or the home-built solution can no longer cope, you’ve got lots of evidence that the process has value. Then you can do a decent cost/benefit analysis of potential tenderers with commercial software—plus you’ve got a really good benchmark expressed in terms of capabilities and functionality that you can use to filter other solutions against.

Some government organisations end up going through those processes and some build their own because there is nothing in the commercial sector that will do the job. That’s fine. But don’t forget the $200 software package that might just do the job. Perhaps not to 100 percent of the capability, but when you want to get to 100 percent performance and you’ve got to pay two million versus two hundred dollars, you can do the cost/benefit analysis. Using this approach shifts a great deal of risk to us. It is much more comfortable having a large vendor accepting all responsibility for maintenance, problem fixing, etc, rather than them having to do it yourself.

In a lab environment you are insulated. You are not impacting the general IT environment. Even in the test environment you can separate most of your functionality out from the normal IT environment. There are few risks to our standard data platforms and systems. Our IT people are responsible for the safekeeping of data and the reliable provision of very time-critical and important client services. If you introduce something into that environment that creates hazards then they are quite justified in protecting the systems. This is doubly true in complex data environments where multiple systems could be affected by a rogue process. The tests we run in the lab environments allow us to demonstrate what the potential risks to the rest of the environments are through our work. And again, that builds confidence—should build the confidence—with the IT organisation that by introducing this piece of software you are not going to damage a whole enterprise solution. There are lots of advantages to using the low cost, low impact, high gain, lab/test environment to build an evidence base across all the dimensions Analytics has to address.

ANALYSTS

SS: What do you look for in an analyst to fit into that environment?

KF: The first thing is, obviously, that the analyst must have the technical skills that are required to work in that environment. At the start, if you are not a technical expert in the field, as I am not….

SS: The bamboozlement risk is high…

KF: In the public service environment, once you made the commitment to hire, it takes you three to six months to bring somebody on board. If you make a mistake, you haven’t got the capability you thought you hired and you now have a staff member who is probably as unhappy as you are about the job they find themselves in. My preferred method is to go out and contract in some skills to establish a baseline. The risk is relatively low in making a decision about a contractor because if they don’t perform, the contract can be terminated.

SS: It’s the open source model, but with people.

KF: When you’re at the start of the process and you don’t really have an ability to make a judgement about what is good and what isn’t good, I think the net risk with contractors is a lot lower. The other things I expect from contractors now, knowing what I now know, is that they have the ability to communicate and that they are be prepared to go and identify business problems, take ownership of business problems and look for business solutions, rather than being a technician who sits there waiting to be told to do something by somebody. So they have to be able to communicate both ways. They’ve got to be able to understand what the business problem is. They have got to be able to communicate things that are important to the business, which includes bad news and good news. That’s not going to happen overnight, and you can’t hire it at the start of a process, so you have to be prepared to invest in the individual in order to allow them to learn the context.

There are some that will probably direct their skills at a variety of problems which are very important, but which can be solved with some fairly basic skills. ‘Basic’ is not derogatory. It’s actually a lot more advanced than I would have, but they’re not super-skilled, although perhaps we can provide a path for them to become that. In our context I like a national approach. We are doing a lot of work developing our analyst network at the moment. The main reason I want it is that there are some smart people sitting in Perth and Brisbane who we might want to tap into, and in this particular age there is no reason why in a globally networked environment they can’t be working on one problem all across Australia.

Then there are starter analysts—people that provide pretty much a refined business intelligence capability and have the fundamental skills of reading and manipulating data. They’re doing things with data that are actually quite sophisticated, but they don’t require a Ph.D. in statistics. They need to be able to drive the toolset, and they’ve got to produce answers that are reliable and valid.

Then of course there are more sophisticated analysts who can effectively take an intellectual lead.

In any organisation like DIAC, you will always have a bleed out of some of these skills either to other parts of the economy, or even to other parts of this organisation. Traditionally, we cry a lot and complain that there’s a loss of skills, and the reason for that is that we don’t yet have an active process in place that grows those skills in house, so that when you lose your ‘left winger’, there is a young guy in the junior team that is ready to move in. That’s the sort of thing we need to do more of. That said, bleeding out is not necessarily bad, because they go to other business areas that are then better informed about what they can expect from an Analytics process.

SS: Cycling back to where we started, 3 years ago, with a CIO deciding that he would like an Analytics function, how has the level of understanding of Analytics and what it can deliver for DIAC changed over that period at the executive level?

KF: The ability to show relevance and quick wins to the business, has at the very senior executive level certainly given us the opportunity to continue for the time being. We are being relied upon increasingly to produce business outcomes that are going to help in a more constrained resource environment. Our executive is also aware that we can see things that we’re never been able to see before. I think all of those things, if they are communicated appropriately, are good things. If the Secretary—and he has done this—can pop down and speak to an operator at Sydney Airport who says, “Boss, this is really good. I don’t know what the math is, but it is pointing to the right people,” then the message has obviously gotten though at all levels.

SS: Do you have a strong view about where an Analytics function belongs with respect to the IT function, or with respect to the organisational divisions that it’s supporting, or on centralised versus embedded arrangements?

Analytics sits in the business because it asks and answers business questions. It finds solutions to business problems. It is not an IT part of the world. It’s not a housing or a building service. It is a business process as far as I can see.

Ultimately the sort of people we want to attract to this area are business people who are outwards looking and constantly searching for what we could do for DIAC, even though the business itself may not have identified a problem or an issue or an opportunity. The people in the Analytics function need to actually be in contact with business areas and be prepared to say, “Have we got a deal for you,” and the deal should not be presented as “You can reduce staff numbers if you let me put a little grey box in the corner”. That’s not a good deal. You’ve really got to understand the business quite well to be able to say, “We may be able to do something for you. We need your support to try it out. Here’s the idea. Into the lab it goes, and we’ll let you know how the experiment turns out, and whether it works or not we will let you know.”

Related Analyst First posts:

 

Today’s post comes from Warwick Graco, a founder of the IAPA and one of Australia’s leaders in Government Analytics:
 
Analytics like all disciplines is not immune from the effects of the Digital Revolution. Indeed its very existence is due to the rise of Big Data where many organizations are now drowning in data as a result of the rise of social media and the use of digital mobile devices. Furthermore, very little of the data collected by organizations has been converted to knowledge and even less to intelligence in terms of identifying the important insights that should underpin the decisions taken by managers.
 
One of the other noticeable effects of the Digital Revolution has been the compression in the timeframes between the collection and processing of data and the dissemination of results to decision makers. One has only to go back a few years where those doing analytics had the luxury of months to develop and deploy models. Now that time frame is rapidly decreasing to a few days and in some cases to one day or less.  In addition to very short time-to-market requirements, those doing analytics have to still produce quality solutions that provide value for money. Return on Investment or ROI and the like are still critical requirements for all those who analyse data and report results.
 
The above all adds up to a demand that has become a cliché ie the need for agility. All who do analysis and reporting these days have to be highly skilled as well as be flexible, adaptable and available. We live in a world that is shrinking in space and time as a result of the Digital Revolution. It no longer matters where people live, work and play as long as they are connected and can communicate in cyberspace.
 
This begs the question of how one becomes agile in a digital world. There are many spokes to the wheel with this issue. It is suggested that one important spoke is the need for multidisciplinary, integrated teams that can take data and convert it into products that meet the needs and expectations of users.
 
The type of specialists required in these teams will vary from issue to issue but typical skills include:
 
• Intelligence Specialists who collect and analyse information on issues to determine opportunities and threats. The intelligence generated informs risk analysis
• Risk Analysts who identify the risks with particular opportunities and threats and the mitigation measures required to reduce their negative consequences. For example, a cyclone is a threat that has many risks
• Profilers who identify the modus operandi of those who are either threats or opportunities and their defining attributes. An example is profiling people who steal the identities of other citizens to identify their signatures
• Miners and Modellers who explore data for new insights and who develop classification and prediction models such as those that identify customers who will churn in specified timeframes
• Data Analysts who extract, clean and manipulate data required by those who do analysis and reporting
• Business Intelligence staff who takes the results of analysis and present it in suitable forms for the consumption of managers. This function is critical because managers judge the value and relevance of analysis by what they see and hear. Therefore, it is very important to get this requirement right
 
Traditionally these specialists have tended to work in silos and therefore in isolation from each other. This has led to ignorance and misunderstandings of what each class of specialist does and to less than optimal results being achieved because the full capabilities of these highly skilled people are not brought to bear on the problem at hand.  
 
One solution to this challenge is form integrated teams made up of these specialists so that there is both unity and economy effort, that all the important skills are focused on the problems requiring resolution, that there is teamwork and team learning so that all profit from the work done and that learning can be applied to future problems and issues.
 
High performing teams such as these are agile and proficient. They can turn around tasks in very short timeframes. This is an imperative with issues such as customer churn, credit-card fraud, detection of improvised explosive devices in a theatre of war and the effects that a flood can have on a rural community.
 
There is much to commend in having integrated, multidisciplinary analytical teams. They are often worth many multiples of what they cost.

Warwick Graco has worked in analytics for nearly 20 years starting in this occupation when the term ‘analytics’ did not exist. He has seen the profession grow from its very small beginnings to what it is today.He has worked in defence, health and lately revenue collection where he heads a small team responsible for operational analytics. His academic interests include analytics, organizational change and organizational decision making

 

The buzzword of the year seems to be “Big Data”. There is a massive wave of promoters of the term, and there are inevitable detractors. There is also the issue of exactly how to define it. What follows is the A1 view on Big Data.

It is real, it is a game changer, and it is here to stay. It is no one thing, and its definition, both quantitiative and qualitative, is rather fluid. Nevertheless, some basic truths apply: Big Data is not a brand name. Neither is Big Data a tool, a business process or a solution. It isn’t even an idea as such. In fact, Big Data is best understood as a problem. Not a problem as in “trouble”, but a problem in the sense of a challenge or puzzle, or more precisely a growing family of problems that we are increasingly forced to grapple with. It’s a problem that does not come with an automatic solution, although there are a growing number of tools to help roll it around.

The A1 angle on this is: you cannot outsource your investment in Big Data any more than you can outsource your own education, or exercise, or being a patient in a surgery theatre. In this sense, what is true of Big Data is also true of Analytics.

Getting Big Data right means getting Small Data even righter. The sort of business that can get value out of Big Data will be one already getting value out of Small Data. Without the business fundamentals in place, Big Data will produce only Big Nonsense. Alternatively, if the logic is there, then Big Data will enhance an existing value-adding framework.

So: small data first, then big data. And before small data, tacit data, which you can always get your hands on, even if you have trouble wrangling the electronic stuff. And before all of those: logic, and human infrastructure. A well understood, well defined business model with well defined intelligence objectives. And incentives, with staff capable of navigating such an environment, managed by a sponsor possessed of the A1 “holy trinity” of adequate influence, appropriate motivation and sufficient understanding of the value, role, and needs of Analytics under their command. Is this too much to ask for?

I should also probably mention tools. Maybe. Last. Do they matter? Of course. So does oxygen. But it is ubiquitous, effectively free, and we take it for granted…

 

A1 is a proud supporter of the AIPIO Collective Forecasting Competition, hosted on Presciient’s new collective forecasting platform System II.

A beginner’s guide may be found at the top of the page.

Collective forecasting and related methods such as prediction markets represent the area of analytics that we call Tacit Data Mining, and allow the extraction, deployment and analysis of the most vital data in the organisation, which lives in people’s heads. It also provides the ultimate data fusion platform, fusing all available data through human filters to provide powerful strategic decision support.

Collective forecasting allows accurate forecasting of future events, and also can condition those events on possible actions, thus providing a powerful decision support. It identifies the consistently most effective forecasters, acting as a filter for the most insightful and prescient members of staff or the public.

It has application in any strategic decision support domain.

The competition at hand has 3 expiry dates for predicted events: in April, July and October, each has prizes for 1 month ahead, 1 week ahead and 1 day ahead. The July and October expiries also have 3 months ahead prizes, and a six month ahead prize for the October expiry.

The one-month ahead April expiry deadline is tomorrow, so don’t delay, register and put in your predictions.

EMC Greenplum’s Mark Burnard and I sat down late last year to talk about the future of data warehousing, big data, and analytics…

SS: How did you fall into data warehousing?

MB: I built spreadsheet systems during the early to mid-1990s while working for Rio Tinto and then Telstra. Then I was working for Corporate Express around 1995 when someone dropped a folder on my desk that said “Cognos”, so I picked it up and kind of ran with an embryonic data warehouse project at Corporate Express, built in SQL Server. So DTS, transformations, dumps out of a mainframe into text file, suck them up every night, build a cube in Cognos. And we took that live, and I thought okay this is really cool, I’d like to do more of this, so I jumped into consulting. That would have been 1999. Since then I’ve moved from BI report development to requirement definitions though to solution architecture and then more strategic consulting, and now to EMC Greenplum. That summarises about 20 years in IT.

SS: In data warehousing there’s a “warehouse first, marts second” approach associated with Inmon and a “marts first, radiate out” approach associated with Kimball. Do you have a view on those design philosophies?

MB: I think Inmon, as the father of data warehousing, came up with the concept of the corporate information factory. The philosophy as I understand it is that you want to build this machine that can answer any question that anyone could possibly ask about the business. So to do that you start off with the data architect and he maps out the entities that are of interest to the organisation—what matters and then what the attributes of those entities are at a high level, then comes up with a conceptual data model. The goal is that you encapsulate in this model everything that matters to the organisation. You turn it into a logical data model, then into a physical data model, then you start building the data warehouse, and eventually you can answer any question out of it. Data marts come along as a secondary spin-off. The underlying warehouse is big and complex, and it’s assumed that the analysts who want to work with its information are 1) only interested in a specific subset: sales, marketing, finance, HR…whatever it happens to be; and 2) that there is complexity in the model that you don’t want them to see as they’ll get really confused, or they’ll drag and drop the wrong columns into a report and start reporting numbers that are faulty. I think that approach works fantastically if the world stays still long enough for you to complete your data model and get the data warehouse built, but the reality is that that never happens. You could probably safely say that world was a bit slower 30 years ago when organisations were doing this model. The early sites tended to be companies that were so big that a project like that was able to return a business benefit simply because of the size of the organisation. The Bank of America, Citibank, AIG—some of the big guys who built the early data warehouses—benefited a lot from that approach. But I think now the game’s changed, and a 6 month turnaround time for a BI project is too slow. Marketing teams want to be able to spin off new products, sometimes one or two a month. Telcos, for example, are spinning off new plans, new product bundles, new marketing messages and solutions that require a lot of agility in the billing system and the provisioning system. Their core systems have to be a lot more agile and the data warehouse has to keep up with that. So if we’re structuring a new product bundle every month and taking it to market, the billing system may be able to keep up with that, the provisioning system may be able to keep up with that, even multiple provisioning systems, but when that all hits the data warehouse—in some of the situations I’ve seen—it becomes a screaming mess. Nobody can keep up, the warehouse can’t keep up, and then the folk who’ve rolled out the new bundle want new reports within the first week of having it in the market. They want to know “How many are we selling?”, “What’s the profitability?”, “What is the profile of the market segment that is purchasing this new product?”, and the warehouse says “We can give you that information in 6 weeks, but right now you’ve just broken all my reports with your new constructs and we’ve got to do some extra work on this data mart over here, and then we’ve got to tweak the ETL, and then we’ve got to do regression testing, impact analysis, etc. So these days, there needs to be a way of meeting the business need for both agility and reporting that keeps up with the pace of change. From what I’ve seen, the classic traditional data warehouse model is not able to do that.

SS: So business is becoming faster and organisations are becoming more complex, and people with an appetite for analysis are demanding more in terms of information, feedback, and measurement?

MB: I think it’s fair to say that the pace of business change has accelerated.  There was a time when the core business of a bank might have been savings accounts and mortgages, but within the last 20 or 30 years that has diversified into online trading, reselling insurance products, having superannuation funds to manage. So the diversity of offerings that a bank is now bringing to the market—they might be offering 50 to 100 products where in the past they offered 10 or 12. Same thing with a telco. Back in Telstra’s early days it was fixed lines, and the only thing to worry about was local calls, STD calls and then international calls, there was nothing else. Now you’ve got mobile, data over mobile, over 3G, all the different product bundles that people have, caps—you’ve got telcos churning out new caps every 6 months—new phones, new bundles, bundling fixed with mobile with home internet and even cable TV—and needing to track when someone cancels one service but is still getting the bundle discount, for example. All that kind of complexity never existed even 10 years ago. So I think the pace of change and the complexity of doing business has changed, and the classic model that we’ve had for data warehousing has not significantly changed. In fact it used to be a line we would drop to reassure the client when positioning a data warehousing project to board level executives, to say “Data warehousing has not changed in 30 years”.  In other words, you can be confident that this is a well-worn space, an area of technology that was not invented yesterday, and therefore that all the lessons have been learned, all the pain has been experienced, all of the lessons have been written up, and so building a data warehouse is a pretty safe endeavour.

SS: So presumably you don’t present that message today?

MB: No. Because I think the message isn’t accurate anymore. I see it diverging in two directions. I see your classic data warehouse which has the stringency around it. It has the discipline around things like being able to identify your data lineage, having your metadata available, being able to explain that this number on this report came from this table in the data mart, which came from this table in the core data model, which came through these transformations, from this source system, and that says 100, and this says 100, and we know that the numbers are correct. That’s been the standard approach to data warehousing, and everybody spends huge amounts of energy, time and money trying to keep the warehouse up to date so that you can confidently say to the board, or the ASX, or ASIC, or APRA, or whoever it happens to be when you put a report down in front of them, that you’ve got your traceability, your auditability, your metadata, your data lineage, and so on. The CFO, for example, has to be able to sign off on those numbers. The problem now is that we’ve taken that disciplined model—which is fantastic for reporting on financial numbers to regulators—and we’ve extended it into the domains of HR, and marketing, and across the entire enterprise data model. We wanted to fit the entire enterprise into the data warehouse. What I think we’re seeing already happen is that the locus of subjects that fit into the traditional data warehouse is shrinking to include only those where that level of rigour is required for reporting and analysis—areas such as finance, risk, and other things that go to regulators. All the other stuff, which is the other 95% of the business, will probably end up in a much more flexible, dynamic platform which we’re starting to call an analytical warehouse or analytic warehouse. It’s not your traditional data warehouse. Usually in terms of data volume it’s a lot larger, a lot more atomic, a lot more nimble, agile, and almost ad-hoc in a way. To be honest, people have always done this—they’ve just done it under the table: take a data set, throw it into an Access database, take a bigger data set, throw it into SQL Server, do something on a laptop, do something on a server that they bought on the corporate credit card down the road because the data warehouse wasn’t able to give them the data they required, or because running the query would slow down all the standard reporting. People have always done analytic warehouses; they were just under the radar.

SS: So they’re coming out of the closet in a sense?

MB: That’s right.  Because the pain of the traditional approach is now such that it’s unsustainable, and so what everybody’s been doing all along covertly, organisations are now beginning to look at and say, “We have to cater for that model.” They have to have a way of managing information that caters for ad-hoc analysis, for marketing guys running quick, nimble little models to come up with some market segment so they can generate a campaign to a particular demographic. Traditionally their only official option was the data warehouse, but they couldn’t throw enough information in, or the queries ran too long, or the joins were too complex given some third normal form industry data model. They’ve been doing in under the table, so let’s elevate that and bring it back into the fold of IT but on a platform that can handle that kind of ad-hoc dynamic: throw some web clicks in here, throw some Twitter feeds in there, mix in some data from the Australian Bureau of Statistics, get some smart people to have a look for correlations, integrate the results from the latest campaign for marketing a platinum credit card to a certain demographic, and let’s do some follow up on that. All of this dynamic information needs a home—well, it’s always had an unofficial home, on someone’s hard drive—but what I see happening is that moving back into a more recognised, structured place, without the imposed controls that had it running away from home in the first place.

SS: So it sounds like we’ve got the emergence of two domains. There’s a domain of stability—the traditional data warehouse which fundamentally hasn’t changed much in 20 to 30 years. Its set of disciplines maps well to data and business processes that also fit that description. Something like financial reporting, where the accounting standards are pretty stable and where there are widely accepted processes which are professionally constrained for getting data into that shape. Then there’s a more uncertain domain—the data’s uncertain, the analysis in uncertain, the queries are uncertain, the perspective is uncertain, and it sounds like that’s broken in a way too. You’re saying that people have been doing it under the table for a long time, but it sounds like the scale is now such that it doesn’t fit under the table anymore. The tools are breaking.

MB: That’s exactly right. The data volumes that we are beginning to deal with won’t fit on a little server under the desk anymore. You’re talking about data volumes that will explode out past many terabytes eventually and you’re not going to fit it on your laptop, and if you did your single query would take the whole weekend. Maybe a real world example would be useful. One of EMC Greenplum’s reference sites is T-Mobile in the US. They have their classic traditional data warehouse. It’s 100 terabytes of data and it has the discipline and the control and the industry standard data model. Billing information comes in, provisioning information come in, and they can report to the market out of the data warehouse their precise number of subscribers and their exact financials every month. All of that controlled information that you’ve just described. Alongside that they have an analytical warehouse which happens to be a petabyte. It’s about 10 times the size of the traditional data warehouse. Into that flows a lot—if not all—of the same data. But it also contains 900 terabytes of other data, which is market sentiment analysis, unstructured data, and all kinds of data that is structured but which is too hard to fit into the industry standard data model. The other thing that’s probably worth mentioning is that within the 100 terabyte traditional data warehouse you have a lot of summarisation of call data records—for example, after 2 or 3 months you’ve finished with your standard reports and you summarise them to one row per customer per month instead of one row per phone call or one row per text message.  This means you can still report accurately on all of your regulatory requirements, but you can’t do analysis anymore at the atomic level of detail. In the analytic warehouse, on the other hand, you might have that history for 3 or 4 years. Why would T-Mobile need that? Well, they were looking for drivers of churn, which is one of the biggest expenses for a telco. It takes 10 or 20 times the amount of money to win back a customer than it does to keep them and not have them leave. So they kicked off a project to try and identify the most significant indicators of a customer who was about to churn. To start with, they did some modelling around complaints to the call centre. Was there a pattern whereby someone who calls the call centre 3 or 4 times within 3 months was more likely to churn? That couldn’t be done on the existing data warehouse platform. Some of the data was there, but not all of it, and there wasn’t a business case to put it in there on the basis of speculation. And even if there was, had they started to run such an analysis it would have slowed down all of the business-as-usual reporting that the warehouse needed to support. Hence the analytic warehouse. First they did analysis on call centre logs looking to relate complaints to the churn indications that came from the warehouse. No correlation. Then they did a similar exercise looking at call dropout rates and call termination codes. Were dropped calls triggering churn? Again, no clear correlation. Note that they needed the unsummarised call records for this, but the data warehouse had already summarised them. So then they let a data scientist loose in the analytical workspace, unconstrained by the data model, unconstrained by performance concerns related to running obtuse queries. It turned out that the most significant precursor to a customer churning was whether or not any of the people a given customer called the most churned. It makes sense because now I’m calling them telling them I’m with a new telco. The correlation was so strong that if I churn today, the people I call the most are seven times more likely than the average customer to churn in the next 3 months. So they took that insight and operationalised it. The action they developed was that as soon as an individual churned, anybody they frequently called was targeted with an attractive offer to re-contract. They estimated the monetary benefit to the organisation from this was $70 million. Rob Strickland, The CTO of T-Mobile at the time still tells this story. He says ‘Here is my data warehouse. I’m spending millions and millions of dollars on maintaining this thing, and what it gives me is a bunch of standard reports. And yet over here, from my sandpit that cost me one million, comes the insight that was of $70 million benefit to the organisation.” They’re spending tens of millions of dollars on the data warehouse, but it’s the one million dollars on the analytic warehouse that’s differentiating them in the market.

SS: There’s a distinction worth recognising there between the process that generated the insight and then the subsequent operationalisation of that insight. It may have cost more than a million dollars to operationalise the insight—to get it out to call centres, to marketing and product teams, to come up with and execute the right campaigns and performance manage the whole thing—all of those things cost money too. But the discovery process to get to the insight was itself relatively cheap. Interesting too that many of the instinctively explanatory metrics didn’t turn out to matter in this case: complaints to the call centre, dropped calls, network, phone, plan, demographics. It turned out to be the social network.

MB: As measured by the call data records. One of the outcomes of that learning was the benefit of keeping atomic level call data for months or years rather than summarising it—which you have to do in a traditional warehouse because of the cost of storage.

SS: Because you don’t know from what perspective you’re going to analyse it?

MB: Correct. You may come up with a question in 3 months’ time that requires atomic level data over 2, 3, 5 years. If that’s been summarised you can’t get it back. This is part of the big data story. Organisations are starting to realise that there’s real gold in this atomic level data if they can find a way to keep it long enough, cheaply enough, in order to enable analytics be run.

SS: That in some sense addresses the ‘volume’ component of big data. What would you say about the ‘velocity’ and ‘variety’ components?

MB: Storage and computational power are both becoming cheaper, and technologies like Hadoop are making those two factors a lot easier to wrestle with. I won’t say that Hadoop is easy, because it’s not, it’s a very embryonic technology. But the vision is to be able to take large data sets that previously were ignored—that come under the label of ‘exhaust data’—application logs, machine logs, detailed stuff that you would never bother collecting, because why would you? Organisations are now beginning to see that it’s not actually that expensive to keep this data. And not only that, really valuable information can be harvested from it. One organisation we work with, for example, captures all of the application logs from a certain application it uses and mines them for indicators of internal fraud. Are any employees or other operators going into areas of the system they shouldn’t be? There was time when that log file information wouldn’t be collected because it was just too expensive to store, let alone analyse. The internal audit approach in such a regime relies on occasional random sampling. But now you can actually have definitive indications from the application logs. So you’re saving time, money, and the imprecision of having someone look randomly, because as soon as an alert comes up it can be managed by exception. It frees up the team who used to spend all day looking at random samples to be more productive.

SS: What are the barriers to an uptake of exploratory analytics as an accepted activity? Cost is one, and you’ve addressed that by pointing out that cheaper solutions are now available.

MB: I think that the IT architect in particular is going to have to embrace the new paradigm. A traditional IT architect is all about structure, reusability, and building things in a certain way so that they benefit the entire enterprise. This drives things like single enterprise views of architecture and standardisation on as few technologies as possible. It tends to weigh against speculative, exploratory and unconventional ways of doing things. So if the business approaches IT and says, “Look, I don’t know exactly what we’re going to do but we think there are insights in our data and we know we need a few terabytes to play with. Can you give me space?”, they’d be very lucky to get funding. That’s not considered a business case. At the same time, analysts probably need to be a bit more aware of how to put their requirements into the right language for architects: What level of security does it need? What level of availability does it need? What level of backup and recovery does it need if any? How much data volume? What are the impacts on the network going to be? What will be the flow on effects to production applications? Standard non-functional requirements—security, availability, backup, recovery. There are 6 or 7 of those that are your standard application/solution non-functional requirements and they’re the kind of things that an architect needs to be able to tick the box on.

SS: Finally, what attracted you to your current role?

MB: I see Greenplum as probably the most innovative and ambitious of the new breed of analytical database technologies. At EMC, Pat Gelsinger in particular looked at the other technologies out there and chose Greenplum because it was able to run on commodity hardware, it had flexibility within the data model to handle unstructured, semi-structured, and structured data, row based and column based storage in the same table—a whole load of stuff that everybody else is trying to catch up with at the moment.  So I see it as early days in the market for an innovative, disruptive technology.

Related Analyst First posts:

Analytics is inherently exploratory, speculative, and adaptive. It’s fundamentally a creative and human activity, and it involves making mistakes. It’s enabled by software, but many of its processes are not well defined and are barely repeatable. These characteristics are most pronounced for organizations that value analytics highly and use it strategically.

Strategic analytics elevates the value and impact of executive decisions, creating entirely new business processes and structures. At the other end of the scale, operational analytics adds marginal value to existing processes and automates low-level decisions. The sustained extraction of strategic value from data is not amenable to commercial off-the-shelf software “solutions,” turnkey approaches, and “best-practices.” Systems built on these belong in well defined, routine, compliance-driven, noncompetitive, noncore areas of business. The closer the relationship between an organization’s sources of unique competitive advantage and its leverage of analytics, the more customized and original its analytics will be.

Budget spent on commercial analytics software is money not invested in other capabilities, such as analyst education and talent recruitment. But the up-front capital expenditure requirement of commercial software applications also reduces flexibility on a range of key fronts. Initiating business cases — typically high-budget, high-profile, and technology-centric — invariably attract the conflicting agendas and priorities of a multitude of internal stakeholders at the point at which they are least informed. Analytics sponsors and teams are forced into premature commitment to certain business applications, outcomes, and benefits. These in turn drive specific tool and technology choices that proscribe alternatives. In a well defined process environment, these forms of lockin are not problematic. However, analytics is different. It’s a data-driven, results-contingent, uncertainty-bounded activity.

These dependencies prevent many businesses from getting analytics initiatives off the ground, and they make it harder for those that do to learn and adapt. Specific pre-commitments, especially those that are operational (such as “more accurate forecasts that will reduce inventory costs” or “better customer targeting that will increase retention”) frequently become shackles for analytics teams. Analytics uncovers insights, but it can’t know what these are going to mean ahead of time or guarantee that they will be actionable to positive effect. Up-front capital expenditure on commercial software has the unfortunate effect of tethering the political capital of analytics teams and their executive sponsors to prematurely determined outcomes. The more complex, committed, and compromised these outcomes, the higher the risk of perceived or actual failure for analytics and those affiliated with it.

Organizations can either avoid these risks altogether or greatly reduce them and make them more manageable through employing commodity and open-source software for analytics. By using the commodity software already on their desktops (e.g., Microsoft Excel and Access) and freely available via open-source licensing (e.g., R, RapidMiner), analytics teams avoid technical lockin, maintain the ability to adapt to changing business priorities, and can devote more resources toward education and talent (via training, experimentation, learning from mistakes, and recruitment).

Importantly, they can do all of this without risking their political capital. In this way, commodity and open-source tools matter by not mattering. Analysts retain the freedom to pursue insights of unique and strategic value to their businesses, and the flexibility to instantiate these in the form of novel and customized business processes. Some of these may turn out to be well suited to the analytics software applications on offer from commercial vendors, in which case the organization is now an educated buyer.

This piece was originally published by All Analytics in September 2011 as part of their Point/Counterpoint series. Its counterpoint post, ‘Downsides Dampen Open-Source Analytics’, by Ajay Ohri, is here. Beth Schultz’s introduction is here.

Related Analyst First posts:

 

I was sorry to read, via Gary Cokins, of the recent passing of Jeremy Hope, who along with Robin Fraser pioneered the Beyond Budgeting movement and co-founded the Beyond Budgeting Round Table (BBRT). As Cokins summarises:

Their basic message was that the annual budgeting process is so broken and dysfunctional that the best solution is not to reform it but rather to abandon the process altogether. Their solution was to understand the underlying purposes of a budget and apply methods, like driver-based rolling financial forecasts, that fulfill the purposes of a budget.

Having spent a good part of the last twelve years as an enterprise budgeting and planning specialist I have a great deal of sympathy for this view. The underlying purposes of budgets are rarely clarified and distinguished from each other. As I’ve written about before, this leads to much wasteful confusion, both practical and linguistic:

In reality the budget is a hybrid because it serves two main purposes. It sets performance targets (goal setting) and limits the resources available to those pursuing them (planning). Both goal setting and planning are necessarily reliant on forecasts, although these underlying objective estimates are not always made explicit. Updated plans and targets – they are commonly revised within a financial year – are often referred to as “forecasts”.

The enterprise budget is an odd and hybrid beast. Many of its perversities and pathologies are familiar to everyone who’s worked in an organisation: arbitrariness, inflexibility, unresponsiveness to change, incentives to game the system (underplaying revenue potential while overstating costs), encouragement of ‘use it or lose it’ spending, disconnection from strategy. Then there is its being expressed in the language of accounting, which is not the natural language of most businesspeople. Finally, there is the sheer complexity of its enterprise coordination—the annual ‘march of a thousand spreadsheets’. Most of this coordination effort is in fact completely unnecessary. The bulk of any organisation’s expenditures are preordained. They’re either fixed, or circumscribed by its balance sheet. The planning (resource allocation) aspect of budgeting is thus fundamentally a top-down exercise. However, its goal setting aspirations lead to an insistence that budgets be built bottom up, painstakingly, by individual managers. The idea is that this generates ‘buy in’. Typically, however, the bottom-up aggregations never conform to the top-down constraints, so they get overridden during the budget finalisation process.

Despite all of this, the annual budget remains stubbornly embedded in the workings of most organisations—more understandably in government, where it fulfills a legislated purpose, than in the private sector. I attended a seminar with Jeremy Hope in Sydney, from memory in 2004, facilitated by the Institute of Chartered Accountants in Australia (ICAA). I remember asking Hope why it was that adoption of Beyond Budgeting’s principles was relatively rare. It was notable that the practitioners featured in Beyond Budgeting’s case studies (companies such as Toyota and Svenska Handelsbanken) had been using it successfully for decades. If the good news wasn’t new, why such resistance? His answer, in essence, was that the status quo, although widely acknowledged as inefficient, was so familiar that dismantling it was literally unimaginable for most budgeteers. Disrupting it was a long and uphill battle.

Beyond Budgeting is to budgeting as Lean Startup is to entrepreneurship and Analyst First is to Business Analytics. Each movement takes a first principles approach to diagnosing, in order to do away with, a set of wasteful habits of thought and practice which result from convention and are sustained by poor incentives.

Odd One Out

Related Analyst First posts:

Eric Ries, Harvard Business School’s ‘Entrepreneur-in-Residence’ and the founder of Lean Startup, has been interviewed a number of times recently following the launch of his Lean Startup book. As we have argued before at Analyst First, the ‘unknown problem / unknown solution’ domain—in which and for which the Lean Startup approach was developed—reflects the world of Business Analytics. In the 12 to 18 months since giving the last interview we linked to, Ries has significantly enriched the ideas of Lean Startup. Two new interviews are highly recommended. The first is from the Commonwealth Club of California’s Inforum program. Details here and audio file here, or here. The second is from the ITConversations network’s Tech Nation program. Details and download here.

Ries takes it as axiomatic that enterpreneurs seek to create “institutions of lasting value” in an unstable environment. He defines a startup as any “human institution designed to create something new under conditions of uncertainty”. The role of the Lean Startup toolkit is to help entrepreneurs navigate the best path in this context. Most contemporary management tools have their heritage in twentieth century manufacturing and are based on forecasting and planning. They assume that the world is stable enough to be predicted such that plans can be reliably devised and executed. As Ries points out, and should be obvious, this assumption simply doesn’t hold in the environments in which many of us now work.

In the twenty-first century we can build almost anything that can be imagined. The challenge is not to build more stuff. It’s to build the right stuff. Most startups fail, says Ries, because they make the wrong things. The key activity of a startup should therefore be learning, not building. What creates value for a startup is it determining whether or not it’s on the path to a sustainable business.

Lean Startup is a scientific approach to new product development which treats everything a startup does as an experiment. The goal is to collect data (feedback about what customers want) with minimal cost, not to build to a pre-determined product specification (which assumes what customers want) with minimal cost. In service of this, the Lean Startup movement is developing ‘innovation accounting’, an attempt to revolutionise the existing accounting paradigm so that it can operate under conditions of uncertainty and instability. The current planning-based paradigm (are we on time, on budget?) is unable to distinguish between the threshold of success and the brink of failure. The core question being asked by innovation accounting is, instead: are the experiments the team’s doing affecting customer behaviour?

Clearly there are implications here for Business Analytics. We’ve often written here of the uncertainty inherent in data-driven analysis, and of the unsuitability of the default IT project plan-based, build-centric, waterfall approach to what is inescapably an exploratory and learning-oriented set of activities. Much of the Lean Startup approach translates directly into the Analytics Lab. However, the constraints faced by those attempting to innovate from within already established organisations (who Ries terms ‘intrapreneurs’) are not the same as those which frame the entrepreneurial enterprise. Entrepreneurs operate until the financial capital provided to them by venture capitalists runs out. The ‘lean’ in Lean Startup seeks to maximise the number of experiments they can run before this happens. The entrepreneurs described by Ries all enjoy a large and fundamentally interchangeable prospective customer base. Many unsuccessful experiments can be run on different user populations in search of a loyal base, essentially without consequences. Unhappy users don’t stick around. This is not the case for intrapreneurs. Within an organisation there are only a limited number of prospective analytics customers, and disappointing any one group leaves a legacy. The scarce resource for intrapreneurs is political capital.

Dilbert.com

Related Analyst First posts:

Is Big Data a Bubble?

In case you’re in a hurry: Of course it is. And that is good.

Quentin Hardy in The New York Times Bits blog summarises the state of play regarding the business world’s interest in and utilisation of big data. As he recognises, big data is really about “the benefits we will gain by cleverly sifting through it to find and exploit new patterns and relationships.” Big data drives up the obviousness of the need for analytics. Its often invoked qualities of volume, velocity, and variety mean that we can’t pretend to analyse it without statistical and machine learning methods.

Big data analytics at the present moment, however, is characterised by uncertainty. What is big data, exactly? What’s its value? How is it analysed? Which technologies should be used? Which standards will prevail? Where to invest? There is, Hardy finds:

a common problem in the Big Data proposition: Often people won’t know exactly what hidden pattern they are looking for, or what the value they extract may be, and therefore it will be impossible to know how much to invest in the technology. Odds are that the initial benefits, as it was with Google’s Adwords algorithm, will lead to a frenzy of investments and marketing pitches, until we find the logical limits of the technology. It will be the place just before everybody lost their shirts.

This is a common characteristic of technology that its champions do not like to talk about, but it is why we have so many bubbles in this industry. Technologists build or discover something great, like railroads or radio or the Internet. The change is so important, often world-changing, that it is hard to value, so people overshoot toward the infinite. When it turns out to be merely huge, there is a crash, in railroad bonds, or RCA stock, or Pets.com. Perhaps Big Data is next, on its way to changing the world.

Such is technology. Hence the level of uncertainty, and also reasons for optimism:

There are an uncountable number of data-mining start-ups in the field: MapReduce and NoSQL for managing the stuff; and the open-source R statistical programming language, for making predictions about what is likely to happen next, based on what has happened before. Established companies in the business, like SAS Institute or SAP, will probably purchase or make alliances with a lot of these smaller companies.

Expect to see a lot more before it all gets sorted out.

TDWI’s recent Big Data Analytics report by Philip Russom, based on 325 sets of responses to a May 2011 survey, echoes this and provides a far more comprehensive overview of how businesses are coping with the realities of big data. One measure of how unsettled the field is right now is that, in response to the question ‘Which of the following best characterizes your familiarity with big data analytics and how you name it?’, 65% of the survey’s respondents replied, ‘I know what you mean, but I don’t have a formal name for it.’

The report is recommended in full, but in case you’re still in a hurry, its margin summaries can be skimmed through in a matter of minutes.

Defensive Dice

Related Analyst First posts:

Tagged with:
 

The recent IAPA discussion panel on ‘Aligning IT and Analytics to deliver sustainable innovation’, plus a later conversation with fellow panellist, EMC-Greenplum’s James Horton, prompted me to sketch some thoughts on what an Analytics Lab ought to do. The lab is the natural home for Analysts engaged in the narrower definition of Analytics:

Purpose

The Analytics Lab is an innovation factory which constantly evaluates data, quantitative methods and tools looking for sources of competitive advantage.

It evaluates:

  • Data: structured and unstructured, sourced from both inside and outside the organisation, established and new.
  • Methods: data transformation, and then data mining, machine learning, statistical, mathematical, and other analytical methods.
  • Tools: as appropriate to method, from programming languages through to GUI applications, from commodity and open source through to commercial tools.
  • Analysts: the lab enables the organisation to evaluate the technical abilities and innovative propensities of its analysts, as well as those on offer from external service providers, without many of the interfering factors present in operationally hardened IT environments.

Its outputs are:

  • Insights
  • BI prototypes
  • Instantiation candidates

It also:

  • Identifies data and knowledge gaps: Analysing data and generating insights brings to light new data needs and exposes gaps in knowledge which may impact the business. Additional data may need to be sourced, gathered through survey, collected by tweaking an existing business process, or purchased from a third party. Additional analyses and subject matter expertise may be required to close knowledge gaps.
  • Resolves disharmonies: All businesses struggle with ‘different views of the truth’, and it’s often the crunching of data which brings these to light. Disharmonies might be within or between data sets, or between conventional wisdom and the drivers of a model. They could relate to anything from actual observations to tacit assumptions. Resolving such disharmonies—harmonisation—involves identifying, scoping, validating, and correcting them.

These last two are not the core business of Analytics, but they’re important activities, and doing Analytics naturally leads to them. Most organisations don’t explicitly provision for them, but arguably they should. The lab is as good a home for them as any other.

Beneficiaries

The Analytics Lab services all levels of business, but in different ways:

  • Senior Management: through the provision of strategic insights.
  • Middle Management and Knowledge Workers: through one-off and/or prototyped BI analyses.
  • Frontline Workers: through the identification of instantiation candidates, i.e. deployable operational analytics.

Context

Many analyses typically need to be tried before those which merit instantiation are discovered. Furthermore, “instantiation” doesn’t necessarily mean a repeatable process. It could simply mean the communication of a one-off insight, e.g. “revenue growth is unmistakeably slowing in all but one customer segment” or “the most reliable predictor of a customer’s propensity to churn is their social network membership.” Such insights are typically complex, valuable, but not “actionable” in any deterministic, automatable way.

Other findings are suited to more regularised delivery, for example as managerial decision support through business intelligence.

Some analytical results, in order to be fully leveraged, need to be integrated into frontline business processes. Predictive models which predict customer acquisition or churn, for example, might require integration in sales, marketing, call centre, channel management and customer support processes.

Approach

Conduct disciplined, exploratory analyses which repeatedly cycle through the following sorts of questions:

Data questions:

  • Is there structure in the data (patterns, trends, relationships, networks, segments, clusters, indicators, drivers, outliers, anomalies)?
  • Are there new insights in the data?
  • Which models are viable?
  • Which variables are important?
  • Which variables do we control?
  • What are the implications for revenue, cost, risk?
  • What data do we want that we don’t have? How could we get it?

Deployment questions:

  • What are the implications of this insight?
  • Who is our internal customer for this insight?
  • Would this analysis be valuable if provided on an ongoing basis? To whom?
  • Into which existing or envisioned business processes should this insight be instantiated?

Harmonisation questions:

  • Where are there disharmonies in tacit or explicit data and assumptions?
  • Which projects, processes and decisions are affected by these disharmonies?
  • How do we validate and resolve these disharmonies?

Key infrastructure

Infrastructure can usefully be separated into the ‘electronic infrastructure’ of hardware and software and the ‘human infrastructure‘ of people, relationships, management and incentives.

Electronic infrastructure

  • Secure, off-network ‘sandpit area’
  • Big storage, big memory, scalable to big data
  • Eclectic analytical toolset: commodity, open source, commercial, experimental, in-house
  • Snapshots, copies, feeds of all manner of available data sources: pre-ETL, pre-warehouse, post-warehouse, external, web, social media, unstructured. In the context of the lab, the data warehouse is just another source system.
  • De-emphasis on repeatable technical processes and compliance with production IT architecture
  • Insulated from IT Service Level Agreements and other production / core system / business-as-usual constraints

Human infrastructure

  • Human Resources:
    • Analysts: Data scientists
    • Management: Validate analysis objectives, ensure that analysts remain focused, performance manage the innovation process.
  • Relationships:
    • Sponsorship from Executive
    • Cross-functional relationships with business units: both ‘push’ (business unit as customer) and ‘pull’ (business unit as subject matter expert)
    • Close relationship with Strategy function
    • ‘Caveat utilitor’ relationship with IT for data provision and tool support
    • Various relationships with service providers: vendors, consultants, training and mentoring providers, industry expertise, academia if appropriate
  • Performance Management:
    • Innovation / Research metrics
    • Risk metrics
    • Sentiment metrics
    • Dimensions of opportunity: Internal, Competitor, Market, Customer, Product, Channel

#gloves #wash #breaking bad

Related Analyst First posts:

Set your Twitter account name in your settings to use the TwitterBar Section.