Thomas C. Redman at HBR recommends Four Steps to Fixing Your Bad Data:

  1. Admit you have a data quality problem. Just like any twelve-step program, admitting the problem is key. From there, follow the next three steps for data improvement.
  2. Focus on the data you expose to customers, regulators, and others outside your organization. Take a careful look at your system of controls. Is it up-to-snuff? Make sure — not only that the right controls are in place, but that you’re actually using them. Every time.
  3. Define and implement an advanced data quality program. Making sure data leaves the door correctly may be a viable short-term alternative, but there is already too much data, and the quantities are growing. Just as manufacturers found that they had to “prevent errors at their sources,” so too with data. You need a quality program that does so.
  4. Take a hard look at the way you treat data more generally. Almost everyone readily acknowledges that “data are among our most important assets.” But they don’t manage them that way. Indeed, data are almost invisible. And the top person responsible for data may be an architect buried deep in the bowels of IT. If this description rings true, you need to get an aggressive data program, with real talent, budget, and teeth in place.

Steps 1 and 4 get to the heart of how data is understood, and misunderstood, within organisations. They’re also about the status of data.

Data has a number of curious properties that limit the explanatory reach of commonly used metaphors:

  • Data is a resource in that it’s quantifiable and useful and can be transformed into something of higher value through extraction and processing. But it’s not a commodity. At the granular level, unlike physical resources, every piece is different; and in the aggregate its value is not a function of its volume.
  • Data mining makes sense as a value extraction analogy, but it also fails in the sense that physical mining processes are well defined, highly automated, repetitive, and relatively predictable. Data mining, not so. Its exploratory, bespoke and synergistic nature makes it more analogous to prospecting than mining.
  • Data as an asset has its limits too. As Redman notes, this is a deceptively simple idea. Everyone touches and interacts with data differently. But unlike physical assets, data persists after reuse and can be endlessly recycled and repurposed, in parallel, and not always in forseeable ways.

Step 1 – admitting to data quality problems – reflects my contention that the default assumption should always be of problematic data.

Step 4 addresses the status of data. I’ve argued before that data is a spellword. It’s easily and often dismissed as technical and geeky, and therefore not worthy of executive concern. As Redman argues, however, its importance shouldn’t be underestimated, nor should the responsibility for understanding and managing it be deprioritised, outsourced, or consigned to the “bowels of IT.”

Steps 2 and 3—the action steps—are of particular interest to Analytics practitioners (as distinct from Business Intelligence practitioners) given the different data needs of analysts compared with business consumers. An advanced data quality program designed to prepare data for exposure to customers, regulators and other outsiders—if it takes a ‘one size fits all’ approach—will most likely interfere with analytics. As Redman contends, “the tendrils of a data program will affect everyone,” and they will do so differently. Such a program needs to recognise that the exploratory work of analysts is a separate stream of activity from the ETL-DQ-MDM-EDW-BI stream, with its own distinct set of needs.

Recycling Regiment 2

Related Analyst First posts:

Tagged with:
 

One Response to Data is different

  1. A good introduction to the subject and thanks for posting this.

    I am currently running a DQ program (along with a dozen other initiatives) and for step 2 in the above framework we are not focusing on “the data we expose to customers, regulators, and others outside our organisation”. I have taken a slightly different path and we are instead focussing on data that:

    1. is determined as ‘business critical’ by the stakeholders of that data (throughout the information life-cycle), and

    2. is financial data. i.e. represents revenue.

    This has been done because of a combination of factors: the relative DQ ‘immaturity’ of the organisation; the focus on the consolidated operation statement as the primary tool to manage a large and diverse conglomerate; and because this represented an achievable target where success would lead to subsequent DQ activities that would expand our ambitions – i.e. it will give us a track record of success upon which we can build.

    The framework you outline from Redman is useful – but implementing it in my experience is only successful when you focus your efforts on smaller (and hopefully faster) deliverables.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.