- Admit you have a data quality problem. Just like any twelve-step program, admitting the problem is key. From there, follow the next three steps for data improvement.
- Focus on the data you expose to customers, regulators, and others outside your organization. Take a careful look at your system of controls. Is it up-to-snuff? Make sure — not only that the right controls are in place, but that you’re actually using them. Every time.
- Define and implement an advanced data quality program. Making sure data leaves the door correctly may be a viable short-term alternative, but there is already too much data, and the quantities are growing. Just as manufacturers found that they had to “prevent errors at their sources,” so too with data. You need a quality program that does so.
- Take a hard look at the way you treat data more generally. Almost everyone readily acknowledges that “data are among our most important assets.” But they don’t manage them that way. Indeed, data are almost invisible. And the top person responsible for data may be an architect buried deep in the bowels of IT. If this description rings true, you need to get an aggressive data program, with real talent, budget, and teeth in place.
Steps 1 and 4 get to the heart of how data is understood, and misunderstood, within organisations. They’re also about the status of data.
Data has a number of curious properties that limit the explanatory reach of commonly used metaphors:
- Data is a resource in that it’s quantifiable and useful and can be transformed into something of higher value through extraction and processing. But it’s not a commodity. At the granular level, unlike physical resources, every piece is different; and in the aggregate its value is not a function of its volume.
- Data mining makes sense as a value extraction analogy, but it also fails in the sense that physical mining processes are well defined, highly automated, repetitive, and relatively predictable. Data mining, not so. Its exploratory, bespoke and synergistic nature makes it more analogous to prospecting than mining.
- Data as an asset has its limits too. As Redman notes, this is a deceptively simple idea. Everyone touches and interacts with data differently. But unlike physical assets, data persists after reuse and can be endlessly recycled and repurposed, in parallel, and not always in forseeable ways.
Step 1 – admitting to data quality problems – reflects my contention that the default assumption should always be of problematic data.
Step 4 addresses the status of data. I’ve argued before that data is a spellword. It’s easily and often dismissed as technical and geeky, and therefore not worthy of executive concern. As Redman argues, however, its importance shouldn’t be underestimated, nor should the responsibility for understanding and managing it be deprioritised, outsourced, or consigned to the “bowels of IT.”
Steps 2 and 3—the action steps—are of particular interest to Analytics practitioners (as distinct from Business Intelligence practitioners) given the different data needs of analysts compared with business consumers. An advanced data quality program designed to prepare data for exposure to customers, regulators and other outsiders—if it takes a ‘one size fits all’ approach—will most likely interfere with analytics. As Redman contends, “the tendrils of a data program will affect everyone,” and they will do so differently. Such a program needs to recognise that the exploratory work of analysts is a separate stream of activity from the ETL-DQ-MDM-EDW-BI stream, with its own distinct set of needs.
Related Analyst First posts:
About usAnalyst First is a new approach to analytics, where tools take a far less important place than the people who perform, manage, request and envision analytics, while analytics is seen as a non-repetitive, exploratory and creative process where the outcome is not known at the start, and only a fraction of efforts are expected to result in success. This is in contrast with a common perception of analytics as IT and process.
Tags in a CloudAIPIO analyst first Analyst First Chapters analytics analytics is not IT arms race environments big data business analytics business intelligence cargo cults collective forecasting commodity and open source tools complexity data decision automation decision support educated buyer EMC-greenplum forecasting HBR holy trinity human infrastructure incentives intelligence model of analytics investing in data lean startup literacy management culture MBAnalytics operational analytics organisational-political considerations Philip Russom Philip Tetlock prediction markets presales R Robin Hanson Strategic Analytics tacit data TDWI Tom Davenport uncertainty uneducated buyer vendors why analyst first