Today’s post is from Klaus Felsche, subject of a recent Analyst First Interview on the creation, development and live deployment of an Analytics function at the Australian Department of Immigration And Citizenship (DIAC).
At a recent Analyst 1st meeting in Canberra, it was suggested that I offer a few thoughts about ‘data quality’ as a thought-piece. Here are some of the thoughts:
There is considerable discussion around ‘data quality’; and so it should be. Unfortunately, ‘quality’ is a qualitative and, at time, emotional, assessment of whether the data we have supports analytics. Some, after a quick look at available data, throw in the towel and abandon any attempts to make the data work.
I am inclined to abandon the word ‘quality’ as it tends to under-value the capabilities of skilled analysts and sophisticated software and leads managers to rash judgments about what is and what is not possible. It is far more constructive to consider whether the given data is able to support our processes with sufficient accuracy to be meaningful. In other words, is there enough value in the data to be fit for service?
The Situation: Data is rarely collected to support analytics
We tend to have data that is collected to support a business process. Even if analytics is considered at the initial design stages, by the time we need to get the data to provide answers, we would generally find that these new questions were probably not anticipated years ago when the project defined the current data structures and processes.
The challenge for analysts is, therefore twofold: firstly the analyst tries to build models that address the issues with the highest degree of accuracy from the available data; secondly the analysts compiles a ‘shopping list’ of data that would enhance the process if it were available. The ‘shopping list’ can be provided to management to ensure it is considered in future redesigns.
There will be times when even the smartest analyst cannot squeeze the answer out of existing data sets or sets outside the organisation that may be available.
While we wait to get more useful data analysts may be able to help the organisation build interim solutions based on what is available (eg intelligence reporting, business intuition and knowledge, etc).
• Avoid the term ‘quality’. It assumes that there is a low or high quality data set. This is not helpful. We should probably focus on how well the data supports the analytics processes. As far as business operations are concerned, if the data processes support the core business functions then there is little wrong with its ‘quality’.
• Start analysis from the ‘data end’ rather than having in mind a business (intuition or anecdotally or experientially – derived) model. While these can be useful, measuring the data against such models and then making a call that the data is ‘not good enough’ in some way is not helpful. Analysts should be given the opportunity to test a range of methods to see what meaning can be drawn from the existing data first – in my experience, analysts have been known to produce pleasant surprises when given the chance and the right tools.
• Ensure that there is a business process in place (or build one) that can feed analysts’ suggestions back into future systems enhancements.
• Many vendors offer an end-to-end solution (everything from data capture to storage to analysis and reporting). Such systems would need to be sufficiently flexible to support changes over time to support changes in data structures, collection processes and tools available to support analytics processes.
• Educate the business areas to create an awareness of the value of data:
o enhanced data sets; and
o business processes that better support analytics (ie convince operational staff or clients entering data that there is value in timeliness, accuracy, completeness, etc).
About usAnalyst First is a new approach to analytics, where tools take a far less important place than the people who perform, manage, request and envision analytics, while analytics is seen as a non-repetitive, exploratory and creative process where the outcome is not known at the start, and only a fraction of efforts are expected to result in success. This is in contrast with a common perception of analytics as IT and process.
Tags in a CloudAIPIO analyst first Analyst First Chapters analytics analytics is not IT arms race environments big data business analytics business intelligence cargo cults collective forecasting commodity and open source tools complexity data decision automation decision support educated buyer EMC-greenplum forecasting HBR holy trinity human infrastructure incentives intelligence model of analytics investing in data lean startup literacy management culture MBAnalytics operational analytics organisational-political considerations Philip Russom Philip Tetlock prediction markets presales R Robin Hanson Strategic Analytics tacit data TDWI Tom Davenport uncertainty uneducated buyer vendors why analyst first