Currently viewing the tag: "spreadmarts"

Week 3, Day 2 of the CORTEX MBAnalytics program includes TDWI’s best practices report, ‘Strategies for Managing Spreadmarts – Migrating to a Managed BI Environment’, by Wayne W. Eckerson and Richard P. Sherman, from 2008 and based in part on a survey conducted in 2007. It’s an excellent document—one of the best articulations of a problem with which Business Analytics practitioners and interested parties ought to be familiar: the nature and causes of ‘spreadmarts’, their strengths, weaknesses and limitations, what to do about them, and the risks involved. Given how well the report covers these topics, I commend it in full. But I’m also going to address where it falls short. It’s right in its reasoning and its conclusions, but potentially misleading in its emphasis and what it omits.

By way of definition:

A spreadmart is a reporting or analysis system running on a desktop database (e.g., spreadsheet, Access database, or dashboard) that is created and maintained by an individual or group that performs all the tasks normally done by a data mart or data warehouse, such as extracting, transforming, and formatting data as well as defining metrics, submitting queries, and formatting and publishing reports to others. Also known as data shadow systems, human data warehouses, or IT shadow systems.

Unsurprisingly:

Finance generates the most spreadmarts by a wide margin, followed by marketing, operations, and sales… Finance departments are particularly vulnerable to spreadmarts because they must create complex financial reports for internal and external reporting as well as develop detailed financial plans, budgets, and forecasts on an ad hoc basis. As a result, they are savvy users of spreadsheets, which excel at this kind of analysis.

Spreadmarts are categorised into three types:

  • One-off reports. Business people use spreadsheets to filter and transform data, create graphs, and format them into reports that they present to their management, customers, suppliers, or partners. With this type of spreadmart, people are using data they already have and the power of Excel to present it. There’s no business justification—or even time—for IT to get involved.
  • Ad hoc analysis. Business analysts create spreadmarts to perform exploratory, ad hoc analysis for which they don’t have a standard report. For instance, they may want to explore how new business conditions might affect product sales or perform what-if scenarios for potential business changes. They use the spreadmart to probe around, not even sure what they’re looking for, and they often bring in supplemental data that may not be available in the data warehouse. This exploration can also be time-sensitive and urgent.
  • Business systems. Most spreadmarts start out as one-off reports or ad hoc analysis, then morph into full-fledged business systems to support ongoing processes like budgeting, planning, and forecasting. It’s usually not the goal to create such a system, but after a power user creates the first one, she’s asked by the business to keep producing the report until, eventually, it becomes an application itself. This type of spreadmart is called a “data shadow system.”

Of these, it’s ‘business systems’ which are the report’s focus. The ‘one-off report’ and ‘ad hoc analysis’ categories of spreadmart, the report fittingly concludes, are inappropriate for systemisation.

But defining spreadmarts in terms of them being ‘shadow systems performing all the tasks normally done by a data mart of data warehouse’ gets things somewhat backwards. Presupposing that all spreadmarts are appropriate for systemsation—in effect viewing them as ‘data marts in waiting’—is misleading when their one-off and ad hoc uses are recognised and their wide proliferation and coverage taken into consideration. One might more accurately define data marts and data warehouses as ‘scaled-up systems which perform some of the tasks normally done by a spreadmart’.

The report’s framing of spreadmarts is a product of its BI/DW view of the world. In that worldview, data lives in source systems and needs to be ETL-ed into a relational repository in order to be published en masse to business users whose analysis requirements consist of read-only slice and dice. These needs do of course exist, but analysis entails much more. This narrower BI/DW view is reflected in the TDWI survey’s design. For example, respondents are asked to rank the top five reasons why spreadmarts exist in their group. “Quick fix to integrating data” ranks second overall, “Inability of the BI team to move quickly” third, ”This is the way it’s always been done” fifth, and “Desire to protect one’s turf” seventh. These options are hardly worded neutrally. They’re phrased so as to norm systemisation and to cast ad hoc and one-off uses of spreadmarts as the products of sloppiness, frustration, ignorance, and narrow self-interest. The section on the benefits of spreadsheets is similarly biased in its framing. “Ideal for one-time analysis” and “Good for prototyping a permanent system” are listed, but not ‘ideal for exploring data, creating scenarios, capturing assumptions, and enriching existing data’. Interpretation also suffers. For example:

Ironically, organizations with a “low” degree of [BI] standardization have the lowest median number of spreadmarts (17.5), and only 31% haven’t counted them. The proper conclusion here is that standardization increases awareness of spreadmarts.

It may be a convenient conclusion, but it certainly isn’t the only one. Perhaps the 70 to 80 percent failure rate of BI standardisation projects is driving business users back to spreadsheets. Excel integration with BI platforms is also filtered through a read-only lens:

BI vendors are starting to offer more robust integration between their platforms and Microsoft Office tools. Today, the best integration occurs between Excel and OLAP databases, where users get all the benefits of Excel without compromising data integrity or consistency, since data and logic are stored centrally.

This tends to be how BI vendors understand Excel integration. They recognise that users enjoy Excel as a query and reporting interface, but they understate its importance as a data and logic creation tool. The BI/DW worldview understands all data as the product of business processes which write it to ‘source systems’. (The one exception to this is the sub-discipline of enterprise budgeting and planning, which is in effect BI with people as the source systems.) What is underappreciated is that analysis itself is a business process—one which can’t help but create data.

These biases are understandable. The TDWI report is sponsored by a collection of BI software vendors, and 60% of those surveyed are IT professionals. What’s missing, then, is a more nuanced understanding of what analysis entails. Such an understanding needs to recognise:

  1. The centrality of data transformation and enrichment by individual analysts
  2. The value of tacit data
  3. The importance of presentation

Simply put, analysis is a read-write activity. Routine analytical tasks I find myself doing, for example, include:

  • Entering hitherto tacit data
  • Codifying business knowledge
  • Finding and synthesising data from outside sources
  • Creating dummy and randomised data
  • Capturing novel assumptions
  • Imposing new categories on existing categorical data
  • Enriching existing data by deriving or devising on-the-fly metadata
  • Building scenarios and constructing counterfactuals
  • Drafting and adding commentary, interpretation, and notes
  • Formalising and detailing new questions and follow-on analyses

All of these activities involve me creating new data, and I would submit that neither I nor any BI/DW requirement gathering cycle would be able to anticipate that data ahead of time. These are creative, reflective, results-contingent activities. As the report puts it:

[M]ost BI vendors have recognized that a large portion of customers are using their BI tools simply to extract data from corporate servers and dump them into Excel, Word, or PowerPoint, where they perform their “real work.”

If we additionally overlay the question of data value, one of Analyst First’s key contentions is that the most valuable information in organisations lives in people’s heads. It’s tacit, and spreadsheets are one of the best tools for eliciting it and making it explicit:

Spreadsheets are the most pervasive and effective decision support tools. No organisation doesn’t use them, and it’s a safe bet that this will always be the case. No amount of data warehousing will ever be able to provide decision makers with all the information they need. To the extent that it can, those decisions can be automated. Decisions invariably require new data. That new data will be either unanticipatable, or tacit, or both. Spreadsheets are unbeatable for ad hoc data analysis and turning tacit data into explicit data.

But spreadsheets aren’t the only tools available for tacit data mining. Nor, for some types of data, are they the best tools. As the ‘The Economics of Data – Analytics is… Investing in Data‘ post argued:

The most interesting, readily available, strategically relevant and poorly understood form of data is tacit data: the information contained in the brains of staff, board, shareholders and anyone else who would see the organisation do well… How is tacit data mined? The most effective and powerful way is by use of collective intelligence and forecasting techniques, such as prediction markets.

Finally—analysis process, ad hoc, and one-off needs aside—decision support systems need to be more than portals for publishing structured data as tables, charts, and indicators. As the TDWI survey picks up:

While Excel is the most popular tool for building spreadmarts, business analysts also use Microsoft Access, PowerPoint, and SAS statistical tools.

SAS and PowerPoint are telling inclusions: SAS contains a great deal of statistical and modelling functionality that BI stacks don’t (or certainly didn’t in 2007); PowerPoint is able to flexibly integrate unstructured commentary with the more structured outputs of BI platforms—as is Word. The TDWI report itself is an example of this: most of it is unstructured text, then there are graphics and other design elements, and finally the charts and tables. Very few high value analyses don’t contain narrative, diagrams, and other unstructured presentation elements.

The TDWI report does in fact acknowledge all of the above:

[T]here is often no acceptable alternative to spreadmarts. For example, the data that people need to do their jobs might not exist in a data warehouse or data mart, so individuals need to source, enter, and combine the data themselves to get the information. The organization’s BI tools may not support the types of complex analysis, forecasting, or modeling that business analysts need to perform, or they may not display data in the format that executives desire.

The report’s biases are in its view of analysis as fundamentally amendable to structure, and spreadmarts (as one of its enablers) as precursors of systems. Scalability, repeatability and automation certainly have their place, but a more realistic view would recognise that the analytical activities which a data warehouse can viably support are a subset. This has important implications for BI/DW practices now that the Business Analytics domain incorporates them alongside advanced analytics and the emerging field of big data.

Related Analyst First posts:

Set your Twitter account name in your settings to use the TwitterBar Section.