Today’s post is by Eric Strovink of BIQ.
Ask a statistician or an applied mathematician, and she’ll probably tell you that analysis is either (1) building predictive models based on historical data, or (2) deciding whether past events are statistically significant (i.e., ascertaining whether what actually happened is sufficiently different than what might have happened by random chance).
But most of us aren’t applied mathematicians or statisticians, so we can get into trouble very easily. For example, we typically haven’t got a particular hypothesis to test (which is critical), and that means any patterns we might “find” are immediately suspect. That’s because in any dataset one can always come up with a hypothesis that generates significant results if one looks hard enough. With regard to predictions, we generally aren’t confident about the predictive power of our models, because we are neither facile with advanced predictive modeling techniques, nor do we have access (in general) to a sufficiently large sample of “known outcomes” to which to compare our predictions. Without a massive dataset like that provided by the Netflix Prize competition, there is no hope of refining a solution.
Of course, practical analysis work can be done without any advanced statistical or modeling techniques. Practical analysis boils down to “finding stuff in your data” that you either didn’t know about, or weren’t sufficiently aware of. That’s the basis of what business analysts do every day. Which salespeople are selling, and which aren’t? What products are selling where, and what aren’t? What was their profit margin, and why? What are the costs associated with running the business, and are they reasonable or unreasonable? And so on.What’s required in order to come up with these answers is well understood:
- Acquire data from one or more sources.
- Transform like data sources into a common format, and link unlike-but-related data sources together with common keys (or computed expressions that result in common keys).
- Create a schema for the data sources, obeying the conventions of a [selected] database system.
- Load the data sources into the database system.
- Issue queries against the database, and, when useful, format the results into reports.
Steps 1 through 4 are accomplished out-of-the-box by every ERP or accounting system, although only for a small subset of the useful data in an organization. Step 5 is also accomplished by ERP or accounting systems, on that same subset of data, but (historically) rather poorly. That’s why there has been such a large market for “Business Intelligence” or “BI” tools that put some necessary functionality back into Step 5.
However, when the data aren’t generated by the system that’s reporting on them, or aren’t resident in one of a handful of ERP systems to which a BI system can attach automatically, then we hit the essential problem with business data analysis. This problem is either ignored or deliberately misunderstood by most IT organizations, and it’s simply this: business analysts, in general, are either unwilling or unable to accomplish the following:
- Transform data;
- Create database schemata;
- Load database tables;
- Issue SQL queries.
And, even if they can accomplish those steps, exploratory analysis usually can’t be justified by management because the above process takes too long (and therefore costs too much, causing the expected value of the analysis to be negative). Which means, IT departments, that you can buy the business people all the data warehouse tools you want, and it won’t make a whisker’s bit difference with respect to their ability to analyze data. Sure, you could hire a data expert to help them, but that won’t work either (I’ll save that explanation for part III).
Previous: Analytics I: Optimization Comes of Age