“Integrated Analytics”, “Analytics as a Service”, “Analytical Centers of Excellence” these are all buzzwords that while they have been around a while, seem to be gaining traction. I downloaded the eBook Integrated Analytics: Platforms and Principles for Centralizing your Data from O’Reilly publishing. Generally I do not pay much attention to these books as they seem to result in calls from the sales departments of the product companies mentioned in the book. However, I wanted to see what they had to say . First – pretty good for a 17 page book, and probably worth a read over the morning coffee; but a few points.
Point #1: This is a paper about analytics, and yet very quickly it becomes a paper about data – data is not sexy, so the paper does not position it itself as a data paper. However, the implementation reality is that analytics especially in the context of the buzzword phrases we are talking about is a data problem – more specifically a data management problem.
Point #2: The paper has a number of good referenceable ROI numbers. Specifically from Nucleus Research. These are always handy for use in presentations, and in supporting the business case process especially if you are setting up an enterprise level analytics or data management function.
Point #3: I was struck by the position in the paper that all data should be centralized. On the one hand there is this recurring theme that the data warehouse is rigid, time consuming, costly, slow and requires specialized skills. The shortcoming of the data warehouse approach are focused on the fact that it is a centralized source of data. On the other hand the paper is recommending that all data be centralized. It does not really address the underlying issue that the recommendation to solve the problem of the centralized data is to centralize the data – in presumably a new way?
When it comes to #3, the paper is not wrong, its just that the reality is that data in large organizations will always be silo’d. The underlying operational / transaction oriented systems that drive business will always be optimized for the operational environment, and rarely for analytics. Indeed structuring the data for analytics in the traditional sense (i.e. dimensional cubes) implies that the analytics has already been done. The true analyst has moved onto the next problem, and is looking for meaning and insights in the data that are not yet known. This article by Tip Clifton talks to this challenge nicely.
What the paper glosses over is that regardless of where the data resides, one needs to know of its existence and nature. What is key is the centralization of the knowledge regarding the data you have – not necessarily the data itself. Indeed fundamental to many of the “big data” discussions is the notion that the data does not move. It is used “in situ” as they say. The knowledge that needs to be centralized centers around the following:
- The context it was collected in. Was this data collected through social media, through a news agency, or from an established research organization?
- How it can be used. Population estimates created through sampling performed by the US Census are going to be used in a different context than population estimates created through location tags on twitter feeds for example. One is not better than the other, but you would not use them in the same analytical context.
- The level of curation. Did anyone validate this data asset? Does it have a shelf life? Who maintains it?
- The lineage and provenance. Is this a referenceable data asset? will my analysis be transparent, testable, repeatable and defensible? Will the analytical conclusion hold up under scrutiny: in front of a judge; your budget committee; your investors; your customers?
- How to access. Where is this data asset located? How can it be accessed? by whom? Under what conditions?
This centralized repository of the above knowledge can come in many forms, but generally speaking we are talking about metadata. The premise here is that if we have fully described and well cataloged data, it does not matter where it resides. Indeed, the work involved in getting the data centralized and structured is liable to devolve into the much maligned Data Warehouse ecosystem that the paper is suggesting we avoid.
I discussed this notion of the managed metadata environment of this nature in this post which discusses how one gets started. Inherent in this approach and well addressed in the paper are a number of data management aspects that are critical: the data plan; governance; stewardship; and tools. The notion of the overarching architecture is missing. However, this needs to be considered as the business architecture will drive system architecture and the set of tools selected. Of course the legacy architecture will often play a huge role here.
When I am talking to clients on the topic of building out analytical capabilities, I tend to focus on three things in order of importance: The business case and impact associated with the analytical capabilities being developed; the analytical approach that makes sense for your organization; and the underlying data foundation on which the analytics must reside.
For integrated analytics, and the buzzwords we are discussing, the data foundations are built on the best practices embedded in the CMMI Data Management Framework aligned to the traditional analytical cycle. This alignment of the CMMI framework specifies the key management practices required to identify, build, scale and manage the integrated analytical environment; whether it is implemented as a service, in a Center of Excellence, or some combination of the two.
Download the paper and have a read. It provides a good starting point.
One Response to “The evolving buzz phrases: “Integrated Analytics”, “Analytics as a Service”, “Analytical Centers of Excellence”…”