Archive | Data Management RSS feed for this section

Magic Quadrant for Data Integration Tools

23 Feb

Gartner Data Integration Survey

October 2012 – All the normal suspects. However, was surprised (and pleased) to see Talend in the mix. Interesting to note that SAS is in the lead with the number of installs (13k) – up there with Microsoft  (12k).

Gartner 2012 MDM Survey

22 Feb

Given what just put out about TIBCO, thought that I should put this out as they show up here under MDM as well.

Magic Quadrant for Master Data Management of Customer Data Solutions

18 October 2012 ID:G00233198
Analyst(s): John Radcliffe, Bill O’Kane

VIEW SUMMARY

Organizations selecting a solution for master data management of customer data still face challenges. Overall, functionality continues to mature, but some of the well-established vendors’ messages are getting more complex. Meanwhile, less-established vendors continue to build up their credentials.

Link

Another reason why Data Management and Analysts cannot lead separate lives

14 Feb

Another reason why Data Management and Analysts cannot lead separate lives

I found this article interesting in that it points out why the bridge between the data side of the house and the analytical side must be well established – if the data team implements a design that does not support analytics, it has material impacts. I know this is blindingly obvious, but ….

I have recently been in a number of discussions where the attitude was we are going to build the data warehouse using best practices and years of experience, and it really does not matter what you are going to do with the data. I know it crazy, but… you know what I am talking about – we see it all the time.

The article itself tests performance on a columnar versus relational approach to persisting data, and has some surprising results – 4,100% improvement! I would be interested in other studies that have looked at the difference between different data architectures when performing analytical tasks.

Link

It is about what you do with the data!!

7 Feb

Hadoop and Big Data – it is about what you do with the data!!

Some good videos from TechTarget and Wayne Eckerson – for a data guy he talks a lot about analytics.

We need to think about ETL differently!

26 Jan

This blog was started to write about analytics – so here I go again on ETL! Seems that if you are working on Big Data things, it always starts with the data, and in many respects that is the thing that is most difficult – or perhaps requires the most wrenching changes – See this Creating an Enterprise Data Strategy for some interesting facts on data strategies.

ETL is a chore at the best of times. Analysts are generally in a rush to get data into a format that supports the analytical task of the day. Often this means taking the data from the data source and performing the data integration required to make the data analytically ready. This is often done at the expense of any effort by the data management folks to apply controls oriented at data quality issues.

This has created a tension between the data management side of the house and the analytical group. The data management folks are focused on getting data into an enterprise Warehouse or DataMart in a consistent format with data defined and structured in accordance with definitions and linkages defined through the data governance process. Analysts on the other hand – especially those engaged in adaptive type of analytical challenges – seem always to be looking at data through a different lens. Analysts often want to apply different entity resolution rules; want to understand new linkages (implies new schema); and, generally seek to apply a much looser structure to the data in order to expose insights that are often hidden by the enterprise ETL process.

This mismatch in requirements can be addressed in many ways. However, a key starting step is to redefine the meaning of ETL within an organization. I like the definition attributed to Michael Porter where he defines a “Lifecycle of Transformation” that shows how data is managed from the raw or source state through to application in a business context (Larger Image)

Value Chain of Transformation

Value Chain of Transformation

I am pretty sure that Michael Porter does not think of himself as an ETL person, and the article  (Page 14) I obtained this from indicates that this perspective is not ETL. However, I submit that the perspective that ETL stops once you have data in the Warehouse or the DataMart is just too limiting, and creates a false divide. Data must be both useable and actionable – not just useable. By looking at the ETL challenge across the entire transformation (does that make ETL TL TL TL …?), practitioners are more likely to meet the needs of business users.

Related discussions for future entries:

  • Wayne Eckerson has a number of articles on this topic. My favorite: Exploiting Big Data Strategies for Integrating with Hadoop by Wayne Eckerson; Published: June 1, 2012.
  • The limitations placed on analytics through the application of a schema independent of the analytical context is one of the drawbacks of “old school” RDBMS. The ability of a file based Hadoop / mapreduce oriented analytical environment to apply the schema later in the process is a key benefit of Hadoop/Mapreduce.