Archive | BI RSS feed for this section

Business Framework for Analytics Implementation

3 Aug

In my previous post I discussed some analytical phrases that are gaining traction. Related to that I have had a number of requests for the deck that I presented at the Enterprise Dataversity  – Data Strategy & Analytics Forum.  I have attached the presentation here.

Also, while I am posting useful things that people keep asking for, here are a set of links that Jeff Gentry did on management frameworks for a Dataversity Webinar. Of particular interest to me was the mapping of the Hoshin Strategic Planning Framework to the CMMI Data Management Maturity Framework. The last link is the actual excel spreadsheet template.

Link to Webinar: Slides: http://www.dataversity.net/cdo-slides-cdo-interview-with-jeff-gentry-favorite-frameworks/

  1. Webinar Recording: http://www.dataversity.net/cdo-webinar-cdo-interview-with-jeff-gentry-favorite-frameworks/
  2. Link to Using Hoshin Frameworks: http://www.slideshare.net/Lightconsulting/hoshin-planning-presentation-7336617
  3. Hoshin Framework linked to DMM: http://content.dataversity.net/rs/656-WMW-918/images/Data Analytics Strategy and Roadmap Template 20160204D.xlsx
Advertisements

Forensic Analytics and the search for “robust” solutions

12 Jan

Happy New Year!

This entry has been sitting in my “to publish” file for some time. There is much more to be said on the topic. however, in the interest of getting it out … enjoy!

=======================================================

This entry was prompted by the article in the INFORMS ANALYTICS Magazine article titled Forensic Analytics: Adapting to a Growing Pandemic by Priti Ravi who is a senior manager with Mu Sigma and specializes “in providing analytics-driven advisory services to some of the largest retail, pharmaceutical and technology clients spread across the United States.”

Ms. Ravi writes a good article that left me hanging. Her conclusion was that the industry lacks access to sophisticated and intelligent monitoring equipment, and there exists a need for a “robust fraud management systems” that “offer a collective set of techniques” to implement a “complex adaptive approach.” I could not agree more. However, where are these systems? Perhaps even what are these systems?

Adaptive Approaches

To the last question first. What is a Complex Adaptive Approach? If you Google the phrase, the initial entries involve biology and ecosystems. However, wikipedia’s definition encompasses medicine, business and economics (amongst others) as areas of applicability. From an analytics perspective, I define complex adaptive challenges as those that  are impacted by the execution of the analytics – by doing the analysis, the observed behaviors change. This is inherently true of fraud as the moment perpetrators  understand (or believe) they can be detected, behavior will change. However, it also applies to a host of other type of challenges: criminal activity, regulatory compliance enforcement, national security; as well as things like consumer marketing and financial investment.

In an article titled Images & Video: Really Big Data the authors (Fritz Venter the director of technology at AYATA; and Andrew Stein the chief adviser at the Pervasive Strategy Group. define an approach they call “prescriptive analytics” that is ideally suited to adaptive challenges. They define prescriptive analytics as follows:

“Prescriptive analytics leverages the emergence of big data and computational and scientific advances in the fields of statistics, mathematics, operations research, business rules and machine learning. Prescriptive analytics is essentially this chain of transformations whereby structured and unstructured big data is processed through intermediate representations to create a set of prescriptions (suggested future actions). These actions are essentially changes (over a future time frame) to variables that influence metrics of interest to an enterprise, government or another institution.”

My less wordy definition:  adaptive approaches deliver a broad set of analytical capabilities that enables a diverse set of integrated techniques to be applied recursively.

What Does the Robust Solution Look Like?

Defining adaptive analytics this way, one can identify characteristics of the ideal “robust” solution as follows:

  • A solution that builds out a framework that supports the broad array of techniques required.
  • A solution that is able to deal with the the challenges of recursive processing. This is very data and systems intensive. Essentially for every observation evaluated, the system must determine whether or not the observation changes any PRIOR observation or assertion.
  • A solution that engages users and subject matter experts to effectively integrate business rules. In an environment where traditional predictive analytic models have a short shelf life (See Note 1), engaging with the user community is often the mechanism to quickly capture environmental changes. For example, in the banking world, tracking call center activity will often identify changes in fraud behavior faster than a neural network set of models. Engaging the User in the analytical process will require user interfaces, and data visualization approaches that are targeted at the user population, and integrate with the organization’s work processes. Visualization will engage non technical users to help them apply their experience and intuition to the data to expose insights. The census bureau has an interesting page, and if you look at Google Images, you can get an idea of visualization approaches.
  • A solution that provides native support for statistical and mathematical functions supporting activities associated with data mining : clustering, correlation, pattern discovery, outlier detection, etc.
  • A solution that structures unstructured data: categorize, cluster, summarize, tag/extract. Of particular importance here is the ability to structure text or other unstructured data into taxonomies or ontologies related to the domain in question.
  • A solution that persists data with the rich set of metadata required to support complex analytics. While it is clearer why unstructured data must be organized into a taxonomy / ontology, this also applies to structured data. Organizing data consistently across the variety of sources allows non obvious relationships to be exposed, and application of more complex analytical approaches.
  • A solution that is relatively data agnostic  – data will come from many places and exist in many forms. The solution must manage the diversity and provide a flexible way to integrate new data into the analytical framework.

What are Candidate Tools ?

And now to the second question: where are these tools? It is hard to find tools that claim to be “adaptive analytic” tools; or “prescriptive analytics” tools or systems in the sense that I have described them above. I find it interesting that over the last five years, major vendors have subsumed complex analytical capabilities into a more easily understandable components. Specifically, you used to be able to find Microsoft  Analytical Services easily on their site. Now it is part of MS SQL Server as SSAS; much the same way that the reporting service is now part of the database offer as SSRS (reporting services). There was a time a few years ago when you had to look really hard on the MS site to find Analytical Services. Of course since then Microsoft has integrated various BI acquisitions into the offer and squared away their marketing communication. Now their positioning is squarely around  BI and the database. Both of these concepts are easier to sell at the executive level, than the notion of prescriptive or adaptive analytics.

The emergence of databases and appliances optimized around analytics has simplified the message on the data side. everyone knows they need a database, and now they have one for analytics. At the decision maker level, that is a much easier decision than trying to figure out what kind of analytical approach the organization is going to adopt. People like Teradata have always supported analytics through the integration of SAS and now R as in-database functionality. However, Greenplum, Neteeza and others have incorporated SAS and the open source analytical “R” . In addition, we have seen the emergence (not new but much more talked about it seems) of the columnar database. The one I hear about most is the Sybase IQ product; although there have been a number of posts on the topic on here, here, and here.

My point here is that vendors have too hard a time selling complex analytical solutions, and have subsumed the complex capabilities into the concepts that are easier to package, position and communicate around; namely; database products and Business Intelligence products. The following are product sets that are candidates for the integrated approach. We start with the big players first and work towards that are less obviously candidates.

SAS

The SAS Fraud Framework provides an integration of all the SAS components that required to implement a comprehensive analytics solution around adaptive challenges (all kinds of fraud, compliance, money laundering, etc. as examples). This is a comprehensive suite of capabilities that spans all activities: data capture, ingest, and quality; analytics tools (including algorithm libraries), data visualization and reporting / BI capabilities. Keep in mind that SAS is a company that sells the building blocks, and the Fraud Framework is just that, a framework within which customers can build out capabilities. This is not a simple plug and play implementation process. It takes time and investment and the right team within the organization. The training has improved, and it is now possible to get comprehensive training.

As with any implementation of SAS, this one comes with all the caveats associated with comprehensive enterprise systems that integrate  analytics into the fabric of an organization. The Gartner 2013 BI report indicates that SAS “very difficult to implement”. This theme echoes across the product set.  Having said that   when it comes to integrated analytic of the kind we have been discussing all, of the major vendors suffer from the same implementation challenges – although perhaps for different reasons.

Bottom line however, is that SAS is a company grounded in analytics – the Fraud Framework has everything needed to build out a first class system. However, the corporate culture builds products for hard core quants, and this is reflected in the Gartner comments.

IBM

IBM is another company that has the complete offer. They have invested heavily in the analytics space, and between their ETL tools; the database/ appliance and Big Data capabilities; the statistical product set that builds off SPSS; and, the Cognos BI suite users can build out the capabilities required. Although these products are being integrated into a seamless set of capabilities, they remain somewhat separate and this probably explains some of the implementation challenges reports. Also, the product side of the IBM operation does not necessarily speak with the Global Services side of the house.

I had thought when IBM purchased Systems Research & Development (SRD) in 2005 that they were going to build out capabilities that SRD and Jeff Jonas had developed. Jeff heads up the Entity Analytics group within IBM Research, and his blog is well worth the read. However, the above product set appears to have remained separated from the approaches and intellectual knowledge that came with SRD. This may be on purpose – from a marketing perspective, buy the product set, and then buy IBM services to operationalize the system is not a bad approach.

Regardless, as the saying goes, no one ever got fired for buying IBM” probably still holds true. However, like SAS beware of the implementation! Any one of the above products (SPSS, Cognos, and Infosphere) require attention when implementing. However, when integrating as an operational whole, project leadership needs to ensure that expectations as to the complexity and time frame are communicated.

Other Products

There are many other product sets and I look forward to learning more about them. Once I post this, someone is going to come back and mention “R” and other open source products. There are plenty out there. However, be aware that while the products may be robust, many are not delivered as an integrated package.

With respect to open source tools, it is worth noting that the capabilities inherent in Hadoop – and the related products, lend themselves to adaptive analytics in the sense that operators can consistently re-link and re-index on the fly without having to deal with where and how the data is persisted. This is key in areas like signals intelligence, unstructured data analysis, and even structured data analysis where the notion of semantic equivalence is shifting. This is a juicy topic all by itself and worthy of a whole blog entry.

Notes:

  1. Predictive analytics relies on past observations to predict future observations. In an adaptive environment, the inputs to those predictive models continually change as a result of the outputs using the past observations.

Self Service BI

29 May

Good article on Self Serve BI. The term has been around a while, but never seems to get old.

The addition of analytical functions to databases

14 Nov

The trend has been for database vendors to integrate analytical functions into their products; thereby moving the analytics closer to the data (versus moving the data to the analytics). Interesting comments in the article below on Curt Monash’s excellent blog.

What was interesting to me, was not the central premise of the story that Curt does not  “think [Teradata’s] library of pre-built analytic packages has been a big success”, but rather the BI vendors that are reportedly planning to integrate to those libraries: Tableau, TIBCO Spotfire, and Alteryx. This is interesting as these are the rapid risers in the space, who have risen to prominence on the basis of data visualization and ease of use – not on the basis of their statistical analytics or big data prowess.

Tableau and Spotfire specifically focused on ease of use and visualization as an extension of Excel spreadsheets. They have more recently started to market themselves as being able to deal with “big data” (i.e. being Hadoop buzzword compliant). With the integration to a Teradata stack and presumably integrating front end functionality into some of these back end capabilities, one might expect to see some interesting features. TIBCO actually acquired an analytics company. Are they finally going to integrate the lot on top of a database? I have said it before, and I will say it again, TIBCO has the ESB (Enterprise Service Bus), the visualization tool in Spotfire and the analytical product (Insightful); hooking these all together on a Teradata stack would make a lot of sense – especially since Teradata and TIBCO are both well established in the financial sector. To be fair to TIBCO, they seem to be moving in this direction, but it has been some time since I used the product).

Alteryx is interesting to me in that they have gone after SAS in a big way. I read their white paper and downloaded the free product. They keep harping on the fact that they are simpler to use than SAS, and the white paper is fierce in its criticism of SAS. I gave their tool a quick run through, and came away with two thoughts: 1) the interface while it does not require coding/script as SAS does, cannot really be called simple; and 2) they are not trying to do the same things as SAS. SAS occupies a different space in the BI world than these tools have traditionally occupied. However,…

Do these tools begin to move into the SAS space by integrating onto foundational data capabilities? The reason SAS is less easy to use than the products of these rapidly growing players is that the rapidly growing players have not tackled the really tough analytics problems in the big data space. The moment they start to tackle big data mining problems requiring complex and recursive analytics, will they start to look more like SAS? If you think I am picking on SAS, swap out SAS for the IBM Cognos, SPSS, Netezza, Streams, Big Insights stack, and see how easy that is! Not to mention the price tag that comes with it.

What is certain is that these “new” players in the Statistical and BI spaces will do whatever they can to make advanced capabilities available to a broader audience than traditionally has been the case with SAS or SPSS (IBM). This will have the effect of making analytically enhanced insights more broadly available within organizations – that has to be a good thing.

Article Link and copy below

October 10, 2013

Libraries in Teradata Aster

I recently wrote (emphasis added):

My clients at Teradata Aster probably see things differently, but I don’t think their library of pre-built analytic packages has been a big success. The same goes for other analytic platform vendors who have done similar (generally lesser) things. I believe that this is because such limited libraries don’t do enough of what users want.

The bolded part has been, shall we say, confirmed. As Randy Lea tells it, Teradata Aster sales qualification includes the determination that at least one SQL-MR operator — be relevant to the use case. (“Operator” seems to be the word now, rather than “function”.) Randy agreed that some users prefer hand-coding, but believes a large majority would like to push work to data analysts/business analysts who might have strong SQL skills, but be less adept at general mathematical programming.

This phrasing will all be less accurate after the release of Aster 6, which extends Aster’s capabilities beyond the trinity of SQL, the SQL-MR library, and Aster-supported hand-coding.

Randy also said:

  • A typical Teradata Aster production customer uses 8-12 of the prebuilt functions (but now they seem to be called operators).
  • nPath is used in almost every Aster account. (And by now nPath has morphed into a family of about 5 different things.)
  • The Aster collaborative filtering operator is used in almost every account.
  • Ditto a/the text operator.
  • Several business intelligence vendors are partnering for direct access to selected Teradata Aster operators — mentioned were Tableau, TIBCO Spotfire, and Alteryx.
  • I don’t know whether this is on the strength of a specific operator or not, but Aster is used to help with predictive parts failure applications in multiple industries.

And Randy seemed to agree when I put words in his mouth to the effect that the prebuilt operators save users months of development time.

Meanwhile, Teradata Aster has started a whole new library for relationship analytics.

Information Architecture – A Moving Target?

6 Jul

I am increasingly seeing articles that talk about the confusion in identifying and building out the right information architecture for the organization. The article here, and with a clip below talk to that point. This is a good thing. People seek simplicity, and are looking for the prescriptive approach: 1) build a data warehouse; 2) build some datamarts for the business folks; 3) get a BI tool and build reports. But this does not cut it as it is too rigid a structure for analysts, or other stakeholders that have to do more than pull reports. The industry has responded by – I am speaking in buzzwords here – by adding “sandboxes”; by adding ODS (Operational Data Stores); and by adding a whole new way of landing, staging, persisting data and using it in analytical tasks (Hadoop). Sitting on top of this data level of the information architecture has been an explosion of tools that cater to (more buzzwords) data visualization, self serve BI, and data mashups to name a few.

Bottom line – how does this all get put together without creating an even bigger data mess than when you started? It is hard. What one sees so often is organizations putting off addressing the issue until they have a real problem. At this point, one sees a lot of sub-optimal management behavior. A consistent theme in the press is agility – organizations and their leaders need to embrace the agile manifesto. I am whole heartedly behind this. HOWEVER, agility needs to be framed within a plan, a vision, or at least some articulated statement of  an end point.

The article below is interesting as it presents agility as a key “must have”  management approach, and yet it also discusses the fact that in order for an agile approach to be successful, it needs to adopt disciplines that are decidedly un-agile! This creates a dual personality for leaders within the data management related functions of an organization (BI, analytics, ERP, …). On the one hand one wants to unleash the power of the tools and the creative intellect that is resident within the organization; on the other, there exists a desire to control, to reduce the noise around data, to simplify ones life. The answer is to embrace both – build a framework that provides long term guidance, and iteratively delivers capabilities within that framework towards a goal that is defined in terms of business capabilities – NOT technology or tightly defined tactical goals.

The framework – whichever approach one chooses will articulate the information architecture of the organization – how data flows around the organization to feed core business activities, and advance management’s goals! It is important – if it cannot be explained on a one page graphic, it is probably too complicated!

Martin’s approach to tying things together is below…

“”So given that there is not a one size fits all approach anymore – how does a company ensure its Information Architecture is developed and deployed correctly? Well, you have to build it from the ground up, and you have to keep updating it as the business requirements and implemented systems change. However, to do this effectively, the organisation must be cognisant of separating related workloads and host data on relevant and appropriate platforms, which are then tied together by certain elements, including:

See also:

  1. Polyglot persistence
  2. Data Management Maturity Model as an example of a way to start thinking about governance
  3. Agile development – a good idea so often badly implemented!

The Making of an Intelligence-Driven Organization

6 Jun

Interesting presentation – but really liked the Prezi – if you have not seen one of these have a look

The discussions/handout covered many points including:

  • As a discipline, intelligence seeks to remain an independent, objective advisor to the decision maker.
  • The realm of intelligence is that judgment and probability, but not prescription.
  • The Intelligence product does NOT tell the decision maker what to do, but rather, identifies the factors at play, and how various actions may affect outcomes.
  • Intelligence analysts must actively review the accuracy of their mind-sets by applying structured analytic techniques coupled with divergent thinking
  • Critical thinking clarifies goals, examines assumptions, discerns hidden values, evaluates evidence, accomplishes actions, and assesses inferences/conclusions
  • Networking, coordinating, cooperating, collaboration, and multi-sector collaboration accomplish different goals and require different levels of human resources, trust, skills, time, and financial resources – but worth it to ensure coverage of issues.
  • Counterintelligence and Security to protect your own position
  • and more….

I liked the stages of Intelligence Driven Organizations in the Prezi.

Data Visualisation » Martin’s Insights

23 Apr

This is a good article on data visualization. The author indicates in his considerations section that “real data can be very difficult to work with at times and so it must never be mistaken that data visualisation is easy to do purely because it is more graphical.” This is a good point. In fact in some respects determining what the right visualization is can be harder than simply working with the data directly – however, much harder to communicate key insights to a diverse audience.

What rarely gets enough attention is that in order to create interesting visualizations, the underlying data needs to be structured and enhanced to feed the visualizations appropriately. The recent Boston bombing where one of the bombers slipped through the system due to a name misspelling recalled a project years ago where we enhanced the underlying data to identify “similarities” between entities (People, cars, addresses, etc.) For each of the entities, the notion of similarity was defined differently; for addresses it was geographic distance; for names it was semantic distance; for cars, it was matching on a number of different variables; and for text narratives in documents we used the same approach that the plagiarism tools use. In this particular project a name misspelling, and the ability to tune the software to resolve names based on our willingness to accept false positives, allowed us to identify linkages that identified  networks. Once the link was established we went back and validated the link. In the above example, the amount of metadata generated to create a relatively simple link chart was significant – the bulk of the work. In terms of data generated, it is not unusual for data created to dwarf the original data set – this is especially true if there are text exploitation and other unstructured data mining approaches used.

So … Next time the sales guy shows you the nifty data visualization tool, ask about the data set used, and how massaged it needed to be.

http://www.martinsights.com/?p=492&goback=%2Egde_4298680_member_232053156

This should come as no suprise… Using Excel for complex analysis on an ongoing basis is asking for trouble!

22 Apr

This report on how using Excel has caused some major miscalculations should come as no surprise… Excel exists because it is pervasive, easy to use and can be applied to a range of decision making activities. However, have you ever had the experience of trying to create a repeatable, defensible and transparent report using excel WITHOUT having to make sure you had done it correctly? The attached article talks about a number of mistakes. I have had a number of discussions over the years with companies that are struggling with whether or not to implement a BI system, and if so to what extent should it provide structure and guidance to the process of using Excel?

The easy implementation of BI is to implement a tool such as Tableau that in essence takes spreadsheets and allows you to pivot the data and visualize more easily that one could in excel. I realize that Tableau does more than that now, but that is how it started and most people appear to use it that way still. This gives you great looking dashboards, and allows you to roll around in the data to bubble up insights. However, it does nothing to address the quality of the report and the issues raised by the article.

At the other end of the spectrum are enterprise level tools that do a great job of locking down the source data, and tracking everything that happens at the desktop to make the final report.These tools are focused on creating the SAME report with exactly the same inputs and calculations as all previous times. To the extent changes are made, they are tracked, and capabilities exist to flag and route changes for review and approval. The downside of course is that they often limit what the user can do with the data.

Somewhere in the middle is the happy spot. To the extent tools are not able to support the requirements for transparency, traceability, and defensibility, these requirements must be addressed through policy, process and standards.  Most of the enterprise tools are configurable to create a well defined set of gates between which analysts and report creators can have complete flexibility.

In the cases mentioned in this article, the technology exists to create the safeguards required. However, the user communities were able to resist change, and management – for whatever reason – did not make the decision to invest in underlying data management, BI and analytical capabilities. In a data driven world, it is only a matter of time before that comes back to bite you.

Gartner BI & Analytics Magic Quadrant is out…

10 Feb

The report can be obtained here. Along with some industry analysis hear

Well the top right quadrant is becoming a crowded place.

Gartner BI Quadrant 2013

I have not had time to really go over this and compare it to last year’s but the trends and challenges that we have been seeing are reflected in this report; some interesting points:

  1. All of the Enterprise level systems are reported to be hard to implement. This is not surprise – what always surprises me is that companies blame this on one company or another – they are all like that! It has to be one of the decision criterion when selecting one of the comprehensive tool sets.
  2. My sense is that IBM is coming along – and is in the running for the uber BI / Analytics company. However, the write up indicates that growth through acquisition is still happening. This has traditionally led to confusion in the product line and difficulty in implementation. This is especially the case when you implement in a big data or streaming environment.
  3. Tibco and Tableau continue to go head to head. I see Spotfire on top from a product perspective with its use of “R”, the purchase of Insightful and building on its traditional enterprise service bus business. HOWEVER, Gartner calls out the cost model as something that holds Spotfire back. This is especially true when compared to Tableau. My sense is that if TIBCO is selling an integrated solution, then they can embed the cost of the BI capabilities in the total purchase and this is how they are gaining traction. Regardless – Spotfire is a great product and TIBCO is set to do great things, but their price point sets them up against SAS and IBM, while their flagship component sets them up against Tableau at a lower price point. My own experience is that this knocks them out of the early stage activity, and hence they are often not “built in” to the later stage activity.
  4. SAS Continues to dominate where analytics and  Big Data are involved. However, interesting to note that Gartner calls out that they are having a hard time communicating business benefit. This is critical when you are selling a enterprise product at a premium price. Unlike IBM who can draw on components that span the enterprise, SAS has to build the enterprise value proposition on the analytics stack only – this is not a problem unique to SAS – building the value proposition for enterprise level analytics is tough.
  5. Tableau is the darling of the crowd and moves into the Gartner Leader’s Quadrant for the first time. The company has come out with a number of “Big Data” type features. They have connectors to Hadoop, and the article refers to in-memory and columnar databases. While these things are important, and the lack of them was holding the company back from entering certain markets, it is a bit at odds with their largest customer segment, and their traditional positioning approach. Addressing larger and a more integrated approach takes them more directly into the competitive sphere of the big guys (SAP, IBM and SAS), and also into the sphere of TIBCO Spotfire.
  6. It would be interesting to run the Gartner analysis along different use cases (Fraud, Risk Management, Consumer Market Analysis, etc.) In certain circles one hears much of companies like Palantir that has a sexy interface and might do well against Spotfire and Tableau, but is not included here.  Detica is another company that may do well. SAS would probably come out on top in certain areas especially with the new Visual Analytics component. There are probably other companies that have comprehensive BI solutions for particular markets – If anyone has information on these types of solutions, I would be interested in a comment.

More to follow – and there is probably much more to say as things continue to evolve at a pace!

%d bloggers like this: