analyticaltern

DGIQ 2018

The DGIQ conference this year went well. I had two presentations, caught up with industry colleagues and customers. It helped that it was in San Diego – and the weather relative to the hot mugginess of the Mid Atlantic was excellent.

My presentation on GDPR was surprisingly well attended. I say surprising in that the deadline has passed, and I find that there are still companies that are formulating their plans. However, I am beginning to feel a bit like Samuel Jackson.

In the GDPR presentation, the goal was to focus attention on not only doing the right thing to be compliant, but also doing it right. How do we reduce the stress and overhead of dealing with regulators. We call this “Audit Resilience.” I spoke to a number of people that are taking a wait and see approach to GDPR compliance. Interestingly even though they are taking this approach, they are still getting requests to remove personal information. It seems to me that if you are taking a wait and see approach, you really still need to be able to remove personal information from at least the web site otherwise, you risk triggering a complaint, and then … you have no defense. Goal has to be to do everything not to trigger a complaint. The presentation took about 15 minutes, and the rest of the time was spent demonstrating the data control model in the DATUM governance platform – Information Value Management.

Building a Strategy customers and Auditors Love from jadams6

Also had the pleasure of presenting with Lynn Scott who co chairs the Healthcare Technology & Innovation practice at Polsinelli with Bill Tanenbaum – what we wanted to do was push home the point that collaboration is key when dealing with thorny risk and compliance issues. We tried to have some fun with this one.

A Lawyer, a Salesperson and the Operations Guy Walk into a Bar . . . from jadams6

I will be at the Data Architecture Summit in Chicago in October. The session will cover:

What are the requirements to ensure management is “audit resilient”?
What is a Control System and how is it related to a Data Control Model?
What is “regulatory alignment” from a data perspective?
How do I build a Data Control Model?
What role do advanced techniques (AI, Machine Learning) play in audit resilience?

Hope to see you all there

Tags: audit reslience, Conferences, DGIQ, GDPR

Comments Leave a Comment
Categories Compliance, Personal Data Protection, Privacy
Author analyticaltern

What is Machine Learning?

11 Jul

I enjoyed this post in Techemergence. I was originally looking for a definition of machine learning that helped reconcile all of the different definitions that are out there. I like the approach that they took. I modified a table they had (presented below) slightly to capture some of the thoughts on machine learning methods.

This is too complicated for many, but captures the idea that ML is layered, and will involve many techniques. I have a simplified list in Automating Data Management and Governance through Machine Learning.

See also No, Machine Learning is not just glorified Statistics for some more discussion in plain English on Machine learning.

Classification

Scoring

Recommendation / Prediction

· K-Nearest Neighbor

· Support Vector Machines

· Naïve Bayes

· Logistic Regression

· Decision Trees

· Sets of Rules

· Propositional Rules

· Logic Rules

· Neural Networks

· Bayesian Networks

· Conditional Random Fields

· Accuracy / Error Rate

· Precision & Recall

· Squared Error

· Likelihood

· Posterior Probability

· Information Gain

· K-L Divergence

· Cost / Utility

· Margin

Combinatorial Optimization

· Greedy Search

· Beam Search

· Branch & Bound

Continuous Optimization

· Gradient Descent

· Conjugant Gradient

· Quasi Newton Method

· Linear Programming

· Non-Linear (Quadratic) Programming

Credit: Dr. Pedro Domingo, University of Washington (Slightly Simplified)

Tags: Machine Learning, ML

Comments 1 Comment
Categories analytics, Metadata
Author analyticaltern

Classification – the key to releasing data’s value !

6 Jun

Someone asked me the other day what the business case was for classifying data. For anyone that has engaged with data to perform analytics or produce business intelligence reports, this may seem like a silly question. However, in many minds, the data does not need to be labelled or classified in any way. The data is used by an application and if that application is performing correctly, the data must be good. And, at some level they are right – as long as the data involved never has to be used outside its application, it may never need to be classified or labelled in any way. The data receives all of it semantic context from the application where it is used.

So when does classification become important? It becomes important when data leaves the application that gave it context. For many of our customers this occurs when data leaves the transactional ERP type system, and is moved into a data warehouse or a data lake whose purpose is to provide access to data from multiple sources. Traditionally, this movement from transactional to a more generally accessible repository came with a level of curation. Prior to the concept of the “Data Lake,” data was moved into the data warehouse with the goal of making it the “single source” of truth. This often involved significant levels of data stewardship and curation to reconcile conflicting versions of “truth.” With the growing awareness and adoption of analytics, the idea of a stable concept of “truth” is elusive. The right data for an analyst is context driven and at times highly variable. The Data Lake construct addresses this issue by allowing all data to be loaded so that the user can determine what data to use based on the decision context at the time. This is what data classification enables. Well classified data can be discovered, analyzed, accessed and integrated into a user’s context based on the classification labels that have been exposed to the user in the Data Asset catalog. Based on this perspective, classification is foundational for driving value out of data in the areas of analytics, business intelligence, operational efficiencies, and compliance.

Indeed in the big data space, classification is foundational for analytics, machine learning, the application of higher level logic, and (way up the maturity curve) for building artificial intelligence capabilities. As a foundational building block for Ai, classification is an interesting topic; although for many too abstracted from today’s problems. However, as the foundation for making data discoverable, understandable, accessible and able to be integrated into downstream applications, it is highly relevant to today’s challenges – almost regardless of where your current capabilities stand. For this reason any data management shop should include in its planning a workstream that seeks to evolve classification capabilities

Consider the following uses cases:

Business Intelligence: marketers seeking to report on price sensitivity and are comparing the difference between prices quoted, prices invoiced, and prices paid net of discount. Data across all of the ERP or transactional systems in use must be classified such that the BI Team is assured that all fields marked as “Price” are the correct type of price.

Marketing Analytics: Your customer 360ᵒ program seeks to understand external factors that may have influenced pricing and discounts provided. What customers are related to the prices referenced above? What kind of customers are they (industry, buying frequency, average purchase, …)? How can I correlate those with external events (elections, new regulation, natural disasters, …)? All of this analysis is supported by data that is classified to reflect the types of queries that may occur and analytical operations to be performed.

Operational Efficiency: Your COO wants to ensure that the acquisition process is fully optimized, and seeks to benchmark operations using the SCOR (Supply Chain Operations Reference) Model. The Operations Team downloads the 250 SCOR performance metrics and seeks to map those to the relevant data. Classification supports the ability to find the right data and map it to the data specified in the SCOR Model.

Compliance & Risk Management. Risk teams will rely on well classified data to enable risk models that are robust and flexible in their ability to address evolving risk. This is especially the case for risk associated with adaptive threats; for example fraud and cyber-crime.

Bottom line, if classification is not something that you have thought about, consider putting a plan together. It is the key to releasing the value of your data, and fully leveraging data as an asset.

Tags: Classification

Comments 2 Comments
Categories Best Practices, BI, Big Data, Classification, Data Management, Metadata
Author analyticaltern

Enterprise Data Worlds

22 May

I attended the Enterprise Data Worlds conference last month in San Diego. I was speaking on GDPR, and what you needed to do if you were just starting to think about GDPR as the deadline is now so close. The meeting was well attended which was a surprise given how close we are to the deadline. The Facebook / Cambridge Analytica fiasco has drawn attention to the protection of personal information, and to GDPR in particular. What I see are the smaller companies getting drawn into the discussion, and realizing how big this might be for them. The deck is below.

In general, the show continues to improve. The keynote presentation by Mike Ferguson. Intelligent Business Strategies Ltd Was interesting in that I am not sure if the same presentation had been given a couple of years ago that it would have been as well received. It would have been considered a fantasy by so many in the audience. Some of his key points:

Very comprehensive at the enterprise level – remember when Enterprise data management – or enterprise anything was a bad word?!
Tagging and classification is all going to be algorithm driven, and in the pipe – In his presentation IOT was driving the volume – had some good volume numbers.
Pushing the virtual enterprise data lake – everything tied together in a metadata hub

The products and vendor knowledge was the biggest surprise of the show – probably because expectations were low. In general, the tools discussions were more applied. Key observations:

Much more evolved presentations – hooked to business drivers.
Integrated products on the rise. Especially around the source to target discussion:
- ETL, DQ, Profiling and Remediation are integrated into a single pipeline discussion
- Sales people were more knowledgeable about how this works.
- API injection of new capabilities into this pipeline – this was something that all professed to do. However, when pushed it was clear that there were varying stages of capability – All seemed to have APIs, the question seemed to be about how robust the API is.
- Linked data / semantics was a bigger topic than normal. It is beginning to be discussed in an applied sense.
- The FIBO (Financial Business Ontology) is a driver in this – more importantly it is being integrated into tools – so people can visualize how it is applied. This is pulling in the business side of the house
- This is all metadata especially business metadata – this is shifting the discussion towards business.

Tags: dataversity, EDW, Enterprise Data Worlds, GDPR

Comments Leave a Comment
Categories Compliance, Data Management, Industry, Personal Data Protection
Author analyticaltern

Audit Resilience and the GDPR

15 May

Compliance activities for organizations are often driven from the legal or risk groups. The initial focus is on management’s position and actions required to be compliant; generally this starts with the creation of policies. This makes sense as policies are a reflection of management’s intent and provide guidance on how to put strategic thinking into action. The legal teams provide legal interpretation and direction with respect to risk. This is also incorporated into the policies. So, what happens next as your organization addresses challenges around ensuring effective implementation and subsequent operational oversight of policies required for General Data Protection Regulation (GDPR) compliance?

THE CHALLENGES

The challenges associated with GDPR as well as other compliance activities are centered on achieving “Audit Resilience.” We define this as the ability to address the needs of the Auditor – internal or external – in such a way that compliance is operationally enabled and can be validated easily and with minimal disruptions and cost. The goal is to reduce the stress, the chaos and the costs that often accompany these events to a manageable level.

WHAT DOES AUDIT RESILIENCE MEAN?

Audit Resilience means that the auditor can:

Easily discern the clear line of site between Policies => Standards => Controls => Actors => Data.
Review and explicitly align governance artifacts (policies, standards and processes) to compliance requirements.
Access and validate the “controls” that ensure standards are applied effectively.
Find evidence of execution of the governance practices within the data.

CRITICAL SUCCESS FACTORS

GDPR compliance is a function of creating logical linkage and consistency across multiple functions and actors – down to the data level. Details will vary based on the organization and the assessment of risk.

Overall, the following are critical to successfully demonstrating compliance:

Produce a catalog of all impacted data
Know where data is being used, and by whom
Show governance lineage from Policy => Process => Standard => Control => Data
Report on effectiveness of “Controls”
Produce specific data related to particular requirements such as: Security Events, Notification, Privacy Impact Assessments, and so forth.
Show the relationship of governance tasks to both data and the business processes that use Personal Information.

Tags: Audit, audit reslience, GDPR

Comments Leave a Comment
Categories Best Practices, Compliance, Privacy
Author analyticaltern

Building Solid Foundations in Big Data & Analytics

23 Aug

Originally Published on the DATUM, LLC Site: Building Solid Foundations in a data Swamp

Much has been written about Big Data, Data Science and Artificial Intelligence and how these will change the world through the insights being derived from the data. This especially applies to the unstructured data. A recent article in the Harvard Business Review indicated that “cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all.”[1]

There are a few challenges however:

How do users create understanding and ensure they have the correct data for their needs if it has no structure?
How do you create a single logical view of data in a big data world, where things are not only highly variable, but also are often widely disbursed.
How do you address analytical requirements, where the notion of data quality and how it is managed, varies significantly?
How do you expose the data lake(s) to users in a form that is discoverable, understandable and useable?

This blog is the first in a series to explore the data management and governance perspectives related to these four challenges.

Challenge #1: Unstructured Data

The question of how to deal with unstructured data consistently raises its head as a challenge for organizations. First let’s get a few things out there:

There is no such thing as truly unstructured data. There is always a structure of some sort.
Knowing what you have and having the right tools are foundational capabilities.
The degree of structure required for data to be useful is variable and context driven.

Let’s take these in order:

Creating Structure

Structure is created in one of two ways:

Through reorganizing data so that it has structure
Through labeling data

The former is what happens to data in a traditional data environment as it is moved through the ecosystem – from Source to Enterprise Data Warehouse for example. The latter is what happens in a big data environment. The data is never moved, but rather labels are added to it to provide the ability analyze that data.

Note: Data can be labeled incrementally. Newly acquired data, can only be labelled with the acquisition date, the source, and the file type. As data moves through the data lifecycle, it will be “curated” to add additional context.

A little labelling goes along way!

How much the data needs to be labelled to be useful can be viewed on a continuum. At one end simply knowing that you are looking at emails provides enough information to know how to organize them; while at the other end, social media sentiment analysis will require extensive labelling. Regardless, the right tools are required to provide logical structure to the unstructured data.

When it comes to tools that cater to unstructured data one key capability is entity tagging or entity extraction tools that can recognize an entity and tag it with a label that makes sense to the organization – essentially tag it with the approved glossary term. Entities can be:

Anything from a simple named list such as a “product”; or
Extremely complex and map entities into semantic ontologies such as a “JV” is a “Joint Venture”, which is a type of “Company”, which is an “organization” that has “owners”.

Complementing the tagging capability is a flexible indexing capability. Tools like Elastic Search allow users to search based on the structures discovered in the data. For example, a “Joint Venture “is a type of company. Additionally, these tools can create an index to allow discovery of similarities in text.

The key point is that once data is organized, users and applications can begin to apply big data techniques to expose insights:

How do emails cluster on a timeline?
Are organizations mentioned in the text? (Could be Joint Ventures, Partnerships, LLCs, PLCs, and so on.)
Is there a change in frequency over time? Related to what entity types / categories?

What does this mean from a data management perspective?

From a data management perspective unstructured data will require some new capabilities. However, in some respects, it really is more of the same: What data do I have and where is it? Is my data labelled to communicate understanding? Is my data easy to acquire and apply in my context?

If you think of tags or labels as descriptive metadata, and the list of tags and labels as reference metadata, then you can place this activity into the traditional data management context. In order for data to be discovered, understood and integrated across systems and use cases, organizations need to:

Have a disciplined approach to how data is described and labelled. This starts with creating a set of glossary terms that can be linked to define meaning. [2]
Implement the governance framework that ensures the data is aligned to – and remains aligned to – the business understanding of what the data is, and how it is used.

Organizations often do not face this challenge until they need to manage data across the various operational silos, geographic regions or functional domains. The need to understand product lifecycle data with regional focus group data is an example of a cross functional/geography/silo data mash up that delivers high impact insights.

Be sure to check back in as we address the next three challenges!

References

[1] Harvard Business Review What’s Your Data Strategy? Leandro DalleMule, Thomas H. Davenport; May –June 2017 Issue https://hbr.org/2017/05/whats-your-data-strategy

[2] With reference to linking of data, the simple link types are “subset of”, “superset of”, “same as”. (See SKOS for a deeper discussion on knowledge organization). For example, using this approach one can tag pharmaceutical products to identify synonyms as recognized by the ISO standards; and synonyms of the same product that are commercial names. This is the challenge faced by organizations implementing the IDMP standards.

[3] For a good case study of data integration across disparate data sets using SKOS metadata see Healthcare Research Information

Tags: Ai, Artificial Intelligence, Big Data, healthcare, SKOS, Unstructured data

Comments Leave a Comment
Categories analytics, Best Practices, Data Management, Metadata
Author analyticaltern

Another Data Mart?

12 Jul

Martin’s Insights published the article below. It begs the questions – what to do? Clearly a CDP is created to solve an unmet need. The whatever the answer is for any given organization, data must be known “in context” and must be traceable back to its original form to survive scrutiny. Here is the article.

======================================

Recently you may have heard – from your business network or circle of marketing friends – that Customer Data Platforms (CDPs) is the new ‘black’. Can a CDP really be an all-rounded solution to marketing’s most pressing problem, when it comes to enhancing customer experience? Certainly, if you are in the BI field, the concept…

via Trend Alert – Customer Data Platforms — Martin’s Insights

Tags: CDP, Customer Data Platform, Datamart

Comments Leave a Comment
Categories Best Practices, Data Management
Author analyticaltern

Health Data Analytics 2016 — Martin’s Insights

29 Nov

I captured this write up by Martin Fowler as it is organized around 6 areas that I see as foundational: collection/persistence; privacy / security; interoperability / sharing; BI / Reporting; analytics; and, Information strategy. It always seems in one form or another to come back to these topics.

From http://www.martinsights.com

Health Data Analytics 2016

I had the privilege and pleasure to attend HISA’s Health Data Analytics conference in Brisbane on 11 and 12 October 2016. What follows is this particular BI and Analytics consultant’s impressions and insights from the conference in terms of the main themes covered and the messages and impressions I take away, again from my particular…

via Health Data Analytics 2016 — Martin’s Insights

Tags: Martin_Fowler

Comments Leave a Comment
Categories analytics, Healthcare
Author analyticaltern

Business Framework for Analytics Implementation

3 Aug

Updated 9/14/20 with new links. It is a bit ironic that I linked to the Dataversity site, and they do not use persistent identifiers to label their data assets, so all my links are dead. Note to practitioners – if you are not using persistent identifiers your institutional knowledge captured in data assets lasts as long as the identifier!

I went looking for this deck as I was having a discussion on governance that is as old as the hills; essentially how do you link data governance activities to the business activity to address – why does data governance exist?

The other discussion that got me looking at this article again was how we go about building an operating model for organizations where the Governance team is doing more than responding to quality requests – how does the team proactively address data issues?

Both of these are tied to the article below. The Hoshin Framework (at least as it is presented below) ties strategic initiatives all the way down to identified data capabilities that can be addressed proactively to support the business strategy.

A note on the spreadsheet. This spreadsheet is not for the faint of heart. The spreadsheet supports the thought exercise used to shape discussions and your communication with stakeholders. The key point to take away is that the spreadsheet gives you the ability to relate governance budget to strategic goals, funded programs, current project and metrics. Think of it as the audit worksheets – no one ever sees those, and the auditor reports out only the results.

Original Post.

In my previous post I discussed some analytical phrases that are gaining traction. Related to that I have had a number of requests for the deck that I presented at the Enterprise Dataversity – Data Strategy & Analytics Forum. I have attached the presentation here. NOTE: This presentation was done a few years ago while I was with CMMI (Now ISACA) as a result it is tied to the Data Management Maturity Model. I talked about analytics, and my colleague on the talk addressed data maturity

Also, while I am posting useful things that people keep asking for, here are a set of links that Jeff Gentry did on management frameworks for a Dataversity Webinar. Of particular interest to me was the mapping of the Hoshin Strategic Planning Framework to the CMMI Data Management Maturity Framework. The last link is the actual excel spreadsheet template.

Links:

Webinar Recording: CDO Webinar: CDO Interview with Jeff Gentry – Favorite Frameworks/. The link to the deck is here
Link to Using Hoshin Frameworks. Hoshin is bigger than just this matrix, and is a heavy process for most people. However, the following gives you soem background: http://www.slideshare.net/Lightconsulting/hoshin-planning-presentation-7336617
Hoshin Framework linked to DMM: Data Analytics Strategy and Roadmap Template 20160204D.xlsx

Tags: analytics, dataversity, Hoshin, Jeff Gentry, Strategy

← Older Entries

Newer Entries →

Search

analyticaltern

What is Machine Learning?

Classification – the key to releasing data’s value !

Building Solid Foundations in Big Data & Analytics

Challenge #1: Unstructured Data

Creating Structure

A little labelling goes along way!

What does this mean from a data management perspective?

Another Data Mart?

Health Data Analytics 2016 — Martin’s Insights

Health Data Analytics 2016

Business Framework for Analytics Implementation

Recent Posts

Archives

Follow Blog via Email

Interesting Tags

Wayne Erikson

Pages

Search

Share this:

Share this:

Share this:

Share this:

Share this:

THE CHALLENGES

WHAT DOES AUDIT RESILIENCE MEAN?

CRITICAL SUCCESS FACTORS

Share this:

Challenge #1: Unstructured Data

Creating Structure

A little labelling goes along way!

What does this mean from a data management perspective?

Share this:

Share this:

Health Data Analytics 2016

Share this:

Share this:

Recent Posts

Archives

Follow Blog via Email

Pages