Archive | Data Management RSS feed for this section

DGIQ 2018

12 Jul

The DGIQ conference this year went well. I had two presentations, caught up with industry colleagues and customers. It helped that it was in San Diego – and the weather relative to the hot mugginess of the Mid Atlantic was excellent.

My presentation on GDPR was surprisingly well attended. I say surprising in that the deadline has passed, and I find that there are still companies that are formulating their  plans. However, I am beginning to feel a bit like Samuel Jackson.

IMG_2650

In the GDPR presentation, the goal was to focus attention on not only doing the right thing to be compliant, but also doing it right. How do we reduce the stress and overhead of dealing with regulators. We call this “Audit Resilience.”  I spoke to a number of people that are taking a wait and see approach to GDPR compliance. Interestingly even though they are taking this approach, they are still getting requests to remove personal information. It seems to me that if you are taking a wait and see approach, you really still need to be able to remove personal information from at least the web site otherwise, you risk triggering a complaint, and then … you have no defense. Goal has to be to do everything not to trigger a complaint. The presentation took about 15 minutes, and the rest of the time was spent demonstrating the data control model in the DATUM governance platform – Information Value Management.

Also had the pleasure of presenting with Lynn Scott who co chairs the Healthcare Technology & Innovation practice at Polsinelli with Bill Tanenbaum – what we wanted to do was push home the point that collaboration is key when dealing with thorny risk and compliance issues. We tried to have some fun with this one.

I will be at the Data Architecture Summit in Chicago in October. The session will cover:

  • What are the requirements to ensure management is “audit resilient”?
  • What is a Control System and how is it related to a Data Control Model?
  • What is “regulatory alignment” from a data perspective?
  • How do I build a Data Control Model?
  • What role do advanced techniques (AI, Machine Learning) play in audit resilience?

Hope to see you all there

3stooges happy

Advertisements

Will the US evolve towards a GDPR “like” approach to personal information?

3 Jul

CA GDPR Law

In a conversation with a lawyer a few months ago, the comment was made that the US has already implemented GDPR, they have just done small bits of it in each state; collectively similar to GDPR, but no one jurisdiction is anything like GDPR. Except now we have California implementing the California Consumer Privacy Act that will go into effect January of 2020. This regulation is similar in spirit and many details to GDPR. What is fascinating is how the bill was enacted. This article explains how California politics works, and points out that the rapid adoption of the legislation is actually an attempt to create a more flexible environment for companies to negotiate the various compromises that I am sure will come. It is also worth noting that for those companies that are well on the way towards GDPR compliance, they will essentially already be compliant with the California law. I do not see this being the last state to create or update their privacy laws. This was a trend that was already underway. However, California is a big state, and the home of many tech companies, and the State’s new law will surely have an influence on how other States address the privacy issue.

Update 1: Comments on non EU countries updating laws – Canada

https://www.jdsupra.com/legalnews/canada-to-update-data-law-to-gdpr-16052/

Update 2: IAPP Comment on Californian law: 

Enterprise Data Worlds

22 May

I attended the Enterprise Data Worlds conference last month in San Diego. I was speaking on GDPR, and what you needed to do if you were just starting to think about GDPR  as the deadline is now so close. The meeting was well attended which was a surprise given how close we are to the deadline. The Facebook / Cambridge Analytica fiasco has drawn attention to the protection of personal information, and to GDPR in particular. What I see are the smaller companies getting drawn into the discussion, and realizing how big this might be for them. The deck is below.

In general, the show continues to improve. The keynote presentation by Mike Ferguson. Intelligent Business Strategies Ltd  Was interesting in that I am not sure if the same presentation had been given a couple of years ago that it would have been as well received. It would have been considered a fantasy by so many in the audience. Some of his key points:

  • Very comprehensive at the enterprise level – remember when Enterprise data management – or enterprise anything was a bad word?!
  • Tagging and classification is all going to be algorithm driven, and in the pipe – In his presentation IOT was driving the volume – had some good volume numbers.
  • Pushing the virtual enterprise data lake – everything tied together in a metadata hub

The products and vendor knowledge was the biggest surprise of the show – probably because expectations were low. In general, the tools discussions were more applied. Key observations:

  • Much more evolved presentations – hooked to business drivers.
  • Integrated products on the rise. Especially around the source to target discussion:
    • ETL, DQ, Profiling and Remediation are integrated into a single pipeline discussion
    • Sales people were more knowledgeable about how this works.
    • API injection of new capabilities into this pipeline – this was something that all professed to do. However, when pushed it was clear that there were varying stages of capability – All seemed to have APIs, the question seemed to be about how robust the API is.
    • Linked data / semantics was a bigger topic than normal. It is beginning to be discussed in an applied sense.
    • The FIBO (Financial Business Ontology) is a driver in this – more importantly it is being integrated into tools – so people can visualize how it is applied. This is pulling in the business side of the house
    • This is all metadata especially business metadata – this is shifting the discussion towards business.

Audit Resilience and the GDPR

15 May

Compliance activities for organizations are often driven from the legal or risk groups. The initial focus is on management’s position and actions required to be compliant; generally this starts with the creation of policies. This makes sense as policies are a reflection of management’s intent and provide guidance on how to put strategic thinking into action. The legal teams provide legal interpretation and direction with respect to risk. This is also incorporated into the policies. So, what happens next as your organization addresses challenges around ensuring effective implementation and subsequent operational oversight of policies required for General Data Protection Regulation (GDPR) compliance?

THE CHALLENGES

The challenges associated with GDPR as well as other compliance activities are centered on achieving “Audit Resilience.” We define this as the ability to address the needs of the Auditor – internal or external – in such a way that compliance is operationally enabled and can be validated easily and with minimal disruptions and cost. The goal is to reduce the stress, the chaos and the costs that often accompany these events to a manageable level.

WHAT DOES AUDIT RESILIENCE MEAN?

Audit Resilience means that the auditor can:

  • Easily discern the clear line of site between Policies => Standards => Controls => Actors => Data.
  • Review and explicitly align governance artifacts (policies, standards and processes) to compliance requirements.
  • Access and validate the “controls” that ensure standards are applied effectively.
  • Find evidence of execution of the governance practices within the data.

 

CRITICAL SUCCESS FACTORS

GDPR compliance is a function of creating logical linkage and consistency across multiple functions and actors – down to the data level.  Details will vary based on the organization and the assessment of risk.

Overall, the following are critical to successfully demonstrating compliance:

  1. Produce a catalog of all impacted data
  2. Know where data is being used, and by whom
  3. Show governance lineage from Policy => Process => Standard => Control => Data
  4. Report on effectiveness of “Controls”
  5. Produce specific data related to particular requirements such as: Security Events, Notification, Privacy Impact Assessments, and so forth.
  6. Show the relationship of governance tasks to both data and the business processes that use Personal Information.

Another Data Mart?

12 Jul

Martin’s Insights published the article below. It begs the questions – what to do? Clearly a CDP is created to solve an unmet need. The whatever the answer is for any given organization, data must be known “in context” and must be traceable back to its original form to survive scrutiny. Here is the article.

======================================

Recently you may have heard – from your business network or circle of marketing friends – that Customer Data Platforms (CDPs) is the new ‘black’. Can a CDP really be an all-rounded solution to marketing’s most pressing problem, when it comes to enhancing customer experience? Certainly, if you are in the BI field, the concept…

via Trend Alert – Customer Data Platforms — Martin’s Insights

The merging of analytics and transactional data platforms requires more than just an upgrade in technology!

15 Sep

This IDC white paper puts the evolution of data platforms into layman’s terms. My take away is that the unshackling of information architects and applications from the constraints of the traditional RDBMS will continue. Many of the design choices that the article details are grounded in the historic limitations of the data platform. The comments made under the Future Outlook segment are key:

“Trying to make definitive statements about the state of analytic-transaction data platforms going forward is challenging, because both the database kernel technology and the hardware on which it runs are evolving at a rapid pace. In addition to this, new workloads and mounting performance requirements add even more to the pace of development. It is safe to say that all the technology described in this study, admittedly in a very abstract manner, may be described as transitional technology that is evolving quickly. New approaches to data structures, new optimizations for transactional data once it is fully freed from the constraints of disk optimization, new ways of organizing processors and memory, and the introduction of non-volatile dual in-line memory modules (NVDIMMs) all will no doubt result in technologies within 10 years that are very different from what is described here.

While platforms and technologies are evolving (this discussion has additional detail here), I find the juxtaposition of the “ideal” view presented here and the reality of most data operations interesting. This article provides “Essential Guidance” focused on IT buyers and guidance on choosing the right technology platform.

The focus on hardware and technology tends to obscure an equally important part of the buying equation – namely can managers manage these new technologies to achieve the desired business impacts and resulting business benefits. For the most part the answer is a resounding – NO. For these “next gen” implementations to work, organizations need to not only upgrade their platforms, but also their management practices. The balance of this blog entry examines some of the areas that the IDC article focuses on from the management perspective of the Chief Data Officer or Enterprise Information Architect.

The Enterprise Data Warehouse. Traditionally the Enterprise Data Warehouse (EDW) has been considered the repository of the “single version of the truth”. However, when it comes to analytics – and melding the transactional data store with analytics, this is a hard concept. There is no one version of the truth – everything is context driven. The design alternatives presented in the article (See Figure below) enable this in that they generally store both the transactional (source) and the fully resolved EDW version. This allows users to hit both the transactional store AND the EDW depending on the context they seek and how they want to interact with the data. Implicit in this view is that the context is captured and in a machine exploitable form that enables users to derive their own “single version of the truth”. This is a function of metadata discussed below. Additionally the article recognizes that the “one large database” solution is not generally a viable alternative; the issue being one of “manageability and agility.” This is somewhat contradicted in the opening “opinion” section in that they talk about a canonical data model. However, I am going to assume that the canonical recommendation is related to the metadata and not the content.

In all of the platform options discussed in the paper (see below), data managers need to keep track of a transactional data and data within a fully resolved EDW. The context and the semantic meaning of the content of both of those data sources needs to be managed, cross walked, and communicated to the user community. This will involve an evolution in both management practices and tools.

IDC Graphic on Data PLatforms

Metadata. I like the way this paper addresses metadata:

“Metadata, including all data models and schemas in the relevant databases or data collections, must be harmonized, kept current with those databases, and mapped to higher order constructs, including a business glossary and, for data managed in common, a canonical data model, in order to facilitate the access and management of the data.”

The notion of mapping “higher order constructs” is key. While it is not always possible or feasible to create a canonical data model, it is very feasible to create a canonical metadata model (metamodel). This give you a consistent way to fully describe your data regardless of the physical form it takes, and link it to higher order constructs referred to. My article here talks to the role the enterprise plays in managing the metadata at the enterprise level.

Managing the Evolution. The architectures discussed in the paper all require an evolution from the transactional data stores that exist today towards platforms that can respond to business needs rapidly, and with little or no latency. The “Type 5” platform in Figure 1 is the “Data Lake” that has become such a buzzword. In this configuration, there is a single data structure for both transactions and analytics. The ETL functions, number of indexes, and flexibility that can be applied to render the data all place a larger burden on the governance disciplines. Additionally, the process by which the organization integrates the business and IT activities requires formalizing in a way that breaks down the traditional silos.

Hampering the evolution at some level is the fact that the tool suites are not entirely intuitive. Tools to handle the mapping of the higher order constructs (concepts systems; ontologies; taxonomies, reference data…), and the management of multiple dictionaries cannot easily be implemented without complex configuration and often coding. The tool vendors seem to be coming along, but many are still working to apply governance and curation within the context of table based systems. The reality is that to create fully described data that is linked to higher order constructs, and to manage these relationships requires a collection of tools that must be configured to address your environment. It is not yet easy.

The Way Forward. Previously I have made the comment that the Information Architect, Enterprise Data Management Office, or CDO must initially focus on creating a tangible value proposition for the business side of the house. As long as data management is perceived as a function related to standards, governance and “protocol” it will be perceived as slowing down the business and getting in the way of achieving business goals. This article details a scoped down set of goals that lay the foundation for that initial value proposition. Once the enterprise data management function is able to make the case they actually improve business operations, and impact key success metrics (i.e. revenue), what next?

This is where all the articles regarding CDO’s seem to agree. The next step is all about outreach and engagement with the broader business community – potentially internal and external to the organization. My recommendation here is to perform this activity using a framework that ensures the discussions stay focused on goals, practices, and result in actionable, measurable and prioritized recommendations. The CMMI Data Management Maturity Model (DMM) is one such framework. I am biased, admittedly as I helped create it, but for an independent opinion Bob Lambert at CapTech wrote a review that speaks volumes. The framework is used to engage in a series of workshops. These workshops serve to identify a maturity level, but more importantly identify the business priorities and concerns as detailed by the workshop participants. This is critical as the resulting recommendations inherently have buy-in from across the organization.

Because the Data Management Model evaluates capabilities at the “practice” level (i.e. what people actually do), it inherently details the next steps in terms of recommendations; in other words – do not try to create a semantically equivalent data model across the whole organization if you cannot even do it for a business unit or a project! Additionally, the model recognizes the relationships between functions. The end result is a holistic and integrated set of guidance for the overall data management strategy and implementation roadmap.

Organizations seeking to upgrade their data platforms to more closely resemble the “Analytic Transactional data platform” that enables the real-time enterprise as discussed in the IDC white paper will have greater success more quickly if they evolve their data management practices at the same time.

%d bloggers like this: