Archive | Data Management RSS feed for this section

I Never Metadata I Did Not Like

This presentation is from December 2019 – pre pandemic, and is the last live show I have presented in – I do look forward to getting back into that groove. The following is a write up done by Amber Dennis at DataVersity. Also posted on LLRX.com – here.

Managing Metadata: An Examination of Successful Approaches – DATAVERSITY

Managing Metadata: An Examination of Successful Approaches

By Amber Lee Dennis, 30 Nov 2020

“If Google can deliver results across the entire internet in seconds, why do I have so much trouble finding things in my organization?” asked Jonathan Adams, Research Director at Infogix, at the DATAVERSITY® DGVision Conference, December 2019. In a presentation titled, “I Never Metadata I did Not Like” Adams outlined successful approaches to understanding and managing metadata.

What is Metadata?

According to the DAMA International Data Management Body of Knowledge (DAMA-DMBoK2), the common definition for metadata, ‘data about data,’ is too simple. Similar to the concept of the card catalog in a library, metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures. It describes the data itself, the concepts the data represents, and the relationships between the data and concepts. To understand metadata’s purpose, imagine a large library, with hundreds of thousands of books and magazines, but no card catalog. Without the card catalog, finding a specific book in the library would be difficult, if not impossible. An organization without metadata is like a library without a card catalog.

“Obviously it’s data about data, in that sense. We all know that, but also, one person’s data is another person’s metadata. So it gets kind of confusing,” Adams said. Metadata has traditionally focused on technical metadata, which details the structure of data and where it resides, supports IT in managing data, and assists user communities in accessing and integrating data. Reference data, which provides known vocabulary and creates business and operational context along with semantic meaning, is also metadata. Adams said:

“Metadata is kind of everything. It’s how you visualize it, and it’s how you find it. It totally enables data, and in many respects, it’s going to be the bulk of the data you have.”

Types of Metadata

Descriptive metadata is metadata about the asset, including its title, creator, subject, source, keywords, etc.
Content classification metadata details the content and meaning of the data asset. This includes relationships, data models, entities, the business glossary, controlled vocabularies, taxonomies and ontologies.
Administrative metadata details how to access and use data assets and includes lineage, structure, audit and control, and preservation information.
Usage metadata indicates how data may be used and how it must be controlled, which includes users, rights, confidentiality and sensitivity.

“And if that isn’t complicated enough,” he said, “those four types of metadata get applied slightly differently depending on where you are.”

Metadata for Operational Systems

Adams provided an illustration of an operational system using a pyramid, with reports on the top level, transactional data on the second level, then functional data on level three, master data on level four, and structural and reference data as the base of the pyramid. Types of data not included in this structure might be a data lake used by marketing, external data, financial information, or CRM data:

“This gets complicated, so we’re going to talk about simplifying it. My point here is that you should drive it from the user perspective, with that use case, view it within this context, and scope it appropriately.”

How Is Metadata Important?

Metadata answers critical questions about data:

Is the data discoverable?
Is it understandable?
Can it be accessed?
Is it usable?

Success in Metadata Management is shown by how well a team engages and aligns information to the business and operational context of the organization, Adams said. The DMBoK2 says that like other data, metadata requires management. As the capacity of organizations to collect and store increases, the role of metadata management grows in importance. To be data-drive, and organization must be metadata-driven.

Success with Metadata Management

To manage metadata, start with a framework that aligns data to business and operational contexts so that metadata can support Data Governance in the following areas:

Organizational Impact
Capabilities and Interfaces
Programs and Platforms
Repositories

Adams then further broke down how to address the governance of each of these four areas.

Organizational Impact

Metadata turns critical ‘data’ into critical ‘information.’ Critical information is data + metadata that feeds Key Performance Indicators (KPIs). He recommends asking: “What will change with a better understanding of
your data?” Getting people on board involves understanding how metadata can solve problems for end users while meeting company objectives. “We want to be in a position to say, ‘I do this and your life gets better.’” To have a greater impact, he said, avoid ‘data speak’ and engage with language that the business understands. For example, the business won’t ask for a ‘glossary.’ Instead they will ask for ‘a single view of the customer, integrated and aligned across business units.’ An added benefit of using accessible language is being perceived as helpful, rather than being seen as adding to the workload.

Capabilities and Interfaces

All users must be given the capability to discover information and apply it to challenges, to share critical information, and have access to automated process when available.

Discover and Understand: A catalog search portal allows users to discover what data is available, place that data in context, and understand who can access it, and how to do so.
Communicate and Share: Users need the ability to communicate what they’ve produced and make it available for broader consumption. Complete descriptions of data are necessary for compliance and consistency, but must be available in language geared toward the user. The term ‘ETL processing’ may be adequate for an IT user, but terminology such as ‘GDPR compliance’ should also be available so business users have access to the same information.
Acquire and Integrate: Acquisition and integration varies depending on the perspective of the user and the use case. Administrative metadata enables data consumers to access and integrate data into their environment by clarifying data type, format and access rights. Configuration metadata is important for IT to perform data prep or ETL. Application Programming Interface (API) metadata shows a programmer how to integrate data into a website.
Integrate and Automate: Interactive metadata supports automated processes for communication and coordination among systems.

Programs and Platforms

Metadata supports reporting and visualization, allowing C-suite members to make better decisions. Metadata enables the transformation of operations allowing the business to grow. Labeling is critical so that data can
move around the organization and be used in innovative ways. Once data is understandable, he said, “You’re going to have people using that data to derive insights that they didn’t even know they didn’t know.”

Data Repositories

Existing information architecture enables – or disables – the depth, scope and quality of available metadata. Adams said that the discussion about repositories is more enterprise architecture-driven, rather than about user needs and business priorities. “It defines what you can do going forward, and it also defines what you cannot do today.”

Reference Architecture

When documenting the Information Architecture, Adams suggests focusing on how the information flows around the architecture of the organization, rather than focusing on specific systems. Start with the type of information and where it resides and denote broad applications and system boundaries. Include data shared with people outside the organization. Although it’s critical to understand what’s happening inside the organization, from a risk perspective, when it comes to risk, it’s more important to understand what’s happening outside the organization. “The interesting thing about this is that you want to use it as a communication tool,” he said. If initially it’s too complex for business users to understand, simplify it a bit. The important thing is to bring people on board.

Data Governance

Often overlooked, governance metadata is Business Intelligence (BI) for your data: metadata about metadata. Metadata ties together Business Strategy, Data Strategy, Data Management and operations with Data Governance. “‘What is the state of my metadata across my ecosystem?’ That’s a bit of a wacky concept for people to grasp.” Enterprise architectures and data reference models are an attempt to align and understand governance policies down to the lower level, Adams said.

Metadata can provide answers to governance questions, such as:

How do I know I’m doing this correctly?
What constitutes ‘good’?
Are we deploying best practices? Are they
defined?
Is this data sufficiently labeled to be
considered ‘governed data?’

Building Capability

As competitive factors in the marketplace continue to evolve and change, the ability to quickly rise to meet those challenges can mean the difference between success and failure. Developing new capabilities, scaling to
meet demand, and controlling risk requires the ability to pull reports using data in ways that are impossible to anticipate in advance, Adams said. “If that’s the environment you want, then you want well-labeled data that allows you to pivot, schema-on-demand kind of activity, and a very flexible perspective.”

Managing Metadata: An Examination of Successful Approaches

Tags: dataversity, DataVision 2019.

Comments Leave a Comment
Categories Classification, Data Management, Metadata
Author analyticaltern

Data Prep – More than a Buzzword?

25 Feb

“Data Prep” has become a popular phrase over the last year or so – why? At a practical level, data preparation tools are providing the same functionality that traditional ETL (extract, transform, load) tools provide. Are data prep tools just a marketing gimmick to get organizations to buy more ETL software? This blog seeks to address why data prep capabilities have become a topic of conversation within the data and analytics communities.

Traditionally, data prep has been viewed as slow and laborious, often associated with linear, rigid methodologies. Recently, however, data prep has become synonymous with data agility. It is a set of capabilities that pushes the boundaries of who has access to data, and how they can apply it to business challenges. Looked at this way, data prep is a foundational capability for digital transformation, which I define as the ability of companies to evolve in an agile fashion in some key dimension of their business model. The business driver of most transformation programs is to fundamentally change key business performance metrics, such as revenue, margins, or market share. Viewed in this way, data prep tools are a critical addition to the toolbox when it comes to driving key business metrics.

Consider the way that data usage has evolved, and the role that data prep capabilities are playing.

Analytics is maturing. Analytics is not a new idea. However, for years it was a function relegated to Operations Research (OR) folks and statisticians. This is no longer the case. As BI and reporting tools grew more powerful and increasingly enabled self service for end users, users began asking questions that were more analytical in nature.

Data-Driven decisions require data “in context.” Decision-making and the process that supports it require data to be evaluated in the context of the business or operational challenge at hand. How management perceives an issue will drive what data is collected and how it is analyzed. In the 1950’s and 1960’s, operations research drove analytics, and the key performance indicators were well established. These included time in process, mean time to failure, yield and throughput. All of these were well understood and largely prescriptive. Fast forward to now. Analytics is broadly applied and used well beyond the scope of operations research. New types of analysis driven in large part by social media trends are much less prescriptive and value is driven by context. Examples include: key opinion leader, fraud networks, perceptual mapping, and sentiment analysis.

Big data is driving the adoption of machine learning. Machine learning requires the integration of domain expertise with the data in order to expose “features” within the data that enhance the effectiveness of machine learning algorithms. The activity that identifies and organizes these features is called “feature engineering.” Many data scientists would not equate “data preparation” with feature engineering, yet there is a strong correlation to what an analyst does. A business analyst invariably creates features as they prepare their data for analysis: 1) observations are placed on a time line; 2) revenue is totaled by quarters and year; 3) customers are organized by location, by cumulative spend, and so on. Data Prep in this context is the organization of data around domain expertise, and is a critical input to the harnessing of big data through automation.

Data science is evolving and data engineering is now a thing. Data engineering focuses on how to apply and scale the insights from data science into an operational context. It’s one thing for a data scientist to spend time organizing data for modest initiatives or limited analysis, but for scaled up operational activities involving business analysts, marketers and operational staff, data prep must be a capability that is available to staff with a more generalized skill set. Data engineering supports building capabilities that enable users to access, prepare and apply data in their day-to-day lives.

“Data Prep” in the context of the above is enabling a broader community of data citizens to discover, access, organize and integrate data into these diverse scenarios. This broad access to data using tools that organize and visualize is a critical success factor for organizations seeking the business benefits of digitally enabling their organization. Future blogs will drill down on each of the above to explore how practitioners can evolve their data prep capabilities and apply them to business challenges.

Comments Leave a Comment
Categories Data Prep, ETL, methodologies
Author analyticaltern

The topic of protecting personal information will grow in importance in 2019

19 Nov

For those interested in the protection of personal information, the IAPP has an interesting – albeit rather hefty – IAPP-EY Annual Privacy Governance Report 2018, and the NTIA has released its comments from industry on pending privacy regulation. I noted that the IAPP report indicates most solutions are still almost all or entirely manual. I am not sure how this does not become a management nightmare as organizations evolve their data maturity to align operations and marketing more. Data management as a process discipline and some degree of automation are going to be critical capabilities to ensure personal information is protected. There are simply too many opportunities for error when this is done manually.

I recently published an article in TDAN on automating data management and governance through machine learning. It is not just about ML, other capabilities will be required. However, as long as organizations rely on manual processes only, it opens up risk and places the burden on management to enforce policies that are often resisted as they are perceived as a burden on actually doing business. Data management as a process discipline in conjunction with automated processes will reduce operational overhead and risk.

Tags: Data Protection, GDPR, Machine Learning, ML, Personal Information, TDAN

Comments Leave a Comment
Categories Compliance, Personal Data Protection, Privacy
Author analyticaltern

Automating Data Management and Governance through Machine Learning

21 Aug

I fairly consistently get questions on the value of Machine Learning in data management and governance. Sometimes this question is framed at a high level in a very “buzz wordy” way. The person asking the question may not know what machine learning (ML) is. They have just heard the words so many times that they know it is good and should be part of the discussion. At other times, the person asking the question knows about ML and various other analytical techniques, but has never really thought of ML in the context of a data management tool. The challenge is that the emergence of IoT data, Customer 360 programs, and emerging best practices that focus on sharing semantically tagged data, all contribute to a fundamental need to do things differently. Machine learning is one of the tools in the toolbox to address the challenges related to scale, change velocity, and the consistent evolution of users and their use cases.

This post focuses on how we can automate the process of identifying data, classifying it, and linking it to internal and external references to provide semantic meaning. The goal of this post is simply to describe what machine learning is for the data manager, and what tasks it performs in the context of the standards based operational perspective.

From an operational perspective, the figure below presents the evolution of data from the “raw” transactional state to a highly labelled or curated state that can be shared between purchaser and vendor; or indeed any producer or consumer of data. Machine Learning plays a role in automating how data is curated and enriched across this lifecycle.

Figure 1: The Curation from raw data to sharable Information

If we drill down on the curation lifecycle, we can identify the various repositories that would be required, and a few of the key supporting standards. These standards and their roles are discussed more completely in a follow on post.

Figure 2: The Curation flow across the various repositories required. NOTE that this functional perspective is discussed in the context of a metadata hub in Enterprise Data Management – Where to Start?

The database symbols outlined in blue (solid lines) represent data at rest. The rectangular items outlined in green (dashed lines) represent tasks that automate how data is augmented as it moves along this path. The focus of this discussion is on these green boxes.

Activities within the Data Quality Rules and MDM Rules tasks can be broken down into a number of functional capabilities as detailed below. Some of these capabilities are traditional data operations tasks; namely, persisting metadata in a database, and exposing the data through some sort of cataloging and publishing capability. The other items (outlined in blue) are those where machine learning approaches can be applied.

Figure 3: Functional capabilities supported by Machine Learning

First let’s start with a definition of Machine Learning as Machine Learning has multiple definitions within the popular literature. The website Techemergence provides a comprehensive definition:

“Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.”

Machine learning techniques play a major role in automating the process detailed above especially over unknown or new data sets.

For data management practitioners it is important to understand that no one machine learning technique is going to apply. In all likelihood multiple approaches will be chained together and invariably executed recursively to ensure that the data can be identified, classified and then linked to the appropriate unique identifier. In the ideal world, the algorithms will change or learn to accommodate changes in the data being classified. The figure below lists some of the machine learning techniques that may be applied.

Machine Learning Techniques

Unstructured Data

Structured Data

· Entity Tagging / Extraction

· Categorize

· Cluster

· Summarize

· Tag

· Linking

· Associate

· Characterize

· Classify

· Predict

· Cluster

· Pattern Discovery

· Exception Analysis

Note that these invariably interact with one another. If I tag people entities within unstructured text, I may wish to characterize them using structured technique: count of male names; frequency per document; frequency across documents, etc. This speaks to the layered and recursive nature of machine learning, and the richness of the metadata that the data team will need to manage. For a more technical view of ML techniques see this summary.

These are detailed below with considerations for program managers.

Capability	Considerations
Identify	Machine Learning approaches support the identification of instance data in order to classify the data. Is this personal Information? Does it look like a financial #? Does it reside in a financial statement? For organizations where there is a significant installed legacy challenge. It will be important to have algorithms that identify data of interest. The identification of personal information is a current area of interest driven by the GDPR regulation.
Classify	Once data is identified, ML approaches support classifying the data within the data dictionary: data is in finance domain; it is in the “Deliver” phase of the Supply Chain Operations Reference (SCOR) lifecycle; etc. Classification algorithms must exist that tag the data with the appropriate classifier. Capabilities must quantify and resolve those instances where there is uncertainty as to accuracy of the classification algorithm. For example, are we are 100% certain that this is a vendor and not a customer?
Resolve	The completed data dictionary will support entity resolution by providing a richer feature set against which MDM machine learning algorithms can be run. Resolving the identity of the master data element may require a multi-tiered approach be run iteratively: apply Algorithm #1; for those that do not resolve with Algorithm #1, apply Algorithm #2; etc. For example, now that I know that have classified the data item as vendor master data (previous step), can I resolve the identity with certainty to identify which vendor it is?
Link	The resolved entity must be linked to internal and external reference sources. Machine Learning techniques may be used to identify and resolve link candidates and specify link type / strength. The analytical details of this may be addressed in the above “Resolve” capability. However, the focus here should be on identifying the correct link (or links) where there are multiple candidate reference sets where links could be established. This is a critical step as the linkage to the internal reference “Concept System” is what describes the data element from a semantic perspective. It is also what links the data being described to a publicly available set of definitions that external parties can reference (See “Sharable Information” in figure above). These linkages cross walk an industry accepted definition between supply chain partners. Example: If a supply chain manager seeks to communicate the nature of a product requirement to a vendor – a machine screw for example. The ability to specify length of screw versus length of the “shoulder” on the screw; thread size (Metric, standard, imperial?); type of head (hex, square, pan head, etc.) is critical. The internal labels for these are linked to the industry agreed on labels available to the vendor community. As long as the vendor is using the same reference concept system, both buyer and vendor can be assured that they are talking about the same machine screw.

Once these activities have been completed, the results need to be persisted in a metadata repository and published in a Data Catalog that will allow users to understand what data is available and how it can be accessed.

Some Closing Thoughts :: It’s all about the Ecosystem Maturity!

The above discussion and the content of the two posts in the works on MDM standards and data quality, identifies a set of standards and techniques that seek to streamline and automate the process of Master Data Management. However, these exist within the context of the organization’s data ecosystem. Data practitioners seeking to evolve master data management must ask some core questions regarding information architecture and data management maturity within their ecosystem:

How do these standards support my data strategy?
- Do I have a business case?
- Executive sponsorship?
- Funding?
Does my information architecture support the capabilities that I need to manage Master Data as envisioned by the standards?
- Will legacy systems impact how this gets executed?
- Does the architecture support a “Service Oriented” metadata registry or catalog concept?
- Do I have a metadata catalog?
- What are the architectural boundaries and how do I share data across those boundaries?
Do I have the data management maturity to execute?
- Identified and scalable processes?
- Processes applied consistently across business units?
- A governance operating model that can accommodate new functions and the change management overhead?
- What controls and metrics exist? Need to be created?

Understanding how standards and machine learning fit within the information architecture and the organization’s capability maturity will enable the data team to define the right strategy and build out a realistic roadmap. For organizations with an established and mature governance function, many of the above questions will be resolved – or the mechanism to resolve them exists. However, for organizations that have less capability maturity, the strategy and roadmap will need to be explicit in identifying the business units where foundational capabilities can be created that can later be adopted across the organizations as the need and maturity evolve.

For an additional perspective on the business cases that might drive ML, see Classification – the key to releasing data’s value !

Tags: architecture, best practices, Machine Learning, MDM, ML

Comments 1 Comment
Categories analytics, Big Data, Classification, Data Management, Master Data, Metadata, Standards
Author analyticaltern

DGIQ 2018

12 Jul

The DGIQ conference this year went well. I had two presentations, caught up with industry colleagues and customers. It helped that it was in San Diego – and the weather relative to the hot mugginess of the Mid Atlantic was excellent.

My presentation on GDPR was surprisingly well attended. I say surprising in that the deadline has passed, and I find that there are still companies that are formulating their plans. However, I am beginning to feel a bit like Samuel Jackson.

In the GDPR presentation, the goal was to focus attention on not only doing the right thing to be compliant, but also doing it right. How do we reduce the stress and overhead of dealing with regulators. We call this “Audit Resilience.” I spoke to a number of people that are taking a wait and see approach to GDPR compliance. Interestingly even though they are taking this approach, they are still getting requests to remove personal information. It seems to me that if you are taking a wait and see approach, you really still need to be able to remove personal information from at least the web site otherwise, you risk triggering a complaint, and then … you have no defense. Goal has to be to do everything not to trigger a complaint. The presentation took about 15 minutes, and the rest of the time was spent demonstrating the data control model in the DATUM governance platform – Information Value Management.

Building a Strategy customers and Auditors Love from jadams6

Also had the pleasure of presenting with Lynn Scott who co chairs the Healthcare Technology & Innovation practice at Polsinelli with Bill Tanenbaum – what we wanted to do was push home the point that collaboration is key when dealing with thorny risk and compliance issues. We tried to have some fun with this one.

A Lawyer, a Salesperson and the Operations Guy Walk into a Bar . . . from jadams6

I will be at the Data Architecture Summit in Chicago in October. The session will cover:

What are the requirements to ensure management is “audit resilient”?
What is a Control System and how is it related to a Data Control Model?
What is “regulatory alignment” from a data perspective?
How do I build a Data Control Model?
What role do advanced techniques (AI, Machine Learning) play in audit resilience?

Hope to see you all there

Tags: audit reslience, Conferences, DGIQ, GDPR

Comments Leave a Comment
Categories Compliance, Personal Data Protection, Privacy
Author analyticaltern

What is Machine Learning?

11 Jul

I enjoyed this post in Techemergence. I was originally looking for a definition of machine learning that helped reconcile all of the different definitions that are out there. I like the approach that they took. I modified a table they had (presented below) slightly to capture some of the thoughts on machine learning methods.

This is too complicated for many, but captures the idea that ML is layered, and will involve many techniques. I have a simplified list in Automating Data Management and Governance through Machine Learning.

See also No, Machine Learning is not just glorified Statistics for some more discussion in plain English on Machine learning.

Classification

Scoring

Recommendation / Prediction

· K-Nearest Neighbor

· Support Vector Machines

· Naïve Bayes

· Logistic Regression

· Decision Trees

· Sets of Rules

· Propositional Rules

· Logic Rules

· Neural Networks

· Bayesian Networks

· Conditional Random Fields

· Accuracy / Error Rate

· Precision & Recall

· Squared Error

· Likelihood

· Posterior Probability

· Information Gain

· K-L Divergence

· Cost / Utility

· Margin

Combinatorial Optimization

· Greedy Search

· Beam Search

· Branch & Bound

Continuous Optimization

· Gradient Descent

· Conjugant Gradient

· Quasi Newton Method

· Linear Programming

· Non-Linear (Quadratic) Programming

Credit: Dr. Pedro Domingo, University of Washington (Slightly Simplified)

Tags: Machine Learning, ML

Comments 1 Comment
Categories analytics, Metadata
Author analyticaltern

Classification – the key to releasing data’s value !

6 Jun

Someone asked me the other day what the business case was for classifying data. For anyone that has engaged with data to perform analytics or produce business intelligence reports, this may seem like a silly question. However, in many minds, the data does not need to be labelled or classified in any way. The data is used by an application and if that application is performing correctly, the data must be good. And, at some level they are right – as long as the data involved never has to be used outside its application, it may never need to be classified or labelled in any way. The data receives all of it semantic context from the application where it is used.

So when does classification become important? It becomes important when data leaves the application that gave it context. For many of our customers this occurs when data leaves the transactional ERP type system, and is moved into a data warehouse or a data lake whose purpose is to provide access to data from multiple sources. Traditionally, this movement from transactional to a more generally accessible repository came with a level of curation. Prior to the concept of the “Data Lake,” data was moved into the data warehouse with the goal of making it the “single source” of truth. This often involved significant levels of data stewardship and curation to reconcile conflicting versions of “truth.” With the growing awareness and adoption of analytics, the idea of a stable concept of “truth” is elusive. The right data for an analyst is context driven and at times highly variable. The Data Lake construct addresses this issue by allowing all data to be loaded so that the user can determine what data to use based on the decision context at the time. This is what data classification enables. Well classified data can be discovered, analyzed, accessed and integrated into a user’s context based on the classification labels that have been exposed to the user in the Data Asset catalog. Based on this perspective, classification is foundational for driving value out of data in the areas of analytics, business intelligence, operational efficiencies, and compliance.

Indeed in the big data space, classification is foundational for analytics, machine learning, the application of higher level logic, and (way up the maturity curve) for building artificial intelligence capabilities. As a foundational building block for Ai, classification is an interesting topic; although for many too abstracted from today’s problems. However, as the foundation for making data discoverable, understandable, accessible and able to be integrated into downstream applications, it is highly relevant to today’s challenges – almost regardless of where your current capabilities stand. For this reason any data management shop should include in its planning a workstream that seeks to evolve classification capabilities

Consider the following uses cases:

Business Intelligence: marketers seeking to report on price sensitivity and are comparing the difference between prices quoted, prices invoiced, and prices paid net of discount. Data across all of the ERP or transactional systems in use must be classified such that the BI Team is assured that all fields marked as “Price” are the correct type of price.

Marketing Analytics: Your customer 360ᵒ program seeks to understand external factors that may have influenced pricing and discounts provided. What customers are related to the prices referenced above? What kind of customers are they (industry, buying frequency, average purchase, …)? How can I correlate those with external events (elections, new regulation, natural disasters, …)? All of this analysis is supported by data that is classified to reflect the types of queries that may occur and analytical operations to be performed.

Operational Efficiency: Your COO wants to ensure that the acquisition process is fully optimized, and seeks to benchmark operations using the SCOR (Supply Chain Operations Reference) Model. The Operations Team downloads the 250 SCOR performance metrics and seeks to map those to the relevant data. Classification supports the ability to find the right data and map it to the data specified in the SCOR Model.

Compliance & Risk Management. Risk teams will rely on well classified data to enable risk models that are robust and flexible in their ability to address evolving risk. This is especially the case for risk associated with adaptive threats; for example fraud and cyber-crime.

Bottom line, if classification is not something that you have thought about, consider putting a plan together. It is the key to releasing the value of your data, and fully leveraging data as an asset.

Tags: Classification

Comments 2 Comments
Categories Best Practices, BI, Big Data, Classification, Data Management, Metadata
Author analyticaltern

Enterprise Data Worlds

22 May

I attended the Enterprise Data Worlds conference last month in San Diego. I was speaking on GDPR, and what you needed to do if you were just starting to think about GDPR as the deadline is now so close. The meeting was well attended which was a surprise given how close we are to the deadline. The Facebook / Cambridge Analytica fiasco has drawn attention to the protection of personal information, and to GDPR in particular. What I see are the smaller companies getting drawn into the discussion, and realizing how big this might be for them. The deck is below.

In general, the show continues to improve. The keynote presentation by Mike Ferguson. Intelligent Business Strategies Ltd Was interesting in that I am not sure if the same presentation had been given a couple of years ago that it would have been as well received. It would have been considered a fantasy by so many in the audience. Some of his key points:

Very comprehensive at the enterprise level – remember when Enterprise data management – or enterprise anything was a bad word?!
Tagging and classification is all going to be algorithm driven, and in the pipe – In his presentation IOT was driving the volume – had some good volume numbers.
Pushing the virtual enterprise data lake – everything tied together in a metadata hub

The products and vendor knowledge was the biggest surprise of the show – probably because expectations were low. In general, the tools discussions were more applied. Key observations:

Much more evolved presentations – hooked to business drivers.
Integrated products on the rise. Especially around the source to target discussion:
- ETL, DQ, Profiling and Remediation are integrated into a single pipeline discussion
- Sales people were more knowledgeable about how this works.
- API injection of new capabilities into this pipeline – this was something that all professed to do. However, when pushed it was clear that there were varying stages of capability – All seemed to have APIs, the question seemed to be about how robust the API is.
- Linked data / semantics was a bigger topic than normal. It is beginning to be discussed in an applied sense.
- The FIBO (Financial Business Ontology) is a driver in this – more importantly it is being integrated into tools – so people can visualize how it is applied. This is pulling in the business side of the house
- This is all metadata especially business metadata – this is shifting the discussion towards business.

Tags: dataversity, EDW, Enterprise Data Worlds, GDPR

Comments Leave a Comment
Categories Compliance, Data Management, Industry, Personal Data Protection
Author analyticaltern

Audit Resilience and the GDPR

15 May

Compliance activities for organizations are often driven from the legal or risk groups. The initial focus is on management’s position and actions required to be compliant; generally this starts with the creation of policies. This makes sense as policies are a reflection of management’s intent and provide guidance on how to put strategic thinking into action. The legal teams provide legal interpretation and direction with respect to risk. This is also incorporated into the policies. So, what happens next as your organization addresses challenges around ensuring effective implementation and subsequent operational oversight of policies required for General Data Protection Regulation (GDPR) compliance?

THE CHALLENGES

The challenges associated with GDPR as well as other compliance activities are centered on achieving “Audit Resilience.” We define this as the ability to address the needs of the Auditor – internal or external – in such a way that compliance is operationally enabled and can be validated easily and with minimal disruptions and cost. The goal is to reduce the stress, the chaos and the costs that often accompany these events to a manageable level.

WHAT DOES AUDIT RESILIENCE MEAN?

Audit Resilience means that the auditor can:

Easily discern the clear line of site between Policies => Standards => Controls => Actors => Data.
Review and explicitly align governance artifacts (policies, standards and processes) to compliance requirements.
Access and validate the “controls” that ensure standards are applied effectively.
Find evidence of execution of the governance practices within the data.

CRITICAL SUCCESS FACTORS

GDPR compliance is a function of creating logical linkage and consistency across multiple functions and actors – down to the data level. Details will vary based on the organization and the assessment of risk.

Overall, the following are critical to successfully demonstrating compliance:

Produce a catalog of all impacted data
Know where data is being used, and by whom
Show governance lineage from Policy => Process => Standard => Control => Data
Report on effectiveness of “Controls”
Produce specific data related to particular requirements such as: Security Events, Notification, Privacy Impact Assessments, and so forth.
Show the relationship of governance tasks to both data and the business processes that use Personal Information.

Tags: Audit, audit reslience, GDPR

Comments Leave a Comment
Categories Best Practices, Compliance, Privacy
Author analyticaltern

← Older Entries

Search

analyticaltern

I Never Metadata I Did Not Like

Managing Metadata: An Examination of Successful Approaches

What is Metadata?

Types of Metadata

Metadata for Operational Systems

How Is Metadata Important?

Success with Metadata Management

Organizational Impact

Capabilities and Interfaces

Programs and Platforms

Data Repositories

Reference Architecture

Data Governance

Building Capability

Data Prep – More than a Buzzword?

Automating Data Management and Governance through Machine Learning

Some Closing Thoughts :: It’s all about the Ecosystem Maturity!

What is Machine Learning?

Classification – the key to releasing data’s value !

Recent Posts

Archives

Follow Blog via Email

Interesting Tags

Wayne Erikson

Pages

Search

Managing Metadata: An Examination of Successful Approaches

What is Metadata?

Types of Metadata

Metadata for Operational Systems

How Is Metadata Important?

Success with Metadata Management

Organizational Impact

Capabilities and Interfaces

Programs and Platforms

Data Repositories

Reference Architecture

Data Governance

Building Capability

Share this:

Share this:

Share this:

Some Closing Thoughts :: It’s all about the Ecosystem Maturity!

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

THE CHALLENGES

WHAT DOES AUDIT RESILIENCE MEAN?

CRITICAL SUCCESS FACTORS

Share this:

Recent Posts

Archives

Follow Blog via Email

Pages