Best Practices | analyticaltern

Archive | Best Practices RSS feed for this section

Lots of discussion on Ai Governance & Privacy this week

Sub Committee hearing on oversight of Ai here

I like this quote: “I recommend that since so many risks of AI systems come from within
relationships where people are on the bad end of an information asymmetry, lawmakers
should implement broad, non-negotiable duties of loyalty, care, and confidentiality as
part of any broad attempt to hold those who build and deploy AI systems accountable.”

It seems to me that if one follows this logic, we end up with principle based legislation that will present challenge in building control models. It will take time for best practices to emerge. Do we end up with something that looks like GDPR but for Ai?

Blumenthal & Hawley Announce Bipartisan Framework on Artificial Intelligence Legislation

Comprehensive framework would establish an independent oversight body, allow enforcers & victims to seek legal accountability for harms, promote transparency, & protect personal data

The good thing about the way that this is written up is that many of the data and PII best practices already on the books are captured – i.e. transparency and how children’s data is managed are the two that caught my eye.

SB-362 Data broker registration: accessible deletion mechanism.(2023-2024)

Much wailing and gnashing of teeth here. This is one of those things that in principle sounds great, but in practice will be complex – maybe in this day and age that applies to all privacy data management. My biggest issue surrounds what organizations do until this all gets sorted out – what does “good” look like from the regulator perspective?

This is summed up in the following from Alex LaCasse at the IAPP “”From a purely practical perspective, in a relatively short time period, there are now many varying privacy laws that require companies to quickly and wholly change their operations and technical infrastructure, let alone their business practices that are reliant on data,” Kelley Drye & Warren Partner Alysa Hutnik, CIPP/US, said. “In the meantime, companies are devoting millions to revamp their operations to comply with these laws in good faith, knowing that realistically their interpretation of these laws may be off, and many more millions of dollars will need to be spent to course-correct based on future regulations and regulatory guidance.”

I am reminded of a comment a lawyer friend made back in 2017 when GDPR was all the rage: “if you wait until the details are sorted out in court, then you will not have wasted millions – far cheaper to pay me $60k to defend this position than to do a system upgrade and have to re do it every time legal opinions are released” (and yes he said $60k which sound too low to me!)

Form a practical perspective, I keep coming back to the core privacy principles – which basically align to GDPR and CCPA Rights and Obligations. We need to be able to execute on those rights at some level, and get those foundations in place, and be in a position to fine tune when the details emerge.

Core Principles:

Lawfulness, fairness and transparency
Purpose limitation
Data minimization
Accuracy
Storage limitation
Integrity and confidentiality (security)
Accountability

Tags: Ai, Artificial Intelligence, best practices, Legislation, Privacy

Business Framework for Analytics Implementation

14 Sep

Updated 9/14/20 with new links. It is a bit ironic that I linked to the Dataversity site, and they do not use persistent identifiers to label their data assets, so all my links are dead. Note to practitioners – if you are not using persistent identifiers your institutional knowledge captured in data assets lasts as long as the identifier!

I went looking for this deck as I was having a discussion on governance that is as old as the hills; essentially how do you link data governance activities to the business activity to address – why does data governance exist?

The other discussion that got me looking at this article again was how we go about building an operating model for organizations where the Governance team is doing more than responding to quality requests – how does the team proactively address data issues?

Both of these are tied to the article below. The Hoshin Framework (at least as it is presented below) ties strategic initiatives all the way down to identified data capabilities that can be addressed proactively to support the business strategy.

A note on the spreadsheet. This spreadsheet is not for the faint of heart. The spreadsheet supports the thought exercise used to shape discussions and your communication with stakeholders. The key point to take away is that the spreadsheet gives you the ability to relate governance budget to strategic goals, funded programs, current project and metrics. Think of it as the audit worksheets – no one ever sees those, and the auditor reports out only the results.

Original Post.

In my previous post I discussed some analytical phrases that are gaining traction. Related to that I have had a number of requests for the deck that I presented at the Enterprise Dataversity – Data Strategy & Analytics Forum. I have attached the presentation here. NOTE: This presentation was done a few years ago while I was with CMMI (Now ISACA) as a result it is tied to their Data Management Maturity Model. I talked about analytics, and my colleague on the presentation addressed data maturity.

Also, while I am posting useful things that people keep asking for, here are a set of links that Jeff Gentry did on management frameworks for a Dataversity Webinar. Of particular interest to me was the mapping of the Hoshin Strategic Planning Framework to the CMMI Data Management Maturity Framework. The last link is the actual excel spreadsheet template.

Links:

Webinar Recording: http://www.dataversity.net/cdo-webinar-cdo-interview-with-jeff-gentry-favorite-frameworks/. Here is link to deck.
Link to Using Hoshin Frameworks. Hoshin is bigger than just this matrix, and is a heavy process for most people. However, the following gives you soem background: http://www.slideshare.net/Lightconsulting/hoshin-planning-presentation-7336617
Hoshin Framework linked to DMM: Data Analytics Strategy and Roadmap Template 20160204D.xlsx

Tags: analytics, dataversity, Hoshin, Jeff Gentry, Strategy

The topic of protecting personal information will grow in importance in 2019

19 Nov

For those interested in the protection of personal information, the IAPP has an interesting – albeit rather hefty – IAPP-EY Annual Privacy Governance Report 2018, and the NTIA has released its comments from industry on pending privacy regulation. I noted that the IAPP report indicates most solutions are still almost all or entirely manual. I am not sure how this does not become a management nightmare as organizations evolve their data maturity to align operations and marketing more. Data management as a process discipline and some degree of automation are going to be critical capabilities to ensure personal information is protected. There are simply too many opportunities for error when this is done manually.

I recently published an article in TDAN on automating data management and governance through machine learning. It is not just about ML, other capabilities will be required. However, as long as organizations rely on manual processes only, it opens up risk and places the burden on management to enforce policies that are often resisted as they are perceived as a burden on actually doing business. Data management as a process discipline in conjunction with automated processes will reduce operational overhead and risk.

Tags: Data Protection, GDPR, Machine Learning, ML, Personal Information, TDAN

Comments Leave a Comment
Categories Compliance, Personal Data Protection, Privacy
Author analyticaltern

Architecting the Framework for Compliance & Risk Management

24 Oct

Really quick visit to the Data Architecture Summit this year. I wish I could have stayed longer, but I had to get back to a project.

My presentation was on creating audit defensibility that ensures practices are compliant and performed in a way that is scalable, transparent, and defensible; thus creating “Audit Resilience.” Data practitioners often struggle with viewing the world from the auditor’s perspective. This presentation focused on how to create the foundational governance framework supporting a data control model required to produce clean audit findings. These capabilities are critical in a world where due diligence and compliance with best practices are critical in addressing the impacts of security and privacy breaches.

Here is the deck. This was billed as an intermediate presentation and we had a mixed group of business folks and IT people with good questions and dialogue. I am looking forward to the next event.

Architecting the Framework for Compliance & Risk Management from jadams6

Tags: best practices, dataversity, Risk Management

Comments Leave a Comment
Categories Best Practices, Industry, methodologies, Privacy
Author analyticaltern

Agile – we just keep trying to make it work!

3 Aug

In the summer of 2013, I must have been thinking about Agile approaches to development as I wrote two blogs on the topic:

I was interested to see that Martin Fowler released an article on yet another approach to fixing what is wrong with agile; the Agile Fluency Model. The article provides a good comprehensive write up on this approach. However, go back to look at the links in the above blogs. There are a number of amusing ones. This one from Martin Fowler titled Flaccid Scrum, and these two very amusing ones here and here. They all refer to the same set of challenges facing how agile is implemented.

I am not sure I have anything to add to the debate. however, I do note that successful teams invariably: 1) involve a white board; 2) engage in lively and dynamic dialogue around the challenge; and 3) have team members with an intuitive user centric understanding of the problems the team seeks to solve.

I guess I am also surprised that we are still talking about how to “do” agile!

Link to agile Fluency Model Diagnostic

Update: Interesting article here by Joshua Seckel

Tags: Agile, martin's Insights, Martin_Fowler

Comments Leave a Comment
Categories Best Practices, Project Management
Author analyticaltern

DGIQ 2018

12 Jul

The DGIQ conference this year went well. I had two presentations, caught up with industry colleagues and customers. It helped that it was in San Diego – and the weather relative to the hot mugginess of the Mid Atlantic was excellent.

My presentation on GDPR was surprisingly well attended. I say surprising in that the deadline has passed, and I find that there are still companies that are formulating their plans. However, I am beginning to feel a bit like Samuel Jackson.

In the GDPR presentation, the goal was to focus attention on not only doing the right thing to be compliant, but also doing it right. How do we reduce the stress and overhead of dealing with regulators. We call this “Audit Resilience.” I spoke to a number of people that are taking a wait and see approach to GDPR compliance. Interestingly even though they are taking this approach, they are still getting requests to remove personal information. It seems to me that if you are taking a wait and see approach, you really still need to be able to remove personal information from at least the web site otherwise, you risk triggering a complaint, and then … you have no defense. Goal has to be to do everything not to trigger a complaint. The presentation took about 15 minutes, and the rest of the time was spent demonstrating the data control model in the DATUM governance platform – Information Value Management.

Building a Strategy customers and Auditors Love from jadams6

Also had the pleasure of presenting with Lynn Scott who co chairs the Healthcare Technology & Innovation practice at Polsinelli with Bill Tanenbaum – what we wanted to do was push home the point that collaboration is key when dealing with thorny risk and compliance issues. We tried to have some fun with this one.

A Lawyer, a Salesperson and the Operations Guy Walk into a Bar . . . from jadams6

I will be at the Data Architecture Summit in Chicago in October. The session will cover:

What are the requirements to ensure management is “audit resilient”?
What is a Control System and how is it related to a Data Control Model?
What is “regulatory alignment” from a data perspective?
How do I build a Data Control Model?
What role do advanced techniques (AI, Machine Learning) play in audit resilience?

Hope to see you all there

Tags: audit reslience, Conferences, DGIQ, GDPR

Comments Leave a Comment
Categories Compliance, Personal Data Protection, Privacy
Author analyticaltern

Classification – the key to releasing data’s value !

6 Jun

Someone asked me the other day what the business case was for classifying data. For anyone that has engaged with data to perform analytics or produce business intelligence reports, this may seem like a silly question. However, in many minds, the data does not need to be labelled or classified in any way. The data is used by an application and if that application is performing correctly, the data must be good. And, at some level they are right – as long as the data involved never has to be used outside its application, it may never need to be classified or labelled in any way. The data receives all of it semantic context from the application where it is used.

So when does classification become important? It becomes important when data leaves the application that gave it context. For many of our customers this occurs when data leaves the transactional ERP type system, and is moved into a data warehouse or a data lake whose purpose is to provide access to data from multiple sources. Traditionally, this movement from transactional to a more generally accessible repository came with a level of curation. Prior to the concept of the “Data Lake,” data was moved into the data warehouse with the goal of making it the “single source” of truth. This often involved significant levels of data stewardship and curation to reconcile conflicting versions of “truth.” With the growing awareness and adoption of analytics, the idea of a stable concept of “truth” is elusive. The right data for an analyst is context driven and at times highly variable. The Data Lake construct addresses this issue by allowing all data to be loaded so that the user can determine what data to use based on the decision context at the time. This is what data classification enables. Well classified data can be discovered, analyzed, accessed and integrated into a user’s context based on the classification labels that have been exposed to the user in the Data Asset catalog. Based on this perspective, classification is foundational for driving value out of data in the areas of analytics, business intelligence, operational efficiencies, and compliance.

Indeed in the big data space, classification is foundational for analytics, machine learning, the application of higher level logic, and (way up the maturity curve) for building artificial intelligence capabilities. As a foundational building block for Ai, classification is an interesting topic; although for many too abstracted from today’s problems. However, as the foundation for making data discoverable, understandable, accessible and able to be integrated into downstream applications, it is highly relevant to today’s challenges – almost regardless of where your current capabilities stand. For this reason any data management shop should include in its planning a workstream that seeks to evolve classification capabilities

Consider the following uses cases:

Business Intelligence: marketers seeking to report on price sensitivity and are comparing the difference between prices quoted, prices invoiced, and prices paid net of discount. Data across all of the ERP or transactional systems in use must be classified such that the BI Team is assured that all fields marked as “Price” are the correct type of price.

Marketing Analytics: Your customer 360ᵒ program seeks to understand external factors that may have influenced pricing and discounts provided. What customers are related to the prices referenced above? What kind of customers are they (industry, buying frequency, average purchase, …)? How can I correlate those with external events (elections, new regulation, natural disasters, …)? All of this analysis is supported by data that is classified to reflect the types of queries that may occur and analytical operations to be performed.

Operational Efficiency: Your COO wants to ensure that the acquisition process is fully optimized, and seeks to benchmark operations using the SCOR (Supply Chain Operations Reference) Model. The Operations Team downloads the 250 SCOR performance metrics and seeks to map those to the relevant data. Classification supports the ability to find the right data and map it to the data specified in the SCOR Model.

Compliance & Risk Management. Risk teams will rely on well classified data to enable risk models that are robust and flexible in their ability to address evolving risk. This is especially the case for risk associated with adaptive threats; for example fraud and cyber-crime.

Bottom line, if classification is not something that you have thought about, consider putting a plan together. It is the key to releasing the value of your data, and fully leveraging data as an asset.

Tags: Classification

Comments 2 Comments
Categories Best Practices, BI, Big Data, Classification, Data Management, Metadata
Author analyticaltern

Audit Resilience and the GDPR

15 May

Compliance activities for organizations are often driven from the legal or risk groups. The initial focus is on management’s position and actions required to be compliant; generally this starts with the creation of policies. This makes sense as policies are a reflection of management’s intent and provide guidance on how to put strategic thinking into action. The legal teams provide legal interpretation and direction with respect to risk. This is also incorporated into the policies. So, what happens next as your organization addresses challenges around ensuring effective implementation and subsequent operational oversight of policies required for General Data Protection Regulation (GDPR) compliance?

THE CHALLENGES

The challenges associated with GDPR as well as other compliance activities are centered on achieving “Audit Resilience.” We define this as the ability to address the needs of the Auditor – internal or external – in such a way that compliance is operationally enabled and can be validated easily and with minimal disruptions and cost. The goal is to reduce the stress, the chaos and the costs that often accompany these events to a manageable level.

WHAT DOES AUDIT RESILIENCE MEAN?

Audit Resilience means that the auditor can:

Easily discern the clear line of site between Policies => Standards => Controls => Actors => Data.
Review and explicitly align governance artifacts (policies, standards and processes) to compliance requirements.
Access and validate the “controls” that ensure standards are applied effectively.
Find evidence of execution of the governance practices within the data.

CRITICAL SUCCESS FACTORS

GDPR compliance is a function of creating logical linkage and consistency across multiple functions and actors – down to the data level. Details will vary based on the organization and the assessment of risk.

Overall, the following are critical to successfully demonstrating compliance:

Produce a catalog of all impacted data
Know where data is being used, and by whom
Show governance lineage from Policy => Process => Standard => Control => Data
Report on effectiveness of “Controls”
Produce specific data related to particular requirements such as: Security Events, Notification, Privacy Impact Assessments, and so forth.
Show the relationship of governance tasks to both data and the business processes that use Personal Information.

Tags: Audit, audit reslience, GDPR

Comments Leave a Comment
Categories Best Practices, Compliance, Privacy
Author analyticaltern

Building Solid Foundations in Big Data & Analytics

23 Aug

Originally Published on the DATUM, LLC Site: Building Solid Foundations in a data Swamp

Much has been written about Big Data, Data Science and Artificial Intelligence and how these will change the world through the insights being derived from the data. This especially applies to the unstructured data. A recent article in the Harvard Business Review indicated that “cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all.”[1]

There are a few challenges however:

How do users create understanding and ensure they have the correct data for their needs if it has no structure?
How do you create a single logical view of data in a big data world, where things are not only highly variable, but also are often widely disbursed.
How do you address analytical requirements, where the notion of data quality and how it is managed, varies significantly?
How do you expose the data lake(s) to users in a form that is discoverable, understandable and useable?

This blog is the first in a series to explore the data management and governance perspectives related to these four challenges.

Challenge #1: Unstructured Data

The question of how to deal with unstructured data consistently raises its head as a challenge for organizations. First let’s get a few things out there:

There is no such thing as truly unstructured data. There is always a structure of some sort.
Knowing what you have and having the right tools are foundational capabilities.
The degree of structure required for data to be useful is variable and context driven.

Let’s take these in order:

Creating Structure

Structure is created in one of two ways:

Through reorganizing data so that it has structure
Through labeling data

The former is what happens to data in a traditional data environment as it is moved through the ecosystem – from Source to Enterprise Data Warehouse for example. The latter is what happens in a big data environment. The data is never moved, but rather labels are added to it to provide the ability analyze that data.

Note: Data can be labeled incrementally. Newly acquired data, can only be labelled with the acquisition date, the source, and the file type. As data moves through the data lifecycle, it will be “curated” to add additional context.

A little labelling goes along way!

How much the data needs to be labelled to be useful can be viewed on a continuum. At one end simply knowing that you are looking at emails provides enough information to know how to organize them; while at the other end, social media sentiment analysis will require extensive labelling. Regardless, the right tools are required to provide logical structure to the unstructured data.

When it comes to tools that cater to unstructured data one key capability is entity tagging or entity extraction tools that can recognize an entity and tag it with a label that makes sense to the organization – essentially tag it with the approved glossary term. Entities can be:

Anything from a simple named list such as a “product”; or
Extremely complex and map entities into semantic ontologies such as a “JV” is a “Joint Venture”, which is a type of “Company”, which is an “organization” that has “owners”.

Complementing the tagging capability is a flexible indexing capability. Tools like Elastic Search allow users to search based on the structures discovered in the data. For example, a “Joint Venture “is a type of company. Additionally, these tools can create an index to allow discovery of similarities in text.

The key point is that once data is organized, users and applications can begin to apply big data techniques to expose insights:

How do emails cluster on a timeline?
Are organizations mentioned in the text? (Could be Joint Ventures, Partnerships, LLCs, PLCs, and so on.)
Is there a change in frequency over time? Related to what entity types / categories?

What does this mean from a data management perspective?

From a data management perspective unstructured data will require some new capabilities. However, in some respects, it really is more of the same: What data do I have and where is it? Is my data labelled to communicate understanding? Is my data easy to acquire and apply in my context?

If you think of tags or labels as descriptive metadata, and the list of tags and labels as reference metadata, then you can place this activity into the traditional data management context. In order for data to be discovered, understood and integrated across systems and use cases, organizations need to:

Have a disciplined approach to how data is described and labelled. This starts with creating a set of glossary terms that can be linked to define meaning. [2]
Implement the governance framework that ensures the data is aligned to – and remains aligned to – the business understanding of what the data is, and how it is used.

Organizations often do not face this challenge until they need to manage data across the various operational silos, geographic regions or functional domains. The need to understand product lifecycle data with regional focus group data is an example of a cross functional/geography/silo data mash up that delivers high impact insights.

Be sure to check back in as we address the next three challenges!

References

[1] Harvard Business Review What’s Your Data Strategy? Leandro DalleMule, Thomas H. Davenport; May –June 2017 Issue https://hbr.org/2017/05/whats-your-data-strategy

[2] With reference to linking of data, the simple link types are “subset of”, “superset of”, “same as”. (See SKOS for a deeper discussion on knowledge organization). For example, using this approach one can tag pharmaceutical products to identify synonyms as recognized by the ISO standards; and synonyms of the same product that are commercial names. This is the challenge faced by organizations implementing the IDMP standards.

[3] For a good case study of data integration across disparate data sets using SKOS metadata see Healthcare Research Information

Tags: Ai, Artificial Intelligence, Big Data, healthcare, SKOS, Unstructured data

Comments Leave a Comment
Categories analytics, Best Practices, Data Management, Metadata
Author analyticaltern

← Older Entries

Search

analyticaltern

Lots of discussion on Ai Governance & Privacy this week

Sub Committee hearing on oversight of Ai here

Blumenthal & Hawley Announce Bipartisan Framework on Artificial Intelligence Legislation

SB-362 Data broker registration: accessible deletion mechanism.(2023-2024)

Business Framework for Analytics Implementation

Architecting the Framework for Compliance & Risk Management

Agile – we just keep trying to make it work!

Classification – the key to releasing data’s value !

Building Solid Foundations in Big Data & Analytics

Challenge #1: Unstructured Data

Creating Structure

A little labelling goes along way!

What does this mean from a data management perspective?

Recent Posts

Archives

Follow Blog via Email

Interesting Tags

Wayne Erikson

Pages

Search

Sub Committee hearing on oversight of Ai here

SB-362 Data broker registration: accessible deletion mechanism.(2023-2024)

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

THE CHALLENGES

WHAT DOES AUDIT RESILIENCE MEAN?

CRITICAL SUCCESS FACTORS

Share this:

Challenge #1: Unstructured Data

Creating Structure

A little labelling goes along way!

What does this mean from a data management perspective?

Share this:

Recent Posts

Archives

Follow Blog via Email

Pages