best practices | analyticaltern

Tag Archives: best practices

Lots of discussion on Ai Governance & Privacy this week

Sub Committee hearing on oversight of Ai here

I like this quote: “I recommend that since so many risks of AI systems come from within
relationships where people are on the bad end of an information asymmetry, lawmakers
should implement broad, non-negotiable duties of loyalty, care, and confidentiality as
part of any broad attempt to hold those who build and deploy AI systems accountable.”

It seems to me that if one follows this logic, we end up with principle based legislation that will present challenge in building control models. It will take time for best practices to emerge. Do we end up with something that looks like GDPR but for Ai?

Blumenthal & Hawley Announce Bipartisan Framework on Artificial Intelligence Legislation

Comprehensive framework would establish an independent oversight body, allow enforcers & victims to seek legal accountability for harms, promote transparency, & protect personal data

The good thing about the way that this is written up is that many of the data and PII best practices already on the books are captured – i.e. transparency and how children’s data is managed are the two that caught my eye.

SB-362 Data broker registration: accessible deletion mechanism.(2023-2024)

Much wailing and gnashing of teeth here. This is one of those things that in principle sounds great, but in practice will be complex – maybe in this day and age that applies to all privacy data management. My biggest issue surrounds what organizations do until this all gets sorted out – what does “good” look like from the regulator perspective?

This is summed up in the following from Alex LaCasse at the IAPP “”From a purely practical perspective, in a relatively short time period, there are now many varying privacy laws that require companies to quickly and wholly change their operations and technical infrastructure, let alone their business practices that are reliant on data,” Kelley Drye & Warren Partner Alysa Hutnik, CIPP/US, said. “In the meantime, companies are devoting millions to revamp their operations to comply with these laws in good faith, knowing that realistically their interpretation of these laws may be off, and many more millions of dollars will need to be spent to course-correct based on future regulations and regulatory guidance.”

I am reminded of a comment a lawyer friend made back in 2017 when GDPR was all the rage: “if you wait until the details are sorted out in court, then you will not have wasted millions – far cheaper to pay me $60k to defend this position than to do a system upgrade and have to re do it every time legal opinions are released” (and yes he said $60k which sound too low to me!)

Form a practical perspective, I keep coming back to the core privacy principles – which basically align to GDPR and CCPA Rights and Obligations. We need to be able to execute on those rights at some level, and get those foundations in place, and be in a position to fine tune when the details emerge.

Core Principles:

Lawfulness, fairness and transparency
Purpose limitation
Data minimization
Accuracy
Storage limitation
Integrity and confidentiality (security)
Accountability

Tags: Ai, Artificial Intelligence, best practices, Legislation, Privacy

Architecting the Framework for Compliance & Risk Management

24 Oct

Really quick visit to the Data Architecture Summit this year. I wish I could have stayed longer, but I had to get back to a project.

My presentation was on creating audit defensibility that ensures practices are compliant and performed in a way that is scalable, transparent, and defensible; thus creating “Audit Resilience.” Data practitioners often struggle with viewing the world from the auditor’s perspective. This presentation focused on how to create the foundational governance framework supporting a data control model required to produce clean audit findings. These capabilities are critical in a world where due diligence and compliance with best practices are critical in addressing the impacts of security and privacy breaches.

Here is the deck. This was billed as an intermediate presentation and we had a mixed group of business folks and IT people with good questions and dialogue. I am looking forward to the next event.

Architecting the Framework for Compliance & Risk Management from jadams6

Tags: best practices, dataversity, Risk Management

Comments Leave a Comment
Categories Best Practices, Industry, methodologies, Privacy
Author analyticaltern

Automating Data Management and Governance through Machine Learning

21 Aug

I fairly consistently get questions on the value of Machine Learning in data management and governance. Sometimes this question is framed at a high level in a very “buzz wordy” way. The person asking the question may not know what machine learning (ML) is. They have just heard the words so many times that they know it is good and should be part of the discussion. At other times, the person asking the question knows about ML and various other analytical techniques, but has never really thought of ML in the context of a data management tool. The challenge is that the emergence of IoT data, Customer 360 programs, and emerging best practices that focus on sharing semantically tagged data, all contribute to a fundamental need to do things differently. Machine learning is one of the tools in the toolbox to address the challenges related to scale, change velocity, and the consistent evolution of users and their use cases.

This post focuses on how we can automate the process of identifying data, classifying it, and linking it to internal and external references to provide semantic meaning. The goal of this post is simply to describe what machine learning is for the data manager, and what tasks it performs in the context of the standards based operational perspective.

From an operational perspective, the figure below presents the evolution of data from the “raw” transactional state to a highly labelled or curated state that can be shared between purchaser and vendor; or indeed any producer or consumer of data. Machine Learning plays a role in automating how data is curated and enriched across this lifecycle.

Figure 1: The Curation from raw data to sharable Information

If we drill down on the curation lifecycle, we can identify the various repositories that would be required, and a few of the key supporting standards. These standards and their roles are discussed more completely in a follow on post.

Figure 2: The Curation flow across the various repositories required. NOTE that this functional perspective is discussed in the context of a metadata hub in Enterprise Data Management – Where to Start?

The database symbols outlined in blue (solid lines) represent data at rest. The rectangular items outlined in green (dashed lines) represent tasks that automate how data is augmented as it moves along this path. The focus of this discussion is on these green boxes.

Activities within the Data Quality Rules and MDM Rules tasks can be broken down into a number of functional capabilities as detailed below. Some of these capabilities are traditional data operations tasks; namely, persisting metadata in a database, and exposing the data through some sort of cataloging and publishing capability. The other items (outlined in blue) are those where machine learning approaches can be applied.

Figure 3: Functional capabilities supported by Machine Learning

First let’s start with a definition of Machine Learning as Machine Learning has multiple definitions within the popular literature. The website Techemergence provides a comprehensive definition:

“Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.”

Machine learning techniques play a major role in automating the process detailed above especially over unknown or new data sets.

For data management practitioners it is important to understand that no one machine learning technique is going to apply. In all likelihood multiple approaches will be chained together and invariably executed recursively to ensure that the data can be identified, classified and then linked to the appropriate unique identifier. In the ideal world, the algorithms will change or learn to accommodate changes in the data being classified. The figure below lists some of the machine learning techniques that may be applied.

Machine Learning Techniques

Unstructured Data

Structured Data

· Entity Tagging / Extraction

· Categorize

· Cluster

· Summarize

· Tag

· Linking

· Associate

· Characterize

· Classify

· Predict

· Cluster

· Pattern Discovery

· Exception Analysis

Note that these invariably interact with one another. If I tag people entities within unstructured text, I may wish to characterize them using structured technique: count of male names; frequency per document; frequency across documents, etc. This speaks to the layered and recursive nature of machine learning, and the richness of the metadata that the data team will need to manage. For a more technical view of ML techniques see this summary.

These are detailed below with considerations for program managers.

Capability	Considerations
Identify	Machine Learning approaches support the identification of instance data in order to classify the data. Is this personal Information? Does it look like a financial #? Does it reside in a financial statement? For organizations where there is a significant installed legacy challenge. It will be important to have algorithms that identify data of interest. The identification of personal information is a current area of interest driven by the GDPR regulation.
Classify	Once data is identified, ML approaches support classifying the data within the data dictionary: data is in finance domain; it is in the “Deliver” phase of the Supply Chain Operations Reference (SCOR) lifecycle; etc. Classification algorithms must exist that tag the data with the appropriate classifier. Capabilities must quantify and resolve those instances where there is uncertainty as to accuracy of the classification algorithm. For example, are we are 100% certain that this is a vendor and not a customer?
Resolve	The completed data dictionary will support entity resolution by providing a richer feature set against which MDM machine learning algorithms can be run. Resolving the identity of the master data element may require a multi-tiered approach be run iteratively: apply Algorithm #1; for those that do not resolve with Algorithm #1, apply Algorithm #2; etc. For example, now that I know that have classified the data item as vendor master data (previous step), can I resolve the identity with certainty to identify which vendor it is?
Link	The resolved entity must be linked to internal and external reference sources. Machine Learning techniques may be used to identify and resolve link candidates and specify link type / strength. The analytical details of this may be addressed in the above “Resolve” capability. However, the focus here should be on identifying the correct link (or links) where there are multiple candidate reference sets where links could be established. This is a critical step as the linkage to the internal reference “Concept System” is what describes the data element from a semantic perspective. It is also what links the data being described to a publicly available set of definitions that external parties can reference (See “Sharable Information” in figure above). These linkages cross walk an industry accepted definition between supply chain partners. Example: If a supply chain manager seeks to communicate the nature of a product requirement to a vendor – a machine screw for example. The ability to specify length of screw versus length of the “shoulder” on the screw; thread size (Metric, standard, imperial?); type of head (hex, square, pan head, etc.) is critical. The internal labels for these are linked to the industry agreed on labels available to the vendor community. As long as the vendor is using the same reference concept system, both buyer and vendor can be assured that they are talking about the same machine screw.

Once these activities have been completed, the results need to be persisted in a metadata repository and published in a Data Catalog that will allow users to understand what data is available and how it can be accessed.

Some Closing Thoughts :: It’s all about the Ecosystem Maturity!

The above discussion and the content of the two posts in the works on MDM standards and data quality, identifies a set of standards and techniques that seek to streamline and automate the process of Master Data Management. However, these exist within the context of the organization’s data ecosystem. Data practitioners seeking to evolve master data management must ask some core questions regarding information architecture and data management maturity within their ecosystem:

How do these standards support my data strategy?
- Do I have a business case?
- Executive sponsorship?
- Funding?
Does my information architecture support the capabilities that I need to manage Master Data as envisioned by the standards?
- Will legacy systems impact how this gets executed?
- Does the architecture support a “Service Oriented” metadata registry or catalog concept?
- Do I have a metadata catalog?
- What are the architectural boundaries and how do I share data across those boundaries?
Do I have the data management maturity to execute?
- Identified and scalable processes?
- Processes applied consistently across business units?
- A governance operating model that can accommodate new functions and the change management overhead?
- What controls and metrics exist? Need to be created?

Understanding how standards and machine learning fit within the information architecture and the organization’s capability maturity will enable the data team to define the right strategy and build out a realistic roadmap. For organizations with an established and mature governance function, many of the above questions will be resolved – or the mechanism to resolve them exists. However, for organizations that have less capability maturity, the strategy and roadmap will need to be explicit in identifying the business units where foundational capabilities can be created that can later be adopted across the organizations as the need and maturity evolve.

For an additional perspective on the business cases that might drive ML, see Classification – the key to releasing data’s value !

Tags: architecture, best practices, Machine Learning, MDM, ML

Comments 1 Comment
Categories analytics, Big Data, Classification, Data Management, Master Data, Metadata, Standards
Author analyticaltern

Forensic Analytics and the search for “robust” solutions

12 Jan

Happy New Year!

This entry has been sitting in my “to publish” file for some time. There is much more to be said on the topic. however, in the interest of getting it out … enjoy!

=======================================================

This entry was prompted by the article in the INFORMS ANALYTICS Magazine article titled Forensic Analytics: Adapting to a Growing Pandemic by Priti Ravi who is a senior manager with Mu Sigma and specializes “in providing analytics-driven advisory services to some of the largest retail, pharmaceutical and technology clients spread across the United States.”

Ms. Ravi writes a good article that left me hanging. Her conclusion was that the industry lacks access to sophisticated and intelligent monitoring equipment, and there exists a need for a “robust fraud management systems” that “offer a collective set of techniques” to implement a “complex adaptive approach.” I could not agree more. However, where are these systems? Perhaps even what are these systems?

Adaptive Approaches

To the last question first. What is a Complex Adaptive Approach? If you Google the phrase, the initial entries involve biology and ecosystems. However, wikipedia’s definition encompasses medicine, business and economics (amongst others) as areas of applicability. From an analytics perspective, I define complex adaptive challenges as those that are impacted by the execution of the analytics – by doing the analysis, the observed behaviors change. This is inherently true of fraud as the moment perpetrators understand (or believe) they can be detected, behavior will change. However, it also applies to a host of other type of challenges: criminal activity, regulatory compliance enforcement, national security; as well as things like consumer marketing and financial investment.

In an article titled Images & Video: Really Big Data the authors (Fritz Venter the director of technology at AYATA; and Andrew Stein the chief adviser at the Pervasive Strategy Group. define an approach they call “prescriptive analytics” that is ideally suited to adaptive challenges. They define prescriptive analytics as follows:

“Prescriptive analytics leverages the emergence of big data and computational and scientific advances in the fields of statistics, mathematics, operations research, business rules and machine learning. Prescriptive analytics is essentially this chain of transformations whereby structured and unstructured big data is processed through intermediate representations to create a set of prescriptions (suggested future actions). These actions are essentially changes (over a future time frame) to variables that influence metrics of interest to an enterprise, government or another institution.”

My less wordy definition: adaptive approaches deliver a broad set of analytical capabilities that enables a diverse set of integrated techniques to be applied recursively.

What Does the Robust Solution Look Like?

Defining adaptive analytics this way, one can identify characteristics of the ideal “robust” solution as follows:

A solution that builds out a framework that supports the broad array of techniques required.
A solution that is able to deal with the the challenges of recursive processing. This is very data and systems intensive. Essentially for every observation evaluated, the system must determine whether or not the observation changes any PRIOR observation or assertion.
A solution that engages users and subject matter experts to effectively integrate business rules. In an environment where traditional predictive analytic models have a short shelf life (See Note 1), engaging with the user community is often the mechanism to quickly capture environmental changes. For example, in the banking world, tracking call center activity will often identify changes in fraud behavior faster than a neural network set of models. Engaging the User in the analytical process will require user interfaces, and data visualization approaches that are targeted at the user population, and integrate with the organization’s work processes. Visualization will engage non technical users to help them apply their experience and intuition to the data to expose insights. The census bureau has an interesting page, and if you look at Google Images, you can get an idea of visualization approaches.
A solution that provides native support for statistical and mathematical functions supporting activities associated with data mining : clustering, correlation, pattern discovery, outlier detection, etc.
A solution that structures unstructured data: categorize, cluster, summarize, tag/extract. Of particular importance here is the ability to structure text or other unstructured data into taxonomies or ontologies related to the domain in question.
A solution that persists data with the rich set of metadata required to support complex analytics. While it is clearer why unstructured data must be organized into a taxonomy / ontology, this also applies to structured data. Organizing data consistently across the variety of sources allows non obvious relationships to be exposed, and application of more complex analytical approaches.
A solution that is relatively data agnostic – data will come from many places and exist in many forms. The solution must manage the diversity and provide a flexible way to integrate new data into the analytical framework.

What are Candidate Tools ?

And now to the second question: where are these tools? It is hard to find tools that claim to be “adaptive analytic” tools; or “prescriptive analytics” tools or systems in the sense that I have described them above. I find it interesting that over the last five years, major vendors have subsumed complex analytical capabilities into a more easily understandable components. Specifically, you used to be able to find Microsoft Analytical Services easily on their site. Now it is part of MS SQL Server as SSAS; much the same way that the reporting service is now part of the database offer as SSRS (reporting services). There was a time a few years ago when you had to look really hard on the MS site to find Analytical Services. Of course since then Microsoft has integrated various BI acquisitions into the offer and squared away their marketing communication. Now their positioning is squarely around BI and the database. Both of these concepts are easier to sell at the executive level, than the notion of prescriptive or adaptive analytics.

The emergence of databases and appliances optimized around analytics has simplified the message on the data side. everyone knows they need a database, and now they have one for analytics. At the decision maker level, that is a much easier decision than trying to figure out what kind of analytical approach the organization is going to adopt. People like Teradata have always supported analytics through the integration of SAS and now R as in-database functionality. However, Greenplum, Neteeza and others have incorporated SAS and the open source analytical “R” . In addition, we have seen the emergence (not new but much more talked about it seems) of the columnar database. The one I hear about most is the Sybase IQ product; although there have been a number of posts on the topic on here, here, and here.

My point here is that vendors have too hard a time selling complex analytical solutions, and have subsumed the complex capabilities into the concepts that are easier to package, position and communicate around; namely; database products and Business Intelligence products. The following are product sets that are candidates for the integrated approach. We start with the big players first and work towards that are less obviously candidates.

SAS

The SAS Fraud Framework provides an integration of all the SAS components that required to implement a comprehensive analytics solution around adaptive challenges (all kinds of fraud, compliance, money laundering, etc. as examples). This is a comprehensive suite of capabilities that spans all activities: data capture, ingest, and quality; analytics tools (including algorithm libraries), data visualization and reporting / BI capabilities. Keep in mind that SAS is a company that sells the building blocks, and the Fraud Framework is just that, a framework within which customers can build out capabilities. This is not a simple plug and play implementation process. It takes time and investment and the right team within the organization. The training has improved, and it is now possible to get comprehensive training.

As with any implementation of SAS, this one comes with all the caveats associated with comprehensive enterprise systems that integrate analytics into the fabric of an organization. The Gartner 2013 BI report indicates that SAS “very difficult to implement”. This theme echoes across the product set. Having said that when it comes to integrated analytic of the kind we have been discussing all, of the major vendors suffer from the same implementation challenges – although perhaps for different reasons.

Bottom line however, is that SAS is a company grounded in analytics – the Fraud Framework has everything needed to build out a first class system. However, the corporate culture builds products for hard core quants, and this is reflected in the Gartner comments.

IBM

IBM is another company that has the complete offer. They have invested heavily in the analytics space, and between their ETL tools; the database/ appliance and Big Data capabilities; the statistical product set that builds off SPSS; and, the Cognos BI suite users can build out the capabilities required. Although these products are being integrated into a seamless set of capabilities, they remain somewhat separate and this probably explains some of the implementation challenges reports. Also, the product side of the IBM operation does not necessarily speak with the Global Services side of the house.

I had thought when IBM purchased Systems Research & Development (SRD) in 2005 that they were going to build out capabilities that SRD and Jeff Jonas had developed. Jeff heads up the Entity Analytics group within IBM Research, and his blog is well worth the read. However, the above product set appears to have remained separated from the approaches and intellectual knowledge that came with SRD. This may be on purpose – from a marketing perspective, buy the product set, and then buy IBM services to operationalize the system is not a bad approach.

Regardless, as the saying goes, no one ever got fired for buying IBM” probably still holds true. However, like SAS beware of the implementation! Any one of the above products (SPSS, Cognos, and Infosphere) require attention when implementing. However, when integrating as an operational whole, project leadership needs to ensure that expectations as to the complexity and time frame are communicated.

Other Products

There are many other product sets and I look forward to learning more about them. Once I post this, someone is going to come back and mention “R” and other open source products. There are plenty out there. However, be aware that while the products may be robust, many are not delivered as an integrated package.

With respect to open source tools, it is worth noting that the capabilities inherent in Hadoop – and the related products, lend themselves to adaptive analytics in the sense that operators can consistently re-link and re-index on the fly without having to deal with where and how the data is persisted. This is key in areas like signals intelligence, unstructured data analysis, and even structured data analysis where the notion of semantic equivalence is shifting. This is a juicy topic all by itself and worthy of a whole blog entry.

Notes:

Predictive analytics relies on past observations to predict future observations. In an adaptive environment, the inputs to those predictive models continually change as a result of the outputs using the past observations.

Tags: Adaptive, analytics, best practices, Big Data, BRE BRMS Adaptive, Prescriptive, products

Comments Leave a Comment
Categories Best Practices, BI, Big Data, methodologies
Author analyticaltern

The merging of analytics and transactional data platforms requires more than just an upgrade in technology!

15 Sep

This IDC white paper puts the evolution of data platforms into layman’s terms. My take away is that the unshackling of information architects and applications from the constraints of the traditional RDBMS will continue. Many of the design choices that the article details are grounded in the historic limitations of the data platform. The comments made under the Future Outlook segment are key:

“Trying to make definitive statements about the state of analytic-transaction data platforms going forward is challenging, because both the database kernel technology and the hardware on which it runs are evolving at a rapid pace. In addition to this, new workloads and mounting performance requirements add even more to the pace of development. It is safe to say that all the technology described in this study, admittedly in a very abstract manner, may be described as transitional technology that is evolving quickly. New approaches to data structures, new optimizations for transactional data once it is fully freed from the constraints of disk optimization, new ways of organizing processors and memory, and the introduction of non-volatile dual in-line memory modules (NVDIMMs) all will no doubt result in technologies within 10 years that are very different from what is described here.”

While platforms and technologies are evolving (this discussion has additional detail here), I find the juxtaposition of the “ideal” view presented here and the reality of most data operations interesting. This article provides “Essential Guidance” focused on IT buyers and guidance on choosing the right technology platform.

The focus on hardware and technology tends to obscure an equally important part of the buying equation – namely can managers manage these new technologies to achieve the desired business impacts and resulting business benefits. For the most part the answer is a resounding – NO. For these “next gen” implementations to work, organizations need to not only upgrade their platforms, but also their management practices. The balance of this blog entry examines some of the areas that the IDC article focuses on from the management perspective of the Chief Data Officer or Enterprise Information Architect.

The Enterprise Data Warehouse. Traditionally the Enterprise Data Warehouse (EDW) has been considered the repository of the “single version of the truth”. However, when it comes to analytics – and melding the transactional data store with analytics, this is a hard concept. There is no one version of the truth – everything is context driven. The design alternatives presented in the article (See Figure below) enable this in that they generally store both the transactional (source) and the fully resolved EDW version. This allows users to hit both the transactional store AND the EDW depending on the context they seek and how they want to interact with the data. Implicit in this view is that the context is captured and in a machine exploitable form that enables users to derive their own “single version of the truth”. This is a function of metadata discussed below. Additionally the article recognizes that the “one large database” solution is not generally a viable alternative; the issue being one of “manageability and agility.” This is somewhat contradicted in the opening “opinion” section in that they talk about a canonical data model. However, I am going to assume that the canonical recommendation is related to the metadata and not the content.

In all of the platform options discussed in the paper (see below), data managers need to keep track of a transactional data and data within a fully resolved EDW. The context and the semantic meaning of the content of both of those data sources needs to be managed, cross walked, and communicated to the user community. This will involve an evolution in both management practices and tools.

Metadata. I like the way this paper addresses metadata:

“Metadata, including all data models and schemas in the relevant databases or data collections, must be harmonized, kept current with those databases, and mapped to higher order constructs, including a business glossary and, for data managed in common, a canonical data model, in order to facilitate the access and management of the data.”

The notion of mapping “higher order constructs” is key. While it is not always possible or feasible to create a canonical data model, it is very feasible to create a canonical metadata model (metamodel). This give you a consistent way to fully describe your data regardless of the physical form it takes, and link it to higher order constructs referred to. My article here talks to the role the enterprise plays in managing the metadata at the enterprise level.

Managing the Evolution. The architectures discussed in the paper all require an evolution from the transactional data stores that exist today towards platforms that can respond to business needs rapidly, and with little or no latency. The “Type 5” platform in Figure 1 is the “Data Lake” that has become such a buzzword. In this configuration, there is a single data structure for both transactions and analytics. The ETL functions, number of indexes, and flexibility that can be applied to render the data all place a larger burden on the governance disciplines. Additionally, the process by which the organization integrates the business and IT activities requires formalizing in a way that breaks down the traditional silos.

Hampering the evolution at some level is the fact that the tool suites are not entirely intuitive. Tools to handle the mapping of the higher order constructs (concepts systems; ontologies; taxonomies, reference data…), and the management of multiple dictionaries cannot easily be implemented without complex configuration and often coding. The tool vendors seem to be coming along, but many are still working to apply governance and curation within the context of table based systems. The reality is that to create fully described data that is linked to higher order constructs, and to manage these relationships requires a collection of tools that must be configured to address your environment. It is not yet easy.

The Way Forward. Previously I have made the comment that the Information Architect, Enterprise Data Management Office, or CDO must initially focus on creating a tangible value proposition for the business side of the house. As long as data management is perceived as a function related to standards, governance and “protocol” it will be perceived as slowing down the business and getting in the way of achieving business goals. This article details a scoped down set of goals that lay the foundation for that initial value proposition. Once the enterprise data management function is able to make the case they actually improve business operations, and impact key success metrics (i.e. revenue), what next?

This is where all the articles regarding CDO’s seem to agree. The next step is all about outreach and engagement with the broader business community – potentially internal and external to the organization. My recommendation here is to perform this activity using a framework that ensures the discussions stay focused on goals, practices, and result in actionable, measurable and prioritized recommendations. The CMMI Data Management Maturity Model (DMM) is one such framework. I am biased, admittedly as I helped create it, but for an independent opinion Bob Lambert at CapTech wrote a review that speaks volumes. The framework is used to engage in a series of workshops. These workshops serve to identify a maturity level, but more importantly identify the business priorities and concerns as detailed by the workshop participants. This is critical as the resulting recommendations inherently have buy-in from across the organization.

Because the Data Management Model evaluates capabilities at the “practice” level (i.e. what people actually do), it inherently details the next steps in terms of recommendations; in other words – do not try to create a semantically equivalent data model across the whole organization if you cannot even do it for a business unit or a project! Additionally, the model recognizes the relationships between functions. The end result is a holistic and integrated set of guidance for the overall data management strategy and implementation roadmap.

Organizations seeking to upgrade their data platforms to more closely resemble the “Analytic Transactional data platform” that enables the real-time enterprise as discussed in the IDC white paper will have greater success more quickly if they evolve their data management practices at the same time.

Tags: best practices, CMMI, Data Architecture, Data Management Maturity, Data Persistence, DMM

Comments Leave a Comment
Categories Best Practices, Data Management, Industry, methodologies
Author analyticaltern

Enterprise Data Management – Where to Start?

3 Jul

What is Enterprise Data Management? In some organizations, this is an easy question to answer. However, in others – especially those with an analytical mission – it is much harder. Often the function is put under the Enterprise Architecture team. One often hears that “the Enterprise data folks just do not get it”. As one executive in a large financial organization put it: “EA is where rubber hits the air”. So how do we define a role for the data function within an Enterprise Architecture team?

This post is not about how to organize effectively in order to align with the business units to show business impact – although a worthy topic. This post is about suggesting a role for the enterprise data management team when that team is organized under the CIO within the Enterprise Architecture function of the organization. In order for the data function to be deemed valuable to the business stakeholders it must be understandable, actionable and tied to the business objectives.

Data is everywhere. When we talk about “Enterprise Data Management” the temptation is for managers to say – well that means we manage all data in all locations. As enterprise data managers, we must know all about everything! Really? Have you ever seen this work? This leads to the top down mandate of the “canonical” approach where the objective is a single standard, a single canonical model – a single ring to rule them all! This rarely works well (if at all). Business requirements, analytical activity, market trends, and evolutions in technology all lead to a core business requirement for flexibility. Additionally, there is a fundamental need to recognize that the “ground truth” is almost always with the business side of the house – and “truth” is often a shifting concept in the real world. This is part of the reason why the Enterprise Data Warehouse (EDW) “single version of the truth” is problematic for analytical and BI staff and for the rise of Hadoop as a more flexible environment. As an analytical or BI person, my version of the truth – or the right data – depends on the context of a particular decision.

So how do we focus the EDM team on what makes sense? The graphic in this excellent article on risk architecture caught my eye. I have modified it a bit to identify some core activities that I see as foundational for organizations seeking to mature their data management in general, and specifically, the integration of the enterprise architecture team in the data management process. The original graphic is attributed to Naomi Clarke currently at Credit Suisse.

Based on the above, the role of EDM is simple; manage data assets to expose those attributes that are needed to answer key business questions about data assets:

What are they?
Where are they?
What has happened to them?
What are they related to?

To do this, one needs a data management “hub”. I call it a Hub as this provides flexibility for discussion purposes. Some would call it a Managed Metadata Environment (MME), others perhaps a Metadata Registry. Regardless, the goal is the metadata ecosystem that can support key functions related to governance, curation, quality, usability and discoverability. This view suggests the following regarding the roles of the EDM team – especially when it is organized within the EA Team:

The Team needs to only manage three inputs: lineage metadata; definitions, and the physical location (what, where and change). The way the organization creates those three inputs is part of an overall data strategy, but not something the EDM team drives – these are driven by the business. By focusing in this way, the Enterprise team leaves it up to the business or operational components to determine the optimal approaches.
Definitions are aligned to business terms and to “Concept Systems” (as defined in the ISO 11179 Specs). This enables discoverability and complex search approaches based on an understanding of semantic equivalence.
Data Assets can be classified within the context of an enterprise data reference model (DRM). In most organizations, this supports the governance process. However, in government organizations, the DRM is also used to align policy and strategy objectives to IT activities. See Federal Enterprise Architecture framework for how this works in the US.
Capabilities to support governance functions must be provided: vocabulary management tools with the capability to curate and link ontologies, taxonomies, controlled vocabularies, etc, data quality tools, and governance tools.

If one can limit the INITIAL scope of the EDM team to these items, it is much easier to tie enterprise activities to the business needs, and provide a set of capabilities that address challenges of high value to the organization: search, discoverability and integration. Evolving the role once these benefits are established is a much easier task.

Tags: architecture, best practices, Data Architecture, EA, EDM, FEA, Information Architecture, ISO 11179, Org. Design

Comments 2 Comments
Categories Best Practices, Data Management, Metadata
Author analyticaltern

Interesting thought process to identify analytical approaches

29 Jan

Courtesy of a colleague in the medical data management world – check out this graphic. It is missing a few approaches, but lays out the thought process well.

The Booz Allen Field Guide to Data Science has a similar linkage that is useful. That book can be downloaded here

While I am at at, I found this good book Managing Research Data by Graham-Pryor that focuses on managing research data. I continue to be surprised at the approaches taken by “traditional” data management folks to feed the analytical processes. The old school way of dealing with analytics data did not work well which has created some of the organizational work arounds that exist in companies. This only gets worse when dealing with large amounts of data, and data that must work across systems / sources.

Tags: best practices, Booz Allen, Field Guide

Comments Leave a Comment
Categories Best Practices, methodologies, Project Management
Author analyticaltern

Interesting observations on the healthcare system implementation

9 Dec

Many thanks to http://www.bespacific.com/ for forwarding this post. I often get funny looks when I tell people that the expected outcome may not be the desired outcome. With ones analyst hat on, it is easy to say this – one has a hypothesis, and one tests it. If the hypothesis proves false, then we have identified a place not to go, or a refinement in thinking. For an analyst “failure” (as defined below) is an option. For program managers, it must be an option, but one that is so hard to manage – generally the inability to address this issue starts at the top, and is framed within the culture of the organization.

As a project manager, one would think it is an option – identified as a “risk” in PMP speak, and addressed and managed as such. We will not know the details of what happened for a while, but the article below sheds some light.

HealthCare.gov and the Gulf Between Planning and Reality
By Irving Wladawsky-Berger
Guest Contributor, WSJ

It’s way too early to know what really happened with the botched launch of HealthCare.Gov. We don’t know how it will all play in years to come and what its impact will be on the evolution of the Alternative Care Act, on election results over the next few years, or on President Obama’s legacy. Depending on how it all turns out over time, this will be just a chapter in future books on the history of the ACA and the Obama administration, or the subject of major books and investigative reports.

Most everyone who’s been involved with the development of complex IT systems knows how wrong things can sometimes go. So, when serious problems do happen, we are eager to learn the lessons that might help us avoid similar problems in the future. It’s quite possible that HealthCare.gov and the ACA’s overall IT system are such complex outliers–technically, organizationally and politically–that any lessons learned might apply to few other projects. But, given the increasing complexity of private and public sector IT systems, the lessons are worth thinking about.

I like the way Clay Shirky, NYU faculty member as well as author and consultant, framed the problem in a very interesting blog, Healthcare.gov and the Gulf Between Planning and Reality. He writes about the gulf between those charged with planning the overall rollout of the ACA and health care exchanges and the realities of trying to get such a complex system designed, built and launched in a short amount of time. It’s essentially a tale of failure is not an option versus the messy world of highly complex IT systems. While the blog is focused on the launch of HealthCare.gov, it can also be read as a more general discussion of the kinds of problems often encountered with highly, complex IT-based projects when a management decision to win a deal at all costs comes back to haunt the implementation of the project.

“For the first couple of weeks after the launch, I assumed any difficulties in the Federal insurance market were caused by unexpected early interest, and that once the initial crush ebbed, all would be well,” he writes. “The sinking feeling that all would not be well started with this disillusioning paragraph about what had happened when a staff member at the Centers for Medicare & Medicaid Services [(CMS)], the department responsible for Healthcare.gov warned about difficulties with the site back in March.”

The paragraph responsible for Mr. Shirky’s sinking feeling was part of an October 12 NY Times article, From the Start Signs of Trouble at Health Portal. According to the article, the warnings came from CMS deputy CIO Henry Chao, the chief digital architect for the new online insurance marketplace. In response, his superior told him:

“. . . in effect, that failure was not an option, according to people who have spoken with him. Nor was rolling out the system in stages or on a smaller scale, as companies like Google typically do so that problems can more easily and quietly be fixed. Former government officials say the White House, which was calling the shots, feared that any backtracking would further embolden Republican critics who were trying to repeal the health care law.”

“The idea that failure is not an option is a fantasy version of how non-engineers should motivate engineers,” adds Mr. Shirky. “Failure is always an option. Engineers work as hard as they do because they understand the risk of failure.” In his opinion, neither technology, talent, budgets or the government’s bureaucratic processes are the main culprits here. Rather, this is a management and a cultural problem. As a result of the huge political pressures they were under, top administration officials did not feel that they could seriously address the possibility that things might go wrong.

Other articles paint a similar picture, such as this recent one in the WSJ’s CIO Journal:

“It was on a cold, sunny day in Baltimore last January that Curt Kwak, chief information officer of the Washington Health Benefit Exchange, first realized that the signature feature of President Obama’s Affordable Care Act could be in trouble. That day, at a status review meeting of CIOs of state health exchanges, he learned that many of his peers were far behind where they should have been. According to Mr. Kwak, several of his peers hadn’t yet selected a systems integrator – tech vendors who play crucial roles in fitting together the multiple components of health insurance exchanges that allow consumers to select and enroll in health plans.”

Why did the administration, as well as several states, wait so long to start the planning of the ACA system including the health care exchanges? Ezekiel Emanuel– oncologist, vice provost and professor at the University of Pennsylvania and former White House advisor on health policy–said in a good article on the subject that the administration did not want to release detail regulations and specifications on the exchange while in the middle of the 2012 election campaign in order to avoid political controversies. “This may have been a smart political move in the short term, but it left the administration scrambling to get the IT infrastructure together in time, robbing it of an opportunity to adequately consult with independent experts, test the site and fix any problems before it opened to the public.”

But, then came the reality, which Mr. Shirky describes as the painful tradeoff between features, quality and time.

“When a project cannot meet all three goals–a situation Healthcare.gov was clearly in by March–something will give. If you want certain features at a certain level of quality, you’d better be able to move the deadline. If you want overall quality by a certain deadline, you’d better be able to simplify, delay, or drop features. And if you have a fixed feature list and deadline, quality will suffer. . . You can slip deadlines, reduce features, or, as a last resort, just launch and see what breaks. . . That just happened to this administration’s signature policy goal.”

The inability of a troubled project to meet all three goals simultaneously, almost feels like the complex systems equivalent of the Heisenberg uncertainty principle; that is, it’s impossible to simultaneously determine the exact position and velocity of an atomic particle with any great degree of accuracy no matter how good your measurement tools are. While clearly not a scientific principle, but a set of guidelines based on decades of experience, there seem to be intrinsic limits to our ability to fix troubled IT projects no matter how hard we try.

In The Mythical Man-Month, noted computer scientist and software engineer Fred Brooks introduced one of the most important concepts in complex IT systems: adding manpower to a late software project makes it later. Brooks’ Law as his concept became known, remains as true today as when it was first formulated almost 40 years ago.

Over the years, we have learned that there are limits to our ability to pre-plan complex IT projects in advance. You need a good design, architecture and overall project plan, but you also need the flexibility to learn as you go and make trade-offs as appropriate. Most such projects are therefore released in stages, with alpha and beta phases that start testing the system with a select and relatively small number of users. Such early testing uncovers not only software bugs, but also design flaws that users have trouble with.

Another important lesson is that all parties involved in a complex, high-risk project must have a good working relationship. All available information on the status of the project should be shared, so there are few last-minute surprises. Tradeoff decisions and project adjustments should involve all key members of the team. Behind most seriously troubled projects lies not only a gulf between planning and reality, but a lack of the close collaboration and overall good will necessary to make the project succeed.

It’s hard to imagine a more politically contentious project than the ACA. The administration was worried that any glitches uncovered while testing the system as part of the usual staged release cycle would give further ammunition to those trying to kill the ACA altogether. They may have felt that slipping deadlines and reducing features prior to the October 1 launch was not politically feasible, and that they therefore had no choice but to launch anyway and hope for the best. Did they make the right decisions? We’ll find out in the fullness of time.

Irving Wladawsky-Berger is a former vice-president of technical strategy and innovation at IBM. He is a strategic advisor to Citigroup and is a regular contributor to CIO Journal.

http://blogs.wsj.com/cio/2013/12/06/healthcare-gov-and-the-gulf-between-planning-and-reality/?mod=wsj_nview_latest

Tags: best practices, healthcare, obamacare

Comments Leave a Comment
Categories Best Practices, Project Management
Author analyticaltern

The different aspects of BI

5 Dec

http://www.martinsights.com/?p=774

I like the recognition that approaches need to be integrated in order to create useful insights. Valuable insights come from balancing the needs and capabilities of business strategy, business analysis, business intelligence and advanced analytics.

Tags: analytics, best practices, BI, Big Data, Martin_Fowler, risk analysis

Comments Leave a Comment
Categories Best Practices
Author analyticaltern

Information Architecture – A Moving Target?

6 Jul

I am increasingly seeing articles that talk about the confusion in identifying and building out the right information architecture for the organization. The article here, and with a clip below talk to that point. This is a good thing. People seek simplicity, and are looking for the prescriptive approach: 1) build a data warehouse; 2) build some datamarts for the business folks; 3) get a BI tool and build reports. But this does not cut it as it is too rigid a structure for analysts, or other stakeholders that have to do more than pull reports. The industry has responded by – I am speaking in buzzwords here – by adding “sandboxes”; by adding ODS (Operational Data Stores); and by adding a whole new way of landing, staging, persisting data and using it in analytical tasks (Hadoop). Sitting on top of this data level of the information architecture has been an explosion of tools that cater to (more buzzwords) data visualization, self serve BI, and data mashups to name a few.

Bottom line – how does this all get put together without creating an even bigger data mess than when you started? It is hard. What one sees so often is organizations putting off addressing the issue until they have a real problem. At this point, one sees a lot of sub-optimal management behavior. A consistent theme in the press is agility – organizations and their leaders need to embrace the agile manifesto. I am whole heartedly behind this. HOWEVER, agility needs to be framed within a plan, a vision, or at least some articulated statement of an end point.

The article below is interesting as it presents agility as a key “must have” management approach, and yet it also discusses the fact that in order for an agile approach to be successful, it needs to adopt disciplines that are decidedly un-agile! This creates a dual personality for leaders within the data management related functions of an organization (BI, analytics, ERP, …). On the one hand one wants to unleash the power of the tools and the creative intellect that is resident within the organization; on the other, there exists a desire to control, to reduce the noise around data, to simplify ones life. The answer is to embrace both – build a framework that provides long term guidance, and iteratively delivers capabilities within that framework towards a goal that is defined in terms of business capabilities – NOT technology or tightly defined tactical goals.

The framework – whichever approach one chooses will articulate the information architecture of the organization – how data flows around the organization to feed core business activities, and advance management’s goals! It is important – if it cannot be explained on a one page graphic, it is probably too complicated!

Martin’s approach to tying things together is below…

“”So given that there is not a one size fits all approach anymore how does a company ensure its Information Architecture is developed and deployed correctly? Well, you have to build it from the ground up, and you have to keep updating it as the business requirements and implemented systems change. However, to do this effectively, the organisation must be cognisant of separating related workloads and host data on relevant and appropriate platforms, which are then tied together by certain elements, including:

Search

analyticaltern

Lots of discussion on Ai Governance & Privacy this week

Sub Committee hearing on oversight of Ai here

Blumenthal & Hawley Announce Bipartisan Framework on Artificial Intelligence Legislation

SB-362 Data broker registration: accessible deletion mechanism.(2023-2024)

Architecting the Framework for Compliance & Risk Management

Automating Data Management and Governance through Machine Learning

Some Closing Thoughts :: It’s all about the Ecosystem Maturity!

Forensic Analytics and the search for “robust” solutions

The merging of analytics and transactional data platforms requires more than just an upgrade in technology!

Enterprise Data Management – Where to Start?

Interesting thought process to identify analytical approaches

Interesting observations on the healthcare system implementation

The different aspects of BI

Information Architecture – A Moving Target?

Recent Posts

Archives

Follow Blog via Email

Interesting Tags

Wayne Erikson

Pages

Search

Sub Committee hearing on oversight of Ai here

SB-362 Data broker registration: accessible deletion mechanism.(2023-2024)

Share this:

Share this:

Some Closing Thoughts :: It’s all about the Ecosystem Maturity!

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Recent Posts

Archives

Follow Blog via Email

Pages