A good baseline presentation on creating semantic data repositories

8 Jul

I have been meaning to give a shout out to Dan MCCreary http://www.danmccreary.com/ for some time now, and am only just getting around to it. What caught my attention was a presentation Dan gave at the 2006 Semantic Technology Conference. The presentation walked the audience through the process of creating a well-structured data repository enabling players within the Minnesota education system to ask complex across all of the data available. While the presentation is rather dated, it is one of the better presentations I have seen that takes things back to a level most people can understand. Additionally, I don’t see that the state of the industry has changed that much in terms of tools to support the creation well-structured data repositories that support broad “knowledge management” goals.

Dan focuses on metadata as the means to achieve the goals of the participants: Minnesota Department of Education; the Wisconsin Department of Public Instruction and the Michigan Department of Education. The goal is to create a semantic understanding of the data available. Semantic is one of those words that means many things to different people. Once you combine “semantic” with “knowledge management” or ”knowledge discovery”, everyone seems to have a different opinion as to what you are talking about – IT and business folks alike. It becomes very important to level set your audience. For the purposes of this discussion, creating a semantic repository involves:

Enhancing data with metadata that focuses on describing the data asset as a whole, and with respect to its component content parts;
Creating descriptive and content metadata that describes the data in specific terms, and in more general “conceptual” terms.
Creating a description of the data that reveals connections to other concepts; related terms; multiple standards.
Creating highly linked data that is organized through taxonomies and ontologies to reveal links that are perhaps not obvious to users.

The desired end state is the ability for a broad range of end users to query disparate data sources and understand how available date can be used in their particular context.

Within the referenced deck (http://www.danmccreary.com/presentations/semweb2006/), there are a number of sections worth drawing your attention to:

ISO 11179 (Starting on page 15). Most people I speak to that know about ISO 11179 (http://www.iso.org/iso/home/search.htm?qt=11179&published=on&active_tab=standards&sort_by=rel ) are somewhat skeptical as the spec is a heavy on as written. Once you get your head around all the good ideas that have gone onto the specification, the question becomes how much of it do you adopt? The idea behind the spec is that each piece of metadata has its own metadata that allows a user to know exactly how that metadata is to be used. Dan presents the 11179 concept within his discussion of the National Information Exchange Model (NIEMS https://www.niem.gov/Pages/default.aspx) which brings us to another interesting point…

Mapping to Standards (Page 16). I like the way that the process of mapping to standards is presented. In this instance, the Educational System is only trying to map to one standard. However, you can imagine an environment where there are multiple standards of interest. In this instance, the mapping approach is organized around the OWL “SameAs” terminology (Web Ontology Language http://www.w3.org/TR/owl-features/). It does not say it in the presentation, but I am assuming that this is taken from the SKOS Simple Knowledge Organization System standards (http://www.w3.org/2004/02/skos/). As you classify data assets, there will be multiple mapping exercises where the mapping will not be exact. The linking concepts defined within SKOS will allow you to relate items that are not precise or consistent in their relationships.

Demonstrating that the data problem grows faster than the number of data sources (Page 17). This sounds so simple. However, I am still surprised when I hear people say – “we have the data mapped between the sources I want, so no need to spend time mapping to standards”. Well ok that works if you have a small number of static data sets, but … that happens rarely. In the presentation Dan refers to this as the O(N²) problem versus the O(N) problem. I have included his graphics as these tell the story better than words.

Tags: information Sharing, ISO 11179, metadata, metamodel, products, semantic

Comments Leave a Comment
Categories Data Management, Data Stores, Master Data, Metadata, methodologies
Author analyticaltern

Search

Leave a comment Cancel reply

Recent Posts

Archives

Follow Blog via Email

Interesting Tags

Wayne Erikson

Pages