Why is a Data Framework Needed?

“Metadata” Everywhere

  • Web Pages
  • Books
  • Genetics Data
  • Tree Database
Almost all data describes some other data.

Web Pages

<!DOCTYPE html>
<html lang="en-US">
<head>
  <title>The Tree Page</title>
  <meta name="author" content="Jane Doe" />
  <meta name="description" content="All about trees." />
</head>
<body>
  …
</body>
</html>

EPUB Package

<package … unique-identifier="pub-id"> …
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="pub-id">
      urn:uuid:a2f9981a-fc28-4f49-895f-3a1a8b12fc2c
    </dc:identifier>
    <dc:title>A Tale of Trees</dc:title>
    <dc:publisher>Arbor Publishing</dc:publisher>
    <dc:language>en</dc:language>
    <meta property="dcterms:modified">
      2017-09-17T03:29:46Z
    </meta>

GENCODE Data

gene_id gene_name level chr start end
ENSG00000223972.5 DDX11L1 2 chr1 11869 14409
ENSG00000227232.5 WASH7P 2 chr1 14404 29570
ENSG00000278267.1 MIR6859-1 3 chr1 17369 17436
ENSG00000243485.5 MIR1302-2HG 2 chr1 29554 31109
ENSG00000284332.1 MIR1302-2 3 chr1 30366 30503

Tree Database

id species sproutYear
123 Cercis canadensis 2012
456 Quercus macrocarpa 1950
789 Cornus florida 1985

Adding Properties

Need to add a tree transplanted flag.

  • Migrate the database schema?
  • Change the hard-coded property list?

A data framework allows new properties to added as needed—even properties not anticipated.

Property Names

Your program adds a tree % mature when transplanted data point.

  • Do downstream processing applications support % in property names?
  • Do services choke when properties contain spaces?
  • Does the main program even support this name format throughout its modules?
  • How would you know?

A data framework provides rules for consistent, valid property names.

Property Identity

Your program encounters two different data points named mold.

  • The tree manager indicates whether a scaffold device is used to mold the shape of the tree.
  • The tree doctor indicates whether sooty mold was found on the leaves.

A data framework provides a mechanism for unambiuously identifying properties.

Property Vocabularies

A third party already provides a property for indicating species.

  • The livestock department is already using the species property from the biology vocabulary.
  • Both departments want a common service for requesting additional per-species data.

A data framework provides a mechanism for integrating third-party vocabularies.

Property Subjects

The tree database needs to track whether a tree is deciduous, shedding its leaves every year.

  • Tree with ID 456 is deciduous.
  • But all oak trees are deciduous.
  • deciduous should be a property of the species Quercus macrocarpa, not of “Tree #456”.

A data framework rigorously indicates the thing being described.

Ontologies

A new property salary is introduced to describe employees.

  • How does the program user interface know that a tree should not have a salary?
  • How does the program user interface know that salary value should be a decimal value, and not a species name?

A data framework provides a means for defining the types and properties themselves:

  • Which types of things a property can be assigned to.
  • What types of values a property may be assigned.

Reasoning

A user needs to determine whether Tree #456 is deciduous.

  • The data indicates that Tree #456 is an oak tree.
  • The data indicates that all oak trees are decidious.
  • Therefore Tree #456 must be deciduous.
  • This is logically implied by the data.

A data framework provides a means for defining semantic relationships among the types and properties, so that a reasoning system can make logical inferences.