Why is a Data Framework Needed?
“Metadata” Everywhere
- Web Pages
- Books
- Genetics Data
- Tree Database
Web Pages
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>The Tree Page</title>
<meta name="author" content="Jane Doe" />
<meta name="description" content="All about trees." />
</head>
<body>
…
</body>
</html>
EPUB Package
<package … unique-identifier="pub-id"> …
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier id="pub-id">
urn:uuid:a2f9981a-fc28-4f49-895f-3a1a8b12fc2c
</dc:identifier>
<dc:title>A Tale of Trees</dc:title>
<dc:publisher>Arbor Publishing</dc:publisher>
<dc:language>en</dc:language>
<meta property="dcterms:modified">
2017-09-17T03:29:46Z
</meta> …
GENCODE Data
gene_id | gene_name | level | chr | start | end |
---|---|---|---|---|---|
ENSG00000223972.5 | DDX11L1 | 2 | chr1 | 11869 | 14409 |
ENSG00000227232.5 | WASH7P | 2 | chr1 | 14404 | 29570 |
ENSG00000278267.1 | MIR6859-1 | 3 | chr1 | 17369 | 17436 |
ENSG00000243485.5 | MIR1302-2HG | 2 | chr1 | 29554 | 31109 |
ENSG00000284332.1 | MIR1302-2 | 3 | chr1 | 30366 | 30503 |
Tree Database
id | species | sproutYear |
---|---|---|
123 | Cercis canadensis | 2012 |
456 | Quercus macrocarpa | 1950 |
789 | Cornus florida | 1985 |
Adding Properties
Need to add a tree transplanted
flag.
- Migrate the database schema?
- Change the hard-coded property list?
A data framework allows new properties to added as needed—even properties not anticipated.
Property Names
Your program adds a tree % mature when transplanted
data point.
- Do downstream processing applications support
%
in property names? - Do services choke when properties contain spaces?
- Does the main program even support this name format throughout its modules?
- How would you know?
A data framework provides rules for consistent, valid property names.
Property Identity
Your program encounters two different data points named mold
.
- The tree manager indicates whether a scaffold device is used to mold the shape of the tree.
- The tree doctor indicates whether sooty mold was found on the leaves.
A data framework provides a mechanism for unambiuously identifying properties.
Property Vocabularies
A third party already provides a property for indicating species.
- The livestock department is already using the
species
property from the biology vocabulary. - Both departments want a common service for requesting additional per-species data.
A data framework provides a mechanism for integrating third-party vocabularies.
Property Subjects
The tree database needs to track whether a tree is deciduous, shedding its leaves every year.
- Tree with ID
456
is deciduous. - But all oak trees are deciduous.
deciduous
should be a property of the species Quercus macrocarpa, not of “Tree #456”.
A data framework rigorously indicates the thing being described.
Ontologies
A new property salary
is introduced to describe employees.
- How does the program user interface know that a tree should not have a
salary
? - How does the program user interface know that
salary
value should be a decimal value, and not a species name?
A data framework provides a means for defining the types and properties themselves:
- Which types of things a property can be assigned to.
- What types of values a property may be assigned.
Reasoning
A user needs to determine whether Tree #456 is deciduous.
- The data indicates that Tree #456 is an oak tree.
- The data indicates that all oak trees are decidious.
- Therefore Tree #456 must be deciduous.
- This is logically implied by the data.
A data framework provides a means for defining semantic relationships among the types and properties, so that a reasoning system can make logical inferences.