Uniform Resource Framework (URF) Specification

Author
Garret Wilson (GlobalMentor, Inc.)
Version
Draft 2020-02-01

Introduction

The Uniform Resource Framework (URF) provides a general, simple, and consistent way for representing data and their relationships. URF is useful for data storage, data interchange, data querying, and logical inferences. As a data framework URF allows similar yet simpler representation than the Resource Description Framework (RDF) . Together with its interchange formats, URF provides a powerful and concise replacement for data-oriented XML and JSON.

Serialization

This section is non-normative.

This document defines URF as an abstract data model. URF provides several official formats, specified in separate documents, including:

TURF
The canonical text serialization.
SURF
A simplified text serialization.
COLURF
A space-efficient, comma-separated, tabular text format.
NURF
A propositional statement-oriented representation.
CSV URF
A rigorous format for representing URF in a standard CSV format; intended primarily for importing legacy data.

Examples in this document may present URF instances in TURF and/or NURF for illustration purposes.

Conventions Used in this Document

The key words “must”, “must not”, “required”, “shall”, “shall not”, “should”, “should not”, “recommended”, “may”, and “optional” in this document are to be interpreted as described in RFC 2119. Parts of this specification marked as notes and annotations are non-normative.

Model

Everything that is described in URF is referred to as a resource. Each resource is the instance of some type. A resource may also be uniquely identified globally by a tag.

A resource.
URF resource.

Statements

Statements assert that resources have property values.
URF resource, property, and value.

Resources are described by a set of statements, each claiming that some resource subject has a property with some property value.

A particular group of statements is called an URF instance. Each statement exists in some URF knowledge community. The root statements of an URF instance are part of the instance community and may be considered to be asserted in the same knowledge community as the instance community of all other URF instances.

Tags

URF identifies a resource using a tag, which takes the form of an IRI as defined in RFC 3987. Although a tag may be a URL, this specification does not define whether the location indicated by the tag indicates any actual, retrievable content; an URF tag functions solely as an identifier. A resource with no identifying tag is referred to as an anonymous resource.

Names

If a resource has a hierarchical tag using a path, the last path component, along with the decoded IRI fragment if present, is considered the name of the resource if it meets all of the following criteria:

The sequence of Unicode code points in a name must follow Normalization Form C (NFC) as per UAX #15.

For example, the resource with the tag https://urf.name/Tree has the name Tree, and the resource with the tag https://urf.name/Tree#456 has the name Tree#456. However the resource with the tag https://urf.name/urf/String#foo%20bar has no name, because the decoded fragment foo bar does not meet the production of name_id_token above.

IDs

A tag that contains a fragment identifier is referred to as an ID tag, and the decoded fragment part of the name is considered the ID of the resource, whether or the resource has a name. Thus  the resource with the tag https://urf.name/Tree#456 has the ID 456, the the resource https://urf.name/urf/String#foo:bar has the ID foo:bar, even though the latter resource has no name.

Namespaces

For those resources with names, the remaining part of the tag path (the part before the last path component) is considered the namespace of the resource. Namespaces should be used to group resources with related semantics.

Each additional path component of a tag path indicates a subnamespace inside the previous namespace. Thus the namespace identified by the IRI https://urf.name/bio/ is a subnamespace of that identified by https://urf.name/. Each subnamespace path component should follow the syntax rules of a base name.

Ad Hoc Namespace

The namespace https://urf.name/ is the ad hoc namespace. Applications that need a low-overhead naming scheme may use resource names placed directly within this namespace. Names placed in the ad hoc namespace have the potential to clash if used with other vocabularies that also use the ad hoc namespace. The ad hoc namespace should only be used for applications for which the identity of the resource is unambiguous in the context.

The resource with tag https://urf.name/Tree#456 is an example of a resource in the ad hoc namespace.

Informal Namespaces

Subnamespaces of the ad hoc namespace are considered informal namespaces. Applications that to mix vocabularies but still do not require or desire the overhead of formal namespace definition may use resource names placed within informal namespaces. Informal namespaces should attempt to avoid name clashes by using subnamespace components that reflect well-known vocabulary identifiers (such as https://urf.name/refseq/ to annotate information in the NCBI Reference Sequence Database); or by using subnamespace components corresponding to domain components which the vocabulary author controls (such as https://urf.name/globalmentor for GlobalMentor, Inc., which controls the globalmentor.com domain).

The following namespaces which would otherwise be informal namespaces are reserved:

https://urf.name/example/
Reserved for use as examples in documentation or for private testing.
https://urf.name/space/
Reserved for declaring namespaces using directives in several URF formats.
https://urf.name/urf/
Reserved for use by the URF specifications.
Formal Namespaces

All other namespaces constitute formal namespaces defined by third parties. For example, the Dublin Core Metadata Initiative defines the formal namespace http://purl.org/dc/elements/1.1/ for information in the Dublin Core Metadata Element Set, Version 1.1.

Handles

A handle represents a shorter yet unambiguous string representation of a resource tag. For tags in the https://urf.name/urf/ namespace or one of its subnamespaces, the resource handle is the same as its name, with zero or more segments prepended, each separated by the HYPHEN-MINUS character - (U+002D). Each segment represents the decoded path statement of the https://urf.name/urf/ (sub)namespace, provided each segment meets the name_token production.

For tags not in the https://urf.name/urf/ namespace or one of its subnamespaces, it may have a handle consisting of its name and some namespace alias prefix, separated by the SOLIDUS character / (U+002F), provided every character in the namespace alias meets the name_token production. A handle with a namespace alias must only be used is only in a context in which the namespace alias has been associated with a valid namespace, and must not contain any segments.

Blank Tags

If a resource has no tag, but a system has a need for some stronger identification in some context, a system may generate a blank tag to represent the resource. A blank tag is formed by resolving some IRI fragment identifier to the IRI https://urf.name/urf/. The fragment identifier must be unique for all blank tags within the context with which the blank tag will be shared. For example, a system might choose to generate a UUID according to RFC 4122 as the fragment identifier, forming the blank tag https://urf.name/urf/#330dca65-c7aa-49ab-b691-70af2b60ce03. Although blank tags have IDs, they are not in the https://urf.name/urf/ namespace—and indeed have no namespace—as blank tags have no names.

Types

Each URF resource is the instance of some type. If a resource is not an instance of any more specific type, it is of type urf-Resource.

ID Tag Types

An ID tag indicates both the unique tag and the type of the resource, with the type being the part of the tag with the fragment removed. For example, the resource identified by the tag https://urf.name/Tree#456 (Tree#456) is of type Tree (itself a resource, identified by the tag https://urf.name/Tree). The resource ID must be globally unique across all resources of the same type. A resource ID will likely not be globally unique for resources of other types.

Lexical ID Types

The resources of some types are identified by ID tags that use the canonical lexical representation of the resource as the resource ID. For example, the urf-Boolean value true is identified by the tag https://urf.name/urf/Boolean#true (urf-Boolean#true). Similarly the integer 5, which in URF is an instance of the urf-Integer type, is identified by the tag https://urf.name/urf/urf-Integer#5 (urf-Integer#5).

Classes

Resources that define types are called classes and are instances of the resource urf-Class, which is itself instance of urf-Class. A class can be a specialization or subclass of another class. The class urf-Integer, for example, is a subclass of urf-Number. The class urf-Class is a subclass of urf-Resource.

Properties

Each resource has zero or more properties. Each property is a resource and must have a tag. Each property forms a statement by associating a subject resource with of a single property value, a resource which may be anonymous.

If the base name of a property ends in the PLUS SIGN + (U+002B), it is considered an n-ary property, representing a one-to-many relationship between a subject and a property value. Otherwise the property is a binary property, representing a one-to-one relationship between a subject and a property value. An n-ary property may associate the same subject with more than one value in an URF instance, while a binary property must associate a subject with at most one value.

Merging

To merge two URF instances is to combine the contents of the source URF instance with the contents of the destination URF instance to produce a result URF instance. The presence of a binary property for the same subject in a source URF instance should replace the value for property of the same subject in the destination URF instance. The presence of an n-ary property for the same subject with an additional value should be added to the destination URF instance.

Ontologies

URF Ontology

Namespace
https://urf.name/urf/

Classes

urf-Binary

Represents an arbitrary sequence of octets. This is a lexical ID type. The canonical lexical form is the Base 64 encoding of the bytes as defined in RFC 4648, using the “base64url” alphabet with no padding.

urf-Boolean

Represents a Boolean value of true or false. This is a lexical ID type. The canonical lexical forms are true and false, respectively.

urf-Character

Represents a Unicode code point. This is a lexical ID type. The canonical lexical form is the character represented by the code point.

urf-Class

A type for generalizing instances of resources. Not every resource used as a class is an instance of the class urf-Class, but a resource may be declared to be of the class type to further specify its semantics. The class urf-Class is a subclass of the class urf-Resource and an instance of the urf-Class itself.

urf-Collection

The parent class for all collection resources urf-List, urf-Map, and urf-Set. Collection resources represent abstract data types that can hold other resources.

urf-Decimal

Decimal numbers consist of all rational numbers that have a finite decimal representation. The semantics of this type prohibit rounding the fractional part within the supported range.

This is a lexical ID type. The canonical lexical form is identical to that of urf-Number.

urf-Duration

TODO Duration resources represent lengths of time. They use an inline namespace with a lexical form consistent with [RFC 2445] and [ISO 8601] of P[nYnMnD][TnHnMn[.n]S], where n is some positive number of roman digits and at least the date or time section is present.

urf-EmailAddress

Represents an email address. This is a lexical ID type. The canonical lexical form is the “addr-spec” format specified in RFC 5322. The email must not include any obsolete elements (those starting with the prefix “obs-”) in RFC 5322. The lexical form must not include any “comments” or “folding white space” as defined by RFC 5322.

urf-Instant

An instant represents a normalized time of day on a particular calendar date. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-Integer

Integer resources are the positive whole numbers, the negative whole numbers, and zero. This is a lexical ID type. The canonical lexical form is defined by the production rule:

This class is a subclass of the class urf-Number.

urf-Iri

Represents an Internationalized Resource Identifier (IRI) as defined in RFC 3987. This is a lexical ID type, and the canonical lexical form is the string representation of the IRI itself.

urf-Language

TODO Language resources represent human langages and use inline namespace URIs. The lexical form of each is the corresponding language tag described in [RFC 4646].

urf-List

List resources are resources that contain other element resources at certain indexes of the list. A list, like normal resources, may have any property, but the properties representing the elements of the list are of type urf-Ordinal, each representing the ordinal index of the element. That is, if a list contains an element at index 5, the element resource will appear as a value of the property urf-Ordinal#5. Although many uses cases will prefer a continuous, unduplicated sequence of index properties beginning with urf-Ordinal#5, this is not an URF requirement.

urf-LocalDate

Represents a date with floating offset as per RFC 2445. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-LocalDateTime

Represents a date and time with floating offset as per RFC 2445. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-LocalTime

Represents a time with floating offset as per RFC 2445. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-Map

A resource containing associations between keys and values. Each association is represented by an instance of urf-MapEntry, which is a value of the urf-member+ property of the map resource.

urf-MapEntry

A resource representing a key-value mapping between the value of the urf-key property value urf-value property value.

urf-MediaType

A media type resource is an Internet media type described by RFC 6838. Internet media types are also known as MIME types and content types. Media types are lexical ID types with a canonical form that follows RFC 6838 with the following normalizations:

urf-MonthDay

Represents a calendar month and day without respect to any time zone. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-Number

The base class for all numerical values. Numbers are lexical ID types and include urf-Decimal and urf-Integer.

The canonical lexical form for urf-Number is the base 10 representation of the number value that may be negative and may be fractional.

The canonical lexical form has:

urf-OffsetDate

Represents a date with a fixed offset from UTC. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-OffsetDateTime

Represents a date and time with a fixed offset from UTC. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-OffsetTime

Represents a time with a fixed offset from UTC. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-Ordinal

Ordinal resources are numbers that represent the position of an element in a sequence. URF currently only supports finite ordinals, which means that there will be a corresponding ordinal for every positive whole numbers and zero. This is a lexical ID type. The canonical lexical form is defined by the production rule:

This class is a subclass of the class urf-Number.

urf-Property

Not every resource used as a property is an instance of the class urf.Property, but a resource may be declared to be of the urf-Property type to further specify its semantics and expected domain and range. The class urf-Property is a subclass of the class urf-Class.

urf-RegularExpression

Represents a regular expression, a text-based pattern that define rules for the content of strings. This is a lexical ID type, and the canonical lexical form is the string representation of the regular expression itself. TODO add support for flags TODO reference regex specification, perhaps ISO/IEC 9945-2 or http://pubs.opengroup.org/onlinepubs/007908799/xbd/re.html

urf-Resource

Every resource is implicitly an instance of the class urf-Resource or one of its descendant classes.

urf-Set

TODO Set resources are resources that contain at most one instance of other element resources. A set, like normal resources, may have any property, but the properties representing the elements of the set appear as values of the urf-member+ property.

urf-String

Represents a sequence of Unicode code points. This is a lexical ID type, and the canonical lexical form is the string itself, normalized to Normalization Form C (NFC) as per UAX #15.

urf-TelephoneNumber

Represents a telephone number. This is a lexical ID type. The canonical lexical form is the “global number” format prescribed by RFC 3966. The canonical representation must not include any “visual separators” as defined by RFC 3966.

urf-Temporal

The parent class of all the temporal types, representing date and/or time information based on ISO 8601. These are all lexical ID types and follow the productions below combined with their own productions. Time zone names tz are from the IANA TZ database and are case-sensitive.

TODO add support for durations

urf-Uuid

Represents a Universally Unique IDentifier (UUID) adhering to the requirements of RFC 4122. This is a lexical ID type. The canonical lexical form is the “UUID” production given in RFC 4122.

urf-Year

Represents a calendar year without respect to any time zone. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-YearMonth

Represents a calendar year and a month without respect to any time zone. This is a subclass of urf-Temporal and is a lexical ID type with the following production:

urf-ZonedDateTime

Represents a date and time relative to a specific IANA TZ time zoneThis is a subclass of urf-Temporal and is a lexical ID type with the following production:

Properties

urf-key

Represents the key in a key-value association, usually of an instance of urf-MapEntry. This is a subclass of urf-Property.

urf-member+

Represents an aggregation association between two resources, in which the subject resource “contains” one or more resources. This is a subclass of urf-Property.

urf-type

Represents the type of a resource. This is a subclass of urf-Property. An URF resource is only allowed one urf-type, analogous to the type used by programming languages to create an “instance” of a class.

urf-value

Represents the value in a key-value association, usually of an instance of urf-MapEntry. This is a subclass of urf-Property.

Content Ontology

Namespace
https://urf.name/content/

Classes

content-Charset

TODO The name of the mapping of integer values to a set of characters. This is equivalent to the charset Internet media type parameter described by [RFC 2046] and further elaborated in [RFC 2278] Section 2.3. A charset encapsulates both the concept of a coded character set and a character encoding scheme, as specified in [RFC 2130] Section 3.2. The lexical form is the canonical charset name specified by [IANA Charset Registry], such as UTF-16BE.

References

DCMI Namespace
Andy Powell and Harry Wagner. Namespace Policy for the Dublin Core Metadata Initiative (DCMI). Dublin Core Namespace Initiative, 2007.
FOAF
FOAF Vocabulary Specification, Dan Brickley, Libby Miller.
IANA Charset Registry
IANA Charset Registry. Internet Assigned Numbers Authority.
ISO 8601:2004
Data elements and interchange formats — Information interchange — Representation of dates and times, third edition, 2014-12-01. ISO.
RDF/XML
Dave Beckett. RDF/XML Syntax Specification (Revised). World Wide Web Consortium, 2006-02-10.
RFC 2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner (Harvard University). IETF.
RFC 2130
C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. Atkinson, M. Crispin, and P. Svanberg. RFC 2130: The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996. Internet Engineering Task Force, 1997.
RFC 2278
N. Freed and J. Postel. RFC 2278: IANA Charset Registration Procedures. Internet Engineering Task Force, 1998.
RFC 2445
RFC 2445: Internet Calendaring and Scheduling Core Object Specification (iCalendar), F. Dawson, D. Stenerson. IETF.
RFC 3339
G. Klyne and C. Newman. RFC 3339: Date and Time on the Internet: Timestamps. Internet Engineering Task Force, 2002.
RFC 3966
The tel URI for Telephone Numbers, H. Schulzrinne (Columbia University). IETF.
RFC 3986
T. Berners-Lee, R. Fielding, and L. Masinter. RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005.
RFC 3987
Internationalized Resource Identifiers (IRIs), M. Duerst (W3C), M. Suignard (Microsoft Corporation). IETF.
RFC 4122
A Universally Unique IDentifier (UUID) URN Namespace, P. Leach (Microsoft Corporation), M. Mealling (Refactored Networks, LLC), R. Salz (DataPower Technology, Inc.). IETF.
RFC 4627
D. Crockford. RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON). Internet Engineering Task Force, 2006.
RFC 4646
A. Phillips and M. Davis. RFC 4646: Tags for Identifying Languages. Internet Engineering Task Force, 2006.
RFC 4648
The Base16, Base32, and Base64 Data Encodings, S. Josefsson (SJD). IETF.
RFC 5322
Internet Message Format, P. Resnick, Ed. (Qualcomm Incorporated). IETF.
RFC 6838
Media Type Specifications and Registration Procedures, N. Freed (Oracle), J. Klensin, T.Hansen (AT&T Laboratories). IETF.
TZ
Time Zone Database. IANA.
UAX #15
Unicode® Standard Annex #15: Unicode Normalization Forms, Mark Davis, Ken Whistler. The Unicode Consortium.
Unicode BOM FAQ
Asmus Freytag and Mark Davis. Unicode Byte Order Mark (BOM) FAQ. Unicode, Inc., Retrieved 2006-06-07.
UTR #13
Mark Davis. Unicode Technical Report #13: Unicode Newline Guidelines. Unicode, Inc., 1999.
XML Schema 2
Paul V. Biron and Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. World Wide Web Consortium, 2004-10-28.

Acknowledgements