Uniform Resource Framework (URF) Specification
- Author
- Garret Wilson (GlobalMentor, Inc.)
- Version
- Draft 2020-02-01
Introduction
The Uniform Resource Framework (URF) provides a general, simple, and consistent way for representing data and their relationships. URF is useful for data storage, data interchange, data querying, and logical inferences. As a data framework URF allows similar yet simpler representation than the Resource Description Framework (RDF) . Together with its interchange formats, URF provides a powerful and concise replacement for data-oriented XML and JSON.
Serialization
This section is non-normative.
This document defines URF as an abstract data model. URF provides several official formats, specified in separate documents, including:
- TURF
- The canonical text serialization.
- SURF
- A simplified text serialization.
- COLURF
- A space-efficient, comma-separated, tabular text format.
- NURF
- A propositional statement-oriented representation.
- CSV URF
- A rigorous format for representing URF in a standard CSV format; intended primarily for importing legacy data.
Examples in this document may present URF instances in TURF and/or NURF for illustration purposes.
Conventions Used in this Document
The key words “must”, “must not”, “required”, “shall”, “shall not”, “should”, “should not”, “recommended”, “may”, and “optional” in this document are to be interpreted as described in RFC 2119. Parts of this specification marked as notes and annotations are non-normative.
Model
Everything that is described in URF is referred to as a resource. Each resource is the instance of some type. A resource may also be uniquely identified globally by a tag.
Statements
Resources are described by a set of statements, each claiming that some resource subject has a property with some property value.
A particular group of statements is called an URF instance. Each statement exists in some URF knowledge community. The root statements of an URF instance are part of the instance community and may be considered to be asserted in the same knowledge community as the instance community of all other URF instances.
Tags
URF identifies a resource using a tag, which takes the form of an IRI as defined in RFC 3987. Although a tag may be a URL, this specification does not define whether the location indicated by the tag indicates any actual, retrievable content; an URF tag functions solely as an identifier. A resource with no identifying tag is referred to as an anonymous resource.
Names
If a resource has a hierarchical tag using a path, the last path component, along with the decoded IRI fragment if present, is considered the name of the resource if it meets all of the following criteria:
- The non-fragment base name portion begins with a character from the Unicode
Letter
(L
) category; followed by zero or more characters each from theLetter
(L
) category, from theMark
(M
) category, from theDecimal_Number
(Nd
) category, and/or from theConnector_Punctuation
(Pc
) category; and optionally ending with thePLUS SIGN
+
(U+002B
). - The decoded fragment portion, if any, contains only characters each from the
Letter
(L
) category, from theMark
(M
) category, from theDecimal_Number
(Nd
) category, and/or from theConnector_Punctuation
(Pc
) category.
The sequence of Unicode code points in a name must follow Normalization Form C
(NFC
) as per UAX #15.
name_id_token
⇒ (Letter
|Mark
|Decimal_Number
|Connector_Punctuation
)+name_token
⇒Letter
name_id_token
?nary
⇒ '+'base_name
⇒name_token
nary
?name
⇒base_name
['#'name_id_token
]
For example, the resource with the tag https://urf.name/Tree
has the name Tree
, and the resource with the tag https://urf.name/Tree#456
has the name Tree#456
. However the resource with the tag https://urf.name/urf/String#foo%20bar
has no name, because the decoded fragment foo bar
does not meet the production of name_id_token
above.
IDs
A tag that contains a fragment identifier is referred to as an ID tag, and the decoded fragment part of the name is considered the ID of the resource, whether or the resource has a name. Thus the resource with the tag https://urf.name/Tree#456
has the ID 456
, the the resource https://urf.name/urf/String#foo:bar
has the ID foo:bar
, even though the latter resource has no name.
Namespaces
For those resources with names, the remaining part of the tag path (the part before the last path component) is considered the namespace of the resource. Namespaces should be used to group resources with related semantics.
Each additional path component of a tag path indicates a subnamespace inside the previous namespace. Thus the namespace identified by the IRI https://urf.name/bio/
is a subnamespace of that identified by https://urf.name/
. Each subnamespace path component should follow the syntax rules of a base name.
Ad Hoc Namespace
The namespace https://urf.name/
is the ad hoc namespace. Applications that need a low-overhead naming scheme may use resource names placed directly within this namespace. Names placed in the ad hoc namespace have the potential to clash if used with other vocabularies that also use the ad hoc namespace. The ad hoc namespace should only be used for applications for which the identity of the resource is unambiguous in the context.
The resource with tag https://urf.name/Tree#456
is an example of a resource in the ad hoc namespace.
Informal Namespaces
Subnamespaces of the ad hoc namespace are considered informal namespaces. Applications that to mix vocabularies but still do not require or desire the overhead of formal namespace definition may use resource names placed within informal namespaces. Informal namespaces should attempt to avoid name clashes by using subnamespace components that reflect well-known vocabulary identifiers (such as https://urf.name/refseq/
to annotate information in the NCBI Reference Sequence Database); or by using subnamespace components corresponding to domain components which the vocabulary author controls (such as https://urf.name/globalmentor
for GlobalMentor, Inc., which controls the globalmentor.com
domain).
The following namespaces which would otherwise be informal namespaces are reserved:
https://urf.name/example/
- Reserved for use as examples in documentation or for private testing.
https://urf.name/space/
- Reserved for declaring namespaces using directives in several URF formats.
https://urf.name/urf/
- Reserved for use by the URF specifications.
Formal Namespaces
All other namespaces constitute formal namespaces defined by third parties. For example, the Dublin Core Metadata Initiative defines the formal namespace http://purl.org/dc/elements/1.1/
for information in the Dublin Core Metadata Element Set, Version 1.1.
Handles
A handle represents a shorter yet unambiguous string representation of a resource tag. For tags in the https://urf.name/urf/
namespace or one of its subnamespaces, the resource handle is the same as its name, with zero or more segments prepended, each separated by the HYPHEN-MINUS
character -
(U+002D
). Each segment represents the decoded path statement of the https://urf.name/urf/
(sub)namespace, provided each segment meets the name_token
production.
For tags not in the https://urf.name/urf/
namespace or one of its subnamespaces, it may have a handle consisting of its name and some namespace alias prefix, separated by the SOLIDUS
character /
(U+002F
), provided every character in the namespace alias meets the name_token
production. A handle with a namespace alias must only be used is only in a context in which the namespace alias has been associated with a valid namespace, and must not contain any segments.
handle
⇒ [namespace_alias
'/'] (segment
'-')*name
namespace_alias
⇒name_token
segment
⇒name_token
Blank Tags
If a resource has no tag, but a system has a need for some stronger identification in some context, a system may generate a blank tag to represent the resource. A blank tag is formed by resolving some IRI fragment identifier to the IRI https://urf.name/urf/
. The fragment identifier must be unique for all blank tags within the context with which the blank tag will be shared. For example, a system might choose to generate a UUID according to RFC 4122 as the fragment identifier, forming the blank tag https://urf.name/urf/#330dca65-c7aa-49ab-b691-70af2b60ce03
. Although blank tags have IDs, they are not in the https://urf.name/urf/
namespace—and indeed have no namespace—as blank tags have no names.
Types
Each URF resource is the instance of some type. If a resource is not an instance of any more specific type, it is of type urf-Resource
.
ID Tag Types
An ID tag indicates both the unique tag and the type of the resource, with the type being the part of the tag with the fragment removed. For example, the resource identified by the tag https://urf.name/Tree#456
(Tree#456
) is of type Tree
(itself a resource, identified by the tag https://urf.name/Tree
). The resource ID must be globally unique across all resources of the same type. A resource ID will likely not be globally unique for resources of other types.
Lexical ID Types
The resources of some types are identified by ID tags that use the canonical lexical representation of the resource as the resource ID. For example, the urf-Boolean
value true
is identified by the tag https://urf.name/urf/Boolean#true
(urf-Boolean#true
). Similarly the integer 5
, which in URF is an instance of the urf-Integer
type, is identified by the tag https://urf.name/urf/urf-Integer#5
(urf-Integer#5
).
Classes
Resources that define types are called classes and are instances of the resource urf-Class
, which is itself instance of urf-Class
. A class can be a specialization or subclass of another class. The class urf-Integer
, for example, is a subclass of urf-Number
. The class urf-Class
is a subclass of urf-Resource
.
Properties
Each resource has zero or more properties. Each property is a resource and must have a tag. Each property forms a statement by associating a subject resource with of a single property value, a resource which may be anonymous.
If the base name of a property ends in the PLUS SIGN
+
(U+002B
), it is considered an n-ary property, representing a one-to-many relationship between a subject and a property value. Otherwise the property is a binary property, representing a one-to-one relationship between a subject and a property value. An n-ary property may associate the same subject with more than one value in an URF instance, while a binary property must associate a subject with at most one value.
Merging
To merge two URF instances is to combine the contents of the source URF instance with the contents of the destination URF instance to produce a result URF instance. The presence of a binary property for the same subject in a source URF instance should replace the value for property of the same subject in the destination URF instance. The presence of an n-ary property for the same subject with an additional value should be added to the destination URF instance.
Ontologies
URF Ontology
- Namespace
https://urf.name/urf/
Classes
urf-Binary
Represents an arbitrary sequence of octets. This is a lexical ID type. The canonical lexical form is the Base 64 encoding of the bytes as defined in RFC 4648, using the “base64url” alphabet with no padding.
urf-Boolean
Represents a Boolean value of true or false. This is a lexical ID type. The canonical lexical forms are true
and false
, respectively.
urf-Character
Represents a Unicode code point. This is a lexical ID type. The canonical lexical form is the character represented by the code point.
urf-Class
A type for generalizing instances of resources. Not every resource used as a class is an instance of the class urf-Class
, but a resource may be declared to be of the class type to further specify its semantics. The class urf-Class
is a subclass of the class urf-Resource
and an instance of the urf-Class
itself.
urf-Collection
The parent class for all collection resources urf-List
, urf-Map
, and urf-Set
. Collection resources represent abstract data types that can hold other resources.
urf-Decimal
Decimal numbers consist of all rational numbers that have a finite decimal representation. The semantics of this type prohibit rounding the fractional part within the supported range.
This is a lexical ID type. The canonical lexical form is identical to that of urf-Number
.
urf-Duration
TODO Duration resources represent lengths of time. They use an inline namespace with a lexical form consistent with [RFC 2445] and [ISO 8601] of P[nYnMnD][TnHnMn[.n]S]
, where n is some positive number of roman digits and at least the date or time section is present.
urf-EmailAddress
Represents an email address. This is a lexical ID type. The canonical lexical form is the “addr-spec” format specified in RFC 5322. The email must not include any obsolete elements (those starting with the prefix “obs-”) in RFC 5322. The lexical form must not include any “comments” or “folding white space” as defined by RFC 5322.
urf-Instant
An instant represents a normalized time of day on a particular calendar date. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
instant
⇒date
'T'time
'Z'
urf-Integer
Integer resources are the positive whole numbers, the negative whole numbers, and zero. This is a lexical ID type. The canonical lexical form is defined by the production rule:
- ['-']
digit
+
This class is a subclass of the class urf-Number
.
urf-Iri
Represents an Internationalized Resource Identifier (IRI) as defined in RFC 3987. This is a lexical ID type, and the canonical lexical form is the string representation of the IRI itself.
urf-Language
TODO Language resources represent human langages and use inline namespace URIs. The lexical form of each is the corresponding language tag described in [RFC 4646].
urf-List
List resources are resources that contain other element resources at certain indexes of the list. A list, like normal resources, may have any property, but the properties representing the elements of the list are of type urf-Ordinal
, each representing the ordinal index of the element. That is, if a list contains an element at index 5, the element resource will appear as a value of the property urf-Ordinal#5
. Although many uses cases will prefer a continuous, unduplicated sequence of index properties beginning with urf-Ordinal#5
, this is not an URF requirement.
urf-LocalDate
Represents a date with floating offset as per RFC 2445. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
local_date
⇒date
urf-LocalDateTime
Represents a date and time with floating offset as per RFC 2445. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
local_date_time
⇒date
'T'time
urf-LocalTime
Represents a time with floating offset as per RFC 2445. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
local_time
⇒time
urf-Map
A resource containing associations between keys and values. Each association is represented by an instance of urf-MapEntry
, which is a value of the urf-member+
property of the map resource.
urf-MapEntry
A resource representing a key-value mapping between the value of the urf-key
property value urf-value
property value.
urf-MediaType
A media type resource is an Internet media type described by RFC 6838. Internet media types are also known as MIME types and content types. Media types are lexical ID types with a canonical form that follows RFC 6838 with the following normalizations:
- The type, the subtype, and each parameter name must be in lowercase.
- If any value for any parameter is case-insensitive, it must be in lowercase. This means that values of the
charset
parameter such asUTF-8
, which is case insensitive and used in HTTP in both uppercase and lowercase, must be in lowercase canonical lexical form. - Parameters must be presented sorted primarily in lexicographic order of names and secondarily in lexicographic order of values, with no ABNF
WSP
around the parameters and their delimiters.
urf-MonthDay
Represents a calendar month and day without respect to any time zone. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
month_day
⇒ '-' '-'MM
'-'DD
urf-Number
The base class for all numerical values. Numbers are lexical ID types and include urf-Decimal
and urf-Integer
.
The canonical lexical form for urf-Number
is the base 10 representation of the number value that may be negative and may be fractional.
number
⇒ ['-']whole
[fraction
] [exponent
]whole
⇒digit
+fraction
⇒ '.'digit
+exponent
⇒ 'e' ['-' | '+']digit
+digit
⇒ '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
The canonical lexical form has:
- The canonical lexical form must not contain leading zeros in the
whole
component except to meet the requirement of at least one digit. - The canonical lexical form must not contain trailing zeros in the digit(s) in the
fraction
component except to meet the requirement of at least one digit. - The canonical lexical form must not contain leading zeros in the digit(s) in the
exponent
component. - The canonical lexical form must use a lowercase
'e'
in theexponent
component. - The canonical lexical form must not contain a
'+'
if the exponent is non-negative.
urf-OffsetDate
Represents a date with a fixed offset from UTC. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
offset_date
⇒date
offset
urf-OffsetDateTime
Represents a date and time with a fixed offset from UTC. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
offset_date_time
⇒date
'T'time
offset
urf-OffsetTime
Represents a time with a fixed offset from UTC. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
offset_time
⇒time
offset
urf-Ordinal
Ordinal resources are numbers that represent the position of an element in a sequence. URF currently only supports finite ordinals, which means that there will be a corresponding ordinal for every positive whole numbers and zero. This is a lexical ID type. The canonical lexical form is defined by the production rule:
digit
+
This class is a subclass of the class urf-Number
.
urf-Property
Not every resource used as a property is an instance of the class urf.Property
, but a resource may be declared to be of the urf-Property
type to further specify its semantics and expected domain and range. The class urf-Property
is a subclass of the class urf-Class
.
urf-RegularExpression
Represents a regular expression, a text-based pattern that define rules for the content of strings. This is a lexical ID type, and the canonical lexical form is the string representation of the regular expression itself. TODO add support for flags TODO reference regex specification, perhaps ISO/IEC 9945-2 or http://pubs.opengroup.org/onlinepubs/007908799/xbd/re.html
urf-Resource
Every resource is implicitly an instance of the class urf-Resource
or one of its descendant classes.
urf-Set
TODO Set resources are resources that contain at most one instance of other element resources. A set, like normal resources, may have any property, but the properties representing the elements of the set appear as values of the urf-member+
property.
urf-String
Represents a sequence of Unicode code points. This is a lexical ID type, and the canonical lexical form is the string itself, normalized to Normalization Form C
(NFC
) as per UAX #15.
urf-TelephoneNumber
Represents a telephone number. This is a lexical ID type. The canonical lexical form is the “global number” format prescribed by RFC 3966. The canonical representation must not include any “visual separators” as defined by RFC 3966.
urf-Temporal
The parent class of all the temporal types, representing date and/or time information based on ISO 8601. These are all lexical ID types and follow the productions below combined with their own productions. Time zone names tz
are from the IANA TZ database and are case-sensitive.
date
⇒YYYY
'-'MM
'-'DD
time
⇒hh
':'mm
':'ss
['.'s
]offset
⇒ ('+' | '-')hh
':'mm
YYYY
⇒digit
digit
digit
digit
MM
⇒digit
digit
DD
⇒digit
digit
hh
⇒digit
digit
mm
⇒digit
digit
ss
⇒digit
digit
s
⇒digit
digit
digit
[digit
digit
digit
[digit
digit
digit
] ]
TODO add support for durations
urf-Uuid
Represents a Universally Unique IDentifier (UUID) adhering to the requirements of RFC 4122. This is a lexical ID type. The canonical lexical form is the “UUID” production given in RFC 4122.
urf-Year
Represents a calendar year without respect to any time zone. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
year
⇒YYYY
urf-YearMonth
Represents a calendar year and a month without respect to any time zone. This is a subclass of urf-Temporal
and is a lexical ID type with the following production:
year_month
⇒year
'-'MM
urf-ZonedDateTime
Represents a date and time relative to a specific IANA TZ time zoneThis is a subclass of urf-Temporal
and is a lexical ID type with the following production:
zoned_date_time
⇒offset_date_time
'['tz
']'
Properties
urf-key
Represents the key in a key-value association, usually of an instance of urf-MapEntry
. This is a subclass of urf-Property
.
urf-member+
Represents an aggregation association between two resources, in which the subject resource “contains” one or more resources. This is a subclass of urf-Property
.
urf-type
Represents the type of a resource. This is a subclass of urf-Property
. An URF resource is only allowed one urf-type
, analogous to the type used by programming languages to create an “instance” of a class.
urf-value
Represents the value in a key-value association, usually of an instance of urf-MapEntry
. This is a subclass of urf-Property
.
Content Ontology
- Namespace
https://urf.name/content/
Classes
content-Charset
TODO The name of the mapping of integer values to a set of characters. This is equivalent to the charset Internet media type parameter described by [RFC 2046] and further elaborated in [RFC 2278] Section 2.3. A charset encapsulates both the concept of a coded character set and a character encoding scheme, as specified in [RFC 2130] Section 3.2. The lexical form is the canonical charset name specified by [IANA Charset Registry], such as UTF-16BE
.
References
- DCMI Namespace
- Andy Powell and Harry Wagner. Namespace Policy for the Dublin Core Metadata Initiative (DCMI). Dublin Core Namespace Initiative, 2007.
- FOAF
- FOAF Vocabulary Specification, Dan Brickley, Libby Miller.
- IANA Charset Registry
- IANA Charset Registry. Internet Assigned Numbers Authority.
- ISO 8601:2004
- Data elements and interchange formats — Information interchange — Representation of dates and times, third edition, 2014-12-01. ISO.
- RDF/XML
- Dave Beckett. RDF/XML Syntax Specification (Revised). World Wide Web Consortium, 2006-02-10.
- RFC 2119
- Key words for use in RFCs to Indicate Requirement Levels, S. Bradner (Harvard University). IETF.
- RFC 2130
- C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. Atkinson, M. Crispin, and P. Svanberg. RFC 2130: The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996. Internet Engineering Task Force, 1997.
- RFC 2278
- N. Freed and J. Postel. RFC 2278: IANA Charset Registration Procedures. Internet Engineering Task Force, 1998.
- RFC 2445
- RFC 2445: Internet Calendaring and Scheduling Core Object Specification (iCalendar), F. Dawson, D. Stenerson. IETF.
- RFC 3339
- G. Klyne and C. Newman. RFC 3339: Date and Time on the Internet: Timestamps. Internet Engineering Task Force, 2002.
- RFC 3966
- The tel URI for Telephone Numbers, H. Schulzrinne (Columbia University). IETF.
- RFC 3986
- T. Berners-Lee, R. Fielding, and L. Masinter. RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005.
- RFC 3987
- Internationalized Resource Identifiers (IRIs), M. Duerst (W3C), M. Suignard (Microsoft Corporation). IETF.
- RFC 4122
- A Universally Unique IDentifier (UUID) URN Namespace, P. Leach (Microsoft Corporation), M. Mealling (Refactored Networks, LLC), R. Salz (DataPower Technology, Inc.). IETF.
- RFC 4627
- D. Crockford. RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON). Internet Engineering Task Force, 2006.
- RFC 4646
- A. Phillips and M. Davis. RFC 4646: Tags for Identifying Languages. Internet Engineering Task Force, 2006.
- RFC 4648
- The Base16, Base32, and Base64 Data Encodings, S. Josefsson (SJD). IETF.
- RFC 5322
- Internet Message Format, P. Resnick, Ed. (Qualcomm Incorporated). IETF.
- RFC 6838
- Media Type Specifications and Registration Procedures, N. Freed (Oracle), J. Klensin, T.Hansen (AT&T Laboratories). IETF.
- TZ
- Time Zone Database. IANA.
- UAX #15
- Unicode® Standard Annex #15: Unicode Normalization Forms, Mark Davis, Ken Whistler. The Unicode Consortium.
- Unicode BOM FAQ
- Asmus Freytag and Mark Davis. Unicode Byte Order Mark (BOM) FAQ. Unicode, Inc., Retrieved 2006-06-07.
- UTR #13
- Mark Davis. Unicode Technical Report #13: Unicode Newline Guidelines. Unicode, Inc., 1999.
- XML Schema 2
- Paul V. Biron and Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. World Wide Web Consortium, 2004-10-28.
Acknowledgements
- Brad Neuberg encouraged the creation of an alternate RDF serialization.
- Frank Manola made convincing arguments for using ordinals instead of integers as list element properties.