Text URF (TURF) Specification
- Author
- Garret Wilson (GlobalMentor, Inc.)
- Version
- Draft 2020-06-10
Introduction
Text URF (TURF) is the text interchange format for URF. TURF emphasizes terseness and consistency while maintaining human readability, with a preference for using symbols from existing interchange formats such as JSON and programming languages such as Java and C#. TURF has several useful properties, including:
- TURF has an optional signature character sequence,
===>urf<
, for easy recognition and/or embedding in other formats. - TURF has an optional unambiguous end delimiter sequence
===
, allowing TURF to be embedded in existing text-based content. - Whitespace separators and comments may be removed following certain rules with no loss of semantics, resulting in an extremely compact representation.
- Standard delimiters are used, such as
{…}
for sets,[…]
for lists,"…"
for strings, and<…>
for IRIs. - A properties-only variant, TURF Properties, useful for embedded metadata in a document or storing metadata as a file sidecar.
Conventions Used in this Document
The key words “must”, “must not”, “required”, “shall”, “shall not”, “should”, “should not”, “recommended”, “may”, and “optional” in this document are to be interpreted as described in RFC 2119. Parts of this specification marked as notes and annotations are non-normative.
Internet Media Type
The Internet media type (RFC 6838) of a TURF document shall be text/urf
and must be encoded in UTF-8, UTF-16, or UTF-32. A TURF document must not begin with a byte order mark (BOM) or UTF-8 signature. If encoded in any encoding other than UTF-8, a TURF document must begin with a TURF signature.
TURF differs from general text media types in the following:
- The default charset of TURF is UTF-8 rather that ASCII.
- TURF allows any Unicode newline character (UTR #13) to represent newlines rather than only
CRLF
.
For any application-specific URF data encoded as TURF should be represented as application/applicationName+turf
as a structured syntax name suffix as described in RFC 6838 to allow the data to be recognized as such, where applicationName
is the application-specific identifier for the URF information.
TURF Properties Format Variant
This specification provides for a format variant with less boilerplate than a general TURF document, suitable for serving as a description sidecar file or providing metadata embedded in some other format. The Internet media type of this TURF Properties format shall be text/urf-properties
, and must adhere to all requirements for a TURF document, except for its distinct document type and the special body syntax described below.
Document
A TURF document encodes an URF instance (TODO reference URF) as one or more graphs of resources representing URF statements. A TURF document may be contain no resource representations, in which case it representing no URF statements.
Syntax
The production rules in this specification reference and build upon those already defined in URF
Whitespace
TURF consider the following characters as whitespace, including characters in the Unicode Space_Separator
(Zs
) category.
whitespace
⇒tab
|vtab
|ff
|sp
|nbsp
|zwnbspr
|Space_Separator
tab
⇒CHARACTER TABULATION
(U+0009
)vtab
⇒LINE TABULATION
(U+000B
)ff
⇒FORM FEED (FF)
(U+000C
)sp
⇒SPACE
(U+0020
)nbsp
⇒NO-BREAK SPACE
(U+00A0
)zwnbspr
⇒ZERO WIDTH NO-BREAK SPACE
(U+FEFF
)
This specification uses the MIDDLE DOT
character ·
to represent zero or more whitespace characters.
- · ⇒
whitespace
*
Line Endings
TURF recognizes both the CARRIAGE RETURN (CR)
character (U+000D
), the LINE FEED (LF)
character (U+000A
), and any Unicode Line_Separator
(Zl
) or Paragraph_Separator
(Zp
) character as marking the end of a line. A TURF parser must behave as if every CRLF
sequence as well as every CR
not followed by a LF
were normalized to a single LF
. A TURF serializer should use the conventional line ending sequence supported by the platform on which it is running if that sequence is allowed by this specification.
eol
⇒cr
|lf
|Line_Separator
|Paragraph_Separator
cr
⇒CARRIAGE RETURN (CR)
(U+000D
)lf
⇒LINE FEED (LF)
(U+000A
)
Comments
Line Comments
A line comment may appear before the end of any line. A line comment begins with the EXCLAMATION MARK
character !
(U+0021
) and proceeds to the next line ending character.
line_comment
⇒ '!' [^eol
]*
Filler
Some structures allow the addition of whitespace, line comments, and/or line endings; these are collectively referred to as filler.
filler
⇒ (whitespace
|line_comment
|eol
)*
Sequences
Several TURF types allow components to be presented in a sequence. A sequence is a syntactical construct indicated by the form item-sequence
, where item
is the construct that may appear zero or more times in the sequence.
Any two items in a sequence are separated by a sequence separator, which is either a COMMA
character ,
(U+002C
) optionally surrounded by filler; or filler with at least one line break but without a COMMA
character. If a COMMA
character is present, an item must follow. If no COMMA
character or filler is present, an item must not follow. This means that filler may end a sequence or appear in an empty sequence.
item-sequence
⇒filler
[item
(sequence_next_comma_separated
|sequence_next_break_separated
)*filler
]sequence_next_comma_separated
⇒filler
','filler
item
sequence_next_break_separated
⇒ ·line_comment
?eol
filler
item
Structure
A TURF document is divided into four sections: a header, a document description, a body, and a footer, each of which is optional. If a footer is included, a header must be included as well..
document
⇒header
?filler
document_description
?filler
body
?filler
footer
?filler
Header
The optional document header begins with three EQUALS SIGN
=
characters (U+003D
) and is followed immediately by a document type, a modified media type representation between the GREATER-THAN SIGN
character >
(U+003E
) and the LESS-THAN SIGN
character <
(U+003C
), in that order. If a header is included it must appear at the first character of the document.
header
⇒ "==="document_type
document_type
⇒ '>'rfc_6838_media_type
directives
? '<'
The document type uses the same production as a media-type literal, with the following modifications:
- The type, subtype, and parameter name(s) should be in lowercase.
- The If the top-level type is
text
, the top-level type and its followingSOLIDUS
(U+002F
) delimiter should be left out. Specifically the media typetext/urf
should be indicated as simply>urf<
. - The
charset
parameter must not be present. - Immediately following the RFC 6838 representation of the media type but before the ending media type delimiter, the document type header may contain directives, a sequence of name-value pairs that provide serialization details essential for parsing.
A TURF parser must not return directives as part of the URF instance, but may provide a way to access them separately.
directives
⇒ ':'directive-sequence
';'directive
⇒handle
filler
'='filler
literal
A directive handle must not include a namespace alias. The production rules for literals appear later in this specification.
If no header is present, the document type shall default to the media type of the document. Currently only document types for the text/urf
and the text/urf-properties
media types are supported. A TURF Parser must not allow a document type that differs from any known media type for the document itself.
Signature
This section is experimental. If a header is present it functions as a signature to identify the document type. A TURF parser must use the bytes of the signature, or the absence of a signature, to determine the charset and byte order of the document, but only if no other indication of charset and byte order is present. A TURF parser may use the signature as a heuristic to reasonably determine that some file in fact contains a TURF document.
signature
⇒ "===>urf" (';' | ':' | '<')
Note that detection of the signature currently requires that the document type follow all the recommendations in this specification. The signature presented here will not detect documents that do not follow all the recommendations here even though they are otherwise conforming TURF documents.
Directives
Namespace Declaration
For every directive corresponding to a tag in the https://urf.name/space/
namespace, the tag name declares a namespace alias prefix to be used with handles within the document body, and the value, which must be an urf-Iri
literal, designates the namespace with which the alias is associated. A TURF serializer must generate a header containing appropriate namespace declarations for all custom namespaces used in the document.
For example a directive handle space-dc
indicates that dc
is to be used as a namespace alias prefix. The example TURF document header in the figure above associates that prefix with the <http://purl.org/dc/elements/1.1/>
namespace. Thus if dc/creator
were to be used within the body of the example document above, it would indicate the tag <http://purl.org/dc/elements/1.1/creator>
.
Document Description
This section is experimental. The optional document description provides a description of the document contents, separate from its serialization and independent of the represented URF graph. The document description begins and ends with the NUMBER SIGN
#
(U+0023
), and contains a sequence of name-value pairs. A TURF parser must not return the document description as part of the URF instance graph, but should provide some means for it to be retrieved after parsing is finished.
document_description
⇒ '#'document_property-sequence
'#'document_property
⇒handle
filler
'='filler
literal
Body
The content of the document body depends on the document type.
TURF Body
If the document type is text/urf
, the document body contains zero or more resources, which may recursively contain other resources.
body
⇒resource
*
TURF Properties Body
If the document type is text/urf-properties
, the document body contains zero or more properties, which are interpreted as describing an implicit, anonymous, untyped root object.
body
⇒property
*
*:
and ending ;
delimiters defining an anonymous object with no type.Footer
This section is experimental. The document footer, if present, explicitly indicates the end of the TURF document.
footer
⇒ "==="
Resources
A resource consists of an optional label followed by a resource representation.
resource
⇒label
? ·resource_representation
|label
described_resource
⇒label
? ·resource_representation
·description
? |label
resource_representation
⇒handle
·object
? |object
|literal
|collection
literal
⇒binary
|boolean
|character
|email
|iri
|media_type
|number
|regex
|string
|telephone
|temporal
|uuid
collection
⇒list
|map
|set
A label consists of an identifier; which is either an URF name, a string, or an IRI; a surrounded by matching VERTICAL LINE
characters |
(U+007C
). The first occurrence of a label with a particular identifier may include a resource representation; if no resource representation is present at the first appearance of a label with some identifier, an object with no type and no description is implied. Subsequent appearances of a label with the same identifier must not include a resource representation. A nested resource representation may refer to the label of an outer resource in the graph.
The tokens false
and true
must not appear as handles in a TURF document.
label
⇒alias_label
|id_label
|tag_label
alias_label
⇒ '|'alias
'|'id_label
⇒ '|'id
'|'tag_label
⇒ '|'tag
'|'alias
⇒name_token
id
⇒string
tag
⇒iri
If a label uses an URF name as its identifier, it indicates an alias for referencing resources only within the confines of the TURF document. If the identifier is an IRI, it represents the URF tag of the resource. A tag label must not introduce any resource representation that itself represents a tag, such as a literal or an object represented by a handle.
A string as the identifier indicates the URF ID for a resource. The ID shall be combined with the tag of the resource type to form the resource tag, as prescribed by the URF specification. This implies that an ID must not appear in front of any resource representation other than an object; and if an ID is present an object must indicate a type and must not be represented by a handle. Furthermore because of the URF formula for creating resource tags given an ID and a type tag, if a label indicates an ID the object type must not itself also indicate an ID.
Objects
Objects are are general resources with an optional type and that may be described by a description.
object
⇒ '*' ·type
?type
⇒tag_reference
tag_reference
⇒tag_label
|handle
If a type is indicated, it represents an URF statement with the object as the subject and the identified type as the value of the urf-type
property. If preceded by a tag reference (that is, a tag label or a equivalently a handle introduces the object), the indicated tag serves as the tag of the object.
Descriptions
A description must not follow any resource representation other than an object. This restriction may be lifted in a future version of TURF. A description must not contain more than one property
with the same handle
, and a TURF parser must consider such a condition as a non-recoverable error. (TODO lift restriction here and in SURF for JSON compatibility) A TURF parser must preserve all distinct values of each n-ary property.
description
⇒ ':'property-sequence
';'property
⇒tag_reference
filler
('='filler
resource
) |description
A TURF parser must interpret a description appearing immediately after the property tag reference as if an object with no type appeared as the resource in the property production. For example the property foo:…;
is the short-hand equivalent of foo=*:…;
.
Literals
TURF literals are lexical representations of tagged resources with lexical ID types. Any TURF literal can also be represented by indicating the resource's lexical representation in an ID label as explained above. The literal 1.23
can also be represented as |"1.23"|*urf-Number
, for example. If the resource canonical lexical representation is also an URF name, it may even be represented as a handle. For example the literal "foo"
can be represented as urf-String#foo
.
The following definitions in many cases delegate to the respective URF definitions of the canonical lexical representations for the respective types. The summaries of the URF canonical lexical representations when given are for informational purposes only. In addition a production rule may refer to the canonical lexical representation of an URF type. For example urf-EmailAddress_lex
refers the canonical lexical representation of the urf-EmailAddress
type.
Binary
The TURF binary literal representation for resources with the urf-Binary
type begins with the PERCENT SIGN
character %
(U+0025
) and is followed by the canonical lexical representation. As defined by URF, the canonical lexical representation for urf-Binary
is zero or more bytes be encoded using the “Base 64 Encoding” defined in RFC 4648, beginning with the PERCENT SIGN
character %
(U+0025
), using “base64url” alphabet with no Base 64 padding.
binary
⇒ '%'urf-Binary_lex
Boolean
The TURF literal representation of urf-Boolean
resources is simply the canonical lexical representation of the resource: either of the tokens true
or false
.
boolean
⇒urf-Boolean_lex
Character
The TURF literal representation for an urf-Character
resource is the Unicode character being represented, delimited on both sides by the APOSTROPHE
character '
(U+0027
). The backslash or REVERSE SOLIDUS
\
(U+005C
) is used as as an escape character. The APOSTROPHE
, REVERSE SOLIDUS
, and control characters must not appear in a character unless they are escaped. The following escape sequences are allowed:
\\
REVERSE SOLIDUS
(U+005C
)\/
SOLIDUS
(U+002F
)\'
APOSTROPHE
(U+0027
)\b
BACKSPACE
(U+0008
)\f
FORM FEED (FF)
(U+000C
)\n
LINE FEED (LF)
(U+000A
)\r
CARRIAGE RETURN (CR)
(U+000D
)\t
CHARACTER TABULATION
(U+0009
)\v
LINE TABULATION
(U+000B
)\uXXXX
- Any 16-bit Unicode code point encoding, where
XXXX
is four hexadecimal digits in any case. Escaped Unicode code points outside the Basic Multilingual Plane must be represented as two UTF-16 surrogate characters.
A TURF parser must correctly interpret characters outside the Basic Multilingual Plane, whether represented as a literal character or as an escaped Unicode code point.
TODO production
Email Address
An email address literal for the urf-EmailAddress
type begins with the CIRCUMFLEX ACCENT
character ^
(U+005E
) commonly known as a “caret”, followed by the URF canonical lexical representation: the “addr-spec” format specified in RFC 5322, without any obsolete elements, comments or folding white space.
email
⇒ '^'urf-EmailAddress_lex
IRI
The literal representation of urf-Iri
is the canonical lexical representation for that type (from RFC 3987) placed between a LESS-THAN SIGN
character <
(U+003C
) and a GREATER-THAN SIGN
character >
(U+003E
).
iri
⇒ '<' (urf-Iri_lex
|email
|telephone
|uuid
) '>'
If an email address, telephone number, or UUID appears between the delimiters, it represents an “IRI short form” that is equivalent to a literal IRI according to the following rules:
email
- The email address is converted into an IRI with a scheme of
mailto
according to RFC 6068. telephone
- The telephone is converted into an IRI with a scheme of
tel
according to RFC 3966. uuid
- The UUID is converted into a IRI with a scheme of
urn
and a URN namespace ofuuid
according to RFC 4122.
Media Type
A media type, sometimes referred to as a “content type”, indicates the type of content contained in a resource and is essential for navigating the World Wide Web. It consists of a type and subtype, optionally followed by one or more parameters.
TURF places the media type between the GREATER-THAN SIGN
character >
(U+003E
) and the LESS-THAN SIGN
character <
(U+003C
), in that order. This representation is not to be confused with that of an IRI, which uses the same delimiters but in a different order.
media_type
⇒ '>'rfc_6838_media_type
'<'
The syntax of the media type is that prescribed by RFC 6838 with the following additional restrictions and recommendations:
- The type, subtype, and parameter name(s) should be in lowercase.
- The If the top-level type is
text
, the top-level type and its followingSOLIDUS
(U+002F
) delimiter may be left out. For example, the media typetext/plain
may be indicated as simply>plain<
in TURF. - The value of the
charset
parameter should be in lowercase. - If any value for any parameter other than
charset
is case-insensitive, it must be in lowercase.
Number
There are several related literal representations for the urf-Number
type and its subclasses. The TURF literal production rules are more lenient than the corresponding URF canonical representations, and must be normalized. The general production rule for the urf-Number
types is a base 10 representation that may be negative and may be fractional.
If the literal begins with the DOLLAR SIGN
character $
(U+0024
), it represents an urf-Decimal
instance. A TURF parser must represent a decimal instance exactly without rounding within the supported range. Specifically a TURF parser must not represent decimal numbers using IEEE 754.
If the literal does not begin with the DOLLAR SIGN
character $
(U+0024
) it represents an instance of urf-Number
unless it contains neither a fraction nor an exponent component, in which case it represents an instance of urf-Integer
.
number
⇒ ['$'] ['-']whole
[fraction
] [exponent
]whole
⇒digit
+fraction
⇒ '.'digit
+exponent
⇒ ('e' | 'E') ['-' | '+']digit
+digit
⇒ '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
A number literal should be in its canonical form:
- No leading zeros in the
whole
component except to meet the requirement of at least one digit. - No trailing zeros in the digit(s) in the
fraction
component except to meet the requirement of at least one digit. - No leading zeros in the digit(s) in the
exponent
component. - A lowercase
'e'
in theexponent
component.
Nevertheless the presence of any leading zero(s) in the whole
component shall not be interpreted as indicating any other number base other than base 10.
Regular Expression
The TURF literal representation for an urf-RegularExpression
is its URF canonical lexical representation surrounded by slash or SOLIDUS
character /
(U+002F
). The backslash or REVERSE SOLIDUS
\
(U+005C
) is interpreted as as an escape character only if followed by a slash character /
.
TODO decide on whether and how to allow flags
regex
⇒ '/'urf-RegularExpression_lex
'/'
String
An urf-String
literal is its URF canonical lexical representation, escaped as necessary, and delimited on both sides by the QUOTATION MARK
character "
(U+0022
). The sequence of Unicode code points in a string should follow Normalization Form C
(NFC
) as per UAX #15. The backslash or REVERSE SOLIDUS
\
(U+005C
) is used as as an escape character. The QUOTATION MARK
, REVERSE SOLIDUS
, and control characters must not appear in a string unless they are escaped. The following escape sequences are allowed:
\\
REVERSE SOLIDUS
(U+005C
)\/
SOLIDUS
(U+002F
)\"
QUOTATION MARK
(U+0022
)\b
BACKSPACE
(U+0008
)\f
FORM FEED (FF)
(U+000C
)\n
LINE FEED (LF)
(U+000A
)\r
CARRIAGE RETURN (CR)
(U+000D
)\t
CHARACTER TABULATION
(U+0009
)\v
LINE TABULATION
(U+000B
)\uXXXX
- Any 16-bit Unicode code point encoding, where
XXXX
is four hexadecimal digits in any case. Escaped Unicode code points outside the Basic Multilingual Plane must be represented as two UTF-16 surrogate characters.
TODO production
Telephone
The literal representation of urf-TelephoneNumber
is exactly its URF canonical lexical form: the format prescribed by RFC 3966, which is a PLUS SIGN
+
(U+002B
), followed by at least one digit, with no visual separators. The presence of the beginning PLUS SIGN
provides a built-in TURF delimiter.
telephone
⇒urf-TelephoneNumber_lex
Temporal
The TURF literal representations for the urf-Temporal
types are each identical to the URF canonical lexical representation for the respective type, with a prefix of the COMMERCIAL AT
character @
(U+0040
). The URF canonical lexical representations for the most part comply with ISO 8601, with time zones from the IANA TZ database represented following java.time.format.DateTimeFormatter.ISO_ZONED_DATE_TIME
.
UUID
A TURF literal for the urf-Uuid
type is the AMPERSAND
character &
(U+0026
) followed by its URF canonical lexical representation: the “UUID” production of RFC 4122.
uuid
⇒ '&'urf-Uuid_lex
Collections
Collections represent the urf-Collection
abstract data types that can hold other resources.
List
A TURF list represents an instance of urf-List
using an order-significant sequence of zero or more element resources with optional descriptions, beginning with a LEFT SQUARE BRACKET
character [
(U+005B
) and ending with a RIGHT SQUARE BRACKET
character ]
(U+005D
).
list
⇒ '['element-sequence
']'element
⇒described_resource
Each element represents an URF statement in which the urf-List
instance is the subject, the identified resource is the property value, and the statement property is an instance of urf-Ordinal
reflecting the zero-based position of the element within the list. TODO define urf-Ordinal in URF and in TURF
Map
A TURF map represents an instance of urf-Map
using a sequence of associations between a key and a value. A TURF map begins a LEFT CURLY BRACKET
character {
(U+007B
) and ends with a RIGHT CURLY BRACKET
character }
(U+007D
). Keys and values can be any resources. If a key is an object with a description, the key must be surrounded by the REVERSE SOLIDUS
character \
(U+005C
). The key and value in each association represent an urf-MapEntry
are separated by a COLON
character :
(U+003A
).
A TURF map should not have entries with duplicate keys, and a TURF serializer must not produce a map with duplicate-key entries. A surf parser must ignore all but one of each entry with the same key. TODO revisit; this bring JSON compatibility, but could cause problems with tags if a duplicate entry is ignored; also address key equality
map
⇒ '{'entry-sequence
'}'entry
⇒key
filler
':'filler
value
key
⇒ '\'described_resource
'\' |resource
value
⇒described_resource
Each entry represents four URF statements:
- A statement that a new anonymous resource has an
urf-Type
ofurf-MapEntry
. - A statement that the
urf-key
of theurf-MapEntry
instance is the key identified in the TURF document. - A statement that the
urf-value
of theurf-MapEntry
instance is the value identified in the TURF document. - A statement that the
urf-MapEntry
instance is anurf-member+
of theurf-Map
instance.
Set
A TURF set represents an instance of urf-Set
using an unordered sequence of zero or more member resources with optional descriptions, beginning with a LEFT PARENTHESIS
character (
(U+0028
) and ending with a RIGHT PARENTHESIS
character )
(U+0029
). The same resource must not appear more than once in a set.
set
⇒ '('member-sequence
')'member
⇒described_resource
Each member represents an URF statement that the identified resource is an urf-member+
of the urf-Set
instance.
Examples
This section is non-normative.
JSON
In the JavaScript Object Notation (JSON) (RFC 7159), every object besides strings, numbers, and booleans are associative arrays using string keys. The following is a complex JSON object, using every JSON data type available:
As all JSON documents are also valid TURF documents, the above document could be parsed as TURF. It would however have some undesirable traits, because of slight semantic differences and because of limitations in JSON. In TURF the JSON “object” would be considered a map, for example. Note also that URF uses a true ordinal property for each list element, while JSON adds each array element as the value associated with a string containing the lexical form of an integer. Unlike JSON, in TURF the commas separating the values are optional if the values appear on separate lines.
Although the above TURF representation replicates the simple semantics of the given JSON example, a better formulation that takes advantage of URF semantics would most importantly use actual properties rather than associative array string key pseudo-properties. An improved formulation would also indicate the type of the resource (the FooBar
class in this example). Such improvements are illustrated in the following reformulation (assuming the "code"
string in the list was intended to actually be a key in a map):
In a TURF document, moreover, the data can be further improved by using true dates, decimal numbers, identifiers such as tags, and vocabularies from formal namespaces.
RDF/XML
URF is a semantic superset of RDF, and can represent any construct available in RDF. URF presents equivalents of many RDF classes as well. In terms of representation formats, TURF can represent any semantic information that is available using RDF/XML, yet is more flexible and has fewer restrictions. The following is a sample RDF/XML representation of an RDF data instance using many of the capabilities available in RDF/XML:
The following URF information represented in TURF is semantically equivalent to the RDF information in the previous example represented in RDF/XML:
Note that, rather than use special literal types or general strings, URF promotes the representation of resources by URI tags. Thus the language tag "pt-BR"
for Brazilian Portuguese is represented as a resource with a lexical ID tag with the type urf-Language
, that is |<https://urf.name/urf/Language#pt-BR>|
, or in its TURF literal form TODO, and the integer value 23
is represented as a resource with a lexical ID tag with the type urf-Integer
, that is |<https://urf.name/urf/Integer#23>|
, shown here in its TURF literal form, #23
.
TODO clarify and fix RDF language tags and properties in the content vocabulary
VCard
The URF VCard Ontology provides a representation of VCard [RFC 2426] within a semantic framework. The following is VCard information in traditional vCard MIME Directory Profile syntax as specified by [RFC 2426].
The same vCard information in its URF VCard formulation is shown below for the resource «http://example.com/example/janedoe»
. Note that the URF VCard version provides more semantics by using true classes and properties to describe what [RFC 2426] calls structured values.
References
- DCMI Namespace
- Andy Powell and Harry Wagner. Namespace Policy for the Dublin Core Metadata Initiative (DCMI). Dublin Core Namespace Initiative, 2007.
- Guise
- Guise™ Internet Application Framework. GlobalMentor, Inc.
- IANA Charset Registry
- IANA Charset Registry. Internet Assigned Numbers Authority.
- IEEE 754-2008
- IEEE Standard for Floating-Point Arithmetic. IEEE.
- ISO 8601
- ISO 8601:2004(E): Data elements and interchange formats — Information interchange — Representation of dates and times. International Organization for Standardization, 2004-12-01.
- RDF/XML
- RDF 1.1 XML Syntax, Fabien Gandon (INRIA), Guus Schreiber (VU University Amsterdam), W3C, 2014.
- RFC 2119
- Key words for use in RFCs to Indicate Requirement Levels, S. Bradner (Harvard University). IETF.
- RFC 2130
- C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. Atkinson, M. Crispin, and P. Svanberg. RFC 2130: The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996. Internet Engineering Task Force, 1997.
- RFC 2278
- N. Freed and J. Postel. RFC 2278: IANA Charset Registration Procedures. Internet Engineering Task Force, 1998.
- RFC 2426
- F. Dawson and T. Howes. RFC 2426: vCard MIME Directory Profile. Internet Engineering Task Force, 1998.
- RFC 2445
- F. Dawson and and D. Stenerson. RFC 2445: Internet Calendaring and Scheduling Core Object Specification (iCalendar). Internet Engineering Task Force, 1998.
- RFC 3339
- G. Klyne and C. Newman. RFC 3339: Date and Time on the Internet: Timestamps. Internet Engineering Task Force, 2002.
- RFC 3966
- The tel URI for Telephone Numbers, H. Schulzrinne (Columbia University). IETF.
- RFC 3986
- Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee (W3C/MIT), R. Fielding (Day Software), L. Masinter (Adobe Systems). IETF.
- RFC 3987
- Internationalized Resource Identifiers (IRIs), M. Duerst (W3C), M. Suignard (Microsoft Corporation). IETF.
- RFC 4122
- A Universally Unique IDentifier (UUID) URN Namespace, P. Leach (Microsoft Corporation), M. Mealling (Refactored Networks, LLC), R. Salz (DataPower Technology, Inc.). IETF.
- RFC 4627
- D. Crockford. RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON). Internet Engineering Task Force, 2006.
- RFC 4646
- A. Phillips and M. Davis. RFC 4646: Tags for Identifying Languages. Internet Engineering Task Force, 2006.
- RFC 4648
- The Base16, Base32, and Base64 Data Encodings, S. Josefsson (SJD). IETF.
- RFC 5322
- Internet Message Format, P. Resnick, Ed. (Qualcomm Incorporated). IETF.
- RFC 6068
- The 'mailto' URI Scheme, M. Duerst (Aoyama Gakuin University), L. Masinter (Adobe Systems Incorporated), J. Zawinski (DNA Lounge). IETF.
- RFC 6838
- Media Type Specifications and Registration Procedures, N. Freed (Oracle), J. Klensin, T.Hansen (AT&T Laboratories). IETF.
- RFC 7159
- The JavaScript Object Notation (JSON) Data Interchange Format, T. Bray (Google, Inc.). IETF.
- Unicode BOM FAQ
- Asmus Freytag and Mark Davis. Unicode Byte Order Mark (BOM) FAQ. Unicode, Inc., Retrieved 2006-06-07.
- URF
- Uniform Resource Framework (URF) Specification, Garret Wilson (GlobalMentor, Inc.).
- UTR #13
- Unicode Technical Report #13: Unicode Newline Guidelines, Mark Davis. Unicode, Inc.
- XML Schema 2
- Paul V. Biron and Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. World Wide Web Consortium, 2004-10-28.
Acknowledgements
- Brad Neuberg encouraged the creation of an alternate RDF serialization.
- Frank Manola made convincing arguments for using ordinals instead of integers as list element properties.