Text URF (TURF) Specification

Garret Wilson (GlobalMentor, Inc.)
Draft 2019-03-07


TURF is the text interchange format for URF. TURF emphasizes terseness and consistency while maintaining human readability, with a preference for using symbols from existing interchange formats such as JSON and programming languages such as Java and C#. TURF has several useful properties, including:

Conventions Used in this Document

The key words “must”, “must not”, “required”, “shall”, “shall not”, “should”, “should not”, “recommended”, “may”, and “optional” in this document are to be interpreted as described in RFC 2119. Parts of this specification marked as notes and annotations are non-normative.

Internet Media Type

The Internet media type (RFC 2046, RFC 6657) of a TURF document shall be text/urf and must be encoded in UTF-8, UTF-16, or UTF-32. A TURF document must not begin with a byte order mark (BOM) or UTF-8 signature. If encoded in any encoding other than UTF-8, a TURF document must begin with a TURF signature.

TURF differs from general RFC 2046 text media types in the following:

For any application-specific URF data encoded as TURF should be represented as application/applicationName+turf to allow the data to be recognized as such, where applicationName is the application-specific identifier for the URF information.


A TURF document encodes an URF instance (TODO reference URF) as one or more graphs of resources representing URF statements. A TURF document may be contain no resource representations, in which case it representing no URF statements.


The production rules in this specification reference and build upon those already defined in URF


TURF consider the following characters as whitespace, including characters in the Unicode Space_Separator (Zs) category.

Line Endings

TURF recognizes both the CARRIAGE RETURN (CR) character (U+000D), the LINE FEED (LF) character (U+000A), and any Unicode Line_Separator (Zl) or Paragraph_Separator (Zp) character as marking the end of a line. A TURF parser must behave as if every CRLF sequence as well as every CR not followed by a LF were normalized to a single LF. A TURF serializer should use the conventional line ending sequence supported by the platform on which it is running if that sequence is allowed by this specification.


Line Comments

A line comment may appear before the end of any line. A line comment begins with the EXCLAMATION MARK character ! (U+0021) and proceeds to the next line ending character.


The comments and whitespace that may appear between some structures is referred to as filler, indicated in this specification using the MIDDLE DOT character ·.

Line Breaks

A line break is any end of line surrounded on either side by filler.


Several TURF types allow components to be presented in a sequence. A sequence is a syntactical construct indicated by the form item-sequence, where item is the construct that may appear zero or more times in the sequence.

Any two items in a sequence are separated by a sequence separator, which is either a COMMA character , (U+002C) optionally surrounded by line line breaks; or one or more line breaks without a COMMA character. If a COMMA character is present, an item must follow. This means that one or more line breaks may end a sequence or appear in an empty sequence.


A TURF document is divided into four sections: a header with optional signature, a document description, a body, and a footer, each of which is optional. If a footer is included, a header must be included as well..


Example TURF document with optional header, signature, and footer.
  space-dc = <http://purl.org/dc/elements/1.1/>
  space-foaf = <http://xmlns.com/foaf/0.1/>

! body of document …


The optional document header begins and ends with a backslash or REVERSE SOLIDUS \ (U+005C). If a header is included it must appear at the first character of the document. The header may contain directives, a sequence of name-value pairs that provide serialization details essential for parsing. A TURF parser must not return directives as part of the URF instance, but may provide a way to access them separately. The production rules for literals appear later in this specification.


This section is experimental. If a signature is present, it must appear starting at the first character header. A TURF parser must use the bytes of the signature, or the absence of a signature, to determine the charset and byte order of the document. A TURF parser may use the signature as a heuristic to reasonably determine that some file in fact contains a TURF document.


Namespace Declaration

For every directive corresponding to a tag in the https://urf.name/space/ namespace, the tag name declares a namespace alias prefix to be used with handles within the document body, and the value, which must be an urf-Iri literal, designates the namespace with which the alias is associated. A TURF serializer must generate a header containing appropriate namespace declarations for all custom namespaces used in the document.

For example a directive handle space-dc indicates that dc is to be used as a namespace alias prefix. The example TURF document header in the figure above associates that prefix with the <http://purl.org/dc/elements/1.1/> namespace. Thus if dc/creator were to be used within the body of the example document above, it would indicate the tag <http://purl.org/dc/elements/1.1/creator>.

Document Description

Example TURF document with a document description.
  space-dc = <http://purl.org/dc/elements/1.1/>
  dc/title = "Example Document"
  dc/creator = "Jane Doe"

! body of document …

This section is experimental. The optional document description provides a description of the document contents, separate from its serialization and independent of the represented URF graph. The document description begins and ends with the NUMBER SIGN # (U+0023), and contains a sequence of name-value pairs. A TURF parser must not return the document description as part of the URF instance graph, but should provide some means for it to be retrieved after parsing is finished.


The document body contains zero or more resources, which may recursively contain other resources.


This section is experimental. The document footer, if present, explicitly indicates the end of the TURF document.


A resource consists of an optional label followed by a resource representation.

A label consists of an identifier; which is either an URF name, a string, or an IRI; a surrounded by matching VERTICAL LINE characters | (U+007C). The first occurrence of a label with a particular identifier may include a resource representation; if no resource representation is present at the first appearance of a label with some identifier, an object with no type and no description is implied. Subsequent appearances of a label with the same identifier must not include a resource representation. A nested resource representation may refer to the label of an outer resource in the graph.

The tokens false and true must not appear as handles in a TURF document.

If a label uses an URF name as its identifier, it indicates an alias for referencing resources only within the confines of the TURF document. If the identifier is an IRI, it represents the URF tag of the resource. A tag label must not introduce any resource representation that itself represents a tag, such as a literal or an object represented by a handle.

A string as the identifier indicates the URF ID for a resource. The ID shall be combined with the tag of the resource type to form the resource tag, as prescribed by the URF specification. This implies that an ID must not appear in front of any resource representation other than an object; and if an ID is present an object must indicate a type and must not be represented by a handle. Furthermore because of the URF formula for creating resource tags given an ID and a type tag, if a label indicates an ID the object type must not itself also indicate an ID.


Objects are are general resources with an optional type and that may be described by a description.

If a type is indicated, it represents an URF statement with the object as the subject and the identified type as the value of the urf-type property. If preceded by a tag reference (that is, a tag label or a equivalently a handle introduces the object), the indicated tag serves as the tag of the object.


A description must not follow any resource representation other than an object. This restriction may be lifted in a future version of TURF. A description must not contain more than one property with the same handle, and a TURF parser must consider such a condition as a non-recoverable error. (TODO lift restriction here and in SURF for JSON compatibility) A TURF parser must preserve all distinct values of each n-ary property.


TURF literals are lexical representations of tagged resources with lexical ID types. Any TURF literal can also be represented by indicating the resource's lexical representation in an ID label as     explained above. The literal 1.23 can also be represented as |"1.23"|*urf-Number, for example. If the resource canonical lexical representation is also an URF name, it may even be represented as a handle. For example the literal "foo" can be represented as urf-String#foo.

The following definitions in many cases delegate to the respective URF definitions of the canonical lexical representations for the respective types. The summaries of the URF canonical lexical representations when given are for informational purposes only. In addition a production rule may refer to the canonical lexical representation of an URF type. For example urf-EmailAddress_lex refers the canonical lexical representation of the urf-EmailAddress type.


The TURF binary literal representation for resources with the urf-Binary type begins with the PERCENT SIGN character % (U+0025) and is followed by the canonical lexical representation. As defined by URF, the canonical lexical representation for urf-Binary is zero or more bytes be encoded using the “Base 64 Encoding” defined in RFC 4648, beginning with the PERCENT SIGN character % (U+0025), using “base64url” alphabet with no Base 64 padding.


The TURF literal representation of urf-Boolean resources is simply the canonical lexical representation of the resource: either of the tokens true or false.


The TURF literal representation for an urf-Character resource is the Unicode character being represented, delimited on both sides by the APOSTROPHE character ' (U+0027). The backslash or REVERSE SOLIDUS \ (U+005C) is used as as an escape character. The APOSTROPHE , REVERSE SOLIDUS, and control characters must not appear in a character unless they are escaped. The following escape sequences are allowed:

Any 16-bit Unicode code point encoding, where XXXX is four hexadecimal digits in any case. Escaped Unicode code points outside the Basic Multilingual Plane must be represented as two UTF-16 surrogate characters.

A TURF parser must correctly interpret characters outside the Basic Multilingual Plane, whether represented as a literal character or as an escaped Unicode code point.

TODO production

Email Address

An email address literal for the urf-EmailAddress type begins with the CIRCUMFLEX ACCENT character ^ (U+005E) commonly known as a “caret”, followed by the URF canonical lexical representation: the “addr-spec” format specified in RFC 5322, without any obsolete elements, comments or folding white space.


The literal representation of urf-Iri is the canonical lexical representation for that type (from RFC 3987) placed between a LESS-THAN SIGN character < (U+003C) and a GREATER-THAN SIGN character > (U+003E).

If an email address, telephone number, or UUID appears between the delimiters, it represents an “IRI short form” that is equivalent to a literal IRI according to the following rules:

The email address is converted into an IRI with a scheme of mailto according to RFC 6068.
The telephone is converted into an IRI with a scheme of tel according to RFC 3966.
The UUID is converted into a IRI with a scheme of urn and a URN namespace of uuid according to RFC 4122.


There are several related literal representations for the urf-Number type and its subclasses. The TURF literal production rules are more lenient than the corresponding URF canonical representations, and must be normalized. The general production rule for the urf-Number types is a base 10 representation that may be negative and may be fractional.

If the literal begins with the DOLLAR SIGN character $ (U+0024), it represents an urf-Decimal instance. A TURF parser must represent a decimal instance exactly without rounding within the supported range. Specifically a TURF parser must not represent decimal numbers using IEEE 754.

If the literal does not begin with the DOLLAR SIGN character $ (U+0024) it represents an instance of urf-Number unless it contains neither a fraction nor an exponent component, in which case it represents an instance of urf-Integer.

A number literal should be in its canonical form:

Nevertheless the presence of any leading zero(s) in the whole component shall not be interpreted as indicating any other number base other than base 10.

Regular Expression

The TURF literal representation for an urf-RegularExpression is its URF canonical lexical representation surrounded by slash or SOLIDUS character / (U+002F). The backslash or REVERSE SOLIDUS \ (U+005C) is interpreted as as an escape character only if followed by a slash character /.

TODO decide on whether and how to allow flags


An urf-String literal is its URF canonical lexical representation, escaped as necessary, and delimited on both sides by the QUOTATION MARK character " (U+0022). The sequence of Unicode code points in a string should follow Normalization Form C (NFC) as per UAX #15. The backslash or REVERSE SOLIDUS \ (U+005C) is used as as an escape character. The QUOTATION MARK , REVERSE SOLIDUS, and control characters must not appear in a string unless they are escaped. The following escape sequences are allowed:

Any 16-bit Unicode code point encoding, where XXXX is four hexadecimal digits in any case. Escaped Unicode code points outside the Basic Multilingual Plane must be represented as two UTF-16 surrogate characters.

TODO production


The literal representation of urf-TelephoneNumber is exactly its URF canonical lexical form: the format prescribed by RFC 3966, which is a PLUS SIGN + (U+002B), followed by at least one digit, with no visual separators. The presence of the beginning PLUS SIGN provides a built-in TURF delimiter.

Example TURF telephone number.
  • +12015550123


Example TURF temporal literals.
  • @2017-02-12T23:29:18.829Z
  • @2017-02-12T15:29:18.829-08:00[America/Los_Angeles]
  • @2017-02-12T15:29:18.829-08:00
  • @2017-02-12-08:00
  • @15:29:18.829-08:00
  • @2017-02-12T15:29:18.829
  • @2017-02-12
  • @15:29:18.829
  • @2017-02
  • @--02-12
  • @2017

The TURF literal representations for the urf-Temporal types are each identical to the URF canonical lexical representation for the respective type, with a prefix of the COMMERCIAL AT character @ (U+0040). The URF canonical lexical representations for the most part comply with ISO 8601, with time zones from the IANA TZ database represented following java.time.format.DateTimeFormatter.ISO_ZONED_DATE_TIME.


A TURF literal for the urf-Uuid type is the AMPERSAND character & (U+0026) followed by its URF canonical lexical representation: the “UUID” production of RFC 4122.

Example SURF UUID.
  • &f81d4fae-7dec-11d0-a765-00a0c91e6bf6


Collections represent the urf-Collection abstract data types that can hold other resources.


A TURF list represents an instance of urf-List using an order-significant sequence of zero or more element resources with optional descriptions, beginning with a LEFT SQUARE BRACKET character [ (U+005B) and ending with a RIGHT SQUARE BRACKET character ] (U+005D).

Each element represents an URF statement in which the urf-List instance is the subject, the identified resource is the property value, and the statement property is an instance of urf-Ordinal reflecting the zero-based position of the element within the list. TODO define urf-Ordinal in URF and in TURF


A TURF map represents an instance of urf-Map using a sequence of associations between a key and a value. A TURF map begins a LEFT CURLY BRACKET character { (U+007B) and ends with a RIGHT CURLY BRACKET character } (U+007D). Keys and values can be any resources. If a key is an object with a description, the key must be surrounded by the REVERSE SOLIDUS character \ (U+005C). The key and value in each association represent an urf-MapEntry are separated by a COLON character : (U+003A).

A TURF map should not have entries with duplicate keys, and a TURF serializer must not produce a map with duplicate-key entries. A surf parser must ignore all but one of each entry with the same key. TODO revisit; this bring JSON compatibility, but could cause problems with tags if a duplicate entry is ignored; also address key equality

Each entry represents four URF statements:

  1. A statement that a new anonymous resource has an urf-Type of urf-MapEntry.
  2. A statement that the urf-key of the urf-MapEntry instance is the key identified in the TURF document.
  3. A statement that the urf-value of the urf-MapEntry instance is the value identified in the TURF document.
  4. A statement that the urf-MapEntry instance is an urf-member+ of the urf-Map instance.


A TURF set represents an instance of urf-Set using an unordered sequence of zero or more member resources with optional descriptions, beginning with a LEFT PARENTHESIS character ( (U+0028) and ending with a RIGHT PARENTHESIS character ) (U+0029). The same resource must not appear more than once in a set.

Each member represents an URF statement that the identified resource is an urf-member+ of the urf-Set instance.


This section is non-normative.


In the JavaScript Object Notation (JSON) (RFC 7159), every object besides strings, numbers, and booleans are associative arrays using string keys. The following is a complex JSON object, using every JSON data type available:

  "length": 1234,
  "valid": true,
  "status": "processing",
  "results": [false, 5, "dog", {"code": 9.8}]

As all JSON documents are also valid TURF documents, the above document could be parsed as TURF. It would however have some undesirable traits, because of slight semantic differences and because of limitations in JSON. In TURF the JSON “object” would be considered a map, for example. Note also that URF uses a true ordinal property for each list element, while JSON adds each array element as the value associated with a string containing the lexical form of an integer. Unlike JSON, in TURF the commas separating the values are optional if the values appear on separate lines.

Although the above TURF representation replicates the simple semantics of the given JSON example, a better formulation that takes advantage of URF semantics would most importantly use actual properties rather than associative array string key pseudo-properties. An improved formulation would also indicate the type of the resource (the FooBar class in this example). Such improvements are illustrated in the following reformulation (assuming the "code" string in the list was intended to actually be a key in a map):

  length = 1234
  valid = true
  status = "processing"
  results = [false, 5, "dog", {"code" = 9.8}]

In a TURF document, moreover, the data can be further improved by using true dates, decimal numbers, identifiers such as tags, and vocabularies from formal namespaces.


URF is a semantic superset of RDF, and can represent any construct available in RDF. URF presents equivalents of many RDF classes as well. In terms of representation formats, TURF can represent any semantic information that is available using RDF/XML, yet is more flexible and has fewer restrictions. The following is a sample RDF/XML representation of an RDF data instance using many of the capabilities available in RDF/XML:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
  <foaf:Person rdf:about="http://example.com/example#janedoe">
    <foaf:nick xml:lang="pt-BR">Janinha</foaf:nick>
    <example:age rdf:datatype="&xsd;integer">23</example:age>
    <example:birthdate rdf:datatype="&xsd;date">1980-04-05</example:birthdate>
    <example:motto rdf:parseType="Literal">Do it. Do it <xhtml:em>right</xhtml:em>.</example:motto>
    <example:favoriteSites rdf:parseType="Collection">
      <rdf:Description rdf:about="http://www.globalmentor.com/"/>
      <rdf:Description rdf:about="http://www.garretwilson.com/"/>

The following URF information represented in TURF is semantically equivalent to the RDF information in the previous example represented in RDF/XML:

  space-dc = <http://purl.org/dc/elements/1.1/>
  space-foaf = <http://xmlns.com/foaf/0.1/>
  space-example = <http://example.com/example/>
  example/age = #23
  example/birthdate = @1980-04-05@
  example/favoriteSites = [
  example.motto="Do it. Do it <xhtml:em xmlns:xhtml=\"http://www.w3.org/1999/xhtml\">right</xhtml:em>.":
  example/possibleVacationDestinations = (

Note that, rather than use special literal types or general strings, URF promotes the representation of resources by URI tags. Thus the language tag "pt-BR" for Brazilian Portuguese is represented as a resource with a lexical ID tag with the type urf-Language, that is |<https://urf.name/urf/Language#pt-BR>|, or in its TURF literal form TODO, and the integer value 23 is represented as a resource with a lexical ID tag with the type urf-Integer, that is |<https://urf.name/urf/Integer#23>|, shown here in its TURF literal form, #23.

TODO clarify and fix RDF language tags and properties in the content vocabulary


The URF VCard Ontology provides a representation of VCard [RFC 2426] within a semantic framework. The following is VCard information in traditional vCard MIME Directory Profile syntax as specified by [RFC 2426].

FN:Jane Doe
ORG:Example Corporation;North American Division;Business Development
TITLE:Directory of Business Development
ADR;TYPE=WORK,POSTAL,PARCEL:;Suite 45;123 Some Street;Someplace;CA;12345-6789;USA
LABEL="123 Some Street\nSuite 45\nSomeplace, CA 12345-6789"

The same vCard information in its URF VCard formulation is shown below for the resource «http://example.com/example/janedoe». Note that the URF VCard version provides more semantics by using true classes and properties to describe what [RFC 2426] calls structured values.

    vcard.fn="Jane Doe"
      vcard.additionalName=\"Mary", "Ann"\
      vcard.honorarySuffix=\"M.D.", "Ph.D."\
    vcard.org=\"Example Corporation", "North American Division", "Business Development"\
    vcard.title="Directory of Business Development"
      vcard.extendedAddress="Suite 45"
      vcard.streetAddress="123 Some Street"
    vcard.label="123 Some Street\nSuite 45\nSomeplace, CA 12345-6789"
    vcard.email=\<janedoe@example.com>, <jdoe@example.org>\
TODO convert to new TURF conceptualization


DCMI Namespace
Andy Powell and Harry Wagner. Namespace Policy for the Dublin Core Metadata Initiative (DCMI). Dublin Core Namespace Initiative, 2007.
Guise™ Internet Application Framework. GlobalMentor, Inc.
IANA Charset Registry
IANA Charset Registry. Internet Assigned Numbers Authority.
IEEE 754-2008
IEEE Standard for Floating-Point Arithmetic. IEEE.
ISO 8601
ISO 8601:2004(E): Data elements and interchange formats — Information interchange — Representation of dates and times. International Organization for Standardization, 2004-12-01.
RDF 1.1 XML Syntax, Fabien Gandon (INRIA), Guus Schreiber (VU University Amsterdam), W3C, 2014.
RFC 2046
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, N. Freed (Innosoft), N. Borenstein. IETF.
RFC 2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner (Harvard University). IETF.
RFC 2130
C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. Atkinson, M. Crispin, and P. Svanberg. RFC 2130: The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996. Internet Engineering Task Force, 1997.
RFC 2278
N. Freed and J. Postel. RFC 2278: IANA Charset Registration Procedures. Internet Engineering Task Force, 1998.
RFC 2426
F. Dawson and T. Howes. RFC 2426: vCard MIME Directory Profile. Internet Engineering Task Force, 1998.
RFC 2445
F. Dawson and and D. Stenerson. RFC 2445: Internet Calendaring and Scheduling Core Object Specification (iCalendar). Internet Engineering Task Force, 1998.
RFC 3339
G. Klyne and C. Newman. RFC 3339: Date and Time on the Internet: Timestamps. Internet Engineering Task Force, 2002.
RFC 3966
The tel URI for Telephone Numbers, H. Schulzrinne (Columbia University). IETF.
RFC 3986
Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee (W3C/MIT), R. Fielding (Day Software), L. Masinter (Adobe Systems). IETF.
RFC 3987
Internationalized Resource Identifiers (IRIs), M. Duerst (W3C), M. Suignard (Microsoft Corporation). IETF.
RFC 4122
A Universally Unique IDentifier (UUID) URN Namespace, P. Leach (Microsoft Corporation), M. Mealling (Refactored Networks, LLC), R. Salz (DataPower Technology, Inc.). IETF.
RFC 4627
D. Crockford. RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON). Internet Engineering Task Force, 2006.
RFC 4646
A. Phillips and M. Davis. RFC 4646: Tags for Identifying Languages. Internet Engineering Task Force, 2006.
RFC 4648
The Base16, Base32, and Base64 Data Encodings, S. Josefsson (SJD). IETF.
RFC 5322
Internet Message Format, P. Resnick, Ed. (Qualcomm Incorporated). IETF.
RFC 6068
The 'mailto' URI Scheme, M. Duerst (Aoyama Gakuin University), L. Masinter (Adobe Systems Incorporated), J. Zawinski (DNA Lounge). IETF.
RFC 6657
Update to MIME regarding "charset" Parameter Handling in Textual Media Types, A. Melnikov (Isode Limited), J. Reschke (greenbytes). IETF.
RFC 7159
The JavaScript Object Notation (JSON) Data Interchange Format, T. Bray (Google, Inc.). IETF.
Unicode BOM FAQ
Asmus Freytag and Mark Davis. Unicode Byte Order Mark (BOM) FAQ. Unicode, Inc., Retrieved 2006-06-07.
Uniform Resource Framework (URF) Specification, Garret Wilson (GlobalMentor, Inc.).
UTR #13
Unicode Technical Report #13: Unicode Newline Guidelines, Mark Davis. Unicode, Inc.
XML Schema 2
Paul V. Biron and Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. World Wide Web Consortium, 2004-10-28.