UNL Logo

The Universal Networking Language (UNL)

Specifications

Version 2010

UNDL Foundation

December 2010

The current specifications constitute a comprehensive set of guidelines and standards for the development and implementation of the Universal Networking Language (UNL). Prepared by the UNDL Foundation, they result from an extensive revision of the previous specifications and incorporate the outcomes of several UNL projects, as well as the experience gained through UNL annotation and tool development. These specifications strengthen the language-independency features of UNL and have been thoroughly discussed within the UNL community, particularly through the UNLweb platform. They replace the earlier specifications and provide a more consistent and robust framework for the effective use of UNL across different applications and platforms.

1. Introduction

The UNL, short for Universal Networking Language, is a computer language designed to represent the meanings expressed by natural languages in a machine-readable and language-independent form. Its purpose is not to replicate the communicative functions of human language, but to provide a formal computational framework through which the semantics of any utterance can be explicitly encoded and processed by computers. By allowing machines to handle information at the level of meaning, the UNL enables the emulation of human linguistic abilities grounded in interpretation and comprehension, thereby offering a foundation for multilingual understanding, knowledge processing, and intelligent communication.

1.1 History

The UNL Programme was launched in 1996 as an initiative of the Institute of Advanced Studies of the United Nations University (UNU/IAS) in Tokyo, Japan. In January 2001, the United Nations University established an autonomous organization—the UNDL Foundation—to oversee the development and management of the Programme. The Foundation is a non-profit international organization, independent from the UNU yet maintaining special links with the United Nations. It inherited from the UNU/IAS the mandate to implement the UNL Programme, and its headquarters are located in Geneva, Switzerland.

Since its inception, the UNL Programme has reached several important milestones. The overall architecture of the UNL System has been developed, along with a suite of core software components and tools essential to its operation, which continue to be tested and refined. Over the past years, a substantial collection of linguistic resources in multiple native languages has been accumulated, supported by a robust technical infrastructure that enables the inclusion of many more languages in the system. In parallel, an increasing number of scientific papers and academic dissertations on UNL are being published each year.

One of the Programme’s most notable achievements is the recognition of UNL’s innovative nature and industrial applicability by the Patent Cooperation Treaty (PCT), granted in May 2002 through the World Intellectual Property Organization (WIPO). This was an unprecedented accomplishment within the United Nations system.

This recognition not only affirms the UNL’s value as a pioneering linguistic framework but also underscores its potential for practical applications in fields such as artificial intelligence, natural language processing, and multilingual communication.

1.2 Structure

The UNL is a declarative language designed to express information and knowledge in the form of a semantic hypergraph. A semantic hypergraph is a structured representation made of interconnected pieces of meaning, where each node (or hyper-node) corresponds to a concept, and each arc corresponds to a semantic relation between concepts.

In the UNL framework, meaning can be codified at three different levels, according to its nature: conceptual, relational, and attributive. Accordingly, the UNL semantic hypergraph is composed of three types of discrete semantic units:

  • Universal Words (UWs): represent concepts (they are the nodes or hyper-nodes in the UNL semantic hypergraph);
  • Universal Relations: represent relations between concepts (they are the arcs in the UNL semantic hypergraph);
  • Universal Attributes: represent properties, characteristics, or contextual features of concepts (they function as annotations within the UNL semantic hypergraph).

Consider the following sentences:

{ara} القط السمين يجلس على السجادة. {/ara}
{chn} 胖猫坐在垫子上。 {/chn}
{deu} Die fette Katze sitzt auf dem Teppich. {/deu}
{eng} The fat cat sits on the mat. {/eng}
{esp} El gato gordo está sentado sobre la alfombra. {/esp}
{fra} Le gros chat est assis sur le tapis. {/fra}
{jpn} 太った猫がマットの上に座っています。 {/jpn}
{rus} Толстый кот сидит на коврике. {/rus}

In UNL, all these sentences are said to convey the same meaning and are represented in the following way:

                        {unl}
                        aoj(300986027, 102121620.@def)
                        exp(201543123.@present, 102121620.@def)
                        plc(201543123.@present, 103727837.@superior.@adjacent.@def
                        {/unl}
                        

In this UNL expression:

  • {unl} and {/unl} mark the beginning and end of the UNL segment.
  • The indexes 300986027 (= having an (over)abundance of flesh), 102121620 (= feline mammal usually having thick soft fur and no ability to roar), 201543123 (= to be seated), and 103727837 (= a thick flat pad used as a floor covering) correspond to concepts expressed by Universal Words (UWs), here represented by their Uniform Concept Locators (UCLs). Each UCL is a unique identifier within the UNL Knowledge Base (UNL KB) — a large semantic network in which concepts are formally defined through their relations to other concepts, allowing their meaning to be inferred computationally. For readability, a UW may alternatively be represented by its Uniform Concept Name (UCN), which varies by language. For instance, 102121620 may appear as “قط(icl>سنوريات)”, “猫(icl>猫科)”, “cat(icl>feline)”, “Katze(icl>Katzenartige)”, “gato(icl>félido)”, “chat(icl>félidé)”, “猫(icl>ネコ科)”, or “кошка(icl>кошачьи)”.
  • aoj (attribute of an object), exp (experiencer), and plc (place) are binary semantic relations between UWs. They encode, for example, that 102121620 (“fat”, “gordo”, “gros”, “太った”, “толстый”) is an attribute of 300986027 (“cat”, “gato”, “chat”, “猫”, “кошка”); that 201543123 (“sit”, “sentarse”, “s’asseoir”, “座る”, “сидеть”) denotes an action experienced by 300986027; and that 103727837 (“mat”, “alfombra”, “tapis”, “マット”, “коврик”) indicates the location associated with 300986027.
  • @def, @present, and @superior.@adjacent are attributes that further specify UWs. The attribute @def marks definiteness (a specific referent); @present locates the event in present time; and @superior.@adjacent expresses the spatial relation “on top of” between two entities.

This representation captures the semantics of sentences independently of their original language, while providing a structured format that can be computationally processed. Crucially, it is language-independent, enabling cross-linguistic understanding and semantic interoperability. For example, languages without a direct lexical item for “cat” can retrieve its definition through the corresponding UCL in the UNL KB; conversely, languages that encode “to sit on” as a single lexical unit can map that unit to the entire relation (for instance, the plc relation linking two UWs) rather than to a single UW.

1.3 Assumptions

1. Languages convey information

The UNL assumes that one of the primary functions of natural languages is to convey information—that is, to represent what we know about the world. This aboutness of language—its representational role—is the main focus of the UNL. The goal of the UNL is therefore not to replicate what natural languages do, but to formally represent what they represent.

2. Information can be represented as semantic networks

The UNL assumes that any information conveyed through natural language can be formally and usefully represented by a semantic network. This idea is not new: semantic networks have been used in knowledge representation since the work of Charles S. Peirce, and as interlinguas for machine translation since the 1950s. In the UNL framework, this semantic network (the UNL graph) is composed of three types of discrete semantic entities: concepts, relations, and attributes. Concepts correspond to the nodes of the network; relations are the arcs connecting them; and attributes delimit the contextual use of concepts. This three-layered representation model is the cornerstone of the UNL and a distinctive feature compared to other semantic networks, which typically rely on only two levels: edges and vertices.

3. Any information may be expressed in any language

The UNL assumes that the information conveyed by natural languages is translatable—that is, languages differ not in their ability to express meaning, but in how they do so. To guarantee this translatability of information, the UNL requires that the semantic network be independent of any particular natural language (in other words, universal). This universality, however, is not meant to imply that all languages share a common set of meanings or that there are semantic primitives inherent to all languages; the notion of "universal" in UNL refers to the capacity of the system to be used and understood across all languages (as in "universal adapter" or "universal remote control"), rather than to represent a common denominator (as in "Universal Grammar"). This goal is achieved through a standardized set of semantic entities that are universally accessible—that is, whose meanings are explicitly defined and can be inferred from the components of the UNL System, such as the UNL Knowledge Base.

1.4 Commitments

The main goal of the UNL Programme is to construct the UNL, an artificial language that can be used to process information across the language barriers.

The major commitments of the UNL are the following:

I - The UNL must represent information

The UNL is first and foremost a knowledge representation language. The most important corollary of this first commitment is that UNL is not a meta-language, i.e., it is not intended to describe or represent natural languages; on the contrary, it is used to represent the information conveyed by natural languages. The goal of UNL is to represent "what was meant" and not "what was said". Accordingly, the UNL provides an interpretation rather than a translation of a given utterance. The UNL version of an existing document is not bound to preserve the lexical and the syntactic choices of the original, but must represent, in a non-ambiguous format, one of its possible meanings, preferably the most conventional one.

II - The UNL must be a language for computers

The UNL is an artificial language shaped to represent knowledge in a machine-tractable format. Like other formal systems, it seeks to provide the infrastructure for computers to handle what is meant by natural languages. Differently from other auxiliary languages (such as Esperanto, Interlingua, Volapük, Ido and others), the UNL is not intended to be a human language. We do not expect people to speak UNL or to communicate in UNL. But we do expect computers to process UNL: to generate UNL out of natural language, and vice-versa, with and without human aid. We expect computers to be able to extract information from UNL documents, and to detect paraphrases, entailments, implicatures, presuppositions, inferences, contradictions and redundancies among a set of propositions represented in UNL.

III - The UNL must be self-sufficient

In the UNL approach, there are two basic movements: UNLization and NLization. UNLization is the process of representing the information conveyed by natural language into UNL; NLization, conversely, is the process of generating a natural language document out of UNL. In order to be fully "understandable" (and manageable) by machines, the UNL must be self-sufficient, i.e., should be as semantically complete and saturated as possible. The UNL representation must not depend on any implicit knowledge, and should explicitly codify all information. This means that the UNLization should be completely independent from the NLization, and vice-versa, i.e., the UNLization should not take into consideration which will be the target language or format of any future NLization; and the NLization should not need any information about the original source language or previous structure of any UNL document.

IV - The UNL must be general-purpose

At first glance, the UNL seems to be a pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge. In addition to translation, the UNL is expected to be used in several other different tasks, such as text mining, multilingual document generation, summarization, text simplification, information retrieval and extraction, sentiment analysis etc. Indeed, in UNL-based systems there is no need for the source language to be different from the target language: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified, a localized or a simply rephrased version of the original.

V - The UNL must be independent from any particular natural language

The UNL is expected to be the language of the United Nations and, therefore, must not be circumscribed to any existing natural language in particular, under the risk of being rejected by the state members of the General Assembly.

1.5 Properties

Non-Ambiguity

As a formal system, the UNL is not expected to have any ambiguity, at any level. The sentence "The girls saw the boy with the telescope" must be represented, in UNL, in a way that there is no ambiguity concerning the meaning of "saw" (past tense of the verb "to see" x present tense of the verb "to saw" x noun "saw") or the dependency relations of "with the telescope" ("saw with the telescope" x "the boy with the telescope").

Non-Redundancy

As a knowledge representation language, the UNL is not expected to have any redundancy. Expressions such as "free gift", "round circle" and "murder to death" are expected to be represented, in UNL, as "gift", "circle" and "murder", respectively. Likewise, sentences such as "Peter killed John", "Peter murdered John", "It's Peter who killed John" and "John was killed by Peter" are expected to be represented in UNL in the same way.

Compositionality

As a formal system, the UNL is always literal, i.e., fully compositional. UNL expressions must derive their semantic value thoroughly from their components, which must be explicitly defined in the UNL Knowledge Base. Accordingly, the UNL does not allow for any figure of speech, such as metaphor and metonymy. Tropes must be represented, in UNL, by their intended meaning. A sentence such as "John devoured thousands of books", for instance, must be represented, in UNL, as "John read many books eagerly".

Declarativeness

As a knowledge representation language, the UNL is not expected to perform speech acts (such as promises, requests, orders etc.), but only to describe them in a constative manner. For instance, given a performative utterance such as "Can you pass me the salt?", the role of the UNL is to represent "you pass the salt to me" and to indicate that this was a polite request (@polite, @request). The UNL representation itself will not be a request, nor will be bound to provoke the same (perlocutionary) effect caused by the original utterance.

Completeness

As a fully-explicit semantic system, the UNL is not expected to have ellipses or pro-forms, except when the referent is not present in the document (exophora). A sentence such as "The monkey took the banana and ate it" must be represented, in UNL, as "[The monkey]i took [the banana]j and [the monkey]i ate [the banana]j".

2. Universal Words (UWs)

Universal Words, or simply UW's, are the words of UNL, and correspond to nodes - to be interlinked by Universal Relations and specified by Universal Attributes - in a UNL graph. They correspond to semantic discrete units conveyed by natural language open lexical categories (noun, verb, adjective and adverb). Any other semantic content (such as the ones conveyed by articles, prepositions, conjunctions etc.) is represented as attributes or relations. This criterion is not language-biased: if a given semantic value proves to be conveyed, in any language, by a closed class, it should not be represented as a UW, regardless of its realisation in other languages.

2.1 The universality of UWs

As the name suggests, Universal Words (UWs) are intended to be “universal.” This does not mean that they represent a common lexical denominator across all languages or any kind of semantic primitive. In the UNL framework, the notion of “universal” should be understood as “usable and understandable by all” (as in universal adapter, universal remote control, or universal screwdriver), rather than “common to all” (as in Universal Grammar). UWs are “universal” in the sense that they serve as uniform identifiers for entities defined in the UNL Knowledge Base—a comprehensive semantic map of world knowledge designed to make any concept translatable.

UWs may correspond to concepts that are widely lexicalized across languages (such as “cause to die”); to concepts lexicalized only in a few languages (for instance, “to execute someone by suffocation so as to leave the body intact for dissection”); to highly specific concepts lexicalized in a single language (such as “a person who forgives the first offense, tolerates the second, but never the third”); or even to concepts not lexicalized in any known language (for example, “women who typically wear red hats and white shoes in large theaters”).

The universality of a UW lies not in the type of concept it represents, but in the way it represents it. A UW provides a standardized means of encoding a concept so that any natural language can interact with it—either as a single node when the concept is lexicalized, or as a hyper-node (a subgraph) when it is not.

2.2 Principles

Sense

UWs represent sense, not reference. They are associated with the intension (sense, meaning, connotation) rather than with the extension (reference, denotation) of linguistic expressions. For example, the expressions “morning star” and “evening star” have the same reference (the planet Venus) but must be represented by different UWs, since they express distinct “modes of presentation” of the same object—that is, they differ in sense: “the last star to disappear in the morning” versus “the first star to appear in the evening.”

Productivity

UWs correspond exclusively to meanings conveyed by open lexical categories—nouns, verbs, adjectives, and adverbs. Any other semantic content (such as those expressed by articles, prepositions, conjunctions, or particles) must be represented through attributes or relations. This criterion is language-independent: if a given semantic value is expressed by a closed class in any language, it should not be represented as a UW, regardless of its realization elsewhere. The only exception are pro-forms, which are represented by a special kind of UW, the pro-UW or null UW.

Compositionality

Simple UWs correspond only to meanings that are non-compositional—that is, to words and multiword expressions whose meanings cannot be fully derived from the combination of existing UWs, attributes, and relations. Compound and complex UWs, on the other hand, must be used whenever a meaning can be fully determined by the meanings of its constituents and by the rules used to combine them.

Comprehensiveness

UWs are “universal” in the sense that they constitute the lexicon of a universal language—one capable of representing meanings expressible in any natural language. This does not mean that all UWs are lexicalized everywhere, nor that they represent semantic primitives or universally shared concepts. Instead, the UW repertoire aims to be as comprehensive as the total set of concepts found across different languages and cultures, regardless of how specific they may be. The UNL lexicon is therefore an open and evolving set, continually expanding to incorporate new cultural and linguistic developments.

Universality

Permanent UWs represent concepts with varying degrees of universality and are accordingly stored in three nested lexical databases, which together form the UNL Dictionary:

  • UNL Core Dictionary: contains only permanent simple UWs that represent concepts presumably lexicalized in all languages;
  • UNL Abridged Dictionary: includes all permanent UWs (simple, compound, or complex) representing concepts lexicalized in at least two different language families;
  • UNL Unabridged Dictionary: includes all permanent UWs (simple, compound, or complex) representing concepts lexicalized in at least one language.

Non-Ambiguity and Non-Redundancy

Each sense must be represented by one and only one UW, and each UW must correspond to one and only one sense. Homonymy, synonymy, and polysemy are therefore excluded from the UNL framework. For the sake of readability, however, the same UW may be displayed under different Uniform Concept Names (UCNs)—such as “قط(icl>سنوريات)”, “猫(icl>猫科)”, “cat(icl>feline)”, “Katze(icl>Katzenartige)”, “gato(icl>félido)”, “chat(icl>félidé)”, “猫(icl>ネコ科)”, or “кошка(icl>кошачьи)”—but all of these correspond to the same Uniform Concept Locator (UCL) and are merely alternative linguistic representations of the same sense.

Simplicity

Simple UWs serve as addresses—not definitions—of senses. A simple UW itself conveys little or no information about its meaning; it functions merely as a label. All information about the sense it represents is provided through the three lexical databases within the UNL framework: the UNL Dictionary, the UNL Knowledge Base, and the UNL Memory.

2.3 Uniform Concept Identifier (UCI)

A Uniform Concept Identifier (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for Universal Words (UWs). In the UNL framework, UCIs are represented either as UCL (Uniform Concept Locator) or UCN (Uniform Concept Name).

Uniform Concept Locators (UCLs)

Uniform Concept Locators (UCLs), like URLs, provide a method for locating a concept in the UNL Knowledge Base. They are represented as:

ucl://<AUTHORITY>/<ID>
Where:
  • ucl is the scheme name for Uniform Concept Locators
  • <AUTHORITY> is the knowledge base responsible for the concept (unlkb.unlarchive.org by default)
  • <ID> is the index of the concept in the knowledge base

For instance, the concept “a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs,” which is lexicalized in English as the noun table, may be located through ucl://unlkb.unlarchive.org/104379964. This address provides all the information concerning the concept—its UNL definition, relations, and attributes—which may be used by languages where this concept is not lexicalized.

Uniform Concept Names (UCNs)

Uniform Concept Names (UCNs), like URNs, do not imply the availability of the identified resource. They are represented as:

ucn:<LID>:<NSS>
Where:
  • ucn is the scheme name for Uniform Concept Names
  • <LID> is the namespace identifier, corresponding to the three-character ISO 639-2 code for the language
  • <NSS> is the namespace-specific string, which is formed by a root and a suffix.

For instance, the concept “a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs,” which is lexicalized in English as table, may be associated with several different names:

  • ucn:eng:table(icl>furniture)
  • ucn:fra:table(icl>mobilier)
  • ucn:spa:mesa(icl>mobiliario)
  • ucn:deu:Tisch(icl>Möbel)
  • ucn:rus:стол(icl>мебель)

UCNs must be unique. The namespace-specific string is normally divided into two parts: a root (e.g.: "table") and a suffix ("(icl>furniture)"), as exemplified above. The root can be a word or a multiword expression, while the suffix—always introduced by a UNL relation—is used to disambiguate the root.

Both UCL and UCN identify Universal Words. The difference is that the UCL is an address pointing to the position of the UW in the UNL Knowledge Base, whereas the UCN is only a readable name for the UW. The same address (UCL) may be associated with several UCNs, but each UCN corresponds to only one UCL. A UCL always describes an available UW, i.e., a UW that has been defined in the UNL KB, whereas a UCN is not necessarily linked to an address. In that sense, UCLs are more “official” than UCNs, which are mainly used to preserve the readability of UNL expressions.

In the UNL Document Structure, UCIs are often abbreviated to their last component, since the scheme, authority, and namespace can be inferred from the document header. For example:

  • 104379964 instead of ucl://unlkb.unlarchive.org/104379964
  • table(icl>furniture) instead of ucn:eng:table(icl>furniture)

Formal syntax

Both UCLs and UCNs are types of URIs (Uniform Resource Identifiers), according to the syntax below:

<UW> ::= [ <scheme> ":" ] <hierarchical part> ;
<scheme> ::= "ucl" | "ucn" ;
<hierarchical part> ::= <UCL> | <UCN> ;
<UCL> ::= [ "//" <AUTHORITY> "/" ] <ID>  ;
<AUTHORITY> ::= <character>+ ;
<ID> ::= <digit>{9} ;
<UCN> ::= [ <LID> ":" ] <NSS> ;
<LID> ::= <character>{3} ; (ISO 639-2 language code)
<NSS> ::= <ROOT> <SUFFIX>
<ROOT> ::= <word>;
<SUFFIX> ::= "(" <relation> ">" <word> ")" ;
<relation> ::= relation of the UNL framework (e.g.: icl, iof, equ, etc.) ;
<word> ::= <character>+;
<character> ::= UTF-8 character;  
Where:
  • <variable> represents a placeholder for a value.
  • "literal" represents a fixed value.
  • ::= represents a definition or assignment.
  • | represents a disjunction ("or")
  • { } to be used zero or more times
  • [ ] to be used zero or one time
  • + to be used one or more times
  • <scheme name> determines the syntax and semantics of the hierarchical part. In the UNL framework, there are two schemes:
    • ucl, used for Uniform Concept Locators (e.g.: ucl://unlkb.unlarchive.org/104379964)
    • ucn, used for Uniform Concept Names (e.g.: ucn:eng:table(icl>furniture))
  • <hierarchical part> holds the identification information, and varies according to the scheme:
    • <UCL> is used when the scheme is ucl, and provides the location of the concept in the UNL Knowledge Base.
    • <UCN> is used when the scheme is ucn, and provides a readable name for the concept.

2.3 Types of UWs

Universal Words (UWs) can be classified according to their status in the UNL system (as permanent or temporary) and their internal structure (as simple, compound, or complex).

Permanent UWs

Permanent UWs are included in the UNL Dictionary and correspond to concepts that have already been lexicalized in at least one natural language—that is, concepts expressed as single lexical items and listed in dictionaries. Permanent UWs may be simple, compound, or complex.

Their internal structure determines how they are represented:

  • Simple UWs are represented as isolated nodes (Uniform Concept Identifiers, or UCIs). They denote non-compositional concepts—those whose meaning cannot be fully derived from constituent elements. Examples include “butterfly” (> “a flying insect with colorful wings”), “pineapple” (> “a tropical fruit”), or “music” (> “an art form consisting of sound and silence”). These words behave as single lexical units even though their meaning can only be paraphrased through longer expressions.

  • Compound UWs are represented as UCIs combined with Universal Attributes. They are used when a concept can be completely derived from an existing simple UW and a UNL attribute. For instance, the concept expressed by the English word “bigger” corresponds to the UW for “big” specified by the degree attribute “@more” (comparative of superiority).

  • Complex UWs are hyper-nodes—subgraphs within the UNL graph composed of UCIs interlinked by Universal Relations and specified by Universal Attributes. They are used when a concept can be derived from the combination of existing UWs, relations, and attributes. For example, the English expression “to stamp” (meaning “affix a stamp to”) can be represented in UNL as the graph corresponding to that definition.

Temporary UWs

Temporary UWs are provisional entries used to represent:

  • Concepts or entities still undergoing lexicalization (e.g., “googler”, “twittered”);
  • Concepts too specific to be included in the UNL Dictionary (e.g., “Universal Networking Digital Language Foundation”, “Léon Werth”); or
  • Non-translatable entities such as numbers, chemical formulas, or URLs (e.g., “3.14159”, “H2O”, “www.undlfoundation.org”).

Temporary UWs are always represented between double quotation marks and follow the spelling conventions of their source language (e.g., capitalization). For the time being, they are also expected to be transliterated into the Roman script.

Examples

Examples of Universal Words
Type Concept (in English) Lexicalization (in English) UCL UCN (in English)
Simple UW above average big 301382086 big(icl>size)
Compound UW comparative of above average bigger 301382086.@more big(icl>size).@more
Complex UW affix a stamp to to stamp obj(201356370,106796119) obj(to affix(icl>to attach), stamp(icl>seal))
Temporary UW UNDL Foundation UNDL Foundation "UNDL Foundation" "UNDL Foundation"

2.4 Categories of UWs

Permanent Universal Words (UWs) are classified into four main categories according to their semantic roles rather than their grammatical forms. These categories are language-independent and reflect conceptual distinctions within the UNL system.

  • Lexical Category (LEX)
    • Adjectival UWs (J) — designate attributes or qualities of entities. Example: delighting the senses or exciting intellectual or emotional admiration (→ eng: beautiful, with beauty; fra: beau, belle; esp: bello, hermoso, lindo; deu: schön; rus: красивый; etc.).

    • Adverbial UWs (A) — designate circumstances such as time, manner, or degree. Example: in a rapid manner (→ eng: quickly, rapidly, fast, swiftly; fra: rapidement, vite; por: rapidamente, depressa; gre: γρήγορα; heb: במהירות; etc.).

    • Nominal UWs (N) — designate things, objects, or entities. Example: a perennial plant with an elongated stem or trunk (→ eng: tree; fra: arbre; esp: árbol; deu: Baum; rus: дерево; hin: पेड़; zho: 树; ara: شجرة; tur: ağaç; etc.).

    • Verbal UWs (V) — designate actions, processes, or states of being. Example: to move swiftly on foot (→ eng: run; fra: courir; esp: correr; deu: laufen; rus: бежать; swa: kimbia; zul: gijima; mal: berlari; etc.).

These categories are defined by semantic value rather than linguistic form. They describe how a UW functions conceptually within the UNL framework, not how it is realized in any particular natural language.

For instance, an adjectival UW (e.g., “300217728” = “delighting the senses or exciting intellectual or emotional admiration”) may correspond to an adjective in English (“beautiful”), but could also be expressed as a prepositional phrase (“with beauty”) or a verbal phrase (“possessing beauty”), depending on the target language.

2.5 Pro-UWs

The UNL representation is designed to be as semantically saturated as possible. Deictic expressions are expected to be resolved during the UNLization process, meaning that ellipses and natural language pro-forms (such as “he”, “she”, “it”, “they”, etc.) should be replaced by their corresponding antecedents whenever possible.

However, when no substitute can be found for elements that depend on contextual information unavailable within the text, pro-UWs are used. These are represented by the null UW "00", optionally combined with specific attributes.

The main cases include:

Exophora

Refers to entities outside the text, such as personal pronouns (“I”, “you”, “we”, etc.) without textual antecedents. These are represented by the null UW "00" followed by person attributes: @1 (1st person singular), @2 (2nd person singular), @3 (3rd person singular), @1.@pl (1st person plural), @2.@pl (2nd person plural), and @3.@pl (3rd person plural).

Indefinite pronouns

These refer to general or unspecified entities (“none”, “anyone”, “everything”, etc.) and are represented by the null UW "00" combined with determiner attributes: "00.@no" (“none”), "00.@any.@person" (“anyone”), "00.@every.@thing" (“everything”), and so on.

Interrogative pronouns

Used to refer to omitted constituents in a sentence (“who”, “whom”, “where”, etc.), these are represented by "00.@wh". The interpretation depends on the relation: "00.@wh" corresponds to “who” when the target is the argument of an agt (agent) relation, to “when” for a tim (time) relation, and to “where” for a plc (place) relation.

Interjections

Standalone interjections expressing emotion (“Ouch!”, “Yeah!”, “Shhh!”, etc.) are represented by the null UW "00" combined with an emotional attribute (e.g., @pain, @joy, @anger).

Ellipses

When an omitted element cannot be recovered from context, it is represented by the null UW "00" without attributes. For instance, in “To be or not to be?”, the representation could be aoj:01(exist,00), aoj:02(exist.@not,00), or(:01,:02), because the necessary subject is missing and cannot be linked to any particular referent.

All cases above involve situations in which semantic information cannot be fully saturated. Whenever possible, pro-forms and ellipses should be replaced by their referents. For example, the pro-UW "00.@3" must not be used in “Peter said that he will not come” if “he” refers to “Peter”. The correct representation would be Peter(i) said that Peter(i) will not come.

Finally, in the UNL framework, pronouns must be distinguished from determiners. For instance, “which” in “Which is that?” functions as an interrogative pronoun and is represented as "00.@wh" when the referent is unknown; in contrast, “which” in “Which book is that?” is a determiner, represented as the attribute .@wh applied to "book" (book.@wh).

2.6 Proper UWs

Most named entities — such as names of people, places, organizations, or brands — are represented as temporary UWs, since it would be impractical to include all of them in the UNL Dictionary. However, certain proper names of broad and lasting relevance (e.g., “England”, “William Shakespeare”, “Romeo and Juliet”, “Romeo”) have been incorporated into the UNL Dictionary and are treated as permanent UWs.

The current criterion for inclusion is based on Wikipedia: if a proper name exists as an entry in Wikipedia, it should be defined as a permanent UW and included in the UNL Dictionary (or the UNL Unabridged Dictionary).

2.7 Lexical Databases

Universal Words (UWs) are organized into several interconnected lexical databases, each serving a specific function within the UNL linguistic infrastructure:

  • The UNL Dictionary is a flat list of UWs with their corresponding semantic features. It is subdivided into three nested dictionaries:

    • The UNL Core Dictionary contains permanent UWs that are expected to be lexicalized in all natural languages.

    • The UNL Abridged Dictionary includes permanent UWs that are lexicalized in at least two different language families, and therefore encompasses the Core Dictionary.

    • The UNL Unabridged Dictionary expands upon the Abridged Dictionary and comprises the complete set of permanent UWs—that is, all concepts lexicalized in at least one natural language.

  • The UNL Knowledge Base (UNL KB) is a semantic network where UWs are interconnected through Universal Relations. Unlike the Dictionary, which provides only general lexical and semantic features (e.g., lexical category, semantic class, abstractness, or cardinality), the UNL KB represents the intension—the conceptual meaning—of each UW. For example, the UW for “dog” is linked to the UWs for “domesticated”, “carnivorous”, and “mammal”.

  • The UNL Ontology is a component of the UNL Knowledge Base that focuses specifically on ontological relations, namely icl (“is-a-kind-of”) and iof (“is-an-instance-of”). It structures the conceptual hierarchy of the UNL system.

  • The UNL Memory is another network of UWs interconnected through Universal Relations. However, while the UNL KB represents the intension (meaning) of a UW, the UNL Memory represents its extension—that is, the set of possible instances or contextual uses of a UW. For instance, in the UNL Memory, the UW “dog” may appear as the agent of “to bite”, the object of “to eat”, or the instrument of “to chase”.

3. Universal Attributes

Universal Attributes are self-referential arcs—links connecting a node to itself. Unlike Universal Relations, which connect different nodes, attributes function as one-place predicates, that is, operations applying to a single argument. In the UNL framework, attributes are primarily used to encode grammatical information conveyed by natural languages, such as tense, mood, aspect, number, and similar categories.

The set of Universal Attributes is considered to be universal and relatively stable, meaning that new attributes are added only in exceptional cases.

3.1 Syntax

The syntax of attributes is defined as follows:

<attribute> ::= "@"<attribute_name>
<attribute_name> ::= <character>+ 
<character> ::= {"a",...,"z","_"}
                    

where:
< > variable
" " terminal symbol
::=... is defined as ...
{ } disjunction ("or")
+ to be used one or more times
... to be repeated more than 0 times

Attribute names are always lower case words or expressions.

In practice, English words (e.g., “past”, “present”) or mnemonic abbreviations (e.g., “def”, “pl”) are typically used for attribute labels. This, however, does not mean that attributes are limited to features found in English. Several attributes correspond to grammatical distinctions not expressed in English, such as @trial and @quadrual (used for quantification), @equivalent and @inferior (used for social deixis), or @recent and @remote (used for absolute tense). English labels are adopted solely to enhance readability and consistency.

No blank space is allowed inside an attribute name.

3.2 Semantics

Attributes are annotations applied to nodes or hypernodes within a UNL hypergraph. They specify the circumstances under which these nodes (or hypernodes) are to be interpreted.

Attributes may convey two main types of information:

  • 1. Grammatical and lexical information — expressed by bound morphemes and closed-class elements such as determiners (articles and demonstratives), adpositions (prepositions, postpositions, and circumpositions), conjunctions, auxiliary and quasi-auxiliary verbs (including auxiliaries, modals, coverbs, and preverbs), and degree adverbs (specifiers):

    • Animacy:@animal, @person, @thing
    • Aspect: @causative, @perfective, @inceptive, @imperfective, @inchoative, …
    • Degree: @minus, @plus, @maximus, @minimus, …
    • Gender: @male, @female, @neuter
    • Manner: @contrast, @cause, @method, @comparison, …
    • Modality: @belief, @condition, @conviction, @desire, …
    • Place: @anterior, @superior, @inferior, @exterior, @proximal, @distal, ...
    • Polarity: @affirmative, @negative, @dubitative
    • Quantification: @singular, @dual, @trial, @paucal, @multal, …
    • Specification: @def, @indef, @each, @own, @same, @certain, @only, …
    • Time: @past, @present, @future, @recent, @remote, @ante, @post, …
    • Voice: @active, @passive, @reflexive, @reciprocal, ...
  • 2. Pragmatic and contextual information — concerning the external conditions of the utterance, including non-verbal and discourse-level features such as prosody, politeness, rhetorical organization, and social deixis:

    • Conventions: @parenthesis, @brace, @angle_bracket, @single_quote, …
    • Emotions: @anger, @contentment, @pain, @surprise, …
    • Information structure: @comment, @focus, @topic
    • Prosody: @intonation, @stress, @rhythm, …
    • Register: @archaic, @colloquial, @jargon, …
    • Schemes: @interruption, @ellipsis, @parallelism, …
    • Speech acts: @apology, @complaint, @greeting, @request, …
    • Social deixis: @familiar, @intimate, @polite, @reverential, …
    • Text structure: @title, @speech, @vocative, …
    • Tropes: @hyperbole, @irony, @metaphor, @metonymy, …

Since UNL is designed to represent what is meant rather than what is said, some attributes, especially those related to pragmatic and contextual information, may be omitted. The primary role of several attributes is to enable the representation of additional content that, while not essential to the core meaning, may enrich it — for instance, the emotions conveyed by the original utterance or the speech act performed through it.

In this sense, the sentence “Peter devoured a thousand books” should be represented as “Peter read many books eagerly.” This representation may optionally be refined with additional information, indicating, for example, that “many” was originally expressed through hyperbole (book.@multal.@hyperbole), or that “read” was conveyed metaphorically (read.@metaphor). A similar case is “John was killed by Peter,” which should be represented as “Peter killed John,” with the optional attribute @passive indicating that the original sentence was in the passive voice ((Peter kill John).@passive). In any case, the use of such attributes remains optional, as the essential meaning is already preserved.

Attributes can be combined to provide a more precise characterization of a node. For example, book.@def.@singular specifies a definite, singular book, while run.@past.@progressive indicates an action of running that occurred in the past and was ongoing at that time.

3.3 Set of attributes

Animacy
  • @animal indicates a non-human animal referent (e.g.: "he" = 00.@male.@animal)
  • @person indicates a human referent (e.g.: "he" = 00.@male.@person)
  • @thing indicates a non-animate referent (e.g.: "it" = 00.@thing)
Aspect
  • @causative: indicates that the subject causes someone or something else to perform an action (e.g., She made him laugh)
  • @continuative: denotes an action that continues over a period of time (e.g., He kept talking all night)
  • @experiential: expresses that the subject has experienced or undergone an action or state (e.g., I have visited Paris)
  • @habitual: describes an action that occurs regularly or habitually (e.g., She drinks coffee every morning)
  • @imperfective: refers to an action that is ongoing, uncompleted, or not viewed as a whole (e.g., He was reading when I arrived)
  • @inceptive: marks the beginning of an action or state (e.g., The flowers began to bloom)
  • @inchoative: indicates a change of state or the process of becoming (e.g., The water turned to ice)
  • @iterative: signals that an action is repeated multiple times (e.g., He knocked on the door repeatedly)
  • @perfect: denotes a completed action with present relevance (e.g., She has finished her homework)
  • @perfective: presents an action as a completed whole (e.g., He wrote the letter yesterday)
  • @permissive: expresses that an action is allowed or permitted (e.g., He let her enter the room)
  • @persistent: indicates an action or state that continues or persists over time (e.g., The noise remained throughout the night)
  • @progressive: marks an action in progress at a specific time (e.g., She is reading a book)
  • @prospective: shows that an action is imminent or about to happen (e.g., He is about to leave)
  • @result: highlights the outcome or consequence of an action (e.g., She cleaned the room, leaving it spotless)
  • @terminative: signals the cessation or end of an action or state (e.g., The rain stopped)
Degree
  • @approximation: indicates closeness or near occurrence (e.g., He almost missed the bus)
  • @addition: indicates repetition or addition (e.g., She also came to the party)
  • positive
    • @repetition: repeated action (e.g., He did it again)
    • @emphasis: adds force or stress (e.g., She really loves music)
    • @sufficiency: denotes adequacy (e.g., He is tall enough)
    • @extra: indicates surplus or exaggeration (e.g., She added too much sugar)
    • @minus: reduces intensity (e.g., It's a little cold today)
    • @plus: increases intensity (e.g., He is very happy)
  • comparative
    • @major: indicates higher degree (e.g.,She is more experienced than him)
    • @minor: indicates lower degree (e.g., This task is less difficult than the previous one)
    • @equal: indicates same degree (e.g., He is as tall as his brother)
  • superlative
    • @maximus: indicates highest degree (e.g., She is the most talented student)
    • @minimus: indicates lowest degree (e.g., This is the least interesting chapter)
Emotions
  • @anger: expresses strong displeasure or hostility (e.g., “Grr!”)
  • @attention: indicates that the speaker is calling for focus or alertness (e.g., “Hey!”)
  • @consent: expresses agreement or permission (e.g., “Okay”)
  • @contentment: indicates satisfaction or happiness (e.g., “Mmm!”)
  • @disagreement: expresses opposition to a statement or idea (e.g., “No!”)
  • @discontentment: indicates dissatisfaction or unhappiness (e.g., “Ugh!”)
  • @dissent: expresses a formal or strong disagreement (e.g., “I don’t think so!”)
  • @hesitation: indicates uncertainty or pause in speech (e.g., “Um…”)
  • @pain: expresses physical or emotional suffering (e.g., “Ouch!”)
  • @relief: indicates alleviation of stress or worry (e.g., “Phew!”)
  • @surprise: expresses astonishment or unexpected reaction (e.g., “Wow!”)
  • @weariness: expresses tiredness or fatigue (e.g., “Sigh…”)
Figure of Speech
Schemes
    • @brachylogia: omission of conjunctions between a series of words (e.g., “I came, I saw, I conquered”)
    • @chiasmus: reversal of grammatical structures in successive clauses (e.g., “Ask not what your country can do for you; ask what you can do for your country”)
    • @climax: arrangement of words or clauses in order of increasing importance (e.g., “He fought bravely, valiantly, and heroically”)
    • @consonance: repetition of consonant sounds without the repetition of vowel sounds (e.g., “Pitter-patter”)
    • @ellipsis: omission of words that are implied by context (e.g., “I ordered the fish, and he the steak”)
    • @epanalepsis: repetition of the initial word or words of a clause or sentence at the end (e.g., “The king is dead, long live the king!”)
    • @interruption: insertion of a clause or sentence that interrupts the natural flow (e.g., “The car—a red one—sped past us”)
    • @parallelism: use of similar grammatical structures in two or more clauses (e.g., “She likes cooking, jogging, and reading”)
    • @pleonasm: use of superfluous or redundant words (e.g., “I saw it with my own eyes”)
    • @polyptoton: repetition of words derived from the same root (e.g., “Choosy mothers choose Jif”)
    • @polysyndeton: repetition of conjunctions between words or clauses (e.g., “We have ships and men and money and stores”)
    • @symploce: combination of anaphora and epistrophe; repetition at both the beginning and end of clauses (e.g., “The people want justice; the streets demand justice”)
Tropes
    • @anthropomorphism: ascribing human characteristics to something that is not human (e.g., The wind whispered through the trees)
    • @antiphrasis: word used contradictory to its usual meaning, often with irony (e.g., Calling a giant “Tiny”)
    • @antonomasia: substitution of a phrase for a proper name or vice versa (e.g., “The Bard” for Shakespeare)
    • @catachresis: use of an existing word to denote something that has no name in the current language (e.g., “The leg of the table”)
    • @double_negative: grammar construction using repeated negatives (e.g., “I don’t know nothing”)
    • @dysphemism: substitution of a harsher, more offensive term for another (e.g., “Croaked” for died)
    • @epanorthosis: immediate and emphatic self-correction (e.g., “I’m sorry—I mean, of course, we must leave now”)
    • @euphemism: substitution of a less offensive term for another (e.g., “Passed away” for died)
    • @hyperbole: use of exaggerated terms for emphasis (e.g., “I’m so hungry I could eat a horse”)
    • @irony: use of a word to convey a meaning opposite to its usual meaning (e.g., “Great! Another traffic jam!”)
    • @metaphor: stating one entity is another for comparative quality (e.g., “Time is a thief”)
    • @metonymy: substitution of a word to suggest what is really meant (e.g., “The crown” for monarchy)
    • @onomatopoeia: words that sound like their meaning (e.g., “Buzz”)
    • @oxymoron: pairing contradictory terms (e.g., “Deafening silence”)
    • @paradox: apparently contradictory ideas pointing to an underlying truth (e.g., “Less is more”)
    • @paronomasia: pun using similar-sounding words with different meanings (e.g., “I used to be a baker, but I couldn’t make enough dough”)
    • @periphrasis: using several words instead of few (e.g., “He passed away” instead of “He died”)
    • @repetition: repeated use of words or groups for effect (e.g., “Alone, alone, all, all alone”)
    • @synecdoche: a part stands for the whole (e.g., “All hands on deck”)
    • @synesthesia: describing one sense using terms of another (e.g., “Loud colors”)
    • @zoomorphism: applying animal characteristics to humans or gods (e.g., “He prowled the room like a tiger”)
Gender
  • @female: indicates a female referent (e.g.: "she" = 00.@female.@person)
  • @male: indicates a male referent (e.g.: "he" = 00.@male.@person)
  • @neuter: indicates a non-binary or gender-neutral referent (e.g.: "they" = 00.@3.@plural.@neuter.@person)
Information Structure
    • @topic: what is being talked about (e.g., (The sky).@topic, it is very clear today.)
    • @comment: what is being said about the topic (e.g., “The skay, (it is very clear today).@comment”)
    • @focus: the part of the sentence that highlights new or contrastive information, often contrary to what the listener assumes (e.g., “I invited (JOHN).@focus, not Mary”)
Lexical Category
  • @adjective: a word normally in another category (like a noun) used as an adjective (e.g., “chicken.@adjective soup” – the noun “chicken” modifies another noun)
  • @adverb: a word normally in another category (like an adjective) used as an adverb (e.g., “Drive safe.@adverb” – the adjective “safe” used as an adverb)
  • @noun: a word normally in another category (like a verb or adjective) used as a noun (e.g., “the rich.@noun” – adjective “rich” used as a noun)
  • @verb: a word normally in another category (like a noun or adjective) used as a verb (e.g., “to email.@verb” – noun “email” used as a verb)
Manner
  • @absence: indicates lack or not having something; includes: without, off (e.g., He left without saying goodbye)
  • @addition: indicates addition or inclusion; includes: and, in addition to, as well as, besides (e.g., In addition to coffee, she ordered tea)
  • @cause: indicates reason or cause; includes: because, because of, due to, owing to, on account of, thanks to, for (e.g., The flight was canceled due to fog)
  • @comparison: indicates comparison; includes: than, as, like (e.g., She is taller than her brother)
  • @compliance: indicates conformity with rules or standards; includes: in accordance with, as per, pursuant to (e.g., In accordance with the law)
  • @condition: indicates condition; includes: if, even if, if only, in case, in case of, given (e.g., Even if it rains, we’ll go)
  • @consequence: indicates result or consequence; includes: so (e.g., It was late, so we went home)
  • @contrast: indicates contrast between clauses or ideas; includes: although, but, unlike, notwithstanding, pace, despite, in spite of, regardless of (e.g., Although it was raining, we went out)
  • @choice: indicates alternatives or options; includes: or (e.g., Tea or coffee?)
  • @exception: indicates exclusion or exception; includes: except, except for, save, unless, barring, failing (e.g., Everyone except John was present)
  • @extent: indicates scope or degree; includes: as far as, to the extent that, insofar as (e.g., To the extent that you are able, please assist)
  • @instrument: indicates the means or tool used to perform an action; includes: by, with (e.g., He wrote the letter with a pen)
  • @manner: indicates the way something is done; includes: as, like, by means of, for (e.g., He solved the problem by means of logic)
  • @method: indicates the means or technique used to achieve something; includes: by, through (e.g., She achieved success by hard work)
  • @opposition: indicates conflict or opposition; includes: against (e.g., She voted against the proposal)
  • @purpose: indicates intention or goal; includes: in order to, so as to, for the purpose of, for (e.g., He studied hard in order to pass the exam)
  • @reference: indicates topic, reference, or concerning something; includes: regarding, as regards, with respect to, with regard to, with relation to, concerning, that of, per, qua (e.g., Regarding your email, I will reply soon)
  • @substitution: indicates replacement; includes: instead of, in place of, on behalf of (e.g., He substituted tea in place of coffee)
  • @support: indicates favor or agreement; includes: in favor of, pro (e.g., He voted in favor of the motion)
  • @time: indicates a point in time; includes: as of (e.g., As of Monday, the policy changes)
  • @value: indicates worth or importance; includes: worth (e.g., This painting is worth a lot of money)
Modality
  • @ability: expresses capacity or skill (She can speak French)
  • @advice: recommends a course of action (You should learn German)
  • @agreement: expresses consent or concurrence (I agree with your plan)
  • @assertion: states something as a fact (Climate change is accelerating)
  • @assumption: takes something for granted without proof (He must be the new manager)
  • @belief: expresses personal conviction (I think that honesty is essential)
  • @command: gives an order or instruction (Close the window)
  • @conclusion: draws a result from reasoning (Therefore, we should postpone the meeting)
  • @condition: expresses a requirement or prerequisite (If it rains, the event will be canceled)
  • @confirmation: verifies or affirms something (Yes, that is correct)
  • @consequence: indicates a result or effect (Because you missed the deadline, your application was rejected)
  • @conviction: expresses a strongly held belief (I am certain that we are making the right choice)
  • @decision: communicates a choice or resolution (We will hire a new assistant)
  • @deduction: draws a specific inference from general information (Since all cats are mammals and Felix is a cat, Felix is a mammal)
  • @desire: expresses a wish or want (I would like to travel to Japan)
  • @determination: shows resolve to achieve something (I will finish this project no matter what)
  • @doubt: conveys uncertainty (I am not sure if this is correct)
  • @exclamation: expresses sudden emotion or surprise (What a beautiful sunset!)
  • @exhortation: urges someone to act (Let's do our best to succeed)
  • @expectation: indicates that something is anticipated (He should arrive soon)
  • @fear: expresses concern or apprehension (I might fail the test)
  • @hope: conveys a positive anticipation (I hope it doesn't rain tomorrow)
  • @hypothesis: proposes a tentative explanation (If we increase the temperature, the reaction will accelerate)
  • @intention: expresses a plan to act (I intend to call her tonight)
  • @interrogation: asks a question (Did you complete the assignment?)
  • @invitation: requests someone to join or participate (Would you like to come to my party?)
  • @judgement: expresses an evaluative opinion (This proposal is clearly superior)
  • @narrative: recounts events or a story (Yesterday, I went to the market and met an old friend)
  • @necessity: indicates something required (You must submit the form by Friday)
  • @obligation: conveys a duty or moral requirement (We ought to follow the code of conduct)
  • @opinion: expresses a personal viewpoint (I think this movie is overrated)
  • @permission: grants consent to act (You may borrow my car)
  • @possibility: indicates something that may happen (It might rain later)
  • @prediction: forecasts a future event (It will be sunny tomorrow)
  • @presumption: assumes something based on evidence (He must be the new manager)
  • @probability: expresses likelihood (She is likely to win)
  • @prohibition: forbids an action (You must not park here)
  • @promise: commits to do something (I will help you move)
  • @regret: expresses sorrow about past actions (I should have attended the meeting)
  • @request: politely asks for something (Could you please pass the salt?)
  • @speculation: conjectures without certainty (He might be the one who left the message)
  • @suggestion: proposes a possible action (Why don't we take a break now?)
  • @threat: warns of potential harm (If you do that again, I will call security)
  • @warning: signals possible danger or problem (Watch out for the slippery floor)
Person
  • @1: indicates first person: speaker (e.g., I = 00.@1, we = 00.@1.@plural)
  • @2: indicates second person: addressee (e.g., tu = 00.@2, vous = 00.@2.@plural)
  • @3: indicates third person: someone or something else (e.g., he = 00.@3.@male, they = 00.@3.@plural)
Place
location
  • @anterior: located in front of another object or facing forward (before, in front of, ahead of)
  • @central: located at or near the center of a space or group (at the center of, in the middle of)
  • @distributed: spread across or throughout a space (along, across, throughout)
  • @encircling: surrounding an object completely or partially (around, about, round)
  • @exterior: situated outside or beyond a boundary or surface (outside, beyond)
  • @inferior: located lower than or beneath another object (below, under, underneath)
  • @interior: situated within or inside a boundary or enclosure (in, inside, within)
  • @intermediate: located in the space separating two distinct entities or points (between, amid)
  • @intermingled: located within or surrounded by multiple entities, without clear separation (among, amidst, in the midst of)
  • @lateral: located to the side of another object (beside, alongside, by)
    • @left: positioned to the left side of a reference point (to the left of)
    • @right: positioned to the right side of a reference point (to the right of)
  • @medial: situated toward the middle or midline of an object or group (toward the middle of, along the center of)
  • @opposite: facing or situated directly across from another object (opposite, across from, against)
  • @posterior: located behind or toward the back of another object (behind, at the back of, beyond)
  • @superior: located higher than or above another object (above, over, on top of)
Position
  • @adjacent: located in direct contact with or immediately next to another object (on, upon, against, next to, beside)
  • @proximal: situated near or close to a reference point (near, close to, by)
  • @distal: situated far from or away from a reference point (far from, away from, beyond)
Direction
  • @circular: describing motion or extension that curves continuously around a central point or axis (around, about, round)
  • @destination: indicating the endpoint or goal toward which movement is directed (to, into, toward)
  • @horizontal: describing motion or extension that is sideways or to the side (to the side, sideways)
  • @linear: describing motion or extension along a straight or single-dimensional path (along, by)
  • @omnidirectional: extending or moving in all directions from a central point (throughout, all over)
  • @orientation: indicating the facing or directional alignment of an object relative to another (toward, facing, in the direction of)
  • @origin: indicating the point or source from which movement or action begins (from, out of)
  • @radial: describing motion or arrangement extending outward or inward along lines from a center (from, to, toward)
  • @reciprocal: describing alternating or oscillating motion between two opposite directions or positions, often repeating in cycles (back and forth, to and fro, again and again)
  • @transversal: describing motion or extension across or perpendicular to a main axis (across, over, through)
  • @vertical: describing motion or extension along an up–down axis
    • @downward: directed from a higher to a lower point (down, downward, below)
    • @upward: directed from a lower to a higher point (up, upward, above)
Polarity
  • @affirmative: expresses a positive or favorable evaluation, approval, or desirable sentiment (The results are excellent)
  • @negative: expresses a negative or unfavorable evaluation, disapproval, or undesirable sentiment (The results are terrible)
  • @dubitative: expresses doubt, hesitation, or mixed sentiment, without clear positive or negative orientation (The results might be good, but I’m not sure)
  • @neutral: expresses a neutral or balanced evaluation, neither positive nor negative (The results are average)
Punctuation and Delimitation
  • @angle_bracket: marks enclosed material within angle brackets ([text] or ⟨text⟩)
  • @brace: marks enclosed material within curly braces ({text})
  • @double_parenthesis: marks material doubly enclosed in parentheses (((text))), often indicating secondary or editorial commentary
  • @double_quote: marks direct speech or quotation using double quotation marks (“text”)
  • @parenthesis: marks material enclosed within parentheses, usually for asides or clarifications ((text))
  • @single_quote: marks direct speech, quotation, or emphasis using single quotation marks (‘text’)
  • @square_bracket: marks editorial insertions or clarifications within text ([text])
Quantification
  • @collective: denotes a grouped or unified set treated as a single entity (A pair of shoes)
  • @existential: expresses the existence of one or more entities without specifying which (Any student can answer)
  • @half (bipartite): refers to one of two equal parts of a whole (Half the team was absent)
  • @generic: indicates reference to a class as a whole, not to specific instances (The tiger is a wild animal)
  • @majority: refers to more than half of a group (The majority voted in favor)
  • @minority (residual): refers to less than half of a group (The minority opposed the decision)
  • @multiplicative: expresses repetition or frequency (She called three times)
  • @null: indicates absence or nonexistence of any entity in a set (No student was present)
  • @partial: denotes an unspecified portion of a whole (Part of the building collapsed)
  • @plural: refers to more than one entity (They bought books)
    • @dual: refers to exactly two entities (They have two eyes)
    • @multal: refers to a large or indefinite number of entities (Many stars are visible)
    • @paucal: refers to a small, limited number of entities (Few people know the truth)
    • @quadrual: refers to exactly four entities (The four elements are essential)
    • @trial: refers to exactly three entities (The three siblings arrived together)
  • @singular: refers to a single entity, considered the default form (The book is new)
  • @total: denotes the completeness or wholeness of a single entity (The entire city was silent)
  • @unit: denotes a single instance used as a measure or indivisible item (One piece of advice)
  • @universal: refers to every member of a set or category (All students passed the exam)
Register
  • @archaic: used in an earlier stage of the language and now rarely found in modern usage (Thou shalt not kill)
  • @colloquial: typical of informal spoken interaction and everyday conversation (She’s kinda tired)
  • @dialect: characteristic of a regional or social variety of a language (We was there yesterday)
  • @elder: typical of older generations, sometimes perceived as conservative or old-fashioned (Back in my day, things were different)
  • @formal: used in official, academic, or ceremonious contexts, with careful structure and vocabulary (We hereby submit the report)
  • @humorous: marked by irony, wordplay, or comic intent (I'm on a whiskey diet — I’ve lost three days already)
  • @informal: casual and spontaneous in tone, suitable for friendly or familiar situations (Hey, what’s up?)
  • @internet: typical of online communication, including abbreviations, emojis, and informal syntax (LOL, that’s hilarious 😂)
  • @jargon: specialized vocabulary used by members of a particular profession or technical field (The algorithm converged after ten iterations)
  • @literary: typical of elevated, artistic, or poetic expression found in literature (The moonlight bathed the valley in silver)
  • @neutral: free of emotional, stylistic, or situational coloring; standard and context-independent (The meeting starts at noon)
  • @pejorative: expresses contempt, disapproval, or negative attitude toward someone or something (That politician is a clown)
  • @religious: associated with ritual, scripture, or spiritual discourse (Peace be with you)
  • @slang: very informal, often playful or innovative expressions typical of specific groups (This place is awesome)
  • @taboo: socially or culturally prohibited or offensive expressions, often related to sexuality, religion, or bodily functions (He uttered a swear word)
  • @technical: restricted to specialized domains of knowledge, conveying precise or scientific meaning (DNA replication involves polymerase enzymes)
  • @youth: used primarily by younger speakers, often featuring innovative or expressive vocabulary (That song is fire)
  • @vulgar: coarse or impolite language considered socially unacceptable (He used a rude insult)
Social Deixis
  • @affiliative: used to create or reinforce a sense of social belonging or group identity (We’re all in this together)
  • @distant: expresses social distance or respect, typically marked by formal pronouns (V-form) (Vous êtes très aimable / Could you please sit down?)
  • @deferential: emphasizes submission or humility in the presence of higher authority, sometimes ritualized (I am honored by your presence, my lord)
  • @dominant: marks a higher social position or authority relative to the interlocutor (e.g., “Yes, sir” — shows deference toward someone of higher status).
  • @equivalent: expresses equality or lack of social hierarchy between speakers (e.g., “Hey, Alex!” — used among peers).
  • @familiar: denotes casual or close relationships, often within family or friendships (e.g., “How’s it going, buddy?”).
  • @intimate: used for very close relationships or private contexts, often involving emotional closeness (e.g., “My love, you’re home early.”).
  • @polite: expresses formality, respect, or social distance (e.g., “Would you mind, please?”).
  • @reverential: conveys profound respect or veneration, often in religious or ceremonial contexts (e.g., “Your Excellency, may I speak?”).
  • @solidary: expresses social closeness and mutual familiarity, often marked by informal pronouns (T-form) (Tu es mon ami / You’re my friend)
  • @submissive: indicates lower social position, humility, or subordination relative to the interlocutor (e.g., “At your service, ma’am.”).
Specification
  • @addition: marks inclusion or addition of an element to a previous set (≈ “also”) (e.g., Maria came, and João also joined.)
  • @estimation: indicates estimation or imprecision (≈ “circa”) (e.g., The meeting lasted around two hours.)
  • @def: expresses definiteness or identifiable reference (e.g., The book is on the table.)
    • @both: specifies two known and included entities (e.g., Both candidates attended the debate.)
    • @distal: refers to something far from the speaker (≈ “that”) (e.g., That house is ours.”)
    • @each: marks distributive reference to all members individually (e.g., Each student received a book.)
    • @either: expresses a choice between two options, both acceptable (e.g., You may choose either route.)
    • @medial: refers to something near the addressee (≈ “that [near you]”) (e.g., Can you pass me that cup [near you]?)
    • @other: specifies an element distinct from a prior one (e.g., I’ll take the other pen.)
    • @own: marks possession or reflexive specification (e.g., She made her own dress.)
    • @proximal: refers to something near the speaker (≈ “this”) (e.g., This book is mine.)
    • @same: indicates identity or repetition of reference (e.g., They arrived at the same time.)
    • @such: points to exemplification or type reference (e.g., Such behavior is unacceptable.)
  • @even: expresses emphasis by extending inclusion to an unexpected case (e.g., Even the teacher was surprised.)
  • @indef: indicates non-specific or unidentified reference (≈ “a”, “some”, “any”) (e.g., I saw a bird outside.)
    • @certain: marks partial identification or specificity within indefiniteness (e.g., A certain book changed my life.)
    • @wh: used in interrogative or relative reference (e.g., Which person said that?)
  • @neither: indicates exclusion of both alternatives (e.g., Neither answer is correct.)
  • @only: marks exclusivity or restriction (e.g., She invited only her friends.)
  • @ordinal: expresses rank or sequence in an ordered set (e.g., This is the second chapter.)
Discourse and Structural Roles
  • @entry: marks the main or initiating clause of a sentence, often functioning as the syntactic head or matrix clause (She said that he was late.)
  • @heading: indicates a phrase or sentence functioning as a section or text heading (Chapter 1: The Beginning of the Journey)
  • @quotation: signals a segment of text reproduced from another source, whether direct or indirect (According to the report, “sales increased by 20%.”)
  • @relative: introduces or heads a relative clause that modifies a noun phrase (The book that you gave me is excellent.)
  • @speech: marks a portion of direct speech within a narrative or dialogue (She said, “I’m ready.”)
  • @title: identifies a nominal expression functioning as the title of a work, section, or piece (I’m reading War and Peace.)
  • @vocative: indicates direct address to a person or entity within discourse (John, could you help me?)
Time
  • absolute tense: refers to the absolute time of the event
    • @past: refers to a time before the moment of utterance (She worked yesterday.)
    • @present: refers to the time coinciding with the moment of utterance (She works every day.)
    • @future: refers to a time after the moment of utterance (She will work tomorrow.)
    • @gnomic: marks a statement expressing a general, timeless truth or habitual fact, rather than a specific event in time (e.g., The sun rises in the east.).
  • relative tense: refers to the time of the event in relation to another time or event
    • @ante: refers to a time occurring before another reference time (We met after lunch. She had left when I arrived. )
    • @post: refers to a time occurring after another reference time (We met before the meeting. She would leave after he returned.)
    • @recent: refers to a time close to the moment of utterance in the past or future (She arrived a few minutes ago.)
    • @remote: refers to a time distant from the moment of utterance in the past or future (She lived here long ago.)
    • @simultaneous: indicates simultaneity or overlap in time (He slept during the movie.)
    • @immediate: indicates that an event is temporally adjacent to the reference point, occurring just before or just after it. Combined with @past, it expresses a recent event ("just"); combined with @future, it expresses an imminent event ("about to").
    • @since: indicates the beginning of a period extending to the present (She has lived here since 2010.)
    • @until: indicates duration up to a certain time limit (He worked until midnight.)
Voice
  • @active: the subject performs the action expressed by the verb (The chef prepared the meal.)
  • @passive: the subject is affected by or receives the action (The meal was prepared by the chef.)
  • @middle: the subject both performs and is affected by the action, often implying self-involvement or spontaneous action (The door opened easily. / She dressed quickly.)
  • @reflexive: the action is directed back toward the agent itself (She washed herself.)
  • @reciprocal: the action is mutual between two or more agents (They hugged each other.)
  • @anticausative: the event occurs without an external agent, often as an intransitive version of a causative verb (The vase broke. compared to He broke the vase.)
  • @impersonal: the agent is unspecified or generic, and the clause lacks a true subject (It is said that... / One should be careful.)

4. Universal Relations

Universal Relations (formerly referred to as “links”) are the labeled arcs that connect one node to another in a UNL graph. They encode two-place semantic predicates, that is, relations that hold between two Universal Words (UWs). Each Universal Relation specifies the semantic role that one concept plays in relation to another—such as agent, object, or instrument. Together, these relations define the internal semantic structure of a UNL representation.

Universal Relations are binary and directed: they connect a source concept to a target concept. The inventory of relations is deliberately restricted and stable, ensuring cross-linguistic consistency and avoiding ad hoc extensions. Their design aims to provide a universal set of semantic functions that can be applied uniformly across languages.

Although the labels of Universal Relations may resemble those used in syntax, they do not correspond to grammatical functions. Instead, they express conceptual roles. For example, the idea of “something that initiates an event,” represented by agt (agent), is conceptually distinct from the grammatical notion of “subject,” even though the two often coincide. The agent role may be realized in different grammatical forms—such as a noun modifier (“the student’s invention”), a prepositional phrase (“made by the student”), or a derived noun (“the builder”). The goal of Universal Relations is precisely to capture such conceptual equivalences beyond their surface grammatical realizations.

4.1 Syntax

Universal relations are represented as follows:

<rel>:<scope>(<source>,<target>)

where:

  • <rel> is the name of the relation (two-character or three-character lower-case strings) (see the complete list of relations below)
  • <scope> is the scope of the relation (two-character unique identifier for the scope). The scope may be omitted if the main scope, i.e., :00.
  • <source> is the UW that assigns the relation <rel>, with the corresponding attributes, if any.
  • <target> is the UW that receives the relation <rel>, with the corresponding attributes, if any.

4.2 Semantics

The following principles govern the use of Universal Relations in UNL:

  • 1. Relations are directional, not commutative.

    The order of arguments matters. For instance:

    cnt(evidence, absence)cnt(absence, evidence)

    In other words, “evidence of absence” is not the same as “absence of evidence.”

  • 2. The target defines the relation.

    In the structure <relation>(<source>, <target>), the target is the element that fulfills the role expressed by the relation. For example:

    • agt(kill, Peter) → ‘Peter’ is the agent of ‘kill’
    • obj(kill, Peter) → ‘Peter’ is the patient of ‘kill’
    • tim(kill, yesterday) → ‘yesterday’ is the time of ‘kill’
    • plc(kill, kitchen) → ‘kitchen’ is the place of ‘kill’
    • mod(book, beautiful) → ‘beautiful’ is a modifier of ‘book’
    • icl(document, book) → ‘book’ is a type of ‘document’
    • iof(city, Paris) → ‘Paris’ is an instance of ‘city’
  • 3. Relations express semantic functions, not syntactic ones.

    The same relation may appear in different syntactic roles. For instance, the relation gol (goal) can occur as:

    • Specifier: ‘Peter received the book’ → gol(received, Peter)
    • Complement: ‘Mary gave the book to Peter’ → gol(gave, Peter)
    • Adjunct: ‘Mary bought a book to Peter’ → gol(bought, Peter)
  • 4. Relations disambiguate meaning.

    They help resolve lexical or syntactic ambiguities. For example, the English preposition “in” in ‘Peter works in X’ may represent several meanings:

    • plc(work, X) → physical place (‘Peter works in Geneva’)
    • lpl(work, X) → logical place (‘Peter works in politics’)
    • cnt(work, X) → content (‘Peter works in improving a technology’)
    • tim(work, X) → time (‘Peter works in the summer’)
    • dur(work, X) → duration (‘Peter works in ten hours’)
    • man(work, X) → manner (‘Peter works in intervals’)
  • 5. Relations are independent of lexical category.

    The same semantic relation can link concepts expressed by nouns or verbs:

    • agt: ‘John arrived’ → agt(arrived, John); ‘arrival of John’ → agt(arrival, John)
    • gol: ‘go to NY’ → gol(go, NY); ‘train to NY’ → gol(train, NY)
    • cnt: ‘talk about John’ → cnt(talk, John); ‘book about John’ → cnt(book, John)
  • 6. More general relations can often be refined into more specific ones through the use of attributes.

    In many cases, specific relations can be reformulated using more general relations combined with attributes:

    • ‘come from NY’ → src(come, NY) = plc(come, NY.@origin)
    • ‘go to NY’ → gol(go, NY) = plc(go, NY.@destination)
    • ‘go through Geneva’ → via(go, Geneva) = plc(go, Geneva.@transversal)
    • ‘work since early’ → tmf(work, early) = tim(work, early.@since)
    • ‘work until late’ → tmt(work, late) = tim(work, late.@until)
    • ‘work during the summer’ → dur(work, summer) = tim(work, summer.@simultaneous)
    • ‘kill with a knife’ → ins(kill, knife) = man(kill, knife.@instrument)
  • 7. The choice of relations depends on the semantic structure of the UW.

    Each Universal Word determines which relations are appropriate for its arguments. For instance:

    • To kill: the subject is the agent, and the object is the patient (transformed by the action). → agt(kill, subject), obj(kill, object)
    • To love: the subject is the experiencer, and the object is the content or theme of the feeling. → exp(love, subject), cnt(love, object)
    • To give: the subject is the agent, the given object is the content, and the recipient is the goal. → agt(give, subject), cnt(give, object), gol(give, recipient)

4.3 Hierarchy of relations

Universal Relations are organized in a hierarchy where lower nodes subsume upper nodes. The topmost level is the relation "rel", which simply indicates that there is a semantic relation between two elements.

rel

  • agt (agent)
  • and (conjunction)
  • aoj (object of an attribute)
    • ant (antonym, different from)
    • equ (synonym, equal to)
    • fld (field)
    • icl (hyponym, a kind of)
    • iof (example, instance of)
    • pof (meronym, part of)
  • ben (beneficiary)
  • cnt (content or theme)
  • con (condition)
  • exp (experiencer)
  • mod (modifier)
    • mat (material)
    • nam (name)
    • pos (possessor)
    • qua (quantifier)
  • obj (patient)
    • opl (objective place)
    • res (result)
  • or (disjunction)
  • per (proportion, rate, distribution or basis for a comparison)
    • bas (basis for a comparison)
  • plc (location: physical or logical)
    • gol (final place or state, destination)
    • lpl (logical place, scene)
    • src (initial place or state, origin)
    • via (intermediate place, path)
  • ptn (partner)
  • tim (time)
    • tmf (initial time)
    • tmt (final time)
    • dur (duration)
      • coo (co-occurrence)
  • man (manner)
    • ins (instrument or method)
      • met (method)
    • pur (purpose)
  • rsn (reason)
  • seq (consequence)

4.4 List of relations in alphabetical order

List of Universal Relations
Tag Relation Definition Example
agt agent A participant in an action or process that provokes a change of state or location. John killed Mary = agt(killed;John)
Mary was killed by John = agt(killed;John)
arrival of John = agt(arrival;John)
and conjunction Used to state a conjunction between two entities. John and Mary = and(John;Mary)
both John and Mary = and(John;Mary)
ant opposition or concession Indicates that two entities do not share the same meaning or reference; also used for concession. John is not Peter = ant(Peter;John)
3 + 2 != 6 = ant(6;3+2)
aoj object of an attribute The subject of a stative verb or the predicative relation between predicate and subject. John is sad = aoj(sad;John)
the book contains many pictures = aoj(contain;book)
ben beneficiary A participant who is advantaged or disadvantaged by an event. John works for Peter = ben(works;Peter)
cnt content or theme The object of a stative or experiential verb, or the theme of an entity. John loves Mary = cnt(love;Mary)
a book about Peter = cnt(book;Peter)
con condition A condition of an event. If I see him, I will tell him = con(I will tell him;I see him)
dur duration or co-occurrence The duration of an entity or event; co-occurrence of events. John worked for five hours = dur(worked;five hours)
equ synonym or paraphrase Indicates that two entities share the same meaning or reference; also semantic apposition. The morning star is the evening star = equ(evening star;morning star)
exp experiencer A participant who receives a sensory impression or is the locus of an experiential event. John believes in Mary = exp(believe;John)
fld field Indicates the semantic domain of an entity. sentence (linguistics) = fld(sentence;linguistics)
gol final state, place, destination or recipient The final state, place, destination or recipient of an entity or event. John received the book = gol(received;John)
John goes to NY = gol(go;NY)
icl hyponymy, is a kind of Refers to a subclass relation (is-a-kind-of). Dogs are mammals = icl(mammal;dogs)
ins instrument or method An inanimate entity or method used by an agent to implement an event. The cook cut the cake with a knife = ins(cut;knife)
iof is an instance of Refers to an instance or individual element of a class. John is a human being = iof(human being;John)
lpl logical place A non-physical place where an entity or event occurs or a state exists. John works in politics = lpl(works;politics)
man manner Indicates how the action, experience or process of an event is carried out. John bought the car quickly = man(bought;quickly)
mat material Indicates the material of which an entity is made. A statue in bronze = mat(statue;bronze)
mod modifier A general modification of an entity. a beautiful book = mod(book;beautiful)
nam name The name of an entity. The city of New York = nam(city;New York)
obj patient A participant undergoing a change of state or location in an action or process. John killed Mary = obj(killed;Mary)
opl objective place A place affected by an action or process. John was hit in the face = opl(hit;face)
or disjunction Indicates a disjunction between two entities. John or Mary = or(John;Mary)
per proportion, rate, distribution or basis for a comparison Indicates a measure or quantification of an event or basis for comparison. twice a week = per(twice;week)
John is more beautiful than Peter = per(beautiful;Peter)
plc place The location or spatial orientation of an entity or event. John works here = plc(work;here)
John works in NY = plc(work;NY)
pof is part of Refers to a part–whole relation. John is part of the family = pof(family;John)
pos possessor The possessor of a thing. John's book = pos(book;John)
ptn partner A secondary (non-focused) participant in an event. John wrote the letter with Peter = ptn(wrote;Peter)
pur purpose The purpose of an entity or event. John left early in order to arrive early = pur(John left early;arrive early)
qua quantity Expresses the quantity of an entity. two books = qua(book;2)
res result or factitive A referent that results from an entity or event. They built a very nice building = res(built;a very nice building)
rsn reason The reason of an entity or event. John left because it was late = rsn(John left;it was late)
seq consequence Used to express consequence. I think therefore I am = seq(I think;I am)
src initial state, place, origin or source The initial state, place, origin or source of an entity or event. John came from NY = src(came;NY)
tim time The temporal placement of an entity or event. John came yesterday = tim(came;yesterday)
tmf initial time The initial time of an entity or event. John worked since early = tmf(worked;early)
tmt final time The final time of an entity or event. John worked until late = tmt(worked;late)
via intermediate state or place The intermediate place or state of an entity or event. John went from NY to Geneva through Paris = via(went;Paris)

5. UNL Sentence

UNL sentences, or UNL expressions, are sentences of UNL. They are hypergraphs made out of nodes (Universal Words) interlinked by binary semantic Universal Relations and modified by Universal Attributes. UNL sentences have been the basic unit of representation inside the UNL framework.

Syntax

There are two different ways of representing UNL sentences: the table format and the list format. In the list format, UWs and relations are represented separately; in the table format, they constitute a single structure.

List Format

The syntax for UNL sentences in the list format is the following:

 
<UNL sentence> ::= "[W]" <list of UWs> "[/W]" [ "[R]" <list of relations> "[/R]" ] 
<list of UWs> ::= <UW+attributes> [<UW+attributes>...] 
<UW+attributes> ::= <UW>{:<Scope-ID>}[<attribute list>]:<UW-ID> 
<list of relations> ::= <binary relation>[<binary relation>...] 
<binary relation> ::= <source node><relation[":"<Scope-ID>]><target node> 
<source node> ::= <UW-ID> 
<target node> ::= <UW-ID> ]

Table Format

The syntax for UNL sentences in the table format is the following:

<UNL sentence> ::= <list of relations> 
<list of relations> ::= <binary relation>[<binary relation>...] 
<binary relation> ::= <relation> [":"<Scope-ID>] "(" <source node> , <target node> ")" 
<source node> ::= <UW+attributes> 
<target node> ::= <UW+attributes> 
<UW+attributes> ::= <UW>{:<Scope-ID>}[<attribute list>]:<UW-ID> 

Where
" and " indicate a predefined delimiter
< and > indicate a non-terminal symbol
{ and } indicate a range
[ and ] indicate an omissible part
... indicates more than 0 times repetition of the front part
::= indicates the left part can be replaced by the right part

Example of UNL Sentence

Table Format
aoj(300986027, 102121620.@def)
exp(201543123.@present, 102121620.@def)
plc(201543123.@present, 103727837.@superior.@adjacent.@def)
List Format
[W]
300986027:01
102121620:@def:02
201543123:@present:03
103727837:@superior.@adjacent.@def:04
[/W]
[R]
01aoj02
03exp02
03plc04
[/R]

6. UNL Document

The UNL/XML document adopts the structural conventions of RDF/XML to ensure compatibility with Semantic Web technologies while preserving the specific representational requirements of the Universal Networking Language (UNL). The root element <unl:UNL> declares the relevant namespaces and schema locations. These include the main UNL namespace (xmlns:unl), which defines the vocabulary for UNL-specific tags; the Dublin Core namespace (xmlns:dc), used for metadata and provenance information; and the XML Schema Instance namespace (xmlns:xsi), which supports schema validation through xsi:schemaLocation. The schema location identifies the authoritative reference for the structure and semantics of UNL/XML documents (typically hosted at https://unlkb.unlarchive.org).

The UNL/XML document is divided into two main sections: a header and a body. The header (<unl:metadata>) specifies essential provenance information such as creator, encoding, schema, and authority. The body (<unl:body>) contains the actual UNL representation, organized hierarchically into headings, paragraphs, and sentences. Each sentence may include the original linguistic content in one or more languages (<unl:org>), the corresponding UNL graph (<unl:unl>), and possibly additional linguistic variants. This structure guarantees that each UNL document is both machine-interpretable and semantically self-contained.

Formal Syntax

<unl_document> ::=  '<?xml version="1.0" encoding="UTF-8"?>'
                    '<unl:UNL' <namespace-declarations>  '>' 
                          <metadata> 
                          <body> 
                    '</unl:UNL>'
<namespace-declarations> ::= 'xmlns:unl="https://unlkb.unlarchive.org/schema/unl#"'
                             'xmlns:dc="http://purl.org/dc/elements/1.1/"'
                             'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"'
                             'xsi:schemaLocation="https://unlkb.unlarchive.org/schema/unl# https://unlkb.unlarchive.org/schema/unl.xsd"'
<metadata> ::= <unl:metadata>
                  <dc:title>TEXT</dc:title>
                  <dc:creator>TEXT</dc:creator>
                  [<dc:date>DATE</dc:date>]
                  [<dc:language><iso639-code></dc:language>]
                  [<dc:rights>TEXT</dc:rights>]
                  <unl:scheme>TEXT</unl:scheme>
                  <unl:authority>URI</unl:authority>
               </unl:metadata>
<body> ::= <unl:body> { <heading> | <paragraph> }+ </unl:body>
<heading> ::= <unl:heading level=DIGIT>TEXT</unl:heading>
<paragraph> ::= <unl:paragraph id="<id>"> { <sentence> }+ </unl:paragraph>
<sentence> ::= <unl:sentence id="<id>">
                 { <org> }+ <unl> [ <out> ]*
               </unl:sentence>
<org> ::= <unl:org lang="<iso639-code>" >TEXT</unl:org>
<unl> ::= <unl:unl uci="<uci>" format="<format>"> { UNL SENTENCE }+ </unl:unl>
<out> ::= <unl:out lang="<iso639-code>">TEXT</unl:out>
<format> ::= 'table' | 'list'
<uci> ::= 'ucl' | <ucn> 
<ucn> ::= <iso639-code>

Example of a UNL Document

<?xml version="1.0" encoding="UTF-8"?>
<unl:UNL
    xmlns:unl="https://unlkb.unlarchive.org/schema/unl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://unlkb.unlarchive.org/schema/unl# https://unlkb.unlarchive.org/schema/unl.xsd">

  <unl:metadata>
    <dc:creator>John Doe</dc:creator>
    <dc:title>Example of UNL/XML Document</dc:title>
    <dc:date>2025-11-05</dc:date>
    <dc:language>en</dc:language>
    <unl:encoding>UTF-8</unl:encoding>
    <unl:scheme>UNL 2025</unl:scheme>
    <unl:authority>https://unlkb.unlarchive.org</unl:authority>
  </unl:metadata>

  <unl:body>
    <unl:heading>Sample Section</unl:heading>
    <unl:paragraph>
      <unl:sentence id="s1">
        <unl:org lang="eng">The fat cat sits on the mat.</unl:org>
        <unl:unl uci="ucl" format="table">
            aoj(300986027, 102121620.@def)
            exp(201543123.@present, 102121620.@def)
            plc(201543123.@present, 103727837.@superior.@adjacent.@def)
        </unl:unl>
        <unl:out lang="fra">Le gros chat est assis sur le tapis.</unl:out>
        <unl:out lang="deu">Die fette Katze sitzt auf der Matte.</unl:out>
        <unl:out lang="por">O gato gordo está sentado no tapete.</unl:out>
        <unl:out lang="ita">Il grosso gatto è seduto sul tappeto.</unl:out>
      </unl:sentence>
    </unl:paragraph>
  </unl:body>
</unl:UNL>

Semantics

At present, a UNL document is conceived as a collection of independent UNL sentences. Nevertheless, it may also be viewed as a higher-level hypergraph, in which each UNL sentence constitutes a sub-hypergraph. These sub-hypergraphs can be interconnected through a special relation, nxt (“next”), which encodes their sequential order within the discourse.

The XUNL Project explores the introduction of semantic relations designed to capture the rhetorical structure of a document — that is, the logical, argumentative, or narrative connections between sentences or paragraphs. Such intersentential relations aim to represent discourse coherence and text organization in a manner analogous to how Universal Relations capture sentence-level meaning. Potential examples include caus (causal), cond (conditional), concs (concessive), expl (explanatory), seq (sequential), contr (contrastive), and supp (supportive). These relations could, for instance, expose the argumentative structure in persuasive texts or the chronological sequence in narratives.

It is important to note, however, that these intersentential relations are still exploratory and under active discussion. For this reason, they have not been included in the present UNL Specifications, which currently address only sentence-level semantic representation.

7. Main changes in UNL 2010

Main Changes in UNL 2010

The 2010 version of the UNL Specifications introduces significant improvements and modifications over the 2005 version. The principal changes are summarized as follows:

  1. Full XMLization of UNL Documents:

    The structure of UNL documents has been completely redesigned to follow XML conventions, making it more compatible with RDF and other semantic web technologies. This allows for easier integration, validation, and interchange of UNL data across platforms and tools.

  2. Adoption of the Uniform Concept Identifier (UCI) Framework for Universal Words:

    The representation of Universal Words (UWs) has been significantly overhauled. Each UW is now identified using a Uniform Concept Identifier (UCI), which enables true language independence by supporting multiple Uniform Concept Names for each concept, with English being only one option. The system emphasizes the role of the UNL Knowledge Base, as each UW is defined and located via its Uniform Concept Locator (UCL). Additionally, pro-UWs have been introduced to represent concepts that have no direct textual referent.

  3. Revised Set of Attributes:

    The attribute system has been thoroughly redesigned to support more precise annotation of hypergraph structures. The updated attributes improve expressiveness and facilitate the representation of complex semantic nuances.

  4. Hierarchical Organization of Relations:

    The set of Universal Relations has been reordered into a hierarchical structure. Upper-level relations now subsume lower-level relations, allowing for a more systematic and semantically coherent representation of roles and dependencies between concepts.

These changes collectively enhance the expressive power, interoperability, and conceptual rigor of the UNL framework, making it better suited for multilingual semantic representation and computational processing.