December 2010
The current specifications constitute a comprehensive set of guidelines and standards for the development and implementation of the Universal Networking Language (UNL). Prepared by the UNDL Foundation, they result from an extensive revision of the previous specifications and incorporate the outcomes of several UNL projects, as well as the experience gained through UNL annotation and tool development. These specifications strengthen the language-independency features of UNL and have been thoroughly discussed within the UNL community, particularly through the UNLweb platform. They replace the earlier specifications and provide a more consistent and robust framework for the effective use of UNL across different applications and platforms.
The UNL, short for Universal Networking Language, is a computer language designed to represent the meanings expressed by natural languages in a machine-readable and language-independent form. Its purpose is not to replicate the communicative functions of human language, but to provide a formal computational framework through which the semantics of any utterance can be explicitly encoded and processed by computers. By allowing machines to handle information at the level of meaning, the UNL enables the emulation of human linguistic abilities grounded in interpretation and comprehension, thereby offering a foundation for multilingual understanding, knowledge processing, and intelligent communication.
The UNL Programme was launched in 1996 as an initiative of the Institute of Advanced Studies of the United Nations University (UNU/IAS) in Tokyo, Japan. In January 2001, the United Nations University established an autonomous organization—the UNDL Foundation—to oversee the development and management of the Programme. The Foundation is a non-profit international organization, independent from the UNU yet maintaining special links with the United Nations. It inherited from the UNU/IAS the mandate to implement the UNL Programme, and its headquarters are located in Geneva, Switzerland.
Since its inception, the UNL Programme has reached several important milestones. The overall architecture of the UNL System has been developed, along with a suite of core software components and tools essential to its operation, which continue to be tested and refined. Over the past years, a substantial collection of linguistic resources in multiple native languages has been accumulated, supported by a robust technical infrastructure that enables the inclusion of many more languages in the system. In parallel, an increasing number of scientific papers and academic dissertations on UNL are being published each year.
One of the Programme’s most notable achievements is the recognition of UNL’s innovative nature and industrial applicability by the Patent Cooperation Treaty (PCT), granted in May 2002 through the World Intellectual Property Organization (WIPO). This was an unprecedented accomplishment within the United Nations system.
This recognition not only affirms the UNL’s value as a pioneering linguistic framework but also underscores its potential for practical applications in fields such as artificial intelligence, natural language processing, and multilingual communication.
The UNL is a declarative language designed to express information and knowledge in the form of a semantic hypergraph. A semantic hypergraph is a structured representation made of interconnected pieces of meaning, where each node (or hyper-node) corresponds to a concept, and each arc corresponds to a semantic relation between concepts.
In the UNL framework, meaning can be codified at three different levels, according to its nature: conceptual, relational, and attributive. Accordingly, the UNL semantic hypergraph is composed of three types of discrete semantic units:
Consider the following sentences:
{ara} القط السمين يجلس على السجادة. {/ara}
{chn} 胖猫坐在垫子上。 {/chn}
{deu} Die fette Katze sitzt auf dem Teppich. {/deu}
{eng} The fat cat sits on the mat. {/eng}
{esp} El gato gordo está sentado sobre la alfombra. {/esp}
{fra} Le gros chat est assis sur le tapis. {/fra}
{jpn} 太った猫がマットの上に座っています。 {/jpn}
{rus} Толстый кот сидит на коврике. {/rus}
In UNL, all these sentences are said to convey the same meaning and are represented in the following way:
{unl}
aoj(300986027, 102121620.@def)
exp(201543123.@present, 102121620.@def)
plc(201543123.@present, 103727837.@superior.@adjacent.@def
{/unl}
In this UNL expression:
102121620 may appear as “قط(icl>سنوريات)”,
“猫(icl>猫科)”, “cat(icl>feline)”, “Katze(icl>Katzenartige)”, “gato(icl>félido)”,
“chat(icl>félidé)”, “猫(icl>ネコ科)”, or “кошка(icl>кошачьи)”.
102121620 (“fat”, “gordo”, “gros”, “太った”,
“толстый”) is an attribute of 300986027 (“cat”, “gato”, “chat”, “猫”,
“кошка”); that 201543123 (“sit”, “sentarse”, “s’asseoir”, “座る”, “сидеть”)
denotes an action experienced by 300986027; and that 103727837
(“mat”, “alfombra”, “tapis”, “マット”, “коврик”) indicates the location associated with
300986027.
This representation captures the semantics of sentences independently of their original language, while providing a structured format that can be computationally processed. Crucially, it is language-independent, enabling cross-linguistic understanding and semantic interoperability. For example, languages without a direct lexical item for “cat” can retrieve its definition through the corresponding UCL in the UNL KB; conversely, languages that encode “to sit on” as a single lexical unit can map that unit to the entire relation (for instance, the plc relation linking two UWs) rather than to a single UW.
The UNL assumes that one of the primary functions of natural languages is to convey information—that is, to represent what we know about the world. This aboutness of language—its representational role—is the main focus of the UNL. The goal of the UNL is therefore not to replicate what natural languages do, but to formally represent what they represent.
The UNL assumes that any information conveyed through natural language can be formally and usefully represented by a semantic network. This idea is not new: semantic networks have been used in knowledge representation since the work of Charles S. Peirce, and as interlinguas for machine translation since the 1950s. In the UNL framework, this semantic network (the UNL graph) is composed of three types of discrete semantic entities: concepts, relations, and attributes. Concepts correspond to the nodes of the network; relations are the arcs connecting them; and attributes delimit the contextual use of concepts. This three-layered representation model is the cornerstone of the UNL and a distinctive feature compared to other semantic networks, which typically rely on only two levels: edges and vertices.
The UNL assumes that the information conveyed by natural languages is translatable—that is, languages differ not in their ability to express meaning, but in how they do so. To guarantee this translatability of information, the UNL requires that the semantic network be independent of any particular natural language (in other words, universal). This universality, however, is not meant to imply that all languages share a common set of meanings or that there are semantic primitives inherent to all languages; the notion of "universal" in UNL refers to the capacity of the system to be used and understood across all languages (as in "universal adapter" or "universal remote control"), rather than to represent a common denominator (as in "Universal Grammar"). This goal is achieved through a standardized set of semantic entities that are universally accessible—that is, whose meanings are explicitly defined and can be inferred from the components of the UNL System, such as the UNL Knowledge Base.
The main goal of the UNL Programme is to construct the UNL, an artificial language that can be used to process information across the language barriers.
The major commitments of the UNL are the following:
The UNL is first and foremost a knowledge representation language. The most important corollary of this first commitment is that UNL is not a meta-language, i.e., it is not intended to describe or represent natural languages; on the contrary, it is used to represent the information conveyed by natural languages. The goal of UNL is to represent "what was meant" and not "what was said". Accordingly, the UNL provides an interpretation rather than a translation of a given utterance. The UNL version of an existing document is not bound to preserve the lexical and the syntactic choices of the original, but must represent, in a non-ambiguous format, one of its possible meanings, preferably the most conventional one.
The UNL is an artificial language shaped to represent knowledge in a machine-tractable format. Like other formal systems, it seeks to provide the infrastructure for computers to handle what is meant by natural languages. Differently from other auxiliary languages (such as Esperanto, Interlingua, Volapük, Ido and others), the UNL is not intended to be a human language. We do not expect people to speak UNL or to communicate in UNL. But we do expect computers to process UNL: to generate UNL out of natural language, and vice-versa, with and without human aid. We expect computers to be able to extract information from UNL documents, and to detect paraphrases, entailments, implicatures, presuppositions, inferences, contradictions and redundancies among a set of propositions represented in UNL.
In the UNL approach, there are two basic movements: UNLization and NLization. UNLization is the process of representing the information conveyed by natural language into UNL; NLization, conversely, is the process of generating a natural language document out of UNL. In order to be fully "understandable" (and manageable) by machines, the UNL must be self-sufficient, i.e., should be as semantically complete and saturated as possible. The UNL representation must not depend on any implicit knowledge, and should explicitly codify all information. This means that the UNLization should be completely independent from the NLization, and vice-versa, i.e., the UNLization should not take into consideration which will be the target language or format of any future NLization; and the NLization should not need any information about the original source language or previous structure of any UNL document.
At first glance, the UNL seems to be a pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge. In addition to translation, the UNL is expected to be used in several other different tasks, such as text mining, multilingual document generation, summarization, text simplification, information retrieval and extraction, sentiment analysis etc. Indeed, in UNL-based systems there is no need for the source language to be different from the target language: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified, a localized or a simply rephrased version of the original.
The UNL is expected to be the language of the United Nations and, therefore, must not be circumscribed to any existing natural language in particular, under the risk of being rejected by the state members of the General Assembly.
As a formal system, the UNL is not expected to have any ambiguity, at any level. The sentence "The girls saw the boy with the telescope" must be represented, in UNL, in a way that there is no ambiguity concerning the meaning of "saw" (past tense of the verb "to see" x present tense of the verb "to saw" x noun "saw") or the dependency relations of "with the telescope" ("saw with the telescope" x "the boy with the telescope").
As a knowledge representation language, the UNL is not expected to have any redundancy. Expressions such as "free gift", "round circle" and "murder to death" are expected to be represented, in UNL, as "gift", "circle" and "murder", respectively. Likewise, sentences such as "Peter killed John", "Peter murdered John", "It's Peter who killed John" and "John was killed by Peter" are expected to be represented in UNL in the same way.
As a formal system, the UNL is always literal, i.e., fully compositional. UNL expressions must derive their semantic value thoroughly from their components, which must be explicitly defined in the UNL Knowledge Base. Accordingly, the UNL does not allow for any figure of speech, such as metaphor and metonymy. Tropes must be represented, in UNL, by their intended meaning. A sentence such as "John devoured thousands of books", for instance, must be represented, in UNL, as "John read many books eagerly".
As a knowledge representation language, the UNL is not expected to perform speech acts (such as promises, requests, orders etc.), but only to describe them in a constative manner. For instance, given a performative utterance such as "Can you pass me the salt?", the role of the UNL is to represent "you pass the salt to me" and to indicate that this was a polite request (@polite, @request). The UNL representation itself will not be a request, nor will be bound to provoke the same (perlocutionary) effect caused by the original utterance.
As a fully-explicit semantic system, the UNL is not expected to have ellipses or pro-forms, except when the referent is not present in the document (exophora). A sentence such as "The monkey took the banana and ate it" must be represented, in UNL, as "[The monkey]i took [the banana]j and [the monkey]i ate [the banana]j".
Universal Words, or simply UW's, are the words of UNL, and correspond to nodes - to be interlinked by Universal Relations and specified by Universal Attributes - in a UNL graph. They correspond to semantic discrete units conveyed by natural language open lexical categories (noun, verb, adjective and adverb). Any other semantic content (such as the ones conveyed by articles, prepositions, conjunctions etc.) is represented as attributes or relations. This criterion is not language-biased: if a given semantic value proves to be conveyed, in any language, by a closed class, it should not be represented as a UW, regardless of its realisation in other languages.
As the name suggests, Universal Words (UWs) are intended to be “universal.” This does not mean that they represent a common lexical denominator across all languages or any kind of semantic primitive. In the UNL framework, the notion of “universal” should be understood as “usable and understandable by all” (as in universal adapter, universal remote control, or universal screwdriver), rather than “common to all” (as in Universal Grammar). UWs are “universal” in the sense that they serve as uniform identifiers for entities defined in the UNL Knowledge Base—a comprehensive semantic map of world knowledge designed to make any concept translatable.
UWs may correspond to concepts that are widely lexicalized across languages (such as “cause to die”); to concepts lexicalized only in a few languages (for instance, “to execute someone by suffocation so as to leave the body intact for dissection”); to highly specific concepts lexicalized in a single language (such as “a person who forgives the first offense, tolerates the second, but never the third”); or even to concepts not lexicalized in any known language (for example, “women who typically wear red hats and white shoes in large theaters”).
The universality of a UW lies not in the type of concept it represents, but in the way it represents it. A UW provides a standardized means of encoding a concept so that any natural language can interact with it—either as a single node when the concept is lexicalized, or as a hyper-node (a subgraph) when it is not.
UWs represent sense, not reference. They are associated with the intension (sense, meaning, connotation) rather than with the extension (reference, denotation) of linguistic expressions. For example, the expressions “morning star” and “evening star” have the same reference (the planet Venus) but must be represented by different UWs, since they express distinct “modes of presentation” of the same object—that is, they differ in sense: “the last star to disappear in the morning” versus “the first star to appear in the evening.”
UWs correspond exclusively to meanings conveyed by open lexical categories—nouns, verbs, adjectives, and adverbs. Any other semantic content (such as those expressed by articles, prepositions, conjunctions, or particles) must be represented through attributes or relations. This criterion is language-independent: if a given semantic value is expressed by a closed class in any language, it should not be represented as a UW, regardless of its realization elsewhere. The only exception are pro-forms, which are represented by a special kind of UW, the pro-UW or null UW.
Simple UWs correspond only to meanings that are non-compositional—that is, to words and multiword expressions whose meanings cannot be fully derived from the combination of existing UWs, attributes, and relations. Compound and complex UWs, on the other hand, must be used whenever a meaning can be fully determined by the meanings of its constituents and by the rules used to combine them.
UWs are “universal” in the sense that they constitute the lexicon of a universal language—one capable of representing meanings expressible in any natural language. This does not mean that all UWs are lexicalized everywhere, nor that they represent semantic primitives or universally shared concepts. Instead, the UW repertoire aims to be as comprehensive as the total set of concepts found across different languages and cultures, regardless of how specific they may be. The UNL lexicon is therefore an open and evolving set, continually expanding to incorporate new cultural and linguistic developments.
Permanent UWs represent concepts with varying degrees of universality and are accordingly stored in three nested lexical databases, which together form the UNL Dictionary:
Each sense must be represented by one and only one UW, and each UW must correspond to one and only one sense. Homonymy, synonymy, and polysemy are therefore excluded from the UNL framework. For the sake of readability, however, the same UW may be displayed under different Uniform Concept Names (UCNs)—such as “قط(icl>سنوريات)”, “猫(icl>猫科)”, “cat(icl>feline)”, “Katze(icl>Katzenartige)”, “gato(icl>félido)”, “chat(icl>félidé)”, “猫(icl>ネコ科)”, or “кошка(icl>кошачьи)”—but all of these correspond to the same Uniform Concept Locator (UCL) and are merely alternative linguistic representations of the same sense.
Simple UWs serve as addresses—not definitions—of senses. A simple UW itself conveys little or no information about its meaning; it functions merely as a label. All information about the sense it represents is provided through the three lexical databases within the UNL framework: the UNL Dictionary, the UNL Knowledge Base, and the UNL Memory.
A Uniform Concept Identifier (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for Universal Words (UWs). In the UNL framework, UCIs are represented either as UCL (Uniform Concept Locator) or UCN (Uniform Concept Name).
Uniform Concept Locators (UCLs), like URLs, provide a method for locating a concept in the UNL Knowledge Base. They are represented as:
ucl://<AUTHORITY>/<ID>Where:
ucl is the scheme name for Uniform Concept Locatorsunlkb.unlarchive.org by default)For instance, the concept “a piece of furniture having a smooth flat top that is usually
supported by one or more vertical legs,”
which is lexicalized in English as the noun table, may be located through
ucl://unlkb.unlarchive.org/104379964. This address provides all the
information concerning the concept—its UNL definition,
relations, and attributes—which may be used by languages where this concept is not
lexicalized.
Uniform Concept Names (UCNs), like URNs, do not imply the availability of the identified resource. They are represented as:
ucn:<LID>:<NSS>Where:
ucn is the scheme name for Uniform Concept NamesFor instance, the concept “a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs,” which is lexicalized in English as table, may be associated with several different names:
ucn:eng:table(icl>furniture)ucn:fra:table(icl>mobilier)ucn:spa:mesa(icl>mobiliario)ucn:deu:Tisch(icl>Möbel)ucn:rus:стол(icl>мебель)UCNs must be unique. The namespace-specific string is normally divided into two parts: a root (e.g.: "table") and a suffix ("(icl>furniture)"), as exemplified above. The root can be a word or a multiword expression, while the suffix—always introduced by a UNL relation—is used to disambiguate the root.
Both UCL and UCN identify Universal Words. The difference is that the UCL is an address pointing to the position of the UW in the UNL Knowledge Base, whereas the UCN is only a readable name for the UW. The same address (UCL) may be associated with several UCNs, but each UCN corresponds to only one UCL. A UCL always describes an available UW, i.e., a UW that has been defined in the UNL KB, whereas a UCN is not necessarily linked to an address. In that sense, UCLs are more “official” than UCNs, which are mainly used to preserve the readability of UNL expressions.
In the UNL Document Structure, UCIs are often abbreviated to their last component, since the scheme, authority, and namespace can be inferred from the document header. For example:
104379964 instead of ucl://unlkb.unlarchive.org/104379964table(icl>furniture) instead of ucn:eng:table(icl>furniture)
Both UCLs and UCNs are types of URIs (Uniform Resource Identifiers), according to the syntax below:
<UW> ::= [ <scheme> ":" ] <hierarchical part> ;
<scheme> ::= "ucl" | "ucn" ;
<hierarchical part> ::= <UCL> | <UCN> ;
<UCL> ::= [ "//" <AUTHORITY> "/" ] <ID> ;
<AUTHORITY> ::= <character>+ ;
<ID> ::= <digit>{9} ;
<UCN> ::= [ <LID> ":" ] <NSS> ;
<LID> ::= <character>{3} ; (ISO 639-2 language code)
<NSS> ::= <ROOT> <SUFFIX>
<ROOT> ::= <word>;
<SUFFIX> ::= "(" <relation> ">" <word> ")" ;
<relation> ::= relation of the UNL framework (e.g.: icl, iof, equ, etc.) ;
<word> ::= <character>+;
<character> ::= UTF-8 character;
Where:
ucl, used for Uniform Concept Locators (e.g.:
ucl://unlkb.unlarchive.org/104379964)
ucn, used for Uniform Concept Names (e.g.:
ucn:eng:table(icl>furniture))
ucl,
and provides the location of the concept in the UNL Knowledge Base.ucn,
and provides a readable name for the concept.Universal Words (UWs) can be classified according to their status in the UNL system (as permanent or temporary) and their internal structure (as simple, compound, or complex).
Permanent UWs are included in the UNL Dictionary and correspond to concepts that have already been lexicalized in at least one natural language—that is, concepts expressed as single lexical items and listed in dictionaries. Permanent UWs may be simple, compound, or complex.
Their internal structure determines how they are represented:
Simple UWs are represented as isolated nodes (Uniform Concept Identifiers, or UCIs). They denote non-compositional concepts—those whose meaning cannot be fully derived from constituent elements. Examples include “butterfly” (> “a flying insect with colorful wings”), “pineapple” (> “a tropical fruit”), or “music” (> “an art form consisting of sound and silence”). These words behave as single lexical units even though their meaning can only be paraphrased through longer expressions.
Compound UWs are represented as UCIs combined with Universal Attributes. They are used when a concept can be completely derived from an existing simple UW and a UNL attribute. For instance, the concept expressed by the English word “bigger” corresponds to the UW for “big” specified by the degree attribute “@more” (comparative of superiority).
Complex UWs are hyper-nodes—subgraphs within the UNL graph composed of UCIs interlinked by Universal Relations and specified by Universal Attributes. They are used when a concept can be derived from the combination of existing UWs, relations, and attributes. For example, the English expression “to stamp” (meaning “affix a stamp to”) can be represented in UNL as the graph corresponding to that definition.
Temporary UWs are provisional entries used to represent:
Temporary UWs are always represented between double quotation marks and follow the spelling conventions of their source language (e.g., capitalization). For the time being, they are also expected to be transliterated into the Roman script.
| Type | Concept (in English) | Lexicalization (in English) | UCL | UCN (in English) |
|---|---|---|---|---|
| Simple UW | above average | big | 301382086 | big(icl>size) |
| Compound UW | comparative of above average | bigger | 301382086.@more | big(icl>size).@more |
| Complex UW | affix a stamp to | to stamp | obj(201356370,106796119) | obj(to affix(icl>to attach), stamp(icl>seal)) |
| Temporary UW | UNDL Foundation | UNDL Foundation | "UNDL Foundation" | "UNDL Foundation" |
Permanent Universal Words (UWs) are classified into four main categories according to their semantic roles rather than their grammatical forms. These categories are language-independent and reflect conceptual distinctions within the UNL system.
Adjectival UWs (J) — designate attributes or
qualities of entities.
Example:
delighting the senses or exciting intellectual or emotional admiration
(→ eng: beautiful, with beauty; fra: beau, belle; esp: bello, hermoso,
lindo; deu: schön; rus: красивый; etc.).
Adverbial UWs (A) — designate circumstances such as
time, manner, or degree.
Example: in a rapid manner (→ eng: quickly, rapidly,
fast, swiftly; fra: rapidement, vite; por: rapidamente, depressa; gre:
γρήγορα; heb: במהירות; etc.).
Nominal UWs (N) — designate things, objects, or
entities.
Example:
a perennial plant with an elongated stem or trunk (→ eng:
tree; fra: arbre; esp: árbol; deu: Baum; rus: дерево; hin: पेड़; zho: 树;
ara: شجرة; tur: ağaç; etc.).
Verbal UWs (V) — designate actions, processes, or
states of being.
Example: to move swiftly on foot (→ eng: run; fra:
courir; esp: correr; deu: laufen; rus: бежать; swa: kimbia; zul: gijima;
mal: berlari; etc.).
These categories are defined by semantic value rather than linguistic form. They describe how a UW functions conceptually within the UNL framework, not how it is realized in any particular natural language.
For instance, an adjectival UW (e.g., “300217728” = “delighting the senses or exciting intellectual or emotional admiration”) may correspond to an adjective in English (“beautiful”), but could also be expressed as a prepositional phrase (“with beauty”) or a verbal phrase (“possessing beauty”), depending on the target language.
The UNL representation is designed to be as semantically saturated as possible. Deictic expressions are expected to be resolved during the UNLization process, meaning that ellipses and natural language pro-forms (such as “he”, “she”, “it”, “they”, etc.) should be replaced by their corresponding antecedents whenever possible.
However, when no substitute can be found for elements that depend on contextual information
unavailable within the text, pro-UWs are used. These are represented by the
null UW "00", optionally combined with specific attributes.
The main cases include:
Refers to entities outside the text, such as personal pronouns (“I”, “you”, “we”,
etc.) without textual antecedents. These are represented by the null UW
"00" followed by person attributes:
@1 (1st person singular), @2 (2nd person singular),
@3 (3rd person singular),
@1.@pl (1st person plural), @2.@pl (2nd person plural),
and @3.@pl (3rd person plural).
These refer to general or unspecified entities (“none”, “anyone”, “everything”, etc.)
and are represented by the null UW "00" combined with determiner
attributes:
"00.@no" (“none”), "00.@any.@person" (“anyone”),
"00.@every.@thing" (“everything”), and so on.
Used to refer to omitted constituents in a sentence (“who”, “whom”, “where”, etc.),
these are represented by "00.@wh". The interpretation depends on the
relation:
"00.@wh" corresponds to “who” when the target is the argument of an
agt (agent) relation, to “when” for a tim (time) relation,
and to “where” for a plc (place) relation.
Standalone interjections expressing emotion (“Ouch!”, “Yeah!”, “Shhh!”, etc.) are
represented by the null UW "00" combined with an emotional attribute
(e.g., @pain, @joy, @anger).
When an omitted element cannot be recovered from context, it is represented by the
null UW "00" without attributes. For instance, in “To be or not to
be?”, the representation could be
aoj:01(exist,00), aoj:02(exist.@not,00), or(:01,:02), because the
necessary subject is missing and cannot be linked to any particular referent.
All cases above involve situations in which semantic information cannot be fully saturated.
Whenever possible, pro-forms and ellipses should be replaced by their referents. For
example, the pro-UW "00.@3" must not be used in “Peter said that he will not
come” if “he” refers to “Peter”. The correct representation would be
Peter(i) said that Peter(i) will not come.
Finally, in the UNL framework, pronouns must be distinguished from determiners. For instance,
“which” in “Which is that?” functions as an interrogative pronoun and is represented as
"00.@wh" when the referent is unknown; in contrast, “which” in “Which book is
that?” is a determiner, represented as the attribute .@wh applied to
"book" (book.@wh).
Most named entities — such as names of people, places, organizations, or brands — are represented as temporary UWs, since it would be impractical to include all of them in the UNL Dictionary. However, certain proper names of broad and lasting relevance (e.g., “England”, “William Shakespeare”, “Romeo and Juliet”, “Romeo”) have been incorporated into the UNL Dictionary and are treated as permanent UWs.
The current criterion for inclusion is based on Wikipedia: if a proper name exists as an entry in Wikipedia, it should be defined as a permanent UW and included in the UNL Dictionary (or the UNL Unabridged Dictionary).
Universal Words (UWs) are organized into several interconnected lexical databases, each serving a specific function within the UNL linguistic infrastructure:
The UNL Dictionary is a flat list of UWs with their corresponding semantic features. It is subdivided into three nested dictionaries:
The UNL Core Dictionary contains permanent UWs that are expected to be lexicalized in all natural languages.
The UNL Abridged Dictionary includes permanent UWs that are lexicalized in at least two different language families, and therefore encompasses the Core Dictionary.
The UNL Unabridged Dictionary expands upon the Abridged Dictionary and comprises the complete set of permanent UWs—that is, all concepts lexicalized in at least one natural language.
The UNL Knowledge Base (UNL KB) is a semantic network where UWs are interconnected through Universal Relations. Unlike the Dictionary, which provides only general lexical and semantic features (e.g., lexical category, semantic class, abstractness, or cardinality), the UNL KB represents the intension—the conceptual meaning—of each UW. For example, the UW for “dog” is linked to the UWs for “domesticated”, “carnivorous”, and “mammal”.
The UNL Ontology is a component of the UNL Knowledge Base that
focuses specifically on ontological relations, namely icl
(“is-a-kind-of”) and iof (“is-an-instance-of”). It structures the
conceptual hierarchy of the UNL system.
The UNL Memory is another network of UWs interconnected through Universal Relations. However, while the UNL KB represents the intension (meaning) of a UW, the UNL Memory represents its extension—that is, the set of possible instances or contextual uses of a UW. For instance, in the UNL Memory, the UW “dog” may appear as the agent of “to bite”, the object of “to eat”, or the instrument of “to chase”.
Universal Attributes are self-referential arcs—links connecting a node to itself. Unlike Universal Relations, which connect different nodes, attributes function as one-place predicates, that is, operations applying to a single argument. In the UNL framework, attributes are primarily used to encode grammatical information conveyed by natural languages, such as tense, mood, aspect, number, and similar categories.
The set of Universal Attributes is considered to be universal and relatively stable, meaning that new attributes are added only in exceptional cases.
The syntax of attributes is defined as follows:
<attribute> ::= "@"<attribute_name>
<attribute_name> ::= <character>+
<character> ::= {"a",...,"z","_"}
where:
< > variable
" " terminal symbol
::=... is defined as
...
{ }
disjunction ("or")
+ to be used one or more times
... to be repeated more
than 0
times
Attribute names are always lower case words or expressions.
In practice, English words (e.g., “past”, “present”) or mnemonic abbreviations (e.g.,
“def”, “pl”) are typically used for attribute labels. This, however, does not mean that
attributes are limited to features found in English. Several attributes correspond to
grammatical distinctions not expressed in English, such as @trial and
@quadrual (used for quantification), @equivalent and
@inferior (used for social deixis), or @recent and
@remote (used for absolute tense). English labels are adopted solely to
enhance readability and consistency.
No blank space is allowed inside an attribute name.
Attributes are annotations applied to nodes or hypernodes within a UNL hypergraph. They specify the circumstances under which these nodes (or hypernodes) are to be interpreted.
Attributes may convey two main types of information:
1. Grammatical and lexical information — expressed by bound morphemes and closed-class elements such as determiners (articles and demonstratives), adpositions (prepositions, postpositions, and circumpositions), conjunctions, auxiliary and quasi-auxiliary verbs (including auxiliaries, modals, coverbs, and preverbs), and degree adverbs (specifiers):
2. Pragmatic and contextual information — concerning the external conditions of the utterance, including non-verbal and discourse-level features such as prosody, politeness, rhetorical organization, and social deixis:
Since UNL is designed to represent what is meant rather than what is said, some attributes, especially those related to pragmatic and contextual information, may be omitted. The primary role of several attributes is to enable the representation of additional content that, while not essential to the core meaning, may enrich it — for instance, the emotions conveyed by the original utterance or the speech act performed through it.
In this sense, the sentence “Peter devoured a thousand books” should be represented as “Peter
read many books eagerly.” This representation may optionally be refined with additional
information, indicating, for example, that “many” was originally expressed through hyperbole
(book.@multal.@hyperbole), or that “read” was conveyed metaphorically
(read.@metaphor). A similar case is “John was killed by Peter,” which should be
represented as “Peter killed John,” with the optional attribute @passive
indicating that the original sentence was in the passive voice
((Peter kill John).@passive). In any case, the use of such attributes remains
optional, as the essential meaning is already preserved.
Attributes can be combined to provide a more precise characterization of a node. For example,
book.@def.@singular specifies a definite, singular book, while
run.@past.@progressive indicates an action of running that occurred in the past
and was ongoing at that time.
Universal Relations (formerly referred to as “links”) are the labeled arcs that connect one node to another in a UNL graph. They encode two-place semantic predicates, that is, relations that hold between two Universal Words (UWs). Each Universal Relation specifies the semantic role that one concept plays in relation to another—such as agent, object, or instrument. Together, these relations define the internal semantic structure of a UNL representation.
Universal Relations are binary and directed: they connect a source concept to a target concept. The inventory of relations is deliberately restricted and stable, ensuring cross-linguistic consistency and avoiding ad hoc extensions. Their design aims to provide a universal set of semantic functions that can be applied uniformly across languages.
Although the labels of Universal Relations may resemble those used in syntax, they do not correspond to grammatical functions. Instead, they express conceptual roles. For example, the idea of “something that initiates an event,” represented by agt (agent), is conceptually distinct from the grammatical notion of “subject,” even though the two often coincide. The agent role may be realized in different grammatical forms—such as a noun modifier (“the student’s invention”), a prepositional phrase (“made by the student”), or a derived noun (“the builder”). The goal of Universal Relations is precisely to capture such conceptual equivalences beyond their surface grammatical realizations.
Universal relations are represented as follows:
<rel>:<scope>(<source>,<target>)
where:
The following principles govern the use of Universal Relations in UNL:
1. Relations are directional, not commutative.
The order of arguments matters. For instance:
cnt(evidence, absence) ≠ cnt(absence, evidence)
In other words, “evidence of absence” is not the same as “absence of evidence.”
2. The target defines the relation.
In the structure <relation>(<source>, <target>), the target is the element that fulfills the role expressed by the relation. For example:
agt(kill, Peter) → ‘Peter’ is the agent of ‘kill’obj(kill, Peter) → ‘Peter’ is the patient of ‘kill’tim(kill, yesterday) → ‘yesterday’ is the time of ‘kill’plc(kill, kitchen) → ‘kitchen’ is the place of ‘kill’mod(book, beautiful) → ‘beautiful’ is a modifier of ‘book’icl(document, book) → ‘book’ is a type of ‘document’iof(city, Paris) → ‘Paris’ is an instance of ‘city’3. Relations express semantic functions, not syntactic ones.
The same relation may appear in different syntactic roles. For instance, the relation gol (goal) can occur as:
gol(received, Peter)gol(gave, Peter)gol(bought, Peter)4. Relations disambiguate meaning.
They help resolve lexical or syntactic ambiguities. For example, the English preposition “in” in ‘Peter works in X’ may represent several meanings:
plc(work, X) → physical place (‘Peter works in Geneva’)lpl(work, X) → logical place (‘Peter works in politics’)cnt(work, X) → content (‘Peter works in improving a technology’)tim(work, X) → time (‘Peter works in the summer’)dur(work, X) → duration (‘Peter works in ten hours’)man(work, X) → manner (‘Peter works in intervals’)5. Relations are independent of lexical category.
The same semantic relation can link concepts expressed by nouns or verbs:
agt: ‘John arrived’ → agt(arrived, John); ‘arrival of John’ → agt(arrival, John)gol: ‘go to NY’ → gol(go, NY); ‘train to NY’ → gol(train, NY)cnt: ‘talk about John’ → cnt(talk, John); ‘book about John’ → cnt(book, John)6. More general relations can often be refined into more specific ones through the use of attributes.
In many cases, specific relations can be reformulated using more general relations combined with attributes:
src(come, NY) = plc(come, NY.@origin)gol(go, NY) = plc(go, NY.@destination)via(go, Geneva) = plc(go, Geneva.@transversal)tmf(work, early) = tim(work, early.@since)tmt(work, late) = tim(work, late.@until)dur(work, summer) = tim(work, summer.@simultaneous)ins(kill, knife) = man(kill, knife.@instrument)7. The choice of relations depends on the semantic structure of the UW.
Each Universal Word determines which relations are appropriate for its arguments. For instance:
agt(kill, subject), obj(kill, object)exp(love, subject), cnt(love, object)agt(give, subject), cnt(give, object), gol(give, recipient)Universal Relations are organized in a hierarchy where lower nodes subsume upper nodes. The topmost level is the relation "rel", which simply indicates that there is a semantic relation between two elements.
rel
| Tag | Relation | Definition | Example |
|---|---|---|---|
| agt | agent | A participant in an action or process that provokes a change of state or location. | John killed Mary = agt(killed;John) Mary was killed by John = agt(killed;John) arrival of John = agt(arrival;John) |
| and | conjunction | Used to state a conjunction between two entities. | John and Mary = and(John;Mary) both John and Mary = and(John;Mary) |
| ant | opposition or concession | Indicates that two entities do not share the same meaning or reference; also used for concession. | John is not Peter = ant(Peter;John) 3 + 2 != 6 = ant(6;3+2) |
| aoj | object of an attribute | The subject of a stative verb or the predicative relation between predicate and subject. | John is sad = aoj(sad;John) the book contains many pictures = aoj(contain;book) |
| ben | beneficiary | A participant who is advantaged or disadvantaged by an event. | John works for Peter = ben(works;Peter) |
| cnt | content or theme | The object of a stative or experiential verb, or the theme of an entity. | John loves Mary = cnt(love;Mary) a book about Peter = cnt(book;Peter) |
| con | condition | A condition of an event. | If I see him, I will tell him = con(I will tell him;I see him) |
| dur | duration or co-occurrence | The duration of an entity or event; co-occurrence of events. | John worked for five hours = dur(worked;five hours) |
| equ | synonym or paraphrase | Indicates that two entities share the same meaning or reference; also semantic apposition. | The morning star is the evening star = equ(evening star;morning star) |
| exp | experiencer | A participant who receives a sensory impression or is the locus of an experiential event. | John believes in Mary = exp(believe;John) |
| fld | field | Indicates the semantic domain of an entity. | sentence (linguistics) = fld(sentence;linguistics) |
| gol | final state, place, destination or recipient | The final state, place, destination or recipient of an entity or event. | John received the book = gol(received;John) John goes to NY = gol(go;NY) |
| icl | hyponymy, is a kind of | Refers to a subclass relation (is-a-kind-of). | Dogs are mammals = icl(mammal;dogs) |
| ins | instrument or method | An inanimate entity or method used by an agent to implement an event. | The cook cut the cake with a knife = ins(cut;knife) |
| iof | is an instance of | Refers to an instance or individual element of a class. | John is a human being = iof(human being;John) |
| lpl | logical place | A non-physical place where an entity or event occurs or a state exists. | John works in politics = lpl(works;politics) |
| man | manner | Indicates how the action, experience or process of an event is carried out. | John bought the car quickly = man(bought;quickly) |
| mat | material | Indicates the material of which an entity is made. | A statue in bronze = mat(statue;bronze) |
| mod | modifier | A general modification of an entity. | a beautiful book = mod(book;beautiful) |
| nam | name | The name of an entity. | The city of New York = nam(city;New York) |
| obj | patient | A participant undergoing a change of state or location in an action or process. | John killed Mary = obj(killed;Mary) |
| opl | objective place | A place affected by an action or process. | John was hit in the face = opl(hit;face) |
| or | disjunction | Indicates a disjunction between two entities. | John or Mary = or(John;Mary) |
| per | proportion, rate, distribution or basis for a comparison | Indicates a measure or quantification of an event or basis for comparison. | twice a week = per(twice;week) John is more beautiful than Peter = per(beautiful;Peter) |
| plc | place | The location or spatial orientation of an entity or event. | John works here = plc(work;here) John works in NY = plc(work;NY) |
| pof | is part of | Refers to a part–whole relation. | John is part of the family = pof(family;John) |
| pos | possessor | The possessor of a thing. | John's book = pos(book;John) |
| ptn | partner | A secondary (non-focused) participant in an event. | John wrote the letter with Peter = ptn(wrote;Peter) |
| pur | purpose | The purpose of an entity or event. | John left early in order to arrive early = pur(John left early;arrive early) |
| qua | quantity | Expresses the quantity of an entity. | two books = qua(book;2) |
| res | result or factitive | A referent that results from an entity or event. | They built a very nice building = res(built;a very nice building) |
| rsn | reason | The reason of an entity or event. | John left because it was late = rsn(John left;it was late) |
| seq | consequence | Used to express consequence. | I think therefore I am = seq(I think;I am) |
| src | initial state, place, origin or source | The initial state, place, origin or source of an entity or event. | John came from NY = src(came;NY) |
| tim | time | The temporal placement of an entity or event. | John came yesterday = tim(came;yesterday) |
| tmf | initial time | The initial time of an entity or event. | John worked since early = tmf(worked;early) |
| tmt | final time | The final time of an entity or event. | John worked until late = tmt(worked;late) |
| via | intermediate state or place | The intermediate place or state of an entity or event. | John went from NY to Geneva through Paris = via(went;Paris) |
UNL sentences, or UNL expressions, are sentences of UNL. They are hypergraphs made out of nodes (Universal Words) interlinked by binary semantic Universal Relations and modified by Universal Attributes. UNL sentences have been the basic unit of representation inside the UNL framework.
There are two different ways of representing UNL sentences: the table format and the list format. In the list format, UWs and relations are represented separately; in the table format, they constitute a single structure.
The syntax for UNL sentences in the list format is the following:
<UNL sentence> ::= "[W]" <list of UWs> "[/W]" [ "[R]" <list of relations> "[/R]" ]
<list of UWs> ::= <UW+attributes> [<UW+attributes>...]
<UW+attributes> ::= <UW>{:<Scope-ID>}[<attribute list>]:<UW-ID>
<list of relations> ::= <binary relation>[<binary relation>...]
<binary relation> ::= <source node><relation[":"<Scope-ID>]><target node>
<source node> ::= <UW-ID>
<target node> ::= <UW-ID> ]
The syntax for UNL sentences in the table format is the following:
<UNL sentence> ::= <list of relations>
<list of relations> ::= <binary relation>[<binary relation>...]
<binary relation> ::= <relation> [":"<Scope-ID>] "(" <source node> , <target node> ")"
<source node> ::= <UW+attributes>
<target node> ::= <UW+attributes>
<UW+attributes> ::= <UW>{:<Scope-ID>}[<attribute list>]:<UW-ID>
Where
" and " indicate a predefined delimiter
< and > indicate a
non-terminal
symbol
{ and } indicate a range
[ and ] indicate an omissible part
...
indicates more than 0 times repetition of the front part
::=
indicates
the left part can be replaced by the right part
aoj(300986027, 102121620.@def) exp(201543123.@present, 102121620.@def) plc(201543123.@present, 103727837.@superior.@adjacent.@def)
[W] 300986027:01 102121620:@def:02 201543123:@present:03 103727837:@superior.@adjacent.@def:04 [/W] [R] 01aoj02 03exp02 03plc04 [/R]
The UNL/XML document adopts the structural conventions of RDF/XML to ensure compatibility with Semantic Web technologies while preserving the specific representational requirements of the Universal Networking Language (UNL). The root element <unl:UNL> declares the relevant namespaces and schema locations. These include the main UNL namespace (xmlns:unl), which defines the vocabulary for UNL-specific tags; the Dublin Core namespace (xmlns:dc), used for metadata and provenance information; and the XML Schema Instance namespace (xmlns:xsi), which supports schema validation through xsi:schemaLocation. The schema location identifies the authoritative reference for the structure and semantics of UNL/XML documents (typically hosted at https://unlkb.unlarchive.org).
The UNL/XML document is divided into two main sections: a header and a body. The header (<unl:metadata>) specifies essential provenance information such as creator, encoding, schema, and authority. The body (<unl:body>) contains the actual UNL representation, organized hierarchically into headings, paragraphs, and sentences. Each sentence may include the original linguistic content in one or more languages (<unl:org>), the corresponding UNL graph (<unl:unl>), and possibly additional linguistic variants. This structure guarantees that each UNL document is both machine-interpretable and semantically self-contained.
<unl_document> ::= '<?xml version="1.0" encoding="UTF-8"?>'
'<unl:UNL' <namespace-declarations> '>'
<metadata>
<body>
'</unl:UNL>'
<namespace-declarations> ::= 'xmlns:unl="https://unlkb.unlarchive.org/schema/unl#"'
'xmlns:dc="http://purl.org/dc/elements/1.1/"'
'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"'
'xsi:schemaLocation="https://unlkb.unlarchive.org/schema/unl# https://unlkb.unlarchive.org/schema/unl.xsd"'
<metadata> ::= <unl:metadata>
<dc:title>TEXT</dc:title>
<dc:creator>TEXT</dc:creator>
[<dc:date>DATE</dc:date>]
[<dc:language><iso639-code></dc:language>]
[<dc:rights>TEXT</dc:rights>]
<unl:scheme>TEXT</unl:scheme>
<unl:authority>URI</unl:authority>
</unl:metadata>
<body> ::= <unl:body> { <heading> | <paragraph> }+ </unl:body>
<heading> ::= <unl:heading level=DIGIT>TEXT</unl:heading>
<paragraph> ::= <unl:paragraph id="<id>"> { <sentence> }+ </unl:paragraph>
<sentence> ::= <unl:sentence id="<id>">
{ <org> }+ <unl> [ <out> ]*
</unl:sentence>
<org> ::= <unl:org lang="<iso639-code>" >TEXT</unl:org>
<unl> ::= <unl:unl uci="<uci>" format="<format>"> { UNL SENTENCE }+ </unl:unl>
<out> ::= <unl:out lang="<iso639-code>">TEXT</unl:out>
<format> ::= 'table' | 'list'
<uci> ::= 'ucl' | <ucn>
<ucn> ::= <iso639-code>
<?xml version="1.0" encoding="UTF-8"?>
<unl:UNL
xmlns:unl="https://unlkb.unlarchive.org/schema/unl#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://unlkb.unlarchive.org/schema/unl# https://unlkb.unlarchive.org/schema/unl.xsd">
<unl:metadata>
<dc:creator>John Doe</dc:creator>
<dc:title>Example of UNL/XML Document</dc:title>
<dc:date>2025-11-05</dc:date>
<dc:language>en</dc:language>
<unl:encoding>UTF-8</unl:encoding>
<unl:scheme>UNL 2025</unl:scheme>
<unl:authority>https://unlkb.unlarchive.org</unl:authority>
</unl:metadata>
<unl:body>
<unl:heading>Sample Section</unl:heading>
<unl:paragraph>
<unl:sentence id="s1">
<unl:org lang="eng">The fat cat sits on the mat.</unl:org>
<unl:unl uci="ucl" format="table">
aoj(300986027, 102121620.@def)
exp(201543123.@present, 102121620.@def)
plc(201543123.@present, 103727837.@superior.@adjacent.@def)
</unl:unl>
<unl:out lang="fra">Le gros chat est assis sur le tapis.</unl:out>
<unl:out lang="deu">Die fette Katze sitzt auf der Matte.</unl:out>
<unl:out lang="por">O gato gordo está sentado no tapete.</unl:out>
<unl:out lang="ita">Il grosso gatto è seduto sul tappeto.</unl:out>
</unl:sentence>
</unl:paragraph>
</unl:body>
</unl:UNL>
At present, a UNL document is conceived as a collection of independent UNL sentences. Nevertheless, it may also be viewed as a higher-level hypergraph, in which each UNL sentence constitutes a sub-hypergraph. These sub-hypergraphs can be interconnected through a special relation, nxt (“next”), which encodes their sequential order within the discourse.
The XUNL Project explores the introduction of semantic relations designed to capture the rhetorical structure of a document — that is, the logical, argumentative, or narrative connections between sentences or paragraphs. Such intersentential relations aim to represent discourse coherence and text organization in a manner analogous to how Universal Relations capture sentence-level meaning. Potential examples include caus (causal), cond (conditional), concs (concessive), expl (explanatory), seq (sequential), contr (contrastive), and supp (supportive). These relations could, for instance, expose the argumentative structure in persuasive texts or the chronological sequence in narratives.
It is important to note, however, that these intersentential relations are still exploratory and under active discussion. For this reason, they have not been included in the present UNL Specifications, which currently address only sentence-level semantic representation.
The 2010 version of the UNL Specifications introduces significant improvements and modifications over the 2005 version. The principal changes are summarized as follows:
The structure of UNL documents has been completely redesigned to follow XML conventions, making it more compatible with RDF and other semantic web technologies. This allows for easier integration, validation, and interchange of UNL data across platforms and tools.
The representation of Universal Words (UWs) has been significantly overhauled. Each UW is now identified using a Uniform Concept Identifier (UCI), which enables true language independence by supporting multiple Uniform Concept Names for each concept, with English being only one option. The system emphasizes the role of the UNL Knowledge Base, as each UW is defined and located via its Uniform Concept Locator (UCL). Additionally, pro-UWs have been introduced to represent concepts that have no direct textual referent.
The attribute system has been thoroughly redesigned to support more precise annotation of hypergraph structures. The updated attributes improve expressiveness and facilitate the representation of complex semantic nuances.
The set of Universal Relations has been reordered into a hierarchical structure. Upper-level relations now subsume lower-level relations, allowing for a more systematic and semantically coherent representation of roles and dependencies between concepts.
These changes collectively enhance the expressive power, interoperability, and conceptual rigor of the UNL framework, making it better suited for multilingual semantic representation and computational processing.