UNL Logo

The Universal Networking Language (UNL)

Specifications

Version 2010

UNDL Foundation

December 2010

The current specifications constitute a comprehensive set of guidelines and standards for the development and implementation of the Universal Networking Language (UNL). Prepared by the UNDL Foundation, they result from an extensive revision of the previous specifications and incorporate the outcomes of several UNL projects, as well as the experience gained through UNL annotation and tool development. These specifications strengthen the language-independency features of UNL and have been thoroughly discussed within the UNL community, particularly through the UNLweb platform. They replace the earlier specifications and provide a more consistent and robust framework for the effective use of UNL across different applications and platforms.

Table of Contents

1. Introduction

The UNL, an acronym for “Universal Networking Language”, is a computer language designed to represent the meaning conveyed by natural languages in a machine-readable and language-independent way. It does not aim to replicate the functions of natural languages in human communication, but rather to provide a formal and computational framework through which the semantics of any utterance can be explicitly encoded and processed by computers. By enabling machines to handle information at the level of meaning, the UNL supports the emulation of human linguistic abilities that rely on interpretation and comprehension, thus offering a foundation for multilingual understanding, knowledge processing, and intelligent communication.

The UNL is a declarative language designed to express information and knowledge in the form of a semantic hypergraph. A semantic hypergraph is a structured representation made of interconnected pieces of meaning, where each node (or hyper-node) corresponds to a concept, and each arc corresponds to a semantic relation between concepts. In the UNL framework, meaning can be codified at three different levels, according to its nature: conceptual, relational, and attributive. Accordingly, the UNL semantic hypergraph is composed of three types of discrete semantic units:

Consider the following English sentence:

{eng} The cat is on the mat. {/eng}

This sentence can be represented in UNL as follows:

{unl:eng} plc( cat(icl>feline).@def , mat(icl>floor cover).@def.@on.@present ) {/unl}

In this UNL expression, {unl:eng} indicates that we are using the UCN schema for English to annotate the sentence (see below). The strings "cat(icl>feline)" and "mat(icl>floor cover)" are UWs representing the concepts conveyed by the English words "cat" and "mat". The suffixes "(icl>feline)" and "(icl>floor cover)" are restrictions to avoid lexical ambiguity. The relation "plc" (place) specifies the relationship between the two concepts. The attribute "@def" indicates that the cat is a specific cat known to both the speaker and the listener. The same applies to the mat. The attribute "@present" indicates that the action is taking place in the present time. The attribute "@on" indicates the position of the cat in relation to the mat. Accordingly, this UNL expression can be read as "The specific cat is located on the specific mat in the present time".

This representation captures the meaning of the original English sentence while providing a structured format that can be easily processed by computers.

1.1 History

The UNL Programme started in 1996, as an initiative of the Institute of Advanced Studies of the United Nations Universityin Tokyo, Japan. In January 2001, the United Nations University set up an autonomous organization, the UNDL Foundation, to be responsible for the development and management of the UNL Programme. The Foundation, a non-profit international organisation, has an independent identity from the United Nations University, although it has special links with the United Nations. It inherited from the UNU/IAS the mandate of implementing the UNL Programme. Its headquarters are based in Geneva, Switzerland.

The UNL Programme has already crossed important milestones. The overall architecture of the UNL System has been developed with a set of basic software and tools necessary for its functioning. These are being tested and improved. A vast amount of linguistic resources from the various native languages already under development has been accumulated in the last few years. Moreover, the technical infrastructure for expanding these resources is already in place, thus facilitating the participation of many more languages in the UNL system from now on. A growing number of scientific papers and academic dissertations on the UNL are being published every year.

The most visible accomplishment so far is the recognition by the Patent Co-operation Treaty (PCT) of the innovative character and industrial applicability of the UNL, which was obtained in May 2002 through the World Intellectual Property Organisation (WIPO). Acquiring the patent for the UNL is a completely novel achievement within the United Nations.

1.2 Commitments

The main goal of the UNL Programme is to construct the UNL, an artificial language that can be used to process information across the language barriers.

The major commitments of the UNL are the following:

I - The UNL must represent information
The UNL is first and foremost a knowledge representation language. The most important corollary of this first commitment is that UNL is not a meta-language, i.e., it is not intended to describe or represent natural languages; on the contrary, it is used to represent the information conveyed by natural languages. The goal of UNL is to represent "what was meant" and not "what was said". Accordingly, the UNL provides an interpretation rather than a translation of a given utterance. The UNL version of an existing document is not bound to preserve the lexical and the syntactic choices of the original, but must represent, in a non-ambiguous format, one of its possible meanings, preferably the most conventional one.
II - The UNL must be a language for computers
The UNL is an artificial language shaped to represent knowledge in a machine-tractable format. Like other formal systems, it seeks to provide the infrastructure for computers to handle what is meant by natural languages. Differently from other auxiliary languages (such as Esperanto, Interlingua, Volapäk, Ido and others), the UNL is not intended to be a human language. We do not expect people to speak UNL or to communicate in UNL. But we do expect computers to process UNL: to generate UNL out of natural language, and vice-versa, with and without human aid. We expect computers to be able to extract information from UNL documents, and to detect paraphrases, entailments, implicatures, presuppositions, inferences, contradictions and redundancies among a set of propositions represented in UNL.
III - The UNL must be self-sufficient
In the UNL approach, there are two basic movements: UNLization and NLization. UNLization is the process of representing the information conveyed by natural language into UNL; NLization, conversely, is the process of generating a natural language document out of UNL. In order to be fully "understandable" (and manageable) by machines, the UNL must be self-sufficient, i.e., should be as semantically complete and saturated as possible. The UNL representation must not depend on any implicit knowledge, and should explicitly codify all information. This means that the UNLization should be completely independent from the NLization, and vice-versa, i.e., the UNLization should not take into consideration which will be the target language or format of any future NLization; and the NLization should not need any information about the original source language or previous structure of any UNL document.
IV - The UNL must be general-purpose
At first glance, the UNL seems to be a pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge. In addition to translation, the UNL is expected to be used in several other different tasks, such as text mining, multilingual document generation, summarization, text simplification, information retrieval and extraction, sentiment analysis etc. Indeed, in UNL-based systems there is no need for the source language to be different from the target language: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified, a localized or a simply rephrased version of the original.
V - The UNL must be independent from any particular natural language
The UNL is expected to be the language of the United Nations and, therefore, must not be circumscribed to any existing natural language in particular, under the risk of being rejected by the state members of the General Assembly.

1.3 Assumptions

1. Languages convey information
The UNL assumes that one of the most outstanding uses of natural languages is to convey information, i.e., that natural languages can be used to represent what we know about the world. This "aboutness" of natural languages, i.e., its representational role, is the main object of the UNL, which is expected, not to do what natural languages do, but to represent what they represent.
2. Information can be represented by semantic networks
The UNL assumes that any information conveyed by natural language can be formally and usefully represented by a semantic network. This idea is not new. Semantic networks have been used in knowledge representation at least since Charles S. Peirce, and as an interlingua for machine translation since the 1950's. In the UNL approach, this semantic network (or UNL graph) is made of three different types of discrete semantic entities: concepts, relations and attributes. Concepts are nodes in the network; relations are arcs linking nodes; and attributes are used to delimit the use of nodes. This three-layered representation model is the cornerstone of the UNL, and a distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices.
3. Any information may be expressed in any language
The UNL assumes that any information conveyed by natural languages is translatable, i.e., that natural languages differ, not in their power to express information, but in the way they do that. The UNL also assumes that, in order to ensure this "translatability" of information, the semantic network must be independent of any natural language in particular (i.e., it must be "universal"[1]). This is achieved by defining a standard (uniform) set of universally-accessible semantic entities, which are the elements of UNL: Universal Words (or UW's), Universal Relations and Universal Attributes. Universal Relations and Universal Attributes.

1.4 Properties

Non-Ambiguity
As a formal system, the UNL is not expected to have any ambiguity, at any level. The sentence "The girls saw the boy with the telescope" must be represented, in UNL, in a way that there is no ambiguity concerning the meaning of "saw" (past tense of the verb "to see" x present tense of the verb "to saw" x noun "saw") or the dependency relations of "with the telescope" ("saw with the telescope" x "the boy with the telescope").
Non-Redundancy
As a knowledge representation language, the UNL is not expected to have any redundancy. Expressions such as "free gift", "round circle" and "murder to death" are expected to be represented, in UNL, as "gift", "circle" and "murder", respectively. Likewise, sentences such as "Peter killed Mary", "Peter murdered Mary", "It's Peter who killed Mary" and "Mary was killed by Peter" are expected to be represented in UNL in the same way[2].
Compositionality
As a formal system, the UNL is always literal, i.e., fully compositional. UNL expressions must derive their semantic value thoroughly from their components, which must be explicitly defined in the UNL Knowledge Base. Accordingly, the UNL does not allow for any figure of speech, such as metaphor and metonymy. Tropes must be represented, in UNL, by their intended meaning. A sentence such as "John devoured thousands of books", for instance, must be represented, in UNL, as "John read many books eagerly"[3].
Declarativeness
As a knowledge representation language, the UNL is not expected to perform speech acts (such as promises, requests, orders etc.), but only to describe them in a constative manner. For instance, given a performative utterance such as "Can you pass me the salt?", the role of the UNL is to represent "you pass the salt to me" and to indicate that this was a polite request[4]. The UNL representation itself will not be a request, nor will be bound to provoke the same (perlocutionary) effect caused by the original utterance.
Completeness
As a fully-explicit semantic system, the UNL is not expected to have ellipses or pro-forms, except when the referent is not present in the document (exophora). A sentence such as "The monkey took the banana and ate it" must be represented, in UNL, as "[The monkey]i took [the banana]j and [the monkey]i ate [the banana]j".

2. Universal Words (UWs)

The basic assumption of the UNL approach is that the information conveyed by natural languages can be formally represented through a semantic network made of three different types of discrete semantic units: Universal Words, Universal Relations and Universal Attributes.

2.1 Definition

Universal Words, or simply UW's, are the words of UNL, and correspond to nodes - to be interlinked by Universal Relations and specified by Universal Attributes - in a UNL graph. They correspond to semantic discrete units conveyed by natural language open lexical categories (noun, verb, adjective and adverb). Any other semantic content (such as the ones conveyed by articles, prepositions, conjunctions etc.) is represented as attributes or relations. This criterion is not language-biased: if a given semantic value proves to be conveyed, in any language, by a closed class, it should not be represented as a UW, regardless of its realisation in other languages.

2.2 The universality of UW's

As the name indicates, Universal Words are expected to be "universal". This does not mean that they represent a sort of common lexical denominator to all languages or a semantic primitive. The concept of "universal", in UNL, must be understood in the sense of "capable of being used and understood by all" (as in "Coordinated Universal Time (UTC)", or in "universal adapter"), rather than "common to all" (as in "Universal Grammar"). They are "universal" in the sense that they are uniform identifiers to the entities defined in the UNL Knowledge Base, which is expected to map everything that we know about the world, and that is used to assign translatability to any concept.

UW's may represent concepts that are believed to be lexicalized[5] in most languages (such as "cause to die"); concepts that are lexicalized only in a few languages (such as "to execute someone by suffocation so as to leave the body intact and suitable for dissection"); concepts that are lexicalized in one single language (such as "a person who is ready to forgive any transgression a first time and then to tolerate it for a second time, but never for a third time"); and concepts that are not lexicalized in any language (such as "women that normally wear red hats and white shoes in big theaters").

The universality of a UW does not come from the type of concept that it represents, but from the way it does that: the UW provides a method for processing the concept, so that any natural language would be able to deal with it, either as a single node, if lexicalized, or as a hyper-node (i.e., a sub-graph), otherwise.

2.3 Permanent UW's and Temporary UW's

UW's can be permanent or temporary.

Permanent UW's
Permanent UW's are included in the UNL Dictionary and correspond to concepts that have been already lexicalized in at least one language (i.e., which are conceived as single lexical items and included therefore in natural language dictionaries). They can be simple, compound or complex (see below).
Temporary UW's
Temporary UW's are words that:

2.4 Simple UW's, Compound UW's and Complex UW's

Permanent UWs can be simple, compound or complex.

Simple UW's
A simple UW is an isolated node in the UNL graph. It is used when the UW represents a concept that is not compositional, i.e., that cannot be fully reduced to constituent concepts, such as "big" (> "above average"), "put" (> "cause to be in a certain state") or "stamp" (> "a small adhesive token").
Compound UW's
A compound UW is an isolated node combined with attributes. It is used when the concept can be fully derived from the combination of an existing simple UW and a UNL attribute, such as the concept conveyed by the English word "bigger", which can be represented simply as the UW corresponding to "big" specified by the degree attribute "@more".
Complex UW's
A complex UW is a hyper-node, i.e., a sub-graph inside the UNL graph. As graphs, complex UWs follow the structure defined for UNL Sentences. They are used when the concept can be fully derived from the combination of existing UW's, relations and attributes, such as in the case of the concept conveyed by the English word "to stamp" (= "affix a stamp to"), which could be represented, in UNL, as the graph corresponding to the definition "affix a stamp to".

2.5 Principles

Sense
UW's represent sense and not reference. UW's are related to the intension (sense, meaning, connotation) rather than to the extension (reference, denotation) of linguistic expressions. The expressions "morning star" and "evening star", which are said to have the same reference (the planet Venus), must be necessarily represented by different UW's, because they convey different "modes of presentation" of the same object, i.e., have different senses: "the last star to disappear in the morning" and "the first star to appear in the evening", respectively.
Productivity
UW's must correspond to and only to contents conveyed by natural language open lexical categories (nouns, verbs, adjectives and adverbs). Any other semantic content (such as the ones conveyed by articles, prepositions, conjunctions etc.) should be represented as attributes or relations. This criterion is not language-biased: if a given semantic value proves to be conveyed, in any language, by a closed class, it should not be represented as a UW, regardless of its realisation in other languages. The only exception to this principle are the pro-forms, which are represented by a special type of UW, the pro-UW, or null UW (see below).
Compositionality
Simple UW's must correspond to and only to contents expressed by non-compositional lexical items, i.e., words and multiword expressions that cannot be fully reduced to the combination of existing UW's, attributes and relations. Compound and complex UW's must be used when the content can be fully determined by the meanings of constituent expressions and the rules used to combine them.
Comprehensiveness
UW's are "universal" in the sense that they constitute the lexicon of a "universal language", i.e., that they convey ideas that can be expressed in each and every language. They are not universal in the sense that they are lexicalized in all languages. In that sense, UW's are not to be considered semantic primitives, nor should represent only common concepts. The repertoire of UW's is supposed to be as comprehensive as the set of different individual concepts depicted by different cultures, no matter how specific they are. Furthermore, the lexicon of UNL constitutes an open set, subject to permanent increase with new UW's, as UNL is supposed to incessantly incorporate new cultures and cultural changes.
Universality
Permanent UW's may represent concepts with different degrees of universality and are stored accordingly in three nested lexical databases, which are subdivisions of the UNL Dictionary:
Non-Ambiguity and Non-Redundancy
A given sense may not be represented by more than one UW, and one UW may not have more than one sense. There is no homonymy, synonymy or polysemy in UNL.
Simplicity
Simple UW's are names (and not definitions) for senses. The simple UW does not bring much (or any) information about its sense. It is just a label. Any information concerning the sense is expected to be provided by the three different lexical databases available inside the UNL framework: the UNL Dictionary, the UNL Knowledge Base and the UNL Memory.

2.6 Uniform Concept Identifier (UCI)

An Uniform Concept Identifier (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for Universal Words (UW's). In the UNL framework, UCI's are represented either as UCL (Uniform Concept Locator) or UCN (Uniform Concept Name).

The UCI follows the generic syntax defined for URI's:

<scheme name>:<hierarchical part>
Where:

Uniform Concept Locators (UCL), as URL's, provide a method for finding the concept in the UNL Knowledge Base. They are represented as:

ucl://<AUTHORITY>/<ID>
Where:

For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table", may be located through "ucl://unlkb.unlarchive.org/104379964". This address is expected to bring all the information concerning the concept, i.e., it's definition in UNL, which may be used by the languages where this concept is not lexicalized.

Uniform Concept Names (UCN) use the ucn scheme and, as URN's, do not imply availability of the identified resource. They are represented as:

ucn:<LID>:<NSS>
Where:

For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table" may be associated to several different names:

UCN's must be unique and the namespace-specific string is normally split into two different parts: a root and a suffix, as exemplified above. The root can be a word or a multi-word expression. The suffix, which is always introduced by a UNL relation, is used to disambiguate the root.

UCL and UCN are both used to identify UW's. The difference is that UCL is an address to the position of the UW in the UNL Knowledge Base, whereas the UCN is only the name of the UW. The same address (i.e., UCL) may be associated to different UCN's, but a single UCN may not have more than one UCL. A UCL always describe an available UW, i.e., a UW that has been already defined in the UNL KB, whereas a UCN is not necessarily linked to an address. In that sense, UCL's are more "official" than UCN's, which are normally used in order to preserve the readability of the UNL code.

In the UNL Document Structure, UCI's are always abbreviated to the last part, because the scheme, the authority and the namespace may be inferred from the document header. For instance:

2.7 Structure

Universal Words are represented as follows:

2.8 Examples

Examples of UW's
Type Concept
(in English)
Lexicalization
(in English)
UCL UCN
Simple UW above average big 301382086 big(icl>size)
grande(icl>tamaño)
groß(icl>Größe)
grand(icl>taille)
...
Compound UW comparative of above average bigger 301382086.@more big(icl>size).@more
...
Complex UW affix a stamp to to stamp obj(201356370,106796119) obj(to affix(icl>to attach), stamp(icl>seal))
Temporary UW UNDL Foundation UNDL Foundation "UNDL Foundation" "UNDL Foundation"

2.9 Categories of UW's

Permanent UW's are classified in four different categories, depending on their semantic values:

These categories are semantically-based. They are related to the UW's and are not oriented to any particular language.

In that sense, adjectival UW's (such as "300217728" = "delighting the senses or exciting intellectual or emotional admiration") tend to be associated to English adjectives ("beautiful"), but they can also be realised as prepositional phrases ("with beauty"), verbal phrases ("possessing beauty"), etc.

2.10 Pro-UW's

The UNL representation is expected to be as semantically saturated as possible, and deictics are supposed to be substituted during the UNLization process. In that sense, ellipses and natural language pro-forms (such as "he", "she", "it", "they" etc.) are expected to be replaced by their corresponding antecedents. In many cases, however, it is not possible to find a substitute for words requiring information that is not available inside natural language texts. In these cases, we use pro-UWs, which are represented by the null UW "00" combined with attributes, when applicable.

The main cases are:

Exophora, which is the reference to something that is not inside the text.
This is the case of personal pronouns (such as "I", "you", "we" etc.) for which there is no antecedent in the text (i.e., which refer directly to the context of utterance). These pronouns are represented by the null UW "00" followed by the person attributes (@1, for first person singular; @2, for second person singular; @3, for third person singular; @1.@pl, for first person plural; @2.@pl, for second person plural; and @3.@pl, for third person plural)
Indefinite pronouns (such as "none", "anyone", "everything" etc.), which refer to general categories of people or things.
These pronouns are represented by the null UW "00" followed by determiner attributes ("none" = "00.@no", "anyone" = "00.@any.@person", "everything" = "00.@every.@thing" etc.).
Interrogative pronouns (such as "who", "whom", "where" etc.), which refer to omitted constituents of the syntactic structure.
These pronouns are represented by the null UW "00" followed by the attribute "@wh" ("who" = "00.@wh", "whom" = "00.@wh", "where" = "00.@wh" etc.). The difference between them is determined by the relation in which they appear: "00.@wh" is to be interpreted as "who" when the target argument of an "agt" (agent) relation; as "when" when the target argument of a "tim" (time) relation; as a "where" when the target argument of a "plc" (place) relation; and so on.
Interjections (such as "Ouch!", "Yeah!", "Shhh!" etc.), when used in isolation to express an emotion or sentiment on the part of the speaker.
These UWs are always represented by the null UW "00" followed by an emotional attribute (@anger, @pain etc).
Ellipses, when cannot be replaced by any antecedent, are represented by the null UW "00" without any specific attribute:
"To be or not to be?", for instance, should be represented either as "aoj(exist,00)" or "aoj(00,00)", depending on the interpretation ("to exist or not to exist" or "to be that or not to be that", respectively), because the necessary subject is missing and cannot be linked to any particular referent..

It is important to stress that all cases above refer to situations where the semantic content cannot be fully saturated. Whenever possible, pro-forms and ellipses are expected to be replaced by their referents. For instance, the pro-UW "00.@3" is not supposed to be used in the case of "Peter said that he will not come", if we are sure that "he" is "Peter". In this case, this sentence is expected to be represented as "Peter(i) said that Peter(i) will not come".

It should also be stressed that, in the UNL approach, pronouns should be differentiated from determiners. The word "which" in "which is that?" is an interrogative pronoun and should be represented, therefore, by the pro-UW "00.@wh", if we cannot determine to what we are referring to; but the word "which" in "which book is that?" is a determiner, to be represented as an attribute (.@wh) assigned to "book" ("book.@wh").

2.11 Proper UW's

Most named entities (names of people, of places, of brands etc.) are represented as temporary UW's, because it would not be feasible to include them all in the UNL Dictionary. Nevertheless, some named entities of widespread use (such as "England", "William Shakespeare", "Romeo and Juliet", "Romeo" etc.) have been already included in the UNL Dictionary and are treated as permanent UW's. Our current criteria is the Wikipedia. If a proper name is defined as an entry in the Wikipedia, then it should be defined as a permanent UW and included in the UNL Dictionary|UNL Unabridged Dictionary.

2.12 Lexical Databases

UW's are grouped in several different lexical databases:

3. Universal Attributes

Universal Attributes are arcs linking a node to itself. In opposition to Universal Relations, they correspond to one-place predicates, i.e., functions that take a single argument. In UNL, attributes have been normally used to represent information conveyed by natural language grammatical categories (such as tense, mood, aspect, number, etc).

The set of attributes, which is claimed to be universal, is not open to frequent additions.

3.1 Syntax

The syntax of attributes is defined as follows:

<attribute>      ::= "@"<attribute_name>
<attribute_name> ::= <character>+
<character>      ::= {"a",...,"z","_"}

where:
< > variable
" " terminal symbol
::=... is defined as ...
{ } disjunction ("or")
+ to be used one or more times
... to be repeated more than 0 times

Attribute names are always lower case words or expressions.

Normally, English words ("past", "will") or mnemonic abbreviations ("def", "pl") are used for attribute labelling.

No blank space is allowed inside an attribute name.

3.2 Semantics

Attributes are annotations made to nodes or hypernodes of a UNL hypergraph. They denote the circumstances under which these nodes (or hypernodes) are used.

Attributes may convey three different kinds of information:

3.3 Set of attributes


4. Universal Relations

Universal Relations, formerly known as "links", are labelled arcs connecting a node to another node in a UNL graph. They correspond to two-place semantic predicates holding between two Universal Words. In UNL, universal relations have been normally used to represent semantic cases or thematic roles (such as agent, object, instrument, etc.) between UWs. The repertoire of universal relations is defined in the Specs|UNL Specs and it is not open to frequent additions.

4.1 Definition

In the UNL framework, universal relations describe semantic functions between two UWs. These functions are binary and directed (from a source to a target) and are claimed to be universal. Because of their similarity in name and function to syntactic relations, it may seem that the labels used for relations are different names for special grammatical functions. This is emphatically not the case. The intention is that the labels used denote specific ideas rather than grammatical structures: the idea of “something that initiates an event,” or “agent” for example, is quite different from “grammatical subject of a sentence”, even though many times the subject of a sentence will indicate the agent of the event. The agent of an event may also appear as an adjective or noun modifier, with the preposition “by” or embedded in nouns with “er” suffixes. The whole point of the conceptual relations is to have a name for these very different grammatical structures which are conceptually quite the same. Thus, the conceptual relations used in UNL are much more abstract than the grammatical relations found in sentences.

4.2 Syntax

Universal relations are represented as follows:

 <rel>:<scope>(<source>,<target>)

where:

4.3 Hierarchy of relations

Universal Relations are organized in a hierarchy where lower nodes subsume upper nodes. The topmost level is the relation "rel", which simply indicates that there is a semantic relation between two elements.

rel

4.4 Observations

  1. Arguments of relations are not commutative:
    cnt(evidence;absence),i.e., 'evidence of absence', is different from cnt(absence;evidence), i.e., 'absence of evidence'
  2. The target always defines the relation:[6]
    <relation>(<source>;<target>) => <target> is the <relation> of <source>
  3. Relations describe semantic dependencies rather than syntactic roles.

    The same relation may play different syntactic roles. Consider, for instance, the case of the relation 'gol' (goal):

  4. Lexical and syntactic ambiguities are solved through relations.

    Consider, for instance, the case of the English preposition "in", as in 'Peter works in X'.

  5. Relations are not necessarily bound to a given lexical category.

    The same relation may be used to describe nominal and verbal structures:

  6. In most cases, lower relations may be completely replaced by the corresponding upper levels with the help of attributes:
  7. In several cases, however, relations are not completely interchangeable, and replacement implies a significant semantic loss. In these cases, upper levels must be used carefully, and only when there is no other alternative:
  8. The use of relations depends on the internal semantic structure of the UW (see semantic frames).

    Consider, for instance, the verbs "to kill", "to love" and "to give":

4.5 List of relations in alphabetical order

Tag Relation Definition Example
agt agent A participant in an action or process that provokes a change of state or location. John killed Mary = agt(killed;John)
Mary was killed by John = agt(killed;John)
arrival of John = agt(arrival;John)
and conjunction Used to state a conjunction between two entities. John and Mary = and(John;Mary)
both John and Mary = and(John;Mary)
neither John nor Mary = and(John;Mary)
John as well as Mary = and(John;Mary)
ant opposition or concession Used to indicate that two entities do not share the same meaning or reference. Also used to indicate concession. John is not Peter = ant(Peter;John)
3 + 2 != 6 = ant(6;3+2)
Although he's quiet, he's not shy = ant(he's not shy;he's quiet)
aoj object of an attribute The subject of an stative verb. Also used to express the predicative relation between the predicate and the subject. John has two daughters = aoj(have;John)
the book belongs to Mary = aoj(belong;book)
the book contains many pictures = aoj(contain;book)
John is sad = aoj(sad;John)
John looks sad = aoj(sad;John);
ben beneficiary A participant who is advantaged or disadvantaged by an event. John works for Peter = ben(works;Peter)
John gave the book to Mary for Peter = ben(gave;Peter)
cnt content or theme The object of an stative or experiental verb, or the theme of an entity. John has two daughters = cnt(have;two daughters)
the book belongs to Mary = cnt(belong;Mary)
the book contains many pictures = cnt(contain;many pictures)
John believes in Mary = cnt(believe;Mary)
John saw Mary = cnt(saw;Mary)
John loves Mary = cnt(love;Mary)
The explosion was heard by everyone = cnt(hear;explosion)
a book about Peter = cnt(book;Peter)
con condition A condition of an event. If I see him, I will tell him = con(I will tell him;I see him)
I will tell him if I see him = con(I will tell him;I see him);
dur duration or co-occurrence The duration of an entity or event. John worked for five hours = dur(worked;five hours)
John worked hard the whole summer = dur(worked;the whole summer)
John completed the task in ten minutes = dur(completed;ten minutes)
John was reading while Peter was cooking = dur(John was reading;Peter was cooking)
equ synonym or paraphrase Used to indicate that two entities share the same meaning or reference. Also used to indicate semantic apposition. The morning star is the evening star = equ(evening star;morning star)
3 + 2 = 5 = equ(5;3+2)
UN (United Nations) = equ(UN;United Nations)
John, the brother of Mary = equ(John;the brother of Mary)
exp experiencer A participant in an action or process who receives a sensory impression or is the locus of an experiential event. John believes in Mary = exp(believe;John)
John saw Mary = exp(saw;John)
John loves Mary = exp(love;John)
The explosion was heard by everyone = exp(hear;everyone)
fld field Used to indicate the semantic domain of an entity. sentence (linguistics) = fld(sentence;linguistics)
gol final state, place, destination or recipient The final state, place, destination or recipient of an entity or event. John received the book = gol(received;John)
John won the prize = gol(won;John)
John changed from poor to rich = gol(changed;rich)
John gave the book to Mary = gol(gave;Mary)
He threw the book at me = gol(threw;me)
John goes to NY = gol(go;NY)
train to NY = gol(train;NY)
icl hyponymy, is a kind of Used to refer to a subclass of a class. Dogs are mammals = icl(mammal;dogs)
ins instrument or method An inanimate entity or method that an agent uses to implement an event. It is the stimulus or immediate physical cause of an event. The cook cut the cake with a knife = ins(cut;knife)
She used a crayon to scribble a note = ins(used;crayon)
That window was broken by a hammer = ins(broken;hammer)
He solved the problem with a new algorithm = ins(solved;a new algorithm)
He solved the problem using an algorithm = ins(solved;using an algorithm)
He used Mathematics to solve the problem = ins(used;Mathematics)
iof is an instance of Used to refer to an instance or individual element of a class. John is a human being = iof(human being;John)
lpl logical place A non-physical place where an entity or event occurs or a state exists. John works in politics = lpl(works;politics)
John is in love = lpl(John;love)
officer in command = lpl(officer;command)
man manner Used to indicate how the action, experience or process of an event is carried out. John bought the car quickly = man(bought;quickly)
John bought the car in equal payments = man(bought;in equal payments)
John paid in cash = man(paid;in cash)
John wrote the letter in German = man(wrote;in German)
John wrote the letter in a bad manner = man(wrote;in a bad manner)
mat material Used to indicate the material of which an entity is made. A statue in bronze = mat(statue;bronze)
a wood box = mat(box;wood)
a glass mug = mat(mug;glass)
mod modifier A general modification of an entity. a beautiful book = mod(book;beautiful)
an old book = mod(book;old)
a book with 10 pages = mod(book;with 10 pages)
a book in hard cover = mod(book;in hard cover)
a poem in iambic pentameter = mod(poem;in iambic pentamenter)
a man in an overcoat = mod(man;in an overcoat)
nam name The name of an entity. The city of New York = nam(city;New York)
my friend Willy = nam(friend;Willy)
obj patient A participant in an action or process undergoing a change of state or location. John killed Mary = obj(killed;Mary)
Mary died = obj(died;Mary)
The snow melts = obj(melts;snow)
opl objective place A place affected by an action or process. John was hit in the face = opl(hit;face)
John fell in the water = opl(fell;water)
or disjunction Used to indicate a disjunction between two entities. John or Mary = or(John;Mary)
either John or Mary = or(John;Mary)
per proportion, rate, distribution, measure or basis for a comparison Used to indicate a measure or quantification of an event. The course was split in two parts = per(split;in two parts)
twice a week = per(twice;week)
The new coat costs $70 = per(cost;$70)
John is more beautiful than Peter = per(beautiful;Peter)
John is as intelligent as Mary = per(intelligent;Mary)
John is the most intelligent of us = per(intelligent;we)
plc place The location or spatial orientation of an entity or event. John works here = plc(work;here)
John works in NY = plc(work;NY)
John works in the office = plc(work;office)
John is in the office = plc(John;office)
a night in Paris = plc(night;Paris)
pof is part of Used to refer to a part of a whole. John is part of the family = pof(family;John)
pos possessor The possessor of a thing. the book of John = pos(book;John)
John's book = pos(book;John)
his book = pos(book;he)
ptn partner A secondary (non-focused) participant in an event. John fights with Peter = ptn(fight;Peter)
John wrote the letter with Peter = ptn(wrote;Peter)
John lives with Peter = ptn(live;Peter)
pur purpose The purpose of an entity or event. John left early in order to arrive early = pur(John left early;arrive early)
You should come to see us = pur(you should come;see us)
book for children = pur(book;children)
qua quantity Used to express the quantity of an entity. two books = qua(book;2)
a group of students = qua(students;group)
res result or factitive A referent that results from an entity or event. The cook bake a cake = res(bake;cake)
They built a very nice building = res(built;a very nice building)
rsn reason The reason of an entity or event. John left because it was late = rsn(John left;it was late)
John killed Mary because of John = rsn(killed;John)
seq consequence Used to express consequence. I think therefore I am = seq(I think;I am)
src initial state, place, origin or source The initial state, place, origin or source of an entity or event. John came from NY = src(came;NY)
John is from NY = src(John;NY)
train from NY = src(train;NY)
John changed from poor into rich = src(changed;poor)
John received the book from Peter = src(received;Peter)
John withdrew the money from the cashier = src(withdrew;cashier)
tim time The temporal placement of an entity or event. The whistle will sound at noon = tim(sound;noon)
John came yesterday = tim(came;yesterday)
tmf initial time The initial time of an entity or event. John worked since early = tmf(worked;early)
tmt final time The final time of an entity or event. John worked until late = tmt(worked;late)
via intermediate state or place The intermediate place or state of an entity or event. John went from NY to Geneva through Paris = via(went;Paris)
The baby crawled across the room = via(crawled;across the room)

5. UNL Sentence

UNL sentences, or UNL expressions, are sentences of UNL. They are hypergraphs made out of nodes (Universal Words) interlinked by binary semantic Universal Relations and modified by Universal Attributes. UNL sentences have been the basic unit of representation inside the UNL framework.

Syntax

According to the Specs|UNL Specs, there are two different ways of representing UNL sentences: the table format and the list format. In the list format, UWs and relations are represented separately; in the table format, they constitute a single structure.

List Format

The syntax for UNL sentences in the list format is the following:

<UNL sentence> ::= "[W]" <list of UWs> "[/W]" [ "[R]" <list of relations> "[/R]" ]
<list of UWs> ::= <UW+attributes> [<UW+attributes>...]
<UW+attributes> ::= <UW>{:<Scope-ID>}[<attribute list>]:<UW-ID>
<list of relations> ::= <binary relation>[<binary relation>...]
<binary relation> ::= <source node><relation[":"<Scope-ID>]<target node>
<source node> ::= <UW-ID>
<target node> ::= <UW-ID>

Table Format

The syntax for UNL sentences in the table format is the following:

<UNL sentence> ::= <list of relations>
<list of relations> ::= <binary relation>[<binary relation>...]
<binary relation> ::= <relation> [":"<Scope-ID>] "(" <source node> , <target node> ")"
<source node> ::= <UW+attributes>
<target node> ::= <UW+attributes>
<UW+attributes> ::= <UW>{:<Scope-ID>}[<attribute list>]:<UW-ID>

Where
" and " indicate a predefined delimiter
< and > indicate a non-terminal symbol
{ and } indicate a range
[ and ] indicate an omissible part
... indicates more than 0 times repetition of the front part
::= indicates the left part can be replaced by the right part


6. UNL Document

UNL documents are documents written in UNL. They are plain text files that include UNL Sentences and some special tags. They are the output of the UNLization process and the input of the NLization process.

Syntax

A UNL document is enclosed with tags “[D:<id>]” and “[/D]”. Within these tags, each paragraph is enclosed with a pair of tags “[P:<id>]” and “[/P]”, and each sentence is enclosed with a pair of tags “[S:<id>]” and “[/S]”. Inside a sentence, the text of original sentence is enclosed with “{org:<lang>}” and “{/org}”, its UNL expression is enclosed with “{unl:<id>}” and “{/unl}”. Sentences of target languages can also be stored in the UNL document. Each target sentence is enclosed with a pair of language tags “{<lang>}” and “{/<lang>}” following the UNL expression of each sentence.

Tags used in UNL Documents
Tag Description
[D:<id>] indicates the beginning of a document.
[/D] indicates the end of a document
[P:<id>] indicates the beginning of a paragraph.
[/P] indicates the end of a paragraph
[S:<id>] indicates the beginning of a sentence.
[/S] indicates the end of a sentence
{org:<lang>=<code>} indicates the beginning of an original/source sentence
{/org} indicates the end of an original sentence
{unl:<id>} indicates the beginning of the UNL expressions of a sentence.
{/unl} indicates the end of the UNL expressions of a sentence
{<lang>} indicates the beginning of a target sentence of the language indicated by <lang>
{/<lang>} indicates the end of a target sentence of the language indicated by <lang>

Where

:<id> (optional), which is normally represented by an integer, may be any sequence of characters used to identify the document, the sentence, the paragraph or the UNL expression
:<lang> (optional in case of {org}) corresponds to the language code in ISO639-2 or ISO639-3
:=<code> (optional) corresponds to the character encoding

Semantics

For the time being, a UNL document is simply a collection of UNL sentences. However, it can also be treated as a hypergraph itself, comprising several subhypergraphs (the UNL sentences) inter-related by a special relation "nxt" (for "next"), which indicates sequential order. In the XUNL Project, we have been proposing some other strategies for representing cross-sentential relations, which are, however, still under discussion.


7. References

8. Notes

  1. The idea of "universality", in UNL, must be understood in the sense of "capable of being used and understood by all" (as in "Coordinated Universal Time (UTC)", or in "universal adapter"), rather than "common to all" (as in "Universal Grammar"). See Universal.
  2. The differences between them can be represented by attributes such as @topic and @passive, but this is rather optional, because the goal of UNL is to represent "what was meant" and not "what was said" or "how it was said".
  3. The information that this content has been conveyed through figurative language can be indicated by the corresponding attributes (@metaphor, @hyperbole, etc.), but this is optional.
  4. This can be done by the use of the attributes @polite and @request.
  5. i.e., consolidated as a single indivisible lexical unit.
  6. The order of the arguments, in many cases, is counter-intuitive. Consider, for instance, the case of "icl" (hyponymy) as in "Dogs are mammals". The relation is icl(mammal;dogs) because "dogs" is the target of the hyponymy, i.e, "dogs" is a hyponym of "mammals", and not the opposite. This seems to contradict with "Peter is in NY", where we have plc(Peter;NY), but it's important to notice that, in both cases, the general principle of the order (i.e., the relation is always defined by the target) is being followed. In this sense, an important change from the past Specs is the order of the relations "and" and "or". Up to the UNL2005, "Mary and John" were represented as and(John;Mary); from the version UNL2010, the same relation, in order to preserve the general principle of the order, is represented as and(Mary;John).