Representation

Next: Deep Extraction Up: Extraction of Utterance Contents Previous: Extraction of Utterance Contents

Representation

The communicative function of an utterance is modeled by the dialogue act which is one of the labels of a structured hierarchy as shown in fig. 1. Once a dialogue act is found (statistically or rule-based) it is attached to the internal representation of the utterance as a label (note that multiple, overlapping communicative functions of a single segment are possible in which case we attach a list of labels).

The propositional content of an utterance is modeled by a number of objects - abstract and physical - that are connected by relations. These objects are instances of an ISA class hierarchy and refer to real-world objects (a small part of the class tree is shown in fig. 2).

Figure 2: OBJECT subtree of direx class hierarchy

The representation language is called DRL (discourse representation language), its entities are called DIREXes (discourse representation expressions). Embedded in this language is a special time representation language whose expressions are called TEMPEXes [Endriss1998]. An example representation looks like this:

so how about a three o'clock flight out of Hannover
[has_move:[move,has_departure_time:{time_of_day:3}, has_source_location:[city,has_name='hannover']]]

The utterance itself is also represented by a DIREX. It contains slots for the dialogue act, the topic and pointers to above mentioned objects. Due to frequent occurrences of ellipses and anaphora the contents information delivered by shallow extraction is incomplete and has to be extended with the help of context knowledge (see section 4.3 below). The utterance object points to the extended object as well as to the original object delivered by shallow extraction.

Jan Alexandersson
Thu Nov 11 15:15:06 MET 1999