next up previous
Next: Tracking the Dialogue Up: Extraction of Utterance Contents Previous: Deep Extraction

Shallow Extraction

   figure88
Figure 3: The graphical user interface for automaton development

Triggered by the real world conditions of the system as explained in section 2, a number of shallow techniques were devised to cope with the incoming data. The shallow track employs a combination of statistical methods and finite state technology to extract dialogue act, TEMPEX and DIREX expressions [Reithinger1999].

The dialogue act is found by statistical methods using trigrams and deleted interpolation [Reithinger and Klesen1997]. This requires a data-base of manually annotated dialogues which serves as training material. We currently use a training corpus of about 660 dialogues for German, 300 for English and 400 for Japanese.

The string output of the speech recognizer plus the recognized dialogue act is then sent to a language-specific set of finite state automata which deliver valid TEMPEXes and a number of semantic chunks that are assembled to a DIREX. These automata were manually encoded with a graphically enhanced toolkit (see fig. 3). Each set can be seen as a number of ordered cascaded automata with special actions for state transition (allowing e.g. global search on the whole string for one transition). All automata are applied to the input in an ordered fashion and each one builds on the output of the preceding one. This output consists of bracketed chunks and the remaining string. The final output delivers a bracketed representation of a DIREX. Currently, we use 186 automata for German, 167 for English and 127 for Japanese extraction.

Performance in dialogue act recognition achieves an accuracy of about 70% on unseen data and is even better on the actual evaluation dialogues (like the ones mentioned in section 2) since dialogues with the running system are often simpler than our sample data.

Content objects found by shallow extraction are also reliable since they are based on very local linguistic evidence and do not expect correct syntactic structures. The objects correspond to linguistic forms within the utterance. Inferences concerning co-reference and reference to real-world objects are drawn at a later stage.


next up previous
Next: Tracking the Dialogue Up: Extraction of Utterance Contents Previous: Deep Extraction

Jan Alexandersson
Thu Nov 11 15:15:06 MET 1999