next up previous
Next: Language Analysis Up: Language-Processing Strategies and Mixed-Initiative Previous: Introduction

System Overview

The architecture of the system is shown in Figure 1. The modules communicate asynchronously by message passing; hence, in principle all of them could run in parallel in different processes. In the current implementation, there are four processes, which handle speech recognition, speech synthesis, database access and everything else, respectively.

The speech recognizer is a Swedish-language version of the SRI Decipher system [Murveit et al. 1993], developed by SRI International and Telia Research. It sends an N-best speech hypothesis list to the two language processors: the Core Language Engine (deep analysis) and the Robust Parser (shallow analysis), further described in Section gif. The language processors each send their analyses to the dialogue manager (DM). After each system turn, the DM updates the language processors with limited information about the state of the discourse: the most recent question (if any) posed by the system, and the types of objects that are salient at the current point in the dialogue.

The DM uses a two-stage heuristic selection process to advance the dialogue. First, each input analysis is categorized as a move of a certain type, and an appropriate response to that move is selected. References are resolved and contextual information is also added, resulting in a further multiplication of possible moves and responses. Secondly, the relative utility of the various responses is judged, and the most productive response move is chosen. The dialogue manager is further described in Section gif.

The generator produces the surface string representing the actual utterance, using a simple template-based approach. The surface string is then turned into speech by Telia Research's synthesizer LIPHON.

In the current system, the database agent contains a web client in order to retrieve data from the Travellink database. All query results are cached in order to shorten the response times as much as possible. However, the response times for most queries would clearly not be acceptable in a commercial system. That inspired us to develop a version that is able to continue the dialogue while database access is in progress (that is, the system might ask about the return leg of a trip, while the database agent is searching for possible trains or flights for the outbound leg).

The system described here is fully implemented and has been permanently installed at the Telia Vision Center in Farsta/Stockholm since November 1998.



next up previous
Next: Language Analysis Up: Language-Processing Strategies and Mixed-Initiative Previous: Introduction



Mats Wiren
Mon Oct 25 13:51:54 MET DST 1999