Composition of the Corpus

Audio

The VOICE Awards Corpus contains audio data in separate .wav files for all 1970 dialogs. The data was recorded in a single channel for both system and user, and the range of the audio quality is considerable.

Transcription

The VOICE Awards Corpus has been manually transcribed using the Transcriber tool (single pass). For quality assurance, spell checking and other quality control scripts have been applied to the transcriptions.

During transcription, a segmentation into turns was performed, encoding the start and end times of a turn, the speaker and overlapping speech. For further processing, the resulting files were converted into NXT format using the script trs2nxt.prl.

Annotations

Using the NITE XML Toolkit, the VOICE Awards Corpus was hand-annotated by a single-pass process. The levels of annotation are dialog acts, markers of miscommunication or errors, task success, and repetitions. Since the annotations are used for learning of dialog and error strategies by a user simulation, only data that can be obtained in real-time by a running spoken dialog application is considered for mark-up.

The table below gives an overview of the annotation options. A detailed description of each item can be found in the annotation schema (in German).

Dialog act
social	hello, bye, thank, sorry
request	open_question, request_info, yes_no_question, alternative_question
confirmation	implicit_confirm, explicit_confirm
metacommunication	instruction, repeat_please, request_instruction
answer	provide_info, accept, reject
other	noise, other
Success
success	task_success, subtask_complete
failure	system/user_abort, abort_subtask, escalated, other_failure
Error
error	not_understand, misunderstand, state_error, no_input, bad_input
other miscommunication	self_correct, system_command, other_error
Repetition
answer	repeat_answer
prompt	repeat_prompt

Additional potentially useful annotations are performed automatically. These include:

Answer type for user input: sentence, phrase/fragment, keyword
Dialog duration
Turn duration
Length of dialogs, turns, system requests and user answers

User Ratings

Also available in the corpus are user ratings from all novice users, obtained through a usability survey. The data includes questions on comprehensibility/learnability, effectiveness, errors and error handling, "hear & feel", and total acceptance of the dialog systems.

The VOICE Awards Corpus

A Spoken Human-Machine Dialog Corpus

Composition of the Corpus

Audio

Transcription

Annotations

User Ratings