Composition of the Corpus
Audio
The VOICE Awards Corpus contains audio data in separate .wav files for all 1970 dialogs. The data was recorded in a single channel for both system and user, and the range of the audio quality is considerable.
Transcription
The VOICE Awards Corpus has been manually transcribed using the Transcriber tool (single pass). For quality assurance, spell checking and other quality control scripts have been applied to the transcriptions.
During transcription, a segmentation into turns was performed, encoding the start and end times of a turn, the speaker and overlapping speech. For further processing, the resulting files were converted into NXT format using the script trs2nxt.prl.
Annotations
Using the NITE XML Toolkit, the VOICE Awards Corpus was hand-annotated by a single-pass process. The levels of annotation are dialog acts, markers of miscommunication or errors, task success, and repetitions. Since the annotations are used for learning of dialog and error strategies by a user simulation, only data that can be obtained in real-time by a running spoken dialog application is considered for mark-up.
The table below gives an overview of the annotation options. A detailed description of each item can be found in the annotation schema (in German).
Dialog act | |
---|---|
social | hello, bye, thank, sorry |
request | open_question, request_info, yes_no_question, alternative_question |
confirmation | implicit_confirm, explicit_confirm |
metacommunication | instruction, repeat_please, request_instruction |
answer | provide_info, accept, reject |
other | noise, other |
Success | |
success | task_success, subtask_complete |
failure | system/user_abort, abort_subtask, escalated, other_failure |
Error | |
error | not_understand, misunderstand, state_error, no_input, bad_input |
other miscommunication | self_correct, system_command, other_error |
Repetition | |
answer | repeat_answer |
prompt | repeat_prompt |
Additional potentially useful annotations are performed automatically. These include:
- Answer type for user input: sentence, phrase/fragment, keyword
- Dialog duration
- Turn duration
- Length of dialogs, turns, system requests and user answers
User Ratings
Also available in the corpus are user ratings from all novice users, obtained through a usability survey. The data includes questions on comprehensibility/learnability, effectiveness, errors and error handling, "hear & feel", and total acceptance of the dialog systems.