Callers utter items ranging from isolated digits, through word-spotting phrases and directory assistance words, to phonetically rich sentences. The database is primarily designed for telephony speech recognition applications, and has been fully annotated at the word level, with non-speech events also being marked. It has been rigorously validated and found to comply with the criteria laid down by the SpeechDat consortium.
SpeechDat Cymru was collected by the University of Wales Swansea Speech and Image Research Group under contract to BT Laboratories. It has been rigorously validated and found to comply with the criteria laid down by the SpeechDat consortium. The database will be made available through ELRA on a set of 10 CD-ROMs.
Background information about SpeechDat Cymru can be gained from the recruitment site - please note that database collection finished in late 1998. General information on SpeechDat is available from the SpeechDat projects website. The design document for SpeechDat Cymru is available in Acrobat format (270k), RTF (280k) or compressed Postscript (145k).