SpeechDat Cymru

SpeechDat Cymru is a database of read Welsh, spoken by a demographically balanced collection of 2000 speakers over the public switched telephone network (PSTN). It is part of the SpeechDat(II) collection of 28 speech and speaker recognition databases over fixed and mobile networks for 21 European languages and dialects.

Callers utter items ranging from isolated digits, through word-spotting phrases and directory assistance words, to phonetically rich sentences. The database is primarily designed for telephony speech recognition applications, and has been fully annotated at the word level, with non-speech events also being marked. It has been rigorously validated and found to comply with the criteria laid down by the SpeechDat consortium.

SpeechDat Cymru was collected by the University of Wales Swansea Speech and Image Research Group under contract to BT Laboratories. It has been rigorously validated and found to comply with the criteria laid down by the SpeechDat consortium. The database will be made available through ELRA on a set of 10 CD-ROMs.

Background information about SpeechDat Cymru can be gained from the recruitment site - please note that database collection finished in late 1998. General information on SpeechDat is available from the SpeechDat projects website. The design document for SpeechDat Cymru is available in Acrobat format (270k), RTF (280k) or compressed Postscript (145k).

Sample files

For further information on SpeechDat Cymru, please contact Rhys Jones


[Speech & Images Research Group Home | UWS home | EEE home | Contact us]

Site designed by MACHADO Lionel.