Next: , Previous: Lexicon requirements, Up: Lexicons


13.7 Available lexicons

Currently Festival supports a number of different lexicons. They are all defined in the file lib/lexicons.scm each with a number of common extra words added to their addendas. They are

CUVOALD
The Computer Users Version of Oxford Advanced Learner's Dictionary is available from the Oxford Text Archive ftp://ota.ox.ac.uk/pub/ota/public/dicts/710. It contains about 70,000 entries and is a part of the BEEP lexicon. It is more consistent in its marking of stress though its syllable marking is not what works best for our synthesis methods. Many syllabic ‘l’'s, ‘n’'s, and ‘m’'s, mess up the syllabification algorithm, making results sometimes appear over reduced. It is however our current default lexicon. It is also the only lexicon with part of speech tags that can be distributed (for non-commercial use).
CMU
This is automatically constructed from cmu_dict-0.4 available from many places on the net (see comp.speech archives). It is not in the mrpa phone set because it is American English pronunciation. Although mappings exist between its phoneset (‘darpa’) and ‘mrpa’ the results for British English speakers are not very good. However this is probably the biggest, most carefully specified lexicon available. It contains just under 100,000 entries. Our distribution has been modified to include part of speech tags on words we know to be homographs.
mrpa
A version of the CSTR lexicon which has been floating about for years. It contains about 25,000 entries. A new updated free version of this is due to be released soon.
BEEP
A British English rival for the cmu_lex. BEEP has been made available by Tony Robinson at Cambridge and is available in many archives. It contains 163,000 entries and has been converted to the ‘mrpa’ phoneset (which was a trivial mapping). Although large, it suffers from a certain randomness in its stress markings, making use of it for synthesis dubious.

All of the above lexicons have some distribution restrictions (though mostly pretty light), but as they are mostly freely available we provide programs that can convert the originals into Festival's format.

The MOBY lexicon has recently been released into the public domain and will be converted into our format soon.