13.7 Available lexicons
Currently Festival supports a number of different lexicons. They are
all defined in the file lib/lexicons.scm each with a number of
common extra words added to their addendas. They are
- ‘CUVOALD’
- The Computer Users Version of Oxford Advanced Learner's Dictionary is
available from the Oxford Text Archive
ftp://ota.ox.ac.uk/pub/ota/public/dicts/710. It contains about
70,000 entries and is a part of the BEEP lexicon. It is more consistent
in its marking of stress though its syllable marking is not what works
best for our synthesis methods. Many syllabic ‘l’'s, ‘n’'s,
and ‘m’'s, mess up the syllabification algorithm, making results
sometimes appear over reduced. It is however our current default
lexicon. It is also the only lexicon with part of speech tags that
can be distributed (for non-commercial use).
- ‘CMU’
- This is automatically constructed from cmu_dict-0.4 available
from many places on the net (see
comp.speech
archives). It is
not in the mrpa phone set because it is American English pronunciation.
Although mappings exist between its phoneset (‘darpa’) and
‘mrpa’ the results for British English speakers are not very good.
However this is probably the biggest, most carefully specified lexicon
available. It contains just under 100,000 entries. Our distribution
has been modified to include part of speech tags on words we know to be
homographs.
- ‘mrpa’
- A version of the CSTR lexicon which has been floating about for years.
It contains about 25,000 entries. A new updated free version of
this is due to be released soon.
- ‘BEEP’
- A British English rival for the cmu_lex. BEEP has been made
available by Tony Robinson at Cambridge and is available in many
archives. It contains 163,000 entries and has been converted to the
‘mrpa’ phoneset (which was a trivial mapping). Although large, it
suffers from a certain randomness in its stress markings, making use of
it for synthesis dubious.
All of the above lexicons have some distribution restrictions (though
mostly pretty light), but as they are mostly freely available we provide
programs that can convert the originals into Festival's format.
The MOBY lexicon has recently been released into the public domain and
will be converted into our format soon.