24.1 Current voices
Currently there are a number of voices available in Festival and we
expect that number to increase. Each is elected via a function of the
name ‘voice_*’ which sets up the waveform synthesizer, phone set,
lexicon, duration and intonation models (and anything else necessary)
for that speaker. These voice setup functions are defined in
lib/voices.scm.
The current voice functions are
voice_rab_diphone
- A British English male RP speaker, Roger. This uses the UniSyn residual
excited LPC diphone synthesizer. The lexicon is the computer users
version of Oxford Advanced Learners' Dictionary, with letter to sound
rules trained from that lexicon. Intonation is provided by a ToBI-like
system using a decision tree to predict accent and end tone position.
The F0 itself is predicted as three points on each syllable, using
linear regression trained from the Boston University FM database (f2b)
and mapped to Roger's pitch range. Duration is predicted by decision
tree, predicting zscore durations for segments trained from the 460
Timit sentence spoken by another British male speaker.
voice_ked_diphone
- An American English male speaker, Kurt. Again this uses the UniSyn
residual excited LPC diphone synthesizer. This uses the CMU lexicon,
and letter to sound rules trained from it. Intonation as with Roger is
trained from the Boston University FM Radio corpus. Duration for this
voice also comes from that database.
voice_kal_diphone
- An American English male speaker. Again this uses the UniSyn residual
excited LPC diphone synthesizer. And like ked, uses the CMU lexicon,
and letter to sound rules trained from it. Intonation as with Roger is
trained from the Boston University FM Radio corpus. Duration for this
voice also comes from that database. This voice was built in two days
work and is at least as good as ked due to us understanding the process
better. The diphone labels were autoaligned with hand correction.
voice_don_diphone
- Steve Isard's LPC based diphone synthesizer, Donovan diphones. The
other parts of this voice, lexicon, intonation, and duration are the
same as
voice_rab_diphone
described above. The
quality of the diphones is not as good as the other voices because it
uses spike excited LPC. Although the quality is not as good it
is much faster and the database is much smaller than the others.
voice_el_diphone
- A male Castilian Spanish speaker, using the Eduardo Lopez diphones.
Alistair Conkie and Borja Etxebarria did much to make this. It has
improved recently but is not as comprehensive as our English voices.
voice_gsw_diphone
- This offers a male RP speaker, Gordon, famed for many previous CSTR
synthesizers, using the standard diphone module. Its higher
levels are very similar to the Roger voice above. This voice
is not in the standard distribution, and is unlikely to be added
for commercial reasons, even though it sounds better than Roger.
voice_en1_mbrola
- The Roger diphone set using the same front end as
voice_rab_diphone
but uses the MBROLA diphone synthesizer for waveform synthesis. The
MBROLA synthesizer and Roger diphone database (called en1
)
is not distributed by CSTR but is available for non-commercial use
for free from http://tcts.fpms.ac.be/synthesis/mbrola.html.
We do however provide the Festival part of the voice in
festvox_en1.tar.gz.
voice_us1_mbrola
- A female Amercian English voice using our standard US English front end and the
us1
database for the MBROLA diphone synthesizer for waveform
synthesis. The MBROLA synthesizer and the us1
diphone database
is not distributed by CSTR but is available for
non-commercial use for free from
http://tcts.fpms.ac.be/synthesis/mbrola.html. We
provide the Festival part of the voice in festvox_us1.tar.gz.
voice_us2_mbrola
- A male Amercian English voice using our standard US English front end and the
us2
database for the MBROLA diphone synthesizer for waveform
synthesis. The MBROLA synthesizer and the us2
diphone database
is not distributed by CSTR but is available for
non-commercial use for free from
http://tcts.fpms.ac.be/synthesis/mbrola.html. We
provide the Festival part of the voice in festvox_us2.tar.gz.
voice_us3_mbrola
- Another male Amercian English voice using our standard US English front
end and the
us2
database for the MBROLA diphone synthesizer for
waveform synthesis. The MBROLA synthesizer and the us2
diphone
database is not distributed by CSTR but is available for non-commercial
use for free from http://tcts.fpms.ac.be/synthesis/mbrola.html.
We provide the Festival part of the voice in festvox_us1.tar.gz.
Other voices will become available through time. Groups other than CSTR
are working on new voices. Particularly OGI's CSLU have release a
number of American English voices, two Mexican Spanish voices and two German
voices. All use OGI's their own residual excited LPC
synthesizer which is distributed as a plug-in for Festival.
(see http://www.cse.ogi.edu/CSLU/research/TTS for
details).
Other languages are being worked on including German, Basque, Welsh,
Greek and Polish already have been developed and could be release soon.
CSTR has a set of Klingon diphones though the text anlysis for Klingon
still requires some work (If anyone has access to a good Klingon
continous speech corpora please let us know.)
Pointers and examples of voices developed at CSTR and elsewhere will
be posted on the Festival home page.