A number of utterance types are currently supported. It is easy to add new ones but the standard distribution includes the following.
Text
(Utterance Text "This is an example")
Words
(Utterance Words (this is an example))
Words may be atomic or lists if further features need to be specified. For example to specify a word and its part of speech you can use
(Utterance Words (I (live (pos v)) in (Reading (pos n) (tone H-H%))))
Note: the use of the tone feature requires an intonation mode that supports it.
Any feature and value named in the input will be added to the Word
item.
Phrase
(Utterance Phrase ((Phrase ((name B)) I saw the man (in ((EMPH 1))) the park) (Phrase ((name BB)) with the telescope)))
ToBI tones and accents may also be specified on Tokens but these will
only take effect if the selected intonation method uses them.
Segments
(Utterance Segments ((# 0.19 ) (h 0.055 (0 115)) (@ 0.037 (0.018 136)) (l 0.064 ) (ou 0.208 (0.0 134) (0.100 135) (0.208 123)) (# 0.19)))
Note the times are in seconds NOT milliseconds. The format of
each segment entry is segment name, duration in seconds, and list of
target values. Each target value consists of a pair of point into the
segment (in seconds) and F0 value in Hz.
Phones
FP_duration
, default 100
ms) and monotone intonation (specified in FP_F0
, default 120Hz).
This may be used for simple checks for waveform synthesizers etc.
(Utterance Phones (# h @ l ou #))
Note the function SayPhones
allows synthesis and playing of
lists of phones through this utterance type.
Wave
(Utterance Wave fred.wav)
Tokens
used in TTS and SegF0
used by utt.resynth
.