Lookup process - Festival Speech Synthesis System

Next: Letter to sound rules, Previous: Defining lexicons, Up: Lexicons

13.3 Lookup process

When looking up a word, either through the C++ interface, or Lisp interface, a word is identified by its headword and part of speech. If no part of speech is specified, nil is assumed which matches any part of speech tag.

The lexicon look up process first checks the addenda, if there is a full match (head word plus part of speech) it is returned. If there is an addenda entry whose head word matches and whose part of speech is nil that entry is returned.

If no match is found in the addenda, the compiled lexicon, if present, is checked. Again a match is when both head word and part of speech tag match, or either the word being searched for has a part of speech nil or an entry has its tag as nil. Unlike the addenda, if no full head word and part of speech tag match is found, the first word in the lexicon whose head word matches is returned. The rationale is that the letter to sound rules (the next defence) are unlikely to be better than an given alternate pronunciation for a the word but different part of speech. Even more so given that as there is an entry with the head word but a different part of speech this word may have an unusual pronunciation that the letter to sound rules will have no chance in producing.

Finally if the word is not found in the compiled lexicon it is passed to whatever method is defined for unknown words. This is most likely a letter to sound module. See Letter to sound rules.

Optional pre- and post-lookup hooks can be specified for a lexicon. As a single (or list of) Lisp functions. The pre-hooks will be called with two arguments (word and features) and should return a pair (word and features). The post-hooks will be given a lexical entry and should return a lexical entry. The pre- and post-hooks do nothing by default.

Compiled lexicons may be created from lists of lexical entries. A compiled lexicon is much more efficient for look up than the addenda. Compiled lexicons use a binary search method while the addenda is searched linearly. Also it would take a prohibitively long time to load in a typical full lexicon as an addenda. If you have more than a few hundred entries in your addenda you should seriously consider adding them to your compiled lexicon.

Because many publicly available lexicons do not have syllable markings for entries the compilation method supports automatic syllabification. Thus for lexicon entries for compilation, two forms for the pronunciation field are supported: the standard full syllabified and stressed form and a simpler linear form found in at least the BEEP and CMU lexicons. If the pronunciation field is a flat atomic list it is assumed syllabification is required.

Syllabification is done by finding the minimum sonorant position between vowels. It is not guaranteed to be accurate but does give a solution that is sufficient for many purposes. A little work would probably improve this significantly. Of course syllabification requires the entry's phones to be in the current phone set. The sonorant values are calculated from the vc, ctype, and cvox features for the current phoneset. See src/arch/festival/Phone.cc:ph_sonority() for actual definition.

Additionally in this flat structure vowels (atoms starting with a, e, i, o or u) may have 1 2 or 0 appended marking stress. This is again following the form found in the BEEP and CMU lexicons.

Some example entries in the flat form (taken from BEEP) are

     ("table" nil (t ei1 b l))
     ("suspicious" nil (s @ s p i1 sh @ s))

Also if syllabification is required there is an opportunity to run a set of "letter-to-sound"-rules on the input (actually an arbitrary re-write rule system). If the variable lex_lts_set is set, the lts ruleset of that name is applied to the flat input before syllabification. This allows simple predictable changes such as conversion of final r into longer vowel for English RP from American labelled lexicons.

A list of all matching entries in the addenda and the compiled lexicon may be found by the function lex.lookup_all. This function takes a word and returns all matching entries irrespective of part of speech.

You can optionally intercept the words as they are looked up, and after they have been found through pre_hooks and post_hooks for each lexicon. This allows a function or list of functions to be applied to a word and feature before lookup or to the resulting entry after lookup. The following example shows how to add voice specific entries to a general lexicon without affecting other voices that use that lexicon.

For example suppose we were trying to use a Scottish English voice with the US English (cmu) lexicon. A number of entries will be inappropriate but we can redefine some entries thus

     (set! cmu_us_awb::lexicon_addenda
           '(
     	("edinburgh" n (((eh d) 1) ((ax n) 0) ((b r ax) 0)))
     	("poem" n (((p ow) 1) ((y ax m) 0)))
     	("usual" n (((y uw) 1) ((zh ax l) 0)))
     	("air" n (((ey r) 1)))
     	("hair" n (((hh ey r) 1)))
     	("fair" n (((f ey r) 1)))
     	("chair" n (((ch ey r) 1)))))

We can then define a function that checks to see if the word looked up is in the speaker specific exception list and use that entry instead.

     (define (cmu_us_awb::cmu_lookup_post entry)
       "(cmu_us_awb::cmu_lookup_post entry)
     Speaker specific lexicon addeda."
       (let ((ne
     	 (assoc_string (car entry) cmu_us_awb::lexicon_addenda)))
         (if ne
     	ne
     	entry)))

And then for the particular voice set up we need to add both a selection part and a reset part. Thus following the FestVox conventions for voice set up.

     (define (cmu_us_awb::select_lexicon)
     
         ...
         (lex.select "cmu")
         ;; Get old var for reset and to append our function to is
         (set! cmu_us_awb::old_cmu_post_hooks
            (lex.set.post_hooks nil))
         (lex.set.post_hooks
            (append cmu_us_awb::old_cmu_post_hooks
                    (list cmu_us_awb::cmu_lookup_post)))
         ...
     )
     
     ...
     
     (define (cmu_us_awb::reset_lexicon)
     
       ...
       ;; reset CMU's post_hooks back to original
       (lex.set.post_hooks cmu_us_awb::old_cmu_post_hooks)
       ...
     
     )

The above isn't the most efficient way as the word is looked up first then it is checked with the speaker specific list.

The pre_hooks functions are called with two arguments, the word and features, they should return a pair of word and features.