Post-lexical rules - Festival Speech Synthesis System

Previous: Available lexicons, Up: Lexicons

13.8 Post-lexical rules

It is the lexicon's job to produce a pronunciation of a given word. However in most languages the most natural pronunciation of a word cannot be found in isolation from the context in which it is to be spoken. This includes such phenomena as reduction, phrase final devoicing and r-insertion. In Festival this is done by post-lexical rules.

PostLex is a module which is run after accent assignment but before duration and F0 generation. This is because knowledge of accent position is necessary for vowel reduction and other post lexical phenomena and changing the segmental items will affect durations.

The PostLex first applies a set of built in rules (which could be done in Scheme but for historical reasons are still in C++). It then applies the functions set in the hook postlex_rules_hook. These should be a set of functions that take an utterance and apply appropriate rules. This should be set up on a per voice basis.

Although a rule system could be devised for post-lexical sound rules it is unclear what the scope of them should be, so we have left it completely open. Our vowel reduction model uses a CART decision tree to predict which syllables should be reduced, while the "'s" rule is very simple (shown in festival/lib/postlex.scm).

The 's in English may be pronounced in a number of different ways depending on the preceding context. If the preceding consonant is a fricative or affricative and not a palatal labio-dental or dental a schwa is required (e.g. bench's) otherwise no schwa is required (e.g. John's). Also if the previous phoneme is unvoiced the "s" is rendered as an "s" while in all other cases it is rendered as a "z".

For our English voices we have a lexical entry for "'s" as a schwa followed by a "z". We use a post lexical rule function called postlex_apos_s_check to modify the basic given form when required. After lexical lookup the segment relation contains the concatenation of segments directly from lookup in the lexicon. Post lexical rules are applied after that.

In the following rule we check each segment to see if it is part of a word labelled "'s", if so we check to see if are we currently looking at the schwa or the z part, and test if modification is required

     (define (postlex_apos_s_check utt)
       "(postlex_apos_s_check UTT)
     Deal with possesive s for English (American and British).  Delete
     schwa of 's if previous is not a fricative or affricative, and
     change voiced to unvoiced s if previous is not voiced."
       (mapcar
        (lambda (seg)
          (if (string-equal "'s" (item.feat
                                  seg "R:SylStructure.parent.parent.name"))
              (if (string-equal "a" (item.feat seg 'ph_vlng))
                  (if (and (member_string (item.feat seg 'p.ph_ctype)
                                          '(f a))
                           (not (member_string
                                 (item.feat seg "p.ph_cplace")
                                 '(d b g))))
                      t;; don't delete schwa
                      (item.delete seg))
                  (if (string-equal "-" (item.feat seg "p.ph_cvox"))
                      (item.set_name seg "s")))));; from "z"
        (utt.relation.items utt 'Segment))
       utt)