Viterbi decoder - Festival Speech Synthesis System

Next: Linear regression, Previous: Ngrams, Up: Tools

25.4 Viterbi decoder

Another common tool is a Viterbi decoder. This C++ Class is defined in the speech tools library speech_tooks/include/EST_viterbi.h and speech_tools/stats/EST_viterbi.cc. A Viterbi decoder requires two functions at declaration time. The first constructs candidates at each stage, while the second combines paths. A number of options are available (which may change).

The prototypical example of use is in the part of speech tagger which using standard Ngram models to predict probabilities of tags. See src/modules/base/pos.cc for an example.

The Viterbi decoder can also be used through the Scheme function Gen_Viterbi. This function respects the parameters defined in the variable get_vit_params. Like other modules this parameter list is an assoc list of feature name and value. The parameters supported are:

Relation: The name of the relation the decoeder is to be applied to.
cand_function: A function that is to be called for each item that will return a list of candidates (with probilities).
return_feat: The name of a feature that the best candidate is to be returned in for each item in the named relation.
p_word: The previous word to the first item in the named relation (only used when ngrams are the "language model").
pp_word: The previous previous word to the first item in the named relation (only used when ngrams are the "language model").
ngramname: the name of an ngram (loaded by ngram.load) to be used as a "language model".
wfstmname: the name of a WFST (loaded by wfst.load) to be used as a "language model", this is ignored if an ngramname is also specified.
debug: If specified more debug features are added to the items in the relation.
gscale_p: Grammar scaling factor.

Here is a short example to help make the use of this facility clearer.

There are two parts required for the Viterbi decode a set of candidate observations and some "language model". For the math to work properly the candidate observations must be reverse probabilities (for each candidiate as given what is the probability of the observation, rather than the probability of the candidate given the observation). These can be calculated for the probabilties candidate given the observation divided by the probability of the candidate in isolation.

For the sake of simplicity let us assume we have a lexicon of words to distribution of part of speech tags with reverse probabilities. And an tri-gram called pos-tri-gram over ngram sequences of part of speech tags. First we must define the candidate function

     (define (pos_cand_function w)
      ;; select the appropriate lexicon
      (lex.select 'pos_lex)
      ;; return the list of cands with rprobs
      (cadr
       (lex.lookup (item.name w) nil)))

The returned candidate list would look somthing like

     ( (jj -9.872) (vbd -6.284) (vbn -5.565) )

Our part of speech tagger function would look something like this

     (define (pos_tagger utt)
       (set! get_vit_params
             (list
              (list 'Relation "Word")
              (list 'return_feat 'pos_tag)
              (list 'p_word "punc")
              (list 'pp_word "nn")
              (list 'ngramname "pos-tri-gram")
              (list 'cand_function 'pos_cand_function)))
       (Gen_Viterbi utt)
       utt)

this will assign the optimal part of speech tags to each word in utt.