Another common tool is a Viterbi decoder. This C++ Class is defined in the speech tools library speech_tooks/include/EST_viterbi.h and speech_tools/stats/EST_viterbi.cc. A Viterbi decoder requires two functions at declaration time. The first constructs candidates at each stage, while the second combines paths. A number of options are available (which may change).
The prototypical example of use is in the part of speech tagger which using standard Ngram models to predict probabilities of tags. See src/modules/base/pos.cc for an example.
The Viterbi decoder can also be used through the Scheme function
Gen_Viterbi
. This function respects the parameters defined
in the variable get_vit_params
. Like other modules this
parameter list is an assoc list of feature name and value. The
parameters supported are:
Relation
cand_function
return_feat
p_word
pp_word
ngramname
ngram.load
) to be used
as a "language model".
wfstmname
wfst.load
) to be used
as a "language model", this is ignored if an ngramname
is also
specified.
debug
gscale_p
There are two parts required for the Viterbi decode a set of candidate observations and some "language model". For the math to work properly the candidate observations must be reverse probabilities (for each candidiate as given what is the probability of the observation, rather than the probability of the candidate given the observation). These can be calculated for the probabilties candidate given the observation divided by the probability of the candidate in isolation.
For the sake of simplicity let us assume we have a lexicon of words to
distribution of part of speech tags with reverse probabilities. And an
tri-gram called pos-tri-gram
over ngram sequences of part of
speech tags. First we must define the candidate function
(define (pos_cand_function w) ;; select the appropriate lexicon (lex.select 'pos_lex) ;; return the list of cands with rprobs (cadr (lex.lookup (item.name w) nil)))
The returned candidate list would look somthing like
( (jj -9.872) (vbd -6.284) (vbn -5.565) )
Our part of speech tagger function would look something like this
(define (pos_tagger utt) (set! get_vit_params (list (list 'Relation "Word") (list 'return_feat 'pos_tag) (list 'p_word "punc") (list 'pp_word "nn") (list 'ngramname "pos-tri-gram") (list 'cand_function 'pos_cand_function))) (Gen_Viterbi utt) utt)
this will assign the optimal part of speech tags to each word in utt.