Next: , Previous: CART trees, Up: Tools


25.3 Ngrams

Bigram, trigrams, and general ngrams are used in the part of speech tagger and the phrase break predicter. An Ngram C++ Class is defined in the speech tools library and some simple facilities are added within Festival itself.

Ngrams may be built from files of tokens using the program ngram_build which is part of the speech tools. See the speech tools documentation for details.

Within Festival ngrams may be named and loaded from files and used when required. The LISP function load_ngram takes a name and a filename as argument and loads the Ngram from that file. For an example of its use once loaded see src/modules/base/pos.cc or src/modules/base/phrasify.cc.