Building models - Festival Speech Synthesis System

Previous: Extracting features, Up: Building models from databases

26.3 Building models

This section describes how to build models from data extracted from databases as described in the previous section. It uses the CART building program, wagon which is available in the speech tools distribution. But the data is suitable for many other types of model building techniques, such as linear regression or neural networks.

Wagon is described in the speech tools manual, though we will cover simple use here. To use Wagon you need a datafile and a data description file.

A datafile consists of a number of vectors one per line each containing the same number of fields. This, not coincidentally, is exactly the format produced by dumpfeats described in the previous section. The data description file describes the fields in the datafile and their range. Fields may be of any of the following types: class (a list of symbols), floats, or ignored. Wagon will build a classification tree if the first field (the predictee) is of type class, or a regression tree if the first field is a float. An example data description file would be

     (
     ( duration float )
     ( name # @ @@ a aa ai au b ch d dh e e@ ei f g h i i@ ii jh k l m n
         ng o oi oo ou p r s sh t th u u@ uh uu v w y z zh )
     ( n.name # @ @@ a aa ai au b ch d dh e e@ ei f g h i i@ ii jh k l m n
         ng o oi oo ou p r s sh t th u u@ uh uu v w y z zh )
     ( p.name # @ @@ a aa ai au b ch d dh e e@ ei f g h i i@ ii jh k l m n
         ng o oi oo ou p r s sh t th u u@ uh uu v w y z zh )
     ( R:SylStructure.parent.position_type 0 final initial mid single )
     ( pos_in_syl float )
     ( syl_initial 0 1 )
     ( syl_final 0 1)
     ( R:SylStructure.parent.R:Syllable.p.syl_break 0 1 3 )
     ( R:SylStructure.parent.syl_break 0 1 3 4 )
     ( R:SylStructure.parent.R:Syllable.n.syl_break 0 1 3 4 )
     ( R:SylStructure.parent.R:Syllable.p.stress 0 1 )
     ( R:SylStructure.parent.stress 0 1 )
     ( R:SylStructure.parent.R:Syllable.n.stress 0 1 )
     )

The script speech_tools/bin/make_wagon_desc goes some way to helping. Given a datafile and a file containing the field names, it will construct an approximation of the description file. This file should still be edited as all fields are treated as of type class by make_wagon_desc and you may want to change them some of them to float.

The data file must be a single file, although we created a number of feature files by the process described in the previous section. From a list of file ids select, say, 80% of them, as training data and cat them into a single datafile. The remaining 20% may be catted together as test data.

To build a tree use a command like

     wagon -desc DESCFILE -data TRAINFILE -test TESTFILE

The minimum cluster size (default 50) may be reduced using the command line option -stop plus a number.

Varying the features and stop size may improve the results.

Building the models and getting good figures is only one part of the process. You must integrate this model into Festival if its going to be of any use. In the case of CART trees generated by Wagon, Festival supports these directly. In the case of CART trees predicting zscores, or factors to modify duration averages, ees can be used as is.

Note there are other options to Wagon which may help build better CART models. Consult the chapter in the speech tools manual on Wagon for more information.

Other parts of the distributed system use CART trees, and linear regression models that were training using the processes described in this chapter. Some other parts of the distributed system use CART trees which were written by hand and may be improved by properly applying these processes.