Next: , Previous: Lexical entries, Up: Lexicons


13.2 Defining lexicons

As stated above, lexicons consist of three basic parts (compiled form, addenda and unknown word method) plus some other declarations.

Each lexicon in the system has a name which allows different lexicons to be selected from efficiently when switching between voices during synthesis. The basic steps involved in a lexicon definition are as follows.

First a new lexicon must be created with a new name

     (lex.create "cstrlex")

A phone set must be declared for the lexicon, to allow both checks on the entries themselves and to allow phone mapping between different phone sets used in the system

     (lex.set.phoneset "mrpa")

The phone set must be already declared in the system.

A compiled lexicon, the construction of which is described below, may be optionally specified

     (lex.set.compile.file "/projects/festival/lib/dicts/cstrlex.out")

The method for dealing with unknown words, See Letter to sound rules, may be set

     (lex.set.lts.method 'lts_rules)
     (lex.set.lts.ruleset 'nrl)

In this case we are specifying the use of a set of letter to sound rules originally developed by the U.S. Naval Research Laboratories. The default method is to give an error if a word is not found in the addenda or compiled lexicon. (This and other options are discussed more fully below.)

Finally addenda items may be added for words that are known to be common, but not in the lexicon and cannot reasonably be analysed by the letter to sound rules.

     (lex.add.entry
       '( "awb" n ((( ei ) 1) ((d uh) 1) ((b @ l) 0) ((y uu) 0) ((b ii) 1))))
     (lex.add.entry
       '( "cstr" n ((( s ii ) 1) (( e s ) 1) (( t ii ) 1) (( aa ) 1)) ))
     (lex.add.entry
       '( "Edinburgh" n ((( e m ) 1) (( b r @ ) 0))) ))

Using lex.add.entry again for the same word and part of speech will redefine the current pronunciation. Note these add entries to the current lexicon so its a good idea to explicitly select the lexicon before you add addenda entries, particularly if you are doing this in your own .festivalrc file.

For large lists, compiled lexicons are best. The function lex.compile takes two filename arguments, a file name containing a list of lexical entries and an output file where the compiled lexicon will be saved.

Compilation can take some time and may require lots of memory, as all entries are loaded in, checked and then sorted before being written out again. During compilation if some entry is malformed the reading process halts with a not so useful message. Note that if any of your entries include quote or double quotes the entries will probably be misparsed and cause such a weird error. In such cases try setting

     (debug_output t)


before compilation. This will print out each entry as it is read in
which should help to narrow down where the error is.