Each lexicon may define what action to take when a word cannot be found in the addenda or the compiled lexicon. There are a number of options which will hopefully be added to as more general letter to sound rule systems are added.
The method is set by the command
(lex.set.lts.method METHOD)
Where METHOD can be any of the following
lex.lts.ruleset
function. This method runs one
set of rules on an exploded form of the word and assumes the rules
return a list of phonemes (in the appropriate set). If multiple
instances of rules are required use the function
method
described next.
nil
pronunciation field. This will
only be valid in very special circumstances.
The basic letter to sound rule system is very simple but is powerful enough to build reasonably complex letter to sound rules. Although we've found trained LTS rules better than hand written ones (for complex languages) where no data is available and rules must be hand written the following rule formalism is much easier to use than that generated by the LTS training system (described in the next section).
The basic form of a rule is as follows
( LEFTCONTEXT [ ITEMS ] RIGHTCONTEXT = NEWITEMS )
This interpretation is that if ITEMS appear in the specified right and left context then the output string is to contain NEWITEMS. Any of LEFTCONTEXT, RIGHTCONTEXT or NEWITEMS may be empty. Note that NEWITEMS is written to a different "tape" and hence cannot feed further rules (within this ruleset). An example is
( # [ c h ] C = k )
The special character #
denotes a word boundary, and the symbol
C
denotes the set of all consonants, sets are declared before
rules. This rule states that a ch
at the start of a word
followed by a consonant is to be rendered as the k
phoneme.
Symbols in contexts may be followed by the symbol *
for zero or
more occurrences, or +
for one or more occurrences.
The symbols in the rules are treated as set names if they are declared as such or as symbols in the input/output alphabets. The symbols may be more than one character long and the names are case sensitive.
The rules are tried in order until one matches the first (or more) symbol of the tape. The rule is applied adding the right hand side to the output tape. The rules are again applied from the start of the list of rules.
The function used to apply a set of rules if given an atom will explode it into a list of single characters, while if given a list will use it as is. This reflects the common usage of wishing to re-write the individual letters in a word to phonemes but without excluding the possibility of using the system for more complex manipulations, such as multi-pass LTS systems and phoneme conversion.
From lisp there are three basic access functions, there are corresponding functions in the C/C++ domain.
(lts.ruleset NAME SETS RULES)
NAME
is the name for this
rule, SETS is a list of set definitions of the form (SETNAME e0 e1
...)
and RULES
are a list of rules as described above.
(lts.apply WORD RULESETNAME)
RULESETNAME
to WORD
. If
WORD
is a symbol it is exploded into a list of the individual
characters in its print name. If WORD
is a list it is used as
is. If the rules cannot be successfully applied an error is given. The
result of (successful) application is returned in a list.
(lts.check_alpha WORD RULESETNAME)
WORD
are checked against the input alphabet of the
rules named RULESETNAME
. If they are all contained in that
alphabet t
is returned, else nil
. Note this does not
necessarily mean the rules will successfully apply (contexts may restrict
the application of the rules), but it allows general checking like
numerals, punctuation etc, allowing application of appropriate rule
sets.
The letter to sound rule system may be used directly from Lisp and can easily be used to do relatively complex operations for analyzing words without requiring modification of the C/C++ system. For example the Welsh letter to sound rule system consists or three rule sets, first to explicitly identify epenthesis, then identify stressed vowels, and finally rewrite this augmented letter string to phonemes. This is achieved by the following function
(define (welsh_lts word features) (let (epen str wel) (set! epen (lts.apply (downcase word) 'newepen)) (set! str (lts.apply epen 'newwelstr)) (set! wel (lts.apply str 'newwel)) (list word nil (lex.syllabify.phstress wel))))
The LTS method for the Welsh lexicon is set to welsh_lts
, so this
function is called when a word is not found in the lexicon. The
above function first downcases the word and then applies the rulesets in
turn, finally calling the syllabification process and returns a
constructed lexically entry.