In previous versions items had a number of predefined features. This is
no longer the case and all features are optional. Particularly the
start
and end
features are no longer fixed, though those
names are still used in the relations where yjeu are appropriate.
Specific functions are provided for the name
feature but they are
just short hand for normal feature access. Simple features directly access
the features in the underlying EST_Feature
class in an item.
In addition to simple features there is a mechanism for relating
functions to names, thus accessing a feature may actually call a
function. For example the features num_syls
is defined as a
feature function which will count the number of syllables in the
given word, rather than simple access a pre-existing feature. Feature
functions are usually dependent on the particular realtion the
item is in, e.g. some feature functions are only appropriate for
items in the Word
relation, or only appropriate for those in the
IntEvent
relation.
The third aspect of feature names is a path component. These are
parts of the name (preceding in .
) that indicated some
trversal of the utterance structure. For example the features
name
will access the name feature on the given item. The
feature n.name
will return the name feature on the next item
(in that item's relation). A number of basic direction
operators are defined.
n.
p.
nn.
pp.
parent.
daughter1.
daughter2.
daughtern.
first.
last.
R:<relationame>.
operator. For example given an Item
in the syllable relation R:SylStructure.parent.name
would
give the name of word the syllable is in.
Some more complex examples are as follows, assuming we are starting
form an item in the Syllable
relation.
vc
of the final segment in this syllable.
In C++ feature values are of class EST_Val which may be a string,
int, or a float (or any arbitrary object). In Scheme this distinction
cannot not always be made and sometimes when you expect an int you
actually get a string. Care should be take to ensure the right matching
functions are use in Scheme. It is recommended you use
string-append
or string-match
as they will always work.
If a pathname does not identify a valid path for the particular
item (e.g. there is no next) "0"
is returned.
When collecting data from speech databases it is often useful to collect a whole set of features from all utterances in a database. These features can then be used for building various models (both CART tree models and linear regression modules use these feature names),
A number of functions exist to help in this task. For example
(utt.features utt1 'Word '(name pos p.pos n.pos))
will return a list of word, and part of speech context for each word in the utterance.
See Extracting features, for an example of extracting sets of features from a database for use in building stochastic models.