Regular expressions are a formal method for describing a certain class of mathematical languages. They may be viewed as patterns which match some set of strings. They are very common in many software tools such as scripting languages like the UNIX shell, PERL, awk, Emacs etc. Unfortunately the exact form of regualr expressions often differs slightly between different applications making their use often a little tricky.
Festival support regular expressions based mainly of the form used in
the GNU libg++ Regex
class, though we have our own implementation
of it. Our implementation (EST_Regex
) is actually based on Henry
Spencer's regex.c as distributed with BSD 4.4.
Regular expressions are represented as character strings which are interpreted as regular expressions by certain Scheme and C++ functions. Most characters in a regular expression are treated as literals and match only that character but a number of others have special meaning. Some characters may be escaped with preceding backslashes to change them from operators to literals (or sometime literals to operators).
.
$
^
X*
X+
X?
[...]
a-z
for all
lower case characters. If the first character of the range is
^
then it matches anything character except those specified
in the range. If you wish -
to be in the range you must
put that first.
\\(...\\)
*
, +
, ?
etc to operate on more than single characters.
X\\|Y
Some example may help in enderstanding the use of regular expressions.
a.b
a
and
ending with a b
.
.*a
a
.*a.*
a
[A-Z].*
[0-9]+
-?[0-9]+\\(\\.[0-9]+\\)?
[^aeiouAEIOU]+
\\([Ss]at\\(urday\\)\\)?\\|\\([Ss]un\\(day\\)\\)
The Scheme function string-matches
takes a string and
a regular expression and returns t
if the regular
expression macthes the string and nil
otherwise.