Server/client API - Festival Speech Synthesis System

Next: C/C++ API, Previous: Shell API, Up: API

28.3 Server/client API

Festival offers a BSD socket-based interface. This allows Festival to run as a server and allow client programs to access it. Basically the server offers a new command interpreter for each client that attaches to it. The server is forked for each client but this is much faster than having to wait for a Festival process to start from scratch. Also the server can run on a bigger machine, offering much faster synthesis.

Note: the Festival server is inherently insecure and may allow arbitrary users access to your machine.

Every effort has been made to minimise the risk of unauthorised access through Festival and a number of levels of security are provided. However with any program offering socket access, like httpd, sendmail or ftpd there is a risk that unauthorised access is possible. I trust Festival's security enough to often run it on my own machine and departmental servers, restricting access to within our department. Please read the information below before using the Festival server so you understand the risks.

28.3.1 Server access control

The following access control is available for Festival when running as a server. When the server starts it will usually start by loading in various commands specific for the task it is to be used for. The following variables are used to control access.

server_port

A number identifying the inet socket port. By default this is 1314. It may be changed as required.

server_log_file

If nil no logging takes place, if t logging is printed to standard out and if a file name log messages are appended to that file. All connections and attempted connections are logged with a time stamp and the name of the client. All commands sent from the client are also logged (output and data input is not logged).

server_deny_list

If non-nil it is used to identify which machines are not allowed access to the server. This is a list of regular expressions. If the host name of the client matches any of the regexs in this list the client is denied access. This overrides all other access methods. Remember that sometimes hosts are identified as numbers not as names.

server_access_list

If this is non-nil only machines whose names match at least one of the regexs in this list may connect as clients. Remember that sometimes hosts are identified as numbers not as names, so you should probably exclude the IP number of machine as well as its name to be properly secure.

server_passwd

If this is non-nil, the client must send this passwd to the server followed by a newline before access is given. This is required even if the machine is included in the access list. This is designed so servers for specific tasks may be set up with reasonable security.

(set_server_safe_functions FUNCNAMELIST)

If called this can restrict which functions the client may call. This is the most restrictive form of access, and thoroughly recommended. In this mode it would be normal to include only the specific functions the client can execute (i.e. the function to set up output, and a tts function). For example a server could call the following at set up time, thus restricting calls to only those that festival_client --ttw uses.

          (set_server_safe_functions
                  '(tts_return_to_client tts_text tts_textall Parameter.set))

Its is strongly recommend that you run Festival in server mode as userid nobody to limit the access the process will have, also running it in a chroot environment is more secure.

For example suppose we wish to allow access to all machines in the CSTR domain except for holmes.cstr.ed.ac.uk and adam.cstr.ed.ac.uk. This may be done by adding the following two commands to a file e.g. server.scm

     (set! server_deny_list '("holmes\\.cstr\\.ed\\.ac\\.uk"
                              "adam\\.cstr\\.ed\\.ac\\.uk"))
     (set! server_access_list '("[^\\.]*\\.cstr\\.ed\\.ac\\.uk"))

and them running the command

     festival PATH_TO/server.scm --server

This is not complete though as when DNS is not working holmes and adam will still be able to access the server (but if our DNS isn't working we probably have more serious problems). However the above is secure in that only machines in the domain cstr.ed.ac.uk can access the server, though there may be ways to fix machines to identify themselves as being in that domain even when they are not.

By default Festival in server mode will only accept client connections for localhost.

28.3.2 Client control

An example client program called festival_client is included with the system that provides a wide range of access methods to the server. A number of options for the client are offered.

--server

The name (or IP number) of the server host. By default this is localhost (i.e. the same machine you run the client on).

--port

The port number the Festival server is running on. By default this is 1314.

--output FILENAME

If a waveform is to be synchronously returned, it will be saved in FILENAME. The --ttw option uses this as does the use of the Festival command utt.send.wave.client. If an output waveform file is received by festival_client and no output file has been given the waveform is discarded with an error message.

--passwd PASSWD

If a passwd is required by the server this should be stated on the client call. PASSWD is sent plus a newline before any other communication takes places. If this isn't specified and a passwd is required, you must enter that first, if the --ttw option is used, a passwd is required and none specified access will be denied.

--prolog FILE

FILE is assumed to be contain Festival commands and its contents are sent to the server after the passwd but before anything else. This is convenient to use in conjunction with --ttw which otherwise does not offer any way to send commands as well as the text to the server.

--otype OUTPUTTYPE

If an output waveform file is to be used this specified the output type of the file. The default is nist, but alaw, riff, ulaw and others as supported by the Edinburgh Speech Tools Library are valid. You may use raw too but note that Festival may return waveforms of various sampling rates depending on the sample rates of the databases its using. You can of course make Festival only return one particular sample rate, by using after_synth_hooks. Note that byte order will be native machine of the client machine if the output format allows it.

--ttw

Text to wave is an attempt to make festival_client useful in many simple applications. Although you can connect to the server and send arbitrary Festival Scheme commands, this option automatically does what is probably what you want most often. When specified this options takes text from the specified file (or stdin), synthesizes it (in one go) and saves it in the specified output file. It basically does the following

          (Parameter.set 'Wavefiletype '<output type>)
          (tts_textall "
          <file/stdin contents>
          ")))

Note that this is best used for small, single utterance texts as you have to wait for the whole text to be synthesized before it is returned.

--aucommand COMMAND

Execute COMMAND of each waveform returned by the server. The variable FILE will be set when COMMAND is executed.

--async

So that the delay between the text being sent and the first sound being available to play, this option in conjunction with --ttw causes the text to be synthesized utterance by utterance and be sent back in separated waveforms. Using --aucommand each waveform my be played locally, and when festival_client is interrupted the sound will stop. Getting the client to connect to an audio server elsewhere means the sound will not necessarily stop when the festival_client process is stopped.

--withlisp

With each command being sent to Festival a Lisp return value is sent, also Lisp expressions may be sent from the server to the client through the command send_client. If this option is specified the Lisp expressions are printed to standard out, otherwise this information is discarded.

A typical example use of festival_client is

     festival_client --async --ttw --aucommand 'na_play $FILE' fred.txt

This will use na_play to play each waveform generated for the utterances in fred.txt. Note the single quotes so that the $ in $FILE isn't expanded locally.

Note the server must be running before you can talk to it. At present Festival is not set up for automatic invocations through inetd and /etc/services. If you do that yourself, note that it is a different type of interface as inetd assumes all communication goes through standard in/out.

Also note that each connection to the server starts a new session. Variables are not persistent over multiple calls to the server so if any initialization is required (e.g. loading of voices) it must be done each time the client starts or more reasonably in the server when it is started.

A PERL festival client is also available in festival/examples/festival_client.pl

28.3.3 Server/client protocol

The client talks to the server using s-expression (Lisp). The server will reply with a number of different chunks until either OK is returned or ER (on error). The communication is synchronous, each client request can generate a number of waveform (WV) replies and/or Lisp replies (LP) and will be terminated with an OK (or ER). Lisp is used as it has its own inherent syntax that Festival can already parse.

The following pseudo-code will help define the protocol as well as show typical use

        fprintf(serverfd,"%s\n",s-expression);
        do
           ack = read three character acknowledgemnt
           if (ack == "WV\n")
              read a waveform
           else if (ack == "LP\n")
              read an s-expression
           else if (ack == "ER\n")
              an error occurred, break;
        while ack != "OK\n"

The server can send a waveform in an utterance to the client through the function utt.send.wave.client. The server can send a lisp expression to the client through the function TO BE DONE.