Singing Synthesis - Festival Speech Synthesis System

29.2 Singing Synthesis

As an interesting example a singing-mode is included. This offers an XML based mode for specifying songs, both notes and duration. This work was done as a student project by Dominic Mazzoni. A number of examples are provided in examples/songs. This may be run as

     festival> (tts "doremi.xml" 'singing)

Each note can be given a note and a beat value

     <?xml version="1.0"?>
     <!DOCTYPE SINGING PUBLIC "-//SINGING//DTD SINGING mark up//EN"
           "Singing.v0_1.dtd"
     []>
     <SINGING BPM="30">
     <PITCH NOTE="G3"><DURATION BEATS="0.3">doe</DURATION></PITCH>
     <PITCH NOTE="A3"><DURATION BEATS="0.3">ray</DURATION></PITCH>
     <PITCH NOTE="B3"><DURATION BEATS="0.3">me</DURATION></PITCH>
     <PITCH NOTE="C4"><DURATION BEATS="0.3">fah</DURATION></PITCH>
     <PITCH NOTE="D4"><DURATION BEATS="0.3">sew</DURATION></PITCH>
     <PITCH NOTE="E4"><DURATION BEATS="0.3">lah</DURATION></PITCH>
     <PITCH NOTE="F#4"><DURATION BEATS="0.3">tee</DURATION></PITCH>
     <PITCH NOTE="G4"><DURATION BEATS="0.3">doe</DURATION></PITCH>
     </SINGING>

You can construct multi-part songs by synthesizing each part and generating waveforms, then combining them. For example

     text2wave -mode singing america1.xml -o america1.wav
     text2wave -mode singing america2.xml -o america2.wav
     text2wave -mode singing america3.xml -o america3.wav
     text2wave -mode singing america4.xml -o america4.wav
     ch_wave -o america.wav -pc longest america?.wav

The voice used to sing is the current voice. Note that the number of syllables in the words must match that at run time, which means this doesn't always work cross dialect (UK voices sometimes won't work without tweaking).

This technique is basically simple, though is definitely effective. However for a more serious singing synthesizer we recommend you look at Flinger http://cslu.cse.ogi.edu/tts/flinger/, which addresses the issues of synthesizing the human singing voice in more detail.