Diphone database format - Festival Speech Synthesis System

Next: LPC databases, Up: Diphone synthesizer

21.1 Diphone database format

A diphone database consists of a dictionary file, a set of waveform files, and a set of pitch mark files. These files are the same format as the previous CSTR (Osprey) synthesizer.

The dictionary file consist of one entry per line. Each entry consists of five fields: a diphone name of the form P1-P2, a filename (without extension), a floating point start position in the file in milliseconds, a mid position in milliseconds (change in phone), and an end position in milliseconds. Lines starting with a semi-colon and blank lines are ignored. The list may be in any order.

For example a partial list of phones may look like.

     ch-l  r021   412.035  463.009  518.23
     jh-l  d747   305.841  382.301  446.018
     h-l   d748   356.814  403.54   437.522
     #-@   d404   233.628  297.345  331.327
     @-#   d001   836.814  938.761  1002.48

Waveform files may be in any form, as long as every file is the same type, headered or unheadered as long as the format is supported the speech tools wave reading functions. These may be standard linear PCM waveform files in the case of PSOLA or LPC coefficients and residual when using the residual LPC synthesizer. LPC databases

Pitch mark files consist a simple list of positions in milliseconds (plus places after the point) in order, one per line of each pitch mark in the file. For high quality diphone synthesis these should be derived from laryngograph data. During unvoiced sections pitch marks should be artificially created at reasonable intervals (e.g. 10 ms). In the current format there is no way to determine the "real" pitch marks from the "unvoiced" pitch marks.

It is normal to hold a diphone database in a directory with a number of sub-directories namely dic/ contain the dictionary file, wave/ for the waveform files, typically of whole nonsense words (sometimes this directory is called vox/ for historical reasons) and pm/ for the pitch mark files. The filename in the dictionary entry should be the same for waveform file and the pitch mark file (with different extensions).