Edinburgh Speech Tools  2.1-release
Pitch/F0 Detection Algorithm functions
Collaboration diagram for Pitch/F0 Detection Algorithm functions:

Functions

void pda (EST_Wave &sig, EST_Track &fz, EST_Features &op, EST_String method="")
 
void icda (EST_Wave &sig, EST_Track &fz, EST_Track &speech, EST_Option &op, EST_String method="")
 
void default_pda_options (EST_Features &al)
 
void srpd (EST_Wave &sig, EST_Track &fz, EST_Features &options)
 
void smooth_phrase (EST_Track &c, EST_Track &speech, EST_Features &options, EST_Track &sm)
 
void smooth_portion (EST_Track &c, EST_Option &op)
 

Detailed Description

These functions are used to produce a track of fundamental frequency (F0) against time of a waveform.

Function Documentation

void pda ( EST_Wave sig,
EST_Track fz,
EST_Features op,
EST_String  method = "" 
)

Top level pitch (F0) detection algorithm. Returns a track containing evenly spaced frames of speech, each containing a F0 value for that point.

At present, only the srpd pitch tracker is implemented, so this is always called regardless of what method is set to.

Parameters
siginput waveform
fzoutput f0 contour
opparameters for pitch tracker
methodpda method to be used.

Definition at line 51 of file pda.cc.

void icda ( EST_Wave sig,
EST_Track fz,
EST_Track speech,
EST_Option op,
EST_String  method = "" 
)

Top level intonation contour detection algorithm. Returns a track containing evenly spaced frames of speech, each containing a F0 for that point. icda differs from pda in that the contour is smoothed, and unvoiced portions have interpolated F0 values.

Parameters
siginput waveform
fzoutput f0 contour
speech{Interpolation is controlled by the speech track. When a point has a positive value in the speech track, it is a candidate for interpolation. }
opparameters for pitch tracker
methodpda method to be used.
void default_pda_options ( EST_Features al)

Create a set sensible defaults for use in pda and icda.

Definition at line 229 of file pda.cc.

void srpd ( EST_Wave sig,
EST_Track fz,
EST_Features options 
)

Super resolution pitch tracker.

srpd is a pitch detection algorithm that produces a fundamental frequency contour from a speech waveform. At present only the super resolution pitch determination algorithm is implemented. See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed description of the algorithm.

Frames of data are read in from sig in chronological order such that each frame is shifted in time from its predecessor by pda_frame_shift. Each frame is analysed in turn.

The maximum and minimum signal amplitudes are initially found over the duration of two segments, each of length N_min samples. If the sum of their absolute values is below two times noise_floor, the frame is classified as representing silence and no coefficients are calculated. Otherwise, a cross correlation coefficient is calculated for all n from a period in samples corresponding to min_pitch to a period in samples corresponding to max_pitch, in steps of decimation_factor. In calculating the coefficient only one in decimation_factor samples of the two segments are used. Such down-sampling permits rapid estimates of the coefficients to be calculated over the range N_min <= n <= N_max. This results in a cross-correlation track for the frame being analysed.

Local maxima of the track with a coefficient value above a specified threshold form candidates for the fundamental period. The threshold is adaptive and dependent upon the values v2uv_coeff_thresh, min_v2uv_coef_thresh, and v2uv_coef_thresh_rati_ratio. If the previously analysed frame was classified as unvoiced or silent (which is the initial state) then the threshold is set to v2uv_coef_thresh. Otherwise, the previous frame was classified as being voiced, and the threshold is set equal to [-r] v2uv_coef_thresh_rati_ratio times the cross-correlation coefficient value at the point of the previous fundamental period in the former coefficients track. This product is not permitted to drop below v2uv_coef_thresh.

If no candidates for the fundamental period are found, the frame is classified as being unvoiced. Otherwise, the candidates are further processed to identify the most likely true pitch period. During this additional processing, a threshold given by anti_doubling_thres is used.

If the peak_tracking flag is set to true, biasing is applied to the cross-correlation track as described in (Bagshaw et al., 1993).

Parameters
siginput waveform
opoptions regarding pitch tracking parameters
op.min_pitchminimum permitted F0 value
op.max_pitchmaximum permitted F0 value
op.pda_frame_shiftanalysis frame shift
op.pda_frame_lengthanalysis frame length
op.lpf_cutoffcut off frequency for low pass filtering
op.lpf_orderorder of low pass filtering (must be odd)
op.decimation
op.noise_floor
op.min_v2uv_coef_thresh
op.v2uv_coef_thresh_ratio
op.v2uv_coef_thresh
op.anti_doubling_thresh
op.peak_tracking

Definition at line 85 of file pda.cc.

void smooth_phrase ( EST_Track c,
EST_Track speech,
EST_Features options,
EST_Track sm 
)

Smooth selected parts of an f0 contour. Interpolation is controlled by the speech track. When a point has a positive value in the speech track, it is a candidate for interpolation.

Definition at line 54 of file smooth_pda.cc.

void smooth_portion ( EST_Track c,
EST_Option op 
)

Smooth all the points in an F0 contour