Collaboration diagram for Pitch/F0 Detection Algorithm functions:

Functions
void	pda (EST_Wave &sig, EST_Track &fz, EST_Features &op, EST_String method="")

void	icda (EST_Wave &sig, EST_Track &fz, EST_Track &speech, EST_Option &op, EST_String method="")

void	default_pda_options (EST_Features &al)

void	srpd (EST_Wave &sig, EST_Track &fz, EST_Features &options)

void	smooth_phrase (EST_Track &c, EST_Track &speech, EST_Features &options, EST_Track &sm)

void	smooth_portion (EST_Track &c, EST_Option &op)

Detailed Description

These functions are used to produce a track of fundamental frequency (F0) against time of a waveform.

Function Documentation

void pda	(	EST_Wave &	sig,
		EST_Track &	fz,
		EST_Features &	op,
		EST_String	method = `""`
	)

Top level pitch (F0) detection algorithm. Returns a track containing evenly spaced frames of speech, each containing a F0 value for that point.

At present, only the srpd pitch tracker is implemented, so this is always called regardless of what method is set to.

Parameters

sig	input waveform
fz	output f0 contour
op	parameters for pitch tracker
method	pda method to be used.

Definition at line 51 of file pda.cc.

void icda	(	EST_Wave &	sig,
		EST_Track &	fz,
		EST_Track &	speech,
		EST_Option &	op,
		EST_String	method = `""`
	)

Top level intonation contour detection algorithm. Returns a track containing evenly spaced frames of speech, each containing a F0 for that point. icda differs from pda in that the contour is smoothed, and unvoiced portions have interpolated F0 values.

Parameters

sig	input waveform
fz	output f0 contour
speech	{Interpolation is controlled by the `speech` track. When a point has a positive value in the speech track, it is a candidate for interpolation. }
op	parameters for pitch tracker
method	pda method to be used.

void default_pda_options ( EST_Features & al )

Create a set sensible defaults for use in pda and icda.

Definition at line 229 of file pda.cc.

void srpd	(	EST_Wave &	sig,
		EST_Track &	fz,
		EST_Features &	options
	)

Super resolution pitch tracker.

srpd is a pitch detection algorithm that produces a fundamental frequency contour from a speech waveform. At present only the super resolution pitch determination algorithm is implemented. See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed description of the algorithm.

Frames of data are read in from sig in chronological order such that each frame is shifted in time from its predecessor by pda_frame_shift. Each frame is analysed in turn.

The maximum and minimum signal amplitudes are initially found over the duration of two segments, each of length N_min samples. If the sum of their absolute values is below two times noise_floor, the frame is classified as representing silence and no coefficients are calculated. Otherwise, a cross correlation coefficient is calculated for all n from a period in samples corresponding to min_pitch to a period in samples corresponding to max_pitch, in steps of decimation_factor. In calculating the coefficient only one in decimation_factor samples of the two segments are used. Such down-sampling permits rapid estimates of the coefficients to be calculated over the range N_min <= n <= N_max. This results in a cross-correlation track for the frame being analysed.

Local maxima of the track with a coefficient value above a specified threshold form candidates for the fundamental period. The threshold is adaptive and dependent upon the values v2uv_coeff_thresh, min_v2uv_coef_thresh, and v2uv_coef_thresh_rati_ratio. If the previously analysed frame was classified as unvoiced or silent (which is the initial state) then the threshold is set to v2uv_coef_thresh. Otherwise, the previous frame was classified as being voiced, and the threshold is set equal to [-r] v2uv_coef_thresh_rati_ratio times the cross-correlation coefficient value at the point of the previous fundamental period in the former coefficients track. This product is not permitted to drop below v2uv_coef_thresh.

If no candidates for the fundamental period are found, the frame is classified as being unvoiced. Otherwise, the candidates are further processed to identify the most likely true pitch period. During this additional processing, a threshold given by anti_doubling_thres is used.

If the peak_tracking flag is set to true, biasing is applied to the cross-correlation track as described in (Bagshaw et al., 1993).

Parameters

sig	input waveform
op	options regarding pitch tracking parameters
op.min_pitch	minimum permitted F0 value
op.max_pitch	maximum permitted F0 value
op.pda_frame_shift	analysis frame shift
op.pda_frame_length	analysis frame length
op.lpf_cutoff	cut off frequency for low pass filtering
op.lpf_order	order of low pass filtering (must be odd)
op.decimation
op.noise_floor
op.min_v2uv_coef_thresh
op.v2uv_coef_thresh_ratio
op.v2uv_coef_thresh
op.anti_doubling_thresh
op.peak_tracking

Definition at line 85 of file pda.cc.

void smooth_phrase	(	EST_Track &	c,
		EST_Track &	speech,
		EST_Features &	options,
		EST_Track &	sm
	)

Smooth selected parts of an f0 contour. Interpolation is controlled by the speech track. When a point has a positive value in the speech track, it is a candidate for interpolation.

Definition at line 54 of file smooth_pda.cc.

void smooth_portion	(	EST_Track &	c,
		EST_Option &	op
	)

Smooth all the points in an F0 contour

Functions

Detailed Description

Function Documentation