Functions | |
void | pda (EST_Wave &sig, EST_Track &fz, EST_Features &op, EST_String method="") |
void | icda (EST_Wave &sig, EST_Track &fz, EST_Track &speech, EST_Option &op, EST_String method="") |
void | default_pda_options (EST_Features &al) |
void | srpd (EST_Wave &sig, EST_Track &fz, EST_Features &options) |
void | smooth_phrase (EST_Track &c, EST_Track &speech, EST_Features &options, EST_Track &sm) |
void | smooth_portion (EST_Track &c, EST_Option &op) |
These functions are used to produce a track of fundamental frequency (F0) against time of a waveform.
void pda | ( | EST_Wave & | sig, |
EST_Track & | fz, | ||
EST_Features & | op, | ||
EST_String | method = "" |
||
) |
Top level pitch (F0) detection algorithm. Returns a track containing evenly spaced frames of speech, each containing a F0 value for that point.
At present, only the srpd pitch tracker is implemented, so this is always called regardless of what method
is set to.
sig | input waveform |
fz | output f0 contour |
op | parameters for pitch tracker |
method | pda method to be used. |
void icda | ( | EST_Wave & | sig, |
EST_Track & | fz, | ||
EST_Track & | speech, | ||
EST_Option & | op, | ||
EST_String | method = "" |
||
) |
Top level intonation contour detection algorithm. Returns a track containing evenly spaced frames of speech, each containing a F0 for that point. icda
differs from pda in that the contour is smoothed, and unvoiced portions have interpolated F0 values.
sig | input waveform |
fz | output f0 contour |
speech | {Interpolation is controlled by the speech track. When a point has a positive value in the speech track, it is a candidate for interpolation. } |
op | parameters for pitch tracker |
method | pda method to be used. |
void default_pda_options | ( | EST_Features & | al | ) |
void srpd | ( | EST_Wave & | sig, |
EST_Track & | fz, | ||
EST_Features & | options | ||
) |
Super resolution pitch tracker.
srpd is a pitch detection algorithm that produces a fundamental frequency contour from a speech waveform. At present only the super resolution pitch determination algorithm is implemented. See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed description of the algorithm.
Frames of data are read in from sig
in chronological order such that each frame is shifted in time from its predecessor by pda_frame_shift
. Each frame is analysed in turn.
The maximum and minimum signal amplitudes are initially found over the duration of two segments, each of length N_min samples. If the sum of their absolute values is below two times noise_floor, the frame is classified as representing silence and no coefficients are calculated. Otherwise, a cross correlation coefficient is calculated for all n from a period in samples corresponding to min_pitch
to a period in samples corresponding to max_pitch
, in steps of decimation_factor
. In calculating the coefficient only one in decimation_factor
samples of the two segments are used. Such down-sampling permits rapid estimates of the coefficients to be calculated over the range N_min <= n <= N_max. This results in a cross-correlation track for the frame being analysed.
Local maxima of the track with a coefficient value above a specified threshold form candidates for the fundamental period. The threshold is adaptive and dependent upon the values v2uv_coeff_thresh
, min_v2uv_coef_thresh
, and v2uv_coef_thresh_rati_ratio
. If the previously analysed frame was classified as unvoiced or silent (which is the initial state) then the threshold is set to v2uv_coef_thresh
. Otherwise, the previous frame was classified as being voiced, and the threshold is set equal to [-r] v2uv_coef_thresh_rati_ratio
times the cross-correlation coefficient value at the point of the previous fundamental period in the former coefficients track. This product is not permitted to drop below v2uv_coef_thresh
.
If no candidates for the fundamental period are found, the frame is classified as being unvoiced. Otherwise, the candidates are further processed to identify the most likely true pitch period. During this additional processing, a threshold given by anti_doubling_thres
is used.
If the peak_tracking
flag is set to true, biasing is applied to the cross-correlation track as described in (Bagshaw et al., 1993).
sig | input waveform |
op | options regarding pitch tracking parameters |
op.min_pitch | minimum permitted F0 value |
op.max_pitch | maximum permitted F0 value |
op.pda_frame_shift | analysis frame shift |
op.pda_frame_length | analysis frame length |
op.lpf_cutoff | cut off frequency for low pass filtering |
op.lpf_order | order of low pass filtering (must be odd) |
op.decimation | |
op.noise_floor | |
op.min_v2uv_coef_thresh | |
op.v2uv_coef_thresh_ratio | |
op.v2uv_coef_thresh | |
op.anti_doubling_thresh | |
op.peak_tracking |
void smooth_phrase | ( | EST_Track & | c, |
EST_Track & | speech, | ||
EST_Features & | options, | ||
EST_Track & | sm | ||
) |
Smooth selected parts of an f0 contour. Interpolation is controlled by the speech
track. When a point has a positive value in the speech track, it is a candidate for interpolation.
Definition at line 54 of file smooth_pda.cc.
void smooth_portion | ( | EST_Track & | c, |
EST_Option & | op | ||
) |
Smooth all the points in an F0 contour