Edinburgh Speech Tools  2.1-release
Functions for use with frame based processing
Collaboration diagram for Functions for use with frame based processing:

Functions

void sig2coef (EST_Wave &sig, EST_Track &a, EST_String type, float factor=2.0, EST_WindowFunc *wf=EST_Window::creator(DEFAULT_WINDOW_NAME))
 
void sigpr_base (EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)
 
void power (EST_Wave &sig, EST_Track &a, float factor)
 
void energy (EST_Wave &sig, EST_Track &a, float factor)
 
void fbank (EST_Wave &sig, EST_Track &fbank, const float factor, EST_WindowFunc *wf=EST_Window::creator(DEFAULT_WINDOW_NAME), const bool up=false, const bool take_log=true)
 
void melcep (EST_Wave &sig, EST_Track &mfcc_track, float factor, int fbank_order, float liftering_parameter, EST_WindowFunc *wf=EST_Window::creator(DEFAULT_WINDOW_NAME), const bool include_c0=false, const bool up=false)
 

Detailed Description

In the following functions, the input is a EST_Wave waveform, and the output is a (usually multi-channel) EST_Track. The track must be set up appropriately before hand. This means the track must be resized accordingly with the correct numbers of frame and channels.

The positions of the frames are found by examination of the time array in the EST_Track, which must be filled prior to the function call. The usual requirement is for fixed frame analysis, where each analysis frame is, say, 10ms after the previous one.

A common alternative is to perform pitch-synchronous analysis where the time shift is related to the local pitch period.

Function Documentation

void sig2coef ( EST_Wave sig,
EST_Track a,
EST_String  type,
float  factor = 2.0,
EST_WindowFunc wf = EST_Window::creator(DEFAULT_WINDOW_NAME) 
)

Produce a single set of coefficients from a waveform. The type of coefficient required is given in the argument type.

Parameters
type{ Possible types are:
  • lpc: linear predictive coding
  • cep: cepstrum coding from lpc coefficients
  • melcep: Mel scale cepstrum coding via fbank
  • fbank: Mel scale log filterbank analysis
  • lsf: line spectral frequencies
  • ref: Linear prediction reflection coefficients
  • power:
  • f0: srpd algorithm
  • energy: root mean square energy }

The order of the analysis is calculated from the number of channels in fv. The positions of the analysis windows must be given by filling in the track's time array.

This function windows the waveform at the intervals given by the track time array. The length of each window is factor * the local time shift. The windowing function is given by wf.

Parameters
siginput waveform
fv{output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.}
typethe types of coefficients to be produced. "lpc", "cep" etc
factor{the frame length factor, i.e. the analysis frame length will be this times the local pitch period.}
wffunction for windowing. See Windowing mechanisms

Definition at line 399 of file sigpr_utt.cc.

void sigpr_base ( EST_Wave sig,
EST_Track fv,
EST_Features op,
const EST_StrList slist 
)

Produce multiple coefficients from a waveform by repeated calls to sig2coef.

Parameters
siginput waveform
fvoutput coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
opFeatures structure containing options for analysis order, frame shift etc.
slistlist of types of coefficients required, from the set of possible types that sig2coef can take.

Definition at line 138 of file sigpr_utt.cc.

void power ( EST_Wave sig,
EST_Track a,
float  factor 
)

Calculate the power for each frame of the waveform.

Parameters
siginput waveform
aoutput power track
factorthe frame length factor, i.e. the analysis frame length will be this times the local pitch period.

Definition at line 422 of file sigpr_utt.cc.

void energy ( EST_Wave sig,
EST_Track a,
float  factor 
)

Calculate the rms energy for each frame of the waveform.

This function calls sig2energy

Parameters
siginput waveform
aoutput coefficients
factoroptional: the frame length factor, i.e. the analysis frame length will be this times the local pitch period.

Definition at line 445 of file sigpr_utt.cc.

void fbank ( EST_Wave sig,
EST_Track fbank,
const float  factor,
EST_WindowFunc wf = EST_Window::creator(DEFAULT_WINDOW_NAME),
const bool  up = false,
const bool  take_log = true 
)

Mel scale filter bank analysis. The Mel scale triangular filters are computed via an FFT (see fastFFT). This routine is required for Mel cepstral analysis (see melcep). The analysis of each frame is done by sig2fbank.

A typical filter bank analysis for speech recognition might use log energy outputs from 20 filters.

Parameters
siginput waveform
fbankthe output. The number of filters is determined from the number size of this track.
factorthe frame length factor, i.e. the analysis frame length will be this times the local pitch period
wffunction for windowing. See {Windowing mechanisms}
upwhether the filterbank analysis should use power rather than energy.
take_logwhether to take logs of the filter outputs
See also
sig2fbank
melcep

Definition at line 496 of file sigpr_utt.cc.

void melcep ( EST_Wave sig,
EST_Track mfcc_track,
float  factor,
int  fbank_order,
float  liftering_parameter,
EST_WindowFunc wf = EST_Window::creator(DEFAULT_WINDOW_NAME),
const bool  include_c0 = false,
const bool  up = false 
)

Mel scale cepstral analysis via filter bank analysis. Cepstral parameters are computed for each frame of speech. The analysis requires fbank . The cepstral analysis of the filterbank outputs is performed by fbank2melcep .

A typical Mel cepstral coefficient (MFCC) analysis for speech recognition might use 12 cepstral coefficients computed from a 20 channel filterbank.

Parameters
siginput: waveform
mfcc_trackthe output
factorthe frame length factor, i.e. the analysis frame length will be this times the local pitch period
fbank_orderthe number of Mel scale filters used for the analysis
liftering_parameterfor filtering in the cepstral domain See fbank2melcep
wffunction for windowing. See Windowing mechanisms
include_c0whether the zero'th cepstral coefficient is to be included
upwhether the filterbank analysis should use power rather than energy.
See also
fbank
fbank2melcep

Definition at line 540 of file sigpr_utt.cc.