Edinburgh Speech Tools  2.1-release
Frame based filter bank and cepstral analysis
Collaboration diagram for Frame based filter bank and cepstral analysis:

Modules

 Frequency conversion functions
 end of filter bank and cepstral analysis
 

Functions

void sig2fbank (const EST_FVector &sig, EST_FVector &fbank_frame, const float sample_rate, const bool use_power_rather_than_energy, const bool take_log)
 
void sig2fft (const EST_FVector &sig, EST_FVector &fft_vec, const bool use_power_rather_than_energy)
 
void fft2fbank (const EST_FVector &fft_frame, EST_FVector &fbank_vec, const float Hz_per_fft_coeff, const EST_FVector &mel_fbank_frequencies)
 
void fbank2melcep (const EST_FVector &fbank_vec, EST_FVector &mfcc, const float liftering_parameter, const bool include_c0=false)
 
void make_mel_triangular_filter (const float this_mel_centre, const float this_mel_low, const float this_mel_high, const float Hz_per_fft_coeff, const int half_fft_order, int &fft_index_start, EST_FVector &filter)
 

Detailed Description

Function Documentation

void sig2fbank ( const EST_FVector sig,
EST_FVector fbank_frame,
const float  sample_rate,
const bool  use_power_rather_than_energy,
const bool  take_log 
)

Calculate the (log) energy (or power) in each channel of a Mel scale filter bank for a frame of speech. The filters are triangular, are evenly spaced and are all of equal width, on a Mel scale. The upper and lower cutoffs of each filter are at the centre frequencies of the adjacent filters. The Mel scale is described under Hz2Mel.

See also
Hz2Mel
sig2fft
fft2fbank

Definition at line 538 of file sigpr_frame.cc.

void sig2fft ( const EST_FVector sig,
EST_FVector fft_vec,
const bool  use_power_rather_than_energy 
)

Calculate the energy (or power) spectrum of a frame of speech. The FFT order is determined by the number of samples in the frame of speech, and is a power of 2. Note that the FFT vector returned corresponds to frequencies from 0 to half the sample rate. Energy is the magnitude of the FFT; power is the squared magnitude.

See also
fft2fbank
sig2fbank

Definition at line 590 of file sigpr_frame.cc.

void fft2fbank ( const EST_FVector fft_frame,
EST_FVector fbank_vec,
const float  Hz_per_fft_coeff,
const EST_FVector mel_fbank_frequencies 
)

Given a Mel filter bank description, bin the FFT coefficients to compute the output of the filters. The first and last elements of mel_fbank_frequencies define the lower and upper bound of the first and last filters respectively and the intervening elements give the filter centre frequencies. That is, mel_fbank_frequencies has two more elements than fbank_vec.

See also
fastFFT
sig2fft
sig2fbank
fbank2melcep

Definition at line 635 of file sigpr_frame.cc.

void fbank2melcep ( const EST_FVector fbank_vec,
EST_FVector mfcc,
const float  liftering_parameter,
const bool  include_c0 = false 
)

Compute the discrete cosine transform of log Mel-scale filter bank output to get the Mel cepstral coefficients for a frame of speech. Optional liftering (filtering in the cepstral domain) can be applied to normalise the magnitudes of the coefficients. This is useful because, typically, the higher order cepstral coefficients are significantly smaller than the lower ones and it is often desirable to normalise the means and variances across coefficients.

The lifter (cepstral filter) used is:

\[c_i' = \{ 1 + \frac{L}{2} sin \frac{\Pi i}{L} \} \; c_i\]

A typical value of L used in speech recognition is 22. A value of L=0 is taken to mean no liftering. This is equivalent to L=1.

See also
sig2fft
fft2fbank
sig2fbank

Definition at line 684 of file sigpr_frame.cc.

void make_mel_triangular_filter ( const float  this_mel_centre,
const float  this_mel_low,
const float  this_mel_high,
const float  Hz_per_fft_coeff,
const int  half_fft_order,
int fft_index_start,
EST_FVector filter 
)

Make a triangular Mel scale filter. The filter is centred at this_mel_centre and extends from this_mel_low to this_mel_high. half_fft_order is the length of a power/energy spectrum covering 0Hz to half the sampling frequency with a resolution of Hz_per_fft_coeff.

The routine returns a vector of weights to be applied to the energy/power spectrum starting at element fft_index_start. The number of points (FFT coefficients) covered by the filter is given by the length of the returned vector filter.

See also
fft2fbank
Hz2Mel
Mel2Hz

Definition at line 730 of file sigpr_frame.cc.