Modules | |
Frequency conversion functions | |
end of filter bank and cepstral analysis | |
Functions | |
void | sig2fbank (const EST_FVector &sig, EST_FVector &fbank_frame, const float sample_rate, const bool use_power_rather_than_energy, const bool take_log) |
void | sig2fft (const EST_FVector &sig, EST_FVector &fft_vec, const bool use_power_rather_than_energy) |
void | fft2fbank (const EST_FVector &fft_frame, EST_FVector &fbank_vec, const float Hz_per_fft_coeff, const EST_FVector &mel_fbank_frequencies) |
void | fbank2melcep (const EST_FVector &fbank_vec, EST_FVector &mfcc, const float liftering_parameter, const bool include_c0=false) |
void | make_mel_triangular_filter (const float this_mel_centre, const float this_mel_low, const float this_mel_high, const float Hz_per_fft_coeff, const int half_fft_order, int &fft_index_start, EST_FVector &filter) |
void sig2fbank | ( | const EST_FVector & | sig, |
EST_FVector & | fbank_frame, | ||
const float | sample_rate, | ||
const bool | use_power_rather_than_energy, | ||
const bool | take_log | ||
) |
Calculate the (log) energy (or power) in each channel of a Mel scale filter bank for a frame of speech. The filters are triangular, are evenly spaced and are all of equal width, on a Mel scale. The upper and lower cutoffs of each filter are at the centre frequencies of the adjacent filters. The Mel scale is described under Hz2Mel
.
Definition at line 538 of file sigpr_frame.cc.
void sig2fft | ( | const EST_FVector & | sig, |
EST_FVector & | fft_vec, | ||
const bool | use_power_rather_than_energy | ||
) |
Calculate the energy (or power) spectrum of a frame of speech. The FFT order is determined by the number of samples in the frame of speech, and is a power of 2. Note that the FFT vector returned corresponds to frequencies from 0 to half the sample rate. Energy is the magnitude of the FFT; power is the squared magnitude.
Definition at line 590 of file sigpr_frame.cc.
void fft2fbank | ( | const EST_FVector & | fft_frame, |
EST_FVector & | fbank_vec, | ||
const float | Hz_per_fft_coeff, | ||
const EST_FVector & | mel_fbank_frequencies | ||
) |
Given a Mel filter bank description, bin the FFT coefficients to compute the output of the filters. The first and last elements of mel_fbank_frequencies
define the lower and upper bound of the first and last filters respectively and the intervening elements give the filter centre frequencies. That is, mel_fbank_frequencies
has two more elements than fbank_vec
.
Definition at line 635 of file sigpr_frame.cc.
void fbank2melcep | ( | const EST_FVector & | fbank_vec, |
EST_FVector & | mfcc, | ||
const float | liftering_parameter, | ||
const bool | include_c0 = false |
||
) |
Compute the discrete cosine transform of log Mel-scale filter bank output to get the Mel cepstral coefficients for a frame of speech. Optional liftering (filtering in the cepstral domain) can be applied to normalise the magnitudes of the coefficients. This is useful because, typically, the higher order cepstral coefficients are significantly smaller than the lower ones and it is often desirable to normalise the means and variances across coefficients.
The lifter (cepstral filter) used is:
A typical value of L used in speech recognition is 22. A value of L=0 is taken to mean no liftering. This is equivalent to L=1.
Definition at line 684 of file sigpr_frame.cc.
void make_mel_triangular_filter | ( | const float | this_mel_centre, |
const float | this_mel_low, | ||
const float | this_mel_high, | ||
const float | Hz_per_fft_coeff, | ||
const int | half_fft_order, | ||
int & | fft_index_start, | ||
EST_FVector & | filter | ||
) |
Make a triangular Mel scale filter. The filter is centred at this_mel_centre
and extends from this_mel_low
to this_mel_high
. half_fft_order
is the length of a power/energy spectrum covering 0Hz to half the sampling frequency with a resolution of Hz_per_fft_coeff
.
The routine returns a vector of weights to be applied to the energy/power spectrum starting at element fft_index_start
. The number of points (FFT coefficients) covered by the filter is given by the length of the returned vector filter
.
Definition at line 730 of file sigpr_frame.cc.