Generate signal processing coefficients from waveforms
Synopsis
sig2fv [input file] -o [output file] [-h ] [-itype string] [-n int] [-f int] [-ibo string] [-iswap ] [-istype string] [-c string] [-start float] [-end float] [-from int] [-to int] [-otype string] [-S float] [-o ofile] [-shift float] [-factor float] [-pm ifile] [-size float] [-coefs string] [-delta string] [-acc string] [-window_type string] [-lpc_order int] [-ref_order int] [-cep_order int] [-melcep_order int] [-fbank_order int] [-preemph float] [-lifter float] [-usepower ] [-include_c0 ] [-order string]
sig2fv
is used to create signal processing feature vector analysis on speech waveforms. The following types of analysis are provided:
- Linear prediction (LPC)
- Cepstrum coding from lpc coefficients
- Mel scale cepstrum coding via fbank
- Mel scale log filterbank analysis
- Line spectral frequencies
- Linear prediction reflection coefficients
- Root mean square energy
- Power
- fundamental frequency (pitch)
- calculation of delta and acceleration coefficients of all of the above
The -coefs option is used to specify a list of the names of what sort of basic processing is required, and -delta and -acc are used for delta and acceleration coefficients respectively.
Options
- -h: Options help
- -itype: string Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
- -n: int Number of channels in an unheadered input file
- -f: int Sample rate in Hertz for an unheadered input file
- -ibo: string Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
- -iswap: Swap bytes. (For use on an unheadered input file)
- -istype: string Sample type in an unheadered input file: short, alaw, mulaw, byte, ascii
- -c: string Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
- -start: float Extract sub-wave starting at this time, specified in seconds
- -end: float Extract sub-wave ending at this time, specified in seconds
- -from: int Extract sub-wave starting at this sample point
- -to: int Extract sub-wave ending at this sample point
- -otype: string " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_mfcc_e, htk_user, htk_discrete, ssff, xmg, xgraph, ema, ema_swapped, ascii, label
- -S: float Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing
- -o: ofile Output filename, defaults to stdout
- -shift: float frame spacing in seconds for fixed frame analysis. This doesn't have to be the same as the output file spacing - the S option can be used to resample the track before saving default: 0.010
- -factor: float Frames lengths will be FACTOR times the local pitch period. default: 2.000
- -pm: ifile Pitch mark file name. This is used to specify the positions of the analysis frames for pitch synchronous analysis. Pitchmark files are just standard track files, but the channel information is ignored and only the time positions are used
- -size: float If specified with pm, size is used as the fixed window size (times factor) rather than size within each the pms.
- -coefs: string list of basic types of processing required. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
- -delta: string list of delta types of processing required. Basic processing does not need to be specified for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
- -acc: string list of acceleration (delta delta) processing required. Basic processing does not need to be specified for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
- -window_type: string Type of window used on waveform. Permissable types are: none unknown window type rectangle Rectangular window triangle Triangular window hanning Hanning window hamming Hamming window default: hamming
- -lpc_order: int Order of lpc analysis.
- -ref_order: int Order of lpc reflection coefficient analysis.
- -cep_order: int Order of lpc cepstral analysis.
- -melcep_order: int Order of Mel cepstral analysis.
- -fbank_order: int Order of filter bank analysis.
- -preemph: float Perform pre-emphasis with this factor.
- -lifter: float lifter coefficient.
- -usepower: use power rather than energy in filter bank analysis
- -include_c0: include cepstral coefficient 0
- -order: string order of analyses
Examples
Fixed frame basic linear prediction:
To produce a set of linear prediction coefficients at every 10ms, using pre-emphasis and saving in EST format:
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5
Pitch Synchronous linear prediction**: The following used the set of pitchmarks in kdt_010.pm as the centres of the analysis windows.
$ sig2fv kdt_010.wav -pm kdt_010.pm -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5
F0, Linear prediction and cepstral coefficients:
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -shift 0.01
Note that pitchtracking can also be done with the pda
program. Both use the same underlying technique, but the pda program offers much finer control over the pitch track specific processing parameters.
Energy, Linear Prediction and Cepstral coefficients, with a 10ms frame shift during analysis but a 5ms frame shift in the output file:
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -S 0.005
-shift 0.01
Delta and acc coefficients can be calculated even if their base form is not required. This produces normal energy coefficients and cepstral delta coefficients:
$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "energy" -delta "cep" -otype est
Mel-scaled cepstra, Delta and acc coefficients, as is common in speech recognition:
$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "melcep" -delta "melcep" -acc "melcep" -otype est -preemph 0.96