Edinburgh Speech Tools  2.1-release
scfg_train

Train the parameters of a stochastic context free grammar

Synopsis

scfg_train [options [-grammar ifile] [-corpus ifile] [-method string] [-passes int] [-startpass int] [-spread int] [-checkpoint int] [-heap int] [-o ofile]

scfg_train takes a stochastic context free grammar (SCFG) and trains the probabilities with respect to a given bracket corpus using the inside-outside algorithm. This is basically an implementation of Pereira and Schabes 1992.

Note using this program properly may require months of CPU time.

Options

  • -grammar: ifile Grammar file, one rule per line.
  • -corpus: ifile Corpus file, one bracketed sentence per line.
  • -method: string " {inout}" Method for training: inout.
  • -passes: int " {50}" Number of training passes.
  • -startpass: int " {0}" Starting at pass N.
  • -spread: int Spread training data over N passes.
  • -checkpoint: int Save grammar every N passes
  • -heap: int " {210000}" Set size of Lisp heap, needed for large corpora
  • -o: ofile Output file for trained grammar.