A class used to train (and test) SCFGs is an extension of EST_SCFG. More...
#include <include/EST_SCFG.h>


Public Member Functions | |
| EST_SCFG_traintest () | |
| ~EST_SCFG_traintest () | |
| void | test_corpus () | 
| void | test_crossbrackets () | 
| void | load_corpus (const EST_String &filename) | 
| void | train_inout (int passes, int startpass, int checkpoint, int spread, const EST_String &outfile) | 
  Public Member Functions inherited from EST_SCFG | |
| EST_SCFG () | |
| EST_SCFG (LISP rules) | |
| Initialize from a set of rules.  More... | |
| ~EST_SCFG () | |
| EST_read_status | load (const EST_String &filename) | 
| Load grammar from named file.  More... | |
| EST_write_status | save (const EST_String &filename) | 
| Save current grammar to named file.  More... | |
| void | set_rules (LISP rules) | 
| Set (or reset) rules from external source after construction.  More... | |
| LISP | get_rules () | 
| Return rules as LISP list.  More... | |
| int | distinguished_symbol () const | 
| void | find_terms_nonterms (EST_StrList &nt, EST_StrList &t, LISP rules) | 
| EST_String | nonterminal (int p) const | 
| Convert nonterminal index to string form.  More... | |
| EST_String | terminal (int m) const | 
| Convert terminal index to string form.  More... | |
| int | nonterminal (const EST_String &p) const | 
| Convert nonterminal string to index.  More... | |
| int | terminal (const EST_String &m) const | 
| Convert terminal string to index.  More... | |
| int | num_nonterminals () const | 
| Number of nonterminals.  More... | |
| int | num_terminals () const | 
| Number of terminals.  More... | |
| double | prob_B (int p, int q, int r) const | 
| The rule probability of given binary rule.  More... | |
| double | prob_U (int p, int m) const | 
| The rule probability of given unary rule.  More... | |
| void | set_rule_prob_cache () | 
| (re-)set rule probability caches  More... | |
Additional Inherited Members | |
  Public Attributes inherited from EST_SCFG | |
| SCFGRuleList | rules | 
| The rules themselves.  More... | |
A class used to train (and test) SCFGs is an extension of EST_SCFG.
This offers an implementation of Pereira and Schabes ``Inside-Outside reestimation from partially bracket corpora.'' ACL 1992.
A SCFG maybe trained from a corpus (optionally) containing brackets over a series of passes reestimating the grammar probabilities after each pass. This basically extends the EST_SCFG class adding support for a bracket corpus and various indexes for efficient use of the grammar.
Definition at line 259 of file EST_SCFG.h.
| EST_SCFG_traintest::EST_SCFG_traintest | ( | void | ) | 
Definition at line 196 of file EST_SCFG_inout.cc.
| EST_SCFG_traintest::~EST_SCFG_traintest | ( | void | ) | 
Definition at line 204 of file EST_SCFG_inout.cc.
| void EST_SCFG_traintest::test_corpus | ( | ) | 
Test the current grammar against the current corpus print summary.
Cross entropy measure only is given.
Definition at line 561 of file EST_SCFG_inout.cc.
| void EST_SCFG_traintest::test_crossbrackets | ( | ) | 
Test the current grammar against the current corpus.
Summary includes percentage of cross bracketing accuracy and percentage of fully correct parses.
Definition at line 519 of file EST_SCFG_Chart.cc.
| void EST_SCFG_traintest::load_corpus | ( | const EST_String & | filename | ) | 
Load a corpus from the given file.
Each sentence in the corpus should be contained in parentheses. Additional parenthesis may be used to denote phrasing within a sentence. The corpus is read using the LISP reader so LISP conventions shold apply, notable single quotes should appear within double quotes.
Definition at line 209 of file EST_SCFG_inout.cc.
| void EST_SCFG_traintest::train_inout | ( | int | passes, | 
| int | startpass, | ||
| int | checkpoint, | ||
| int | spread, | ||
| const EST_String & | outfile | ||
| ) | 
Train a grammar using the loaded corpus.
| passes | the number of training passes desired. | 
| startpass | from which pass to start from | 
| checkpoint | save the grammar every n passes | 
| spread | Percentage of corpus to use on each pass, this cycles through the corpus on each pass. | 
| outfile | Output file name | 
Definition at line 484 of file EST_SCFG_inout.cc.