|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectinpro.irmrsc.parser.SITDBSParser
public class SITDBSParser
A Simple (robust) Incremental Top Down Beam Search Parser.
The parser is fed incrementally with new input tokens. Its candidate analyses are stored in a priority queue, serving the most probable first. The parser searches top down for analyses and moves those successfully matching the current input token in a new queue. The parser continues to search for further analyses until a probability threshold is reached, dynamically determined by the base beam factor the current size of the new queue and the highest probability in the new queue. All remaining analyses are pruned. Additionally, if robust parsing is activated, three robust operations allow to entertain hypotheses about deletions, insertions and repairs of input token. Robust operations are restricted in many ways: Only one is allowed to occur between two real input tokens. Each operation induces a probability malus. Finally there is a limit of robust operations per sentence. Inspired by Brian Roark 2001 Ph.D. Thesis, Department of Cognitive and Linguistic Sciences, Brown University.
Field Summary | |
---|---|
static boolean |
beRobust
allow or disallow the parser to use robust operations |
static int |
cntDegradations
a counter to keep track of the number of externally degraded analyses (see degradeAnalysis(CandidateAnalysis, double) ). |
static int |
cntDerivations
a counter to keep track of the number of derivations that survived a parsing step, i.e. of those that did not fell out of the beam |
static int |
cntExpansions
a counter to keep track of the number of expansions |
static int |
cntPrunes
a counter to keep track of the number of prunes analyses, i.e. of those that fell out of the beam |
static double |
deletionMalus
the probability malus a derivations receives for each repair deletion |
static Symbol |
endOfUtteranceTag
the name of the POS-tag marking the end of utterance |
static java.lang.String |
fillerRuleAndTagName
the name of the POS-tag and the (parser internal) syntactic rule for fillers |
static double |
insertionMalus
the probability malus a derivations receives for each repair insertion |
(package private) static org.apache.log4j.Logger |
logger
|
(package private) static java.lang.String |
logPrefix
|
static int |
maxCandidatesLimit
the maximum number of candidate analysis allowed in the parsers queue |
static int |
maxDeletions
the maximum number of deletion hypotheses allowed per sentence |
static int |
maxInsertions
the maximum number of insertion hypotheses allowed per sentence |
static int |
maxRepairs
the maximum number of repair hypotheses allowed per sentence |
private double |
mBaseBeamFactor
the base beam factor |
private Grammar |
mGrammar
the grammar use for parsing |
private java.util.PriorityQueue<CandidateAnalysis> |
mQueue
the parser main internal data structure |
static double |
repairMalus
the probability malus a derivations receives for each repair hypothesis |
static Symbol |
unknownTag
the name of the POS-tag for unknown tags |
Constructor Summary | |
---|---|
SITDBSParser(Grammar grammar)
|
|
SITDBSParser(Grammar grammar,
double bbf)
|
|
SITDBSParser(SITDBSParser p)
copy constructor |
Method Summary | |
---|---|
void |
degradeAnalysis(CandidateAnalysis ca,
double malus)
degrades the probability of a given CandidateAnalysis by a given malus |
void |
feed(java.lang.String nextToken)
feeds the parser with the next input token |
void |
feed(Symbol nextToken)
feeds the parser with the next input token |
int |
getNumberOfCompletableAnalyses()
returns the number of analyses that are completable, i.e. |
java.util.PriorityQueue<CandidateAnalysis> |
getQueue()
returns the parsers queue (or an empty queue if it's null) |
void |
info()
prints all derivations in the queue, for debugging |
void |
reset()
resets the parsers internal queue to initial state |
void |
setLogger(org.apache.log4j.Logger l)
|
static void |
setRobust(boolean v)
sets whether the parser is allowed to use robust operations or not |
void |
status()
prints some useful information |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
static java.lang.String logPrefix
static org.apache.log4j.Logger logger
public static int cntExpansions
public static int cntDegradations
degradeAnalysis(CandidateAnalysis, double)
).
public static int cntPrunes
public static int cntDerivations
public static boolean beRobust
public static int maxRepairs
public static double repairMalus
public static int maxInsertions
public static double insertionMalus
public static int maxDeletions
public static double deletionMalus
public static int maxCandidatesLimit
public static final java.lang.String fillerRuleAndTagName
public static final Symbol unknownTag
public static final Symbol endOfUtteranceTag
private java.util.PriorityQueue<CandidateAnalysis> mQueue
private Grammar mGrammar
private double mBaseBeamFactor
Constructor Detail |
---|
public SITDBSParser(Grammar grammar, double bbf)
public SITDBSParser(Grammar grammar)
public SITDBSParser(SITDBSParser p)
Method Detail |
---|
public void feed(java.lang.String nextToken)
public void feed(Symbol nextToken)
public int getNumberOfCompletableAnalyses()
public java.util.PriorityQueue<CandidateAnalysis> getQueue()
public void degradeAnalysis(CandidateAnalysis ca, double malus)
public void reset()
public void info()
public void status()
public void setLogger(org.apache.log4j.Logger l)
public static void setRobust(boolean v)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |