inpro.irmrsc.parser
Class SITDBSParser

java.lang.Object
  extended by inpro.irmrsc.parser.SITDBSParser

public class SITDBSParser
extends java.lang.Object

A Simple (robust) Incremental Top Down Beam Search Parser.

The parser is fed incrementally with new input tokens. Its candidate analyses are stored in a priority queue, serving the most probable first. The parser searches top down for analyses and moves those successfully matching the current input token in a new queue. The parser continues to search for further analyses until a probability threshold is reached, dynamically determined by the base beam factor the current size of the new queue and the highest probability in the new queue. All remaining analyses are pruned.

Additionally, if robust parsing is activated, three robust operations allow to entertain hypotheses about deletions, insertions and repairs of input token. Robust operations are restricted in many ways: Only one is allowed to occur between two real input tokens. Each operation induces a probability malus. Finally there is a limit of robust operations per sentence.

Inspired by Brian Roark 2001 Ph.D. Thesis, Department of Cognitive and Linguistic Sciences, Brown University.

Author:
Andreas Peldszus

Field Summary
static boolean beRobust
          allow or disallow the parser to use robust operations
static int cntDegradations
          a counter to keep track of the number of externally degraded analyses (see degradeAnalysis(CandidateAnalysis, double)).
static int cntDerivations
          a counter to keep track of the number of derivations that survived a parsing step, i.e. of those that did not fell out of the beam
static int cntExpansions
          a counter to keep track of the number of expansions
static int cntPrunes
          a counter to keep track of the number of prunes analyses, i.e. of those that fell out of the beam
static double deletionMalus
          the probability malus a derivations receives for each repair deletion
static Symbol endOfUtteranceTag
          the name of the POS-tag marking the end of utterance
static java.lang.String fillerRuleAndTagName
          the name of the POS-tag and the (parser internal) syntactic rule for fillers
static double insertionMalus
          the probability malus a derivations receives for each repair insertion
(package private) static org.apache.log4j.Logger logger
           
(package private) static java.lang.String logPrefix
           
static int maxCandidatesLimit
          the maximum number of candidate analysis allowed in the parsers queue
static int maxDeletions
          the maximum number of deletion hypotheses allowed per sentence
static int maxInsertions
          the maximum number of insertion hypotheses allowed per sentence
static int maxRepairs
          the maximum number of repair hypotheses allowed per sentence
private  double mBaseBeamFactor
          the base beam factor
private  Grammar mGrammar
          the grammar use for parsing
private  java.util.PriorityQueue<CandidateAnalysis> mQueue
          the parser main internal data structure
static double repairMalus
          the probability malus a derivations receives for each repair hypothesis
static Symbol unknownTag
          the name of the POS-tag for unknown tags
 
Constructor Summary
SITDBSParser(Grammar grammar)
           
SITDBSParser(Grammar grammar, double bbf)
           
SITDBSParser(SITDBSParser p)
          copy constructor
 
Method Summary
 void degradeAnalysis(CandidateAnalysis ca, double malus)
          degrades the probability of a given CandidateAnalysis by a given malus
 void feed(java.lang.String nextToken)
          feeds the parser with the next input token
 void feed(Symbol nextToken)
          feeds the parser with the next input token
 int getNumberOfCompletableAnalyses()
          returns the number of analyses that are completable, i.e.
 java.util.PriorityQueue<CandidateAnalysis> getQueue()
          returns the parsers queue (or an empty queue if it's null)
 void info()
          prints all derivations in the queue, for debugging
 void reset()
          resets the parsers internal queue to initial state
 void setLogger(org.apache.log4j.Logger l)
           
static void setRobust(boolean v)
          sets whether the parser is allowed to use robust operations or not
 void status()
          prints some useful information
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logPrefix

static java.lang.String logPrefix

logger

static org.apache.log4j.Logger logger

cntExpansions

public static int cntExpansions
a counter to keep track of the number of expansions


cntDegradations

public static int cntDegradations
a counter to keep track of the number of externally degraded analyses (see degradeAnalysis(CandidateAnalysis, double)).


cntPrunes

public static int cntPrunes
a counter to keep track of the number of prunes analyses, i.e. of those that fell out of the beam


cntDerivations

public static int cntDerivations
a counter to keep track of the number of derivations that survived a parsing step, i.e. of those that did not fell out of the beam


beRobust

public static boolean beRobust
allow or disallow the parser to use robust operations


maxRepairs

public static int maxRepairs
the maximum number of repair hypotheses allowed per sentence


repairMalus

public static double repairMalus
the probability malus a derivations receives for each repair hypothesis


maxInsertions

public static int maxInsertions
the maximum number of insertion hypotheses allowed per sentence


insertionMalus

public static double insertionMalus
the probability malus a derivations receives for each repair insertion


maxDeletions

public static int maxDeletions
the maximum number of deletion hypotheses allowed per sentence


deletionMalus

public static double deletionMalus
the probability malus a derivations receives for each repair deletion


maxCandidatesLimit

public static int maxCandidatesLimit
the maximum number of candidate analysis allowed in the parsers queue


fillerRuleAndTagName

public static final java.lang.String fillerRuleAndTagName
the name of the POS-tag and the (parser internal) syntactic rule for fillers

See Also:
Constant Field Values

unknownTag

public static final Symbol unknownTag
the name of the POS-tag for unknown tags


endOfUtteranceTag

public static final Symbol endOfUtteranceTag
the name of the POS-tag marking the end of utterance


mQueue

private java.util.PriorityQueue<CandidateAnalysis> mQueue
the parser main internal data structure


mGrammar

private Grammar mGrammar
the grammar use for parsing


mBaseBeamFactor

private double mBaseBeamFactor
the base beam factor

Constructor Detail

SITDBSParser

public SITDBSParser(Grammar grammar,
                    double bbf)

SITDBSParser

public SITDBSParser(Grammar grammar)

SITDBSParser

public SITDBSParser(SITDBSParser p)
copy constructor

Method Detail

feed

public void feed(java.lang.String nextToken)
feeds the parser with the next input token


feed

public void feed(Symbol nextToken)
feeds the parser with the next input token


getNumberOfCompletableAnalyses

public int getNumberOfCompletableAnalyses()
returns the number of analyses that are completable, i.e. that have no symbols or only eliminable symbols on their stack


getQueue

public java.util.PriorityQueue<CandidateAnalysis> getQueue()
returns the parsers queue (or an empty queue if it's null)


degradeAnalysis

public void degradeAnalysis(CandidateAnalysis ca,
                            double malus)
degrades the probability of a given CandidateAnalysis by a given malus


reset

public void reset()
resets the parsers internal queue to initial state


info

public void info()
prints all derivations in the queue, for debugging


status

public void status()
prints some useful information


setLogger

public void setLogger(org.apache.log4j.Logger l)

setRobust

public static void setRobust(boolean v)
sets whether the parser is allowed to use robust operations or not