inpro.sphinx.frontend
Class QuickSpeechMarker

java.lang.Object
  extended by edu.cmu.sphinx.util.props.ConfigurableAdapter
      extended by edu.cmu.sphinx.frontend.BaseDataProcessor
          extended by inpro.sphinx.frontend.QuickSpeechMarker
All Implemented Interfaces:
edu.cmu.sphinx.frontend.DataProcessor, edu.cmu.sphinx.util.props.Configurable

public class QuickSpeechMarker
extends edu.cmu.sphinx.frontend.BaseDataProcessor

Converts a stream of SpeechClassifiedData objects, marked as speech and non-speech, and mark out the regions that are considered speech. This is done by inserting SPEECH_START and SPEECH_END signals into the stream.

The algorithm for inserting the two signals is as follows.

The algorithm is always in one of two states: 'in-speech' and 'out-of-speech'. If 'out-of-speech', it will read in audio until we hit audio that is speech. If we have read more than 'startSpeech' amount of continuous speech, we consider that speech has started, and insert a SPEECH_START at 'speechLeader' time before speech first started. The state of the algorithm changes to 'in-speech'.

Now consider the case when the algorithm is in 'in-speech' state. If it read an audio that is speech, it is scheduled for output. If the audio is non-speech, we read ahead until we have 'endSilence' amount of continuous non-speech. At the point we consider that speech has ended. A SPEECH_END signal is inserted at 'speechTrailer' time after the first non-speech audio. The algorithm returns to 'out-of-speech' state. If any speech audio is encountered in-between, the accounting starts all over again. While speech audio is processed delay is lowered to some minimal amount. This helps to segment both slow speech with visible delays and fast speech when delays are minimal.


Nested Class Summary
private static class QuickSpeechMarker.State
           
 
Field Summary
private  java.util.List<edu.cmu.sphinx.frontend.Data> buffer
          frames that have to be kept, in order to possibly insert a signal into the stream
private  int currentSilence
           
private  int currentSpeech
           
private  int endSilenceTime
           
private  java.util.Queue<edu.cmu.sphinx.frontend.Data> outputQueue
          processed frames which are ready to be released
static java.lang.String PROP_END_SILENCE
          The property for the amount of time in silence (in milliseconds) to be considered as utterance end.
static java.lang.String PROP_SPEECH_LEADER
          The property for the amount of time (in milliseconds) before speech start to be included as speech data.
static java.lang.String PROP_SPEECH_TRAILER
          The property for the amount of time (in milliseconds) after speech ends to be included as speech data.
static java.lang.String PROP_START_SPEECH
          The property for the minimum amount of time in speech (in milliseconds) to be considered as utterance start.
private  int speechLeader
           
private  int speechTrailer
           
private  int startSpeechTime
           
(package private)  QuickSpeechMarker.State state
           
 
Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
logger
 
Constructor Summary
QuickSpeechMarker()
           
QuickSpeechMarker(int startSpeechTime, int endSilenceTime, int speechLeader, int speechTrailer)
           
 
Method Summary
private  void endOfSpeech()
          transition from IN_SPEECH to NON_SPEECH.
private  void flushBuffer()
          flush the internal buffer to outputQueue
 int getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)
          Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object.
(package private)  long getCollectTimeOfNextFrame()
          the collection time of the next frame in the buffer
 edu.cmu.sphinx.frontend.Data getData()
          Returns the next Data object.
private  void handleNewFrameInSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
          what to do when we're in speech.
private  void handleNewFrameNonSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
          what to do when we're out-of-speech: If incoming audio isSpeech, we we check whether there was startSpeechTime amount of speech yet.
 void initialize()
          Initializes this SpeechMarker
 boolean inSpeech()
           
 void newProperties(edu.cmu.sphinx.util.props.PropertySheet ps)
           
 edu.cmu.sphinx.frontend.Data nextOutputFrame()
          gets one Data object from outputQueue, unwraps SpeechClassifiedData and tags DataStartSignals
private  void processOneInputFrame()
          read one frame of input with all the necessary accounting.
private  edu.cmu.sphinx.frontend.Data readData()
          read one data object from the predecessor in the frontend pipeline
private  void reset()
          Resets this SpeechMarker to a starting state.
private  void startOfSpeech()
          transition from NON_SPEECH to IN_SPEECH.
 java.lang.String toString()
           
private  void trimBufferToLeader()
          release the buffer so that it contains just speechLeader amount of audio (and update currentSilence accordingly)
private  void updateCounters(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
          update the counters that track speech/silence duration.
 
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getPredecessor, getTimer, setPredecessor
 
Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
getName, initLogger
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_START_SPEECH

@S4Integer(defaultValue=200)
public static final java.lang.String PROP_START_SPEECH
The property for the minimum amount of time in speech (in milliseconds) to be considered as utterance start.

See Also:
Constant Field Values

startSpeechTime

private int startSpeechTime

PROP_END_SILENCE

@S4Integer(defaultValue=500)
public static final java.lang.String PROP_END_SILENCE
The property for the amount of time in silence (in milliseconds) to be considered as utterance end.

See Also:
Constant Field Values

endSilenceTime

private int endSilenceTime

PROP_SPEECH_LEADER

@S4Integer(defaultValue=50)
public static final java.lang.String PROP_SPEECH_LEADER
The property for the amount of time (in milliseconds) before speech start to be included as speech data.

See Also:
Constant Field Values

speechLeader

private int speechLeader

PROP_SPEECH_TRAILER

@S4Integer(defaultValue=50)
public static final java.lang.String PROP_SPEECH_TRAILER
The property for the amount of time (in milliseconds) after speech ends to be included as speech data.

See Also:
Constant Field Values

speechTrailer

private int speechTrailer

outputQueue

private java.util.Queue<edu.cmu.sphinx.frontend.Data> outputQueue
processed frames which are ready to be released


buffer

private java.util.List<edu.cmu.sphinx.frontend.Data> buffer
frames that have to be kept, in order to possibly insert a signal into the stream


currentSilence

private int currentSilence

currentSpeech

private int currentSpeech

state

QuickSpeechMarker.State state
Constructor Detail

QuickSpeechMarker

public QuickSpeechMarker(int startSpeechTime,
                         int endSilenceTime,
                         int speechLeader,
                         int speechTrailer)

QuickSpeechMarker

public QuickSpeechMarker()
Method Detail

newProperties

public void newProperties(edu.cmu.sphinx.util.props.PropertySheet ps)
                   throws edu.cmu.sphinx.util.props.PropertyException
Specified by:
newProperties in interface edu.cmu.sphinx.util.props.Configurable
Overrides:
newProperties in class edu.cmu.sphinx.util.props.ConfigurableAdapter
Throws:
edu.cmu.sphinx.util.props.PropertyException

initialize

public void initialize()
Initializes this SpeechMarker

Specified by:
initialize in interface edu.cmu.sphinx.frontend.DataProcessor
Overrides:
initialize in class edu.cmu.sphinx.frontend.BaseDataProcessor

reset

private void reset()
Resets this SpeechMarker to a starting state.


getData

public edu.cmu.sphinx.frontend.Data getData()
Returns the next Data object.

Specified by:
getData in interface edu.cmu.sphinx.frontend.DataProcessor
Specified by:
getData in class edu.cmu.sphinx.frontend.BaseDataProcessor
Returns:
the next Data object, or null after all data has been returned

nextOutputFrame

public edu.cmu.sphinx.frontend.Data nextOutputFrame()
gets one Data object from outputQueue, unwraps SpeechClassifiedData and tags DataStartSignals

Returns:
the next Data object from the outputQueue

processOneInputFrame

private void processOneInputFrame()
read one frame of input with all the necessary accounting. this may or may not add frames to outputBuffer


updateCounters

private void updateCounters(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
update the counters that track speech/silence duration. depending on whether incoming audio isSpeech or not, we update currentSpeech/Silence counters correspondingly.


handleNewFrameInSpeech

private void handleNewFrameInSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
what to do when we're in speech. If incoming audio isSpeech, we can directly release the buffer to outputQueue. If it's not speech, and the silence reaches endSilenceTime, we transition to end of speech endOfSpeech(), if it's not speech and we release the buffer *if* we are still within speechTrailer amount of silence (otherwise, we have to hold back the remaining frames, because a DataEndSignal may have to be inserted later.

Parameters:
scd - the frame at this time step

handleNewFrameNonSpeech

private void handleNewFrameNonSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
what to do when we're out-of-speech: If incoming audio isSpeech, we we check whether there was startSpeechTime amount of speech yet. If so, we transition to IN_SPEECH.

Parameters:
scd - the frame at this time step

startOfSpeech

private void startOfSpeech()
transition from NON_SPEECH to IN_SPEECH. we insert a SpeechStartSignal into the outputQueue and release the buffer to the outputQueue


getCollectTimeOfNextFrame

long getCollectTimeOfNextFrame()
the collection time of the next frame in the buffer


endOfSpeech

private void endOfSpeech()
transition from IN_SPEECH to NON_SPEECH. we insert a SpeechEndSignal into the outputQueue and partially release the buffer so that it contains no more than needed for the next speechLeader


flushBuffer

private void flushBuffer()
flush the internal buffer to outputQueue


trimBufferToLeader

private void trimBufferToLeader()
release the buffer so that it contains just speechLeader amount of audio (and update currentSilence accordingly)


toString

public java.lang.String toString()
Overrides:
toString in class edu.cmu.sphinx.util.props.ConfigurableAdapter

readData

private edu.cmu.sphinx.frontend.Data readData()
                                       throws edu.cmu.sphinx.frontend.DataProcessingException
read one data object from the predecessor in the frontend pipeline

Throws:
edu.cmu.sphinx.frontend.DataProcessingException

getAudioTime

public int getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)
Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object.

Parameters:
audio - the SpeechClassifiedData object
Returns:
the amount of audio data in milliseconds

inSpeech

public boolean inSpeech()