QuickSpeechMarker

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

inpro.sphinx.frontend
Class QuickSpeechMarker

java.lang.Object
  edu.cmu.sphinx.util.props.ConfigurableAdapter
      edu.cmu.sphinx.frontend.BaseDataProcessor
          inpro.sphinx.frontend.QuickSpeechMarker

All Implemented Interfaces:: edu.cmu.sphinx.frontend.DataProcessor, edu.cmu.sphinx.util.props.Configurable

public class QuickSpeechMarker
extends edu.cmu.sphinx.frontend.BaseDataProcessor
extends edu.cmu.sphinx.frontend.BaseDataProcessor

Converts a stream of SpeechClassifiedData objects, marked as speech and non-speech, and mark out the regions that are considered speech. This is done by inserting SPEECH_START and SPEECH_END signals into the stream.

The algorithm for inserting the two signals is as follows.

The algorithm is always in one of two states: 'in-speech' and 'out-of-speech'. If 'out-of-speech', it will read in audio until we hit audio that is speech. If we have read more than 'startSpeech' amount of continuous speech, we consider that speech has started, and insert a SPEECH_START at 'speechLeader' time before speech first started. The state of the algorithm changes to 'in-speech'.

Now consider the case when the algorithm is in 'in-speech' state. If it read an audio that is speech, it is scheduled for output. If the audio is non-speech, we read ahead until we have 'endSilence' amount of continuous non-speech. At the point we consider that speech has ended. A SPEECH_END signal is inserted at 'speechTrailer' time after the first non-speech audio. The algorithm returns to 'out-of-speech' state. If any speech audio is encountered in-between, the accounting starts all over again. While speech audio is processed delay is lowered to some minimal amount. This helps to segment both slow speech with visible delays and fast speech when delays are minimal.

Nested Class Summary
`private static class`	`QuickSpeechMarker.State`

Field Summary
`private java.util.List<edu.cmu.sphinx.frontend.Data>`	`buffer` frames that have to be kept, in order to possibly insert a signal into the stream
`private int`	`currentSilence`
`private int`	`currentSpeech`
`private int`	`endSilenceTime`
`private java.util.Queue<edu.cmu.sphinx.frontend.Data>`	`outputQueue` processed frames which are ready to be released
`static java.lang.String`	`PROP_END_SILENCE` The property for the amount of time in silence (in milliseconds) to be considered as utterance end.
`static java.lang.String`	`PROP_SPEECH_LEADER` The property for the amount of time (in milliseconds) before speech start to be included as speech data.
`static java.lang.String`	`PROP_SPEECH_TRAILER` The property for the amount of time (in milliseconds) after speech ends to be included as speech data.
`static java.lang.String`	`PROP_START_SPEECH` The property for the minimum amount of time in speech (in milliseconds) to be considered as utterance start.
`private int`	`speechLeader`
`private int`	`speechTrailer`
`private int`	`startSpeechTime`
`(package private) QuickSpeechMarker.State`	`state`

Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
`logger`

Constructor Summary
`QuickSpeechMarker()`
`QuickSpeechMarker(int startSpeechTime, int endSilenceTime, int speechLeader, int speechTrailer)`

Method Summary
`private void`	`endOfSpeech()` transition from IN_SPEECH to NON_SPEECH.
`private void`	`flushBuffer()` flush the internal buffer to outputQueue
`int`	`getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)` Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object.
`(package private) long`	`getCollectTimeOfNextFrame()` the collection time of the next frame in the buffer
`edu.cmu.sphinx.frontend.Data`	`getData()` Returns the next Data object.
`private void`	`handleNewFrameInSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)` what to do when we're in speech.
`private void`	`handleNewFrameNonSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)` what to do when we're out-of-speech: If incoming audio isSpeech, we we check whether there was startSpeechTime amount of speech yet.
`void`	`initialize()` Initializes this SpeechMarker
`boolean`	`inSpeech()`
`void`	`newProperties(edu.cmu.sphinx.util.props.PropertySheet ps)`
`edu.cmu.sphinx.frontend.Data`	`nextOutputFrame()` gets one Data object from outputQueue, unwraps SpeechClassifiedData and tags DataStartSignals
`private void`	`processOneInputFrame()` read one frame of input with all the necessary accounting.
`private edu.cmu.sphinx.frontend.Data`	`readData()` read one data object from the predecessor in the frontend pipeline
`private void`	`reset()` Resets this SpeechMarker to a starting state.
`private void`	`startOfSpeech()` transition from NON_SPEECH to IN_SPEECH.
`java.lang.String`	`toString()`
`private void`	`trimBufferToLeader()` release the buffer so that it contains just speechLeader amount of audio (and update currentSilence accordingly)
`private void`	`updateCounters(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)` update the counters that track speech/silence duration.

Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
`getPredecessor, getTimer, setPredecessor`

Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
`getName, initLogger`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

PROP_START_SPEECH

@S4Integer(defaultValue=200)
public static final java.lang.String PROP_START_SPEECH

The property for the minimum amount of time in speech (in milliseconds) to be considered as utterance start.

See Also:: Constant Field Values

startSpeechTime

private int startSpeechTime

PROP_END_SILENCE

@S4Integer(defaultValue=500)
public static final java.lang.String PROP_END_SILENCE

The property for the amount of time in silence (in milliseconds) to be considered as utterance end.

See Also:: Constant Field Values

endSilenceTime

private int endSilenceTime

PROP_SPEECH_LEADER

@S4Integer(defaultValue=50)
public static final java.lang.String PROP_SPEECH_LEADER

The property for the amount of time (in milliseconds) before speech start to be included as speech data.

See Also:: Constant Field Values

speechLeader

private int speechLeader

PROP_SPEECH_TRAILER

@S4Integer(defaultValue=50)
public static final java.lang.String PROP_SPEECH_TRAILER

The property for the amount of time (in milliseconds) after speech ends to be included as speech data.

See Also:: Constant Field Values

speechTrailer

private int speechTrailer

outputQueue

private java.util.Queue<edu.cmu.sphinx.frontend.Data> outputQueue

processed frames which are ready to be released

buffer

private java.util.List<edu.cmu.sphinx.frontend.Data> buffer

frames that have to be kept, in order to possibly insert a signal into the stream

currentSilence

private int currentSilence

currentSpeech

private int currentSpeech

state

QuickSpeechMarker.State state

Constructor Detail

QuickSpeechMarker

public QuickSpeechMarker(int startSpeechTime,
                         int endSilenceTime,
                         int speechLeader,
                         int speechTrailer)

QuickSpeechMarker

public QuickSpeechMarker()

Method Detail

newProperties

public void newProperties(edu.cmu.sphinx.util.props.PropertySheet ps)
                   throws edu.cmu.sphinx.util.props.PropertyException

Specified by:: newProperties in interface edu.cmu.sphinx.util.props.Configurable
Overrides:: newProperties in class edu.cmu.sphinx.util.props.ConfigurableAdapter

Throws:: edu.cmu.sphinx.util.props.PropertyException

initialize

public void initialize()

Initializes this SpeechMarker

Specified by:: initialize in interface edu.cmu.sphinx.frontend.DataProcessor
Overrides:: initialize in class edu.cmu.sphinx.frontend.BaseDataProcessor

reset

private void reset()

Resets this SpeechMarker to a starting state.

getData

public edu.cmu.sphinx.frontend.Data getData()

Returns the next Data object.

Specified by:: getData in interface edu.cmu.sphinx.frontend.DataProcessor
Specified by:: getData in class edu.cmu.sphinx.frontend.BaseDataProcessor

Returns:: the next Data object, or null after all data has been returned

nextOutputFrame

public edu.cmu.sphinx.frontend.Data nextOutputFrame()

gets one Data object from outputQueue, unwraps SpeechClassifiedData and tags DataStartSignals

Returns:: the next Data object from the outputQueue

processOneInputFrame

private void processOneInputFrame()

read one frame of input with all the necessary accounting. this may or may not add frames to outputBuffer

updateCounters

private void updateCounters(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)

update the counters that track speech/silence duration. depending on whether incoming audio isSpeech or not, we update currentSpeech/Silence counters correspondingly.

handleNewFrameInSpeech

private void handleNewFrameInSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)

what to do when we're in speech. If incoming audio isSpeech, we can directly release the buffer to outputQueue. If it's not speech, and the silence reaches endSilenceTime, we transition to end of speech endOfSpeech(), if it's not speech and we release the buffer *if* we are still within speechTrailer amount of silence (otherwise, we have to hold back the remaining frames, because a DataEndSignal may have to be inserted later.

Parameters:: scd - the frame at this time step

handleNewFrameNonSpeech

private void handleNewFrameNonSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)

what to do when we're out-of-speech: If incoming audio isSpeech, we we check whether there was startSpeechTime amount of speech yet. If so, we transition to IN_SPEECH.

Parameters:: scd - the frame at this time step

startOfSpeech

private void startOfSpeech()

transition from NON_SPEECH to IN_SPEECH. we insert a SpeechStartSignal into the outputQueue and release the buffer to the outputQueue

getCollectTimeOfNextFrame

long getCollectTimeOfNextFrame()

the collection time of the next frame in the buffer

endOfSpeech

private void endOfSpeech()

transition from IN_SPEECH to NON_SPEECH. we insert a SpeechEndSignal into the outputQueue and partially release the buffer so that it contains no more than needed for the next speechLeader

flushBuffer

private void flushBuffer()

flush the internal buffer to outputQueue

trimBufferToLeader

private void trimBufferToLeader()

release the buffer so that it contains just speechLeader amount of audio (and update currentSilence accordingly)

toString

public java.lang.String toString()

Overrides:: toString in class edu.cmu.sphinx.util.props.ConfigurableAdapter

readData

private edu.cmu.sphinx.frontend.Data readData()
                                       throws edu.cmu.sphinx.frontend.DataProcessingException

read one data object from the predecessor in the frontend pipeline

Throws:: edu.cmu.sphinx.frontend.DataProcessingException

getAudioTime

public int getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)

Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object.

Parameters:: audio - the SpeechClassifiedData object
Returns:: the amount of audio data in milliseconds

inSpeech

public boolean inSpeech()

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

inpro.sphinx.frontend Class QuickSpeechMarker

PROP_START_SPEECH

startSpeechTime

PROP_END_SILENCE

endSilenceTime

PROP_SPEECH_LEADER

speechLeader

PROP_SPEECH_TRAILER

speechTrailer

outputQueue

buffer

currentSilence

currentSpeech

state

QuickSpeechMarker

QuickSpeechMarker

newProperties

initialize

reset

getData

nextOutputFrame

processOneInputFrame

updateCounters

handleNewFrameInSpeech

handleNewFrameNonSpeech

startOfSpeech

getCollectTimeOfNextFrame

endOfSpeech

flushBuffer

trimBufferToLeader

toString

readData

getAudioTime

inSpeech

inpro.sphinx.frontend
Class QuickSpeechMarker