|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.cmu.sphinx.util.props.ConfigurableAdapter
edu.cmu.sphinx.frontend.BaseDataProcessor
inpro.sphinx.frontend.QuickSpeechMarker
public class QuickSpeechMarker
Converts a stream of SpeechClassifiedData objects, marked as speech and non-speech, and mark out the regions that are considered speech. This is done by inserting SPEECH_START and SPEECH_END signals into the stream.
The algorithm for inserting the two signals is as follows.
The algorithm is always in one of two states: 'in-speech' and 'out-of-speech'. If 'out-of-speech', it will read in audio until we hit audio that is speech. If we have read more than 'startSpeech' amount of continuous speech, we consider that speech has started, and insert a SPEECH_START at 'speechLeader' time before speech first started. The state of the algorithm changes to 'in-speech'.
Now consider the case when the algorithm is in 'in-speech' state. If it read an audio that is speech, it is scheduled for output. If the audio is non-speech, we read ahead until we have 'endSilence' amount of continuous non-speech. At the point we consider that speech has ended. A SPEECH_END signal is inserted at 'speechTrailer' time after the first non-speech audio. The algorithm returns to 'out-of-speech' state. If any speech audio is encountered in-between, the accounting starts all over again. While speech audio is processed delay is lowered to some minimal amount. This helps to segment both slow speech with visible delays and fast speech when delays are minimal.
Nested Class Summary | |
---|---|
private static class |
QuickSpeechMarker.State
|
Field Summary | |
---|---|
private java.util.List<edu.cmu.sphinx.frontend.Data> |
buffer
frames that have to be kept, in order to possibly insert a signal into the stream |
private int |
currentSilence
|
private int |
currentSpeech
|
private int |
endSilenceTime
|
private java.util.Queue<edu.cmu.sphinx.frontend.Data> |
outputQueue
processed frames which are ready to be released |
static java.lang.String |
PROP_END_SILENCE
The property for the amount of time in silence (in milliseconds) to be considered as utterance end. |
static java.lang.String |
PROP_SPEECH_LEADER
The property for the amount of time (in milliseconds) before speech start to be included as speech data. |
static java.lang.String |
PROP_SPEECH_TRAILER
The property for the amount of time (in milliseconds) after speech ends to be included as speech data. |
static java.lang.String |
PROP_START_SPEECH
The property for the minimum amount of time in speech (in milliseconds) to be considered as utterance start. |
private int |
speechLeader
|
private int |
speechTrailer
|
private int |
startSpeechTime
|
(package private) QuickSpeechMarker.State |
state
|
Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter |
---|
logger |
Constructor Summary | |
---|---|
QuickSpeechMarker()
|
|
QuickSpeechMarker(int startSpeechTime,
int endSilenceTime,
int speechLeader,
int speechTrailer)
|
Method Summary | |
---|---|
private void |
endOfSpeech()
transition from IN_SPEECH to NON_SPEECH. |
private void |
flushBuffer()
flush the internal buffer to outputQueue |
int |
getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)
Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object. |
(package private) long |
getCollectTimeOfNextFrame()
the collection time of the next frame in the buffer |
edu.cmu.sphinx.frontend.Data |
getData()
Returns the next Data object. |
private void |
handleNewFrameInSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
what to do when we're in speech. |
private void |
handleNewFrameNonSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
what to do when we're out-of-speech: If incoming audio isSpeech, we we check whether there was startSpeechTime amount of speech yet. |
void |
initialize()
Initializes this SpeechMarker |
boolean |
inSpeech()
|
void |
newProperties(edu.cmu.sphinx.util.props.PropertySheet ps)
|
edu.cmu.sphinx.frontend.Data |
nextOutputFrame()
gets one Data object from outputQueue, unwraps SpeechClassifiedData and tags DataStartSignals |
private void |
processOneInputFrame()
read one frame of input with all the necessary accounting. |
private edu.cmu.sphinx.frontend.Data |
readData()
read one data object from the predecessor in the frontend pipeline |
private void |
reset()
Resets this SpeechMarker to a starting state. |
private void |
startOfSpeech()
transition from NON_SPEECH to IN_SPEECH. |
java.lang.String |
toString()
|
private void |
trimBufferToLeader()
release the buffer so that it contains just speechLeader amount of audio (and update currentSilence accordingly) |
private void |
updateCounters(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
update the counters that track speech/silence duration. |
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor |
---|
getPredecessor, getTimer, setPredecessor |
Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter |
---|
getName, initLogger |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
@S4Integer(defaultValue=200) public static final java.lang.String PROP_START_SPEECH
private int startSpeechTime
@S4Integer(defaultValue=500) public static final java.lang.String PROP_END_SILENCE
private int endSilenceTime
@S4Integer(defaultValue=50) public static final java.lang.String PROP_SPEECH_LEADER
private int speechLeader
@S4Integer(defaultValue=50) public static final java.lang.String PROP_SPEECH_TRAILER
private int speechTrailer
private java.util.Queue<edu.cmu.sphinx.frontend.Data> outputQueue
private java.util.List<edu.cmu.sphinx.frontend.Data> buffer
private int currentSilence
private int currentSpeech
QuickSpeechMarker.State state
Constructor Detail |
---|
public QuickSpeechMarker(int startSpeechTime, int endSilenceTime, int speechLeader, int speechTrailer)
public QuickSpeechMarker()
Method Detail |
---|
public void newProperties(edu.cmu.sphinx.util.props.PropertySheet ps) throws edu.cmu.sphinx.util.props.PropertyException
newProperties
in interface edu.cmu.sphinx.util.props.Configurable
newProperties
in class edu.cmu.sphinx.util.props.ConfigurableAdapter
edu.cmu.sphinx.util.props.PropertyException
public void initialize()
initialize
in interface edu.cmu.sphinx.frontend.DataProcessor
initialize
in class edu.cmu.sphinx.frontend.BaseDataProcessor
private void reset()
public edu.cmu.sphinx.frontend.Data getData()
getData
in interface edu.cmu.sphinx.frontend.DataProcessor
getData
in class edu.cmu.sphinx.frontend.BaseDataProcessor
public edu.cmu.sphinx.frontend.Data nextOutputFrame()
private void processOneInputFrame()
private void updateCounters(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
private void handleNewFrameInSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
endOfSpeech()
,
if it's not speech and we release the buffer *if* we are still
within speechTrailer amount of silence (otherwise, we have to hold
back the remaining frames, because a DataEndSignal may have to be
inserted later.
scd
- the frame at this time stepprivate void handleNewFrameNonSpeech(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData scd)
scd
- the frame at this time stepprivate void startOfSpeech()
long getCollectTimeOfNextFrame()
private void endOfSpeech()
private void flushBuffer()
private void trimBufferToLeader()
public java.lang.String toString()
toString
in class edu.cmu.sphinx.util.props.ConfigurableAdapter
private edu.cmu.sphinx.frontend.Data readData() throws edu.cmu.sphinx.frontend.DataProcessingException
edu.cmu.sphinx.frontend.DataProcessingException
public int getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)
audio
- the SpeechClassifiedData object
public boolean inSpeech()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |