Using the Speech API to convert speech to text

Some time ago I created a “listen.exe” tool which used SAPI’s ISpRecoContext to listen to the microphone and dump any recognized text to the console.

Today I had to debug an issue with SAPI reading from a .wav file, so I updated it to accept a listen.exe –file foo.wav argument; this consumes the audio in the .wav file instead of listening to the microphone.

Pseudocode for the difference:

CoCreate(ISpRecognizer);
CoCreate(ISpStream);
pSpStream->BindToFile(file);
pSpRecognizer->SetInput(pSpStream);

Also, we have to tell the ISpRecoContext that we’re interested in SPEI_END_SR_STREAM events as well as SPEI_RECOGNITION events.

Full source and binaries attached.

A gotcha: the .wav file has to have a WAVEFORMATEX.wFormatTag = WAVE_FORMAT_PCM. If it’s anything else, ISpRecoGrammar::SetDictationState fails with SPERR_UNSUPPORTED_FORMAT. Neither WAVE_FORMAT_IEEE_FLOAT nor (WAVE_FORMAT_EXTENSIBLE with SubFormat = KSDATAFORMAT_SUBTYPE_PCM) work.

Browse source

Download binary

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s