Voicegain v3

The video below shows an example of Voicegain Live Transcribe used to provide transcription for an event streamed over video.
‍

‍

Here are some details about this particular setup:

the video part is streamed using BoxCast
the audio for transcription is tapped live at the source on site
audio is streamed to Voicegain Cloud for processing using a small Java client running on raspberry pi computer
the audio client was downloaded pre-configured from the Voicegain portal and reads audio directly from USB audio device plugged into raspberry pi
speech is transcribed in the Cloud using Voicegain semi-real-time mode which delivers results in about 30 seconds (the real-time mode delivers results will less than 1 second delay))
the transcription output goes via a delay component that allows us to dial in the precise delay to match the streaming video delay - in this case the delay was 35.5 seconds
the transcribed words are sent to a Web Client over websocket - each word is sent with the set delay
the words are displayed with the gray font shade corresponding to the confidence in the words and the gap proportional to the gap between the spoken words
the Acoustic Model used here has been custom trained with additional 200h+ hours from this particular speaker
custom training data consisted simply of previously transcribed speeches by the speaker that were readily available on the website
we are also using a custom Language Model (on top of the base NLM) that was created from user provided corpus

Related articles.