The video below shows an example of Voicegain Live Transcribe used to provide transcription for an event streamed over video.
Here are some details about this particular setup:
- the video part is streamed using BoxCast
- the audio for transcription is tapped live at the source on site
- audio is streamed to Voicegain Cloud for processing using a small Java client running on raspberry pi computer
- the audio client was downloaded pre-configured from the Voicegain portal and reads audio directly from USB audio device plugged into raspberry pi
- speech is transcribed in the Cloud using Voicegain semi-real-time mode which delivers results in about 30 seconds (the real-time mode delivers results will less than 1 second delay))
- the transcription output goes via a delay component that allows us to dial in the precise delay to match the streaming video delay - in this case the delay was 35.5 seconds
- the transcribed words are sent to a Web Client over websocket - each word is sent with the set delay
- the words are displayed with the gray font shade corresponding to the confidence in the words and the gap proportional to the gap between the spoken words
- the Acoustic Model used here has been custom trained with additional 200h+ hours from this particular speaker
- custom training data consisted simply of previously transcribed speeches by the speaker that were readily available on the website
- we are also using a custom Language Model (on top of the base NLM) that was created from user provided corpus