ASR, Streaming, Contact Center

Voicegain SIP Media Stream B2BUA - Enable real-time transcription for Generative Voice AI Apps in Premise-Based Contact Centers

This articles provides an overview of Voicegain SIP Media Stream Back-to-Back User Agent (B2BUA), a Contact Center Platform agnostic solution that forks real-time SIP media streams from premise-based contact centers to Voicegain Speech-to-Text for real-time transcription. In SIP telephony, B2BUA stands for Back-to-Back User Agent and is a specific network element that can both terminate and originate media streams. This is explained further in this post.

The Voicegain SIP Media B2BUA is a containerized solution that is deployed in the same network as the Contact Center platform. Once configured, Developers or enterprise customers can get real-time access to speaker-separated transcripts (over a Websocket connection). 

The Voicegain SIP Media B2BUA is a solution for enterprises and SaaS ISVs looking to extend their real-time LLM-powered Voice AI application to premise-based contact centers. Examples of such on-premise contact center platforms include Avaya, Genesys or Cisco. The Generative Voice AI applications supported include Real-Time Agent Assist or Voice AI Co-Pilots, Real-time sentiment analysis, voice biometrics and other types of real-time speech analytics apps.

The Challenge: Accessing Real-time voice data from On-Prem Call Centers is hard

Most premise-based Contact Center platforms - whether it is from Avaya, Genesys, or Cisco - do not provide programmatic access to real-time media streams. While these systems are reliable for call routing and management, they were not designed for modern LLM-powered AI applications.

Traditionally forking of media streams is supported by a Session Border Controller (SBC), a separate network element that sits "in front" of the Call Center platform. These SBCs rely on a protocol called SIPREC to fork these media streams. However SIPREC is primarily intended for network-based compliant call recording and commercial compliant recording vendors like NICE or Verint leverage the SIPREC protocol to access real-time media streams from premise-based contact center platforms.

However there are many pain points :

1) Only large enterprises have implemented Session Border Controllers.

2) Even if an enterprise has an SBC, the forking media capacity is used up by the call recording solution. Adding an additional streaming option for generative AI requires upgrades to hardware and software licensing on the SBC.

The Solution: Voicegain SIP Media Stream B2BUA

Voicegain offers a highly scalable, reliable and fully contained SIP Media Stream B2BUA to address the challenge discussed above. This B2BUA is a containerized network element that is deployed in the same network as the premise-based contact center. From a SIP Protocol standpoint, this Media Stream Back-to-Back User Agent (B2BUA)  acts as a transparent media relay while forking SIP RTP media streams to real-time Speech-to-Text.

What is a B2BUA? Unlike a simple SIP proxy that only handles signaling, a Back-to-Back User Agent terminates and re-originates both signaling and media streams, allowing it to manipulate call flows while maintaining access to the audio content.

Key Components and Architecture

The Voicegain SIP Media Stream B2BUA is:

  • Containerized: Deployed via Docker Compose in your existing network
  • Platform-agnostic: Works with Avaya, Cisco, Genesys, and other SIP-based platforms
  • Lightweight: Minimal processing overhead with efficient media handling
  • Non-disruptive: Integrates with existing call flows through simple SIP routing

The diagram above show cases the call flow to fork media streams for real-time transcription.

At a high-level, most SIP-based Contact Center ACDs like Avaya Communication Manager, Genesys Engage and Cisco UCCE support creation of SIP Trunks. The Voicegain SIP Media Stream B2BUA is a SIP Server/SIP Peer that connects to the premise-based ACD over a dedicated SIP trunk. It can receive calls from and make calls to the premise-based ACD.

The overall call flow has the following steps:

  1. Install the Voicegain Media Stream B2BUA as a containerized local network element (using Docker Compose).
  2. Create a local SIP Trunk on the ACD and test using the SIP Trunk to make calls to and receive calls from the VG Media Stream B2BUA.
  3. Configure ACD to transfer incoming calls first to the VG Media Stream B2BUA over SIP (This is essentially a SIP INVITE of the Media Stream B2BUA). Include the DID or URI of the Contact Center Queue or the IVR in the routing label (SIP URI). (Step 1)
  4. VG Media Stream B2BUA has an associated configuration file. This provides any mapping required of the destination DID.
  5. Based on the above, the B2BUA "bridges" the DID (or URI) of the Contact Center Queue or the IVR with the incoming call(Step 2)
  6. The B2BUA also forks the RTP streams of the 2-channel stereo audio (of the caller and the agent) to Voicegain STT for transcription (Step 3)
  7. The real-time transcript is made available on a Websocket connection to the Gen AI App (Step 4)

To summarize, what is needed is to configure a SIP Trunk on the current Contact Center ACD and use that to transfer calls to Voicegain Media Stream B2BUA (which is the equivalent of initiating a SIP INVITE). The Medial Stream SIP B2BUA in turn bridges the DID of the Contact Center Queue or IVR (which is a SIP INVITE to the destination SIP URI) over the same SIP Trunk and then forks the two RTP streams (caller & Agent) to Voicegain STT.

Tested with Avaya

The Voicegain SIP Media Stream B2BUA has been deployed in production application for a leading health care customer with an On-Premise Avaya Contact Center. 

Implementation Requirements

To deploy the Voicegain SIP Media Stream Proxy, you'll need:

  1. A Docker-compatible server within your contact center network
  2. SIP connectivity to your existing contact center platform
  3. Network access to Voicegain's Speech-to-Text APIs (or on-premises deployment)
  4. Basic configuration to define routing rules and destination mappings

Contact Us to discuss your application

If you have an on-premise contact center and you would like to discuss getting access to the real-time media stream, please contact us at support@voicegain.ai

Voicegain: Voice AI Under Your Control

Voicegain: Build Voice AI apps with our Speech-to-Text and LLM-powered NLU APIs. Record & Transcribe meetings, contact center calls, videos, etc. Get LLM-powered Summary, Sentiment and more. Build Conversational Voice Bots that integrate with your On-prem or cloud CCaaS platform. Get started today.

See how Voicegain works — get a demo of Voicegain today.

Tell us what you are building!

We love talking with you about generative AI, speech & transcription, & privacy—whether you're a startup, a Fortune 500 company, or anywhere in between.
By sending your message, you agree to Voicegain’s  Terms of Service and Privacy Policies.
Thank you for reaching us!
We will be in touch with you shortly.
Oops! Something went wrong while submitting the form. Please, try again!