In search of the perfect voicebot speech
The tech innovators trying to build bots that sound exactly like humans
Imagine the scene: you call your bank’s customer service number. The phone is answered in one ring - no waiting no, IVR, just “Hello (your name), welcome to (bank’s name), how can I help?”
You explain what you need, and after you answer a few clarifying questions, your request is being dealt with. It is only at this point in the conversation that it dawns on you…
…you were talking to a bot all along.
This experience is not in the realm of science fiction. It is possible using today's technology and when it works the effect is incredible.
It’s an important leap forward because, currently, customers don't always react in the most constructive way when they encounter an obvious bot. Many will simply ask to speak to a human from the outset, whilst others will try to second-guess the bot by calling out keywords that they believe will be most effective. What they are not doing is interacting with the bot as though they are talking to a human, which is usually the outcome we want to achieve.
Conversely, I have listened to recordings of voicebot conversations where the customer’s first instinct is that they are talking to a human, and the effect is magical. The customer just engages straight away and the bot is able to do its job effectively.
The range of options
Below are a selection of different technical solutions offered by vendors to create convincing artificial human voices. This is by no means a complete list, nor an endorsement of specific products, but it should give an indication of the range of options to consider when building a voicebot.
The fast and full-featured workhorse: Amazon Polly
Amazon Polly is one of the more common Text to Speech (TTS) tools being used for a range of applications, including voice bot. Polly offers around 60 voices in around 30 languages, with a simple scripting language to ensure specific words are pronounced correctly and to change the tone of the speech.
Whilst it may not offer the most realistic voices out there, it is very fast and flexible to set up.
The GenAI innovator: Eleven Labs
For an example of how sophisticated AI can get in generating voices, it’s worth trying out the demos on Eleven Labs’ website. This product is able to create more human-like voices by modelling some of the non-verbal elements like intonation and pauses.
The effect is near-perfect, although there are occasional moments where the voice dips into the uncanny valley.
For true realism: Poly.ai
Poly.ai’s offering is unique in that it generates speech from a blend of synthesized AI voices and recorded human voices. All those “ums” and “ahhs” sound real because they are real.
This approach is less suited to generating original speech in real time than the other options on this list, but to my ears it produces the most realistic results.
The best of open source: Tortoise TTS
I’ve yet to find open source text to speech model that performs as well as the commercial products, but the benefit of open source is that it can be great for experimenting and building proofs of concept before committing to a more sophisticated solution.
Tortoise TTS is one example that is relatively easy to set up, configure and use.
I think we can expect to see many developments in this field in the coming months and years. How long will be it be until you genuinely cannot tell if you are speaking to a bot or a human?
With thanks to my colleague Luke Purcell for some of the leads on this search.
Recommended news articles
AI could kill call centres, says Tata Consultancy Services head (paywall)
Eircom customer service manuals warned staff not to obey law, court told
Who are America's customer service champions? Here is the list of top-ranked companies
The rise of annoying customer satisfaction surveys and questionnaires
Latest perspectives from BCG
Virtual job simulation: explore what it is like to build a GenAI chatbot
Explore firsthand the process of creating an AI-powered chatbot used to analyze financial documents in this self-paced, free virtual module
You will develop skills used to extract financial data, better understand financial statement components and the principles of how to develop an AI-powered chatbot
Click the link above to enroll on this virtual course provided by BCG