The focus in generative AI has largely centered around text-based interfaces. Now, the next frontier – voice AI – appears to be rapidly emerging. Google recently announced it will integrate Chirp 3, its speech-to-text and HD text-to-speech models, into its Vertex AI development platform, starting the following week.
Last week’s news revealed eight new voice options within Chirp 3, supporting 31 languages. The platform’s applications include building voice assistants, producing audiobooks, and developing virtual support agents and video voice-overs. This announcement was made at Google’s DeepMind offices in London.
Google’s initiatives coincide with advancements from other companies in the voice AI sector. Last week, Sesame, known for its realistic ‘Maya’ and ‘Miles’ AI apps, announced the launch of its model, allowing developers to customize apps and services using its technology.
Chirp 3 will have usage restrictions to manage potential misuse. Thomas Kurian, CEO of Google Cloud, mentioned at a recent event, “We’re just working through some of these things with our safety team.”
ElevenLabs is among the major startups that have secured considerable funding to develop its AI voice services. The integration of Chirp 3 brings it into the same category as the newest iterations of Gemini (its flagship LLM), the image-generation model Imagen, and the Veo 2 video generation tool.
Whether Google’s Chirp 3 will attain the level of realism seen in other AI voice projects (especially Sesame’s work) remains to be seen. Demis Hassabis, the CEO of DeepMind, emphasized that this is a long-term project. “In the near term … this idea that [AI is] a silver bullet to everything in the next couple of years, I don’t see that happening just yet. Think we’re still quite a few years away from something like AGI happening. It’s going to change things … over the next decade, so the medium to longer term. It’s one of those interesting moments in time.”
Introduced in 2021, Vertex AI started as a platform for developers to build cloud-based machine-learning services. The increasing interest in AI, particularly generative AI, followed the release of OpenAI’s GPT services. Since then, Google has focused on Vertex AI to catch up with competitors such as Microsoft and Amazon, who also provide generative AI tooling for developers. In addition to constructing generative AI models built on Gemini, developers can utilize Vertex AI to classify data, train models, and prepare models for production. An interesting development would be if Google expands its platform to include models beyond those created by Google itself. Google has been developing ‘Chirp’ voice services for years, first using the name as a code name for its early competition against Amazon’s Alexa service.