Amazon has been at the forefront of developing voice-based technologies for over a decade, from creating the world’s best personal AI assistant, Alexa, to developing AWS services like Lex, Polly, and Connect. However, for voice AI to drive more real-world value for customers, it must account for the nuance and complexity of human conversation. Words alone can fall flat without acoustic context that gives them depth. How something is said is equally, if not more important, than what is said.
Introducing Amazon Nova Sonic
Today, Amazon announced Amazon Nova Sonic, a new foundation model that unifies speech understanding and speech generation into a single model. Available via a new API in Amazon Bedrock, Nova Sonic simplifies the development of voice applications across various industries, including customer service call automation and AI agents in travel, education, healthcare, entertainment, and more.
A New Approach to Voice-Enabled Applications
Traditional approaches to building voice-enabled applications involve complex orchestration of multiple models: speech recognition to convert speech to text, large language models (LLMs) to understand and generate responses, and text-to-speech to convert text back to audio. This fragmented approach increases development complexity and fails to preserve crucial acoustic context and nuances like tone, prosody, and speaking style essential for natural conversations.

Nova Sonic takes a new approach by unifying understanding and generation capabilities into a single model. This unification enables the model to adapt generated voice responses to acoustic context (e.g., tone, style) and spoken input, resulting in more natural dialogue. Nova Sonic understands human conversation nuances, including natural pauses and hesitations, waiting to speak until appropriate and handling barge-ins gracefully.
Real-World Applications
AI Agent for Travel
An example of an AI agent built on Amazon Nova Sonic is a virtual travel assistant helping a customer plan a trip to Hawaii. When the customer’s tone shifts from excitement to concern about costs, the AI’s tone becomes more reassuring as it provides relevant pricing information. Nova Sonic also generates a text transcript for the user’s speech, enabling developers to use that text to call specific tools and APIs for building voice-enabled AI agents.
Enterprise AI Assistant
Nova Sonic can also benefit enterprise customers by grounding responses in company data. A dashboard AI assistant can pull reports and share accurate data in a natural, conversational tone while proactively asking relevant follow-up questions. The fluid dialogue enables multi-turn exchanges without requiring explicit context-setting from the speaker.
With the launch of Nova Sonic, Amazon continues to innovate with state-of-the-art foundation models that deliver real-world value for every Amazon customer. Developers and tech enthusiasts can explore Amazon Nova on nova.amazon.com and access the Amazon Nova Act SDK to build agents that take action in web browsers.