Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home » OpenAI Unveils New Voice AI Models for Developers, Expanding Capabilities
    AI

    OpenAI Unveils New Voice AI Models for Developers, Expanding Capabilities

    techgeekwireBy techgeekwireMarch 28, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    OpenAI Expands Voice AI Offerings with New Models

    OpenAI is continuing its push into voice AI with the release of three new models designed for developers: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts. These models, built upon the existing GPT-4o technology, aim to provide improved transcription accuracy and customizable speech synthesis, enhancing the tools available for third-party developers. The company has also launched a demo site, OpenAI.fm, showcasing various voice customizations.

    These new models come at a time when OpenAI has faced public scrutiny regarding its voice AI offerings. This includes a prior instance of criticism regarding the use of a voice model that sounded similar to actress Scarlett Johansson. The company clarified that users will be able to control how their AI voices sound. Users will be able to change accents, pitch, tone, and emotions.

    Enhanced Transcription and Speech Capabilities

    The new models are based on the GPT-4o model that powers the ChatGPT text and voice experience, but have been post-trained with additional data to excel at transcription and speech.

    OpenAI’s technical staff member, Jeff Harris, showed in a demo how the voice can be customized using text prompts, allowing it to sound like a cackling mad scientist or a calm yoga instructor. These advanced models leverage noise cancellation and semantic voice activity detection to improve accuracy. Performance has been refined in noisy environments while recognizing diverse accents and varying speech speeds across more than 100 languages. This builds upon the capabilities of the company’s Whisper open-source text-to-speech model. Based on the company’s website, error rates using the new gpt-4o-transcribe models have fallen significantly when identifying words across 33 languages. In English, for example, the error rate is just 2.46%.

    Comparison of gpt-4o-transcribe models' error rates across languages
    Comparison of gpt-4o-transcribe models’ error rates across languages

    Applications and Developer Tools

    The models are designed to work well in environments like customer call centers, meeting note transcription, and AI-powered assistants. The new Agents SDK launched last week makes it easy for developers to add voice interactions to text-based language models like GPT-4o.

    According to Harris, these models introduce streaming speech-to-text, allowing developers to receive a real-time text stream, making conversations feel more natural. For low-latency, real-time AI voice experiences, OpenAI recommends using its speech-to-speech models in the Realtime API.

    Pricing and Competition

    The new models are accessible immediately via OpenAI’s API with the following pricing structure:

    • gpt-4o-transcribe: $6.00 per 1M audio input tokens (~$0.006 per minute)
    • gpt-4o-mini-transcribe: $3.00 per 1M audio input tokens (~$0.003 per minute)
    • gpt-4o-mini-tts: $0.60 per 1M text input tokens, $12.00 per 1M audio output tokens (~$0.015 per minute)

    The AI transcription and speech space is seeing increased competition, with companies like ElevenLabs and Hume AI offering distinct solutions.

    Early Industry Adoption

    OpenAI has shared testimonials from companies who have integrated the new audio models. EliseAI, which focuses on property management automation, noted that the text-to-speech model enabled more natural and emotionally rich interactions with tenants. Decagon, which builds AI-powered voice experiences, saw a 30% improvement in transcription accuracy by using OpenAI’s speech recognition model.

    Reactions and Future Plans

    While the new models have largely been well-received, not all reactions have been positive. Ben Hylak, co-founder of Dawn AI, suggested that the announcement “feels like a retreat from real-time voice.” The launch was also preceded by an early leak on X (formerly Twitter).

    Looking ahead, OpenAI will continue refining its audio models, exploring custom voice capabilities, and investing in multimodal AI, including video.

    AI Developers GPT-4o OpenAI speech transcription voice AI
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025

    Andreessen Horowitz Backs Controversial Startup Cluely Despite ‘Rage-Bait’ Marketing

    July 4, 2025

    Invesco QQQ ETF Hits All-Time High as Tech Stocks Continue to Soar

    July 4, 2025

    ContractPodAi Partners with Microsoft to Advance Legal AI Automation

    July 4, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025

    Andreessen Horowitz Backs Controversial Startup Cluely Despite ‘Rage-Bait’ Marketing

    July 4, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025
    Categories
    • AI (2,696)
    • Amazon (1,056)
    • Corporation (990)
    • Crypto (1,130)
    • Digital Health Technology (1,079)
    • Event (523)
    • Microsoft (1,230)
    • New (9,568)
    • Startup (1,164)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.