AI experts sharing free tutorials to accelerate your business.
← Back to News
Breaking

ChatGPT Voice Mode Runs Old GPT-4o Model, Devs Flag Capability Gap

Krasa AI

2026-05-28

5 minute read

ChatGPT Voice Mode Runs Old GPT-4o Model, Devs Flag Capability Gap

A simmering developer controversy hit the AI press this week: ChatGPT's voice mode — the feature OpenAI markets as the most natural way to talk to its AI — still runs on a GPT-4o-era model with an April 2024 knowledge cutoff. That's roughly 13 months behind the GPT-5.5 model that powers the text interface.

Independent researcher Simon Willison surfaced the gap in a widely shared post, and Andrej Karpathy followed with his own commentary. The pattern is clear: paying $200/month for ChatGPT Pro doesn't get you the same model across modes, and most users have no idea.

What's Actually Happening

When you open ChatGPT and start a voice conversation, the model handling your audio isn't GPT-5.5 Instant or any of the newer flagship models. It's a GPT-4o-derived voice-to-voice model that hasn't been meaningfully updated since 2024.

Willison's test is simple to reproduce: ask voice mode for its knowledge cutoff. It tells you April 2024. Ask the text interface the same question, and you get a much more recent date.

Why this matters: ChatGPT users naturally assume the voice they talk to is the same intelligence they get when they type. The voice feels conversational, fast, and articulate — most of the cues people use to judge AI quality. But it's running on a model that's effectively two generations behind, and that gap shows up in reasoning, factual recall, and tool use.

The Technical Reason

There's a real engineering constraint underneath this. Real-time voice requires low-latency inference. The model has to listen, think, and respond fast enough that the conversation doesn't feel laggy — typically under 500 milliseconds end-to-end.

Frontier models like GPT-5.5 are big, expensive to run, and slow at full quality. Running them in a voice loop without major changes would either burn money on inference costs or introduce noticeable delay. So OpenAI keeps voice on a smaller, faster, older model that was specifically trained for voice-to-voice work.

The tradeoff makes sense from a margin perspective. It just hasn't been disclosed clearly to users.

Why Developers Are Pushing Back

The criticism isn't really about the model choice — it's about the disclosure. There's no prominent indicator in the ChatGPT app that voice mode runs on a different, older model. Users paying for Pro pay the same $200/month whether they use text or voice. The implicit promise is feature parity.

Willison's framing has been blunt: it's "non-obvious to many people" that voice is weaker, and people deserve to know. Karpathy's commentary echoed the same point. Among developers who've been tracking this, the frustration is that OpenAI's voice quality has actually gotten worse over the past few months by some informal tests, even as the text models have improved.

The contrast with competitors is sharp. Google's Gemini Live runs on Gemini 3.5 Flash, the same Flash-tier model available in the regular Gemini app. Anthropic doesn't have a comparable real-time voice mode at all, but its Claude text models are consistent across surfaces.

OpenAI's Realtime Models Exist — They're Just Not in ChatGPT

There's an extra wrinkle. OpenAI shipped three new realtime voice models earlier in May, including GPT-5-class reasoning variants and 70-language translation. Those are available through the API.

So the company has built better voice models. They're just not the ones running inside the consumer ChatGPT app.

Why this matters: developers building on OpenAI's API have access to substantially better voice intelligence than ChatGPT Pro subscribers do. That's an unusual configuration — typically the consumer product gets the best version of the technology, not the worst.

What Industry Watchers Are Saying

Tech press has been picking up the story since Willison's April post resurfaced this week. Help Net Security and others have flagged the gap. AI news aggregators ranked it among the top stories of May 28.

Within the developer community, the dominant sentiment is that OpenAI needs to either upgrade ChatGPT voice mode to a current-generation model or disclose clearly that voice runs on an older one. The current setup — silently using a weaker model for a premium feature — is the worst of both worlds.

OpenAI has not officially responded to the criticism as of publication.

What's Next

The pressure points are obvious. Either OpenAI ships a real-time variant of GPT-5.5 into ChatGPT voice (technically hard but the obvious customer-pleasing move), or it adds clearer in-app disclosure about which model handles voice. A third option — letting Pro users opt into a slower-but-smarter voice — would split the difference.

For Pro and Plus subscribers, the practical advice for now: if a conversation matters, use text. Voice is fine for casual queries, but it's running on a model that doesn't know anything that happened after April 2024 and reasons noticeably worse than the text version.

API customers have a different path: build directly on the new realtime models OpenAI released in May. Those are GPT-5-class and don't have the same capability gap.

The Bottom Line

The voice mode controversy is small in dollar terms but big in trust. ChatGPT's most distinctive consumer feature is running on yesterday's model, and developers are increasingly willing to say it out loud. Whether OpenAI fixes the gap or just gets better at disclosing it, the era of assuming "ChatGPT" means a single model behind the curtain is over.

If you're using ChatGPT voice for anything that requires reasoning, recent facts, or careful instruction-following — switch to text. The model behind the voice isn't the one you think it is.

#ai#openai#chatgpt#voice

Related Articles