OpenAI’s Whisper invents parts of transcriptions — a lot

Imagine going to the doctor, telling them how you feel and then having false information added to a transcription later and your story changed. That can happen at medical centers that use OpenAI’s transcription tool Whisper.

More than a dozen developers, software engineers and academic researchers have found evidence that Whisper creates illusions – fabricated text – including fabricated medications, racist slurs and violent comments, the Associated Press reports. Yet, last month, the open-source AI platform HuggingFace saw 4.2 million downloads of Whisper’s latest version. The tool is also built into Oracle and Microsoft’s cloud computing platforms, as well as some versions of ChatGPT.

The damning evidence is widespread, with experts finding significant flaws in Whisper everywhere. Take a University of Michigan researcher who found fabricated text in eight out of ten audio transcriptions of public meetings. In another study, computer scientists found 187 hallucinations analyzing more than 13,000 audio recordings.

This trend continues: A machine learning engineer found them in about half of more than 100 hours of transcriptions, while a developer saw hallucinations in nearly all of the 26,000 transcriptions created by Whisper.

The potential danger becomes even more clear when looking at specific examples of these hallucinations. Two professors, Alison Koeneke and Mona Sloane from Cornell University and the University of Virginia, respectively, watched clips from a research repository called TalkBank. The pair found that about 40 percent of the hallucinations were likely to be misinterpreted or misrepresented.

In one case, Whisper invented that the three people discussed were black. In another, Whisper made up the idea that “He, the boy, was going to get the umbrella, I don’t know exactly.” to “He took a big piece of a cross, a little piece… I’m sure he didn’t have a knife out of terror so he killed a lot of people.”

Whisper’s hallucinations also have risky medical implications. A company called Nabla uses Whisper for its medical transcription tool, which is used by more than 30,000 physicians and 40 health systems – transcribing an estimated seven million visits so far. Although the company is aware of the issue and claims to be addressing it, there is currently no way to check the validity of the transcript.

According to Martin Raison, Nabla’s chief technology officer, the tool erases all audio for “data security reasons.” The company also claims that providers should edit and approve transcriptions quickly (do doctors have that much extra time?), but this system may change. In the meantime, no one can confirm that the transcriptions are accurate due to privacy laws.

The rise of AI NPCs has felt like a threat for years, as if developers couldn’t wait to dump human writers and offload NPC conversations to generative AI models. At CES 2025, NVIDIA made it abundantly clear that the technology is just around the corner. PUBG developer Krafton, for example, plans to use NVIDIA’s ACE (Avatar Cloud Engine) to power AI companions who will assist you and joke around with you during matches. Krafton isn’t just stopping there – it’s also using ACE in its life simulation title InZOI to make characters smarter and generate objects.

While the use of generative AI in games seems almost inevitable, as the medium has always toyed with new ways to make enemies and NPCs smarter and more realistic, seeing multiple NVIDIA ACE demos one after another really made me cringe.

This wasn’t just slightly smarter enemy AI – ACE can create entire conversations out of thin air, simulate voices and try to give NPCs a sense of personality. It’s even doing this locally on your PC, powered by NVIDIA’s RTX GPUs. But while all of this might sound good on paper, I hated nearly every second of watching the AI ​​NPCs in action.

TiGames’ ZooPunk is a great example of this: it relies on NVIDIA ACE to generate dialogue, a virtual voice and lip syncing for an NPC named Buck. But as you can see in the video above, Buck sounds like a robot with a slightly rustic accent. If he’s supposed to have some kind of relationship with the main character, you can’t tell from his performance.

I think my deep dislike of NVIDIA’s ACE-powered AI comes down to this: there’s simply nothing charming about it. No joy, no warmth, no humanity. Every ACE AI character feels like a developer cutting corners in the worst way possible, as if you can see their contempt for the audience as a boring NPC. I would much rather scroll through some on-screen text, at least I don’t have to interact with weird robot voices.

Leave a Comment