"Well, you seem like a person, but you're just a voice in a computer"

By: Rhaomi

13 May 2024 at 15:14

OpenAI unveils GPT-4o, a new flagship "omnimodel" capable of processing text, audio, and video. While it delivers big improvements in speed, cost, and reasoning ability, perhaps the most impressive is its new voice mode -- while the old version was a clunky speech --> text --> speech approach with tons of latency, the new model takes in audio directly and responds in kind, enabling real-time conversations with an eerily realistic voice, one that can recognize multiple speakers and even respond with sarcasm, laughter, and other emotional content of speech. Rumor has it Apple has neared a deal with the company to revamp an aging Siri, while the advance has clear implications for customer service, translation, education, and even virtual companions (or perhaps "lovers", as the allusions to Spike Jonze's Her, the Samantha-esque demo voice, and opening the door to mature content imply). Meanwhile, the offloading of most premium ChatGPT features to the free tier suggests something bigger coming down the pike.

Normal view