OpenAI unveils GPT-4o, a new flagship "omnimodel" capable of processing text, audio, and video. While it delivers big improvements in speed, cost, and reasoning ability, perhaps the most impressive is its new voice mode -- while
the old version was a clunky speech --> text --> speech approach with tons of latency, the new model takes in audio directly and responds in kind, enabling real-time conversations with an eerily realistic voice, one that can
recognize multiple speakers and even
respond with sarcasm, laughter, and other emotional content of speech. Rumor has it
Apple has neared a deal with the company to revamp an aging Siri, while the advance has clear implications for
customer service,
translation,
education, and even
virtual companions (or perhaps
"lovers", as the
allusions to Spike Jonze's
Her, the Samantha-esque demo voice, and
opening the door to mature content imply). Meanwhile, the
offloading of most premium ChatGPT features to the free tier suggests something bigger coming down the pike.