Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Google’s Astra is its first AI-for-everything agent

14 May 2024 at 13:55

Google is set to introduce a new system called Astra later this year and promises that it will be the most powerful, advanced type of AI assistant it’s ever launched. 

The current generation of AI assistants, such as ChatGPT, can retrieve information and offer answers, but that is about it. But this year, Google is rebranding its assistants as more advanced “agents,” which it says could  show reasoning, planning, and memory skills and are able to take multiple steps to execute tasks. 

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology Review

“We are in very early days [of AI agent development],” Google CEO Sundar Pichai said on a call ahead of Google’s I/O conference today. 

“We’ve always wanted to build a universal agent that will be useful in everyday life,” said Demis Hassabis, the CEO and cofounder of Google DeepMind. “Imagine agents that can see and hear what we do, better understand the context we’re in, and respond quickly in conversation, making the pace and quality of interaction feel much more natural.” That, he says, is what Astra will be. 

Google’s announcement comes a day after competitor OpenAI unveiled its own supercharged AI assistant, GPT-4o. Google DeepMind’s Astra responds to audio and video inputs, much in the same way as GPT-4o (albeit it less flirtatiously). 

In a press demo, a user pointed a smartphone camera and smart glasses at things and asked Astra to explain what they were. When the person pointed the device out the window and asked “What neighborhood do you think I’m in?” the AI system was able to identify King’s Cross, London, site of Google DeepMind’s headquarters. It was also able to say that the person’s glasses were on a desk, having recorded them earlier in the interaction. 

The demo showcases Google DeepMind’s vision of multimodal AI (which can handle multiple types of input—voice, video, text, and so on) working in real time, Vinyals says. 

“We are very excited about, in the future, to be able to really just get closer to the user, assist the user with anything that they want,” he says. Google recently upgraded its artificial-intelligence model Gemini to process even larger amounts of data, an upgrade which helps it handle bigger documents and videos, and have longer conversations. 

Tech companies are in the middle of a fierce competition over AI supremacy, and  AI agents are the latest effort from Big Tech firms to show they are pushing the frontier of development. Agents also play into a narrative by many tech companies, including OpenAI and Google DeepMind, that aim to build artificial general intelligence, a highly hypothetical idea of superintelligent AI systems. 

“Eventually, you’ll have this one agent that really knows you well, can do lots of things for you, and can work across multiple tasks and domains,” says Chirag Shah, a professor at the University of Washington who specializes in online search.

This vision is still aspirational. But today’s announcement should be seen as Google’s attempt to keep up with competitors. And by rushing these products out, Google can collect even more data from its over a billion users on how they are using their models and what works, Shah says.

Google is unveiling many more new AI capabilities beyond agents today. It’s going to integrate AI more deeply into Search through a new feature called AI overviews, which gather information from the internet and package them into short summaries in response to search queries. The feature, which launches today, will initially be available only in the US, with more countries to gain access later. 

This will help speed up the search process and get users more specific answers to more complex, niche questions, says Felix Simon, a research fellow in AI and digital news at the Reuters Institute for Journalism. “I think that’s where Search has always struggled,” he says. 

Another new feature of Google’s AI Search offering is better planning. People will soon be able to ask Search to make meal and travel suggestions, for example, much like asking a travel agent to suggest restaurants and hotels. Gemini will be able to help them plan what they need to do or buy to cook recipes, and they will also be able to have conversations with the AI system, asking it to do anything from relatively mundane tasks, such as informing them about the weather forecast, to highly complex ones like helping them prepare for a job interview or an important speech. 

People will also be able to interrupt Gemini midsentence and ask clarifying questions, much as in a real conversation. 

In another move to one-up competitor OpenAI, Google also unveiled Veo, a new video-generating AI system. Veo is able to generate short videos and allows users more control over cinematic styles by understanding prompts like “time lapse” or “aerial shots of a landscape.”

Google has a significant advantage when it comes to training generative video models, because it owns YouTube. It’s already announced collaborations with artists such as Donald Glover and Wycleaf Jean, who are using its technology to produce their work. 

Earlier this year, OpenA’s CTO, Mira Murati, fumbled when asked about whether the company’s model was trained on YouTube data. Douglas Eck, senior research director at Google DeepMind, was also vague about the training data used to create Veo when asked about by MIT Technology Review, but he said that it “may be trained on some YouTube content in accordance with our agreements with YouTube creators.”

On one hand, Google is presenting its generative AI as a tool artists can use to make stuff, but the tools likely get their ability to create that stuff by using material from existing artists, says Shah. AI companies such as Google and OpenAI have faced a slew of lawsuits by writers and artists claiming that their intellectual property has been used without consent or compensation.  

“For artists it’s a double-edged sword,” says Shah. 

What to expect at Google I/O

14 May 2024 at 06:42

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In the world of AI, a lot can happen in a year. Last year, at the beginning of Big Tech’s AI wars, Google announced during its annual I/O conference that it was throwing generative AI at everything, integrating it into its suite of products from Docs to email to e-commerce listings and its chatbot Bard. It was an effort to catch up with competitors like Microsoft and OpenAI, which had unveiled snazzy products like coding assistants and ChatGPT, the product that has done more than any other to ignite the current excitement about AI.

Since then, its ChatGPT competitor chatbot Bard (which, you may recall, temporarily wiped $100 billion off Google’s share price when it made a factual error during the demo) has been replaced by the more advanced Gemini. But, for me, the AI revolution hasn’t felt like one. Instead, it’s been a slow slide toward marginal efficiency gains. I see more autocomplete functions in my email and word processing applications, and Google Docs now offers more ready-made templates. They are not groundbreaking features, but they are also reassuringly inoffensive. 

Google is holding its I/O conference tomorrow, May 14, and we expect them to announce a whole new slew of AI features, further embedding it into everything it does. The company is tight-lipped about its announcements, but we can make educated guesses. There has been a lot of speculation that it will upgrade its crown jewel, Search, with generative AI features that could, for example, go behind a paywall. Perhaps we will see Google’s version of AI agents, a buzzy word that basically means more capable and useful smart assistants able to do more complex tasks, such as booking flights and hotels much as a travel agent would. 

Google, despite having 90% of the online search market, is in a defensive position this year. Upstarts such as Perplexity AI have launched their own versions of AI-powered search to rave reviews, Microsoft’s AI-powered Bing has managed to increase its market share slightly, and OpenAI is working on its own AI-powered online search function and is also reportedly in conversation with Apple to integrate ChatGPT into smartphones

There are some hints about what any new AI-powered search features might look like. Felix Simon, a research fellow at the Reuters Institute for Journalism, has been part of the Google Search Generative Experience trial, which is the company’s way of testing new products on a small selection of real users. 

Last month, Simon noticed that his Google searches with links and short snippets from online sources had been replaced by more detailed, neatly packaged AI-generated summaries. He was able to get these results from queries related to nature and health, such as “Do snakes have ears?” Most of the information offered to him was correct, which was a surprise, as AI language models have a tendency to “hallucinate” (which means make stuff up), and they have been criticized for being an unreliable source of information. 

To Simon’s surprise, he enjoyed the new feature. “It’s convenient to ask [the AI] to get something presented just for you,” he says. 

Simon then started using the new AI-powered Google function to search for news items rather than scientific information.

For most of these queries, such as what happened in the UK or Ukraine yesterday, he was simply offered links to news sources such as the BBC and Al Jazeera. But he did manage to get the search engine to generate an overview of recent news items from Germany, in the form of a bullet-pointed list of news headlines from the day before. The first entry was about an attack on Franziska Giffey, a Berlin politician who was assaulted in a library. The AI summary had the date of the attack wrong. But it was so close to the truth that Simon didn’t think twice about its accuracy. 

A quick online search during our call revealed that the rest of the AI-generated news summaries were also littered with inaccuracies. Details were wrong, or the events referred to happened years ago. All the stories were also about terrorism, hate crimes, or violence, with one soccer result thrown in. Omitting headlines on politics, culture, and the economy seems like a weird choice.  

People have a tendency to believe computers to be correct even when they are not, and Simon’s experience is an example of the kinds of problems that might arise when AI models hallucinate. The ease of getting results means that people might unknowingly ingest fake news or wrong information. It’s very problematic if even people like Simon, who are trained to fact-check things and know how AI models work, don’t do their due diligence and assume information is correct. 

Whatever Google announces at I/O tomorrow, there is immense pressure for it to be something that would justify its massive investment into AI. And after a year of experimenting, there also need to be serious improvements in making its generative AI tools more accurate and reliable. 

There are some people in the computer science community who say that hallucinations are an intrinsic part of generative AI that can’t ever be fixed, and that we can never fully trust these systems. But hallucinations will make AI-powered products less appealing to users. And it’s highly unlikely that Google will announce it has fixed this problem at I/O tomorrow. 

If you want to learn more about how Google plans to develop and deploy AI, come and hear from its vice president of AI, Jay Yagnik, at our flagship AI conference, EmTech Digital. It’ll be held at the MIT campus and streamed live online next week on May 22-23.  I’ll be there, along with AI leaders from companies like OpenAI, AWS, and Nvidia, talking about where AI is going next. Nick Clegg, Meta’s president of global affairs, will also join MIT Technology Review’s executive editor Amy Nordrum for an exclusive interview on stage. See you there! 

Readers of The Algorithm get 30% off tickets with the code ALGORITHMD24.


Now read the rest of The Algorithm

Deeper Learning

Deepfakes of your dead loved ones are a booming Chinese business

Once a week, Sun Kai has a video call with his mother. He opens up about work, the pressures he faces as a middle-aged man, and thoughts that he doesn’t even discuss with his wife. His mother will occasionally make a comment, but mostly, she just listens. That’s because Sun’s mother died five years ago. And the person he’s talking to isn’t actually a person, but a digital replica he made of her—a moving image that can conduct basic conversations. 

AI resurrection: There are plenty of people like Sun who want to use AI to interact with lost loved ones. The market is particularly strong in China, where at least half a dozen companies are now offering such technologies. In some ways, the avatars are the latest manifestation of a cultural tradition: Chinese people have always taken solace from confiding in the dead. Read more from Zeyi Yang

Bits and Bytes

Google DeepMind’s new AlphaFold can model a much larger slice of biological life
Google DeepMind has released an improved version of its biology prediction tool, AlphaFold, that can predict the structures not only of proteins but of nearly all the elements of biological life. It’s an exciting development that could help accelerate drug discovery and other scientific research. ​​(MIT Technology Review

The way whales communicate is closer to human language than we realized
Researchers used statistical models to analyze whale “codas” and managed to identify a structure to their language that’s similar to features of the complex vocalizations humans use. It’s a small step forward, but it could help unlock a greater understanding of how whales communicate. (MIT Technology Review)

Tech workers should shine a light on the industry’s secretive work with the military
Despite what happens in Google’s executive suites, workers themselves can force change. William Fitzgerald, who leaked information about Google’s controversial Project Maven, has shared how he thinks they can do this. (MIT Technology Review

AI systems are getting better at tricking us
A wave of AI systems have “deceived” humans in ways they haven’t been explicitly trained to do, by offering up false explanations for their behavior or concealing the truth from human users and misleading them to achieve a strategic end. This issue highlights how difficult artificial intelligence is to control and the unpredictable ways in which these systems work. (MIT Technology Review

Why America needs an Apollo program for the age of AI
AI is crucial to the future security and prosperity of the US. We need to lay the groundwork now by investing in computational power, argues Eric Schmidt. (MIT Technology Review

Fooled by AI? These firms sell deepfake detection that’s “REAL 100%”
The AI detection business is booming. There is one catch, however. Detecting AI-generated content is notoriously unreliable, and the tech is still in its infancy. That hasn’t stopped some startup founders (many of whom have no experience or background in AI) from trying to sell services they claim can do so. (The Washington Post

The tech-bro turf war over AI’s most hardcore hacker house
A hilarious piece taking an anthropological look at the power struggle between two competing hacker houses in Silicon Valley. The fight is over which house can call itself “AGI House.” (Forbes

My deepfake shows how valuable our data is in the age of AI

30 April 2024 at 05:23

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Deepfakes are getting good. Like, really good. Earlier this month I went to a studio in East London to get myself digitally cloned by the AI video startup Synthesia. They made a hyperrealistic deepfake that looked and sounded just like me, with realistic intonation. It is a long way away from the glitchiness of earlier generations of AI avatars. The end result was mind-blowing. It could easily fool someone who doesn’t know me well.

Synthesia has managed to create AI avatars that are remarkably humanlike after only one year of tinkering with the latest generation of generative AI. It’s equally exciting and daunting thinking about where this technology is going. It will soon be very difficult to differentiate between what is real and what is not, and this is a particularly acute threat given the record number of elections happening around the world this year. 

We are not ready for what is coming. If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI. 

I just published a story on my deepfake creation experience, and on the big questions about a world where we increasingly can’t tell what’s real. Read it here

But there is another big question: What happens to our data once we submit it to AI companies? Synthesia says it does not sell the data it collects from actors and customers, although it does release some of it for academic research purposes. The company uses avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company deletes their data.

But other companies are not that transparent about their intentions. As my colleague Eileen Guo reported last year, companies such as Meta license actors’ data—including their faces and  expressions—in a way that allows the companies to do whatever they want with it. Actors are paid a small up-front fee, but their likeness can then be used to train AI models in perpetuity without their knowledge. 

Even if contracts for data are transparent, they don’t apply if you die, says Carl Öhman, an assistant professor at Uppsala University who has studied the online data left by deceased people and is the author of a new book, The Afterlife of Data. The data we input into social media platforms or AI models might end up benefiting companies and living on long after we’re gone. 

“Facebook is projected to host, within the next couple of decades, a couple of billion dead profiles,” Öhman says. “They’re not really commercially viable. Dead people don’t click on any ads, but they take up server space nevertheless,” he adds. This data could be used to train new AI models, or to make inferences about the descendants of those deceased users. The whole model of data and consent with AI presumes that both the data subject and the company will live on forever, Öhman says.

Our data is a hot commodity. AI language models are trained by indiscriminately scraping the web, and that also includes our personal data. A couple of years ago I tested to see if GPT-3, the predecessor of the language model powering ChatGPT, has anything on me. It struggled, but I found that I was able to retrieve personal information about MIT Technology Review’s editor in chief, Mat Honan. 

High-quality, human-written data is crucial to training the next generation of powerful AI models, and we are on the verge of running out of free online training data. That’s why AI companies are racing to strike deals with news organizations and publishers to access their data treasure chests. 

Old social media sites are also a potential gold mine: when companies go out of business or platforms stop being popular, their assets, including users’ data, get sold to the highest bidder, says Öhman. 

“MySpace data has been bought and sold multiple times since MySpace crashed. And something similar may well happen to Synthesia, or X, or TikTok,” he says. 

Some people may not care much about what happens to their data, says Öhman. But securing exclusive access to high-quality data helps cement the monopoly position of large corporations, and that harms us all. This is something we need to grapple with as a society, he adds. 

Synthesia said it will delete my avatar after my experiment, but the whole experience did make me think of all the cringeworthy photos and posts that haunt me on Facebook and other social media platforms. I think it’s time for a purge.


Now read the rest of The Algorithm

Deeper Learning

Chatbot answers are all made up. This new tool helps you figure out which ones to trust.

Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk. A new tool created by Cleanlab, an AI startup spun out of MIT, is designed to provide a clearer sense of how trustworthy these models really are. 

A BS-o-meter for chatbots: Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. Read more from Will Douglas Heaven.

Bits and Bytes

Here’s the defense tech at the center of US aid to Israel, Ukraine, and Taiwan
President Joe Biden signed a $95 billion aid package into law last week. The bill will send a significant quantity of supplies to Ukraine and Israel, while also supporting Taiwan with submarine technology to aid its defenses against China. (MIT Technology Review

Rishi Sunak promised to make AI safe. Big Tech’s not playing ball.
The UK’s prime minister thought he secured a political win when he got AI power players to agree to voluntary safety testing with the UK’s new AI Safety Institute. Six months on, it turns out pinkie promises don’t go very far. OpenAI and Meta have not granted access to the AI Safety Institute to do prerelease safety testing on their models. (Politico

Inside the race to find AI’s killer app
The AI hype bubble is starting to deflate as companies try to find a way to make profits out of the eye-wateringly expensive process of developing and running this technology. Tech companies haven’t solved some of the fundamental problems slowing its wider adoption, such as the fact that generative models constantly make things up. (The Washington Post)  

Why the AI industry’s thirst for new data centers can’t be satisfied
The current boom in data-hungry AI means there is now a shortage of parts, property, and power to build data centers. (The Wall Street Journal

The friends who became rivals in Big Tech’s AI race
This story is a fascinating look into one of the most famous and fractious relationships in AI. Demis Hassabis and Mustafa Suleyman are old friends who grew up in London and went on to cofound AI lab DeepMind. Suleyman was ousted following a bullying scandal, went on to start his own short-lived startup, and now heads rival Microsoft’s AI efforts, while Hassabis still runs DeepMind, which is now Google’s central AI research lab. (The New York Times

This creamy vegan cheese was made with AI
Startups are using artificial intelligence to design plant-based foods. The companies train algorithms on data sets of ingredients with desirable traits like flavor, scent, or stretchability. Then they use AI to comb troves of data to develop new combinations of those ingredients that perform similarly. (MIT Technology Review

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

25 April 2024 at 01:00

I’m stressed and running late, because what do you wear for the rest of eternity? 

This makes it sound like I’m dying, but it’s the opposite. I am, in a way, about to live forever, thanks to the AI video startup Synthesia. For the past several years, the company has produced AI-generated avatars, but today it launches a new generation, its first to take advantage of the latest advancements in generative AI, and they are more realistic and expressive than anything I’ve ever seen. While today’s release means almost anyone will now be able to make a digital double, on this early April afternoon, before the technology goes public, they’ve agreed to make one of me. 

When I finally arrive at the company’s stylish studio in East London, I am greeted by Tosin Oshinyemi, the company’s production lead. He is going to guide and direct me through the data collection process—and by “data collection,” I mean the capture of my facial features, mannerisms, and more—much like he normally does for actors and Synthesia’s customers. 

In this AI-generated footage, synthetic “Melissa” gives a performance of Hamlet’s famous soliloquy. (The magazine had no role in producing this video.)
SYNTHESIA

He introduces me to a waiting stylist and a makeup artist, and I curse myself for wasting so much time getting ready. Their job is to ensure that people have the kind of clothes that look good on camera and that they look consistent from one shot to the next. The stylist tells me my outfit is fine (phew), and the makeup artist touches up my face and tidies my baby hairs. The dressing room is decorated with hundreds of smiling Polaroids of people who have been digitally cloned before me. 

Apart from the small supercomputer whirring in the corridor, which processes the data generated at the studio, this feels more like going into a news studio than entering a deepfake factory. 

I joke that Oshinyemi has what MIT Technology Review might call a job title of the future: “deepfake creation director.” 

“We like the term ‘synthetic media’ as opposed to ‘deepfake,’” he says. 

It’s a subtle but, some would argue, notable difference in semantics. Both mean AI-generated videos or audio recordings of people doing or saying something that didn’t necessarily happen in real life. But deepfakes have a bad reputation. Since their inception nearly a decade ago, the term has come to signal something unethical, says Alexandru Voica, Synthesia’s head of corporate affairs and policy. Think of sexual content produced without consent, or political campaigns that spread disinformation or propaganda.

“Synthetic media is the more benign, productive version of that,” he argues. And Synthesia wants to offer the best version of that version.  

Until now, all AI-generated videos of people have tended to have some stiffness, glitchiness, or other unnatural elements that make them pretty easy to differentiate from reality. Because they’re so close to the real thing but not quite it, these videos can make people feel annoyed or uneasy or icky—a phenomenon commonly known as the uncanny valley. Synthesia claims its new technology will finally lead us out of the valley. 

Thanks to rapid advancements in generative AI and a glut of training data created by human actors that has been fed into its AI model, Synthesia has been able to produce avatars that are indeed more humanlike and more expressive than their predecessors. The digital clones are better able to match their reactions and intonation to the sentiment of their scripts—acting more upbeat when talking about happy things, for instance, and more serious or sad when talking about unpleasant things. They also do a better job matching facial expressions—the tiny movements that can speak for us without words. 

But this technological progress also signals a much larger social and cultural shift. Increasingly, so much of what we see on our screens is generated (or at least tinkered with) by AI, and it is becoming more and more difficult to distinguish what is real from what is not. This threatens our trust in everything we see, which could have very real, very dangerous consequences. 

“I think we might just have to say goodbye to finding out about the truth in a quick way,” says Sandra Wachter, a professor at the Oxford Internet Institute, who researches the legal and ethical implications of AI. “The idea that you can just quickly Google something and know what’s fact and what’s fiction—I don’t think it works like that anymore.” 

monitor on a video camera showing Heikkilä and Oshinyemi on set in front of the green screen
Tosin Oshinyemi, the company’s production lead, guides and directs actors and customers through the data collection process.
DAVID VINTINER

So while I was excited for Synthesia to make my digital double, I also wondered if the distinction between synthetic media and deepfakes is fundamentally meaningless. Even if the former centers a creator’s intent and, critically, a subject’s consent, is there really a way to make AI avatars safely if the end result is the same? And do we really want to get out of the uncanny valley if it means we can no longer grasp the truth?

But more urgently, it was time to find out what it’s like to see a post-truth version of yourself.

Almost the real thing

A month before my trip to the studio, I visited Synthesia CEO Victor Riparbelli at his office near Oxford Circus. As Riparbelli tells it, Synthesia’s origin story stems from his experiences exploring avant-garde, geeky techno music while growing up in Denmark. The internet allowed him to download software and produce his own songs without buying expensive synthesizers. 

“I’m a huge believer in giving people the ability to express themselves in the way that they can, because I think that that provides for a more meritocratic world,” he tells me. 

He saw the possibility of doing something similar with video when he came across research on using deep learning to transfer expressions from one human face to another on screen. 

“What that showcased was the first time a deep-learning network could produce video frames that looked and felt real,” he says. 

That research was conducted by Matthias Niessner, a professor at the Technical University of Munich, who cofounded Synthesia with Riparbelli in 2017, alongside University College London professor Lourdes Agapito and Steffen Tjerrild, whom Riparbelli had previously worked with on a cryptocurrency project. 

Initially the company built lip-synching and dubbing tools for the entertainment industry, but it found that the bar for this technology’s quality was very high and there wasn’t much demand for it. Synthesia changed direction in 2020 and launched its first generation of AI avatars for corporate clients. That pivot paid off. In 2023, Synthesia achieved unicorn status, meaning it was valued at over $1 billion—making it one of the relatively few European AI companies to do so. 

That first generation of avatars looked clunky, with looped movements and little variation. Subsequent iterations started looking more human, but they still struggled to say complicated words, and things were slightly out of sync. 

The challenge is that people are used to looking at other people’s faces. “We as humans know what real humans do,” says Jonathan Starck, Synthesia’s CTO. Since infancy, “you’re really tuned in to people and faces. You know what’s right, so anything that’s not quite right really jumps out a mile.” 

These earlier AI-generated videos, like deepfakes more broadly, were made using generative adversarial networks, or GANs—an older technique for generating images and videos that uses two neural networks that play off one another. It was a laborious and complicated process, and the technology was unstable. 

But in the generative AI boom of the last year or so, the company has found it can create much better avatars using generative neural networks that produce higher quality more consistently. The more data these models are fed, the better they learn. Synthesia uses both large language models and diffusion models to do this; the former help the avatars react to the script, and the latter generate the pixels. 

Despite the leap in quality, the company is still not pitching itself to the entertainment industry. Synthesia continues to see itself as a platform for businesses. Its bet is this: As people spend more time watching videos on YouTube and TikTok, there will be more demand for video content. Young people are already skipping traditional search and defaulting to TikTok for information presented in video form. Riparbelli argues that Synthesia’s tech could help companies convert their boring corporate comms and reports and training materials into content people will actually watch and engage with. He also suggests it could be used to make marketing materials. 

He claims Synthesia’s technology is used by 56% of the Fortune 100, with the vast majority of those companies using it for internal communication. The company lists Zoom, Xerox, Microsoft, and Reuters as clients. Services start at $22 a month.

This, the company hopes, will be a cheaper and more efficient alternative to video from a professional production company—and one that may be nearly indistinguishable from it. Riparbelli tells me its newest avatars could easily fool a person into thinking they are real. 

“I think we’re 98% there,” he says. 

For better or worse, I am about to see it for myself. 

Don’t be garbage

In AI research, there is a saying: Garbage in, garbage out. If the data that went into training an AI model is trash, that will be reflected in the outputs of the model. The more data points the AI model has captured of my facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be. 

Back in the studio, I’m trying really hard not to be garbage. 

I am standing in front of a green screen, and Oshinyemi guides me through the initial calibration process, where I have to move my head and then eyes in a circular motion. Apparently, this will allow the system to understand my natural colors and facial features. I am then asked to say the sentence “All the boys ate a fish,” which will capture all the mouth movements needed to form vowels and consonants. We also film footage of me “idling” in silence.

image of Melissa standing on her mark in front of a green screen with server racks in background image
The more data points the AI system has on facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be.
DAVID VINTINER

He then asks me to read a script for a fictitious YouTuber in different tones, directing me on the spectrum of emotions I should convey. First I’m supposed to read it in a neutral, informative way, then in an encouraging way, an annoyed and complain-y way, and finally an excited, convincing way. 

“Hey, everyone—welcome back to Elevate Her with your host, Jess Mars. It’s great to have you here. We’re about to take on a topic that’s pretty delicate and honestly hits close to home—dealing with criticism in our spiritual journey,” I read off the teleprompter, simultaneously trying to visualize ranting about something to my partner during the complain-y version. “No matter where you look, it feels like there’s always a critical voice ready to chime in, doesn’t it?” 

Don’t be garbage, don’t be garbage, don’t be garbage. 

“That was really good. I was watching it and I was like, ‘Well, this is true. She’s definitely complaining,’” Oshinyemi says, encouragingly. Next time, maybe add some judgment, he suggests.   

We film several takes featuring different variations of the script. In some versions I’m allowed to move my hands around. In others, Oshinyemi asks me to hold a metal pin between my fingers as I do. This is to test the “edges” of the technology’s capabilities when it comes to communicating with hands, Oshinyemi says. 

Historically, making AI avatars look natural and matching mouth movements to speech has been a very difficult challenge, says David Barber, a professor of machine learning at University College London who is not involved in Synthesia’s work. That is because the problem goes far beyond mouth movements; you have to think about eyebrows, all the muscles in the face, shoulder shrugs, and the numerous different small movements that humans use to express themselves. 

motion capture stage with detail of a mocap pattern inset
The motion capture process uses reference patterns to help align footage captured from multiple angles around the subject.
DAVID VINTINER

Synthesia has worked with actors to train its models since 2020, and their doubles make up the 225 stock avatars that are available for customers to animate with their own scripts. But to train its latest generation of avatars, Synthesia needed more data; it has spent the past year working with around 1,000 professional actors in London and New York. (Synthesia says it does not sell the data it collects, although it does release some of it for academic research purposes.)

The actors previously got paid each time their avatar was used, but now the company pays them an up-front fee to train the AI model. Synthesia uses their avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company will delete their data. Synthesia’s enterprise customers can also generate their own custom avatars by sending someone into the studio to do much of what I’m doing.

photograph of a teleprompter screen with three arrows pointing down to "HEAD then EYES>"
The initial calibration process allows the system to understand the subject’s natural colors and facial features.
Melissa recording audio into a boom mic seated in front of a laptop stand
Synthesia also collects voice samples. In the studio, I read a passage indicating that I explicitly consent to having my voice cloned.

Between takes, the makeup artist comes in and does some touch-ups to make sure I look the same in every shot. I can feel myself blushing because of the lights in the studio, but also because of the acting. After the team has collected all the shots it needs to capture my facial expressions, I go downstairs to read more text aloud for voice samples. 

This process requires me to read a passage indicating that I explicitly consent to having my voice cloned, and that it can be used on Voica’s account on the Synthesia platform to generate videos and speech. 

Consent is key

This process is very different from the way many AI avatars, deepfakes, or synthetic media—whatever you want to call them—are created. 

Most deepfakes aren’t created in a studio. Studies have shown that the vast majority of deepfakes online are nonconsensual sexual content, usually using images stolen from social media. Generative AI has made the creation of these deepfakes easy and cheap, and there have been several high-profile cases in the US and Europe of children and women being abused in this way. Experts have also raised alarms that the technology can be used to spread political disinformation, a particularly acute threat given the record number of elections happening around the world this year. 

Synthesia’s policy is to not create avatars of people without their explicit consent. But it hasn’t been immune from abuse. Last year, researchers found pro-China misinformation that was created using Synthesia’s avatars and packaged as news, which the company said violated its terms of service. 

Since then, the company has put more rigorous verification and content moderation systems in place. It applies a watermark with information on where and how the AI avatar videos were created. Where it once had four in-house content moderators, people doing this work now make up 10% of its 300-person staff. It also hired an engineer to build better AI-powered content moderation systems. These filters help Synthesia vet every single thing its customers try to generate. Anything suspicious or ambiguous, such as content about cryptocurrencies or sexual health, gets forwarded to the human content moderators. Synthesia also keeps a record of all the videos its system creates.

And while anyone can join the platform, many features aren’t available until people go through an extensive vetting system similar to that used by the banking industry, which includes talking to the sales team, signing legal contracts, and submitting to security auditing, says Voica. Entry-level customers are limited to producing strictly factual content, and only enterprise customers using custom avatars can generate content that contains opinions. On top of this, only accredited news organizations are allowed to create content on current affairs.

“We can’t claim to be perfect. If people report things to us, we take quick action, [such as] banning or limiting individuals or organizations,” Voica says. But he believes these measures work as a deterrent, which means most bad actors will turn to freely available open-source tools instead. 

I put some of these limits to the test when I head to Synthesia’s office for the next step in my avatar generation process. In order to create the videos that will feature my avatar, I have to write a script. Using Voica’s account, I decide to use passages from Hamlet, as well as previous articles I have written. I also use a new feature on the Synthesia platform, which is an AI assistant that transforms any web link or document into a ready-made script. I try to get my avatar to read news about the European Union’s new sanctions against Iran. 

Voica immediately texts me: “You got me in trouble!” 

The system has flagged his account for trying to generate content that is restricted.

screencap from Synthesia video with text overlay "Your video was moderated for violating our Disinformation & Misinformation: Media Reporting (News) guidelines. If you believe this was an error please submit an appeal here."
AI-powered content filters help Synthesia vet every single thing its customers try to generate. Only accredited news organizations are allowed to create content on current affairs.
COURTESY OF SYNTHESIA

Offering services without these restrictions would be “a great growth strategy,” Riparbelli grumbles. But “ultimately, we have very strict rules on what you can create and what you cannot create. We think the right way to roll out these technologies in society is to be a little bit over-restrictive at the beginning.” 

Still, even if these guardrails operated perfectly, the ultimate result would nevertheless be an internet where everything is fake. And my experiment makes me wonder how we could possibly prepare ourselves. 

Our information landscape already feels very murky. On the one hand, there is heightened public awareness that AI-generated content is flourishing and could be a powerful tool for misinformation. But on the other, it is still unclear whether deepfakes are used for misinformation at scale and whether they’re broadly moving the needle to change people’s beliefs and behaviors. 

If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI. 

Claire Leibowicz, the head of the AI and media integrity at the nonprofit Partnership on AI, says she worries that growing awareness of this gap will make it easier to “plausibly deny and cast doubt on real material or media as evidence in many different contexts, not only in the news, [but] also in the courts, in the financial services industry, and in many of our institutions.” She tells me she’s heartened by the resources Synthesia has devoted to content moderation and consent but says that process is never flawless.

Even Riparbelli admits that in the short term, the proliferation of AI-generated content will probably cause trouble. While people have been trained not to believe everything they read, they still tend to trust images and videos, he adds. He says people now need to test AI products for themselves to see what is possible, and should not trust anything they see online unless they have verified it. 

Never mind that AI regulation is still patchy, and the tech sector’s efforts to verify content provenance are still in their early stages. Can consumers, with their varying degrees of media literacy, really fight the growing wave of harmful AI-generated content through individual action? 

Watch out, PowerPoint

The day after my final visit, Voica emails me the videos with my avatar. When the first one starts playing, I am taken aback. It’s as painful as seeing yourself on camera or hearing a recording of your voice. Then I catch myself. At first I thought the avatar was me. 

The more I watch videos of “myself,” the more I spiral. Do I really squint that much? Blink that much? And move my jaw like that? Jesus. 

It’s good. It’s really good. But it’s not perfect. “Weirdly good animation,” my partner texts me. 

“But the voice sometimes sounds exactly like you, and at other times like a generic American and with a weird tone,” he adds. “Weird AF.” 

He’s right. The voice is sometimes me, but in real life I umm and ahh more. What’s remarkable is that it picked up on an irregularity in the way I talk. My accent is a transatlantic mess, confused by years spent living in the UK, watching American TV, and attending international school. My avatar sometimes says the word “robot” in a British accent and other times in an American accent. It’s something that probably nobody else would notice. But the AI did. 

My avatar’s range of emotions is also limited. It delivers Shakespeare’s “To be or not to be” speech very matter-of-factly. I had guided it to be furious when reading a story I wrote about Taylor Swift’s nonconsensual nude deepfakes; the avatar is complain-y and judgy, for sure, but not angry. 

This isn’t the first time I’ve made myself a test subject for new AI. Not too long ago, I tried generating AI avatar images of myself, only to get a bunch of nudes. That experience was a jarring example of just how biased AI systems can be. But this experience—and this particular way of being immortalized—was definitely on a different level.

Carl Öhman, an assistant professor at Uppsala University who has studied digital remains and is the author of a new book, The Afterlife of Data, calls avatars like the ones I made “digital corpses.” 

“It looks exactly like you, but no one’s home,” he says. “It would be the equivalent of cloning you, but your clone is dead. And then you’re animating the corpse, so that it moves and talks, with electrical impulses.” 

That’s kind of how it feels. The little, nuanced ways I don’t recognize myself are enough to put me off. Then again, the avatar could quite possibly fool anyone who doesn’t know me very well. It really shines when presenting a story I wrote about how the field of robotics could be getting its own ChatGPT moment; the virtual AI assistant summarizes the long read into a decent short video, which my avatar narrates. It is not Shakespeare, but it’s better than many of the corporate presentations I’ve had to sit through. I think if I were using this to deliver an end-of-year report to my colleagues, maybe that level of authenticity would be enough. 

And that is the sell, according to Riparbelli: “What we’re doing is more like PowerPoint than it is like Hollywood.”

Once a likeness has been generated, Synthesia is able to generate video presentations quickly from a script. In this video, synthetic “Melissa” summarizes an article real Melissa wrote about Taylor Swift deepfakes.
SYNTHESIA

The newest generation of avatars certainly aren’t ready for the silver screen. They’re still stuck in portrait mode, only showing the avatar front-facing and from the waist up. But in the not-too-distant future, Riparbelli says, the company hopes to create avatars that can communicate with their hands and have conversations with one another. It is also planning for full-body avatars that can walk and move around in a space that a person has generated. (The rig to enable this technology already exists; in fact it’s where I am in the image at the top of this piece.)

But do we really want that? It feels like a bleak future where humans are consuming AI-generated content presented to them by AI-generated avatars and using AI to repackage that into more content, which will likely be scraped to generate more AI. If nothing else, this experiment made clear to me that the technology sector urgently needs to step up its content moderation practices and ensure that content provenance techniques such as watermarking are robust. 

Even if Synthesia’s technology and content moderation aren’t yet perfect, they’re significantly better than anything I have seen in the field before, and this is after only a year or so of the current boom in generative AI. AI development moves at breakneck speed, and it is both exciting and daunting to consider what AI avatars will look like in just a few years. Maybe in the future we will have to adopt safewords to indicate that you are in fact communicating with a real human, not an AI. 

But that day is not today. 

I found it weirdly comforting that in one of the videos, my avatar rants about nonconsensual deepfakes and says, in a sociopathically happy voice, “The tech giants? Oh! They’re making a killing!” 

I would never. 

Three things we learned about AI from EmTech Digital London

23 April 2024 at 05:55

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Last week, MIT Technology Review held its inaugural EmTech Digital conference in London. It was a great success! I loved seeing so many of you there asking excellent questions, and it was a couple of days full of brain-tickling insights about where AI is going next. 

Here are the three main things I took away from the conference.

1. AI avatars are getting really, really good

UK-based AI unicorn Synthesia teased its next generation of AI avatars, which are far more emotive and realistic than any I have ever seen before. The company is pitching these avatars as a new, more engaging way to communicate. Instead of skimming through pages and pages of onboarding material, for example, new employees could watch a video where a hyperrealistic AI avatar explains what they need to know about their job. This has the potential to change the way we communicate, allowing content creators to outsource their work to custom avatars and making it easier for organizations to share information with their staff. 

2. AI agents are coming 

Thanks to the ChatGPT boom, many of us have interacted with  an AI assistant that can retrieve information. But the next generation of these tools, called AI agents, can do much more than that. They are AI models and algorithms that can autonomously make decisions by themselves in a dynamic world. Imagine an AI travel agent that can not only retrieve information and suggest things to do, but also take action to book things for you, from flights to tours and accommodations. Every AI lab worth its salt, from OpenAI to Meta to startups, is racing to build agents that can reason better, memorize more steps, and interact with other apps and websites.  

3. Humans are not perfect either 

One of the best ways we have of ensuring that AI systems don’t go awry is getting humans to audit and evaluate them. But humans are complicated and biased, and we don’t always get things right. In order to build machines that meet our expectations​ and complement our limitations, we should account for human error from the get-go. In a fascinating presentation, Katie Collins, an AI researcher at the University of Cambridge, explained how she found that allowing people to express how certain or uncertain they are—for example, by using a percentage to indicate how confident they are in labeling data—leads to better accuracy for AI models overall. The only downside with this approach is that it costs more and takes more time.

And we’re doing it all again next month, this time at the mothership. 

Join us for EmTech Digital at the MIT campus in Cambridge, Massachusetts, on May 22-23, 2024. I’ll be there—join me! 

Our fantastic speakers include Nick Clegg, president of global affairs at Meta, who will talk about elections and AI-generated misinformation. We also have the OpenAI researchers who built the video-generation AI Sora, sharing their vision on how generative AI will change Hollywood. Then Max Tegmark, the MIT professor who wrote an open letter last year calling for a pause on AI development, will take stock of what has happened and discuss how to make powerful systems more safe. We also have a bunch of top scientists from the labs at Google, OpenAI, AWS, MIT, Nvidia and more. 

Readers of The Algorithm get 30% off with the discount code ALGORITHMD24.

I hope to see you there!


Now read the rest of The Algorithm

Deeper Learning

Researchers taught robots to run. Now they’re teaching them to walk.

Researchers at Oregon State University have successfully trained a humanoid robot called Digit V3 to stand, walk, pick up a box, and move it from one location to another. Meanwhile, a separate group of researchers from the University of California, Berkeley, have focused on teaching Digit to walk in unfamiliar environments while carrying different loads, without toppling over. 

What’s the big deal: Both groups are using an AI technique called sim-to-real reinforcement learning, a burgeoning method of training two-legged robots like Digit. Researchers believe it will lead to more robust, reliable two-legged machines capable of interacting with their surroundings more safely—as well as learning much more quickly. Read more from Rhiannon Williams

Bits and Bytes

It’s time to retire the term “user”
The proliferation of AI means we need a new word. Tools we once called AI bots have been assigned lofty titles like “copilot,” “assistant,” and “collaborator” to convey a sense of partnership instead of a sense of automation. But if AI is now a partner, then what are we? (MIT Technology Review

Three ways the US could help universities compete with tech companies on AI innovation
Empowering universities to remain at the forefront of AI research will be key to realizing the field’s long-term potential, argue Ylli Bajraktari, Tom Mitchell, and Daniela Rus. (MIT Technology Review

AI was supposed to make police body cams better. What happened?
New AI programs that analyze bodycam recordings promise more transparency but are doing little to change culture. This story serves as a useful reminder that technology is never a panacea for these sorts of deep-rooted issues. (MIT Technology Review

The World Health Organization’s AI chatbot makes stuff up
The World Health Organization launched a “virtual health worker“ to help people with questions about things like mental health, tobacco use, and healthy eating. But the chatbot frequently offers outdated information or just simply makes things up, a common issue with AI models. This is a great cautionary tale of why it’s not always a good idea to use AI chatbots. Hallucinating chatbots can lead to serious consequences when they are applied to important tasks such as giving health advice. (Bloomberg

Meta is adding AI assistants everywhere in its biggest AI push
The tech giant is rolling out its latest AI model, Llama 3, in most of its apps including Instagram, Facebook, and WhatsApp. People will also be able to ask its AI assistants for advice, or use them to search for information on the internet. (New York Times

Stability AI is in trouble
One of the first new generative AI unicorns, the company behind the open-source image-generating AI model Stable Diffusion, is laying off 10% of its workforce. Just a couple of weeks ago its CEO, Emad Mostaque, announced that he was leaving the company. Stability has also lost several high-profile researchers and struggled to monetize its product, and it is facing a slew of lawsuits over copyright. (The Verge

Three reasons robots are about to become way more useful 

16 April 2024 at 05:40

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The holy grail of robotics since the field’s beginning has been to build a robot that can do our housework. But for a long time, that has just been a dream. While roboticists have been able to get robots to do impressive things in the lab, such as parkour, this usually requires meticulous planning in a tightly-controlled setting. This makes it hard for robots to work reliably in homes around children and pets, homes have wildly varying floorplans, and contain all sorts of mess. 

There’s a well-known observation among roboticists called the Moravec’s paradox: What is hard for humans is easy for machines, and what is easy for humans is hard for machines. Thanks to AI, this is now changing. Robots are starting to become capable of doing tasks such as folding laundry, cooking and unloading shopping baskets, which not too long ago were seen as almost impossible tasks. 

In our most recent cover story for the MIT Technology Review print magazine, I looked at how robotics as a field is at an inflection point. You can read more here. A really exciting mix of things are converging in robotics research, which could usher in robots that might—just might—make it out of the lab and into our homes. 

Here are three reasons why robotics is on the brink of having its own “ChatGPT moment.”

1. Cheap hardware makes research more accessible
Robots are expensive. Highly sophisticated robots can easily cost hundreds of thousands of dollars, which makes them inaccessible for most researchers. For example the PR2, one of the earliest iterations of home robots, weighed 450 pounds (200 kilograms) and cost $400,000. 

But new, cheaper robots are allowing more researchers to do cool stuff. A new robot called Stretch, developed by startup Hello Robot, launched during the pandemic with a much more reasonable price tag of around $18,000 and a weight of 50 pounds. It has a small mobile base, a stick with a camera dangling off it, an adjustable arm featuring a gripper with suction cups at the ends, and it can be controlled with a console controller. 

Meanwhile, a team at Stanford has built a system called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), that learned to cook shrimp with the help of just 20 human demonstrations and data from other tasks. They used off-the-shelf components to cobble together robots with more reasonable price tags in the tens, not hundreds, of thousands.

2. AI is helping us build “robotic brains”
What separates this new crop of robots is their software. Thanks to the AI boom the focus is now shifting from feats of physical dexterity achieved by expensive robots to building “general-purpose robot brains” in the form of neural networks. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly. 

Last summer, Google launched a vision-language-­action model called RT-2. This model gets its general understanding of the world from the online text and images it has been trained on, as well as its own interactions. It translates that data into robotic actions. 

And researchers at the Toyota Research Institute, Columbia University and MIT have been able to quickly teach robots to do many new tasks with the help of an AI learning technique called imitation learning, plus generative AI. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements. 

Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks. 

3. More data allows robots to learn more skills
The power of large AI models such as GPT-4 lie in the reams and reams of data hoovered from the internet. But that doesn’t really work for robots, which need data that have been specifically collected for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded. Right now that data is very scarce, and it takes a long time for humans to collect.

A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving.  

Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.

Read more in my story here


Now read the rest of The Algorithm

Deeper Learning

Generative AI can turn your most precious memories into photos that never existed

Maria grew up in Barcelona, Spain, in the 1940s. Her first memories of her father are vivid. As a six-year-old, Maria would visit a neighbor’s apartment in her building when she wanted to see him. From there, she could peer through the railings of a balcony into the prison below and try to catch a glimpse of him through the small window of his cell, where he was locked up for opposing the dictatorship of Francisco Franco. There is no photo of Maria on that balcony. But she can now hold something like it: a fake photo—or memory-based reconstruction.

Remember this: Dozens of people have now had their memories turned into images in this way via Synthetic Memories, a project run by Barcelona-based design studio Domestic Data Streamers. Read this story by my colleague Will Douglas Heaven to find out more

Bits and Bytes

Why the Chinese government is sparing AI from harsh regulations—for now
The way China regulates its tech industry can seem highly unpredictable. The government can celebrate the achievements of Chinese tech companies one day and then turn against them the next. But there are patterns in China’s approach, and they indicate how it’ll regulate AI. (MIT Technology Review

AI could make better beer. Here’s how.
New AI models can accurately identify not only how tasty consumers will deem beers, but also what kinds of compounds brewers should be adding to make them taste better, according to research. (MIT Technology Review

OpenAI’s legal troubles are mounting
OpenAI is lawyering up as it faces a deluge of lawsuits both at home and abroad. The company has hired about two dozen in-house lawyers since last spring to work on copyright claims, and is also hiring an antitrust lawyer. The company’s new strategy is to try to position itself as America’s bulwark against China. (The Washington Post

Did Google’s AI actually discover millions of new materials?
Late last year, Google DeepMind claimed it had discovered millions of new materials using deep learning. But researchers who analyzed a subset of DeepMind’s work found that the company’s claims may have been overhyped, and that the company hadn’t found materials that were useful or credible. (404 Media

OpenAI and Meta are building new AI models capable of “reasoning”
The next generation of powerful AI models from OpenAI and Meta will be able to do more complex tasks, such as reason, plan and retain more information. This, tech companies believe, will allow them to be more reliable and not make the kind of silly mistakes that this generation of language models are so prone to. (The Financial Times

Is robotics about to have its own ChatGPT moment?

11 April 2024 at 05:00

Silent. Rigid. Clumsy.

Henry and Jane Evans are used to awkward houseguests. For more than a decade, the couple, who live in Los Altos Hills, California, have hosted a slew of robots in their home. 

In 2002, at age 40, Henry had a massive stroke, which left him with quadriplegia and an inability to speak. Since then, he’s learned how to communicate by moving his eyes over a letter board, but he is highly reliant on caregivers and his wife, Jane. 

Henry got a glimmer of a different kind of life when he saw Charlie Kemp on CNN in 2010. Kemp, a robotics professor at Georgia Tech, was on TV talking about PR2, a robot developed by the company Willow Garage. PR2 was a massive two-armed machine on wheels that looked like a crude metal butler. Kemp was demonstrating how the robot worked, and talking about his research on how health-care robots could help people. He showed how the PR2 robot could hand some medicine to the television host.    

“All of a sudden, Henry turns to me and says, ‘Why can’t that robot be an extension of my body?’ And I said, ‘Why not?’” Jane says. 

There was a solid reason why not. While engineers have made great progress in getting robots to work in tightly controlled environments like labs and factories, the home has proved difficult to design for. Out in the real, messy world, furniture and floor plans differ wildly; children and pets can jump in a robot’s way; and clothes that need folding come in different shapes, colors, and sizes. Managing such unpredictable settings and varied conditions has been beyond the capabilities of even the most advanced robot prototypes. 

That seems to finally be changing, in large part thanks to artificial intelligence. For decades, roboticists have more or less focused on controlling robots’ “bodies”—their arms, legs, levers, wheels, and the like—via purpose-­driven software. But a new generation of scientists and inventors believes that the previously missing ingredient of AI can give robots the ability to learn new skills and adapt to new environments faster than ever before. This new approach, just maybe, can finally bring robots out of the factory and into our homes. 

Progress won’t happen overnight, though, as the Evanses know far too well from their many years of using various robot prototypes. 

PR2 was the first robot they brought in, and it opened entirely new skills for Henry. It would hold a beard shaver and Henry would move his face against it, allowing him to shave and scratch an itch by himself for the first time in a decade. But at 450 pounds (200 kilograms) or so and $400,000, the robot was difficult to have around. “It could easily take out a wall in your house,” Jane says. “I wasn’t a big fan.”

More recently, the Evanses have been testing out a smaller robot called Stretch, which Kemp developed through his startup Hello Robot. The first iteration launched during the pandemic with a much more reasonable price tag of around $18,000. 

Stretch weighs about 50 pounds. It has a small mobile base, a stick with a camera dangling off it, and an adjustable arm featuring a gripper with suction cups at the ends. It can be controlled with a console controller. Henry controls Stretch using a laptop, with a tool that that tracks his head movements to move a cursor around. He is able to move his thumb and index finger enough to click a computer mouse. Last summer, Stretch was with the couple for more than a month, and Henry says it gave him a whole new level of autonomy. “It was practical, and I could see using it every day,” he says. 

a robot arm holds a brush over the head of Henry Evans which rests on a pillow
Henry Evans used the Stretch robot to brush his hair, eat, and even play with his granddaughter.
PETER ADAMS

Using his laptop, he could get the robot to brush his hair and have it hold fruit kebabs for him to snack on. It also opened up Henry’s relationship with his granddaughter Teddie. Before, they barely interacted. “She didn’t hug him at all goodbye. Nothing like that,” Jane says. But “Papa Wheelie” and Teddie used Stretch to play, engaging in relay races, bowling, and magnetic fishing. 

Stretch doesn’t have much in the way of smarts: it comes with some pre­installed software, such as the web interface that Henry uses to control it, and other capabilities such as AI-enabled navigation. The main benefit of Stretch is that people can plug in their own AI models and use them to do experiments. But it offers a glimpse of what a world with useful home robots could look like. Robots that can do many of the things humans do in the home—tasks such as folding laundry, cooking meals, and cleaning—have been a dream of robotics research since the inception of the field in the 1950s. For a long time, it’s been just that: “Robotics is full of dreamers,” says Kemp.

But the field is at an inflection point, says Ken Goldberg, a robotics professor at the University of California, Berkeley. Previous efforts to build a useful home robot, he says, have emphatically failed to meet the expectations set by popular culture—think the robotic maid from The Jetsons. Now things are very different. Thanks to cheap hardware like Stretch, along with efforts to collect and share data and advances in generative AI, robots are getting more competent and helpful faster than ever before. “We’re at a point where we’re very close to getting capability that is really going to be useful,” Goldberg says. 

Folding laundry, cooking shrimp, wiping surfaces, unloading shopping baskets—today’s AI-powered robots are learning to do tasks that for their predecessors would have been extremely difficult. 

Missing pieces

There’s a well-known observation among roboticists: What is hard for humans is easy for machines, and what is easy for humans is hard for machines. Called Moravec’s paradox, it was first articulated in the 1980s by Hans Moravec, thena roboticist at the Robotics Institute of Carnegie Mellon University. A robot can play chess or hold an object still for hours on end with no problem. Tying a shoelace, catching a ball, or having a conversation is another matter. 

There are three reasons for this, says Goldberg. First, robots lack precise control and coordination. Second, their understanding of the surrounding world is limited because they are reliant on cameras and sensors to perceive it. Third, they lack an innate sense of practical physics. 

“Pick up a hammer, and it will probably fall out of your gripper, unless you grab it near the heavy part. But you don’t know that if you just look at it, unless you know how hammers work,” Goldberg says. 

On top of these basic considerations, there are many other technical things that need to be just right, from motors to cameras to Wi-Fi connections, and hardware can be prohibitively expensive. 

Mechanically, we’ve been able to do fairly complex things for a while. In a video from 1957, two large robotic arms are dexterous enough to pinch a cigarette, place it in the mouth of a woman at a typewriter, and reapply her lipstick. But the intelligence and the spatial awareness of that robot came from the person who was operating it. 

""
In a video from 1957, a man operates two large robotic arms and uses the machine to apply a woman’s lipstick. Robots have come a long way since.
“LIGHTER SIDE OF THE NEWS –ATOMIC ROBOT A HANDY GUY” (1957) VIA YOUTUBE

“The missing piece is: How do we get software to do [these things] automatically?” says Deepak Pathak, an assistant professor of computer science at Carnegie Mellon.  

Researchers training robots have traditionally approached this problem by planning everything the robot does in excruciating detail. Robotics giant Boston Dynamics used this approach when it developed its boogying and parkouring humanoid robot Atlas. Cameras and computer vision are used to identify objects and scenes. Researchers then use that data to make models that can be used to predict with extreme precision what will happen if a robot moves a certain way. Using these models, roboticists plan the motions of their machines by writing a very specific list of actions for them to take. The engineers then test these motions in the laboratory many times and tweak them to perfection. 

This approach has its limits. Robots trained like this are strictly choreographed to work in one specific setting. Take them out of the laboratory and into an unfamiliar location, and they are likely to topple over. 

Compared with other fields, such as computer vision, robotics has been in the dark ages, Pathak says. But that might not be the case for much longer, because the field is seeing a big shake-up. Thanks to the AI boom, he says, the focus is now shifting from feats of physical dexterity to building “general-purpose robot brains” in the form of neural networks. Much as the human brain is adaptable and can control different aspects of the human body, these networks can be adapted to work in different robots and different scenarios. Early signs of this work show promising results. 

Robots, meet AI 

For a long time, robotics research was an unforgiving field, plagued by slow progress. At the Robotics Institute at Carnegie Mellon, where Pathak works, he says, “there used to be a saying that if you touch a robot, you add one year to your PhD.” Now, he says, students get exposure to many robots and see results in a matter of weeks.

What separates this new crop of robots is their software. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly. At the same time, new, cheaper hardware, such as off-the-shelf components and robots like Stretch, is making this sort of experimentation more accessible. 

Broadly speaking, there are two popular ways researchers are using AI to train robots. Pathak has been using reinforcement learning, an AI technique that allows systems to improve through trial and error, to get robots to adapt their movements in new environments. This is a technique that Boston Dynamics has also started using  in its robot “dogs” called Spot.

Deepak Pathak’s team at Carnegie Mellon has used an AI technique called reinforcement learning to create a robotic dog that can do extreme parkour with minimal pre-programming.

In 2022, Pathak’s team used this method to create four-legged robot “dogs” capable of scrambling up steps and navigating tricky terrain. The robots were first trained to move around in a general way in a simulator. Then they were set loose in the real world, with a single built-in camera and computer vision software to guide them. Other similar robots rely on tightly prescribed internal maps of the world and cannot navigate beyond them.

Pathak says the team’s approach was inspired by human navigation. Humans receive information about the surrounding world from their eyes, and this helps them instinctively place one foot in front of the other to get around in an appropriate way. Humans don’t typically look down at the ground under their feet when they walk, but a few steps ahead, at a spot where they want to go. Pathak’s team trained its robots to take a similar approach to walking: each one used the camera to look ahead. The robot was then able to memorize what was in front of it for long enough to guide its leg placement. The robots learned about the world in real time, without internal maps, and adjusted their behavior accordingly. At the time, experts told MIT Technology Review the technique was a “breakthrough in robot learning and autonomy” and could allow researchers to build legged robots capable of being deployed in the wild.   

Pathak’s robot dogs have since leveled up. The team’s latest algorithm allows a quadruped robot to do extreme parkour. The robot was again trained to move around in a general way in a simulation. But using reinforcement learning, it was then able to teach itself new skills on the go, such as how to jump long distances, walk on its front legs, and clamber up tall boxes twice its height. These behaviors were not something the researchers programmed. Instead, the robot learned through trial and error and visual input from its front camera. “I didn’t believe it was possible three years ago,” Pathak says. 

In the other popular technique, called imitation learning, models learn to perform tasks by, for example, imitating the actions of a human teleoperating a robot or using a VR headset to collect data on a robot. It’s a technique that has gone in and out of fashion over decades but has recently become more popular with robots that do manipulation tasks, says Russ Tedrake, vice president of robotics research at the Toyota Research Institute and an MIT professor.

By pairing this technique with generative AI, researchers at the Toyota Research Institute, Columbia University, and MIT have been able to quickly teach robots to do many new tasks. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements. 

The idea is to start with a human, who manually controls the robot to demonstrate behaviors such as whisking eggs or picking up plates. Using a technique called diffusion policy, the robot is then able to use the data fed into it to learn skills. The researchers have taught robots more than 200 skills, such as peeling vegetables and pouring liquids, and say they are working toward teaching 1,000 skills by the end of the year. 

Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks. 

The Toyota Research Institute team hopes this will one day lead to “large behavior models,” which are analogous to large language models, says Tedrake. “A lot of people think behavior cloning is going to get us to a ChatGPT moment for robotics,” he says. 

In a similar demonstration, earlier this year a team at Stanford managed to use a relatively cheap off-the-shelf robot costing $32,000 to do complex manipulation tasks such as cooking shrimp and cleaning stains. It learned those new skills quickly with AI. 

Called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), the robot learned to cook shrimp with the help of just 20 human demonstrations and data from other tasks, such as tearing off a paper towel or piece of tape. The Stanford researchers found that AI can help robots acquire transferable skills: training on one task can improve its performance for others.

While the current generation of generative AI works with images and language, researchers at the Toyota Research Institute, Columbia University, and MIT believe the approach can extend to the domain of robot motion.

This is all laying the groundwork for robots that can be useful in homes. Human needs change over time, and teaching robots to reliably do a wide range of tasks is important, as it will help them adapt to us. That is also crucial to commercialization—first-generation home robots will come with a hefty price tag, and the robots need to have enough useful skills for regular consumers to want to invest in them. 

For a long time, a lot of the robotics community was very skeptical of these kinds of approaches, says Chelsea Finn, an assistant professor of computer science and electrical engineering at Stanford University and an advisor for the Mobile ALOHA project. Finn says that nearly a decade ago, learning-based approaches were rare at robotics conferences and disparaged in the robotics community. “The [natural-language-processing] boom has been convincing more of the community that this approach is really, really powerful,” she says. 

There is one catch, however. In order to imitate new behaviors, the AI models need plenty of data. 

More is more

Unlike chatbots, which can be trained by using billions of data points hoovered from the internet, robots need data specifically created for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded, says Lerrel Pinto, an assistant professor of computer science at New York University. Right now that data is very scarce, and it takes a long time for humans to collect.

top frame shows a person recording themself opening a kitchen drawer with a grabber, and the bottom shows a robot attempting the same action
“ON BRINGING ROBOTS HOME,” NUR MUHAMMAD (MAHI) SHAFIULLAH, ET AL.

Some researchers are trying to use existing videos of humans doing things to train robots, hoping the machines will be able to copy the actions without the need for physical demonstrations. 

Pinto’s lab has also developed a neat, cheap data collection approach that connects robotic movements to desired actions. Researchers took a reacher-grabber stick, similar to ones used to pick up trash, and attached an iPhone to it. Human volunteers can use this system to film themselves doing household chores, mimicking the robot’s view of the end of its robotic arm. Using this stand-in for Stretch’s robotic arm and an open-source system called DOBB-E, Pinto’s team was able to get a Stretch robot to learn tasks such as pouring from a cup and opening shower curtains with just 20 minutes of iPhone data.  

But for more complex tasks, robots would need even more data and more demonstrations.  

The requisite scale would be hard to reach with DOBB-E, says Pinto, because you’d basically need to persuade every human on Earth to buy the reacher-­grabber system, collect data, and upload it to the internet. 

A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving.  

Sergey Levine, a computer scientist at UC Berkeley who participated in the project, says the goal was to create a “robot internet” by collecting data from labs around the world. This would give researchers access to bigger, more scalable, and more diverse data sets. The deep-learning revolution that led to the generative AI of today started in 2012 with the rise of ImageNet, a vast online data set of images. The Open X-Embodiment Collaboration is an attempt by the robotics community to do something similar for robot data. 

Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. 

When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.

“I don’t think anybody saw that coming,” says Vincent Vanhoucke, Google DeepMind’s head of robotics. “Suddenly there is a path to basically leveraging all these other sources of data to bring about very intelligent behaviors in robotics.”

Many roboticists think that large vision-language models, which are able to analyze image and language data, might offer robots important hints as to how the surrounding world works, Vanhoucke says. They offer semantic clues about the world and could help robots with reasoning, deducing things, and learning by interpreting images. To test this, researchers took a robot that had been trained on the larger model and asked it to point to a picture of Taylor Swift. The researchers had not shown the robot pictures of Swift, but it was still able to identify the pop star because it had a web-scale understanding of who she was even without photos of her in its data set, says Vanhoucke.

""
RT-2, a recent model for robotic control, was trained on online text and images as well as interactions with the real world.
KELSEY MCCLELLAN

Vanhoucke says Google DeepMind is increasingly using techniques similar to those it would use for machine translation to translate from English to robotics. Last summer, Google introduced a vision-language-­action model called RT-2. This model gets its general understanding of the world from online text and images it has been trained on, as well as its own interactions in the real world. It translates that data into robotic actions. Each robot has a slightly different way of translating English into action, he adds.  

“We increasingly feel like a robot is essentially a chatbot that speaks robotese,” Vanhoucke says. 

Baby steps

Despite the fast pace of development, robots still face many challenges before they can be released into the real world. They are still way too clumsy for regular consumers to justify spending tens of thousands of dollars on them. Robots also still lack the sort of common sense that would allow them to multitask. And they need to move from just picking things up and placing them somewhere to putting things together, says Goldberg—for example, putting a deck of cards or a board game back in its box and then into the games cupboard. 

But to judge from the early results of integrating AI into robots, roboticists are not wasting their time, says Pinto. 

“I feel fairly confident that we will see some semblance of a general-purpose home robot. Now, will it be accessible to the general public? I don’t think so,” he says. “But in terms of raw intelligence, we are already seeing signs right now.” 

Building the next generation of robots might not just assist humans in their everyday chores or help people like Henry Evans live a more independent life. For researchers like Pinto, there is an even bigger goal in sight.

Home robotics offers one of the best benchmarks for human-level machine intelligence, he says. The fact that a human can operate intelligently in the home environment, he adds, means we know this is a level of intelligence that can be reached. 

“It’s something which we can potentially solve. We just don’t know how to solve it,” he says. 

Evans in the foreground with computer screen.  A table with playing cards separates him from two other people in the room
Thanks to Stretch, Henry Evans was able to hold his own playing cards for the first time in two decades.
VY NGUYEN

For Henry and Jane Evans, a big win would be to get a robot that simply works reliably. The Stretch robot that the Evanses experimented with is still too buggy to use without researchers present to troubleshoot, and their home doesn’t always have the dependable Wi-Fi connectivity Henry needs in order to communicate with Stretch using a laptop.

Even so, Henry says, one of the greatest benefits of his experiment with robots has been independence: “All I do is lay in bed, and now I can do things for myself that involve manipulating my physical environment.”

Thanks to Stretch, for the first time in two decades, Henry was able to hold his own playing cards during a match. 

“I kicked everyone’s butt several times,” he says. 

“Okay, let’s not talk too big here,” Jane says, and laughs.

A conversation with Dragoș Tudorache, the politician behind the AI Act

8 April 2024 at 05:43

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Dragoș Tudorache is feeling pretty damn good. We’re sitting in a conference room in a chateau overlooking a lake outside Brussels, sipping glasses of cava. The Romanian liberal member of the European Parliament has spent the day hosting a conference on AI, defense, and geopolitics attended by nearly 400 VIP guests. The day is almost over, and Tudorache has promised to squeeze an interview with me in during cocktail hour. 

A former interior minister, Tudorache is one of the most important players in European AI policy. He is one of the two lead negotiators of the AI Act in the European Parliament. The bill, the first sweeping AI law of its kind in the world, will enter into force this year. We first met two years ago, when Tudorache was appointed to his position as negotiator. 

But Tudorache’s interest in AI started much earlier, in 2015. He says reading Nick Bostrom’s book Superintelligence, which explores how an AI superintelligence could be created and what the implications could be, made him realize the potential and dangers of AI and the need for regulating it. (Bostrom has recently been embroiled in a scandal for expressing racist views in emails unearthed from the ‘90s. Tudorache says he is not aware of Bostrom’s career after the publication of the book, and he did not comment on the controversy.) 

When he was elected to the European Parliament in 2019, he says, he arrived determined to work on AI regulation if the opportunity presented itself. 

“When I heard [Ursula] von der Leyen [the European Commission president] say in her first speech in front of Parliament that there will be AI regulation, I said ‘Whoo-ha, this is my moment,’” he recalls. 

Since then, Tudorache has chaired a special committee on AI, and shepherded the AI Act through the European Parliament and into its final form following negotiations with other EU institutions. 

It’s been a wild ride, with intense negotiations, the rise of ChatGPT, lobbying from tech companies, and flip-flopping by some of Europe’s largest economies. But now, as the AI Act has passed into law, Tudorache’s job on it is done and dusted, and he says he has no regrets. Although the act has been criticized—both by civil society for not protecting human rights enough and by industry for being too restrictive—Tudorache says its final form was the sort of compromise he expected. Politics is the art of compromise, after all. 

“There’s going to be a lot of building the plane while flying, and there’s going to be a lot of learning while doing,” he says. “But if the true spirit of what we meant with the legislation is well understood by all concerned, I do think that the outcome can be a positive one.”  

It is still early days—the law comes fully into force two years from now. But Tudorache believes it will change the tech industry for the better and start a process where companies will start to take responsible AI seriously thanks to the legally binding obligations for AI companies to be more transparent about how their models are built. (I wrote about the five things you need to know about the AI Act a couple of months ago here.)

“The fact that we now have a blueprint for how you put the right boundaries, while also leaving room for innovation, is something that will serve society,” says Tudorache. It will also serve businesses, he says, because it offers a predictable path forward on what you can and cannot do with AI. 

But the AI Act is just the beginning, and there is still plenty keeping Tudorache up at night. AI is ushering in big changes across every industry and society. It will change everything from health care to education, labor, defense, and even human creativity. Most countries have not grasped what AI will mean for them, he says, and the responsibility now lies with governments to ensure that citizens and society more broadly are ready for the AI age. 

“The crunch time … starts now,” he says. 

Join Dragoș Tudorache and me at Emtech Digital London on April 16-17! Tudorache will walk you through what companies need to take into account with the AI Act right now. See you next week!


Now read the rest of The Algorithm

Deeper Learning

A conversation with OpenAI’s first artist in residence

Alex Reben’s work is often absurd, sometimes surreal: a mash-up of giant ears imagined by DALL-E and sculpted by hand out of marble; critical burns generated by ChatGPT that thumb the nose at AI art. But its message is relevant to everyone. Reben is interested in the roles humans play in a world filled with machines, and how those roles are changing. He is also OpenAI’s first artist in residence. 
Meet the artist: Officially, the appointment started in January and lasts three months. But he’s been working with OpenAI for years already. Our senior editor for AI, Will Douglas Heaven, sat down with Reben to talk about the role AI can play in art, and the backlash against it from artists. Read more here.

Bits and Bytes

It’s easy to tamper with watermarks from AI-generated text

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t. It’s an especially significant finding because many regulations around the world, including the AI Act, are betting heavily on the development of watermarks to trace AI-generated content.  (MIT Technology Review

How three filmmakers created Sora’s latest stunning videos

In the last month, a handful of filmmakers have taken OpenAI’s new generative AI model Sora for a test drive. The results are amazing. The short films are a big jump up even from the cherry-picked demo videos that OpenAI used to tease Sora just six weeks ago. Here’s how three of the filmmakers did it. (MIT Technology Review

What’s next for generative video

Generative video will probably upend a wide range of businesses and change the roles of many professionals, from animators to advertisers. Fears of misuse are also growing. The widespread ability to generate fake video will make it easier than ever to flood the internet with propaganda and nonconsensual porn. We can see it coming. The problem is, nobody has a good fix. (MIT Technology Review

Google is considering charging for AI-powered search

In a major potential shake-up to Google’s business model, the tech giant is considering putting AI-powered search features behind a paywall. But considering how untrustworthy AI search results are, it’s unclear if people will want to pay for them. (Financial Times) 

The fight for AI talent heats up 

As layoffs sweep through the tech sector, AI jobs are still super hot. Tech giants are fighting each other for top talent, even offering seven-figure salaries, and poaching entire engineering teams with experience in generative AI. (Wall Street Journal

Inside Big Tech’s underground race to buy AI training data

AI models need to be trained on massive data sets, and big tech companies are quietly paying for data, chat logs, and personal photos hidden behind paywalls and login screens. (Reuters

How tech giants cut corners to harvest data for AI

AI companies are running out of quality training data for their huge AI models. In order to harvest more data, tech companies such as OpenAI, Google, and Meta have cut corners, ignored corporate policies, and debated bending the law, the New York Times found. (New York Times)

It’s easy to tamper with watermarks from AI-generated text

29 March 2024 at 10:51

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t. 

Watermarking works by inserting hidden patterns in AI-generated text, which allow computers to detect that the text comes from an AI system. They’re a fairly new invention, but they have already become a popular solution for fighting AI-generated misinformation and plagiarism. For example, the European Union’s AI Act, which enters into force in May, will require developers to watermark AI-generated content. But the new research shows that the cutting edge of watermarking technology doesn’t live up to regulators’ requirements, says Robin Staab, a PhD student at ETH Zürich, who was part of the team that developed the attacks. The research is yet to be peer reviewed, but will be presented at the International Conference on Learning Representations conference in May.  

AI language models work by predicting the next likely word in a sentence, generating one word at a time on the basis of those predictions. Watermarking algorithms for text divide the language model’s vocabulary into words on a “green list” and a “red list,” and then make the AI model choose words from the green list. The more words in a sentence that are from the green list, the more likely it is that the text was generated by a computer. Humans tend to write sentences that include a more random mix of words. 

The researchers tampered with five different watermarks that work in this way. They were able to reverse-engineer the watermarks by using an API to access the AI model with the watermark applied and prompting it many times, says Staab. The responses allow the attacker to “steal” the watermark by building an approximate model of the watermarking rules. They do this by analyzing the AI outputs and comparing them with normal text. 

Once they have an approximate idea of what the watermarked words might be, this allows the researchers to execute two kinds of attacks. The first one, called a spoofing attack, allows malicious actors to use the information they learned from stealing the watermark to produce text that can be passed off as being watermarked. The second attack allows hackers to scrub AI-generated text from its watermark, so the text can be passed off as human-written. 

The team had a roughly 80% success rate in spoofing watermarks, and an 85% success rate in stripping AI-generated text of its watermark. 

Researchers not affiliated with the ETH Zürich team, such as Soheil Feizi, an associate professor and director of the Reliable AI Lab at the University of Maryland, have also found watermarks to be unreliable and vulnerable to spoofing attacks. 

The findings from ETH Zürich confirm that these issues with watermarks persist and extend to the most advanced types of chatbots and large language models being used today, says Feizi. 

The research “underscores the importance of exercising caution when deploying such detection mechanisms on a large scale,” he says. 

Despite the findings, watermarks remain the most promising way to detect AI-generated content, says Nikola Jovanović, a PhD student at ETH Zürich who worked on the research. 

But more research is needed to make watermarks ready for deployment on a large scale, he adds. Until then, we should manage our expectations of how reliable and useful these tools are. “If it’s better than nothing, it is still useful,” he says.  

Update: This research will be presented at the International Conference on Learning Representations conference. The story has been updated to reflect that.

Meet the MIT Technology Review AI team in London

26 March 2024 at 07:06

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The UK is home to AI powerhouse Google DeepMind, a slew of exciting AI startups, and some of the world’s best universities. It’s also where I live, along with quite a few of my MIT Technology Review colleagues, including our senior AI editor Will Douglas Heaven. 

That’s why I’m super stoked to tell you that we’re gathering some of the brightest minds in AI in Europe for our flagship AI conference, EmTech Digital, in London on April 16 and 17. 

Our speakers include top figures like Zoubin Ghahramani, vice president of research at Google DeepMind; Maja Pantic, AI scientific research lead at Meta; Dragoș Tudorache, a member of the European Parliament and one of the key politicians behind the newly passed EU AI Act; and Victor Riparbelli, CEO of AI avatar company Synthesia. 

We’ll also hear from executives at NVIDIA, Roblox, Faculty, and ElevenLabs, and researchers from the UK’s top universities and AI research institutes. 

They will share their wisdom on how to harness AI and what businesses need to know right now about this transformative technology. 

Here are some sessions I am particularly excited about.

Generating AI’s Path Forward
Where is AI going next? Zoubin Ghahramani, vice president of research at Google DeepMind, will map out realistic timelines for new innovation, and he will discuss the need for an overall strategy for a safe and productive AI future for Europe and beyond.

Digital Assistants for AI Automation
You’ve perhaps heard of AI assistants. But in this session, David Barber, director of the Centre for Artificial Intelligence at University College London, will argue that a major transformation will come with the rise of AI agents, which can complete complex sets of actions such as booking travel, answering messages, and performing data entry. 

AI’s Impact on Democracy
A senior official from the UK’s National Cyber Security Centre will walk us through some of the threats posed by AI that keep him up at night. Based on our speaker prep call, I can tell you that real life really is stranger than fiction. 

The AI Act’s Impacts on Policy and Regulations
The AI Act is here, and companies in the US and the UK will have to comply with it if they want to do business in the EU. I will be sitting down with Dragoș Tudorache, one of the key politicians behind the law, to walk you through what companies need to take into account right now. 

Venturing into AI Opportunity
The European startup scene has long played second fiddle to the US. But with the rise of open-source AI unicorn Mistral and others, hopes are rising that European startups could become more competitive in the global AI marketplace. Paul Murphy, a partner at venture capital firm Lightspeed, one of the first funds to invest in Mistral, will tell us all about his predictions. 

The Business of Solving Big Challenges with AI
Colin Murdoch, Google DeepMind’s chief business officer, will show us why AI is so much more than generative AI and how it can help solve society’s greatest challenges, from gene editing to sustainable energy and computing. 

And the best bit of all: the post-conference drinks! A conference in London would not be nearly as fun without some good old-fashioned networking in a pub afterward. So join us April 16–17 in London, and get the inside scoop on how AI is transforming the world. Get your tickets here

Before you go… We have a freebie to give you a taster of the event. Join me and MIT Technology Review’s editors Niall Firth and David Rotman for a free half-hour LinkedIn Live session today, March 26. We’ll discuss how AI is changing the way we work. Bring your questions and tune in here  at 4pm GMT/12pm EDT/9am EDT.


Now read the rest of The Algorithm

Deeper Learning

The tech industry can’t agree on what open-source AI means. That’s a problem.

Suddenly, “open source” is the latest buzzword in AI circles. Meta has pledged to create open-source artificial general intelligence. And Elon Musk is suing OpenAI over its lack of open-source AI models. Meanwhile, a growing number of tech leaders and companies are setting themselves up as open-source champions. But there’s a fundamental problem—no one can agree on what “open-source AI” means. 

Definitions wanted: Open-source AI promises a future where anyone can take part in the technology’s development. That could accelerate innovation, boost transparency, and give users greater control over systems that could soon reshape many aspects of our lives. But what even is it? What makes an AI model open source, and what disqualifies it? The answers could have significant ramifications for the future of the technology. Read more from Edd Gent.

Bits and Bytes

Apple researchers are exploring dropping “Hey Siri” and listening with AI instead
So maybe our phones will be listening to us all the time after all? New research aims to see if AI models can determine when you’re speaking to your phone without needing a trigger phrase. They also show how Apple, considered a laggard in AI, is determined to catch up. (MIT Technology Review)

An AI-driven “factory of drugs” claims to have hit a big milestone
Insilico is part of a wave of companies betting on AI as the “next amazing revolution” in biology. The company claims to have created the first “true AI drug” that’s advanced to a test of whether it can cure a fatal lung condition in humans. (MIT Technology Review

Chinese platforms are cracking down on influencers selling AI lessons
Over the last year, a few Chinese influencers have made millions of dollars peddling short video lessons on AI, profiting off people’s fears about the as-yet-unclear impact of the new technology on their livelihoods. Now the platforms they thrived on have started to turn against them. (MIT Technology Review

Google DeepMind’s new AI assistant helps elite soccer coaches get even better
The system can predict the outcome of corner kicks and provide realistic and accurate tactical suggestions in matches. The system, called TacticAI, works by analyzing a dataset of 7,176 corner kicks taken by players for Liverpool FC, one of the world’s biggest soccer clubs. (MIT Technology Review)

How AI taught Cassie the two-legged robot to run and jump
Researchers used an AI technique called reinforcement learning to help a two-legged robot nicknamed Cassie run 400 meters, over varying terrains, and execute standing long jumps and high jumps, without being trained explicitly on each movement. (MIT Technology Review)

France fined Google €250 million over copyright infringements 
The country’s competition watchdog says the tech company failed to broker fair agreements with media outlets for publishing links to their content and plundered press articles to train its AI technology without informing the publishers. This sets an interesting precedent for AI and copyright in Europe, and potentially beyond. (Bloomberg

China is educating the next generation of top AI talent
New research suggests that China has eclipsed the United States as the biggest producer of AI talent. (New York Times

DeepMind’s cofounder has ditched his startup to lead Microsoft’s AI initiative
Mustafa Suleyman  has now left his conversational AI startup Inflection to lead Microsoft AI, a new organization focused on advancing Microsoft’s Copilot and other consumer AI products. (Microsoft)

How Adobe’s bet on non-exploitative AI is paying off

26 March 2024 at 04:00

Since the beginning of the generative AI boom, there has been a fight over how large AI models are trained. In one camp sit tech companies such as OpenAI that have claimed it is “impossible” to train AI without hoovering the internet of copyrighted data. And in the other camp are artists who argue that AI companies have taken their intellectual property without consent and compensation. 

Adobe is pretty unusual in that it sides with the latter group, with an  approach that stands out as an example of how generative AI products can be built without scraping copyrighted data from the internet. Adobe released its image-generating model Firefly, which is integrated into its popular photo editing tool Photoshop, one year ago.

In an exclusive interview with MIT Technology Review, Adobe’s AI leaders are adamant this is the only way forward. At stake is not just the livelihood of creators, they say, but our whole information ecosystem. What they have learned shows that building responsible tech doesn’t have to come at the cost of doing business. 

“We worry that the industry, Silicon Valley in particular, does not pause to ask the ‘how’ or the ‘why.’ Just because you can build something doesn’t mean you should build it without consideration of the impact that you’re creating,” says David Wadhwani, president of Adobe’s digital media business. 

Those questions guided the creation of Firefly. When the generative image boom kicked off in 2022, there was a major backlash against AI from creative communities. Many people were using generative AI models as derivative content machines to create images in the style of another artist, sparking a legal fight over copyright and fair use. The latest generative AI technology has also made it much easier to create deepfakes and misinformation. 

It soon became clear that to offer creators proper credit and businesses legal certainty, the company could not build its models by scraping the web of data, Wadwani says.  

Adobe wants to reap the benefits of generative AI while still “recognizing that these are built on the back of human labor. And we have to figure out how to fairly compensate people for that labor now and in the future,” says Ely Greenfield, Adobe’s chief technology officer for digital media.  

To scrape or not to scrape

The scraping of online data, commonplace in AI, has recently become highly controversial. AI companies such as OpenAI, Stability.AI, Meta, and Google are facing numerous lawsuits over AI training data. Tech companies argue that publicly available data is fair game. Writers and artists disagree and are pushing for a license-based model, where creators would get compensated for having their work included in training datasets. 

Adobe trained Firefly on content that had an explicit license allowing AI training, which means the bulk of the training data comes from Adobe’s library of stock photos, says Greenfield. The company offers creators extra compensation when material is  used to train AI models, he adds.  

This is in contrast to the status quo in AI today, where tech companies scrape the web indiscriminately and have a limited understanding of what of what the training data includes. Because of these practices, the AI datasets inevitably include copyrighted content and personal data, and research has uncovered toxic content, such as child sexual abuse material

Scraping the internet gives tech companies a cheap way to get lots of AI training data, and traditionally, having more data has allowed developers to build more powerful models. Limiting Firefly to licensed data for training was a risky bet, says Greenfield. 

“To be honest, when we started with Firefly with our image model, we didn’t know whether or not we would be able to satisfy customer needs without scraping the web,” says Greenfield. 

“And we found we could, which was great.” 

Human content moderators also review the training data to weed out objectionable or harmful content, known intellectual property, and images of known people, and the company has licenses for everything its products train on. 

Adobe’s strategy has been to integrate generative AI tools into its existing products, says Greenfield. In Photoshop, for example, Firefly users can fill in areas of an image using text commands. This allows them much more control over the creative process, and it aids their creativity. 

Still, more work needs to be done. The company wants to make Firefly even faster. Currently it takes around 10 seconds for the company’s content moderation algorithms to check the outputs of the model, for example, Greenfield says. Adobe is also trying to figure out how some business customers could generate copyrighted content, such as Marvel characters or Mickey Mouse. Adobe has teamed up with companies such as IBM, Mattel, NVIDIA and NASCAR, which allows these companies to use the tool with their intellectual property. It is also working on audio, lip synching tools and  3D generation

Garbage in, garbage out

The decision to not scrape the internet also gives Adobe an edge in content moderation. Generative AI is notoriously difficult to control, and developers themselves don’t know why the models generate the images and texts they do. Generative AI models have put out questionable and toxic content in numerous cases. 

That all comes down to what it has been trained on, Greenfield says. He says Adobe’s model has never seen a picture of Joe Biden or Donald Trump, for example, and it cannot be coaxed into generating political misinformation. The AI model’s training data has no news content or famous people. It has not been trained on any copyrighted material, such as images of Mickey Mouse. 

“It just doesn’t understand what that concept is,” says Greenfield. 

Adobe also applies automated content moderation at the point of creation to check that Firefly’s creations are safe for professional use. The model is prohibited from creating news stories or violent images. Some names of artists are also blocked. Firefly-generated content comes with labels that indicate it has been created using AI, and the image’s edit history. 

During a critical election year, the need to know who made a piece of content, and how, is especially important. Adobe has been a vocal advocate for labels on AI content that tell where it originated, and with whom. 

The company started the Content Authenticity Initiative, an association promoting the use of labels which tell you whether content is AI-generated or not, along with the New York Times and Twitter (now X). The initiative now has over 2,500 members. It is also part of developing C2PA, an industry standard label which shows where a piece of content has come from, and how it was created. 

“We’re long overdue [for] a better education in media literacy and tools that support people’s ability to validate any content that claims to represent reality,” Greenfield says. 

Adobe’s approach highlights the need for AI companies to be thinking deeply about content moderation, says Claire Leibowicz, head of AI and media integrity at the nonprofit Partnership on AI. 

Adobe’s approach toward generative AI serves those societal goals by fighting misinformation as well as promoting business goals, such as preserving creator autonomy and attribution, adds Leibowicz. 

“The business mission of Adobe is not to prevent misinformation, per se,” she says. “It’s to empower creators. And isn’t this a really elegant confluence of mission and tactics, to be able to kill two birds with one stone?” 

Wadhwani agrees. The company says Firefly-powered features are among its most popular, and 90% of Firefly’s web app users are entirely new customers to Adobe. 

 “I think our approach has definitely been good for business,” Wadhwani says.

Correction: An earlier version of this article had David Wadhwani’s title wrong. This has been amended.

The AI Act is done. Here’s what will (and won’t) change

19 March 2024 at 07:17

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

It’s official. After three years, the AI Act, the EU’s new sweeping AI law, jumped through its final bureaucratic hoop last week when the European Parliament voted to approve it. (You can catch up on the five main things you need to know about the AI Act with this story I wrote last year.) 

This also feels like the end of an era for me personally: I was the first reporter to get the scoop on an early draft of the AI Act in 2021, and have followed the ensuing lobbying circus closely ever since. 

But the reality is that the hard work starts now. The law will enter into force in May, and people living in the EU will start seeing changes by the end of the year. Regulators will need to get set up in order to enforce the law properly, and companies will have between up to three years to comply with the law.

Here’s what will (and won’t) change:

1. Some AI uses will get banned later this year

The Act places restrictions on AI use cases that pose a high risk to people’s fundamental rights, such as in healthcare, education, and policing. These will be outlawed by the end of the year. 

It also bans some uses that are deemed to pose an “unacceptable risk.” They include some pretty out-there and ambiguous use cases, such as AI systems that deploy “subliminal, manipulative, or deceptive techniques to distort behavior and impair informed decision-making,” or exploit vulnerable people. The AI Act also bans systems that infer sensitive characteristics such as someone’s political opinions or sexual orientation, and the use of real-time facial recognition software in public places. The creation of facial recognition databases by scraping the internet à la Clearview AI will also be outlawed. 

There are some pretty huge caveats, however. Law enforcement agencies are still allowed to use sensitive biometric data, as well as facial recognition software in public places to fight serious crime, such as terrorism or kidnappings. Some civil rights organizations, such as digital rights organization Access Now, have called the AI Act a “failure for human rights” because it did not ban controversial AI use cases such as facial recognition outright. And while companies and schools are not allowed to use software that claims to recognize people’s emotions, they can if it’s for medical or safety reasons.

2. It will be more obvious when you’re interacting with an AI system

Tech companies will be required to label deepfakes and AI-generated content and notify people when they are interacting with a chatbot or other AI system. The AI Act will also require companies to develop AI-generated media in a way that makes it possible to detect. This is promising news in the fight against misinformation, and will give research around watermarking and content provenance a big boost. 

However, this is all easier said than done, and research lags far behind what the regulation requires. Watermarks are still an experimental technology and easy to tamper with. It is still difficult to reliably detect AI-generated content. Some efforts show promise, such as the C2PA, an open-source internet protocol, but far more work is needed to make provenance techniques reliable, and to build an industry-wide standard. 

3. Citizens can complain if they have been harmed by an AI

The AI Act will set up a new European AI Office to coordinate compliance, implementation, and enforcement (and they are hiring). Thanks to the AI Act, citizens in the EU cansubmit complaints about AI systems when they suspect they have been harmed by one, and can receive explanations on why the AI systems made decisions they did. It’s an important first step toward giving people more agency in an increasingly automated world. However, this will require citizens to have a decent level of AI literacy, and to be aware of how algorithmic harms happen. For most people, these are still very foreign and abstract concepts. 

4. AI companies will need to be more transparent

Most AI uses will not require compliance with the AI Act. It’s only AI companies developing technologies in “high risk” sectors, such as critical infrastructure or healthcare, that will have new obligations when the Act fully comes into force in three years. These include better data governance, ensuring human oversight and assessing how these systems will affect people’s rights.

AI companies that are developing “general purpose AI models,” such as language models, will also need to create and keep technical documentation showing how they built the model, how they respect copyright law, and publish a publicly available summary of what training data went into training the AI model. 

This is a big change from the current status quo, where tech companies are secretive about the data that went into their models, and will require an overhaul of the AI sector’s messy data management practices

The companies with the most powerful AI models, such as GPT-4 and Gemini, will face more onerous requirements, such as having to perform model evaluations and risk-assessments and mitigations, ensure cybersecurity protection, and report any incidents where the AI system failed. Companies that fail to comply will face huge fines or their products could be banned from the EU. 

It’s also worth noting that free open-source AI models that share every detail of how the model was built, including the model’s architecture, parameters, and weights, are exempt from many of the obligations of the AI Act.


Now read the rest of The Algorithm

Deeper Learning

Africa’s push to regulate AI starts now

The projected benefit of AI adoption on Africa’s economy is tantalizing. Estimates suggest that Nigeria, Ghana, Kenya, and South Africa alone could rake in up to $136 billion worth of economic benefits by 2030 if businesses there begin using more AI tools. Now the African Union—made up of 55 member nations—is trying to work out how to develop and regulate this emerging technology. 

It’s not going to be easy: If African countries don’t develop their own regulatory frameworks to protect citizens from the technology’s misuse, some experts worry that Africans will be hurt in the process. But if these countries don’t also find a way to harness AI’s benefits, others fear their economies could be left behind. (Read more from Abdullahi Tsanni.) 

Bits and Bytes

An AI that can play Goat Simulator is a step toward more useful machines
A new AI agent from Google DeepMind can play different games, including ones it has never seen before such as Goat Simulator 3, a fun action game with exaggerated physics. It’s a step toward more generalized AI that can transfer skills across multiple environments. (MIT Technology Review

This self-driving startup is using generative AI to predict traffic
Waabi says its new model can anticipate how pedestrians, trucks, and bicyclists move using lidar data. If you prompt the model with a situation, like a driver recklessly merging onto a highway at high speed, it predicts how the surrounding vehicles will move, then generates a lidar representation of 5 to 10 seconds into the future (MIT Technology Review

LLMs become more covertly racist with human intervention
It’s long been clear that large language models like ChatGPT absorb racist views from the millions of pages of the internet they are trained on. Developers have responded by trying to make them less toxic. But new research suggests that those efforts, especially as models get larger, are only curbing racist views that are overt, while letting more covert stereotypes grow stronger and better hidden. (MIT Technology Review)

Let’s not make the same mistakes with AI that we made with social media
Social media’s unregulated evolution over the past decade holds a lot of lessons that apply directly to AI companies and technologies, argue Nathan E. Sanders and Bruce Schneier. (MIT Technology Review

OpenAI’s CTO Mira Murati fumbled when asked about training data for Sora
In this interview with the Wall Street Journal, the journalist asks Murati whether OpenAI’s new video-generation AI system, Sora, was trained on videos from YouTube. Murati says she is not sure, which is an embarrassing answer from someone who should really know. OpenAI has been hit with copyright lawsuits about the data used to train its other AI models, and I would not be surprised if video was its next legal headache. (Wall Street Journal

Among the AI doomsayers
I really enjoyed this piece. Writer Andrew Marantz spent time with people who fear that AI poses an existential risk to humanity, and tried to get under their skin. The details in this story are both hilarious and juicy—and raise questions about who we should be listening to when it comes to AI’s harms. (The New Yorker

An AI that can play Goat Simulator is a step toward more useful machines

13 March 2024 at 10:00

Fly, goat, fly! A new AI agent from Google DeepMind can play different games, including ones it has never seen before such as Goat Simulator 3, a fun action game with exaggerated physics. Researchers were able to get it to follow text commands to play seven different games and move around in three different 3D research environments. It’s a step toward more generalized AI that can transfer skills across multiple environments.  

Google DeepMind has had huge success developing game-playing AI systems. Its system AlphaGo, which beat top professional player Lee Sedol at the game Go in 2016, was a major milestone that showed the power of deep learning. But unlike earlier game-playing AI systems, which mastered only one game or could only follow single goals or commands, this new agent is able to play a variety of different games, including Valheim and No Man’s Sky. It’s called SIMA, an acronym for “scalable, instructable, multiworld agent.”

In training AI systems, games are a good proxy for real-world tasks. “A general game-playing agent could, in principle, learn a lot more about how to navigate our world than anything in a single environment ever could,” says Michael Bernstein, an associate professor of computer science at Stanford University, who was not part of the research. 

“One could imagine one day rather than having superhuman agents which you play against, we could have agents like SIMA playing alongside you in games with you and with your friends,” says Tim Harley, a research engineer at Google DeepMind who was part of the team that developed the agent. 

The team trained SIMA on lots of examples of humans playing video games, both individually and collaboratively, alongside keyboard and mouse input and annotations of what the players did in the game, says Frederic Besse, a research engineer at Google DeepMind.  

Then they used an AI technique called imitation learning to teach the agent to play games as humans would. SIMA can follow 600 basic instructions, such as “Turn left,” “Climb the ladder,” and “Open the map,” each of which can be completed in less than approximately 10 seconds.

The team found that a SIMA agent that was trained on many games was better than an agent that learned how to play just one. This is because it was able to take advantage of concepts shared between games to learn better skills and get better at carrying out instructions, says Besse. 

“This is again a really exciting key property, as we have an agent that can play games it has never seen before, essentially,” he says. 

Seeing this sort of knowledge transfer between games is a significant milestone for AI research, says Paulo Rauber, a lecturer in artificial Intelligence at Queen Mary University of London. 

The basic idea of learning to execute instructions on the basis of examples provided by humans could lead to more powerful systems in the future, especially with bigger data sets, Rauber says. SIMA’s relatively limited data set is what is holding back its performance, he says. 

Although the number of game environments it’s been trained on is still small, SIMA is on the right track for scaling up, says Jim Fan, a senior research scientist at Nvidia who runs its  AI Agents Initiative. 

But the AI system is still not close to human level, says Harley. For example, in the game No Man’s Sky, the AI agent could do just 60% of the tasks humans could do. And when the researchers removed the ability for humans to give SIMA instructions, they found the agent performed much worse than before. 

Next, Besse says, the team is working on improving the agent’s performance. The researchers want to get it to work in as many environments as possible and learn new skills, and they want people to be able to chat with the agent and get a response. The team also wants SIMA to have more generalized skills, allowing it to quickly pick up games it has never seen before, much like a human. 

Humans “can generalize very well to unseen environments and unseen situations,” says Besse. “And we want our agents to be just the same.”  

SIMA inches us closer to a “ChatGPT moment” for autonomous agents, says Roy Fox, an assistant professor at the University of California, Irvine.  

But it is a long way away from actual autonomous AI. That would be “a whole different ball game,” he says. 

Why we need better defenses against VR cyberattacks

12 March 2024 at 06:14

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

I remember the first time I tried on a VR headset. It was the first Oculus Rift, and I nearly fainted after experiencing an intense but visually clumsy VR roller-coaster. But that was a decade ago, and the experience has gotten a lot smoother and more realistic since. That impressive level of immersiveness could be a problem, though: it makes us particularly vulnerable to cyberattacks in VR. 

I just published a story about a new kind of security vulnerability discovered by researchers at the University of Chicago. Inspired by the Christoper Nolan movie Inception, the attack allows hackers to create an app that injects malicious code into the Meta Quest VR system. Then it launches a clone of the home screen and apps that looks identical to the user’s original screen. Once inside, attackers are able to see, record, and modify everything the person does with the VR headset, tracking voice, motion, gestures, keystrokes, browsing activity, and even interactions with other people in real time. New fear = unlocked. 

The findings are pretty mind-bending, in part because the researchers’ unsuspecting test subjects had absolutely no idea they were under attack. You can read more about it in my story here.

It’s shocking to see how fragile and unsecure these VR systems are, especially considering that Meta’s Quest headset is the most popular such product on the market, used by millions of people. 

But perhaps more unsettling is how attacks like this can happen without our noticing, and can warp our sense of reality. Past studies have shown how quickly people start treating things in AR or VR as real, says Franzi Roesner, an associate professor of computer science at the University of Washington, who studies security and privacy but was not part of the study. Even in very basic virtual environments, people start stepping around objects as if they were really there. 

VR has the potential to put misinformation, deception and other problematic content on steroids because it exploits people’s brains, and deceives them physiologically and subconsciously, says Roesner: “The immersion is really powerful.”  

And because VR technology is relatively new, people aren’t vigilantly looking out for security flaws or traps while using it. To test how stealthy the inception attack was, the University of Chicago researchers recruited 27 volunteer VR experts to experience it. One of the participants was Jasmine Lu, a computer science PhD researcher at the University of Chicago. She says she has been using, studying, and working with VR systems regularly since 2017. Despite that, the attack took her and almost all the other participants by surprise. 

“As far as I could tell, there was not any difference except a bit of a slower loading time—things that I think most people would just translate as small glitches in the system,” says Lu.  

One of the fundamental issues people may have to deal with in using VR is whether they can trust what they’re seeing, says Roesner. 

Lu agrees. She says that with online browsers, we have been trained to recognize what looks legitimate and what doesn’t, but with VR, we simply haven’t. People do not know what an attack looks like. 

This is related to a growing problem we’re seeing with the rise of generative AI, and even with text, audio, and video: it is notoriously difficult to distinguish real from AI-generated content. The inception attack shows that we need to think of VR as another dimension in a world where it’s getting increasingly difficult to know what’s real and what’s not. 

As more people use these systems, and more products enter the market, the onus is on the tech sector to develop ways to make them more secure and trustworthy. 

The good news? While VR technologies are commercially available, they’re not all that widely used, says Roesner. So there’s time to start beefing up defenses now. 


Now read the rest of The Algorithm

Deeper Learning

An OpenAI spinoff has built an AI model that helps robots learn tasks like humans

In the summer of 2021, OpenAI quietly shuttered its robotics team, announcing that progress was being stifled by a lack of data necessary to train robots in how to move and reason using artificial intelligence. Now three of OpenAI’s early research scientists say the startup they spun off in 2017, called Covariant, has solved that problem and unveiled a system that combines the reasoning skills of large language models with the physical dexterity of an advanced robot.

Multimodal prompting: The new model, called RFM-1, was trained on years of data collected from Covariant’s small fleet of item-picking robots that customers like Crate & Barrel and Bonprix use in warehouses around the world, as well as words and videos from the internet. Users can prompt the model using five different types of input: text, images, video, robot instructions, and measurements. The company hopes the system will become more capable and efficient as it’s deployed in the real world. Read more from James O’Donnell here

Bits and Bytes

You can now use generative AI to turn your stories into comics
By pulling together several different generative models into an easy-to-use package controlled with the push of a button, Lore Machine heralds the arrival of one-click AI. (MIT Technology Review

A former Google engineer has been charged with stealing AI trade secrets for Chinese companies
The race to develop ever more powerful AI systems is becoming dirty. A Chinese engineer downloaded confidential files about Google’s supercomputing data centers to his personal Google Cloud account while working for Chinese companies. (US Department of Justice)  

There’s been even more drama in the OpenAI saga
This story truly is the  gift that keeps on giving. OpenAI has clapped back at Elon Musk and his lawsuit, which claims the company has betrayed its original mission of doing good for the world, by publishing emails showing that Musk was keen to commercialize OpenAI too. Meanwhile, Sam Altman is back on the OpenAI board after his temporary ouster, and it turns out that chief technology officer Mira Murati played a bigger role in the coup against Altman than initially reported. 

A Microsoft whistleblower has warned that the company’s AI tool creates violent and sexual images, and ignores copyright
Shane Jones, an engineer who works at Microsoft, says his tests with the company’s Copilot Designer gave him concerning and disturbing results. He says the company acknowledged his concerns, but it did not take the product off the market. Jones then sent a letter explaining these concerns to the Federal Trade Commission, and Microsoft has since started blocking some terms that generated toxic content. (CNBC)

Silicon Valley is pricing academics out of AI research
AI research is eye-wateringly expensive, and Big Tech, with its huge salaries and computing resources, is draining academia of top talent. This has serious implications for the technology, causing it to be focused on commercial uses over science. (The Washington Post

VR headsets can be hacked with an Inception-style attack

11 March 2024 at 12:52

In the Christoper Nolan movie Inception, Leonardo DiCaprio’s character uses technology to enter his targets’ dreams to steal information and insert false details into their subconscious. 

A new “inception attack” in virtual reality works in a similar way. Researchers at the University of Chicago exploited a security vulnerability in Meta’s Quest VR system that allows hackers to hijack users’ headsets, steal sensitive information, and—with the help of generative AI—manipulate social interactions. 

The attack hasn’t been used in the wild yet, and the bar to executing it is high, because it requires a hacker to gain access to the VR headset user’s Wi-Fi network. However, it is highly sophisticated and leaves those targeted vulnerable to phishing, scams, and grooming, among other risks. 

In the attack, hackers create an app that injects malicious code into the Meta Quest VR system and then launch a clone of the VR system’s home screen and apps that looks identical to the user’s original screen. Once inside, attackers can see, record, and modify everything the person does with the headset. That includes tracking voice, gestures, keystrokes, browsing activity, and even the user’s social interactions. The attacker can even change the content of a user’s messages to other people. The research, which was shared with MIT Technology Review exclusively, is yet to be peer reviewed.

A spokesperson for Meta said the company plans to review the findings: “We constantly work with academic researchers as part of our bug bounty program and other initiatives.” 

VR headsets have slowly become more popular in recent years, but security research has lagged behind product development, and current defenses against attacks in VR are lacking. What’s more, the immersive nature of virtual reality makes it harder for people to realize they’ve fallen into a trap. 

“The shock in this is how fragile the VR systems of today are,” says Heather Zheng, a professor of computer science at the University of Chicago, who led the team behind the research. 

Stealth attack

The inception attack exploits a loophole in Meta Quest headsets: users must enable “developer mode” to download third-party apps, adjust their headset resolution, or screenshot content, but this mode allows attackers to gain access to the VR headset if they’re using the same Wi-Fi network. 

Developer mode is supposed to give people remote access for debugging purposes. However, that access can be repurposed by a malicious actor to see what a user’s home screen looks like and which apps are installed. (Attackers can also strike if they are able to access a headset physically or if a user downloads apps that include malware.) With this information, the attacker can replicate the victim’s home screen and applications. 

Then the attacker stealthily injects an app with the inception attack in it. The attack is activated and the VR headset hijacked when unsuspecting users exit an application and return to the home screen. The attack also captures the user’s display and audio stream, which can be livestreamed back to the attacker. 

In this way, the researchers were able to see when a user entered login credentials to an online banking site. Then they were able to manipulate the user’s screen to show an incorrect bank balance. When the user tried to pay someone $1 through the headset, the researchers were able to change the amount transferred to $5 without the user realizing. This is because the attacker can control both what the user sees in the system and what the device sends out. 

This banking example is particularly compelling, says Jiasi Chen, an associate professor of computer science at the University of Michigan, who researches virtual reality but was not involved in the research. The attack could probably be combined with other malicious tactics, such as tricking people to click on suspicious links, she adds. 

The inception attack can also be used to manipulate social interactions in VR. The researchers cloned Meta Quest’s VRChat app, which allows users to talk to each other through their avatars. They were then able to intercept people’s messages and respond however they wanted. 

Generative AI could make this threat even worse because it allows anyone to instantaneously clone people’s voices and generate visual deepfakes, which malicious actors could then use to manipulate people in their VR interactions, says Zheng. 

Twisting reality

To test how easily people can be fooled by the inception attack, Zheng’s team recruited 27 volunteer VR experts. The participants were asked to explore applications such as a game called Beat Saber, where players control light sabers and try to slash beats of music that fly toward them. They were told the study aimed to investigate their experience with VR apps. Without their knowledge, the researchers launched the inception attack on the volunteers’ headsets. 

The vast majority of participants did not suspect anything. Out of 27 people, only 10 noticed a small “glitch” when the attack began, but most of them brushed it off as normal lag. Only one person flagged some kind of suspicious activity. 

There is no way to authenticate what you are seeing once you go into virtual reality, and the immersiveness of the technology makes people trust it more, says Zheng. This has the potential to make such attacks especially powerful, says Franzi Roesner, an associate professor of computer science at the University of Washington, who studies security and privacy but was not part of the study.

The best defense, the team found, is restoring the headset’s factory settings to remove the app. 

The inception attack gives hackers many different ways to get into the VR system and take advantage of people, says Ben Zhao, a professor of computer science at the University of Chicago, who was part of the team doing the research. But because VR adoption is still limited, there’s time to develop more robust defenses before these headsets become more widespread, he says. 

❌
❌