Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Meta says AI-generated election content is not happening at a “systemic level”

22 May 2024 at 16:35

Meta has seen strikingly little AI-generated misinformation around the 2024 elections despite major votes in countries such as Indonesia, Taiwan, and Bangladesh, said the company’s president of global affairs, Nick Clegg, on Wednesday. 

“The interesting thing so far—I stress, so far—is not how much but how little AI-generated content [there is],” said Clegg during an interview at MIT Technology Review’s EmTech Digital conference in Cambridge, Massachusetts.  

“It is there; it is discernible. It’s really not happening on … a volume or a systemic level,” he said. Clegg said Meta has seen attempts at interference in, for example, the Taiwanese election, but that the scale of that interference is at a “manageable amount.” 

As voters will head to polls this year in more than 50 countries, experts have raised the alarm over AI-generated political disinformation and the prospect that malicious actors will use generative AI and social media to interfere with elections. Meta has previously faced criticism over its content moderation policies around past elections—for example, when it failed to prevent the January 6 rioters from organizing on its platforms. 

Clegg defended the company’s efforts at preventing violent groups from organizing, but he also stressed the difficulty of keeping up. “This is a highly adversarial space. You play Whack-a-Mole, candidly. You remove one group, they rename themselves, rebrand themselves, and so on,” he said. 

Clegg argued that compared with 2016, the company is now “utterly different” when it comes to moderating election content. Since then, it has removed over 200 “networks of coordinated inauthentic behavior,” he said. The company now relies on fact checkers and AI technology to identify unwanted groups on its platforms. 

Earlier this year, Meta announced it would label AI-generated images on Facebook, Instagram, and Threads. Meta has started adding visible markers to such images, as well as invisible watermarks and metadata in the image file. The watermarks will be added to images created using Meta’s generative AI systems or ones that carry invisible industry-standard markers. The company says its measures are in line with best practices laid out by the Partnership on AI, an AI research nonprofit.

But at the same time, Clegg admitted that tools to detect AI-generated content are still imperfect and immature. Watermarks in AI systems are not adopted industry-wide, and they are easy to tamper with. They are also hard to implement robustly in AI-generated text, audio, and video. 

Ultimately that should not matter, Clegg said, because Meta’s systems should be able to catch and detect mis- and disinformation regardless of its origins. 

“AI is a sword and a shield in this,” he said.

Clegg also defended the company’s decision to allow ads claiming that the 2020 US election was stolen, noting that these kinds of claims are common throughout the world and saying it’s “not feasible” for Meta to relitigate past elections. Just this month, eight state secretaries of state wrote a letter to Meta CEO Mark Zuckerberg arguing that the ads could still be dangerous, and that they have the potential to further threaten public trust in elections and the safety of individual election workers.

You can watch the full interview with Nick Clegg and MIT Technology Review executive editor Amy Nordrum below.

Five ways criminals are using AI

21 May 2024 at 08:30

Artificial intelligence has brought a big boost in productivity—to the criminal underworld. 

Generative AI provides a new, powerful tool kit that allows malicious actors to work far more efficiently and internationally than ever before, says Vincenzo Ciancaglini, a senior threat researcher at the security company Trend Micro. 

Most criminals are “not living in some dark lair and plotting things,” says Ciancaglini. “Most of them are regular folks that carry on regular activities that require productivity as well.”

Last year saw the rise and fall of WormGPT, an AI language model built on top of an open-source model and trained on malware-related data, which was created to assist hackers and had no ethical rules or restrictions. But last summer, its creators announced they were shutting the model down after it started attracting media attention. Since then, cybercriminals have mostly stopped developing their own AI models. Instead, they are opting for tricks with existing tools that work reliably. 

That’s because criminals want an easy life and quick gains, Ciancaglini explains. For any new technology to be worth the unknown risks associated with adopting it—for example, a higher risk of getting caught—it has to be better and bring higher rewards than what they’re currently using. 

Here are five ways criminals are using AI now. 

Phishing

The  biggest use case for generative AI among criminals right now is phishing, which involves trying to trick people into revealing sensitive information that can be used for malicious purposes, says Mislav Balunović, an AI security researcher at ETH Zurich. Researchers have found that the rise of ChatGPT has been accompanied by a huge spike in the number of phishing emails

Spam-generating services, such as GoMail Pro, have ChatGPT integrated into them, which allows criminal users to translate or improve the messages sent to victims, says Ciancaglini. OpenAI’s policies restrict people from using their products for illegal activities, but that is difficult to police in practice, because many innocent-sounding prompts could be used for malicious purposes too, says Ciancaglini. 

OpenAI says it uses a mix of human reviewers and automated systems to identify and enforce against misuse of its models, and issues warnings, temporary suspensions and bans if users violate the company’s policies. 

“We take the safety of our products seriously and are continually improving our safety measures based on how people use our products,” a spokesperson for OpenAI told us. “We are constantly working to make our models safer and more robust against abuse and jailbreaks, while also maintaining the models’ usefulness and task performance,” they added. 

In a report from February, OpenAI said it had closed five accounts associated with state-affiliated malicous actors. 

Before, so-called Nigerian prince scams, in which someone promises the victim a large sum of money in exchange for a small up-front payment, were relatively easy to spot because the English in the messages was clumsy and riddled with grammatical errors, Ciancaglini. says. Language models allow scammers to generate messages that sound like something a native speaker would have written. 

“English speakers used to be relatively safe from non-English-speaking [criminals] because you could spot their messages,” Ciancaglini says. That’s not the case anymore. 

Thanks to better AI translation, different criminal groups around the world can also communicate better with each other. The risk is that they could coordinate large-scale operations that span beyond their nations and target victims in other countries, says Ciancaglini.

Deepfake audio scams

Generative AI has allowed deepfake development to take a big leap forward, with synthetic images, videos, and audio looking and sounding more realistic than ever. This has not gone unnoticed by the criminal underworld.

Earlier this year, an employee in Hong Kong was reportedly scammed out of $25 million after cybercriminals used a deepfake of the company’s chief financial officer to convince the employee to transfer the money to the scammer’s account. “We’ve seen deepfakes finally being marketed in the underground,” says Ciancaglini. His team found people on platforms such as Telegram showing off their “portfolio” of deepfakes and selling their services for as little as $10 per image or $500 per minute of video. One of the most popular people for criminals to deepfake is Elon Musk, says Ciancaglini. 

And while deepfake videos remain complicated to make and easier for humans to spot, that is not the case for audio deepfakes. They are cheap to make and require only a couple of seconds of someone’s voice—taken, for example, from social media—to generate something scarily convincing.

In the US, there have been high-profile cases where people have received distressing calls from loved ones saying they’ve been kidnapped and asking for money to be freed, only for the caller to turn out to be a scammer using a deepfake voice recording. 

“People need to be aware that now these things are possible, and people need to be aware that now the Nigerian king doesn’t speak in broken English anymore,” says Ciancaglini. “People can call you with another voice, and they can put you in a very stressful situation,” he adds. 

There are some for people to protect themselves, he says. Ciancaglini recommends agreeing on a regularly changing secret safe word between loved ones that could help confirm the identity of the person on the other end of the line. 

“I password-protected my grandma,” he says.  

Bypassing identity checks

Another way criminals are using deepfakes is to bypass “know your customer” verification systems. Banks and cryptocurrency exchanges use these systems to verify that their customers are real people. They require new users to take a photo of themselves holding a physical identification document in front of a camera. But criminals have started selling apps on platforms such as Telegram that allow people to get around the requirement. 

They work by offering a fake or stolen ID and imposing a deepfake image on top of a real person’s face to trick the verification system on an Android phone’s camera. Ciancaglini has found examples where people are offering these services for cryptocurrency website Binance for as little as $70. 

“They are still fairly basic,” Ciancaglini says. The techniques they use are similar to Instagram filters, where someone else’s face is swapped for your own. 

“What we can expect in the future is that [criminals] will use actual deepfakes … so that you can do more complex authentication,” he says. 

An example of a stolen ID and a criminal using face swapping technology to bypass identity verification systems.

Jailbreak-as-a-service

If you ask most AI systems how to make a bomb, you won’t get a useful response.

That’s because AI companies have put in place various safeguards to prevent their models from spewing harmful or dangerous information. Instead of building their own AI models without these safeguards, which is expensive, time-consuming, and difficult, cybercriminals have begun to embrace a new trend: jailbreak-as-a-service. 

Most models come with rules around how they can be used. Jailbreaking allows users to manipulate the AI system to generate outputs that violate those policies—for example, to write code for ransomware or generate text that could be used in scam emails. 

Services such as EscapeGPT and BlackhatGPT offer anonymized access to language-model APIs and jailbreaking prompts that update frequently. To fight back against this growing cottage industry, AI companies such as OpenAI and Google frequently have to plug security holes that could allow their models to be abused. 

Jailbreaking services use different tricks to break through safety mechanisms, such as posing hypothetical questions or asking questions in foreign languages. There is a constant cat-and-mouse game between AI companies trying to prevent their models from misbehaving and malicious actors coming up with ever more creative jailbreaking prompts. 

These services are hitting the sweet spot for criminals, says Ciancaglini. 

“Keeping up with jailbreaks is a tedious activity. You come up with a new one, then you need to test it, then it’s going to work for a couple of weeks, and then Open AI updates their model,” he adds. “Jailbreaking is a super-interesting service for criminals.”

Doxxing and surveillance

AI language models are a perfect tool for not only phishing but for doxxing (revealing private, identifying information about someone online), says Balunović. This is because AI language models are trained on vast amounts of internet data, including personal data, and can deduce where, for example, someone might be located.

As an example of how this works, you could ask a chatbot to pretend to be a private investigator with experience in profiling. Then you could ask it to analyze text the victim has written, and infer personal information from small clues in that text—for example, their age based on when they went to high school, or where they live based on landmarks they mention on their commute. The more information there is about them on the internet, the more vulnerable they are to being identified. 

Balunović was part of a team of researchers that found late last year that large language models, such as GPT-4, Llama 2, and Claude, are able to infer sensitive information such as people’s ethnicity, location, and occupation purely from mundane conversations with a chatbot. In theory, anyone with access to these models could use them this way. 

Since their paper came out, new services that exploit this feature of language models have emerged. 

While the existence of these services doesn’t indicate criminal activity, it points out the new capabilities malicious actors could get their hands on. And if regular people can build surveillance tools like this, state actors probably have far better systems, Balunović says. 

“The only way for us to prevent these things is to work on defenses,” he says.

Companies should invest in data protection and security, he adds. 

For individuals, increased awareness is key. People should think twice about what they share online and decide whether they are comfortable with having their personal details being used in language models, Balunović says. 

Join me at EmTech Digital this week!

21 May 2024 at 05:00

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

I’m excited to spend this week in Cambridge, Massachusetts. I’m visiting the mothership for MIT Technology Review’s annual flagship AI conference, EmTech Digital, on May 22-23. 

Between the world leaders gathering in Seoul for the second AI Safety Summit this week and Google and OpenAI’s launches of their supercharged new models, Astra and GPT-4o, the timing could not be better. AI feels hotter than ever.  

This year’s EmTech will be all about how we can harness the power of generative AI while mitigating its risks,and how the technology will affect the workforce, competitiveness, and democracy. We will also get a sneak peek into the AI labs of Google, OpenAI, Adobe, AWS, and others. 

This year’s top speakers include Nick Clegg, the president of global affairs at Meta, who will talk about what the platform intends to do to curb misinformation. In 2024, over 40 national elections will happen around the world, making it one of the most consequential political years in history. At the same time, generative AI has enabled an entirely new age of misinformation. And it’s all coalescing, with major shake-ups at social media companies and information platforms. MIT Technology Review’s executive editor Amy Nordrum will press Clegg on stage about what this all means for democracy.

Here are some other sessions I am excited about.

A Peek Inside Google’s plans 
Jay Yagnik, a vice president and engineering fellow at Google, will share what the history of AI can teach us about where the technology is going next and discuss Google’s vision for how to harness generative AI.  

From the Labs of OpenAI
Srinivas Narayanan, the vice president of applied AI at OpenAI, will share what the company has been building recently and what is coming next. In another session, Connor Holmes, who led work on video-generation AI Sora, will talk about how video-generation models could work as world simulators, and what this means for future AI models. 

The Often-Overlooked Privacy Problems in AI
Language models are prone to leaking private data. In this session Patricia Thaine, cofounder and CEO of Private AI, will explore methods that keep secrets secret and help organizations maintain compliance with privacy regulations. 

A Word Is Worth a Thousand Pictures
Cynthia Lu, senior director and head of applied research at Adobe, will walk us through the AI technology that Adobe is building and the ethical and legal implications of generated imagery. I’ve written about Adobe’s efforts to build generative AI in a non-exploitative way and how they’re paying off, so I’ll be interested to hear more about that.  

AI in the ER
Advances in medical image analysis are now enabling doctors to interpret radiology reports and automate incident documentation. This session by Polina Golland, the associate director of the MIT Computer Science and AI Laboratory, will explore both the challenges of working with sensitive personal data and the benefits of AI-assisted health care for patients.

Future Compute
On Tuesday, May 21, we are also hosting Future Compute, a day looking at how businesses and technical leaders navigate adopting AI. We have tech leaders from Salesforce, Stack Overflow, Amazon, and more, discussing how they are managing the AI transformation, and what pitfalls to avoid. 

I’d love to see you there, so if you can make it, sign up and come along! Readers of The Algorithm get 30% off tickets with the code ALGORITHMD24.


Now read the rest of The Algorithm

Deeper Learning

To kick off this busy week in AI, heavyweights such as Turing Prize winners Geoffrey Hinton and Yoshua Bengio, and a slew of other prominent academics and writers, have just written an op-ed published in Science calling for more investment in AI safety research. The op-ed, timed to coincide with the Seoul AI Safety Summit, represents the group’s wish list for leaders meeting to discuss AI. Many of the researchers behind the text have been heavily involved in consulting with governments and international organizations on the best approach to building safer AI systems. 

They argue that tech companies and public funders should invest at least a third of their AI R&D budgets into AI safety, and that governments should mandate stricter AI safety standards and assessments rather than relying on voluntary measures. The piece calls for them to establish fast-acting AI oversight bodies and provide them with funding comparable to the budgets of safety agencies in other sectors. It also says governments should require AI companies to prove that their systems cannot cause harm. 

But it’s hard to see this op-ed shifting things much. Tech companies have little incentive to spend money on measures that might slow down innovation and, crucially, product launches. Over the past few years, we’ve seen teams working on responsible AI take the hit during mass layoffs. Governments have shown more willingness to regulate AI in the last year or so, with the EU passing its first piece of comprehensive AI legislation, but this op-ed calls for them to go much further and faster. 

Despite that, focusing on the hypothetical existential risks posed by AI remains controversial among researchers, with some experts arguing that it distracts from the very real problems AI is causing today. As my colleague Will Douglas Heaven wrote last June when the AI safety debate was at a fever pitch: “The Overton window has shifted. What were once extreme views are now mainstream talking points, grabbing not only headlines but the attention of world leaders.”

Even Deeper Learning

GPT-4o’s Chinese token-training data is polluted by spam and porn websites

Last Monday OpenAI released GPT-4o, an AI model that you can communicate with in real time via live voice conversation, video streams from your phone, and text. But just days later, Chinese speakers started to notice that something seemed off about it: the tokens it uses to parse text were full of phrases related to spam and porn.

Oops, AI did it again: Humans read in words, but LLMs analyze tokens—distinct units in a sentence. When it comes to the Chinese language, the new tokenizer used by GPT-4o has introduced a disproportionate number of meaningless phrases. In one example, the longest token in GPT-4o’s public token library literally means “_free Japanese porn video to watch.” Experts say that’s likely due to insufficient data cleaning and filtering before the tokenizer was trained. (MIT Technology Review

Bits and Bytes

What’s next in chips
Thanks to the boom in artificial intelligence, the world of chips is on the cusp of a huge tidal shift. We outline four trends to look for in the year ahead that will define what the chips of the future will look like, who will make them, and which new technologies they’ll unlock. (MIT Technology Review

OpenAI and Google are launching supercharged AI assistants. Here’s how you can try them out.
OpenAI unveiled its GPT-4o assistant last Monday, and Google unveiled its own work building supercharged AI assistants just a day later. My colleague James O’Donnell walks you through what you should know about how to access these new tools, what you might use them for, and how much it will cost. 

OpenAI has lost its cofounder and dissolved the team focused on long-term AI risks
Last week OpenAI cofounder Ilya Sutskever and Jan Leike, the co-lead of the startup’s superalignment team, announced they were leaving the company. The superalignment team was set up less than a year ago to develop ways to control superintelligent AI systems. Leike said he was leaving because OpenAI’s “safety culture and processes have taken a backseat to shiny products.” In Silicon Valley, money always wins. (CNBC

Meta’s plan to win the AI race: give its tech away for free
Mark Zuckerberg’s bet is that making powerful AI technology free will drive down competitors’ prices, making Meta’s tech more widespread while others build products on top of it—ultimately giving him more control over the future of AI. (The Wall Street Journal

Sony Music Group has warned companies against using its content to train AI
The record label says it opts out of indiscriminate AI training and has started sending letters to AI companies prohibiting them from mining text or data, scraping the internet, or using Sony’s content without licensing agreements. (Sony

What do you do when an AI company takes your voice?
Two voice actors are suing Lovo, a startup, claiming it illegally took recordings of their voices to train their AI model. (The New York Times

Google’s Astra is its first AI-for-everything agent

14 May 2024 at 13:55

Google is set to introduce a new system called Astra later this year and promises that it will be the most powerful, advanced type of AI assistant it’s ever launched. 

The current generation of AI assistants, such as ChatGPT, can retrieve information and offer answers, but that is about it. But this year, Google is rebranding its assistants as more advanced “agents,” which it says could  show reasoning, planning, and memory skills and are able to take multiple steps to execute tasks. 

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of research at Google DeepMind, told MIT Technology Review

“We are in very early days [of AI agent development],” Google CEO Sundar Pichai said on a call ahead of Google’s I/O conference today. 

“We’ve always wanted to build a universal agent that will be useful in everyday life,” said Demis Hassabis, the CEO and cofounder of Google DeepMind. “Imagine agents that can see and hear what we do, better understand the context we’re in, and respond quickly in conversation, making the pace and quality of interaction feel much more natural.” That, he says, is what Astra will be. 

Google’s announcement comes a day after competitor OpenAI unveiled its own supercharged AI assistant, GPT-4o. Google DeepMind’s Astra responds to audio and video inputs, much in the same way as GPT-4o (albeit it less flirtatiously). 

In a press demo, a user pointed a smartphone camera and smart glasses at things and asked Astra to explain what they were. When the person pointed the device out the window and asked “What neighborhood do you think I’m in?” the AI system was able to identify King’s Cross, London, site of Google DeepMind’s headquarters. It was also able to say that the person’s glasses were on a desk, having recorded them earlier in the interaction. 

The demo showcases Google DeepMind’s vision of multimodal AI (which can handle multiple types of input—voice, video, text, and so on) working in real time, Vinyals says. 

“We are very excited about, in the future, to be able to really just get closer to the user, assist the user with anything that they want,” he says. Google recently upgraded its artificial-intelligence model Gemini to process even larger amounts of data, an upgrade which helps it handle bigger documents and videos, and have longer conversations. 

Tech companies are in the middle of a fierce competition over AI supremacy, and  AI agents are the latest effort from Big Tech firms to show they are pushing the frontier of development. Agents also play into a narrative by many tech companies, including OpenAI and Google DeepMind, that aim to build artificial general intelligence, a highly hypothetical idea of superintelligent AI systems. 

“Eventually, you’ll have this one agent that really knows you well, can do lots of things for you, and can work across multiple tasks and domains,” says Chirag Shah, a professor at the University of Washington who specializes in online search.

This vision is still aspirational. But today’s announcement should be seen as Google’s attempt to keep up with competitors. And by rushing these products out, Google can collect even more data from its over a billion users on how they are using their models and what works, Shah says.

Google is unveiling many more new AI capabilities beyond agents today. It’s going to integrate AI more deeply into Search through a new feature called AI overviews, which gather information from the internet and package them into short summaries in response to search queries. The feature, which launches today, will initially be available only in the US, with more countries to gain access later. 

This will help speed up the search process and get users more specific answers to more complex, niche questions, says Felix Simon, a research fellow in AI and digital news at the Reuters Institute for Journalism. “I think that’s where Search has always struggled,” he says. 

Another new feature of Google’s AI Search offering is better planning. People will soon be able to ask Search to make meal and travel suggestions, for example, much like asking a travel agent to suggest restaurants and hotels. Gemini will be able to help them plan what they need to do or buy to cook recipes, and they will also be able to have conversations with the AI system, asking it to do anything from relatively mundane tasks, such as informing them about the weather forecast, to highly complex ones like helping them prepare for a job interview or an important speech. 

People will also be able to interrupt Gemini midsentence and ask clarifying questions, much as in a real conversation. 

In another move to one-up competitor OpenAI, Google also unveiled Veo, a new video-generating AI system. Veo is able to generate short videos and allows users more control over cinematic styles by understanding prompts like “time lapse” or “aerial shots of a landscape.”

Google has a significant advantage when it comes to training generative video models, because it owns YouTube. It’s already announced collaborations with artists such as Donald Glover and Wycleaf Jean, who are using its technology to produce their work. 

Earlier this year, OpenA’s CTO, Mira Murati, fumbled when asked about whether the company’s model was trained on YouTube data. Douglas Eck, senior research director at Google DeepMind, was also vague about the training data used to create Veo when asked about by MIT Technology Review, but he said that it “may be trained on some YouTube content in accordance with our agreements with YouTube creators.”

On one hand, Google is presenting its generative AI as a tool artists can use to make stuff, but the tools likely get their ability to create that stuff by using material from existing artists, says Shah. AI companies such as Google and OpenAI have faced a slew of lawsuits by writers and artists claiming that their intellectual property has been used without consent or compensation.  

“For artists it’s a double-edged sword,” says Shah. 

What to expect at Google I/O

14 May 2024 at 06:42

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In the world of AI, a lot can happen in a year. Last year, at the beginning of Big Tech’s AI wars, Google announced during its annual I/O conference that it was throwing generative AI at everything, integrating it into its suite of products from Docs to email to e-commerce listings and its chatbot Bard. It was an effort to catch up with competitors like Microsoft and OpenAI, which had unveiled snazzy products like coding assistants and ChatGPT, the product that has done more than any other to ignite the current excitement about AI.

Since then, its ChatGPT competitor chatbot Bard (which, you may recall, temporarily wiped $100 billion off Google’s share price when it made a factual error during the demo) has been replaced by the more advanced Gemini. But, for me, the AI revolution hasn’t felt like one. Instead, it’s been a slow slide toward marginal efficiency gains. I see more autocomplete functions in my email and word processing applications, and Google Docs now offers more ready-made templates. They are not groundbreaking features, but they are also reassuringly inoffensive. 

Google is holding its I/O conference tomorrow, May 14, and we expect them to announce a whole new slew of AI features, further embedding it into everything it does. The company is tight-lipped about its announcements, but we can make educated guesses. There has been a lot of speculation that it will upgrade its crown jewel, Search, with generative AI features that could, for example, go behind a paywall. Perhaps we will see Google’s version of AI agents, a buzzy word that basically means more capable and useful smart assistants able to do more complex tasks, such as booking flights and hotels much as a travel agent would. 

Google, despite having 90% of the online search market, is in a defensive position this year. Upstarts such as Perplexity AI have launched their own versions of AI-powered search to rave reviews, Microsoft’s AI-powered Bing has managed to increase its market share slightly, and OpenAI is working on its own AI-powered online search function and is also reportedly in conversation with Apple to integrate ChatGPT into smartphones

There are some hints about what any new AI-powered search features might look like. Felix Simon, a research fellow at the Reuters Institute for Journalism, has been part of the Google Search Generative Experience trial, which is the company’s way of testing new products on a small selection of real users. 

Last month, Simon noticed that his Google searches with links and short snippets from online sources had been replaced by more detailed, neatly packaged AI-generated summaries. He was able to get these results from queries related to nature and health, such as “Do snakes have ears?” Most of the information offered to him was correct, which was a surprise, as AI language models have a tendency to “hallucinate” (which means make stuff up), and they have been criticized for being an unreliable source of information. 

To Simon’s surprise, he enjoyed the new feature. “It’s convenient to ask [the AI] to get something presented just for you,” he says. 

Simon then started using the new AI-powered Google function to search for news items rather than scientific information.

For most of these queries, such as what happened in the UK or Ukraine yesterday, he was simply offered links to news sources such as the BBC and Al Jazeera. But he did manage to get the search engine to generate an overview of recent news items from Germany, in the form of a bullet-pointed list of news headlines from the day before. The first entry was about an attack on Franziska Giffey, a Berlin politician who was assaulted in a library. The AI summary had the date of the attack wrong. But it was so close to the truth that Simon didn’t think twice about its accuracy. 

A quick online search during our call revealed that the rest of the AI-generated news summaries were also littered with inaccuracies. Details were wrong, or the events referred to happened years ago. All the stories were also about terrorism, hate crimes, or violence, with one soccer result thrown in. Omitting headlines on politics, culture, and the economy seems like a weird choice.  

People have a tendency to believe computers to be correct even when they are not, and Simon’s experience is an example of the kinds of problems that might arise when AI models hallucinate. The ease of getting results means that people might unknowingly ingest fake news or wrong information. It’s very problematic if even people like Simon, who are trained to fact-check things and know how AI models work, don’t do their due diligence and assume information is correct. 

Whatever Google announces at I/O tomorrow, there is immense pressure for it to be something that would justify its massive investment into AI. And after a year of experimenting, there also need to be serious improvements in making its generative AI tools more accurate and reliable. 

There are some people in the computer science community who say that hallucinations are an intrinsic part of generative AI that can’t ever be fixed, and that we can never fully trust these systems. But hallucinations will make AI-powered products less appealing to users. And it’s highly unlikely that Google will announce it has fixed this problem at I/O tomorrow. 

If you want to learn more about how Google plans to develop and deploy AI, come and hear from its vice president of AI, Jay Yagnik, at our flagship AI conference, EmTech Digital. It’ll be held at the MIT campus and streamed live online next week on May 22-23.  I’ll be there, along with AI leaders from companies like OpenAI, AWS, and Nvidia, talking about where AI is going next. Nick Clegg, Meta’s president of global affairs, will also join MIT Technology Review’s executive editor Amy Nordrum for an exclusive interview on stage. See you there! 

Readers of The Algorithm get 30% off tickets with the code ALGORITHMD24.


Now read the rest of The Algorithm

Deeper Learning

Deepfakes of your dead loved ones are a booming Chinese business

Once a week, Sun Kai has a video call with his mother. He opens up about work, the pressures he faces as a middle-aged man, and thoughts that he doesn’t even discuss with his wife. His mother will occasionally make a comment, but mostly, she just listens. That’s because Sun’s mother died five years ago. And the person he’s talking to isn’t actually a person, but a digital replica he made of her—a moving image that can conduct basic conversations. 

AI resurrection: There are plenty of people like Sun who want to use AI to interact with lost loved ones. The market is particularly strong in China, where at least half a dozen companies are now offering such technologies. In some ways, the avatars are the latest manifestation of a cultural tradition: Chinese people have always taken solace from confiding in the dead. Read more from Zeyi Yang

Bits and Bytes

Google DeepMind’s new AlphaFold can model a much larger slice of biological life
Google DeepMind has released an improved version of its biology prediction tool, AlphaFold, that can predict the structures not only of proteins but of nearly all the elements of biological life. It’s an exciting development that could help accelerate drug discovery and other scientific research. ​​(MIT Technology Review

The way whales communicate is closer to human language than we realized
Researchers used statistical models to analyze whale “codas” and managed to identify a structure to their language that’s similar to features of the complex vocalizations humans use. It’s a small step forward, but it could help unlock a greater understanding of how whales communicate. (MIT Technology Review)

Tech workers should shine a light on the industry’s secretive work with the military
Despite what happens in Google’s executive suites, workers themselves can force change. William Fitzgerald, who leaked information about Google’s controversial Project Maven, has shared how he thinks they can do this. (MIT Technology Review

AI systems are getting better at tricking us
A wave of AI systems have “deceived” humans in ways they haven’t been explicitly trained to do, by offering up false explanations for their behavior or concealing the truth from human users and misleading them to achieve a strategic end. This issue highlights how difficult artificial intelligence is to control and the unpredictable ways in which these systems work. (MIT Technology Review

Why America needs an Apollo program for the age of AI
AI is crucial to the future security and prosperity of the US. We need to lay the groundwork now by investing in computational power, argues Eric Schmidt. (MIT Technology Review

Fooled by AI? These firms sell deepfake detection that’s “REAL 100%”
The AI detection business is booming. There is one catch, however. Detecting AI-generated content is notoriously unreliable, and the tech is still in its infancy. That hasn’t stopped some startup founders (many of whom have no experience or background in AI) from trying to sell services they claim can do so. (The Washington Post

The tech-bro turf war over AI’s most hardcore hacker house
A hilarious piece taking an anthropological look at the power struggle between two competing hacker houses in Silicon Valley. The fight is over which house can call itself “AGI House.” (Forbes

My deepfake shows how valuable our data is in the age of AI

30 April 2024 at 05:23

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Deepfakes are getting good. Like, really good. Earlier this month I went to a studio in East London to get myself digitally cloned by the AI video startup Synthesia. They made a hyperrealistic deepfake that looked and sounded just like me, with realistic intonation. It is a long way away from the glitchiness of earlier generations of AI avatars. The end result was mind-blowing. It could easily fool someone who doesn’t know me well.

Synthesia has managed to create AI avatars that are remarkably humanlike after only one year of tinkering with the latest generation of generative AI. It’s equally exciting and daunting thinking about where this technology is going. It will soon be very difficult to differentiate between what is real and what is not, and this is a particularly acute threat given the record number of elections happening around the world this year. 

We are not ready for what is coming. If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI. 

I just published a story on my deepfake creation experience, and on the big questions about a world where we increasingly can’t tell what’s real. Read it here

But there is another big question: What happens to our data once we submit it to AI companies? Synthesia says it does not sell the data it collects from actors and customers, although it does release some of it for academic research purposes. The company uses avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company deletes their data.

But other companies are not that transparent about their intentions. As my colleague Eileen Guo reported last year, companies such as Meta license actors’ data—including their faces and  expressions—in a way that allows the companies to do whatever they want with it. Actors are paid a small up-front fee, but their likeness can then be used to train AI models in perpetuity without their knowledge. 

Even if contracts for data are transparent, they don’t apply if you die, says Carl Öhman, an assistant professor at Uppsala University who has studied the online data left by deceased people and is the author of a new book, The Afterlife of Data. The data we input into social media platforms or AI models might end up benefiting companies and living on long after we’re gone. 

“Facebook is projected to host, within the next couple of decades, a couple of billion dead profiles,” Öhman says. “They’re not really commercially viable. Dead people don’t click on any ads, but they take up server space nevertheless,” he adds. This data could be used to train new AI models, or to make inferences about the descendants of those deceased users. The whole model of data and consent with AI presumes that both the data subject and the company will live on forever, Öhman says.

Our data is a hot commodity. AI language models are trained by indiscriminately scraping the web, and that also includes our personal data. A couple of years ago I tested to see if GPT-3, the predecessor of the language model powering ChatGPT, has anything on me. It struggled, but I found that I was able to retrieve personal information about MIT Technology Review’s editor in chief, Mat Honan. 

High-quality, human-written data is crucial to training the next generation of powerful AI models, and we are on the verge of running out of free online training data. That’s why AI companies are racing to strike deals with news organizations and publishers to access their data treasure chests. 

Old social media sites are also a potential gold mine: when companies go out of business or platforms stop being popular, their assets, including users’ data, get sold to the highest bidder, says Öhman. 

“MySpace data has been bought and sold multiple times since MySpace crashed. And something similar may well happen to Synthesia, or X, or TikTok,” he says. 

Some people may not care much about what happens to their data, says Öhman. But securing exclusive access to high-quality data helps cement the monopoly position of large corporations, and that harms us all. This is something we need to grapple with as a society, he adds. 

Synthesia said it will delete my avatar after my experiment, but the whole experience did make me think of all the cringeworthy photos and posts that haunt me on Facebook and other social media platforms. I think it’s time for a purge.


Now read the rest of The Algorithm

Deeper Learning

Chatbot answers are all made up. This new tool helps you figure out which ones to trust.

Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk. A new tool created by Cleanlab, an AI startup spun out of MIT, is designed to provide a clearer sense of how trustworthy these models really are. 

A BS-o-meter for chatbots: Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. Read more from Will Douglas Heaven.

Bits and Bytes

Here’s the defense tech at the center of US aid to Israel, Ukraine, and Taiwan
President Joe Biden signed a $95 billion aid package into law last week. The bill will send a significant quantity of supplies to Ukraine and Israel, while also supporting Taiwan with submarine technology to aid its defenses against China. (MIT Technology Review

Rishi Sunak promised to make AI safe. Big Tech’s not playing ball.
The UK’s prime minister thought he secured a political win when he got AI power players to agree to voluntary safety testing with the UK’s new AI Safety Institute. Six months on, it turns out pinkie promises don’t go very far. OpenAI and Meta have not granted access to the AI Safety Institute to do prerelease safety testing on their models. (Politico

Inside the race to find AI’s killer app
The AI hype bubble is starting to deflate as companies try to find a way to make profits out of the eye-wateringly expensive process of developing and running this technology. Tech companies haven’t solved some of the fundamental problems slowing its wider adoption, such as the fact that generative models constantly make things up. (The Washington Post)  

Why the AI industry’s thirst for new data centers can’t be satisfied
The current boom in data-hungry AI means there is now a shortage of parts, property, and power to build data centers. (The Wall Street Journal

The friends who became rivals in Big Tech’s AI race
This story is a fascinating look into one of the most famous and fractious relationships in AI. Demis Hassabis and Mustafa Suleyman are old friends who grew up in London and went on to cofound AI lab DeepMind. Suleyman was ousted following a bullying scandal, went on to start his own short-lived startup, and now heads rival Microsoft’s AI efforts, while Hassabis still runs DeepMind, which is now Google’s central AI research lab. (The New York Times

This creamy vegan cheese was made with AI
Startups are using artificial intelligence to design plant-based foods. The companies train algorithms on data sets of ingredients with desirable traits like flavor, scent, or stretchability. Then they use AI to comb troves of data to develop new combinations of those ingredients that perform similarly. (MIT Technology Review

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

25 April 2024 at 01:00

I’m stressed and running late, because what do you wear for the rest of eternity? 

This makes it sound like I’m dying, but it’s the opposite. I am, in a way, about to live forever, thanks to the AI video startup Synthesia. For the past several years, the company has produced AI-generated avatars, but today it launches a new generation, its first to take advantage of the latest advancements in generative AI, and they are more realistic and expressive than anything I’ve ever seen. While today’s release means almost anyone will now be able to make a digital double, on this early April afternoon, before the technology goes public, they’ve agreed to make one of me. 

When I finally arrive at the company’s stylish studio in East London, I am greeted by Tosin Oshinyemi, the company’s production lead. He is going to guide and direct me through the data collection process—and by “data collection,” I mean the capture of my facial features, mannerisms, and more—much like he normally does for actors and Synthesia’s customers. 

In this AI-generated footage, synthetic “Melissa” gives a performance of Hamlet’s famous soliloquy. (The magazine had no role in producing this video.)
SYNTHESIA

He introduces me to a waiting stylist and a makeup artist, and I curse myself for wasting so much time getting ready. Their job is to ensure that people have the kind of clothes that look good on camera and that they look consistent from one shot to the next. The stylist tells me my outfit is fine (phew), and the makeup artist touches up my face and tidies my baby hairs. The dressing room is decorated with hundreds of smiling Polaroids of people who have been digitally cloned before me. 

Apart from the small supercomputer whirring in the corridor, which processes the data generated at the studio, this feels more like going into a news studio than entering a deepfake factory. 

I joke that Oshinyemi has what MIT Technology Review might call a job title of the future: “deepfake creation director.” 

“We like the term ‘synthetic media’ as opposed to ‘deepfake,’” he says. 

It’s a subtle but, some would argue, notable difference in semantics. Both mean AI-generated videos or audio recordings of people doing or saying something that didn’t necessarily happen in real life. But deepfakes have a bad reputation. Since their inception nearly a decade ago, the term has come to signal something unethical, says Alexandru Voica, Synthesia’s head of corporate affairs and policy. Think of sexual content produced without consent, or political campaigns that spread disinformation or propaganda.

“Synthetic media is the more benign, productive version of that,” he argues. And Synthesia wants to offer the best version of that version.  

Until now, all AI-generated videos of people have tended to have some stiffness, glitchiness, or other unnatural elements that make them pretty easy to differentiate from reality. Because they’re so close to the real thing but not quite it, these videos can make people feel annoyed or uneasy or icky—a phenomenon commonly known as the uncanny valley. Synthesia claims its new technology will finally lead us out of the valley. 

Thanks to rapid advancements in generative AI and a glut of training data created by human actors that has been fed into its AI model, Synthesia has been able to produce avatars that are indeed more humanlike and more expressive than their predecessors. The digital clones are better able to match their reactions and intonation to the sentiment of their scripts—acting more upbeat when talking about happy things, for instance, and more serious or sad when talking about unpleasant things. They also do a better job matching facial expressions—the tiny movements that can speak for us without words. 

But this technological progress also signals a much larger social and cultural shift. Increasingly, so much of what we see on our screens is generated (or at least tinkered with) by AI, and it is becoming more and more difficult to distinguish what is real from what is not. This threatens our trust in everything we see, which could have very real, very dangerous consequences. 

“I think we might just have to say goodbye to finding out about the truth in a quick way,” says Sandra Wachter, a professor at the Oxford Internet Institute, who researches the legal and ethical implications of AI. “The idea that you can just quickly Google something and know what’s fact and what’s fiction—I don’t think it works like that anymore.” 

monitor on a video camera showing Heikkilä and Oshinyemi on set in front of the green screen
Tosin Oshinyemi, the company’s production lead, guides and directs actors and customers through the data collection process.
DAVID VINTINER

So while I was excited for Synthesia to make my digital double, I also wondered if the distinction between synthetic media and deepfakes is fundamentally meaningless. Even if the former centers a creator’s intent and, critically, a subject’s consent, is there really a way to make AI avatars safely if the end result is the same? And do we really want to get out of the uncanny valley if it means we can no longer grasp the truth?

But more urgently, it was time to find out what it’s like to see a post-truth version of yourself.

Almost the real thing

A month before my trip to the studio, I visited Synthesia CEO Victor Riparbelli at his office near Oxford Circus. As Riparbelli tells it, Synthesia’s origin story stems from his experiences exploring avant-garde, geeky techno music while growing up in Denmark. The internet allowed him to download software and produce his own songs without buying expensive synthesizers. 

“I’m a huge believer in giving people the ability to express themselves in the way that they can, because I think that that provides for a more meritocratic world,” he tells me. 

He saw the possibility of doing something similar with video when he came across research on using deep learning to transfer expressions from one human face to another on screen. 

“What that showcased was the first time a deep-learning network could produce video frames that looked and felt real,” he says. 

That research was conducted by Matthias Niessner, a professor at the Technical University of Munich, who cofounded Synthesia with Riparbelli in 2017, alongside University College London professor Lourdes Agapito and Steffen Tjerrild, whom Riparbelli had previously worked with on a cryptocurrency project. 

Initially the company built lip-synching and dubbing tools for the entertainment industry, but it found that the bar for this technology’s quality was very high and there wasn’t much demand for it. Synthesia changed direction in 2020 and launched its first generation of AI avatars for corporate clients. That pivot paid off. In 2023, Synthesia achieved unicorn status, meaning it was valued at over $1 billion—making it one of the relatively few European AI companies to do so. 

That first generation of avatars looked clunky, with looped movements and little variation. Subsequent iterations started looking more human, but they still struggled to say complicated words, and things were slightly out of sync. 

The challenge is that people are used to looking at other people’s faces. “We as humans know what real humans do,” says Jonathan Starck, Synthesia’s CTO. Since infancy, “you’re really tuned in to people and faces. You know what’s right, so anything that’s not quite right really jumps out a mile.” 

These earlier AI-generated videos, like deepfakes more broadly, were made using generative adversarial networks, or GANs—an older technique for generating images and videos that uses two neural networks that play off one another. It was a laborious and complicated process, and the technology was unstable. 

But in the generative AI boom of the last year or so, the company has found it can create much better avatars using generative neural networks that produce higher quality more consistently. The more data these models are fed, the better they learn. Synthesia uses both large language models and diffusion models to do this; the former help the avatars react to the script, and the latter generate the pixels. 

Despite the leap in quality, the company is still not pitching itself to the entertainment industry. Synthesia continues to see itself as a platform for businesses. Its bet is this: As people spend more time watching videos on YouTube and TikTok, there will be more demand for video content. Young people are already skipping traditional search and defaulting to TikTok for information presented in video form. Riparbelli argues that Synthesia’s tech could help companies convert their boring corporate comms and reports and training materials into content people will actually watch and engage with. He also suggests it could be used to make marketing materials. 

He claims Synthesia’s technology is used by 56% of the Fortune 100, with the vast majority of those companies using it for internal communication. The company lists Zoom, Xerox, Microsoft, and Reuters as clients. Services start at $22 a month.

This, the company hopes, will be a cheaper and more efficient alternative to video from a professional production company—and one that may be nearly indistinguishable from it. Riparbelli tells me its newest avatars could easily fool a person into thinking they are real. 

“I think we’re 98% there,” he says. 

For better or worse, I am about to see it for myself. 

Don’t be garbage

In AI research, there is a saying: Garbage in, garbage out. If the data that went into training an AI model is trash, that will be reflected in the outputs of the model. The more data points the AI model has captured of my facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be. 

Back in the studio, I’m trying really hard not to be garbage. 

I am standing in front of a green screen, and Oshinyemi guides me through the initial calibration process, where I have to move my head and then eyes in a circular motion. Apparently, this will allow the system to understand my natural colors and facial features. I am then asked to say the sentence “All the boys ate a fish,” which will capture all the mouth movements needed to form vowels and consonants. We also film footage of me “idling” in silence.

image of Melissa standing on her mark in front of a green screen with server racks in background image
The more data points the AI system has on facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be.
DAVID VINTINER

He then asks me to read a script for a fictitious YouTuber in different tones, directing me on the spectrum of emotions I should convey. First I’m supposed to read it in a neutral, informative way, then in an encouraging way, an annoyed and complain-y way, and finally an excited, convincing way. 

“Hey, everyone—welcome back to Elevate Her with your host, Jess Mars. It’s great to have you here. We’re about to take on a topic that’s pretty delicate and honestly hits close to home—dealing with criticism in our spiritual journey,” I read off the teleprompter, simultaneously trying to visualize ranting about something to my partner during the complain-y version. “No matter where you look, it feels like there’s always a critical voice ready to chime in, doesn’t it?” 

Don’t be garbage, don’t be garbage, don’t be garbage. 

“That was really good. I was watching it and I was like, ‘Well, this is true. She’s definitely complaining,’” Oshinyemi says, encouragingly. Next time, maybe add some judgment, he suggests.   

We film several takes featuring different variations of the script. In some versions I’m allowed to move my hands around. In others, Oshinyemi asks me to hold a metal pin between my fingers as I do. This is to test the “edges” of the technology’s capabilities when it comes to communicating with hands, Oshinyemi says. 

Historically, making AI avatars look natural and matching mouth movements to speech has been a very difficult challenge, says David Barber, a professor of machine learning at University College London who is not involved in Synthesia’s work. That is because the problem goes far beyond mouth movements; you have to think about eyebrows, all the muscles in the face, shoulder shrugs, and the numerous different small movements that humans use to express themselves. 

motion capture stage with detail of a mocap pattern inset
The motion capture process uses reference patterns to help align footage captured from multiple angles around the subject.
DAVID VINTINER

Synthesia has worked with actors to train its models since 2020, and their doubles make up the 225 stock avatars that are available for customers to animate with their own scripts. But to train its latest generation of avatars, Synthesia needed more data; it has spent the past year working with around 1,000 professional actors in London and New York. (Synthesia says it does not sell the data it collects, although it does release some of it for academic research purposes.)

The actors previously got paid each time their avatar was used, but now the company pays them an up-front fee to train the AI model. Synthesia uses their avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company will delete their data. Synthesia’s enterprise customers can also generate their own custom avatars by sending someone into the studio to do much of what I’m doing.

photograph of a teleprompter screen with three arrows pointing down to "HEAD then EYES>"
The initial calibration process allows the system to understand the subject’s natural colors and facial features.
Melissa recording audio into a boom mic seated in front of a laptop stand
Synthesia also collects voice samples. In the studio, I read a passage indicating that I explicitly consent to having my voice cloned.

Between takes, the makeup artist comes in and does some touch-ups to make sure I look the same in every shot. I can feel myself blushing because of the lights in the studio, but also because of the acting. After the team has collected all the shots it needs to capture my facial expressions, I go downstairs to read more text aloud for voice samples. 

This process requires me to read a passage indicating that I explicitly consent to having my voice cloned, and that it can be used on Voica’s account on the Synthesia platform to generate videos and speech. 

Consent is key

This process is very different from the way many AI avatars, deepfakes, or synthetic media—whatever you want to call them—are created. 

Most deepfakes aren’t created in a studio. Studies have shown that the vast majority of deepfakes online are nonconsensual sexual content, usually using images stolen from social media. Generative AI has made the creation of these deepfakes easy and cheap, and there have been several high-profile cases in the US and Europe of children and women being abused in this way. Experts have also raised alarms that the technology can be used to spread political disinformation, a particularly acute threat given the record number of elections happening around the world this year. 

Synthesia’s policy is to not create avatars of people without their explicit consent. But it hasn’t been immune from abuse. Last year, researchers found pro-China misinformation that was created using Synthesia’s avatars and packaged as news, which the company said violated its terms of service. 

Since then, the company has put more rigorous verification and content moderation systems in place. It applies a watermark with information on where and how the AI avatar videos were created. Where it once had four in-house content moderators, people doing this work now make up 10% of its 300-person staff. It also hired an engineer to build better AI-powered content moderation systems. These filters help Synthesia vet every single thing its customers try to generate. Anything suspicious or ambiguous, such as content about cryptocurrencies or sexual health, gets forwarded to the human content moderators. Synthesia also keeps a record of all the videos its system creates.

And while anyone can join the platform, many features aren’t available until people go through an extensive vetting system similar to that used by the banking industry, which includes talking to the sales team, signing legal contracts, and submitting to security auditing, says Voica. Entry-level customers are limited to producing strictly factual content, and only enterprise customers using custom avatars can generate content that contains opinions. On top of this, only accredited news organizations are allowed to create content on current affairs.

“We can’t claim to be perfect. If people report things to us, we take quick action, [such as] banning or limiting individuals or organizations,” Voica says. But he believes these measures work as a deterrent, which means most bad actors will turn to freely available open-source tools instead. 

I put some of these limits to the test when I head to Synthesia’s office for the next step in my avatar generation process. In order to create the videos that will feature my avatar, I have to write a script. Using Voica’s account, I decide to use passages from Hamlet, as well as previous articles I have written. I also use a new feature on the Synthesia platform, which is an AI assistant that transforms any web link or document into a ready-made script. I try to get my avatar to read news about the European Union’s new sanctions against Iran. 

Voica immediately texts me: “You got me in trouble!” 

The system has flagged his account for trying to generate content that is restricted.

screencap from Synthesia video with text overlay "Your video was moderated for violating our Disinformation & Misinformation: Media Reporting (News) guidelines. If you believe this was an error please submit an appeal here."
AI-powered content filters help Synthesia vet every single thing its customers try to generate. Only accredited news organizations are allowed to create content on current affairs.
COURTESY OF SYNTHESIA

Offering services without these restrictions would be “a great growth strategy,” Riparbelli grumbles. But “ultimately, we have very strict rules on what you can create and what you cannot create. We think the right way to roll out these technologies in society is to be a little bit over-restrictive at the beginning.” 

Still, even if these guardrails operated perfectly, the ultimate result would nevertheless be an internet where everything is fake. And my experiment makes me wonder how we could possibly prepare ourselves. 

Our information landscape already feels very murky. On the one hand, there is heightened public awareness that AI-generated content is flourishing and could be a powerful tool for misinformation. But on the other, it is still unclear whether deepfakes are used for misinformation at scale and whether they’re broadly moving the needle to change people’s beliefs and behaviors. 

If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI. 

Claire Leibowicz, the head of the AI and media integrity at the nonprofit Partnership on AI, says she worries that growing awareness of this gap will make it easier to “plausibly deny and cast doubt on real material or media as evidence in many different contexts, not only in the news, [but] also in the courts, in the financial services industry, and in many of our institutions.” She tells me she’s heartened by the resources Synthesia has devoted to content moderation and consent but says that process is never flawless.

Even Riparbelli admits that in the short term, the proliferation of AI-generated content will probably cause trouble. While people have been trained not to believe everything they read, they still tend to trust images and videos, he adds. He says people now need to test AI products for themselves to see what is possible, and should not trust anything they see online unless they have verified it. 

Never mind that AI regulation is still patchy, and the tech sector’s efforts to verify content provenance are still in their early stages. Can consumers, with their varying degrees of media literacy, really fight the growing wave of harmful AI-generated content through individual action? 

Watch out, PowerPoint

The day after my final visit, Voica emails me the videos with my avatar. When the first one starts playing, I am taken aback. It’s as painful as seeing yourself on camera or hearing a recording of your voice. Then I catch myself. At first I thought the avatar was me. 

The more I watch videos of “myself,” the more I spiral. Do I really squint that much? Blink that much? And move my jaw like that? Jesus. 

It’s good. It’s really good. But it’s not perfect. “Weirdly good animation,” my partner texts me. 

“But the voice sometimes sounds exactly like you, and at other times like a generic American and with a weird tone,” he adds. “Weird AF.” 

He’s right. The voice is sometimes me, but in real life I umm and ahh more. What’s remarkable is that it picked up on an irregularity in the way I talk. My accent is a transatlantic mess, confused by years spent living in the UK, watching American TV, and attending international school. My avatar sometimes says the word “robot” in a British accent and other times in an American accent. It’s something that probably nobody else would notice. But the AI did. 

My avatar’s range of emotions is also limited. It delivers Shakespeare’s “To be or not to be” speech very matter-of-factly. I had guided it to be furious when reading a story I wrote about Taylor Swift’s nonconsensual nude deepfakes; the avatar is complain-y and judgy, for sure, but not angry. 

This isn’t the first time I’ve made myself a test subject for new AI. Not too long ago, I tried generating AI avatar images of myself, only to get a bunch of nudes. That experience was a jarring example of just how biased AI systems can be. But this experience—and this particular way of being immortalized—was definitely on a different level.

Carl Öhman, an assistant professor at Uppsala University who has studied digital remains and is the author of a new book, The Afterlife of Data, calls avatars like the ones I made “digital corpses.” 

“It looks exactly like you, but no one’s home,” he says. “It would be the equivalent of cloning you, but your clone is dead. And then you’re animating the corpse, so that it moves and talks, with electrical impulses.” 

That’s kind of how it feels. The little, nuanced ways I don’t recognize myself are enough to put me off. Then again, the avatar could quite possibly fool anyone who doesn’t know me very well. It really shines when presenting a story I wrote about how the field of robotics could be getting its own ChatGPT moment; the virtual AI assistant summarizes the long read into a decent short video, which my avatar narrates. It is not Shakespeare, but it’s better than many of the corporate presentations I’ve had to sit through. I think if I were using this to deliver an end-of-year report to my colleagues, maybe that level of authenticity would be enough. 

And that is the sell, according to Riparbelli: “What we’re doing is more like PowerPoint than it is like Hollywood.”

Once a likeness has been generated, Synthesia is able to generate video presentations quickly from a script. In this video, synthetic “Melissa” summarizes an article real Melissa wrote about Taylor Swift deepfakes.
SYNTHESIA

The newest generation of avatars certainly aren’t ready for the silver screen. They’re still stuck in portrait mode, only showing the avatar front-facing and from the waist up. But in the not-too-distant future, Riparbelli says, the company hopes to create avatars that can communicate with their hands and have conversations with one another. It is also planning for full-body avatars that can walk and move around in a space that a person has generated. (The rig to enable this technology already exists; in fact it’s where I am in the image at the top of this piece.)

But do we really want that? It feels like a bleak future where humans are consuming AI-generated content presented to them by AI-generated avatars and using AI to repackage that into more content, which will likely be scraped to generate more AI. If nothing else, this experiment made clear to me that the technology sector urgently needs to step up its content moderation practices and ensure that content provenance techniques such as watermarking are robust. 

Even if Synthesia’s technology and content moderation aren’t yet perfect, they’re significantly better than anything I have seen in the field before, and this is after only a year or so of the current boom in generative AI. AI development moves at breakneck speed, and it is both exciting and daunting to consider what AI avatars will look like in just a few years. Maybe in the future we will have to adopt safewords to indicate that you are in fact communicating with a real human, not an AI. 

But that day is not today. 

I found it weirdly comforting that in one of the videos, my avatar rants about nonconsensual deepfakes and says, in a sociopathically happy voice, “The tech giants? Oh! They’re making a killing!” 

I would never. 

Three things we learned about AI from EmTech Digital London

23 April 2024 at 05:55

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Last week, MIT Technology Review held its inaugural EmTech Digital conference in London. It was a great success! I loved seeing so many of you there asking excellent questions, and it was a couple of days full of brain-tickling insights about where AI is going next. 

Here are the three main things I took away from the conference.

1. AI avatars are getting really, really good

UK-based AI unicorn Synthesia teased its next generation of AI avatars, which are far more emotive and realistic than any I have ever seen before. The company is pitching these avatars as a new, more engaging way to communicate. Instead of skimming through pages and pages of onboarding material, for example, new employees could watch a video where a hyperrealistic AI avatar explains what they need to know about their job. This has the potential to change the way we communicate, allowing content creators to outsource their work to custom avatars and making it easier for organizations to share information with their staff. 

2. AI agents are coming 

Thanks to the ChatGPT boom, many of us have interacted with  an AI assistant that can retrieve information. But the next generation of these tools, called AI agents, can do much more than that. They are AI models and algorithms that can autonomously make decisions by themselves in a dynamic world. Imagine an AI travel agent that can not only retrieve information and suggest things to do, but also take action to book things for you, from flights to tours and accommodations. Every AI lab worth its salt, from OpenAI to Meta to startups, is racing to build agents that can reason better, memorize more steps, and interact with other apps and websites.  

3. Humans are not perfect either 

One of the best ways we have of ensuring that AI systems don’t go awry is getting humans to audit and evaluate them. But humans are complicated and biased, and we don’t always get things right. In order to build machines that meet our expectations​ and complement our limitations, we should account for human error from the get-go. In a fascinating presentation, Katie Collins, an AI researcher at the University of Cambridge, explained how she found that allowing people to express how certain or uncertain they are—for example, by using a percentage to indicate how confident they are in labeling data—leads to better accuracy for AI models overall. The only downside with this approach is that it costs more and takes more time.

And we’re doing it all again next month, this time at the mothership. 

Join us for EmTech Digital at the MIT campus in Cambridge, Massachusetts, on May 22-23, 2024. I’ll be there—join me! 

Our fantastic speakers include Nick Clegg, president of global affairs at Meta, who will talk about elections and AI-generated misinformation. We also have the OpenAI researchers who built the video-generation AI Sora, sharing their vision on how generative AI will change Hollywood. Then Max Tegmark, the MIT professor who wrote an open letter last year calling for a pause on AI development, will take stock of what has happened and discuss how to make powerful systems more safe. We also have a bunch of top scientists from the labs at Google, OpenAI, AWS, MIT, Nvidia and more. 

Readers of The Algorithm get 30% off with the discount code ALGORITHMD24.

I hope to see you there!


Now read the rest of The Algorithm

Deeper Learning

Researchers taught robots to run. Now they’re teaching them to walk.

Researchers at Oregon State University have successfully trained a humanoid robot called Digit V3 to stand, walk, pick up a box, and move it from one location to another. Meanwhile, a separate group of researchers from the University of California, Berkeley, have focused on teaching Digit to walk in unfamiliar environments while carrying different loads, without toppling over. 

What’s the big deal: Both groups are using an AI technique called sim-to-real reinforcement learning, a burgeoning method of training two-legged robots like Digit. Researchers believe it will lead to more robust, reliable two-legged machines capable of interacting with their surroundings more safely—as well as learning much more quickly. Read more from Rhiannon Williams

Bits and Bytes

It’s time to retire the term “user”
The proliferation of AI means we need a new word. Tools we once called AI bots have been assigned lofty titles like “copilot,” “assistant,” and “collaborator” to convey a sense of partnership instead of a sense of automation. But if AI is now a partner, then what are we? (MIT Technology Review

Three ways the US could help universities compete with tech companies on AI innovation
Empowering universities to remain at the forefront of AI research will be key to realizing the field’s long-term potential, argue Ylli Bajraktari, Tom Mitchell, and Daniela Rus. (MIT Technology Review

AI was supposed to make police body cams better. What happened?
New AI programs that analyze bodycam recordings promise more transparency but are doing little to change culture. This story serves as a useful reminder that technology is never a panacea for these sorts of deep-rooted issues. (MIT Technology Review

The World Health Organization’s AI chatbot makes stuff up
The World Health Organization launched a “virtual health worker“ to help people with questions about things like mental health, tobacco use, and healthy eating. But the chatbot frequently offers outdated information or just simply makes things up, a common issue with AI models. This is a great cautionary tale of why it’s not always a good idea to use AI chatbots. Hallucinating chatbots can lead to serious consequences when they are applied to important tasks such as giving health advice. (Bloomberg

Meta is adding AI assistants everywhere in its biggest AI push
The tech giant is rolling out its latest AI model, Llama 3, in most of its apps including Instagram, Facebook, and WhatsApp. People will also be able to ask its AI assistants for advice, or use them to search for information on the internet. (New York Times

Stability AI is in trouble
One of the first new generative AI unicorns, the company behind the open-source image-generating AI model Stable Diffusion, is laying off 10% of its workforce. Just a couple of weeks ago its CEO, Emad Mostaque, announced that he was leaving the company. Stability has also lost several high-profile researchers and struggled to monetize its product, and it is facing a slew of lawsuits over copyright. (The Verge

Three reasons robots are about to become way more useful 

16 April 2024 at 05:40

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The holy grail of robotics since the field’s beginning has been to build a robot that can do our housework. But for a long time, that has just been a dream. While roboticists have been able to get robots to do impressive things in the lab, such as parkour, this usually requires meticulous planning in a tightly-controlled setting. This makes it hard for robots to work reliably in homes around children and pets, homes have wildly varying floorplans, and contain all sorts of mess. 

There’s a well-known observation among roboticists called the Moravec’s paradox: What is hard for humans is easy for machines, and what is easy for humans is hard for machines. Thanks to AI, this is now changing. Robots are starting to become capable of doing tasks such as folding laundry, cooking and unloading shopping baskets, which not too long ago were seen as almost impossible tasks. 

In our most recent cover story for the MIT Technology Review print magazine, I looked at how robotics as a field is at an inflection point. You can read more here. A really exciting mix of things are converging in robotics research, which could usher in robots that might—just might—make it out of the lab and into our homes. 

Here are three reasons why robotics is on the brink of having its own “ChatGPT moment.”

1. Cheap hardware makes research more accessible
Robots are expensive. Highly sophisticated robots can easily cost hundreds of thousands of dollars, which makes them inaccessible for most researchers. For example the PR2, one of the earliest iterations of home robots, weighed 450 pounds (200 kilograms) and cost $400,000. 

But new, cheaper robots are allowing more researchers to do cool stuff. A new robot called Stretch, developed by startup Hello Robot, launched during the pandemic with a much more reasonable price tag of around $18,000 and a weight of 50 pounds. It has a small mobile base, a stick with a camera dangling off it, an adjustable arm featuring a gripper with suction cups at the ends, and it can be controlled with a console controller. 

Meanwhile, a team at Stanford has built a system called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), that learned to cook shrimp with the help of just 20 human demonstrations and data from other tasks. They used off-the-shelf components to cobble together robots with more reasonable price tags in the tens, not hundreds, of thousands.

2. AI is helping us build “robotic brains”
What separates this new crop of robots is their software. Thanks to the AI boom the focus is now shifting from feats of physical dexterity achieved by expensive robots to building “general-purpose robot brains” in the form of neural networks. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly. 

Last summer, Google launched a vision-language-­action model called RT-2. This model gets its general understanding of the world from the online text and images it has been trained on, as well as its own interactions. It translates that data into robotic actions. 

And researchers at the Toyota Research Institute, Columbia University and MIT have been able to quickly teach robots to do many new tasks with the help of an AI learning technique called imitation learning, plus generative AI. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements. 

Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks. 

3. More data allows robots to learn more skills
The power of large AI models such as GPT-4 lie in the reams and reams of data hoovered from the internet. But that doesn’t really work for robots, which need data that have been specifically collected for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded. Right now that data is very scarce, and it takes a long time for humans to collect.

A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving.  

Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.

Read more in my story here


Now read the rest of The Algorithm

Deeper Learning

Generative AI can turn your most precious memories into photos that never existed

Maria grew up in Barcelona, Spain, in the 1940s. Her first memories of her father are vivid. As a six-year-old, Maria would visit a neighbor’s apartment in her building when she wanted to see him. From there, she could peer through the railings of a balcony into the prison below and try to catch a glimpse of him through the small window of his cell, where he was locked up for opposing the dictatorship of Francisco Franco. There is no photo of Maria on that balcony. But she can now hold something like it: a fake photo—or memory-based reconstruction.

Remember this: Dozens of people have now had their memories turned into images in this way via Synthetic Memories, a project run by Barcelona-based design studio Domestic Data Streamers. Read this story by my colleague Will Douglas Heaven to find out more

Bits and Bytes

Why the Chinese government is sparing AI from harsh regulations—for now
The way China regulates its tech industry can seem highly unpredictable. The government can celebrate the achievements of Chinese tech companies one day and then turn against them the next. But there are patterns in China’s approach, and they indicate how it’ll regulate AI. (MIT Technology Review

AI could make better beer. Here’s how.
New AI models can accurately identify not only how tasty consumers will deem beers, but also what kinds of compounds brewers should be adding to make them taste better, according to research. (MIT Technology Review

OpenAI’s legal troubles are mounting
OpenAI is lawyering up as it faces a deluge of lawsuits both at home and abroad. The company has hired about two dozen in-house lawyers since last spring to work on copyright claims, and is also hiring an antitrust lawyer. The company’s new strategy is to try to position itself as America’s bulwark against China. (The Washington Post

Did Google’s AI actually discover millions of new materials?
Late last year, Google DeepMind claimed it had discovered millions of new materials using deep learning. But researchers who analyzed a subset of DeepMind’s work found that the company’s claims may have been overhyped, and that the company hadn’t found materials that were useful or credible. (404 Media

OpenAI and Meta are building new AI models capable of “reasoning”
The next generation of powerful AI models from OpenAI and Meta will be able to do more complex tasks, such as reason, plan and retain more information. This, tech companies believe, will allow them to be more reliable and not make the kind of silly mistakes that this generation of language models are so prone to. (The Financial Times

Is robotics about to have its own ChatGPT moment?

11 April 2024 at 05:00

Silent. Rigid. Clumsy.

Henry and Jane Evans are used to awkward houseguests. For more than a decade, the couple, who live in Los Altos Hills, California, have hosted a slew of robots in their home. 

In 2002, at age 40, Henry had a massive stroke, which left him with quadriplegia and an inability to speak. Since then, he’s learned how to communicate by moving his eyes over a letter board, but he is highly reliant on caregivers and his wife, Jane. 

Henry got a glimmer of a different kind of life when he saw Charlie Kemp on CNN in 2010. Kemp, a robotics professor at Georgia Tech, was on TV talking about PR2, a robot developed by the company Willow Garage. PR2 was a massive two-armed machine on wheels that looked like a crude metal butler. Kemp was demonstrating how the robot worked, and talking about his research on how health-care robots could help people. He showed how the PR2 robot could hand some medicine to the television host.    

“All of a sudden, Henry turns to me and says, ‘Why can’t that robot be an extension of my body?’ And I said, ‘Why not?’” Jane says. 

There was a solid reason why not. While engineers have made great progress in getting robots to work in tightly controlled environments like labs and factories, the home has proved difficult to design for. Out in the real, messy world, furniture and floor plans differ wildly; children and pets can jump in a robot’s way; and clothes that need folding come in different shapes, colors, and sizes. Managing such unpredictable settings and varied conditions has been beyond the capabilities of even the most advanced robot prototypes. 

That seems to finally be changing, in large part thanks to artificial intelligence. For decades, roboticists have more or less focused on controlling robots’ “bodies”—their arms, legs, levers, wheels, and the like—via purpose-­driven software. But a new generation of scientists and inventors believes that the previously missing ingredient of AI can give robots the ability to learn new skills and adapt to new environments faster than ever before. This new approach, just maybe, can finally bring robots out of the factory and into our homes. 

Progress won’t happen overnight, though, as the Evanses know far too well from their many years of using various robot prototypes. 

PR2 was the first robot they brought in, and it opened entirely new skills for Henry. It would hold a beard shaver and Henry would move his face against it, allowing him to shave and scratch an itch by himself for the first time in a decade. But at 450 pounds (200 kilograms) or so and $400,000, the robot was difficult to have around. “It could easily take out a wall in your house,” Jane says. “I wasn’t a big fan.”

More recently, the Evanses have been testing out a smaller robot called Stretch, which Kemp developed through his startup Hello Robot. The first iteration launched during the pandemic with a much more reasonable price tag of around $18,000. 

Stretch weighs about 50 pounds. It has a small mobile base, a stick with a camera dangling off it, and an adjustable arm featuring a gripper with suction cups at the ends. It can be controlled with a console controller. Henry controls Stretch using a laptop, with a tool that that tracks his head movements to move a cursor around. He is able to move his thumb and index finger enough to click a computer mouse. Last summer, Stretch was with the couple for more than a month, and Henry says it gave him a whole new level of autonomy. “It was practical, and I could see using it every day,” he says. 

a robot arm holds a brush over the head of Henry Evans which rests on a pillow
Henry Evans used the Stretch robot to brush his hair, eat, and even play with his granddaughter.
PETER ADAMS

Using his laptop, he could get the robot to brush his hair and have it hold fruit kebabs for him to snack on. It also opened up Henry’s relationship with his granddaughter Teddie. Before, they barely interacted. “She didn’t hug him at all goodbye. Nothing like that,” Jane says. But “Papa Wheelie” and Teddie used Stretch to play, engaging in relay races, bowling, and magnetic fishing. 

Stretch doesn’t have much in the way of smarts: it comes with some pre­installed software, such as the web interface that Henry uses to control it, and other capabilities such as AI-enabled navigation. The main benefit of Stretch is that people can plug in their own AI models and use them to do experiments. But it offers a glimpse of what a world with useful home robots could look like. Robots that can do many of the things humans do in the home—tasks such as folding laundry, cooking meals, and cleaning—have been a dream of robotics research since the inception of the field in the 1950s. For a long time, it’s been just that: “Robotics is full of dreamers,” says Kemp.

But the field is at an inflection point, says Ken Goldberg, a robotics professor at the University of California, Berkeley. Previous efforts to build a useful home robot, he says, have emphatically failed to meet the expectations set by popular culture—think the robotic maid from The Jetsons. Now things are very different. Thanks to cheap hardware like Stretch, along with efforts to collect and share data and advances in generative AI, robots are getting more competent and helpful faster than ever before. “We’re at a point where we’re very close to getting capability that is really going to be useful,” Goldberg says. 

Folding laundry, cooking shrimp, wiping surfaces, unloading shopping baskets—today’s AI-powered robots are learning to do tasks that for their predecessors would have been extremely difficult. 

Missing pieces

There’s a well-known observation among roboticists: What is hard for humans is easy for machines, and what is easy for humans is hard for machines. Called Moravec’s paradox, it was first articulated in the 1980s by Hans Moravec, thena roboticist at the Robotics Institute of Carnegie Mellon University. A robot can play chess or hold an object still for hours on end with no problem. Tying a shoelace, catching a ball, or having a conversation is another matter. 

There are three reasons for this, says Goldberg. First, robots lack precise control and coordination. Second, their understanding of the surrounding world is limited because they are reliant on cameras and sensors to perceive it. Third, they lack an innate sense of practical physics. 

“Pick up a hammer, and it will probably fall out of your gripper, unless you grab it near the heavy part. But you don’t know that if you just look at it, unless you know how hammers work,” Goldberg says. 

On top of these basic considerations, there are many other technical things that need to be just right, from motors to cameras to Wi-Fi connections, and hardware can be prohibitively expensive. 

Mechanically, we’ve been able to do fairly complex things for a while. In a video from 1957, two large robotic arms are dexterous enough to pinch a cigarette, place it in the mouth of a woman at a typewriter, and reapply her lipstick. But the intelligence and the spatial awareness of that robot came from the person who was operating it. 

""
In a video from 1957, a man operates two large robotic arms and uses the machine to apply a woman’s lipstick. Robots have come a long way since.
“LIGHTER SIDE OF THE NEWS –ATOMIC ROBOT A HANDY GUY” (1957) VIA YOUTUBE

“The missing piece is: How do we get software to do [these things] automatically?” says Deepak Pathak, an assistant professor of computer science at Carnegie Mellon.  

Researchers training robots have traditionally approached this problem by planning everything the robot does in excruciating detail. Robotics giant Boston Dynamics used this approach when it developed its boogying and parkouring humanoid robot Atlas. Cameras and computer vision are used to identify objects and scenes. Researchers then use that data to make models that can be used to predict with extreme precision what will happen if a robot moves a certain way. Using these models, roboticists plan the motions of their machines by writing a very specific list of actions for them to take. The engineers then test these motions in the laboratory many times and tweak them to perfection. 

This approach has its limits. Robots trained like this are strictly choreographed to work in one specific setting. Take them out of the laboratory and into an unfamiliar location, and they are likely to topple over. 

Compared with other fields, such as computer vision, robotics has been in the dark ages, Pathak says. But that might not be the case for much longer, because the field is seeing a big shake-up. Thanks to the AI boom, he says, the focus is now shifting from feats of physical dexterity to building “general-purpose robot brains” in the form of neural networks. Much as the human brain is adaptable and can control different aspects of the human body, these networks can be adapted to work in different robots and different scenarios. Early signs of this work show promising results. 

Robots, meet AI 

For a long time, robotics research was an unforgiving field, plagued by slow progress. At the Robotics Institute at Carnegie Mellon, where Pathak works, he says, “there used to be a saying that if you touch a robot, you add one year to your PhD.” Now, he says, students get exposure to many robots and see results in a matter of weeks.

What separates this new crop of robots is their software. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly. At the same time, new, cheaper hardware, such as off-the-shelf components and robots like Stretch, is making this sort of experimentation more accessible. 

Broadly speaking, there are two popular ways researchers are using AI to train robots. Pathak has been using reinforcement learning, an AI technique that allows systems to improve through trial and error, to get robots to adapt their movements in new environments. This is a technique that Boston Dynamics has also started using  in its robot “dogs” called Spot.

Deepak Pathak’s team at Carnegie Mellon has used an AI technique called reinforcement learning to create a robotic dog that can do extreme parkour with minimal pre-programming.

In 2022, Pathak’s team used this method to create four-legged robot “dogs” capable of scrambling up steps and navigating tricky terrain. The robots were first trained to move around in a general way in a simulator. Then they were set loose in the real world, with a single built-in camera and computer vision software to guide them. Other similar robots rely on tightly prescribed internal maps of the world and cannot navigate beyond them.

Pathak says the team’s approach was inspired by human navigation. Humans receive information about the surrounding world from their eyes, and this helps them instinctively place one foot in front of the other to get around in an appropriate way. Humans don’t typically look down at the ground under their feet when they walk, but a few steps ahead, at a spot where they want to go. Pathak’s team trained its robots to take a similar approach to walking: each one used the camera to look ahead. The robot was then able to memorize what was in front of it for long enough to guide its leg placement. The robots learned about the world in real time, without internal maps, and adjusted their behavior accordingly. At the time, experts told MIT Technology Review the technique was a “breakthrough in robot learning and autonomy” and could allow researchers to build legged robots capable of being deployed in the wild.   

Pathak’s robot dogs have since leveled up. The team’s latest algorithm allows a quadruped robot to do extreme parkour. The robot was again trained to move around in a general way in a simulation. But using reinforcement learning, it was then able to teach itself new skills on the go, such as how to jump long distances, walk on its front legs, and clamber up tall boxes twice its height. These behaviors were not something the researchers programmed. Instead, the robot learned through trial and error and visual input from its front camera. “I didn’t believe it was possible three years ago,” Pathak says. 

In the other popular technique, called imitation learning, models learn to perform tasks by, for example, imitating the actions of a human teleoperating a robot or using a VR headset to collect data on a robot. It’s a technique that has gone in and out of fashion over decades but has recently become more popular with robots that do manipulation tasks, says Russ Tedrake, vice president of robotics research at the Toyota Research Institute and an MIT professor.

By pairing this technique with generative AI, researchers at the Toyota Research Institute, Columbia University, and MIT have been able to quickly teach robots to do many new tasks. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements. 

The idea is to start with a human, who manually controls the robot to demonstrate behaviors such as whisking eggs or picking up plates. Using a technique called diffusion policy, the robot is then able to use the data fed into it to learn skills. The researchers have taught robots more than 200 skills, such as peeling vegetables and pouring liquids, and say they are working toward teaching 1,000 skills by the end of the year. 

Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks. 

The Toyota Research Institute team hopes this will one day lead to “large behavior models,” which are analogous to large language models, says Tedrake. “A lot of people think behavior cloning is going to get us to a ChatGPT moment for robotics,” he says. 

In a similar demonstration, earlier this year a team at Stanford managed to use a relatively cheap off-the-shelf robot costing $32,000 to do complex manipulation tasks such as cooking shrimp and cleaning stains. It learned those new skills quickly with AI. 

Called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), the robot learned to cook shrimp with the help of just 20 human demonstrations and data from other tasks, such as tearing off a paper towel or piece of tape. The Stanford researchers found that AI can help robots acquire transferable skills: training on one task can improve its performance for others.

While the current generation of generative AI works with images and language, researchers at the Toyota Research Institute, Columbia University, and MIT believe the approach can extend to the domain of robot motion.

This is all laying the groundwork for robots that can be useful in homes. Human needs change over time, and teaching robots to reliably do a wide range of tasks is important, as it will help them adapt to us. That is also crucial to commercialization—first-generation home robots will come with a hefty price tag, and the robots need to have enough useful skills for regular consumers to want to invest in them. 

For a long time, a lot of the robotics community was very skeptical of these kinds of approaches, says Chelsea Finn, an assistant professor of computer science and electrical engineering at Stanford University and an advisor for the Mobile ALOHA project. Finn says that nearly a decade ago, learning-based approaches were rare at robotics conferences and disparaged in the robotics community. “The [natural-language-processing] boom has been convincing more of the community that this approach is really, really powerful,” she says. 

There is one catch, however. In order to imitate new behaviors, the AI models need plenty of data. 

More is more

Unlike chatbots, which can be trained by using billions of data points hoovered from the internet, robots need data specifically created for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded, says Lerrel Pinto, an assistant professor of computer science at New York University. Right now that data is very scarce, and it takes a long time for humans to collect.

top frame shows a person recording themself opening a kitchen drawer with a grabber, and the bottom shows a robot attempting the same action
“ON BRINGING ROBOTS HOME,” NUR MUHAMMAD (MAHI) SHAFIULLAH, ET AL.

Some researchers are trying to use existing videos of humans doing things to train robots, hoping the machines will be able to copy the actions without the need for physical demonstrations. 

Pinto’s lab has also developed a neat, cheap data collection approach that connects robotic movements to desired actions. Researchers took a reacher-grabber stick, similar to ones used to pick up trash, and attached an iPhone to it. Human volunteers can use this system to film themselves doing household chores, mimicking the robot’s view of the end of its robotic arm. Using this stand-in for Stretch’s robotic arm and an open-source system called DOBB-E, Pinto’s team was able to get a Stretch robot to learn tasks such as pouring from a cup and opening shower curtains with just 20 minutes of iPhone data.  

But for more complex tasks, robots would need even more data and more demonstrations.  

The requisite scale would be hard to reach with DOBB-E, says Pinto, because you’d basically need to persuade every human on Earth to buy the reacher-­grabber system, collect data, and upload it to the internet. 

A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving.  

Sergey Levine, a computer scientist at UC Berkeley who participated in the project, says the goal was to create a “robot internet” by collecting data from labs around the world. This would give researchers access to bigger, more scalable, and more diverse data sets. The deep-learning revolution that led to the generative AI of today started in 2012 with the rise of ImageNet, a vast online data set of images. The Open X-Embodiment Collaboration is an attempt by the robotics community to do something similar for robot data. 

Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. 

When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.

“I don’t think anybody saw that coming,” says Vincent Vanhoucke, Google DeepMind’s head of robotics. “Suddenly there is a path to basically leveraging all these other sources of data to bring about very intelligent behaviors in robotics.”

Many roboticists think that large vision-language models, which are able to analyze image and language data, might offer robots important hints as to how the surrounding world works, Vanhoucke says. They offer semantic clues about the world and could help robots with reasoning, deducing things, and learning by interpreting images. To test this, researchers took a robot that had been trained on the larger model and asked it to point to a picture of Taylor Swift. The researchers had not shown the robot pictures of Swift, but it was still able to identify the pop star because it had a web-scale understanding of who she was even without photos of her in its data set, says Vanhoucke.

""
RT-2, a recent model for robotic control, was trained on online text and images as well as interactions with the real world.
KELSEY MCCLELLAN

Vanhoucke says Google DeepMind is increasingly using techniques similar to those it would use for machine translation to translate from English to robotics. Last summer, Google introduced a vision-language-­action model called RT-2. This model gets its general understanding of the world from online text and images it has been trained on, as well as its own interactions in the real world. It translates that data into robotic actions. Each robot has a slightly different way of translating English into action, he adds.  

“We increasingly feel like a robot is essentially a chatbot that speaks robotese,” Vanhoucke says. 

Baby steps

Despite the fast pace of development, robots still face many challenges before they can be released into the real world. They are still way too clumsy for regular consumers to justify spending tens of thousands of dollars on them. Robots also still lack the sort of common sense that would allow them to multitask. And they need to move from just picking things up and placing them somewhere to putting things together, says Goldberg—for example, putting a deck of cards or a board game back in its box and then into the games cupboard. 

But to judge from the early results of integrating AI into robots, roboticists are not wasting their time, says Pinto. 

“I feel fairly confident that we will see some semblance of a general-purpose home robot. Now, will it be accessible to the general public? I don’t think so,” he says. “But in terms of raw intelligence, we are already seeing signs right now.” 

Building the next generation of robots might not just assist humans in their everyday chores or help people like Henry Evans live a more independent life. For researchers like Pinto, there is an even bigger goal in sight.

Home robotics offers one of the best benchmarks for human-level machine intelligence, he says. The fact that a human can operate intelligently in the home environment, he adds, means we know this is a level of intelligence that can be reached. 

“It’s something which we can potentially solve. We just don’t know how to solve it,” he says. 

Evans in the foreground with computer screen.  A table with playing cards separates him from two other people in the room
Thanks to Stretch, Henry Evans was able to hold his own playing cards for the first time in two decades.
VY NGUYEN

For Henry and Jane Evans, a big win would be to get a robot that simply works reliably. The Stretch robot that the Evanses experimented with is still too buggy to use without researchers present to troubleshoot, and their home doesn’t always have the dependable Wi-Fi connectivity Henry needs in order to communicate with Stretch using a laptop.

Even so, Henry says, one of the greatest benefits of his experiment with robots has been independence: “All I do is lay in bed, and now I can do things for myself that involve manipulating my physical environment.”

Thanks to Stretch, for the first time in two decades, Henry was able to hold his own playing cards during a match. 

“I kicked everyone’s butt several times,” he says. 

“Okay, let’s not talk too big here,” Jane says, and laughs.

A conversation with Dragoș Tudorache, the politician behind the AI Act

8 April 2024 at 05:43

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Dragoș Tudorache is feeling pretty damn good. We’re sitting in a conference room in a chateau overlooking a lake outside Brussels, sipping glasses of cava. The Romanian liberal member of the European Parliament has spent the day hosting a conference on AI, defense, and geopolitics attended by nearly 400 VIP guests. The day is almost over, and Tudorache has promised to squeeze an interview with me in during cocktail hour. 

A former interior minister, Tudorache is one of the most important players in European AI policy. He is one of the two lead negotiators of the AI Act in the European Parliament. The bill, the first sweeping AI law of its kind in the world, will enter into force this year. We first met two years ago, when Tudorache was appointed to his position as negotiator. 

But Tudorache’s interest in AI started much earlier, in 2015. He says reading Nick Bostrom’s book Superintelligence, which explores how an AI superintelligence could be created and what the implications could be, made him realize the potential and dangers of AI and the need for regulating it. (Bostrom has recently been embroiled in a scandal for expressing racist views in emails unearthed from the ‘90s. Tudorache says he is not aware of Bostrom’s career after the publication of the book, and he did not comment on the controversy.) 

When he was elected to the European Parliament in 2019, he says, he arrived determined to work on AI regulation if the opportunity presented itself. 

“When I heard [Ursula] von der Leyen [the European Commission president] say in her first speech in front of Parliament that there will be AI regulation, I said ‘Whoo-ha, this is my moment,’” he recalls. 

Since then, Tudorache has chaired a special committee on AI, and shepherded the AI Act through the European Parliament and into its final form following negotiations with other EU institutions. 

It’s been a wild ride, with intense negotiations, the rise of ChatGPT, lobbying from tech companies, and flip-flopping by some of Europe’s largest economies. But now, as the AI Act has passed into law, Tudorache’s job on it is done and dusted, and he says he has no regrets. Although the act has been criticized—both by civil society for not protecting human rights enough and by industry for being too restrictive—Tudorache says its final form was the sort of compromise he expected. Politics is the art of compromise, after all. 

“There’s going to be a lot of building the plane while flying, and there’s going to be a lot of learning while doing,” he says. “But if the true spirit of what we meant with the legislation is well understood by all concerned, I do think that the outcome can be a positive one.”  

It is still early days—the law comes fully into force two years from now. But Tudorache believes it will change the tech industry for the better and start a process where companies will start to take responsible AI seriously thanks to the legally binding obligations for AI companies to be more transparent about how their models are built. (I wrote about the five things you need to know about the AI Act a couple of months ago here.)

“The fact that we now have a blueprint for how you put the right boundaries, while also leaving room for innovation, is something that will serve society,” says Tudorache. It will also serve businesses, he says, because it offers a predictable path forward on what you can and cannot do with AI. 

But the AI Act is just the beginning, and there is still plenty keeping Tudorache up at night. AI is ushering in big changes across every industry and society. It will change everything from health care to education, labor, defense, and even human creativity. Most countries have not grasped what AI will mean for them, he says, and the responsibility now lies with governments to ensure that citizens and society more broadly are ready for the AI age. 

“The crunch time … starts now,” he says. 

Join Dragoș Tudorache and me at Emtech Digital London on April 16-17! Tudorache will walk you through what companies need to take into account with the AI Act right now. See you next week!


Now read the rest of The Algorithm

Deeper Learning

A conversation with OpenAI’s first artist in residence

Alex Reben’s work is often absurd, sometimes surreal: a mash-up of giant ears imagined by DALL-E and sculpted by hand out of marble; critical burns generated by ChatGPT that thumb the nose at AI art. But its message is relevant to everyone. Reben is interested in the roles humans play in a world filled with machines, and how those roles are changing. He is also OpenAI’s first artist in residence. 
Meet the artist: Officially, the appointment started in January and lasts three months. But he’s been working with OpenAI for years already. Our senior editor for AI, Will Douglas Heaven, sat down with Reben to talk about the role AI can play in art, and the backlash against it from artists. Read more here.

Bits and Bytes

It’s easy to tamper with watermarks from AI-generated text

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t. It’s an especially significant finding because many regulations around the world, including the AI Act, are betting heavily on the development of watermarks to trace AI-generated content.  (MIT Technology Review

How three filmmakers created Sora’s latest stunning videos

In the last month, a handful of filmmakers have taken OpenAI’s new generative AI model Sora for a test drive. The results are amazing. The short films are a big jump up even from the cherry-picked demo videos that OpenAI used to tease Sora just six weeks ago. Here’s how three of the filmmakers did it. (MIT Technology Review

What’s next for generative video

Generative video will probably upend a wide range of businesses and change the roles of many professionals, from animators to advertisers. Fears of misuse are also growing. The widespread ability to generate fake video will make it easier than ever to flood the internet with propaganda and nonconsensual porn. We can see it coming. The problem is, nobody has a good fix. (MIT Technology Review

Google is considering charging for AI-powered search

In a major potential shake-up to Google’s business model, the tech giant is considering putting AI-powered search features behind a paywall. But considering how untrustworthy AI search results are, it’s unclear if people will want to pay for them. (Financial Times) 

The fight for AI talent heats up 

As layoffs sweep through the tech sector, AI jobs are still super hot. Tech giants are fighting each other for top talent, even offering seven-figure salaries, and poaching entire engineering teams with experience in generative AI. (Wall Street Journal

Inside Big Tech’s underground race to buy AI training data

AI models need to be trained on massive data sets, and big tech companies are quietly paying for data, chat logs, and personal photos hidden behind paywalls and login screens. (Reuters

How tech giants cut corners to harvest data for AI

AI companies are running out of quality training data for their huge AI models. In order to harvest more data, tech companies such as OpenAI, Google, and Meta have cut corners, ignored corporate policies, and debated bending the law, the New York Times found. (New York Times)

It’s easy to tamper with watermarks from AI-generated text

29 March 2024 at 10:51

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t. 

Watermarking works by inserting hidden patterns in AI-generated text, which allow computers to detect that the text comes from an AI system. They’re a fairly new invention, but they have already become a popular solution for fighting AI-generated misinformation and plagiarism. For example, the European Union’s AI Act, which enters into force in May, will require developers to watermark AI-generated content. But the new research shows that the cutting edge of watermarking technology doesn’t live up to regulators’ requirements, says Robin Staab, a PhD student at ETH Zürich, who was part of the team that developed the attacks. The research is yet to be peer reviewed, but will be presented at the International Conference on Learning Representations conference in May.  

AI language models work by predicting the next likely word in a sentence, generating one word at a time on the basis of those predictions. Watermarking algorithms for text divide the language model’s vocabulary into words on a “green list” and a “red list,” and then make the AI model choose words from the green list. The more words in a sentence that are from the green list, the more likely it is that the text was generated by a computer. Humans tend to write sentences that include a more random mix of words. 

The researchers tampered with five different watermarks that work in this way. They were able to reverse-engineer the watermarks by using an API to access the AI model with the watermark applied and prompting it many times, says Staab. The responses allow the attacker to “steal” the watermark by building an approximate model of the watermarking rules. They do this by analyzing the AI outputs and comparing them with normal text. 

Once they have an approximate idea of what the watermarked words might be, this allows the researchers to execute two kinds of attacks. The first one, called a spoofing attack, allows malicious actors to use the information they learned from stealing the watermark to produce text that can be passed off as being watermarked. The second attack allows hackers to scrub AI-generated text from its watermark, so the text can be passed off as human-written. 

The team had a roughly 80% success rate in spoofing watermarks, and an 85% success rate in stripping AI-generated text of its watermark. 

Researchers not affiliated with the ETH Zürich team, such as Soheil Feizi, an associate professor and director of the Reliable AI Lab at the University of Maryland, have also found watermarks to be unreliable and vulnerable to spoofing attacks. 

The findings from ETH Zürich confirm that these issues with watermarks persist and extend to the most advanced types of chatbots and large language models being used today, says Feizi. 

The research “underscores the importance of exercising caution when deploying such detection mechanisms on a large scale,” he says. 

Despite the findings, watermarks remain the most promising way to detect AI-generated content, says Nikola Jovanović, a PhD student at ETH Zürich who worked on the research. 

But more research is needed to make watermarks ready for deployment on a large scale, he adds. Until then, we should manage our expectations of how reliable and useful these tools are. “If it’s better than nothing, it is still useful,” he says.  

Update: This research will be presented at the International Conference on Learning Representations conference. The story has been updated to reflect that.

Meet the MIT Technology Review AI team in London

26 March 2024 at 07:06

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The UK is home to AI powerhouse Google DeepMind, a slew of exciting AI startups, and some of the world’s best universities. It’s also where I live, along with quite a few of my MIT Technology Review colleagues, including our senior AI editor Will Douglas Heaven. 

That’s why I’m super stoked to tell you that we’re gathering some of the brightest minds in AI in Europe for our flagship AI conference, EmTech Digital, in London on April 16 and 17. 

Our speakers include top figures like Zoubin Ghahramani, vice president of research at Google DeepMind; Maja Pantic, AI scientific research lead at Meta; Dragoș Tudorache, a member of the European Parliament and one of the key politicians behind the newly passed EU AI Act; and Victor Riparbelli, CEO of AI avatar company Synthesia. 

We’ll also hear from executives at NVIDIA, Roblox, Faculty, and ElevenLabs, and researchers from the UK’s top universities and AI research institutes. 

They will share their wisdom on how to harness AI and what businesses need to know right now about this transformative technology. 

Here are some sessions I am particularly excited about.

Generating AI’s Path Forward
Where is AI going next? Zoubin Ghahramani, vice president of research at Google DeepMind, will map out realistic timelines for new innovation, and he will discuss the need for an overall strategy for a safe and productive AI future for Europe and beyond.

Digital Assistants for AI Automation
You’ve perhaps heard of AI assistants. But in this session, David Barber, director of the Centre for Artificial Intelligence at University College London, will argue that a major transformation will come with the rise of AI agents, which can complete complex sets of actions such as booking travel, answering messages, and performing data entry. 

AI’s Impact on Democracy
A senior official from the UK’s National Cyber Security Centre will walk us through some of the threats posed by AI that keep him up at night. Based on our speaker prep call, I can tell you that real life really is stranger than fiction. 

The AI Act’s Impacts on Policy and Regulations
The AI Act is here, and companies in the US and the UK will have to comply with it if they want to do business in the EU. I will be sitting down with Dragoș Tudorache, one of the key politicians behind the law, to walk you through what companies need to take into account right now. 

Venturing into AI Opportunity
The European startup scene has long played second fiddle to the US. But with the rise of open-source AI unicorn Mistral and others, hopes are rising that European startups could become more competitive in the global AI marketplace. Paul Murphy, a partner at venture capital firm Lightspeed, one of the first funds to invest in Mistral, will tell us all about his predictions. 

The Business of Solving Big Challenges with AI
Colin Murdoch, Google DeepMind’s chief business officer, will show us why AI is so much more than generative AI and how it can help solve society’s greatest challenges, from gene editing to sustainable energy and computing. 

And the best bit of all: the post-conference drinks! A conference in London would not be nearly as fun without some good old-fashioned networking in a pub afterward. So join us April 16–17 in London, and get the inside scoop on how AI is transforming the world. Get your tickets here

Before you go… We have a freebie to give you a taster of the event. Join me and MIT Technology Review’s editors Niall Firth and David Rotman for a free half-hour LinkedIn Live session today, March 26. We’ll discuss how AI is changing the way we work. Bring your questions and tune in here  at 4pm GMT/12pm EDT/9am EDT.


Now read the rest of The Algorithm

Deeper Learning

The tech industry can’t agree on what open-source AI means. That’s a problem.

Suddenly, “open source” is the latest buzzword in AI circles. Meta has pledged to create open-source artificial general intelligence. And Elon Musk is suing OpenAI over its lack of open-source AI models. Meanwhile, a growing number of tech leaders and companies are setting themselves up as open-source champions. But there’s a fundamental problem—no one can agree on what “open-source AI” means. 

Definitions wanted: Open-source AI promises a future where anyone can take part in the technology’s development. That could accelerate innovation, boost transparency, and give users greater control over systems that could soon reshape many aspects of our lives. But what even is it? What makes an AI model open source, and what disqualifies it? The answers could have significant ramifications for the future of the technology. Read more from Edd Gent.

Bits and Bytes

Apple researchers are exploring dropping “Hey Siri” and listening with AI instead
So maybe our phones will be listening to us all the time after all? New research aims to see if AI models can determine when you’re speaking to your phone without needing a trigger phrase. They also show how Apple, considered a laggard in AI, is determined to catch up. (MIT Technology Review)

An AI-driven “factory of drugs” claims to have hit a big milestone
Insilico is part of a wave of companies betting on AI as the “next amazing revolution” in biology. The company claims to have created the first “true AI drug” that’s advanced to a test of whether it can cure a fatal lung condition in humans. (MIT Technology Review

Chinese platforms are cracking down on influencers selling AI lessons
Over the last year, a few Chinese influencers have made millions of dollars peddling short video lessons on AI, profiting off people’s fears about the as-yet-unclear impact of the new technology on their livelihoods. Now the platforms they thrived on have started to turn against them. (MIT Technology Review

Google DeepMind’s new AI assistant helps elite soccer coaches get even better
The system can predict the outcome of corner kicks and provide realistic and accurate tactical suggestions in matches. The system, called TacticAI, works by analyzing a dataset of 7,176 corner kicks taken by players for Liverpool FC, one of the world’s biggest soccer clubs. (MIT Technology Review)

How AI taught Cassie the two-legged robot to run and jump
Researchers used an AI technique called reinforcement learning to help a two-legged robot nicknamed Cassie run 400 meters, over varying terrains, and execute standing long jumps and high jumps, without being trained explicitly on each movement. (MIT Technology Review)

France fined Google €250 million over copyright infringements 
The country’s competition watchdog says the tech company failed to broker fair agreements with media outlets for publishing links to their content and plundered press articles to train its AI technology without informing the publishers. This sets an interesting precedent for AI and copyright in Europe, and potentially beyond. (Bloomberg

China is educating the next generation of top AI talent
New research suggests that China has eclipsed the United States as the biggest producer of AI talent. (New York Times

DeepMind’s cofounder has ditched his startup to lead Microsoft’s AI initiative
Mustafa Suleyman  has now left his conversational AI startup Inflection to lead Microsoft AI, a new organization focused on advancing Microsoft’s Copilot and other consumer AI products. (Microsoft)

How Adobe’s bet on non-exploitative AI is paying off

26 March 2024 at 04:00

Since the beginning of the generative AI boom, there has been a fight over how large AI models are trained. In one camp sit tech companies such as OpenAI that have claimed it is “impossible” to train AI without hoovering the internet of copyrighted data. And in the other camp are artists who argue that AI companies have taken their intellectual property without consent and compensation. 

Adobe is pretty unusual in that it sides with the latter group, with an  approach that stands out as an example of how generative AI products can be built without scraping copyrighted data from the internet. Adobe released its image-generating model Firefly, which is integrated into its popular photo editing tool Photoshop, one year ago.

In an exclusive interview with MIT Technology Review, Adobe’s AI leaders are adamant this is the only way forward. At stake is not just the livelihood of creators, they say, but our whole information ecosystem. What they have learned shows that building responsible tech doesn’t have to come at the cost of doing business. 

“We worry that the industry, Silicon Valley in particular, does not pause to ask the ‘how’ or the ‘why.’ Just because you can build something doesn’t mean you should build it without consideration of the impact that you’re creating,” says David Wadhwani, president of Adobe’s digital media business. 

Those questions guided the creation of Firefly. When the generative image boom kicked off in 2022, there was a major backlash against AI from creative communities. Many people were using generative AI models as derivative content machines to create images in the style of another artist, sparking a legal fight over copyright and fair use. The latest generative AI technology has also made it much easier to create deepfakes and misinformation. 

It soon became clear that to offer creators proper credit and businesses legal certainty, the company could not build its models by scraping the web of data, Wadwani says.  

Adobe wants to reap the benefits of generative AI while still “recognizing that these are built on the back of human labor. And we have to figure out how to fairly compensate people for that labor now and in the future,” says Ely Greenfield, Adobe’s chief technology officer for digital media.  

To scrape or not to scrape

The scraping of online data, commonplace in AI, has recently become highly controversial. AI companies such as OpenAI, Stability.AI, Meta, and Google are facing numerous lawsuits over AI training data. Tech companies argue that publicly available data is fair game. Writers and artists disagree and are pushing for a license-based model, where creators would get compensated for having their work included in training datasets. 

Adobe trained Firefly on content that had an explicit license allowing AI training, which means the bulk of the training data comes from Adobe’s library of stock photos, says Greenfield. The company offers creators extra compensation when material is  used to train AI models, he adds.  

This is in contrast to the status quo in AI today, where tech companies scrape the web indiscriminately and have a limited understanding of what of what the training data includes. Because of these practices, the AI datasets inevitably include copyrighted content and personal data, and research has uncovered toxic content, such as child sexual abuse material

Scraping the internet gives tech companies a cheap way to get lots of AI training data, and traditionally, having more data has allowed developers to build more powerful models. Limiting Firefly to licensed data for training was a risky bet, says Greenfield. 

“To be honest, when we started with Firefly with our image model, we didn’t know whether or not we would be able to satisfy customer needs without scraping the web,” says Greenfield. 

“And we found we could, which was great.” 

Human content moderators also review the training data to weed out objectionable or harmful content, known intellectual property, and images of known people, and the company has licenses for everything its products train on. 

Adobe’s strategy has been to integrate generative AI tools into its existing products, says Greenfield. In Photoshop, for example, Firefly users can fill in areas of an image using text commands. This allows them much more control over the creative process, and it aids their creativity. 

Still, more work needs to be done. The company wants to make Firefly even faster. Currently it takes around 10 seconds for the company’s content moderation algorithms to check the outputs of the model, for example, Greenfield says. Adobe is also trying to figure out how some business customers could generate copyrighted content, such as Marvel characters or Mickey Mouse. Adobe has teamed up with companies such as IBM, Mattel, NVIDIA and NASCAR, which allows these companies to use the tool with their intellectual property. It is also working on audio, lip synching tools and  3D generation

Garbage in, garbage out

The decision to not scrape the internet also gives Adobe an edge in content moderation. Generative AI is notoriously difficult to control, and developers themselves don’t know why the models generate the images and texts they do. Generative AI models have put out questionable and toxic content in numerous cases. 

That all comes down to what it has been trained on, Greenfield says. He says Adobe’s model has never seen a picture of Joe Biden or Donald Trump, for example, and it cannot be coaxed into generating political misinformation. The AI model’s training data has no news content or famous people. It has not been trained on any copyrighted material, such as images of Mickey Mouse. 

“It just doesn’t understand what that concept is,” says Greenfield. 

Adobe also applies automated content moderation at the point of creation to check that Firefly’s creations are safe for professional use. The model is prohibited from creating news stories or violent images. Some names of artists are also blocked. Firefly-generated content comes with labels that indicate it has been created using AI, and the image’s edit history. 

During a critical election year, the need to know who made a piece of content, and how, is especially important. Adobe has been a vocal advocate for labels on AI content that tell where it originated, and with whom. 

The company started the Content Authenticity Initiative, an association promoting the use of labels which tell you whether content is AI-generated or not, along with the New York Times and Twitter (now X). The initiative now has over 2,500 members. It is also part of developing C2PA, an industry standard label which shows where a piece of content has come from, and how it was created. 

“We’re long overdue [for] a better education in media literacy and tools that support people’s ability to validate any content that claims to represent reality,” Greenfield says. 

Adobe’s approach highlights the need for AI companies to be thinking deeply about content moderation, says Claire Leibowicz, head of AI and media integrity at the nonprofit Partnership on AI. 

Adobe’s approach toward generative AI serves those societal goals by fighting misinformation as well as promoting business goals, such as preserving creator autonomy and attribution, adds Leibowicz. 

“The business mission of Adobe is not to prevent misinformation, per se,” she says. “It’s to empower creators. And isn’t this a really elegant confluence of mission and tactics, to be able to kill two birds with one stone?” 

Wadhwani agrees. The company says Firefly-powered features are among its most popular, and 90% of Firefly’s web app users are entirely new customers to Adobe. 

 “I think our approach has definitely been good for business,” Wadhwani says.

Correction: An earlier version of this article had David Wadhwani’s title wrong. This has been amended.

❌
❌