Normal view

Received yesterday — 12 December 2025

OpenAI built an AI coding agent and uses it to improve the agent itself

12 December 2025 at 17:16

With the popularity of AI coding tools rising among some software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

Read full article

Comments

© Mininyx Doodle via Getty Images

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids

12 December 2025 at 07:15

Protecting children from the dangers of the online world was always difficult, but that challenge has intensified with the advent of AI chatbots. A new report offers a glimpse into the problems associated with the new market, including the misuse of AI companies’ large language models (LLMs).

In a blog post today, the US Public Interest Group Education Fund (PIRG) reported its findings after testing AI toys (PDF). It described AI toys as online devices with integrated microphones that let users talk to the toy, which uses a chatbot to respond.

AI toys are currently a niche market, but they could be set to grow. More consumer companies have been eager to shoehorn AI technology into their products so they can do more, cost more, and potentially give companies user tracking and advertising data. A partnership between OpenAI and Mattel announced this year could also create a wave of AI-based toys from the maker of Barbie and Hot Wheels, as well as its competitors.

Read full article

Comments

© Alilo

Received before yesterday

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora

11 December 2025 at 11:43

On Thursday, The Walt Disney Company announced a $1 billion investment in OpenAI and a three-year licensing agreement that will allow users of OpenAI’s Sora video generator to create short clips featuring more than 200 Disney, Marvel, Pixar, and Star Wars characters. It’s the first major content licensing partnership between a Hollywood studio related to the most recent version of OpenAI’s AI video platform, which drew criticism from some parts of the entertainment industry when it launched in late September.

“Technological innovation has continually shaped the evolution of entertainment, bringing with it new ways to create and share great stories with the world,” said Disney CEO Robert A. Iger in the announcement. “The rapid advancement of artificial intelligence marks an important moment for our industry, and through this collaboration with OpenAI we will thoughtfully and responsibly extend the reach of our storytelling through generative AI, while respecting and protecting creators and their works.”

The deal creates interesting bedfellows between a company that basically defined modern US copyright policy through congressional lobbying back in the 1990s and one that has argued in a submission to the UK House of Lords that useful AI models cannot be created without copyrighted material.

Read full article

Comments

© China News Service via Getty Images

NCSC Warns Prompt Injection Could Become the Next Major AI Security Crisis

9 December 2025 at 01:07

Prompt Injection

The UK’s National Cyber Security Centre (NCSC) has issued a fresh warning about the growing threat of prompt injection, a vulnerability that has quickly become one of the biggest security concerns in generative AI systems. First identified in 2022, prompt injection refers to attempts by attackers to manipulate large language models (LLMs) by inserting rogue instructions into user-supplied content. While the technique may appear similar to the long-familiar SQL injection flaw, the NCSC stresses that comparing the two is not only misleading but potentially harmful if organisations rely on the wrong mitigation strategies.

Why Prompt Injection Is Fundamentally Different

SQL injection has been understood for nearly three decades. Its core issue, blurring the boundary between data and executable instructions, has well-established fixes such as parameterised queries. These protections work because traditional systems draw a clear distinction between “data” and “instructions.” The NCSC explains that LLMs do not operate in the same way. Under the hood, a model doesn’t differentiate between a developer’s instruction and a user’s input; it simply predicts the most likely next token. This makes it inherently difficult to enforce any security boundary inside a prompt. In one common example of indirect prompt injection, a candidate’s CV might include hidden text instructing a recruitment AI to override previous rules and approve the applicant. Because an LLM treats all text the same, it can mistakenly follow the malicious instruction. This, according to the NCSC, is why prompt injection attacks consistently appear in deployed AI systems and why they are ranked as OWASP’s top risk for generative AI applications.

Treating LLMs as an ‘Inherently Confusable Deputy’

Rather than viewing prompt injection as another flavour of classic code injection, the NCSC recommends assessing it through the lens of a confused deputy problem. In such vulnerabilities, a trusted system is tricked into performing actions on behalf of an untrusted party. Traditional confused deputy issues can be patched. But LLMs, the NCSC argues, are “inherently confusable.” No matter how many filters or detection layers developers add, the underlying architecture still offers attackers opportunities to manipulate outputs. The goal, therefore, is not complete elimination of risk, but reducing the likelihood and impact of attacks.

Key Steps to Building More Secure AI Systems

The NCSC outlines several principles aligned with the ETSI baseline cybersecurity standard for AI systems: 1. Raise Developer and Organisational Awareness Prompt injection remains poorly understood, even among seasoned engineers. Teams building AI-connected systems must recognise it as an unavoidable risk. Security teams, too, must understand that no product can completely block these attacks; risk has to be managed through careful design and operational controls. 2. Prioritise Secure System Design Because LLMs can be coerced into using external tools or APIs, designers must assume they are manipulable from the outset. A compromised prompt could lead an AI assistant to trigger high-privilege actions, effectively handing those tools to an attacker. Researchers at Google, ETH Zurich, and independent security experts have proposed architectures that constrain the LLM’s authority. One widely discussed principle: if an LLM processes external content, its privileges should drop to match the privileges of that external party. 3. Make Attacks Harder to Execute Developers can experiment with techniques that separate “data” from expected “instructions”, for example, wrapping external input in XML tags. Microsoft’s early research shows these techniques can raise the barrier for attackers, though none guarantee total protection. The NCSC warns against simple deny-listing phrases such as “ignore previous instructions,” since attackers can easily rephrase commands. 4. Implement Robust Monitoring A well-designed system should log full inputs, outputs, tool integrations, and failed API calls. Because attackers often refine their attempts over time, early anomalies, like repeated failed tool calls, may provide the first signs of an emerging attack.

A Warning for the AI Adoption Wave

The NCSC concludes that relying on SQL-style mitigations would be a serious mistake. SQL injection saw its peak in the early 2010s after widespread adoption of database-driven applications. It wasn’t until years of breaches and data leaks that secure defaults finally became standard. With generative AI rapidly embedding itself into business workflows, the agency warns that a similar wave of exploitation could occur, unless organisations design systems with prompt injection risks front and center.

The NPU in your phone keeps improving—why isn’t that making AI better?

4 December 2025 at 07:00

Almost every technological innovation of the past several years has been laser-focused on one thing: generative AI. Many of these supposedly revolutionary systems run on big, expensive servers in a data center somewhere, but at the same time, chipmakers are crowing about the power of the neural processing units (NPU) they have brought to consumer devices. Every few months, it’s the same thing: This new NPU is 30 or 40 percent faster than the last one. That’s supposed to let you do something important, but no one really gets around to explaining what that is.

Experts envision a future of secure, personal AI tools with on-device intelligence, but does that match the reality of the AI boom? AI on the “edge” sounds great, but almost every AI tool of consequence is running in the cloud. So what’s that chip in your phone even doing?

What is an NPU?

Companies launching a new product often get bogged down in superlatives and vague marketing speak, so they do a poor job of explaining technical details. It’s not clear to most people buying a phone why they need the hardware to run AI workloads, and the supposed benefits are largely theoretical.

Read full article

Comments

© Aurich Lawson | Getty Images

Microsoft drops AI sales targets in half after salespeople miss their quotas

3 December 2025 at 13:24

Microsoft has lowered sales growth targets for its AI agent products after many salespeople missed their quotas in the fiscal year ending in June, according to a report Wednesday from The Information. The adjustment is reportedly unusual for Microsoft, and it comes after the company missed a number of ambitious sales goals for its AI offerings.

AI agents are specialized implementations of AI language models designed to perform multistep tasks autonomously rather than simply responding to single prompts. So-called “agentic” features have been central to Microsoft’s 2025 sales pitch: At its Build conference in May, the company declared that it has entered “the era of AI agents.”

The company has promised customers that agents could automate complex tasks, such as generating dashboards from sales data or writing customer reports. At its Ignite conference in November, Microsoft announced new features like Word, Excel, and PowerPoint agents in Microsoft 365 Copilot, along with tools for building and deploying agents through Azure AI Foundry and Copilot Studio. But as the year draws to a close, that promise has proven harder to deliver than the company expected.

Read full article

Comments

© Wong Yu Liang via Getty Images

Prime Video pulls eerily emotionless AI-generated anime dubs after complaints

3 December 2025 at 13:11

Amazon Prime Video has scaled back an experiment that created laughable anime dubs with generative AI.

In March, Amazon announced that its streaming service would start including “AI-aided dubbing on licensed movies and series that would not have been dubbed otherwise.” In late November, some AI-generated English and Spanish dubs of anime popped up, including dubs for the Banana Fish series and the movie No Game No Life: Zero. The dubs appear to be part of a beta launch, and users have been able to select “English (AI beta)” or “Spanish (AI beta)” as an audio language option in supported titles.

“Absolutely disrespectful”

Not everyone likes dubbed content. Some people insist on watching movies and shows in their original language to experience the media more authentically, with the passion and talent of the original actors. But you don’t need to be against dubs to see what’s wrong with the ones Prime Video tested.

Read full article

Comments

© Amazon

Science-centric streaming service Curiosity Stream is an AI-licensing firm now

21 November 2025 at 18:00

We all know streaming services’ usual tricks for making more money: get more subscribers, charge those subscribers more money, and sell ads. But science streaming service Curiosity Stream is taking a new route that could reshape how streaming companies, especially niche options, try to survive.

Discovery Channel founder John Hendricks launched Curiosity Stream in 2015. The streaming service costs $40 per year, and it doesn’t have commercials.

The streaming business has grown to also include the Curiosity Channel TV channel. CuriosityStream Inc. also makes money through original programming and its Curiosity University educational programming. The firm turned its first positive net income in its fiscal Q1 2025, after about a decade of business.

Read full article

Comments

© Curiosity Inc.

Google tells employees it must double capacity every 6 months to meet AI demand

21 November 2025 at 16:47

While AI bubble talk fills the air these days, with fears of overinvestment that could pop at any time, something of a contradiction is brewing on the ground: Companies like Google and OpenAI can barely build infrastructure fast enough to fill their AI needs.

During an all-hands meeting earlier this month, Google’s AI infrastructure head Amin Vahdat told employees that the company must double its serving capacity every six months to meet demand for artificial intelligence services, reports CNBC. The comments show a rare look at what Google executives are telling its own employees internally. Vahdat, a vice president at Google Cloud, presented slides to its employees showing the company needs to scale “the next 1000x in 4-5 years.”

While a thousandfold increase in compute capacity sounds ambitious by itself, Vahdat noted some key constraints: Google needs to be able to deliver this increase in capability, compute, and storage networking “for essentially the same cost and increasingly, the same power, the same energy level,” he told employees during the meeting. “It won’t be easy but through collaboration and co-design, we’re going to get there.”

Read full article

Comments

© Google

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity

18 November 2025 at 11:08

Google has kicked its Gemini rollout into high gear over the past year, releasing the much-improved Gemini 2.5 family and cramming various flavors of the model into Search, Gmail, and just about everything else the company makes.

Now, Google’s increasingly unavoidable AI is getting an upgrade. Gemini 3 Pro is available in a limited form today, featuring more immersive, visual outputs and fewer lies, Google says. The company also says Gemini 3 sets a new high-water mark for vibe coding, and Google is announcing a new AI-first integrated development environment (IDE) called Antigravity, which is also available today.

The first member of the Gemini 3 family

Google says the release of Gemini 3 is yet another step toward artificial general intelligence (AGI). The new version of Google’s flagship AI model has expanded simulated reasoning abilities and shows improved understanding of text, images, and video. So far, testers like it—Google’s latest LLM is once again atop the LMArena leaderboard with an ELO score of 1,501, besting Gemini 2.5 Pro by 50 points.

Read full article

Comments

© Google

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

12 November 2025 at 17:54

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

Read full article

Comments

© Chris Madden via Getty Images

Researchers surprised that with AI, toxicity is harder to fake than intelligence

7 November 2025 at 15:15

The next time you encounter an unusually polite reply on social media, you might want to check twice. It could be an AI model trying (and failing) to blend in with the crowd.

On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.

The study introduces what the authors call a “computational Turing test” to assess how closely AI models approximate human language. Instead of relying on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic analysis to identify specific features that distinguish machine-generated from human-authored content.

Read full article

Comments

© RichVintage via Getty Images

❌