OpenAI built an AI coding agent and uses it to improve the agent itself 12 December 2025 at 17:16

OpenAI built an AI coding agent and uses it to improve the agent itself

12 December 2025 at 17:16

With the popularity of AI coding tools rising among some software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

Read full article

Comments

OpenAI releases GPT-5.2 after “code red” Google threat alert 11 December 2025 at 16:27

OpenAI releases GPT-5.2 after “code red” Google threat alert

Ars Technica

By:Benj Edwards

11 December 2025 at 16:27

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

Read full article

Comments

Researchers find what makes AI chatbots politically persuasive 4 December 2025 at 15:07

Researchers find what makes AI chatbots politically persuasive

Ars Technica

By:Jacek Krywko

4 December 2025 at 15:07

Roughly two years ago, Sam Altman tweeted that AI systems would be capable of superhuman persuasion well before achieving general intelligence—a prediction that raised concerns about the influence AI could have over democratic elections.

To see if conversational large language models can really sway political views of the public, scientists at the UK AI Security Institute, MIT, Stanford, Carnegie Mellon, and many other institutions performed by far the largest study on AI persuasiveness to date, involving nearly 80,000 participants in the UK. It turned out political AI chatbots fell far short of superhuman persuasiveness, but the study raises some more nuanced issues about our interactions with AI.

AI dystopias

The public debate about the impact AI has on politics has largely revolved around notions drawn from dystopian sci-fi. Large language models have access to essentially every fact and story ever published about any issue or candidate. They have processed information from books on psychology, negotiations, and human manipulation. They can rely on absurdly high computing power in huge data centers worldwide. On top of that, they can often access tons of personal information about individual users thanks to hundreds upon hundreds of online interactions at their disposal.

Read full article

Comments

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months 2 December 2025 at 17:42

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

Ars Technica

By:Benj Edwards

2 December 2025 at 17:42

The shoe is most certainly on the other foot. On Monday, OpenAI CEO Sam Altman reportedly declared a “code red” at the company to improve ChatGPT, delaying advertising plans and other products in the process, The Information reported based on a leaked internal memo. The move follows Google’s release of its Gemini 3 model last month, which has outperformed ChatGPT on some industry benchmark tests and sparked high-profile praise on social media.

In the memo, Altman wrote, “We are at a critical time for ChatGPT.” The company will push back work on advertising integration, AI agents for health and shopping, and a personal assistant feature called Pulse. Altman encouraged temporary team transfers and established daily calls for employees responsible for enhancing the chatbot.

The directive creates an odd symmetry with events from December 2022, when Google management declared its own “code red” internal emergency after ChatGPT launched and rapidly gained in popularity. At the time, Google CEO Sundar Pichai reassigned teams across the company to develop AI prototypes and products to compete with OpenAI’s chatbot. Now, three years later, the AI industry is in a very different place.

Read full article

Comments

Syntax hacking: Researchers discover sentence structure can bypass AI safety rules 2 December 2025 at 07:15

Syntax hacking: Researchers discover sentence structure can bypass AI safety rules

Ars Technica

By:Benj Edwards

2 December 2025 at 07:15

Researchers from MIT, Northeastern University, and Meta recently released a paper suggesting that large language models (LLMs) similar to those that power ChatGPT may sometimes prioritize sentence structure over meaning when answering questions. The findings reveal a weakness in how these models process instructions that may shed light on why some prompt injection or jailbreaking approaches work, though the researchers caution their analysis of some production models remains speculative since training data details of prominent commercial AI models are not publicly available.

The team, led by Chantal Shaib and Vinith M. Suriyakumar, tested this by asking models questions with preserved grammatical patterns but nonsensical words. For example, when prompted with “Quickly sit Paris clouded?” (mimicking the structure of “Where is Paris located?”), models still answered “France.”

This suggests models absorb both meaning and syntactic patterns, but can overrely on structural shortcuts when they strongly correlate with specific domains in training data, which sometimes allows patterns to override semantic understanding in edge cases. The team plans to present these findings at NeurIPS later this month.

Read full article

Comments

Google tells employees it must double capacity every 6 months to meet AI demand 21 November 2025 at 16:47

Google tells employees it must double capacity every 6 months to meet AI demand

Ars Technica

By:Benj Edwards

21 November 2025 at 16:47

While AI bubble talk fills the air these days, with fears of overinvestment that could pop at any time, something of a contradiction is brewing on the ground: Companies like Google and OpenAI can barely build infrastructure fast enough to fill their AI needs.

During an all-hands meeting earlier this month, Google’s AI infrastructure head Amin Vahdat told employees that the company must double its serving capacity every six months to meet demand for artificial intelligence services, reports CNBC. The comments show a rare look at what Google executives are telling its own employees internally. Vahdat, a vice president at Google Cloud, presented slides to its employees showing the company needs to scale “the next 1000x in 4-5 years.”

While a thousandfold increase in compute capacity sounds ambitious by itself, Vahdat noted some key constraints: Google needs to be able to deliver this increase in capability, compute, and storage networking “for essentially the same cost and increasingly, the same power, the same energy level,” he told employees during the meeting. “It won’t be easy but through collaboration and co-design, we’re going to get there.”

Read full article

Comments

Critics scoff after Microsoft warns AI feature can infect machines and pilfer data 19 November 2025 at 15:25

Critics scoff after Microsoft warns AI feature can infect machines and pilfer data

Ars Technica

By:Dan Goodin

19 November 2025 at 15:25

Microsoft’s warning on Tuesday that an experimental AI agent integrated into Windows can infect devices and pilfer sensitive user data has set off a familiar response from security-minded critics: Why is Big Tech so intent on pushing new features before their dangerous behaviors can be fully understood and contained?

As reported Tuesday, Microsoft introduced Copilot Actions, a new set of “experimental agentic features” that, when enabled, perform “everyday tasks like organizing files, scheduling meetings, or sending emails,” and provide “an active digital collaborator that can carry out complex tasks for you to enhance efficiency and productivity.”

Hallucinations and prompt injections apply

The fanfare, however, came with a significant caveat. Microsoft recommended users enable Copilot Actions only “if you understand the security implications outlined.”

Read full article

Comments

Google CEO: If an AI bubble pops, no one is getting out clean 18 November 2025 at 11:32

Google CEO: If an AI bubble pops, no one is getting out clean

Ars Technica

By:Benj Edwards

18 November 2025 at 11:32

On Tuesday, Alphabet CEO Sundar Pichai warned of “irrationality” in the AI market, telling the BBC in an interview, “I think no company is going to be immune, including us.” His comments arrive as scrutiny over the state of the AI market has reached new heights, with Alphabet shares doubling in value over seven months to reach a $3.5 trillion market capitalization.

Speaking exclusively to the BBC at Google’s California headquarters, Pichai acknowledged that while AI investment growth is at an “extraordinary moment,” the industry can “overshoot” in investment cycles, as we’re seeing now. He drew comparisons to the late 1990s Internet boom, which saw early Internet company valuations surge before collapsing in 2000, leading to bankruptcies and job losses.

“We can look back at the Internet right now. There was clearly a lot of excess investment, but none of us would question whether the Internet was profound,” Pichai said. “I expect AI to be the same. So I think it’s both rational and there are elements of irrationality through a moment like this.”

Read full article

Comments

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules 14 November 2025 at 13:45

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules

Ars Technica

By:Benj Edwards

14 November 2025 at 13:45

Em dashes have become what many believe to be a telltale sign of AI-generated text over the past few years. The punctuation mark appears frequently in outputs from ChatGPT and other AI chatbots, sometimes to the point where readers believe they can identify AI writing by its overuse alone—although people can overuse it, too.

On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. “Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it’s supposed to do!” he wrote.

The post, which came two days after the release of OpenAI’s new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this “small win” raises a very big question: If the world’s most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim.

Read full article

Comments

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities 12 November 2025 at 17:54

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

Ars Technica

By:Benj Edwards

12 November 2025 at 17:54

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

Read full article

Comments

Meta’s star AI scientist Yann LeCun plans to leave for own startup 12 November 2025 at 12:14

Meta’s star AI scientist Yann LeCun plans to leave for own startup

Ars Technica

By:Benj Edwards

12 November 2025 at 12:14

Meta’s chief AI scientist and Turing Award winner Yann LeCun plans to leave the company to launch his own startup focused on a different type of AI called “world models,” the Financial Times reported. The French-US scientist has reportedly told associates he will depart in the coming months and is already in early talks to raise funds for the new venture. The departure comes as CEO Mark Zuckerberg radically overhauled Meta’s AI operations after deciding the company had fallen behind rivals such as OpenAI and Google.

World models are hypothetical AI systems that some AI engineers expect to develop an internal “understanding” of the physical world by learning from video and spatial data rather than text alone. Unlike current large language models (such as the kind that power ChatGPT) that predict the next segment of data in a sequence, world models would ideally simulate cause-and-effect scenarios, understand physics, and enable machines to reason and plan more like animals do. LeCun has said this architecture could take a decade to fully develop.

While some AI experts believe that Transformer-based AI models—such as large language models, video synthesis models, and interactive world synthesis models—have emergently modeled physics or absorbed the structural rules of the physical world from training data examples, the evidence so far generally points to sophisticated pattern-matching rather than a base understanding of how the physical world actually works.

Read full article

Comments

Researchers surprised that with AI, toxicity is harder to fake than intelligence 7 November 2025 at 15:15

Researchers surprised that with AI, toxicity is harder to fake than intelligence

Ars Technica

By:Benj Edwards

7 November 2025 at 15:15

The next time you encounter an unusually polite reply on social media, you might want to check twice. It could be an AI model trying (and failing) to blend in with the crowd.

On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.

The study introduces what the authors call a “computational Turing test” to assess how closely AI models approximate human language. Instead of relying on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic analysis to identify specific features that distinguish machine-generated from human-authored content.

Read full article

Comments

StrongestLayer Adds AI Reasoning Engine to Validate Emails 29 October 2025 at 09:00

StrongestLayer Adds AI Reasoning Engine to Validate Emails

Security Boulevard

By:Michael Vizard

29 October 2025 at 09:00

email, attacks, Google Yahoo spam Proofpoint spoofed phishing

StrongestLayer has launched AI Advisor, an advanced email protection tool powered by large language models (LLMs) that evaluates message provenance in real time to detect phishing attacks. By triangulating sender legitimacy and assigning dynamic risk scores, AI Advisor cuts false positives to under 1% and saves security teams hundreds of analyst hours each quarter.

The post StrongestLayer Adds AI Reasoning Engine to Validate Emails appeared first on Security Boulevard.

Normal view

AI dystopias

Hallucinations and prompt injections apply