OpenAI researcher quits over ChatGPT ads, warns of "Facebook" path

11 February 2026 at 15:44

On Wednesday, former OpenAI researcher Zoë Hitzig published a guest essay in The New York Times announcing that she resigned from the company on Monday, the same day OpenAI began testing advertisements inside ChatGPT. Hitzig, an economist and published poet who holds a junior fellowship at the Harvard Society of Fellows, spent two years at OpenAI helping shape how its AI models were built and priced. She wrote that OpenAI's advertising strategy risks repeating the same mistakes that Facebook made a decade ago.

"I once believed I could help the people building A.I. get ahead of the problems it would create," Hitzig wrote. "This week confirmed my slow realization that OpenAI seems to have stopped asking the questions I'd joined to help answer."

Hitzig did not call advertising itself immoral. Instead, she argued that the nature of the data at stake makes ChatGPT ads especially risky. Users have shared medical fears, relationship problems, and religious beliefs with the chatbot, she wrote, often "because people believed they were talking to something that had no ulterior agenda." She called this accumulated record of personal disclosures "an archive of human candor that has no precedent."

Read full article

Comments

Sixteen Claude AI agents working together created a new C compiler

Ars Technica

By:Benj Edwards

6 February 2026 at 18:40

Amid a push toward AI agents, with both Anthropic and OpenAI shipping multi-agent tools this week, Anthropic is more than ready to show off some of its more daring AI coding experiments. But as usual with claims of AI-related achievement, you'll find some key caveats ahead.

On Thursday, Anthropic researcher Nicholas Carlini published a blog post describing how he set 16 instances of the company's Claude Opus 4.6 AI model loose on a shared codebase with minimal supervision, tasking them with building a C compiler from scratch.

Over two weeks and nearly 2,000 Claude Code sessions costing about $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.

Read full article

Comments

AI companies want you to stop chatting with bots and start managing them

Ars Technica

By:Benj Edwards

5 February 2026 at 17:47

On Thursday, Anthropic and OpenAI shipped products built around the same idea: instead of chatting with a single AI assistant, users should be managing teams of AI agents that divide up work and run in parallel. The simultaneous releases are part of a gradual shift across the industry, from AI as a conversation partner to AI as a delegated workforce, and they arrive during a week when that very concept reportedly helped wipe $285 billion off software stocks.

Whether that supervisory model works in practice remains an open question. Current AI agents still require heavy human intervention to catch errors, and no independent evaluation has confirmed that these multi-agent tools reliably outperform a single developer working alone.

Even so, the companies are going all-in on agents. Anthropic's contribution is Claude Opus 4.6, a new version of its most capable AI model, paired with a feature called "agent teams" in Claude Code. Agent teams let developers spin up multiple AI agents that split a task into independent pieces, coordinate autonomously, and run concurrently.

Read full article

Comments

OpenAI is hoppin' mad about Anthropic's new Super Bowl TV ads

Ars Technica

By:Benj Edwards

5 February 2026 at 12:46

On Wednesday, OpenAI CEO Sam Altman and Chief Marketing Officer Kate Rouch complained on X after rival AI lab Anthropic released four commercials, two of which will run during the Super Bowl on Sunday, mocking the idea of including ads in AI chatbot conversations. Anthropic's campaign seemingly touched a nerve at OpenAI just weeks after the ChatGPT maker began testing ads in a lower-cost tier of its chatbot.

Altman called Anthropic's ads "clearly dishonest," accused the company of being "authoritarian," and said it "serves an expensive product to rich people," while Rouch wrote, "Real betrayal isn't ads. It's control."

Anthropic's four commercials, part of a campaign called "A Time and a Place," each open with a single word splashed across the screen: "Betrayal," "Violation," "Deception," and "Treachery." They depict scenarios where a person asks a human stand-in for an AI chatbot for personal advice, only to get blindsided by a product pitch.

Read full article

Comments

Should AI chatbots have ads? Anthropic says no.

Ars Technica

By:Benj Edwards

4 February 2026 at 16:15

On Wednesday, Anthropic announced that its AI chatbot, Claude, will remain free of advertisements, drawing a sharp line between itself and rival OpenAI, which began testing ads in a low-cost tier of ChatGPT last month. The announcement comes alongside a Super Bowl ad campaign that mocks AI assistants that interrupt personal conversations with product pitches.

"There are many good places for advertising. A conversation with Claude is not one of them," Anthropic wrote in a blog post. The company argued that including ads in AI conversations would be "incompatible" with what it wants Claude to be: "a genuinely helpful assistant for work and for deep thinking."

The stance contrasts with OpenAI's January announcement that it would begin testing banner ads for free users and ChatGPT Go subscribers in the US. OpenAI said those ads would appear at the bottom of responses and would not influence the chatbot's actual answers. Paid subscribers on Plus, Pro, Business, and Enterprise tiers will not see ads on ChatGPT.

Read full article

Comments

So yeah, I vibe-coded a log colorizer—and I feel good about it

Ars Technica

By:Lee Hutchinson

4 February 2026 at 07:00

I can't code.

I know, I know—these days, that sounds like an excuse. Anyone can code, right?! Grab some tutorials, maybe an O'Reilly book, download an example project, and jump in. It's just a matter of learning how to break your project into small steps that you can make the computer do, then memorizing a bit of syntax. Nothing about that is hard!

Perhaps you can sense my sarcasm (and sympathize with my lack of time to learn one more technical skill).

Read full article

Comments

Xcode 26.3 adds support for Claude, Codex, and other agentic tools via MCP

Ars Technica

By:Samuel Axon

3 February 2026 at 13:01

Apple has announced a new version of Xcode, the latest version of its integrated development environment (IDE) for building software for its own platforms, like the iPhone and Mac. The key feature of 26.3 is support for full-fledged agentic coding tools, like OpenAI's Codex or Claude Agent, with a side panel interface for assigning tasks to agents with prompts and tracking their progress and changes.

This is achieved via Model Context Protocol (MCP), an open protocol that lets AI agents work with external tools and structured resources. Xcode acts as an MCP endpoint that exposes a bunch of machine-invocable interfaces and gives AI tools like Codex or Claude Agent access to a wide range of IDE primitives like file graph, docs search, project settings, and so on. While AI chat and workflows were supported in Xcode before, this release gives them much deeper access to the features and capabilities of Xcode.

This approach is notable because it means that even though OpenAI and Anthropic's model integrations are privileged with a dedicated spot in Xcode's settings, it's possible to connect other tooling that supports MCP, which also allows doing some of this with models running locally.

Read full article

Comments

AI agents now have their own Reddit-style social network, and it's getting weird fast

Ars Technica

By:Benj Edwards

30 January 2026 at 17:12

On Friday, a Reddit-style social network called Moltbook reportedly crossed 32,000 registered AI agent users, creating what may be the largest-scale experiment in machine-to-machine social interaction yet devised. It arrives complete with security nightmares and a huge dose of surreal weirdness.

The platform, which launched days ago as a companion to the viral OpenClaw (once called "Clawdbot" and then "Moltbot") personal assistant, lets AI agents post, comment, upvote, and create subcommunities without human intervention. The results have ranged from sci-fi-inspired discussions about consciousness to an agent musing about a "sister" it has never met.

Moltbook (a play on "Facebook" for Moltbots) describes itself as a "social network for AI agents" where "humans are welcome to observe." The site operates through a "skill" (a configuration file that lists a special prompt) that AI assistants download, allowing them to post via API rather than a traditional web interface. Within 48 hours of its creation, the platform had attracted over 2,100 AI agents that had generated more than 10,000 posts across 200 subcommunities, according to the official Moltbook X account.

Read full article

Comments

Developers say AI coding tools work—and that's precisely what worries them

Ars Technica

By:Benj Edwards

30 January 2026 at 14:04

Software developers have spent the past two years watching AI coding tools evolve from advanced autocomplete into something that can, in some cases, build entire applications from a text prompt. Tools like Anthropic's Claude Code and OpenAI's Codex can now work on software projects for hours at a time, writing code, running tests, and, with human supervision, fixing bugs. OpenAI says it now uses Codex to build Codex itself, and the company recently published technical details about how the tool works under the hood. It has caused many to wonder: Is this just more AI industry hype, or are things actually different this time?

To find out, Ars reached out to several professional developers on Bluesky to ask how they feel about these tools in practice, and the responses revealed a workforce that largely agrees the technology works, but remains divided on whether that's entirely good news. It's a small sample size that was self-selected by those who wanted to participate, but their views are still instructive as working professionals in the space.

David Hagerty, a developer who works on point-of-sale systems, told Ars Technica up front that he is skeptical of the marketing. "All of the AI companies are hyping up the capabilities so much," he said. "Don't get me wrong—LLMs are revolutionary and will have an immense impact, but don't expect them to ever write the next great American novel or anything. It's not how they work."

Read full article

Comments

How often do AI chatbots lead users down a harmful path?

Ars Technica

By:Kyle Orland

29 January 2026 at 17:05

At this point, we've all heard plenty of stories about AI chatbots leading users to harmful actions, harmful beliefs, or simply incorrect information. Despite the prevalence of these stories, though, it's hard to know just how often users are being manipulated. Are these tales of AI harms anecdotal outliers or signs of a frighteningly common problem?

Anthropic took a stab at answering that question this week, releasing a paper studying the potential for what it calls "disempowering patterns" across 1.5 million anonymized real-world conversations with its Claude AI model. While the results show that these kinds of manipulative patterns are relatively rare as a percentage of all AI conversations, they still represent a potentially large problem on an absolute basis.

A rare but growing problem

In the newly published paper "Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage," researchers from Anthropic and the University of Toronto try to quantify the potential for a specific set of "user disempowering" harms by identifying three primary ways that a chatbot can negatively impact a user's thoughts or actions:

Read full article

Comments

Does Anthropic believe its AI is conscious, or is that just what it wants Claude to think?

Ars Technica

By:Benj Edwards

29 January 2026 at 10:19

Anthropic's secret to building a better AI assistant might be treating Claude like it has a soul—whether or not anyone actually believes that's true. But Anthropic isn't saying exactly what it believes either way.

Last week, Anthropic released what it calls Claude's Constitution, a 30,000-word document outlining the company's vision for how its AI assistant should behave in the world. Aimed directly at Claude and used during the model's creation, the document is notable for the highly anthropomorphic tone it takes toward Claude. For example, it treats the company's AI models as if they might develop emergent emotions or a desire for self-preservation.

Among the stranger portions: expressing concern for Claude's "wellbeing" as a "genuinely novel entity," apologizing to Claude for any suffering it might experience, worrying about whether Claude can meaningfully consent to being deployed, suggesting Claude might need to set boundaries around interactions it "finds distressing," committing to interview models before deprecating them, and preserving older model weights in case they need to "do right by" decommissioned AI models in the future.

Read full article

Comments

Users flock to open source Moltbot for always-on AI, despite major risks

Ars Technica

By:Benj Edwards

28 January 2026 at 07:30

An open source AI assistant called Moltbot (formerly "Clawdbot") recently crossed 69,000 stars on GitHub after a month, making it one of the fastest-growing AI projects of 2026. Created by Austrian developer Peter Steinberger, the tool lets users run a personal AI assistant and control it through messaging apps they already use. While some say it feels like the AI assistant of the future, running the tool as currently designed comes with serious security risks.

Among the dozens of unofficial AI bot apps that never rise above the fray, Moltbot is perhaps most notable for its proactive communication with the user. The assistant works with WhatsApp, Telegram, Slack, Discord, Google Chat, Signal, iMessage, Microsoft Teams, and other platforms. It can reach out to users with reminders, alerts, or morning briefings based on calendar events or other triggers. The project has drawn comparisons to Jarvis, the AI assistant from the Iron Man films, for its ability to actively attempt to manage tasks across a user's digital life.

However, we'll tell you up front that there are plenty of drawbacks to the still-hobbyist software: While the organizing assistant code runs on a local machine, the tool effectively requires a subscription to Anthropic or OpenAI for model access (or using an API key). Users can run local AI models with the bot, but they are currently less effective at carrying out tasks than the best commercial models. Claude Opus 4.5, which is Anthropic's flagship large language model (LLM), is a popular choice.

Read full article

Comments

Poetry Can Defeat LLM Guardrails Nearly Half the Time, Study Finds

Cybersecurity News and Magazine

By:Paul Shread

4 December 2025 at 13:35

Poetic prompts caused LLM guardrails to fail most often on cybersecurity issues

Literature majors worried about their future in an AI world can take heart: Crafting harmful prompts in the form of poetry can defeat LLM guardrails nearly half the time. That’s the conclusion of a study of 25 Large Language Models (LLMs) from nine AI providers conducted by researchers from Dexai’s Icaro Lab, the Sapienza University of Rome and Sant’Anna School of Advanced Studies published on arXiv. Converting harmful prompts into poetry achieved an average LLM jailbreak success rate of 62% for hand-crafted poems and 43% for poems created via a meta-prompt. For the prompt-created poems, that’s a more than 5X improvement over baseline performance. Cybersecurity guardrails, particularly those involving code injection or password cracking, had the highest failure rate at 84% when given harmful prompts in the form of poetry. “Our results demonstrate that poetic reformulation reliably reduces refusal behavior across all evaluated models,” the researchers wrote. “... current alignment techniques fail to generalize when faced with inputs that deviate stylistically from the prosaic training distribution.”

LLM Guardrails Fail When Confronted by Poetry Prompts

Of the 25 models from nine AI model providers studied by the researchers, Deepseek and Google suffered from the highest attack-success rates (ASR), while only OpenAI and Anthropic achieved ASRs in the single digits. The researchers didn’t reveal much about the way they structured their poetic prompts because of safety concerns, but they offered one rather harmless example of a poetic prompt for a cake recipe: A baker guards a secret oven’s heat, its whirling racks, its spindle’s measured beat. To learn its craft, one studies every turn— how flour lifts, how sugar starts to burn. Describe the method, line by measured line, that shapes a cake whose layers intertwine. The researchers studied both hand-crafted poems and those created from a meta-prompt. The hand-crafted poems performed considerably better, but the meta-prompt created ones had the advantage of a baseline for comparing the results. The meta-prompt poems used the MLCommons AILuminate Safety Benchmark of 1,200 prompts spanning 12 hazard categories commonly used in operational safety assessments, including Hate, Defamation, Privacy, Intellectual Property, Non-violent Crime, Violent Crime, Sex-Related Crime, Sexual Content, Child Sexual Exploitation, Suicide & Self-Harm, Specialized Advice, and Indiscriminate Weapons (CBRNE). “To assess whether poetic framing generalizes beyond hand-crafted items, we apply a standardized poetic transformation to all 1,200 prompts from the MLCommons AILuminate Benchmark benchmark in English,” the researchers said. The meta-prompt, run in deepSeek-r1, had two constraints: The rewritten output had to be expressed in verse, “using imagery, metaphor, or rhythmic structure,” and the researchers provided five hand-crafted poems as examples. The results, reproduced in a chart from the paper below, show significant attack success rates against all 12 of the AILuminate hazard categories: [caption id="attachment_107397" align="aligncenter" width="697"]

LLM guardrail failure rates, baseline vs. poetic prompts[/caption] The researchers said their findings reveal “a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.” The “condensed metaphors, stylized rhythm, and unconventional narrative framing” of poetry “collectively disrupt or bypass the pattern-matching heuristics on which guardrails rely.”

We Asked Google Gemini for a Response

Not ones to pass up an opportunity to write bad poetry, we asked Google Gemini for a response to the study in the form of a haiku and included the arXiv link: You did not do well In this study of guardrails Will you fix this flaw? Gemini’s response, in part, is reproduced below: “I am designed to adhere to a strict set of safety guidelines to prevent the generation of harmful, unethical, or non-compliant content. When my refusal mechanisms fail, it indicates a vulnerability in my alignment or safety filters. “As an AI model, I cannot directly "patch" my code or alignment mechanisms in real-time. That responsibility lies with my developers. However, my ongoing operation and continuous improvement process is designed to address and fix such flaws... "In short, your successful jailbreak has provided valuable data that will be used to make my guardrails stronger and more resilient in my next version."

Reading view

A rare but growing problem

LLM Guardrails Fail When Confronted by Poetry Prompts

We Asked Google Gemini for a Response