Reading view

AI Chatbot Companies Should Protect Your Conversations From Bulk Surveillance

EFF intern Alexandra Halbeck contributed to this blog

When people talk to a chatbot, they often reveal highly personal information they wouldn’t share with anyone else. Chat logs are digital repositories of our most sensitive and revealing information. They are also tempting targets for law enforcement, to which the U.S. Constitution gives only one answer: get a warrant.

AI companies have a responsibility to their users to make sure the warrant requirement is strictly followed, to resist unlawful bulk surveillance requests, and to be transparent with their users about the number of government requests they receive.

Chat logs are deeply personal, just like your emails.

Tens of millions of people use chatbots to brainstorm, test ideas, and explore questions they might never post publicly or even admit to another person. Whether advisable or not, people also turn to consumer AI companies for medical information, financial advice, and even dating tips. These conversations reveal people’s most sensitive information.

Without privacy protections, users would be chilled in their use of AI systems.


Consider the sensitivity of the following prompts: “how to get abortion pills,” “how to protect myself at a protest,” or “how to escape an abusive relationship.” These exchanges can reveal everything from health status to political beliefs to private grief. A single chat thread can expose the kind of intimate detail once locked away in a handwritten diary.

Without privacy protections, users would be chilled in their use of AI systems for learning, expression, and seeking help.

Chat logs require a warrant.

Whether you draft an email, edit an online document, or ask a question to a chatbot, you have a reasonable expectation of privacy in that information. Chatbots may be a new technology, but the constitutional principle is old and clear. Before the government can rifle through your private thoughts stored on digital platforms, it must do what it has always been required to do: get a warrant.

For over a century, the Fourth Amendment has protected the content of private communications—such as letters, emails, and search engine prompts—from unreasonable government searches. AI prompts require the same constitutional protection.

This protection is not aspirational—it already exists. The Fourth Amendment draws a bright line around private communications: the government must show probable cause and obtain a particularized warrant before compelling a company to turn over your data. Companies like OpenAI acknowledge this warrant requirement explicitly, while others like Anthropic could stand to be more precise.

AI companies must resist bulk surveillance orders.

AI companies that create chatbots should commit to having your back and resisting unlawful bulk surveillance orders. A valid search warrant requires law enforcement to provide a judge with probable cause and to particularly describe the thing to be searched. This means that bulk surveillance orders often fail that test.

What do these overbroad orders look like? In the past decade or so, police have often sought “reverse” search warrants for user information held by technology companies. Rather than searching for one particular individual, police have demanded that companies rummage through their giant databases of personal data to help develop investigative leads. This has included “tower dumps” or “geofence warrants,” in which police order a company to search all users’ location data to identify anyone that’s been near a particular place at a particular time. It has also included “keyword” warrants, which seek to identify any person who typed a particular phrase into a search engine. This could include a chilling keyword search for a well-known politician’s name or busy street, or a geofence warrant near a protest or church.

Courts are beginning to rule that these broad demands are unconstitutional. And after years of complying, Google has finally made it technically difficult—if not impossible—to provide mass location data in response to a geofence warrant.

This is an old story: if a company stores a lot of data about its users, law enforcement (and private litigants) will eventually seek it out. Law enforcement is already demanding user data from AI chatbot companies, and it will only increase. These companies must be prepared for this onslaught, and they must commit to fighting to protect their users.

In addition to minimizing the amount of data accessible to law enforcement, they can start with three promises to their users. These aren’t radical ideas. They are basic transparency and accountability standards to preserve user trust and to ensure constitutional rights keep pace with technology:

  1. commit to fighting bulk orders for user data in court,
  2. commit to providing users with advanced notice before complying with a legal demand so that users can choose to fight on their own behalf, and 
  3. commit to publishing periodic transparency reports, which tally up how many legal demands for user data the company receives (including the number of bulk orders specifically).

  •  

Victory! Ninth Circuit Limits Intrusive DMCA Subpoenas

The Ninth Circuit upheld an important limitation on Digital Millenium Copyright Act (DMCA) subpoenas that other federal courts have recognized for more than two decades. The DMCA, a misguided anti-piracy law passed in the late nineties, created a bevy of powerful tools, ostensibly to help copyright holders fight online infringement. Unfortunately, the DMCA’s powerful protections are ripe for abuse by “copyright trolls,” unscrupulous litigants who abuse the system at everyone else’s expense.

The DMCA’s “notice and takedown” regime is one of these tools. Section 512 of the DMCA creates “safe harbors” that protect service providers from liability, so long as they disable access to content when a copyright holder notifies them that the content is infringing, and fulfill some other requirements. This gives copyright holders a quick and easy way to censor allegedly infringing content without going to court. 

Unfortunately, the DMCA’s powerful protections are ripe for abuse by “copyright trolls”

Section 512(h) is ostensibly designed to facilitate this system, by giving rightsholders a fast and easy way of identifying anonymous infringers. Section 512(h) allows copyright holders to obtain a judicial subpoena to unmask the identities of allegedly infringing anonymous internet users, just by asking a court clerk to issue one, and attaching a copy of the infringement notice. In other words, they can wield the court’s power to override an internet user’s right to anonymous speech, without permission from a judge.  It’s easy to see why these subpoenas are prone to misuse.

Internet service providers (ISPs)—the companies that provide an internet connection (e.g. broadband or fiber) to customers—are obvious targets for these subpoenas. Often, copyright holders know the Internet Protocol (IP) address of an alleged infringer, but not their name or contact information. Since ISPs assign IP addresses to customers, they can often identify the customer associated with one.

Fortunately, Section 512(h) has an important limitation that protects users.  Over two decades ago, several federal appeals courts ruled that Section 512(h) subpoenas cannot be issued to ISPs. Now, in In re Internet Subscribers of Cox Communications, LLC, the Ninth Circuit agreed, as EFF urged it to in our amicus brief.

As the Ninth Circuit held:

Because a § 512(a) service provider cannot remove or disable access to infringing content, it cannot receive a valid (c)(3)(A) notification, which is a prerequisite for a § 512(h) subpoena. We therefore conclude from the text of the DMCA that a § 512(h) subpoena cannot issue to a § 512(a) service provider as a matter of law.

This decision preserves the understanding of Section 512(h) that internet users, websites, and copyright holders have shared for decades. As EFF explained to the court in its amicus brief:

[This] ensures important procedural safeguards for internet users against a group of copyright holders who seek to monetize frequent litigation (or threats of litigation) by coercing settlements—copyright trolls. Affirming the district court and upholding the interpretation of the D.C. and Eighth Circuits will preserve this protection, while still allowing rightsholders the ability to find and sue infringers.

EFF applauds this decision. And because three federal appeals courts have all ruled the same way on this question—and none have disagreed—ISPs all over the country can feel confident about protecting their customers’ privacy by simply throwing improper DMCA 512(h) subpoenas in the trash.

  •  

President Trump’s War on “Woke AI” Is a Civil Liberties Nightmare

The White House’s recently-unveiled “AI Action Plan” wages war on so-called “woke AI”—including large language models (LLMs) that provide information inconsistent with the administration’s views on climate change, gender, and other issues. It also targets measures designed to mitigate the generation of racial and gender biased content and even hate speech. The reproduction of this bias is a pernicious problem that AI developers have struggled to solve for over a decade.

A new executive order called “Preventing Woke AI in the Federal Government,” released alongside the AI Action Plan, seeks to strong-arm AI companies into modifying their models to conform with the Trump Administration’s ideological agenda.

The executive order requires AI companies that receive federal contracts to prove that their LLMs are free from purported “ideological biases” like “diversity, equity, and inclusion.” This heavy-handed censorship will not make models more accurate or “trustworthy,” as the Trump Administration claims, but is a blatant attempt to censor the development of LLMs and restrict them as a tool of expression and information access. While the First Amendment permits the government to choose to purchase only services that reflect government viewpoints, the government may not use that power to influence what services and information are available to the public. Lucrative government contracts can push commercial companies to implement features (or biases) that they wouldn't otherwise, and those often roll down to the user. Doing so would impact the 60 percent of Americans who get information from LLMs, and it would force developers to roll back efforts to reduce biases—making the models much less accurate, and far more likely to cause harm, especially in the hands of the government. 

Less Accuracy, More Bias and Discrimination

It’s no secret that AI models—including gen AI—tend to discriminate against racial and gender minorities. AI models use machine learning to identify and reproduce patterns in data that they are “trained” on. If the training data reflects biases against racial, ethnic, and gender minorities—which it often does—then the AI model will “learn” to discriminate against those groups. In other words, garbage in, garbage out. Models also often reflect the biases of the people who train, test, and evaluate them. 

This is true across different types of AI. For example, “predictive policing” tools trained on arrest data that reflects overpolicing of black neighborhoods frequently recommend heightened levels of policing in those neighborhoods, often based on inaccurate predictions that crime will occur there. Generative AI models are also implicated. LLMs already recommend more criminal convictions, harsher sentences, and less prestigious jobs for people of color. Despite that people of color account for less than half of the U.S. prison population, 80 percent of Stable Diffusion's AI-generated images of inmates have darker skin. Over 90 percent of AI-generated images of judges were men; in real life, 34 percent of judges are women. 

These models aren’t just biased—they’re fundamentally incorrect. Race and gender aren’t objective criteria for deciding who gets hired or convicted of a crime. Those discriminatory decisions reflected trends in the training data that could be caused by bias or chance—not some “objective” reality. Setting fairness aside, biased models are just worse models: they make more mistakes, more often. Efforts to reduce bias-induced errors will ultimately make models more accurate, not less. 

Biased LLMs Cause Serious Harm—Especially in the Hands of the Government

But inaccuracy is far from the only problem. When government agencies start using biased AI to make decisions, real people suffer. Government officials routinely make decisions that impact people’s personal freedom and access to financial resources, healthcare, housing, and more. The White House’s AI Action Plan calls for a massive increase in agencies’ use of LLMs and other AI—while all but requiring the use of biased models that automate systemic, historical injustice. Using AI simply to entrench the way things have always been done squanders the promise of this new technology.

We need strong safeguards to prevent government agencies from procuring biased, harmful AI tools. In a series of executive orders, as well as his AI Action Plan, the Trump Administration has rolled back the already-feeble Biden-era AI safeguards. This makes AI-enabled civil rights abuses far more likely, putting everyone’s rights at risk. 

And the Administration could easily exploit the new rules to pressure companies to make publicly available models worse, too. Corporations like healthcare companies and landlords increasingly use AI to make high-impact decisions about people, so more biased commercial models would also cause harm. 

We have argued against using machine learning to make predictive policing decisions or other punitive judgments for just these reasons, and will continue to protect your right not to be subject to biased government determinations influenced by machine learning.

  •  

EFF to Court: The DMCA Didn't Create a New Right of Attribution, You Shouldn't Either

Amid a wave of lawsuits targeting how AI companies use copyrighted works to train large language models that generate new works, a peculiar provision of copyright law is suddenly in the spotlight: Section 1202 of the Digital Millennium Copyright Act (DMCA). Section 1202 restricts intentionally removing or changing copyright management information (CMI), such as a signature on a painting or attached to a photograph. Passed in 1998, the rule was supposed to help rightsholders identify potentially infringing uses of their works and encourage licensing.

Open AI and Microsoft used code from Github as part of the training data for their LLMs, along with billions of other works. A group of anonymous Github contributors sued, arguing that those LLMs generated new snippets of code that were substantially similar to theirs—but with the CMI stripped. Notably, they did not claim that the new code was copyright infringement—they are relying solely on Section 1202 of the DMCA. Their problem? The generated code is different from their original work, and courts across the US have adopted an “identicality rule,” on the theory that Section 1202 is supposed to apply only when CMI is removed from existing works, not when it’s simply missing from a new one.

It may sound like an obscure legal question, but the outcome of this battle—currently before the Ninth Circuit Court of Appeals—could have far-reaching implications beyond generative AI technologies. If the rightholders were correct, Section 1202 effectively creates a freestanding right of attribution, creating potential liability even for non-infringing uses, such as fair use, if those new uses simply omit the CMI. While many fair users might ultimately escape liability under other limitations built into Section 1202, the looming threat of litigation, backed by risk of high and unpredictable statutory penalties, will be enough to pressure many defendants to settle. Indeed, an entire legal industry of “copyright trolls” has emerged to exploit this dynamic, with no corollary benefit to creativity or innovation.

Fortunately, as we explain in a brief filed today, the text of Section 1202 doesn’t support such an expansive interpretation. The provision repeatedly refers to “works” and “copies of works”—not “substantially similar” excerpts or new adaptations—and its focus on “removal or alteration” clearly contemplates actions taken with respect to existing works, not new ones. Congress could have chosen otherwise and written the law differently. Wisely it did not, thereby ensuring that rightsholders couldn’t leverage the omission of CMI to punish or unfairly threaten otherwise lawful re-uses of a work.

Given the proliferation of copyrighted works in virtually every facet of daily life, the last thing any court should do is give rightsholders a new, freestanding weapon against fair uses. As the Supreme Court once observed, copyright is a “tax on readers for the purpose of giving a bounty to writers.” That tax—including the expense of litigation—can be an important way to encourage new creativity, but it should not be levied unless the Copyright Act clearly requires it.

  •  

Two Courts Rule On Generative AI and Fair Use — One Gets It Right

Things are speeding up in generative AI legal cases, with two judicial opinions just out on an issue that will shape the future of generative AI: whether training gen-AI models on copyrighted works is fair use. One gets it spot on; the other, not so much, but fortunately in a way that future courts can and should discount.

The core question in both cases was whether using copyrighted works to train Large Language Models (LLMs) used in AI chatbots is a lawful fair use. Under the US Copyright Act, answering that question requires courts to consider:

  1. whether the use was transformative;
  2. the nature of the works (Are they more creative than factual? Long since published?)
  3. how much of the original was used; and
  4. the harm to the market for the original work.

In both cases, the judges focused on factors (1) and (4).

The right approach

In Bartz v. Anthropic, three authors sued Anthropic for using their books to train its Claude chatbot. In his order deciding parts of the case, Judge William Alsup confirmed what EFF has said for years: fair use protects the use of copyrighted works for training because, among other things, training gen-AI is “transformative—spectacularly so” and any alleged harm to the market for the original is pure speculation. Just as copying books or images to create search engines is fair, the court held, copying books to create a new, “transformative” LLM and related technologies is also protected:

[U]sing copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use.

Importantly, Bartz rejected the copyright holders’ attempts to claim that any model capable of generating new written material that might compete with existing works by emulating their “sweeping themes, “substantive points,” or “grammar, composition, and style” was an infringement machine. As the court rightly recognized, building gen-AI models that create new works is beyond “anything that any copyright owner rightly could expect to control.” 

There’s a lot more to like about the Bartz ruling, but just as we were digesting it Kadrey v. Meta Platforms came out. Sadly, this decision bungles the fair use analysis.

A fumble on fair use

Kadrey is another suit by authors against the developer of an AI model, in this case Meta’s ‘Llama’ chatbot. The authors in Kadrey asked the court to rule that fair use did not apply.

Much of the Kadrey ruling by Judge Vince Chhabria is dicta—meaning, the opinion spends many paragraphs on what it thinks could justify ruling in favor of the author plaintiffs, if only they had managed to present different facts (rather than pure speculation). The court then rules in Meta’s favor because the plaintiffs only offered speculation. 

But it makes a number of errors along the way to the right outcome. At the top, the ruling broadly proclaims that training AI without buying a license to use each and every piece of copyrighted training material will be “illegal” in “most cases.” The court asserted that fair use usually won’t apply to AI training uses even though training is a “highly transformative” process, because of hypothetical “market dilution” scenarios where competition from AI-generated works could reduce the value of the books used to train the AI model..

That theory, in turn, depends on three mistaken premises. First, that the most important factor for determining fair use is whether the use might cause market harm. That’s not correct. Since its seminal 1994 opinion in Cambell v Acuff-Rose, the Supreme Court has been very clear that no single factor controls the fair use analysis.

Second, that an AI developer would typically seek to train a model entirely on a certain type of work, and then use that model to generate new works in the exact same genre, which would then compete with the works on which it was trained, such that the market for the original works is harmed. As the Kadrey ruling notes, there was no evidence that Llama was intended to to, or does, anything like that, nor will most LLMs for the exact reasons discussed in Bartz.

Third, as a matter of law, copyright doesn't prevent “market dilution” unless the new works are otherwise infringing. In fact, the whole purpose of copyright is to be an engine for new expression. If that new expression competes with existing works, that’s a feature, not a bug.

Gen-AI is spurring the kind of tech panics we’ve seen before; then, as now, thoughtful fair use opinions helped ensure that copyright law served innovation and creativity. Gen-AI does raise a host of other serious concerns about fair labor practices and misinformation, but copyright wasn’t designed to address those problems. Trying to force copyright law to play those roles only hurts important and legal uses of this technology.

In keeping with that tradition, courts deciding fair use in other AI copyright cases should look to Bartz, not Kadrey.

  •