AI misuse

There are a few different angles to the debate around the use of authors’ work to train AI models, though I’ll give you the bottom line: Authors are going to end up getting screwed one way or another.

Why is that? Because the way that the current laws are set up isn’t structured to enforce copyright in this circumstance, while any future protections for authors will likely be watered down in order to ensure that they don’t impede progress in AI development.

I’ll explain.

First, on the case of current authors’ work that’s been used to train AI models. Most authors are rightly outraged that their books have been used to assist in building systems that are going to cannibalize their industry, without their permission, and it’s true that certain AI developers have essentially stolen authors’ work without permission.

Meta has been a key target in this respect, because according to reports, Meta knowing loaded its AI language database with content from LibGen, a “shadow library” which illegally hosts millions of copyrighted books and academic papers.

To be clear, Meta’s likely not the only AI developer to have accessed the LibGen database. But because Meta generates hundreds of billions of dollars from ads, and it’s not paying authors to use their work, many people have taken aim at Meta’s process as their main focus of angst in this respect.

Indeed, several high-profile authors took Meta to court seeking damages on this front, after demonstrating that Meta’s AI tools were able to accurately recreate large segments of their work. This, the authors claimed, is evidence of copyright infringement, and they sought damages through the US court system.

And failed.

Why?

A US Federal Court ruled that Meta has not violated copyright in building its AI datasets with LibGen content, because the judge in the case found that the company’s use of these works is for “transformative” purpose. And as such, Meta’s AI tools are not designed to create competing works, as such.

This comes down to legal technicalities: Yes, Meta likely took this content from LibGen and loaded it into its system. But the copyright infringement in this case has been done by the LibGen database, not Meta, which has merely reused the information it accessed in the dataset.

The judge’s view is that AI tools are not designed to infringe on copyright, as such, because they create new works, and only use these datasets as context. It’s kind of like trying to argue that a car manufacturer is responsible for a person speeding, Meta only built the tool, it doesn’t control what people might do with it.

That also means, as the judge noted, that there may be specific cases where copyright is infringed by AI tools, like if you could show that somebody else repurposed your work specifically, and started making profit from that, then there could be a case for compensation. But that would be on a case-by-case basis, and is not prosecutable under a broader umbrella ruling on such use.

The actual violator in this case would be LibGen, which has proven difficult to shut down, because it regularly shifts domain names, and is hosted in Russia.

But because Meta’s not republishing this content direct, and profiting from that, the legal technicalities mean that Meta will likely be able to argue that such access is “fair use,” so authors won’t be getting any compensation from, or control over Meta direct on this front.

This is also further complicated by a broader push by big tech firms to encourage leniency on AI development, in order to ensure that the US is able to lead the AI race.

The White House, for example, recently outlined its AI action plan, in which it noted that it’ll be looking to:

“…work with all Federal agencies to identify, revise, or repeal regulations, rules, memoranda, administrative orders, guidance documents, policy statements, and interagency agreements that unnecessarily hinder AI development or deployment.”

Governments around the world are looking to drive AI innovation, and under pressure from the US, may find it difficult to build in relevant protections to impede such.

Though they are still exploring such.

The Australian Productivity Commission (APC) is exploring ways to protect Australian authors’ work, and build AI-specific regulations into future publishing contracts, for example.

Though many don’t believe that the APC is overly invested in such, and it seems unlikely that there are going to be any significant regulatory shifts to ensure greater protections. Which is further complicated by the existence of datasets like LibGen, which already operate outside the law, and if AI developers are legally allowed to access such, I’m not sure what the APC can actually do.

So, essentially, Australian authors, and authors in general, don’t have a lot of legal recourse for AI misuse, unless there are specific cases where you can demonstrate that AI tools have been used to rip-off your work, and have harmed your business opportunities as a result.

Which sounds a little defeatist, but this is based on the rules as they stand, and the legal precedent established by past cases.

You can’t simply argue that you don’t like it, that you don’t want your work used, you need to be able to prove that it has been, for one, then show that you’ve lost out as a result.

Which means that there will be AI copyright cases, but in general, they’ll be difficult to litigate. Which means that authors are going to get screwed. One way or another.

Leave a comment