Become a member
5

MIN READ

November 21, 2024

HarperCollins AI deal: what authors need to know

HarperCollins has confirmed recent reports of an agreement with an unnamed AI technology company to allow “limited use of nonfiction backlist titles for training AI models.” The publisher is in consultation with authors about this opt-in arrangement, offering USD $2,500 per title for a three-year license. This is the first AI licensing agreement to be reached by a major trade publisher and it involves over 150 Australian authors as part of a larger international deal. It is therefore more important than ever for authors and illustrators to come to grips with AI so they can make informed decisions about whether or not they wish to license their work.

The ASA supports and has been advocating for appropriate licensing to replace the uncontrolled, unlicensed, unremunerated use of authors’ and illustrators’ work to train AI models like Chat GPT or Llama. At the same time as we are seeking redress and compensation for this original infringing use, we are also calling for a licensing regime that gives authors the right to say yes or no, to impose limits on licences, and to be compensated fairly if they opt in.

While there are elements of the HarperCollins licensing agreement that raise concerns – which we detail below – it is a step in the right direction that authors are being consulted, their explicit permission is required, and the publisher has set guardrails around use of the material. In contrast, deals negotiated by some educational publishers reportedly involved no consultation with authors, and no ability to opt out. The HarperCollins agreement represents a welcome acknowledgement from publishers that they must seek permission from creators for AI licensing. 

The details of the deal

The ASA has reviewed news reports and spoken with HarperCollins Australia CEO, Jim Demetriou, to better understand the terms of the arrangement.

Three-year licence

The agreement involves a non-exclusive three-year licensing term for use of the authors’ work to train the tech company’s AI model. The AI company would need to renew the licence thereafter, or cease using the material for training, fine-tuning, or testing.

While we are supportive of setting term limits around use of material, more transparency is required to understand how this is feasible. Over the last two years, AI tech companies have maintained that removing material from their LLMs is unimaginably difficult. Not only is there an overwhelming volume of data ingested by generative AI models, the models are learning statistical relationships between data points, which aren’t so simple to “unlearn”. Our understanding is that, increasingly, AI companies can prevent content from being regurgitated on the output level, but once a work has been ingested, it has been ingested. 

What’s more, if each new iteration of a generative AI model is built on the last, what incentive is there for the tech company to renew the licence, given the work has already been used to train their model? If it’s the case that the tech company is not removing the material from their LLM – and instead, just not using that material for training of future iterations of the model – then this raises questions about the fairness of the $2,500 payment, which could represent the full extent of the remuneration an author is ever likely to receive under this deal.

The problem is that AI tech companies have been less than transparent about data, their practices, and how their systems work. The question becomes: what assurances can they make about removal of training data after a three-year term? How will this be enforced?

We encourage authors to seek more information about this from their publisher to ensure that they are satisfied their work will be properly safeguarded.

Guardrails

As Publishers Lunch reports, HarperCollins has negotiated four key guardrails as part of the licensing agreement. The AI tech company must:

  1. Limit their model’s outputs to no more than 200 consecutive words and/or five percent of the book’s text across multiple outputs during a user session
  2. Prohibit commercial users from attempting to infringe the copyright of the books used for training, and must monitor infringing uses
  3. Pledge not to scrape data from an agreed upon list of pirate websites
  4. Promptly address any breaches of copyright once it is brought to their attention

 

The ASA supports limitations on use of the licensed work to prevent infringing or harmful outputs and further theft of copyright material. Having said that, there is uncertainty about how the guardrails will work because of the opacity about how the technology works and the limited extent to which LLMs can currently prevent regurgitation. This makes it difficult to determine the extent to which authors’ rights are adequately protected. For example, how would these guardrails prevent any users from thwarting the output restrictions by generating bite-size chunks of text across multiple sessions?

Additionally, we encourage authors to ask their publisher how their moral rights are being protected. In the case of nonfiction authors especially, the right to attribution or against false attribution may be of some concern, and has not been addressed in these guardrails. 

Compensation

The unnamed technology company has reportedly offered USD $5,000 per title to HarperCollins under this agreement, meaning the USD $2,500 offered to authors represents a 50/50 split. The publisher has defended this split citing the amount of work involved in brokering the deal, as well as the administrative work required to prepare files to send to the AI company.

Regardless, the ASA believes a 50/50 split does not represent fair compensation. Not only is it the authors’ expression and ideas – the text – that are of most value in AI training, it is also authors’ and illustrators’ work that is likely to be displaced or supplanted by generative AI technology. We share the Authors Guild US’ view that “the authors should receive most of the revenue, minus only the equivalent of an agent’s fee, plus what is needed to compensate the publisher for additional labor or rights.” There are some reports indicating that the 50/50 split is non-negotiable, however, we encourage our members to try to seek fairer compensation as it is primarily their work upon which AI models and licensing depends.

While we note that the HarperCollins agreement is a one-off addendum to existing contracts for deep backlist nonfiction titles, we would not like to see this set a precedent for unfair compensation to creators.

We acknowledge that the landscape for artificial intelligence is shifting rapidly and authors may feel incredibly ambivalent about entering into such licensing agreements. On one hand, so much is still unknown, tech companies have acted in bad faith by scraping material from the internet without permission or payment, and this technology has the potential to diminish authors’ and illustrators’ future earnings. On the other, this could represent a welcome new income stream for Australian creators, who are currently earning on average only $18,200 per annum.

You are not alone. If you have been in consultation with your publisher about AI licensing or you have any questions, contact the ASA through our free Member Advice Service, or seek legal advice through our subsidiary law firm, Authors Legal.

HarperCollins statement

Jim Demetriou shared a statement from HarperCollins, which reads: 

HarperCollins has reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance. While we believe this deal is attractive, we respect the various views of our authors, and they have the choice to opt in to the agreement or to pass on the opportunity.

HarperCollins has a long history of innovation and experimentation with new business models. Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams. This agreement, with its limited scope and clear guardrails around model output that respects author’s rights, does that.

The ASA will continue to update members on any new licensing agreements as they arise. Read more about our advocacy on artificial intelligence.