Become a member
6

MIN READ

January 24, 2024

Artificial intelligence: the latest developments

There’s been a lot of AI-related news over the holiday break; The New York Times has joined the growing list of writers, illustrators, and organisations suing Generative AI developers for copyright infringement; Open AI has admitted that in-copyright content is essential to train their Generative AI systems; AI-generated fake Aboriginal art has begun to crop up; and the Government has released their interim report on safe and responsible AI in Australia.

In late December, The New York Times filed a complaint against OpenAI and Microsoft following a breakdown in their months-long negotiations with Open AI to license The Times’ content for AI training purposes.

The complaint sets out the importance of independent high-quality journalism for democracy and the whole of society and the enormous investment of time, talent, and money required to produce such work. It contends that millions of NYT articles were used by OpenAI – a multi-billion dollar business – to train their automated chatbots, without permission or payment.

The Times sought to address the assertion from OpenAI that such copying is ‘fair use’ and therefore permissible under US law. For context, US law allows for the use of copyright material under particular circumstances according to certain ‘fairness factors’. A key factor in determining ‘fair use’ is whether the use is ‘transformative’; namely whether the use adds something new, with a further purpose or different character, and is not merely a substitute for the original work. Australian copyright law does not include a US-style ‘fair use’ exception. 

According to The Times, “there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it….The law does not permit the kind of systematic and competitive infringement that Defendants have committed.” 

The complaint makes the point that some of ChatGPT’s outputs are verbatim copies of The Times’ content or summaries or derivatives of The Times’ work, which mimics its expressive style, and includes examples in the pleadings. Just as concerning, says The Times, is that ChatGPT has also hallucinated false versions of The Times’ articles or falsely attributed information to The Times, thereby harming The Times’ brand.

OpenAI has disputed the claims made by The New York Times’ complaint in a blog post dated 8 January. Similarly to other AI companies, it  asserts that copying for AI training purposes is permissible under fair use, and downplays “rare” regurgitation of copyright works by their generative AI systems.

In its blog, OpenAI explains that it is actively trying to work to minimise regurgitation of training materials: 

“[W]e have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.” 

To create software that relies on third party IP without permission and then admonish users for using that software to replicate that same IP without regard to OpenAI’s ‘terms of use’ is somewhat galling.  It is not yet clear whether the regurgitation problem can be solved

In its blog, OpenAI also alludes to possible future licensing deals, citing the partnership deals with Associated Press and Axel Springer as examples. On the one hand, this is promising as it hints at the prospect of negotiated agreements. On the other hand, OpenAI and The Times did not reach an agreement after months of negotiation. 

We understand that Apple has reportedly commenced discussions with news publishers to use their content for training its generative AI models. To the best of our knowledge, no licensing arrangements are in place with book publishers anywhere in the world. 

Many pro-creator commentators have been buoyed by the filing of The New York Times suit, not only because of the thoroughness of the arguments made in the complaint, but because it signals the arrival of media heavyweights in the fight against AI companies’ use of copyright work without permission or payment. (Read an analysis of The Times complaint in Hugh Stephens’ International Copyright Issues blog.)

What’s particularly infuriating for creators is that there are increasing indications that AI developers willfully used copyright works for AI training even though they knew, or ought to have known, that such use would raise copyright infringement risks. In the class action brought by authors against Meta, the plaintiffs have filed evidence to show that Meta’s lawyers had warned against using the Book3 dataset and advised that AI “models cannot be published if they are trained on that data.” Nevertheless, Meta included Books3 in its training dataset for Llama 1 (its large language model). 

In early December 2023, in a submission to the UK Government, OpenAI stated, “Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.” 

In our view, this admission states what we already believed: OpenAI actively decided to use copyright works for AI training because of the relevance, currency and quality of that work, and they couldn’t have built their Large Language Models without such inputs. (Read more in The Guardian). That OpenAI is prepared to acknowledge the enormous value of copyright works but not prepared to seek licences and pay for that work – despite their estimated $90 billion valuation – is a source of frustration and anger for the creative community. 

Concerns about AI enabling fake Indigenous art are also growing, with fake Indigenous art images having been found for sale on Adobe and Shutterstock. Crikey reported on the concerns of First Nations artists whose work is being used, without permission, to train AI software which then generates images from word prompts.  As we raised in our submission to the Government on AI, at a time when the Government is seeking to safeguard First Nations traditional knowledge and cultural expressions, generative AI is undermining Indigenous Cultural and Intellectual Property and threatening the livelihoods of First Nations artists.  

Finally, on 17 January, the Federal Minister for Industry and Science, the Hon Ed Husic MP, released the Federal Government’s Interim Report in response to the consultation on Safe and Responsible AI. 

The Report detailed excitement over the potential opportunities posed by AI adoption, including in medical imaging, engineering, managing natural emergencies and improving educational outcomes for children, but also canvassed the low public trust in AI systems and broad concerns that not enough is being done by the Government to mitigate risks. The Report acknowledged that the speed and scale of AI development is driving this concern and necessitates government intervention. 

The Government has adopted a risk-based approach, meaning it will introduce mandatory obligations for developing and deploying AI in high-risk contexts only and allow the use of AI in low-risk settings to flourish unimpeded. 

Overall, we regard this Report as a start – and we know that consultation will be ongoing –  but we’re concerned that it reflects a cautious approach to regulating AI. We were disappointed to see that while the Report acknowledged that transparency obligations could include labelling or watermarking of outputs and could require public reporting on the data on which an AI model is trained, this will be limited to high-risk settings on a voluntary basis, with mandated requirements for “longer term consideration”. 

We’re yet to understand how high-risk settings will be defined. In our submission to Government, we advocated for mandated transparency on both training inputs and AI-generated outputs.

While the Department of Industry, Science, and Resources explicitly stated it was not seeking to consider intellectual property or copyright issues in relation to AI, in our view, the harmful ways in which generative AI has been developed must be resolved as an initial priority because the alternative is to condone the enormous land grab that is at the centre of generative AI training. 

Despite the growing list of overseas lawsuits, the Report was light on references to rightsholders’ serious concerns about copyright infringement in AI training and AI-generated outputs, save for a welcome acknowledgement that “data sets could use intellectual property without approval from the owner in a way that breaches relevant intellectual property laws and undermine the legitimate commercial interests of rightsholders.”

The Government also noted that some legislative frameworks may need updating or clarification to be relevant in an AI context, including copyright. Lastly, the Report refers to the ongoing work of the Attorney-General’s Department and IP Australia on the implications of AI on copyright law. 

Our conclusion is that copyright has been parked as an issue needing further work. We look forward to the establishment and input of the Copyright and AI Reference Group announced by the Attorney-General last December. The ASA will continue to advocate on behalf of authors; including continuing to call for a commitment from the Government to develop a Code of Conduct on Copyright and AI which makes clear that copyright works used for AI training must be appropriately licensed.