California’s A.B. 412: Championing Transparency or Hindering Innovation?
California legislators are currently deliberating on Assembly Bill 412 (A.B. 412), a bill that mandates AI developers to track and disclose all copyrighted works used in training their AI models. While the intention might appear to be a straightforward step toward transparency, the practical implications could be far-reaching, potentially stifling small AI startups and inadvertently consolidating power in the hands of larger tech corporations.
An Onerous Burden on Small Developers
The AI sector is undergoing a period of rapid growth, but there’s a risk of it being dominated by large, well-funded companies. While these tech giants often capture headlines, there’s a vibrant ecosystem of smaller AI companies, some with fewer than ten employees, striving to create innovative solutions in niche areas. A.B. 412 imposes a considerable burden on these developers, requiring them—even those operating as a two-person team or hobbyists experimenting with small software projects—to identify copyrighted materials used in their AI training.
This requirement is inherently complex, particularly given the current state of copyright registration in the U.S. The registration system is not user-friendly or easily searchable, functioning more like an antiquated card catalog than a modern database. It lacks the necessary tools to accurately identify all authors of a work, making it difficult for developers to reliably match materials in their training sets to information within the copyright system. Even for major tech companies, adhering to these new obligations would be a significant challenge.
For small startups, the additional cost of compliance could be a death sentence. If A.B. 412 becomes law, these smaller companies will be forced to divert their limited resources to navigate an unworkable compliance structure, inevitably taking focus away from development and innovation. The risk of facing legal action, potentially from copyright trolls, could deter new ventures from even entering the AI field.
AI Training: A Question of Fair Use
A.B. 412 is based on the problematic assumption that the use of publicly available web content for AI training constitutes copyright infringement. Courts are expected, and it is believed they will find, that the vast majority of this activity is fair use. It’s a cornerstone of internet law that certain types of online content copying are considered transformative and therefore legal under fair use. Examples include reproducing thumbnail images for image searches or snippets of text used in book searches.
The U.S. copyright system aims to strike a balance between fostering innovation and protecting the rights of creators, and the courts are still in the process of determining how copyright rules should apply to AI training. In many current AI-related cases, courts have not yet reviewed how fair use should apply in this context.
A.B. 412 could pre-empt this process by establishing a vague, excessively broad standard that could ultimately do more harm than good. It is crucial to recognize that these critical court cases will be decided in federal courts. Copyright is governed by federal law, and A.B. 412 inappropriately attempts to impose state-level copyright legislation on a still-evolving issue.
The Unintended Consequence: A Gift to Big Tech
The irony of A.B. 412 is that instead of halting AI development; it may concentrate it within the largest corporations. Big tech firms already have the resources to navigate legal and regulatory complexities. They can afford the costs of compliance, or at least give the impression of compliance, with A.B. 412’s burdensome requirements. Meanwhile, small developers could be forced out of the market or compelled to form partnerships, losing their independence.
The eventual result could be less competition, fewer innovations, and a tech landscape dominated by a handful of enormous companies. If lawmakers can address some of the problems with A.B. 412 and manage to pass a version of it, they could actually force programmers to research—and effectively pay off—copyright owners before writing a line of code. If this happens in California, Big Tech will not be discouraged. They will benefit. Only a handful of companies own extensive content libraries or can afford to license the materials necessary to develop a deep learning model. The opportunities for startups and small programmers will decrease, and the competition will be so limited, that the profits for large companies will increase and be maintained for a generation.