from the reading-by-robots dept
This series of posts explores how we can rethink the intersection of AI, creativity, and policy. From examining outdated regulatory metaphors to questioning copyright norms and highlighting the risks of stifling innovation, each post addresses a different piece of the AI puzzle. Together, they advocate for a more balanced, forward-thinking approach that acknowledges the potential of technological evolution while safeguarding the rights of creators and ensuring AI’s development serves the broader interests of society. You can read the first, second, third, fourth, and fifth posts in the series.
Whenever content is involved, copyright enters the conversation. And when we talk about AI, we’re talking about systems that absorb petabytes of content to meet their training needs. So naturally, copyright issues are at the forefront of the debate.
Interestingly, copyright usually only becomes an issue when there’s the perception that someone or something is successful—and that copyright holders are missing out on potential control or revenues. For decades, “reading by robots” has been a part of our digital lives. Just think of search engines crawling billions of pages to index them. These robots read far more content than any human ever could. But it wasn’t until AI began learning from this content—and, more crucially, producing content that appeared successful—that the rules inspired by the Queen Anne Statute of 1710 come into play.
The Input Side: Potential Innovation and the Garbage In, Garbage Out Principle
On the input side, generative AI relies heavily on the data it consumes, but under EU law, its access is carefully regulated. The 2019 EU Directive on Copyright in the Digital Single Market (DCDSM) sets the framework for text and data mining (TDM). Article 3 of the Directive permits TDM for scientific research only, while Article 4 allows it more broadly—provided the rightsholder hasn’t expressly reserved their rights.
With the AI Act adopted in 2024 referring to these provisions, we’re left with a raft of questions about the future of AI models. One of the key concerns is the potential for a data winter—a scenario where AI models face limited access to the data they need to evolve and improve.
This brings us to a fundamental concept in AI—Garbage In, Garbage Out. AI models are only as good as the data they are trained on. If access to high-quality, diverse datasets is restricted by rigid copyright rules, AI systems will end up training on lower-quality data. Poor-quality data leads to unreliable, biassed, or outright inaccurate AI outputs. Just as a chef can only make a great dish with fresh ingredients, AI needs high-quality input to deliver reliable, innovative, and useful results. Restricting access due to copyright concerns risks leading AI into a “data winter” where innovation freezes, limited by the garbage fed into the system.
A data winter not only stifles technological advancement but also risks widening the gap between regions that enforce stricter copyright policies and those that embrace more flexible rules. Ultimately, Europe’s global competitiveness in AI hinges on whether it can provide an environment where AI can access the data it needs without unnecessary restrictions.
But access to diverse data is also important from a cultural perspective: if AI is trained predominantly on Anglo-Saxon or non-European content, it naturally reflects those cultures in its outputs. This could mean that European creativity becomes increasingly marginalised, with AI-generated content lacking in cultural relevance and failing to reflect the diversity of Europe. AI should be a tool that amplifies the diversity of human expression, not one that homogenises it.
Challenges on the Output Side: Copyright Protection for AI-Generated Content
Now let’s look at the output side of generative AI. The assumption that creative works, like movies, video games, or books, are automatically protected by copyright may not apply to AI-generated content. The traditional protection of creative expression hinges on human authorship, and while creative elements like prompt choices could be considered for copyright, the level of protection will likely be much lower than expected. This could mean that parts of a work—such as AI-generated backgrounds in video games or movies—could be freely copied by others.
This uncertainty could lead to increased pressure from creative industries to modify copyright law, pushing for more familiar levels of protection that might extend copyright to currently unprotected AI-generated content. If such changes happen, we could end up in a spiral where access to knowledge becomes more restricted, stifling creativity and innovation. We’ve seen similar debates before—most notably during the advent of photography, when early courts struggled to determine whether machine-created works could be protected.
The path forward requires a careful balancing act: we need copyright laws that protect human creativity and labour without hampering access to the data that AI—and society—need to innovate and grow. By avoiding a data winter and ensuring AI systems have access to diverse, quality inputs, we can harness AI’s potential to drive the creative industries forward, rather than allow outdated copyright rules to drag progress backward.
Caroline De Cock is a communications and policy expert, author, and entrepreneur. She serves as Managing Director of N-square Consulting and Square-up Agency, and Head of Research at Information Labs. Caroline specializes in digital rights, policy advocacy, and strategic innovation, driven by her commitment to fostering global connectivity and positive change.
Filed Under: access to data, ai, copyright, creativity, creativity and ai, incentives, right to read