Creative Commons, a prominent nonprofit organization that pioneered the licensing movement enabling creators to share their work while maintaining copyright, is stepping into the AI era. Recently, the organization announced the initiation of a new project called CC Signals, which will allow dataset holders to specify how their data may be reused by machines, particularly in training AI models. The intent behind this initiative is to strike a balance between the open nature of the internet and the growing demand for data needed to train AI systems.
Creative Commons warns that ongoing data extraction could undermine internet openness, leading entities to restrict access to their sites or implement paywalls instead of sharing data freely. The CC Signals project aims to offer a legal and technical framework for sharing datasets, facilitating better communication between data controllers and AI developers. As the demand for such tools increases, companies are adjusting their policies regarding user data.
Some are limiting AI training on their data, while others clarify how user data may be utilized for AI-related purposes. For example, X initially changed its privacy policy to allow third parties to train AI models using its public data, but later reversed that decision. Reddit is utilizing its robots.txt file to prevent AI bots from scraping its data, while Cloudflare is exploring options to charge AI bots for data scraping.
Open-source developers are also working on tools to hinder AI crawlers that disregard their no-crawl directives. CC Signals offers a distinct alternative, providing a set of tools that combine legal enforceability with ethical considerations, similar to existing Creative Commons licenses that cover billions of openly licensed works. Creative Commons CEO Anna Tumadóttir emphasizes that these signals are intended to sustain the commons in the AI age, helping to create an open AI ecosystem rooted in reciprocity.
Early designs for the CC Signals project are being shared for public feedback, with plans for an alpha launch in November 2025 and a series of town halls to gather input.