Consultation on Copyright in the Age of Generative Artificial Intelligence: Submissions C-D (2024)

The information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages and privacy requirements.

C

The Canadian Animation Guild, IATSE Local 938

Technical Evidence

Measures to limit risk of producing copyright-infringing work are distinct to the different levels of authority and creative involvement that individuals have during a production. The workers are responsible for sourcing, producing and editing assets, art and writing. Leadership in studios will decide upon the methods, tools and programs that workers must employ on a project. CAG938 is able to provide perspective on what measures are taken by all of these roles to mitigate risks of copyright infringement.

Workers are required to ensure that assets they produce for clients and studios are free of copyright infringing material. This responsibility is easy to meet when working with standard artist tools because the production of the work employs many conscious choices. An artist can decide to not reuse work they have created for a client in the past, and to not plagiarise work made by others. Any risk they are exposed to in the completion of their work is within their own control because they are exercising their judgement at each step of the process.

If workers are required to use AI in the completion of their work it is no longer easy for an individual worker to ensure the output they produce is free of infringing material. Artists are not generally trained in the skills needed to examine the quantity of data needed to train a generative AI model and ensure that dataset is free of copyright-protected material. Additionally, no production schedule allows them the time to fully examine a dataset, should they in fact have the skills to do so. They would not be able to reasonably remove the ability of an AI model to produce infringing output, and would be far less able to recognize infringing works created by AI.

Most animation studios in Canada are not developing their own datasets and models for generative AI in-house. Studios are licensing pre-existing datasets and models. Our AI in the Workplace report form hosted on the CAG938 website (CAG938.ca) has received reports indicating that studios that are using AI tools are employing Open AI, Midjourney, Chat GPT, Adobe Firefly and Dall-E in their explorations. Because the studios are relying solely on these outside tools, they are not taking a role in the curation of the datasets being used and rely on the developers of the tools to report accurately on the use of copyright-protected data.

It is possible for a human artist to create work that is copyright-infringing without intending to, but it happens rarely and at a rate that is reasonable to correct, while work that is created through generative AI is much more prone to infringing on copyright. Creative products are made through the synthesis of human memory and expression and workers create based on what they have seen and felt. It is easy for a worker to know when they are intentionally plagiarising someone. Studio management can thus rely on the workers they employ to not produce infringing material. Generative AI uses no such self-awareness when working with the massive datasets it relies on. Also, it is drawing on a body of work many times larger than any one artist on a team could hold in mind. The massive datasets make it difficult to recognize infringing output made by the generative AI, because there is no guarantee that the workers on a production would be aware of all of the copyright-protected works in the dataset. Taken together, work created by generative AI is prone to “overfitting” and creating work that would violate copyright protections. The Institute of Electrical and Electronics Engineers (IEEE) recently reported on this issue and assembled a useful demonstration of how easy it is to prompt infringing material from generative AI. (https://spectrum.ieee.org/midjourney-copyright) If the team of artists and management both fail to recognize infringing output that a generative AI is likely to produce, it opens both of these groups up to risk.

Largely, the ability to mitigate risk of generative AI producing infringing content is out scope for individuals making artistic and management choices in the Canadian animation industry. The individuals that do have access to the training data are the only ones that can prevent infringing output being produced by managing the data that is input into the model in the first place.

The creative industries in Canada have been involved in the development of AI systems by no choice of our own. Our local’s members and their peers are being relied upon as producers of training data. This data can include the scraping of workers’ independently produced artworks as well as the work they make for hire in studios. It is a massive amount of labour and human experience. Generative AI would not have developed to the point that it has without our work fed into it.

In very simple terms, the developers that gather the datasets upon which generative AI relies, produce them by using web crawlers, scrapers and data mining. They collect massive amounts of data from across the web and from archives, and sort it into usable sets of data using a system such as a Contrastive Language–Image Pre-training program (CLIP). The CLIP identifies what words describe what images, and from the parsing of those relationships it then predicts what it is being asked for when prompted to create something. This explanation is a simplification of how LAION curates their LAION-5B dataset, used by many developers including Stability AI, Midjourney, and Google, and how Open AI describes their own CLIP approach. Deeper intricacies in the process exist, but this is a solid summation of the processes.

Even though the majority of creative workers in the Canadian entertainment industries are not writing code, we are involved in the creation of these systems as subjects of study. The generative AI’s outputs suffer when the input of training data is interrupted, either through cloaking artwork through the Glaze program (https://techcrunch.com/2023/03/17/glaze-generative-ai-art-style-mimicry-protection/) or actively attacking the images in the data source through data “poisoning” tools such as Nightshade (https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/). When we are removed from the generative AI process, it ceases to function. As essential players in the creation of this technology, we have a right to decide how it is used.

Generative AI can show up in myriad ways across the Canadian Animation and Video game industries. Our Local has received reports of the use of AI in pre-visualization, design, writing, compositing and in at least one visual effects studio, all steps of the pipeline. Generative AI has been used to create short animations in their entirety for large clients such as Disney and Warner Music. It has also been introduced into production pipelines in large and small ways from assistive colouring, to automation of processes, writing of emails and sorting of job applications.

Animation, Games and VFX studio management often hopes to use AI as a tool to cut costs by automating work. The 2023 Writers’ Guild of America and Screen Actors’ Guild strikes in the United States spotlighted the lengths to which the The Alliance of Motion Picture & Television Producers were willing to go to have free rein to automate the labour of their workers. The WGA struck for 148 days and SAG-AFTRA struck for 118 days before the final tentative agreements, that included protections against AI, were accepted. The AMPTP are large clients of the Canadian entertainment industries, so their interests are naturally replicated in the Canadian studios that service them. Unregulated AI usage could result in a marked shrinkage of the animation, video games and VFX industries and result in a large loss of jobs.

Another harmful angle that arises from AI being used to cut costs on creative work is the loss of unique Canadian artistic expression in a broader media landscape. Canadians have a distinct and valuable perspective that is appreciated worldwide. Within our nation we also have incredible varieties of human experiences and subjectivities that create a rich culture. AI design asks it to generate output based on the patterns identified in the training data. Therefore, it tends to recreate any bias present in the training data reducing its output to an echoing of dominant attitudes in culture. Because AI is unable to observe and experience the world and develop a sapient subjectivity, it cannot innovate in art in the way Canadians always have. Many great Canadian artists in the animation industry developed their unique artistic voices with the same jobs that are at risk of automation. Canadian animation and its distinct history risk losing the next generation to the cost-cutting associated with generative AI automation.

In the future, there may be room for AI to be used as assistive tools to help make the work of creatives more efficient without resulting in automation of their labour. Canadian software developers have created tools that have become standard across the international animation industry, such as the Toon Boom suite, the industry standard 3D software Maya and the robust VFX software Houdini. Canadian-developed assistive AI software, that could rise to the international standard level that other tools have achieved, risks being lost in the current climate around generative AI. Generative AI banks on false promises about what it can do, and sells itself as a replacement for skilled human labour. The natural distrust for technologies peddled in this way stymies the potential for adoption of actual useful AI tools that could elevate human performance. Copyright legislation that protects the labour of creatives will also create markets for AI ingenuity that can exist alongside creative workers.

Across the board, technical evidence suggests that legislation is needed to make clear how generative AI should interact with copyright.

Text and Data Mining

Clarity and regulation around copyright and TDM, applied appropriately, would help to boost both the AI and creative industries. The ability for copyright holders to opt out of TDM is essential to new legislation. Ideally, law would allow for all Canadians to enjoy security in the knowledge that by default their personal and copyright protected data may not be used for TDM. Regulations that protect the copyright of creatives and ensure that they are fairly compensated for the use of their works, or that allow them to opt out of the use of their works, will give artists the safety they need to be able to work alongside new technology.

Current TDM law in Canada does not guarantee the right to opt out of data mining. This lack of control, combined with the fact that large quantities of data that is being mined was scraped from public internet pages leaves many Canadians feeling exposed and has broken their trust in public virtual spaces. The LAION-5b dataset was created through the TDM of the data collected by Common Crawl. Common Crawl is a project that employs web crawlers and data scraping to maintain what it describes as a copy of the internet. LAION-5b is used as a dataset for Stability AI, Midjourney, Google’s Imagen and more AI models. A developer can download an artist’s body of work, mine it for data, and then output work that either is directly infringing on that artist’s copyrights or so visually similar to that it would be mistaken for their output, without ever speaking to that artist. Where there is consent, there can be no trust in the technology.

When an artist attempts to exercise their copyright and remove their work from TDM for AI, the lack of relevant copyright law can embolden the infringement. This backfiring was the experience of Sam Yang, a Toronto-based artist. (https://shorturl.at/bgsKW) Yang found himself unable to rely on existing copyright law, because it does not adequately define infringement when AI is involved.

When copyright is respected, and the work that can be done in AI is limited to innovation within those parameters, that is when truly useful tools will be developed. Most AI tools on the market currently are polished randomization tools that use mass amounts of copyright-protected work in order to generate flashy output. While that output may look impressive, it betrays a lack of competency or insight. Under scrutiny, the errors that are made by AI are very basic and easily to spot. This casts doubt on the efficacy of the tools and undermines the credibility of the entire field. Smaller, properly licensed datasets will necessitate true innovation in AI.

TDM is happening in Canada, and the copyright protected data of Canadians is being processed using TDM. We have observed the data of our industry peers being used, and the low barrier to conducting TDM makes it easy for anyone to engage in this practice. While it may be possible to list every individual running TDM applications within Canada, it is less useful than examining the evidence that TDM is impacting Canadian industry.

Canadian artists’s data is being mined for training generative AI. Examining the Midjourney name list alone, which was entered into evidence in the Andersen v. Stability AI Ltd. lawsuit, makes it clear that developers relied heavily on the work of Canadian artists in order to train generative AI. (https://shorturl.at/pzGNZ) On this list of names, our union identified a number of our members as well as their clients. Canadian labour is present in the work captured from companies like Riot Games, and is likely captured in works attributed to animation professionals that work with Canadians such as Lauren Faust (My Little Pony) and Justin Roiland (Rick and Morty). Other notable Canadian artists that are on the Midjourney name list are: Danny Antonucci, Seb McKinnon, Joy Ang, Anastasia Ovchinnikova, Jason Rainville, Zara Alfonso, Janine Johnston, John Howe, Michael Walsh, Kate Beaton, Fiona Staples, Attila Adorjany, Bobby Chiu, Nina Matsumoto and more. Further, the presence of these names on the Midjourney list indicates that the program was trained to specifically mimic them. Works by many other artists yet to be identified have been mined to make up the rest of the training data to make broad descriptive categories.

TDM is easily accessible for those with sufficient technical knowledge. A brief internet search quickly turns up the CLIP code that is used by many generative AI developers. (https://shorturl.at/IM258) There are also lists of open source LLMs available (https://shorturl.at/lrPX1) and proprietary options that can be licensed. Some research efforts, such as the BMO Lab for Creative Research, will publish details about their use of TDM (https://shorturl.at/ijFIS) while many for-profit ventures will not. With this ease of access, one can assert that where AI is being developed, TDM is happening. The government has a responsibility to regulate its use.

Canada’s existing copyright law does not adequately require those conducting TDM to approach copyright holders for licensing to access their work. Because these developers need a massive quantity of data and are able to get it for free using datasets like LAION-5b, they have no incentive to ask permission or pay rights holders. Licensing modes for works at the scale needed can’t exist without both parties entering into good faith negotiations about employing them.

Licensing for AI training is being explored most often by companies that sell stock photos, fonts, and creative assets. Their pricing structures reflect that existing industry. A client pays once for access to the assets, pays a higher fee for commercial use, and then pays additional fees to train AI. The right to negotiate costs for widespread or corporate usage of their assets is reserved in line with how they licence their other products. This is a useful framework to apply to licensing work for TDM as it allows for nuance based on usage.

The question of remuneration is difficult to answer without precedent for what copyright the user of the generative AI will hold on the AI’s output. In keeping with current standards for licensing art assets, licensors should be able to determine the cost of the licence based on the demand and what income they stand to lose via automation and all relevant legal terms of service.

Clarifying the scope of permissible TDM must require that TDM only be conducted on licensed data with rights holder’s consent. In line with existing policy, Canada is most lenient in when activities are conducted for research purposes. Canada’s own Panel on Research Ethics states that “An important mechanism for respecting participants' autonomy in research is the requirement to seek their free, informed, and ongoing consent. This requirement reflects the commitment that participation in research, including participation through the use of one's data or biological materials, should be a matter of choice and that, to be meaningful, the choice must be informed.” TDM and the industries, processes and models that rely on it should not be subject to any less strict regulations than researchers would.

Regulating TDM in this way would serve to protect professionals across all sectors. Those who wish to be included in development of AI will be able to engage autonomously. Individuals that desire to opt out will be able to do so. This equitable relationship will allow both workers and technology to flourish.

AI developers should be required to keep records of data used in the training of their models and be able to disclose records when asked. To enable record keeping, metadata should be retained when scraping images and/or conducting TDM. This will allow AI developers to accurately index, curate and credit their datasets. To enforce copyright, one must be able to know if protected work has been used, so records must be kept and reported.

Disclosure of the use of copyright-protected works in an AI product to the public is essential. Especially in cases where generative AI produces output that could be mistaken for the work of an individual artist, it must be made clear that it was made by AI. This would prevent fraud and misrepresentation of an individual’s character. The disclosure must state that AI generated the work and explain how to request the details of the data from the developer.

New legislation in the EU sets a strong example for how to handle AI makes it clear that TDM must respect the EU’s copyright law. It divorces copyright law from AI law in order to prevent AI technology from influencing copyright, or being an exception to the laws.

"Any use of copyright protected content requires the authorization of the rightholder concerned unless relevant copyright exceptions apply”, and “where the rights to opt out has been expressly reserved in an appropriate manner, providers of general-purpose AI models need to obtain an authorisation from right holders if they want to carry out text and data mining over such works.”

Additionally, they go further to state that any TDM activities or AI products or services that seek to do business in the EU must conform to their copyright and privacy standards.

The EU will also require all general purpose AI developers to: “put in place a policy to respect Union copyright law in particular to identify and respect, including through state of the art technologies where applicable, the reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;” And “draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office;”

Its also worthy of note that these obligations apply to general purpose AI developers, as mentioned above. The EU recognises that regulating AI outside of high-risk applications is necessary to the wellbeing of their member nations. Canada should adopt a similar stance.

Authorship and Ownership of Works Generated by AI

Is the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies? If so, how?

The uncertainty around the rights to ownership and authorship of works created with AI is absolutely a stumbling block in the development and adoption of AI tools. The myriad processes involved in the creation of generative AI outputs, coupled with the lack of regulation has left many individuals and businesses interested in the space to wait for precedent to be set before exploring their options. In creative industries, ownership of intellectual property (IP) is an essential element of conducting our business and compensating parties involved in projects. When the rights are unclear in these situations, all other processes that depend on the IP agreements get bottlenecked or threatened with liability. It is impossible for any widespread use of a new technology to be adopted in an entertainment industry when IP rights aren’t clearly delineated.

Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how?

The government should propose clarification on copyright ownership for AI-assisted and AI-generated works, and it should do so in a way that is carefully informed by the nuances of different use cases and the impacts those uses have. Generally, our Local supports copyright legislation that grants authorship only to work created by humans. (ie, in the case of a storybook with AI illustrations and human-written text, the text would be subject to copyright and the illustrations would not.)

Within that stance, we feel that there are some nuances that must inform copyright law for AI-generated works:

-What were the terms of the licence on the training works that the AI-prompter used to generate their product? Works generated using source material from the public domain should not able to be copyrighted by a prompter, content that is generated using works made in “for hire” agreements that predate the application of generative AI should not be able to be copyrighted by the prompter, and works generated by infringing upon copyright or privacy should also not be able to be copyrighted by the prompter. However, in a case where both parties enter into an agreement where the individual from whom the training data is being sourced from is willing to grant copyright to the prompter, some allowances for copyright may be made.

-What were the terms of the licence for the TDM and other AI software the prompter used to generate their product? It is unclear if the hand that an AI developer has in writing TDM parameters entitles them to any editorial authorship. There may be methods outside of the mainstream of generative AI where developers working with LLM could be exerting creative authorship through their curation processes when making more sophisticated generative AI models than Stability AI, Midjourney, Imagen, Dreamup and their competitors. Currently, these generative AI programs take a position of neutrality in the creative process, similar to how an art creation program such as Photoshop does, but there may be other developers in the field that merit a more nuanced position.

-Did the prompter create all of the data/work that was used to train the model? Do they own the copyright to that data/work? If an artist is using AI assistance or generative AI that has been trained solely on their own creative output, that could be looked at as a nuance that allows for copyright to be applied to an AI product. Artists currently make assistive tools such as custom typefaces or art brushes to automate parts of their processes to increase their own productivity without a loss of copyright on the finished product. When made with entirely their own copyrighted material, it should fall within copyright protected works.

-If a work is created with AI-assistance, to what extent was the labour completed by a human and to what extent was the work generated by AI? When authorship based on the above examples is unclear, decisions regarding copyright protection should heavily favour the amount of human labour that goes into producing the end result. This will help to retain value on the physical and creative work of humans, and the innovation and creativity that is intrinsic to that process.

Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

Both the United States and the EU have created precedent that human authorship is an essential element of making anything eligible for copyright protection. As mentioned above in our submission regarding TDM in the EU, our local feels that AI policy in the EU is a solid reference point for Canada. Seeing consensus for this position in our closest trading partner, the United States, reinforces our position that these are good examples to follow when crafting Canadian Legislation.

Infringement and Liability regarding AI

Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright (e.g., AI-generated works including complete reproductions or a substantial part of the works that were used in TDM, licensed or otherwise)?

Existing legal tests for copyright infringement were written with human-scale infringement in mind, and legal frameworks that are reasonable for humans are not reasonable when evaluating machine processes. A new mode of evaluation must be created to specifically evaluate AI-generated material.

Human-scale and generative AI-scale remix and sampling are very different, any output that is produced by these two modes is different as well, even if they appear similar on a surface level. The concepts of transformative works and fair-use function on the transformation created by the synthesis of a human’s intent and subjectivity creating new meaning from the copyright-protected works that are referenced. As of yet, generative AI models have not been proven to be capable of exercising judgement, expressing a subjective stance or demonstrating any awareness of what the output they are creating means in any human-equivalent way. In their paper, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” (https://dl.acm.org/doi/10.1145/3442188.3445922) authors and AI researchers Emily M. Bender, Timnit Gebru and Angelina McMillan-Major warn that generative AI is only capable of creating human-like expression, and that the meaning seen in generative AI is brought to it by the viewer as a result of a very human desire to ascribe meaning to recognizable patterns. What AI produces is stochastic, or random, sequences of forms or words that have a high probability of matching what a prompter has requested. Similar to how a parrot may be capable of saying the words “pretty” and “bird” but has no sapient knowledge of what either of those phrases signify.

AI is factually incapable of the synthesis and meaningful expression that a human being is able to perform. With this essential distinction in mind, the best way to rule on infringement in cases of AI-generated content is to examine the training material for unlawfully-used copyrighted material. To ascribe any weight to the perceptual difference that is created through the generative AI production process as transformative work is to distract oneself from the material realities of the processes and their real-world implications. If an AI-generated work relies on copyright-infringing material to be generated, it is violating copyright.

What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?

Many developers of generative AI are secretive about their datasets and the parameters they apply when using TDM to build their generative AI models. Datasets like LAION-5b are massive and sample even more massive source material. LAION-5b is “5.85 billion pairs of image URLs and the corresponding data”, which is a 240 terabyte bundle of files, and it is based on Common Crawl’s repository of web data, which boasts a size of multiple petabytes of data. (One Petabyte is equal to 1000 terabytes.) While LAION-5b includes the metadata of their image-text pairs in their dataset, the preservation of this metadata is up to the next user of the dataset. If this metadata is destroyed, it becomes incredibly difficult to search a massive repository of data for infringing images.

Additionally, copyright-protected materials can be “laundered” to dodge copyright protection and create the illusion of a generative AI system that is capable of creating works without infringing input. In her paper for the IEEE, "AI Imagery and the Overton Window of Data Laundering” author Sarah K. Amer describes the data laundering process in text to image generators as follows:

“STEP 1 Visual media (pictures, art, illustration, logos, etc.) is scraped from the internet

STEP 2 Scraped media is stored in a dataset or group of datasets

STEP 3 Scraped media is used to train AI text-to-image models using GAN and Diffusion architecture

STEP 4 Training produces and stores latent images based off of the original media

STEP 5 New imagery is later generated from the stored latent image bases by an AI end user to sell”

Amer goes on to describe that companies will gain access to copyright protected works as part of research projects under non-profit and academic entities. Once the data laundering is complete as part of the research stage, the companies will sell their laundered dataset and model and become for-profit organisations. (https://arxiv.org/pdf/2306.00080.pdf) This shell game obfuscates the path the data took to becoming training material for a generative AI model, and it is currently unclear what the legality of this pipeline is.

Our Local, and the entertainment industry at large, feels that this laundering process is not substantially transformative of the copyright-protected source material in theory or in practice. We feel it is clear, for reasons we have stated above, that generative AI models are not capable of the meaningful synthesis that constitutes transformative works and that the products of these generative AI programs still appear to be copyright-infringing after this laundering process. Regulation must close this legal loophole that has been exploited to make it difficult to prove that generative AI is infringing copyright.

When commercialising AI applications, what measures are businesses taking to mitigate risks of liability for infringing AI-generated works?

Use of AI applications in the creative sectors in Canada is largely conducted in an end-user capacity, and not in a developmental capacity. Many large studios are avoiding applications of AI at any large scale because they cannot exercise meaningful oversight of the data used to train AI tools, and their actual liability in the process is legally unclear. Studios insulate themselves from risk by relying on AI for steps of the creative process that are not broadcast or published in an obvious way. This can take the form of ideating works that will be redrawn by human artists, automating image editing, or other uses that create the appearance to the viewer that generative AI was not used.

A small number of studios and individuals have sought out agreements with copyright holders that allow for licensing and consent around content use for training generative AI, and the more successful examples of this look like the copyright regulation suggestions we have made in this submission. Unfortunately these examples are the exception and not the rule.

Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

Yes, the Government should clarify liability in the case of AI-generated works infringing on copyright protected works. Liability for copyright infringement lies with any parties that were responsible for the assembly, curation and direction of the dataset. Whether those parties are the developers creating the generative AI tools, or the clients or users of those tools that freely chose to use those tools and publish the product of them. Developers should be responsible for providing accurate information to clients and users about the copyright status of the materials in their datasets, and should they misrepresent that information, the clients and users should be released from liability.

Additionally, when it is unclear who amongst the developers of generative AI is ultimately liable for infringement caused by AI, claimants should be able to exercise multiple avenues in order to claim compensation. If no individual person can be clearly found to be at fault, strict liability claims should be open to claimants. And in cases where errors have led to the issue, claimants should be able to claim compensation as they would for a defective product.

Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

Again, it behoves the Canadian Government to look to the EU for examples on effective approaches for liability in copyright infringement. While the functionality of Canadian legislation will be different than a multi-state entity like the EU, we urge Canada to look to the ethos behind the EU’s Artificial Intelligence Liability Directive. This directive seeks to make it easier and simpler for the claimants who are harmed by AI to seek compensation for that harm, because it is easier for AI to harm those claimants. This would be an essential step in levelling the playing field between rights holders and any bad actors that are utilising AI to infringe on those rights. When it becomes easier to violate rights, it must also become easier to defend those rights.

Comments and Suggestions

I am submitting these responses on behalf of the Canadian Animation Guild, IATSE Local 938(CAG938). We are a quickly growing union representing animation and video game workers in Canada. Currently we represent 420+ active union members and 600+ workers currently in the process of bargaining their first collective agreements. Additionally, IATSE organisers are collaborating with workers in animation, video games and visual effects (VFX) studios across the country who are seeking union representation. Our local's submissions on this topic are thereby representative of the sentiments of the members of our local as well as the concerns of the broader Canadian entertainment industries that are seeking representation with us or alongside us.

As an international union, IATSE is a member of the Human Artistry Campaign (HAC). As such, our local submission is guided by the Core Principles for Artificial Intelligence Applications in Support of Human Creativity and Accomplishment defined by the HAC. Per the Core Principles for Applications of Artificial Intelligence and Machine Learning Technology published by IATSE on July5th, 2023 we seek to:

-Ensure entertainment workers are fairly compensated when their work is used to train, develop or generate new works by AI systems

-Prioritise the people involved in the creative process and protect owners of intellectual property from theft

-Improve transparency of the use of AI & machine learning systems

-Prevent legal loopholes that can be exploited by individuals, companies, and organisations in the U.S., Canada, and otherwise

CAG938's members have a strong interest in what shape AI legislation in Canada will take, and we are grateful for the opportunity to make a submission on this topic. This submission will focus on our knowledge of and recommendations for the application of copyright law to generative AI in the industries that our work touches. Alongside these points, our Local would like to assert that it is in support of the Canadian Labour Congress's position that AI regulation is essential across all industries that are impacted by its adoption, not just the ones that have been designated as "high-impact" by the Canadian Government.

Legislation should be approached from a point of view that places human rights, labour, societal impact, accountability and privacy at the forefront. It is only by ensuring that those priorities are met that Canada can safely rely on AI to improve the lives of its workers and grow its industries. Without those safeguards in place, Canada is at serious risk of enabling harm on a massive scale to the privacy, safety and livelihoods of its citizens. The Canadian government must not allow industry to circumvent the copyright protections that Canadians rely upon to protect both their labour and culture.

In closing, our Local would like to emphasise how essential it is that copyright law is not considered in a vacuum when dealing with AI. Privacy law is an essential part of this discussion, and many aspects of the generative AI issue must be addressed through a lens of protecting the privacy of Canadians.

Throughout this submission, our arguments have focused on the very real impacts of generative AI and copyright on the professional work and livelihoods of Canadian artists and entertainment workers. For workers in our industries there is not always a clear line between our personal and professional lives, especially when one attempts to draw it around things we have created. We work for studios, we work for ourselves, and we share art in public to build community. With the lack of privacy protections in Canada around TDM that don't guarantee individuals the right to opt-out of these processes, Canadian artists are at risk of losing the spaces they have relied upon in order to share skills, network and build a world-renowned industry. Privacy laws must account for all of these nuances in order to preserve the livelihood and wellbeing of Canadian creative workers.

Privacy and copyright legislation must work together to protect Canada's vibrant creative industries. All regulations must put workers first, and not fail to support the preservation and elevation of Canadian innovation and creativity. Regulations must ensure that rights holders are able to exercise control and consent over how their work is used, that workers are fairly compensated for their labour, that the needs of people are prioritised over the development of technology, that practices around AI are transparent and that all loopholes that allow for the exploitation of workers and rights holders are closed.

The members of CAG938 and the workers that make up Canada's world class entertainment industries will be watching the Canadian Government's decisions on this matter closely. We are hopeful for a bright future that values human expression, labour and fairness for all.

Respectfully submitted on behalf of the Canadian Animation Guild, IATSE Local 938.

Canadian Artists' Representation / Le Front des artistes canadiens (CARFAC)

Technical Evidence

Canadian Artists’ Representation / Le Front des artistes canadiens (CARFAC) is the national association for visual and media artists in Canada. We represent 4,000 artist members across Canada, and we work to affirm the economic and legal rights of all visual and media artists. The non-consensual use of intellectual property by generative AI companies profoundly impacts artists. This submission results from over 220 unique responses from artists to a national survey regarding their concerns about generative AI products; community dialogues at virtual panel events organized by CARFAC in Ontario and Saskatchewan; and hundreds of hours connecting with artists and stakeholders across the cultural sector in Canada and abroad.

We are concerned about how generative AI companies interpret the laws that apply to their business models. For example, Midjourney recently updated its terms of service to clarify that users may not use the product to violate others' intellectual property rights, and that doing so may result in the company taking legal action against the user. Yet, the company has itself been accused of copyright infringement on multiple occasions. The terms of service goes on to state that the company does not guarantee that the service does not infringe on copyright.

Additionally, Open AI recently launched a program called Copyright Shield which promises to pay legal costs for its developer customers who face lawsuits over IP claims. Assertions that these business practices are in compliance with the Copyright Act are inconsistent with these protective measures, and also raise the question why some AI companies are ratifying licensing deals with major publishing and media providers, but argue against the use of licensing models for all content being used as training data. We discuss this issue further in this submission.

Regarding how businesses and consumers use AI-assisted and AI-generated content, Canadian artists face growing labor disruptions. This trend is unsurprising as organizations that once licensed the use of original works can now use generative AI to meet their needs without paying creators.

Text and Data Mining

Using artwork obtained through TDM to train Generative AI products without allowing artists to provide consent, negotiate compensation, or determine if/how they will be credited violates those artists’ rights under the Copyright Act. This consultation is an opportunity for the Government to educate generative AI companies to ensure they comply with the law.

Questions about moral rights are notably absent from this survey; however, the violation of artists' moral rights is inevitable based on the current business models used by most generative AI companies. It is common for generative AI models to distort original works which may harm the reputation of artists, and artists do not commonly have the choice to be credited or remain anonymous. Generative AI also enables an environment in which artists are unable to protect their works from association with causes, products, services, or institutions to which they are personally opposed.

Eighty-two percent of artists responding to our survey indicated they were very concerned that their artwork is used without consent to train generative AI products. This concern is so deep and widespread that independent countermeasures are required. For example, the University of Chicago has developed Glaze, a tool that artists can use to protect their works online from becoming AI training data. While these efforts are appreciated, protecting the IP rights of artists against non-consensual use by some of the world’s largest corporations should be done through copyright law and federal regulation. It should not rely solely on these independent initiatives.

Terms such as “training data” are used frequently, and this language can devalue a creator’s intellectual property. For artists, this is not “training data”; this is their life’s work.

The plurality of stakeholders, the legal uncertainty, the need for more transparency in data management systems, and the opacity of AI systems are insurmountable obstacles for artists to defend their copyrights without support. The critical challenge currently faced by Canadian rights holders in licensing their works is the inability to determine what copyright-protected works have trained generative AI products; this opacity prevents parties from negotiating licensing terms and stifles the development of emerging licensing markets. It is also challenging to establish that the infringing party had access to the original work, that the original work was the source of the copy, and that the work was significant in informing the creation of the new content produced. However, we understand that AI developers and researchers in the sector document their training data. Therefore, greater transparency of this data with rights holders is therefore technically feasible.

Most artists have yet to have opportunities to negotiate licenses for their works already used to train generative AI models. Though many mainstream generative AI companies do not employ licensing models, such business frameworks exist within the AI industry. Getty Images, for example, has released an AI Image Generator trained exclusively on its content. Getty compensates creators for the use of their work in their AI model.

Resistance from generative AI companies to engage in licensing negotiations with the arts sector is another critical challenge in establishing a market-based approach to consent and compensation for artwork used in TDM. Meta, for example, has argued that imposing a licensing regime after the fact would cause chaos for the industry and result in little benefit for artists, given the insignificance of their respective works within larger datasets. Already, however, we are seeing contradictions in these arguments. OpenAI recently entered a licensing deal with Axel Springer SE, the parent company of Business Insider and Politico; such an agreement could become the norm. Companies that regularly violate the Copyright Act should not benefit from an exemption on the grounds that those actions have already occurred. Even when the financial value of an individual work is small, this does not preclude the rights of artists to provide consent and receive payment for using that work.

The Canadian Government should avoid entertaining arguments that complying with the Copyright Act and obtaining prior consent from artists would slow the development process of generative AI products. While AI technology may be complex, basic principles of fairness, justice, and asking permission before taking things are straightforward and baked into Canadian laws and social values.

Generative AI companies are rightfully excited about the products they produce and understandably feel a sense of urgency to accelerate the development of those products. Artists are no different; we must regard their needs with the same level of importance, innovation, and urgency. Neither group can be permitted to operate outside of the law or develop their products in ways that harm individuals or society at large.

The Copyright Act is sufficient and applicable to protect the rights of creators in the context of generative AI. There is no reason to believe that current copyright laws do not, or should not, apply. Situations in which private companies use, without permission, the copyrighted works of Canadian artists to develop and grow the value of their commercial products is precisely the kind of scenario that the Copyright Act should prevent. The Federal Government should refrain from implementing new exceptions for TDM. Doing so would devastate the economic environment for Canadian artists – many of whom live at or under the poverty line. A TDM exception would result in long-term negative social and cultural externalities, including compromising the global competitiveness of Canadian arts and culture and harming small creative businesses.

The current environment must enable rights holders to determine if their works have been used to train generative AI models. An opaque operating model both encourages the unauthorized use of Canadian artists’ works by AI developers and prevents licensing negotiations from taking place. We, therefore, recommend that generative AI companies be required to publish records of copyright-protected works that have trained AI models.

Developers and researchers in the generative AI sector are already documenting their training data, for example, using model cards. Model cards can record structured information, such as the names of domains where training data is collected. Therefore, introducing a record-keeping obligation should not entail additional costs for the AI industry and would provide much-needed transparency.

Artists and generative AI companies should negotiate remuneration for licenses without government intervention. The Government can enable a market-based solution by ensuring that the generative AI companies operating within Canada comply with current Canadian copyright law without exception, and that records of copyright-protected works that trained AI products are made public. Generative AI has negatively impacted labour opportunities in our industry. The Federal Government can contribute to stabilizing this fallout by ensuring that generative AI companies operating in Canada adopt appropriate licensing models.

Authorship and Ownership of Works Generated by AI

Existing copyright laws are sufficient to address authorship and ownership, and no legal amendments are required. As the Supreme Court of Canada noted in CCH v. The Law Society of Upper Canada, “An original work must be the product of an author’s exercise of skill and judgment. The exercise of skill and judgment required to produce the work must not be so trivial that it could be characterized as a purely mechanical exercise”. These same criteria should apply when evaluating the granting of copyright to AI-produced or AI-assisted works. Entering a series of text prompts into an AI Image Generator is decidedly a “purely mechanical exercise,” that does not require the user to exercise “skill and judgment.”

There may be, however, other situations where AI-generated or AI-assisted artwork meets the criteria for copyright protection. For example, suppose an artist designs an AI model and trains that model with their artwork so that the model can understand and interact with the training data in unique ways specified by the artist. In that case, the content resulting from this process may meet the copyright criteria.

The question of authorship of AI-generated works is essential but difficult to consider in a landscape where private companies use Canadian artists' intellectual property without consent, credit, or compensation to develop their products and increase the value of those products. Suggestions that generative AI companies could be able to continue developing their products using unauthorized Canadian artists’ works while simultaneously considering if the resulting content generated by those products should receive copyright protection are concerning.

The devastating impact this would have on the creative economy in Canada is profound and difficult to predict, though it is essential to highlight the specific effects on Indigenous artists. As the theft of original Indigenous cultural expressions is already widespread, its unauthorized use by generative AI companies is unconscionable and contrary to notions of Truth and Reconciliation. Moreover, including counterfeit Indigenous artwork in training datasets accelerates the spread of counterfeit imagery, and the generation of AI content based on authentic or fake Indigenous artwork cannot be permitted. This issue deserves a separate analysis and consultation process.

Infringement and Liability regarding AI

When generative AI companies use Canadian art to train their models and build the commercial value of their products without consent, they commit copyright infringement and must assume liability for those actions.

Requiring generative AI companies to keep and publish records of copyright-protected works used to train their models will address the large-scale copyright infringement that has already happened and provide parties involved with the information needed to negotiate terms for using those works. This will enable the development of licensing markets and strengthen Canada’s creative economy while potentially accelerating growth and competition within the AI industry itself.

Comments and Suggestions

On October 12th, 2023, the Government of Canada announced this Consultation on Copyright in the Age of Generative Artificial Intelligence, with submissions due by December 4th, 2023, and extended to January 15, 2024. We are concerned that this short timeline to prepare recommendations on this complex issue has the danger of capturing uneven results. Some arts and culture sector stakeholders could be unfairly disadvantaged by having to balance limited resources and capacity while preparing thoughtful and researched analysis, while generative AI industries can allocate much deeper resources to the process. While we appreciate the extension to respond to this consultation, future consultations will benefit from longer timelines. And while we are thankful for the opportunity to respond to this survey, we are disappointed that we were not invited to consult on, or contribute to, its design.

CARFAC does not wish to hinder the advancement of AI, but we need to preserve the balance that the Copyright Act underpins, and we must uphold the interests of copyright holders. Indeed, we see the potential of AI: if adequately regulated, it could fuel creativity, promote content discoverability, and equip creators to defend their rights.

Nevertheless, it is essential to be aware of the negative impacts that AI can have on all sectors, on the foundations of our society, and the rights of artists. As generative AI profoundly impacts the cultural industries, creators must be centrally involved in developing the governance and policy frameworks affecting our sector.

In summary, our primary concern is to ensure compliance with the Copyright Act. The "3 Cs" principle (consent, credit, and compensation) must guide the Government's actions in this public consultation and any potential amendments to the Copyright Act. Our request is consistent with those called for by artists in other countries at the present time. Creators' consent must be obtained and the Government must not undercut their options to be paid when their content is used for text and data mining ("TDM") purposes. We also recommend that a transparency obligation be imposed on users. Specifically, this framework should require disclosure of any works used in the context of generative AI. Such a mechanism is feasible and poses no technical difficulties for generative AI companies. Rather, it would lay the foundations of the edifice, in order to ensure fair and equitable remuneration for artists and copyright holders.

Canadian Association of Broadcasters

Technical Evidence

The members of the Canadian Association of Broadcasters (CAB) are meaningful players in the Canadian cultural economy and are uniquely situated as both users and creators of creative content. This dual perspective enables the CAB to appreciate the motivation of the Government to encourage innovation in generative AI as a means to increasing efficiency and economic growth while also ensuring that creators receive the necessary protections for their underlying works that is essential to incentivizing creativity and the cultural economy in this country. In large part, the CAB’s members are in exploratory and experimental phases of engagement with generative AI.

Text and Data Mining

The CAB supports the retention of copyright protection in works that would be otherwise subject to such protection, and does not support a general exception for text and data mining (TDM). The mere existence of generative AI systems does not support the removal of the copyright protection that automatically arises in Canada when original works are created and fixed in a material form. Copyright is a creature of statute, and the Copyright Act states at section 27(1) that “[i]t is an infringement of copyright for any person to do, without the consent of the owner of the copyright, anything that by this Act only the owner of the copyright has the right to do.” Accordingly, the question is whether these generative AI systems are doing anything that only the copyright owner has the right to do.

The technological methods employed to undertake text and data mining in a given situation must be considered in answering the question of whether the use of copyright protected works by generative AI systems is infringing. There may be activities in connection with TDM that, based on the technological methods employed, do not infringe copyright. For example, generative AI systems that engage processes akin to reading the underlying works, much the same way search algorithms read underlying content in order to produce meaningful search results, may not result in copyright infringement. To be clear, this concept of “reading” has to be fully evaluated to determine whether it in fact triggers liability. If generative AI systems are engaged in making reproductions, it may be possible that such reproductions are subject to existing copyright exceptions such as fair dealing at section 29 or the technical process exemption at section 30.71. The answers to these questions lie with the creators of AI systems and are not readily available to the end-users.

If creative works are being engaged in a manner that triggers copyright protection, the owners of the copyright in those works should be entitled to compensation for that use. The existing neighbouring rights regime in the Copyright Act provides an operational example of how copyright owners can be paid for the use of their works even in situations where it may not be possible for them to deny access to their works. Performers and sound recording makers are entitled to be paid equitable remuneration when published sound recordings containing performances are performed in public or communicated to the public via telecommunication. This payment is made to the designated copyright collecting society in the case of sound recordings of performances. The amount of the payment is determined either by way of direct negotiation between the user and the rights holder and/or the collecting society or, in many cases, through the administrative process carried out by the Copyright Board of Canada. If it is determined that generative AI systems are engaging the copyrights of the underlying works being used to train those systems, payment could be made to the underlying rights holders via a system of equitable remuneration similar that already in place for published sound recordings.

Authorship and Ownership of Works Generated by AI

The question of whether works created through generative AI systems should themselves be subject to copyright protection is directly tied to the concept of the author in the Copyright Act. The Act provides at section 13.1 that the author of a work is the first owner of copyright in that work. The Act does not define, author, per se, though it does indicate at section 5(1)(a) that copyright will only subsist in works if the author was “a person”. In addition, as highlighted in the Consultation paper, “Canadian copyright jurisprudence suggests that 'authorship' must be attributed to a natural person who exercises skill and judgment in creating the work, reflective of the fact that the Act ties the term of protection to the life and death of an author.” To date, Canadian copyright law appears to only provide protection for human-generated works.

In the case of works wholly produced by generative AI systems, that is those generated by a system that has received only cursory instructions from an AI user, there is no natural person who has exercised the necessary skill and judgment required to meet the preconditions of authorship and thereby give rise to copyright protection in the autogenerated work. The computer program will have made the creative decisions independent of any human interaction. Whereas, until recently, computers and software used to generate creative outputs have been viewed as a tool used by authors exercising considerable skill and judgment in connection with the creation of a work, new generative AI systems can be instructed without the exercise of skill and judgement that gives rise to copyright protection in Canada. Accordingly, AI work products that can result from basic instructions that lack a level of skill and judgment exercised by traditional authors should not receive the same protection afforded to human-generated works.

Moreover, it is essential that works wholly produced by generative AI systems do not benefit from compensation through existing royalty channels, as this would serve to undermine the existing system that compensates human rightsholders for their creative labours.

Infringement and Liability regarding AI

If one starts with the proposition that the underlying works used to train AI systems are subject to copyright protection, it follows that, under existing copyright law, those works could be infringed by the AI systems themselves as well as by the end users of the AI-generated works. As the Consultation paper rightly points out, it will be near impossible for end-users to know which works were used by the AI systems and who the copyright owners of those works could be. Therefore, it would be unreasonable to put the onus on the end-user of the AI-generated works to avoid involuntary infringement. Only the providers of the AI systems that are inputting the underlying works into those systems have the potential to know what works are being used. In this way, only the creators of the AI systems should be liable for infringement that occurs as a result of the inputs they chose to rely upon and the way they manipulate those inputs.

At section 38.1(1)(a), the Copyright Act provides statutory damages of between $500 and $20,000 per work infringed for commercial purposes. The application of this provision to the underlying works used in generative AI systems could quickly lead to absurdly high damages for end users who have no knowledge of or ability to determine whether and to what extent copyright is engaged by the generative AI systems. If the Government accepts the end-users have no knowledge of the underlying copyrights that may be engaged by their use of generative AI systems, it will be important to clarify that statutory damages do not apply in the case of AI generated works for individual or commercial users and further to ensure that such users are statutorily indemnified by the generative AI system owner or licensor against any and all copyright claims.

Comments and Suggestions

The CAB’s members are active participants in the Canadian cultural economy, as both creators of original content and as users of copyright protected works. The promise of generative AI in the context of broadcasting is nascent but may yield benefits for Canadian private broadcasters and their audiences. CAB members are currently exploring the potential of generative AI in their businesses and are keen to see how the Government shapes the rules surrounding this area of innovation.

The CAB advocates for continued copyright protection in the underlying works used to train AI systems. Where AI systems are infringing that copyright through the technological processes they employ, the owner of copyright in the underlying works deserves to be paid for that use. If applicable, the use of the protected materials may be subject to an exception under the Copyright Act, in which case the use would be permissible. The nature of generative AI is such that the end-user has no knowledge of the copyright protected works that may have been infringed by the AI systems in the production of these works, and therefore it is not reasonable or fair for the end-user to be liable for any copyright infringement that results from that use. The providers of the AI systems are uniquely positioned to know which works are engaged and to make the necessary payments. The statutory damages framework in the Copyright Act should therefore be amended to preclude its application to end users of works produced by generative AI and end users of generative AI should be protected by a statutory indemnity.

Canadian Association of Research Libraries

Technical Evidence

Technical aspects of AI, particularly generative AI, are rapidly evolving. The Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems along with pending legislation will provide a framework for AI systems. Until communities of practice are established for the evolution and use of AI tools, guided by court decisions which will likely draw on the principle of technological neutrality, the research library community believes that the government should not restrict the use of AI, unintentionally or otherwise, as to do so would hamper innovation.

Text and Data Mining

Text and data mining (TDM) is not a new concept; post secondary students and researchers in Canada have long utilized non-generative AI systems that rely on TDM and also use TDM as a research practice. However, the legal status of TDM currently lacks clarity, and the absence of a specific TDM exception in the Canadian Copyright Act hinders researchers' efforts and impedes progress by requiring extensive copyright analysis to ensure compliance. A new statutory provision should be implemented to confirm that the use of a work or other subject matter for the purposes of TDM does not infringe copyright and is thus noncompensable (i.e., any remuneration would be separate from nonconsumptive TDM). The exception should apply to all users and allow commercial and non-commercial uses and allow retention and sharing of copies used for TDM.

In addition, to assist developers and users of AI more broadly, the fair dealing exception (Section 29) should be amended to make the list of purposes illustrative. It should also be made clear that fair dealing is not subject to contractual obligations, that authors and publishers cannot prevent the use of fair dealing (e.g. opting their works out of an LLM training set), and that TPMs can be circumvented for non-infringing purposes. These changes help maintain the Supreme Court of Canada’s description of the provision in CCH v LSUC, that the “fair dealing exception is always available” (para 49) and that, “the availability of a licence is not relevant to deciding whether a dealing has been fair” (para 70).

For context, research libraries typically license electronic resources with mostly non-negotiable terms of use that may prohibit activities including TDM. Publishers and their intermediaries hold the balance of power in this environment. It should not be necessary for users to obtain a secondary license for non-infringing activities, including TDM.

A number of Canada’s key trading partners already have a specific exception for TDM, including Japan, Singapore, United Kingdom, and the EU. In addition, research shows that providing copyright exceptions or other clarifications of the law to permit TDM is associated with increased publication of scientific research in the countries that make the change (Pijipvideo, Empirical Study Pt 2: Impact of Research Exceptions on Scientific OutputJoan-Josep Vallbé, - May 23, YouTube (July 24, 2023), https://www.youtube.com/watch?v=2bs_e7QRDHo&list=PLuk2SmOxN5RI1z40tC6qDxV 6uQdq-kqLq&index=4; Michael Palmedo, The Impact of Copyright Exceptions for Researchers on Scholarly Output, 2 Efil Journal 114 (2017)). CARL supports an exception that applies to both commercial and non-commercial research, as legislated in Japan and described by the Canadian Federation of Library Association's submission to the current consultation.

Authorship and Ownership of Works Generated by AI

Research libraries reflect the risk tolerance of their parent institutions and the adoption of AI tools could be hampered by uncertainty surrounding ownership of AI works. For example, concerns about copyright infringement could deleteriously impact decisions about selecting and using AI tools to support teaching, research, and other library services.

AI generated works do not meet the threshold for copyright protection as they do not involve a human exercise of skill and judgment (e.g., CCH Canadian Ltd. v Law Society of Upper Canada, 2004 SCC 13, [2004] 1 SCR 339, paras 16, 24, etc.) and should not be protected by copyright. CARL supports the recommendation from A Modern Copyright Framework for Artificial Intelligence: IP Scholars' Joint Submission to the Canadian Government Consultation (September 26, 2021) https://ssrn.com/abstract=4115848 that Sections 2 and 5 in the Copyright Act be changed to confirm that “author” is a natural person and that copyright does not subsist unless created by a human being.

AI assisted works are an inevitability. Responsible uses and best practices are emerging that take into consideration the difficulties in acknowledging the use of AI tools in a new creation. This is not an issue that should be addressed by the Copyright Act.

Infringement and Liability regarding AI

Current provisions in the Copyright Act already address infringement and liability related to copyright when a substantial portion of a work is reproduced as an AI-generated output. Before considering any amendments to the Act that pertain to the scope of permissible TDM activities, the courts should be provided an opportunity to consider any emerging issues, including those related to AI, and provide analysis and guidance for any legislative changes.

Copyright is one of multiple policy instruments that can provide appropriate controls related to AI systems, but not the most effective one for issues related to remuneration.

Comments and Suggestions

The potential uses of generative AI cross all areas of research library mission and operation. Libraries will play a critical role as AI continues to evolve and can offer supports related to data discovery, data management, and preservation (IFLA report, 2023), as well as AI literacy itself.

Research libraries support copyright literacy in universities and their staff understand how the balance between protecting creator rights and facilitating the exchange of ideas and promoting creativity benefits society as a whole. While consultation on AI and copyright is important and CARL is pleased to engage in this process, it is critical to point out that not all provided questions are related to the purpose and intent of copyright law. Issues related to author remuneration and record keeping should not be legislated or addressed in the Copyright Act. Any new copyright regulation of AI should not negatively impact the public’s right and ability to access information, knowledge, and culture. In addition, any new copyright regulation of AI should maintain the appropriate balance of rights and interests in Canada’s copyright system, consistent with a robust principle of technological neutrality.

CARL endorses the following submissions related to copyright and AI:

2023 CFLA Consultation on Copyright in the Age of Generative Artificial Intelligence

2021 Craig, Carys J. and Amani, Bita and Bannerman, Sara and Castets-Renard, Céline and Chapdelaine, Pascale and Guibault, L. and Hagen, Gregory R. and Hutchison, Cameron J. and Katz, Ariel and Mogyoros, Alexandra and Reynolds, Graham J. and Rosborough, Anthony D and Scassa, Teresa and Tawfik, Myra, A Modern Copyright Framework for Artificial Intelligence: IP Scholars' Joint Submission to the Canadian Government Consultation, https://ssrn.com/abstract=4115848

2021 Keller, Liwah and Yuan Stevens. Innovation and Balance. Submission to the Government of Canada’s Consultation on Copyright, AI, and IoT. https://cippic.ca/sites/default/files/File/CIPPIC_Submission_-_AI_%2B_IoT_Consultation_-_2021Sept17.pdf

In summary, Canadian research libraries posit that:

AI generated works do not meet the threshold for copyright protection.

A new TDM exception should be implemented and apply to both commercial and non-commercial uses.

Current provisions in the Copyright Act already address infringement and liability, and provide a mechanism for claims related to an AI-generated output, when that output closely replicates an original work that is already protected by copyright.

The government should not restrict the use of AI, unintentionally or otherwise, until court decisions can guide legislative change. To do otherwise would hamper innovation and the emergence of responsible practices.

Canadian Authors Association

Technical Evidence

These questions are not applicable to our organization.

Text and Data Mining

TDM directly engages copyright law in Canada since it involves reproduction of copyright-protected works and creators’ moral rights. Right now, the unauthorized use of such copyright-protected works is contrary to the Copyright Act, unless such use falls within one of the enumerated fair dealing exceptions, and then, only if the six-part legal test is satisfied. Some commentators allege that TDM is only “reading” the text, the way a human would read a book. However, the analogy is facile and false since machines are capable of verbatim regurgitation and manipulation of the works they ingest. By setting policy affirming that TDM requires Copyright Act compliance, and/or amending the Copyright Act to clarify that TDM does require compliance with the Act, would create the clarity that is needed to ensure balance between users and creators.

Providing this clarity ensures:

a. costly litigation will be reduced

b. marketplace solutions will emerge

c. Canadian content makers will maintain control and receive fair remuneration for the use of their creations in AI training, for example

d. no new exception should be created for TDM activities

Yes, TDM activities are being conducted in Canada. Examples include:

a. Books3

b. Cohere AI platform

c. Media monitoring requiring TDM activity

d. LLM research at Canadian universities

e. Globe and Mail: algorithm for front page stories

f. NovelAI

The primary challenge facing rights holders in licensing their works for TDM activities is lack of clarity regarding TDM and the Copyright Act. In the United States, the CCC, Copyright Clearance Center, is a copyright collective providing support for rights holders and engages in direct TDM licensing. This does not exist in Canada. The only collective for writers, publishers and artists, Access Copyright has recently been hollowed out due to failure by the government to remedy the devastating effects of the educational purposes fair dealing exception introduced in 2012, and to remedy the Supreme Court of Canada decision that copyright tariffs are not enforceable.

Because TDM activity is not transparent about which works are used by which data mining companies, rights holders, publishers and Access Copyright have no clarity about whose works are used, for how long and for what purpose. Thus, mechanisms in Canada for creators to obtain compensation for the use of their copyright-protected works has been completely undermined in the last decade by a number of factors: by the decimation of the sole collective agency, Access Copyright, as a direct result of the federal government’s unkept promises to fix the Copyright Act; by the vague wording of the educational purposes exception that prevents educational institutions and related parties from engaging in legitimate licensing processes; and by the gap in the law perceived by the Supreme Court of Canada in holding that Copyright-Board certified tariffs are not mandatory. It is completely untenable for creators to resort individually to litigation to enforce their rights against TDM businesses, even smaller ones, when they have no means to prove the infringements due to lack of transparency. For individual creators to sue the large, foreign, largely US-based platforms is unsustainable.

It is essential for rights holders that there be a rights market as there are in other jurisdictions, where the licensing process is working, and creators have the ability to control when their works should or should not be used and to participate in the compensation process when AI systems use their work.

Rights holders face immense obstacles in licensing right now, because they are being kept in the dark as to which of their works are being used by which TDM companies.

In the AI market with its surge in development, there are many types of licenses and models emerging to address the diverse TDM needs of different companies. Examples include: media monitoring, scientific data, and text based creative data. Opportunities for both collective licensing and direct licensing should be available for creators.

It is vital that the Government recognize that Canada’s cultural industry will unequivocally be negatively impacted by generative AI and that safeguards need to be put in place. Safeguards include clarifying the Copyright Act to ensure that TDM activities are part of the exclusive rights of creators and there should be no exception created for TDM activities in this Act. The Government would thereby ensure rights holders can effectively license, and enforce their rights, including whether or not they wish to allow their work to be used in TDM activities.

The Government should observe case studies such as Britain’s Intellectual Property Office, in 2022, which issued a limitless copyright exemption to AI tech companies, without having to obtain permission. This faced tremendous backlash by creators and Parliamentarians.

The Department of Industry: Innovation, Science and Economic Development (ISED) shares responsibility for guiding policy relating to copyright with the Canadian Heritage Ministry. Thus, any changes to the copyright landscape must be framed by balancing innovation and Canadian creators’ needs and rights.

It is essential that AI developers maintain complete records and disclosures regarding what and how content was used, as well as evidence of permissions granted. AI developers possess the ability to do this and must provide detailed reporting and data analytics ensuring all copyrighted material is clearly identifiable and referenced, whether it has been used in training or excerpted (in whole or in part) in any resulting AI output or product. Use of any copyrighted material at any stage in the process must be compensated through the application of a mandatory license to be paid per use by the developer and/or user of the tool.

Any approach to the authorship of AI-assisted or AI-generated works must be predicated on detailed and precise tracking and documentation of underlying sources, whether or not they are explicitly quoted or footnoted. If unlicensed, copyrighted materials have been used in an AI-assisted or AI-generated work, there should be requirement that both the original rights-holders (e.g. authors, publisher) and the relevant tariffing / licensing agency should be automatically notified of such a proposed use, together with an option for the author to refuse to license that use, or even to refuse it altogether (given the existence of each author's legal right to exercise 'moral rights' over their own material.) Exercise of any "fair dealing" exception should be confined within specific statutory defined limits for fair dealing (still to be defined and embodied within an amended Copyright Act), and must be accompanied by evidence of compliance with a mandatory tariffing regime.

The process of obtaining copyright clearances, such as for film/tv/audiovisual or multi-media uses, is already well understood. AI tool users making use of such content should be required to respect existing law and regulations, obtaining necessary licenses and clearances in order to use such materials. Failure to respect such rights and processes should be understood as a breach of copyright, with appropriate remedies.

The use of metrics on use cases, intended length of use, and intended use of the AI system itself needs to be established as part of the remuneration process. The market for TDM is evolving as are the offerings and costs of various licences for AI training purposes. When the rules around TDM use are fair to both developers and creators, the market will define the level of remuneration for TDM activities.

Canadian tariffing agencies must also be empowered with the right and legal standing to act on behalf of their represented content creators in pursuing remedies for breach of copyright. It would be challenging for individuals to summon the resources to pursue their legal rights when confronted by large institutional, organizational, or corporate infringers of those rights. Further, to avoid undue regulatory and process burden on developers, users, creators, or rights administrators, development of a common, automated settlement tool would enable the simple financial settlement of an established tariffing scheme (built around an e-commerce kind of framework), such that non-exceptional license clearances can be automatically settled.

1. United Kingdom: could serve as a framework for Canada with its balanced approach and growth of a rights market, fair compensation for creators and a growing AI sector.

2. United States: CCC Collective Licensing

Authorship and Ownership of Works Generated by AI

Canadian law is certain that although the owner of a copyright-protected work (literary, artistic, musical, and so on) can be humans, corporations, or other legal entities, only a human can be the author of that work. As discussed below in response to Question 2, Canadian Authors Association clearly advocates that no change to the law should be enacted to disturb that certainty.

While Canadian Authors Association recognizes that some forms of AI technology (such as editing and organizing tools) can be useful tools for writers, CAA is not in favour of creating any form of exception or protection for the AI technology sector. CAA is much more concerned about protecting the position of creators than that of the AI technology industry, which by reason of its sheer size and generally aggressive approach to ingestion, already has a dominant position in the marketplace.

The consumption by AI technology businesses of copyright-protected literary works infringes both the reproductive rights (permission to copy and to benefit from copying) and the moral rights (attribution, integrity and association) of the humans that created the ingested works. The human creators are individual and virtually helpless to assert the infringement of their rights against AI industry players. This certitude creates a heavy imbalance between the rights of creators and users within the copyright landscape. Any form of legislative protection for AI technologies would therefore concentrate even more power in the hands of large technology platforms, which already dominate content discovery and distribution. Therefore, CAA is strongly opposed to introducing any such legislative protection. The diversity of Canadian cultural discourse would be even further diminished if any protections for the AI technology industry were introduced.

In conclusion, CAA maintains that fully AI-generated works are not and should not be copyrightable.

Copyright law in force in Canada today recognizes that human creativity and expression is essential to attracting copyright protection. It is imperative that the articulation by humans of original creativity remains at the heart of Canadian copyrightable works. The legislation should not be changed to alter this well-established principle, to which Canada is bound pursuant to international treaties. Human skill and judgement must be exercised in order to attract copyright protection pursuant to the law right now, reflecting the policy perspective that human expression is vital to Canadian cultural discourse. Put another way, to leave the Copyright Act unaltered in this respect sends a strong message to AI technology developers that that AI-generated products are not capable of attracting copyright protection. Many such AI-generated works are regurgitations of human input and therefore to grant them status as copyrightable works would infringe upon the reproductive and moral rights of the humans whose works were digested without authorization.

As is currently the law in Canada, determining whether a work attracts copyright should continue to focus on the human’s creative processes and the skill and judgement they exercised in making decisions as to the selection, assembly and organization of elements that ultimately are incorporated into the creative work. If a creator uses AI as a tool to create a copyright-protected work, the portion of the work that was created by AI tools should not automatically be deprived of copyright protection. That’s because human skill and judgement were used in deploying the tool to assist with the creation of the work. The Government may choose to clarify or modify the copyright and authorship definition, either through setting maximum limits on percentages of AI- assistance and AI - generated work creators use at the outset to create their work, or using terms like “a substantial proportion of the work must be human-generated” which gives courts the mandate to exercise their discretion.

The Canadian federal government is uniquely positioned to set the stage for a strong, sovereign cultural policy in Canada. AI technology needs no assistance from the federal government to proliferate and thrive in Canada. AI platforms are concerned with profit, not the promotion of a distinctive Canadian cultural voice. The federal government has already pledged hundreds of thousands of dollars to help AI-industry players, without providing the necessary balanced financial incentives to inherently disadvantaged Canadian creators. To compound this imbalance with policies favouring AI-industry players would further stifle the Canadian voice. A strong Canadian voice is necessary not only to maintain and encourage future sovereign cultural identity and diversity, but also, for a strong democratic system.

CAA is aware that the UK government proposed to create an exception for AI-created works, exempting them from having to seek permissions to use copyright-protected work. The backlash against the proposal reached international news and serves as a cautionary tale: governments in charge of copyright and creative policy should not favour technology at the expense of creators. Canadian law explicitly endorses technological neutrality.

Further, the Canadian government itself can learn from its own experience. In creating an indiscriminate fair dealing exception for educational purposes in 2012, the government upended the tenuous balance between the rights of users and creators to the benefit of users – mainly the very large educational sector – and at the expense of creators, a loss currently estimated to be $200 million. Introducing any legislation that would further upset the already heavily imbalanced fair dealing provisions in the copyright landscape would further silence Canadian voices, stifle our culture, and thereby threaten our democracy.

Infringement and Liability regarding AI

There are very real concerns. The current legal framework for fair dealing exceptions within the Canadian Copyright Act has resulted in wholesale and uncompensated copying of copyrighted works; the Supreme Court of Canada has acknowledged that the current wording of the Act fails to define specific, enforceable limits for "fair dealing" while, at the same time, it deprives individual creators of the right to pursue remedies for breach through the tariffing agencies that were meant to collect for the use of their material. The combined findings that tariffs were not mandatory, and that the tariffing agency Access Copyright lacked "legal standing" to act for breach on behalf of its licensed users had the unfortunate effect of leaving Canadian creators without a just option to defend their rights against breach.

It is against this backdrop that creators must assert their rights.

It is therefore critical that AI tools maintain detailed and specific tracking concerning all source materials used to train, or as ongoing inputs to the tool and its subsequent applications, whether by original developers or by subsequent users of the tool. Absent such mechanisms, identifying or enforcement of any infringement would be impossible.

In this context, it could be observed that extending the term of copyright, as Canada has recently done, while depriving authors of the ability to enforce those rights, is at best, flawed. Unenforceable rights are no rights at all.

However, Canadian tariffing agencies must also be empowered with the right and legal standing to act on behalf of their represented content creators in pursuing remedies for breach of copyright. It would be profoundly unjust to deprive creators of their legal remedies arising from sweeping and generalized uses of their copyright-protected materials merely because they cannot, as individuals, summon the resources to pursue their legal rights when confronted by large institutional, organizational, or corporate infringers of those rights.

The most obvious barrier facing creators in determining whether an AI system accessed or copied their copyright-protected work without their consent is that they have no way to prove such infringement. Lack of transparency is a critical obstacle. Therefore, TDM and other AI-industry players must be required to maintain detailed tracking of multiple generations of documents that derive from compounding sources. Overcoming barriers such as these will require robust mechanisms for watermarking licensed sources as well as for detecting infringements of copyright. AI systems must have discoverable and traceable source records and be obliged to make them available to creators. A deemed infringement provision could be added to the Act so that an AI-industry business that fails to keep such records or make them available would be deemed to have infringed the copyright of the creator seeking the remedy. A provision could be added enabling a creator to pursue their rights against any or all of the parties profiting from the unauthorized use of their copyright-protected work: the TDM and other AI-industry business, the platform on which the regurgitated work was published, and/or the end user of the regurgitated work.

Canadian Authors Association is not aware of what measures AI-industry businesses are currently taking to mitigate their liability risks. Such businesses have their own vital interests at stake in the protection of their own intellectual property, as well as for the avoidance of direct or indirect liability arising from their failure to address such risks. This should motivate them toward the development of robust tracking and detection mechanisms that align with emerging public, governmental and industry-wide mechanisms, including a Canadian-made code of conduct, as well as any framework necessary for international treaty obligations.

Absolutely, there should be greater clarity. It should also be a general principle that no copyright can be granted for a generated work that is found to have infringed (or substantially infringed) existing copyrighted materials.

Transparency is required to allow for rights holders to meet the evidentiary burden of proving their work was ingested by an AI platform – discoverable and traceable source records. Conversely, if the onus of proof is on the alleged infringer, the creator of a disputed work must be able to refer to a common and certified master record in defence of a balance of probabilities (or beyond reasonable doubt, if a criminal code test is necessary.)

All rights holders should be able to seek remedies, whether individually or jointly and severally, against Large Language Model creator / operator, AI platform / application provider, and end user of AI generated content.

1. United Kingdom: could serve as a framework for Canada with its balanced approach and growth of a rights market, fair compensation for creators and a growing AI sector.

2. United States: CCC Collective Licensing

Comments and Suggestions

The copyright system is already broken mainly due to the introduction of the educational purposes exception.

The Canadian Bar Association

Technical Evidence

N/A

Text and Data Mining

While some jurisdictions have proposed limited text and data mining (TDM) exceptions to copyright liability for artificial Intelligence (AI) training purposes, the CBA Section does not propose such an exception.

We note that following criticism from creative industries, the United Kingdom (UK) reconsidered a key outcome of its thorough consultation, to widen and replace its existing limited TDM exception that allows for reproduction of copyright works for the purpose of computational analysis for non-commercial research, with a broad TDM exception that allows reproduction for any purpose by anyone, and to database rights with a rightsholders’ option to opt-out. The UK House of Lords Communications and Digital Committee’s Report recommends that the UK Initial Public Offering (IPO) pause its proposed new TDM exception to conduct an impact assessment of the creative sector and if negative effects are found, to pursue alternative approaches. In its response to the Committee’s Report, the UK Government confirmed that it would “not be proceeding with a broad copyright exception for TDM”. Rather, the Government committed to “work with users and rights holders” to discuss “copyright licensing for inputs”, with the hope of producing a “voluntary code of practice”, and foster growth and partnership between the tech and creative sectors. The UK IPO set up a working group made up of representatives from the technical, creative and research sectors, with terms of reference published on its website.

The CBA Section suggests that any amendments to current legislation in Canada would be premature at this time, as the current regime appears more likely to strike a proper balance between copyright holders and users. Until there is clear evidence calling for legislative change, the CBA Section believes restraint is appropriate.

Sections 29 of the Copyright Act already prescribes sufficient exceptions for fair dealing for the purpose of research, private study, education, parody or satire.

Fair dealing in Canada differs from other jurisdictions such as the United States, in that the purpose of the fair dealing must first comply with S.29 of the Copyright Act before fairness of such dealing can be established. The Supreme Court of Canada in Law Society of Upper Canada v. CCH Canadian Limited laid down 6-non-exhaustive factors for conducting a fair dealing analysis as contemplated by Section 29 of the Copyright Act, being “the purpose of the dealing, the character of the dealing, the amount of the dealing, the nature of the work, available alternatives to the dealing, and the effect of the dealing on the work.” The Supreme Court of Canada further explained that fair dealing “must not be interpreted restrictively, and the word “research” must be given a “large and liberal interpretation in order to ensure that users’ rights are not unduly constrained and is not limited to non-commercial or private contexts.” As such TDM, for the purpose of research, private study, or review, may fall within the definition of fair dealing under the Copyright Act.

Furthermore, Section 30.71 of the Copyright Act provides a suitable exception to infringement for the temporary reproduction of works that are essential parts of a technological process, for the duration of that process and where “the reproduction’s only purpose is to facilitate a use that is not an infringement of copyright”.

Should further advances to technology necessitate consideration of a TDM exception, the CBA proposes that there should be an in-depth study of approaches, and conditions/restrictions on the application of the section, which should be carefully studied and considered for Canada. Canada appears to be headed in the right direction, with the proposed Artificial Intelligence and Data Act (AIDA) and the Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems.

The CBA Section proposes that any proposed text and data mining regulation should carefully address at least three categories of text and data: (1) personal information that can be linked to an individual that does not fall into category 3 (Personal Information); (2) works with copyright protection that do not fall into category 3; and (3) documents that are filed with a court, tribunal or other government entity that may have copyright protection:

1. Personal Information should not be collected from sources that are third parties to the collector, AI developer, and/or user, unless it is for a proscribed use or in a proscribed manner. The risk of misuse of AI in targeting human behaviour is high and this type of data collection should be highly regulated.

The CBA Section recommends additional study of the acceptable uses of Personal Information in AI models: in particular, the study of the types of personal information that is a risk of misuse in AI models.

2. Content protected by copyright should be required to be licensed or explicit permission provided for the use of copyright material. The availability of copyrighted material on the internet should not be assumed to authorize its ingestion and reproduction for AI training models, and rights holders should not be required to affirmatively opt out of such uses. These protections are particularly important for works in the creative arts and software programming.

In the alternative or in addition, AI developers should keep records of or disclose what copyright-protected content was used in the training of AI systems. Such transparency and accountability measures are critical to rights holders, who are otherwise unable in practice to determine when and how their content has been accessed and reproduced for such purposes. While we do not propose an amendment to the Copyright Act to provide for such obligations, we note that Bill C-27, The Artificial Intelligence and Data Act (AIDA), currently at consideration in committee in the House of Commons, gives the Governor in Council power to make regulations on topics such as transparency and record keeping obligations. The government has indicated that it intends to hold consultations on this topic, and we urge them to consider such obligations at that time. This aligns with the Bill's purpose of establishing Canadian requirements for the design, development, and use of AI systems.

3. Documents that are filed with a court, tribunal, or other government entity, even if they are subject to copyright protection, should be permitted to be collected for training AI models without the need for a license or explicit consent. The collection of this category of text and data will assist AI models in the legal field and will support access to justice. The costs of legal fees to clients could be reduced by allowing for the use of AI models on documents in this category.

Authorship and Ownership of Works Generated by AI

Although there is no explicit statutory requirement for human authorship in Canada, the provisions of the Copyright Act, coupled with Canadian jurisprudence clearly suggest that human authorship is a requirement for copyright. Therefore, works created entirely by AI will not qualify for copyright protection.

The Supreme Court of Canada explained the Canadian standard for originality in Law Society of Upper Canada v. CCH Canadian Limited as one that “originates from an author and is not copied from another work” and is “the product of an author’s exercise of skill and judgment”, such exercise not being “so trivial that it could be characterized as a purely mechanical exercise” and “will necessarily involve intellectual effort.”

This does not mean that AI cannot be used as a tool by a human author and use of such will not render a work uncopyrightable under Canadian law. Technological tools have been used for some time by creators in a variety of media.

Currently, there is no need for changes to the basic definitions and requirements for copyright to accommodate technological changes in the generative AI field. These should continue to apply to works created by human authors using AI as a tool in the creative process.

Apart from works created by humans with the assistance of AI technology, it is possible for generative AI to produce content following prompts by its users. Users may also further modify works generated by the AI technology or adapt the AI technology itself to meet a specific creative use. With the advancement of AI systems, it is also possible for AI to independently generate creative works with little or no human involvement in producing output. The endless possibilities by which AI can be involved in the creative process with or without human involvement in the input and output process, has created inter-jurisdictional uncertainty as to how these works should be treated, whether they can be considered original or copyrightable, and how the authors and first owners of the work should be identified.

When registering works, the Canadian Intellectual Property Office (CIPO) currently generally requires disclosure of the human author who created the work. CIPO has also registered copyright in a work which listed a human and an AI painting application as co-authors of the artistic work, an approach which received international attention and some criticism. Although CIPO does not conduct a substantive examination of claims made in applications for copyright registration, it would be useful to require disclosure if any elements of the work were created entirely by AI, with no human intervention, in which case that particular element may not be copyrightable. We note that courts in Manitoba, Yukon and the Federal Court similarly require disclosure of when AI was employed in producing court documents, and more Canadian courts may follow suit.

The CBA Section does not believe that it is currently necessary to clarify that copyright protection applies only to works created by humans, about which the applicable language of the Copyright Act and case law is clear.

Because solely AI-generated works do not qualify for copyright, we do not support attributing authorship of AI-generated works to the person who arranged for the work to be created, which would require overturning the requirement of human creativity in copyrighted works. Such approach has been taken in countries like the UK, where the Copyright legislation allows for authorship of computer-generated works by the person who undertakes “the arrangements necessary for the creation of the work”. Recent consultation in the UK supported no changes to the existing law for computer-generated works without a human author. In Canada, there is also no clear legislative intent or jurisprudence that allows for authorship by computer systems or AI, and this is not likely to change soon.

The CBA Section does not support creating a new and unique or sui generis right or set of rights for AI-generated works as there is insufficient evidence to suggest that such approach would fully address the presented issues or maintain the proper balance between the rights of owners and users and the public interest.

Infringement and Liability regarding AI

A. Existing laws are generally sufficient at this time to address AI liability, but developments should be monitored

The Copyright Act and existing Canadian case law provides the necessary legal tests for establishing infringement with respect to: 1) inputs, that is, ingestion and use of copyrighted materials by test data management (TDM) and training of machine learning models, and 2) outputs, where the AI-created work evidences infringement of a substantial part of a copyrighted material.

With respect to inputs, TDM undertaken to feed training of machine learning models, at least in many cases, requires reproductions of the training material. If that training material includes copyright-protected works and other content, and if such reproductions are unauthorized, infringement is clear. Whether such use will be exempted from liability by the fair dealing exception is a case-by-case determination. It should be noted that where the ultimate use is commercial and such use may compete with the ingested content, the application of the fair dealing exception to such activity becomes less tenable.

B. Canada's current copyright framework is for the most part preferable for now

Because AI is an evolving technology, the ramifications of which remain to be seen, application of existing fair dealing provisions provides the most flexible and resilient approach, which can evolve if needed to accommodate economic and technological developments. The established body of existing Canadian case law also bodes in favour of not replacing Canada’s existing fair dealing provision with an alternative untested and completely undeveloped in Canada, such as the US fair use doctrine. For this reason, we believe the existing fair dealing exception is also preferable to attempts to legislate a statutory TDM exception.

With respect to outputs, that is, AI-created works, in at least some scenarios, there will be strong evidence of liability, such as where an output is substantially similar to existing content, and it could likely be demonstrated that the AI system was trained on the pre-existing content. Similarly, where users’ prompts to AI systems result in AI-generated outputs that contain substantial parts of existing content, it could likely be determined that: i) reproductions of existing content has occurred at the input stage, and ii) the output is likely infringing. Where there is strong evidence of AI creations borrowing from a particular creator’s catalogue (if the output bears an uncanny resemblance to the music of Drake, for example), it would be more likely to establish that it was trained on the creator’s catalogue (thereby likely infringing at the input stage), but whether the output infringes a particular work may be more difficult to establish. In this space, it will be useful to closely follow development of case law to see whether the common law, applying interpretive principles such as technological neutrality, continues to adequately protect copyrighted content, or whether statutory adjustments are necessary.

For now, the CBA Section finds that the flexibility and resilience of the current copyright framework, and leaving it to Courts to apply the current framework on a case-by-case basis, appears to be a more prudent approach than trying to legislate what may be static and quickly outdated solutions for a rapidly evolving field.

Outside of the copyright field, the government may want to consider codifying a federal tort of appropriation of personality, or in the case of commercial artists, a federal right of publicity, to protect artists whose likeness or voice is commercially exploited through AI “deepfakes” that, although not necessarily infringing a particular work or song, can exploit the artist’s entire professional catalogue of work. This will also assist because copyright infringement may be difficult to prove in cases where training data has not been copied in the traditional or copyright sense; however, the works of an author have still been used to train the AI. AI platforms should not benefit financially from a prompt that says, “Write me a song in the style of Drake,” for example, where the AI has been trained on the catalogue of Drake, regardless of whether the catalogue itself has been copied within the meaning of the Copyright Act, and whether or not the resulting AI output infringes a specific work.

C. Transparency and accountability requirements can address barriers to barriers to determining whether an AI system accessed or copied a copyright-protected content

With respect to both inputs and outputs, it can be difficult to determine whether an AI system accessed or copied specific copyright-protected content. For this reason, it would be useful for the government to consider rules establishing transparency and accountability for the use of copyrighted content in AI training. As noted above, we do not propose amendments to the Copyright Act, but in the context of rulemaking on The Artificial Intelligence and Data Act (AIDA), the government should consider whether AI Developers and creators of training datasets should maintain records of which content they have ingested, how it was procured, and how it is used in development of AI-generated content, and disclose such information when requested by rightsholders. It is noteworthy that the Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems, introduced in September 2023, includes transparency among the outcomes to which developers and managers of advanced generative systems have committed.

Comments and Suggestions

N/A

Canadian Civil Liberties Association

Technical Evidence

N/A

Text and Data Mining

TDM is a necessary step in training generative AI models. But in the case of generative tools that spit out poetry, music, and illustrations, the “T” and “D” that need to be “M’d” are actually human-made works of art, work that has sprung from the boundless creativity of human minds. There is already something dispiriting about the way generative AI can flatten pre-existing creative works into “text” or “data” for harvesting, and just because these models need lots of data does not mean that that data should be mined with such little regard for the creatives whose work fuels it.

Standardizing TDM usage rights is thus a crucial step toward respecting copyright holders of creative works. These platforms should obtain licenses, pay licensing fees, and bear primary liability for copyright infringement. If platforms lack these licenses, or if no prior authorization is granted, then platforms should be liable for the unauthorized acts of communicating these copyright-protected works to the public, including making them available to the public. Further, standardization of usage rights can create helpful benchmarks for transparency and accountability in how companies approach TDM. This means establishing effective complaint and redress mechanisms along with human oversight of what may otherwise be an automated process. Establishing strict standards of disclosure for users and effective complaint processes for copyright holders can protect individuals when TDM is occurring, and if copyright holders’ work is being used unlawfully.

One such model Canada can look to for protecting rights of copyright holders during TDM is that of English-Corpora.org. In this model, the accessible content has limitations to how it can be used when downloaded. To quote McCracken and Raub (2023), “the vendor manages the limitation of copyright by removing 5% of all the content. Doing this through removing the last 10 of every 200 words, the vendor has created a collection that essentially has no resale value but is still fully valid for linguistic analysis and research.” English-Corpora's model is appealing for many reasons: it creates safe access to content that does not risk delivering the full licensed text collection; vendors would not need to create their own portal to control access to the data; and patrons would have the tools to work on more than one data set and establish reproducibility. Overall, the goal of whatever regulatory scheme is implemented should further protect the rights of the creator and owner of the copyright.

We acknowledge that in some cases, those who wish to train generative AI models may find it necessary to seek exceptions to copyright protections. However, these exceptions should be limited in scope to protect copyright holders' rights and interests. This is consistent with TDM regimes around the world. Though they vary by jurisdiction, countries like Japan, Singapore, Estonia, and Germany allow copyright exceptions for non-commercial uses and research purposes. Both exceptions balance the interests of users and copyright holders: they understand the benefit and value that generative AI research may create for society, while also ensuring that copyright holders derive financial reward from the use of their work in commercial contexts. To balance the moral rights of the owner and the interests of the developer, ISED should implement regulations that require the developer to recognize the use of the owners’ works in non-commercial and research applications.

Another option that could be implemented into the law is a fair use and fair dealing exception that is followed in the United States. This is where the exceptions to the copyright law regarding TDM are determined by a test of proportionality and includes assessing the purpose and character of use, the nature of the copyright, the amount and substantiality of the portion taken during the TDM process, and the effect of the use on the potential market. However, this still needs to consider whether granting this exception to commercial uses would grant a disproportionate amount of control to the technology sector, giving them inordinate power over the creators of copyrighted works. There needs to be attention given to the economic rights of the owner of the copyright so that the law does not allow too many exceptions that the owner can no longer derive financial reward from the use of their work. Moral rights are also an important consideration in TDM and to balance the rights of the owner with the user of their work, regulation should be implemented that requires the user to recognize the use of the owner’s work.

Authorship and Ownership of Works Generated by AI

When it comes to generative AI, there is uncertainty surrounding authorship and ownership for copyright purposes. This uncertainty stems from the separation between types of AI-generated works, and from the difficulty of trying to fit these works into current models of authorship and ownership.

For the first issue, it would be helpful to break up the legal definition for copyright purposes into two categories. One category would be ‘works created solely by AI systems that have no human contribution.’ Under the current copyright laws, these works would not be protected by copyright. The second category would be ‘human works that were made with the assistance of AI.’ If following approaches taken by other jurisdictions, these works could be protected by copyright, if it can be proven that a level of ‘sweat of the brow’ and ‘modicum of creativity’ are demonstrated in the works by a human being. This means that it would need to be determined in law what level of effort of skill, labour, and creative pursuit is necessary to fulfill these two categories, in a sufficient manner to allow for copyright protection (This approach is addressed within Nova Productions v. Mazooma Games [2007] EWCA Civ 219).

The concept of originality has also been considered internationally when addressing how copyright protection is created for a work. Originality is an important factor for copyrighted material and this concept should continue to be considered when works of AI are being analyzed because without ensuring that originality is present, works of AI would merely be a compilation of publicly available or already copyrighted materials. The case of Eastern Book Company v. D.B. Modak (2008) 1 SCC 1. addresses the importance of originality as a guiding principle. This case states that to grant copyright, it would need to be proven that there was expertise, judgement, or skill used in the creation of the work. The case of Rupendra Kashyap v. Jiwan Publishing House Pvt. Ltd., coming out of India, also addresses this issue of originality when considering creations or systems of AI. This court describes how originality could be broken down into the following factors to determine whether its work is original: whether the expression and idea are inherently linked; whether the author applied expertise and effort; whether the least possible level of imagination is present in the work; and whether the resultant work is a product of only work and skill. Both courts in Canada and India have been clear on their view of copyright protection: originality is essential to grant this protection. Without it, there is nothing that the copyright can attach to, and this would be the same result for AI-generated works. That said, the issue of authorship and ownership of AI systems and works still exists and needs to be addressed to answer the question of who will hold the copyright if originality is found in AI-generated works, and consequently who will be held liable for potential issues of infringement.

Although many cases support this traditional view that copyright protection shall only be granted to human-made works, and that our legal ideas of authorship can only exist within this framework, this does not mean that things cannot change. Canada’s living tree approach to our legal system supports reconceptualizing definitions of authorship within the realm of copyright to better suit the technological present and the current needs and interests of society.

Redefining authorship to allow for AI-generated works to fit under current copyright frameworks is another approach that can be taken by the Canadian government. For this, Canada could apply the concept of the “work made for hire” doctrine. This doctrine states that “the individual who created the work is always considered to be the author of the work, regardless of whether the work was created in an employment or independent contractor relationship,” but that the producer’s employer is the first owner of the copyright of works created by an employee in the course of employment (Moskal, 2021). This approach could change how ownership was determined as it would classify AI as employees of whatever company was utilizing the system. Although this doctrine slightly changes the traditional method from granting the ownership of the copyright to the author of the works, to instead granting the ownership of the copyright to the employer of that individual, the work for hire doctrine allows for a more viable solution to issues of infringement and liability that arise with AI-generated works or AI systems. The validity of the work for hire doctrine is supported by Beloff v. Pressdram (1973) 1 All E.R. 241 (Ch. D). It is a solution to honouring the ideas of originality, creativity, and labour that exist in traditional copyright law, while also allowing the legislation surrounding copyright to evolve with the creation of new technology to promote societal evolution and creation.

Although this doctrine has benefits that support the creation of new technologies and provides an answer to the ownership of copyright if it is granted, there are issues with this work-made-for-hire approach to ownership. For one, it could over-reward users, programmers, and companies. It could also allow companies to own every piece of work that the AI program could produce, which has the potential to lead to access inequality issues, where copyright is being obtained at unprecedented rates and access to autonomous AI systems becomes impossible. For these reasons, there would still need to be limits placed upon what can be copyrightable works under this new definition of ownership. A solution would be to still separate AI works into the two above-noted categories, where copyright protection and ownership was only granted over the AI works that were created by humans with the assistance of these AI systems and not solely AI-generated works. This approach has been contested because of the potential issues flagged above, but it is evident that the other option of immediate entrance into the public domain could create issues down the line with liability and infringement. If there is no classified owner over AI systems, then there will be no one to hold liable for infringement over other copyrighted works.

Another approach that is taken for authorship is to grant it to the programmer, in instances where the developer of the AI system is in direct control of their development. This is supported by similar regimes in Hong Kong, India, Ireland, New Zealand, and the United Kingdom. It would need to be determined through law the specific control that a programmer has over their AI system to be able to hold authorship of works created from that system. This would take the place of the human contribution, only if there is a specific vision of the works that the AI will produce. This approach is addressed and supported through the example of “The Portrait of Edmond Belamy”, which was a work created using an AI system that used a data set of 15,000 portraits and sold for massive profits at auction. This is an example of what experts are arguing against allowing copyright protection over, as its creation is evidence of an AI system that has been “designed to access so many variations of input data that developers cannot even imagine what kind of outputs it will lead to” (Budden, 2022). The regimes that support authorship being granted to the programmer support it on the grounds that the result from the AI technology is not too remote from the work of the programmers. It must be ensured that the pattern used by the AI system is discernible to the programmer and not just the AI system; as the basics of copyright law or creative input and labour are lost the moment that the programmer cannot foresee every possible output of the system (Chaudhary, 2022). Granting authorship to the programmer can still fit into the work for hire doctrine if the programmer created the AI system during their employment, giving any copyright ownership to the employer of the programmer.

Infringement and Liability regarding AI

Currently, there is an issue with determining if an AI-generated work has infringed existing copyright, when there are autonomous AI systems that create works beyond the ideas thought of by the programmer. When AI can create and act without input or supervision of a human source, it becomes very difficult to catch infringement unless the programmer or company that is using the AI system is actively keeping record of the sources used in creating its works. One way infringement prevention has been considered internationally is through requiring the implementation of a record-keeping system for anyone involved in the development or deployment of AI systems. An approach to this kind of record keeping is outlined in article 12 of the European Union’s AI Act. It states that “high risk AI systems shall be designed and developed with capabilities enabling the automatic recording of events (‘logs’) while the high-risk AI systems are operating. Those logging capabilities shall conform to recognized standards or common specifications” (EU AI Act, 2023). The Act also states that logging should also include that it the ability to record the period of use, note the reference database which input data has been checked by the system, the input data for which the search has led to a match, and any identification of natural persons involved in the verification of the results. A similar model to this one outlined in article 12 of the EU’s AI Act would be helpful in Canada for limiting infringement and delineating clearer liability standards.

Comments and Suggestions

N/A

Canadian Copyright Institute (CCI)

Technical Evidence

• How do businesses and consumers use AI systems and AI-assisted and AI-generated content in your area of knowledge, work or organization?

AI is currently in use by authors and publishers in the book and periodical industry for research and review and for routine tasks such as checking spelling and grammar in a manuscript for publication or tracking inventory and sales. AI is used by some authors for generating scenarios or drafts, and by some non-author users to generate full texts. It is used by some publishers for generating marketing copy, or for drafting alternative text descriptions of images for use in accessible ebook publishing.

Text and Data Mining

• What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?

Clarification is needed for both understanding and compliance. TDM without authorization from copyright owners and holders whose works are used is a misappropriation of copyright content created by authors. An exception in the Copyright Act for TDM or for works made by AI in reliance on TDM will lead to a flooding of the market for books, magazines and images of works of art by a massive number of machine-generated works that will compete with original works created by human authors and their publishers, reduce their incomes, put some of them out of business and cause others to leave their profession.

• Are rightsholders facing challenges in licensing their works for TDM activities? If so, what is the nature and extent of those challenges?

As an organization, CCI has no direct or specific knowledge of the pirated content of the datasets being used in the process of making publications that will compete with the original works of human authors, but we do know that many rightsholders do not want to license their works for this purpose or to be complicit in encouraging AI developers and platforms that may simply want rightsholder licences as an insurance policy to reduce the risks of making models that spew out machine-made works in reliance on the published copyright works of human authors. Such works without the intellectual involvement of humans exercising their skill and judgment do not have the “originality” required for copyright protection (as delineated by the Supreme Court of Canada) and will impair the market for original works. If the rights of copyright owners and holders and author’s moral rights are apparently infringed by such AI-generated works, there will likely be little they can do to establish anyone’s liability, quite apart from being deterred by formidable costs of endeavouring to do so.

It should be the responsibility of AI developers to develop AI tools that will not allow prompts by users of their systems that could recreate copyright works used as input, whether or not those input works have been licensed for use in TDM.

• If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards? What would be the expected impact of such an exception on your industry and activities?

There should not be a legislated exception to copyright specifically for TDM. Regulations for setting limits on TDM as fair dealing would be premature, but at some point they may be needed.

TDM for AI development should be permitted only if specifically licensed by rightsholders under direct or collective licences. In the absence of negotiated licences, rightsholders should have the option to apply through the Copyright Board for mandatory tariffs that are subject to arbitration by the Board.

AI developers should be required to develop AI tools prohibiting prompts by users of their models that might recreate copyright material used as input for their AI systems, and AI developers as well as AI platforms, any credited publisher and any persons to whom authorship is attributed should be liable or share liability for infringing output.

Although many members and affiliates of collective societies will likely prefer their rights with respect to TDM to be handled by a collective society, a licence from a collective society should not preclude the possibility of direct licensing by an individual copyright owner or holder.

As a safeguard against inaccurate, misleading, manipulative or false information and deep fakes, all published AI-generated content should be labelled as “machine-generated” or “generated by artificial intelligence”. Section 30.71 of the Copyright Act permitting temporary reproductions for technological processes should be amended to specifically exclude TDM if not ruled out earlier by court decisions.

• Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?

AI developers should be required to keep records of all works used to produce or “train” generative-AI models and to release this information promptly to allow public inspection in addition to monitoring by copyright owners or holders, whether or not alleging infringement.

Lack of transparency around what works are used to train generative-AI models also increases the likelihood of infringing authors’ moral right of attribution.

• What level of remuneration would be appropriate for the use of a given work in TDM activities?

Fees should be negotiated by the AI platforms and copyright owners or holders or by collective societies which they have voluntarily joined or with which they have voluntarily affiliated. In the absence of negotiated licences, rightsholders should have the option to apply to the Copyright Board for mandatory tariffs that are subject to arbitration by the Board.

Criminal penalties for copyright infringement and the statutory damages for plaintiffs that are available if opted for by a copyright owner or holder, need to be increased as the infringers or enablers of infringement are most likely to be very large international technological corporations.

Authorship and Ownership of Works Generated by AI

• Is the uncertainty surrounding authorship and ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies?

Neither the user of a generative-AI model nor its AI owner should be treated as an author protected by copyright because a machine-made work lacks the originality without which there is no copyright work.

An AI-generated work should not be protected by copyright. Nor should an AI-assisted work be protected by copyright unless under effective and verifiable human control by a named author and publisher. An AI-assisted work may have a copyright notice, but if published anonymously, pseudonymously or under a pen name, it should be published under an identifiable publisher’s imprint or under the name of an identifiable individual or entity prepared to accept liability if there is an infringement of copyright or if another issue arises, such as libel, invasion of privacy, breach of personality rights or unjust enrichment. Imitating or mimicking the distinctive style of another writer could be viewed as an appropriation of personality rights.

Regulations should include an obligation to name or identify – and publish on every AI-generated publication – a responsible person or entity if liability should fall to anyone other than or in addition to a named author and publisher of the publication.

Infringement and Liability regarding AI

• What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?

Lack of information on the input of copyright works into an AI system is a huge barrier to establishing the access considered necessary by courts to prove the infringement of a work in addition to necessarily subjective assessment of observable substantial similarity between it and infringing AI-generated material – whether there was an actual reproduction of the copyright work or whether the order of words or images in the AI-generated work was predicted by an algorithm. There should be an obligation to keep records of input into an AI system and to make them available promptly for public inspection and for monitoring by rightsholders. These obligations could be required by regulations, which should ideally also state that knowledge or intent to cause harm may be presumed if AI developers or users of their AI tools fail to comply with the regulations.

• Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

AI developers and platforms and those claiming to be authors and publishers of an AI-generated work should be liable and bear liability for infringing copyright works.

Comments and Suggestions

The Canadian Copyright Institute (“CCI”), founded in 1965, a non-profit association of organizations and individuals comprising creators, publishers and distributors of copyright works with an interest in copyright law, observes and submits as follows:

Technology related to artificial intelligence worldwide will likely continue to develop at an astonishing speed. Consequently it is the view of CCI that it is premature to pass any new copyright legislation with respect to AI systems.

No legislation or regulation is needed to clarify that authorization from rightsholders is required prior to any scanning of works into an AI system, except that it may become necessary to put regulatory parameters on “fair dealing” for TDM.

Instead of copyright law continuing, as it has historically since the enactment of the Statute of Anne in 1709, in the British Parliament, to reward human authors for their work and provide an incentive for them to create more works, recognition of copyright in AI-generated works without effective and verifiable human control would disrespect human authors and their publishers and demean their professions as well as reduce incomes in the book and periodical sector of the Canadian economy and put some workers out of work. To legislate change to this basic assumption of human authorship of copyright works would bring a huge cultural shift. AI output of text and images that do not result from authors exercising human skill and judgment should never be protected by copyright.

What is needed right now – prior to eventual amendments to the Copyright Act – is much more information and transparency about any AI systems being used to generate materials including the sources of the data relied on for content. This information must be easily available, not just in case of alleged copyright infringement but in any case. There should be no exceptions now to copyright to accommodate developers of AI and generative-AI platforms. Any legislated exception would encourage more use of entirely AI-generated works that would substitute for, compete with and impair the market for original copyright works created by the skill and judgment of human authors, as well as their creativity and labour, including AI-assisted works.

Parliament should not jump prematurely to enact copyright legislation on AI, before there is certainty that it will be compatible, to the extent reasonably possible, with the copyright laws of Canada’s main trading partners, particularly the United States, Europe, the United Kingdom, as well as former colonies of countries with compatible laws.

Regulations – and any eventual copyright legislation – should require users of AI systems to state on their publications that all or part of the content has been generated by AI and that authorization has been obtained from rightsholders for input material obtained by TDM. In case an author who has made some use of AI wants to be in a good position to defend, as fair dealing, a potential claim of copyright infringement, they may be well-advised to include on their published work any relevant source and, if in the source, the name of the author.

In this continuing extraordinarily disruptive period as Canadian society gets accustomed to generative AI, and as the book and periodical industry accustoms itself to modified practices within the book and periodical sector, it should be remembered that the Copyright Act is based on human authorship including the sole rights vested in authors set out in Section 3.

Article 9(1) of the Berne Convention for the Protection of Literary and Artistic Works states that authors “shall have the exclusive right of authorizing reproduction of their works in any manner or form.” Any exception must pass the 3-step test in Article 9(2) for exceptions and, certainly, any exception allowing use of generated-AI material derived from authors’ works without authorization will “unreasonably prejudice the legitimate interests of the author” and violate that test for exceptions. The 3-step test is also included in the WIPO Copyright Treaty and in the Canada-US-Mexico Agreement.

We should not lose sight of the potentially devastating impact of generative AI on the creative community comprising authors, their publishers and other workers in the book and periodical industry and of how authors’ works benefit those who enjoy, learn from or rely on those works in any way – nor lose sight of how the consumers of some AI-generated materials may be misinformed, misled, deceived, manipulated or otherwise adversely affected. While we marvel at the text and images that can be produced by generative AI and recognize that AI-generated material can have great value if used responsibly in appropriate contexts, let’s not allow use of AI to encroach on respect for human authorship – as expressed in paragraph 2 of Article 27 of the Universal Declaration of Human Rights: “Everyone has the right to the protection of the moral and material interests resulting from any scientific, literary or artistic production of which he is the author.”

Canadian Council of Archives/Conseil Canadien des archives, with the endorsem*nt of l'Association des archivistes du Québec, and the Association of Canadian Archivists

Technical Evidence

How does your organization access and collect copyright-protected content, and encode it in training datasets? Not Applicable (NA)
How does your organization use training datasets to develop AI systems? NA
In your area of knowledge or organization, what measures are taken to mitigate liability risks regarding AI-generated content infringing existing copyright-protected works? NA
In your area of knowledge or organization, what is the involvement of humans in the development of AI systems? NA
How do businesses and consumers use AI systems and AI-assisted and AI-generated content in your area of knowledge, work, or organization?

The holdings in Libraries, Archives and Museums (LAMs) are a major source of documents for AI researchers in their development of training datasets for use in training AI models, particularly those datasets used to train large language models. Canadian archival holdings are a rich treasure trove of records in all formats that serve as the raw material for scholars, students, and ordinary citizens. Archival institutions have eagerly embraced the opportunities provided by the Internet to digitize and make our holdings available online. When digitized, traditional records can be mined for valuable historical information. For example, fur traders’ journals document decades of weather patterns, relations between indigenous people and settlers, and early commercial activities; and tax rolls record the names of residents, which are of great interest to family historians. In addition, archival institutions are acquiring born-digital records and research data sets from their parent institutions and private donors.

Transparency and ensuring non-bias in training datasets is of concern to LAMs because of our public service mission. Although it may be onerous, it is very desirable to create the metadata to be able to identify and link the training datasets to the generative AI output created by the AI tool. This metadata will provide transparency and oversight to the users, and also for the creators and rightsholders whose works are used in the datasets in a non-consumptive way.

Canadian LAMS are also beginning to use some of the emerging generative AI tools for their own purposes. Archives can use AI tools to generate basic metadata, transcriptions, and create important access points for all manner of digitized documents, thereby providing greatly improved access to our holdings for their researchers and the General Public. (See Pavis, Mathilde. Artificial Intelligence: a digital heritage leadership briefing, 2023 https://www.heritagefund.org.uk/about/insight/research/artificial-intelligence-digital-heritage-leadership-briefing.). Along with many other materials, Canadian archives include a large volume of orphan works (works for which the rights holder is unknown or unreachable) on a wide variety of topics and archivists and researchers would benefit greatly from more clarity on whether or not orphan works can be digitized to enable this kind of improved access. The newly created metadata that can be produced by generative AI will unlock access to the content of archival holdings that goes well beyond what is possible with the limited human resources currently available in archival institutions.

Recommendations:

Provide clarity in the Copyright Act for the legal use of data for training generative AI tools
Require AI researchers and developers to ensure training datasets have identifiable metadata that can be linked to generative AI output

Text and Data Mining

What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry? Not Applicable (NA)
Are TDM activities being conducted in Canada? Why or why not? NA

Are rights holders facing challenges in licensing their works for TDM activities? If so, what is the nature and extent of those challenges? NA
What kind of copyright licenses for TDM activities are available, and do these licenses meet the needs of those conducting TDM activities? NA
If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards?

Libraries, archives, and museums (LAMs) are unlikely to be AI developers, so LAMs don’t need an exception that permits them to use copies to train machines. LAMs are more likely to be asked to make copies of their holdings for AI developers upon request. If so, the fair dealing provision (s 29) of the Copyright Act (CA) may well serve, with some changes. The changes proposed below build upon provisions that are part of the balance between the rights of copyright owners and the interests of users already established within the CA.

Before describing the proposed changes, it is important to note that provisions such as fair dealing and the exceptions for LAMs are fundamental to the balance inherent in a well-functioning copyright system. Canada’s approach to the challenges of AI must begin with established principles. The Supreme Court of Canada (SCC) has established that exceptions are not just loopholes, but users’ rights (CCH v LSUC 2004 SCC 13 para 48), and we steadfastly defend their presence (particularly the fair dealing provision) as a fundamental principle. Since fair dealing is not limited to particular user groups, rights, formats, or categories of protected matter, everyone can benefit from it to access and use copyrighted material without authorization or payment, provided that the dealing is fair as determined by the SCC’s two-step test.

As beneficiaries of fair dealing, LAMs already can make copies upon request for the purpose of research. Provided that TDM is appropriately defined to be clear that it is included within a broad and liberal interpretation of research, making copies for TDM falls within one of the allowable purposes of fair dealing. That uncertainty would be clarified if the fair dealing provision were amended by adding TDM or computational data analysis to the list of authorized purposes, OR by making the purposes illustrative rather than exhaustive, i.e., “fair dealing for purposes such as research, private study, ...do not infringe copyright.” A further condition would require the LAM to inform the requester that the copies were provided for research only, that any further uses may require the permission of the rights holder, and that it is the responsibility of the requester to obtain any necessary permissions. Admittedly, the scope of fair dealing may have to be clarified through litigation, since the limited case law cited in the consultation paper does not address situations where the copied images were used to train a machine.

Since users’ rights are fundamental to a balanced copyright system, constraining them through contractual agreements undermines the system. Thus, the Copyright Act must be amended to provide that any contractual provision contrary to the exceptions in the Act shall be unenforceable.

The proposed amendments would provide legal clarity for both LAMs and AI developers by enabling LAMs to provide copies to AI developers to be used in the training of machine learning models.

Recommendations:

Amend the fair dealing provision of the Copyright Act to provide that TDM lies within the scope of fair dealing.
Amend the Copyright Act to provide that copyright exceptions cannot be overridden by contract terms.
What would be the expected impact of such an exception on your industry and activities? NA
Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?

Having sufficient metadata that would identify and link the training datasets to the generative AI output created by the AI tool would be highly desirable. Requiring AI developers to provide such metadata would provide transparency to the users of AI tools, and to the creators and rightsholders whose works are used in the datasets in a non-consumptive way. In order to ensure transparency and clarify rights issues, generative AI output should always be tagged as such.

Recommendation

Require AI researchers and developers to ensure training datasets have identifiable metadata that can be linked to generative AI output.
Generative AI output should always be tagged as such.
What level of remuneration would be appropriate for the use of a given work in TDM activities? NA
Are there TDM approaches in other jurisdictions that could inform a Canadian consideration of this issue?

The possibility of a more general exception to permit TDM falls outside the scope of the archival community’s direct interests. If, however, such an exception is needed, the provisions of Singapore’s Copyright Act pertaining to computational data analysis (sections 243-244) are well thought out in terms of scope and appropriate safeguards. Its strengths are:

Definition of “computational data analysis” (s. 243)
Limited purpose (only Computational data analysis) (s. 244)(2)(a) & (b))
Copy supplied/communicated to another only in very limited circ*mstances (s. 244)(2)(c) & 244)(4))
User must have lawful access to source materials (s. 244)(2)(d))
Infringing source materials can be used subject to specific limited conditions (s. 244)(2)(e))

Authorship and Ownership of Works Generated by AI

Is the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies? If so, how?

In Canada the basic principle is well established that copyright is automatic for original creations that include human skill and judgement, as specified in the Supreme Court decision (CCH Canadian Ltd. v. Law Society of Upper Canada, 2004 SCC 13, para. 25). The output that results from the generative AI mechanical process cannot meet this requirement for skill and judgement and is therefore not protected by copyright. The humanly created algorithm does meet the requirement and is protected by copyright.

The current principle of not assigning copyright protection to generative AI output does not at all appear to be limiting the rapid development and adoption of AI technologies. The lack of certainty is, however, having a profound effect on creators and how they view their future prospects from both an economic and social standpoint.

Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how?

We believe that assigning full intellectual property rights to the output of generative AI processes is inappropriate. LAMs have a long history of advocating for clarity in the Copyright Act and we believe this issue must be addressed in the legislation, to provide as much clarity as possible.

Even with the current constraints and uncertainties, AI is profoundly disruptive in many ways, particularly to the creative communities. Assigning copyright protection to AI output would very negatively affect the work of creators and their contribution to society, resulting in a negative effect on incentive to create. Extending copyright protection to AI output calls into question the value we place on human creativity and expression.

AI processes can be programmed to create mass output that could quickly monopolize the creative space, thereby disrupting in profound ways human creative activity, the copyright balance, and the marketplace.

The rapid development and dissemination of AI has already created considerable disruptions to the creator community, and these will continue to be a major problem. Creators contest that the ingest of their works in the creation of the AI training models without attribution, permission, or financial compensation is a serious problem that will affect them in many ways. But fair dealing and/or a TDM exemption would permit data mining for research purposes of the millions or billions of documents in the data sets used for training.

The prospect of directly compensating creators within the structure of the Copyright Act raises many thorny problems. Copyright law should not be used to address broad societal problems and challenges. However, copyright law is not the only way that we can reward creators. We recommend that the Government create a system outside the copyright regime to reward and acknowledge creators for the part their work plays in generative AI, such as a program in which AI developers are required to contribute to a fund that will be plowed back into the creator community to support a broad spectrum of Canadian creativity. (Other examples of this type of scheme are Canada’s Public Lending Rights, Telefilm). The details of how such a program would work would have to be carefully considered, with input from the creator community, and the outcomes would have to include mandatory contributions by those developing the training datasets, and money paid out to the creator community. This would help redress the balance between human creators and the potential dominance of large corporate AI in the marketplace and the creation landscape. Such a program would enhance Government efforts to ensure support for Canadian creators and creative industries, while simultaneously fostering Canadian AI competitiveness, innovation, and support for maintaining overall access to Canadian creation, all of which are important public policy objectives.

Recommendations:

Amend the Copyright Act to maintain and clarify the basic principle that copyright protects original creations that are the product of human skill and judgement and that the mechanical generative AI output is in the public domain, but the humanly created algorithm is protected by copyright.
Create a system outside the copyright regime, that rewards and acknowledges creators for the part their work plays in generative AI, whereby generative AI developers are required to contribute to a fund that will be plowed back into the creator community to support a broad spectrum of Canadian creativity.
Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

With further study and careful consideration, it is possible to consider very limited rights for AI outputs in particular circ*mstances, a variation of what is sometimes referred to as “thin copyright”, such as the limited rights sometimes accorded to databases. But these should be very limited in both scope and duration.

Recommendation:

Consider very limited rights for AI outputs in particular circ*mstances, a variation of what is sometimes referred to as “thin copyright”.

Infringement and Liability regarding AI

Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright (e.g., AI-generated works including complete reproductions or a substantial part of the works that were used in TDM, licensed or otherwise)? Not Applicable (NA)
What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?

At present, there is no requirement for AI developers to provide metadata that would identify and link the training datasets to the generative AI output created by the AI tool. Requiring AI developers to provide such metadata would assist in determining whether protected material had been copied when generating an infringing output, in addition to providing transparency to users of AI tools, and to the creators and rightsholders whose works are used in the datasets.

Recommendations

Require AI researchers and developers to ensure training datasets have identifiable metadata that can be linked to generative AI output.
When commercialising AI applications, what measures are businesses taking to mitigate risks of liability for infringing AI-generated works? NA
Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

It is clear that there should be greater clarity on where liability lies when AI-generated works infringe copyright-protected works. The current liabilities for copyright infringement should apply, but the issues will be clarified through litigation. Resolving the potential continuum of responsibility that arises with the actual situations in the litigation will be a more realistic approach, rather than rushing into a legislative solution that may have unintended consequences. The solutions must be consistent with the public policy issues discussed in other sections of this questionnaire

Recommendations

Continue to apply current liability provisions and remedies for copyright infringement.
Resolve liability and remedies issues that arise with ongoing litigation, to be consistent with sound public policy
Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue? NA

Comments and Suggestions

Are there any other considerations or elements you wish to share about copyright policy related to AI?

Diagnose and Understand the Problems

In this submission, we have pointed out some areas of deep concern, but have not necessarily proposed concrete or well-developed proposals of how they could or should be addressed. Expedient short-term solutions are not always the best way to proceed with problems arising with rapid change, as the hasty legislative “solutions” can create a whole set of new and unintended negative consequences. Diagnosing and understanding the problems is the important first step in finding viable solutions and this requires time and careful consideration.

Further Consultation Must extend beyond Copyright Stakeholders. We believe that it is crucial that public policy and legislation concerning Artificial Intelligence should be undertaken only after wide public consultation and discussion that includes broad public policy concerns such as protection of privacy and personal information, and other human rights. It is clear AI is introducing significant disruption to current realities, and it is important to remember that this disruption is taking place not only in the commercial marketplace, but also in research and in many other spheres of public life. Across the globe, we are struggling to understand the impacts of this rapidly evolving technology. Despite its many benefits, there are many public interest issues that are of concern to LAMs, concerns that go well beyond copyright—for example machine learning bias, misinformation, privacy issues, data breaches, and protection of freedom of expression. The Archival Community believes that all these issues require careful examination and public consultation that extends beyond this consultation and its copyright concerns. We believe it’s just too early to know how to deal with some of these complex and rapidly emerging problems.

Any changes to policy and legislation, be it copyright or other initiatives, must safeguard the public interest and privacy considerations. Effective public consultation with public interest groups and the broad General Public on the complex issues surrounding AI must take place. It is not only copyright stakeholders and those with a vested interest in the marketplace who should be part of the discussion. Copyright concerns for AI must not be addressed in isolation and without reference to broad public concerns, including the professional, economic, and social disruptions that come with AI. These are extremely important issues for archivists because of the nature of our holdings and our public service mandate. Broad public consultation and discussions are required to develop sound public policy and legislation that will avoid decision-making that is solely market-oriented. The public good will be served by such consultations that are in line with Open Government policies and processes already in place in the Canadian government.

AI is developing at a galloping pace, but we must avoid knee-jerk reactions that do not take into consideration the broad range of public policy issues that are intricately connected to this emerging technology.

Only Litigation Will Solve Some Problems We believe that some of the thorny issues around AI will only be clarified through litigation. This is inevitable in any environment of rapid change. Allowing the litigation to follow its course may be helpful in seeing clearly where the problems are.

Recommendation:

Extend the consultation and discussion of these issues to the General Public to ensure that public policy issues are considered in decision-making concerning AI policy and legislation.

Canadian Electroacoustic Community

Technical Evidence

N/A

Text and Data Mining

"What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?"

More clarity would provide increased security for both the AI and creative industries: confidence within the AI industry that the risk of unwanted, unexpected, or unnecessary legal challenges will be low; and confidence within the creative industry that AI will not be trained on an artist's own work to essentially and effectively replace that artist.

"Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?"

Yes. This is essential, at least in the early stages of developing a framework for the ethical use of AI (i.e. over the coming 5 to 10 years), in order to not only track and monitor what works are being used to train AI systems, but, importantly, to allow for investigations into the relationships between such 'training' works and any AI outputs, in the hopes of establishing and measuring ethical use standards.

Authorship and Ownership of Works Generated by AI

"Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how?"

Yes. However, the primary issue here, it seems to me, is that 'reproduction' of copyright-protected work is no longer an appropriate or adequate measurement in this new context of AI-training and its impact on AI outputs. Thus, either:

a) Copyright law needs a new extension and new framework to tackle this new form of use; or,

b) If copyright law cannot be extended beyond reproduction etc. of copyright-protected work, then copyright law is not the correct context for establishing a legal framework for the training of AI in creative practices, and a new legal framework must be established.

Infringement and Liability regarding AI

"Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright (e.g., AI-generated works including complete reproductions or a substantial part of the works that were used in TDM, licensed or otherwise)?"

Focusing only or primarily on the ‘reproduction’ of a copyright-protected work is insufficient and inadequate. One of the main uses of AI in creative contexts is to produce new works that replicate the style of existing content. This means that, as soon as a human creator produces a creative work, AI is then immediately able to create an infinite number of works that directly replicate the human creator’s creative style, which is a central and essential aspect – or even THE essential aspect – of an artist’s creative communication with their audience, through a personal artistic ‘signature’ – all without the AI directly ‘reproducing’ the original artwork.

In the absence of bespoke protections in this new environment, this would have the same consequences as having an artist’s work immediately fall into public domain upon its release, and a single output by any artist would immediately defeat the need for any further work by that same artist (as AI would be able to self-generate an infinite amount of content in that artist's style).

It is possible, therefore, that traditional copyright is the wrong approach, and that a new bespoke framework should be developed for TDM and AI etc.

"Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?"

Yes. As discussed in the consultation text, there are a number of parties who might bear partial or complete responsibility – or, on the other hand, none at all – for copyright and other ethical transgressions. It is essential for all parties involved that these degrees of responsibility be established with clarity.

Comments and Suggestions

N/A

The Canadian Federation of Library Associations

Technical Evidence

Libraries, archives and museums (LAMs) support AI research in the development of training datasets for use in AI models, particularly those used to train language models. Libraries provide access to large corpora of text and facilitate the licensing of content for AI purposes. Canadian university libraries informally report that researchers are stymied by scholarly publishers’ poor tools and high licensing costs for AI research. These tools are expensive, proprietary, and lack the functionality researchers need. Licensing costs for TDM activities are now a revenue stream for large multinational publishers, requiring libraries to pay multiple times for use, albeit different uses, of the same content. Such actions exemplify the drive to commodify all uses and thereby shrink the commons, threatening the public good and upsetting the Copyright Act’s balance between users and rightsholders.

Some publishers block the non-consumptive use of published works for AI training, while at the same time collecting data and usage patterns of their paying customers to develop AI systems for further commercial purposes (Yoose & Shockey, 2023), threatening privacy and equity standards. Researchers often need access to a wide variety of data sources in order to protect against bias so high costs and extra licenses needed for TDM access can inhibit research.

Libraries are centres of copyright expertise within many organizations, and are called upon by researchers to provide assistance in understanding the copyright implications of an AI research project. Researchers and librarians want to ensure the responsible development of AI and this includes ensuring that copyright is considered and respected. Most potential training datasets are not neatly packaged up, analyzed for copyright issues, and made available under a legally vetted licence. Instead, most training datasets are either vast in size, custom built for training a specific model, or for transfer learning. (Transfer learning is common in LAMs and occurs when researchers use an already trained model and then introduce a small new dataset in order to refine the model so that it better accomplishes a specific task.) Consequently, the use of most training datasets requires a fair dealing assessment in order to mitigate the risk of infringement. Librarians provide copyright guidance to researchers on their proposed use of training datasets, and their use of generative AI systems to create new works. This guidance is needed as the current formulations of sections 29 and 30.71 of the Copyright Act lack clarity for a researcher to know if the training of AI models with their proposed dataset is copyright infringing. Clarity through a specific exception would assist researchers in their AI projects as well as libraries in providing copyright guidance.

In libraries and educational institutions, human input is significant in the development of AI models and datasets. Many developers practice human centred explainable AI, centring the human in AI development, letting us understand and contest generative AI outputs and the decisions underlying those outputs (Ehsan et al., 2023). Thus, to mitigate bias in generative AI models, we need diverse and inclusive datasets. Market solutions providing datasets that are curated and licensed by rights owners is insufficient and public domain materials and openly licensed materials lack sufficient diversity for bias-reduced training of AI models. Thus AI models must be trained on all kinds of works, including unlicensed copyrighted works.

Technological neutrality helps navigate the copyright implications of using datasets containing copyrighted works to train generative AI models. Both the Summary to the Copyright Modernization Act and Supreme Court jurisprudence reminds us of the importance of technological neutrality in preserving the balance between authors and users in the digital environment (Entertainment Software Association v. Society of Composers, Authors and Music Publishers of Canada, 2012, paras 7-8). Technological neutrality implies that Canada’s core copyright understandings must be consistently applied “in a manner that appropriately balances the rights and interests at stake - maintaining in the face of technical change, the steady pursuit of copyright’s policy goals” (Craig, 2017, p. 612).

Generative AI is an evolving technology which enables the analysis and production of information at a speed and scale impossible for human beings. Such technology disrupts our current copyright framework and raises questions about how, or if, this technology implicates the exclusive rights of copyright owners. However, using the lens of technological neutrality allows for copyright to adapt to new disruptive technologies and lets us “maintain normative vigilance as conditions change” (Craig, 2017, p. 617) rather than constantly extending copyright owners’ exclusive rights when the activities of new technologies do not actually engage the copyright owner’s legitimate interests. A work that is copied to be reduced to a collection of discrete elements, or underlying facts and ideas, for training an AI model is not copied for human enjoyment and is not engaging with the author’s interests nor with an incentive to create. A technologically neutral functional equivalence approach tells us that copies made for training AI models do not implicate exclusive rights. To argue otherwise risks entertaining the concept that the acts of reading and memorization of works engages exclusive rights in an infringing way.

There is concern in the research community about training data and mitigating the risk of copyright infringement, but also about ensuring transparency and non-bias in training data. Many of these same researchers are concerned about the impact of the generated products on rightholders and are working on solutions to attribute, or link, training data to the generated works to provide greater transparency to the user. To do this effectively will require that training datasets properly identify the source of each discrete element of content in the dataset.

Canadian LAMs use generative AI tools in multiple ways. For example, university libraries and archives are using computer vision AI and generative AI tools to create extensive metadata for existing analogue image collections. LAMs also use generative AI models to create basic metadata for each document in large scale digitization projects.

Generative AI holds out great promise to enable libraries and archives to provide new access points and greater descriptive metadata to their collections than is currently possible. Generative AI transcription tools, such as Whisper, when trained with specific datasets incorporating content from the collections of libraries or archives, extracts information from audio files (e.g. oral histories and interviews) about the subject matter and the people involved. For example, these tools can extract the titles of all the poems recited and the types of questions asked by the audience in a poetry reading recording; this type of description is too labourious and time consuming without the aid of AI.

Film and media archives use generative AI to move beyond simple descriptions and allow researchers to engage with film in ways we never did in the past, and assist with a wide range of accessibility needs (Mason, 2023). These endeavours are too time consuming for humans to carry out, but generative AI makes it feasible for libraries and archives to provide rich metadata and vastly increased discoverability and access to collections. When utilizing generative AI tools for enriching descriptions and access, the original works are usually in analogue format. Therefore these works need to be converted - copied - into a digital format so that they can separately be ingested into the AI system for individual analysis. These copies are not necessarily for TDM purposes or for dataset training purposes, nor are they being made under the preservation and obsolete format provisions of the Copyright Act. For libraries and archives to use generative AI tools to enhance discoverability and access, they must be confident that the copies they make to utilize the promise of generative AI are not considered compensable or infringing.

Recommendations:

1. Provide clarity around training dataset content by encouraging training datasets to have sufficient metadata such that each content element is identifiable.

2. Ensure that any Copyright Act exception for the creation of non-consumptive copies for the purpose of informational analysis is broad enough to allow LAMs and other users to make non-consumptive copies of works. This would include the ability to circumvent a TPM to make such copies, or for purposes of utilizing technological tools such as generative AI to create metadata and enable superior discovery of those works.

References

Yoose, B., & Shockey, N. (2023). Navigating Risk in Vendor Data Privacy Practices: An Analysis of Elsevier's ScienceDirect (Version v1). SPARC. https://doi.org/10.5281/zenodo.10078610

Ehsan, U., Wintersberger, P., Watkins, E.A., Manger, C., Ramos, G., Weisz, J.D., Daum, H., Reiner, A., Riedl, M.O. (2023). Human-centered explainable AI (HCXAI): Coming of age. Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. https://doi-org/10.1145/3544549.3573832

Entertainment Software Association v. Society of Composers, Authors and Music Publishers of Canada. 2012 SCC 34. https://scc-csc.lexum.com/scc-csc/scc-csc/en/item/9994/index.do

Craig, C. J. (2017). Technological neutrality: Recalibrating copyright in the Information Age. Osgoode Legal Studies Research Paper Series, 186. http://digitalcommons.osgoode.yorku.ca/olsrps/186

Mason, I. (2023, November 15-17). Organizing within LAMs to address AI [Conference Presentation]. AI4LAMs Conference, Vancouver, BC, Canada.

Text and Data Mining

Text and data mining (TDM) involves the automated identification of patterns within vast datasets, playing a crucial role in the advancement of artificial intelligence (AI). TDM entails creating non-consumptive duplicates (copies that are utilized for purposes other than the works' original objective (eg. reading)) of materials, some of which may be subject to copyright , but are used for technological purposes such as web caching or data processing purposes, like TDM. The legal status of TDM lacks clarity, and the absence of a specific TDM exception in the Canadian Copyright Act hinders researchers' efforts and impedes progress by requiring extensive copyright analysis to ensure compliance.

The comments below build upon our community’s previous submissions and statements, which offer additional examples illustrating the crucial role that libraries play in this domain(CFLA, 2023; CFLA, 2018; Portage, 2018).

Many of the questions that TDM analysis pose are central to larger issues that libraries are struggling with as our collective works move from traditional formats like print where we can rely on copyright laws and exceptions, to digital access where fundamental user rights are eroded under licensing terms and weakened by technological protection measures (TPM). The suggestions and remarks below are embedded into the wider framework of safeguarding the overarching goals of promoting fair access to knowledge and information for the 'public good' (Liber, 2020; IFLA, 2020). These goals are directly impacted by any alterations Canada may make to its copyright legislation to responsibly and equitably address technological advancements such as Generative AI that use TDM.

It is crucial to recognize that generative AI and TDM analysis are distinct tools. TDM is an analytical tool which involves the automated identification of patterns from extensive datasets. Certain TDM applications involve a large corpus of textual data. It is important to articulate that the library community support for TDM is based upon applications of this technology that do not attempt to encroach on the vested copyrights of the original expression of a work, but to facilitate analysis that unearth patterns, information, and correlations, from the facts and ideas behind the works. Librarians and archivists believe this non-expressive/non-consumptive use of a work should be protected in copyright legislation through an exception. Any limitations or regulations applied to TDM will significantly impact the future shape and value of generative AI among other forms of analysis of digital works.

As stated in previous submissions (CFLA, 2021), the library community is familiar with the limitations and chilling effects that current copyright legislation imposes. Libraries are finding efficiencies and technologies to keep up with the proliferation of all formats of works (National Lottery Heritage Fund, 2023). However, restrictive licensing, digital locks and TPMs erode well-established user rights and inhibit access. In one example of how such restrictions affect scholarly work, a Canadian-led group of researchers was forced to retract a paper that had been accepted for publication on COVID-19 vaccine hesitancy because, while the law allowed it, the database contract overrode the statutory rights of the researchers; they had not secured a licence to mine a database of news articles used in the study (RetractionWatch, 2021; CFLA, 2023). Libraries acknowledge the need for mechanisms that allow for and incentivize a market for TDM data, but these incentives cannot come at the expense of basic user rights to the original publication or access to the facts and data of the expression. The non-consumptive nature of these analytical uses of works is an important concept to build into any technologically durable copyright policy.

As outlined in this Consultation’s Paper, two general directions exist that address TDM within copyright legislation in other jurisdictions. The library community supports the introduction of a specific TDM exception. This approach provides a practical basis for users and a solid framework for libraries to support research and creativity; but we caution against overly restrictive language potentially leading to unexpected obstacles as technology and expression evolve. Many of Canada’s key trading partners already have a specific exception for TDM, including Japan, Singapore, UK, and the EU. The library community supports an exception that applies to both commercial and non-commercial research which includes both the reproduction right and communication right. Japan's 2018 TDM exception, based on Article 30-4 of its Copyright Act, specifies that non-consumptive copies do not infringe the rights of the copyright owner. The Japanese exception permits TDM for both commercial and non-commercial purposes and prohibits rights holders from making TDM reservations(Ueno, 2021). Additionally, it nullifies contractual clauses attempting to restrict TDM.

Libraries should be able to override contract restrictions that thwart statutory rights and copyright exceptions so that vendors cannot make TDM reservations and/or fair dealing reservations. Singapore's 2021 TDM exception also allows for commercial and non-commercial TDM, explicitly forbidding contractual overrides. Similar to Singapore’s 2021 Computational Data Analysis amendment, this exception must apply to contracts governed by Canadian law or governed by foreign law “where the choice of foreign law is wholly or mainly to evade any copyright exception”(Kang, 2021).

To safeguard the integrity of the balance of user rights a TDM exception needs to be supplemented with illustrative language within the fair dealing framework. Adding the words “such as” to the purposes given in S.29 of our Act will allow users to confidently apply basic user rights across creative expression (CFLA, 2023). For example, the use of illustrative language in the US has established a solid legal basis within their fair use framework for non-consumptive research, such as TDM, on copyrighted materials. Legislation that anticipates fair and diverse access to information requires an approach that does not over-inflate the expressive capabilities of machine generated output or undervalue the importance of access to the widest possible scope of information that will enable unbiased applications of this form of analysis.

Contrary to licensing as a viable solution for TDM, libraries argue, as articulated by the International Federation of Library Associations (IFLA), that the right to access content should inherently encompass the right to engage in text and data mining.

"[T]he right to read ... content should encompass the right to mine. Further, the sheer volume and diversity of information that can be utilized for text and data mining, which extends far beyond already licensed research databases, and which are not viewed in silos, makes a licence-driven solution close to impossible" (IFLA, 2013).

Since research is often conducted by international teams, CFLA recommends the development of an international TDM instrument at WIPO to ensure that cross border research is not hampered by patchworks of national legislative barriers. The vast and diverse range of information available for TDM, extending beyond licensed research databases and not compartmentalized, makes a license-driven solution nearly impractical.

Recommendations

1. Create a specific exception for TDM. The library community supports the creation of a specific exception that would "facilitate the use of a work or other subject-matter for the purpose of informational analysis".

2. Further facilitating TDM: prohibit contract override and allow circumvention of TPMs for any non-infringing purpose. CFLA recommends introducing an exception that prevents contracts from overriding copyright exceptions for non-infringing purposes. This provision should apply to all future and pre-existing contracts.

3. Make fair dealing purposes illustrative. CFLA supports recommendations in the 2019 Copyright Review related to the enumerated list of purposes under Section 29 of the Copyright Act.

4. Support the creation of a specific international exception for TDM.

References

CARL-ABRC Fair Dealing Comparison Chart (2018). https://www.carl-abrc.ca/wp-content/uploads/2018/07/Fair-dealing-comparison-chart.pdf

CFLA-FCAB Brief to the Government of Canada Consultation on a Modern Framework for AI and the IoT (2021). https://cfla-fcab.ca/wp-content/uploads/2022/01/CFLA-CARL-Brief-Artificial-Intelligence-and-the-Internet-of-Things.pdf

CFLA-FCAB Statement: Copyright and Text and Data Mining (TDM) Research (2023). https://cfla-fcab.ca/wp-content/uploads/2023/07/CFLA_FCAB_Statement-on-Text-and-Data-Mining-Research-1.docx.pdf

IFLA. IFLA Statement on Libraries and Artificial Intelligence (2020). https://www.ifla.org/publications/ifla-statement-on-libraries-and-artificial-intelligence/

IFLA. IFLA Statement on Text and Data Mining. https://www.ifla.org/publications/ifla-statement-on-text-and-data-mining-2013/

Kang, A. (2021). Coming Up in Singapore. https://www.lexology.com/library/detail.aspx?g=1ce9c997-22a1-4953-bd0b-68a95d31bc89

Liber. (2020). Text and Data Mining. https://libereurope.eu/wp-content/uploads/2020/11/TDM-Copyright-Exception.pdf

National Lottery Heritage Fund. Artificial Intelligence. https://www.heritagefund.org.uk/about/insight/research/artificial-intelligence-digital-heritage-lead

Portage. (2018). Brief to INDU Committee on TDM. https://www.ourcommons.ca/Content/Committee/421/INDU/Brief/BR10245456/br-external/PortageNetwork-e.pdf

RetractionWatch. (2021, July 30). A very unfortunate event. https://retractionwatch.com/2021/07/30/a-very-unfortunate-event-paper-on-covid-19-vaccine-hesitancy-retracted/

Ueno, T. (2021). The Flexible Copyright Exception for ‘Non-Enjoyment’ Purposes. GRUR International, 70(2), 145–152. https://doi.org/10.1093/grurint/ikaa184

Authorship and Ownership of Works Generated by AI

The current Copyright Act has achieved a certain balance that would be disrupted by including AI outputs (CFLA, 2023). The current Copyright Act has achieved a certain balance that would be disrupted by including AI outputs. The Copyright Act safeguards works crafted by human authors, including the underlying computer programs of AI. The development and adoption of AI technologies is not inhibited by the current lack of copyright protection of AI-generated works. However, the lack of a policy framework for generative AI is having an impact on creators, and could be addressed in a number of ways outside of copyright.

In Canada, copyright serves to protect the expression of human creativity, encompassing both skill and judgment. Outputs from mechanical and routine processes do not meet the originality standard set by the unanimous CCH decision of the Supreme Court of Canada (CCH Canadian Ltd. v Law Society of Upper Canada, 2004). Without expressive agency and intellectual effort, the outputs of AI processes should not be accorded similar copyright protection as works by human creators. Carys Craig underscores that "authorship involves expressive agency, a quality inherently lacking in AI (Craig, 2021)." Granting machines the status of rights holders is contrary to the current provisions in the Copyright Act.

The outputs of AI processes without significant human intervention represent mechanical exercises devoid of skill and judgment, contrasting with the exercise of skill and judgment in developing an algorithm. Consequently, a computer program is protected by the Copyright Act. Unlike human authors, AI processes do not rely on copyright incentives to produce new works (Gervais, 2020). Expanding protection of intellectual property rights to outputs generated by AI machines could upset the balance of IP protection and discourage other stakeholders.

AI processes possess the ability to generate works more rapidly and systematically than human authors. The substantial output facilitated by AI has the capacity to displace human creativity and introduce economic disruptions, disadvantageous to human authors while favoring the swift and serendipitous outputs of machines. One of the primary purposes of copyright is to strike a balance between the rights of authors and the broader public interest, particularly in education, research, and access to information (WIPO, 1996). The extensive output enabled by AI has the potential to disrupt the economy by placing human authors at a disadvantage. If subjected to a comprehensive spectrum of copyright protections, this volume-driven "autoship" could marginalize human authors' outputs and undermine society's right to access facts and information that would otherwise remain in the public domain. Giving the full duration of copyright protection to AI-generated works could result in copyright overreach on a massive scale, allowing some AI companies the potential ability to crowd out human creators in such areas as music (Obeebo Inc., 2019).

On the matter of authorship, CFLA currently advocates that outputs of AI processes remain unenclosed and open to the public. As cautioned by Craig and others, extending full copyright protection to AI outputs poses a threat to the equilibrium of copyright and challenges the value Canada places on human expression (Craig, 2021; Copyright Review Board, United States Copyright Office, 2022.)

The Copyright Act should remain as is with regards to human authorship. Granting copyright protection to artificial intelligence (AI) outputs could disrupt the intricate and nuanced equilibrium established in the Copyright Act. The Act currently safeguards works crafted by human authors, encompassing the computer programs that form the foundation of AI.

In circ*mstances where sufficient human expressive agency is added to an AI-generated work (e.g. the output of a generative AI process has been substantially re-edited in Photoshop), as the U.S Copyright Office notes works could be afforded copyright protection under certain circ*mstances (United States Copyright Office, March 2023). The granting of any such protection would be judged on a case-by-case basis, and copyright ownership claims could be based on documentation of the exercise of human skill and judgment dedicated to the revision of the AI generated work.

Possible ways to make clear the origins of AI-generated works include the addition of metadata that identifies the work as AI-generated. For example, the private company Stability AI is currently working on a tool that will tag image content generated with their tool with metadata that discloses the AI origin of the work, which could be protective both to distinguish AI generated work from human expressive content and also protect against “deep fakes (Stability AI, 2023).” If AI generated works are marked in some way it will be easier to trace the public domain copyright status of AI outputs, to distinguish them from works that have copyright protection.

The Canadian Intellectual Property Office (CIPO) should refrain from granting copyright registration to AI created works, and refrain from acknowledging AI machines as co-authors or single authors of works. CIPO should be guided by the work done by the Copyright Office in the United States, which produced “Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence in March 2023 (United States Copyright Office, 2023).” This document makes it clear that human authorship is necessary for copyright registration and protection, unless significant human input is made to the resulting output.

AI Authorship Recommendations

1. Artificial intelligence authored works should not be protected by copyright.

2. The Canadian Intellectual Property Office (CIPO) should refrain from granting copyright registration to AI created works, and refrain from acknowledging AI machines as co-authors or single authors of works without the applicant showing significant human input has been made to the outputs.

3. While CFLA recognizes that generative AI can be disruptive to creators, the issue of possible compensation for creators of material used to train AI machines should be separate from the Copyright Act.

References

CCH Canadian Ltd. v Law Society of Upper Canada, 2004 SCC 13, [2004] 1 SCR 339.

CFLA Copyright Committee. (2023). CFLA Statement: AI and Copyright and its application in Cultural Heritage Institutions. https://cfla-fcab.ca/wp-content/uploads/2023/07/CFLA-FCAB_Statement_on_AI__Authorship-1.docx.pdf

Craig, C. J. (2021). AI and Copyright., in Florian Martin-Bariteau & Teresa Scassa, eds., Artificial Intelligence and the Law in Canada. Toronto: LexisNexis Canada.

Copyright Review Board, United States Copyright Office. (2022, February 14). Second Request for Reconsideration for Refusal to Register A Recent Entrance to Paradise. (Correspondence ID 1-3ZPC6C3; SR # 1-7100387071).” https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf

Gervais, D.(2020). The Machine as Author. Iowa Law Review. 105 (2062), 2053–2106.

Obeebo Inc. (2019). Comments on Intellectual Property Protection for Artificial Intelligence Innovation., submitted to USPTO Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation. https://www.uspto.gov/sites/default/files/documents/Obeebo-Inc_RFC-84-FR-58141.pdf

Stability AI. (2023). Comment from Stability AI. U.S. Copyright Office.

https://www.regulations.gov/comment/COLC-2023-0006-8664

U.S Copyright Office. (2023, March 16). Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence. Federal Register, 16191-16192

https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence

World Intellectual Property Organization. (1996) WIPO Copyright Treaty. Geneva.

Infringement and Liability regarding AI

CFLA’s posits that non-consumptive copies of works used to train AI machines are allowable uses under fair dealing. Thus, on the matter of infringement in training data also see our Text and Data Mining recommendations regarding a TDM exception, contract override provisions and adding illustrative purposes to fair dealing.

AI introduces a host of opportunities for misuse and exploitation of Ai's power to pursue illegal endeavours such as privacy intrusions, large scale copyright infringement, illegal collection of data and other actions. These issues are significant from a public policy perspective, and must be addressed along with the copyright implications of AI.

The Copyright Act already provides legal remedies for copyright infringement in works created through generative AI. If an AI-generated output is determined to be substantially similar to an already existing human created work, it can be subject to a copyright infringement claim which would be decided through the courts. Additional clarification in the Copyright Act is not needed. The question of who would be considered liable (e.g. those providing access to the generative AI product or service, the programmer, or the end user) exists on a continuum and needs to be evaluated by the courts. Generative AI outputs should not have copyright protection and should remain in the public domain. Courts must also protect against copyright misuse (Twigg, 2012), so rights holders seeking protection in areas such as style, ideas, facts and data do not overreach the statutory limits of copyright and encroach on the public domain.

As derivative works created by AI could be influenced by a number of factors, including a dearth of training data, infringement by users of AI services in many circ*mstances may be unintentional. Also, some generative AI services such as ChatGPT disclaim responsibility for similarity to content that is produced by their tools, and thus do not guarantee that similar material will not be created for different users of their service (OpenAI, 2023). Incidental copyright infringement might also be recognized as a possible defence when it comes to accidental infringement in AI-generated works. For example, many AI generated artwork outputs are somewhat random, and keywords used can only guide outputs. In Section 30.7 of the Copyright Act, the “incidental inclusion” provision states: “It is not an infringement of copyright to incidentally and not deliberately: (a) include a work or other subject-matter in another work or other subject-matter; or (b) do any act in relation to a work or other subject-matter that is incidentally and not deliberately included in another work or other subject-matter (RSC,1985, c. C-42).”

Libraries would like to see the ongoing development and use of generative AI centring transparency in how AI models are trained, algorithms are used, and in the design and intentions behind AI tools. Transparency is essential to protect and inform users about how generative AI tools make decisions, especially when it comes to certain areas such as healthcare. This transparency goes far beyond creator and copyright issues in terms of impact on Canadians.

A lack of transparency regarding training data is an obstacle to determining if a non-consumptive copy of a specific work was used in the AI-training process.

Using metadata tags to track training material could help remedy this situation. AI developers should keep records of where training data came from, and be required to disclose training data summaries in response to claims of infringement. Infringement claims should be based on the similarity of AI-created outputs to training materials that have been ingested, not based merely on non-consumptive copying of content, and infringement should be decided in a court of law. However, transparency requirements need to remain flexible, not be retroactive, and allow sufficient time for AI developers to plan and implement.

Some generative AI companies that used creative copyright-protected works to train their machines, such as Stability AI, have taken steps to create tools allowing for creators to opt-out of the inclusion of their work in the companies’ models going forward (Heikkilä, 2022). The ability to opt-out of training data should remain a private ordering and not be legislated. Legislating TDM to allow opt-outs will have significant unintended consequences by limiting the potential sources of data on which AI tools can be trained, thus contributing to existing issues of bias and inequality in AI-generated outputs as well as having serious long term effects on the future reliability of AI machines in certain applications such as health care, autonomous vehicles, etc (Craig, 2021, p.3; Creative Commons, 2021, p.6).

Copyright infringement liability should be determined in the courts. Liability for copyright infringement regarding outputs could lie with the developer, the AI company, or the user, or lie on a continuum. Additionally, liability when it comes to generative AI goes far beyond copyright when it comes to “high risk” consumer applications such as medical uses or self-driving cars, and the evaluation of whether user error or a defect was present in the design of the AI (Long, 2023).

The threat of liability will have an impact on cultural heritage institutions that are mandated to preserve, disseminate, and provide access to knowledge, culture, and history. These public good institutions need clear protection from liability so that they can continue their mission. With so little case law in the area of AI liability many jurisdictions may take a “wait and see approach” before implementing legislation, and Canada may be wise to follow suit (CRS, 2023).

The AI Act in the European Union offers guidance. It stipulates that any image, audio, or video content displaying a noticeable similarity to authentic or truthful content (a ‘deep fake’) must be noted as having been generated through automated means unless it is for some allowable purpose (European Commission, 2021). It may be useful to have identification mechanisms such as metadata for AI generated creative content in order to identify outputs created via generative AI and to distinguish them from copyrighted works, as well as having the ability to identify “deep fakes” (Barney & Wigmore, 2023).

As mentioned in the Text and Data Mining section of our response, a number of Canada’s key trading partners have a specific TDM exception, including Japan, Singapore, UK, and EU. The library community supports an exception that applies to commercial and non-commercial research, and includes the reproduction right and communication right such as Japan's 2018 TDM exception specifying that non-consumptive copies do not infringe the rights of the copyright owner (Ueno, 2021). The Japanese exception permits TDM for commercial and non-commercial purposes, prohibits rights holders from making TDM reservations and nullifies contractual clauses restricting TDM (Ueno, 2021). Singapore's 2021 TDM exception also allows for commercial and non-commercial TDM and forbids contractual overrides (Kang & Oh, 2021).

Recommendations:

1. Please refer to our TDM section.

2. A mechanism for judging copyright infringement in generative AI outputs already exists and thus infringement should be determined in the courts.

3. Liability regarding possibly infringing AI outputs may reside with the developer, the AI company, or the user, or lie on a continuum.

4. There needs to be a consideration of incidental inclusion when it comes to generative AI outputs.

5. With so little case law in the area of AI liability many jurisdictions may take a “wait and see approach” before implementing legislation, and Canada may be wise to follow suit.

References

Barney, N., & Wigmore, I. (2023, March 21). What is Deepfake AI? https://www.techtarget.com/whatis/definition/deepfake

CRS. (2023, September 29). Generative Artificial Intelligence and Copyright Law. (Legal Sidebar LSB10922) https://crsreports.congress.gov/product/pdf/LSB/LSB10922

Craig, C. (2021). Joint Submission of IP Scholars, Re. Consultation on a Modern Copyright Framework for Artificial Intelligence and the Internet of Things. https://digitalcommons.osgoode.yorku.ca/reports/226:

Creative Commons (2021). Submission to Government of Canada Consultation on a Modern Copyright Framework for AI and the IoT. https://wiki.creativecommons.org/images/f/f6/Creative_Commons_submission_Canada_consultation_AI_and_the_internet_of_things.pdf

European Commission (2021, April 4). Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts (COM/2021/206 final). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206

Heikkilä, M. (2022, December 16). Artists can now opt out of the next version of Stable Diffusion. MIT Technology Review. https://www.technologyreview.com/2022/12/16/1065247/artists-can-now-opt-out-of-the-next-version-of-stable-diffusion/

Kang, A. & Oh, P. (2021, September 20). Coming Up in Singapore. https://www.lexology.com/library/detail.aspx?g=1ce9c997-22a1-4953-bd0b-68a95d31bc89

Long, R.E. (2023, March 17). Artificial Intelligence Liability. Center for Internet and Society. https://cyberlaw.stanford.edu/blog/2023/03/artificial-intelligence-liability-rules-are-changing-1

OpenAI terms of use (2023, November 14). https://openai.com/policies/terms-of-use

Twigg, M. (2012). Copyright Misuse: Protecting Copyright in Canada from Overreach and Abuse. Dalhousie Journal of Legal Studies, 21(1). https://digitalcommons.schulichlaw.dal.ca/djls/vol21/iss1/6/

Ueno, T. (2021). Flexible Copyright Exception for ‘Non-Enjoyment’ Purposes. GRUR International, 70 (2). https://doi.org/10.1093/grurint/ikaa184

Comments and Suggestions

Copyright law should not be utilized as a tool to tackle the broader societal challenges that may result from the effects of generative AI on society. Nor should AI innovation be constrained in Canada by laws that are inflexible and have fewer exceptions than other competing jurisdictions, such as the US, which has an expansive fair use doctrine for AI developers and researchers to rely on.

AI possesses the capacity to revolutionize numerous occupations beyond individual creators, and such disruptive innovations have been seen throughout human history such as the printing press, automation in industry, and the digital disruption of the internet, to name a few examples. Addressing the resultant innovative disruption by supporting training for new opportunities in jobs related to AI development or by supporting worker retraining through organizations like community colleges, universities and public libraries, should be approached at an economic and society wide level (Library Copyright Alliance, 2023). As well, the Canadian government should invest in more grants and support for Canadian creative industries and for creators in the long term.

As it currently stands there is a huge swath of information that is unavailable to Canadian higher education researchers and smaller independent AI researchers because of technological protection measures and prohibitive licensing fees to access some data sets. This includes licensed library resources that in many cases require additional text and data mining agreements to be able to be used by institutional researchers for TDM purposes. Researchers may need access to many sets of data in order to complete a project, and there is a real risk that these research projects might not be realized. There is a societal risk of a regime of monopolistic access to data, where large AI or data companies are the only ones that can afford to gather, purchase or assume the risk of accessing data sets (Internet Archive, 2023). Democratic access is reduced under licensing regimes. It is in the public interest for Canadian AI researchers to have robust exceptions when it comes to TDM.

References:

Library Copyright Alliance. (2023, July 10). Library Copyright Alliance Principles for Copyright and Artificial Intelligence. https://www.librarycopyrightalliance.org/wp-content/uploads/2023/06/AI-principles.pdf

Internet Archive. (2023, Nov. 1). Internet Archive’s Public Comments in Response to the Copyright Office Study on Artificial Intelligence. Comment from Internet Archive. U.S. Copyright Office. https://www.regulations.gov/comment/COLC-2023-0006-8836

Canadian Media Producers Association (CMPA)

Technical Evidence

A. Technical Evidence

The use of AI in the production sector

Canada’s independent producers embrace the possibilities of Generative AI when used responsibly, ethically, and lawfully. These principles include respecting the rights in the content being used to train the systems and recognizing the importance of human expression in copyright-protected works and other subject-matter. For CMPA members, AI is a tool that supports – but does not supplant – human creation.

There is a need to distinguish between AI tools that assist with the creative and production process and those that use existing copyright-protected works and other subject-matter to generate content that competes in the market.

Technologies using some form of machine or artificial intelligence in the production of film and television projects are not new. Editing, visual effects (VFX), and computer-generated imagery (CGI) often employ AI-technology to make shows more realistic. AI can be used to streamline location scouting and budgeting, financial forecasting, and modelling. It can also assist with camera automation and scene optimizations, visual effects, colour-correction, sharpening detail, and sound mixing, among other things. When used responsibly, AI can provide tremendous value to the creative and production process and ultimately produce a better product on the screen. AI, like other tools, can also be used to support the creative process by handling routine, repetitive and sometimes menial tasks in production and post-production.

Producers and their suppliers routinely use a range of technological tools, including AI, to create and produce film, television, and digital media content. This saves production costs, time and resources that can be used elsewhere.

As we understand it, these tools are not the sorts of technology that is the subject of this Consultation. Instead, the Consultation is focussed on Generative AI, meaning diffusion models and large language learning models that can generate high-quality text, images, music, and other content based on the content they were trained on.

While Generative AI can be useful to automate monotonous tasks, like writing an email or designing layout, it does raise different copyright policy implications than the first-generation AI tools mentioned above.

As noted in the Consultation, the TDM used in the training of Generative AI systems involves the reproduction of large quantities of data and copyright-protected works. This raises a number of issues.

First, the “inputs” into TDM are often copyright-protected works, for which no licences or permissions are sought. Consequently, there is little, if any, compensation flowing to rightsholders for the use of their works.

Second, the data and works used to train Generative AI technology is a black box. Rightsholders have no insight or information about whether their works or other subject-matter were used to train a specific system. This creates not just licensing and enforcement problems for rightsholders, but also issues with respect to unknown biases, authenticity, attribution, and overall trustworthiness of these systems for users.

Third, Generative AI can output high-quality content that competes with the market for rightsholders’ works. To date, AI-generated videos remain relatively primitive and short and as a result, we have not seen a public Generative AI platform that is producing high-quality full-length television shows or feature films. However, we expect that as the technology advances this type of application will develop.

Fourth, AI can generate deepfakes, holograms, stand-ins, voiceovers, and reproduce an actor’s voice and/or likeness or a writer’s style. While not necessarily triggering copyright per se, these uses are copyright-adjacent and must be considered when examining the copyright policy implications of Generative AI.

Text and Data Mining

B. Text and Data Mining: No New Exceptions are Required

The Copyright Act is technologically neutral, meaning that copyright law has developed and should continue to develop in ways that are independent of any particular technology. Existing copyright laws are adequate to encourage innovation in the development of Generative AI and ensure that rightsholders are properly compensated when their rights are engaged.

As noted in the Consultation, there are two possible exceptions to copyright infringement that could apply to TDM activities: fair dealing under section 29 and the exception for temporary reproductions for technological processes in section 30.71. Both exceptions, or user’s rights, are available to Generative AI developers. It is our view that these exceptions are sufficient to allow certain uses of copyright-protected works in TDM, as appropriate. As such, no new exceptions are required.

Fair dealing for the purpose of research under section 29 of the Act is flexible enough to allow certain TDM required for certain AI uses (i.e., the “inputs”) if determined by a judge to be “fair” based on the facts. A fair dealing assessment will balance the interests of rightsholders and users on a case-by-case basis, by considering several factors, including the purpose, character and amount of the dealing, the nature of the work copied, alternatives to the dealing and the effect of the dealing on the work.

Implementing any sort of specific TDM exception would require a careful examination and balancing of all these factors. Various TDM exceptions around the world have attempted to incorporate some of these safeguards into the exception itself, such as requiring the use be for “non-commercial research only” (UK) or the “sole purpose of research” (France), making the uses available only for certain organizations such as not-for-profit partnerships, public libraries, archives and museums (France). Other TDM exceptions create conditions on the use: for example, the exception is only available if there is lawful access to the original work (UK) or if the first copy is not infringing (Singapore), the use does not “unreasonably prejudice the interests of the copyright owner in light of the nature or purposes of the work or the circ*mstances of its exploitation” (Japan); the copy is accompanied by sufficient acknowledgement (UK), the copies are not shared (UK) and numerous other conditions. Importantly, none of these TDM exceptions have been able to provide the level of balancing between rightsholders’ and users’ interests that Canada’s fair dealing exception can provide.

Additionally, we have not seen any evidence that a new exception for TDM is necessary. Generative AI platforms are flooding the market and there is no evidence that the current copyright framework is disincentivizing innovation in the field. To the contrary, several platforms appear to believe quite strongly that the training of their systems does not infringe copyright. Google, Microsoft, OpenAI, Adobe and Getty are all indemnifying their users against copyright claims that might result from the use of their platforms and the distribution of content generated by them. [See for example, Adobe’s indemnity, which reads: “ With Firefly, Adobe will also be offering enterprise customers an IP indemnity, which means that Adobe would protect customers from third party IP claims about Firefly-generated outputs”, Google’s indemnity here: [https://cloud.google.com/blog/products/ai-machine-learning/protecting-customers-with-generative-ai-indemnification], and Microsoft’s here: https://www.microsoft.com/en-us/licensing/news/microsoft-copilot-copyright-commitment].

The Consultation suggests that “clarity” might be required. But introducing an exception in the name of “clarity” now would mean the Government is merely anticipating the impact of rapidly changing technology in a developing market. We caution the Government on moving quickly to provide broad exceptions for TDM when the technology is changing so rapidly, its potential impacts cannot readily be known and the market is developing.

In fact, the introduction of an exception would disrupt a burgeoning licensing market for the use of copyright-protected works and other subject-matter for TDM in Generative AI. There is no reason to believe that copyright owners and Generative AI developers cannot enter into voluntary licensing arrangements as long as the parties are willing to negotiate. In some industries, direct voluntary licensing is already happening. OpenAI has entered into some licence agreements to pay for content to train its models, such as the photographs, video, graphics, music and “high-quality training data” provided by Shutterstock [https://investor.shutterstock.com/news-releases/news-release-details/shutterstock-expands-partnership-openai-signs-new-six-year] or the literary works provided by the Associated Press [https://www.ap.org/ap-in-the-news/2023/chatgpt-maker-openai-signs-deal-with-ap-to-license-news-stories#:~:text=ChatGPT%2Dmaker%20OpenAI%20and%20The,AP's%20archive%20of%20news%20stories]. These types of market-based solutions represent the desired outcome: they both respect copyright owners’ rights by providing market-determined compensation and facilitate the training of Generative AI systems.

Voluntary licensing is both feasible and desirable. The Government ought to take a laissez-faire approach and let the market work itself out. The fair dealing exception ought to be fit to handle the analysis for appropriate cases. Until and unless the market, Canada’s courts, or the Copyright Board exposes a real gap with respect to Generative AI that needs to be addressed, there is no valid policy reason to introduce any exception for TDM.

Notwithstanding various examples of direct, voluntary licensing in the market, rightsholders are facing challenges in licensing their works for TDM activities. The biggest barrier is caused by the lack of information available to rightsholders in terms of the content that is being used to train Generative AI systems. This information asymmetry causes an imbalance of power between rightsholders and system developers, which is generating market inefficiencies and may ultimately cause potential market failure. A simple fix to this problem is the imposition of transparency principles, which have been implemented in various jurisdictions around the world. We discuss these principles in more detail below.

Authorship and Ownership of Works Generated by AI

C. Authorship and Ownership of Works Generated by AI

Clarity on the question of authorship and ownership of works generated by AI is desirable, but it remains too early to determine what that solution should be. It is the CMPA’s view that this issue requires further study and consultation before the Government makes any anticipatory changes to the Copyright Act.

Certain generative AI outputs must attract copyright protection. From a copyright policy perspective, AI tools ought to be able to produce content that is protectable and exploitable, provided there is an exercise of skill and judgment in its creation.

As noted in the Consultation, while the Copyright Act does not explicitly define the word “author”, case law suggests that authorship must be attributed to a human who exercises skill and judgment in creating the work. Moreover, since the Government’s last copyright consultation on AI in 2021, there seems to be a generally accepted international consensus that human authorship is a bedrock requirement for copyright protection and that content generated by AI without any human involvement is not, and ought not be, protected by copyright.

The existing standard of originality for copyright protection, which requires the exercise of skill and judgement that is not trivial or purely mechanical, remains appropriate for AI-generated works. No other higher standard of originality need be introduced for AI-generated content. As noted in CCH v. The Law Society of Upper Canada, “A standard requiring the exercise of skill and judgment in the production of a work…. provides a workable and appropriate standard for copyright protection that is consistent with the policy objectives of the Copyright Act”. The current (low) standard of originality ensures that authors are not undercompensated for their work and that authors and other rightsholders do not face unnecessary and prohibitive bars to licensing, other exploitation opportunities, and infringement claims. Any push for a higher standard of originality for copyright protection ought to be rejected outright.

As noted above, AI tools are routinely used in the creation and production of audiovisual productions. Producers make significant investments into their productions and must have certainty that these investments are protected and capable of being exploited on an exclusive basis.

There have been some recent U.S. examples of content produced by generative AI systems where copyright protection was denied, despite the exercise of hundreds of hours of work by the creator/users, including experimenting through extensive prompting and inputting numerous revisions. (See for example, Jason Allen’s artwork, “Theatre D’opera Spatial,” created with Midjourney.) These examples illustrate some of the challenges producers and other rightsholders will face with the current uncertainty around authorship and ownership of AI-generated works.

Generative AI output must be protected by copyright to prevent others from exploiting the portions of audiovisual productions that may be generated by AI. If it is not, others will get a free and possibly deceptive ride on the creation, production and marketing costs related to those portions. If these investments and output are not protected by copyright, there will be little incentive for producers to use AI tools in the creation of their productions.

One of the approaches to the question of authorship and ownership identified in the Consultation is that the Government clarify that copyright protection applies only to works created by humans. While the CMPA remains of the view that human authorship should be a requirement for copyright protection, this approach leaves open the possibility that those portions of works that were generated by AI would be ineligible, or excluded, from protection. Rightsholders would be required to parse out the individual pieces of their work that were generated by AI and make claims only for those portions that were not. Indeed, the U.S. Copyright Office appears to be requiring copyright registration applicants to disclaim and disclose AI-generated content that is more than de minimus. If similarly applied in Canada, these types of developments would be prohibitively rigid for producers, not appreciate the use of AI as a tool in the creative process, ignore the significant contributions of human creators, and disregard the exercise of skill and judgment in the creation of a work. As it applies to audiovisual works, which are the product of dozens (if not more) individual contributions, each of which may have used some AI tools in their creation, this is unworkable.

Another possible approach is to attribute authorship to the person who arranged for the work to be created. The Consultation indicates this would be a similar approach to the United Kingdom, but it is also similar to the sound recording maker’s right in section 18 of the Copyright Act pursuant to which “the person who makes the arrangements necessary for the first fixation of the sounds” has a copyright in the sound recording.

This approach may ultimately provide the requisite clarity but requires significant further study and consultation to ensure the appropriate rights and protections are put in place, particularly given that such an approach could apply to various types of content, different stakeholders, and different industries.

The Government’s third approach suggests creating a new and unique set of economic rights for AI-generated works to a person who did not provide any original contribution to such works, such as an AI developer, deployer or user. While the Consultation likens this to the maker’s right, this approach alone would not solve the question of copyright ownership in AI-generated works that do have original contribution. A layering of rights may be appropriate, but more evidence and consultation would be required to establish the rights and policy implications.

The CMPA proposes that no changes be made to the Act at this time to address issues of authorship and ownership of AI-generated works. However, the question is deserving of further study as the technology, Canadian case law and international consensus develops. We propose that the Government continue to monitor international developments, and actively participate in ongoing discussions, maintain a watching brief and continue to consult with all stakeholders on whether and what changes to the Act are necessary to clarify authorship and ownership of AI-generated works and provide certainty in the marketplace.

Infringement and Liability regarding AI

D. Infringement and Liability

As previously discussed, the CMPA’s primary concern about existing legal tests for demonstrating that an AI-generated work infringes copyright is that it is difficult, if not impossible, to determine whether the system that generated the work was trained on or had access to the allegedly infringed work. The content used to train any particular Generative AI system is a black box. Rightsholders have no idea whether their works have been reproduced or whether any alleged similarity to their works are mere coincidence. This creates problems not just for demonstrating infringement, but also impedes licensing opportunities.

The current legal test for establishing infringement in Canada includes a requirement that the plaintiff establish a “causal connection” between the allegedly infringing work and the original work; in other words, that the defendant had access to the original work. But without some knowledge or information about what’s inside the box, rightsholders plaintiffs are left to infer or make assumptions about whether their work was copied.

Various lawsuits launched in other jurisdictions illustrate the issue. In a Stability AI/Midjourney case launched in the US, the defendants claim that the rightsholder plaintiff could not proceed with infringement allegations unless she identified with specificity each of the works she believes were used as training ‘inputs’ for the alleged infringing system. [https://fingfx.thomsonreuters.com/gfx/legaldocs/byprrngynpe/AI%20COPYRIGHT%20LAWSUIT%20mtdruling.pdf] Rightsholders will also undoubtedly face similar evidentiary issues as many datasets used to train Generative AI systems are purportedly destroyed after the initial training is complete.

Comments and Suggestions

E. Conclusion/Comments/Suggestions

The CMPA agrees with the Government that additional evidence and consultation is necessary to inform policy decisions in the area of Generative AI. As such, we applaud this important consultation. We also maintain that the Government should take a very measured approach in its consideration of whether any legislative change is required to deal with issues on Generative AI at this time. Good policy requires good evidence, and enacting legislative change to deal with theoretical issues resulting from nascent technology does not make good policy.

The one clear issue that demands immediate attention is the lack of available information about the content being used to train Generative AI systems. We ask the Government to implement transparency requirements on all developers of Generative AI systems similar to those that are currently being implemented in the EU.

Otherwise, we encourage the Government to exercise restraint in the area of Generative AI, keep a watching brief as these technologies develop, publish any evidence received as part of the Consultation and seek comment on policy considerations based on such evidence.

Canadian Publishers’ Council

Technical Evidence

At this point in their development, AI applications are being developed by CPC members for their potential contribution to greater efficiencies in the publishing business, across Trade, Educational and Professional parts of our sector. Examples are provided below. As a critical principle, CPC members approach this with complete adherence to existing copyright law and a commitment to preservation of creators’ exclusive rights, and argue for the paramount importance of these principles in this submission.
Areas of concern include the unlicensed use of works for training AI systems. In the educational field this includes the use of AI systems to circumvent Assessment tools in areas such as exams and assignments, across the whole education spectrum but particularly in Post-Secondary Education. This is a concern shared of course by Post-Secondary institutions – universities and colleges, both public and private – who are important customers for many CPC member firms.
CPC members are using their own content to train AI systems or when more training data is required, will enter into licenses to be able to legally use copyright content. We note that the developers of the large language models LLMs (such as OpenAI, Meta, and Google) unfairly compete with publishers by using infringing copies of books and other media such as books that are being illegally made available though pirate online libraries like Books2 and Books3.

Text and Data Mining

Our principal concern will always be the upholding of rightsholders’ rights – both authors/creators and publishers – as the market for AI-generated content evolves. It is critical that rightsholders’ consent be obtained for training AI systems. In our view, training AI systems involves multiple reproductions and adaptations of works. There is no question that copying is involved when works are downloaded for training purposes, when works are copied and compressed into AI models, and in many cases the outputs from systems training on publishers’ materials are reproduced verbatim or are adaptations that include substantial parts of ingested materials because the AI systems’ “memorization” capabilities (1).

Further, in our view, the models training, the models ts that are trained using publishers’ works, and many of the outputs would infringe copyright, subject to any defenses such as fair dealing for the purpose of research (discussed below). It must be borne in mind that a work is reproduced when it is copied in “any material form”. This means that a work is reproduced even it is encoded in any language or notation (including tokens generated during AI training) or is stored in a compressed form from which publishers’ works can be perceived by or with the aid of a machine or device (2). Thus, training AI models using unlicensed publishers’ works implicate the reproduction right because of the three different processes involved in training AI models and providing services using them that generate outputs. Further, operators of generative AI systems that produce synthetic outputs that contain reproductions of all or any substantial parts of publishers’ works may also infringe the authorization right and/or the making available rights under the Copyright Act (3).

With that said, any litigation necessary to vindicate publishers’ exclusive rights would be costly and would likely take many years from when a case is commenced until it is ultimately resolved by the Supreme Court. In the interim, many AI operators including the very large platforms will likely continue to exploit publishers’ works without consent or providing any form of compensation. Further, publishers would be unable even to collectively license their works for training purposes using the Copyright Act’s tariff system because of the unfortunate decision by the Supreme Court that held that even certified tariffs are not mandatory. While the law is clarified by the courts, AI operators may continue to unfairly copy publishers’ works and thereby compete with publishers’ commercialisation of their publications and undermine their own adoption of AI systems.

We note that France’s National Assembly Bill aimed at regulating artificial intelligence by copyright (5) would expressly require the authorization of authors or rights holders for training or exploitation of artificial intelligence systems.

For the above reasons, the CPC recommends that the law be clarified to provide an express right of publishers to authorize the use of their works in AI systems that are trained or made available for commercial purposes (“commercial AI use”). Further, to avoid providing foreign AI operators an advantage over Canadian operators, the training authorization right should apply to any AI systems that make the service available in Canada.

As noted above, some uses of publishers’ works could fall within a fair dealing exception. The Supreme Court has construed the fair dealing exception for research and the fair dealing factors as user rights. Under the Supreme Court precedents, the exception for research may include activities that are not limited to non-commercial or private contexts and can include those that “facilitate” research and even large scale infringements can, in certain instanced, also be considered to be fair (6). However, publishers view the research exception as intending to promote the goal of research conducted by human beings and do not accept that training machines even to facilitate research by human beings is compatible with the human centric goal of research under the fair dealing exception. However, the law on this has not yet been settled in Canada.

Of course, all fair dealing determination are questions of fact and “a matter of impression” and are left to the courts. Therefore, publishers recommend that the clarified exclusive right to authorize commercial AI use not be subject to any fair dealing exception.

The CPC strongly objects to any TDM exception. It would be impossible to properly calibrate such an exception to take into account all of the evolving AI business models and uses of publishers’ works for AI purposes. Further, given the evolving market for publishers’ own exploitation and licensing of their works for AI purposes, any TDM exception would undermine market forces and effectively create an uncompensated compulsory license. A TDM exception could violate Canada’s TRIPs and other treaty obligation that prohibits any exceptions to copyright that would violate the Three Step Test.

It is also critical that companies developing AI systems be required to accurately track, retain and disclose content sources used for AI training, without any attempt to rely on existing exceptions (e.g., Research, Education) to circumvent this requirement. In the absence of this protection, it is impossible for rightsholders to know if their works have been ingested for training purposes, given the vast scope of inputs into Generative AI models, and the ‘black box’ nature of their modeling.

There are several ways to help promote transparency.

First, there should be a Bill of Discovery right to enable rights holders to obtain disclosure as to whether their works were used to train AI models.

Second, amendments could be made to Canada’s draft Artificial Intelligence Act (AIDA) to expressly include transparency requirements on persons who make available or manage a general-purpose system along with the other transparency requirement currently being proposed by the Minister. AIDA should also require such persons to respect Canadian copyright laws regardless of where the training and deployment of the AI systems takes place if the AI system is deployed in Canada. Publishers also view the unlicensed use of their copyright content to be very damaging and recommend that the term “harm” in AIDA expressly include the economic and moral harms to authors and publishers associated with generative AI systems.

It is only in these ways that rightsholders can be assured of the opportunity to participate in the value chain created by use of their content for Generative AI.

TDM activities are undoubtedly being conducted in Canada, in a broad range of applications that include media monitoring and scholarly research, especially in as Large Learning Models. Some of our members are developing their own AI solutions. For example, Thomson Reuters offers services called “Westlaw Precision” and “CoCounsel” which jumpstarts AI assisted legal research. A second offering built on Open AI’s GPT-4 provides legal professionals with document review, legal research, contract analysis, compliance and other functionality. Most of our members would also engage with existing AI platforms such as OpenAI’s ChatGTP.

The principal challenge here is ensuring that exclusive rights under copyright law are respected. This includes tracking and disclosure of content use for ingestion into AI models, as mentioned above. Many providers of generative AI systems have disregarded copyright rights in the hopes they can become well established and eventually forestall having to pay for their unlicensed uses of works. The sooner the law is clarified, the easier it will be for publishers to license their works including for uses in Canada.

Existing licencing mechanisms, both direct and collective, are available today to cover content licencing needs for AI model ingestion. For example, the Copyright Clearance Center (CCC) offers TDM licences for the use of literary a works. Many scientific, technical, and medical publishers (STM publishers) offer TDM licences in Canada. An example is Elsevier’s TDM licence which permits non-commercial research (7). Taylor & Francis offers a TDM license which can also include a license for commercial purposes.

Further there are market developments that illustrate that developers of generative AI systems can develop AI systems using their own content or content that has been properly licensed from others. Examples are Getty Images (which uses its extensive library of licensed images), Adobe (with its Firefly generative AI model that is trained on Adobe Stock images, openly licensed content, and public domain content) (9), Meta’s AI image generator (which was trained on public facing Facebook and Instagram images) (10), and, as noted above, Thomson Reuters services called “Westlaw Precision” and “CoCounsel” (which are trained on its own proprietary content and public domain content it makes available).

We believe it is critical that market forces be allowed to develop to facilitate appropriate licencing models for AI content ingestion, based on the framework exclusive rights provided by copyright law. It must be remembered that a TDM exception will provide no remuneration or compensation to authors or publishers. The copyright law provides a market framework by which market prices are set for the uses of copyright works. A TDM exception would operate extremely unfairly by undermining publishers’ own ability to exploit their works including for AI purposes. It would also, effectively, operate as a very unfair wealth transfer mechanism from authors and publishers to AI companies, many of which are very large and well funded companies. (Continued in Comment Section)

Authorship and Ownership of Works Generated by AI

At this point in time, there is no evidence of an adverse impact on the development or adoption of AI technologies due to the copyright status of generated works. However, rights holders would be concerned providing copyright protection for works created entirely by computer generated content.

We are of the view that the Supreme Court of Canada originality test is fit for purpose and would adequately address the copyright status of AI-assisted works. We believe that copyright should be reserved for human-created works that meet the test of originality. AI-generated works that have NO human intervention should not be protected by copyright.

The approaches in the UK and other countries that provide protection for computer generated works were conceived before generative AI was introduced and would potentially provide protection for works created without the requirement of originality, which would not be appropriate.

Infringement and Liability regarding AI

There are no compulsory requirements for GenAI engines to record and disclose the datasets that are ingested for AI training purposes. In the absence of this requirement, it is difficult to determine what copyright-protected content has in fact been used. This adversely affects the ability of publishers to negotiate licenses or seek legal redress. We recommend the approach being adopted in the EU under the draft Artificial Intelligence Act (AIA) that would require transparency, except to the extent a creator uses its own works for training purposes (12).

CPC members are vigilant only to use their own works or works that are licensed. This is not the practice of the large Generative AI companies like OpenAI, Meta’s LLM, or Google. There are instances of AI-generated content, derived from AI training from copyright-protected works, being fraudulently marketed as being authored by the creator of that copyrighted work (e.g., Clare Duffy lawsuit in U.S.)

The extensive litigation in the United States shows that AI companies reject that outputs from their content infringe copyright. Some outputs clearly reproduce all or a substantial part of input data. The New York Times suit (referred to above) is a good example. However, grey area is where the outputs are derived from the training data, but specific outputs may not actual meet the test to be a reproduction under the Copyright Act. For example “style” may not be protected by copyright. But, generative AI systems are capable of and do produce output in the “styles” of authors, artists, and other creators. Further, “facts” are not protected by copyrights. But, here again, generative AI systems can wholly ingest facts as part of the ingestion of works and use these facts in output in ways that could complete with the publishers, a good example being a publication that is comprised of compilations. In a broader sense, all generative AI content is derived from copyright works or a mixture of copyright works and works in the public domain (such as where the copyrights have expired) or works that have been licensed.

This “derived” content is extremely valuable and it would be very unfair for the copyright works to be appropriated to generate massive profits for AI companies with no control or compensation to publishers and other creators. A major challenge with the existing copyright law is that while a specific output may not be derived and reproduce a specific input, when taken in the aggregate many outputs would be derived from the aggregation of unlicensed copying of publishers’ and other creators’ copyright works.

It is undoubtedly for this reason that France’s National Assembly Bill aimed at regulating artificial intelligence by copyright (13) would provide for compensation to be paid to authors for all generative AI content whose origin cannot be determined (14).

This issue is an especially important one for Canadian publishers as, to date, much of the training of LLMs has been done in the United States. This leaves Canadian based publishers with few remedies in Canada for output that is largely derived from copyright works including where specific outputs are not infringing. Also as generative AI systems will often return different results from specific text based prompts, it may be difficult to make out a case of reproduction by relying solely on a specific output.

CPC publishers therefore recommend that the Copyright Act be clarified so that all outputs of generative AI systems that are derived from copyright works, individually or in the aggregate, used without a license be deemed to be infringing reproductions.

Footnotes

(1) See, Exhibit J to the Complaint by the New York Times against OpenAI and Microsoft; also, Extracting Training Data from ChatGPT.

(2) The Copyright Act makes it an infringement to reproduce a work (or a substantial part of a work) in any “material form”. This is a technologically and media neutral term that would make any copying into a form of storage a reproduction, regardless of the form of notation or encoding. See, Apple Computer Inc. v. Mackintosh Computers Ltd. (1986), 28 D.L.R. (4th) 178 (Fed. T.D.), varied (1987), 44 D.L.R. (4th) 74 (Fed. C.A.), affirmed [1990] 2 S.C.R. 209; Canadian Broadcasting Corp. v. SODRAC 2003 Inc., 2015 SCC 57, Robertson v. Thomson Corp., [2006] 2 S.C.R. 363; Labrecque (O Sauna) c. Trudel (Centre Bellaza, s.e.n.c.), 2014 QCCQ 2595.

(3) If the outputs are copied by users, then the AI operators infringe the authorization right. If copies are only made available for viewing, then the AI operators my be liable under the making available right. See,

Society of Composers, Authors and Music Publishers of Canada v. Entertainment Software Association, 2022 SCC 30.

(4) York University v. Canadian Copyright Licensing Agency (Access Copyright), 2021 SCC 32 (“York”)

(5) France National Assembly Bill No 1630 online @ https://www.assemblee-nationale.fr/dyn/16/textes/l16b1630_proposition-loi ““Article L. 131-3 "The integration by artificial intelligence software of intellectual works protected by copyright into its system and a fortiori their exploitation is subject to the general provisions of this Code and therefore to the authorization of authors or rights holders".

(6) Society of Composers, Authors and Music Publishers of Canada v. Bell Canada, 2012 SCC 36; York

(7) https://www.elsevier.com/about/policies-and-standards/text-and-data-mining/license

(8)Getty made an AI generator that only trained on its licensed images - The Verge

(9) Adobe Firefly - Free Generative AI for creatives

(10) Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos | Ars Technica

(11) There is a precedent for this. See ss. 29.4(3), 30.1(2P of the Copyright Act.

(12) "4. Providers of foundation models used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video (“generative AI”) and providers who specialise a foundation model into a generative AI system, shall in addition:

c) without prejudice to national or Union legislation on copyright, document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law.” EU AI Act (draft Compromise Amendments) May 9, 2023

(13) France National Assembly Bill No 1630 online @ https://www.assemblee-nationale.fr/dyn/16/textes/l16b1630_proposition-loi

(14) Article L. 121-2 ““Moreover, in the event that a work of the mind is generated by an artificial intelligence device from works whose origin cannot be determined, taxation intended to promote the creation is established at the benefit of the body responsible for collective management designated by amended article L. 131-3 of this code.

“This taxation is imposed on the company which operates the artificial intelligence system which made it possible to generate said “artificial work”

Comments and Suggestions

Continued from TDM Section...

If these or any other companies want to use AI works for training they should negotiate fair agreements and pay publishers for inputs the same they do for other inputs.

Clearly content creators such as authors and publishers would want TDM activities to be conducted under licence only, and not under any newly-conceived exceptions. The expected impact is clear, and consistent with existing law: the protection of rightsholders’ exclusive rights and the concomitant opportunity to ensure proper compensation for use of their content. These rights and their safeguarding are the foundation that supports the creative industries broadly, and publishing in particular, and is the legal framework that encourages creative endeavours which in turn are critical to Canadian culture and voices. Moreover, this approach reinforces the notion that human creativity is at the heart of what Copyright has been established to protect, and will continue to be the critical input for the cultural sector even as Generative AI exerts more influence in the market for content. Lastly, it is only when AI Developers engage in appropriate licencing mechanisms that fulsome investment in AI development will continue, and consumer confidence in AI outputs will be strengthened.

However, notwithstanding the foregoing, any TDM exception should not apply unless (i) the exception is purely for non-commercial research, (ii) the work used is not infringing and is not in breach of any license or contractual prohibition, (iii) any copying must be ephemeral, (iv) the resulting output cannot reproduce all or any part of the work used for training, (v) the copyright holder must be notified before the work is used, and (vi) there is no circumvention of any technological protection measure (TPM) or violation of the rights management information (RMI) provisions of the Act.

We also note that Canada is obliged by international treaties (including the WCT and WPPT and the CAUSMA) to provide the latter protections, and any TDM exception must respect these obligations as well as the Three Step Test.

Further, we recommend that if a TDM exception is to be proposed it should be subject to these conditions:

Before a work or works from a publisher is used, the publisher must be notified and the AI company must have an obligation to negotiate a license at fair market value. If a price cannot be agreed, the publisher can have the price settled by arbitration or the Copyright Board. We note that copyright owner of books made available in Canada are listed in publicly available sources such as [Booknet?].
Neither the TDM exception (nor any fair dealing exception) should apply if the work is commercially available under a license including a tariff approved by the Copyright Board.

This requirement is absolutely critical to the evolution of legitimate AI industry in Canada and beyond Clear and discoverable records of content use, with full disclosure requirements, must be the foundation for this framework, in order to protect rightsholders’ rights and ensure their ability to participate in the value chain that AI is developing. In the absence of such safeguards, human-based creativity is put at risk and may become increasingly marginalized.

Market-based licencing mechanisms must be allowed to develop in this sphere. If the playing field is balanced between rightsholders and AI systems via the markets created by the system of exclusive rights , the price will set itself via negotiations.. The appropriate price should always be the price that is (or would be) negotiated by a willing buyer and a willing seller.

Alissa Centivany and Kailin Rourke

Technical Evidence

A NOTE ON TECHNICAL EVIDENCE

As the authors are independent experts and do not represent any organization, we are unable or ill-suited to provide responses to many of the technical questions posed. As is so often the case with LLMs and related “big data” systems, the ways these systems are deployed lacks transparency, are difficult to trace, and are not subject to reasonable notice and disclosure policies, and thus we can only assume that the data we generate through our work as a professor and student is being collected and used to train large language models. For example, Western University faculty and staff are provided with Microsoft Office (Microsoft 365) and Microsoft has major investments in OpenAI. We therefore speculate, without hard evidence (as evidence is obfuscated and withheld as a matter of course) that our data is being used without our specific knowledge or consent. In addition, some members of our community choose to use LLMs in the course of their work, either to produce or assist in the production of works or as a site of critical inquiry and research. In terms of the mitigation of risks, it is proceeding in an ad hoc, haphazard fashion. Individual instructors might set course-level policies regarding the use of generative AI, faculties might have their own policies, the University currently has no overarching policy that applies specifically to generative AI, but other policies likely provide some governance oversight, i.e. policies on academic integrity.

Text and Data Mining

Clarity around copyright and TDM in Canada is critical. The following brief discussion aims to highlight what I believe to be the most salient factors to be considered in terms of proceeding with clarifying regulation.

First, it is necessary to be careful and deliberate in terms of how we understand TDM. TDM cannot be clumped into a single monolithic category because, in practice, it actually consists of a number of distinct activities (and often distinct actors) that bear differently on any given copyright analysis. For example, the activity of “compiling a data set” may raise copyright concerns if code is written to scrape and copy protectible works and store the works in a database. Here, the creation and application of the scraping tool(s), the selling of the tool(s) to others, the scraping and copying of protected works, and the storage of those works in a database could all, separately and distinctly, create exposure to liability for copyright infringement. If a user subsequently queries the dataset to pull out key words or phrases, this might raise copyright-relevant concerns depending on various factors such as how much of the original texts are reproduced and for what purpose. Therefore, TDM should be understood as consisting of a series of distinct activities that may raise distinct copyright implications and we would all benefit from greater clarity about “which activities” are protected or exempted.

Second, TDM not only consists of distinct copyright-relevant activities but often also involves distinct actors that may play a role in one part of the TDM “chain” but not other parts. To illustrate, an actor might create an application that scrapes and compiles data, or create an LLM that “trains” on that data, or create a user interface that enables third-parties to query the dataset, but the actor may not utilize or analyze the scraped data to produce new works or insights from it. Under existing law, it is TDM is often accomplished through collaboration or cooperation between distinct actors that might play different roles and, under existing copyright law, each of these actors’ exposure to liability for their part may be different (and unclear). We lack clear rules under existing law on how privity functions with regard to exemptions, rights, and defenses that typically favor end-users/creators of secondary works (e.g. fair dealing) might apply to intermediaries along the TDM chain. Thus, in clarifying the copyright implications of TDM, it is essential that Canada addresses questions of privity (whether intermediary actors that facilitate but do not directly engage in socially productive activities just discussed enjoy the exemptions, rights, and defenses users/creators might under the Act) and contributory or vicarious liability. It is my position that the socially productive use principle should govern how privity and vicarious liability might apply in the TDM context. In the case of emerging LLMs, I would argue that the nexus between creators/owners of the scraped datasets and query tools, i.e. ChatGPT, and users that might generate secondary works is too tenuous and ethically fraught to support claims of privity vis-à-vis fair dealing.

Other jurisdictions have created TDM regulations on the basis of activity-type. For example, Japan permits TMD for data analysis, the UK and France allow it for non-commercial research, the EU permits it for scientific research by research orgs, and Singapore for computational data analysis. In the United States, its Copyright Act does not yet include specific TDM provisions, but case law seems to suggest that courts will consider whether or not a use is a “transformative” fair use. In terms of the aggregation of protected works into large datasets, U.S. case law arising from the mass digitization project undertaken by Google and various libraries is likely to provide the most relevant recent analog. In those cases, Google partnered with libraries to digitize their collections including, at least in the case of the University of Michigan library, millions of in-copyright works. The Authors’ Guild sued for infringement. The Second Circuit Court of Appeals held that digitizing print materials to facilitate search and discovery, and to support the provision of services for vision-impaired patrons, constituted non-infringing transformative uses. The actual mining activities (now made possible through HathiTrust’s research division for its (primarily academic members)) were not litigated and thus we do not know for certain how that Court would have ruled had TDM been at issue. While the copyright implications of TDM are still an open question under U.S. law, good arguments can be made that query’s made to HathiTrust’s corpus would also be exempt as a class of “non-consumptive and non-expressive uses”; users query the database to generate responses that they then interpret to form new insights and transformative secondary works under the supervision of HathiTrust research staff.

The nature of TDM using HathiTrust is distinguishable from generative AI models in a number of key ways. First, the HathiTrust database was generated from lawfully obtained copies (the library’s print corpus) whereas we have no evidence that generative AI systems are comprised of lawfully obtained copies. Second, while TDM was not litigated in the HathiTrust case, the Second Circuit found that mass copying for purposes of facilitating search and discovery was a transformative fair use and this aligns with the kind of TDM activities HathiTrust supports. By contrast, generative AI systems produce outputs that are both consumptive (meant to be consumed or read just as the primary source was) and expressive. Third, HathiTrust is an organization that consists primarily of libraries and thus its mission, values, and culture are strongly aligned with research, education, and serving the public interest. Unlike libraries, which enjoy a special status in society that is also reflected in the Copyright Act, the firms creating generative AI systems are largely technology companies that are private, profit-driven entities whose interests and motives are quite different from libraries.

Canada should clarify the copyright implications of TDM along the lines of what other jurisdictions have done, protecting TDM of protected works for “socially productive” purposes. Socially productive uses would include research, education, criticism, parody, non-consumptive and/or non-expressive uses, and uses that result in secondary works that serve a new and different function or purpose from the original one and are not a substitute for the original. [See Centivany, “Understanding Organizational Responses to Innovative Deviance: A Case Study of HathiTrust”, (2016) dissertation available in UMich Deepblue; Centivany, “Innovative Deviance: A Theoretical Framework Emerging at the Intersection of Copyright Law and Technological Change”, in Proceedings of the iConference (2015); Centivany “Contributory Transformative Use” (2017) manuscript on file with author.]

Canada has an interest in supporting innovation and competition and greater clarity about both the activities and actors would promote those interests. However, it is also critical to note that the purpose of copyright law is not to support innovation (at least not in the dominant understand of innovation) or competition. Rather the aim of copyright law in Canada is to promote the creation and sharing of cultural works for the benefit of all and ensure that adequate incentives exist to foster creativity. It is doubtful that the outputs of generative AI systems satisfy the goals of the Act. Returning to the overarching purpose of the Act is always critical, but it is especially important when emerging technologies create policy vacuums, disequilibria, and instability. We should also be clear that not all innovation is socially productive. To the extent that generative AI is used to disintermediate workers in the cultural industries or intellectual sectors while monetizing data sets comprised entirely of unlawfully obtained works to facilitate the production of simulated creative outputs, it is not clear that this promotes the interests of the Copyright Act. Socially productive use, even where it might appear to transgress existing (and perhaps outdated) legal doctrine, can be a way for the Act to grow and adapt in light of emerging technologies while remaining faithful to the underlying purposes of the Act.

Authorship and Ownership of Works Generated by AI

Most, if not all, jurisdictions condition authorship on “human-ness”. Under Canadian law, works require an author’s “skill and judgment” (meaning the use of one’s “knowledge, developed aptitude or practiced ability in producing the work” and one’s “capacity for discernment or ability to form an opinion or evaluating by comparing different possible options in producing the work”) and an exercise of “intellectual effort” in the “expression of ideas” (CCH). The government should clarify through an amendment to the Act that an “author” must be human.

Questions might be raised as to whether the author of the LLM could be an author of the outputs produced through its use. The nexus between the author of the program or code, and the outputs of the AI, are too tenuous to justify copyright protection by proxy. The programmer does not met the just-stated requirements of authorship under the Act.

Thus, the salient question here is whether or to what extent a generative AI-assisted human can obtain copyright protection for the AI outputs. This is a complex question that, in reality, should probably involve a case-by-case analysis akin to fair use or fair dealing inquiries. However, given that a primary object of the Act is to incentivize authors to create it is worth asking whether the limited monopoly afforded by copyright is the incentivizing factors for AI-assisted works, or rather whether the ease of use and inexpensiveness of generative AI is incentive enough. My position would be that copyright-related incentives are not necessary for AI-generated works and therefore they should not acquire protection. AI-assisted works would likely benefit from a fair dealing or fair use-like analysis, but these are costly and inefficient. Conclusions regarding authorship and ownership of AI-generated works should be guided by the overarching purpose of the Act. On balance, the outcome most consistent with the purpose of the Act and existing precedent is that works resulting from generative AI, absent sufficient human authorship, would enter the public domain.

Infringement and Liability regarding AI

Infringement and liability can arise in a few different ways regarding AI. First, it can result from scraping, copying, and other acts related to the creation of the dataset upon which the LLM or generative AI will train or operate. This issue was addressed in the previous discussion of TDM. Second, infringement can result from the outputs generated by the AI because, for example, they constitute unauthorized derivative works or are substantially similar to existing protected works. Third, if it is determined that the outputs of generative AI models may acquire copyright protection, should these works be treated the same as human-generated works for purposes of liability and infringement. As discussed in the previous section, consistent with other jurisdictions, copyright does not subsist in AI-generated works. For AI-assisted works, the question is more complex, but I conclude that, given the overarching purposes of the Act, the ease and efficiency of production using AI tools likely obviates the need for copyright-based incentives.

With regard to whether AI outputs may constitute infringement works, either because they are unauthorized derivatives or because (using language from U.S. caselaw) they are substantially similar to protected works, the simple answer is “Yes”. Outputs generated by AI systems can infringe. Under fair dealing, research, private study, education, parody or satire, criticism or review, and news reporting may be exempt from liability for infringement if the factors articulated in the CCH test are also satisfied: the purpose, character, and amount of the dealing, alternatives to the dealing, nature of the work, and effect of the dealing on the work. (CCH v. LSUC (2004).

The infringement analysis of AI-generated works would mirror the analysis had generative AI not been involved aside from the questions of contributory liability raised earlier in the TDM section. In other words, would the creator of the AI be contributorily liable for infringing outputs resulting from the queries of users? This is an open question, and we would look to other examples and case law to analogize and distinguish. For example, in the famous U.S. case Sony v. Universal, the court found the maker of the Betamax (a VCR-like device used for recording or “time-shifting” broadcast television shows) not liable because the device was capable of “substantial non-infringing uses”. The government should provide additional guidance on the extent to which the makers of generative AI systems can avail themselves of protection-through-privity or may be subject to vicarious liability.

Comments and Suggestions

This submission was prepared by Dr. Alissa Centivany, Assistant Professor at the Faculty of Information and Media Studies at the University of Western Ontario, and Ms. Kailin Rourke, student in the Master of Library and Information Science program at Western University. These comments reflect the views of the authors not Western University.

As a general matter, we note that emerging technologies always pose challenges for existing legal regimes. From the printing press, to the player piano, to the VCR, to Napster, to ChatGPT, controversies are a normal, perhaps unavoidable, part of sociotechnical change. Technological change tends to be relatively fast-paced, forward-looking, and innovation-focused, while legal institutions and rules tend to change slowly, employing logics that are retrospective, bound to precedent, and (at least partially) concerned with preserving the legitimacy of the status quo. Presently, Generative AI systems are raising important, often complex questions regarding the nature of research and innovation, creativity and collaboration, expression and transformation, reliability and trust, and whether and how institutions critical to democracy such as the press, public education, elections, and parliament itself might be impacted or undermined. In light of the specific questions and interests outlined in ISED’s public consultation, the survey responses offered focus narrowly on the copyright implications of Generative AI systems which include concerns regarding authorship, ownership, infringement, attribution, consent, and fair dealing/use.

Coalition for the diversity of cultural expressions

Technical Evidence

B. Fact Finding: Recent Applications of Generative Artificial Intelligence (AI) in the Cultural Sector

The CDCE has been examining the interaction between AI and culture for several years. [See for example, CDCE (2018), Ethical Principles for the Development of Artificial Intelligence Based on the Diversity of Cultural Expressions : https://cdec-cdce.org/wp-content/uploads/2018/11/EN-CDCE-AI.pdf.] Our members recognize AI’s potential and are exploring the benefits of this new technology. AI, like other tools, can be used to enhance and support human creativity when used responsibly and ethically. In the cultural industries, AI is used as a tool that supports – not replaces – the original human expression of creators’ works. Some creators are using AI as a tool as part of their creative process to reduce some of the time-intensive, rote aspects of their work. Publishers and producers are using it to assist with layout, style, visual effects, color-correction, and sharpening detail, among other things. When used responsibly, AI can provide tremendous value to the creative process. However, when used irresponsibly, AI has the potential to seriously undermine and damage the cultural sector and the diversity of cultural expressions in Canada, and around the world.

This Consultation is focused on “Generative AI”, that is, systems that use deep-learning models that can generate high-quality creative content based on the works, performances, and sound recordings they have been built on. In comparison, “traditional AI”, generally refers to tools or systems that perform specific tasks based on predefined rules and inputs. Using AI to assist in crafting a response to an email is markedly different than using it to produce a song, an illustration, or a poem. When considering whether change is needed to copyright policy, there is a need for careful differentiation between the two. We are pleased that the Government has focused this Consultation on Generative AI and concentrate our response on that next generation of AI.

As a general principle, AI developments can and should co-exist with a copyright system that incentivizes creators to create and disseminate their work and protects the rights of copyright owners. But Generative AI platforms are profiting significantly from the unauthorized uses and reproductions of the works, sound recordings and performances represented by CDCE members.

As noted in the Consultation Paper, the text, and data mining (TDM) used for Generative AI systems involves the reproduction of large quantities of data and copyright-protected works, performances and sound recordings. These “inputs” in TDM are often copyright-protected works, or other subject matter, for which no licenses are sought, nor any compensation flowing to rightsholders.

TDM also infringes authors’ moral rights. No attribution is given to the authors of the text, music, artwork, and other copyright-protected content that is ingested into Generative AI systems, either during the training of the system or during the use of the system. TDM also distorts works, including by cropping photographs, using lower resolutions, and disaggregating lyrics, text, or lines of music into segments and reassembling them into different sequences. These material alterations and mutilations offend the integrity of authors’ works to the prejudice of their honour and reputation.

The outputs of Generative AI similarly pose fundamental and existential issues for the cultural sector. As examples, in the music sector, Jukebox, released by OpenAI in Beta form, can produce a “wide range of music and singing styles, and generalizes to lyrics co-written by a language model and an OpenAI researcher” precisely because it has ingested vast amounts of previously composed and recorded music. [See Jukebox (openai.com) : https://openai.com/research/jukebox] Similarly, MuseNet can generate up to “4-minute musical compositions and can combine styles from country to Mozart to the Beatles”. OpenAI states that MuseNet discovered patterns of harmony, rhythm, and style using the same technology as GPT-2, a large language model that ingested a dataset of 8 million webpages. [See: MuseNet (openai.com): https://openai.com/research/%20musenet]. Books3, used to train Meta’s Generative AI, was based on a collection of more than 191,000 pirated books. [See: These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech - The Atlantic : https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/].

Generative AI outputs also often infringe creators’ copyrights by reproducing and communicating to the public “lookalikes” and “soundalikes” of creators’ original expression, often in response to prompts of users seeking exactly that.

In addition, various applications rely on AI, including TDM, to generate synthetic media, such as deepfakes, holograms, digital replicas, stand-ins, voiceovers, virtual characters, and environments. Applications can reproduce a person’s voice, image and/or likeness. Text-to-speech systems can reproduce a person’s voice, while other systems can rework the facial expressions of actors to assist with dubbing. If the output incorporates performer’s image or likeness, it is a reproduction of a substantial part of that performer’s performance and an infringement.

Finally, AI-generated output can also infringe authors’ moral rights. As one example, if a work is marketed as “in the style of” or “in the sound of” a particular author, the use will often be a bastardization and inferior copy of the author’s work. Infringing output can also be used in inappropriate contexts, like in a political campaign with which the author does not agree. Both examples could cause real prejudice to the honour and reputation of the author.

These types of outputs compete directly with the market for creators’ works, and threaten the livelihoods of Canada’s writers, authors, actors, publishers, musicians, songwriters, composers, visual artists, performers, directors, labels, music publishers, and producers.

All these uses, and the consequent infringements, must be considered when examining the copyright policy implications of Generative AI.

Text and Data Mining

C. Proposed New Exceptions are Unwarranted and Unnecessary

The Consultation paper asks, “What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?” Absolute clarity around copyright and TDM in Canada need not be the goal of this Consultation, or of copyright policy more generally, especially as the market is developing around these new uses. Serious risks arise from assuming that a new form of technology automatically requires the creation of a new exception. Reflexive approaches do not, and cannot, consider the speed at which Generative AI technology is evolving and the resulting impact on affected markets.

Moreover, introducing a new exception for TDM now would interfere with the ability of market participants, namely users and rightsholders, to set the boundaries of that emerging market. It would be particularly disruptive for the Government to introduce a new exception for TDM as licensing models are developing and emerging. Various licensing models for the use of copyright-protected works, performances and sound recordings in Generative AI already exist in the market, including:

the Copyright Clearance Center (CCC) in the United States has offered TDM licences for the use of literary and artistic works for several years;
many Scientific, Technical and Medical journal publishers have been offering TDM licences in Canada for a long time. [See for example, Elsevier’s TDM licence : https://www.elsevier.com/about/policies-and-standards/text-and-data-mining/license and Taylor & Francis’s TDM licence : https://taylorandfrancis.com/our-policies/textanddatamining/];
Getty Images has made a Generative AI platform that uses only its own licensed images [Getty Images Launches Commercially Safe Generative AI Offering - Getty Images]
Meta’s MusicGen reportedly used Meta-owned and licensed music from stock footage and music licensing companies [Introducing AudioCraft: A Generative AI Tool For Audio and Music | Meta (fb.com) : https://about.fb.com/news/2023/08/audiocraft-generative-ai-for-music-and-audio/]; and
Universal Music Group has recently announced collaborations with AI developers to explore how their technology can promote and enhance the creative process, including a partnership with Bandlab, a social music making platform, with a mandate to “advance the companies’ shared commitment to ethical use of AI and the protection of artist and songwriter rights”. [See here: UNIVERSAL MUSIC GROUP AND BANDLAB TECHNOLOGIES ANNOUNCE FIRST-OF-ITS-KIND STRATEGIC AI COLLABORATION - UMG : https://www.universalmusic.com/universal-music-group-and-bandlab-technologies-announce-first-of-its-kind-strategic-ai-collaboration/].

Given the nascent market for the licensing of TDM activity and Generative AI partnerships that are developing between AI developers and rightsholders, now is most certainly not the time for the Government to step in and introduce new exceptions. Instead, the Government should allow the market to work out market-based licensing solutions for TDM uses in Generative AI.

According to the Consultation Paper, “Because of the large quantity of data often involved in training such models, in particular when sources from the Internet, obtaining any necessary authorization from rights holders to make reproductions of the works or other subject matter in the course of these activities could be a significant burden”. Given the above examples, this is most certainly not the case.

Rather, it appears that those who are calling for exceptions prefer to lobby governments around the world as opposed to coming up with market-based solutions. After all, sometimes market-based solutions come with a price. Exceptions are usually free.

The Consultation Paper raises the 2019 EU Directive that requires member states to provide two TDM exceptions: one for research and heritage institutions, and one for any other person and purposes, which rightsholders can “opt out” of. Any suggestion of an “opt-out” regime for TDM is both controversial and impractical. The introduction of an exception that gives rightsholders the ability to “opt-out” of such an exception turns copyright on its head. Copyright is an opt-in system: no formality is required for a work to attract copyright protection. Requiring a copyright owner to advise a platform that it objects to the use of its work in TDM for Generative AI is a formality that violates Canada’s obligations under the Berne Convention. There is no basis to throw out these fundamental principles of copyright law and Canada’s international treaty obligations.

An opt-out system would require all copyright owners to monitor every Generative AI platform available in Canada and send some sort of notice to each Generative AI operator or application developer advising that it has chosen to “opt-out” of the exception for TDM purposes. As discussed below, copyright owners would first have to know that their works, performances, or sound recordings were being used by the Generative AI operator/application developer, or otherwise send a notice to all of them. Rightsholders would also have no remedy regarding any copying that took place before they opted-out. That is a massive burden to put on copyright owners and is wildly disproportionate to this supposed problem.

There have been suggestions in other jurisdictions that a compulsory licensing regime may be appropriate for TDM. Compulsory licensing deprives creators and copyright owners of their exclusive rights to authorize the reproduction of their works, performances, or sound recordings by forcing them to license, deprive them of fair compensation for those uses. It similarly deprives rightsholders of the ability to prohibit the use of their content by a service that might ultimately be cannibalizing the need for their own labour or that produces content that acts as a substitute for their original work. The implementation of such a regime would also impose a significant burden on copyright owners to administer and enforce. Additionally, implementing compulsory licensing for TDM is a solution in search of a problem. Compulsory licensing might make sense in certain special cases when voluntary licensing is impossible, such as the use of copyright-protected works or other subject-matter by retransmitters contemplated under section 31 of the Copyright Act. But here, licensing is not impossible: there is a functioning and growing market for licensing for TDM uses.

There is no reason to conclude that the current law is insufficient to address any uses that might arise with respect to the training of Generative AI. The Copyright Act is sufficiently technologically neutral to accommodate technological development and foster Generative AI innovation. As noted in the Consultation paper, there are existing exceptions in the Copyright Act that may assist users in appropriate cases. Until and unless a Canadian court or the Copyright Board exposes a real deficiency with respect to Generative AI that needs to be addressed, there is no valid policy reason to introduce any exception for TDM.

As AI developers and platforms are keenly aware, entering any market involves assuming responsibility for the impact of their new technology on that market and the players within it. The platforms must be responsible for respecting the copyrights of creators.

Numerous exceptions in Canada’s copyright laws have already caused a structural imbalance in the Copyright Act: an imbalance that is depriving rightsholders of their “just reward” and that may otherwise discourage and disincentivize copyright owners from creating and disseminating their works, performances, and sound recordings. The introduction of any additional exceptions for TDM purposes would only serve to further upend the balance sought in the Copyright Act.

There is no need to introduce exceptions allowing further uses of rightsholders’ works, performances, or sound recordings in Generative AI systems.

Recommendation 1. That the Government neither amend the current exceptions to include TDM, nor implement new exceptions for TDM.

D. Questions of Fair Compensation Should Be Left for the Market to Determine

With respect, when and how rightsholders should be compensated for the use of their works, performances, or sound recordings as inputs into Generative AI systems is not a question that should be answered by the Government. Nor should the Government interfere with the question about what level of remuneration is appropriate. The compensation and fair remuneration required for the use of a given work, sound recording, or performance will, and should, be set by the market or, if need be, the Copyright Board of Canada.

This question is properly the subject of negotiations between rightsholders and Generative AI platforms. There may indeed be instances where rightsholders agree with a platform that compensation is not necessary. But the Government should not decide that question. Doing so would simply impede those negotiations and prevent a market-based solution for TDM.

But additionally, it is not just compensation that should be the focus of the question. Among other things, the Copyright Act provides rightsholders with exclusive rights to reproduce their works, performances and sound recordings, or any substantial part thereof, and to authorize any such acts. These rights are engaged when copyright-protected works, performances and sound recordings are ingested into Generative AI systems. Authorization and permission are as important as compensation, particularly when the system output might compete with, or act as a substitute for, the original work. Again, market negotiations and the developing licensing market ought to decide these questions, not the Government.

Authorship and Ownership of Works Generated by AI

THIS IS THE CONTINUATION OF THE PREVIOUS SECTION (TDM), DUE TO THE CHARACTER LIMIT.

E. Challenges with Licensing, Monitoring and Enforcement

The Consultation Paper asks whether rightsholders are facing challenges in licensing their works or related rights for TDM activities.

Many platforms appear to believe the ingestion of copyright-protected content in their systems is already exempted, or that it does not require authorization, under copyright law in Canada. Globally, Google is so confident that the inputs to some of its Generative AI platforms and the output generated by them are non-infringing that it is indemnifying its users from copyright infringement claims:

“If you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved. To do this we will employ a two-pronged, industry-first approach designed to give you more peace of mind when using our generative AI products. The first prong relates to Google’s use of training data, while the second specifically covers generated output of foundation models. Taken together, these indemnities provide comprehensive coverage for our customers who may be justifiably concerned about the risks associated with this exciting new frontier of generative AI products.” [See: Google: https://cloud.google.com/blog/products/ai-machine-learning/protecting-customers-with-generative-ai-indemnification]

Microsoft is similarly confident, offering to “defend its customers from intellectual property infringement claims arising from the customer’s use and distribution of the output of its Generative AI Copilot services”. [See: Microsoft: https://www.microsoft.com/en-us/licensing/news/microsoft-copilot-copyright-commitment].

Notably, the former VP of audio at Stability AI (the maker of the popular image generator Stable Diffusion) announced that he recently resigned from his role at Stability AI because he did not agree with the company’s opinion that ingesting copyright-protected content into Generative AI models is fair use. [See Stability AI VP quits in 'fair use' copyright protest • The Register: https://www.theregister.com/2023/11/16/stability_ai_vp_quits/]

These types of ingrained and bullish positions pose challenges for rightsholders in licensing their works, performances, and sound recordings for TDM.

In addition, rightsholders have no insight into whether their works, performances or sound recordings have been ingested into any Generative AI platform. TDM on any platform is a black box. This information asymmetry, and resulting imbalance in negotiating power, makes monitoring and the resulting licensing opportunities incredibly difficult for rightsholders. At best, licensing transactions are inefficient: rightsholders are left with the choice of guessing whether a particular Generative AI system has used their works or waiting for the operator of a system to approach them for a licence. At worst, there is complete market failure where operators are free riding off the backs of creators’ content. This imbalance must be corrected. For all these reasons, we recommend the Government apply legally binding transparency obligations to Generative AI systems, like those that were recommended by the European Parliament and contained in the provisional agreement for foundation models [See: European Parliament : https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence and European Council: https://www.consilium.europa.eu/en/press/press-releases/2023/12/09/artificial-intelligence-act-council-and-parliament-strike-a-deal-on-the-first-worldwide-rules-for-ai/]

Recommendation 2. That Generative AI platforms be required to comply with transparency requirements, including, but not limited to: (i) publishing records of the copyright-protected works, sound recordings and performances that were ingested into the platform; (ii) designing the model to prevent it from generating illegal or infringing content; and (iii) disclosing that the output produced by the system was generated by AI.

These obligations ought not be limited to “high-impact systems” as defined and contemplated in the Artificial Intelligence and Data Act (AIDA) proposed in Bill C-27 but must apply to all Generative AI large language models.

The imposition of transparency requirements should not cause any real challenges for developers because they already document this data. As one example, the GPT-2AI model card published by Open AI in 2019 includes a list of the top one thousand domains present in their dataset, as well as their frequency. In this log, you will find illegal sites (Pirate Bay), p*rnography (Youp*rn) and sites of rightsholders (Le Monde, CBC) [See https://github.com/openai/gpt-2/blob/master/model_card.md] By requiring transparency, not only will rightsholders have access to essential information for the management of their copyrights and related rights, but users of the systems will have critical information about the sources and biases that may be inherent in the system itself.

While these proposed transparency requirements are not all copyright-specific, and may instead more properly be the subject of Bill C-27 or other legislation, increased transparency in the inputs into and use of Generative AI will help ensure the systems are responsible, lawful, safe, transparent, accountable, and non-discriminatory.

THIS IS THE BEGINNING OF THE NEW SECTION "AUTHORSHIP AND OWNERSHIP"

F. Attribution/Ownership/Authorship

The questions asked in the Consultation regarding authorship and ownership of copyright in AI-generated content raise fundamental questions for the cultural sector.

While the Copyright Act does not explicitly define the word “author”, Canadian case law has already reiterated that original works protected by copyright must be the product of an author’s exercise of skill and judgment, which “must not be so trivial that it could be characterized as a purely mechanical exercise”. [CCH v. The Law Society of Upper Canada, [2004] 1. S.C.R. at para. 25.]

Debates on this issue are ongoing at WIPO and in many other jurisdictions, including the United States and the European Union. However, the international consensus is that human authorship is core to copyright and that content generated by AI without any human involvement is not, and should not be, protected by copyright. CDCE agrees with this consensus. The same principles apply to performer’s performances: only human performances are entitled to rights and protections under the Copyright Act.

The purpose of copyright is, in part, to obtain a just reward for the creator (and prevent someone other than the creator from appropriating whatever benefits may be generated) and to incentivize further creation. At this point in time, there is no need to amend the Copyright Act to create new rights to incentivize the creation of AI-generated content.

Granting copyright or related rights to Generative AI systems for autonomous or mechanical content, without original expression of an idea attributed to a human author or performer, would shift the copyright regime from a paradigm of protecting and promoting human creativity to the pursuit of innovation and revenues for companies of all kinds. This would have far-reaching consequences, the long-term impact of which is difficult to anticipate.

Finally, there is something perverse and frankly, offensive about suggesting that there should be an exception to human authors’ and performers’ copyrights and related rights for the inputs into Generative AI systems, while also providing the same platforms with additional copyright protections for non-human generated outputs. Leaving aside the resulting job losses in the sector, this prospect of mass production of pseudo-cultural content entirely generated by Generative AI systems is a major social concern. “Creation” would be the result of companies seeking solely to make their products mass marketed and profitable, as opposed to having a multitude of diverse creators and artists expressing their own thoughts, views, opinions, commentary, and creativity.

Products resulting from purely mechanical AI generation processes that lack original human expression are not “works” protected by copyright or any sort of neighbouring right and should not become so.

Recommendation 3. That the Government not amend the Act to afford copyright protection to AI-generated content.

In addition, it is important that performer’s performances continue to have rights and protections like those they current enjoy under the Copyright Act, whether or not the underlying content is generated by AI.

Recommendation 4. That performers' performances remain fully protected under the Copyright Act, including when the content performed is AI-generated.

Infringement and Liability regarding AI

G. Infringement and Liability.

To establish infringement, a rightsholder must prove that the defendant copied or made available all or a substantial part of a work, performance, or sound recording, that the defendant had access to the original work, performance or sound recording and that the original work, performance, or sound recording was the source of the copy. Independent creation is a complete defence to infringement.

The biggest barrier to determining whether an AI system accessed or copied a specific copyright-protected work, sound recording, or performance is the lack of transparency described above. Without some knowledge of the copyright-protected content ingested into a Generative AI system, rightsholders can only suspect that their content has been used without authorization. In some cases, this will lead to undetected or unproveable large-scale infringements. It will also lead to a process where rightsholders are forced to sue Generative AI platforms they assume have infringed their copyrights in to (hopefully) obtain disclosure in discoveries, incurring significant time, resources, legal and expert fees, and expenses. At the end of that process, a rightsholder may find out the platform never had access to the copyright-protected content in the first place. This would result in a wildly inefficient system and cannot be the intent of the Government.

Requiring Generative AI systems to publish records of the copyright-protected content that was ingested into the systems is necessary so that copyright owners can protect and monetize their intellectual property. With these transparency obligations in place, liability could potentially arise for primary, secondary, or authorizing infringement, enabling infringement, moral rights infringement, removal of digital rights management information, and circumvention of technological protection measures. From an infringement perspective, the current Copyright Act will be sufficient to address issues specific to Generative AI provided these record-keeping and disclosure obligations are in place.

Perhaps more importantly, transparency requirements will also promote a functioning and more efficient licensing market where rightsholders and users can negotiate on a more level playing field by reducing the information asymmetry that is currently present in the market.

Finally, the Government must consider the impact of Generative AI on authors’ and performers’ moral rights, including their rights to the integrity of their works and performances, as well as name and likeness rights and rights of personality and publicity. While not nearly a full answer to these issues, transparency obligations and disclosure requirements will at least signal that uses like deepfakes and voiceovers, are not sanctioned by the performers they are imitating.

Ultimately, the aim of this Consultation on Generative AI should be to encourage responsible use of copyright protected works and a healthy, functioning licensing market for the use of copyright-protected works, sound recordings and performances in TDM. Transparency obligations that require Generative AI platforms to disclose records of the copyright-protected content that was ingested – as opposed to the creation of new exceptions – is the best way to ensure that Generative AI can continue to innovate alongside a copyright system that incentivizes creators to create and disseminate their work and that provides them with their just reward.

Comments and Suggestions

Here is a summary of the CDCE recommendations for this consultation :

- Recommendation 1. That the Government neither amend the current exceptions to include TDM, nor implement new exceptions for TDM.

- Recommendation 2. That Generative AI platforms be required to comply with transparency requirements, including, but not limited to: (i) publishing records of the copyright-protected works, sound recordings and performances that were ingested into the platform; (ii) designing the model to prevent it from generating illegal or infringing content; and (iii) disclosing that the output produced by the system was generated by AI.

- Recommendation 3. That the Government not amend the Act to afford copyright protection to AI-generated content.

- Recommendation 4. That performers' performances remain fully protected under the Copyright Act, including when the content performed is AI-generated.

H. Other Copyright Act Amendments

The CDCE has made other recommendations to improve the Copyright Act, which are reproduced here. We request that the next copyright reform include these other recommendations, even if they are not subject to the current consultation.

We thank the Government for this opportunity to provide our comments on this important Consultation.

The six urgent recommendations of CDCE members:

1. Amend the fair dealing provisions in the context of education so that they only apply where a work is not commercially available under a license by the rightsholder or a collective society.

2. Incorporate resale right into the Copyright Act.

3. Abolish the public performance royalty exemption for performers and producers for commercial radio stations.

4. Amend the definition of sound recording to include sound recordings that accompany audiovisual works.

5. Amend the Act to confirm the binding nature of tariffs set by the Copyright Board.

6. Restore the private copy regime in the music sector.

The six mid-term recommendations of CDCE members:

1. Ratify the Beijing Treaty and grants moral and economic rights to performing artists on audiovisual media in the Act.

2. Raise the upper and lower limits of statutory damages for non-commercial violations and allow the establishment of higher damages in case of systematic and massive use.

3. Ensure that right holders in the various sectors have the same tools by ensuring that all collecting societies can claim statutory damages of three to ten times the value of the tariff that has not been paid.

4. Improve the private copying regime by allowing the payment of royalties for rights holders in the audiovisual, literary, and visual arts sectors.

5. Amend the exemption in section 32.2(3) to limit its application to acts without motive of gain.

6. Take into account the needs and realities of Indigenous artists, creators, and organizations.

The CDCE

The Coalition for the Diversity of Cultural Expressions (CDCE) brings together the main English and French professional organizations in the cultural sector in Canada. It is composed of 54 organizations that collectively represent the interests of more than 360,000 professionals and 2,900 organizations and businesses in the book, film, television, new media, music, performing arts and visual arts sectors. The CDCE’s main objective is to ensure that cultural goods and services are excluded from trade negotiations and that the diversity of cultural expressions is present in the digital environment.

The Coalition ensures that Canada retains the sovereign right to develop, implement and modify the policies, programs and measures required to ensure we have a robust supply of Canadian artistic expressions of all kinds, in every medium, and from all communities. CDCE also works to protect and promote our artists and cultural industries, and to ensure there is a rich diversity of cultural expressions in Canada and globally, including in the digital environment.

The Copyright Act is one of the key tools available to the Canadian government to foster a viable, sustainable, and diverse cultural ecosystem. The rapid developments in generative artificial intelligence over the past year will undoubtedly impact the diversity of cultural expressions in Canada. It is timely to question the robustness of the Copyright Act in this context. The CDCE, however, wishes to emphasize from the outset that this is not the only legislative tool that can or should be mobilized to protect the diversity of cultural expressions in response to these developments. The cultural sector requests to be included in all Canadian reflections surrounding the governance of AI.

Cohere Inc.

Technical Evidence

(1) How does your organization access and collect copyright-protected content, and encode it in training datasets?

The prevailing method followed by research organizations and companies to train AI models is to filter and encode data from a wide variety of sources, including publicly available, proprietary and synthetic data. In some cases, data that is subject to access controls may be used where access is permitted technologically or authorization has been obtained from the individual or entity who controls access to the data.

The encoding process entails presenting information or other data as machine readable numerical tokens. These tokens are then computationally analyzed to train the model. During this training process, the model learns by identifying patterns occurring across a broad spectrum (represented by numerical tokens), and then compressing its learnings in the form of statistical correlations (e.g., model weights). Post-training, the model undergoes further retraining and safety and model evaluation processes to create a model that would be ready for operation. In addition, the models may be further fine-tuned with additional data, such as customer data or domain-specific proprietary data, in a process that tailors the model for more specific tasks or use-cases. Finally, once in operation, models are continuously monitored and evaluated to allow for the ongoing iteration and improvement of the models.

As such, the inclusion of any individual datapoint in the training of the model (e.g., one individual piece of text) does not determine the model’s result. Rather, the model’s usefulness depends on the quantity and diversity of the data as a whole because AI models learn by synthesizing and aggregating diverse human knowledge that is presented during training and fine-tuning.

Foundational large language models must be trained on large amounts of data (billions, if not trillions of data points), which, by necessity, requires accessing publicly available data online. Large amounts of data are required for a multitude of purposes, including to:

- Develop, build and deploy models that are useful for learning and building on ideas and creating new knowledge and tools;

- Ensure that the model learns statistical information representing all the complexities of semantic meaning and grammatical structure;

- Evaluate and test models;

- Identify and reduce the risk of bias and other potentially harmful outputs; and

- Update and re-train models from time to time.

Fulfilling each of these purposes is dependent on large, diverse data sets. The developer also needs large, diverse datasets so that the resulting model will understand different speech patterns, including everyday human speech, technical jargon, and literature. If the training data is limited—e.g., if it excludes entire categories of speech, or contains only a limited number of examples—the resulting model will not capture all of the nuances and statistical patterns that reflect the complexity of human language. That, in turn, will limit the function, usefulness, and flexibility of the resulting model.

(2) How does your organization use training datasets to develop AI systems?

See response to preceding question for a description of the prevailing method followed by research organizations and companies to train AI models.

(3) In your area of knowledge or organization, what measures are taken to mitigate liability risks regarding AI-generated content infringing existing copyright-protected works?

Among generative AI platforms, some of the available measures to mitigate the risk of AI-generated output infringing copyright include:

- Testing, evaluating and fine-tuning models to reduce the likelihood of output replicating an existing work;

- Deploying content filters and technical controls – e.g., software that evaluates generative output to identify and remove output that may replicate an existing work; and

- Requiring users to comply with codes of conduct or usage guidelines which prohibit the use of AI systems for the purpose of generating infringing output.

As AI-models and filters improve over time, the likelihood of AI-generated output infringing copyright (which is already low) will continue to fall.

(4) In your area of knowledge or organization, what is the involvement of humans in the development of AI systems?

Humans are involved at each stage of development and deployment of an AI system including:

-Defining the objectives and operating parameters of the AI system;

-Designing the model architecture and development the AI system in which such models are embedded within;

-Collecting, selecting, preparing and encoding of data used to train and fine-tune models;

-Collecting, selecting and encoding of post-training data used to enhance and fine-tune models, including related evaluation and experimentation;

-Testing models to assess performance and the risk of biased or harmful output;

-Fine-tuning models to enhance performance and safety, including to mitigate the risk of biased or harmful output; and

-Deploying, operating and monitoring the performance of the AI systems and embedded models, including those that are put into production.

(5) How do businesses and consumers use AI systems and AI-assisted and AI-generated content in your area of knowledge, work, or organization?

Businesses and consumers use AI systems to, among other things:

- Summarize and review large volumes of text to extract meaningful insights or information;

- Generate text that is informed by the AI systems, such as generations for a certain audience;

- Obtain information about publicly available or proprietary information (if fine-tuned on such) to inform decision making; and

- Educate and improve access to disparate sources of publicly available information.

Text and Data Mining

(1) What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?

Clarity around copyright and TDM in Canada are prerequisites to the AI industry in Canada making the transition from a global leader in research and development to a viable market from which to scale up and operate successful companies.

Harnessing the societal benefits of artificial intelligence through the development and commercialization of “safe, secure and trustworthy AI” – which is the underlying objective of the G7’s Hiroshima Process International Code of Conduct for Organizations – requires accessing publicly available data online.

A lack of clarity around copyright and TDM will undermine Canada’s ambitions to be the home of world leading AI companies and ecosystems and risks entrenching global incumbents with access to vast amounts of information and compute, or entities with significant financial resources and ability to exclusively acquire such information to the exclusion of others, and withstand years of regulatory and legal uncertainty. AI companies and ecosystems will migrate to those jurisdictions with laws that either provide this clarity or are otherwise more favourable to AI development and commercialization – such as the United States, Japan and Singapore.

Similarly, the lack of clarity related to TDM activities and copyright has undermined the building of AI infrastructure, specifically access to in Canada. The lack of access to large- scale, cost- effective AI compute has downstream effects, including limiting access to compute for researchers, civil society, and downstream deployers especially in sensitive sectors such as healthcare, finance, and public sectors, and AI sovereignty implications. Canada’s global ranking in AI is currently falling as other countries advance their own AI strategies, attract top AI talent, and invest in AI compute. A report from Tortoise Media found that Canada’s global AI ranking fell from fourth to fifth, and a significant drop in Canada’s global ranking was attributed to Canada’s lack of AI infrastructure.

It is also reasonable to expect that uncertainty around copyright and TDM in Canada may undercut the distribution and deployment of AI systems in this country, further increasing the growing productivity gap between Canada and other leading economies, including the United States. The resulting uncertainty of a short-sighted approach may culminate in a Pyrrhic victory for copyright in Canada—with a potential opportunity to leverage Canada’s world-leading AI talent and resources squandered as the world adopts AI, the result being a tiny fraction of the economic value and wealth that should have been created. Such a loss may be incalculable and irrecoverable.

(2) Are TDM activities being conducted in Canada? Why or why not?

Only limited TDM activities (largely for non-commercial research) are being conducted in Canada.

TDM for commercial purposes is mostly being conducted in the United States, Japan, Israel, South Korea, and other jurisdictions with laws that are more supportive of AI development and commercialization. Given that capital, AI infrastructure development and talent will gravitate to jurisdictions with favourable laws, this trend can be expected to continue as AI development grows.

For TDM activities to become more prevalent in Canada, and for Canada’s capacity to enable the scaling-up of domestic AI-companies, it is critically important for Canada’s laws to provide certainty – either through favourable judicial decisions interpreting existing provisions in the Copyright Act or through the addition of a new exception to the Copyright Act specifying that TDM activities do not infringe copyright.

(3) Are rights holders facing challenges in licensing their works for TDM activities? If so, what is the nature and extent of those challenges?

TDM does not infringe copyright, making the licensing of works for TDM activities unnecessary for publicly available data. The process of training models involves learning concepts and facts from, and identifying patterns found in, data – similar to how an individual who reads a book gains knowledge. As these concepts, facts and patterns are not protected by copyright, copyright law should not be interpreted to prevent AI training.

Furthermore, the computational analysis undertaken by an AI model during training does not involve the consumption of copyright protected data for their expressive content. Rather, such analysis involves mathematical calculations of probabilities, correlations, trends, and other patterns across the entire tokenized data set. Such analysis seeks to understand only the mathematical patterns (e.g., the relationships of specific tokens in relation to other tokens) distributed across the entire data set. These mathematical patterns are themselves not expressive content protected by copyright law.

More broadly, the non-expressive encoding data does not have an impact on the market for an underlying work or related revenue models or the economic interests that copyright is intended to protect, as the rights holder continues to control access to the work. Restricting access to a work allows the rights holder to prevent the work from being used to train AI models. It also provides the rights holder with the option of making access to a work available for a fee. There exists today a multitude of business models and technological protection measures that can be – and are being – used by rights holders to control access to their copyright-protected works to generate revenue streams through voluntary licensing.

(4) If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards? What would be the expected impact of such an exception on your industry and activities?

Canada should adopt an express amendment to the Copyright Act to cover TDM activities. Such an express amendment should extend to:

- Commercial and non-commercial uses;

- All works and other subject matter; all copyright-relevant acts, including retention of data for purposes of verification and validation of results; and

- The provision of AI services (i.e., permit service providers to perform AI activities on behalf of its end-users).

An amendment to the Copyright Act to confirm that TDM activities do not infringe copyright should also make clear that TDM activities do not override access controls. The Copyright Act already includes provisions addressing technological protection measures, making the inclusion of new restrictions unnecessary.

(5) Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?

There has been significant discussion around compelling AI developers to disclose the datasets on which they have trained. We believe that would be a misguided policy decision for several reasons.

First, such a requirement would be largely unworkable in practice:

- LLMs are generally trained on a wide variety and innumerable quantities of publicly-available source material, such as web-crawled or scraped data, the very nature of which is disparate and ever-shifting. This stands in stark contrast to other technologies that rely on smaller datasets, which lend themselves to being provided or managed by a centralized provider. Correspondingly, because there is no central set of data vendors or a standardized approach to referring to individual copyrighted work, the copyright status of individual works contained in billions or trillions of datasets are effectively impossible to discern. Furthermore, there is significant variation in the nature and extent of metadata associated with each item of training data, and none that is specific to the copyrighted status of content that is publicly available.

- LLMs are deployed internationally. Consequently, the burden of harmonizing copyright law for any given dataset across geographies prior to deployment would practically mean that no useful model would ever get off the ground.

- Development of foundational models is an ongoing process that involves substantial amounts of work to filter and refine the training data used for each version of a model. Data used for one version of a model may not be used for a subsequent version and vice versa, multiplying the complexity of maintaining an up-to-date list of every copyrighted work that forms part of the training data for a specific version of a foundational model.

- Every iteration of this technology may give rise to new considerations about data use and integration that are impossible to forecast, infinitely complicating the model training process. Consequently, prior to and at various stages during the model training of a new model iteration, the datasets themselves and aspects thereof need to be refined, structured, categorized or manipulated in a multitude of ways to address various factors from model efficacy to safety.

Second, requiring developers to disclose the datasets used to train models would result in the disclosure of proprietary and commercially sensitive information.

Third, and perhaps most important, compelling disclosure of training data for LLMs is at odds with existing copyright law principles given that TDM activities do not infringe copyright.

To the extent that the Government of Canada still seeks to pursue any record-keeping or disclosure obligations, they should be aligned with globally recognized principles of responsible AI or corresponding current or emerging regulatory obligations and be limited to high-level disclosures (e.g., listing categories of data types used in training).

(6) What level of remuneration would be appropriate for the use of a given work in TDM activities?

Remuneration would not be appropriate given TDM does not infringe copyright. See our response above to Question 3.

Authorship and Ownership of Works Generated by AI

(1) Is the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies? If so, how?

Existing copyright principles can be applied to determine whether an AI-generated work or other subject matter is subject to copyright protection. The use of AI to create a new work entails an individual exercising their skill and judgment to prompt an AI-system or tool to generate a new work. An AI-generated work is often enhanced or modified, both with additional prompts and non-AI activities, to create a finished work. Whether the exercise of skill and judgment by an individual is sufficient to meet the test for originality and intellectual effort under Canadian copyright law can be assessed by the courts.

(2) Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how?

Legislative amendments are not needed. As explained in our response to the preceding question, existing copyright principles can be applied to determine whether an AI-generated work or other subject matter is subject to copyright protection.

Infringement and Liability regarding AI

(1) Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright (e.g., AI-generated works including complete reproductions or a substantial part of the works that were used in TDM, licensed or otherwise)?

Existing copyright principles can be applied to determine whether an AI-generated work or other subject matter infringes copyright.

(2) When commercialising AI applications, what measures are businesses taking to mitigate risks of liability for infringing AI-generated works?

Among generative AI platforms, some of the available measures to mitigate the risk of AI-generated output infringing copyright include:

- Deploying content filters and technical controls – e.g., software that evaluates generative output to identify and remove output that may replicate an existing work; and

- Requiring users to comply with contractual obligations, codes of conduct or usage guidelines which prohibit the use of AI systems for the purpose of generating infringing output.

As AI-models and filters improve over time, the likelihood of AI-generated output infringing copyright (which is already low) will continue to fall.

(3) Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

Existing copyright principles can be applied to determine liability if an AI-generated work infringes an existing copyright-protected work.

Comments and Suggestions

(1) Are there TDM approaches in other jurisdictions that could inform a Canadian consideration of this issue?

Japan, Singapore, Israel, South Korea and the United States are jurisdictions that should be used to inform a Canadian consideration and approach to TDM. Japan and Singapore have amended their copyright laws to specify that TDM (or “computational data analysis”) is permitted for both commercial and non-commercial purposes, providing much-needed clarity for AI companies and ecosystems. While the United States does not have an express exception for TDM in its Copyright Act, it is a jurisdiction that is widely regarded as having laws more favourable to AI development and commercialization. Notably, the “fair use” exception in the US Copyright Act is more flexible than the “fair dealing” exception in Canada, since “fair use” is not restricted to a specific list of enumerated purposes that limit the scope of “fair dealing” in the Canadian Copyright Act and thus could hinder TDM activities in Canada for certain purposes. Israeli and South Korean copyright laws allow for fair use similar to the US’s fair use exception.

Given the global and borderless nature of the development and deployment of generative AI, Cohere believes that it is vital for Canada to maintain interoperability with rules, regulations, and norms in other jurisdictions such as those listed above. To the extent that Canada was to distance itself by failing to adopt an express TDM amendment in copyright law, other jurisdictions would highlight this policy shift to become more attractive hubs of AI training and development. Furthermore, creating an outlier training regime in Canada would diminish access to diverse and differentiated AI models, leading to fewer available models in Canada, depriving Canadian businesses from access to state-of-the-art tools enjoyed by their foreign counterparts. Not only would the number of models be reduced, but the quality of those models would also suffer, as the ability to train on a diverse and global dataset, which is crucial to building AI systems that are safe, accurate, representative, and unbiased across many different regions and cultures, would likewise be significantly diminished.

(2) Technical Measures and Outputs of AI Systems

We are aware of some content creators’ concerns that the outputs of AI systems may be similar to the information on which the AI systems were trained. We view such outputs as a solvable technological challenge in AI systems, one which we are committed to mitigating through further development of AI systems and through the implementation of technical measures. We note, as have others, that such phenomena are difficult to reproduce and have been observed mostly through teams seeking to intentionally and actively force a model to generate those specific outputs. In other words, other than researchers testing the capabilities, weaknesses, and limitations of early LLMs, users have not reliably experienced reproductions of training data through typical usage. We believe that further testing, evaluation and fine-tuning of foundation models will result in a further reduction of the potential for this to occur. Models are trained to, and fundamentally their value is derived from, their ability to effectively and efficiently provide insights into, and unlock information about, an existing corpus of data, not reproduce it.

(3) Conclusion

Canada should adopt an express amendment to the Copyright Act to cover TDM activities. Such an express amendment should extend to:

- Commercial and non-commercial uses;

- All works and other subject matter; all copyright-relevant acts, including retention of data for purposes of verification and validation of results; and

- The provision of AI services (i.e., permit service providers to perform AI activities on behalf of its end-users).

Such an express amendment would:

- Continue to maintain the balance between “promoting the public interest in the encouragement and dissemination of works of the arts and intellect and obtaining a just reward for the creator.”

->This is because the training AI models do not displace the market for the underlying work. Based on how machine learning works and its purpose - to create new content - it is fair to conclude that the “information analysis” undertaken during model training is highly unlikely to harm the primary purpose of the original work. The objective is not to republish or compete with the work used for training purposes but rather to ensure that the text and data inputs can be used for analysis. It is for this reason that commercial use in a TDM exception will not negatively implicate rights holders’ copyright interest.

->The “information analysis” is transformative and not replicative.

- Be consistent with Canada’s international obligations. The TRIPs Agreement and Berne Convention require Member States to ensure that exceptions to copyright are confined to “certain special cases which do not conflict with the normal exploitation of the work and do not unreasonably prejudice the legitimate interests of the rights holder.”

There is a lot at stake in the outcome of this consultation, particularly as it relates to TDM activities. If Canadian law prohibits TDM activities without a license or imposes a statutory remuneration regime, the ability of Canada to realize the many societal benefits of AI – including improved health care delivery, climate change measurement and reduction, enhanced productivity, and globally successful AI companies – will be compromised.

Amending Canadian copyright law to provide rights holders with a new right to control TDM will not result in rights holders acquiring new revenue streams, as any meaningful TDM activities will be conducted outside Canada. Inadvertently, adding such a right will also undermine Canada’s emerging AI ecosystem and all but guarantee that development and commercialization of AI will happen largely elsewhere. More broadly, it also could result in AI systems, including systems that are critical to advancing health care, addressing the climate crisis and closing Canada’s productivity gap, not being made available for use in Canada.

Thank you for the opportunity to submit our responses. We look forward to continuing to work with all stakeholders to clarify the best path forward for all.

Colleges & Institutes Canada

Technical Evidence

N/A

Text and Data Mining

Enhancing Copyright for Canadian Colleges

An important first step to supporting AI use in higher learning would be creating a clear exception for text and data mining (TDM). Adding this exception would modernize Canada’s copyright scheme and maintain the fundamental user-creator balance.

The Supreme Court of Canada has provided that the proper copyright balance, “lies not only in recognizing the creator’s rights but in giving due weight to their limited nature.” The Court has also previously cautioned against tilting this balance in a way that limits creative innovation in the public domain, to support long-term interests in Canadian society. Text and data mining and AI have already proven to be useful tools for enhancing education and research.

For example, the OER Studio at Fanshawe College is using generative AI to create chapter summaries, assessment questions, and supplementary learning materials from the primary content included in an Open Educational Resource (OER). The OER Studio also hopes to build on this by using AI to create more resources, such as audio-visual case studies and adaptive learning modules.

In college applied research, TDM supports the planning, data collection, analysis, processing, design, prototyping, and developing new products and services for industry. Examples of AI applications in applied research projects include:

AI applications to measure customer satisfaction;
AI-based solutions for the health sector;
AI-based systems for product and service deliveries;
AI based systems to enhance sales; and
AI systems to ensure accountability, confirm authorities, and support risk management practices.

Fleming College’s Centre for Advancement in Mechatronics and Industrial Internet of Things is working with industry partners to develop AI-based customer service solutions, customer satisfaction analyses, and automatic checkout processes.

As part of their Digital Sea Project, Cégep de Matane is using AI to recognize, classify, and characterize different marine species. These processes help identify species that are commercially valuable and cases where stock of certain species may be scarce, which can assist in better protection and conservation efforts.

Boosting Education and Innovation with Text and Data Mining

Using TDM in college learning and research promotes creative solutions and efficient project outcomes. Put simply, if a user has lawfully accessed information and copyrighted content, the right to read should also encompass the right to mine. While several other jurisdictions (details below) having already enacted provisions clarifying TDM, Canadian researchers and learners risk being left behind when it comes to AI-driven research and education without government action.

The Copyright Act provides an allowable purpose for TDM of copyrighted works under the fair dealing exceptions for research and education. However, like other infringement exceptions in the Act, the ability to use TDM under fair dealing is not evident. Without specific language on the lawful use of TDM in the Copyright Act, this lack of clarity could negatively impact or prevent prospective research.

Enacting a specific TDM exception for copyright infringement would provide legal certainty for college learners and industry partners. Several jurisdictions have already enacted some form of TDM exception, including the United States, European Union, Japan, the United Kingdom, Singapore and more. For instance, Singapore recently enacted a specific statutory exception for TDM, even though TDM could have been permitted under their existing fair use provisions. Singapore preferred to add a stand-alone TDM exception to increase legal certainty for the AI industry and promote their plans to make Singapore a global AI hub. While other jurisdictions move forward with TDM exceptions, a comparative lack of certainty will put Canada at a disadvantage and could stifle AI research and development.

Canada can also learn from other jurisdiction’s experiences and ensure a new TDM exception properly supports Canadian innovation. For instance, the United Kingdom introduced a TDM exception in 2014 but limited the exception to non-commercial research. Following a recent consultation, the UK Intellectual Property Office (UKIPO) announced plans in 2022 to extend the TDM exception to commercial purposes, providing a comparative advantage for the UK in AI research and development. The UKIPO highlighted this expanded exception would benefit researchers, AI developers, cultural heritage institutions, and the wider public, by speeding up the TDM process and AI development.

Creating a specific exception in the Copyright Act for TDM would ensure Canadians can access the full research, innovation, and educational benefits from AI. In addition, by creating this exception for both commercial and non-commercial purposes, Canada will be empowering colleges and their innovation partners, while supporting a competitive Canadian AI industry.

CICan recommends the Copyright Act be amended to add a specific exception to infringement for text and data mining (TDM), which applies to both commercial and non-commercial purposes.

Complementary Amendments to Facilitate Text and Data Mining

In addition to enacting a specific infringement exception for TDM in the Copyright Act, additional amendments would also help bolster TDM in Canada’s copyright framework.

First, the Government of Canada could also reinforce TDM amendments by making it clear that no exception to copyright can be overridden by contract. Certain contractual obligations can often interfere with a college’s lawful use of copyright exceptions. College libraries spend thousands each year to licence content that supports current education and training needs. However, licencing contracts often forbid TDM, even though colleges are paying for access to the material and these activities are permissible under fair dealing.

The Copyright Act must ensure that when works are accessed legally and the uses do not infringe on one of the exclusive rights of the author, the Act should not be overridden by contract. The law should clearly provide no contract can deem a lawful activity as copyright infringement when activity is permissible under the Act. This will guarantee Canadian researchers and learners can gain the full benefits from emerging technology, such as AI, and content they to which they are paying for access.

CICan recommends the Copyright Act be amended to clarify that no copyright exception can be overridden by contract.

Second, to ensure any TDM exceptions can be fully leveraged by users, the Copyright Act should also be modified to allow technological protection measures (TPMs) to be circumvented for any non-infringing use. Technological protection measures too often act as a barrier to the lawful use of copyright exceptions. Under the present law, circumventing a TPM is considered infringement, even when the use of the underlying material is lawful. This has effectively obstructed the legal use of materials for research and education.

Technological protection measures are especially problematic when colleges are implementing learning supports for persons with disabilities. For instance, in several provinces, providing closed captioning for those with a hearing impairment is an accessibility requirement, but colleges and users are unable to break digital locks to implement this support for the students who require it. To avoid continued use of TPMs to obstruct the legal use of materials, including for activities involving emerging AI technology and TDM, the Government of Canada should amend the Copyright Act to make it explicit that circumventing a TPM for a lawful and required use is not infringement.

CICan recommends the Copyright Act be amended to allow for the circumvention of technological protection measures for any non-infringing purpose.

Authorship and Ownership of Works Generated by AI

Authorship and Ownership of AI-Generated Works

As AI becomes widely used and develops rapidly, questions regarding authorship and ownership are often central to designing policy and legislation around this technology. Colleges specialize in partner-driven AR&D to solve technology, business, health, and social innovation challenges. When these projects conclude, the IP developed remains with the partner/client for any commercial purposes. Generally, any IP retained by colleges is only used for academic and research purposes. However, providing clarity on this point in the law is important for understanding ownership questions over certain research methodologies, such as AI algorithms and statistical methods.

As the Government considers these questions, it should be noted that several instances throughout the Copyright Act provide “authors” of copyrighted works in Canada are human. Any future amendments to the Copyright Act surrounding AI should be framed around this central notion.

CICan recommends any future amendments to the Copyright Act pertaining to artificial intelligence (AI) should remain consistent with the central notion that authors of copyrighted material are human in Canadian law.

Infringement and Liability regarding AI

Finally, any approach to copyright infringement and liability concerning generative AI should consider the legal distinctions between authorizing infringement and simply using AI technology. The Supreme Court has provided, “a person does not authorize infringement by authorizing the mere use of equipment.” This distinction should be considered in developing any liability approach, to ensure researchers and learners are not punished for merely using AI.

CICan recommends any approach to copyright infringement and liability concerning generative artificial intelligence (AI) remain consistent with the legal distinctions between authorizing infringement and using technology, to ensure users are not punished for using AI.

Comments and Suggestions

Introduction

Over 95% of Canadians and 86% of Indigenous peoples live within 50 kilometers of a public college, institute, cégep, or polytechnic. As Canada’s most accessible post-secondary education network, colleges are critical to tackling many challenges facing Canada, including skills for the future economy, the drive to net-zero emissions, and housing shortages. Colleges serve as local gateways to the innovation ecosystem for thousands of small and medium enterprises (SMEs) and other partners, specializing in applied research that solves technology, business, and innovation challenges.

Ensuring Canada’s copyright framework is balanced and responsive to emerging technologies is important to colleges, their students, faculty, and staff, and their research and innovation partners. Copyright legislation affects the way students and educators can access and use copyright-protected materials. This ultimately impacts teaching, learning, and research. The Supreme Court of Canada has been clear that the Copyright Act must maintain an equitable balance between rights for creators and users. In the age of generative artificial intelligence (AI), copyright law must bolster a supportive environment for AI research and innovation in Canada.

Colleges and Artificial Intelligence: Empowering Education and Applied Research

Canada’s colleges are embracing AI’s potential to enhance teaching, learning, and applied research. Generative AI supports students and teachers by strengthening communication processes, personalizing student support, and creating innovative educational resources.

Canada’s colleges are already integrating AI into programs and into pedagogy. For instance, colleges offer several courses, workshops, and programs focused on AI. The college sector is eager to continue integrating AI technology into classrooms to advance teaching and learning. For example, Algonquin College is encouraging faculty to use generative AI in their classrooms to increase learner productivity and efficiency, stimulate creativity in brainstorming activities, support writing and editing, and increase classroom accessibility by catering to various learning styles.

In addition, college applied research & development (AR&D) refers to a range of innovation activities delivered through partnerships between colleges and a private firm, not-for-profit or community organization. Several CICan members have related specialized research hubs for AI, including Durham College, the British Columbia Institute of Technology (BCIT), Niagara College, Saskatchewan Polytechnic, and the Cégep de Sept-Îles.

Through AR&D projects, colleges support their partners with product and prototype development, process improvement, commercialization readiness, and technology adoption. Generative AI has the potential to enhance efficiency and creativity in college AR&D.

Conclusion

As AI development is poised to impact every sector and industry, Canada’s colleges are embracing its potential to strengthen teaching, learning, and research. By providing accessible and high-quality education, the college system is at the forefront of training the workers who will enhance productivity and innovation with AI technology. Our labour market needs diverse, resilient, nimble, and sophisticated workers who can work alongside AI.

As AI becomes widely used and filters into several aspects of education and training, colleges must be able to confidently access the technology and resources that support innovative and responsive education. Copyright law should not hinder AI development, but instead, embrace and empower the potential it holds for learners and researchers.

Beyond this consultation, any potential changes to the Copyright Act, pertaining to AI or otherwise, should be conducted in accordance with a comprehensive parliamentary review. The Government is urged to consider the recommendations put forth in this submission to support creative and innovative education in the age of generative AI.

Copibec

Preuve de nature technique

Copibec est la société québécoise de gestion collective des droits de reproduction, une entreprise d’économie sociale à but non-lucratif spécialisée en gestion des droits d’auteur. Elle représente plus de 30 000 autrices et auteurs, et 1 300 maisons d’édition. Elle offre aux utilisatrices et aux utilisateurs de matériel protégé par le droit d’auteur des solutions simples et adaptées à leurs besoins. À l’échelle internationale, la société de gestion collective a conclu des ententes avec plus de 33 sociétés étrangères afin d’inclure les livres, journaux et revues de ces pays à son répertoire. Elle compte parmi ses membres l’UNEQ, l’ANEL, le RAAV, l’AJIQ, la FPQJ, la SODEP, les Quotidiens du Québec et les Hebdos du Québec.  La consultation portant sur le droit d’auteur à l’ère de l’intelligence artificielle générative lancée en octobre dernier alors qu’une consultation sur un cadre moderne du droit d’auteur pour l’intelligence artificielle et l’Internet des objets avait été initiée en 2021 exprime en soi l’évolution rapide de l’IA.

Cette évolution rapide et les questions que suscite l’intelligence artificielle générative en lien avec le droit d’auteur pourraient nous contraindre à agir avec une trop grande célérité en ne mesurant pas l’ampleur des changements à venir. Il faut, bien sûr, prendre acte des défis que pose l’avènement de l’IA et des questionnements qu’elle suscite. Il ne faut pourtant pas éluder le fait que l’on oppose l’intelligence artificielle à l’intelligence qui, elle, émane d’êtres humains. Certes, les questions sont vertigineuses, mais elles nous ramènent aussi à notre essence. De ce point de vue, la Loi sur le droit d’auteur, modelée sur ce qui nous est propre, nous renvoie à ce qui nous singularise et nous distingue.

Si la dénomination de l’IA ne suffit pas à la cerner, il demeure que les mots permettent de définir ce que nous nommons. Que nous ayons ainsi pensé important d’apposer l’adjectif « artificielle » au substantif « intelligence » dénote bien le fait qu’il faut distinguer l’IA de ce qu’est l’intelligence qui, sans autre qualification est associé à l’intelligence humaine. L’IA force en quelque sorte l’intelligence qui nous est propre à se définir par opposition à ce qui serait un autre ensemble référentiel. L’utilisation de l’expression « machine learning » met déjà plus à distance ce que nous tentons de cerner et peut aussi nous aider à répondre aux questions qui nous sont posées. Un possible amalgame entre “intelligence artificielle” et “intelligence humaine” porte déjà en son sein un glissem*nt qui pourrait s’avérer tendancieux.

Il nous semblait important de nous attarder aux termes utilisés et qui désignent ce que nous devons circonscrire quant à la demande qui nous est faite par cette consultation sur le droit d’auteur à l’ère de l’intelligence artificielle générative. Il faut d’ailleurs noter que le terme « générative » s’est ajouté dans cette nouvelle consultation, alors qu’en 2021 nous n’en étions encore qu’à l’intelligence artificielle.

Il pourrait être facile de s’abandonner à la technicité, de sacrifier les questions éthiques au nom de la science. Ce n’est pas la voie que nous emprunterons ici.

Que la Loi sur le droit d’auteur soit à nouveau interpellée par de nouvelles technologies, qu’elle soit poussée dans ses derniers retranchements, qu’on la questionne et l’interroge face à des objets éloignés de son centre nous oblige en fait à définir ce que nous désirons protéger. Disons d’emblée que la technologie n’est pas neutre et que nous constatons dès à présent les enjeux que sont notamment le manque de diversité, la présence de biais cognitifs et la possibilité, en nous projetant, d’entrevoir une uniformisation en ce qui a trait aux extrants générés par l’IA.

Ainsi, prendre un recul critique face au développement de l’IA permet de mieux la situer. Il semble que ChatGPT fut le déclencheur d’une prise de conscience planétaire des capacités de l’IA, comme si un seuil avait été franchi, et ce n’est pas qu’au Canada que l’on tente de la juguler, de l’endiguer, de la paramétrer. Nous sommes donc devant un effort à l’échelle mondiale pour trouver des réponses satisfaisantes en ce qui concerne l’IA générative et force est de constater que la culture est l’un des vecteurs qu’impacte l’IA à l’ambition totalisante. La prise en compte de l’ensemble de ces vecteurs démontre les aspects problématiques que suscitent l’avènement de l’IA. Il nous est à ce jour difficile d’envisager l’avenir même à très court terme.

Le document de consultation parle à certains moments de « contenus créatifs» en se référant à ce que peuvent générer les systèmes d’intelligence artificielle. Nous avons entendu « artefact», «contenu généré», et il nous semble important encore ici de nous distancier du terme «œuvre» utilisé dans la Loi sur le droit d’auteur.

Ce qui pourraient n’être que des considérations lexicales sont au contraire des éléments qui nous disent d’emblée que la façon d’appréhender l’IA générative, sans nous interroger en amont sur cette nouvelle « réalité », procède d’une analyse qui ferait l’économie de son objet. C’est ce que nous désirons indiquer par des remarques que l’on pourrait qualifier de liminaires, mais qui ne sont pas, de notre point de vue, accessoires.

De plus, les questions identifiées dans le document de consultation produit, tout en nous indiquant la possibilité de commenter de façon plus large, produisent néanmoins un effet d’ancrage chez le répondant qui aura tendance à se modeler à ce qui est proposé et à ne pas questionner les prémisses de base. Nous avons discuté de l’emploi du terme « intelligence artificielle », de la dénomination des extrants générés par l’IA. Ce qu’est un « auteur », une « œuvre» sont autant d’éléments fondateurs dans la Loi sur le droit d’auteur. Qui plus est, la version anglaise de la Loi pose déjà des défis. Son titre lui-même ayant fait l’objet d’analyses et le terme «work», équivalent du terme «œuvre», en langue anglaise ne sont que des exemples de l’importance et de la difficulté que pose le champ lexical en droit d’auteur.

Les questions que pose l’IA vont au-delà de l’objet de cette consultation et la Loi sur le droit d’auteur est le substrat d’éléments partagés et sur lesquels nous avions une compréhension commune. Que la Loi ne définisse pas, par exemple, ce qu’est un auteur et que l’on ait une approche dans laquelle l'historicité de la Loi est évacuée nous semble bancal. Que l’on constate le caractère évolutif des textes est une chose, qu’on le dénature en est un autre.

L’originalité est certes le critère qui définit le mieux ce qui sera protégé de ce qui ne le sera pas – il s’agit d’un élément fondateur et nous y reviendrons lorsque nous parlerons de titularité.

Fouille de textes et de données

Nous sommes fermement en défaveur de l’établissem*nt d’une exception qui permettrait de procéder à toute activité de fouille de textes et de données. Nous n’avons pas, en notre qualité de société de gestion collective des droits de reproduction, reçu de demande concernant la FTD. Par ailleurs, des licences existent déjà afin d’autoriser cette activité et nous croyons qu’un marché peut se développer. Il correspondrait à la logique interne du droit d’auteur selon laquelle les autorisations se transigent en égard aux droits exclusifs des auteurs et qu’une valeur monétaire est offerte en contrepartie d’une utilisation déterminée. Le Copyright Clearance Center (CCC) aux États-Unis offre de telles licences pour la FTD en ce qui a trait aux œuvres artistiques et littéraires.

Nous soumettons que la LDA répond avec efficacité aux besoins exprimés par l’industrie et nous reprenons les termes du document de consultation : « La présente consultation a pour objectif de poursuivre l’important travail de recherche des faits déjà entamé, plus particulièrement en vue d’éclairer la politique sur le droit d’auteur à une époque où le contenu, même s’il peut sembler créatif et original, peut être facilement généré par un système d’IA. » (p. 3)

La recherche des faits qu'évoque le document de consultation est ardue pour nous puisque nous n’avons en aucun cas été sollicité en lien avec la fouille de textes et de données. Nous ne nions pas l’existence de la FTD, mais elle n’apparaît sur aucun de nos écrans radar.

L’entraînement des systèmes d’intelligence artificielle a nécessité l’utilisation d’œuvres protégées par le droit d’auteur sans qu’aucun consentement n’ait été accordé et qu’aucune forme de rémunération n’ait été établie. Nous plaidons pour une exigence de transparence.

Titularité et propriété des œuvres produites par l’IA

Alors que nous avons abordé la question des intrants en égard à l’intelligence artificielle générative, la question des extrants est aussi posée. On ne peut lire les extrants et penser à leur titularité en faisant abstraction des intrants qui ont servi à générer le contenu. Nombre d’œuvres protégées ont servi à l’entraînement des intelligences artificielles génératives sans consentement des titulaires de droit. Isoler les extrants des intrants est une position insoutenable en ce qu’elle fausse le raisonnement en oblitérant le lien de causalité entre des œuvres protégées et la production de contenus générés par une IA. Encore une fois, le besoin de transparence est criant. Se positionner sans avoir les faits permettant de procéder à une analyse témoigne d’une incurie.

Il nous semble pertinent de faire un rappel concernant l’originalité et la protection qu’elle confère à une œuvre. Le critère de l’originalité permet de distinguer ce qui ne sera pas protégé de ce qui le sera. Il est ici utile de citer l’arrêt CCH Canadienne Ltée c. Barreau du Haut-Canada, [2004] 1 R.C.S., 339, par. 16 : « L’élément essentiel à la protection de l’expression d’une idée par le droit d’auteur est l’exercice du talent et du jugement. J’entends par talent le recours aux connaissances personnelles, à une aptitude acquise ou à une compétence issue de l’expérience pour produire l’œuvre. J’entends par jugement la faculté de discernement ou la capacité de se faire une opinion ou de procéder à une évaluation en comparant différentes options possibles pour produire l’œuvre. Cet exercice du talent et du jugement implique nécessairement un effort intellectuel. L’exercice du talent et du jugement que requiert la production de l’œuvre ne doit pas être négligeable au point de pouvoir être assimilé à une entreprise purement mécanique. »

Les œuvres assistées par l’IA et les contenus générés par l’IA créent une ligne de partage. L’incertitude que l’on évoque quant aux solutions qui seront développées quant au contenu généré par l’IA – mentionnons que l’incertitude qu’a créé l’introduction de nombreuses exceptions en 2012 et les enjeux que comportent l’interprétation de nouveaux textes législatifs n’a pas semblé un élément perturbateur pour le législateur – n’est pas en soi un motif pour bousculer l’économie du droit d’auteur.

Nous soumettons que le cadre actuel fournit les outils suffisants dans la détermination de la titularité.

Violation et responsabilité en matière d’IA

Au moment où nous nous parlons, il n’est tout simplement pas possible pour un titulaire de droit qui allègue une violation de droit d’auteur de démontrer que l’œuvre a servi à entraîner un système d’intelligence artificielle. Par ailleurs, le contenu généré est aussi à risque. Le soutien à l’innovation et à l’investissem*nt débute par des pratiques d’affaires qui prennent en compte les coûts associés au modèle de l’IA, à savoir que l’entraînement de l’IA en amont implique des coûts afin d’obtenir les autorisations nécessaires. S’y soustraire entraîne des répercussions.

Que l’on demande des preuves et des données probantes pour clarifier la violation et la responsabilité en matière d’IA alors que les titulaires de droit sont dans l’obscurité quant à ce qui a servi à générer les contenus nous semble une façon de procéder qui équivaut à du « reverse engineering » et nous ne pouvons y souscrire.

L’intelligence artificielle peut agrandir le champ des possibles pour les créateurs. Il s’agit d’un outil qui peut être porteur en termes d’innovation. Réaliser l’intégration de l’IA générative exige au premier chef que la transparence fasse partie de l’ensemble des processus. Cette mise au jour est absolument nécessaire afin que la Loi sur le droit d’auteur puisse jouer le rôle qui lui est dévolu. On ne peut parler de violation ou de responsabilité s’il n’est pas possible de documenter de façon très précise ce qui a servi à entraîner les intelligences artificielles.

Commentaires et suggestions

En tout état de cause, on aura compris que l’IA générative a des conséquences qui vont au-delà de la Loi sur le droit d’auteur mais touchent aussi ce que la Loi protège. Les créateurs, tout comme l’ensemble de la société civile, voient déferler des avancées technologiques que l’on pensait voir survenir dans un futur beaucoup plus éloigné. Le premier réflexe ne doit pas être de voir la Loi comme étant un frein à l’innovation. Elle porte plutôt en son sein son fruit et les sociétés de gestion collective font partie de la solution.

Copyright Office, University of Alberta

Technical Evidence

Post-secondary institutions are involved in all sides of the development and use of assistive and generative AI tools.

A generative AI tool is largely developed by humans, with the selection, accessing, and ingestion of content for training datasets directed by humans, the prompts to direct the generative AI tool to create outputs normally provided by humans, and the decision regarding whether and how to distribute those outputs generally made by humans.

Where there is an alleged copyright infringement in relation a generative AI tool, it could be either that the generative AI tool’s accessing and reproducing copyright-protected works as part of the training datasets is alleged to be infringing, or it could be that the generated output that is made public is alleged to be infringing.

Developers of generative AI tools should be aware of copyright considerations in relation to the text and data those tools ingest for their training datasets. The type of outputs that the tool is designed to generate – which may be the ultimate purpose of the tool – might have a bearing on whether and the extent to which the use by the tool of any copyright-protected text and data might be found to be infringing. This determination will generally involve a fair dealing assessment, in cases where there has been no other authorization to access and use the copyright protected text and data.

Users of these tools are often not the developers of these tools. Therefore, for users who are publishing outputs from generative AI tools, transparency regarding what tools have been used and how they are used in relation to AI-generated outputs, along with good record keeping regarding these, may be useful voluntary measures that can be undertaken to assist in the defense of cases where those outputs are alleged to be infringing and thereby to mitigate legal risks.

Text and Data Mining

TDM is an important research tool, incorporating the latest technology to save time and expense in gathering and analyzing data for research purposes. There are a broad range of texts and datasets accessed via TDM, and not all such sources, nor the ultimate uses of the research outputs, give rise to copyright concerns.

Regarding copyright licenses from rights-holders that might govern TDM activities, these may be beneficial in certain cases where the use of TDM would not reasonably be covered under fair dealing or any other exception. However, to the extent that fair dealing is applicable to TDM activities, it is important to note that “the availability of a license is not relevant to deciding whether a dealing has been fair.” (CCH Canadian Ltd. v. Law Society of Upper Canada [2004] 1 S.C.R. 339 at paragraph 70).0

As the use of generative AI tools broadens and grows, it will be increasingly significant to understand the impacts of any limitations placed on the material those learning datasets can draw upon. If the datasets are not appropriately current and drawn from appropriately diverse sources, the output could be outdated or biased.

In any discussion regarding adding clarity around copyright and TDM in Canada, “clarity” is the operative word. A broad range of cases of TDM could be reasonably considered to be fair dealings, based on the purpose of the ultimate use of the outputs derived from the TDM activity, in conjunction with the other factors of a fair dealing analysis.

There are always (or should always be) concerns about imposing strict limitations in relation to activities that have such a broad range of potential approaches and uses. To establish such strict parameters in the statute might not appropriately allow for new or unforeseen approaches or uses. It would be unfortunate if any such bright lines led to a result other than what a proper fair dealing analysis might determine.

Acknowledging that not all uses of outputs derived from TDM would be fair dealing, it would be useful to make clear to the TDM community that fair dealing is available to them, should an objection be raised under the Copyright Act to the TDM activity itself or to how the outputs arising from that activity are made publicly available.

Any such added “clarity” should do nothing to reduce or limit the application of fair dealing in cases of TDM. This added clarity might take the form of a “TDM exception” that would explicitly allow for the use of TDM in certain defined ways that are deemed to be in the public interest. If such a TDM exception were to become part of the Copyright Act, the application of that exception should not be limited by the terms of contracts governing access to content where those terms purport to override terms of the Act.

Regarding imposing obligations for record-keeping and disclosure regarding content accessed, such an approach would be undesirable and impractical. While voluntary record-keeping regarding content accessed might be a good practice, obligatory practices would be difficult to monitor and enforce, and such obligations would only serve to create a potential violation under the Act that is not copyright infringement. As was mentioned earlier, maintaining such records may be useful to the developer in the event of a claim of copyright infringement against any output generated through its generative AI tool, but that is not a compelling reason to make the burden of such record-keeping obligatory.

Regarding other jurisdictions, the US Copyright Office has issued a notice of inquiry regarding artificial intelligence and copyright. This notice of inquiry was issued on 30 August 2023, and the due date for reply comments has been extended to 06 December 2023. The outcome of that inquiry may be a useful supplement to the results of this consultation. Additionally, the European Union has recently agreed upon a new law known as the A.I. Act.

Authorship and Ownership of Works Generated by AI

There are clearly economic interests that would arise from a determination of authorship or copyright ownership for certain outputs from generative AI tools. In such cases, the determination of who would be the rightsholder in those outputs and what level of copyright protection would apply for what length of term might be important to the parties involved. However, the fact that these interests in AI-generated content are important to certain parties is not sufficient to determine whether it would be in the broader public interest to provide any level of statutory protection to this content under copyright law.

One of the foundational purposes of copyright protection is to protect the interests of (human) authors, ensuring that they can receive a just reward for their creative works. This is intended to provide such (human) authors with an incentive to spend the time and effort and skill and judgment to produce those new creative works. Providing this copyright protection to such (human) authors is in the public interest, as it encourages the ongoing creation of new works that benefit the public.

The extent to which the works created by a generative AI tool require such copyright protection to incentivize their creation is much less clear. While it may be in the public interest to ensure that the developers of generative AI tools can benefit from the value of the outputs of those tools, this benefit can readily be derived from user fees, perhaps including royalties, and other terms of use associated with their tools. Copyright protection is not necessary to further incentivize the developers of these tools, and there is no compelling reason to “reward” the users of these tools with copyright protection in their outputs.

In cases where there is significant human contribution to a “work” that also had a significant contribution from a generative-AI tool, there would be the need to establish reasonable thresholds of human skill and judgment relative to the contribution of the tool to reach the level of “human authorship” and thus copyright protection, as well as effective means of determining whether such thresholds have been reached.

Infringement and Liability regarding AI

Assuming no new changes in what counts as an infringing work under the Copyright Act, then the existing legal tests that are currently applied to decide whether a work authored (entirely) by a human is infringing should be sufficient to determine whether a work generated (in whole or in part) by an AI tool is infringing.

In relation to the outputs from generative AI tools, while it may be good practice for the developers/administrators of generative AI tools to track what copyright-protected content has been accessed and/or copied in the development of the learning dataset, this may be most useful as a defense to copyright infringement. Otherwise, the existing tests for similarities between works in determining whether one infringes upon the other in copyright disputes should be sufficient. The courts are well-equipped to decide these issues.

Regarding providing greater clarity on where liability lies if an AI-generated work is found to be infringing, again, the operative word is “clarity”. The owner/controller of the generative AI tool might reasonably be liable, at least in part, if the outputs of that tool too closely resemble existing copyright-protected works. Similarly, the individuals who made the decision of what prompts to provide the generative AI tool, and whether and how to distribute the outputs from that tool, might also reasonably be liable, at least in part, in cases where those outputs are found to be infringing.

However, given the very broad range of possible specific circ*mstances for such cases, it would be important that any such additional “clarity” would not limit the range of liability that might apply to any of these parties. There are a number of human decisions at play at all stages of the process, by each of the parties to the process, and it is the extent of the connection between those decisions and the specific nature of the alleged infringement that will determine whether and the extent to which each of the parties involved in the matter will be liable, or not be liable, for the ultimate distribution of an infringing output. Again, existing approaches to determining and apportioning liability should be sufficient in such cases, and the courts are well-equipped to decide these matters.

Comments and Suggestions

At the end of the day, generative AI is a tool to serve human purposes, some of which involve the generation of outputs that on their faces might give rise to copyright protection. The statutory copyright regime is in place to provide protection to works where that protection benefits the broader public interest. Before the choice is made to expand or modify copyright protection of works to accommodate the use of generative AI tools to create outputs that are published, it should be properly determined whether existing rights, protections and exceptions under the Copyright Act, and the underlying goals of providing those rights, protections and exceptions, might be sufficient as they are to address any issues that arise.

Some “additional clarity” might be required around how the rights, protections and exceptions apply in these cases, but a minimally intrusive approach has generally been the initial approach of the copyright regime in dealing with the impacts of new technologies. In the case of generative AI, it may be too early to try to do too much. If sufficient guidance can be provided so as not to unduly stifle the development and use of such generative AI tools in ways that serve the broader public interest, that may be the best outcome at this stage.

Council of Canadian Innovators

Technical Evidence

The Council of Canadian Innovators does not work directly to develop AI products and services or use them such that we actively use training datasets.

Text and Data Mining

The use of large amounts of training data is essential to producing viable artificial intelligence and machine-learning (AI/ML)-based products and services. In some but not all cases, data are drawn from publicly available sources, including on the open internet.

At the technical level, how text- and data-mining (TDM) activities provide algorithmic models with training data is widely misunderstood to involve the storage and repeated reading and ‘copying’ of data by algorithms from a database of materials used in training. Most such activities do not work in this way and could plausibly be said to fall under the scope of fair dealing. It is also important to note that some proposals, including that AI training be restricted to data in the public domain, would only increase the likelihood of biased outputs due to the significant age of many such materials, as well as make AI-powered products and services less useful to end users.

Clarity and guidance in establishing which text- and data-mining activities and practices fall within the scope of fair dealing, temporary reproductions or similar clauses could increase certainty for investors. With that said, AI is a broad family of technologies and care should be taken to ensure that regulation, clarification or other interventions into the marketplace treat the technology and the copyright regime holistically and especially do not serve to bring Canada’s existing AI governance regime out of overall alignment and compatibility with jurisdictions like the United States and the European Union.

With regard to obligatory recording and disclosure of the use of copyrighted materials in TDM work, this would create a very high burden on smaller, innovative companies and restrict access to AI training at scale to large, established and mostly multinational players. This would seriously hamper the potential of Canadian companies to scale significantly, particularly if such requirements brought Canada as a jurisdiction out of regulatory alignment with peer countries.

Given the emergence of opt-outs from inclusion in training data sets and other mechanisms in this market to respond to the concerns of copyright owners, as well as ongoing efforts to more broadly regulate AI technologies in Canada and abroad, Canada’s innovators feel that it would be premature for government to do more than offer guidance regarding copyright at this stage and to let the market mature, and then allow more specialized regulatory bodies like the proposed AI and Data Commissioner to re-examine the issue in more detail at a future date.

Authorship and Ownership of Works Generated by AI

The policy space governing works created using AI tools and authorship is still in its infancy, and Canada’s innovators wish to indicate that comments here are of a prefatory nature and could evolve as the space matures.

Given that AI tools are not persons, the assigning of authorship for copyright purposes to works created using AI to the AI tool itself is (given the current state of technology) nonsensical. AI tools should be considered for the purpose of authorship and copyright as equivalent to other software tools used in creative endeavours, such as Adobe’s Photoshop. These tools enable the creation of creative output, and many incorporate AI/ML tools within their suite of options for users, but still rely on human creativity and talent.

Similarly, ‘prompt engineering’ for generative AI tools could be considered comparable to human inputs in other design and creative software for the purposes of determining authorship. The difference in using a natural-language interface in the form of text prompts with an algorithmic tool to edit or create an image or text and using a stylus, mouse or keyboard to do the same is fundamentally not a difference that should matter unduly to a technology-neutral framework like the Copyright Act.

This is an area that would benefit primarily from clarification as case law evolves rather than active attempts to change legislation. From the point of view of Canada’s innovators, given that digital software in the use of creative industries, including the use of AI/ML tools and products, is not fundamentally new, the current law is adequate to meeting the challenge of governing generative AI tools and services.

Infringement and Liability regarding AI

As discussed above, Canada’s innovators believe that AI tools in a copyright context are best considered in comparison to other software tools used in creative endeavours, such as text, video or photo editing software. As also discussed above, itemizing training data would be prohibitive for all but the biggest companies, and in any case would not actually serve to enhance copyright protection since AI models’ interaction with training data is best understood as fair dealing or as a temporary reproduction. If users are using software tools to facilitate the infringement of copyright, the responsibility for infringement rests with users rather than the developer of the tool.

Comments and Suggestions

N/A

Carys Craig

Technical Evidence

It is important to note that, in cataloguing the wide range of uses being made of AI systems in different fields, a distinction should be drawn between text and data mining (or what the INDU Copyright Act Review report called “informational analysis”) and generative AI. The public discourse is currently largely concerned with real and perceived risks presented by generative AI as a source of potential outputs that may compete with human-authored works, while the significant promises that may flow from informational analysis conducted through text and data mining seem to have been somewhat forgotten in the fray. By launching a second consultation focused on generative AI, there is a risk that the kind of text and data mining (TDM) that featured prominently in the previous consultation—and which motivated the INDU Committee to recommend the enactment of a copyright exception to expressly permit informational analysis—has been submerged by a wave of concerns (and something of a moral panic) about generative AI. In reviewing submissions in response to this question, then, the government should take clear note of the distinction between TDM (as a technologically facilitated process with a wide range of uses and applications) and generative AI (as a particular kind of machine-learning AI tool typically defined by the nature of its outputs). In addition to the training requirements and potential uses of generative AI, then, the vital role that TDM and informational analysis can play in research and the advancement of knowledge should remain at the forefront of any copyright policy response to AI. (See Sean M. Fiil-Flynn et al, “Legal reform to enhance global text and data mining research: Outdated copyright laws around the world hinder research”, 378(6623) SCIENCE 951-53 (2 December 2022).

Whether in educational environments like the university or in the professional practice of law, it must be expected that people – students, educators, researchers, administrators, clerks, lawyers, judges, etc. – will use generative AI systems to assist them. As a technological tool, generative AI may be used in ways that enhance productivity, improve efficiency, elevate performance, equalize capacities, and advance inclusivity. It may also be used in ways that are unprofessional, unethical, antisocial, exclusionary, or that exacerbate inequalities.

In legal research, for example, systems such as the Lexum AI tool on Canlii can be used to generate helpful summaries of cases and legislation in a manner that may improve understanding and ultimately advance access to justice. Chat-GPT may be used to efficiently generate, e.g., technical or formulaic legal writing, or even first drafts of documents or papers that will help stimulate ideas or assist with organizing information. Of course, Generative AI may also be misused as an ostensible research tool to produce, e.g., plausible sounding but “hallucinated” cases and citations that will mislead or misinform, or to generate important legal documents and decisions with insufficient (human) consideration, oversight, or developed expertise. It may be used without appropriate acknowledgement to produce outputs that effectively substitute for, e.g., scholarly writings or student assignments, in a manner that violates the letter or spirit of professional ethics and academic honesty.

Generative AI is, after all, a general-purpose technology with potential uses in multiple spheres of activity. The value, appropriateness, and desirability of such uses will depend on the specific context, purpose, and effect.

In law and socio-legal studies, AI systems may also be used to conduct important computational research and informational analysis. Osgoode Hall Law School’s Professor Sean Rehaag, for example, uses GPT-3 to extract data from online Federal Court dockets, enabling the identification of patterns in outcomes in thousands of cases. As Professor Rehaag writes, this project demonstrates “how machine learning can be used to pursue empirical legal research projects that would have been cost-prohibitive or technically challenging only a few years ago – and shows how technology…can…be used to scrutinize legal decision-making…” (See Sean Rehaag, “Luck of the Draw III: Using AI to Examine Decision-Making in Federal Court Stays of Removal (January 11, 2023). Refugee Law Lab Working Paper (11 January 2023), Osgoode Legal Studies Research Paper No. 4322881, https://ssrn.com/abstract=4322881.

More broadly, as explained in the IP Scholars Submission (2021), the growing importance of TDM as a research method cannot be overstated. This is true across the full range of scholarly disciplines, as well as in journalism, education, civil society, and a wide range of commercial research. Unfortunately, copyright law may already be acting as a barrier to some such initiatives, having a chilling effect on research, journalism, and civil society projects.

In legal education, legal practice, and academia – and, indeed, any field of activity -- different AI use-cases present substantially different benefits and harms, many of which remain either anecdotal or speculative. It is therefore a mistake to generalize about AI uses or to extrapolate broad conclusions about the appropriate trajectory of AI technologies from particular use cases. The public interest in facilitating informational analysis in health research, for example (whether commercial or non-commercial) is very different from the public interest in facilitating the AI generation of graphic art. What we can do with these technologies now may be very different from what they enable us to do even a few years into the future. It would be premature and a mistake to anoint copyright law (which is concerned with encouraging the creation and dissemination of works of the arts and intellect (Théberge v Galerie d’Art du Petit Champlain Inc, 2002 SCC 34 (SCC)) with a central regulatory role in attempting to balance the harms and benefits of generative AI, specifically, or artificial intelligence technologies more broadly.

Text and Data Mining

Given the vast volume of inputs required as training data for machine learning, copyright restrictions on the use of protected works place an enormous burden on AI research and development. There is currently a lack of clarity about the lawfulness of TDM projects in Canada. This uncertainty extends to core activities at each stage, from the compilation of large data sets to the technological processes involved in informational analysis and the training of AI models, as well as to any subsequent sharing or storage of data sets or models for, e.g., testing, replication, or transparency purposes. At any stage, the creation of digital reproductions of copyright protected works could present the risk of liability—and that risk could significantly chill AI research, development, and deployment.

Unfortunately, this risk looms large in Canada even although digital copies used for training purposes will typically never reach a public audience. As explained in the IP Scholar's submission (2021), such “non-expressive” uses do not implicate the legitimate interests of copyright holders. Assembling datasets of digital copies is only the necessary first stage in the process of informational analysis whereby data is extracted from the works and tokenized, allowing models to be trained to identify and replicate patterns, frequencies, correlations, and structures. This is what enables generative AI to predict and output text, for example, that sounds plausible and appropriate. In this training process, the expressive work of authorship within a dataset is translated into statistics–the meaning is turned into math. The copyrightable work itself is typically not retained, while the kind of information extracted from the work is beyond the scope of copyright (which does not protect information, data, knowledge, ideas, styles, or commonplace elements). The digital copies assembled in training datasets are made for technical, incidental, and non-public purposes. Nonetheless, the copyright liability risk remains under letter of the current law and jurisprudence. The chilling effect of this uncertainty may be compounded by Canada’s statutory damages provisions since potential liability, calculated on a per copy basis, could be staggering.

The IP Scholars’ Submission (2021) also explained why TDM would normally constitute fair dealing. Fair dealing in Canada is a user right, which “must not be interpreted restrictively.” (CCH [48]) Given the “large and liberal” interpretation to be accorded to fair dealing purposes, most TDM is likely to qualify as research or private study, meeting the first step in the fair dealing analysis (SOCAN v Bell [2012] SCC 36 [15]-[30]) Many uses made for machine-learning purposes are also likely to be “fair” under the second step given the purpose, character, and effect of the dealing and the absence of realistic alternatives. A court would consider the training purpose of the use as well as the fact that the copy typically does not reach the public, constitutes only a minimal fraction of a massive data set, and so does not compromise the core economic or moral interests of the copyright owner nor substitute for the work of the author in the market. It would likely also recognize that obtaining permission for the millions of works typically contained in a dataset would be an impossibility such that there is no realistic alternative to the use (Alberta v Access, 2012 SCC 37, [31]-[32]). Restricting training data only to non-copyright protected materials would produce inferior and incomplete data sets, reducing the capabilities of the AI and its fitness for purpose. In most projects, the use of copyright protected materials is reasonably necessary to achieve the ultimate purpose (CCH, [57]) of the AI researcher/developer. Even if it were somehow possible for an AI developer to license for the vast quantity of materials required for training purposes, this would not be relevant to finding fair dealing (CCH, [70]).

If the vast majority of TDM activities is likely lawful under Canada’s existing copyright law, there is a concerning bent in the questions posed in the questionnaire regarding licensing. The questions seem to assume that securing licenses to conduct TDM activities in Canada is either necessary or appropriate. If TDM is non-infringing, licenses are unnecessary. As the Supreme Court explained, “[i]f a copyright owner were allowed to license people to use its work and then point to a person’s decision not to obtain a licence as proof that his or her dealings were not fair, this would extend the scope of the owner’s monopoly over the use of his or her work” (CCH, [70]). Redundant licensing arrangements produce unnecessary transaction costs, obstacles, and exclusions (especially for smaller, less well-resourced players) while also producing windfall gains and costly rent-seeking behaviour amongst rightsholders and their representatives. They should be avoided and certainly not encouraged as part of a policy response.

Clarity in the law may be an important goal but it is not everything. Any legislative amendment that clearly subjects AI researchers/developers to onerous rights’ clearance requirements, owner opt-ins or opt-outs, or rightsholder remuneration obligations will chill AI research and development in Canada. As noted in the IP Scholars’ 2021 submission, clear copyright obstacles to TDM would also negatively affect competition in the AI industry, incentivize secrecy, and disincentivize transparency, sharing, and testing. It could also reduce the quality of AI models and their outputs, obstructing the use of comprehensive, inclusive, high-quality datasets, thereby encouraging reliance on limited, incomplete, and exclusionary or biased data sets, with detrimental results.

Rather, the Act should be amended to clarify that informational analysis or TDM does not infringe copyright. This could be by means of a new exception, an addition to the incidental inclusion or temporary reproduction for technological processes provisions (ss.30.70, 30.71), and/or as an addition to the enumerated purposes in the fair dealing provisions (s. 29 or a new s. 29.3). Opening fair dealing by adding the words “such as” would allow dealings for non-enumerated purposes to qualify, accommodating most TDM activities while avoiding the risks of technology-specific legislative amendments.

Drawing distinctions between commercial and non-commercial uses and/or users is unnecessary and will create obstacles to TDM and new grounds for confusion about scope and meaning. Requiring lawful access as a condition is unnecessary and redundant (if the copy or the manner in which it was accessed was unlawful, liability would already attach to the infringing actor). Such a restriction on the availability of the exception could create new grounds for uncertainty (defining lawful access; determining which jurisdiction's law applies; assessing the scope and applicability of express and implied contract terms, the relevance of TPM circumvention, the degree of knowledge required by the use, whether lawful access must be to lawful sources, etc.) (See e.g. Thomas Margoni, “Saving Research: Lawful Access to Unlawful Sources”, Kluwer Blog, 22 Dec 2023) Under the current copyright and anti-circumvention law, a lawful access requirement could effectively foreclose the availability and effectiveness of a TDM exception.

If general transparency or disclosure obligations are to be imposed on AI developers (and there may be good reasons for doing so), these should not be tied specifically to “copyright-protected content”. Any requirement to record and disclose every copyright work used in training (bearing in mind that this includes not only traditional works of authorship but any and all original texts, images, photographs, videos, transcripts, etc. scaped from the web and compiled in a vast data set of billions of items) as well as, presumably, its author, owner, current copyright status, etc. would be incredibly (indeed impossibly) onerous, impracticable, and ultimately unfeasible.

The questionnaire asks what level of remuneration would be appropriate for the use of a given work in TDM activities; but this seems to assume that there is a right to remuneration. There is not. Nor is any right implicated for which compensation can be demanded. Moreover, in the context of vast data compilations, it makes little sense to ascribe (more than nominal) value to any particular work. And given that works are used only as sources of data, the value gained from their inclusion has no measurable relation to their market value as works of authorship. Frankly, it will be impossible under current technologies to calibrate (micro-)payments made under collective licensing arrangements to actual usage of individual authors’ works. Attempting to identify an appropriate level of remuneration for individual rightsholders is therefore a misguided exercise. A redistributive levy scheme or tax could, of course, be imposed upon providers of generative AI systems and used to finance social and cultural funds (See Martin Senftleben, Generative AI and Author Remuneration, 54 54(1) IIC 1535 (2023))), but this is a very different—and much more compelling—proposition than purporting to appropriately remunerate individual rightsholders for the use of individual works in massive AI training datasets.

As for preferred approaches, I would point to Japan’s new exception (Article 30(4)) permitting non-expressive uses, as well as to Israel (Opinion on Uses of Copyrighted Materials for Machine Learning (Dec. 2022)). I would also encourage consideration of the US response to this issue. While the scope of fair use for TDM is currently before the US courts, there are strong arguments and precedent to support the conclusion that non-expressive copies made for TDM purposes are lawful in the US (See eg Matthew Sag, “Copyright Safety for Generative AI” 61 Houston L Rev 305 (2023).

Authorship and Ownership of Works Generated by AI

As explained at length in the IP Scholars’ submission (2021), the basic copyright rules regarding the ownership and authorship of AI generated works are currently clear. AI generated works are not copyrightable works of original expression and so they belong in the public domain. In the absence of a human author who exercises more than trivial or mechanical skill and judgment in the expression of an original work, there is no existing basis on which to claim copyright.

This conclusion is consistent with the Canadian legislation and jurisprudence on authorship, copyright ownership, and originality. Under the Act, all copyrightable works require a human author. This is clearly implied or assumed in sections 5 (conditions for subsistence); sections 6 and 9 (copyright term), sections 13 (ownership), section 14 (moral rights), etc. Unlike in the US work-for-hire doctrine, there is no circ*mstance in Canada in which authorship of a work is deemed to belong to someone or something other than a human author who is the de facto source of the original expression. Unlike in, e.g., the UK, there is no specific deeming provision regarding authorship of computer-generated work. In the absence of such a provision, most jurisdictions agree that the defining threshold of "originality" precludes an AI from being an original author—AI-generated works without a human author therefore do not attract copyright protection.

Such a conclusion is also consistent with the purpose of copyright law (to maintain a balance between the encouragement and dissemination of works of the arts and intellect and a just reward for creators (CCH)) and the principle of technological neutrality (which works to maintain that balance as technology evolves (ESA v SOCAN, 2022 SCC 30). Protecting AI-generated works by deeming authorship as a legal fiction would undermine rather than advance the objectives of copyright law. There are many arguments to support this assertion, including the following (for brevity): the AI does not need incentives to create works; the creation of the AI code is already incentivized by the copyright that attaches to it as a literary work; no author is denied their just reward when AI works are refused protection; there is no apparent risk of underproduction of AI-generated works that would necessitate or justify extending copyright to ensure their protection/incentivization; and if copyright offers economic rewards to set off the “costs of expression” (Landes & Posner’s ‘Economic Analysis of Copyright Act (1989)), these costs are significantly reduced in the case of AI generated works, such that the need for an economic return is significantly lessened. Perhaps the most important reason, however, is the potential consequence of allowing copyright to attach to AI generated works. This would create a cultural environment in which vast volumes of privately owned AI works would quickly become an obstacle to human authorship and a liability risk for creators and users alike, particularly when the owners of such vast collections can also employ algorithms to identify any work bearing a "substantial similarity" to any AI-generated work. Such an incentive—and disincentive—structure would surely fly in the face of copyright’s public and cultural policy objectives. Copyright law should focus on encouraging the expressive act of (human) authorship and not on rewarding the automated generation of outputs.

I have made more expansive arguments elsewhere. See, e,g, “The AI-Copyright Challenge: Tech-Neutrality, Authorship, and the Public Interest” in Ryan Abbott (ed), Research Handbook on AI & IP (2022); “AI and Copyright” in Martin-Bariteau & Scassa (eds), Artificial Intelligence and the Law in Canada (2021); “The Death of the AI Author” with Ian Kerr, 52(1) Ottawa L Rev 33-86 (2021).

The IP Scholars (2021) agreed that AI generated works do not meet the requirements of copyright in Canada. It was also suggested, however, that amendments to the Copyright Act could further clarify and confirm this conclusion. Specifically, we recommended confirming in section 2 of the Act, that “author” means a human being/natural person and adding to section 5 a clear statement that “copyright shall not subsist in a work created without a human author.” These recommendations remain sound, and recent developments in generative AI technologies have only underscored their importance.

While it is simple to assert that copyrightable works require a human author, there remains the complicated question of how much human participation in the creation of a work is necessary in order for that person to be an author. More specifically, at what point, if any, might the user of a generative AI tool have done what is required of them to be regarded in law as an author of the output (or at least of any original expression contained in the output)?

Recent decisions of the US Copyright Office have suggested that works created with the assistance of generative AI may be registrable, but that copyright would protect only the original creative expression of the human author contained within the registered work. In the examination of “Zarya of the Dawn”, the Office approved protection for the text and arrangement of images but denied protection for individual images created using Midjourney (notwithstanding arguments on behalf of the applicant Kashtanova that it had taken multiple rounds of careful prompt composition “to get closer and closer to what they wanted to express") (Registration # VAu001480196 2023) The Office found that there was too much “distance” between the Kashtanova’s inputs and the AI outputs, pointing to a lack of predictability and control. Subsequently, the Copyright Office confirmed (88 FR 16190) that works that contain AI generated material may be registrable if the applicant claims only the human-authored content in the work (for example, the original selection and arrangement of elements in a compilation work) and excludes any more than de minimis content that was AI generated. The work “Théâtre D’opéra Spatial” has since been refused registration for containing a more than a de minimis amount of AI-generated content that the applicant was unwilling to disclaim (Copyright Review Board Sept. 5, 2023).

These examples present challenging questions for copyright law. It is important to resist an elevated vision of romantic authorship that imagines the creative process to be wholly independent and originary, and that therefore discounts the kind of iterative, selective process that might gradually shape and define a work with enough specificity to capture the author’s intended expression, even if relying on assistive technologies. There are often intervening actors and technologies involved in creative processes that limit predictability or control. (See Dan Burk, “Thirty-Six Views of Copyright Authorship by Jackson Pollick 58 Houston L Rev 263 (2020)). Photographers may not predict or control how light will be captured in a photograph, or indeed what transpires in front of the camera. Choreographers give detailed instructions to dancers but may lack direct control over the dance as performed and recorded (fixed). These photographers and choreographers are nevertheless considered authors of the resulting work. The difference, however, is that the original expressive elements in the works remain attributable to them, while the ostensibly expressive elements of an AI generated output may not be attributable (i.e. do no originate with) the AI user. The creation of prompts may amount to literary expression, but prompts may also be merely technical or functional instructions, unprotected due to the idea/expression or merger doctrine. The AI user’s inputs may also be considered merely mechanical or trivial (and therefore unoriginal (CCH)). Even tweaks made directly to outputs may be merely editorial and so unoriginal or de minimus.

Ultimately, however, it is conceivable that, in certain cases, an AI user could demonstrate sufficient skill and judgment in the expression of a specific work created using AI as an assistive technology to claim copyright in that work. Their copyright would, however, protect only those elements that constitute their original expression. Elements attributable to the machine would be beyond the scope of their right and free for others to use. In the case of compilation works, it may be simple to separate the author’s original selection and arrangement (copyrightable) from AI-generated components (uncopyrightable). In cases where authored and generated elements merge, this will be more challenging. While it may require careful dissection of works to identify protected elements in particular cases, this is nothing new. Indeed, this is how copyright always works (see Cinar (SCC)).

These issues simply require the application of existing doctrine to new contexts and should therefore be left for the courts to work through. While registration is less central to Canada’s copyright system, CIPO would be well advised to follow the approach taken by the US Copyright Office. CIPO’s apparent acceptance of AI as a designated author for copyright registration purposes is contrary to the Copyright Act. The registration of Suryast (No. 1188619) names an AI App as co-author. If an AI cannot be an author, it cannot be a co-author. Nor can it have the kind of collaborative intent required for joint authorship, nor the capacity to consent to uses of the co-authored work. That copyright term is measured from the death of the last surviving author (s 9) is reason enough to reject the idea of AI-human joint authorship.

While recognizing the evolving conceptual, practical, evidentiary and doctrinal challenges presented by generative AI, it should be clear that the proposed approach here remains as stated above and in the IP Scholars 2021 submissions: AI itself cannot be an author; and works without human authors are not protected by copyright.

Infringement and Liability regarding AI

Qu.: Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright?

I have suggested in previous responses that the activity or process of training an AI on copyright protected inputs typically does not and should not infringe copyright in those works. When considering copyright infringement by generative AI, the focus should not be on the inputs but rather on the outputs of the AI. After all, it in only in respect of the outputs that it might be argued that the generative AI is in some sense competing with human authors and artists -- this is where the effect on the market for traditional human-authored works may be felt. The important question for copyright law is whether the AI outputs constitute substantial reproductions of any particular copyright protected works upon which the AI model was trained.

AI-generated outputs implicate the copyright in pre-existing works when the outputs are substantially similar to specific copyrighted works that were present in the training data. The test is whether the ordinary reasonable observer would regard a given output as substantially similar to the copyrightable elements of a given input. The copyright doctrine of independent creation means that a work produced independently, without copying, will not infringe copyright in a pre-existing work even if it is identical. Where a substantial similarity between an AI output and a protected work is simply a matter of coincidence (i.e. the work—or a derivative thereof—was not in the training data) the similarity is no basis for liability. However, if an AI is trained on a dataset that includes a particular protected work and subsequently produces a substantially similar output, then it is hard to chalk the similarity up to coincidence: there is access to the protected work and so the necessary causal connection. Is the protected input the causa sine qua non of the output, but for which the particular output would not have been made? (See Gondos v. Hardy et al.; (1982), 38 O.R. (2d) 555 at para. 32 (Ont. H.C.J.)). The inscrutability of the algorithm’s operation may make it impossible to say. One could argue that, by virtue of the automated generative system, all of its outputs are effectively independently created. Alternatively, some may argue, the AI’s outputs are effectively derived from its inputs, and the combination of access plus substantial similarity is sufficient to establish prima facie infringement.

The point is that questions of access, causality, and substantiality similarity are always essential to the determination of infringement liability in copyright law. In my view, there is nothing about this infringement inquiry in the context of generative AI that makes it fundamentally different or even more difficult. Indeed, it may be easier to establish on what sources an AI has been trained than it is to know what works may have found their way into an author’s subconscious mind (Gondos).

It is important to stress that, in the absence of a “substantial similarity” between an output and an input in the training dataset, the outputs cannot be regarded as infringing reproductions of preexisting works. As Pamela Samuelson explains, “While it is true that outputs are, in some sense, ‘based upon’ the works on which foundation models were trained, this has never sufficed to support derivative work [infringement] claims; there must be substantial similarity in expressions to infringe that right.” (Samuelson, Fair Use Defenses, supra note 84, Part III-A-2)

If an output does not substantially reproduce the protected expression of an input, qualitatively or quantitatively, it cannot be an infringing reproduction. If it does, however, then it may be found to infringe drawing simply upon established law and precedent. Problems of proof always abound in this area of law, but they are no more pronounced (and indeed may even be more readily be answered) in the case of generative AI than in other more traditional creative contexts.

Qu.: Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

Who is responsible for an AI's infringing output? Copyright infringement is a matter of strict liability and does not require knowledge; but it does require causation and, in line with common law principle of tortious liability, some degree of responsibility for, or control over, the unlawful act (CBS Songs Ltd. v. Amstrad Consumer Electronics plc., 2 All E.R. 484). In the US jurisprudence, this has been articulated as a requirement of some “element of volition” with respect to the infringing conduct. (Mala Chatterjee & Jeanne C. Fromer, “Minds, Machines, and the Law: The Case of Volition in Copyright Law” (2019) 119:7 Colum. L. Rev. 1887) A volitional act requirement coheres with the intuition that individuals should be held responsible only for that which was under their control. In the case of AI outputs, the allocation of liability should depend on who, if anyone, has the necessary degree of control over the production of an infringing copy. In some cases, this may be the AI programmer (who could have programmed the AI to discard outputs that are substantially similar to individual inputs); in others, it may be the AI user whose prompts and selections produced the infringing output. In some cases, there may be joint liability while, in others, it may be that no person had responsibility or control over the infringing act.

Even if individuals are not responsible for doing the infringing act, it could be argued that they “authorize” it (s. 27(1)) where they “sanction, approve or countenance” the infringing act (CCH). Importantly, however, “a person does not authorize infringement by authorizing the mere use of equipment.” Moreover, “[c]ourts should presume that a person who authorizes an activity does so only so far as it is in accordance with the law.” (CCH, par 38) Given the unpredictable nature of an infringing output by the AI system, the programmer or deployer of an AI tool cannot be taken to have authorized the infringement merely by virtue of having provided the technological means by which an infringing reproduction was made. However, if they could have employed, e.g., technological means to prevent the production of outputs that are substantially similar to inputs in the training data, it may be possible to find sufficient control to ground authorization liability in certain instances.

Arguably, the user of an AI system could be said to have authorized the making of infringing copies where their prompts demonstrate a sufficient degree of control over the AI’s outputs (regardless of their knowledge of the particular inputs on which the AI was trained). One might query whether authorizing a machine is the same as authorizing a person to infringe, but section 3 does not require the relevant infringing acts to be carried out by a person. It could be argued, however, that the user of generative AI is entitled to presume that the AI will not produce infringing content that is substantially similar to any particular work on which it was trained, at least in the absence of control over its generative processes.

In the absence of a human actor with de facto control over the machine’s outputs, there may be questions about whether and where liability could attach. In such a case, can the AI itself be said to infringe? The Copyright Act states that “it is an infringement of copyright for any person to do…anything that…only the owner of the copyright has the right to do.” (s. 27(1)) It would seem that, under the current law, the AI itself cannot be an infringer. Applying a principal-agent analysis, however, the machine’s act qua “agent” could potentially produce liability for whichever party notionally fits the role of “principal”—most likely the person that creates or deploys the AI. Notably, the legal questions about infringement liability are not specific to copyright law. These questions remain to be worked out in every area from criminal to contract to negligence and tort.

The matter may appear more academic than practical, as there is little economic reason to have liability attach at the moment that an output is generated and visible on the individual AI user’s computer screen. Economic concerns are more likely presented when such copies begin to circulate, undermining the market for the original. The exploitation of infringing copies produces its own form of secondary liability that could readily extend to those seeking to exploit infringing AI outputs. It can be secondary infringement for a person to sell, rent, or distribute copies of works and other subject matter where the person knows or should have known that the copies infringe copyright (s. 27(2)). This could capture the sale or distribution of both infringing copies within a training dataset, for example (if these are not protected by fair dealing) and, potentially, infringing copies produced by a generative AI. Strictly speaking, however, the latter will be “infringing copies” only if an autonomous AI is deemed capable of infringing copyright in the first place. Secondary infringement requires that, for liability to attach, a person knows or should have known that the copy “infringes copyright or would infringe copyright if it had been made in Canada by the person who made it.” Once solution here may be to amend s. 27(2) to include dealing with “a machine-generated copy that would infringe copyright if it had been made in Canada by an unauthorized person.”

As suggested above, the kinds of questions about legal liability and responsibility presented by AI generated works are not unique to copyright law, however, and remain to be worked out across the board. By and large, existing copyright doctrine is well equipped to address and work through these questions as they arise.

Comments and Suggestions

The joint submission of the fourteen Canadian IP Scholars submitted in the previous Consultation on a Modern Copyright Framework for Artificial Intelligence and cited in the current Consultation Paper addressed many of the issues raised in this Questionnaire. Its authors, all experts in Copyright Law and disinterested commentators on the copyright issues presented by AI, were already familiar with the technological affordances of generative AI at the time of that prior consultation. While it is certainly true that the rapid proliferation and growing ubiquity of tools like Chat-GPT and Midjourney, for example, have brought greater public attention to the arrival of generative AI, developments between the prior consultation and this one should not displace the positions previously articulated in that joint submission. I would urge that this earlier collaborative submission, representing the combined expertise of many of Canada’s leading copyright scholars, should continue to inform policymakers deliberating more specifically on generative AI and copyright at this time.

I would also like to express my concern about the way in which this second consultation paper and questionnaire have shifted baseline expectations and principles. The proliferation of generative AI tools has certainly raised awareness about the technology and its affordances. It has also brought out numerous self-interested actors and market incumbents who see the opportunity to reap windfalls from the arrival of AI through the vehicle of copyright law and its expansion. I would urge the government to carefully evaluate the claims, demands, and assumptions of lobbyists, industry actors, collective management organizations and their representatives whose efforts to cash in on or reap financial gain from TDM and generative AI are perfectly understandable but ought not to guide or dictate Canadian copyright policymaking. Copyright law serves a public interest in encouraging creativity and fostering a vibrant public domain -- if reforms are to be made to the law in response to technological developments, this should be in an effort to ensure that copyright law continues to serve the public interest, maintaining an appropriate balance between incentives and access, and between the rights of owners and users as technology evolves. (See Craig, “The AI-Copyright Challenge: Tech-Neutrality, Authorship, and the Public Interest” in Ryan Abbott (ed), Research Handbook on Artificial Intelligence and Intellectual Property (2022)). While public consultation is commendable, voices representing the public interest may be few and far between, and risk being drowned out by more vociferous stakeholders and their economic interests.

As such, I note with concern the way in which the consultation paper explains the two main objectives in the balance when considering possible copyright policy options: 1) To support innovation and investment in AI and other digital and emerging technologies in all sectors in Canada; 2) To support Canada's creative industries and preserve the incentive to create and invest provided by the rights set out in the Canadian Copyright Act (the Act), including to be adequately remunerated for the use of their works or other copyright subject matter. First, it is worth noting that, under the Act, rightholders have only limited exclusive rights to control reproduction, publication and performance of their works, subject to users’ rights and consistent with the public interest (York University v Access Copyright 2021 SCC 32 [91]-[94]). They do not have a per se right to “adequate remuneration” for "use" of their works. There is no such general guarantee within the Copyright Act. More importantly, the rights and interests in the balance, as stated, include only industries’ interests in incentives to innovate, create and invest, and rightsholders rights to adequate remuneration. Nowhere in this so-called “balance” is any mention made of the public side of the balance – where is the public’s interest in the dissemination of works, for example, or users’ rights to use works for research or private study, education, downstream creativity, etc.? In other words, the consultation paper’s articulation of the objectives to be balanced in no way reflects the copyright balance as repeatedly articulated by Canada’s Supreme Court, and nor does it fairly represent the interests of users and the public that are surely at stake.

Novel as it may be, it should be acknowledged that generative AI is raising age-old questions about the socio-economic structures of cultural production, creative activity, and the public interest. Policy makers are now being called upon to protect creators from the extractive, exploitative, and even existential threat posed by generative AI. In the haste to act, however, there is a risk of running into what I think of as a "copyright trap”: the mistake of assuming that strengthening and expanding copyright law is the best tool to support creators and cultural production (when in fact it may do more harm than good).

One way into the copyright trap is to assume that everything that has value must be privately owned. This leads to the assumption that AI outputs should be protected by copyright, but also that anyone who reaps value from a work without permission has taken something to which they have no right. In fact, many valuable things belong in the public domain and many productive uses are free. The mere fact that AI developers and users gain some value from training AI on pre-existing works, for example, need not render that activity unlawful free-riding; copyright owners are not thereby being denied the original value or enjoyment of their work to which they were otherwise entitled.

Another route into the copyright trap is assuming that copying is inherently wrong such that every unauthorized copy must be unlawful. The reality is that copyright’s focus on copies and copying is an ill fit for the digital age where copying is easy and virtually costless, and almost every consumptive activity in relation to a work involves at least background digital copying. The policy focus should shift away from the mere technicality of making copies to the dynamics of creation and distribution in the modern cultural marketplace. In the context of AI and TDM, a misplaced fixation on copying directs our attention to digital copies in training data sets that are never enjoyed or consumed by public audiences or recipients. Such copies are immaterial and their existence as “copies” ought not to be what shapes and confines the future development of artificial intelligence.

Finally, we risk running headlong into the copyright trap when we assume that the allocation of private copyright control holds the answer to creators’ economic struggles, empowering authors and artists to secure fair returns and future livelihoods. The trope of the starving artist has been dusted off and leveraged by market incumbents with the arrival of each new paradigm-shifting technology since the printing press. Sadly the reality is that copyright ill-serves the artists and creators that it purports to save. The intermediaries demanding their fair treatment are often the ones taking ownership of copyright from creators and extracting the bulk of royalty payments before passing along any remaining benefits. In the fast-moving debate over generative AI, the same dynamics are already apparent. It will be important for the government to assess the benefits that will actually flow to authors for any, e.g., collective licensing model or opt-in/out structure that it may consider putting in place. It seems highly unlikely that, given the vast numbers of works involved in ultimately producing AI outputs with relatively little economic value, a pro-rata micro-share of license fees could possibly make a significant difference in the economic lives of artists and authors.

There are many good reasons to be concerned about the rise of generative AI and the threats that it presents to human creatives, consumers, and the public domain. There are also good reasons to be wary of allowing copyright law to shape the future development of AI technologies. If we want to encourage the development of ethical and responsible AI, we ought to ask what kind of material and training data should be available to AI developers to help to advance that goal. Care should then be taken not to erect copyright barriers that could compound the risks of bias in AI and produce a lack of competition in the AI industry.

Writing about copyright and the Internet in her book ‘Digital Copyright’, Jessica Litman wrote about “The Art of Making Copyright Laws”. She observed that “the threat and promise of the Internet has induced those of us who are copyright lawyers to an act of breathtaking hubris. We define a set of rules that we say ought to be the basic copyright rules of the road, and then we construe those rules to govern every single way that information coded in electrons can move from one computer to another” p30.

As we contemplate the threat and promise of generative AI, I would caution Canada’s lawmakers away from similar copyright overreaching – copyright law is neither apposite nor equipped to govern the way that generative AI is developed, trained, deployed, or enjoyed. Insisting that it should do so, and imagining that it is up to the task, could do far more harm than good.

As stated in the IP Scholars’ 2021 submission, "copyright law should not create barriers to entry in the development and advancement of new innovations.” Whether it is restrictions to training or onerous permissions, payment or record-keeping requirements, such copyright barriers would impede AI-related research and development, effectively determining who can engage in it, and negatively impacting the quality and functionality of AI models that are poised to become a pervasive part of our professional, social, and cultural lives.

D

Digital Media Association (DiMA)

Technical Evidence

First and foremost, Digital Media Association (“DiMA”) appreciates this opportunity to contribute to the Canadian government’s consultation via the Ministry of Innovation, Science and Economic Development as it examines “Copyright in the Age of Generative Artificial Intelligence.” DiMA represents the leading audio streaming services and innovators– Amazon, Apple Music, Feed.fm, Pandora, Spotify, and YouTube. Together these services connect millions of fans across the nation and around the world with essentially the entire history of recorded music, providing unique listening experiences and constantly innovating to strengthen the connection between artists and fans. While DiMA’s member companies differ in size and business model, the revolutionary services they have built rely on the guidance provided by the current Canadian copyright laws that allow our members the flexibility that is critical to continue to innovate and develop global audio platforms that drive the recorded music industry’s revenues and have returned the music industry to growth. DiMA’s members are proud of the services they have built and their substantial efforts to provide legal access to music and reduce piracy while ensuring a dynamic and engaging experience for fans.

DiMA members encourage fans to legally engage with copyrighted content. Equally, DiMA members partner with rightsholders to protect that content online. Over the last decade, streaming has played a central role in the resurgence of the music industry and we urge extreme caution with regard to any legislation or recommendations from the government that may upend (even inadvertently) the reliability that existing copyright laws provide.

Audio streaming services launched in the mid-2010s in Canada. Since that time, the growth has been remarkable, with benefits to Canadian music fans, creators, and rightsholders.

Canada currently ranks eighth in the list of top countries in recorded music revenues, and in recent years streaming has played a larger and larger role in music consumption. (IFPI Global Music Report 2023, available at https://www.ifpi.org/wp-content/uploads/2020/03/Global_Music_Report_2023_State_of_the_Industry.pdf.)
In early 2021, there were over 9 million music streaming subscriptions in Canada, up from less than 8 million the year before. (Music Ally, Canadian music industry encouraged by 2020’s streaming growth, https://musically.com/2021/03/09/canadian-music-industry-encouraged-by-2020s-streaming-growth/)
According to the International Federation of the Phonographic Industry (IFPI), in 2022, streaming revenue in Canada increased by 10.1% year-over-year (and 18% year-over-year in 2021), continuing a trend of annual double-digit growth. (IFPI Global Music Report 2022; see also 2021 Study of the economic impacts of music streaming on the Canadian music industry, Prepared for Canadian Heritage by Wall Communications Inc., at Figure 6 (Jun. 2, 2021), available at 2021 Study of the economic impacts of music streaming on the Canadian music industry - Canada.ca.)

The state of the music industry in 2024 represents a true sea-change from the circ*mstances of just a decade ago, when the music industry was faced with declining revenues and struggling with how to get people to pay for music. DiMA’s members and their rightsholder partners changed that trajectory. “Over six in ten (63%) that have stopped illegally downloading music now use streaming services.” Russell Feldnman, Number of Britons illegally downloading music falls, YouGov (Aug. 2, 2018), available at https://yougov.co.uk/topics/arts/articles-reports/2018/08/02/number-britons-illegally-downloading-music-falls; see also Joost Poort et al., Global Online Piracy Study, 48, Univ. of Amsterdam (July 2018), available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3224323 (in 13 countries studied by the University of Amsterdam, streaming is the most commonly used channel to access music, ahead of digital downloads and physical copies).

And such beneficial and robust voluntary partnerships between DiMA members and rightsholders have and should continue to be leveraged to work through AI-related concerns. Voluntary partnerships forged between services and rightsholders are the most direct and effective measures toward addressing the challenges raised by AI-generated content.

Further, DiMA continues to support industry-wide efforts to facilitate adequate identification or information regarding music via metadata, including for AI-related creations. The growing influx of music, along with historic metadata challenges, has led to persistent, challenges throughout the industry at-large to accurately identify works, creators, and rightsholders in order to efficiently and correctly process royalties at the speed of digital commerce.

In this regard, DiMA submits that the initial responsibility to identify the use of AI in music via metadata must rest with the rightsholders or owners of those works, who are in a position to know how they were created. Any subsequent obligation on distributors to include identifiers must require those identifiers to be built in before they reach the streaming services, in a way that services can ingest and deploy in their own systems. There is not a meaningful or practicable way for streaming services to identify these works on an individualized basis, let alone at scale, nor is it appropriate to make the service, rather than the source of the work, responsible for ascertaining how a work was made. Rather, the responsibility of streaming services is to ingest data provided to them and be a good steward of that data as it moves through the supply chain.

Whatever amendments (if any) to the Canadian Copyright Act result from the present consultations, these should not burden music streaming services nor create new obligations, for example with respect to monitoring the validity of AI labels or watermarking and/or ensuring compliance with new labelling rules related to AI-generated content.

Text and Data Mining

The digital music industry has long been vexed by data challenges. The growing influx of commercially released music, along with historic challenges around creating timely, standardized identification of works at the source and flowing that data through multiple stakeholder systems, has led to persistent challenges throughout the industry in accurately identifying works, creators, and rightsholders in order to efficiently and correctly process royalties at the speed of digital commerce. Whereas generative AI has the potential to be a force multiplier to help address those data challenges, new data requirements relating to AI-generated content may also confound these challenges if not carefully tailored.

There are two existing exceptions to copyright infringement that could potentially apply in the context of TDM activities: (1) the fair dealing exception for research in section 29; and (2) the exception for temporary reproductions for technological processes in section 30.71. Current uncertainty as to whether and to what extent these exceptions would apply to TDM activities could stifle innovation in the field of AI.

DiMA submits that any potential exception to copyright infringement for TDM activities should seek a balance between industry interests so as not to stifle innovations while also protecting certain creator rights. Consequently, in the instance that a TDM exception is created, it must:

with respect to the exception for temporary reproductions, permit reproductions that are entirely automated or that employ both computer automation and some human operations (such as labeling by humans of data in the works); and
ensure that any public domain work is not covered; in other words, cover only copyright-protected works.

Authorship and Ownership of Works Generated by AI

AI can now create content that is difficult to distinguish from content created by humans. Although the Copyright Act does not explicitly define the term “author,” Canadian copyright jurisprudence suggests that “authorship” must be attributed to a natural person who exercises skill and judgment in creating the work. This is in line with the approach in other jurisdictions, including the United States. DiMA supports an approach whereby copyright authorship is attributed to the person, natural or corporate, who makes arrangements for the creation of the work, which should be interpreted in a manner consistent with existing Canadian copyright law principles, such as the exercise of skill and judgment by a natural person.

DiMA believes that this approach adheres to the fundamental principle of technological neutrality, which provides that the Copyright Act should not be interpreted or applied to favour or discriminate against any particular form of technology. Further, this approach promotes certainty for the courts and market participants.

Infringement and Liability regarding AI

DiMA members believe that existing copyright laws, including copyrightability (originality, de minimis contributions, scenes a faire, and the idea/expression dichotomy); and infringement (questions of unlawful appropriation, substantial similarity, safe-harbors, and causation), are well-equipped to address novel issues raised by AI technology. Accordingly, DiMA asserts that new legislation to amend Canadian copyright law is neither necessary nor appropriate at this time. Existing legal doctrines as well as the use of contracts can and should be employed to consider liability questions arising from the creation and distribution of AI-generated music.

What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?

This is a highly technical question, and the answer may vary from one AI model to the next, and over time. In that context, there is generally no way for streaming services to know whether any given sound recording or musical work was created with the use of AI, let alone whether such use resulted in copyright infringement. Streaming services provide fans access to millions of songs. They must not be made the arbiters of the legal status of music delivered to them by third parties, whether due to the use of AI or otherwise.

Moreover, the use of a particular streaming service by a third party to train an AI model likewise should not lead to a finding of liability for the streaming service in the event the AI model is found to have engaged in infringing conduct. There is often no way for streaming services to know whether AI is training from their platforms, or how to block any such use at a technical level. While services work to detect and prevent uses that violate their terms of use, services must not be held liable for the actions of third parties. Instead, existing Canadian copyright law and precedent should be the default framework with which to analyze these complex, fact-specific questions.

Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

To the extent the outputs of a generative AI model are found to be infringing, such findings must not create direct or secondary liability for third parties who distribute such materials in good faith. In the case of music streaming, services should not be held liable for merely distributing the music delivered to them by their label and digital aggregator partners to fans. DiMA’s members must not be tasked with being the arbiter of what is or is not created by or using AI, or being the enforcer of AI-related mandates that are reliant on the action or inaction of others, for example, regarding data labeling.

DiMA opposes amendments to the Copyright Act that would unduly burden music streaming services by creating new obligations or liability for streaming services and other intermediary stakeholders in the music supply chain, where such liability should rest with the creator of the content itself. In particular, DiMA opposes the creation or expansion of new notice-and-takedown rules that would expand liability or impose monitoring obligations on streaming services for AI-related activity. DiMA’s members must not be required to proactively determine which of the millions of songs on their services were created using AI, and which of those songs may represent an act of infringement. To do so would, among other concerns, generate costs that could degrade the consumer music streaming experience to the detriment of all stakeholders in the music ecosystem, as well as raise concerns about the technological neutrality of the music streaming in Canada.

Comments and Suggestions

Music streaming is a consumer-driven industry. Any public policy around AI must not undermine consumer choice, for instance by increasing the barriers to entry into the market, or by otherwise limiting access to music that fans legally wish to hear, including if such music is partially or wholly AI-generated.
Canadian copyright law is – and must remain – technology neutral. Technological neutrality has served all stakeholders in the copyright ecosystem well for decades and has allowed for constant innovation without negatively impacting the creative and economic interests of rightsholders and users.

Directors Guild of Canada

Technical Evidence

1. The Directors Guild of Canada (DGC) appreciates the opportunity to submit comments and recommendations as part of the public consultation on Copyright in the Age of Generative Artificial Intelligence, organized by the Department of Innovation, Science, and Economic Development (ISED). In 2023, Artificial Intelligence has become a focal point of interest for the DGC, as the audiovisual industry undergoes rapid disruption due to the widespread use of various forms of artificial intelligence, particularly Generative AI. This disruption is exemplified by the role played by AI in the recent dual strikes by the Writers Guild of America and SAG-AFTRA in Hollywood.

2. The DGC is a national labour organization that represents key creative and logistical personnel in the film, television and digital media industries. It was created in 1962 as an association of Canada’s film and television directors. Today, it has over 7,000 members drawn from 47 different craft and occupational categories covering all areas of direction, production, editing and design of screen-based programming in Canada.

3. In a recent survey of national DGC members, respondents described their relationship with AI tools and expressed concerns about the potential harms and risks of AI in relation to their jobs. More than 85% of members indicated that AI should be considered of high or extremely high importance in the list of DGC advocacy priorities.

4. Seventy-two percent of DGC members are familiar with the concept of Generative AI and machine learning but very few admit to having partial or solid AI skills. Looking ahead one to five years, 56% of members are concerned about the possibility of their expertise and skills being replaced by AI.

5. When it comes to the use of Generative AI, usage varies depending on the DGC Caucus and work category. Members already report a large positive impact of AI on their creative, logistical, and administrative work. It should be noted that Generative AI is generally used as a tool to augment productivity, and members see the value of enhanced tasks, whether they are creative (for instance the creation of a mood board or inspirational visuals for a director) or logistical (a text-to-text Generative AI used by an assistant director for example).

6. For directors’ authorship rights specifically, the risks are multiple: widespread text and data mining, including to train and deploy AI models, have the potential of robbing their creative and economic rights, and the lack of clarity regarding the attribution of copyright when using a Generative AI tool also endangers their economic prospects.

7. For the DGC membership and the screen industry at large, it is rather difficult to predict how AI systems might transform jobs and work routines or affect creative rights. Considering this, the DGC believes that the Government should exercise restraint before having a clear understanding on the impact of TDM on the existing rights and production ecosystem.

8. The DGC considers the copyright-related questions of the Government consultation as being equally relevant to both business and cultural concerns. This is an opportunity to consider how the rules around AI can serve creators and the society in general. In DGC’s view, it is essential that future AI developments preserve and augment human creativity.

Text and Data Mining

The creative industries need clarity around Copyright and Text and Data Mining to operate

1. The present consultation on Copyright and Generative Artificial Intelligence is an opportunity for the Government to provide further clarifications and safeguards to prevent text and data mining (TDM) to further disrupt the existing copyright creative/production ecosystem. The screen production industry, in Canada as well as in other key jurisdictions, including the United States, operates on the clarity of copyright ownership and market-based licensing and remuneration mechanisms.

2. We note the consultation asks “how and when rightsholders could or should be compensated for the use of copyright-protected content as inputs in the development of AI”. In our view this question omits the fundamental consideration as to whether copyright owners have the right to authorize the uses of their works for TDM purposes. This issue is not solely about remuneration; it is whether creators should be able to control whether their works are copied and used for TDM purposes in the first place. We believe they do and should have that right and this technologically neutral right needs to be preserved with reference to TDM activities.

3. Common industry practice with regards to agreements and contracts has developed in such a way that the creator, producer, studio, broadcaster or distributor participate in a production with the assurance of having assigned or secured the rights necessary to exploit it. Licensing and the orderly exploitation of rights is at the centre of the film and television industry. These rights need to be preserved in order to generate revenues, ensure investments are recouped and profits are realized and to ensure continued employment by the creative sector in Canada. These rights also need to be protected, especially when it comes to TDM.

4. Therefore, content creators and audiovisual authors need complete clarity around copyright and TDM. A modern copyright system should continue to protect creators, authors and copyright owners among new AI developments. As a first step, ensuring that AI developers provide more transparency around TDM in order to allow creators to provide consent and be rightfully compensated is one of the most urgent DGC priorities with respect to AI alongside authorship and ownership, which are both part of a general copyright policy centered around the creator.

5. The TDM process used in the training of Generative AI systems involves the cloning of large swaths of copyrighted works. While Generative AI companies now are realizing that they need licenses to operate legally and are starting to negotiate licenses , in the absence of transparency and legal clarity, such licensing may be inhibited which will result in no compensation to rightsholders. Moreover, the relationship between AI companies, TDM activities and rightsholders is not clearly defined yet in Canada and many other similar sized jurisdictions.

6. Both the Generative AI “inputs” and “outputs” also threaten human creation and creators’ livelihoods. When using copyrighted works without consent, Large Language Models (LLMs) risk devaluating human creation. TDM then represents a form of “copyright laundering”, as there is not a clear understanding of the processes at work with machine learning. The public and Government need to understand the different steps when datasets are being treated by LLMs. TDM seemingly converts stolen or copyrighted data so that it can be later sold or used by ostensibly legitimate Generative AI tools.

7. The operators of LLMs claim that their models do not contain copies of works ingested for training purposes. However, that is not the case. Recent research confirms that LLMs are capable of generating outputs that are exact copies of the input data. As such, the contention that only “logical” or “semantic” information is extracted from ingested works simply cannot withstand scrutiny. The operators of AI tools also argue that their uses of works are “transformative”. But the reality is that Generative AI tools do not transform but instead exploit the entirety of the databases they mine.

8. While DGC directors and other members are protected by collective agreements entered with producers, they have no such protection in the case of new players such as AI companies. These players are not bound to DGC contract provisions, and the TDM can potentially harm directors’ artistic expression and remuneration.

9. The DGC believes that there is no compelling argument to create new exemptions for TDM and to do so would adversely affect the copyright balance that promotes innovation in our industry. The current Copyright Act is interpreted to be technology-neutral and is adaptable to technological developments. The balance in the Act both promotes innovation and enables rightsholders to license and exploit their works and be rewarded for their creative efforts.

Recommendations:

10. Should the Canadian government consider a TDM exception, it should include the criteria identified above and be narrowly crafted. The exception should only apply where the work is accessed lawfully by the beneficiary and, insofar as the rightsholders have not reserved in an appropriate manner the rights to make reproductions and extractions for text and data mining - including through agreements and licenses, machine readable means, or through the use of technical measures.

• Therefore, the DGC recommends that the Government should not amend existing exceptions or introduce new ones for text-and-data-mining.

11. The list of copyrighted content used to train AI tools should be made available to the rightsholders and users. Additionally, the content output produced by Generative AI tools should be identified and marked, such as with watermarks, to separate synthetic creations (in part or as a whole) from human ones. Rightsholders should also have the legal right to make demands of any AI company that offers services in Canada, for a list of works used to train the AI model, or at the very least, whether specific identified works owned or licensed by a rightsholder have been used to train the AI model. There should also be a right to require the AI entity to conduct and publish independent audits that include relevant information related to the works used to train the AI models and whether the works remain stored in the models (in any material form), and whether they are capable of generating outputs that reproduce all or any substantial parts of ingested works.

• The DGC recommends that the Government create an obligation to respect Canadian Copyright law within Canada’s Artificial Intelligence Act (AIDA).

12. The proposed amendments to Canada’s draft Artificial Intelligence Act (AIDA) include measures that must be taken before a general-purpose AI system can be made available or changed. The DGC recommends that, in addition to the requirements already included in the proposed amendments, that AIDA expressly include a requirement on persons who make available or manage a general-purpose system to:

(a) Put in place and follow a policy to respect Canadian copyright law. This should apply to any TDM exception should one be enacted. These obligations should apply regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of these systems take place. This is necessary to ensure a level playing field among providers of general-purpose AI systems where no provider should be able to gain a competitive advantage in the Canadian market by applying lower copyright standards than those provided in Canada. This is consistent with the draft EU Artificial Intelligence Act.

(b) If the system generates digital output consisting of text, images or audio or video content, a detailed plain-language description has been prepared and published of the copyright works and other subject matter that have been used to train the system in accordance with the regulations.

13. We note that Bill C-27, before Parliament, is intended to regulate certain high impact AI systems. The DGC submits that any AI system that is developed or deployed in Canada and that uses copyright works for its training data should be subject to regulation under AIDA. The definition of “harm” under AIDA should include any wide scale unauthorized use of works for training AI systems.

• The DGC recommends that the Government should require more transparency from AI companies regarding their text and data mining activities, such as making available public registries of the data used to train the datasets.

Authorship and Ownership of Works Generated by AI

36. The Consultation questions regarding authorship and ownership of copyright in AI-generated works reflect the current ongoing debates taking place in many jurisdictions: can the copyright be attributed to AI-generated works? In the United States and in European jurisdictions, the global trend is to defend human authorship as a central principle over machines. The DGC is concerned that the lingering uncertainty surrounding authorship and AI might negatively affect creators using Generative AI tools and their likes.

37. The DGC is supportive of a comprehensive copyright framework that continues to provide predictability and fair remuneration in the film and television industry for Canadian directors and creators – including the ability to negotiate for such rights on any future platform, including AI entities.

38. The DGC has collective agreements already in place with producers’ associations that recognize directors’ rights. Under the Canadian Copyright Act and DGC’s collective agreement, DGC directors are entitled to authorial rights, including moral and economic rights. However, Canadian directors and writers have no recourse under their respective bargained collective agreements to protect credit on their work if an AI entity, which has no contractual relationship with the Guilds, scrape their work for text and data mining.

Canadian authorship attribution parameters

39. The Copyright Act attributes authorship to an individual human author and considers the originality of a work to be the determining parameter to award copyright. Consequently, the authorial rights should continue to be attributed to human directors, even when AI tools are being used, as long as there is sufficient original human involvement in the process of creation. This position has already been validated with the US Copyright Office ruling of May 2023 that only protects the product of human creativity.

40. A main purpose of the Copyright Act is to reward the authors for their creative labours and to incentivise creation by providing them with exclusive rights. In its current state, the Act protects the creator’s status, including when AI is being used to assist in the creative process. This provides a sound framework for protection.

41. DGC opposes giving protection for synthetic content produced solely using Generative AI tools. This would effectively permit individuals that use Generative AI tools to obtain copyrights derived from the creative labours of our members and this synthetic content would then be able to compete with the original content of our members. This is in line with a recent US court ruling that says that AI computer generated art cannot be protected by copyright.

42. Earlier this year, the Directors Guild of America (DGA) negotiated unprecedented AI protections in their new collective agreement with the U.S. studios. These new protections require that the work and duties of directors must be done by a person, excluding recognizing Generative AI as a person in the contract. The DGA agreement also underlines the importance of director decision-making with regard to the use of Generative AI and other AI technologies in its work.

43. In the face of the rapid development of AI technologies, and as exemplified by the recent strikes from U.S. unions and guilds, collective agreements today represent the most appropriate and quickest way to address questions related to authorship and protect human creativity. Canada’s Copyright Act is technology-agnostic and should be protecting authors and creators’ moral rights and all exclusive rights when being used by AI companies.

44. Despite our view that copyright can only subsist in works if they are original and result from the exercise of skill and judgment of individuals, because of the importance of the issue, we recommend that the Government clarify that the originality required for copyright subsistence must be the skill and judgment of human beings

DGC Recommendation

Recommendation 3: The DGC recommends that the Government make an amendment to the Copyright Act to confirm that the originality requirement in the Act is solely the skill and judgment of human beings.

We also note that the Consultation Paper refers to the approaches in the U.K., Ireland and New Zealand, whereby “in the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”

The DGC opposes any amendment that would deem authorship to “the person by whom the arrangements necessary for the creation of the work are undertaken.” As noted above, under the current law, directors are considered the author of film and television productions. Any such amendment risks introducing a major change in the law that could give producers copyright in film and television productions. Further, where the productions are based in part on the use of Generative AI, it risks giving producers at least some copyright interest in works that under the current law would vest all copyrights in the director.

DGC Recommendation

Recommendation 4: The DGC recommends that the Government not amend the law to deem authorship in a computer-generated work to be in the person by whom the arrangements necessary for the creation of the work are undertaken.

Infringement and Liability regarding AI

The DGC believes that the liability rules related to copyright infringement are well established. The DGC believes that existing laws related to direct infringement (including infringement by reproduction and authorization) and accessorial liability (including inducing, procuring and acting in concert), and defenses to infringement such as fair dealing are sufficient for the present time, subject to the following.

There is currently uncertainty as whether outputs that are derived from copyright works but that do not reproduce all or a substantial part of a work are infringing. This is particularly problematic because many outputs of AI systems are generated based on the uses of many works. The outputs may be in the “style” of an author or otherwise appropriate small portions from many works. Thus, deployers AI systems may be able to evade copyright infringement in their outputs by denying that any particular output is a reproduction of any particular work. This situation could also risk negatively influencing a fair dealing analysis of whether training of AI systems using unlicensed content is infringing, as courts will likely focus on many factors including whether the outputs of Generative AI systems are infringing.

Based on the foregoing, we urgently recommend that the Government study how to protect copyright owners against the generation of outputs that are based upon or derived from the uses of copyright content even though individual outputs may not be infringing. This should include consideration of the need to expand exclusive rights, ensure that a fair dealing analysis is not tainted by a gap in the law that does not recognize collective contributions of authors to the generation of computer outputs, and how to ensure that authors can obtain compensation for this derived synthetic content.

We note that France’s National Assembly Bill aimed at regulating artificial intelligence by copyright would establish a mechanism to provide for compensation to authors for all computer-generated content whose origin cannot be determined.

DGC Recommendation

Recommendation 5: The DGC recommends that the Government study how to protect copyright owners against the generation of outputs that are based upon or derived from the uses of copyright content even though the individual outputs may not be infringing.

Comments and Suggestions

N/A

Annex

Annex: Detailed Questions