Talk Gen AI: Navigating the Legal Landscape

Arte Merritt
TalkGenAI
Published in
6 min readJun 13, 2024

As adoption of Generative AI continues to grow, new questions and concerns arise from a legal perspective. How enterprises navigate the legal landscape is one of the hottest topics in generative AI.

We were fortunate to have Barath Chari, a Partner at Wilson Sonsini, at Talk Gen AI to discuss some of the key issues facing companies exploring Generative AI. Wilson Sonsini represents more than half of the top 50 AI companies, including Open AI, Anthropic, Google, and Stability AI.

Read some of the highlights from Chari’s presentation and watch the video below.

Data rights

To build a generative AI model, you need a lot of data.

When obtaining data, it is important to think about what rights you have in the data and to ensure that you are using it in a compliant way, as Chari points out.

One place companies tend to go to for high quality data is an academic university — for a university data set. While this data is often free to access, it often has noncommercial-use limitations, or other restrictions.

Another common option is open-source data. People think open-source means free and that they can do whatever they want with it, but that is not true. As Chari explains, you have to make sure you understand what restrictions are on that use and make sure it comports with your use case.

The other option folks tend to use is web-scraping. As Chari explains, all the foundational models at their core were based on scraping the Internet — and they have all been sued.

Web scraping is a high risk endeavor. First, there is the risk of copyright infringement. Second, there are potential breach of contract claims with the website terms of service. Third, there is also a “trespass to chattel” claim, which is based on an old common-law action regarding touching someone’s horse and it getting hurt. What it means on the Internet is if you place a lot of strain on someone’s server, and the server is not able to respond as fast as it should, they could claim you trespassed to chattel, or trespassed to their server.

Fair use

What about “fair use?” As Chari states, this is the billion dollar question. Is what all the model providers have done with scraping, fair use?

It is complicated. As Chari explains, “fair use” is not a license. It is not a right to train. It is a defense to an infringement claim.

If you are sued for infringement, and use the “fair use” claim, you will most likely go through a long trial to try and prove it. For example, Google and Oracle have been in the courts for more than 10 years over fair use claims. The issue has even been brought in front of the Supreme Court and it is still not clear.

Using the “fair use” argument is not a stable foundation on which to build an industry, but it is what we have right now, as Chari points out.

Open vs closed models

Chari is seeing more companies now working on top of existing models, rather than trying to build their own models. They use a foundational model, an LLM or LAM, or RAG on top of a model.

This is leading to an interesting dynamic with two ecosystems being built around open or closed models.

It also opens up a potential antitrust angle, related to “open early, closed late” when a model starts out as open, but after it becomes entrenched, switches to closed. It is a competition concern, as Chari explains, because the market becomes reliant on the model when open, but then it is shut down, closed.

When building on top of another model, it is important to understand the terms of the model. You also need to look at the terms from both a strategic and pragmatic perspective. Are you going to attempt to negotiate with the provider? Which terms are really important to attempt to negotiate over.

IP ownership

There are two core parts to intellectual property. One is IP rights which are intangible things, like a trademark, patent, copyright, or trade secrets. The second is what IP rights tend to attach to are actually tangible items of technology — software, Xerox machine, or a computer.

Patents and trademarks are purely a legal right — i.e. the exclusive right to do something. Copyrights and trade secrets are actually tied to something — e.g. the copyright to software or to a book.

It is important to keep this in mind as it relates to Gen AI.

For instance, AI generated output is not copyrightable. In order to be copyrightable, something has to have been made by a human. At the same time, while AI-assisted inventions are not categorically unpatentable, the output of AI itself is unpatentable, because it has to be invented by a human.

If you are using AI as part of your process, it is important to know what your role is in that work, and how you can file for something that gets protection.

Trade secrets are a valuable thing that some entrepreneurs may not appreciate. As Chari points out, of the hundreds of startups he works with, hardly any have filed for a patent or copyright, but they do keep their source code secret. He adds, from a practical perspective, if you use AI to generate code, and your terms make clear that the outputs belong to you and only for your use, and you keep those outputs secret, only shared under NDA, you still have a valuable proprietary right in the source code.

Legal risks

Copyright infringement

Copyright infringement is a key area to keep in mind as it relates to Gen AI outputs.

Copyright owners have the exclusive right to reproduce the work, create derivatives, and distribute copies, and in the case of performance art or music, the right to public performance.

If a model provider scrapes the web to make copies of your copyrighted works in order to train the model, you can claim your exclusive right to make copies was infringed.

What becomes challenging for the model provider is, if you ask the model to do something in the style of a famous work and you get a result that looks a lot like the work, it is hard for the provider to explain how they did that without having copies of the original copyrighted work.

Breach of contract

If you are using code generation tools, there is a potential for a breach of contract claim. As Chari explains, there is a case against Github and Microsoft who built Github Copilot using the software code on Github, a lot of which was open-source. However, the outputs from Copilot do not give attribution to the open source authors, which is a violation of the open source terms.

Regulations

There are new regulations coming about to keep in mind.

In the US, on the federal level, the Biden administration put out an Executive Order on AI, given concerns with national security, consumer harms, privacy, and other issues. The Executive Order still has to be implemented though, and a bunch of rules need to be passed. States are acting on their own, but those initiatives tend to be more privacy focused now.

In the European Union (EU), the EU AI act passed on May 21st. However, it still needs to be implemented. It creates a risk-based framework for AI models, categorizing based on the risks: unacceptable, high, medium, and none. There will be severe penalties if you do not adhere to the regulations — i.e. 7% of annual turnover or 35M Euros, whichever is higher.

These regulations will most likely start to kick in towards the end of the year.

Watch the video

See the full video to learn even more about the legal aspects of Generative AI.

Arte Merritt is the founder of Reconify, an analytics and optimization platform for Generative AI. Previously, he led the Global Conversational AI partner initiative at AWS. He was the founder and CEO of the leading analytics platform for Conversational AI, leading the company to 20,000 customers, 90B messages processed, and multiple acquisition offers. He is a frequent author and speaker on Generative AI and Conversational AI. Arte is an MIT alum.

--

--

Arte Merritt
TalkGenAI

Conversational AI & Generative AI Entrepreneur; Founder of Reconify; Former Conversational AI partnerships at AWS; Former CEO/Co-founder Dashbot