How Strong Policies Can Seal AI Gaps

As AI outpaces copyright laws and regulatory guidelines, researchers offer 3 proposals to keep tech in check

MIT IDE

Published in

MIT Initiative on the Digital Economy

6 min readJan 29, 2024

By Peter Krass

Artificial intelligence is advancing so quickly, it threatens to outpace U.S. laws and regulations that might govern the technology.

But policy and lawmakers take note: Those looking to pick up the pace have three new recommendations from researchers at the MIT Initiative on the Digital Economy (IDE) to guide them through the thorny thicket of AI advancements.

Unfair ‘Fair Use’

One hot topic is whether generative AI, the technology’s latest implementation, inherently violates copyrights

GenAI builds its models by scanning existing content — including books, photographs, music and videos — and then rearranging that data in new ways. But debates are heating up about whether this process violates the copyrights of the content’s creators.

As Sinan Aral, director of the IDE, pointed out on a recent Yahoo! Finance broadcast, much of the debate comes down to how “fair use” is defined by the courts. Until now a certain amount of copyrighted material could be used without the copyright holder’s permission to encourage the exchange of ideas. That’s fair use.

Less clear is how “a certain amount” gets defined. Historically, it’s depended on why the content is being used, the nature of the copyrighted work, the amount used, and the effect (if any) on the copyrighted work’s value.

The issue is at the heart of several recent lawsuits. In one notable case, The New York Times sued Open AI, the creator of ChatGPT, and Microsoft, accusing the two companies of unlawfully using the Times’ work to create their AI products.

As Aral pointed out, the Times lawsuit casts fair use in a new light. “Training very large language models on millions of pieces of content by The New York Times is a new use. In other words, it hasn’t been considered in the past as being a traditional use of copyrighted material,” Aral explained. “Therefore, we have not decided as a society whether this is fair use or not.”

A lot is at stake, and Aral said the lawsuit’s implications could be broad and far-reaching. “If the courts or a settlement determine that payments [are needed] to the original producers of the training data, that could increase the costs for AI, generative AI companies, and generative AI startups,” he said. It “would make a big difference for how the industry runs.”

2. Labels to Disable Fake News

AI technology is also outrunning the laws and regulations that aim to fight fake news and misinformation. AI’s potential for misuse, mischief, and outright crime is enormous:

· “Deepfake” photographs can violate the privacy of individuals.

· Fake news on social media can mislead voters.

· AI “hallucinations” — instances where the technology generates false information, but presents it as true — can create confusion, even mayhem. (Just ask the New York attorney who used ChatGPT to research case law citations, only to discover, during trial, that none of the citations actually exist.)

One solution may be to use warning labels to alert users to the presence of AI-generated media that may or may not be accurate.

That’s the focus of a recent policy brief co-authored by David Rand, a professor at MIT Sloan school and the lead of the IDE’s Misinformation & Fake News research group.

The policy brief, published in late November, was released as part of an MIT ad hoc committee on AI governance. Other papers produced by committee members describe an overall framework for ensuring a safe and thriving AI sector; a proposal for “pro-worker AI”; and recommendations for oversight of large language models (LLMs).

How could labeling be enforced? Rand and his co-authors support the U.S. AI Labeling Act of 2023. If passed into law, the act would require developers of GenAI systems to include “clear and conspicuous” disclosures of content developed by AI.

Rand also proposes a set of guidelines with two main goals: Explain the technical processes through which a piece of content was generated or modified, and assess the content’s impact and potential for harm.

In their policy brief, Labeling AI-Generated Content, the researchers also call for a framework that’s applicable across the entire technology industry. Otherwise, they warn, we risk creating a fragmented or unreliable labeling system that “could engender mistrust and further blur the lines between reality and fiction.”

3. Restraining the Trainers

Regulating corporate giants and the fear of monopolies are other vexing issues. AI innovation could be stifled if the models used to train AI systems are dominated by just one or two corporate giants. And that worries Alex “Sandy” Pentland, director of both the MIT Human Dynamics Lab and MIT Media Lab Entrepreneurship program. He also leads the IDE’s Building a Distributed Economy research group.

Pentland co-authored a working paper last year with recommendations to ensure that competition in the GenAI training space remains open and dynamic.

Entitled Competition Between AI Models, the MIT working paper argues that AI training competition — along with the innovation it drives — is at risk. In the worst-case scenario, the authors warn, the entire training sector could be consolidated around a handful of strategic, but not necessarily highly innovative players.

To prevent this and to encourage training data collaboration, Pentland and co-author Thibault Schrepel, an associate professor of law & technology at Vrije Universiteit Amsterdam, offer five recommendations for encouraging AI framework competition and innovation:

· Ensure new rules and standards for GenAI foundational models are enacted only after the publication of an impact assessment. This assessment would document whether the rules and standards would likely to lead to monopoly power.

· Give these impact assessments to regulators with specific AI expertise, and then, coordinate with a new, informal council comprising members from different regulatory agencies.

· Create antitrust-rule exemptions that accelerate R&D to be created by open source and open access companies. Also, let these companies pool their resources in joint ventures, without having to notify antitrust agencies.

· Level the field for all players in the general public foundation model ecosystem. Once that’s established, permit foundation models to be trained on data that is both publicly available and proprietary and personal. If foundation models only train on nonproprietary data, that would favor big players able to pay gigantic licensing fees. The same is true for personal data; smaller companies tend to have fewer personal data than their large counterparts. By making this exception to data-ownership and privacy laws, competition is encouraged.

· Implement complex adaptive regulations (CAR). These regulations can be adapted over time, as needed, to protect innovation.

Pentland led sessions at the recent World Economic Forum conference in Davos, Switzerland. The sessions were presented by Imagination in Action, a group, co-sponsored by Tata Consultancy Services and MIT that brings together business leaders and academic researchers. On January 17, Pentland co-led sessions on “Navigating the GenAI Revolution,” “AI Enhanced Financial Security,” “Better Decision Making: The Real Value of AI,” and “GenAI: Navigating New Frontiers in Politics.”

Peter Krass is a contributing writer and editor to the MIT IDE.

Additional reading on AI content labeling : https://medium.com/mit-initiative-on-the-digital-economy/how-should-ai-generated-content-be-labeled-ba76a0f08628

How Strong Policies Can Seal AI Gaps

As AI outpaces copyright laws and regulatory guidelines, researchers offer 3 proposals to keep tech in check

Written by MIT IDE