Blog Cover

The Landscape of Open Source Licensing in AI: A Primer on LLMs and Vector Databases

Zilliz
6 min readApr 28, 2024

The concept of open-source software has been common in the technology industry for decades. However, more clarity about the implications and restrictions of different licenses for developers and businesses is still needed. This guide demystifies open-source licensing related explicitly to AI technology, like vector databases and large language models (LLMs).

Open source means that the creator makes software, hardware, or even a large language model available to the community for free. These projects are often developed and maintained by community efforts, typically involving collaboration by developers from many different companies. The license type under which the product or software is provided governs how different open-source products can be used.

Unexpected changes to a software project’s open-source license can significantly and potentially harm companies and businesses that have built offerings around open-source software. This dynamic adds additional complexity and underscores the importance of understanding open-source licensing.

Benefits of Open Source Vector DBs and LLMs

Vector Databases

Open-source vector databases like Milvus (provided under Apache License 2.0) benefit the AI ecosystem. Since it is freely available, developers can rapidly prototype solutions while minimizing the costs of building new applications. Since the code base is open and accessible, developers and businesses can review how it works in detail to make sure it aligns with their plans and standards. This increases trust and confidence and helps users decide how to implement it within or alongside other applications. Finally, Milvus, like other open-source vector databases, was developed in partnership with the creators Zilliz and the broader Milvus user community. This has allowed everyone to benefit from the shared development and expertise of other organizations like NVIDIA, IBM, SalesForce, and others.

Large Language Models

Open-source large language models (LLMs) have seen a dramatic increase in availability and adoption in the last year. Proprietary LLMs, on the other hand, are exclusively owned by a company and are accessible only to customers who purchase a license like OpenAI’s GPT. Such licenses often impose limitations on the LLM’s usage. In contrast, open-source LLMs are freely accessible to all, allowing for unrestricted access and utilization for any purpose, modification, and distribution.

With LLMs, the open source component pertains to the accessibility of the LLM’s code and foundational structure. This accessibility grants any developer and researcher the freedom to utilize, enhance, or modify the model. This openness increases access by reducing long-term costs for developers looking to build solutions that leverage the power of LLMs; this is especially true for organizations without in-house model development and machine learning talent. Open-source large language models can also be deployed within a company’s data infrastructure, which reduces the risk of exposing private data to an outside source like a model controlled by an external or maybe even competing company. Finally, since open-source LLMs can be modified, they can be tuned, optimized, and enhanced for an application’s specific use case. The open code base increases trust and transparency by allowing developers and data scientists to review the model’s construction and training in detail.

The Spectrum of Open Source Licenses

Open-source licenses come in various types, each with its own set of permissions, restrictions, and requirements. It’s important for developers and users to understand the implications of each type of license to ensure compliance with the terms and conditions set forth by the license.

Here are some of the common types:

Permissive Licenses

Permissive licenses give users extensive freedom to use, modify, and distribute the software without many restrictions. Examples include:

  • MIT License: Allows almost unrestricted use, modification, and distribution with minimal requirements.
  • BSD License: Similar to the MIT License, it permits almost unrestricted use but with slightly different requirements.
  • Apache License: Permits the use, modification, and distribution of the software under certain conditions.

Copyleft Licenses

These licenses require that modified or derived works also be distributed under the same license terms as the original software. Examples include:

  • GNU General Public License (GPL): Requires that any derivative work is distributed under the same GPL terms, ensuring that modifications remain open source.
  • GNU Lesser General Public License (LGPL): A modified version of the GPL that allows linking with non-GPL software under certain conditions.
  • Mozilla Public License (MPL): Allows modifications and distribution under the MPL or any compatible license.

Weak Copyleft Licenses

These licenses require that only the modified parts of the software be distributed under the same license terms as the original software. Examples include:

  • GNU Affero General Public License (AGPL): An extension of the GPL designed for network/server software, requiring distribution of source code to users interacting with the software over a network.

Non-Commercial Licenses

These licenses restrict the use of the software for commercial purposes. Examples include:

  • Creative Commons Non-Commercial License: Permits non-commercial use, modification, and distribution of creative works.

Public Domain

Some developers choose to release their work into the public domain, effectively relinquishing all rights to the work. Users can freely use, modify, and distribute the software without any restrictions.

Governing Bodies and Communities

A few key organizations play a vital role in governing open-source licensing standards, ensuring adherence to principles of openness, transparency, and collaboration. Two prominent entities in this domain are the Open Source Initiative (OSI) and the Free Software Foundation (FSF).

The OSI maintains the Open Source Definition, a set of criteria a software license must meet to be considered open source. It evaluates and approves licenses that meet these criteria, helping to maintain consistency and clarity within the open-source community.

On the other hand, the FSF advocates for free software and promotes the use of licenses, such as the GNU General Public License (GPL), which ensures software freedom.

The Apache Software Foundation (ASF) is another key organization that plays a significant role in the governance of open-source licensing standards. Primarily known for developing widely-used software projects such as Apache Hadoop and Apache Kafka, the ASF provides a framework for open and decentralized development and employs a permissive licensing model. The Apache License allows for commercial use flexibility while ensuring that derivative works remain open source.

Additionally, community governance is crucial in shaping licensing policies and practices. Open-source projects often have community-driven decision-making processes where contributors and stakeholders discuss and decide on licensing matters. Community involvement helps maintain trust, transparency, and consensus within the open-source ecosystem, fostering innovation and growth while preserving the integrity of open-source software.

The Degrees of Openness

The degrees of openness inherent in different licensing models influence collaboration, innovation, and transparency in AI development. Permissive licenses encourage a broad community of contributors, fostering rapid iteration and experimentation. In contrast, copyleft licenses prioritize the preservation of open-source ideals, safeguarding against commercial exploitation at the expense of broader adoption.

Recent License Transitions and Controversies

Notable shifts in licensing models by AI technology providers like Redis and HashiCorp have sparked debates surrounding sustainability and ethics. Motivations range from protecting revenue streams to addressing concerns about fair compensation for contributions. These transitions underscore the nuanced balance between fostering innovation and safeguarding the principles of open-source collaboration.

When a company changes the license to their open-source project, it can be particularly concerning to users and businesses who have built products based on that open-source code. Suppose a company that provides open-source software suddenly closes the source or uses a more restrictive license. In that case, it can mean that the other businesses utilizing the last open-source version code may have to take on the full burden of maintaining the code and developing new feature sets.

Why Licensing Matters in AI

Licensing is not merely a legal formality but can determine AI technologies’ trajectory. It governs accessibility, adaptability, and equitable distribution, shaping the AI ecosystem. It’s important to balance the protection of intellectual property (IP) with fostering an environment of collaboration in AI to drive innovation and ensure inclusivity.

Currently, the AI industry is expanding at an alarming speed. New technologies, use cases, and even companies are emerging every day, and everyone seems eager to get in on the frenzy. With this fast-paced innovation and race to market, we can likely expect to see companies adopt open-source code to speed development efforts and increase innovation through broad collaboration, but we may also see a reflexive shift in open-source licensing applications as companies try to preserve their IP and pathways to revenue.

Conclusion

Open-source licensing is the cornerstone of collaborative development and innovation in AI, defining the boundaries of access, usage, and distribution. As we navigate the complexities of licensing models, let us remain informed and proactive in shaping a future where AI technologies serve the collective good. Embracing the spirit of open collaboration, we have the opportunity to create a more inclusive and sustainable AI landscape. To read more on Zilliz’s thoughts around open-source license restrictions and our open-source approach read here.

Resources:

--

--

Zilliz

Building the #VectorDatabase for enterprise-grade AI.