OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
OpenFedLLM is an a research-oriented code framework facilitating the training of LLMs using FL (enabling experiments on a single RTX 3090 GPU, finished within 8 hours) 😈😈😈
While data plays a crucial role in training contemporary AI models, it is acknowledged that valuable public data will be exhausted in a few years, directing the world’s attention towards the massive decentralized private data.
Trained on massive public data, large languages models (LLMs) have demonstrated tremendous success across a broad spectrum of fields in recent years. Nevertheless, an issue of significant concern has emerged amidst this proliferation of LLMs: it has been estimated that high-quality public data will exhaust before 2026. The scarcity of data can also be discerned from a current trend where more researchers tend to train data-hungry LLMs by combining existing datasets or using model-generated datasets, rather than collecting and generating new datasets. These indicates that the development of current LLMs could potentially come to a bottleneck since the commonly acknowledged scaling laws show that more data usually leads to better performance.
Meanwhile, an abundance of high-quality data is distributed across diverse parties but remains underutilized, which cannot be publicly shared due to issues such as privacy (e.g., medical and financial data) or physical constraints (e.g., lacking network connections). As a representative case, trained on large amounts of private financial data (over a span of 40 years), BloomberGPT demonstrates exceptional performance in finance, indicating the value of high-quality private data. However, the challenge lies in the fact that not every party possesses sufficient data to train a well-performed and data-hungry LLM individually. Considering the limitations of public data, and the high utility yet potential scarcity of one’s private data, it is critical to support the development of modern LLMs with collaborative training of LLMs on decentralized private data without direct data sharing.
OpenFedLLM is an open-source research-use codebase for training Large Language Models (LLM) via federated learning.
A recent work OpenFedLLM comprehensively explores the potential of training LLMs on the decentralized private data via federated learning (FL), a privacy-preserving training paradigm where multiple parties collaboratively train a model under the coordination of a central server. Specifically, starting from an off-the-shelf base LLM that has been pre-trained on a large corpus, we aim to train/fine-tune the LLM to achieve interested functionalities via FL, which consists of four iterative steps: global model downloading, local model training, local model uploading, and global model aggregating. Here, in the context of FL, we focus on two critical and representative procedures in the training of contemporary LLMs: instruction tuning and value alignment, positioning as two applications in collaborative and privacy-preserving training of LLMs on decentralized private data.
In OpenFedLLM, the users can easily focus on either FL or LLMs without much background knowledge of the other field (LLMs or FL). OpenFedLLM implements diverse critical features, covering federated instruction tuning, federated value alignment, multiple representative FL baselines (i.e., 7), diverse training datasets (i.e., 8) and evaluation metrics (i.e., 30+), and more. It also makes huge efforts to decouple the implementation of FL and LLM training, reducing the engineering cost of both two communities and thus encouraging their joint future contributions. Besides, quantization and parameter-efficient fine-tuning techniques are applied together with memory-saving strategies, making the training executable on one single consumer GPU (e.g., NVIDIA 3090). It is worth noting that OpenFedLLM is the first framework that simultaneously integrates federated instruction tuning, federated value alignment, and diverse FL baselines, contributing to fill the gap between these two communities.
Under some specific domains such as finance that require domain-specific expert knowledge, FL on the corresponding dataset can even outperform GPT-4
Here is an example that OpenFedLLM is used in financial sentiment analysis. we use the FinGPT dataset for training and evaluate on four financial sentiment analysis benchmarks, including FPB, FIQA-SA, TFNS and NWGI, where both accuracy and F1 score are measured.
The above table shows the accuracy and F1 score comparisons among various models. From the table, we see that (1) FedAvg significantly and consistently outperforms local training. Specifically, on average (Avg:4), FedAvg outperforms local training by 11.5% relatively. (2) On average, SCAFFOLD, FedAvgM, and FedAdaGrad are three FL algorithms that have better performance in this financial domain. (3) FL methods > GPT-4 > GPT-3.5 > local training. This shows that participating FL system provides clients with a financial model that is even better than GPT-4, which cannot be achieved if training individually. This key observation provides strong motivation for the distributed parties to collaboratively train a better LLM.
In this era of LLMs, we advocate future works in FL communities to implement their algorithms based on OpenFedLLM to examine their performance in such new application scenarios, making FL evolve with the recent trends.
There are many emerging challenges and interesting directions that are worth exploring in the future.
Heterogeneous Preference in federated value alignment. Despite the significance of FedVA which injects human values into LLMs and alleviates the requirement of one single party collecting massive annotated preference data, heterogeneous preferences in value alignment pose significant challenges. Since client data is collected independently, diverse clients could have unique cultural, ethical, and contextual values, making it challenging to train a shared model that harmoniously integrates these varying values.
Personalized Federated Learning for LLMs. For instance, in the context of federated instruction tuning, the collaboration among clients from various domains could enhance the general capability of LLMs (e.g., chatting capability), while each client is also interested in its own domain (e.g., answering financial questions).
Privacy Preservation in FedLLM. Deep learning models, particularly those of substantial size, have the capacity to memorize training data, which could raise privacy concerns. The risk is accentuated in LLMs, which due to their expansive capacity, can inadvertently memorize and potentially expose even more detailed information. This situation poses a dual challenge: ensuring the model’s effectiveness without compromising individual privacy.
Efficiency in FedLLM. Efficiency is one fundamental topic in FL, including training efficiency since clients need to afford the training process, and communication efficiency since FL requires multi-round communication between server and clients. In the realm of FedLLM, efficiency becomes even more critical since the LLMs are usually much larger than conventional models used in previous FL literature. For example, the smallest Llama2 model has 7 billion parameters while models used in previous FL works are usually at the sizes of millions (e.g., ResNet).
Check more information in the original arXiv paper as well as the associated code on GitHub.