How to test Databricks’ Dolly 2.0?
TLDR
We had our first look at the recently released Dolly 2.0, an open-source instruction-following Large Language Model (LLM). This model is similar to ChatGPT’s but with a free license for research and commercial use. Furthermore you can run it locally! However, running the model efficiently can be a challenge. When running it on my laptop, each prompt took anywhere between 10 minutes to several hours. Follow along with this blog to learn how to make the model run more efficiently on Google Cloud Platform (GCP).
For whom is this blog?
This blog is for anyone interested in recent innovations in LLM and who wants to try out Databricks’ new Dolly 2.0. It will be especially useful for people with access to a Google Cloud Platform. Based on this blog, you can have the model up and running in 10 minutes. No specific technical knowledge is required.
Introduction:
Dolly 2.0 is Databricks’ latest release. It is an open-source, human-generated instruction-following, large language model (LLM). This 12B parameter language model has been fine-tuned on a human-generated instruction dataset and is the first open-source instruction-following LLM suitable for both research and commercial use. By open-sourcing the model, the dataset, and the training code, Databricks is enabling companies and other organizations to create their own customized LLMs without cost or license restrictions.
Upon seeing Databricks’ recent post I was wondering how well it works. Perhaps unsurprisingly, on my local machine, the model takes quite long to return a single prompt. Asking it to count to three took 20 minutes.
This blog is based on the notes of my first attempt to get it running on Google Cloud Platform. In it, I try to achieve an acceptable response time by using a decent machine with GPU attached. To be sure, there are several other options to achieve this (not in the least on a Databricks notebook). This is a mere first attempt and does not pretend to be complete or even the best way.
Dolly
You need an active GCP project (with billing enabled) to follow along. The first step is to create a Vertex AI Notebook. You can also use a regular Virtual Machine, but the advantage of these Notebooks is that they come preconfigured. This saves you some trouble in installing TensorFlow and the necessary drivers to run an attached GPU.
Navigate to the Vertex AI section, click on Workbench in the left-hand menu, and click on the “+ NEW NOTEBOOK” button and then on customize.
The second step is to choose the environment configuration. Click on the Environment tab. I opted for the default Debian operating system and the pre-configured TensorFlow Enterprise 2.11 environment.
Under “Machine type,” I selected the “a2-highgpu-1g” and added one GPU. Other machine types may be better suited; let me know what you find. The monthly estimate may serve as a reminder to shut down the machine after your experimentation. The rest of the settings you can leave as default. Now click create and wait a few minutes as the machine is provisioned and the necessary software is installed.
Once the notebook instance is ready, click on the “OPEN JUPYTERLAB” button. Alternatively, you can access the notebook’s SSH by going to the VM instances page and clicking on the “SSH” button. I opted for the latter option.
The last step is to install the required dependencies and set a few things straight. In the notebook or SSH terminal, execute the following commands to install the necessary libraries:
Download the required packages:
pip install accelerate==0.17.0
pip install transformers[torch]==4.25.1Update the LD_LIBRARY_PATH environment variable to include the NVIDIA cuBLAS library path:
export LD_LIBRARY_PATH=/opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATHCreate symbolic links for the NVIDIA Inference library files. There seems to be a version clash. Let me know in the comments if you can think of a more elegant way to solve this.
cd /lib
sudo ln -s libnvinfer.so.8 libnvinfer.so.7
sudo ln -s libnvinfer_plugin.so.8 libnvinfer_plugin.so.7That’s it! Open Python and execute the following commands to start using the model.
import torch
from transformers import pipeline
generate_text = pipeline(model="databricks/dolly-v2–12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")If everything works correctly, you should now have a working pipeline using the Dolly v2–12b model from Databricks. You can now use this pipeline to generate text or perform other natural language processing tasks.
While Dolly may not be as powerful as ChatGPT 3.5 or 4, it still produces responses that are quite impressive. Its occasional random and humorous replies are charming. We eagerly anticipate the future advancements in open-source LLM models that Databricks’ Dolly free license will undoubtedly bring.
Some results:
Conclusion
This was a first look at Dolly 2.0. The main effort of this contribution was to share a first attempt to get the model up and answering in a decent time frame. Happy experimenting!
