How to install Audiocraft (Meta’s open source audio generation tool) locally, create long songs and integrate it into an existing codebase

Alfredo Lhu 🏝️💻🎸
5 min readJun 10, 2023

--

On Friday, June 9, 2023, Meta unveiled yet another amazing AI tool: Audiocraft. It is a music generator and audio processing tool powered by deep learning. In contrast to Google’s MusicLM, Audiocraft is an open-source platform, providing users with the freedom to explore and experiment as much as they desire. Today, we will delve into the process of installing it and learning how to extend the duration limit (120 seconds) so we can create full-length songs. We will also examine how to import and use Audiocraft in a Streamlit app, so you can also integrate it with your projects.

Examples:

You need to have Python 3.9 (tested with 3.10 too) and pip installed on your machine, as well as pytorch ≥2.0 and ffmpeg.

The installation is pretty straightforward.

First, clone the github repository.

git clone https://github.com/facebookresearch/audiocraft
cd audiocraft

Make sure you have pytorch 2.0.1 and ffmpeg already installed. If you don’t, then do it by running:

# Install pytorch
pip install 'torch>=2.0'

# Install ffmpeg
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install 'ffmpeg<5' -c conda-forge

Now we can proceed to install the rest of the packages with:

pip install -r requirements.txt

And you’re ready to go! Now you can run:

python -m demos.musicgen_app --share

We are ready to start playing music with our new instrument! The Gradio UI demo should be ready in your local http://127.0.0.1:7860/

That’s it! You’re free to generate as much music as you want.

The first time you run a model might take some time, since it involves downloading big files.

Longer audio

If you want to generate songs longer than 2 minutes, you can do so by making a small change in the code. Open the demos/musicgen_app.py file and look for the gr.Slider component that determines the duration of your file (line 240, as of 13/08/2023), and put whatever you want as the maximum. I set the maximum to 200, so I can generate songs 3:20 minutes long. You’ll have to restart the program to see the changes take effect.

Using Audiocraft as an API in an existing codebase

Let’s dive deeper now on how to use audiocraft as a tool in an existing codebase. For this purpose, we will create a simple app that generates a song description from a URL. We will then feed Audiocraft with this description to create audio based on the URL’s content. You will need an OpenAI API Key for this.

Create a new folder named “audiocraft_app”. Then create a python file “audiocraft_app.py” and a “requirements.txt” text file:

mkdir audiocraft_app
cd audiocraft_app
touch audiocraft_app.py
touch requirements.txt

Open the requirements.txt file and fill it with the necesary libraries.

git+https://github.com/huggingface/transformers.git
scipy
streamlit==1.22.0
langchain==0.0.176
openai==0.27.7
tiktoken==0.4.0
unstructured==0.6.8
tabulate==0.9.0
pdf2image==1.16.3
pytesseract==0.3.10

And install the libraries.

pip install -r requirements.txt

Now open the audiocraft_app.py file and write the following code:

import validators
import streamlit as st
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate
from transformers import AutoProcessor, MusicgenForConditionalGeneration
import scipy

# Streamlit app
st.subheader('URL to Song:')
st.caption(
"ChatGPT will turn the contents of this URL's website into a song description and Audiocraft will then create a song out of it and save it as musicgen_out.wav in your project's folder. This might take a while the first time your run it because Audiocraft needs to download the models.")

# Get OpenAI API key and URL to be summarized
with st.sidebar:
openai_api_key = st.text_input("OpenAI API key", value="", type="password")
st.caption(
"*If you don't have an OpenAI API key, get it [here](https://platform.openai.com/account/api-keys).*")
model = st.selectbox("OpenAI chat model",
("gpt-3.5-turbo", "gpt-3.5-turbo-16k"))
st.caption("*If the article is long, choose gpt-3.5-turbo-16k.*")
url = st.text_input("URL", label_visibility="collapsed")

# If 'Create song' button is clicked
if st.button("Create song"):
# Validate inputs
if not openai_api_key.strip() or not url.strip():
st.error("Please provide the missing fields.")
elif not validators.url(url):
st.error("Please enter a valid URL.")
else:
try:
with st.spinner("Please wait..."):
# Load URL data
loader = UnstructuredURLLoader(urls=[url])
data = loader.load()

# Initialize the ChatOpenAI module, load and run the summarize chain
llm = ChatOpenAI(temperature=0, model=model,
openai_api_key=openai_api_key)
prompt_template = """Write a 1 sentence song description, specifying instruments and style, from the following text:
{text}
"""
prompt = PromptTemplate(
template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(
llm, chain_type="stuff", prompt=prompt)
# Generate the song description based on the URL's content
song_description = chain.run(data)

# Load the MusicGen model
processor = AutoProcessor.from_pretrained(
"facebook/musicgen-small")
model = MusicgenForConditionalGeneration.from_pretrained(
"facebook/musicgen-small")

# Format the input based on the song description
inputs = processor(
text=[song_description],
padding=True,
return_tensors="pt",
)

# Generate the audio
audio_values = model.generate(**inputs, max_new_tokens=256)

sampling_rate = model.config.audio_encoder.sampling_rate
# Save the wav file into your system
scipy.io.wavfile.write(
"musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())

# Render a success message with the song description generated by ChatGPT
st.success("Your song has been succesfully created with the following prompt: "+song_description)
except Exception as e:
st.exception(f"Exception: {e}")

Save the file and run the app with:

streamlit run audiocraft_app.py

You should see your app running on http://localhost:8501/. Insert your OpenAI API Key and choose a ChatGPT model depending on your website’s content length. I used a fairly lengthy Wikipedia article about Erik Satie (a french composer from the late XIX, early XX centuries), so I selected the “gpt-3.5-turbo-16k” model. When you have your URL ready, write it in the input and hit the “Create Song” button. Your first audio might take a while because, again, Audiocraft needs to download the model.

If everything went right, your app should show a success message with the song description generated by ChatGPT, which also means that your wav file should be ready at your project’s root folder.

For simplicity’s sake, the code we wrote generates a short 5 seconds sample using the “facebook/musicgen-small” model, but now that you have everything set up, you are free to experiment with longer durations (by modifying the max_new_tokens variable) and the other models. More information about this here.

We just managed to create music with AI! Isn’t that amazing? The possibilities are endless, and it will only get better from here. The use cases include music, movies, ads, podcasting and so much more. You can even generate your own tracks for studying and working. I hope you find this tutorial helpful and that it inspires to create awesome tracks and apps.

If you have any questions, feel free to reach out to me on Twitter: https://twitter.com/thedevalweb

--

--