Is it Pop or Rock? Classify songs with Hugging Face 🤗 and Ray on Vertex AI
Introduction
During the holiday season, I dedicated some time to learning about using Transformers for audio through the Hugging Face Audio course. This course provides hands-on experience in leveraging transformers to address various audio-related tasks, such as audio classification and speech recognition. One particular unit focuses on fine-tuning a transformer model for music classification, where songs are categorized into genres like “pop” and “rock.”
Initially, It was taking significant time for fine-tuning the audio classifier using Colab and NVIDIA’s T4 GPUs runtime. To speed up the fine-tuning process I thought about why I don’t use Ray on Vertex AI to train models faster? As the saying goes, it would be a perfect opportunity to “learn by practice!”
This article covers how it is possible to fine-tune an audio classification model using HuggingFace and Ray on Vertex AI. By the end of this article, you will have a better understanding of how you can use Ray on Vertex AI to speed up ML workloads effortlessly and streamline your audio applications.
The article assumes a basic knowledge of HuggingFace and Ray on Vertex AI. If you are familiar with HuggingFace but not with Ray on Vertex AI, it is recommended to check out this previous article to have a quick introduction about Ray on Vertex AI. Also the article uses Ray 2.4.0 as the actual version supported by Ray on Vertex AI.
When HuggingFace meets Ray on Vertex AI
Imagine you have your Hugging Face script which contains your training code. Like with audio data, fine-tuning might take time. To speed up this process, Hugging Face offers the Accelerate library. Accelerate is designed to facilitate distributed machine learning workloads with PyTorch. However, it does require some minor code modifications. Additionally, you may find the code distribution process to be opaque.
For an alternative approach, you might want to explore using Ray on Vertex AI. As part of Ray AIR, Ray Train provides a HuggingFaceTrainer class (in addition to the HuggingFaceCheckpoint and HuggingFacePredictor classes) to conveniently distribute your Hugging Face training across multiple workers, as mentioned in the Ray 2.4.0 documentation. By leveraging Ray on Vertex AI, you can effortlessly obtain a Ray cluster, enabling you to execute your machine learning workload in a distributed manner.
After setting up a Ray Cluster on Vertex AI and defining your training function, you can leverage Ray Train for distributing HuggingFace Transformers on PyTorch training using Vertex AI as shown in the figure below
In other words,
- Define the Scaling Configuration: This configuration determines the number of workers and the compute resources (e.g., CPUs or GPUs) required for training.
- Initiate the HuggingFaceTrainer: This represents the training function to be distributed utilizing the defined scaling configuration.
Then, you just call a fit method to run the distributed training. It is important to highlight that one advantage of utilizing Ray Train is its facilitation of distributed training via PyTorch DDP. By leveraging Ray Actors, Ray Train manages crucial torch process groups necessary for distributed PyTorch training, considerably simplifying the process. Moreover, based on my personal experience, its excellent interface avoids code modifications to distribute your ML workload.
Now that you know how to distribute a HuggingFace Transformers on PyTorch training using Ray on Vertex AI, let’s see how you can leverage Vertex AI to tune an audio classification model using Ray.
Tune HuggingFace Transformers model using Ray on Vertex AI
As mentioned above, you start from the Python function that contains training code you want to distribute. Below you can see a pseudo function to fine-tune an encoder-only transformer model for music classification.
from transformers import AutoFeatureExtractor
from transformers import AutoModelForAudioClassification
from transformers import TrainingArguments, Trainer
def trainer_init_per_worker(train_dataset, eval_dataset, config):
"""A function to initialize the trainer."""
# set training configuration
model_id = "ntu-spml/distilhubert"
feature_extractor = AutoFeatureExtractor.from_pretrained(
model_id, do_normalize=True, return_attention_mask=True
)
...
# fine-tuning the model
model = AutoModelForAudioClassification.from_pretrained(
model_id,
num_labels=num_labels,
label2id=label2id,
id2label=id2label,
)
training_args = TrainingArguments(
model_name,
learning_rate=config["learning_rate"],
per_device_train_batch_size=config["batch_size"],
gradient_accumulation_steps=config["gradient_accumulation_steps"],
per_device_eval_batch_size=config["batch_size"],
... )
return Trainer(
model,
training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=feature_extractor,
compute_metrics=partial(compute_metrics, metric=metric)
)
This function is based on the Fine-tuning a model for music classification section of the Hugging Face Audio course. It utilizes the GTZAN dataset, a popular dataset of 1K songs divided by genre for music genre classification.
The function initializes the feature extractor and model associated with DistilHuBERT, a base model pre-trained on 16kHz sampled speech audio. The AutoFeatureExtractor class normalizes and truncates the audio data, while the AutoModelForAudioClassification class defines a model instance with the appropriate classification head.
Training arguments such as batch size, gradient accumulation steps, number of epochs, and learning rate are set using the TrainingArguments class. The high-level API of Trainer is then used to handle the training process.
It’s noteworthy that there are no changes in the code aside from the trainer_init_per_worker interface. This interface is responsible for crafting a Hugging Face Transformers Trainer, which Ray will distribute utilizing Distributed Data Parallelism (using PyTorch’s Distributed backend).
Now that you have the training function, you can use Ray Train to define your distributed training. Below you have a Ray Train pseudocode to the distributed fine-tuning of the DistilHuBERT model for music classification.
from datasets import load_dataset
import ray
from ray.train.huggingface import HuggingFaceTrainer, HuggingFaceCheckpoint
from ray.air.config import RunConfig, ScalingConfig, CheckpointConfig
from ray.tune.syncer import SyncConfig
# initialize ray session
ray.init()
# read HuggingFace dataset
dataset = load_dataset("marsyas/gtzan", "all", cache_dir='.')
dataset = dataset["train"].train_test_split(seed=config['seed'], shuffle=True, test_size=config['test_size'])
# preprocess dataset
dataset_encoded = dataset.map(
partial(preprocess_function, max_duration=config["max_duration"]),
remove_columns=["audio", "file"],
batched=True,
batch_size=config["batch_size"]
)
dataset_encoded = dataset_encoded.rename_column("genre",
"label")
train_dataset = dataset_encoded["train"]
eval_dataset = dataset_encoded["test"]
# fine-tuning the model using Ray
ray_train_dataset = ray.data.from_huggingface(train_dataset)
ray_eval_dataset = ray.data.from_huggingface(eval_dataset)
scaling_config = ScalingConfig(num_workers=config['num_workers'],
use_gpu=config['use_gpu'])
run_config = RunConfig(
checkpoint_config=CheckpointConfig(num_to_keep=1),
sync_config=SyncConfig(
upload_dir=config['logging_dir'],
),
name=config['experiment_name']
)
trainer = HuggingFaceTrainer(
trainer_init_per_worker=partial(trainer_init_per_worker, config=config),
datasets={"train": ray_train_dataset, "evaluation": ray_eval_dataset},
scaling_config=scaling_config,
run_config=run_config,
)
result = trainer.fit()
After you initiate the Ray session, you preprocess the audio dataset by splitting the dataset , applying the preprocessing function for normalizing and truncating the audio data and removing unnecessary columns. Then you convert the prepared Hugging Face datasets to Ray Datasets using from_huggingface() method. It is important to notice the preprocessing in this case uses RAM resources. Although Ray Data provides a great interface to handle distributed data preprocessing, the attempt for converting an Audio class from HuggingFace data using Ray 2.4.0 was unsuccessful.
As previously mentioned, at this stage, you need to define the scaling_config parameter, where you can specify the desired number of workers and indicate whether the distributed training process requires GPUs. Additionally, you have the option to define a run_config parameter, which allows you to specify checkpointing and synchronization behaviors. Notably, on Ray on Vertex AI, this feature is particularly advantageous as it enables you to conveniently store model checkpoints directly on Google Cloud Storage!
By setting your Ray Dataset, scaling and runtime configurations, you finally can execute your distributed fine-tuning job. This is done by initiating a HuggingFaceTrainer and running it using its fit method. Under the hood, the Trainer distributes the process across multiple workers, each with its own copy of the Hugging Face transformers Trainer.
To launch the distributed tuning job on Ray on Vertex AI, you can use the Ray Jobs API as described in the official documentation. All you need to do is to provide runtime dependencies (in the example below a requirements file has been used) and initiate the Ray job client. Then you can submit the job using the submit_job method. Below you have an example of how to submit the distributed fine-tuning job of the DistilHuBERT model on Ray on Vertex AI.
# prepare the requirements file
requirements = """
ipywidgets>=8
fsspec==2023.9.2
gcsfs==2023.9.2
torch==2.1.0
ray[data]==2.4.0
ray[train]==2.4.0
ray[tune]==2.4.0
datasets[audio]==2.10.1
transformers @ git+https://github.com/huggingface/transformers.git
sentencepiece==0.1.99
accelerate==0.25.0
evaluate==0.4.1
"""
with open(tutorial_path / "requirements.txt", "w") as f:
f.write(requirements)
f.close()
client = JobSubmissionClient("<your-dashboard-address>")
# submit the distributed training job
id = "".join(random.choices(string.ascii_lowercase + string.digits, k=4))
job_id = client.submit_job(
submission_id=f"<your-ray-job-id>",
entrypoint=f"python3 task.py,
runtime_env={
"pip": {"packages": str(tutorial_path / "requirements.txt")},
"working_dir": str(src_path)}
)
where the your-dashboard-address
represents Ray dashboard address for your cluster. You can find the dashboard address in the Vertex AI console or using the Vertex AI SDK.
After submitting the job, you can monitor the distributed training using the Ray OSS Dashboard on Ray on Vertex AI. For example, below you can see a log view of the running Ray Train job.
By analyzing the log, you can see that four actors are executing a HuggingFace distributed fine-tuning process over a Ray cluster consisting of four nodes (plus one master), each equipped with four GPUs. This configuration is a result of the job being submitted with the default setting of utilizing GPUs. It’s worth noting that monitoring your distributed training job may reveal instances where GPUs are not fully utilized. This underutilization can be attributed to various factors such as CPU bottlenecks, memory constraints, among others. It would be beneficial to conduct further investigation to pinpoint the specific cause.
Once the distributed training ends, you can retrieve the latest checkpoint and generate your predictions as shown below.
# initiate the log and extract model checkpoint uri
log = client.get_job_logs(job_id).split('\t')[-1]
model_checkpoint_uri = re.search(r"uri=(.*).{1}", log).group(1)
# copy the model checkpoint locally
! gsutil cp -r {model_checkpoint_uri} './checkpoints'
# initiate pipeline
pipe = pipeline(
"audio-classification", model='./checkpoints'
)
# compress audio
audio_64bytestr = compress_audio(audio_path)
# generate predictions
predictions = classify_audio(audio_64bytestr)
# print predictions
print('Predicted audio class:', max(predictions, key=predictions.get))
print('Score:', round(max(predictions.values()), 3))
# Predicted audio class: rock
# Score: 0.57
You might have noticed that I didn’t use HuggingFaceCheckpoint mentioned above. That’s because an error occurs when you try to initialize the audio model checkpoint using this class. Before scoring the audio, there is a compress_audio
function available. Using this function is not typically necessary; however, as you’ll see later, it is required for Vertex AI prediction requirements.
Now, you should have a deeper understanding of how to utilize Ray on Vertex AI to accelerate machine learning (ML) workloads, as exemplified in this article by fine-tuning a Transformer-based model using audio data. But what about deploying the fine-tuned model on Vertex AI and building a demo with Gradio? That’s what is covered in the bonus section below!
Bonus: Deploy an audio classification model on Vertex AI Endpoint with a Gradio application
To deploy your fine-tuned Hugging Face model on Vertex AI Endpoint, a managed web service for generating online predictions, these are the following steps you need to take:
- Create a custom serving container image
- Register the model as Model Resource in Vertex AI Model Registry
- Deploy the model in a Vertex AI Endpoint
About step 1), this may be required because the specific dependencies you use during the fine-tuning process which you couldn’t find in Vertex AI Prediction prebuilt images.
Looking at the Vertex AI official documentation, when you want to serve predictions from a custom-trained model, you must provide Vertex AI with a Docker container image that runs an HTTP server. You can implement the HTTP server in any way but you need to guarantee that it listens and responds to liveness checks, health checks, and prediction requests. Check out custom container requirements for the prediction page to know more.
Below you have the pseudocode of a HTTP server implemented using FastAPI to serve the HuggingFace Transformer-based music classification model.
#import libraries
from fastapi import FastAPI, Request
from transformers import pipeline
import base64
import io
import os
import pydub
import numpy as np
# variables
model_checkpoint_path = './checkpoint'
p = pipeline("audio-classification", model=model_checkpoint_path)
app = FastAPI()
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
"""Health check for the model."""
return {"status": "healthy"}
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
"""Predict using the model."""
body = await request.json()
# Load instances from request body
instances = body["instances"]
# Run inference
predictions = get_predictions(instances)
return {"predictions": predictions}
Notice that when deploying a model to a public endpoint, it’s required to ensure that each prediction request is under 1.5 MB. This means that you need to compress audio data into a smaller format like MP3 on the client side before sending the request to the prediction server. Upon receiving the request, the data can be decoded, decompressed (either using your own method or using Custom Prediction Routine) and passed to the model for predictions.
Once you’ve implemented the HTTP server, you can utilize Cloud Build, a serverless CI/CD platform, to build the corresponding Docker image. Cloud Build empowers you to seamlessly build, test, and deploy their applications across various platforms, including serving images on Vertex AI. The resulting image will be effortlessly pushed to Artifact Registry, which is fully managed service that proficiently supports container images.
To build the Docker image, you need to provide the associated Dockerfile as shown below.
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.10
# set workdir
WORKDIR app
# install dependencies
RUN apt-get -y update && apt-get install -y --no-install-recommends ffmpeg libavcodec-extra
RUN python3 -m pip install --upgrade pip
RUN pip3 install git+https://github.com/huggingface/transformers.git sentencepiece==0.1.99 torch==2.1.0 pydub==0.25.1
# prepare model
COPY main.py .
COPY ./checkpoint/ ./checkpoint/
# expose port
EXPOSE 8080
# Start the app
CMD ["gunicorn", "-b", "0.0.0.0:8080", "main:app", "-k","uvicorn.workers.UvicornWorker", "--workers","1"]
Note that the serving image includes not only dependencies and prediction server code, but also the model itself.
Finally, you can build the image through a CLI command. Here is an example of the build command using Dockerfile.
gcloud builds submit <your-source-path> -tag <your-prediction-server-image-tag>
where source path is the path where you saved Dockerfile and HTTP server code and prediction image tag is the full name of the image to be built. See the documentation to know more.
Now that you have your prediction server, you can register it as a Model Resource with an associated version on Vertex AI Model Registry. And you can deploy the model to a Vertex AI Endpoint for generating online predictions using the Vertex AI Python SDK.
The following code example demonstrates how to register the model in Vertex AI Model Registry. A key consideration is when defining the predict route, health route, and ports for the serving container. Ensure that these are aligned with the definitions specified in the HTTP server code provided above. Maintaining consistency is essential to avoid potential deployment issues.
# import library
from google.cloud import aiplatform as vertex_ai
# register the model
model = vertex_ai.Model.upload(
display_name=EXPERIMENT_NAME,
description='Finetuned DistilHuBERT model with HuggingFace and Ray on Vertex AI',
serving_container_image_uri=DEPLOY_IMAGE,
serving_container_predict_route=PREDICT_ROUTE,
serving_container_health_route=HEALTH_ROUTE,
serving_container_ports=SERVING_CONTAINER_PORTS,
sync=True,
)
And this code snippet demonstrates how to deploy a registered model to a Vertex AI Endpoint, enabling online predictions.
# import library
from google.cloud import aiplatform as vertex_ai
# create an endpoint
endpoint = vertex_ai.Endpoint.create(display_name=EXPERIMENT_NAME + '-endpoint')
# deploy the model to the endpoint
endpoint.deploy(
model,
deployed_model_display_name = EXPERIMENT_NAME + '-deployed'
machine_type='n1-standard-16',
traffic_split={"0": 100},
min_replica_count=1,
max_replica_count=1,
traffic_percentage=100,
sync=True,
)
Notice how the code employs default configuration settings. Check out the Vertex AI documentation to know more about deployment parameters.
Once your model has been registered and deployed, you can use the predict method to classify your song.
To use the predict method, you must create an instance data structure as shown below. You will also need to update your function to get predictions using the Vertex AI Endpoint hosting the model. And finally, you’ll need to update your compress_audio
function to adhere to the specified data payload limits.
# import libraries
from google.cloud import aiplatform as vertex_ai
# helpers
def vertex_classify_audio(audio_path):
"""A function to classify audio with vertex ai"""
instances = []
instances.append({'id': path(audio_path).stem.replace('.', '_'),
'bytes' : compress_audio(audio_path)})
response = endpoint.predict(instances)
return response.predictions[0]['predictions']
# initiate endpoint
endpoint = vertex_ai.Endpoint(...)
# get predictions
predictions = vertex_classify_audio("<your-audio-path>")
print('Predicted audio class:', max(predictions, key=predictions.get))
print('Score:', round(max(predictions.values()), 3))
# Predicted audio class: rock
# Score: 0.57
To conclude, consider presenting the code more effectively by building an application with Gradio. Here’s an example of integrating the Vertex AI Python SDK with the Gradio interface.
# import libraries
import gradio as gr
# define gradio interface
audio_demo = gr.Interface(
fn=vertex_classify_audio,
inputs=gr.Audio(type="filepath"),
outputs=gr.Label(),
title=title,
description=description,
article=article,
theme=gr.themes.Base()
)
audio_demo.launch()
This integration enables you to execute straightforward yet impactful audio demos as you can see here!
Conclusions
In this article, we’ve explored how to utilize Ray on Vertex AI to accelerate machine learning (ML) workloads. It demonstrated how to fine-tune a Transformer-based model using audio data and deploy it on Vertex AI Endpoint. It also showed how to build a demo with Gradio to effectively interact with the deployed model.
By now, you should have a better understanding of how Vertex AI provides a managed platform that empowers you to build powerful applications. And Ray on Vertex AI can help you accelerate your ML workloads using whatever ML framework you prefer.
If you are interested in exploring Ray on Vertex AI, I highly recommend checking out the Vertex AI documentation. Additionally, I encourage you to review the series below.
Scale AI on Ray on Vertex AI Series
This article is part of the Scale AI on Ray on Vertex AI series where you learn more about how to scale your AI and Python applications using Ray on Vertex. And, follow me, as more exciting content is coming your way.
Thanks for reading
I hope you enjoyed the article. If so, please clap or leave your comments. Also let’s connect on LinkedIn or X to share feedback and questions 🤗
Thanks Matthew Tang and Amy Wu for feedback and suggestions!
References
- https://arxiv.org/pdf/2303.11607.pdf
- https://huggingface.co/learn/audio-course
- https://medium.com/google-cloud/finetuning-flan-t5-base-and-online-deployment-in-vertex-ai-bf099c3a4a86
- https://www.usenix.org/system/files/osdi18-moritz.pdf
- https://docs.ray.io/en/releases-2.4.0/ray-overview/index.html
- https://cloud.google.com/vertex-ai/docs/open-source/ray-on-vertex-ai/overview
- https://learning.oreilly.com/library/view/learning-ray
- https://github.com/ray-project/ray-educational-materials