Cry Cry Baby: Deciphering Baby Cries

Authors: Jessica Gochioco, Charline Shen, Adam Stone, Jingwen Zhang

This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course.

Photo by Tim Bish on Unsplash

It’s 3AM. They’re crying. You’re crying. Navigating the world as a new parent is a challenge, especially in today’s society where many of us are raising a family while we’re far away from our own. It takes a village to raise a child, and that village might look very different today from what it did even a decade ago. Introducing the newest member of your village: the Cry Cry Baby app! Simply record your baby’s cry, and we’ll decipher what they need — whether it’s hunger, tiredness, a need to burp, or other nuances.

We’ve based our model on the Dunstan Baby Language, a system created by the trained opera singer Priscilla Dunstan. Using her impeccable hearing, Dunstan identified roughly five categories that babies’ sounds fell into:

Neh = hunger

Eh = needs burping

Eairh or earggghh = gassy

Heh = physically uncomfortable

Owh or oah= tired

The seemingly unconventional idea is rooted in the fact that babies express themselves through physical reflexes, translating into audible “words.” It’s a unique approach aimed at making parenting a bit more predictable in the midst of the beautiful chaos.

PROJECT OVERVIEW

Try Our Website !!

Check out our website at http://34.75.108.68.sslip.io/ !

Star Our Github!!

Check out our GitHub at https://github.com/charlineshen/AC215_CryCryBaby !

Web App Flow

This is our envisioned process: A user uploads their recording and it is initially assessed by model 1. If no cry is detected, we prompt the user to try again. If a cry is identified, we proceed to model 2, which evaluates the baby’s needs.

System Architecture

Model Containers:

  • Download Data
  • Preprocessing
  • Model 1
  • Model 2

App Containers:

  • API Service
  • Front-End
  • Deployment

MODEL

Photo by Scott Graham on Unsplash

Data

As previously discussed, we’re implementing two distinct models to optimize app functionality. The first model determines whether an input sound qualifies as a cry at all, while the second one is dedicated to classifying the specific needs associated with a cry. Our labeled dataset is sourced from the Donate-A-Cry corpus , meticulously cleaned and annotated with the Dunstan Baby Language as a guiding framework. Additionally, we expand our data resources by incorporating information from diverse sources, including CryCeleb, CREMA-D, and ESC50. This approach enriches our dataset, especially benefiting our binary classifier by providing a broader spectrum of audio data.

Preprocessing

In our preprocessing pipeline, we standardize our audio files to ensure consistency and enhance model performance. This involves resampling the files to 16 kHz, adjusting their duration to a uniform 7 seconds through truncation or padding, and transforming the wav files into mel spectrograms. To maintain uniformity and facilitate model training, we normalize the values. It’s worth noting that the choice of a 7-second duration aligns with the prevalent length of our audio files. These procedures collectively represent well-established and widely used preprocessing techniques for handling audio data.

Exploratory Data Analysis (EDA)

Let’s delve into a visual exploration of our spectrogram outputs across various labels. While distinct differences are evident, we approach our data with a cautious perspective. The diversity in children’s ages within our dataset raises concerns, as the efficacy of the Dunstan Baby Language is most pronounced during the early stages when a baby’s reflexes are intact — typically before they begin learning their caregiver’s language, typically around 2–3 months of age.

Modeling

Since the goal of our project was more so the MLOps side of the course, we worked with a simple toy model with the idea that we would come back to it to improve it. While our binary classification model performs admirably, the same can’t be said for our needs classification model. It appears to plateau at approximately 84%, consistently favoring the prediction of hunger as the most likely need. Despite implementing various strategies like data augmentation, upgrading to a more sophisticated model incorporating batch normalization and dropout, and leveraging SMOTE to address data imbalance, our accuracy rate still has not budged.

APPLICATION

Photo by Taras Shypka on Unsplash

Front-end

Our front-end is straightforward: simply upload a baby cry audio file in .wav format, and within a few seconds you’ll receive your output!

demo of a hungry baby cry audio

Backend

We use python’s FastAPI to build out our backend because it’s easy to use and also automatically generates documentation.

Deployment

We utilized Ansible for deployment, enabling the creation and deployment of virtual machines, software installation, and networking through YAML scripts. Post-initial setup, the deployment process becomes significantly faster, automating manual steps like VM creation with a simple script. Ansible also facilitates collaboration via source control systems like GitHub. Both deployments require container pushes from local machines to Google Container Registry, emphasizing the need for a local development mode mirroring production for efficient bug fixes and cost-effective development.

Deployment entails a secrets folder for GCP connectivity, granting specific accesses for GCS buckets, Container Registry, and Ansible-enabled automatic setup. Our README provides detailed instructions. The deployment process involves local container pushes to GCR, GCP VM deployment setting up networking, instance provisioning with Docker and Ansible, container deployment from GCR to the VM, and NGINX deployment for efficient request handling. Following these steps allows obtaining the VM’s external IP to view the app.

To improve our deployment strategy, we’re incorporating Kubernetes for better scaling. Our process now includes setting up Kubernetes manifests for deploying containers, helping us manage these containers effectively across a cluster. This approach includes features like auto-scaling and load balancing, managed through Kubernetes services and ingress. These capabilities allow our application to adjust automatically to changing loads, ensuring it remains reliably available.

Kubernetes cluster

CI/CD App Deployment

We manage our CI/CD deployment using Github Actions. This allows us to tag commits to the codebase in Github to trigger ansible scripts to update our containers in the Google Container Registry, and then update our kubernetes cluster with those containers.

Automatic Model Training

We also use Github Actions to automate training our model in response to new data, preprocessing methods, and model architecture and hyperparameters.

Using a tag at commit, Github Actions triggers an Ansible deployment of the four containers in the model training pipeline, then launches a python script which generates and Kubeflow pipeline file, and finally submits it to VertexAI for training. At the end of the training, the new model replaces the old model and is made available to the app API container.

Challenges & Lessons Learned

The main challenges we faced during the project were around model performance, non-portable containers, slow deployment feedback, and code maintenance.

  • Model Performance: For our model, we started out with toy models in order to move forward with our project milestones. However, when we went back to improve model2, we found that despite all techniques we described earlier, we never really budged beyond 84% due to our small data size.
  • Non-Portable Containers: On the technical collaboration side, despite using containers, we still struggled to see the same output on all our local machines. We think different chip architectures are the reason why we have dependency and lock file issues.
  • Slow Deployment Feedback: We also dealt with slow deployment feedback; although GitHub Actions, Ansible, and VertexAI allow us to work in a more automated fashion, it may take 30 minutes per iteration to find out if our changes made any difference
  • Code Maintenance: Lastly, we found it difficult to maintain both production and development code and so ended up working primarily on production code.

Future Work/Improvements

We have a lot of ideas of what we can do with our app moving forward.

  • Increase Usability: In order to make our app a lot more usable, we’d like to move from a web app to a mobile one.
  • Improve Data Quality & Quantity: We want to improve both data quantity and quality by adding features that would allow users to submit their recordings alongside their label predictions and also integrating a baby tracker which would help parents keep track of feedings, sleep, etc and add that data as features in addition to the crying audios.
  • Improve Model: After we’ve collected more data, we want to revisit improving our model, likely involving fine tuning an open-source pretrained model.
  • Improve Experience: Finally, we’d like to improve our users’ experience by integrating an LLM chatbot that is prompted with our API prediction and that users can interact with.

References

--

--