Effect Bath — Dissection

Matt Miller
7 min readMay 25, 2018

A look at the code behind a Twitter bot: https://twitter.com/effectbath

This Twitter bot was the most media heavy one I’ve created. I wanted to document the process of creating something like it from start to end and share the source code. The goal of the bot is to collage together 16,000 audio effects into 60 second videos. Repo here.


We need a listing of all the sound effect files in order to download them. Opening up the network inspection tool in your web browser can help find any metadata files that are being loaded to display content on a site like this:

In this case a CSV file is being downloaded from the site with all the needed information. So we find the URL of the CSV and save it.

Now that we have a list of all the WAV files we need to download them. You could use WGET but I just wrote a small python script that loads the CSV file and politely synchronously downloads all the files. Downloading all the WAV files took up 284G.

Now that we have the data I wanted to split up all the sounds into five second chunks. I used Pydubs, a really nice python wrapper around ffmpeg. I wrote a script that loads all the files and splits them, since we are entering CPU dependent territory (manipulating data in memory) we can start to use multiprocessing to speed things up, my computer has 8 cores, so it works on 8 files at once. We also switch over from saving the segment files from WAV to MP3, saving 259G.

We now need to gather some information about each of the five seconds clips, what is the intensity for each one, what is the average intensity. This first script processes each segment and saves the data. It creates data for each segment that looks like this: https://gist.github.com/thisismattmiller/56bbb3b6fe27f704a4feac63a03fd8c0

Now that we data for various parts of the segment (the “values” value) we can match them up and average them. This script does that processing and rounding the values into more general values. It then saves them out as lookup files to be loaded later.

Now that we have all the data ready to reference we can start matching the sound segments together into a compiled audio file.

Matching and Building

I wanted to do two things, match up segments that had similar intensity at their transitions (build_narative_clip) and also overlap a lot of clips that had the same intensity(build_overlap_clip). This script has the two functions that does that. In the last step we stored a bunch of lookup json files that with the index being the “intensity” value for parts of each segment. So we simply pick a random opening clip and then look in the lookup file that has a starting matching intensity. One problem is that I wanted to overlap the clips in this mode, so I needed to look at the intensity a few ticks back to match up to the new clip. The other mode uses another lookup and just picks an intensity at random and starts overlapping the segments together, much more straightforward.

The results are interesting for the build_narative_clip mode, because if it gets into a loud trend it will get very loud as the intensity at the end of each segment ramps up and it needs to find equally intense segment to match with.

Likewise for the build_overlap_clip mode it can find a really low value and match a lot of clips that have the same level, making a very ASMR-ish clip:

We do all this compilation using Pydubs and save out the audio file as a MP3. Each MP3 has a meta.json files that contains all the info about what segments were used and at what timestamp they are played.


We need to now build some visual representation to go with the MP3. The strategy here is to make a lot of still images of the waveform of the compiled sound effect and then aggregate them into a video. First we need the waveform images, done in this script. This process uses numpy to generate the waveform values and then write out 600 frames of the waveform animation:

So we have 600 images of the waveform playing, we now need to add the text. Using this script we create a large base image of 640x420(nice) to paste the waveform image into and write the name of the sound effect out. I was going to use a 640x640 image, as per the twitter recommend media best practices but I did not need 640 on the vertical so reduced it. This script does the work of planing out when to display the text, measuring the size of each text and placing it in composition with the waveform image. Since we know the length of each clip and there are 600 frames to work with we can plan out when to fade in each piece of text.

The result is 600 frames of the combined waveform and text:

All that is left is to make the 600 frames into a real video using a direct ffmpeg call:

os.system(f"ffmpeg -r {rate} -i '{wave_file_name}/gif_frames/%03d.png' -vcodec mpeg4 -vb 15000k -y {wave_file_name}/video.mp4")

We can adjust the frames per second using the -r argument, since we know that there are 600 frames over the course of 60 seconds we can calculate how long each frame needs to pause for in order to match up the video to the audio file.

Then we can combine the video and audio together:

os.system(f"ffmpeg -i {wave_file_name}/audio.mp3 -i {wave_file_name}/video.mp4 {wave_file_name}/final.mp4")

This re-encodes the the two sources files into final.mp4.

The next step is to build the generative title for the clip from the all the clips that were used to make it. I used textgenrnn to generate this, this script trains a little neural network on the titles used and generates a new one. It modifies the meta.json file with the new title.

The last step is to clean up the build files.

When I need to generate some new clips I run:

python match_and_build.py && python build_waveform_frames.py && python build_gif_frames.py && python build_titles.py && python remove_build_files.py


We now have the content, the video file we want to Tweet. The idea is to have a stock pile of content ready to go and then every X hours Tweet one out. So I’ve generated a bunch of content using the above commands each saved in a directory with a unique UUID name.

The idea is to upload all these UUID folders to a AWS S3 bucket to be ready to be used. I created a bucket on AWS and using the AWS command line tool I upload all the data:

aws s3 cp . s3://bbc-sound-bot/todo/ --recursive

This copies all the files into my bucket into a folder called “todo”.

Now that the data is in AWS ecosystem I can write a Lambda app that posts it to twitter. I setup a new Lambda and upload my code as a zip file. The deployment zip needs to have all the modules compiled and ready to be used. (you can use this repo for your bot, just need to modify and zip it) I just need to write the python (2.7 in this case) script that does the posting. This script picks a random UUID directory name from S3, logs into twitter and posts it using the video and meta.json file at the end it deletes the used UUID folder from S3. You will need to have all credentials from http://apps.twitter.com/ to populate the script. The Lambda setup looks like this:

It has permissions to modify S3, and is triggered by a reoccurring (every 5 hours) ColudWatch event.

I like this setup, I don’t need to worry about running a server, I know the function will be triggered every 5 hours and as long as there is content in the todo folder it will post to Twitter.

That is the general outline of how the bot works and the general process of how I build most of my Twitter bots.