Music transcription (converting raw audio into note information) is a sequence-to-sequence task, which recurrent neural networks are especially well suited for. However, there are a few tricks needed to make this work. For one, there is a rate mismatch between the input sequence (which may be sampled at 44.1khz) and the desired output sequence (which may be quantized to sixteenth notes at 80bpm or similar).
The first trick is to perform the (discrete) short-time Fourier transform, a.k.a. STFT, on the input sequences. This achieves two things. First, it extracts spectral information from the input audio, and this spectral information tends to be quite useful to detecting pitches and timbre. Second, it reduces the rate of input sequence, while increasing the dimensionality. For example, if a frame size of 1024 with a frame step of 256 is used on an input audio file sampled at 44.1khz, the resulting output sequence will consist of ~172 frames per second, each with 513 dimensions of spectral information. …
This article is a continuation of a previous article:
In the previous article, we discussed how to vectorize text data and store minibatches of this vectorized data as sparse tensors, for efficiency. In this article, we take a look at how to create a generator for feeding this data into an RNN.
The first step is to create a function to load a particular minibatch from storage into memory. In this case we are reading from a file path, but this could easily be modified to read from a database:
Recall that our sparse tensors were stored via their “shape” together with arrays for “indices” and “values”. We use tf.sparse.SparseTensor to reconstruct a Tensorflow SparseTensor from these three pieces of data, and then use tf.sparse.to_dense to convert it to a dense tensor. We assign the result to a variable “var” of the appropriate shape passed as an argument. …
Recurrent Neural Networks (RNNs) have an uncanny ability to model natural language. The basic idea is to train an RNN to learn the probability distribution over possible next words of a sentence given the start of a sentence. So after training, the RNN can take in a partial sentence like “The dog looked for the” and give the conditional probability of the sixth word of the sentence being “bone” versus “hydrant” versus “cat” versus … etc. given that the first five words of the sentence are “the”, “dog”, “looked”, “for”, and “the”. …
If you’ve used the online Learn.co platform and used GitHub integration, you might have accumulated a lot of automatically generated repos storing your work which clutter up your GitHub account. This can make it harder for people to find the repositories where you are doing original work. For this reason, I’ve developed a relatively simple tool called learnco-nuke to automatically backup and then delete these repos from GitHub for you.
You can get started by cloning the learnco-nuke repository here:
There are instructions in the Jupyter Notebook, but we’ll go over the basics here. The first step is to create a personal access token on GitHub with the “repos” and “delete” authorization scopes. The following link goes over how to create a token: https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line …
Check out Part I of this series here:
In the previous part, we explained the intuition behind autoencoders and gave a general mathematical formulation of what an autoencoder is. In this part, we’ll prove that a certain class of autoencoders can optimally learn polynomial regression.
Let’s suppose that our inputs are tuples of real-valued measurements at times t=1,2,3,…,n, where n is very large (say 100,000), and each input consists of a signal plus some random i.i.d. Gaussian measurement noise at each measurement time. Perhaps we have reason to believe that the underlying signal for each input is a degree k polynomial in t, where k is small relative to n (say 20). …
This article is targeted towards a mathematical audience. We aim to explain why autoencoders work, and prove rigorously that for certain classes of input data, autoencoders converge to an optimal representation. We’ll try to give necessary background information where feasible, but having already a solid understanding of linear regression and statistics will be helpful here.
The basic structure of an autoencoder is an “encoder” network chained before a “decoder” network, stuck together in the middle like two funnels glued together on their narrow ends. The basic idea is that the encoder network takes high-dimensional input data, and produces a low-dimensional “code” or “latent representation”. …
When we represent uncompressed audio digitally, we often represent as a one-dimensional array of values, corresponding to the amplitude of the waveform over time.
However, psycho-acoustically, audio does not feel like it is described by a single real number for each moment of time. Rather, when he hear audio, we notice all sorts of qualities of the audio like pitch and timbre. These qualities are best described as “spectral properties”. They are properties most readily identified from the Short Time Fourier Transform (STFT) of the signal. Here is an example of plotting the STFT of someone speaking the word “one”.
Working in circles with other techies can distort your view of reality in ways that are dangerous for those outside those circles. Designers of algorithms can inflict harms in ways both subtle and profound on people who have no voice in the process, and we should remain constantly aware of this fact.
We are all likely aware of the horror stories coming out of Amazon warehouses: employees being forced to pee in bottles to avoid missing inhumanely strict productivity targets set by machines. Perhaps we’ve also heard of stories of people wrongly being denied unemployment benefits based on the MiDAS system in Michigan. As much as we’d like to paint these as just mistakes or oversights, the root cause of these problems is that the people writing the algorithms and designing these automated systems don’t feel close to those who will be impacted most by them. …
If you’ve ever tried to collaborate on jupyter notebooks in a git repository, you’ve probably noticed that merge conflicts emerge far more often that you would expect. The reason for this is that the notebook format, .ipynb, does not play nice with the line-based diff algorithm that git uses by default.
Fortunately, there’s a relatively simple fix that integrates with git quite easily called nbdime. You can take a look at the documentation here: https://nbdime.readthedocs.io/en/latest/
For now, let’s just go over the basic steps to integrate nbdime with git and the usual workflow. If you have pip, it’s two lines in the terminal.
Running nbdime mergetool will open up a web-based interface to see difference between two notebooks when merging.
Hope this helps you out!
A choropleth is a color-coded map which allows you to visualize a variable in terms of familiar geography. There are a number of libraries capable of generating choropleths in Python, but if we want to allow users to interact with our data, sometimes we’d like to export a choropleth into HTML format.
Here’s an example of what the final product looks like:
All the code used to generate this can be found in the github repository I’ve set up for this project: https://github.com/jmsmdy/html-choropleth-generator
To start with, we need to find a “shapefile” for the geographic region we wish to map. There are many freely available shapefiles online, broken down into many different levels of detail. For US shapefiles, a good source is the US Census website: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html …