Tried LSTM music generation

My friend trained a LSTM network to generate ancient style Chinese poems. If you could read Chinese, you can tell that the generated results are pretty plausible:


Although the contents of the poems are arguably meaningless, it would take an innocent person a while to figure that out, as ancient Tang poems are meant to be obscure. And immediately we had noticed two things from the results. A, the format of the poems are exactly correct! They are formed by either 4 or 8 short half-sentences with each sentence formed by either 5 or 7 characters. B, the sentences rhythm, even the pronunciation of characters wasn’t part of the input. More than that, Tang poems and many other Chinese literature forms tend to use word pairs with similar or opposite meanings in adjacent sentences. And this had been learnt also by the network. In the above result, for example, the first half of the second sentence ends with “Snow (雪)” and in the following half, the corresponding character is “Frost(霜)”.

Similar experiments are easy to set up thanks to the char-rnn project. I tried to apply it to the novels of Jin Yong, who I think is the Chinese counterpart of George R. R. Martin. I had the hope that a well-trained network could generate some funny fictional pieces. But the network didn’t converge well. I couldn’t find long phrases with more than 5 characters make sense. But it did get the punctuation marks correctly. So if you stand 5 meters away from the text, it will still look plausible:

LSTM generated Jin Yong novel

In the above result, you can easily find names from Jin Yong’s works, such as 姑苏慕容氏, 完颜洪烈, 成昆 and 殷素素. This means that the network recognized them as short terms. But it never understands that they can be abstracted as the concept of name. And when generating a fake novel, names can be made up; they don’t have to come from the original works. This made me wonder how far away we are from the real artificial intelligence.

As a second try, I trained a network to generate music. Because the character size for music note is way smaller than Chinese and judging the quality of a music is not as easy as judging a piece of text, I hoped for more exciting results.

I still haven’t found the optimized parameters to train the network. I had converging issues and overfitting. But some short generated music pieces are actually pretty good.

This is a raw output of my network.

Raw output of LSTM

And this is a piece I composed with several outputs.

Crunchy Apple Chips

I’m still experimenting the network for better results.

Sheet of My AI generated music
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.