How artificial intelligence will do to music creation what Instagram did to photography

Rethinking creativity and accessibility for music in a world where “there will be ten million songs a day.”

40 min readJul 29, 2019

Below is a transcript of an interview with Alex Mitchell, founder/CEO of Boomy, for the Water & Music podcast. Hosted by me, Cherie Hu, the podcast unpacks the fine print behind big ideas at the intersection of music and tech, featuring a curated selection of leaders, artists, thinkers and innovators from across the music business. You can listen to the podcast on Spotify, Google Podcasts, Apple Podcasts, Stitcher, TuneIn, Overcast, Pocket Casts and many other platforms.

Boomy, a startup that just launched out of beta a few weeks ago, is building accessible tools to help users not only make and edit “instant music” on the fly using artificial intelligence, but also distribute and monetize that music directly on paid streaming services like Spotify and Apple Music.

What you’re about to hear is definitely one of the most interesting conversations I’ve ever had about this space of A.I. and music creation. Some of the questions we dive into include:

What’s wrong with the phrase “A.I.-generated music”?
What’s the difference between optimizing for creativity versus for accuracy in a machine-learning algorithm in the context of art?
What’s a “win condition” for A.I. in music — i.e. how do we know if a musical algorithm actually works?
How is A.I. going to have the same effect on music creation that Instagram had on photography?

Hope you enjoy. :)

Cherie Hu: Hey, Alex, thanks so much for joining this podcast.

Alex Mitchell: Yeah, hey, thanks so much for having me.

CH: Today is a super exciting day for you, because as of recording this, you’ve just launched Boomy out of beta.

AM: One hour ago.

CH: One hour ago. Wow. [laughs]

AM: Yeah.

CH: That’s awesome. So, I definitely want to dive into the actual product and your takeaways from the beta — but before that, I want to talk about more high-level concepts in this field of A.I. and music creation.

AM: Sure.

CH: I want to start specifically with something that I know you feel very strongly about, which is the misnomer of “A.I.-generated music.”

AM: Yeah, and I appreciate starting there, too, because we’ve been a little bit quiet — this is the first interview I’m doing about Boomy, and we’ve been in beta and we’ve been seeing what our users are doing, and then other people have kind of picked up on this, and they’ve started writing about us, and they keep describing what we do as “A.I.-generated music.”

I’m curious to get your thoughts on this too, but my thought on this is that you wouldn’t call what Slash does “guitar-generated music.” And you wouldn’t call what Drake does, like, “autotune-generated music.” I think this notion of “A.I.-generated” — it puts kind of a weird emphasis on the tool. And while we all are very proud of the technology that we’ve built, really this is just another way of making music, and a really cool, efficient way of making music. I think many of our users think of themselves as artists, even if they’ve never made music or studied music prior to using Boomy.

CH: I’m totally with you. I think it’s related to how, in general, a lot of the conversation around A.I. and arts, or A.I. and work, is understandably very fearful and very skeptical — to the point where, when we’re talking specifically about music, we do kind of give all responsibility and all credit to the tool. It’s like, this algorithm was “sitting in their bedroom” and producing all this music. [laughs] Instead of an actual human being controlling the software.

AM: And I think there’s a big difference between what we were testing over the last several months — very much as a minimum viable product that had a limited set of functionality, where basically up until today, pretty much all you could do with Boomy was generate songs and save or reject them. You could do some editing, but the editing was limited. Over the last several months, we’ve been talking to our users a lot … and the thing that everybody asked for was, “I want to edit this more. I want to change it, I want to be able to change the influence after the fact.” Nobody really is asking us to make the A.I. better — I mean that’s happening anyway — but what everybody wants to do is manipulate the song.

One of the reasons we kept it in beta [for] so long is because we felt it wasn’t really a complete product experience until you could really change the track — delete things; add things; change sounds; delete sounds; add your own vocals, which is coming very soon — and things like that, where it’s really just a very simple way of creating some really compelling, really powerful music.

CH: Were there any musicians, who maybe could be called “traditional,” who were using Boomy as part of the beta? And if so, what was their experience like using the tool? What did you hear from them?

AM: Yeah, there are a bunch. So one of the cool things that we’ve been able to do is give you the ability to take what you’re doing on Boomy off of the platform. So when you create a song — and that song gets created in five seconds now — you can then take it off in MIDI format, you can take it off in stems and in WAVs, and download MP3s.

I was talking to a pretty prominent songwriter recently and he was creating beats with Boomy, and as soon as he saw how fast and efficient it was, and that it’s actually usable, he was like, “Oh my god, I’m going to use this in sessions all weekend.” … When he works with rappers, you know, they’re just kind of going through beats, and now they can do that without having to worry about licensing, without having to worry about potential issues with rights. We’re seeing composers who have taken inspiration out of some of the things that they’ve made on Boomy. We made sure, pretty much from day one, that if you want to take what Boomy’s done with you, and take it off-platform and integrate it into your own work, we made it super easy to do so.

CH: Thinking about what actually makes an algorithm good, something that you’ve written about on the Boomy blog is this concept of optimizing for creativity versus for accuracy. I was wondering if you could elaborate on what exactly you’re referring to in that situation. Also, you said on the blog in the case of Boomy that you’re aiming for the former, in terms of optimizing for creativity. So if you could talk about both of those things.

AM: Yeah, absolutely. So I think it has a little bit to do with just how deep learning and machine learning works. It might be helpful to talk just for two seconds about what artificial intelligence is — and what it means from an engineering standpoint, versus what it means socially, or what it means in [the] press sometimes.

When we talk about artificial intelligence, you’re really talking about a set of data processing technologies, right? So it’s an advanced way of analyzing a bunch of data very quickly, and allowing the algorithm to figure out what the best way to process that data is. This is technology that you’ve seen be applied to photos pretty extensively. So, you know, here’s a picture of a person, or here’s a bunch of pictures of people, here’s a bunch of pictures of dogs. You, algorithm, go figure out what’s a person and what’s a dog — and then now here’s this new photo, is it a person or is it a dog?

These systems have been designed for accuracy. In a system like that, you want to be able to distinguish — and you’re training the algorithm to distinguish accurately — what is a dog or a person. And there’s been some amazing research in applying this to music, because we have this great format called MIDI, and you can do something very similar to it where you are listening, effectively, or analyzing a bunch of that data, and then trying to replicate something out of it.

So one of the issues that you get into with that is what I call the “over-optimization problem” — where if you’re asking a trained algorithm that’s been trained on a bunch of MIDI data [to] write a song that’s “accurate,” it’s trying to find what’s called a local minimum, so it’s going to write the same song every time — or very similar-sounding songs. If you’re giving it a reference and you’re saying, “here’s what a ‘good’ song sounds like and here’s what a ‘bad’ song sounds like,” you can get into this issue where there isn’t really enough variability and it isn’t super creative.

And so one of the things that we set out to do that was kind of the whole point of, I think, what we’re bringing to this field, is look at two areas where we can actually try to use that same concept a little differently. And instead of trying to make an accurate song, try different creative ideas very rapidly — and do this both on the composition side and on the production side. If you do it fast enough, what you can get is something that sounds like totally stupid, right? Something that doesn’t sound like music at all — it just sounds like a bunch of garbled nonsense. But, that garbled nonsense might actually sound cool to someone. And then over time as we start refining the algorithms and working with the data that we’re getting back from our users, we’re able to refine this workflow into something where, now, we can start training individual creative profiles of our users. And that’s what’s really exciting.

So that’s a lot of words. I think what I’m trying to get across is, if you try to write an accurate song, you’re going to end up with something that kind of sounds the same all the time. If you’re trying to write a creative song, it necessarily has to be bad sometimes. I think a lot of artists have had the experience where you’re in the studio, you’re recording a track, and then, like, the guitar bumps the amp or something — but it ends up sounding really cool, and that ends up adding something to the track that was technically a mistake, but still sounds really good. That’s kind of what we tried to represent, model and include in the tracks that are being created on Boomy. So you’ll get songs that have mistakes, but those mistakes are actually really interesting.

Our theory was: so long as we can create these songs very quickly — it took fifteen minutes to create a track, which was too long; we did a bunch of engineering, and I’m happy to report we’re doing it in under five seconds once you’re connected to a server — if you can do this fast enough, then we can start rapidly trying new kind of out-there, weird ideas, throw them up to the user and say, “hey, does this reflect you?” And the user can save or reject it, and over time, again, we’ve developed that profile.

CH: I think allowing users to edit, and then personalizing output to the users’ tastes, is really key — rather than saying that, like, “this is the definitive output of an upbeat folk song, and this is what you’re going to get, and the individual’s taste isn’t going to be taken into account.”

AM: Right.

CH: What I’m going to play in this episode are some sample tracks you sent me ahead of time; they’re two pairs of hip-hop and strings tracks, one of which in each category is more “accurate” versus more creative.

So in hip-hop, you have a track called “Nameless ” that’s more accurate. Supposedly.

AM: And for what it’s worth, these are weird names because we actually have an A.I. that writes the names, too. I think maybe some of them came from users, but on our platform, you can name your songs with A.I., and you can actually create cover art when you distribute with A.I., too. But we can talk about that later.

CH: For sure. So, there’s “Nameless” which is more accurate in hip-hop…

CH: And then the more creative [hip-hop] track, which I’ll play here, is called “Pro-Gladitorial.” Am I pronouncing that correctly?

AM: I have no idea. I guess.

CH: [laughs] OK. I’ll just put that out there.

CH: For strings, the more accurate track is called “Acoustic Screech”…

CH: And then the more creative one is called “Turbulent.”

AM: Yeah.

CH: My gut reaction listening to these tracks was very similar to what you just laid out. For the more accurate tracks, when I was listening through, they were much more rhythmically and melodically predictable. So, like, for the “Nameless” hip-hop track that was more accurate, I was, like, oh yeah, that’s, like, a standard hip-hop beat, and — oh, yeah, now there are all these, like, cow bells and other, like, percussive embellishments on top, and oh, now the same bass line is coming back… so it was like that sense of familiarity that you do sense, actually, in a lot of mainstream music today that may or may not have been made with artificial intelligence.

And then the more creative tracks definitely had a lot more curveballs that I wasn’t expecting, in terms of both rhythm and melody. And it seemed like those tracks were consistently morphing, even just over the span of a minute as I was listening to them — where even at the 30-second mark, we’re at a different place from where we were at the five-second mark.

AM: Right.

CH: It’s definitely more interesting that way.

AM: And they certainly get a lot crazier than some of what I sent you. All you got to do is go to Boomy.com and use the “unlimited” filter a few times, and it’ll go a little bit crazy.

But yeah, I think it’s important that what we’re doing is not just sort of best-guess copies of music that came before, right? If that’s all that this technology is going to do, the market applications are going to be pretty limited. And it’s not really going to fulfill the mission of the company for us, which is to bring music-making to everybody, for free.

The creativity piece really comes in, in leaning into the inconsistency sometimes, and then creating evaluators to figure out, well, is this cool inconsistency? Is this good? Like, what is “good” is really core in what we’re doing. And for us, we don’t necessarily want to assume that. We don’t want to say, you know, make us songs that sound exactly like this other artist or this other thing; we want this to really be a reflection of the user. And so if we start from a very wide base of creativity, the more you save and edit on Boomy, the more we can kind of get to what you would write if you were a musician. That’s what this technology can enable. We think that’s really exciting.

CH: I’m actually thinking back now to a panel that I moderated on this topic at the NY:LON Connect conference in New York. I asked a question that, in hindsight, I realize maybe is a false equivalence … at the time, Google had just launched AlphaGo, which was some kind of machine-learning algorithm that could beat the world champion at the game Go.

One of the first questions I brought up during that panel was, “What is the equivalent of AlphaGo for music?” [laughs] Part of the discussion kind of led to the fact that music is not a board game, arguably. There isn’t a set number of super solid rules. Obviously, there are some things that could be codified in terms of music theory — like, these are the chords that tend to come up in a pop song, or something like that. But the rules are much hazier.

I’m curious to hear your thoughts on that. Like, what is the “AlphaGo equivalent” [for music] — is that kind of going the more “accurate” route? Or might there be a parallel there as well?

Screenshot from Netflix’s documentary “*AlphaGo,” named after the infamous computer program.*

AM: So, yeah, I think the important thing to note with something like AlphaGo, or some of the great research that Open A.I. has done around gaming, is that there’s a win. There’s a win condition. You won the game, or you didn’t win the game. Right? The way they did that is, they made that algorithm play [Go] a bazillion times, some crazy number of times on a server, and let it kind of figure out every possible outcome.

For us, we have to stay really honest about: what is a “win condition” in music? Some of the research I think that’s been done by feeding it a bunch of reference tracks that says, hey, make something like “this” — again, it’ll just make that song over and over again. In my last company Audiokite Research [acquired by ReverbNation in 2016], we worked a lot with A&Rs, and any A&R will tell you — I mean, I’m not going to speak for all A&Rs — but a lot of A&Rs will tell you that they’re not looking for something that sounds like what’s popular today. They’re looking for what’s going to sound popular in six months, a year, two years. They want to develop artists; they want to develop new sounds. Right?

So I think, whereas academically it’s super interesting to look at what it takes to analyze folk songs and create new folk songs that kind of sound similar — and there’s an interesting kind of data science going on there — for us, it’s a lot more interesting to say, how do we create something new? How do we create something that’s totally different from anything you’ve heard before? I think if you listen to some of the songs that Boomy’s producing, you know, some of them you can definitely hear influences of genre, but other ones are totally new. They’re totally different. One of our users tweeted the other day, “I just invented a new genre of music.” And we were like, “Yes. You got it. That’s exactly what’s going on here.” Because the songs he was producing, you know, they’re electronic, but it would be really hard to classify them, according to things that came back — but they still sound cool. They still sound good.

It almost is an A&R thing, in some ways — to say that creative music is going to be something that you haven’t heard before, accurate music is going to be something that you have.

CH: Yeah, I think framing it as “what is the win condition” for music is super useful in this day and age when genre is already so fluid, and you have artists who would traditionally be classified as hip-hop totally going into electronic, totally going into country. The whole thing around “Old Town Road,” that’s kind of the epitome of where we are in 2019. In this environment, a win condition is kind of the opposite of imitating what has been done before, for sure. Or like, the win condition is pushing the culture forward in some way, and opening up people’s minds in some way.

AM: Yeah, I think it’s going to be different for each user — and I can tell you what it means to us: I think the win condition for us is meaning. If we can help you make a song that’s meaningful to you, in some way or in any way, then we’ve won. Right?

And again, by using a system that can create an incredibly wide variety of output — that can produce things that are accurate if you want, but that can produce stuff that’s totally crazy and new — then that meaning that you bring to it will inspire you to use music in ways that we might not necessarily even know today. We certainly saw some behaviours during the beta from our users that were super interesting, that we weren’t expecting. And I think it’s because very quickly, they were able to find meaning in their songs.

And so, you get into this, like, well, what’s a “good” song and what’s a “bad” song? I think it’s really simple for us: what is the lower bound of effort that it takes for somebody to get from never having created a song in their life — or maybe [never] even thought about creating a song in their life — to having something that means something to them, and that they can go use?

CH: You mentioned both professional artists and also everyday people are creating music in a whole new way through Boomy. I’m wondering: what other use cases have come up, at least through the beta program? Are people using it primarily as a creative tool, and then maybe, exporting it, downloading it to incorporate it into their songs, or are there other use cases have come up?

AM: It’s part of what we set out to learn. The way I say this to the team is that we’re solving a market problem as much as we’re solving a technology problem.

The way we’re going about this is, we’re designing certain styles — so you can come into Boomy and select a style — and the first style that we designed was Beats. We’re pretty good at that right now. So we can make beats, they sound like hip-hop beats, they can sound like EDM beats — we’re going after hip-hop artists, so we have a lot of artists who are using us to create beats so they don’t have to worry about rights, and then put them into their songs. And that was kind of where we started. It was like, “hip-hop people are going to use this to make songs.” And we did see some of that.

But here’s something else we saw. So we had a whole bunch of gamers sign up for the beta — like sort of younger gamer kids — and they were making these EDM tracks. And we were looking at these song title names that they were giving their songs, they didn’t make any sense to us. So we had to Google it. And as it turns out, what they were doing was they were creating whole EDM tracks just to make fun of another player on Discord for losing at the game. [laughs] And another player would send them this whole EDM song back being like, “haha, no, I beat you this time.” That’s what we were able to discern from that whole thing and from talking to those users, is that they’re taking this and they’re using it to communicate positivity or negativity in a way that we never really expected.

I pull that use case out because it’s the most interesting one to me, because it’s not something we ever would’ve imagined people would actually do with this music. Right? We set out and we were like, you know, DJs are using it to create samples that they can play, hip-hop producers are using it. If you told me six months ago that actually most of the usage would be gamer kids, like, making fun of each other, I would have not been able to guess that. And that’s not something that ever would’ve made sense before you could make a song in five seconds. Right? And so that was an interesting behavior that I would draw attention to.

[There are] A lot of dancers. I mean, anything that you can imagine, right? What we’re really going for is non-musicians, primarily. And musicians are going to use it because they’re going to use it like any other music tool. But we’re really looking at this as a way to kind of onboard people into the music creation process, without having to have this incredible amount of education or access to resources in order to do that.

CH: And I think with Boomy specifically, one of the most interesting aspects to me — also just thinking about what the future of the music industry could look like — is that distribution, and therefore being able to monetize these tracks is directly built into the product.

AM: Right, right.

CH: I think for people listening, it would be helpful if you could just walk through how you’ve structured the distribution aspect of the platform, and sort of your philosophy as well about what rights Boomy as the company owns versus what the user gets to keep — because I think there are still a lot of possibilities, and opportunities to experiment around that.

AM: Sure, so distribution is awesome. I mean, in some ways, it’s start-up 101: what are your users doing? And the first kind of batch of the first 1,000 users — what they were doing, um, was they were creating songs in Boomy, they were downloading them, and they were putting them on streaming services through other distributors. We saw that from a whole bunch of people. I think there’s this natural inclination the first time you make a song, or when you finish a song — what you want to do is share it with somebody. Right? You want to show it to someone. And I think the coolest way to share it is through a streaming service that gets monetized.

Philosophically, it really comes down to expanding access. We have all seen the numbers, we’ve all seen the research on the incredible amount of royalties that are going to be available over the next 10 to 20 years thanks to the increase in streaming. But the issue that we see with that, is that the only people who are going to be able to participate in that economy are people who have musical talent, skill, time, expense — and it just necessarily leaves a lot of people out. A lot of people just don’t have the time or access or money or resources to create music. And so if what we’re doing, fundamentally, is creating an even playing field for everybody to create music and participate in that economy, enabling distribution right through our platform is, one, a response to just that’s what the users are doing anyway, and we want to capture that, and two, it allows us to, like you said, be flexible with how we manage those payments and get it back.

So we have an 80–20 split, to answer your question directly, where you can create songs on Boomy and then after you’re done creating them, you can go into your royalties tab, add a song, add an album — you can distribute singles and albums — and we send them up. We call it an “application.” I think that’s important to note: it isn’t valuable for us, or anyone, for us to be flooding DSPs with a whole bunch of content, so we actually set up as an application for distribution. But we’ve certainly had releases in the hundreds go up without issue in the last several weeks since we’ve been doing that feature. And yeah, we send 80% of our royalties back to the user.

CH: Okay. That’s super interesting, because something I also noticed is that you have at the bottom of the Boomy page in beta is how many songs users have generated to date. And it’s past 100,000 [tracks] — you say it’s, like, at least 0.12, 0.13% of “the world’s recorded music.”

Assuming, now that you’re out of beta, that user growth is going to accelerate at a certain rate over the next year, you can very easily get that number to one million within a year, or at least within a couple of years. Depending on how the application process for distribution evolves, a good chunk of that could also be distributed and then monetized, and living on these platforms. Right?

AM: Sure.

CH: I feel like volume is already something that a lot of people in music are already kind of overwhelmed by. You see stats that Spotify has released, around 40,000 tracks being uploaded daily, or every 24 hours. Even without the dominance of A.I. in the music creation process, there’s already this huge volume, and I think A.I. could potentially exacerbate that, just given the ease of creation. I’m curious as to how you think about that, and to what extent you think the music industry is or is not prepared for that shift, because I feel like supporting that and processing that requires a lot more infrastructure.

AM: Sure, so I’ll tell you exactly what I think about it. So two points. The first is that, so, 40,000 tracks a day. Right? I think people in our world — you know, the music world — we hear that, and we think it’s a big number, right? We’re, like, oh my God, 40,000 tracks. Do you know how many photos go to Instagram every day? Off the top of your head? It’s 95 million. 95 million photos go to Instagram every day. And there’s something like 350 million photos that go to Facebook every day.

You’ll hear me make references to photos a lot, because that’s really how I think about this market. I think for a long time, if you wanted to have a photo, you needed all this equipment — you needed all this time and expense, you needed the right lighting. It was this hard thing to do. And then instant photos came along, and all of a sudden it happened in two seconds.

We call what we do “instant music” — we don’t even necessarily call it “music,” right? — and the metric that we track in terms of percentage of the world’s recorded music is there to inspire exactly the thought that you just had, which is getting ready for a world where there will be a million songs a day. There will be ten million songs a day. Of this I’m certain. It’s going to happen. The only question is on what timeline is that going to happen, what’s the role of automation, and what are things like Boomy in the market going to have on that. But we’re going to get there no matter what.

I want you to think about your five favourite photos. You probably took all of them. Or someone took them of you. They might not be “technically brilliant photos,” that, like, a photographer, a professional photographer would take, but they’re still meaningful. Right? And this is what I mean when I say that “the win condition is meaning.”

So I think, again, 40,000, like, is nothing. Right? If you think of it as data, if you think of it as individual contributions of creativity, that’s a tiny number. And I think, frankly, it’s so small that the world outside of our world — and this is the second point I want to make — that in the real world, it just doesn’t register quite as much as other types of media, right? In some ways, I think what we’re trying to do is match the ease of use or ease of creation of photos, and replicate that experience of almost like an iPhone camera — where you don’t know anything about photography but you get to take a photo and now there it is — we’re going to get there with music. And as soon as we are, and arguably Boomy is there today, yes you will see more content going up to DSPs, you will see more content in general, but again I just challenge the notion that 40,000 is a lot. I think it’s nothing. I think it’s very small, compared to where it’s going to go.

CH: That is such a good point. And I’m thinking, a lot of apps like YouTube or TikTok especially that are centered around UGC [user-generated content] that’s connected to music in some way — like, people are used to that level of volume. Obviously, it’s not like every video is producing a new song, but it’s producing a new expression on an existing song. That’s already happening at [the rate of] tens of millions of posts a day.

And so it’s interesting that that hasn’t really translated to the actual music creation process, I guess because there still is a perception that — it is a universal language in terms of people enjoying it, but it’s not a universal language in terms of what it takes to create it, and you still need to build up skill over the course of years. Even the thought of posting something to Soundcloud — like, you have to have a full song ready, or even just, like, a minute to three minutes of a fully produced song, to upload it to even have a chance at getting noticed.

AM: Making music is really hard still. It’s so hard, if you think about it. And even the really simple music apps are hard for most people. And I think a lot of that has to do with the fact that the people who are designing music apps are musicians.

Another thing that we’ve learned over the last few months of beta is that if you really want to capture that everyday user, the friction level has to go essentially to zero. It’s got to be, like, one button. Two buttons. Or else you lose them. A good example of this would be, when we first launched the beta — absolute first version, a couple hundred people on it — we had four options, right? And you could pick intensity, genre influence, tempo, and what you wanted in the song, so drums, bass, melody. Right? And we were like, this is so simple and intuitive. Everyone’s going to get it. It’s, like, four options, and then you press a button, and then a couple seconds later, you get this cool song you get to filled with your options.

People were so confused by that, by just those four things. Like, “what is intensity?” was one thing that came up. “What’s EDM?” Is another question that came up. “I see these influences here, I know what hip-hop is, but, like, what’s trap?” And, like, what do these words mean? What’s the concept of speed in a song? Let alone, what is the word tempo? What is the word melody? People who have music education, and who had been lucky enough to have music education — these are just concepts that we understand intuitively. But to the everyday user, it becomes very like, “I don’t know what that is. I’m not a music person, I can’t do that.” I mean, imagine if every time you wanted to take a photo with your iPhone camera, it was, like, “Cool, so tell me what kind of lighting settings you want, and then we’re going to take this photo.”

And so we sort of went back to the drawing board, and we said, alright, how do we make this even simpler? And we launched what we call “filters,” because people get that. It’s like a photo filter. And the filter doesn’t have any musical thing in it at all. You can preview it, and you can kind of see what it sounds like, but we’re not enforcing any sort of genre on it. And we came up with silly names for them — one of them is “maximum thump,” which is in place of, like, high-intensity EDM. All of our stats went up by four for the next group. Four times as many songs, four times as many saves, four times as many people who kind of got through the process and understood it.

And so that was a really important lesson for us, in that if you look at the simple music making apps, even the simplest ones usually break it out to bass, melody, chords, and leads, and for a certain kind of user, for that mass user — the person walking around on the street who couldn’t care less about the things you and I care about in the music world — you’ve already lost them when you say the word “melody,” or when you say the word “tempo.” I think music people like us, we can be very esoteric with the way that we approach these things. And so it’s been a learning experience for us, and we’ve been learning from our users, that you’ve just got to make this, super, super simple.

CH:I really like that analogy and the name “filter,” and the comparison to photo filters. Because I was just thinking … I’m an avid user of Instagram, and I have no idea what the filter “Clarendon” means. Like, I don’t even know if I’m saying that correctly.

AM: [laughs] Right.

CH: But, like, I have a very clear idea of its effect and how it impacts the photo, and I can go in later and do some customizations. It’s not like the filter is necessarily tied to anything about the skills, or, like, the techniques behind photography or anything. Just saying, “Here’s a filter, do what you want with it, you can still customize it.

AM: Right, and here’s what it looks like now. And for us, it’s like, here’s what it sounds like.

You can see this in the live version — I don’t have to say beta anymore, that’s so exciting! — you can see in the live version, we actually those same four things [in the original beta], and we called it “advanced mode.” So that’s “advanced mode” now. And it’s actually super fun, it’s probably the most fun to play with for people who do understand music. It’s not like we took that away. But again, we’re trying to get to that “instant music” thought without having to know all the stuff that we’ve had to learn about music.

CH: And to go back to a lot of the fear people have about A.I.’s role in music creation, one claim that I think is totally ridiculous, but keeps being thrown around a lot, is that A.I. is going to replace human musicians somehow. Or, like, now that you have “instant music,” why do you need humans to make it? Because you just do it instantly.

AM: It’s still humans making it! [laughs] It’s still humans making it.

CH: Yeah, there we go. It goes back to the whole “A.I.-generated music” framing.

AM: Obviously we’ve heard some of this fear. We get it on Twitter mostly … But if you just look at the history of music and technology in general, there’s never been a significant advancement in music technology that was not immediately met with fear, and eventually was considered something that is normal, and a thing that everyone does. Right?

There was a backlash against synthesizers. Queen used to put “no synths,” proudly, on the back of all their album covers, because they were taking a stand against synthesizers. I think we can all agree that synthesizers were a good thing. At some point in history, someone decided to make a bigger viola. It all started with a viola, and then someone else was like, “hey, if we make a bigger viola, it would make a deeper sound.” And then everyone around them was like, “Are you insane? A bigger viola? How would you even play it? You would have to put it on the ground and bow it horizontally like a moron.”

Queen was notoriously anti-synth. [Source]

And so I’m saying, there’s always going to be kind of a discomfort, I think, especially because we are reducing the effort, but I think that’s only going to come from, one, people who don’t necessarily understand that what we’re trying to do is not replace artists or replace anything, we’re trying to help people create music that’s meaningful for themselves in a way that they don’t have to know anything or pay any money to use, and I think that’s hard to argue with. I think it’s hard to say that that’s not, uh, that’s not a noble goal.

To be somebody who’s really “anti-A.I.,” you sort of have to be someone who has the opinion that music is something that should be reserved for an elite class of person, for somebody who has gone through the training and taken the lessons and has the equipment and the time to create music. And anybody who doesn’t have that, or doesn’t have sort of this internal talent, doesn’t deserve to create music. I think that’s an argument you could make; I don’t think it’s one that’s going to age well.

CH: Yeah. Actually just a couple days ago, I was talking to someone about how a lot of people were reacting negatively to autotune at first, but then arguably it was T-Pain who brought autotune into [the] real mainstream — and so many people have tried to replicate what he’s done through autotune, in terms of just the quality of singing, or making the experience sound as great as possible, and it’s actually been really difficult. So he’s successfully been able to express himself and actually make a mark on culture, using autotune just as a tool to do that.

AM: Exactly, and I think you’re going to start seeing something very similar from us — but I want it to happen every day. Every high school and every college should have its own pop star. That isn’t necessarily reflective of the musical skill of the person; it’s just, you know, they are the person who everybody wants to listen to, who has a voice, who now has a musical voice that they may not have had before. I think we’re going to look back on this, and we’re going to look back on it the same way we look back on synthesizers, and autotune, and stereo recording, and electric guitars — the guitar wasn’t taken seriously for many years, you know, don’t forget — and so I think history’s on our side. You could pull out a ton of examples of advances in music technology that are initially met with skepticism and fear, and then, you know, years later, that’s what everyone does. Like, everyone uses autotune now, because it sounds good and people like it.

CH: Yeah. And this is just sort of the last question I want to ask before the final segment: I think one potential source of fear is in who ultimately owns this technology. Like, thinking about A.I. and music creation, specifically, one thing I’ve noticed in terms of what’s being written about this space is that there has been a lot of emphasis on bigger tech companies. Google, IBM, Sony — they’re all creating their own A.I.-driven tools for music creation. And they do seem to be the ones leading the conversation in part because they do have the resources and the engineering talent, importantly, to bring this stuff to life.

But there are also smaller indie start-ups like Boomy; there are lots of indie artists who are working with individual developers, and therefore [with] not as many resources but maybe making more provocative statements about the tech. So I’m wondering how you see that dynamic unfolding in the future, with respect to bigger tech companies versus smaller start-ups [and] singular artists playing with this tech. I personally kind of see it as a rift in terms of, like, approaches and philosophies, but I’d love to hear your thoughts on that.

AM: Sure. I think with the bigger tech companies, it’s less about, like, “let’s make a bunch of music,” and I think it’s more about [how] they are the ones really leading the charge with the underlying technologies that startups like us get to use and employ for our purposes. I’ll give a shoutout to Google Cloud — they have supported us in an incredible way with a lot of credits and a lot of advanced technologies. We would not have been able to build what we’ve built at Boomy three years ago, or even two years ago. A lot of the stuff that we’re using is very bleeding-edge stuff that’s come out of Google.

What I’m hearing from you is that people are uncomfortable with large tech companies owning everything in general these days.

CH: That’s part of it, yeah.

AM: And going into culture is sort of another angle of attack in that argument. At the end of the day, it’s easier to focus on the negative than it is to understand the positive, right? Uber gets a lot of flack, but they’ve also probably saved a lot of lives through the lack of drunk driving, which is not something that’s easily trackable.

And so when it comes to music and kind of the smaller startups versus the bigger start-ups, we’re trying to become a very big company with this. The win condition for us, right, is if we can quickly enable people creating meaningful music that means something to them, this is going to be the most significant advancement in the creation and distribution of music probably in history. And I really mean that. I think you can get to a very, very large company through something like this.

At the same time, I think that the days, as a technology company, big or small, where you can sort of get away with things that were not good for users, or could [have] sort of a lack of transparency, are over. You can’t pull anything over anyone, on anyone anymore, right? We’re very open about our ownership model. We’re very open about how our technology works and what we’re trying to achieve with it. And I think so long as we stay transparent and we have an open dialogue with our users and our customers, ultimately, it’s the consumers and the users who will decide whether or not what we’re doing, what Apple’s doing, what Spotify’s doing, is okay with them. And they will vote with their attention and they’ll vote with their dollars.

But again, I feel like history’s on our side. The easier it becomes to create music, the more people will get to come into this space. I think we’re going to have a huge net positive for this space.

Like, why do hipsters like vintage cameras? You ever ask yourself that question? Like, why, all of a sudden?

CH: [laughs] That is a good question.

AM: I’ll tell you exactly why—because of Instagram. You saw an Instagram filter, and you’re, like, “hey, how does this work,” and the answer is, “well, it’s based on models from old cameras,” and then people wanted to go deeper. Right?

And so I think whether this tool comes from a big tech company or a small tech company or whatever, what’s going to happen to people — and this is already happening with our users, we’ve heard this from our users — it’s going to inspire people to join our world. That’s what we’re trying to achieve here, and I think that’s good, no matter where it comes from. We’ve seen our users be like, “hey, can I change this?” And we’re, like, no. We have certain editing features, but you can’t do that, you can’t track it out and then put it into a DAW [digital audio workstation] and then, you know, go learn how to make music. We’ve certainly inspired, I would say at this point, at least hundreds of people—particularly young people, who went from never making music before, to making music on Boomy, to now participating in the music creation economy in a way that they had never considered before.

And so wherever that comes from, I think so long as you have an open dialogue with your users, it’s going to be a net benefit both for our world of the music industry and kind of the music economy. I think it’s going to be a net benefit for whatever company ends up really breaking through on this. And I expect it to be us.

CH: Yeah, and in the way that Instagram did for photography, just giving people the opportunity to participate in that way. Yeah, thanks so much for that.

So, for the last segment — the over-/underrated segment —I would love for you to start. So, are there any pieces of recent music or entertainment news that you wanted to bring up and talk about?

AM: Aw, man, there’s so much. There’s so much cool stuff. I would give a lot of credit and a shoutout to Splice and their ML [machine-learning] team. They announced some research that I would say is underrated. They came out with some research I think two days ago, um, around splitting out — is this too nerdy? This is probably way nerdy.

CH: I don’t think anything’s too nerdy. [laughs]

AM: [laughs] Yeah. They’re using A.I. and convolutional neural networks to pull out — and in a pretty high quality way, which is the innovation — stems from stereo recordings, which is just awesome in what that’s going to enable for creation once that technology really becomes useful. Right now, we’re kind of limited in the A.I. space as to the data and the formats that we get to analyze to create stuff. We’re using MIDI — we actually had to invent a format for production because one does not actually exist for us to use, and so we’re sort of training on this new way of thinking about how you track production data.

If you look at the research from Splice and from others, soon you’re going to be able to pull out all of the necessary data from all the songs that have ever been written, that’s going to help us analyze both the history and the future of music, because we can then start applying that to our own songs. There are things about the songs that are being created on Boomy that — we might understand the underlying data, but when it gets turned into audio, there’s more to analyze there than what we currently have the ability to analyze. So I would give Splice a shoutout for that, it is cool. They put it on Medium, and they did a great job on breaking down how this stuff works, and so if you’re somebody who doesn’t really understand a lot about neural networks and how they work, it’s a great primer.

CH: That’s awesome. And thinking about stems specifically, a lot of people see that as the next logical step in terms of how existing recorded music or how traditional recordings could potentially be monetized. Be that in making the remixing process a lot easier, because my impression is the process of just getting that done, there’s a lot of legwork involved that arguably might not be necessary, at least from a licensing perspective; [or] in terms of sampling, which is kind of different [because] it doesn’t necessarily require breaking out stems — but people are kind of looking at individual units of a song and seeing what could be done with that creatively.

AM: And we’re sort of answering our own question from earlier: the more advanced a technology becomes, the more opportunities for modernization, and the more markets that open up, become available. That’s cool, and that’s good. And I think that gets lost a little bit when we talk about A.I., which is a phrase that just invites fear and skepticism, like you said. What’s getting lost is the fact that this technology isn’t being built in a vacuum for no reason; it’s being built to open up markets, and to open up economic opportunities, for companies and for people and for users.

We’ve all been hearing rumblings about very interesting new monetization schemes around stems. If you just look at Boomy, it’s a very interesting set of monetization around A.I. music that hasn’t been tried yet. When you find new markets, there’s always opportunity there. I feel like that gets lost sometimes because people are a little bit too — people want to be scared, more than they want to be inspired, sometimes.

CH: Yeah, that’s a really good point. And actually, that’s a good segue to the piece of news that I had in mind, because it’s kind of the opposite dynamic in the podcast world, of a previously completely open market potentially becoming more closed.

So, there have been a lot of mostly podcast distribution platforms, or streaming platforms, now investing in original, exclusive content. Like a couple months ago, when Luminary launched — it’s exclusive and gated. But the latest company to enter this space is Apple. So it’s announced this week that they’re going to be investing, reportedly, in original podcasts that would be exclusive to its Podcasts app — which is free to everyone, but it won’t be available, like, on Spotify.

I think this is both underrated and overrated at this point. So why I think it’s underrated is it might put the wider podcast community in a really weird place if part of their distribution also becomes their competition. This is actually why I decided not to host my podcast on Anchor, because Spotify bought Anchor, and this podcast — which I hope is a more or less objective view on music, would be governed by one of the biggest music platforms — so that’s why I didn’t go through Anchor. But now there are all these other platforms, and Apple I’m sure will not be the last one to invest in originals. So there’s some kind of conflict of interest there.

On the other hand, this may not be why it’s overrated, but I feel like Apple should really focus on improving its user experience alongside if not even more than investing in original content. Because I think if you invest in original, exclusive content, if the user experience isn’t there, I’m not sure if it’ll actually be worth it. Of course Apple benefits from really wide distribution in that they still command such a large share of podcast listening, but in terms of recommendation and discoverability, it just really isn’t there yet. And that might exacerbate the other point that I mentioned of competition with everyone else who’s already using your platform. So, is there a future where, yes, Apple is investing in originals, but it also makes it easier to discover new shows, or new episodes that you might not have seen before? So I think it’s kind of a double-edged sword there.

And then the last thing is that I think Apple is pretty set on keeping their music and podcast apps separate. They launched a new podcast app for Mac, for desktop. I think in that sense also, Spotify also has a competitive advantage of having both [formats] on the same platform. I think that gives them a lot more to work with in terms of recommendation and discoverability, especially across different formats — like seeing any trends in behavior in terms of, like, whether you listen to a certain type of music alongside a certain type of podcast or show. It’s a really interesting development, but I think there are a lot of caveats to Apple succeeding in this realm.

AM: Yeah, so I — like everyone, I’ve been listening to a lot more podcasts lately.

CH: Okay, yeah. [laughs]

AM: It feels strange to me how free all of it is, sometimes. I don’t know that world nearly as well as you do, but I know that a lot of them — and I think we all had the experience of, like, aaand now there’s an ad. Right? So you’re listening, you’re listening, aaand it’s ad time. Andyou go on, and you’re, like, oh, man, how do I skip through this, and you got to sort of fumble with your phone, and crash your car. [Cherie’s note: oh my god]

I think what you said about user experience is really key. Interrupting a podcast with ads is not awesome, from a user experience perspective. It’s almost even worse sometimes where it comes through in this weird disingenuous way, where all of a sudden we’re like, “and by the way, this mattress company is, like, super comfy, I really love it, and they’re paying me to say that.” And now you’re all suspicious, and you’re like, “do you actually like that mattress company?”

Again, the technology and the platforms opening up new avenues of new modernisation and new markets is part of what’s at play here. If you can have something where I’m now subscribing to a subscription podcast tier, or maybe it’s bundled in my Apple News subscription and Apple stays editorially neutral and doesn’t enforce standards on creators, then it’s just another way of getting paid, which is important. And I think Apple is a company that you can always count on to make stuff that makes stuff — so their laptops and their hardware, people make things with them. And to the extent that people are making podcasts and consuming podcasts, it makes sense to me that, eventually, there would be a form of monetization that isn’t, like, “hey, but what about this mattress.” Right?

CH: Yeah, because it’s so free, it’s, like, a whole economy that basically just runs on branded content. It’s like native advertising, basically, which — if you compare it to the music world, it’s not like that at all. If you compare it to the online media world, it is kind of like that, but there’s still some scrutiny around native advertising and whether it’s actually good for journalism in general. But in the podcast world, there’s very little criticism around it, because it relies almost entirely on advertising, and in-line advertising.

AM: But that could just be for lack of an alternative, right?

CH: Yes, exactly.

AM: Wouldn’t it be strange if, like, all of our songs included, like, weird references to —

CH: Oh my god. [laughs] Like, after the chorus. “Sponsored by …” Yeah.

AM: Right. Or just in the middle of it. And obviously, there is branded content within music, too — that’s a different subject — but look, like I said before, ultimately it’s the public, it’s the consuming public, they vote with their ears, they vote with their dollars. If Apple releases a whole bunch of things and nobody uses it, they’re going to stop. Right? Or if it becomes a much better way of monetizing podcasts, then, you know, you might see more creators going into that. We don’t get to decide; the market will do what the market’s going to do.

CH: For sure. I don’t know if you had any last comments or points that you wanted to bring up about Boomy — is there anything that maybe we can expect in the coming weeks?

AM: Sure. So we’re really excited — as of today, you can go to Boomy.com, and you can sign up and start making music. You can go from never having made a song in your whole life, to having a pretty cool track on Spotify, in, like, a day. Which is just awesome. So I would encourage people to definitely go check it out, and decide for yourself. Don’t take my word for anything, that we’ve said about creativity or anything — go use it, go mess with it, make some cool stuff. It’s a lot of fun.

We’re going to be launching a bunch of new features through the end of the year. This release was really about interface, and it was about consistency in the music, and the backend that we built and some of the engines that run Boomy. You’re going to see things like vocals and the ability to add your vocals very soon. We’re working on synthetic vocals, which is terrifying and weird right now, so we haven’t done it yet. [laughs] A.I. lyrics, the whole thing. If you can think of it, we’re definitely working on it. We’re going to release this to our users, and you’re going to see a lot of really interesting music come out of this platform. It already is not necessarily what we expected, especially in terms of the user response. So we’re learning alongside our users and everybody else. I would just say, go to Boomy.com — you don’t even need a beta code anymore. You can just use it. It’s awesome.

CH: Great! Thanks so much again for joining this podcast, it’s super interesting.

AM: Of course, of course.

If you enjoyed this conversation and want to follow similar analyses on the intersection of music and tech, please listen to the remaining episodes of the Water & Music podcast and/or subscribe to the eponymous newsletter!

You can also reach out to me via Twitter, Instagram and LinkedIn.

How artificial intelligence will do to music creation what Instagram did to photography

Rethinking creativity and accessibility for music in a world where “there will be ten million songs a day.”

Written by Cherie Hu