Above the Clouds: ‘Data’ at SoundCloud
SoundCloud’s data team is entering a new, exciting phase. This is a story about our journey.
SoundCloud is where the music culture of ‘now’ is defined. Every day, millions of people express themselves, listen, create, discover, and connect around music, in ways that define themselves and sometimes change their lives forever.
At the core of our product, there is a unique dataset — one of the richest music datasets ever created on the internet:
- A social network where the biggest stars in the world of music and audio interact with their most loyal fans
- A platform where unsigned artists (the stars of tomorrow) share their first creations with the world, find a community of like-minded peers, and receive feedback
- An ecosystem where amateur and professional curators scan hundreds of tracks to find the next gem that no one has heard before
- A platform where people from every single country in the world have some of their most magical listening experiences.
The Early Days
When I joined SoundCloud at the beginning of 2011, I felt like a kid in a candy store. A musician and statistician myself, I couldn’t imagine a better place to combine the things I love the most in life: music, data, technology, and DIY culture. I was the second person on the analytics team, joining a group of thirty people working from a small office in Berlin with a vision to change the music industry.
SoundCloud had around 2 million users at the time and was already getting some attention on the global tech scene. But it wasn’t about the data. The company grew to this point for a different reason: It was a rare combination of intuition (creativity) and a deep empathy for the user that was guiding day to day decisions for the team. And it was working really well.
Data Informed Decision-making
Around that time, the ‘big data’ frenzy began taking over the world and the phrase ‘data driven decision-making’ was showing up all over the internet. When I joined SoundCloud, I saw a massive opportunity to build something unique. SoundCloud was already growing really fast, so there was no reason to disrupt the organizational culture of something successful. Instead, we decided to build on the same principles (intuition, empathy) that had made SoundCloud successful over the years.
Our plan was to add data to the equation. The ‘open’ nature of SoundCloud, together with the ‘social’ aspect of the platform creates a goldmine of user feedback and behavioral evidence on how the world experiences and interacts with sound. We saw our dataset as a key asset that can inform our business and product decisions, and enable us to disrupt the way music is shared and discovered across the internet.
In comparison to the majority of the industry at the time, none of us believed that data should always lead the way, so we wanted to avoid being dogmatic with data. We saw intuition, empathy, and data as complementary to each other and we valued them equally when making decisions. This approach helped us create strong partnerships within the organization, even with people that were previously skeptical about the role of data in product development processes. Today, some of the most successful online communities in the world (Airbnb, Kickstarter, and more) openly embrace the same approach for reasons that are well articulated and understood by the data community.
The People Part of the Equation
Building a data informed culture starts with people, so in order to realize that change we had to carefully rethink how we recruit and onboard ‘data’ people and consider what kind of culture we would like to foster. Operating in an environment that is constantly changing (creation and consumption patterns in the digital music space continue to evolve dramatically), we knew we had to focus more on the process of learning rather than just the output.
When recruiting for data roles, we decided that we want to look beyond just the brightest flash in the talent pool. On the one hand, we looked for people who can demonstrate strong foundations and expertise in areas that are valuable to us. On the other hand, we wanted people who are comfortable with the exploratory nature of the work and the ambiguity that comes with it.
For that reason, we looked for people that are team first — who understand that they are a piece of the bigger puzzle, and want to make an impact. We looked for people who know how to communicate and have high emotional intelligence in order to operate in a team and further evolve it.
It’s the domain expertise, the desire to succeed, the empathy for our users’ problems, and the team that help our employees go the extra step in their learning and career journeys.
Early success stories and the people behind them changed the culture of SoundCloud and led to more investments in music information retrieval, analytics & statistical inference, machine learning, data engineering, and distributed systems.
As our user base grew significantly year after year, we modernized our data stack and invested in machine learning to better connect creators with listeners. And as our company grew, we built tools and curricula to improve data access and analytical literacy across the organization so that every SoundCloud employee could benefit from our efforts.
As we empowered more and more teams to leverage data, it was necessary to reorganize resources throughout the company in order to successfully manage new challenges. We recently evolved our organizational structure around data and organized all our data scientists and data engineers under one common mission and strategy.
Data (Science & Engineering) @ SoundCloud
The newly formed data team (science and engineering) brings together a variety of skills and backgrounds that we would like to harness to continue to serve the most inspiring community of music and audio on the web. We’ve used this opportunity to think critically about how we’ve worked in the past, how other companies have improved their data orgs, and where we believe SoundCloud is headed.
In this new data organization, there are two fundamental pillars. One is ‘data platform’ and the other is ‘data science.’ The focus of the data platform team is empowerment and productivity for data scientists and back-end engineers through data infrastructure and tooling, while the remit of data science is to solve specific product and business problems using evidence, statistical inference, and engineering.
Our team seeks to create the richest music dataset in the world, make it easily accessible and consumable in order to build unique personalized experiences for our users, and make informed decisions across the business. Today, data and science are core to the SoundCloud culture and we are really excited for what the future holds.