System design: Music Streaming Services

Ishwarya Hidkimath
3 min readMay 9, 2024

--

Today I would like to walk you guys through designing music streaming services like Spotify, Wynk and Apple music etc. I have another blog written to handle top k songs played, “System Design: Top k songs played on music streaming applications”. This blog refers to generic design of the entire system as a whole.

1. Requirements

Functional:

  1. User should be able to search for songs.
  2. User should be able to play songs.

Non functional:

  1. System should be fault tolerant, reliable and highly available.

Extensibility: Can talk about audio encoding for better performance.

2. Back of the envelope estimations

Total users = 1B Total songs = 100M

size of each songs: 5MB, total storage = 500TB

Metadata storage: each 100B, total = 10GB

3. System APIs

We have /uploadSong, /searchSong and /playSong

4. DB Schema

Metadata can be stored in any simple SQL database

Song{
songID: 123
SongURL: https://s3...
Artist: Adele
Genre: Pop
Link_to_album_cover: https://..
audio_link: https://...
}

5. High Level Design

HLD of a Spotify

6. Design Deep Dive

Search for a song:

  1. User send a request to the load balancer.
  2. Load balancer distributes the traffic to the web servers.
  3. Web server queries from the music metadata database to find the search string.
  4. Web server returns back the related data to the user.

Play song:

  1. When the user clicks on play button. The request will be sent to web server though load balancer.
  2. The web server queries and gets the audio link from the music metadata database and sends it to S3 storage.
  3. It gets the song from S3 and starts streaming that back to the user.

7. Optimizations

  1. Identifying and eliminating Bottlenecks: Say there is a popular artist releasing new song, and we have majority of the request to this one particular song. All the web servers holding this requests, creates bottle neck to the s3 bucket containing this song. Hence we store the popular song in CDN (amazon cloud front). This would enhance user experience in terms of latency.
  2. Cost optimization: CDN comes with a high cost, hence coming up with your own CDNs will be cost efficient. Some songs are popular only in certain region, hence storing such songs in only near by CDNs will be efficient. Store only high demand songs in CDNs and less popular ones in high capacity storage servers.

8. Fault Tolerance

  1. Data replication
  2. Redundancy and failover mechanism
  3. Robust monitoring, alerting and logging systems

9. Performance

  1. Load balancers: Choosing the right metrics to distribute the traffic also enhance the performance. In this particular use case, I would like to do it based on network bandwidth and not on CPU.
  2. User can store most played songs in his cache which will potentially save a lot of load on the server.
  3. The server can also store limited most played songs in its in memory cache, to save requests made to the database store or CDN.
  4. We can also introduce a system for audio encoder and compressors to reduce the size so that it is more space efficient.
  5. Expanding the Database: Leader-Follower Approach
  6. As our application experiences significant reading activity compared to writing activity, with a large number of users streaming songs but relatively few artists uploading them, we can implement the Leader → Follower technique. This entails having a primary database (Leader) that manages both read and write operations, alongside multiple secondary databases (Followers or Slaves) dedicated solely to reading data, facilitating the retrieval of song and user metadata.

10. Scalability

  1. Amazon S3 provides automatic scaling capabilities.
  2. Introduce horizontal scaling for web servers.

11. CAP theorem

Given the nature of a music streaming service like Spotify, where real-time access to songs and metadata is critical for user experience, the primary focus should be on Availability and Partition Tolerance. Partition tolerance ensures that the system can withstand network partitions or failures without affecting its overall functionality, allowing users to access their favorite songs from anywhere, at any time. High availability ensures that users can search for songs and play them even in case of server failure.

Reference materials:

  1. Google system design interview: Design Spotify (with ex-Google EM).

--

--