Recipes for Tasteful Music Experience

Utkarsh Sopan
Wynk
Published in
7 min readJun 2, 2021

Wynk is a platform for Music where we have around 10M+ songs and a user base of a fifth of India.

We have content in almost every language of India and publishers around the world.

Music listening is a leisure activity hence a particular human will ideally not spend too much time searching for the music he/she likes. Users are usually accustomed to the “Radio” kind of construct where they start with a song they like and mostly depend on recommendations of the platform.

So there are many recommenders in the Wynk ecosystem.

  • Machine Recommender
  • Human Recommender
  • Coarse Grain Machine Recommender

Machine recommender is a hardcore machine learning based recommender system, which is nuanced enough to gather taste profile of a particular user based on user’s listening history.

It’s covered in a separate blog here.

Human Recommender is a team of musicologists who know what is trending and what are some deep learnings from music charts and the industry as a whole. They cater to a large chunk of wynk consumers who are discovering new music with these recommendations.

Coarse Grain Machine Recommender is a system which is running on aggregated user consumption attributes and works on a segment level rather than a per user level and it is backed by content attributes that are populated from various sources.

Now there are many use cases which are not the outcome of a single recommender system, because at an organizational level all subsystems have their own roadmap, objectives and initiatives.

These subsystems provide a specific ingredient for a truly personalized user experience.

Music consumption has many patterns which can be broken down into these broad categories

  • Leisure listening
  • Event/Party context
  • Nostalgia
  • Exploration and more…..

Building a specialized product as a monolith for products solving a need in above contexts is a non sustainable strategy. Some drawbacks of monolithic products.

  • Release cycles become long and slow
  • Variations of a particular use case is a maintenance over-head
  • Scaling of products become challenging as each context will have different set of challenges and requirements

Ideally we would need to build abstract capabilities as an “ingredient” and put them through a workflow engine to make a certain playlist/product for the end user to consume.

Some of the capabilities that we have built are

  • User Segmentation
  • Recommendations
  • Musicologist Curated Knowledge
  • Content Knowledge API
  • Viewport Templating
  • Artwork Personalization
  • Song Similarity
  • Recently Played
  • User Personalized Re-ranking
  • User level Ingestion and Key-value serving

Let us walk through some of the components in detail

User Segmentation

Usecase : User existence check in cohorts based on attributes and lists.

User segmentation relies on clickstream data aggregation on a user level, here our scalable data fabric prepares the consumable attributes for the backend segmentation system.

This system is the heart of our current personalization platform, it aggregates various data sources in the business and attaches user attributes, all these attributes are defined by the analytics and data science team.

This system also has the capability to split a segment on a randomized basis, which allows us to do AB experiments on any downstream service, which works on a per-user basis.

This system also has a scalable way to upload a list of users as a segment, this list is stored as a bloom filter to give it a high-cardinality scalability.

Stack:

The reason behind the reactive grpc stack was the tremendous scalability demanded by this API. This also runs as a GraalVM native application which reduces its memory footprint by half. It starts up instantaneously to give us advanced auto-scaling capabilities for all kinds of workload spikes.

Musicologist Curated Knowledge

Usecase: Allow Content team to collaborate with machine learning, segmentation and also serve as content knowledge.

This is where we add human intelligence to our recommendation architecture, this is a system that is inclusive and flexible.

Here we allow our content team to interact with catalog and come up with playlists which are targeted to a specific taste profile of the users. This is coupled with segmentation capabilities to target specific user cohorts with their content curation.

Heart of this system is its design language, which is inclusive. Also in the backend it touches upon all of the underlying services in the Wynk universe, which brings a need for an api gateway kind of design pattern.

Stack:

  • Frontend: React
  • APIs : Spring Boot
  • Datastore: Downstream services, Portal IAM : MongoDB

Content Knowledge API

Usecase: Answer complex questions about content for the recommendation system.

This is a system where we gather various attributes of content, sourced from catalog, analytics, machine learning, and content partners.

This has various underlying storage layers, and ingestion architecture which supports addition of new content and attribute sources. The API layer is fluent and scalable.

Stack:

  • Storage : JanusGraph (Elasticsearch, Scylla), Aggregate Query layer : Apache Druid
  • API Layer : Quarkus Reactive Grpc, Openfaas

Viewport Templating

Usecase: Allow user experience to evolve with the product.

This is a problem that exists in every app where there are many versions of an app in the market but you need to keep upgrading the user experience for the end user. This is where this system comes into play. This system keeps the template as data in the database rather than in code. It assumes a base set of “Key Classes” which are respected by all app versions.

This system keeps maintenance of app experience easy and extensible, it has capabilities to filter template responses based on app versions sent by the client in headers. This also helps us in staging the capabilities to end users on a single template.

Stack:

Artwork Personalization

Usecase: Provide contextual and personalised artwork experience.

Artwork is an important part of the content experience, there are many underlying thoughts around how artwork of a particular playlist looks like. This gives a need for a flexible system which connects with content knowledge and artwork sources. Some use cases utilize artworks which are sourced from various entities which are relevant for particular context.

There is a need for an image processing layer which is scalable for the Wynk scale. Here our devops team has built an amazing infrastructure based on thumbor server, which allowed for us to collaborate with the design team to marry their elements with catalog items.

There was a need for a flexible mechanism which sources images from various sources and presented as an url to the client which in turn is served by thumbor. This is a workflow use case made alive by a nano-service framework called Open-faas running on kubernetes.

Stack:

Other than above capabilities we have some platform level capabilities which are very useful in maintaining team morale and time to market.

Zero code Observability

We leverage Service mesh on Kubernetes i.e. Istio which gives us advanced traffic routing capabilities also observability out of the box. This really helps us identify bottlenecks in the system and scale each component individually based on requests it is receiving rather than generic system level metrics. Horizontal Pod Autoscaling coupled with Isito metrics gives a scalable compute fabric.

Canary Releases

In a system of this scale and complexity we ideally need to quality check every commit, we have a strong QA team behind the systems, but to ensure zero user impact we have a canary release framework adopted from our service mesh capabilities. Flagger gave us slack integration, simple canary definition and monitoring out of the box which was a very powerful tool since it provides dev team and QA team with last safety-net. Makes them more receptive to change requests and overall business side scalability of things.

Autoscaled Kafka Ingestion

Content discovery platform is a host of many datasets coming from various sources, we do not have our own aggregation layers, most of our serving datasources work on a “per-record” ingestion model. There is a need for a “Buffer” between data ingestion and storage. This buffer is provided by Kafka. There are many use cases which are powered by batch workloads.Maintaining a long running set of kafka consumers is inefficient for our runtime.

We have built lag-based autoscaling consumers which are directed to solve for streaming ingestion of batch job outputs into our key value stores. This helped us in moving fast on hosting new usecases. We used Azkaban, kubectl and Kafdrop (for lag monitoring) to build the system.

Credit goes to the awesome team behind it : Ankit Srivastava, Arpit Anand, Krittam Kothari, Trideep Sharma, Rahul Swami, Randhir Kumar, Shreyans Mongia, Sanchit Malhotra, Dheeraj Sharma, Akhil Sai, Sourabh Kalal and many more

--

--