Data Science in Pratilipi
Data Science is impacting various aspects of our life increasingly, and it is probably even more true for any technology based company. At Pratilipi, we always think from a user-first perspective, and oftentimes data science provides us a better solution than most other approaches.
With this short article I will try to provide a very high level overview of some of the use cases where our data science team has been able to significantly improve on our initial approaches. 2 points need to be highlighted here -
1. All services are a work in progress, and we are actively trying to improve them and continuously learn and iterate on what is the best approach for a use case.
2. Some use cases touch different stakeholders (readers, writers, internal teams etc) so we often have to figure out what the right trade offs are.
Major Use Cases
Recommendation service — This service forms the basis of how we suggest personalised contents for our readers. This service runs on a daily basis and generates a list of content which can then be shown to the users when they are on our platform.
Next Read recommendation — This service provides a list of content to users upon completing a story. The aim is to provide stories that they would like to read next.
Promo notifications — This service identifies one content that needs to be suggested to users daily/or twice a day as a marketing /promo notification. The primary purpose of promo notifications is to help with discovery of content that our readers are likely to enjoy.
Social Media Posts — This service identifies content that we post in various social media platforms. The aim here is to provide writers one additional place of visibility and to our newer readers a initial flavour of the content on our platform before they become regular on our platform. For some readers even after they become very active, our social media posts act as a way to both discover and talk about great stories.
Quality check service — This service examines the quality of the content published in our platform based on its content as well as cover images. Once these contents are flagged, our language team takes appropriate action on them. Contents may be flagged for copyright violation, pornographic content etc. For detecting contents we have built our internal models and for detecting images we currently use AWS rekognition module.
Meaning & Synonyms — For readers, we have built a dictionary which provides the meaning of the words and wherever possible synonyms of the words. This is available in most of the languages for our app users. We are thankful to open source communities for building the base models basis which these services run. As a token of our gratitude and give-back, we have provided open access for anyone who wants to consume this on their own platform as an API. https://gruite.docs.apiary.io/
Content meta service — For the contents published on the platform, certain tags and keywords are identified from the content, which helps us to better understand that content and enable its discovery for our users.
Top performing contents — Contents are identified which are showing greater impact on the platform based on their read, rating, reviews and various other actions performed by the users. These contents are then used by different teams for IP acquisition, conversion to other formats such as Audio or Comics etc.
We believe we have just started scratching the surface here. And still a long way to go from where we want to be.
If you are a data scientist or an aspiring data scientist who find these problems interesting, or even if you just want to have a chat about any of these use cases, please feel free to reach out to me at sunny@pratilipi.com