Data Science at ShareChat: Technologies and Challenges — ShareChat

Ankur Shrivastava
Jan 17 · 5 min read

Abstract

  • It further provides a brief look at all the challenges faced by ShareChat, and how the data science team is helping it resolve these challenges.
  • Simultaneously, it addresses the key element of content recommendation, and how it differs by virtue of being a part of a vernacular language platform.

In our previous article, we gave you a brief description of ShareChat — what we stand for, a closer look at our platform and its demographics, and how we are using the latest technologies to process a vastly increase array of data. Here, we delve further into the specifics of our data, and how our data science team is innovating core technologies to build a trustworthy, highly customised platform for the Next Billion to voice their opinions.

Decoding the data

While many existing recommendation models of machine learning (ML) and AI are already available on internet, the usage of these technologies on ShareChat is a challenge that we face every day. Key to that is the problem of vernacular languages. For instance, while recommending English language video content already has efficient models available online, vernacular content has little contextual understanding, and it is important that we manage to understand the content being shared on our platform. Only then will we be able to make it into a stable, successful one.

The right usage of technology also depends on the scale of data, which is different in every social media. For instance, a method like matrix factorisation which is commonly used in several recommendation engines doesn’t works well at ShareChat. Matrix factorisation performs well with at least 5–10% of the gathered data to include user engagement information, such as likes, shares and favourites etc. whereas at ShareChat data density is just 0.03%

A wide variety

The range of challenges do not end here — some of the other incidental challenges that we also face include the scalability of our technology, time complexity of recommendation and processing, incremental learning, indexing methods for faster retrieval of data, identifying content evaluation criteria, measuring user retention in short periods and reducing latency of our technologies. This only comprises a brief understanding of the challenges of cutting edge machine learning technology, and we aim to discuss these challenges in wider detail, going forward.

Our core technologies

These technologies, for our platform, find implementation in two primary channels — our Trending Feed, and the Content Processing Pipeline. The correct recommendation is crucial for our Trending Feed, for it is our landing page, and one where our users land up at for consuming new content. The core technologies for content recommendation for the Trending Feed include matrix factorisation, factorisation machines and neural collaborative filtering, which is an advanced, deep learning algorithm that further help us process meaning out of new user data.

In the data stream of our Trending Feed, the matrix factorisation method is fused with the deep learning algorithms, since typical engagement data does not give us enough information to refine our feed. Collaborative data filtering and dynamic feature engineering also provide additional information, and help create real-time data points, thereby allowing our ML algorithms to be trained.

Finally, when it comes to the content processing pipeline, perhaps the most crucial are the convolutional neural networks, which detect vernacular video data, process them as per language and content type, and recommend them as per data points. This is the trickiest part of our data-based technology work, because such work has seldom been done before. Computer vision algorithms and contextual text processing further enable us to understand each language in its native form, and hence recommend efficiently. For instance, it is very important for us to get the essence of a piece of text as well as gauge its tone. Only then we can we actually curate it into humour, satire, propaganda or the myriad other classifications.

Food for thought

Ayush Mittal


Originally published at https://blog.sharechat.com on January 17, 2019.

Sharechat

India's fastest growing social network

Ankur Shrivastava

Written by

Marketing Analyst / SEO at ShareChat | Full Stack Digital Marketer

Sharechat

Sharechat

India's fastest growing social network