Building NGraph: Noon’s Social Learning Graph— Part 1

Published in

Noon

6 min readDec 17, 2021

Context

In our last post, we talked about social learning, student’s engagement in the classroom, and how we at Noon are building a social learning graph to better engage students. Over the course of the next few posts, we will dig deeper into the design choices and architecture of nGraph — the social learning graph — that powers the social learning experiences at Noon. But, before we get into the graph itself, it is important to understand how the interactions between the three pillars of Noon — Student, Teacher and Content — are captured and how do they flow into the system. This is going to be the area of focus in this post.

Event First Thinking

There is always something happening on our platform — mutual interactions between the entities, contents being created and the entities having their own lifecycle as well. Moreover, in order to provide a rich and relevant social learning experience, it is imperative that the system reflects the most recent state of entities and their interactions.

So, how could we make these two ends meet in a way that is elegant, scalable and capable enough to handle the varied and constantly changing types of interactions and activities on our platform?

This is where event first thinking helps. Each action on our platform is treated as an independent event that can be acted upon asynchronously in real-time. Events are a first-class citizen in our ecosystem.

As we embarked on the journey of building nGraph, we quickly realised the value of adopting an event-first approach. There are far too many ways in which our students and teachers interact on our platform, generating tons of events that can be used to arrive at the affinity between students, teachers, and content and inferring the behavioural characteristics of entities.

For example :

a student attending a live classroom on physics is an event that allows us to determine affinity between him/her with that subject — physics.
a student’s tendency to activate the mic is a very important behavioural inference about her that can be used while forming teams.

With an event first approach there are 3 critical things that have to be addressed :

Event generation
Event hub — the home of events
Setting the events in motion

Event Generation

Event generation is a deceptively simple thing. If not done right, it can mess up the quality of data in the upstream itself. There are two sources from where we trigger the events -

Backend services — for any action performed by the user on Noon that results in a mutation of state, the events are triggered from their corresponding backend system. This is an important and conscious decision we have made, to avoid generating events from the different clients that we have, as much as feasible.

For generating events from the backend, we have developed an event generation library, that asynchronously pushes the events to Kafka.

For the backend systems, event generation is actually a side-effect. So, it is critical that event-generation should not interfere with the normal flow and it should have minimal impact on the performance of the backend systems. And it should be fairly simple to integrate. With these in mind, we do not impose any schema adherence at the time of generating the event. At this point, we just capture the context, request and response in an event and it is asynchronously pushed to a set of intermediate Kafka topics that sits before the EventHub.

Integrating the event generation library with a Spring boot microservice looks something like this -

Client side (Web / App) — for the read-only kind of actions, we trigger the events from the clients.

Event Hub

Event-Hub is the middleman for downstream consumption of events. It is essentially a set of Kafka topics organised by domains. Each domain is a logical collection of a related set of activities. For example, all events related to an online classroom are categorised into a live-session domain. This kind of logical organisation of events into domains helps us strike the right balance in the logistics of events.

Simple & Consistent Schema For Events

All the events residing in Event Hub adhere to our Noon Event Format. Noon Event Format is actually a semantic triple of subject-predicate-object, additionally containing some associated properties of the entities of the triple. Subject is the actor performing an action, predicate is the type of action, and the object is the acted upon entity. Every user action can be represented as a semantic triple and that makes it easy to reason about in the downstream systems.

We use our real-time stream processing platform — Pronto (discussed in the next section) — for consuming the events from intermediate Kafka topics and normalising and transforming them into Noon Event Format and pushing them into the Event Hub.

Setting the events in motion

There are quite a few events in our system and even more varied are the requirements for downstream consumption — same event can have different kind of interpretation and the corresponding mutation in the different data stores in our ecosystem. There can not be a single team handling all the downstream consumptions. This has to be democratised. Also, not every product engineer needs to know the nitty-gritty of writing a stream processing application and managing the deployment and monitoring of the jobs. This is why we came up with Pronto.

Pronto

Well, this deserves a post of it’s own but, the area of focus in this post is incomplete without talking about how we set the data in motion. Pronto is a real-time stream processing platform to meet all the real-time processing requirements at Noon. The goal of Pronto is to make it easy for the product engineering teams to consume, transform and eventually dump the stream of events generated from user’s activities into the application specific rich-data-stores. It is built on top of Apache Flink.

Creating a Pronto application requires configuring the Sources and the Sinks, and providing an implementation of the transformation logic. Job Compiler, the generic Flink job driver takes the job configuration and creates the source, sink and the transformers DAG. It allows you to chain multiple transformers.

A Pronto job configuration looks something like this :

Infrastructure, Deployability and Observability

Pronto takes care of job deployment, maintaining the uniqueness of jobs and application observability.

It ensures that there are no duplicate jobs — jobs having the same combination of Source and Sink — in a running state. It won’t deploy any new job if it finds that there’s another job in running state having the same combination.

It allows you to plug in custom metrics into the transformers and have application-specific Grafana dashboards. We also have Grafana dashboards for monitoring the health of the cluster.

Pronto uses an elastic Flink cluster on EMR and collecting metrics in such a dynamic setup isn’t pretty straightforward. We use a hybrid push-pull based mechanism for collecting metrics from Flink Task Managers.

In this post, we covered how we go about generating the events(Event Generation Library), the mechanism to set the event in motion(Pronto), and the home of events at Noon (EventHub). These are super critical to building the Social Learning Graph that deals with varied kinds of interactions and activities on the entities that we have.

In the next parts of this series, we will talk about how we ingest data into the graph, what does the graph look like and how does it help in the social offerings at Noon. We will also talk about the GraphQL based serving layer of nGraph.

It takes a team to build something as complex and critical to the company’s mission and we at Noon are fortunate to have a great bunch of engineers trying to solve this problem.

If you share the same enthusiasm in solving these kind of problems that contribute towards making learning fun and radically change the way people learn then we would love to hear from you. Please reach out at hrindia@noonacademy.com or ping me on LinkedIn.