Scaling xAPI

10 min readJun 24, 2024

Scaling xAPI (Experience API) comes with several challenges that need to be addressed to ensure the system remains efficient, reliable, and responsive as it grows. In this segment we are going to look at some of the key challenges in scaling xAPI:

Data volume of xAPI messages
Performance, latency and reliability
Database management

1. Data volume of xAPI messages

xAPI (Experience API) can capture extremely detailed data, tracking virtually any user interaction you want. For example, consider an “article” with several topics and an embedded video. You can track the following interactions:

Article Launch: Indicates that the user opened the article.
Topic Viewed: Indicates that the user viewed a topic in the article but has not completed it.
Topic Completion: Indicates that the user finished a topic within the article.
Media Interaction: Indicates if the user interacts with any media object in the article, such as a flip card.
Activity Pause: Indicates that there is no user activity for a defined period of time.
Activity Resume: Indicates that the user returned to the article after a pause.
Article Completion: Indicates that the user finished the entire article.
Started Viewing Video: Indicates that the user started viewing the video.
Paused Video: Indicates that the user paused the video.
Resumed Video: Indicates that the user resumed a paused video.
Video Fast Forwarded: Indicates that the user fast-forwarded the video. By counting how often this event occurs, you can determine how many times the user fast-forwarded.
Video Completed: Indicates that the user finished watching the video.

So, even a simple article with a few topics and media objects can generate numerous tracking events per user. Now, imagine tracking interactions for more complex learning experiences and a large number of users on the platform. The volume of data generated by xAPI can quickly become massive, leading to significant data (volume) and rapid data arrival (velocity).

As discussed in the segment “How to get started with xAPI?” once your objectives are clear, you must carefully decide what kind of data is relevant to capture to meet your objectives. Once you’ve made this decision, your xAPI instrumentation should only generate those relevant events, ignoring other types of interactions. Irrelevant events are noise and should be ignored. This is the first step to consider when dealing with xAPI scaling.

More data about your users’ behavior and interactions is always valuable. It’s often a trade-off between collecting the necessary data for your analytics and maintaining optimum system performance and scalability. Practically, this is an iterative process: you start by enabling certain events, derive insights from the data, and then identify additional data needs. Based on these insights, you can enable more events to refine and expand your analytics.

2. Performance, latency and reliability

Performance is a key aspect when it comes to supporting a large number of xAPI messages. Let’s put this into perspective with an example. Suppose you have a busy learning platform that attracts roughly 5,000 concurrent users. These users are actively engaging with your platform at any given time. As they interact with your learning content, such as launching an article, interacting with media objects like flip cards and videos, or taking a quiz or assessment, each interaction generates an xAPI statement.

In the worst-case scenario, each user may generate 1 or more xAPI statements per second. This means you would receive approximately 5,000 * (1 or more) xAPI messages per second that your back-end system must reliably handle. As the number of active users on your platform grows and the interactions you track increase, the rate of xAPI messages per second will rise considerably. This traffic is in addition to whatever your back-end systems are currently dealing with.

This illustrates the importance of designing a scalable system that can efficiently process a high volume of xAPI messages to ensure reliable performance and user experience. Let’s look at a standard best practice architecture for managing xAPI messages at scale.

2.1 xAPI Source

xAPI statements can indeed be captured from various sources, with Learning Management Systems (LMS) being a primary example. LMSs can track events like course completions, quiz attempts, scores, grades, and user logins, generating xAPI statements at appropriate times. Other sources include custom eLearning course content, modules, and mobile applications. Regardless of the source, it’s crucial to consider “how” and “when” these xAPI statements are sent, and what your requirements are if any of these statements are lost.

For instance, if an LMS sends a ‘course completed’ xAPI statement and your backend is offline, or the client experiences a transient network issue, this xAPI statement may be lost. If you are using these messages to build user course completion trends, losing a few messages might be acceptable. However, if these xAPI statements are used to trigger business rules, message loss is not tolerable. For example, if a business rule awards “course completion certificates” to users, losing the “course completed” xAPI statement means the user won’t receive their certificate, leading to support issues and user dissatisfaction. Therefore, reliable instrumentation to ensure xAPI messages are sent and received without fail is critical.

Another important consideration is the timing of sending xAPI statements from the client side (such as an LMS or custom application). For example, if an article has 10 topics and 5 videos on a single page, sending an xAPI statement immediately after a user finishes each topic or video can result in at least 15 xAPI statements (REST calls) to the backend, which can be very chatty. Most default xAPI clients follow this approach, but it is chatty and stresses the client side, especially if users access content on mobile devices or in areas with poor network connectivity.

One effective approach is to buffer these interactions and store them locally on the user’s browser. When the user navigates away from one page to another, a single request can be triggered, including all 15 xAPI statements. This bulk sending of statements is highly efficient and addresses the aforementioned challenges.

However, this approach has its downsides as well. If a user makes partial progress on a page or article (with progress stored locally), and then leaves or their browser crashes, the partial statements stored locally will not be sent to the backend unless the user finishes the rest of the article or page. This means your backend might lag in capturing the user’s progress. If the user clears their browser cache, any unsent local messages will be lost. Despite these downsides, you must assess the trade-offs and choose the approach that works best for your needs. As an organization handling millions of xAPI messages daily, buffering and bulk sending xAPI statements has significantly helped us scale xAPI.

Another benefit of this buffering approach is easier handling of upstream failures, allowing for message resending until successful. If an xAPI statement fails to send, those messages remain in the local browser cache and are resent on the next attempt (e.g., after the next page completion). They won’t be removed until successfully sent. You can also provide a visual cue to the user about pending messages needing synchronization with the backend, possibly soliciting user intervention if needed. This method ensures visibility into failed messages and enables retries at appropriate times, enhancing reliability in capturing user interaction and progress information.

2.2 API Gateway

API Gateway provides several benefits in managing, securing and scaling API’s. Here are the key benefits:

Security: API Gateway serves as a singular entry point for all client requests, ensuring consistent enforcement of authentication, authorization, and HTTPS encryption across all APIs. It centralizes security policies and shields your backend services from direct exposure to clients. Various authentication methods such as API key authentication, OAuth 2.0 tokens, Basic Auth, and custom authentication are available. Depending on your organization’s security requirements, you can choose the most suitable authentication mechanism. Regardless of the chosen method, API Gateway guarantees that requests are authenticated (according to the chosen mechanism) before reaching your backend. Clients may receive a “401 Unauthorized” response otherwise.
Load Balancing and Routing: It handles routing of client requests to appropriate backend services based on predefined rules and policies. API Gateway can also load balance your incoming traffic across multiple backend instances to improve performance and reliability.
Monitoring and Analytics: API Gateway consolidates logging, monitoring, and analytics functionalities, capturing metrics related to API usage, performance metrics, and error rates. This enables administrators to gain comprehensive insights into API traffic and efficiently diagnose any operational issues.
Rate Limiting: API Gateway implements policies for rate limiting and throttling to safeguard against misuse and prevent APIs from being overwhelmed by excessive requests. This ensures fair utilization of resources and protects backend services, especially in scenarios where xAPI messages generate frequent interactions.
API Lifecycle Management: API Gateway supports the complete lifecycle management of APIs, encompassing versioning, deployment, and retirement processes. It offers tools for version control, ensuring backward compatibility, and facilitates seamless updates to APIs without disrupting client applications.
Scalability: Designed for scalability, API Gateway efficiently manages high volumes of API traffic by horizontally scaling resources based on demand. It dynamically deploys additional instances or nodes to evenly distribute workloads, thereby maintaining optimal performance across fluctuating load conditions.

2.3 Application Backend

Once the request reaches this point, it’s already been verified as legitimate. For handling large volumes of messages, your application can efficiently send them to a distributed event streaming platform such as Apache Kafka. Kafka is specifically engineered to manage extensive, real-time data streams. Producers can publish messages (events) to topics, which are then consumed by consumers in real-time. This streaming capability makes Kafka well-suited for environments where data is continuously generated and processed. Kafka seamlessly integrates with stream processing frameworks like Apache Flink, Apache Spark, and Kafka Streams, empowering developers to conduct sophisticated event processing, real-time analytics, and data transformations on streams flowing through Kafka topics. Once the messages are transmitted to Kafka, various consumers can then act on them according to their respective needs.

Now, when producing messages to Kafka, a crucial consideration is how to distribute your xAPI messages effectively. Each message in Kafka comprises two primary components: a key and a value. The key is an optional attribute linked to every message, serving two main purposes:

Message Ordering and Partitioning: Messages sharing the same key are stored in a single partition within a Kafka topic. This guarantees that messages with related keys are processed in sequence by consumers, maintaining the intended order. This capability is particularly critical when handling xAPI messages, especially for scenarios where business rules depend on the sequential processing of events. For instance, when a user attempts a quiz or exam multiple times, ensuring sequential processing prevents discrepancies in scoring. Actor’s email is often used as the partition key.
Partition Assignment: Kafka utilizes the key to determine the appropriate partition for storing each message within a topic. This functionality ensures messages are evenly distributed across partitions, optimizing parallel processing and scalability. However, achieving balanced partition distribution hinges on selecting keys that are ideally unique. UUIDs (Universally Unique Identifiers) are specifically crafted to maintain uniqueness across different contexts, making them well-suited for ensuring message integrity. Nevertheless, the choice of UUIDs or other key types should align closely with the specific needs and data characteristics of your application. Keeping this in mind greatly facilitates the scalability of xAPI message processing.

2.4 Kafka consumers

This is the pivotal moment. Once xAPI statements enter Kafka, they become versatile. Business rule consumers can process these events to detect specific triggers, such as ‘course completion’, and initiate corresponding actions, like issuing completion certificates. If these messages require transmission to Learning Record Stores (LRS), dedicated consumers can handle and dispatch them accordingly.

3. Database management

If you have reasons to send xAPI statements to an LRS and store them, managing a substantial volume of xAPI statements in a Learning Record Store (LRS) can pose several challenges:

Scalability: As the number of xAPI statements grows, the LRS must efficiently manage the increasing data volume. This necessitates robust scalability in terms of both storage capacity and processing power. The choice of LRS and its underlying storage mechanism, such as NoSQL databases, significantly impacts scalability. For instance, in my experience with open-source Learning Locker, which utilizes MongoDB, optimizing for Input/Output Operations Per Second (IOPS) is critical to minimize latency when interacting with its APIs for sending and receiving xAPI statements.
Performance: Swiftly storing and retrieving xAPI statements is essential to maintain responsiveness for users and applications. Inadequate optimization for handling large datasets can adversely affect query performance.
Indexing: Implementing appropriate indexes on fields commonly used in queries can significantly improve retrieval speed. For xAPI statements, fields like actor, verb, object, and timestamp are frequently queried and should be indexed for efficient access.
Query Optimization: Review and optimize queries to ensure they are well-constructed and leverage indexes effectively. Use query profiling tools to identify slow queries and optimize them accordingly.
Caching: Implement caching mechanisms to store frequently accessed data in memory. This can reduce the need for repetitive queries to the database, improving overall response times.
Partitioning and Sharding: If you are dealing with extremely large datasets, consider partitioning data across multiple servers (sharding) or logical partitions to distribute load and improve scalability.
Regular Maintenance: Perform regular database maintenance tasks such as index rebuilding, statistics updating, and data compaction to ensure optimal performance over time.

Conclusion

Scaling xAPI is a complex challenge that demands careful planning and robust solutions. By addressing the key obstacles and implementing best practices, we can fully harness the power of xAPI, capturing rich learning experiences and driving improved outcomes. It’s crucial to acknowledge the transformative impact that effectively scaled xAPI can have on organizations. It enables deeper insights into the learning process, customization of educational experiences to individual needs, and data-driven decisions that enhance both learning and performance.

Moreover, scaling xAPI is not just a technical task but a strategic one. It necessitates alignment with organizational goals, investment in resources, and a commitment to fostering a data-driven culture. By doing so, we can unlock the full potential of xAPI and create impactful, learner-centric experiences that drive success at all levels.

Related Articles: