Pull vs push architecture for Mobile

Abhishek Kharb
Microsoft Mobile Engineering
8 min readOct 10, 2023

What is pull vs push architecture for Mobile?

Almost all mobile applications need to regularly sync data from servers to be able to show their users fresh information for consumption.

Photo by Claudio Schwarz on Unsplash

This data sync can be achieved in broadly two ways:
1. Pull based model — The pull-based model requires apps to initiate an API call in order to fetch the latest data.

This call is usually initiated by some user action defined by the programmer. It could be when a user navigates to a particular screen in the app or performs a particular action (like pull to refresh), or it could be based on the polling technique.

Broadly, polling can be achieved in one of the two following ways:

  • Short polling: Sometimes referred to as Regular polling or Standard polling, in this case the client makes an HTTP call to the server at regular intervals of time. As and when the data is available, the server returns the data back to the client, until then the server just sends the same old data or empty response depending upon the implementation.
    One of the major drawbacks of this technique is that since the data is refreshed at regular intervals by the client, it results in redundant calls when no new data is available and stale data if something changes on the server before the next call is made.
Short polling-based mechanism for data sync.
  • Long Polling: This technique aims to address the redundant calls made by the client in short polling that return no new data from the server. In this case, the client makes an HTTP call to the server, but the server doesn’t immediately respond if there is no new data available. Instead, the server waits until new data is available (or until the request times out), after which the server responds with the data. As soon as the client receives data from the server (or after the request times out), it sends another request (or after a certain cool off period), and the same process is repeated.
Long-polling based mechanism for data sync.

2. Push based model — Push based model for mobile is a way of communication where any updates in data are communicated by the server to the mobile device. This usually happens either over a socket connection established between client and server or by using specific push frameworks provided by the mobile platform (APNS for Apple devices and FCM for Android devices).

A client usually registers itself for such communication as a one-time process. The server then registers that client for all the subsequent pushes. Any change in data on the server side is then communicated to the client via the aforementioned socket connection or Push frameworks.

Such a model has multiple advantages over the pull-based model. The client need not constantly poll to get fresh data, thus saving the client from making multiple API calls and using the network bandwidth unnecessarily. Even if the client app is not active, push frameworks can wake the app in the background with all the new information that can then be processed by the client.

Push based mechanism for data sync.

Existing architecture for Teams Mobile Calendar

Calendar is one of the most essential parts of any productivity app. At Teams Mobile, we’re heavily invested in ensuring that a user’s Calendar always stays up to date, they are informed about all the new Calendar events generated in real time and any changes to existing events are synced to their mobile devices.

Our calendar architecture relied on a pull-based model for any updates to the user’s events. Each time the user would navigate to their calendar, fresh data would be fetched from the servers and updated on the user’s app. We saw this an opportunity to improve the experience further by moving to a push- based model!

There were two major challenges in moving to a purely push-based model that we encountered:
1. Large size payload for Calendar events — Each calendar event is a large and complex entity. It contains a variety of fields ranging from the title of the event, the description of the event to the information about all participants of the event etc. This constitutes to a large payload which sometimes could be bigger in size than the payload limits enforced by push notification frameworks like APNS, FCM etc.

2. Push for each Calendar event update — Just as complex is a calendar event entity, there are a great different number of actions that a user could take which could trigger an update in a calendar event. It could something as trivial as the change in event time, to a participant performing an RSVP action to the event. Sending a push to a mobile device when the app is not active consumes battery. The platform itself has its own checks to ensure no single app does too much/ too frequent CPU intensive work in the background.

Photo by Nubelson Fernandes on Unsplash

These challenges made us go back to the drawing board to understand how we could enhance the user experience without consuming too much of system resources and ensuring that push-based updates are meaningful and relevant to the user.

A hybrid approach.

We could clearly glean the value added by moving to a push-based model for the Calendar experience, but it had to be implemented intricately. While we couldn’t rely solely on polling triggered by user actions, we couldn’t also fully move to push notifications for every little update owing to the above-mentioned reasons.

Hence, we moved to a hybrid approach!

We used push framework as a way of letting the clients know that a certain event has been created/ updated. That’s all the information that we send over push notification. Once the client is aware that there is new information to consume, the client would make a REST API call to get this new information for the events that have been changed.

This provided us with 2 major benefits:
1. The payload size of the push notification would remain in check as we would not send all the event related information, only the unique event identifier which would help the client to identify the event for syncing.

2. We could also control what event changes would require the client to immediately sync the data, while what event changes could be deferred for sync at a later point in time depending upon the priority of the change. For example, a change in the start time of an event would need an immediate sync whereas change in the RSVP status of a participant could be synced at a later stage.

Overall mechanism of the hybrid approach.

Thus, our push-based model would be more resilient, less likely to be flagged by the platform for consuming too much memory or performing CPU intensive activities, while bringing a better user experience at the same time!

Measuring our success!

While all the above does sound good in theory, we wanted to quantify the difference this change would bring to our users! We wanted to measure how much value this change has actually added!

Photo by Markus Winkler on Unsplash

So, we defined a new metric, Push Sync Score!
In order to measure any change, the first thing we needed was to define a baseline! Something to measure the change against! Something to compare with!
We asked ourselves, how do we measure the current polling model? Since in the existing model, the events for the calendar would sync via polling when the user visits their calendar, what we had to do was to measure how many updated events (updated events here refer to any new additions, deletions or changes to an existing event) does an average user sync when they visit their calendar?
That was the direction we decided to explore! We would first calculate the number of updated events that a user syncs each time they visit their calendar in a given time window. This would provide us a score for that user using the polling mechanism.

Let’s say the number of events that were updated via a user when visiting their calendar be x.
Let’s say the total events during that time window be y.
Push Sync Score for the user would be defined as x/y.

With polling, the only way for a user to sync their calendar would be when they visit the calendar. So, the above ratio would indicate how many changes a user has in their calendar on an average.

Now, what happens once we introduce a push-based model?
Each time there is any change in a user's calendar, a push is sent to the client with that event’s identifier and the client syncs the event to get the latest information. So, the user doesn’t need to go their calendar to sync the latest information, but only to consume the already present latest information!
So, the push sync score for the same user would start tending to zero, as the push architecture would have already synced any updates that would have taken place in the user’s calendar!
Note: Here we say that the push sync score starts tending to zero and is not actually zero, mainly because of the reason that even though our push payloads are small in size and not as often triggered, based on the memory available, device battery level and multiple such other factors, the OS can still choose to not deliver a push to the client.

Photo by krakenimages on Unsplash

Impact of the changes made over time!

Graphs showing reduction in Push Sync Score over time.

Final thoughts!

Both pull and push architectures have their own set of advantages and disadvantages! There’s no reason to unseeingly write off either approach in favour of the other! It all depends on use case to use case! For example, if your data doesn’t get refreshed very often, you might not find the need to invest in building the push-based architecture. Or, if your data changes only by small increments, you might not want to keep polling and just send it over as a push notification.
For us, the perfect balance lied in the middle, the hybrid approach! The equilibrium between pull and push, that brought all the delight we wanted to bring to our users!

--

--