Part II: Inter-process communication in OpenChange Asynchronous Notifications

Julien Kerihuel
5 min readApr 22, 2015

Introduction

Among the technical decisions that had the biggest impact on the architecture, the approach to inter-process communication has been the most important.

The problem is not about implementing the best solution but rather find the most harmonious shade of grey

The complexity of the asynchronous notifications architecture in OpenChange lies in the number of links, components and services affected by the scope of this change. It is a complex workflow and rather than trying to find the best solution for every aspect of the problem, we have taken a global approach where drawbacks and their impact are negligible with regards to the benefits.

We mainly had 3 kind of communications to handle:

  • between different OpenChange services
  • between OpenChange server and external services such as dovecot
  • between OpenChange server and Outlook

While the latter is pretty straight forward and use existing channels, we had to design and implement a solution for the first two items.

Communications to handle

Inside OpenChange Server

In usual on-premise deployments, both the emsmdb and asyncemsmdb endpoints services from OpenChange would be running on the same machine and Unix domain socket would probably be a no-brainer to manage these inter-process communications.

However OpenChange is also deployed into scalable cloud infrastructures where MAPI traffic is load-balanced and distributed among different nodes. With this architecture, two connections from the same client — for example one for emsmdb traffic and the other one for asyncemsmdb — may end up on two different nodes.

We could have tried to solve this load-balancing issue at the routing level and force every connections from the same client to be redirected to the same node, but this approach also had some pitfalls such as the sizing the connection pools for every client or anticipating how each node would have to be loaded to maximize the number of connections and limit consumption of resources.

We needed a global solution, hence a third-party service

It was decided to rely on a unique deployment approach working for every use case and to depend on an external service to manage communications between emsmdb and asyncemsmdb.

This service was expected to run in a trusted environment, to share transitory information and deal with small payload with a limited lifetime lasting at best for the duration of the Outlook user session. The choice of the service was made driven by the following considerations:

  • Using a dedicated messaging queue service such as rabbitMQ would have required to add yet another service and would have made the architecture more complex. We also didn’t particularly need any authentication system and the communication model to implement (push/pull) was pretty basic. Using RabbitMQ to this regards therefore sounded over engineered. Furthermore rabbitmq client libraries are generally packaged with OpenSSL support (for example on Ubuntu) and this produces a licensing issue with software released under the terms of GPLv3 or later. The case of custom messaging queue service built on top of specific network libraries (zeromq, nanomsg) will be covered later.
  • Using a database service such as MySQL has also been considered, but MySQL primarily aims at storing persistent storage and would have added a substantial load to the existing infrastructure that would have had to be addressed sooner or later. The number of read and commit operations would exponentially increase as the number of users grows. This would have been a bottleneck preventing the solution from scaling.
  • Using a key/value store solution such as Redis would have made sense but once again would have required to add another service to the architecture. Furthermore it would have required more investigations to figure out the store that would have best fit our needs, and as discussed in the introduction, this kind of optimization was out of the scope. It was therefore gently discarded.

A deeper look at the existing OpenChange architecture show that we already have a service used as a transitory cache. This service is memcached and is already used to cache mapistore indexing data and SOGo information. It was therefore decided to rely on this service again.

Memcached usage was then extended in asynchronous notification framework to cover internal architecture needs and:

  1. share global dcerpc session handle between dcerpc services
  2. store notification subscriptions from a user within a key to be accessed by emsmdb and asyncemsmdb
  3. transfer notification payload from asyncemsmdb to emsmdb — ready to be consumed by Microsoft Outlook.

Between OpenChange and external services

The other type of communication to handle was between OpenChange server and external services such as dovecot. The kind of information and how we interact with it is very specific but also very simple. It is all about blobs of data that tells the server when a new email has arrived, a new object was created, an existing object was modified, moved, copied or deleted.

There is therefore nothing fundamentally complex about these notifications. At worst the payload to generate and send should hold information about the source folder and message, destination folder and message along with other configuration data, but this is pretty much all.

Furthermore we are relying on information with a limited lifetime. It is only relevant for a short period of time and while accurate dispatching of notifications is critical to run a mailbox in a production environment, loosing one notification among many is not going to have any annoying consequences for the user or a business.

Wasted notifications is acceptable by design

If we needed something that could not suffer any loss, we would have looked for a resilient FIFO message queue where every notification sequentially has to be processed, but the current approach is more relaxed with regards to this approach and so is Microsoft Outlook.

Conclusion

We are therefore decided to rely on a message queue, but more aspects need to be detailed to understand the global architecture, such as:

  • What is the message queue approach that would best fit our architecture approach: a broker used a hub and proxy or a peer to peer approach?
  • Which network library to use as a transport layer for notifications? amqp, zeromq, nanomsg, custom service over plain bsd socket?

These are the questions to be answered in forthcoming articles of this series.

Read Next Part

--

--

Julien Kerihuel

Founder and Lead Architect of the OpenChange project / Organizer of ProtocolsPlugfest Europe conference