Building messaging infra for Powow: Part 1
Requirements, Technologies and Solutions for backend service
At Powow, over last 2 months, we have been building a messaging platform. The platform facilitates targeted discovery of people, events and public service providers in localities. We decided to list down various points that we realised in the process.
Requirements
Functional requirements from backend service:
- Private chats
Allow people to converse individually after discovery on platform. - Group messaging
Segmentation of conversations supporting dynamic participation. - Real time events
Send online/offline/typing/stopped-typing events to the recipients. - Image Sharing
Manage pictures shared on the platform efficiently and optimise image loading based on quality requirements. - Notifications
Facilitate sending notifications, when app is closed or inactive.
Non Functional requirements from backend service:
- Horizontally scalable
Traffic on messaging platforms, having network effects, can explode in short period of time. Needs design with that traffic consideration. - Lightweight connections
Connection frequencies are short and frequent for messaging, expecting them to be setup quickly and maintain without much network overhead. - Near real-time
Guarantee of round trip time of less than 100 ms to server is expected, absence of which turns user experience sluggish. - Geo-spatial queries
Powow needs to avail location based discovery on the move for people and requires efficient methods to find them quickly. - Easy debugging
Early stage product development is extremely dynamic and iterating quickly requires a decent way to discover details when they go wrong.
Technology
After brain storming and looking at a bunch of open-source options, we finalised on these technologies to build our backend platform using:
- Go (language)
Although considerably young, there are a lot of open source tools/libraries already available. It boasts of having excellent support for concurrent programming, concurrent programming is inherently built into the language. Go seems to be a good fit when building highly concurrent systems like messaging (with group, broadcast, one-on-one chats). - Gorilla (Websockets)
Be careful in choosing a Websocket library. Not all of them are RFC compliant, gorilla is. If your library is not compliant you might have a hard time integrating in a non-homogenous environment. There are intricacies like handshake/ping-pong/keep-alive that needs to be taken care of. - BeeGo (Framework)
BeeGo is an application framework for Go. It provides templates, logging, profiling, configuration, ORM etc all builtin to the package. - HAProxy (Load balancer)
It works as a reverse proxy on the servers and provides insight into how http / websockets are communicating with server. - MongoDB (Data storage)
Provides support for geo-spatial queries. As we often need to find out people within an ‘x’ km radius from the users latitude/longitude. We might have to relook into scalability issues we might face in the long run. - Hystrix (Real-time monitoring)
An open source library by Netflix, provides connection pooling and fault tolerance when operating in distributed systems.
Apart from the above mentioned, other 3rd party services we are leveraging including Amazon SES (eMail campaigns), Ionic Platform (app analytics).
Web sockets & Stateful Architecture
Unlike HTTP APIs, in websocket the connection needs to be maintained by the application server which make it stateful. In a distributed system, there could be multiple application servers, which leads to a situation where a message is to be sent over a websocket, connected to a different server all together. The server now needs to communicate with each other to get the work done.
The way we have solved inter server communication, is by moving the socket management package out of the application logic and by creating a messaging pileline between application servers. So if a server, say server1 wants to send out a message to websocket, say ws1 which is connected to server2, server1 can write a message in the respective queue/topic which will be read by the concerned server and the message will be dispatched via the websocket (ws1).
Realtime Monitoring
Investing in SOA is not fruitful if you do not have deep insights of the underlying systems. Hence from day 1, we invested in building a system which is easy to debug and resource efficient.
Hystrix provides deep insights on how the downstream services are performing. The number of threads executing, response times, thread pool rejection, timeout are all provided in an amazing user interface.
While Hystrix gives insight into downstream services, BeeGo provides information about the application server itself. The application stats can be pushed to StatsD as well, where you can look up historical and aggregated data.
The above mentioned dashboard along with HAProxy, gives us ample information on how systems are behaving and reduces our debugging time.
To see all Powow components, check us out on AngelList. Next part of the blog series will dive into more details about performance benchmarks, bottlenecks & possibilities of the stack being used.
In case you have any suggestions / feedbacks / reviews, please reach us out at hey@powow.info, we would love to hear from you.