Why we rewrote Pydio in Golang

Pydio Team has recently announced Pydio Cells, enterprise-ready filesharing software written in #Golang.

Image for post
Image for post

This is a major milestone in a long story, started ten years ago with AjaXplorer, an open source software I developed for easily sharing music files with my reggae bandmates, at a time where cloud storage solutions like Dropbox did not exist. Since then renamed Pydio and turned a full-fledge Sync-and-Share platform, the application stuck to its initial technical stack: LAMP, as in Linux/Apache/MySQL/PHP. This choice led us to many creative tricks to make the most out of PHP, but at one point the developer team felt we were definitely hitting the limits.

This article will dive into the motivations of rewriting entirely in Go, introduce the new architecture and the choices behind it, and tell how that transition went along (a.k.a lessons learnt). This is the first of a series where we will expose the new architecture in more details.

Image for post
Image for post
AjaXplorer / Pydio versions timeline

PHP: a love/hate relationship

Don’t get me wrong: this article is not about PHP-bashing. PHP was and is still a great language, but as any language, it’s a perfect fit for certain tasks, and less for others. PHP community has been extremely active over the years, the last version (7) saw big performances in execution time, great frameworks like Symfony help with code maintainability and productivity, and PHP can definitely be the language of choice for developing a website or an evolving backend for a rich web application.

Simple to learn, weakly typed, “save-and-refresh” scripting approach, PHP is perfect to easily on-board developers, designers and open source contributors on a common repository. But at one point, managing files for sync-and-share is not just about “save-and-refresh” coding.

Pydio is deployed on-premises for fleets of thousands of users, each of them accessing to their files (that are getting bigger every day) from many devices (web, desktop, mobile), and these users are used to a Dropbox-like level of usability and features: ubiquitous access to files, easily sharing with internal or external users, real-time notification, metadata extraction, etc.
On the Administrator side, it’s all about keeping the control of the company data, complying to the internal security policies, making sure that data and their access are always consistent. Not talking about #gdpr and #cloudact…

Implementing these expectations in PHP over the years ended up in a kind of dependency-crippled software:

So naturally came the time where we were just… fed up with PHP.

Looking at emerging technologies, we started small : in 2016, we introduced Pydio Booster, a Golang dedicated “companion” for the main platform, in charge of alleviating the burden on PHP shoulders. It handled downloads and uploads in a separate process, and provided an embedded messaging protocol and a websocket server.

But let’s face it, once you try it, you can hardly look back, and we soon got excited about taking over the whole code in Go.

Rewrite goals

So we took a step back: if we were to abandon the tons of LoC written in PHP, what were exactly our objectives?

Golang Pros / Cons

With that spec in hand, we looked at Go with a fresh eye! The following aspects of the language were the key-factors in our choice:

Of course, compared with other languages, Go still has some flaws:

The Go community is very active, and in just the last couple of months, they introduced new versions or new specs for next version, to fix exactly the points listed above (Go module for dependencies, error handling to be reworked in Go2). So we guess the bet was a winning one!

Again, each language has its perfect usage, and I would currently not advise a web-agency to fully switch to Go. But for writing a super-performant backend for a REST Api, or an application to manage files over a network, Go does the job perfectly.

So after looking at other options as well, (Rust — too low-level, NodeJS — too weakly typed unless you use a “transpiled” layer, Java — no just kidding, …), Go was definitely our choice of heart.

Breaking the monolith

Once we settled on the language choice, we could go to the next step: designing an application that would meet our requirements!

While our PHP codebase was very decently organized, plugin-oriented, and had been refactored many times over the years, it was still monolithic: running any sub-feature of the application would require to “run” the code as a whole. When such an application grows in features, its complexity will inevitably grow along, and after some time this can lead to two major issues :

In the last years, multiple long-term trends in software engineering (SOA, Agile development, DevOps…) led to the concept of micro-services: instead of managing a huge project, all aspects of an application are split into many much smaller projects. Each brick is in charge of a very specific feature, and is specified to run as an independent application: it implements its own persistence layer, its own API for communicating with outside world, its own way of loading configurations, etc. Services can even be written in different languages, as long as the API contract is honored.

Communication via API’s strongly decouples the services definition from their actual implementation. Technical debt is under control, and code can easily evolve. By monitoring load on each service, bottlenecks are easily detected and horizontal scalability is performed on-demand.

Heavily promoted since 2015, there are plenty of articles out there about the micro-services architecture. Amongst other, see the patterns bible Microservices.io. It is worth noting that the Microsoft Azure documentation provides very explicative articles about micro-services and cloud-oriented patterns. Finally, working on the new architecture, we also decided to stick to the 12-Factor application methodology.

Pydio Cells architecture overview

Behold! The schema below shows how Pydio Cells is designed (click to make it bigger).

Image for post
Image for post
Cells General Overview

Although our final binary self-contains all micro-services, each one can be run as an independent process (on its own server, vm or container). They communicate with each other through various channels: GRPC (a performant RPC protocol using Protobuf serialization and running on HTTP/2) for synchronous or streaming requests, an Event Bus for PUB/SUB messaging, and standard HTTP REST apis. Starting from top, we can distinguish 4 categories:

So we could now bring answers to each of our requirements:

Although this can looks frightening at first sight, at the end all services can be started on one machine with one simple command line :

$ ./cells start
2018-10-14T13:40:14.966+0200 INFO nats started
2018-10-14T13:40:14.975+0200 INFO pydio.grpc.log started
2018-10-14T13:40:14.982+0200 INFO pydio.grpc.data.objects started
2018-10-14T13:40:14.993+0200 INFO pydio.grpc.user-key started
2018-10-14T13:40:14.996+0200 INFO pydio.grpc.policy started
2018-10-14T13:40:14.997+0200 INFO pydio.grpc.acl started
2018-10-14T13:40:14.999+0200 INFO pydio.grpc.config started
2018-10-14T13:40:15.047+0200 INFO pydio.grpc.meta started
2018-10-14T13:40:15.062+0200 INFO pydio.grpc.user-meta started
2018-10-14T13:40:15.924+0200 INFO pydio.grpc.update started
[...]

Assuming you installed it on https://cells.yourcompany.com, opening this URL in your browser gives you a working instance.

Image for post
Image for post
Cells login screen

Mission accomplished!

Lessons learnt

Of course, this transition was a long journey, we made beginner’s mistakes and fixed them along the way, but now the whole team is really proud of this new product. Along with Continuous Integration and testing automations, we are pretty confident about the quality of the delivered code. Here are some lessons we learnt from this incredible adventure:

To be continued

In the next articles, I will try to go deeper in the architecture and show how we carefully designed each concerns of Pydio Cells. If you are interested in reading the code and eventually contributing, you’re welcome! It all starts onGithub (https://github.com/pydio/cells) as well as in our developer’s doc (https://pydio.com/en/docs/developer-guide)

Thanks for reading!

Written by

Pydio founder — Software Engineer — Gopher — Open Source Advocate — Jazzman

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store