Migrating to Python 3

Published in

Preply Engineering Blog

6 min readJun 19, 2019

Background

Preply just migrated to Python 3. It would have been much easier to do so in 2014. Though, the lacking of libraries and projects supporting Python 3.X made us move forward with the second version.

Here’s why the first lines of Preply repository, dated July 2, 2012, were written in Django and Python 2.X. The adoption process of Python 3.X was still underway and Django wasn’t yet supporting it (Django 1.5 was released February 26, 2013).

As a data-driven tech company, we measure, make hypotheses, iterate, measure again and decide what’s next. And, we prefer speed over perfection. That’s why we had Python 2.X up till now.

Introduction

Our monolith project consists of approximately 250 000 *.py lines (~75 000 lines are unit tests) and a few other applications that aren’t written on Python 2.X. At a certain moment, we noticed that:

as time passed, the adoption of Python 3.X has increased and the libraries would no longer support Python 2.X branch.
hiring talent became harder — people want to work with the newest technology stack, which we were lacking.
we wanted to run our application on Python 3.X too, but we had lots of code written on Python 2.X without taking into account specifics of Python 3.X syntax.

This is why we decided to migrate to Python 3.X. Our main requirement was not to pause the current development process. We had to make sure our project is compatible with Python 3.X without breaking the compatibility with Python 2.X. We spent approximately 6 months on migrating to Python 3.X. The migration was led by 1 developer (with the help of other devs, of course).

In this article, we’ll try to illustrate all the various steps taken and a few more details.

Research

All good things begin with good research. Here’s what we started with:

Others have already written great articles about this migration. Take a look at Python 3 at Facebook and keynotes at PyCon 2017 about Python 3 at Instagram.

Step-by-step process

Good test coverage Fortunately, writing unit-tests at Preply is a common practice and we had enough tests. We felt comfortable changing some parts of the code since this wouldn’t cause lots of bugs on production. Of course, bugs can still appear in production, but it’s quite rare.
Compatibility of third-party packages As with any mature project, our monolith has more than two hundred third-party dependencies. caniusepython3 was used in order to make sure that all our third-party packages are compatible with Python 3.X. It checks “trove classifiers” of the package (or uses the manual overridden list in specific cases like when the package becomes a part of Python standard library).
But not all packages are Python 3.X compatible and not all packages have correct “trove classifiers”. For this reason, we had to manually check all packages for compatibility (usually the answer can be found in repository issues or commit history). If there was no compatibility, we added it by forking the package (this step will solve this problem for you) and proposed changes to the original repository (this step will solve this problem for the community). In our case, approximately 2% of Python 3.X packages were incompatible. These numbers were fair and expected.
Setting up a linter Unfortunately, we didn’t have full linter coverage (it only checks the pull request diff) of the project and this topic is too long to be described in detail here, so stay tuned for a separate article. Long story short, we used flake8 as Python linter. At this step, we gained 100% linter coverage of Python code (of course without system generated *.py files) and started running flake8 via Python 2.X on the whole project on every pull request.
Lint pull request diff on Python 3.X linter In additional to flake8 linter on Python 2.X that checks the entire project, we added flake8 linter running on Python 3.X as separate CI to check on diff between pull request and base branch. This step prevents the increased usage of syntax like xrange(), except Exception, e, unicode() and others that don’t exist in Python 3.X as in such case you will get undefined name or SyntaxError on CI. As a result, we decrease the amount of work with this code in the future.
Adding __future__’s from __future__ import absolute_import, division, print_function, unicode_literals were added consecutively, module by module as __future__’s can cause unexpected regressions on Python 2.X that require some time to find and fix it. flake8-future-import extension for flake8 was of great help here.
Removing Python 3.X incompatible syntax Take a look at django.utils.encoding and builtins module.

— Run linter on Python 3.X from Step 3 throughout the entire project. So unicode(<>) becomes from builtins import str; str(<>), whileraw_input(<>) becomes from builtins import input; input(<>), etc. Full list of compatible idioms can be found here. We’ve also used 2to3 with a specific list of fixers and with it map(<>) becomes list(map(<>)), etc.

— Run the project on Python 3.X and fix failed tests. But as was said at the beginning we want to have a Python 2/3 compatible code. And on this step for us can help django.utils.encoding module(highly recommend to read it). With the help of this module, hashlib.md5(<>) becomes hashlib.md5(force_bytes(<>)) and response.content becomes force_text(response.content).

Are we done yet?

Nope. There was one thing about the Migration to Python 3.X that required lots of attention and effort and certainly deserves a separate paragraph:

random.randint method problem

Here at Preply we use feature flags for writing new features and running A/B tests. Our implementation allows us to have the deterministic state of the feature for specific users. In the simplified version, it looks like this:

Coin flipping on Python 2.X

An important point here is that we don’t store flag_value — it depends on flag_name and uid. We were surprised when the code above showed different results on Python 2.X and Python 3.X. And the reason was random.randint method which behaves differently.

Comparison of output random.randint in Python 2.X and 3.X

But at the same moment, random.random() for the same seed shows the same result.

Note that in Python 3.X for seed also used version parameter. It helped with reproducing random sequences from the older Python version.

More details of this change were found at Issue9025 in Python bug tracker which gives an understanding of the reason why it was changed. At the same time fact that random.random() works in the same way for different Python versions give hypothesis that random.randint() can be just replaced by random.random().

We dug deeper and concluded that all we need is to slightly rewrite get_flag_value using random.random() instead of random.randint() as described below to make sure this code works equally both on Python 2.X and Python 3.X.

Coin flipping on Python 3.X

With six module these two snippets can be merged into one that is Python 2.X and 3.X compatible:

Coin flipping Python 2.X and 3.X compatible

Migrating to Python 3 brings happiness…

…to engineers

Possibility to use new language features and be happier. And who doesn’t like using the newest and coolest technology?

Time spent in Python according to the New Relic

…Preply customers

As you may see on the graph from New Relic above — average server response time has decreased from 113ms to 90ms. Our users and crawlers are now a little happier too. ❤️

Next steps

We’re now thinking about another migration, this time on Django 2.X to get the ability to use such new framework features like Model view permissions and Constraint classes or adding mypy as a static type checker.