Migration of Old Projects to Python 3
How to deal with 2to3, tests, incompatibilities and dependencies
A while ago I’ve self-assigned the daunting task of migrating an ancient API project written in Python 2.7 and Django to the latest stable Python 3.7 and Django 2.2. What at first seemed to be a piece of cake, turned out to be a Hydra, the multi-headed dragon, with its dependencies acting like an octopus.
I should admit that it was no easy task. I remember days I was banging my head to get that tiny fix to work as expected. But it all went well, hopefully, and I was able to migrate the project into an almost stable piece of code that just works.
As far as I know, the countdown of sunsetting of Python 2.7 is nearer than it appears. It sadly means that the version 2 flavor of Python won’t be maintained beyond 1st January 2020, which is inevitable. This date would be the official end of life of Python 2. This calls to an action to try to avoid it in the very near future for all companies using it.
The maintainers and the community worked hard to make everything compatible with the new Python 3 counterpart. The readiness score of Python 3 packages compatibility is eye-catching and a growing number of packages already has support for both version, while some completely switched to Python 3, specially those libraries written for newly widespread platforms, like shiny new database drivers. There is a timeline for many important packages that will Python 3, and would drop support for the old Python 2.
Writing code against both flavors of Python and maintaining that code-base is really hard, since the maintenance cost is high. It is even more costly when you want to develop a library, since actual projects can stick with a specific flavor. There is always a trade-off between choosing the old version and having the latest greatest features. This being said, so much can be ported to the old version, and eventually portability becomes an issue itself. For almost every single package you find in Python 3, either in the standard library or in third-parties, there is a backport to Python 2, but this doesn’t hold true for the new features of Python 3. Two simple example would be the very great asyncio and aiohttp libraries.
As I said above, migration is no easy task, meaning that you cannot just change your interpreter version in your configuration file and headers and expect it to work right away. You should plan, analyze and iterate through the code-base to have everything migrated. I’ll explain each in greater details.
In the middle of doing this, I felt the need to have some search mechanism that can understand the structure of the Python code, so that I’ll be able to search inside imports in each file in the project, in order to able to understand where each module is actually used. So I’ve written a script that maps all python files inside the project to a nested dictionary, with its values being imports in that specific file. It then gives me the ability to search with a XPath-like syntax (via dpath). I’ll make it available soon as FOSS. It’s called “Algae”.
Planning involves knowing the extent you wish your project to be updated. Sometimes you want to stick with an LTS version of some library and other times you want the latest cutting-edge features. This trade-off of stability versus trendiness is what to look for while planning for the migration.
This planning should also take into account the business side, although the focus of this article is on the engineering complications. Some changes should be organized and changed with the help of different departments. Otherwise it just won’t work. You’re going to deliver the same value or more to the customer, while decreasing technical debt and maintenance costs.
Analysis is the next step. You probably are already familiar with the project you are working on, or at least have worked on some parts of it and you know what it does. Analysis is to know what to keep, update or get rid of. Sometimes the project has grown organically and you no longer use some parts of it, like old API versions. Now is a good time to rethink. Sometimes a rewrite would be a more viable option, since the code-base is so messy that you no longer want to touch it. Other times, polishing and migration of the same code-base is easier and less costly. Whichever path you take would reveal your next moves.
Some changes include update to the backend services, like an update to a database driver that no longer supports outdated versions of that database. Some would break the tests, undoubtedly. This is what to expect while analyzing your code-base. You can’t just update the code and expect it to work without touching anything else. You have to think about it thoroughly.
Iteration is the actual work you want to do to ensure that you have an up-to-date application. It is called iteration, since you cannot move everything all at once. It is truly infeasible. Just don’t!
On each iteration, you actually take action by changing each part meticulously. Some parts are easy to change and require minimal amount of work, like updating small functional dependencies. Yet there are parts that require a lot of effort and time to change and test.
1. Use branching to isolate changes
Each change should go into a new branch, e.g. git branch. This is to decrease clutter and friction with other parts. This would help narrow down the scope of change and would help you easily revert breaking or malfunctioning changes.
It is okay to build on top of the same change by sub-branching from the same branch you used to work on, and which is tested and worked as expected. You cannot push people to review your code changes, although being subjective, except the company’s future depends on it.
2. Upgrade everything to latest version supported by Python 2.7
The first iteration, besides version control and branching, would be to update dependencies to the latest version supported by Python 2.7. This helps find some bugs and alleviate many headaches afterwards. For example, the latest long-term support (LTS) version of Django supporting Python 2.7 is 1.11.x and of Django REST Framework is 3.9.x. This means that the official support for these libraries on Python 2.7 is dropped and no longer available. It’s safe to say that upgrading to these versions has the least friction, unless your existing Django and DRF is ancient and unsupported.
Sometimes the changes in each version of Django and/or DRF is not backward compatible, so you have to keep an eye on Release Notes and Changelogs to adapt your code to the new changes.
3. Monitor tests, Continuous Integration and coverage
In this step, you should ensure that your tests pass and your coverage is not decreased dramatically. One of the biggest technical debts are not having enough tests, which would come to surface with users complaining about failures and developers seeing flaky test runs while updating code.
If that’s the case, consider writing more tests and invest more on CI workflows and pipelines. Of investment, I don’t necessarily mean money. Sometimes correcting a broken pipeline would help alleviate many issues in the future.
The next step is to monitor your test results to see if there are errors and warnings related to your code. Yes, I’ve seen tests pass, having many errors and warnings that are simply ignored. Test runner should be chosen wisely. Although the “unittest” standard library module is feature-rich, I do prefer “pytest” as a more flexible and extendable counterpart. This obviously doesn’t stop you from choosing your preferred test suite or even writing your own.
Your tests should not only include unittests. Integration, E2E, acceptance and performance test are just some of all the available test methodologies.
4. Run 2to3 on your code-base
After upgrading to the latest versions supported on Python 2.7 and testing to see if everything’s in order, now is the time to run the 2to3 tool on your code-base. This is a tool provided as part of the Python standard library to read the Python 2.x source code and transform them into a valid Python 3.x source code, through the so-called fixers. Fixers are changes to the syntax and semantics of the source code from Python 2.x to 3.x. For example, in Python 2.x, it is not necessary to have parenthesis around the
print() all over your source code. Test again and again and make sure not to mess things up by not using branching.
5. Change your interpreter and package manager
Now is the time to change your Python version to 3.7. Usually this is done by either installing it using a package manager, like APT, or just changing your Dockerfile
FROM statement to include
This is the trickiest part, since many tests may fail, due to incompatible requirements and dependencies.
I do also recommend you to upgrade your package manager,
pip, to the latest version.
6. Upgrade your dependencies
It is a good idea to check your dependencies before upgrade by running pipdeptree on the
requirements.txt file to detect circular and conflicting dependencies. The next best tool is the pip-update-requirements that helps you upgrade all dependencies in the
Some dependencies are not easily upgradable, either because they are no longer supported anymore, specially on Python 3.x, or newer libraries have replaced them. For example, pycassa is an old database driver for Cassandra supporting only Python 2.x, with its last commit on 17 January 2017, which only supports Thrift protocol and there is no CQL support. But the official DataStax Python Driver replaced it deliberately supporting 2.7 and 3.x, which is easier to use and has more features and also is based on CQL.
Some other libraries have different issues. For example,
python-social-auth package is deprecated and you should upgrade all your dependencies to
social-auth-app-django, in which you have to change many parts of the
settings.py and your imports.
Some other libraries are replaced with better counterparts. For example, the
python-memcached library has many other more up-to-date alternatives like
7. Upgrade Django and DRF to latest version
The Django release process is a good resource for knowing what to use and when to deprecate and get rid of old versions. The release roadmap on Django download page will give you a very quick view into the lifetime of each version.
As of now, the latest LTS version of Django that only supports Python 3.x is 2.2.x and the latest version of DRF is 3.10.x.
If you could eventually pass step 6, you can almost easily upgrade these versions and test if everything holds together.
Upgrading Django from 1.11.x to 2.x is a lot more painful than I thought it would be and it needs more time to be invested to get everything work correctly. There are removals, deprecations and changes that you should be aware of, to not break your system. Consider doing the upgrade step by step, otherwise you’ll be in big trouble hunting multiple unrelated bugs and the whole process would quickly become exhausting.
I hope you got an overview of what to expect while upgrading your Django and DRF application to the latest versions supported by Python 3.7. If you have any questions, comments or improvements, I’ll be really glad to hear them.