Upgrading To Python 3 And Moving Away From Legacy

Published in

skai engineering blog

6 min readNov 13, 2019

Why do we want to migrate to Python 3?

We’ve all seen the the pip warning “DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won’t be maintained after that date. A future version of pip will drop support for Python 2.7.”

How do we get past this? (I tried adding the — quite flag and it didn’t help.)

The real question is “How does an organization upgrade all of its Python-using processes and applications to Python 3?”

Python is widely used throughout our organization. Use of Python ranges from automated processes orchestrating continuous integration and continuous deployment, to complex machine-learning applications.

To upgrade all of that separated code to Python 3 is a daunting task to say the least.

So why should we upgrade to Python 3?

As the pip warning states, Python 2 will not be maintained past 2020.

As a result, most PyPi packages that currently support both Python 2 and 3 have pledged to stop supporting Python 2 by 2020 (official pledge). Included in that statement are Python projects that Kenshoo heavily depends on, like TensorFlow, Pandas, Requests, NumPy, and more.

So how does this affect us?

After deprecation of Python 2, any project running on Python 2 with third-party libraries will not be able to receive updates of those libraries. They will miss out on security patches, bug fixes, and new features. This will greatly restrict the development of our research departments that depend on Python and Python open source projects. And the lack of maintenance for security issues is a big issue in itself.

Moreover, Python 3 has/will become the default Python interpreter for many Linux distributions. Python code executed using the default Python interpreter will fail if the code contains non-Python 3 compatible syntax. This may break processes when an image / OS is updated. It’s likely that we’ll update our systems which means that platform support of Python 2 will decrease. We could maintain legacy systems, however, this is best avoided. We always want to move away from legacy.

How do we SAFELY migrate to Python 3?

First, consider our obstacles

As described earlier, Python code is littered throughout our company’s code. This includes scripts that run deployment procedures to benchmark our product and microservice applications. Many of these procedures were written a long time ago, and there’s the possibility that their current users do not maintain these procedures or are not aware of their existence. If they are aware of their existence, they may have little or no experience with Python, and would not have the knowledge to migrate that code.

In addition, these applications, procedures, and scripts run in different environments, and those environments currently lack a Python 3 interpreter.

Lastly, we reuse our code. So a few different applications may depend on a specific internal Python package, and the different processes may need to migrate at different times. That means we could potentially break procedures by upgrading the shared code.

How did we tackle upgrading to Python 3?

Our first task was to add VirtualEnv and a Python 3 Interpreter to all images, slaves, and environments that run Python code.

As we mapped out all Python-executing environments, we also created an internal tool to find projects containing Python code and dependencies that are not Python 3 compatible. Our “Python Detective” tool is an automated Jenkins job. It can scan our entire code base to find problematic projects/repositories or can be executed manually with a specific repository.

The Python Detective Tool

The “Python Detective” job clones a project’s repository. After that, the job uses the 2to3 library (Futurize or Modernize are great alternatives) to convert Python 2 code to Python 3 syntax. The following command 2to3 -x import — write . runs the 2to3 tool recursively on the current folder. All non-compatible Python 3 files are converted to Python 3 code. We needed to exclude 2to3’s relative import checks so we used -x import. It’s important to mention that the resulting Python 3 syntax is not Python 2 compatible, and may break procedures running in a Python 2 environment.

We also used the caniusepython3 library to check that our dependencies were Python 3 compatible. Problematic dependencies that were found were marked.

The caniusepython3 library did not always determine if a library is Python 3 compatible. As an example, old versions of Fabric were not determined to be Python 3 incompatible.

After converting the code, our process would open a Pull Request with the suggested changes, and tag our team and the last committer to their project. This would trigger our CI process as well.

These PR’s were not meant to be merged automatically. They were used to locate problematic code, to allow us to plan how to upgrade to Python 3.

The repository owner had to decide what steps they would take to maintain, convert (using the 2to3 suggestions), and test the converted code.

Decisions, decisions, decisions…

Deciding between strategies

Not upgrading is a real option. It may be more worthwhile for you to maintain an old environment, then convert a project to Python 3. There is a cost-effect analysis that must be considered on an individual project basis. However, as mentioned before, we avoid maintaining legacy technologies. The repository owners were given a few strategies on how to safely upgrade:

Use the suggestions from the 2to3 results and upgrade or find alternatives to out of date dependencies. Run tests, deploy, and pray. Of course there is the option to add/improve tests. Adding tests is the preferred path, however, the cost value of taking on that overhead must be considered.
Scrap the old project, design, and create it again as a Python 3 project (possibly another language) with a new set of Python 3 compatible dependencies.
Make the project Python 2 and Python 3 compatible. This strategy was necessary for internal Python packages. Because internal projects would migrate to Python 3 at different stages, we needed to keep our internal packages compatible with both Python 2 and 3.

Where Python 2 and Python 3 compatibility was needed, we added the tox tool to test against multiple versions of Python. The addition of the tox package will allow us to port to future Python versions as well.

Dependencies holding us back.

External Python packages could be difficult to port.

A few of our package dependencies were not Python 3 compatible and therefore made projects difficult to port. We used the caniusepython3 to locate such dependencies. Here are a few examples of dependencies that were not Python 3 compatible and how we replaced those libraries.

One of our projects was dependent on the Lettuce project. The Lettuce project is no longer maintained and will not become Python 3 compatible. We replaced the use of Lettuce with a forked Python 3 compatible project called aloe. This guide helped us do this.

Many of our deployment processes are dependent on Fabric. Fabric has had a major API change since we started using it. The newer version of Fabric is Python 3 compatible, however, we would need to change a lot of code to include the latest API. Instead of converting to the new API we switched Fabric with the Fabric3 library. This library uses the old Fabric API but is Python 3 compatible. Projects using Fabric are now allowed a grace period to accept the new Fabric API.

We were also dependent on the suds library. We switched to zeep. The changes within our code to switch to zeep were miniscule.

Happily ever after… or until Python 4…

Python is a flexible language. That flexibility allows change and invites improvement. We designed a plan and enlisted all affected teams to take on this task and responsibility. Because of these factors, we were able to upgrade approximately 130/183 projects within 2 months. Yes, things did break, but we managed to fix them relatively quickly.