Migrating to Python 3: The HealthifyMe Experience. (Part 2)

Manmohan Sharma
HealthifyMe Tech
Published in
8 min readDec 16, 2020
We moved all the fishes 😀

Hello and welcome back here. This is part 2 where we will explore how HealthyfyMe moved to python 3 without having downtime with ongoing development. If you missed the part-1 where we explored why you should move to Python 3 and rare case difference in Python 2 and Python 3 and Compatible Solutions for those.

Introduction

We have approximately 12–15 developer backend team. This is our base project with 7–10 build release daily, having bug fixes, improvements, security fixes, new feature development etc. Our main challenge was not to stop the current development process. We had to make sure our project is compatible with Python 3.X without breaking the compatibility with Python 2.X.

Migration is not an easy task, it’s not like you change your configuration file, settings files and it will work as expected. It’s a continuous process where we have to plan, analyze, test and iterate through code base each time we migrate. We will explain each of these details here.

Migration Strategy

Migration Strategy

Plan/Research

If your application is small and can be refactored quickly, just start fresh and re-write the code using Python 3.
This was not the case for us. Our application is big, works on a good scale and developers are working on different parts of the code base, you’d need a functioning application at all times. We have to plan first how we are going to migrate such a large project. We listed down all the components, scripts (python scripts for staging and prod), environments, internal and external apps where we need migration.

We spent the time to understand/research on —

Analyze

Once we spent time on research and we find out some of the existing automated tools that we used for this migration.

  • caniusepython3: This package takes in a set of dependencies and then figures out which of them are holding you up from porting to Python 3.
  • 2to3: Automated Python 2 to 3 code translation.
  • six: Package intended to support codebases that work on both Python 2 and 3 without modification.
  • Python-Modernize: This library is a very thin wrapper around lib2to3 to utilize it to make Python 2 code more modern with the intention of eventually porting it over to Python 3
  • python-future: It allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead.

Apart from this, we created our own wrapper for making code compatible with python 2 and python 3. That we explained in our part-1 where we mention how some feature works differently in Python 2 and Python 3 and what is the compatible solution for such cases.

Migration Process

Isolated Git Branch:- We followed the basic software ethics of keeping each new changes in different git branch. For better understanding for each change, we create a new git branch having prefix py3- .

__future__s Adding:- We added require future import to each python file to make code compatible in both python version. We added from __future__ import absolute_import, division, print_function, unicode_literals based on file code require in the python file.

Compatible third-party packages updated:- We are having more than 180 third party package dependencies in our project. There were packages that are compatible with both the versions like django(1.11),simplejson(3.8.1), Requests(2.22) and some packages that are not compatible like redis, django-cacheops, django-fake-model etc. We updated package that is not compatible with both the version and created a different requirements file for python 3.

Six package:- The conclusion was to use six, which is a library to make it easy to build a codebase that is valid in both in Python 2 and 3. We used six package functionalities like six.iteritems, six.moves.range, six.moves.urllib.parse.urlencode, six.moves.zip, six.with_metaclass,six.text_type, six.string_types, six.moves.urllib.request, six.viewkeys, six.StringIO,six.moves.html_parser etc.

Custom compatible wrapper:- As we mention in the blog part-1 there were still in a lot of places where we need to write our own compatible solution that can work in both python version. We created two compatible layers one for the Unittest case and other for our main codebase. We replaced the newly created method in exiting codebase. Some of the methods are like


def isPY3():
"""Check the current running version is python 3 or not."""
return True if _PY == 3 else False
def base64ify(bytes_or_str):
if _PY == 3 and isinstance(bytes_or_str, str):
input_bytes = bytes_or_str.encode('utf8')
else:
input_bytes = bytes_or_str
try:
output_bytes = base64.b64encode(input_bytes)
except (UnicodeEncodeError, TypeError):
# This happens when the input message has
# non-ascii encodable characters in an unicode string
# `'`(ascii encodable) vs `’`(non-ascii encodable)
# In this case, we first need to encode it to utf-8
# and then do the base64 encoding
output_bytes = base64.b64encode(input_bytes.encode('utf-8'))
if _PY == 3:
return output_bytes.decode('ascii')
else:
return output_bytes
def py2min(input_list):
"""Get the minimum item from list."""
if not input_list:
raise ValueError('List should not be empty')
return min(input_list) if None not in input_list else None
def py2_round(x, d=0):
"""Round same as PY2 in PY3."""
p = 10 ** d
if x >= 0:
return float(math.floor((x * p) + 0.5)) / p
else:
return float(math.ceil((x * p) - 0.5)) / p
def hash_512_create(value):
"""Hash obj creation python 2 and python 3 compatibility."""
if isPY3():
if isinstance(value, str):
value = value.encode('utf-8')
return hashlib.sha512(value)
def django_smart_bytes(value):
"""Django smart_bytes always returns str for python 2 and python 3 compatibility."""
if isPY3():
return smart_bytes(value).decode('utf-8')
return smart_bytes(value)

Created a wrapper for unittest case also because assertCountEqual is depricated in Py3, String to byte comparison will fail, Dict order is different in both versions, Mock lib is included in unittest lib.

Unittest Case:- At HealthifyMe we always try to follow the best engineering practices. There can be a separate debate on the why we need Unittest case, but one thing we learn with this migration is that if you have Unittest case with good coverage then you will save your 30–40 % time in migration. In Healthifyme we have test coverage of more than 90%, because of which we felt comfortable changing some parts of the code since this wouldn’t cause lots of bugs on production.

CICD Pipeline:- We started running two CICD pipelines for both python 2 and python 3. Here we were trying to make sure that once migrated, compatible code should not be changed in a non-compatible way.

Code changes/Developers learning:- We are having more then 150 app’s in our monolith Django project. We started picking each app and try to make the code compatible with both the python version. Once the app is compatible we add that to our CICD pipeline. This way we were making sure that for the migrated app if developers are making any code change the CICD pipeline will take care of code compatibility and if any part of the code breaks in any of the versions then it is the developer’s responsibility to make it compatible. In this way, we are making sure for all the migrated app will be compatible with our new code and developer also will be familiar with how to write compatible code. Like this way, we migrated all the Django project apps.

Are we done?

No, We are not done. Till now we were making sure (using Unittest case) that our code is compatible with both python version but this does not mean that we are ready for python 3, but we were about 70–75% ready.

Preparation for Dev testing Rollout/Manual testing:- We use Newrelic to track the performances of application and Sentry for bug tracking. We created two configurations based on the python version and also created different celery queues configuration for both versions.

Manual Testing bug fixes:- After this, we released our application for DEV testing with python 3 and started manual testing on this for a couple of days. We monitored the application using sentry and Newrelic. Here we found below issues:-

  • Sentry bugs: We faced most of the bugs related to data comparisons, string formatting and hashing related.
  • String formatting issue: 'Hello {}'.format(b'HealthifyMe') is 'Hello HealthifyMe' in Python 2, but “Hello b’HealthifyMe’” in Python 3. This is silent behaviour in python 3 (I do not really agree with this: For me either its behaviour, should not change (existing code works) or it should throw an exception (would have been best!))
  • Django-redis-cache slow delete pattern: Using the Newrelic we faced deleting keys were slow in django-redis-cache.
  • Django model migration issue:- Check-in Part-1 Third-party library fixes Django.
  • Django admin console xxxxxx display issue:- Check-in Part-1 Third-party library fixes Django.

Python pickle issue while running the application in both versions:- Now after this, we started looking code compatibility for python 2 and python 3 at the same time. Till now we were having UAT instance for both Python version. We started routing our UAT traffic in both instances we starting from 10%, eventually increasing the traffic. While we were doing this we faced Python pickle object issue, explained this in blog part-1
Fix:- There is a ready fix for django-redis-cachewhere we have to specify the pickle protocol version in Django settings like

CACHES = {
"default": {
# ...
"OPTIONS": {
"PICKLE_VERSION": -1 # Use the latest protocol version (default), for py2 compatibility use 2
}
}
}

For django-cacheops Please check Part-1 Third-party library fixes django-cacheops package.

Testing with prod DB:- As we are a B2C company, handling millions of requests, so before rolling out the backend to production, we did one more round of testing. We made a prod instance in python 3 and connected with prod DB. We did manual testing on this instance by this we make sure that our code is compatible with prod data and we can run prod with both python 2 and python 3 instances at a time.

Production Rollout:- Now we were ready for Prod rollout, but here also we rolled out in this manner

  • Run all the Dev, UAT instance with equal traffic split.
  • Rollout less traffic machine and not business impact servers in prod with 0.01% traffic.
  • After one day for the Increased traffic from 0.01% to 1% then 5% then 10%.
  • Send full traffic for short period and fix the bugs related to string formatting and comparison related.
  • Rollout traffic for other service instance. (We started with sending the traffic to the less busy servers first)
  • In the end, when we were confident we started sending traffic to business impacted (Payment related traffic) servers with the same gradual manner.

One thing we want to mention is that we didn’t face any major issue in Production and the production rollout was smooth.

After this me removed the python 2 CICD pipeline, Python 2 servers for all environments.

Conclusion:

Ultimately HealthifyMe moved to Python 3 without having any downtime with continues development, after that we started looking migration for Django 2.2. I hope this article will help readers to migrate to Python 3.

Special Mention:- This migration was impossible without the help of HealthifyMe’s CTO Sachin Shenoy, Team lead and DevOps Tharun(one-man army), my colleague Akhil and Tech Team.

What’s next

Stay tuned to HealthifyMe tech blogs guys, we have more interesting blogs to come about Django 2.2 migration, MySQL migration, and about the best practices that we follow.

Special Notes

We are Hiring 😀 ! It’s an exciting time to join HealthifyMe engineering team. Please have a look at HealthifyMe careers page. We have a lot of exciting problems to solve in many domains like Scalability, DevOps, Backend, System design etc.

--

--