Migrating to Python 3: The HealthifyMe Experience.

Manmohan Sharma
HealthifyMe Tech
Published in
15 min readNov 18, 2020

Hello there!, this blog is about how to do Python 3 migration of a legacy monolith project without taking any downtime. We have 2 parts for this.

First Blog: This blog has information about why you should migrate to Python 3 and rare case difference in Python 2 and Python 3 and Compatible Solutions for those.

Second Blog: How HealthifyMe moved to Python 3 without having downtime with ongoing development.

Introduction

Python is the primary coding language at HealthifyMe. We are using Python 3 for all the new projects but our legacy project was still running with Python 2.7 with Django(1.11). Our monolith project is 7 years old with more than 2 million lines of python code. Some of the reasons why we moved to python 3:

  1. Support dropped for python 2:After Jan 2020, support for Python 2 will not be provided.
  2. Python 3 Adoption: As most of the companies, the open-source project is already adopting Python 3. New libraries, tools, modules, frameworks will be written in Python 3.
    The existing open-source project also migrated and new feature, fixes, security improvement are coming in Python 3.
  3. Software security: Ensuring software security is a legal requirement, especially when you’re dealing with personal information in the realm of GDPR. Keeping your software up to date understandably ranks very high among security best practices, and an outdated Python interpreter would be all but guaranteed to show up as a red flag during a security audit. Security-testing tools like Black Duck has reported many vulnerabilities, exploits, and security issues in Python 2. And most of them are fixed in the latest Python 3 versions (3.7.5, 3.8.0).
  4. Performance and new features: Python 3 has better performance than python 2. The new software product that uses python 3 has reported a 12 % CPU performance boost and a 30% improvement in using memory resources.
    also, python 3 gives us:
    * native asynchronous programming.
    * type annotations
    you can use to improve static code analysis and overall usability.
    * chained exceptions, which are especially useful when debugging.
    * other helpful features that make coding in Python much more efficient.

This list goes on, and it is sure to grow with each new Python 3 release.

These are some reasons for us to migrate to python 3. We have approximately 12–15 developer backend team. This is our base project with 7–10 build release daily, having bug fixes, improvement, security fixes, new feature development etc. Our main challenge was not to stop the current development process. We had to make sure our project is compatible with Python 3.X without breaking the compatibility with Python 2.X. The migration was led by 1 developer (of course, with the help of other devs).

In this article, we’ll try to illustrate all the various steps taken, problems faced, and a few more details.

Difference between python 2 and python 3.

We can find out the common difference here:

https://docs.python.org/3/whatsnew/3.0.html
https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html
https://jaxenter.com/differences-python-2-3-148432.html
https://python-future.org/compatible_idioms.html

Now we will describe in brief some of the rare and edge cases we faced when we started our migration.

Data type Comparison:

1. In Python 2 comparing an integer to none will work, such that none is considered less than an integer, even negative ones. Also, you can compare none with string, string with int .
Different data type comparison is not allowed in python 3.
This is known by most of the developer but we faced edge case where NotImplementedType was compared with int and this will not work in python 3.

Code Snippet :

class Base(object):
PHONE_NO_SIZE_LIMIT = NotImplemented
base = Base()
if base.PHONE_NO_SIZE_LIMIT > 10:
print("Pass correct phone number")
else:
print("Valid phone number")

If we save this with phone_number_validation.py and run the code:

# Python 2
mac@192 ~ % python phone_number_validation.py
Pass correct phone number
# Python 3
mac@192 ~ % python3.7 phone_number_validation.py
Traceback (most recent call last):
File "phone_number_validation.py", line 4, in <module>
if base.PHONE_NO_SIZE_LIMIT > 10:
TypeError: '>' not supported between instances of 'NotImplementedType' and 'int'

Compatible Solution:
We have to check if base.PHONE_NO_SIZE_LIMIT is implemented or not, if not then we have to handle it. Like:

if isinstance(base.PHONE_NO_SIZE_LIMIT, type(NotImplemented)):
# Have logic here, also exit/return here.
print("Phone size is not implemented")if base.PHONE_NO_SIZE_LIMIT > 10:
print("Pass correct phone number")
else:
print("Valid phone number")

2. Min, Max Math functions:
Comparison cant’s work in int to none, int to str, none to str in python 3, so the math function min and Max also have changed in python 3.

# Python 2 
>>> max([1, None, 2])
2
>>> max([1, None, 2, "abc"])
'abc'
>>> min([1, None, 2, "abc"])
None
# Python 3
>>> max([1, None, 2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'NoneType' and 'int'
>>> max([1, 2, "abc", None])
Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: '>' not supported between instances of 'str' and 'int'

Compatible Solution:
1. The list should have data of one type either string or int
2. With one type if none is there we can have our own method to handle this like.

def py2max(input_list):
"""Get the maximum item from list."""
if not input_list:
raise ValueError('List should not be empty')
formated_input_list = [rec for rec in input_list if rec is not None]
return max(formated_input_list) if formated_input_list else None

Hex encoding/decoding

While we encode a string in python 2 if we follow this <str_to_be_encoded>.encode(‘hex’) This will not work in Python 3

# ENCODING# Python 2
>>> "msg_to_be_encoded".encode('hex')
'6d73675f746f5f62655f656e636f646564'
# Python 3
>>> "msg_to_be_encoded".encode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'hex' is not a text encoding; use codecs.encode() to handle arbitrary codecs
# DECODING# Python 2
>>> "6d73675f746f5f62655f656e636f646564".decode('hex') 'msg_to_be_encoded'
# Python 3
>>> b"6d73675f746f5f62655f656e636f646564".decode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs
# Also look at exception here in python 2 and python 3.# Python 3>>> b"6d73675f746f5f62655f656e636f646564".decode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs
# Python 2>>> "6d73675f746f5f62655f656e636f64656".decode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/hex_codec.py", line 42, in hex_decode
output = binascii.a2b_hex(input)
TypeError: Odd-length string

Compatible Solution:
We should use codecs for Python 3 and 2. In python 3 input and output both is byte datatype.

# Python 2
>>> import codecs
>>> message = "msg_to_be_encoded"
>>> codecs.encode(message.encode(), 'hex')
'6d73675f746f5f62655f656e636f646564'
# Python 3
>>> message = "msg_to_be_encoded"
>>> codecs.encode(message.encode(), 'hex')
b'6d73675f746f5f62655f656e636f646564'

String uppercase:

string.uppercase doesn’t work in python 3.

# Python 2 
>>> import string
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# Python 3
>>> import string
>>> string.uppercase
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'string' has no attribute 'uppercase'

Compatible Solution:
Use ascii_uppercase

# Python 2
>>> import string
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# Python 3
>>> import string
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

hasattr():

hasattr() presents a huge edge case when writing Python 2 and 3 compatible code.hasattr() checks for the existence of an attribute by trying to retrieve it.

Python 2

hasattr(object, name)
The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an exception or not.)

Python 3

hasattr(object, name)The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.)

Please find more details here: https://medium.com/@k.wahome/python-2-vs-3-hasattr-behaviour-f1bed48b068

Sample code

class Foo(dict):     
def __init__(self):
super(Foo, self).__init__()
self.example_dict = {}

def __getitem__(self, key):
try:
return super(Foo, self).__getitem__(key)
except KeyError:
return self.example_dict[key]
def __getattr__(self, key):
return self[key]
foo = Foo()
if hasattr(foo, "not_present_key"):
pass
else:
print("Not Found")

Save the above snippet with hasattr_test.py

# Python 2mac@192 ~ % python hasattr_test.py 
Not Found
# Python 3mac@192 ~ % python3.7 hasattr_test.py
Traceback (most recent call last):
File "hasattr_test.py", line 8, in __getitem__
return super(Foo, self).__getitem__(key)
KeyError: 'not_present_key'
During handling of the above exception, another exception occurred:Traceback (most recent call last):
File "hasattr_test.py", line 17, in <module>
if hasattr(foo, "not_present_key"):
File "hasattr_test.py", line 13, in __getattr__
return self[key]
File "hasattr_test.py", line 10, in __getitem__
return self.example_dict[key]
KeyError: 'not_present_key'

Compatible Solution:
To make code compatibility in Python 2 and Python 3, we have to change __getattr__ function like below.

def __getattr__(self, key):
try:
return self[key]
except KeyError:
raise AttributeError

Dict becomes ordered in python 3:

From Python 3.6+ onwards dictionary is now default insertion ordered.

# Python 2
>>> sample_dict = {}
>>> sample_dict["a"] = 1
>>> sample_dict["b"] = 2
>>> sample_dict["c"] = 3
>>> sample_dict["d"] = 4
>>> sample_dict
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
# Python 3
>>> sample_dict = {}
>>> sample_dict["a"] = 1
>>> sample_dict["b"] = 2
>>> sample_dict["c"] = 3
>>> sample_dict["d"] = 4
>>> sample_dict
{'a': 1, 'b': 2, 'c': 3, 'd': 4}

Compatible code:
Ideally, this should not break the application code, because it changed to unordered to ordered dict. If still, we need the same result in both python version (order is important, test case failing) we have to use OrderedDict for keeping both language output the same.

Hashing:

In python 2 input can be typed unicode and str , but in Python 3 it needs bytes

# Python 2
>>> import hashlib
>>> message = "healthify"
>>> hashlib.sha512(message).hexdigest().lower()
'f910c1fa68087a546512ac3b175c99ee7eba21360fa4e579c2aed649c7e4a43466c56bceedcd60d783bc6e7d069a16f0b9c67140d6c129d2a1898af8cfb62719'
# Python 3
>>> message = "healthify"
>>> hashlib.sha512(message).hexdigest().lower()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing

Compatible Solution:
Python 3 needs bytes as input, where python 2 works with type unicode and str both.

# Python 2
>>> import hashlib
>>> message = "healthify"
>>> hashlib.sha512(message.encode('utf-8')).hexdigest().lower()
'f910c1fa68087a546512ac3b175c99ee7eba21360fa4e579c2aed649c7e4a43466c56bceedcd60d783bc6e7d069a16f0b9c67140d6c129d2a1898af8cfb62719'
# Python 3
>>> message = "healthify"
>>> hashlib.sha512(message.encode('utf-8')).hexdigest().lower()
'f910c1fa68087a546512ac3b175c99ee7eba21360fa4e579c2aed649c7e4a43466c56bceedcd60d783bc6e7d069a16f0b9c67140d6c129d2a1898af8cfb62719'

__div__ operator overloading:

In python 3, the __div__ operator seems not to exist as it was replaced by __truediv__ entirely.

class item:
fats = 0.0

def __div__(self, other):
self.fats = self.fats / other
it = item()
it.fats = 34.0
it / 3
print(it.fats)
# python 2 output
11.3333333333
# Python 3 output
Traceback (most recent call last):
File "div_overloading.py", line 16, in <module>
print(AB / 3)
TypeError: unsupported operand type(s) for /: 'Vector2' and 'int'

Compatible Solution:
In Python 3.x, we have to overload the __truediv__ operators, not the __div__ operator. For making code compatible, we have to keep both methods like:

class item:
fats = 0.0
def __div__(self, other):
self.fats = self.fats / other

def __truediv__(self, other):
self.fats = self.fats / other
it = item()
it.fats = 34.0
it / 3
print(it.fats)
# python 2 output
11.3333333333
# Python 3 output
11.333333333333334

Base64 encoding:

We do the base64 encoding using base64.b64encode(<string_to_be_encode>). In Python 2 we can pass unicode or str as input. But in python 3 it needs the bytes as input.

# Python 2
>>> from base64 import b64encode
>>> b64encode("man")
'bWFu'
>>> b64encode(u"man")
'bWFu'
# Python 3
>>> from base64 import b64encode
>>> b64encode("man")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'

Compatible Solution:
We can have our own method for base64 encoding and it can have input string and bytes both.

import base64
import six
def base64ify(bytes_or_str):
if six.PY3 and isinstance(bytes_or_str, str):
input_bytes = bytes_or_str.encode('utf8')
else:
input_bytes = bytes_or_str
try:
output_bytes = base64.b64encode(input_bytes)
except (UnicodeEncodeError, TypeError):
# This happens when the input message has
# non-ascii encodable characters in an unicode string
# `'`(ascii encodable) vs `’`(non-ascii encodable)
# In this case, we first need to encode it to utf-8
# and then do the base64 encoding
output_bytes = base64.b64encode(input_bytes.encode('utf-8'))
if six.PY3:
return output_bytes.decode('ascii')
else:
return output_bytes

Inbuilt round method:

Python2: Rounding is done away from (so. for example, round(0.5) is 1.0 and round(-0.5) is -1.0)
Python 3: Rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2).

# Python 2
>>> round(15.5)
16.0
>>> round(16.5)
17.0
# Python 3
>>> round(15.5)
16
>>> round(16.5)
16

Compatible Solution:
We created our own round method that works the same as Python 2 round in python 3 also.

def py2_round(x, d=0):
"""Round same as PY2 in PY3."""
p = 10 ** d
if x >= 0:
return float(math.floor((x * p) + 0.5)) / p
else:
return float(math.ceil((x * p) - 0.5)) / p

struct.pack input type:

input type is str , Python 3 it should be byte

# Python 2
>>> import struct
>>> import struct
>>> string = "blab"
>>> s = struct.Struct(b'4s')
>>> packed_data = s.pack(string)
# Python 3
>>> import struct
>>> string = "blab"
>>> s = struct.Struct(b'4s')
>>> packed_data = s.pack(string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: argument for 's' must be a bytes object

Compatible Solution:
encode the input.

list comprehension variable scope changes:

In Python 3 the list comprehension variable use enclosing scope mean you will not able to access the plan variable outside in the function, this was not the case in Python 2.

# Python 2
>>> def two_or_three():
... x = 3
... [0 for x in range(3)]
... return x
...
>>> two_or_three()
2
>>> def two_or_three_with_method():
... def print_number():
... print(x)
... [0 for x in range(3)]
... print_number()
...
>>> two_or_three_with_method()
2
# Python 3
>>> def two_or_three():
... x = 3
... [0 for x in range(3)]
... return x
...
>>> two_or_three()
3
>>> def two_or_three_with_method():
... def print_number():
... print(x)
... [0 for x in range(3)]
... print_number()
...
>>> two_or_three_with_method()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in two_or_three
File "<stdin>", line 3, in print_number
NameError: name 'x' is not defined

Compatible Solution:
We should avoid such cases. For the second method (two_or_three_with_method ) we have to pass x value as an argument.

math.floor and math.ceil return data type changed:

In python 2 the floor and ceil return float data type but in python 3 it returns int data type.

# Python 2
>>> from math import floor,ceil
>>> floor(4.345)
4.0
>>> ceil(4.345)
5.0
# Python 3
>>> from math import floor,ceil
>>> floor(4.345)
4
>>> ceil(4.345)
5

Compatible Solution:
We can make the output as a float in python 3. It will not impact python 2, float(floor(4.345))

Unpickling a python 2 pickled object into python 3:

As mention, we are making code compatible to run both python version. we faced an issue while the object is pickled in Python 2 but we are not able to unpickle in python 3. This can happen for Redis Pickled cached objects also.

pickle.load(), default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3

Compatible Solution:
We can use the below method to unpickle objects.

def unpickling_py2_to_py3(pickled_value):
"""Unpickling python 2 pickled in to python 3."""
if isPY3():
try:
value = pickle.loads(pickled_value)
except UnicodeDecodeError:
value = pickle.loads(pickled_value, encoding="latin1")
else:
value = pickle.loads(pickled_value)
return value

Third-party library fixes:

In our Project we are using a lot of third party packages, while we update them we faced some edge cases. You can skip this if you are not using any of them.

  1. Django:
    a. Django migration files
    When we run Django makemigrations in python 3 we were seeing new migration files. but the same was not happening for python 2. There can be multiple reasons for this.

b prefix: On makemigration most of the new files are generated without having b prefix for string values.This is because all string literals used in your models and fields (e.g. ``verbose_name``,``related_name``, etc.), must be consistently either byte strings or text(unicode) strings in both Python 2 and 3.

Compatible Solution: The easiest way to achieve one migration for new migration add from __future__ import unicode_literal to all models files. For existing migration files either we run makemigration and that should only happen once, or we can remove b prefix from existing migration files.

Choice Field: In models we use dict.items(). As we know now that the dict become ordered in python 3 so the values return from dict.items() will be differ in Python 2 and Python 3.

Compatible Solution: To make compatible for both we sorted(dict.items()) and generated migration file that is now compatible for both python versions.

b. “<model_name> Object” display in admin console
For python 3 in Admin console, we can see <model_name> Object as fields value instead of a string. if was happening because our model class having method.

def __unicode__(self):
return 'MyModel: {}'.format(self.name)

We can have str method __str__ and that works for both Python 2and Python 3. But it will fail if the str version has non-ASCII characters.

Compatible Solution: Got the solution from here, added @python_2_unicode_compatible decorator for models and modified __unicode__ to __str__ .

c. Django query object slicing
Django query object has slicing capability to fetch records. for Django version(1.11), python 2 it supports slicing for int and str. In Python 3 it only supports slicing through the int.

# Python 2
>>> from food import models
>>> foods = models.foods.objects.all()
>>> foods[1:2] # int slicing
<QuerySet [<Foods: Cashewnut Cutlets >]>
>>> foods["1":"2"] # string slicing
<QuerySet [<Foods: Cashewnut Cutlets >]>
# Python 3
In [2]: from food import models
In [3]: foods = models.Foods.objects.all()
In [4]: foods["1":"2"]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-be74a70cdff4> in <module>
----> 1 foods["1":"2"]
~/Venvs/py3_venv/lib/python3.7/site-packages/django/db/models/query.py in __getitem__(self, k)
265 raise TypeError
266 assert ((not isinstance(k, slice) and (k >= 0)) or
--> 267 (isinstance(k, slice) and (k.start is None or k.start >= 0) and
268 (k.stop is None or k.stop >= 0))), \
269 "Negative indexing is not supported."
TypeError: '>=' not supported between instances of 'str' and 'int'In [5]: foods[1:2]
Out[5]: <QuerySet [<Foods: Foods object>]>

Compatible Solution: Avoid string slicing that is anyway not a good approach.

2. Redis: Redis is common used python package. redis-py 3.0 introduces many new features but required several backwards-incompatible changes to be made in the process.

Compatible Solution:
https://pypi.org/project/redis/ from here we can find out the changes and how to make compatible code. We made our own Redis methods that compatible with both Redis versions. Like

import six
def redis_zadd(redis_connection, key, **values):
"""Redis method zadd for python 2 and python 3 compatibility."""
if six.PY3:
redis_connection.zadd(key, values)
else:
redis_connection.zadd(key, **values)
def redis_zincrby(redis_connection, key, value, score):
"""Redis method zincrby for python 2 and python 3 compatibility."""
if six.PY3:
redis_connection.zincrby(key, score, value)
else:
redis_connection.zincrby(key, value, score)

3. django-cacheops: Cacheops is a slick app that supports automatic or manual queryset caching and automatic granular event-driven invalidation. There is a gotcha while Django cacheops store values to Redis it makes them a pickle object.

In python 2 there are 3 different protocols (0, 1, 2) and the default is 0.In python 3 there are 5 different protocols (0, 1, 2, 3, 4) and the default is 3.

Pickle use the default pickle protocol to dumps data.So in python 3 if we make a pickle object and want to unpickle in Python 2 will not work because pickle protocol 3 is not available in python 2.

Compatible Solution:
We can specify the protocol parameter when invoking pickle.dump.
django-cacheops don’t have an option to provide the pickle protocol. We used monkey patching to sort out this.

import cacheops
from cacheops.cross import pickle
@cacheops.redis.handle_connection_failure
def _custom_cacheops_redis_set(self, cache_key, data, timeout=None):
pickled_data = pickle.dumps(data, 2) # Protocol 2 is valid in both Python version.
if timeout is not None:
self.conn.setex(cache_key, timeout, pickled_data)
else:
self.conn.set(cache_key, pickled_data)
cacheops.RedisCache.set = _custom_cacheops_redis_set

Also as mention above how to Unpickle a python 2 pickled object into python 3. We want to get data in python 3, we can face UnicodeDecodeError due to picking done in different python versions.
this is also sorted using patching

import six
from cacheops.simple import CacheMiss
if six.PY3:
import pickle
else:
import cPickle as pickle
def unpickling_py2_to_py3(pickled_value):
"""Unpickling python 2 pickled in to python 3."""
if six.PY3:
try:
value = pickle.loads(pickled_value)
except UnicodeDecodeError:
value = pickle.loads(pickled_value, encoding="latin1")
else:
value = pickle.loads(pickled_value)
return value
def _custom_cacheops_redis_get(self, cache_key):
data = self.conn.get(cache_key)
if data is None:
raise CacheMiss
return unpickling_py2_to_py3(data)
cacheops.RedisCache.get = _custom_cacheops_redis_get

4. django-redis-cache: we have a method for deleting keys based on the pattern. In Python 2 we use the version 1.6.5 the key searching/delete was happening without scan and but for python 3 we updated the version to 2.1 where the pattern search happening using Redis scan, that is making it so slow. This was causing the problem. Git hub issue for this.

Compatible Solution:
We sorted the issue using the old way of pattern delete. instead of calling cache.delete_pattern(pattern) we are doing

pattern = cache.make_key(pattern)
keys = cache.master_client.keys(pattern)
if len(keys):
cache.master_client.delete(*keys)

What’s Next

In part two of this blog, where we will explore how to move to python 3 without having downtime with ongoing development.

--

--