Migrating to Python 3: The HealthifyMe Experience.
Hello there!, this blog is about how to do Python 3 migration of a legacy monolith project without taking any downtime. We have 2 parts for this.
First Blog: This blog has information about why you should migrate to Python 3 and rare case difference in Python 2 and Python 3 and Compatible Solutions for those.
Second Blog: How HealthifyMe moved to Python 3 without having downtime with ongoing development.
Introduction
Python is the primary coding language at HealthifyMe. We are using Python 3 for all the new projects but our legacy project was still running with Python 2.7 with Django(1.11). Our monolith project is 7 years old with more than 2 million lines of python code. Some of the reasons why we moved to python 3:
- Support dropped for python 2:After Jan 2020, support for Python 2 will not be provided.
- Python 3 Adoption: As most of the companies, the open-source project is already adopting Python 3. New libraries, tools, modules, frameworks will be written in Python 3.
The existing open-source project also migrated and new feature, fixes, security improvement are coming in Python 3. - Software security: Ensuring software security is a legal requirement, especially when you’re dealing with personal information in the realm of GDPR. Keeping your software up to date understandably ranks very high among security best practices, and an outdated Python interpreter would be all but guaranteed to show up as a red flag during a security audit. Security-testing tools like Black Duck has reported many vulnerabilities, exploits, and security issues in Python 2. And most of them are fixed in the latest Python 3 versions (3.7.5, 3.8.0).
- Performance and new features: Python 3 has better performance than python 2. The new software product that uses python 3 has reported a 12 % CPU performance boost and a 30% improvement in using memory resources.
also, python 3 gives us:
* native asynchronous programming.
* type annotations you can use to improve static code analysis and overall usability.
* chained exceptions, which are especially useful when debugging.
* other helpful features that make coding in Python much more efficient.
This list goes on, and it is sure to grow with each new Python 3 release.
These are some reasons for us to migrate to python 3. We have approximately 12–15 developer backend team. This is our base project with 7–10 build release daily, having bug fixes, improvement, security fixes, new feature development etc. Our main challenge was not to stop the current development process. We had to make sure our project is compatible with Python 3.X without breaking the compatibility with Python 2.X. The migration was led by 1 developer (of course, with the help of other devs).
In this article, we’ll try to illustrate all the various steps taken, problems faced, and a few more details.
Difference between python 2 and python 3.
We can find out the common difference here:
https://docs.python.org/3/whatsnew/3.0.html
https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html
https://jaxenter.com/differences-python-2-3-148432.html
https://python-future.org/compatible_idioms.html
Now we will describe in brief some of the rare and edge cases we faced when we started our migration.
Data type Comparison:
1. In Python 2 comparing an integer
to none
will work, such that none
is considered less than an integer, even negative ones. Also, you can compare none
with string
, string
with int
.
Different data type comparison is not allowed in python 3.
This is known by most of the developer but we faced edge case where NotImplementedType
was compared with int
and this will not work in python 3.
Code Snippet :
class Base(object):
PHONE_NO_SIZE_LIMIT = NotImplementedbase = Base()
if base.PHONE_NO_SIZE_LIMIT > 10:
print("Pass correct phone number")
else:
print("Valid phone number")
If we save this with phone_number_validation.py
and run the code:
# Python 2
mac@192 ~ % python phone_number_validation.py
Pass correct phone number# Python 3
mac@192 ~ % python3.7 phone_number_validation.py
Traceback (most recent call last):
File "phone_number_validation.py", line 4, in <module>
if base.PHONE_NO_SIZE_LIMIT > 10:
TypeError: '>' not supported between instances of 'NotImplementedType' and 'int'
Compatible Solution:
We have to check if base.PHONE_NO_SIZE_LIMIT
is implemented or not, if not then we have to handle it. Like:
if isinstance(base.PHONE_NO_SIZE_LIMIT, type(NotImplemented)):
# Have logic here, also exit/return here.
print("Phone size is not implemented")if base.PHONE_NO_SIZE_LIMIT > 10:
print("Pass correct phone number")
else:
print("Valid phone number")
2. Min, Max Math functions:
Comparison cant’s work in int
to none
, int
to str
, none
to str
in python 3, so the math function min and Max also have changed in python 3.
# Python 2
>>> max([1, None, 2])
2
>>> max([1, None, 2, "abc"])
'abc'
>>> min([1, None, 2, "abc"])
None # Python 3
>>> max([1, None, 2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'NoneType' and 'int' >>> max([1, 2, "abc", None])
Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: '>' not supported between instances of 'str' and 'int'
Compatible Solution:
1. The list should have data of one type either string
or int
2. With one type if none
is there we can have our own method to handle this like.
def py2max(input_list):
"""Get the maximum item from list."""
if not input_list:
raise ValueError('List should not be empty')
formated_input_list = [rec for rec in input_list if rec is not None]
return max(formated_input_list) if formated_input_list else None
Hex encoding/decoding
While we encode a string in python 2 if we follow this <str_to_be_encoded>.encode(‘hex’)
This will not work in Python 3
# ENCODING# Python 2
>>> "msg_to_be_encoded".encode('hex')
'6d73675f746f5f62655f656e636f646564'# Python 3
>>> "msg_to_be_encoded".encode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'hex' is not a text encoding; use codecs.encode() to handle arbitrary codecs# DECODING# Python 2
>>> "6d73675f746f5f62655f656e636f646564".decode('hex') 'msg_to_be_encoded'# Python 3
>>> b"6d73675f746f5f62655f656e636f646564".decode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs# Also look at exception here in python 2 and python 3.# Python 3>>> b"6d73675f746f5f62655f656e636f646564".decode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: 'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs# Python 2>>> "6d73675f746f5f62655f656e636f64656".decode('hex')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/hex_codec.py", line 42, in hex_decode
output = binascii.a2b_hex(input)
TypeError: Odd-length string
Compatible Solution:
We should use codecs
for Python 3 and 2. In python 3 input and output both is byte
datatype.
# Python 2
>>> import codecs
>>> message = "msg_to_be_encoded"
>>> codecs.encode(message.encode(), 'hex')
'6d73675f746f5f62655f656e636f646564'# Python 3
>>> message = "msg_to_be_encoded"
>>> codecs.encode(message.encode(), 'hex')
b'6d73675f746f5f62655f656e636f646564'
String uppercase:
string.uppercase
doesn’t work in python 3.
# Python 2
>>> import string
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'# Python 3
>>> import string
>>> string.uppercase
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'string' has no attribute 'uppercase'
Compatible Solution:
Use ascii_uppercase
# Python 2
>>> import string
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'# Python 3
>>> import string
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
hasattr():
hasattr()
presents a huge edge case when writing Python 2 and 3 compatible code.hasattr()
checks for the existence of an attribute by trying to retrieve it.
Python 2
hasattr(object, name)
The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an exception or not.)
Python 3
hasattr(object, name)The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.)
Please find more details here: https://medium.com/@k.wahome/python-2-vs-3-hasattr-behaviour-f1bed48b068
Sample code
class Foo(dict):
def __init__(self):
super(Foo, self).__init__()
self.example_dict = {}
def __getitem__(self, key):
try:
return super(Foo, self).__getitem__(key)
except KeyError:
return self.example_dict[key] def __getattr__(self, key):
return self[key]foo = Foo()
if hasattr(foo, "not_present_key"):
pass
else:
print("Not Found")
Save the above snippet with hasattr_test.py
# Python 2mac@192 ~ % python hasattr_test.py
Not Found# Python 3mac@192 ~ % python3.7 hasattr_test.py
Traceback (most recent call last):
File "hasattr_test.py", line 8, in __getitem__
return super(Foo, self).__getitem__(key)
KeyError: 'not_present_key'During handling of the above exception, another exception occurred:Traceback (most recent call last):
File "hasattr_test.py", line 17, in <module>
if hasattr(foo, "not_present_key"):
File "hasattr_test.py", line 13, in __getattr__
return self[key]
File "hasattr_test.py", line 10, in __getitem__
return self.example_dict[key]
KeyError: 'not_present_key'
Compatible Solution:
To make code compatibility in Python 2 and Python 3, we have to change __getattr__
function like below.
def __getattr__(self, key):
try:
return self[key]
except KeyError:
raise AttributeError
Dict becomes ordered in python 3:
From Python 3.6+ onwards dictionary is now default insertion ordered.
# Python 2
>>> sample_dict = {}
>>> sample_dict["a"] = 1
>>> sample_dict["b"] = 2
>>> sample_dict["c"] = 3
>>> sample_dict["d"] = 4
>>> sample_dict
{'a': 1, 'c': 3, 'b': 2, 'd': 4}# Python 3
>>> sample_dict = {}
>>> sample_dict["a"] = 1
>>> sample_dict["b"] = 2
>>> sample_dict["c"] = 3
>>> sample_dict["d"] = 4
>>> sample_dict
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
Compatible code:
Ideally, this should not break the application code, because it changed to unordered to ordered dict. If still, we need the same result in both python version (order is important, test case failing) we have to use OrderedDict
for keeping both language output the same.
Hashing:
In python 2 input can be typed unicode
and str
, but in Python 3 it needs bytes
# Python 2
>>> import hashlib
>>> message = "healthify"
>>> hashlib.sha512(message).hexdigest().lower()
'f910c1fa68087a546512ac3b175c99ee7eba21360fa4e579c2aed649c7e4a43466c56bceedcd60d783bc6e7d069a16f0b9c67140d6c129d2a1898af8cfb62719'# Python 3
>>> message = "healthify"
>>> hashlib.sha512(message).hexdigest().lower()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
Compatible Solution:
Python 3 needs bytes
as input, where python 2 works with type unicode
and str
both.
# Python 2
>>> import hashlib
>>> message = "healthify"
>>> hashlib.sha512(message.encode('utf-8')).hexdigest().lower()
'f910c1fa68087a546512ac3b175c99ee7eba21360fa4e579c2aed649c7e4a43466c56bceedcd60d783bc6e7d069a16f0b9c67140d6c129d2a1898af8cfb62719'# Python 3
>>> message = "healthify"
>>> hashlib.sha512(message.encode('utf-8')).hexdigest().lower()
'f910c1fa68087a546512ac3b175c99ee7eba21360fa4e579c2aed649c7e4a43466c56bceedcd60d783bc6e7d069a16f0b9c67140d6c129d2a1898af8cfb62719'
__div__ operator overloading:
In python 3, the __div__ operator seems not to exist as it was replaced by __truediv__ entirely.
class item:
fats = 0.0
def __div__(self, other):
self.fats = self.fats / otherit = item()
it.fats = 34.0
it / 3
print(it.fats)# python 2 output
11.3333333333# Python 3 output
Traceback (most recent call last):
File "div_overloading.py", line 16, in <module>
print(AB / 3)
TypeError: unsupported operand type(s) for /: 'Vector2' and 'int'
Compatible Solution:
In Python 3.x, we have to overload the __truediv__
operators, not the __div__
operator. For making code compatible, we have to keep both methods like:
class item:
fats = 0.0 def __div__(self, other):
self.fats = self.fats / other
def __truediv__(self, other):
self.fats = self.fats / otherit = item()
it.fats = 34.0
it / 3
print(it.fats)# python 2 output
11.3333333333# Python 3 output
11.333333333333334
Base64 encoding:
We do the base64 encoding using base64.b64encode(<string_to_be_encode>)
. In Python 2 we can pass unicode
or str
as input. But in python 3 it needs the bytes
as input.
# Python 2
>>> from base64 import b64encode
>>> b64encode("man")
'bWFu'
>>> b64encode(u"man")
'bWFu'# Python 3
>>> from base64 import b64encode
>>> b64encode("man")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
Compatible Solution:
We can have our own method for base64 encoding and it can have input string
and bytes
both.
import base64
import six
def base64ify(bytes_or_str):
if six.PY3 and isinstance(bytes_or_str, str):
input_bytes = bytes_or_str.encode('utf8')
else:
input_bytes = bytes_or_str
try:
output_bytes = base64.b64encode(input_bytes)
except (UnicodeEncodeError, TypeError):
# This happens when the input message has
# non-ascii encodable characters in an unicode string
# `'`(ascii encodable) vs `’`(non-ascii encodable)
# In this case, we first need to encode it to utf-8
# and then do the base64 encoding
output_bytes = base64.b64encode(input_bytes.encode('utf-8'))
if six.PY3:
return output_bytes.decode('ascii')
else:
return output_bytes
Inbuilt round method:
Python2: Rounding is done away from (so. for example, round(0.5) is 1.0 and round(-0.5) is -1.0)
Python 3: Rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2).
# Python 2
>>> round(15.5)
16.0
>>> round(16.5)
17.0# Python 3
>>> round(15.5)
16
>>> round(16.5)
16
Compatible Solution:
We created our own round method that works the same as Python 2 round in python 3 also.
def py2_round(x, d=0):
"""Round same as PY2 in PY3."""
p = 10 ** d
if x >= 0:
return float(math.floor((x * p) + 0.5)) / p
else:
return float(math.ceil((x * p) - 0.5)) / p
struct.pack input type:
input type is str
, Python 3 it should be byte
# Python 2
>>> import struct
>>> import struct
>>> string = "blab"
>>> s = struct.Struct(b'4s')
>>> packed_data = s.pack(string)# Python 3
>>> import struct
>>> string = "blab"
>>> s = struct.Struct(b'4s')
>>> packed_data = s.pack(string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: argument for 's' must be a bytes object
Compatible Solution:
encode
the input.
list comprehension variable scope
changes:
In Python 3 the list comprehension variable use enclosing scope mean you will not able to access the plan variable outside in the function, this was not the case in Python 2.
# Python 2
>>> def two_or_three():
... x = 3
... [0 for x in range(3)]
... return x
...
>>> two_or_three()
2>>> def two_or_three_with_method():
... def print_number():
... print(x)
... [0 for x in range(3)]
... print_number()
...
>>> two_or_three_with_method()
2# Python 3
>>> def two_or_three():
... x = 3
... [0 for x in range(3)]
... return x
...
>>> two_or_three()
3>>> def two_or_three_with_method():
... def print_number():
... print(x)
... [0 for x in range(3)]
... print_number()
...
>>> two_or_three_with_method()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in two_or_three
File "<stdin>", line 3, in print_number
NameError: name 'x' is not defined
Compatible Solution:
We should avoid such cases. For the second method (two_or_three_with_method
) we have to pass x value as an argument.
math.floor and math.ceil return data type changed:
In python 2 the floor and ceil return float data type but in python 3 it returns int data type.
# Python 2
>>> from math import floor,ceil
>>> floor(4.345)
4.0
>>> ceil(4.345)
5.0# Python 3
>>> from math import floor,ceil
>>> floor(4.345)
4
>>> ceil(4.345)
5
Compatible Solution:
We can make the output as a float in python 3. It will not impact python 2, float(floor(4.345))
Unpickling a python 2 pickled object into python 3:
As mention, we are making code compatible to run both python version. we faced an issue while the object is pickled in Python 2 but we are not able to unpickle in python 3. This can happen for Redis Pickled cached objects also.
pickle.load(),
default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load()
documentation:
Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3
Compatible Solution:
We can use the below method to unpickle objects.
def unpickling_py2_to_py3(pickled_value):
"""Unpickling python 2 pickled in to python 3."""
if isPY3():
try:
value = pickle.loads(pickled_value)
except UnicodeDecodeError:
value = pickle.loads(pickled_value, encoding="latin1")
else:
value = pickle.loads(pickled_value)
return value
Third-party library fixes:
In our Project we are using a lot of third party packages, while we update them we faced some edge cases. You can skip this if you are not using any of them.
- Django:
a. Django migration files
When we run Djangomakemigrations
in python 3 we were seeing new migration files. but the same was not happening for python 2. There can be multiple reasons for this.
b prefix: On makemigration most of the new files are generated without having
b prefix
for string values.This is because all string literals used in your models and fields (e.g. ``verbose_name``,``related_name``, etc.), must be consistently either byte strings or text(unicode) strings in both Python 2 and 3.Compatible Solution: The easiest way to achieve one migration for new migration add
from __future__ import unicode_literal
to all models files. For existing migration files either we runmakemigration
and that should only happen once, or we can removeb prefix
from existing migration files.Choice Field: In models we use dict.items(). As we know now that the dict become ordered in python 3 so the values return from dict.items() will be differ in Python 2 and Python 3.
Compatible Solution: To make compatible for both we sorted(dict.items()) and generated migration file that is now compatible for both python versions.
b. “<model_name> Object” display in admin console
For python 3 in Admin console, we can see <model_name> Object as fields value instead of a string. if was happening because our model class having method.
def __unicode__(self):
return 'MyModel: {}'.format(self.name)
We can have str method __str__ and that works for both Python 2and Python 3. But it will fail if the str version has non-ASCII characters.
Compatible Solution: Got the solution from here, added @python_2_unicode_compatible
decorator for models and modified __unicode__
to __str__
.
c. Django query object slicing
Django query object has slicing capability to fetch records. for Django version(1.11), python 2 it supports slicing for int and str. In Python 3 it only supports slicing through the int.
# Python 2
>>> from food import models
>>> foods = models.foods.objects.all()
>>> foods[1:2] # int slicing
<QuerySet [<Foods: Cashewnut Cutlets >]>
>>> foods["1":"2"] # string slicing
<QuerySet [<Foods: Cashewnut Cutlets >]># Python 3
In [2]: from food import models
In [3]: foods = models.Foods.objects.all()
In [4]: foods["1":"2"]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-be74a70cdff4> in <module>
----> 1 foods["1":"2"]~/Venvs/py3_venv/lib/python3.7/site-packages/django/db/models/query.py in __getitem__(self, k)
265 raise TypeError
266 assert ((not isinstance(k, slice) and (k >= 0)) or
--> 267 (isinstance(k, slice) and (k.start is None or k.start >= 0) and
268 (k.stop is None or k.stop >= 0))), \
269 "Negative indexing is not supported."TypeError: '>=' not supported between instances of 'str' and 'int'In [5]: foods[1:2]
Out[5]: <QuerySet [<Foods: Foods object>]>
Compatible Solution: Avoid string slicing that is anyway not a good approach.
2. Redis: Redis is common used python package. redis-py 3.0 introduces many new features but required several backwards-incompatible changes to be made in the process.
Compatible Solution:
https://pypi.org/project/redis/ from here we can find out the changes and how to make compatible code. We made our own Redis methods that compatible with both Redis versions. Like
import six
def redis_zadd(redis_connection, key, **values):
"""Redis method zadd for python 2 and python 3 compatibility."""
if six.PY3:
redis_connection.zadd(key, values)
else:
redis_connection.zadd(key, **values)def redis_zincrby(redis_connection, key, value, score):
"""Redis method zincrby for python 2 and python 3 compatibility."""
if six.PY3:
redis_connection.zincrby(key, score, value)
else:
redis_connection.zincrby(key, value, score)
3. django-cacheops: Cacheops is a slick app that supports automatic or manual queryset caching and automatic granular event-driven invalidation. There is a gotcha while Django cacheops store values to Redis it makes them a pickle object.
In python 2 there are 3 different protocols (0, 1, 2) and the default is 0.In python 3 there are 5 different protocols (0, 1, 2, 3, 4) and the default is 3.
Pickle use the default pickle protocol to dumps data.So in python 3 if we make a pickle object and want to unpickle in Python 2 will not work because pickle protocol
3
is not available in python 2.
Compatible Solution:
We can specify the protocol parameter when invoking pickle.dump.
django-cacheops don’t have an option to provide the pickle protocol. We used monkey patching to sort out this.
import cacheops
from cacheops.cross import pickle@cacheops.redis.handle_connection_failure
def _custom_cacheops_redis_set(self, cache_key, data, timeout=None):
pickled_data = pickle.dumps(data, 2) # Protocol 2 is valid in both Python version.
if timeout is not None:
self.conn.setex(cache_key, timeout, pickled_data)
else:
self.conn.set(cache_key, pickled_data)cacheops.RedisCache.set = _custom_cacheops_redis_set
Also as mention above how to Unpickle a python 2 pickled object into python 3. We want to get data in python 3, we can face UnicodeDecodeError due to picking done in different python versions.
this is also sorted using patching
import six
from cacheops.simple import CacheMiss
if six.PY3:
import pickle
else:
import cPickle as pickledef unpickling_py2_to_py3(pickled_value):
"""Unpickling python 2 pickled in to python 3."""
if six.PY3:
try:
value = pickle.loads(pickled_value)
except UnicodeDecodeError:
value = pickle.loads(pickled_value, encoding="latin1")
else:
value = pickle.loads(pickled_value)
return valuedef _custom_cacheops_redis_get(self, cache_key):
data = self.conn.get(cache_key)
if data is None:
raise CacheMiss
return unpickling_py2_to_py3(data)cacheops.RedisCache.get = _custom_cacheops_redis_get
4. django-redis-cache: we have a method for deleting keys based on the pattern. In Python 2 we use the version 1.6.5
the key searching/delete was happening without scan and but for python 3 we updated the version to 2.1
where the pattern search happening using Redis scan, that is making it so slow. This was causing the problem. Git hub issue for this.
Compatible Solution:
We sorted the issue using the old way of pattern delete. instead of calling cache.delete_pattern(pattern)
we are doing
pattern = cache.make_key(pattern)
keys = cache.master_client.keys(pattern)
if len(keys):
cache.master_client.delete(*keys)
What’s Next
In part two of this blog, where we will explore how to move to python 3 without having downtime with ongoing development.