My experience of making Python 2 SocketServer code compatible with Python 3
Recently I have been changing some old Python 2 code to make them compatiable with Python 3.7 and 3.8. In order to dig out all the potential compatibility issues, I used the TDD approach: write a whole bunch of unit, integration and system tests that either run the code within the tests and run the tests in different python version, or put the code into a separate module and run that module in another process in different python versions using the subprocess
module.
One of the repo I worked on has a class which is the subclass of the Python socketserver.TCPServer
class(or SocketServer.TCPServer
in Python 2). Although it didn’t take long to discover all the surprises thanks to the effectiveness of TDD, a lot of the bugs still gave me some “WTF that doesn’t work in Python 3??!!” moments, some of which are socketserver specific and some are common pitfalls anyone who’s changing their Python 2 code to work with Python 3 could encounter. Here are the ones I feel like worth sharing:
Want to read this story later? Save it in Journal.
socketserver vs. SocketServer
Turns out the module itself is called SocketServer
in Python 2 and sockerserver
in Python 3. This was a easy catch since all my tests were broken when running in Python 3 because of import error. At first I did what a newbie would do:
try:
import SocketServer as socketserver
except:
import socketserver
A more experienced co-worker pointed out I could use the six
module for most of these compatibility issues. Using six
I can simply do:
from six.moves import socketserver
Then replace all SocketServer
with socketserver
, problem solved. This is the first issue that introduced thesix
module to me and I have been using it to fix compatibility issues since then.
Class initialization
The code I was working on is a subclass of socketserver.TCPServer
, let’s call it MyServer
for convenience. The MyServer
class looks like:
class MyServer(SockerServer.TCPServer):
def __init__(self, *args, **kwargs):
...
SockerServer.TCPServer.__init__(self, *args, **kwargs)
It initializes its parent class SocketServer.TCPServer
at the end of its __init__
method. The reason it didn’t dosuper(SocketServer.TCPServer, self).__init__(*args, **kwargs)
is because SocketServer.TCPServer
is a subclass of SocketServer.BaseServer
and from Python 2.7 SocketServer source code, it’s defined as:
class BaseServer:
......
Notice it doesn’t inherit from object
. That means it’s an old style python class which makesSocketServer.TCPServer
one as well, so we have to call the __init__
method of parent class explictly to initialize it, which is not the ideal way to do it. But in Python 3 socketserver source code the socketserver.BaseServer
class definition didn’t change, which automatically makes it a new style class. In order to make the code do the right thing in Python 3 and fall back to the old way in Python 2, I had to do something like this:
class MyServer(socketserver.TCPServer):
def __init__(self, *args, **kwargs):
...
if isinstance(socketserver.TCPServer, object):
# Python 3
super(socketserver.TCPServer, self).__init__(*args, **kwargs)
else:
# Python 2
socketserver.TCPServer.__init__(self, *args, **kwargs)
That way it could work in both Python 2 and 3, although the Python 2 part looks awkward.
Sending and receiving data
This issue was caught in the custom stream handler class, which is a subclass of socketserver.StreamRequestHandler
. It has a method that sends response back to the client using self.wfile.write()
. The problem was self.wfile.write()
actually takes a byte type argument, and other codes were passing string literals into it. In Python 2 str
and bytes
are the same type. In fact bytes
is just an alias to str
, so you could pass string literals to functions that are expecting bytes
with no issue. But in Python 3 string literals are Unicode
type, you have to explicitly convert a string literal to bytes by either using the b
notation or bytes
constructor to get a bytes
type string. Instead of doing all the encoding and decoding for Python 3 Unicode
strings, we can simply use six.b
to ensure we get a bytes
string in both Python 2 and 3:
self.wfile.write(six.b(response_str))
But that’s not all of it. The code also receives data from clients using self.rfile.readline()
, which returns bytes
. That’s troublesome for the code because it’s checking the content of the data by comparing it with some string literals, which are of different types in Python 2 and 3. To make sure the code could safely compare received data with string literals, we can use six.ensure_str()
to make sure the code can process received data like it used to:
six.ensure_str(self.rfile.readline())
Basically because of the type difference of string literals between Python 2 and 3, on the server side we need to convert data to bytes
before sending it out, and on the client side we need to convert the received data to whatever type the old code is expecting. The six
module comes very handy in these scenarios since it saves you from dealing with all the Unicode
conversion.
String type
Like mentioned above, the code receives data and checks if the format and content of the data is correct. One of the check is to see if the value of a key in a dictionary is a string by comparing it with the basestring
type, but Python 3 abandoned the basestring
type. To solve that we can simply use six.string_types
to check if something is a string or not:
isinstance(might_be_a_string, six.string_types)
Conclusion
This is the first repo I worked on to ensure Python 2/3 compatibility. There are a lot of hacks to tackle these kind of issues but Python developers have thought of most of them, and they have developed varies tools to make our life easier. six
is one of the most common one. Here are the lessons I learned:
six
is your best friend when tackling compatibility issues. Check ifsix
already have the utility to solve your problem before implementing your own.- The TDD approach eased my mind a lot when working on this repo. This repo doesn’t have a lot of tests(only have some system test) and code coverage of the existing tests are low. Before jumping into fixing everything, I added all the missing unit and integration tests and captured all the compatibility issue by simply running those tests in different python versions.
- If your code involves a client or server that communicates with it via network, add tests to run your code and the client/server code with different Python versions. You never know what could break when the two sides are running in different Python versions.