My experience of making Python 2 SocketServer code compatible with Python 3

Published in

The Startup

5 min readJun 20, 2020

Recently I have been changing some old Python 2 code to make them compatiable with Python 3.7 and 3.8. In order to dig out all the potential compatibility issues, I used the TDD approach: write a whole bunch of unit, integration and system tests that either run the code within the tests and run the tests in different python version, or put the code into a separate module and run that module in another process in different python versions using the subprocess module.

One of the repo I worked on has a class which is the subclass of the Python socketserver.TCPServer class(or SocketServer.TCPServer in Python 2). Although it didn’t take long to discover all the surprises thanks to the effectiveness of TDD, a lot of the bugs still gave me some “WTF that doesn’t work in Python 3??!!” moments, some of which are socketserver specific and some are common pitfalls anyone who’s changing their Python 2 code to work with Python 3 could encounter. Here are the ones I feel like worth sharing:

Want to read this story later? Save it in Journal.

socketserver vs. SocketServer

Turns out the module itself is called SocketServer in Python 2 and sockerserver in Python 3. This was a easy catch since all my tests were broken when running in Python 3 because of import error. At first I did what a newbie would do:

try:
    import SocketServer as socketserver
except:
    import socketserver

A more experienced co-worker pointed out I could use the six module for most of these compatibility issues. Using six I can simply do:

from six.moves import socketserver

Then replace all SocketServer with socketserver , problem solved. This is the first issue that introduced thesix module to me and I have been using it to fix compatibility issues since then.

Class initialization

The code I was working on is a subclass of socketserver.TCPServer , let’s call it MyServer for convenience. The MyServer class looks like:

class MyServer(SockerServer.TCPServer):
    def __init__(self, *args, **kwargs):
        ...
        SockerServer.TCPServer.__init__(self, *args, **kwargs)

It initializes its parent class SocketServer.TCPServer at the end of its __init__ method. The reason it didn’t dosuper(SocketServer.TCPServer, self).__init__(*args, **kwargs) is because SocketServer.TCPServer is a subclass of SocketServer.BaseServer and from Python 2.7 SocketServer source code, it’s defined as:

class BaseServer:
    ......

Notice it doesn’t inherit from object. That means it’s an old style python class which makesSocketServer.TCPServer one as well, so we have to call the __init__method of parent class explictly to initialize it, which is not the ideal way to do it. But in Python 3 socketserver source code the socketserver.BaseServer class definition didn’t change, which automatically makes it a new style class. In order to make the code do the right thing in Python 3 and fall back to the old way in Python 2, I had to do something like this:

class MyServer(socketserver.TCPServer):
    def __init__(self, *args, **kwargs):
        ...
        if isinstance(socketserver.TCPServer, object):
            # Python 3
            super(socketserver.TCPServer, self).__init__(*args, **kwargs)
        else:
            # Python 2
            socketserver.TCPServer.__init__(self, *args, **kwargs)

That way it could work in both Python 2 and 3, although the Python 2 part looks awkward.

Sending and receiving data

This issue was caught in the custom stream handler class, which is a subclass of socketserver.StreamRequestHandler . It has a method that sends response back to the client using self.wfile.write() . The problem was self.wfile.write() actually takes a byte type argument, and other codes were passing string literals into it. In Python 2 str and bytes are the same type. In fact bytes is just an alias to str , so you could pass string literals to functions that are expecting bytes with no issue. But in Python 3 string literals are Unicode type, you have to explicitly convert a string literal to bytes by either using the b notation or bytes constructor to get a bytes type string. Instead of doing all the encoding and decoding for Python 3 Unicode strings, we can simply use six.b to ensure we get a bytes string in both Python 2 and 3:

self.wfile.write(six.b(response_str))

But that’s not all of it. The code also receives data from clients using self.rfile.readline(), which returns bytes. That’s troublesome for the code because it’s checking the content of the data by comparing it with some string literals, which are of different types in Python 2 and 3. To make sure the code could safely compare received data with string literals, we can use six.ensure_str() to make sure the code can process received data like it used to:

six.ensure_str(self.rfile.readline())

Basically because of the type difference of string literals between Python 2 and 3, on the server side we need to convert data to bytes before sending it out, and on the client side we need to convert the received data to whatever type the old code is expecting. The six module comes very handy in these scenarios since it saves you from dealing with all the Unicode conversion.

String type

Like mentioned above, the code receives data and checks if the format and content of the data is correct. One of the check is to see if the value of a key in a dictionary is a string by comparing it with the basestring type, but Python 3 abandoned the basestring type. To solve that we can simply use six.string_types to check if something is a string or not:

isinstance(might_be_a_string, six.string_types)

Conclusion

This is the first repo I worked on to ensure Python 2/3 compatibility. There are a lot of hacks to tackle these kind of issues but Python developers have thought of most of them, and they have developed varies tools to make our life easier. six is one of the most common one. Here are the lessons I learned:

six is your best friend when tackling compatibility issues. Check if six already have the utility to solve your problem before implementing your own.
The TDD approach eased my mind a lot when working on this repo. This repo doesn’t have a lot of tests(only have some system test) and code coverage of the existing tests are low. Before jumping into fixing everything, I added all the missing unit and integration tests and captured all the compatibility issue by simply running those tests in different python versions.
If your code involves a client or server that communicates with it via network, add tests to run your code and the client/server code with different Python versions. You never know what could break when the two sides are running in different Python versions.