An exercise in refactoring

Patryk Zawadzki
Blog | Mirumee
Published in
3 min readOct 20, 2016

Consider this (unfortunately) common pattern in Python programming:

class Extractor:
def __init__(self, source):
self.source = source
def extract(self):
results = []
for token in self.source.split():
results.append(token.upper())
self.results = results
data_source = 'unu du\ntri\tkvar'
ex = Extractor(data_source)
ex.extract()
print(ex.results)

Please excuse the above example for it is rather silly.

The general pattern is that you have a class that you instantiate with all of the necessary arguments, then call its “do the actual work” method and finally access the result using an attribute.

If you have a minute to spare, stick around and follow me as I make a few observations about the above code.

There seems to be a whole lot of boilerplate involved in using the class. Each time I want to use the code, I need to do it in three steps. The code is mostly copy-and-paste save for the arguments passed to __init__().

But is there really a reason to assign the result to an attribute? What if I made the method return the data instead? This will save me some work each time I use the class:

class Extractor:
def __init__(self, source):
self.source = source
def extract(self):
results = []
for token in self.source.split():
results.append(token.upper())
return results
data_source = 'unu du\ntri\tkvar'
print(Extractor(data_source).extract())

That’s better. I’ve turned three lines per use into one. That’s 66.6% better!

But what if I could reuse the instance and only pass the arguments when I’m ready to do the work? The code is synchronous so it should not matter much so let’s create the instance first and then just call its extract() method:

class Extractor:
def extract(self, source):
results = []
for token in source.split():
results.append(token.upper())
return results
extractor = Extractor()
data_source = 'unu du\ntri\tkvar'
print(extractor.extract(data_source))

Much better. But is it perfect? I’ve traded unreadable code for this weird global variable.

Now that I think about it I only create the instance so I can call its method. That means the method is… static, right?

class Extractor:
@staticmethod
def extract(source):
results = []
for token in source.split():
results.append(token.upper())
return results
data_source = 'unu du\ntri\tkvar'
print(Extractor.extract(data_source))

The instance is no more! Let’s call it a day.

Wait a minute. Now that instance is gone and the only method is static, why do I need to keep the class around?

def extract(source):
results = []
for token in source.split():
results.append(token.upper())
return results
data_source = 'unu du\ntri\tkvar'
print(extract(data_source))

Now that’s more like it. I would be happy to leave this code as is but I can do even better.

The function still makes the call on which structure should be used to store the result. Or whether they should be stored at all. What if there were millions of results and I only needed the first three? What if all results could not fit into my computer’s memory?

I can mitigate all that by turning my function into a generator:

def extract(source):
for token in source.split():
yield token.upper()
data_source = 'unu du\ntri\tkvar'
print(list(extract(data_source)))

Now it’s my job to turn the results into a list. Or only fetch the first three. Oh, the possibilities! We’re also down from 14 lines to 6 lines and the code is much easier to read.

--

--