Writing Cleaner Code: Part 1 @ RC

Monique Tuin
Jul 27, 2017 · 5 min read

I’m spending six weeks this summer at The Recurse Center, a self-directed, educational retreat for programmers in New York City. RC’s philosophy is influenced by the educational movement of unschooling, and the goal is to become a dramatically better programmer.

One of my goals at RC is to fill in some of the fundamental computer science knowledge I’m missing as a self-taught programmer. To start, I worked through UC Berkeley’s CS61A: The Structure and Interpretation of Computer Programs, which is a great introduction to programming abstractions in Python.

Mapping Twitter Trends: Filter, Lambda and List Comprehensions

I worked through the project Twitter Trends, which develops a geographic visualization of Twitter data from across the USA. My goal was to apply concepts from the course, to shift from my usual aim of getting to a working product, to writing more elegant code.

As an example, let’s consider one function, extract_words, which takes a tweet string as input, and outputs a list of words in that tweet. Words should be substrings of the tweet that contain only ASCII letters (i.e. no punctuation or special characters). For example,

>>> extract_words(“paperclips! they’re so awesome, cool, & useful!”)[‘paperclips’, ‘they’, ‘re’, ‘so’, ‘awesome’, ‘cool’, ‘useful’]

Easy enough, right? My first attempt at this function involved a slightly complicated mess of a for loop and nested if statements, iterating through each character and deciding if it should be added to a word or not.

My first attempt at a function for extract_words()

It worked, but it’s a mess, so let’s try to clean it up. Firstly, Python’s split function returns a list of all words in a string, splitting on whitespace characters by default. We can modify this to split on all characters that are not letters using regular expressions. A useful regex is [^a], will match any character except a. I therefore use the line

new_text = re.split('[^a-zA-Z]', text)

to split text by any character that is not a lower or upper case letter.

Now new_text contains a list of words in the tweet, with one catch: if two non-letters appear in a row, this causes another split, and our list can be filled with some empty words ''. Rather than using an if statement to find and remove these empty words, we can either apply a filter function, or use a list comprehension to remove the poor empty breaths. I tried both!

Removing empty words with filter and lambda function: Map, filter and reduce are functions that operate on sequences of elements, without the need for explicit flow control or reference to individual elements, like you’d normally find in a for statement. In particular, filter tests each element in a sequence, and only the elements evaluating toTrue are returned in a new list.

We can apply filter with a lambda function — a simple, anonymous inline function — which checks whether x is not an empty word ''. With these improvements, the extract_words function now looks like this:

Cleaned up function for extract_words()

Removing empty words using list comprehension: An alternative way we could have removed empty words in new_text is using a filtered list comprehension, which takes the form:

[expression-involving-loop-variable for loop-variable in sequence if boolean-expression-involving-loop-variable]

This evaluates the boolean expression for every item in the list, and only keeps elements for which the expression is True. Our function now looks like this:

A function that started as 14 lines of code is now 4! 🎉 This project was a chance to implement some aspects of functional programming to clean up my code, and the results were some pretty maps of Twitter data (red = 😁, blue = 😔 ):

Tweeters in Nevada and Utah seem to love winter ❄️
Americans are divided on their feelings toward pineapple 🍍

Building a Logo Interpreter: Recursive Functions 🐢

The second project I worked on was to build an interpreter for the Logo language, which is a Lisp dialect. An interpreter is a program that performs instructions written in a programming language. There are several ways to do this; the Logo interpreter translates the source code into an intermediate representation (Python code), and then immediately executes this. This is in contrast to a compiler, which translates the whole program into machine code before the program is run.

Logo is a simple but powerful functional language, meant to be conversational in nature. The course gave a helpful description of the syntax of the language, which is quite straightforward:

? print sum 2 3 
5
? print "hello
hello
? print last [1 2 3]
3

A Logo interpreter makes use of a read-eval-print loop (REPL). First, the read function takes an expression from the user, parses it and stores it in a data structure. Logo words are represented in Python as strings, while Logo sentences are represented in Python as lists. Next, the evaluator evaluates each line of the expression by calling a function. Finally, the print function returns the result to the user.

A useful concept to introduce here is that of a recursive function: a function that calls itself. Recursive functions consistent of a conditional statement calling a base case (a simple problem that can be solved without recursion), followed by a recursive call in which, for every call, the problem is downsized and moves towards the base case. This is most easily understood through an implementation of the factorial:

def factorial(n):
if n == 1:
return 1 #base case
else:
return n * factorial (n - 1) #recursive call

When building the evaluator, several recursive functions were required, one of which being logo_type, which uses Python’s print statement to implement Logo’s type procedure, printing the contents of any word or sentence, while putting square brackets around any sublist:

? type [a [b c] d]
a [b c] d?

For most inputs, this function is very simple, calling print(x, end='') to print the input without a new line. For nested lists, however, a recursive call to logo_type allows us to evaluate the contents of the list, either printing it (if it is a primitive type) or calling logo_type again if it contains another list. I ran into a funny bug with my if statement, which resulted in me giving a 5 minute presentation at RC titled “When list != list”. 🐛

tl;dr, I learned that types like list, dict and tuple are not reserved keywords in Python. ⚠️ Beware!

Overall, building an interpreter was an interesting task that required thoughtful application of the topics covered in the course — more than once, I spent a couple hours staring at my screen before writing 10 lines of code, but once I understood, the implementation itself became quite straightforward.

If you’re interested in learning more about programming abstractions, the textbook this course was based off of is a great resource. Both these projects were opportunities to take advantage of new concepts and start writing cleaner code. 💯

That’s part 1! ✅ 👩🏼‍💻

Monique Tuin

Written by

👩🏼‍💻Recurse Center Summer 2 '17.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade