Python

Programming languages are often judged by how simple or complex they are to print “Hello World!”

Here is this code in C:

#include<stdio.h>
main()
{
printf("Hello World!")
return 0;
}

Here’s the equivalent code in Python:

print("Hello World!")

This basic example shows how readable and natural the Python programming language is. Python was created by Guido Van Rossum in the late 1980s and named after his love of the show Monty Python’s Flying Circus. Today, Python is the fifth most popular programming language. There are tons of companies using Python for a variety of projects such as web development, data science, visualizations, natural language processing, graphical user interfaces and linux server scripting to automate every day tasks.

So, what exactly is Python?

First, let’s back up and explain programming.

Programming is just typing a set of instructions for a computer to do. Similar to learning a language in the real world, you must learn new syntax and grammar to communicate with your computer. You can write loops to make code run hundreds or thousands of times, something that would take way too long for humans to complete manually. For example, in about 10 lines of code you can create a program which reads every line in Harry Potter and the Sorcerer’s Stone (reading each line of text is an iteration of a loop) and tells you what the top 20 most frequently appearing words in the book are… all in less than a second!

Python is just one of many programming languages. Each language has its pros and cons, and programmers gravitate towards certain languages based on their use case (and maybe what their company uses).

Python is:

  • object-oriented: focuses on objects over actions, and data over logic
  • high-level: the written script must be compiled down to bytecode before it can run which adds time, but it is easier to write and read than low-level languages (machine languages). It is portable, meaning it can be used on many types of computers with very little, if any, modification.
  • dynamic: there’s more on this below, but basically you can change the type of an object throughout your code

Experienced programmers have two main nuances with Python - its interpretation of whitespace/indenting and its speed. For me, it only seems natural to indent code blocks and I haven’t ran into any issues with processing speed, so neither bother me or should be major problems for beginners.

If you are new to programming and deciding which language to learn, I would absolutely recommend Python. So far, it’s been easier to learn than many of the math classes I took in college. There’s also an active Python community and lots of free, online resources.

The main resource I am using to learn Python is the free book How to Think Like a Computer Scientist. Overall, I think this is an excellent book for beginners. It explains Python using real life examples and no confusing programming jargon. It discusses how to approach and think about problems in a scientific way, and not just how to code a solution. As the book introduces new topics, we revisit previous examples and learn to how write the same programs more efficiently.

I also watched an edX Python course, but stopped about 3 weeks in. Although the material is explained well, I absorbed much more, much quicker reading a book and coding along. This varies person by person depending on how you learn, but I zone out when someone else is speaking and I’m not actively working.

Python via command line

Initially, I was writing Python via the command line that comes preinstalled on all MacBooks. I don’t like this because all my code is lost after exiting and there’s no way for me to go back and review code I’ve written. It’s just not very beginner-user friendly.

Here is what it looked like:

Python via command line

Python in Jupyter

Then, I switched over to Jupyter notebooks, a web browser based application, which was previously called iPython. The name was changed to avoid confusion as Jupyter supports over forty programming languages and not just Python.

Here is that same code above in Jupyter:

Python notebook in Jupyter

Now, I can write a line of code, run it to make sure it does what it’s intended to, and write the next line of code based off that. I can run only specific parts of the code, without running the whole program. I can switch between markdown text and code to explain what I’m doing, which makes revisiting old code and understanding it much quicker. The code is color-coded so it’s easier to review. One major perk in Jupyter is tab completion, where pressing tab on your keyboard automatically shows all possible attributes to the object you are working with. Also, I haven’t played with this yet, but Jupyter allows magic commands to write different languages in different cells within one notebook.

Here is Part Two of the syntax guide I made following HTTLACS. Jupyter notebooks are easily saved and shared via email, GitHub, or Dropbox and the format looks just as neat as it does in the notebook browser. This could be beneficial for a presenter giving a demo.

Python vs. Scala

As a small side-project to prepare for Spark Summit in NYC, I learned very basic Scala programming using a Databricks notebook. Scala is an acronym for “scalable language”, meaning that Scala grows as your needs grow. Scala is an object-oriented and functional programming language. After working with Python, the object oriented part makes sense to me. In the future, if I need a more powerful, functional language, I may learn more Scala. I’ve read about the first third of Atomic Scala.

Of course, there are some expected syntax differences between the two languages. One major difference is that Python is a dynamically typed language while Scala is statically typed. Dynamic languages are generally easier for beginners to work with and are good for quick development of prototypes. Static languages are better suited for important applications where bugs in the code can be very costly (air traffic control, telecommunication systems or large distributed applications that run on hundreds of machines).

For example, in Python, you can set x = 10 and then in the next line, change the type of x from an integer to a string and completely change the data that x contains from a number to text:

In Scala, when creating new objects, you have to say whether they will be variables or values. If you make a variable, you can change its data, but not the type of data.

This is valid because in both cases x is an integer (10 or 9):

But this is not valid because we’re trying to change the the type that x holds from an integer to a string:

Scala also has values, along with variables. If you make a value, you can never change the type or the data that is held in the value. So once a value like x = 10, it is forever going to remain an integer with value 10 and cannot be changed at all:

If you only use values in your code, you have to keep creating new values to hold new things instead of modifying old values. Here we’re just creating a new value (y) to hold the 9 and leaving x unchanged:

Jupyter vs. Databricks

Here is the code I wrote in the Databricks notebook, which supports R, SQL, Python and Scala. One cool feature Databricks offers is collaborative notebooks, so more than one person can write code at a time (although this is a little slow with more than one person in a cell). I like the cell command user interface better than Jupyter’s — the commands are in each cell as opposed to the top of the page.

Cell commands in Databricks are found in each cell via a dropdown menu
Cell commands in Jupyter remain at the top of the page regardless of where the cell is

One issue with Databricks notebooks is that they can’t easily read files directly from my computer. Instead, I must first upload these to some sort of storage like Amazon S3 or a database. It is possible to upload comma seperated structured files to Databricks directly from a website into tables, but unstructured text like digital books cannot be directly uploaded. I guess Databricks is meant to be used with data that is too big to fit on a single laptop, so I have a different use case than their typical customer.


Overall, I’ve really enjoyed working with Python. It’s not intimidating and I can understand the real life use cases. I’ve started a text analysis mini-project in Python which I’ll blog about soon.

If you’re a programmer and have any additional pros, cons or comparisons between Python, Scala, Jupyter and Databricks, please leave a comment below.