Python For Non-Techical Data Folks
It’s not as scary as you think!
Python is a programming language.
That statement alone is enough to scare many Py-curious folks from diving into the Python data stack.
But what if I told you that you don’t need to be a Computer Science whiz to master one of the most versatile data tools out there?
In fact, if you are comfortable working with tools like SQL or Excel, the leap to Python really is less of a leap and more of a hop. No, it’s not “easy”, but you might be surprised how many skills are transferrable from those tools.
You might be skeptical, but one of the reasons why Python has become of the most popular tools for data analysts, scientists, and engineers is the fact that its data stack is built by data professionals for data professionals.
You do not need to have software engineer level understanding of algorithms or software development to leverage Python to add value at each stage of the data analysis workflow. To illustrate, let’s dive into the Python data stack.
The Python Data Stack
In order to perform most analytical tasks, you’re need to learn the following three pillars:
- Base Python: The building blocks of the language
- Pandas: A powerful library data manipulation & analysis
- Matplotlib & Seaborn: Data Visualization
Base Python
You need to know the basic building blocks of the Python programming language, but you do not need to be an expert in manipulating them.
As an analogy, you may not be able to write laws or defend clients in court, but you probably know enough about law not to get in trouble.
The same is true for Python for data professionals. You need to understand the syntax and building blocks of the language, but you don’t need to be an expert. While learning base python is mostly a means to an end, you will also find your thinking start to get shaped by learning a proper programming language.
You should be able to work with and understand:
- Python Data Types (numbers, text, lists, dictionaries, and more)
- Conditional Logic (if, else, elif)
- Arithmetic
- String Manipulation
- Defining Custom Functions
- Importing External Libraries
- And a few other key topics
Learning enough Python to be competent with data analysis takes about 10–15 hours. The reason why you don’t need more is because of libraries.
Libraries are collections of (free) custom functions & tools developed by others. They make performing data tasks much easier than if we were starting from scratch. We can do things like scrape data, perform any data manipulation we can in excel, build machine learning models, and much more thanks to the work of others.
If you are an analyst or data scientist, these are the most important libraries to learn first:
- Pandas : A comprehensive data analysis library (think SQL + Excel)
- Matplotlib: Highly Customizable Data Visualization
- Seaborn: Complements matplotlib with convenient analytical visualizations
Let’s take a quick look at each of these libraries.
Pandas — Python’s foundational data library
Pandas is a library built for data analysis, manipulation, and visualization. While it isn’t a perfect substitute, I like to think of it as a hybrid of Excel and SQL.
Here, I write about the first few 5 functions to learn in Pandas
And here, I talking about selecting rows in columns in your data.
Like Excel, it offers fantastic tools for data wrangling and visualization, and like SQL, it serves as a credible workhorse for building data pipelines. If you have ever worked with Excel, you might appreciate the following function — pivot_table(). Instead of ‘rows’, we use the term ‘index’, but with only one line of code we can generate powerful insights.
I go in depth on Pivot Tables in this article, but let’s take a quick look at how easy it is to pivot data in Pandas.
Once you get comfortable with Pandas, you can dive deep into visualization.
Matplotlib — Highly Customizable Visualizations
Matplotlib allows you to build data visualizations from the ground up. The level of control you have over visualizations via code can be intimidating, and to be honest, tools like Excel and PowerBI are often better choices for many visualizations.
But it is quite easy to build charts quickly with helpful functions for charts like line charts, bar charts, scatterplots, histograms and more. Below, is the final product of my course on matplotlib. I don’t have the best design eye, but we can create compelling infographics after just a few hours learning the basics.
Seaborn — A Dream for Advanced Analytics Visualizations
Seaborn is built on top of matplotlib, but makes things much easier for the user. It also contains some advanced visualizations that would be very challenging to build in other tools, but with Seaborn, only take one line of code.
Remember our Pandas Pivot tables? Below, we’re creating a heatmap on our pivot table simply by calling seaborn’s heatmap function on it.
There are a host of other great visuals from seaborn, all of them have similar syntax. sns.plotting_function(dataset_name, x=, y=)
In all, Python has a steeper learning curve than other analytical tools, but it isn’t that much steeper, thanks to libraries like Pandas, Matplotlib, Seaborn, and many, many more. Once you get the hang of it, you’ll find you are able to create powerful recipes for analysis and data visualization that allow you to automate complex workflows and refresh them with ease.
With about 40 hours of course time, and some practice projects, you will start to feel very comfortable, and even powerful with data in Python, and you hardly need to be a CS whiz or have been coding for a decade to get up to speed quickly.
If you are interested in learning more about these topics, check out our statistics, data science, and Python courses at the Maven Analytics Website or Udemy. We will also be kicking off a 10 week Python cohort learning program at Maven on April 24th — sign up now to get a discount and secure your seat! You’ll get a chance to ask yours truly questions I can’t answer ;).