Python For Non-Techical Data Folks

It’s not as scary as you think!

Chris Bruehl
Learning Data
Published in
6 min readMar 22, 2024

--

Photo by Hitesh Choudhary on Unsplash

Python is a programming language.

That statement alone is enough to scare many Py-curious folks from diving into the Python data stack.

But what if I told you that you don’t need to be a Computer Science whiz to master one of the most versatile data tools out there?

In fact, if you are comfortable working with tools like SQL or Excel, the leap to Python really is less of a leap and more of a hop. No, it’s not “easy”, but you might be surprised how many skills are transferrable from those tools.

You might be skeptical, but one of the reasons why Python has become of the most popular tools for data analysts, scientists, and engineers is the fact that its data stack is built by data professionals for data professionals.

You do not need to have software engineer level understanding of algorithms or software development to leverage Python to add value at each stage of the data analysis workflow. To illustrate, let’s dive into the Python data stack.

The Python Data Stack

In order to perform most analytical tasks, you’re need to learn the following three pillars:

  • Base Python: The building blocks of the language
  • Pandas: A powerful library data manipulation & analysis
  • Matplotlib & Seaborn: Data Visualization

Base Python

You need to know the basic building blocks of the Python programming language, but you do not need to be an expert in manipulating them.

As an analogy, you may not be able to write laws or defend clients in court, but you probably know enough about law not to get in trouble.

The same is true for Python for data professionals. You need to understand the syntax and building blocks of the language, but you don’t need to be an expert. While learning base python is mostly a means to an end, you will also find your thinking start to get shaped by learning a proper programming language.

You should be able to work with and understand:

  • Python Data Types (numbers, text, lists, dictionaries, and more)
  • Conditional Logic (if, else, elif)
  • Arithmetic
  • String Manipulation
  • Defining Custom Functions
  • Importing External Libraries
  • And a few other key topics

Learning enough Python to be competent with data analysis takes about 10–15 hours. The reason why you don’t need more is because of libraries.

Define A Function Once…
…use it as many times a you like!

Libraries are collections of (free) custom functions & tools developed by others. They make performing data tasks much easier than if we were starting from scratch. We can do things like scrape data, perform any data manipulation we can in excel, build machine learning models, and much more thanks to the work of others.

If you are an analyst or data scientist, these are the most important libraries to learn first:

  • Pandas : A comprehensive data analysis library (think SQL + Excel)
  • Matplotlib: Highly Customizable Data Visualization
  • Seaborn: Complements matplotlib with convenient analytical visualizations

Let’s take a quick look at each of these libraries.

Pandas — Python’s foundational data library

Pandas is a library built for data analysis, manipulation, and visualization. While it isn’t a perfect substitute, I like to think of it as a hybrid of Excel and SQL.

Here, I write about the first few 5 functions to learn in Pandas

Importing Our Data With Pandas read_csv() function
Inspecting the Top (or head) of our Data With .head()

And here, I talking about selecting rows in columns in your data.

Using the Query method to filter rows using SQL like Syntax

Like Excel, it offers fantastic tools for data wrangling and visualization, and like SQL, it serves as a credible workhorse for building data pipelines. If you have ever worked with Excel, you might appreciate the following function — pivot_table(). Instead of ‘rows’, we use the term ‘index’, but with only one line of code we can generate powerful insights.

I go in depth on Pivot Tables in this article, but let’s take a quick look at how easy it is to pivot data in Pandas.

Here, I make a direct comparison between Excel & Pandas

Once you get comfortable with Pandas, you can dive deep into visualization.

Matplotlib — Highly Customizable Visualizations

Matplotlib allows you to build data visualizations from the ground up. The level of control you have over visualizations via code can be intimidating, and to be honest, tools like Excel and PowerBI are often better choices for many visualizations.

But it is quite easy to build charts quickly with helpful functions for charts like line charts, bar charts, scatterplots, histograms and more. Below, is the final product of my course on matplotlib. I don’t have the best design eye, but we can create compelling infographics after just a few hours learning the basics.

A short course on matplotlib can help you build standout visualizations

Seaborn — A Dream for Advanced Analytics Visualizations

Seaborn is built on top of matplotlib, but makes things much easier for the user. It also contains some advanced visualizations that would be very challenging to build in other tools, but with Seaborn, only take one line of code.

Remember our Pandas Pivot tables? Below, we’re creating a heatmap on our pivot table simply by calling seaborn’s heatmap function on it.

There are a host of other great visuals from seaborn, all of them have similar syntax. sns.plotting_function(dataset_name, x=, y=)

In all, Python has a steeper learning curve than other analytical tools, but it isn’t that much steeper, thanks to libraries like Pandas, Matplotlib, Seaborn, and many, many more. Once you get the hang of it, you’ll find you are able to create powerful recipes for analysis and data visualization that allow you to automate complex workflows and refresh them with ease.

With about 40 hours of course time, and some practice projects, you will start to feel very comfortable, and even powerful with data in Python, and you hardly need to be a CS whiz or have been coding for a decade to get up to speed quickly.

If you are interested in learning more about these topics, check out our statistics, data science, and Python courses at the Maven Analytics Website or Udemy. We will also be kicking off a 10 week Python cohort learning program at Maven on April 24th — sign up now to get a discount and secure your seat! You’ll get a chance to ask yours truly questions I can’t answer ;).

--

--