Product Management can be a challenging role, with demands pulling you in many different ways. So anything we can do to minimise time spent on repetitive tasks or to maximise the value that we get from data should be capitalised to the full.
Python is a popular open-source programming language which is relatively easy to learn. It can help you automate the boring stuff!
In this series of posts, I’ll show a couple of examples of how mastering Python can free up valuable time and provide deeper insights to you as a Product Manager.
Step one — Install Python
Installing Python depends on you Operating System and your requirements.
Recommended — use the Anaconda distribution
Anaconda is a distribution which contains Python and many of the most common and popular additional packages that you might need. It also gives you a clear and sane way of adding new packages and keeping stuff up to date. It’s probably the simplest way to get started, and it’s widely used so lots of help out there.
Installation instructions can be found online and are easy to follow.
If you use Anaconda, you’ll want to familiarise yourself with the
conda command for package management. Again, there are good instructions available online.
You can search for packages:
conda search <package>
You can install a package:
conda install <package>
and you can update a package:
conda update <package>
Also good but maybe a bit more complicated — install from Python.org
If your using Python without the Anaconda distribution, then pip is the package manager you’ll use.
Step two — Jupyter Notebook
There are several ways to run a Python program. Most simply, a text file with a .py extension can be saved and then run with the Python interpreter. Python is an interpreted language, so rather than compiling your code to an executable file (e.g. MyApp.exe) the scripts are interpreted at run time. This makes it good for experimenting!
To make the whole process even more immediate, a web-based interactive “notebook” is available through the Jupyter project. This allows you to write and execute Python scripts via your browser in an intuative way which lends itself to nicely documented, repeatable experiments.
If you’ve used Anaconda, you should have Jupyer installed already. If not, installation instructions are available.
To start the browser based notebook, run the following:
All being well, a web browser will open up showing you a file / directory browser.
Add a new notebook for one of the Python versions you have available (e.g Python 3 for me in the above screenshot).
Just to prove it’s all working, let’s run some Python code. Type or paste the following into the cell:
print("Hello Product Managers")
Now, whilst the cell is selected, press
ctrl-return and you should see the output of the code below.
Notebook keyboard shortcuts
There is, of course, a lovely toolbar and menu structure. But for speed, there are a number of keyboard shortcuts available which make life easier. Here’s a few to get you started:
ctrl-enter- Run current cell
shift-enter- Run current cell and create a new empty cell below
esc- Enter command mode. This is when you can do commands on the cell, rather than entering text into the cell. The left-hand border of the cell is green in edit mode, and blue in command mode.
a- New cell above current cell (when in command mode)
b- New cell below current cell (when in command mode)
That’s probably enough to get started! A few shortcuts go a long way! There are many more shortcuts to learn though, once you’re up and running.
Step three — learn Python
Now, I’m not best placed to teach you Python and its basic syntax. Many people are better equipped to do that, and there is a wealth of information, courses and tutorials out there which will give you a good grounding.
Some pointers to great resources for learning Python:
- Learn Python — interactive tutorials
- The Python Tutorial — the official tutorial and tour of the Python language
- Real Python — more tutorials and tips
Or, you can grab a book!
Step four — Python modules for data analysis
So, I kind of shirked my responsibilities in the last step, didn’t I! Well, to make up for it, assuming you know some basic Python, here’s some stuff that will make it all seem worthwhile. And if you haven’t learned yet, hopefully this will inspire you to get stuck in!
Pandas is a great module that helps you load, transform and analyse data quickly in Python. It’s definately a modeule to explore if, like me, you spend a lot of time looking at data an asking questions.
Specifically, Pandas is a useful tool in your workflow for the following common data analysis tasks:
- Loading data from CSV files or directly from a database
- Viewing, filtering and sorting the data
- Looking at common statistics and aggregations of the data
- Joining and merging data together
- Grouping, pivoting and transforming the data
- Plotting and visualising the data
We’ll work through examples of each of these in the next installment — but to make sure you’re hooked, here’s a little example. Imagine we’ve obtained a tab delimited text file with data about sessions for our SaaS application. Each row represents one session, with columns for:
- A unique ID
- Session start time
- Session end time
- A user ID
My sample file has several years worth of data and is about 100Mb in size. Let’s have a look what we can do with that!
# Show graphs and charts inline in the notebook
%matplotlib inline # Import some libraries
import pandas as pd
import matplotlib.pyplot as plt # Set some default plot styles to make things look nice matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)
plt.style.use('bmh')# Import our session data from a tab delimited text file, making sure the start and end dates are loaded as dates
df = pd.read_csv('monthly_sessions.csv', sep='\t', parse_dates=['end_datetime', 'start_datetime']) # Show the top few rows of
# Let's see how many sessions we've loaded from the file len(df.index)1293267# Let's count sessions per month and plot to see the trend df.groupby([df.start_datetime.dt.year, df.start_datetime.dt.month]).agg('count').id.plot() plt.title('Sessions per month')
# Let's calculate the duration of each session df['session_duration'] = df.end_datetime - df.start_datetime # And then look at some stats for session_duration
# Let's look at the number of sessions by day of week df.start_datetime.dt.weekday.hist(bins=[0,1,2,3,4,5,6,7])<matplotlib.axes._subplots.AxesSubplot at 0x7feea73a93c8>
Hopefully that gives you a glimpse into how you can load, manipulate and visualise data.
But I can do that in Excel!
Now, you could do that in Excel, but where Python comes in super useful is that (depending on your PC specs) it can handle larger files more quickly and that it’s super easy to repeat your analysis and even automate it. Imagine a slightly more complex example of the above analysis pulling data from several databases or files — you can re-run the analysis at the click of a button, obtaining new data and presenting new outputs.
In the next instalment, we’ll look in more detail at some common tasks and how to achieve them with Python & Pandas.
If you’ve got any comments, suggestions or requests then please let me know in the comments.
Read the rest of the series
Follow the full series of posts to master Python!
Originally published at productmetrics.net on February 14, 2019.