Product Analytics 📈 Pt.1

I recently took a 40-hour Product Management course taught by Rocky Sharma through Product School. The goal was to reinforce my PM fundamentals and learn from a diverse group of students as well as a seasoned instructor. Turns out in addition to the live PM course, students are also given access to a previous recording of the Product Analytics course, which is the content I’ll be sharing here below.

I plan to break the Product Analytics learnings into 2 parts, and cover the first three topics in part 1 and the latter three in part 2.

  1. Web Analytics
  2. Manage Large Datasets with SQL
  3. Advanced SQL
  4. Data Visualization
  5. Statistical Thinking
  6. Intro to Big Data and Machine Learning

1/ Web Analytics

Web analysis

The most popular web analysis tool is by far Google Analytics, simply because it’s free. The instructor of this course worked at Adobe Analytics, which is often used by larger corporations, such as Marriott. Optimizely and Mixpanel are also very popular options. The goals of web analysis include:

  1. Analyzing traffic source. The team needs to identify where visitors are coming from and invest in growing high converting traffic sources.
  2. Understanding user behavior. Here we want to analyze visitors’ behavior in order to reduce bounce rates and identify the popular content.
  3. Identify primary market. Finally, it’s time to identify the target audience and best customers to focus on.

A/B testing

A very popular and forever important tactic in the product and data world is A/B testing, which is when a hypothesis about the product is tested in the real world by running an experiment to a specific (or randomized) group of users. The test is ran for a limited amount of time, and the results are collected and reviewed to decide whether the variant is to be kept or scrapped.

The steps to running an A/B test include:

  1. Define Goals. In order to successfully run the test and make sense of the data, make sure to choose the right conversion metrics.
  2. Launch your experiment. Select a target audience and create variation.
  3. Turn results into action. This involves compiling the findings and next steps into an email, doc, or presentation, which should be communicated to execs, rest of the team, or the org to drive action and / or reach consensus. Even when you do not end up with the results you had originally hoped for or hypothesized, it is still very important to communicate the conclusions or next steps, and keep the team updated on what’s being experimented on and what to expect.
An example of an A/B test given in the slides.

And when it comes to structuring your A/B test, think in terms of goal, if, then and because:

GOAL: KPI* (e.g. CTR, page views)

IF: variable (e.g. website element)

THEN: predicted outcome (e.g. more signups)

BECAUSE: rationale (e.g. informed hypothesis, customer understanding)

*KPI: Key Performance Indicator is used to measure the success of a product or team in meeting a business goal or objective

There are also limitations to A/B testing. For instance, in order for the A/B test to be conclusive you must reach statistical significance, which may require either lots of users using the product or to spend more time collecting data or both. And for startups or companies with limited number of users, this could prove to be challenging. Furthermore, releasing a new feature may become problematic while an experiment is running because you’re introducing change. However, if you must release an update, make sure the change is made to both the experiment and the control group, and to consider what possible impacts the update may have had on the experiment or even the hypothesis — which may or may not be nothing.

2/ Manage Large Dataset with SQL

Introduction to databases

The course here defined what is a database, what differentiates databases, basic queries to retrieve data, and how to insert, update and delete records. There are lots of free educational material online to pick up basic SQL, for instance W3School.

Before databases, applications queried directly from file systems, which was unsound and insecure in many ways. Issues included:

  1. File systems allowed for multiple file formats and duplication of information in different files, which led to data redundancy and inconsistency.
  2. Integrity constraints, such as account balance > 0, had to be written as part of the program code, which made it difficult to add new constraints or change existing ones.
  3. Because file systems didn’t have auto-rollbacks or optimized failure sequence to handle incomplete transactions, update failures may leave the database in an inconsistent state with partial updates carried out.
  4. Concurrent access was also a problem. File systems could not manage concurrent user access, which leads to data inconsistencies.
  5. Lastly, file systems control access on a file level, as opposed to securing a single entry point, which is what today’s databases now use. Long story short, databases were finally built and popularized in 60s and 70s, and the basic components of a DBMS (database management system) is depicted in this image from W3School.
diagram portraying components of DBMS from W3Schools

Introduction to SQL

First, SQL is a declarative programming language, just like HTML as an example, which means it contains no loops, conditions, or explicit instructions like in Java or Python. To begin, there are 3 categories of SQL commands:

  • DDL (Data Definition Language) deals with database schemas and descriptions and how the data should reside in the database.
  • DML (Data Manipulation Language) deals with data manipulation. It includes most common SQL statements such as SELECT, INSERT, UPDATE, DELETE etc, and it is used to store, modify, retrieve, delete and update data in database.
  • DCL (Data Control Language) includes commands such as GRANT, and mostly concerns rights, permissions and other controls of the database system.

My understanding is that nearly all data scientists, analysts or PMs will be running and interpreting Data Manipulation Language, while your database admin uses DDL and DCL to do administrative tasks. Happy to learn more about this later.

Furthermore, the main datatypes are:

  • CHAR(size): fixed-length string
  • TEXT: long strings (up to ~65K characters)
  • SMALLINT(size): integers from -32768 to 32767.
  • INT(size): integer from ~-2B to ~+2B
  • UNSIGNED INT(size): integer from 0 to ~4B
  • FLOAT(size,d): decimal number
  • DATETIME(): date and time

And constraints on table schema include:

  • NOT NULL: forbid “NULL” or “empty” cells in the corresponding columns.
  • UNIQUE: the column cannot contain duplicates.
  • PRIMARY KEY: a combination of NOT NULL and UNIQUE. Uniquely identifies each row in a table.

Basic SQL queries

Basic operations include SELECT, AS, WHERE, WHERE / LIKE, DISTINCT, ORDER BY, and common conditions used include:

  • Equality. e.g., Column = 3.2
  • Difference. e.g., Column != ‘Apple’
  • Comparison. e.g., Column > 5
  • Set membership. e.g., Column IN (‘Apple’, ‘Pear’, ‘Orange’)
  • Fuzzy string matching. e.g., LIKE ‘%food%’
  • As well as conditions combined with AND, OR, and NOT operators

Some more advanced operations include:

  • COUNT, GROUP BY, AVG
  • Calculating summary statistics, such as mean, medians, percentiles
  • And subqueries, which is running queries inside a query. Here’s an example from sqltutorial.org:

SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id NOT IN (SELECT department_id
FROM departments
WHERE location_id = 1700)
ORDER BY first_name , last_name;

The result is a list of employee_id, first_name, last_name of employees not from department of location ID 1700 listed in order of first_name, then last_name.

subquery SQL example from sqltutorial.org

3/ Advanced SQL

GROUP BY

The GROUP BY operation aggregates all rows with the same value for the specified row title, and is often used in conjunction with COUNT(column_name) to display aggregate count. W3School explains this operation very well, and also provides an interactive editor to help students learn by editing and executing SQL code live.

GROUP BY also comes with two filters: WHERE and HAVING, and the difference comes down to the ordering of when the data is filtered. The WHERE statement is applied before aggregation, while HAVING is applied after aggregation. For instance,

SELECT COUNT(*) AS NumberProducts, CategoryID
FROM Products
WHERE SupplierID != 2
GROUP BY CategoryID
HAVING NumberProducts > 5;

Because WHERE is applied before aggregation, the SQL query is saying to exclude the products from supplier with ID 2 from the count before grouping by CategoryID. While the HAVING filter is saying to only output categories with 5 or more products, after the grouping (which excluded supplier ID 2) is complete.

JOIN

The JOIN operation is actually very common in SQL because most analysis and queries require fields from different tables. To merge data from a different table, you need to match a foreign key to a primary key, which is essentially a way to tell your database to combine these two tables using this unique identifier. Although the most common JOINs are INNER, Inclusive RIGHT JOIN, Inclusive LEFT JOIN, and Inclusive FULL JOIN, the image below also lists out Exclusive JOINs.

Different possible joins to perform.

UNION

If JOIN merges columns from different tables, UNION concatenates rows, but only from tables with the same schema. With UNION, you have an option to combine with deduping by using UNION, or combine while keeping duplicates if any by using UNION ALL. For example,

SELECT CustomerName AS name FROM Customers
UNION
SELECT SupplierName AS name FROM Suppliers

Lastly, because popular database commands include filtering or aggregation of rows, and mixing of columns from different tables, we do typically see JOIN used on a more regular basis than UNION.

While going through the course material, I also found this useful SQL cheatsheet by janikvonrotz (on Github). It’s very holistic and I imagine might come handy in the future.

On a final personal note, I feel very lucky to have done a database course during my technical undergrad. I remember paying much more attention to my public speaking elective for example (aka. I found database at the time very boring), but I’m now very grateful for that early exposure, and having received the opportunity to practice SQL and database basics earlier on in my technology career.

--

--

--

learning, staying curious

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Getting started with Azure Functions

Lessons learned after 20 years of developing, managing and selling a software product

Stipend Platform Changelog 17-Nov-2019

The Datavant Guide to Hosting a Company Hackathon

Did you try to press . key when reviewing pull request on Github?

New Endpoint: from start to test

Deploying Multiple Websites with Terraform

Auth0 vs Ory Kratos

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rox Guo

Rox Guo

learning, staying curious

More from Medium

Product analytics: financial model / part 2

Product Analytics with Short Notes 2 #PA&Tools

How to Reduce Churn

ML use cases in Ecommerce