401k Tableau Dashboard Using Google Sheets, BigQuery and Jupyter Notebook With Tableau Desktop Connectors and APIs

Ryder Nguyen
The Startup
Published in
8 min readSep 29, 2020

Recommended Prerequisites: Understanding of Python (pandas & working with dataframes), Jupyter Notebook, SQL table operations, Tableau Desktop (calculated fields, parameters) and GitHub

Codes can be found here. Below are the separate dashboards created for desktop and mobile.

For this project, I want to create an interface that presents information relevant to my 401k investment in one place instead of having to switch between different platforms. My solution is to combine data from Fidelity Investment, Yahoo Finance, Morningstar, Investopedia and mutual fund universe data from Kaggle for both tracking and research purposes. Utilizing my current knowledge of the Google Cloud Platform, coupled with the power of Python and APIs, I created a Tableau dashboard that centralizes the data concerning my 401k portfolio that I’m most interested in.

  1. WHY CHOOSE ONE PRODUCT/TECHNOLOGY VERSUS ANOTHER
  2. SETUP OF THE WORKSPACE
  3. DESIGN WITH THE FINANCE FOCUS IN MIND
  4. MAKE MAGIC WITH TABLEAU
  5. NEXT STEPS

WHY CHOOSE ONE PRODUCT/TECHNOLOGY VERSUS ANOTHER

One lesson I’m still learning is to choose where and when to transform and store my data. For example, would it be more useful to store all price and return data from all the investment funds in BigQuery storage or Google Sheets?

BigQuery would process much quicker but data transformations such as row operations (calculating daily log returns and portfolio weighted returns) would be much easier in Jupyter. This project is quite math heavy so it would help to be able to see the data on google sheets. Moreover, considering the built-in google finance functions and the size of the data (about 40 individual investments), google sheets is a better choice. However, the universe data from Kaggle is massive and it would break Google Sheets so BigQuery storage is a better choice in this case. If there are complicated calculations involved, you could always read from BigQuery to Jupyter and then rewrite to BigQuery. Another thing to note is if you’re using Tableau Desktop with a public license, you can still follow this tutorial since you can make connections to Google Sheets without a problem though you won’t be able to connect to BigQuery. The data pipeline is presented below:

SETUP OF THE WORKSPACE

For the most part, working with Jupyter notebook and BigQuery API follows the same steps as in my first project setting up (ETL data pipeline for the COVID-19 Dashboard). The difference this time is setting up the communication between Jupyter and Google Sheets. In this tutorial, we will be using Python APIs for Google Sheets package called gspread.

  • First, make sure APIs are enabled for these products: Google Sheets, Drive, BigQuery.
  • Second, create a service account that has the compute role in your cloud project, call it something along the line of “compute-engine”. Create an API key for the service engine and keep it in the same folder as your Jupyter notebook. Grab the email address while you’re here!
  • Next, create a google sheet named “personal_finance_test” (make sure that the name is somewhat unique because the API will pull it automatically). Share it with the service account email you copied from last step (make sure to assign the role of editor). This will let the service account modify the google sheet once we make API calls from Jupyter. You can take a look at the google sheet I created HERE.
Share the worksheet with compute-engine

Each sheet is then a “dataset” to be joined in Tableau. Please note benchmark mapping is on the “Master” sheet. This is manually modified due to availability of the benchmarks chosen by Fidelity. If benchmark data is not available, the investment is compared to S&P 500 by default. See also that there are formulas to format the string in order to enter into a Tableau calculated field.

  • Next, authorize google sheets using the following:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("{API compute-engine API key}.json", scope)
sheetsclient = gspread.authorize(creds)
#Open the workbook
workbook = sheetsclient.open("personal_finance_test")

Now you’re ready to modify the google sheets by way of the service account “user.” The workbook represents the whole google sheets. Take a look at the methods for gspread HERE.

DESIGN WITH THE FINANCE FOCUS IN MIND

As I mentioned earlier, it is always a challenge of being aware of the capabilities and limitations of each technology to get the results you’d want. We’ll take a look at some example:

1. Yahoo Finance vs. Google Finance

In this project, I used the python package yfinance to get information about the mutual funds of interest. An alternative to that would be to use google finance formulas to grab the historical prices. Because log returns are usually not reported on most publicly available datasets, data transformation would have to be done in Jupyter, making yfinance a better choice. As an alternative, you could get the daily prices from google finance, read into Jupyter and then transform it. Both tools provides ways to gather mutual funds ratios and other performance indicators so for the purpose of this project, I pulled data from both (reformatting the dictionary results from yfinance and cell formulas in google sheet).

2. Data Transformation (SQL vs. Python vs. Tableau)

One of the good things about having most of the data in google sheets is that because this project is quite math-heavy, I want to be able to do quick spot checks to make sure data flows correctly and calculations are accurate. I did most of the data cleaning in Python to make sure everything is of the same standards (check for missing data, sync to the same number of trading dates, portfolio return calculations). For sure you could choose either SQL or Tableau for the job but it’s just a personal preference. However, there are cases where SQL or Tableau would do a better job. For example, working to transform mutual fund universe data requires significant compute power and some operations (PARTITION and LAG) just work way better in SQL. As for Tableau, one of its strongest tools is its calculated fields. It is data transformation in a way because these can be used to calculate certain metrics and since we’re working with financial mathematics, certain calculations are reserved for Tableau (calculating SHARPE ratio, running cumulative log returns, etc.).

MAKE MAGIC HAPPEN WITH TABLEAU

If you haven’t worked with Tableau before, the designing process would go from designing worksheets > dashboards > story. Each worksheet contains a specific visualization which then a dashboard contains many worksheets. Lastly, a story contains many dashboards. The only way to learn how to design is to try and fail and learn and retry. I have two friends who helped me tremendously by giving me feedback so I have two people to thank. There are things I couldn’t see that they could. When you design a user experience, you need user input. There is also a difference between making the dashboard work on both desktop and mobile devices. I came up with two separate Tableau packaged file for each. For example, the desktop layout would work better with a story since we have enough display space for that while mobile requires navigation buttons to make it feel more like an app. I won’t go much into details about Tableau but I want to point out some important things that I learned:

  1. Come up with a couple of parameters that are useful for your dashboard.

For example, if I would like to be able to filter the investment performance using a start and end date and at the same time, be able to toggle between the whole portfolio and individual investments, I would create those parameters first because then those will be used in calculated fields to filter data.

2. Tool tips are your friends! This would help your users understand what they’re looking at and in Tableau, this is a built-in tool that is highly customizable.

3. Add actions to your dashboard to make the customer journey more enjoyable.

There are many things you can accomplish using actions. The “Go to URL” option gives me more design freedom. Let’s say you want the user to explore more about the investments but you also want them not to exit the dashboard, this can be done by clicking on a text field and display a dropdown of the external URL(s). Another use of actions in my dashboard is to change set values. This comes in handy for the comparison basket. It lets you change the member(s) of a predefined set dynamically without taking away from the user experience.

4. Embed external websites using a parameter, exemplified by the Metrics/Terms Explanation page.

When you embed a webpage in Tableau, you can specify the URL and another cool thing is you could incorporate a parameter in the URL builder in order to navigate. One thing to keep in mind is that some webpages won’t display because of their security settings like Yahoo Finance, Morningstar and Investopedia. You can still use a parameterized URL however in a dropdown list to external sites like the “More Resources” highlight text on Individual Investment page.

5. Easter Eggs!

Tableau has string splits! I used this when I was importing data from BigQuery. Specifically, my column labels for the percentile_universe table are in the form of “fund_return_3years.” When you click split in Tableau, this would automatically create 3 fields (fund, return and 3). What’s great is that the last field is automatically a measure. Cheers!

NEXT STEPS

  • Incorporate machine learning and data models (eg. Markov-Switching Multifractals, GARCH) to forecast volatility of individual investments and the portfolio
  • Build a sandbox to visualize how a different mix of investments will perform in the future with a set of assumptions
  • Design a robust interface to compare two investments
  • Customize alerts for user-defined thresholds

--

--

Ryder Nguyen
The Startup

Data Analyst | A self-identified creative mind trapped inside a mathematical brain | https://rydernguyen.github.io/Portfolio/