Coding Markowitz’s Efficient Frontier with Python and Streamlit

Guilherme Ziegler
14 min readMar 26, 2024

--

In this series of posts, I will delve into crucial technical skills that every economist should master before graduating from university.

In this inaugural part, I will walk you through coding Markowitz’s Efficient Frontier using Python and Streamlit.

In part 1, we will explore how to run simulations to optimize portfolio allocation. In part two we will cover backtesting trading strategies to evaluate the practical applicability of Markowitz’s theory.

Our primary aim is to decide if we need to rebalance portfolios because of shifts in risk.

If you want to see how this work concludes, just wake up my app in Streamlit and create your own simulations.

Introducing the Efficient Frontier

The efficient frontier (EF) theory, introduced by Nobel Laureate Harry Markowitz in 1952, is a fundamental concept in modern portfolio theory (MPT). It evaluates portfolios based on their return (y-axis) versus risk (x-axis).

The compound annual growth rate (CAGR) represents return, while annualized standard deviation indicates risk.

The efficient frontier visually displays portfolios that maximize returns for a given level of risk. Investors aim to build portfolios that offer high returns and low combined risk.

It relies on statistical covariance among assets to determine if their returns are less synchronized, meaning they are less likely to move in the same direction at the same time. Thus, the portfolio risk decreases.

Mean Variance Optimization (MVO)

MVO is an optimization technique that determines the weights of assets in a portfolio to achieve higher returns with lower risks.

It relies on historical data to estimate expected returns and covariance matrices. However, MVO does not differenciate positive from negative risk. Since investors tend to risk aversion, they might prioritize avoiding negative returns over capturing potential positive returns.

Adicionally, MVO does not consider factors such as taxes, transaction costs, liquidity, technical and fundamental analyses. Consequently, it overlooks asymmetric information such as insider trading.

As the bull gores and the bear swipes downward, investors are more likely to fear market drawbacks than enjoy gains.

Getting to know the Sharpe Ratio

In order to get best allocation portfolios we must define three fundamental variables:

  • Rp = Expected portfolio return
  • Rf = Risk-free rate of return
  • StdDev Rp = Standard deviation of portfolio return (or, volatility)

The expected portfolio return is calculated using empirical data from the past, typically by computing the average of daily logarithmic returns. To annualize this return, it’s common practice to consider 252 trading days

The risk-free rate of return represents the investment return that one could achieve without taking any risks. It is usually associated with assets like bonds and other securities.

Standard deviation is a dispersion metric that shows how much data points deviate from the mean. Variance is calculated by averaging the squared differences between each data point and the mean. In terms of stocks, it can be understood as how much daily returns deviate from the average return.

As expected returns and standard deviation are determined by empirical market data, it is a significant assumption to expect them to remain constant in the future. This is a key weakness of Modern Portfolio Theory (MPT), as most time series are recognized to be non-stationary stochastic processes.

An investor who embraces risk would accept higher risk for higher expected returns. Conversely, an investor with risk aversion would prioritize achieving higher returns while considering lower risk.

We use those variables to calculate the Sharpe Ratio, a metric that quantifies the relationship between return and risk. Typically, a Sharpe Ratio of one indicates an equal balance between risk and expected returns:

Equation 1. Sharpe Ratio formula

While the Sharpe Ratio offers a standardized measure of the risk-return tradeoff, portfolios are typically optimized for maximum Sharpe Ratio.

However, investors still have the flexibility to choose their preferred level of risk and calculate the weight compositions that generate the desired return.

Seting streamlit up

To start this project, you need to have some familiarity with Streamlit. I will not be covering how to build your application in Streamlit here since they have an excellent tutorial to guide you through the process.

An important hint, though, is how to use Streamlit’s session state to retrieve, save, and modify data throughout the project. For that, we are going to use the following code:

import streamlit as st
import pandas as pd

class SessionState:
def __init__(self, **kwargs):
for key, val in kwargs.items():
setattr(self, key, val)

@st.cache(allow_output_mutation=True)
def get_session():
return SessionState(df=pd.DataFrame())

session_state = get_session()

# Creating a dictionary with some data
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 35, 40, 25]
}

# Creating a DataFrame with the data
df = pd.DataFrame(data)

# Assigning this DataFrame to session_state
session_state.df = df

This code defines the SessionState class, allowing us to store Python objects within it. By invoking the get_session function, we can retrieve any Python objects saved in the session state. In this example, a Pandas DataFrame named df is defined. The code then assigns this DataFrame to the attribute session_state.df within the session state.

Basic libraries for the project

import streamlit as st  # library for building interactive web apps
import pandas as pd # Pandas library for data manipulation and analysis
import numpy as np # NumPy library for numerical computing
from datetime import date # Importing date class from datetime module for handling dates

# Libraries for retrieving and downloading data
import requests # Making HTTP requests
from io import BytesIO # Working with binary data streams
import yfinance as yf # Yahoo Finance library for retrieving financial data
import base64 # Encoding and decoding binary data to/from ASCII

# Plotting libraries
import plotly.express as px # Plotly Express for creating interactive plots
import plotly.graph_objects as go # Plotly Graph Objects for creating more customizable plots

First of all, if your are using Streamlit to render an application you may want to open files from github. The function below loads data from a GitHub URL using the raw URL format.

def load_data_from_github(url):
response = requests.get(url)
content = BytesIO(response.content)
data = pd.read_pickle(content)
return data

It is quite handy to load into your project unchanging data files like dictionaries, pickle, parquet and even csv. It definitely saves you some lines in your code.

Retriving data

For convenience, we are using the 'yfinance' library to retrieve data from the stock market. Its syntax requires either a list separated by commas or a dictionary iteration looping each key.

def download_data(data, period='1y'):
dfs = []
if isinstance(data, dict):
for name, ticker in data.items():
ticker_obj = yf.Ticker(ticker)
hist = ticker_obj.history(period=period) # timeframe
hist.columns = [f"{name}_{col}" for col in hist.columns] # Add prefix to the name
hist.index = pd.to_datetime(hist.index.map(lambda x: x.strftime('%Y-%m-%d')))
dfs.append(hist)
elif isinstance(data, list):
for ticker in data:
ticker_obj = yf.Ticker(ticker)
hist = ticker_obj.history(period=period)
# Add prefix to each close price column
hist.columns = [f'{ticker}_{col}' for col in hist.columns]
hist.index = pd.to_datetime(hist.index.map(lambda x: x.strftime('%Y-%m-%d')))
dfs.append(hist)
# Use join='outer' to handle different data indices
combined_df = pd.concat(dfs, axis=1, join='outer')

As we are rendering some Streamlit widgets, I allow users to decide which method they are more comfortable with. The Streamlit code to choose assets assumes that you have defined and stored a combination of tickers in a dataframe.

For the purpose of improving features, I have enabled the download option to retrieve all available historical data. This includes open, close, maximum, and minimum prices, as well as adjusted prices. Occasionally, we also receive volume, dividend yield, stock splits, and capital gains information. We will be working with the Close price columns.

For this project, I have constructed four dictionaries containing tickers for the B3 BOVESPA, S&P500, NASDAQ, commodities, cryptocurrencies, and USD pairs.

Additionally, I buit features for the user to choose the timeframe for the data, perform resampling, missing values treatment, and rolling average calculations.

# select box widget to choose timeframes
selected_timeframes = st.selectbox('Select Timeframe:', ['1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max'], index=7)

# creating a dictionary of dictionaries with all available tickers
assets_list = {'CURRENCIES': currencies_dict, 'CRYPTO': crypto_dict, 'B3_STOCKS': b3_stocks, 'SP500': sp500_dict, 'NASDAQ100': nasdaq_dict, 'indexes': indexes_dict}

# combining dictionaries when the user selects one or more in assets_list
selected_dict_names = st.multiselect('Select dictionaries to combine', list(assets_list.keys()))
combined_dict = {}
for name in selected_dict_names:
dictionary = assets_list.get(name)
if dictionary:
combined_dict.update(dictionary)

# dictionary to actually store retrieved data
selected_ticker_dict = {}

# looping through the chosen tickers
if selected_dict_names:
# the list to iterate over tickers
tickers = st.multiselect('Asset Selection', list(combined_dict.keys()))
if tickers and st.button("Download data"):
for key in tickers:
if key in combined_dict:
selected_ticker_dict[key] = combined_dict[key]
# Assigning data object as the result of the function download_data
session_state.data = download_data(selected_ticker_dict, selected_timeframes)

# Handle tickers entered manually
type_tickers = st.text_input('Enter Tickers (comma-separated):')
if type_tickers and st.button("Download data"):
tickers = [ticker.strip() for ticker in type_tickers.split(',')]
# doing the same for tickers separated by commas
session_state.data = download_data(tickers, selected_timeframes)

If you have done everything correctly, you should be able to render the widget and get a result like this:

Video 1. Download from yahoo finance in streamlit

Simulating the Frontier

This is probably one of my favorite sections, where we delve into portfolio simulations. Firstly, there are various methods to apply MVO (Mean-Variance Optimization). I opted for the traditional approach, but you can easily accomplish the task using pyportfolioopt.

Damian Boh did an excellent job in his post titled ‘Easily Optimize a Stock Portfolio using PyPortfolioOpt in Python.’ While we are covering similar topics, if you are pressed for time to complete the task, you can refer to his writings. As a keepsake, I will provide you with a quick .

from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
import pandas as pd

# Sample stock data (replace it with Close prices downloaded from yahoo finance)
stock_data = {
"AAPL": [0.1, 0.05, 0.08, 0.12, 0.07],
"GOOG": [0.05, 0.06, 0.07, 0.08, 0.09],
"MSFT": [0.06, 0.04, 0.08, 0.07, 0.05]
}

stocks_df = pd.DataFrame(stock_data)

# Calculate expected returns and sample covariance matrix
mu = expected_returns.mean_historical_return(stocks_df)
S = risk_models.sample_cov(stocks_df)

# Create the Efficient Frontier object
ef = EfficientFrontier(mu, S)

# Optimize for maximum Sharpe ratio
weights = ef.max_sharpe()

# Clean the weights (optional)
cleaned_weights = ef.clean_weights()

# Print the cleaned weights
print(cleaned_weights)

# Plot the efficient frontier
ef.portfolio_performance(verbose=True)

# all done!

Finally de Frontier

It is also possible to use a solver and implement a minimize function. This will save you a lot of time, even though on my 16GB RAM computer, 10,000 simulations took only about one minute to complete. Regardless, applying MVO relies on:

  • Estimating the annualized returns of the assets.
  • Calculating the portfolio variance.
  • Finding the covariance between each pair of assets in your portfolio.
  • Calculating the Sharpe Ratio and determining the optimal weights for each asset that maximize it.
  • At least these constraints are necessary: the sum of weights must equal one and be greater than zero (short trading is not allowed in this tutorial).

See below how to implement the code:

import numpy as np
import cvxpy as cp

# Risk-free rate
risk_free_rate = 0.05

# Number of assets
n_assets = len(expected_returns)

# Define variables
weights = cp.Variable(n_assets)

# Portfolio expected return
portfolio_return = expected_returns @ weights

# Portfolio volatility (standard deviation)
portfolio_volatility = cp.sqrt(cp.quad_form(weights, covariance_matrix))

# Portfolio Sharpe Ratio
portfolio_sharpe_ratio = (portfolio_return - risk_free_rate) / portfolio_volatility

# Define objective function (maximize Sharpe Ratio)
objective = cp.Maximize(portfolio_sharpe_ratio)

# Define constraints
constraints = [
cp.sum(weights) == 1, # sum of weights equals 1 (fully invested)
weights >= 0 # weights are non-negative
]

# You can add more constraints here, such as minimum and maximum weights,
# target return, etc.

# Solve the optimization problem
problem = cp.Problem(objective, constraints)
problem.solve()

As intelligent investors, our focus lies in constructing a diversified portfolio with positive returns.

To achieve this, we define returns as the logarithmic transformation of tomorrow’s adjusted close price divided by today’s adjusted close price.

Essentially, it breaks down as follows:

Equation 2. Calculating the returns

The underlying concept here is to calculate rate of growth and the pace at which this rate occurs. In essence, we are computing the first derivative of adjusted prices.

The code below calculates and plots the log returns for my test portfolio. I selected Banco do Brasil (BBAS3.SA) , Biogen (BIIB), Apple (AAPL) e Google (GOOG).

def logreturns(df):
# Separating the close price in this case
df.columns = df.columns.str.split('_').str[0]
log_returns = np.log(df)
log_returns = df.iloc[:, 0:].pct_change()

fig = px.line(log_returns, x=log_returns.index, y=log_returns.columns[0:].split('_')[0],
labels={'value': 'log'},
title='Log Returns')
fig.update_layout(legend_title_text='Assets')
st.plotly_chart(fig)

return log_returns

If you have everything set up correctly, you should visualize an image like this for your chosen assets:

Image 1. Log returns for the portfolio.

As you can observe, the log transformation makes the time series closer to stationarity. We can verify this by applying the augmented Dickey-Fuller test. Check out how to work with time series on my new post.

Nonetheless, there may still be a few structural breakpoints for the log returns in which the prices were more volatile than their mean.

It is hard, though, to determine whether the assets showed positive or negative returns just by looking at log returns alone.

To address this, we can plot returns from the initial date by anchoring the first date and iterating through rows. Let’s see how it works:

def return_over_time(df):
return_df = pd.DataFrame()
df.columns = df.columns.str.split('_').str[0]
for col in df.columns:
return_df[col] = df[col] / df[col].iloc[0] -1

fig = px.line(return_df, x=return_df.index, y=return_df.columns[0:],
labels={'value': 'Returns to Date'},
title='returns')
fig.update_layout(legend_title_text='Assets')
st.plotly_chart(fig) # for streamlit plots

The results should be something like this:

Image 2. Return over time for the selected tickers.

As depicted in image 2, only Biogen showed a negative return for the last date considered. Banco do Brasil went through a rough patch from 2020 to 2023 but returns were positve in 2024.

Return of the portfolio

Calculating portfolio returns involves multiplying weights by the average log returns. While the weights are generated randomly for each simulation, the average log returns require us to assume a mean of the data points.

In this case, the steps we follow are:

  1. Resampling the dataframe for the chosen timeframe
  2. Calculating the average of these returns.
  3. Multiply by the number of trading days, typically 252, to obtain annualized returns.
resampler = 'A'  # for annual
trading_days = 252 # usual trading year

# Step 1. Resample keeping only the last values of each year
annualized_returns = df.resample(resampler).last()

# Step 2. Calculate the log returns and take the average
# Step 3. Multiply it by trading days
annualized_returns = df.pct_change().apply(lambda x: np.log(1 + x)).mean() * trading_days
simple_returns = np.exp(annualized_returns) - 1

As a reminder, you can annualize multiple resampled portfolios as desired. For instance, if you want a quarterly analysis, simply change the resampling parameter to ‘3M’ and execute the snippet above. Since Mean-Variance Optimization (MVO) assumes returns are normally distributed, it is standard procedure to capture annualized returns.

A more conservative approach would be to take the average for daily log returns without resampling the original dataframe and then multiply it by the number of trading days. All you need to do is follow steps 2 and 3.

In fact, the Sharpe Ratio is indeed sensitive to how you calculate average returns, implying that the returns up to today may be replicable in the future. This assumption may simply not hold true.

Anyhow, portfolio return can be obtained by multiplying each asset return (ri) by its weight (wi):

Equation 3. Portfolio return
  • n is the number of assets in the portfolio.
  • ri​ is the return of asset i.
  • wi​ is the weight of asset i in the portfolio.

Portfolio risk

To calculate the risk of a portfolio we need to determine weights of the assets in the portfolio, the variance of each asset alone, and the covariance between each pair of assets:

Equation 4. Portfolio variance

The term “product of the weights and covariance between assets” captures the combined influence of asset weights (e.g., w1​ and w2 ​) and their covariance (e.g., Cov(1,2)).

A positive covariance between assets 1 and 2 means they move in the same direction, making the portfolio risk higher. Inversally, when covariance is negative, they move in different directions, meaning at least one asset is performing poorly.

So, diversification is nothing more than finding assets that move upward in different ways.

In other words, diversification relies on selecting assets that appreciate but not at the same pace or for the same reasons.

We can calculate the porfolio variance as it follows:

trading_days = 252
var = cov_matrix.mul(weights, axis=0).mul(weights, axis=1).sum().sum()
sd = np.sqrt(var) # obtain the risk
ann_sd = sd*np.sqrt(trading_days) # to scale the risk for any timeframe

What matters here is scaling both returns and risk to the same level. However, in part 2, we are going to take some leaps of faith by scaling returns for much shorter time frames. As the Sharpe Ratio depends on both return and risk, the timeframe matters for finding the highest Sharpe Ratio portfolio.

Simulating the frontier

Finally, we have all we need to simulate the Efficient Frontier. For that purpose, we will generate a thousand portfolio configurations and calculate the Sharpe Ratio for both annualized expected returns and risk.

def efficient_frontier(df, 
trading_days,
risk_free_rate,
simulations= 1000,
resampler='A'):

# covariance matrix of the log returns
cov_matrix = df.pct_change().apply(lambda x: np.log(1+x)).cov()

# lists to store the results
portfolio_returns = []
portfolio_variance = []
portfolio_weights = []

num_assets = len(df.columns)

for _ in range(simulations):
# generating up to 1000 portfolio simulations
weights = np.random.random(num_assets)
weights = weights/np.sum(weights) # scaling weights to 1
portfolio_weights.append(weights)
# calculating the log returns
returns = df.resample(resampler).last()
returns = df.pct_change().apply(lambda x: np.log(1 + x)).mean() * trading_days
annualized_returns = np.dot(weights, returns)
portfolio_returns.append(annualized_returns)
# portfolio variance
var = cov_matrix.mul(weights, axis=0).mul(weights, axis=1).sum().sum()
sd = np.sqrt(var) # Daily standard deviation
ann_sd = sd*np.sqrt(trading_days)
portfolio_variance.append(ann_sd)

# storing returns and volatitly in a dataframe
data = {'Returns':portfolio_returns, 'Volatility':portfolio_variance}

for counter, symbol in enumerate(df.columns.tolist()):
data[symbol+' weight'] = [w[counter] for w in portfolio_weights]

simulated_portfolios = pd.DataFrame(data)
simulated_portfolios['Sharpe_ratio'] = (simulated_portfolios['Returns'] - risk_free_rate) / simulated_portfolios['Volatility']

return simulated_portfolios

Now we can plot the frontier in streamlit and do some customization by intruducing expected_sharpe, expected_return and risk taken as the initial configuration we are interested in.

def plot_efficient_frontier(simulated_portfolios, expected_sharpe, expected_return, risk_taken):
simulated_portfolios = simulated_portfolios.sort_values(by='Volatility')

# concatenating weights so we can hover on them as we select data points
simulated_portfolios['Weights'] = simulated_portfolios.iloc[:, 2:-1].apply(
lambda row: ', '.join([f'{asset}: {weight:.4f}' for asset, weight in zip(simulated_portfolios.columns[2:-1], row)]),
axis=1
)
# creating the plot as a scatter graph
frontier = px.scatter(simulated_portfolios, x='Volatility', y='Returns', width=800, height=600,
title="Markowitz's Efficient Frontier", labels={'Volatility': 'Volatility', 'Returns': 'Return'},
hover_name='Weights')

# getting the index of max Sharpe Ratio and painting in green
max_sharpe_ratio_portfolio = simulated_portfolios.loc[simulated_portfolios['Sharpe_ratio'].idxmax()]
frontier.add_trace(go.Scatter(x=[max_sharpe_ratio_portfolio['Volatility']],
y=[max_sharpe_ratio_portfolio['Returns']],
mode='markers',
marker=dict(color='green', size=10),
name='Max Sharpe Ratio',
text=max_sharpe_ratio_portfolio['Weights']))

# Getting portfolios where returns are above our expectations and below
# the amount of risk we aim to take
low_risk_portfolios = simulated_portfolios[
(simulated_portfolios['Returns'] >= expected_return) &
(simulated_portfolios['Volatility'] <= risk_taken)
]

frontier.add_trace(go.Scatter(x=low_risk_portfolios['Volatility'],
y=low_risk_portfolios['Returns'],
mode='markers',
marker=dict(color='purple', size=5),
name='Expected Return & Risk Taken',
text=low_risk_portfolios['Weights']))

# Selecting portfolios with our initial Sharpe ratio expectations
# and paiting it orange
expected_portfolio = simulated_portfolios[
(simulated_portfolios['Sharpe_ratio'] >= expected_sharpe - 0.001) &
(simulated_portfolios['Sharpe_ratio'] <= expected_sharpe + 0.001)
]

if not expected_portfolio.empty:
frontier.add_trace(go.Scatter(x=[expected_portfolio['Volatility'].values[0]],
y=[expected_portfolio['Returns'].values[0]],
mode='markers',
marker=dict(color='orange', size=10),
name='Expected Sharpe Ratio',
text=expected_portfolio['Weights']))

# selecting the portfolio with lowest risk and painting it red
frontier.add_trace(go.Scatter(x=[simulated_portfolios.iloc[0]['Volatility']],
y=[simulated_portfolios.iloc[0]['Returns']],
mode='markers',
marker=dict(color='red', size=10),
name='Min Volatility',
text=simulated_portfolios.iloc[0]['Weights']))

frontier.update_layout(legend=dict(
orientation='h',
yanchor='top',
y=1.1,
xanchor='center',
x=0.5))

st.plotly_chart(frontier)

If you did everything shown here, you are expected to end up with a frontier like this:

Image 3. Efficient Frontier plot

Note that any point on the left edge is the best configuration of Sharpe Ratio. Our initial Sharpe expectations, considering the risk we take, are marked in orange.

The maximum Sharpe Ratio is marked in green, and the lowest risk portfolio is marked in red.

Lastly, purple data points represent portfolios in which expected returns exceed our initial expectations, but we only take a defined amount of risk. For those, we should consider any portfolio also on the left edge of the plot.

Video 2. Streamlit Portfolio Balancer

Final Considerations

In conclusion, the analysis conducted provides valuable insights into portfolio optimization and risk management. By simulating the Efficient Frontier and exploring various portfolio configurations, we have gained a deeper understanding of how to balance risk and return in investment portfolios.

Variance Optimization (MVO) is mandatory knowledge in modern portfolio theory (MPT). However, it only suggests the weights one should have bought, as it is limited to future predictions. It relies on the premises of diversification and risk avoidance.

Having said that, MVO is one of the most used techniques for rebalancing your portfolio and defining new weights. Nonetheless, does it really work when backtesting empirical data?

In Part 2, I present a script that rebalances portfolios using the previously determined maximum Sharpe Ratio weights and compares the total portfolio return with evenly distributed weights.

Additionally, I will explore whether minimum risk optimization is justified over simply assuming systematic risk.

Can we actually use the Sharpe Ratio in a non-figurative way?
Shoul we be indifferent do the market risk?

Thank you for reading and for your support! If you have read my article, please clap and comment. 😊

References

Streamlit. (2024). Streamlit. Retrieved from https://streamlit.io/.

PyTech Academy. (2023). Streamlit App Deployment: Step-by-Step Guide to Streamlit Cloud. Medium. https://pytechacademy.medium.com/streamlit-app-deployment-step-by-step-guide-to-streamlit-cloud-37c7eb2687d6.

DataDrivenInvestor. (2022). Easily Optimize a Stock Portfolio using PyPortfolioOpt in Python. Medium. Recuperado de https://medium.datadriveninvestor.com/easily-optimize-a-stock-portfolio-using-pyportfolioopt-in-python-80492b83912a.

Markowitz, H. (1952). “Portfolio Selection.” Journal of Finance, 7(1), 77–91.

William F. Sharpe, “The Sharpe Ratio,” Journal of Portfolio Management, Vol. 21, №1, Fall 1994, pp. 49–58. [Online]. Available: https://web.stanford.edu/~wfsharpe/art/sr/sr.htm.

--

--