Machine Learning used to build a Diversified Portfolio: K-Means Clustering

Hugh Donnelly
Analytics Vidhya
Published in
12 min readSep 30, 2021

--

Introduction

In this article, we will explore K-Means Clustering:

  • What is K-Means Clustering?
  • Algorithm
  • K-Means Clustering Application: Building a diversified portfolio

Jupyter Notebooks are available on Google Colab and Github.

For this project, we use several Python-based scientific computing technologies listed below.

import time
import kneed
import requests
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
import ipywidgets as widgets
from scipy.stats import mstats
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from datetime import datetime, timedelta
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError
from requests.packages.urllib3.util.retry import Retry

What is K-Means Clustering?

K-Means Clustering is a form of unsupervised machine learning (ML). It is considered to be one of the simplest and most popular unsupervised machine learning techniques. Unsupervised algorithms use vectors on data points. These data points are not labeled or classified. Our goal is to discover hidden patterns and group the data points in a sensible way based on similarity of features. Each group of data points is a cluster and each cluster will have a center.

--

--

Hugh Donnelly
Analytics Vidhya

Hugh founded AlphaWave Data in 2020 and is responsible for risk, attribution, portfolio construction, and investment solutions.