Time Series Classification using Dynamic Time Warping K-Nearest Neighbours

15 min readNov 5, 2023

Have you ever wondered if there’s a stock out there that’s moving in a strikingly similar pattern to the one you’re watching? Well, we’ve been diving deep into this question and found a fascinating approach to unravel this mystery!

Imagine you’ve got a favourite song, and you want to find another track with the same rhythm and vibe. In the world of time series (like stock prices), there’s this cool method called Dynamic Time Warping (DTW). It’s a tool that measures how similar two sequences are, even if they are out of sync.

But that’s only half the puzzle. We pair DTW with the K-Nearest Neighbors (KNN) algorithm. KNN is a straightforward yet powerful algorithm that scans for the “closest” items to your item of interest. In our case, it hunts for stocks that have the most similar price patterns to our chosen stock.

So, by combining DTW’s sequence similarity magic with KNN’s neighbour-finding process, we’ve created a nifty model to spotlight stocks that dance to the same tune as our stock of interest.

In this article, we’ll unwind the magic of the K-Nearest Neighbours (KNN) and Dynamic Time Warping (DTW) methods, and explore how they can be harnessed to classify time series data. With KNN and DTW, we can identify stocks that display similar trends to a particular stock of interest. We hope to equip you with these tools to spot potential correlations and perhaps even predict future stock movements. Let’s embark on this intriguing journey together!

What is a Time Series Analysis?

A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus, it is a sequence of discrete-time data. A time series is a frequently plotted chart. It has excellent applications in the real world, including statistics, signal processing, pattern recognition, forecasting, healthcare, astronomy and largely any domain of science and engineering involving temporal data measurement.

Time series analysis studies this data to extract meaningful statistics and characteristics. It can be used for forecasting — predicting future events using historical data — or attempting to establish a correlation between variables.

An interesting thing to note about time series data is that it is temporal, i.e. it is time-based and can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the English language).

Standard models to analyse a time series can be Autoregressive models, Integrated models, Moving Average models, or even a combination of these models — ARIMA, ARMA, ARFIMA and so on. We won’t be going into too much detail about these models, how they work and their exact technicalities in this blog, however, we will be looking into other concepts mentioned above.

DTW: What, Why, How?

What is DTW?

Imagine you’re trying to compare two dancers moving to the same song, but each dancing at their own unique rhythm. One might be gracefully gliding with prolonged steps, while the other energetically moves with quick, short strides. Even though they’re dancing to the same tune, comparing their movements directly might make them appear out of sync.

Enter DTW or Dynamic Time Warping. It’s like the genius choreographer who can make both dancers appear in harmony, regardless of their distinct styles.

At its heart, DTW is a time series analysis tool. What makes it special? While traditional methods might struggle when sequences dance to different beats (or, are out of sync or of varying lengths), DTW elegantly “warps” their timelines for an optimal alignment. It understands that two sequences can share the same essence, even if they have different speeds or timings. So, unlike the classic Euclidean distance method, which might frown upon our uniquely paced dancers, DTW celebrates their individuality and still finds harmony.

In simpler terms, think of DTW as a matchmaker, which ensures that two distinct time series sequences, no matter how different, get their perfect moment in sync!

Picture this: You’re trying to measure how alike two dance routines are. One approach (like Euclidean distance) would be like comparing dancers’ positions at specific beats of the music. Sure, it’s a method, but what if one dancer took a dramatic pause, throwing off the whole comparison?

Now, let’s bring in the star: Dynamic Time Warping (DTW). Instead of rigidly sticking to the beats, DTW is like a dance judge who focuses on the moves’ essence. It understands the rhythm and spirit, ensuring that a dramatic twirl in one routine aligns with a similar move in another, even if they happen at slightly different times.

In the side-by-side comparison of the two methods, you’ll notice how DTW beautifully captures the dance’s distinctive patterns. It doesn’t just match based on the exact moment (or timestamp); it ensures that similar moves, no matter when they happen, are paired together. This, undoubtedly, offers a richer and more genuine understanding of similarity than just checking positions at fixed beats.

So, when comparing time series data, do you want to be the strict timekeeper or the dance enthusiast who appreciates rhythm and flow? DTW makes the choice easy!

We will be using Python to visualise and implement the DTW model

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path)
distance = dtw.distance(s1, s2)

Imagine looking at two dancers swaying in perfect sync with the rhythm of sine waves. The figure above illustrates this captivating dance, highlighting the ideal distance between each of their moves.

Now, let’s dive a bit deeper. In the following visualisation, we’re breaking down the magic behind these moves. This is where we introduce the concept of the ‘cost matrix’. Think of it as a dance scorecard, helping us trace the dancers’ steps and ensuring they’re in harmony.

Wondering how we fill out this scorecard or, in other words, compute the entries of the cost matrix? Well, let’s unveil that secret next!

Visualise a vast canvas of countless tiny squares, each with a unique shade. These aren’t random artistic choices, they represent distances between two data points we’re studying.

Now, here’s the fascinating part. The deeper the shade, the closer the two points are. It’s like a dance, where the algorithm skims through the vast floor, looking for the darkest patches. That’s where the magic lies. This choreography culminates in a stunning curve, often a diagonal, aptly named the ‘optimal warping path’, drawn to link these darkest cells.

But there’s a hiccup. As enchanting as this dance is, DTW has a rhythm that’s often too slow, especially when it comes to time complexity. Imagine wanting to pair this dance with another, like KNN; it could feel like an eternity waiting for the grand performance. So, we need a smart choreographer, which is an optimization strategy, to speed things up without losing the essence of the dance.

d, paths = dtw.warping_paths(s1, s2, window=20, use_pruning=True )
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

Imagine choreographing a dance, but there are certain sections of the stage you’re confident the dancers won’t use. Wouldn’t it make sense to remove or avoid those parts, making the performance more streamlined?

That’s precisely what we did here. By using a window=20, we set a boundary, ensuring that only points from s1 and s2 within a specific range can align with each other. Think of it as narrowing down the dance stage based on where the action truly happens.

Now, there’s more! With use_pruning=True, we’re bringing in a master move. It’s like having a keen-eyed director who quickly spots and removes any paths that aren’t adding to the main performance. The aim? To ensure the dance — or, in our case, the algorithm — remains focused on the optimal path, making our computational dance not just beautiful but efficient too.

KNN: What, Why, How?

What is the KNN algorithm?

Imagine you’re at a party, and you don’t know anyone. How would you get to know people? One approach might be to start mingling with those who seem most like you — perhaps they’re wearing a shirt from your favourite band or discussing a topic you’re passionate about.

This is somewhat how the KNN (k-nearest neighbours) algorithm functions. At its core, KNN is like that friendly party-goer, classifying unknown data points by aligning them with their most similar counterparts from a known dataset.

Here’s a simple way to visualise it:

When k equals 1: Think of it as asking just one person at the party for an opinion. In KNN terms, a test example would adopt the label of its nearest neighbour from the training set.

When k is 3: It’s like seeking opinions from three people and going with the majority view. So, if two out of three people give the same advice, that’s what you follow. The KNN algorithm would look at the labels of the three nearest data points and choose the most common label.

The challenge? Picking the right ‘k’. Just as you’d need to decide how many people to consult at a party, in KNN, choosing the right value for ‘k’ is crucial. It’s not always about asking more people, it’s about asking the right number of people. To determine this magical number, techniques like cross-validation come in handy, where we test the algorithm’s accuracy with various ‘k’ values on smaller chunks of data.

In essence, KNN is about finding your crowd in a sea of data and making decisions based on collective wisdom!

How does the KNN algorithm work?

Step 1: Select the number k of the neighbours
Step 2: Calculate the Euclidean distance of k number of neighbours
Step 3: Take the k nearest neighbours as per the calculated Euclidean distance.
Step 4: Among these k neighbours, count the number of the data points in each category.
Step 5: Assign the new data points to that category for which the number of neighbours is maximum.
Step 6: Our model is ready.

Why do we need KNN?

Imagine you’re at a grand party with two exclusive groups: Team A and Team B. You spot someone new, let’s call them Alex, entering the room. The burning question is, which group will Alex join?

Enter the K-NN (K-Nearest Neighbors) algorithm, our social detective for the evening. With K-NN, we can quickly determine if Alex is more of a Team A or Team B kind of person based on certain characteristics or similarities with members of each group. So, the next time you see a newcomer and wonder where they’ll fit best, remember, K-NN might just have the answer!

How to choose the right value of K?

Imagine setting up a magical telescope that can help you peer into patterns in the universe. But here’s the catch — you have to pick the perfect lens to get the crispest, most accurate view.

Choosing the right ‘k’ in K-NN is a bit like finding that perfect lens. You don’t want to pick a random one and get a blurry vision. So, what do you do? You test various lenses (or ‘k’ values) and choose the one that gives the clearest picture with the least distortion.

In our data galaxy, this means picking a ‘k’ that minimises errors but keeps our algorithm’s predictive power sharp. It’s a balance, but with a little patience and testing, you’ll find that sweet spot and reveal the secrets of your data!

Picture this: you’re looking at an image showing data points grouped into two distinct classes, say Class A and Class B. A new, unknown point appears! Where does it fit in?

If you look closely, you’ll notice something fascinating. If we set our trusty K-NN algorithm to K=3, it seems to lean towards placing our mysterious data point in Class B. But twist the plot a bit and bump K up to 7, and suddenly, our point seems more at home with Class A. Confused? Don’t be.

Finding the ideal ‘k’ isn’t about relying on set-in-stone rules. It’s an artful balance. While there isn’t a one-size-fits-all formula to pin down the best ‘k’, we do have some guidelines.

Start Simple: Kick things off with a random ‘k’. It’s a beginning, and every epic story needs one!
Small k = Wild Ride: A tiny ‘k’ value can give you decision boundaries that feel like a roller coaster — thrilling but unstable.
Large k = Smooth Sailing: Conversely, a bigger ‘k’ will soothe those wild swings and give a more stable (and often more accurate) classification.
Plot It Out: Visualise the relationship between ‘k’ values and error rates. Typically, you’ll want to look for that sweet spot where the error rate dips to its lowest.
So next time you’re grappling with ‘k’, remember: it’s not just about numbers. It’s about understanding your data, experimenting, and visualising. Dive in, adjust, and let the data guide you.

Now, let’s combine the magic of DTW (Dynamic Time Warping) with the intuitive prowess of K-NN (k-Nearest Neighbors) to craft a model tailored for predicting stock trends. The goal? To spotlight stocks that dance to the same rhythm as our chosen stock.

DTW, with its knack for syncing diverse time series seamlessly, and K-NN, renowned for its classification brilliance, team up to offer a potent tool in our financial arsenal.

So, as we gear up to unlock the secrets of stock trends, let this dynamic duo lead the way.

MODEL

Imagine each stock in the market as a dancer with its own unique rhythm and moves. Some dancers have eerily similar patterns, even if they dance a bit out of step sometimes. Enter our dynamic duo: Dynamic Time Warping (DTW) and the K-Nearest Neighbors (KNN) algorithm. Together, they’re like our dance judges, spotting pairs and groups with similar grooves and vibes, even if they miss a beat occasionally.

Curious about how we teach our computer to be this perfect judge? Let’s dive right into the techy part.

But, hold on! Before our grand performance, we need to ensure our stage is set and props are ready. By that, we mean we need some digital tools in our toolkit. Here’s a quick command to get all the essentials:

This version paints a vivid picture using the dance analogy and eases into the technical details in a light-hearted manner.

Alright, let’s roll up our sleeves and dive into the coding magic! But before we conjure up any algorithms, we’ve got some prep work to do. Think of it like getting the ingredients ready before we start cooking. Here’s a nifty command to gather all the digital spices we’ll need:

pip install yfinance fastdtw numpy scikit-learn

And now, the moment you’ve all been waiting for: let’s unveil the code!

import yfinance as yf
import numpy as np
from fastdtw import fastdtw
from sklearn.model_selection import KFold, s
from sklearn.neighbors import KNeighborsClassifier


# Computes the Dynamic Time Warping (DTW) distance between two stock time series.
# The lower the result, the more similar the two stock patterns are.
def dtw_distance(stock_1, stock_2):
 distance, _ = fastdtw(stock_1, stock_2)
 return float(distance)


# To download Stock Data
# We'll use yfinance to get the closing prices of various companies for a specified time period.
tickers = ["AAPL", "MSFT", "AMZN", "META", "GOOGL", "GOOG", "TSLA", "JPM", "JNJ", "V", "PG", "NVDA", "HD", "UNH", "PYPL"]
start_date = "2022–01–01"
end_date = "2023–01–01"
data = {ticker: yf.download(ticker, start=start_date, end=end_date)['Close'].tolist() for ticker in tickers}


# Prepare data for our model
X = [data[ticker] for ticker in tickers]
y = tickers


# Calculate Similarities Between Stocks
# We're going to compare every stock's price patterns with every other stock's patterns.
dtw_matrix = np.zeros((len(tickers), len(tickers)))
for i in range(len(tickers)):
 for j in range(len(tickers)):
 dtw_matrix[i, j] = dtw_distance(X[i], X[j])

print("\nHow Similar Are These Stocks?")
print(dtw_matrix)


# Find Most Similar Stocks
# Using K-nearest neighbours with our DTW distance metric to find the most similar stocks.
knn = KNeighborsClassifier(metric=dtw_distance)


# Using KFold to split our data into parts (or "folds") for cross-validation.
cv = KFold(n_splits=3, shuffle=True, random_state=42)


# Let's try different values for K (number of neighbors) to find the best one.
param_grid = {'n_neighbors': list(range(1, min(len(tickers)-1, 10)))}


# GridSearchCV will try out each value of K and tell us the best one.
grid_search = GridSearchCV(knn, param_grid, cv=cv)
grid_search.fit(X, y)

print(f"\nBest Number of Neighbors: {grid_search.best_params_['n_neighbors']}")


# For each stock, let's find out which other stock is most similar to it.
distances, neighbors_idx = grid_search.best_estimator_.kneighbors(X, 2)
for i, ticker in enumerate(tickers):
 print(f"\n{ticker}'s closest buddy is {tickers[neighbors_idx[i][1]]} with a similarity score of {distances[i][1]}.")

RESULTS

Alright, let’s break down what our code just did:

Imagine every stock as a person in a huge party. Now, our code played matchmaker and found the best buddy (or ‘nearest neighbour’) for each stock-guest. And guess what? It didn’t just stop there. It also spilled the beans on how close these stock-buddies really are by showing us the DTW distance between them. Think of it as revealing how in sync their dance moves are!

And you know what’s even cooler? Our code didn’t just randomly pick a buddy. It went the extra mile and figured out the perfect number (or the ‘optimal k’) for the group size to ensure the best matches. This is like making sure everyone gets into the perfect-sized dance group where their moves align just right.

So, next time you wonder about which stocks move to the same groove, our code has got the answers. Now we can dive deeper to spot those trendy patterns and make some informed moves!

This version takes a light-hearted, relatable approach, likening the stocks to party-goers and using the dance analogy to explain the concept of nearest neighbours and DTW distance.

Best Number of Neighbors: 1

AAPL's closest buddy is PG with a similarity score of 1692.610008239746.

MSFT's closest buddy is TSLA with a similarity score of 5961.919708251953.

AMZN's closest buddy is GOOG with a similarity score of 1904.8935852050781.

META's closest buddy is NVDA with a similarity score of 4677.560478210449.

GOOGL's closest buddy is GOOG with a similarity score of 114.77847290039062.

GOOG's closest buddy is GOOGL with a similarity score of 114.77847290039062.

TSLA's closest buddy is NVDA with a similarity score of 5553.903793334961.

JPM's closest buddy is PG with a similarity score of 1971.1000518798828.

JNJ's closest buddy is AAPL with a similarity score of 3678.819969177246.

V's closest buddy is NVDA with a similarity score of 8016.499649047852.

PG's closest buddy is AAPL with a similarity score of 1670.3999862670898.

NVDA's closest buddy is META with a similarity score of 4684.2404708862305.

HD's closest buddy is MSFT with a similarity score of 6944.269836425781.

UNH's closest buddy is HD with a similarity score of 50169.78009033203.

PYPL's closest buddy is GOOGL with a similarity score of 2227.4922943115234.

CONCLUSION

Okay, let’s talk about time series data for a second. It can be a real head-scratcher, right? With all its sequential patterns and those pesky time shifts, analysing it isn’t always a walk in the park. But guess what? As we’ve discussed there’s a cool combo that can come to our rescue: Dynamic Time Warping (DTW) and k-Nearest Neighbors (KNN).

Here’s the lowdown: DTW is like that genius friend who can spot similarities between two songs even if one is a tad faster. It aligns time series data, making sure we compare apples to apples, even if one apple ripened a bit earlier. Now, add KNN to the mix, a handy tool that identifies patterns based on what’s common in the ‘neighbourhood’. You’ve got yourself a stellar team to tackle time series classification.

Whether you’re trying to get a grip on stock market movements, spot something off in machine sensor readings, or just wrangle any data that’s tied to time, this pair is definitely worth adding to your data toolkit. But, remember, no tool is magic! You’ve got to tweak it right and really get the feel of your data. But once you do, the insights you can glean with DTW and KNN? Pure gold.