How to Analyze Volume Profiles With Python

An approach to using volume profiles for algorithmic trading

Minh Nguyen
The Startup
5 min readJan 15, 2021

--

When trading in markets such as equities or currencies it is important to identify value areas to inform our trading decisions. One way to do this is by looking at the volume profile.

In this post, we explore quantitative methods for examining the distribution of volume over a period of time.

More specifically, we’ll be using Python and statistical and signal processing tools in SciPy’s suite of modules. Data plots are rendered with Plotly.

How do we use volume?

This post covers the concept of volume profiles and how to trade them.

In short, volume conveys the amount of assets transacted during a period of time. Another dimension we can look at is the volume transacted around given price levels. This is known as a volume profile.

The images below show how we can look at volume along these two dimensions.

Volume grouped by time
Volume grouped by price. Volume nodes marked with blue arrows.

Areas where significant volume is accumulated form high volume nodes (or clusters). These volume nodes can be helpful in determining important price levels to watch.

Analyzing the Volume Data

It’s easy to visually spot volume nodes in the charts above but the idea here is to do it with code. So let’s see how we can do that.

Download the data

First, we’ll fetch data for the asset and timeframe shown in the charts above. We’ll be fetching minute data to allow us to construct a volume profile with enough granularity.

In this example, we’re looking at spot Forex which means we don’t have true volume data. The volume data here is tick volume which serves as a proxy for actual volume transacted.

Function for downloading data is stubbed out
EURUSD minute data from OANDA. Time is UTC

Now let’s plot the volume profile which is just a histogram along the price axis.

A histogram is a good way of showing the distribution of our data. However, it’s rough and can vary depending on how the bins are defined. Let’s find a more convenient way to model our distribution.

Kernel Density Estimator

A kernel density estimator (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This allows us to represent our distribution as a smooth and continuous curve.

We’ll use SciPy’s gaussian_kde to get our PDF. We can plot a normalized histogram with the curve to see how it fits.

kde_factor affects the smoothness of the curve. I chose an arbitrary value that seemed like a good fit on our multimodal distribution.

Finding Volume Nodes

To help us determine the structure of our volume profile, we use find_peaks from SciPy’s signal processing module.

At this point, we already have some useful data to work with. For instance, we have some relatively interesting price points based on our volume profile.

But we can do better! find_peaks provides us the ability to specify constraints to further filter out noise.

Prominence

For starters, we can specify a minimum prominence for our peaks. This can help us identify more pronounced volume nodes. Let’s plot some lines representing each peak’s prominence to see what that looks like.

Prominence is defined as the vertical distance between the peak and its lowest contour line. Let’s filter for peaks where the prominence is at least 30% of the range.

min_prom = kdy.max() * 0.3
Peaks filtered with a minimum prominence

Peak Width

Now let’s say we’re looking for areas of high volume and tight consolidation. In other words, we’re interested in areas where a lot of activity happened within a small range. We can find this by specifying constraints on the width of our peaks.

First let’s plot markers showing the width of the peaks. By default, the width is calculated at half the relative height of the peak.

Red lines show where the width is calculated

On the hourly chart, our average true range during the week was around 14 pips. Let’s set our max width at 20 pips. To do this, we’ll need to convert our range in prices to number of samples.

Peaks filtered by both prominence and width

We’ve now spotted an area of tight consolidation and high volume. In terms of technical analysis, this would be a key value area to watch!

Density

Maybe we don’t care so much about the width or range of our volume node. What if instead, we were more interested in the total density of the volume node? The PDF is a continuous curve, so we’ll need to take the integral between a range to get the density. We’ll use the base of the peaks to determine the range to integrate over.

> [0.631087355798145, 0.366996783471251]

Since this is a probability density function, our density values are normalized. As a reminder, the integral over the entire range is equal to 1.

Conclusion

So far we were able to download price and volume data over a time period and view the distribution across price as a volume profile. We used a Gaussian KDE to fit a PDF over this data and utilized some signal processing tools to extract useful information.

I hope this was helpful especially for those interested in algorithmic trading. This is only a starting point and we would need to make it more robust before incorporating in a research or trading system.

Thanks for reading and feel free to reach out on LinkedIn if you have any comments or suggestions!

--

--