A galaxy of daily returns

Intra-portfolio correlation in drift-diffusion asset price models

NTTP
17 min readOct 11, 2023
Photo by Bryan Goff on Unsplash

Most retail investors who manage a small portfolio (of their own) have heard the term “diversification” and the importance of it, and perhaps they have heard the advice of trying to find assets to put in their portfolio that are uncorrelated, to reduce risk. True, it is more than “uncorrelated” that they search for. It is possible for an asset to be uncorrelated with the rest of your folio, and also be a losing proposition. This is less than ideal, to put it mildly.

The tangle of prices

But looking at time series price charts is perhaps the most confusing way to understand correlation among assets, and such mere looking is difficult to quantify and may be misleading. To experts, this is all well-known, but using our simple apps [Note 5] aimed at retail investors, we are trying to make some of these more advanced concepts understandable and easily usable by those investors; without them (that is: you, our readers) having to type Excel formulas or learn Python coding. Correlation among assets is one of the concepts that we are trying to clarify and illume. Note that we said among and not between. We will start with between, but correlation among assets is the goal.

First, down the price time series rabbit-hole

Photo by Nicole Baster on Unsplash

If we take a snapshot of a small portfolio of time series charts from finance.yahoo.com of six different common stocks (2023 year to date [Note 4]) in the automotive sector (which we suspect to be correlated, at least partially), the charts are kind of a jumble, even though we show only the raw data and not the myriad of overlay curves derived from different formulae that are popular these days:

https://stockcharts.com/

…and yes, you can see some similar behavior among these time series shown below.

Figure Y: Yahoo Finance chart of a multi asset portfolio YTD 2013

The couple of boxes we drew on the chart (blue and red) show areas of possible interest to the time series enjoyer: In the blue box we get similar “peak / trough wave” behavior for the non TSLA assets, maybe grouped into two groups; and in the red box we see that maybe even TSLA is behaving the same as the others — in direction at least, if not in magnitude. And the red arrow shows a crossover zone where some of the lower batch of assets are behaving in a negatively correlated manner. But these boxes represent only a couple of months of the year to date; what about the other months, or a year or more back? When you start analyzing time series charts by eye like this and try to come up with explanations regarding what it is going on, it starts to seem suspiciously similar to the “agitated man making analysis of corkboard” meme. We won’t put the picture inline due to potential copyright (Is it fair use or not? We are not lawyers, so we don’t know. Even then, the experts often answer, “it depends…”), but, here is the link for your amusement in case you don’t know what we are talking about (and you certainly won’t be the first):

Correlation is difficult to quantify by these manual methods in the price domain “by inspection,” and even if TSLA is running far above (in price) the other companies in this time series chart, is this merely due to more extreme behavior (more volatility) of similar trends as the others… or what?

Before forging ahead too far, we should note that this is an analysis of market data using applied mathematics; we are not investment advisors, and if you don’t have one, you may need one. An investment advisor, that is.

To daily returns

Instead, it is much easier and perhaps more informative to look at daily return data to analyze how groups of assets have behaved. That is, we can look at daily returns instead of daily prices, which returns are easily calculated as percent change of each asset per day in history. We even leave off the * 100% step of computing percent because it is not needed, and we just leave the return data so that, for example, a 1% daily change is a daily return value of 0.01. This we do in our apps automatically after we pull price time series data from our data provider IEX Cloud (for stocks, that is; we use cryptocompare.com for crypto data). In fact, in our apps, we don’t even bother to allow overlays of time series graphs on top of one another, because our apps want the user to focus on daily returns-based analyses (which then get translated back into the (time, price) domain at the final steps), and there are plenty of other web sites and applications that let you do a lot (and possibly far too much) with price time series graphs. No, we are interested in daily returns and the stability of them or lack thereof.

We do throw a bone towards the “price as a function of time” aficionados by showing the price chart of one asset at a time, mainly as kind of a visual check of our data provider’s output and of our “fumble-fingering in” of the symbols.

Price time series for reference, above histogram of daily returns

The time series gives us some visual feedback: Are we pulling the proper symbol’s data, are there any big spikes or flat spots in the time series — which may indicate a data feed problem or possibly a split/reverse_split of the asset — and so on. Splits or reverse splits would show as a step jump in the price series (down or up) if the data provider does not adjust for it. These splits or other outlier data we can filter out once we are in daily returns space by telling the app to “reject all points that move more than 49%,” or whatever cutoff number we want to use.

In the MonteCarlo tab of the app, we can set a cutoff point for daily returns so that false outliers can be filtered out before forecasting.
Portfolio editor (EditFolio) for MCarloRisk, showing 6 assets with 100 shares each. For this exposition, the reader need only be concerned with the symbol and share count portion of this user interface, on the lower portion of above screen shot.

After setting up a portfolio and giving a share count to each asset (let’s say 100 shares per asset for this example), we can start looking into the returns data using a couple of visualization tools. In our system, when we show returns data by itself, the share counts we specified do not figure in to the analysis. Those counts do figure in when we make our final portfolio forecast surface of (time, price, probability). Also, we do not need to do this step of looking at returns histograms for forecasting with the system, since our monte carlo forecaster automatically handles resampling from the historical return data of the portfolio of assets that we set up. However, looking at returns graphs can be enlightening if we had spent too much time in the past agonizing over time series charts of our favorite companies; for returns data is much simpler in geometry than time series data when we look at it en masse, as we shall see.

To start viewing the returns data, we can scroll through this small portfolio of assets and examine the returns distribution in the histogram display (the same assets as in the above Figure Y graph), one asset at a time. By default, the histograms are sorted from low volatility to high volatility, which you can see qualitatively by dragging left and right on the histogram (mouse button down, or finger down on a touch screen). The X axis is scaled the same among all of the assets so that you can notice relative volatility among the assets easier.

Pan horizontally on histograms to cycle through all assets in your portfolio in the MCarloRisk app. Above is an animated GIF of what you see when panning in the app.

We remind that our histograms here are generated from data 1 year back (252 trading days) from the current day (beginning-to-mid Oct 2023), but the Yahoo time series charts (Figure Y) are only year to date 2023 (less than 1 full year), due to a bug in the one year back charting in Yahoo [Note 4].

Examining the histograms, we can see that TM (Toyota) is the least volatile asset in this group, and TSLA is the most volatile. To quantify a little bit more, the daily standard deviation of returns of TM in the last year is 0.016, and the same for TSLA is 0.040. This is somewhat less than 3x times more volatile. But volatility per asset and the associated histograms only refer to one asset at a time, and do not say anything about correlation between pairs of assets, or correlation among them all.

Pairwise correlation

To see traditional pairwise correlation, we go to the Correl tab of our app which plots every asset versus every other asset, in daily return space. Again panning/dragging back and forth across the dot plot, and noting that the correlation graphs are sorted in order of correlation, we see that the two most correlated assets of this small folio are Ford and GM (F and GM), and the two least correlated assets are Honda (HMC) and TSLA. That the Honda/Tesla pair is the most correlated (among the six), an analyst would be hard-pressed to call out from just looking at those six price time series on top of one another (Figure Y above), or from reading news reports. We like to show the dot plots of daily returns like this because there are so many time series charting systems around for stocks… why not show some returns-based graphs, since they represent more stable “stationary” data?… which stationary data we need for robust forecasting.

Most correlated assets over the past year: Ford and GM (of the example portfolio of 6 assets), R = 0.85. Panning left and right on this graph in the Correl tab of the app cycles through all pairs of assets in your folio.
Least correlated daily returns over the past year, Honda and Tesla (of the 6), R = 0.41

A colleague of mine once referred to these types of charts as “galaxy charts” (or words implying the same), because they sometimes look like remote galaxies in “outer space,” edge-on: An excellent description.

The dot plots in the Correl tab of the app provide a way to view these “galaxy charts” pairwise (one asset on X axis, one on Y axis), and we see that they show as a kind of point cloud: Typically more dense near the center and more sparse near the edges, much like some real galaxies. However, our returns dot plots do not spiral like some galaxies. They are more like elliptical galaxies, to continue the astro analogy. They are certainly not like these kind of Galaxies, and in fact just by clicking on that link we can almost smell the aroma of partially burned hydrocarbons…

But back to our dot plots: These are analogous to our bell curve shaped histograms discussed and shown above, which show one asset at a time: The histograms represent dense point distribution (high bars) near zero return (suggesting that, more often than not, daily returns are small-ish), and drop down gradually to zero probability as we move away from zero return, implying that large daily moves in price (by percent) are more rare.

We have some visualization features in the app to help the user see this density a little better by overlaying a color contour type of plot on the same pair of axes as the dots, but this colorization is mainly for qualitative viewing and does not figure into the calculations.

Color contour visualization of returns point cloud, sliced through the (GM, Toyota) plane

In fact, the returns point cloud is of N dimensions (N = the number of assets we are studying: six here). The histograms show one dimension at a time, and the dot plots show two dimensions at a time. The dot plots are a 2D (planar) slice through the N dimensional point cloud parallel to the displayed asset axes, slicing through the zero return center: (0,0,0,0,0,0), assuming our six asset portfolio here. Note that when you pan all the way to the right on the correlation chart, the app also shows the cases where the same asset is on both the X and the Y axis of the graph (the so-called “like-like” plots, which is possibly a misnomer, and probably redundant, but yet has staying power nonetheless), mainly to check the plotting code and the data in general. We had better see a straight line of points at 45 degrees if we plot the same data on X and Y (harumph).

“Like-like” returns plot (same asset on X and Y). Not a bug, just for checking.

Monte carlo forecasting for a portfolio

Now when we resample and remix points to generate monte carlo paths to forecast forward in time the behavior of a portfolio of assets, each re-sample is not merely from the individual return data streams for each asset. Rather, we resample from this N dimensional point cloud; the individual return data streams (per asset) are linked by the common variable of time. Each resample is exactly one of these “galaxy” points (prior to us later adding any tuning or adjustments to the model).

For example, our random selector might pick:

21 days back (then we pull return data from all assets at that number of days back)
57 days back (then we pull return data from all assets at that number of days back)

… and so on for thousands of more re-samples

In other words, the random selections are linked by days backward in time. The random variable generated becomes a “days back across all assets” time index.

But then since portfolios have share counts per asset, the apropos share count weight is applied to the returns before we re-sum up the result for each random walk generated price path point, as described in our earlier article:

https://medium.com/@nttp/backtest-to-the-future-1ea4db87845c

and in our user training guide. So what we do is resample returns from the past, but taking into account the historical (partial) correlation among all assets in our portfolio. We do not resample from assets assuming that they move independently from one another. This would be an error, since they do not move independently from one another; or, at least, they have not in the past. Sure, these assets are not 100% correlated, and there are differences in how the returns move (from asset to asset), but they are far from independent, statistically speaking.

The monte carlo expanding into the future price envelope generation does not depend on you, the user, viewing the pairwise correlation graphs at all. We added those dot plots in the app to show assets from the point of view of daily returns, rather than from the confusing tangle of time series charts (which can be misleading anyway, and may make a viewer think that there are patterns in the data — which there very well may be — but which patterns may not have any predictive power) [Note 1].

The returns point cloud is always that: A point cloud, dense at the center and sparse at the extremes, elliptical-ish of varying dimensions or tending toward circular shape (the more circular, the less correlation). In fact, our “ground truth” theory reference points out that elliptical distributions are often used to fit these types of multi-dimensional point cloud distributions, if one chooses to do this fitting. Here we stick to viewing and modeling from distributions in their raw data “empirical” form as we mention in prior work, like we do with our single asset modeler. That is, we just resample from the returns point cloud as-is and don’t try to fit any N-dimension multi-moment distribution equation to it.

Occasionally, the analyst will see an outlier “star” in this point cloud galaxy, way at the edge of it, in “empty space.” Such is the nature of reality: Sometimes extreme events occur. This is much like the per-asset histograms: Dense in the center (high bars), sparser as we move away from zero (low bars); and an occasional outlier way on the left or right, with small bar height representing some small N point count. Statisticians will recognize the histograms as “empirical marginal distributions” or something similar with “marginal distribution” in the name:

It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables.
From
https://en.wikipedia.org/wiki/Marginal_distribution [our emphasis]

… that is, these histograms represent the entire N dimensional “galaxy” distribution when viewed from one dimension only, and with all other dimensions flattened to that one dimension. Note that we don’t see any indication of the angled tilt that we see in the 2D dot plots (suggesting correlation) when we view these marginal distribution histograms one at a time… because we are only looking at the data from the point of view of one asset at a time.

Similarly, the dot plots with their color contours (the colors being just for show) are merely the N dimension returns point cloud viewed two dimensions at a time. We could also make some 3D plots (3 assets at a time) of such an N-dimension point cloud, and maybe 4D (with time / animation for the 4th dim) in the app; we have not done this yet because it does not contribute to automated forecasting (the point of the app); but it might be cool. Interested readers can use Octave or Matlab or any of several applications to do this plotting if they are interested. We found an excellent macOS app called Graph-R if you want to play around with higher dimensional plots of returns [Note 3].

Note how we could see some correlation or lack thereof in the time series graphs, over some periods of time (the blue and red boxes we drew on Figure Y), but this is not so easy to see in other periods of time, and it is definitely difficult to quantify correlation from a bunch of time series charts stacked on top of one another. However, it is easy to visualize how much each asset has been correlated with each other asset (historically [Note 2]) with these galaxy-type dot plots and the metrics we next present.

Quantifying pairwise correlation

Those with even basic statistics knowledge will see the next “Thank you ,Captain Obvious!” statement coming from a kilometer away:

We can quantify this historical correlation, which is sometimes done in portfolio analysis systems using a “triangle of pairwise correlations” matrix (say, Excel generated or some-such):

We report out pairwise correlation numbers (R and R-square) on each of these pairwise dot plots for the user’s reference, along with some other linear regression parameters which may or may not be of interest. The R is needed because sometimes correlation is negative.

If we add VXX (a volatilty ETF) to our portfolio, we can see negative correlation between TSLA and VXX. The red, green, and blue axes are the first 3 principal component (axes) of the portfolio projected to this (VXX, TSLA) plane. Interested app users can read ahead on this topic (and see more info below).

We do not (yet?) show a triangle grid of correlations in the app, because we have a surprise in store which may be even be better than this and help the user understand this N-dimensional “galaxy” of returns more so than the sometimes confusing pairwise triangle.

Correlation of daily returns for Toyota versus Stellantis, arrow showing some statistics of this pairwise analysis

The one possibly non-obvious piece of the puzzle to neophyte analysts is that when performing these correlation analyses, we must be sure to use daily returns, not daily prices; or, perhaps, weekly or monthly returns. We refer the reader to our prior article which touches upon the relevant concept of stationarity, in addition to more in-depth references:

https://www.investopedia.com/articles/trading/07/stationary.asp

https://www.researchgate.net/post/Is_the_stock_return_series_ALWAYS_stationary

For forecasts similar to the ones performed in this app, the stationarity of multi-dimension distributions of particular assets could be investigated, if the reader wishes to do further study. However, this is not yet done in-app. As we note later, our exhaustive backtests should “smoke out” any forecasting problems related to assumptions like this [multi dimensional distribution stationarity]; but the backtests may not be able to point directly at a particular assumption as the cause of possible troubles.

Pairwise confusion

One question that we had when we first saw these correlation triangles a long time ago was: Ok if asset A is 70% correlated with asset B, and asset B is 90% correlated with asset C, can we say anything about the correlation between asset A and C? We can compute the correlation between A and C directly, but then how does this relate to the “transitive” A to B to C correlation? Luckily, quick wits before us have figured this out, which figuring we may be able to cover in a future article. Interested readers can “read ahead” and dive into training slide set 12 of the app for details on this feature:

https://diffent.com/mcrtrain/MCRSlideSetPCAV2.pdf

As an added clarification, the linear regression parameters (slope, intercept) above the dot plots are not used in the envelope (time, price, probability) forecasting analysis. They may be of interest if you are interested in ideas of “beta to” an asset other than SPY. However, our app has more involved features for this “beta to anything” computation elsewhere in the app, as described by this set of slides. We find that “beta to Bitcoin” might be an interesting study, or beta of an asset versus its related sector ETF.

Backtest away!

Now that you hopefully understand a little better about how we generate forward forecasts from correlated assets, you can run the same kinds of bulk and exhaustive backtests for portfolios (of several assets, with different share counts per asset) that we described in our earlier article for individual assets. For the portfolio section of our app(s), we do not offer as many model tuning or adjustment parameters as we do in the individual asset section of the app(s). Adding these features will be dependent on user / investor interest.

Bulk backtest of our portfolio, 100 days. Reality seems to have been skating around about the 95th percentile (high) level of the forecast. From this, we might improperly conclude that we are over-estimating risk, because reality was so high up in the forecast envelope, and even above that envelope at times. But when we next do an exhaustive backtest, we will see that this is not the case.
Exhaustive validate backtest for our little 6 asset portfolio, 1 year calendar time. Not too bad. Are we over estimating risk? No, probably not. Note how at about 1/4 of the way in from the left, the reality blue curve dips below the 5% risk level a bit. Overall, the reality price curve seems to be pretty well contained within the forecast bands, nicking the top 99th percentile band a bit about 2/3rds of the way in.
As per the single asset modeler, we also provide a table of quantitative estimates on how well the exhaustive batchtest matched reality. The model “might-could” use a bit of tuning, but is pretty good for a first model.

After checking that backtests are acceptable and possibly tuning up the model using steps we describe in our training guide (also found in the Help tab of the app), you can then forecast full portfolio price/probability estimates into the future by withholding zero days worth of data.

Notes

[Note 1] Prof. Xiu’s team from the University of Chicago had done an insightful study on using a machine learning image recognizer system to recognize chart patterns that might have predictive power. This is in contrast to many chart patterns that people think might have predictive power by inspection… but do they really? Or did the forecasting analyst merely get lucky a few times? Xiu’s are backtestable chart patterns, since the patterns are recognized by a machine automatically. Has he shown that there may be something to the concept of “chart reading” to forecast asset prices? We will let you read the original study and make your own call:

[Note 2] While we are only looking at historical correlation among assets in this app, past correlation does not imply that future correlation will exist at the same magnitude and direction, or exist at all. Our apps assume that correlation is stationary (or fairly stable, at least), over time. This again is an assumption that is only an approximation to what actually happens. Here are some nice graphs of correlation over time from a higher level asset class point of view, rather than an individual asset point of view, to get you started on this idea:

https://advisor.visualcapitalist.com/asset-class-correlation-over-25-years/

Our motif in the app of using exhaustive backtests to test the assumptions of the model (such as this assumption of relatively constant correlation among portfolio assets) should expose such assumptions if they prove to be significantly wrong for various portfolios, and allow the analyst to make compensatory adjustments in model tuning parameters.

[Note 3] If you try to plot 3D point data with the Graph-R package, be sure to look up the slightly unusual CSV file format it uses, with two extra header lines above the typical single header line CSV:

DataFormat,2,
memo,,
x,y,z
-1.452513966,-1.555869873,-2.252853881
-0.113378685,-2.514367816,-1.810187267
-1.929625426,0.257921887,2.597039601

and so on for the remaining 3D data points

The documentation points out this unusual format, but it is not intuitive that this is the case.

[Note 4] The Yahoo 1 year chart setting seems to have some issues when overlaying time series. Note the truncation of the non TSLA series:

Yahoo Finance Charts at 1 year setting: Note red arrow showing several time series truncated. Bug, or are we just “doing it wrong”? The latter is of course a real possibility.

[Note 5] Our Monte Carlo forecasting apps that support portfolios are currently available on macOS and iPhone (and the iPhone version runs on iPad in a small window; or, at least it should). If there is interest in portfolio support for these types of studies on other platforms, please let us know.

https://apps.apple.com/us/app/mcarlorisk3d/id1493844588?mt=12

https://apps.apple.com/us/app/mcarlorisk-for-stocks-etfs/id412346415

Our single asset apps are available on all major app stores: macOS, modern Windows (store.microsoft.com), Google Play, iPhone, and iPad. Search for MCarloRisk3D and MCarloRisk3DLite on your favorite app store.

--

--