Intuitive understandings of signature method and practical examples in machine learning
This article introduces feature mapping methods for sequential data, an area that has received increasing attention in recent years in the field of financial engineering. I aim to provide as intuitive and consistent an explanation of the research from the 2016–2023 literature as possible.
Consider a situation in which you wake up on Saturday morning and decide whether to postpone tomorrow’s picnic. You open the window and black clouds cover the sky, and you feel a slightly cooler breeze than last night. Every year at this time of year you have stable temperatures and lots of rain, and once it clouds over, the weather deteriorates for a while. But then the daily temperature difference begins to increase, and the weather remains clear for many days. You feel that the relationship between temperature and weather must have changed today and have decided that the weather will be better tomorrow.
This example shows how multiple time series data Xn (X1, X2, …) affect the forecast Y. If we consider the degree of cloudiness Y as a point that jumps through multiple sequential dimensions Xn such as temperature, humidity, and wind speed, we can imagine analyzing the behavior of Y from its trajectory. What comes to mind as such high-dimensional sequential data would be option pricing. In this article, I introduce a feature mapping method for sequential data, assuming its application to option pricing.
- Sections 2 and 3 discuss how deeply to analyze the trajectory of Y (Level)
- Section 4 introduce pre-processing when used in machine learning
- Section 5 present a study that improves on the problems of the mapping method
- Section 6 briefly introduce its application to generative models, which I have begun to use in my personal trading
1. Background
If an option price in front of you has a similar price movement to the past, can you really say that the price movement is similar to the past when the surrounding economic conditions are taken into account? And how well does machine learning work with such complex sequential (time series) data?
All machine learning models challenge “potential symmetries”¹ in the process of processing high-dimensional, complex data. For example, the training model used in image recognition can recognize the original image even when the original image has been rotated or its color has changed. This is because the training model is able to successfully obtain feature maps that are not affected by the changes.
On the other hand, with sequential data, it is more difficult to obtain “potential symmetries” than with image data. As is the case with option prices, the model behind the data changes over time, and a large number of variables that are dominant to price changes are interchanged. Therefore, situations often arise in which the shape and statistics are completely different, even though they potentially have the same characteristics.
For example, if φ squares t, Figure 1 shows the paths formed by (γx, γy) on the left, (γxφ, γyφ) in the center dashed line, and γx and γy (or γxφ and γyφ) on the right. Comparing the left and center plots, the mean, variance, and correlation of the two variables are clearly different before and after reparameterization with φ. On the other hand, on the right, the relationship between the two variables is perfectly consistent regardless of the reparameterization of φ. In other words, for those interested only in the relationship between γx and γy, φ is just noise. Therefore, statistical moments are not sufficient to map features from the 3-dimensional Path including t shown here.
This article introduces Signature, which is considered the ideal feature map for high-dimensional paths, in order to make it as intuitive as possible. If you are interested in the detailed formulas, check out the following study². It provides many excellent explanations with very clear expressions and examples.
2. Basics of signature
First of all, I will introduce Signature as intuitively as possible. I will first introduce you to some mathematical formulas, but it is not necessary to understand the detailed calculation process or content.
First, let’s imagine Path. For example, over the period 2022–1–24 to 2022–6–16, the price changes of Yahoo Finance’s Coca-Cola (NYSE: KO), IBM (NYSE: IBM), and General Electric (NYSE: GE) are depicted on a daily basis in Figure 2. The start date of the Path is indicated by a blue star and the end date is indicated by a red circle. In addition, we consider two Sub Paths, each with interval T=10 days. Signature is a mapper that allows us to compare the characteristics of the entire Path on the left with those of the Sub Paths on the right.
The above equation expresses Path information in the form of the Rieman-Stieltjes integral. If you are not interested in mathematical formulas, you are probably already fed up. But to illustrate the image of Signature, I’ll introduce a few more.
This formula integrates the Path repeatedly in various dimensions. For example, integrating twice over a two-dimensional Path (second level) results in a combination that includes several of the first equations.
Basically, the number of elements in a signature is determined from (a) the dimensionality of the path and (b) the level, which is truncated to an appropriate value by comparing the computational cost and the analysis results. For this reason, this basic signature is called a Truncated Signature.
The level refers to how much attention is paid to the path in detail and can be determined independently of the number of dimensions. For example, for a two-dimensional X = {X1, X2}, the third level Truncated Signature would have 14 dimensions as follows. You can see how the number of dimensions of the output is 2 + 2**2 + 2 ** 3 = 14, regardless of the length 4 of X.
import iisignature as isig
X1 = [1, 2, 4, 6]
X2 = [0, 5, 1, 10]
X = [X1, X2]
level = 3
path = np.array(X).transpose()
sig = isig.sig(path, level)
print(sig)
# [ 5. 10. 12.5 30.5 19.5
# 50. 20.83333333 65.66666667 21.16666667 106.5
# 38.16666667 92. 51.5 166.66666667]
# [ S1, S2, S11, S12, S21,
# S22, S111, S112, S121, S122,
# S211, S212, S221, S222]
This is the basic explanation of Signature. The next thing you may be wondering is what this Truncated Signature means in the end. Let’s check the relationship between a simple Path and a Truncated Signature in the following sections.
3. Intuitive understanding of the Signature
Suppose you have the data set X=(X1, X2) of length 4 introduced earlier. In addition, you have a large number of Xs with slightly different numerical values. After a quick visual check, you will probably want to group them by k-means or euclidean distance for starters. But here I dare to introduce the intuitive meaning of using Truncated Signature.
Here, Truncated Signature up to the second level is explained. Since level is the depth of repeated integration, it can be calculated above the third level regardless of the number of dimensions to be analyzed. However, it seems to be difficult to intuitively express the level above the third level, so this section is limited to the explanation of the second level.
Figure 3 plots the aforementioned X1 and X2. In the right figure, the horizontal axis is X1 and the vertical axis is X2. The right plot, when calculated with Truncated Signature, yields the five variables (S1, S2, S11, S12, and S22). What are the meanings of these numbers?
(S1, S2) is the integral of X1 and X2, respectively, and (S1, S2) = (5, 10). This means that whatever number X1 (or X2) is at t = 1, 2 does not affect the result, but the shadow of Path projected onto the X1 (or X2) axis. (S11, S22) is also more complicated to calculate but has the same meaning. For example, (S11, S22) is (S1**2/2!, S2**2/2!), or (12.5, 50).
Figure 4 shows an image of these values. S12 has a property equal to the area in the left plot and S21 has a property equal to the area in the center plot. The Truncated Signature has a number of other important properties, for example, the figure on the right can be calculated using the following equation.
The right plot in Figure 4 shows the area of the closed area formed by the Path, where the clockwise area is negative and the counterclockwise area is positive as viewed from the direction the Path is moving. In the example in Figure 4, the area is (30.5–19.5)/2 = 5.5.
Furthermore, the above relationship holds and indeed 5 * 10 = 30.5 + 19.5. Thus, Truncated Signature extracts information about the closed area formed by the starting and ending points of the Path and the paths along the way.
Let us consider a more complex example: in Figure ⁵³, the Truncated Signature (eighth level) is computed from the MNIST handwriting data to estimate the numbers (S1, S2, S11, …, S22222221, S 2222222222). Handwritten data differs from image recognition data in that the plot order is specified from the beginning to the end of the writing, and the path direction is clearly defined. Although the results are not shown here, in the example, the Ridge-based model trained 60000 samples of Truncated Signatures and was able to classify them with high accuracy.
As can be seen from the above, Truncated Signature works well as a feature map for sequential data (time series data). The examples introduced so far are 3-dimensional data including t, but it is easy to imagine how well it works for higher dimensional data as well.
4. Path Transformation
So far, we have checked the relationship between Path as time series data and Truncated Signature. Of course, you can leave it as it is, but depending on the type of data, you may need to transform Path in preprocessing. Next, we will introduce Cumulative Sum and Lead-Lag, which are frequently used.
Figure 6 shows X2 represented by (left) Lead-Lag and (right) Lead-Lag of the cumulative sum CS(X2). The left figure simply represents the increase or decrease in X2 as it is. Similarly, the figure on the right represents the change in CS(X2) from X2 = {0, 5, 1, 10} to the cumulative sum CS(X2) = {0, 0+5, 0+5+1, 0+5+1+10}. What is the impact of adding these transformations in the Signature preprocessing?
As we have seen in the previous section, the Truncated Signature extracts information about a closed area, which in Figure 6 is the area formed by the three colored triangles and the dotted line. The specific formula is beyond the scope of this article, but the area of the closed area formed by the Lead-Lag is related to the statistical moment of X²². For example, at the second level it means the mean and variance of X2, and at higher levels it means higher statistical moments. Thus, using Cumusative Sum and Lead-Lag allows us to take into account the statistics of the Path, and depending on the data, deforming the Path in preprocessing may improve performance.
In the example, only X2 is transformed, but it is common to handle multiple dimensions. How should we deform when there are several variables? At least as far as the research on Signature is concerned, there does not seem to be a universal combination for deforming Path. There are various ways to transform individual variables into Truncated Signatures, to make the main variable Lag and all other variables Lead, to make only certain variables Lead, to accumulate all variables and Lead-Lag them as a single variable, and so on. As part of the pre-processing validation, it would be interesting to explore the transformation method that best suits your application.
5. Extension to kernelization
Truncated Signature allows you to choose any level, but as you increase the level, the number of dimensions increases exponentially. The following example compares the number of dimensions of the second level and the third level. As you can see, as level is increased, the number of dimensions of the second level X increases from 6 to 14, and the number of dimensions of the third level X increases from 12 to 39.
path = np.array([X1, X2]).transpose()
sig_2 = isig.sig(path, 2)
sig_3 = isig.sig(path, 3)
print(f"sig_2d_lev2: {len(sig_2)}, sig_2d_lev3: {len(sig_3)}")
# sig_2d_lev2: 6, sig_2d_lev3: 14
path = np.array([X1, X2, X1]).transpose()
sig_2 = isig.sig(path, 2)
sig_3 = isig.sig(path, 3)
print(f"sig_3d_lev2: {len(sig_2)}, sig_3d_lev3: {len(sig_3)}")
# sig_3d_lev2: 12, sig_3d_lev3: 39
For example, the following study⁴ uses Truncated Signature for WTI crude oil futures prices. The variables employed are 1. median Ask and Bid prices, 2. spreads, 3. order imbalances, 4. volume accumulation, and 5. normalized time. Using these 5 Lead variables plus 1 (median Ask and Bid prices) Lag as input, a total of 6 variables were used to estimate the time period of the price data, resulting in an Accuracy of 91% for OoS. However, since the number of input dimensions is 6, a total of 1555 dimensions need to be calculated, even though the level is only 4, which is not very high. Furthermore, after evaluating the variables in LASSO, we found that S1515 and S5151 at the fourth level contributed the most to the classification. So, if you were to compute the Signature at the third level, you would find that performance would be degraded.
Thus, when using a truncated signature, the level is literally truncated somewhere due to computational cost issues, but of course information is lost along the way. There is no guarantee that the computable levels contain sufficient information, and the compromise between computational cost and performance depends on the data. If we take into account the need for variable combination verification and parameter tuning of the model used, the computational cost tends to increase significantly. Signature Kernel¹, an extension to kernelization, is an effective way to deal with such computational burden and information loss.
Signature Kernel literally calculates kernels (the inner product) of two Signatures=(Sx, Sy); unlike Truncated Signature, it does not need to consider levels and can convert two SubPaths into a simple score. Compared to Truncated Signature, where the number of dimensions increases exponentially, Signature Kernel has superior performance in terms of both memory and time since it does not incur unusual computational costs.
In addition to improving dimensionality, it also excels at maintaining information content. Research¹ has evaluated classification and regression prediction with the Support Vector Machine on time series data and Bitcoin prices in multiple dimensions. Comparing five different kernels, including Truncated Signature, which employs a realistic level, the Signature Kernel (Sig-PDE in the study) performed better on almost all data.
Beyond this study, many studies have shown that Signature Kernel outperforms Truncated Signature. For these reasons, I personally use Signature Kernel-based methods instead of Truncated Signature when I use Signature in my trade.
6. How to use it in the trade
Finally, let us imagine a situation in which the price information of the target and another issue is captured as a marginal probability distribution. Naturally, the price of the target and the price of another issue X are in a 1:1 relationship. However, using Signature, it is possible to map paths that comprehensively evaluate multiple price information into features. This is like calculating features directly from Figure 2.
There are various ways to utilize the acquired features. One of them is to create a classifier by associating them with labels, as shown in Figure 5. A particularly interesting method in such Signature research is scoring based on Maximum Mean Discrepancy (MMD). Finally, I will briefly discuss some of the studies that have been helpful in implementing my personal trading tools.
Figure 7 shows the results of mapping price changes with Signrature and scoring the results⁶, using seven daily stocks (GE, IBM, JPM, KO, PG, XOM, ^GSPC) as the Path price data. Only KO is depicted here for visibility, and the blue line shows the use of EMA for the score. Note that of course this score was generated from 7-dimensional price data, so price changes other than KO are taken into account. In addition to MMD scoring, this study evaluates a variety of approaches to analyze regimes using Signature, including computationally simpler path-to-path scoring methods, offline clustering, and evaluation of data generated by the pricing model.
In addition, here is another study. The use of Signature for option price generators is expected to be a discriminator. In this study⁵, the price data generated by the generator is mapped by Signature Kernel and scored by MMD in comparison with the actual price. The comparison is based on a GAN using a generic discriminator, and three types of synthetic data (geometric Brownian Motion, rBergomi, and actual currency pairs) are used for validation. In terms of implementation, the system eliminates the instability that can be a problem when training GANs, and in terms of performance, the results are generally excellent in all situations.
References
[1] Cristopher Salvi, Thomas Cass, James Foster, Terry Lyon, Weixin Yang, “The Signature Kernel is the solution of a Goursat PDE”
[2] Ilya Chevyrev, Andrey Kormilitzin, “A Primer on the Signature Method in Machine Learning”
[3] https://github.com/pafoster/path_signatures_introduction
[4] ”Extracting information from the signature of a financial data stream”
[6] Zacharia Issa, Blanka Horvath, Maud Lemercier, Cristopher Salvi, “Non-adversarial training of Neural SDEs with signature kernel scores”
[7] Zacharia Issa, Blanka Horvath, “Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures”