Unified Stream and Batch Processing of WorldQuant 101 Alphas in DolphinDB

DolphinDB
6 min readOct 21, 2022

--

In 2015, the formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. To conveniently calculate these 101 alphas in DolphinDB, you can use the functions encapsulated in the module wq101alpha (wq101alpha.dos).

This module has the following advantages over Python:

  • Better Performance: DolphinDB outperforms Python by a median of 15.5x. For 27.5% of the alphas, DolphinDB is more than 100 times faster than Python.
  • Unified Stream and Batch Processing: Functions defined in this module can be used for both stream and batch processing.
  • Simplicity: Most of the 101 alphas can be written with DolphinDB built-in functions. No need to develop new functions.

All scripts in this tutorial are compatible with DolphinDB V2.00.8, 1.30.20 or higher.

1. Naming Conventions

  • All function names in module wq101alpha start with “WQAlpha” followed by a number from 1 to 101. For examples, WQAlpha1WQAlpha2, etc.
  • The following is a list of parameters for alphas defined in the wq101alpha module. Each alpha may use a different set of parameters (see appendix).
  • Two types of alphas are defined in the wq101alpha module: alphas with and without industry information.

2. Examples

This chapter expounds how to calculate alphas with specific examples.

2.1 Environment Setup

Add the module file wq101alpha.dos to [home]/modules.

The [home] directory is specified by the configuration parameter home. (To check the value of home, use getHomeDir())

2.2 Data Preparation

You can simulate daily data with DailyDataSimulation and industry information with IndustryInfo.

Alternatively, if you already have tables of daily data and industry information, you need to perform an equal join to combine these two tables, and make sure that the column names are consistent with the parameters defined in the module.

If you need to change the column names, you can use function prepareData in module prepare101 (add prepare101.dos to [home]/modules). It converts the column names to the defined parameters.

  • rawData is a table containing non-industry information.
  • infoData is a table containing industry information.
  • startTime and endTime determine the start time and end time of data.
  • Other parameters are column names to be converted.

Import the wq101alpha module and load the data you prepared:

2.3 Calculating Alphas Without Industry Information

In the wq101alpha module, the calculation of alphas without industry information is generally conducted on two dimensions: time-series and cross-section. For these factors, you need to prepare a matrix as the input, and then call function WQAlpha#. Check Appendix for specific parameters.

For example, you can calculate alpha 1 and alpha 2 as follows:

We provide function prepare# and calAlpha# in the prepare101 module to save your time spent on specifying parameters.

Function prepare# prepares the parameters required for each alpha and function calAlpha# encapsulates function prepare# and wqAlpha#.

Take alpha 1 as an example:

In addition to matrices, parameters of function WQAlpha41, WQAlpha54, and WQAlpha101 can also be vectors.

For example, you can calculate alpha 101 using a SQL statement as follows:

use wq101alpha

res = select tradetime, securityid, `alpha101 as factorname, WQAlpha101(close, open, high, low) as val from data where tradetime between startTime : endTime

2.4 Calculating Alphas with Industry Information

To calculate alphas with industry information, you need to specify a table as the input.

Take alpha 48 as an example:

You can also use function calAlpha# in prepare101 module.

The alpha calculation in the paper 101 Formulatic Alphas adopts several industry classifications, such as ``IndClass``, ``subindustry``, ``IndClass.industry``, ``IndClass.sector``. For the sake of convenience, only ``IndClass`` is used in this module.

Functions in the wq101alpha module return a matrix or a table. You can save your results to database if needed. Please refer to wq101alphaStorage.

3. Performance Comparison

Our testings show that the wq101alpha module of DolphinDB outperforms Python pandas and Numpy.

  • Hardware

CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz

OS: 64-bit CentOS Linux 7 (Core)

  • Data

We use the simulated daily data in a year to conduct performance testing (see TestData).

3.1 DolphinDB vs. Python Pandas

We compare the performance of alpha calculation implemented by DolphinDB module wq101alpha and Python pandas.

The following is the main script for performance testing of the wq101alpha module (see full script in wq101alphaDDBTime):

The following is the main script for performance testing of Python pandas (see full script in wq101alphaPyTime):

The execution time of all 101 alphas can be found in PerformanceComparison.

Alphas that have not yet been implemented in Python pandas or whose results are questionable are excluded.

69 alphas are available for comparison (in millisecond):

The result shows that wq101alpha in DolphinDB outperforms python pandas. DolphinDB is faster than Python by a median of 15.5x. For 27.5% of the alphas, DolphinDB is more than 100 times faster than python.

3.2 DolphinDB vs. NumPy

Considering NumPy may have better performance than pandas, we choose 11 alphas that are time-consuming in pandas and implement them with NumPy. See partialAlphaNumpyTime for test results of NumPy.

Performance comparison of DolphinDB and NumPy:

We can see that while NumPy is faster than pandas, DolphinDB outperforms both.

DolphinDB has optimized the implementation of its window functions. In comparison, NumPy is not optimized for window calculations implemented by numpy.lib.stride_tricks.sliding_window_view.

4. Stream Processing

It is complex to implement most alphas in real time, which requires more than one stream engine. DolphinDB provides the streamEngineParser function to automatically form a pipeline of stream engines to carry out the specified metrics calculation. In streamEngineParser, you can directly call functions in module wq101alpha .

See full script in wq101alphaStreamTest for the implementation of real-time alpha calculations.

Take alpha 1 for example:

Define the schemata of input and output tables.

Call the wq101alpha module and use WQAlpha1 as the metrics for the streamEngineParser function.

Check the status of the stream engines with function getStreamEngineStat().

Append data to stream engines and check the output table ``resultStream``.

5. Conclusion

This tutorial introduces how to calculate 101 alphas with DolphinDB built-in functions in the wq101alpha module. This module features efficiency, speed, and simplicity, and achieves unified batch and stream processing.

Appendix: Required Parameters for Each Alpha

  • Alphas without industry information
  • Alphas with industry information

See Also

Thanks for your reading! To keep up with our latest news, please follow our Twitter @DolphinDB_Inc and Linkedin. You can also join our Slack to chat with the author! 😉

--

--

DolphinDB

Real-time platform for analytics and stream processing, powered by high-performance time series database.