Sparkster’s Algo Backtesting Progress Update

Published in

Sparkster

13 min readSep 30, 2019

Product Vision

The Sparkster Backtester allows traders to create their own fully backtested strategy using a visual drag-and-drop strategy builder.

We aim to build a tool suitable for every level of experience, from seasoned algo trading professionals, through to brand new enthusiasts who want to follow the trading performance of experts.

With the crypto backtesting tool, you can:

Build Strategies

Mix and match any combination of indicators you can imagine using a simple drag-and-drop strategy builder.

Popular tools like TradingView for technical analysis limit the amount of indicators you can use on a chart — unless you pay for a premium level account. Even then, following the logic of how the combination of indicators fit together to create entry or exit points takes months of study and practice.

The Sparkster backtesting tool helps traders at any level of experience to understand the exact logical composition of indicators that produce a strategy, to then view on chart how any combination of indicators produce very precise buy/sell signals.

Test Strategies

Run comprehensive historical data backtests to find the best variations of your strategy for different currency pairs.

Building a strategy is only guesswork without seeing how well that strategy would have performed to produce profit (or loss) if executed in past events.

What’s more, different strategies perform better or worse with different currency pairs depending on fundamentals such as related market trends, team performance, influence from bitcoin volatility, overall market cap, and so on.

This means that no single strategy is suitable for all currency pairs at all times. Sparksters backtesting tool lets traders build specific strategies that best fit different currency pairs for maximum profit potential and minimal risk.

Manage Trades

Use your optimised strategies to help guide your trading.

The key to beating the market is being more responsive to market changes, more accurate with trading decisions, and removing the risks of trading on emotion with every pump or dump of the market.

The Sparkster backtesting tool helps traders make reasoned decisions about each and every trade.

Learn Together

Share or view strategies created by others to learn and profit from together.

Traders can ‘publish’ any strategy so it shows up in the Public Strategies section inside the Sparkster backtesting environment. This helps the community learn from each others experiments to refine and advance strategies. Of course, the very best strategies may be kept private by their creator to trade and profit from without giving too many secrets away.

Current Status

Following are some of the key changes that are now visible in the Sparkster backtesting tool for alpha testers.

15 Indicators:

The list now includes:

Chaikin Oscillator
EMA
MACD
RSI
WMA
Bollinger Bands
Chaikin Money Flow
Ichimoku Cloud
Parabolic SAR
Volume
On Balance Volume
Williams %R
Ultimate Oscillator

As our alpha test group requests new indicators they are being quickly added by the development team to the available indicator library.

ETH/USD Available

During the current alpha test phase, backtesting is restricted to ETH/USD while the team focuses on back-end infrastructure for multi-threaded scalability.

Alpha Tester group results

3 weeks ago we began a round of user testing with an alpha release test group.

Comments from the group included:

The user testing process involves several tools to help us fully explore the User Experience. One of the key tools we use at Sparkster is http://hotjar.com which includes many features for gathering feedback and insight, including form analysis, surveys, video capture, heatmaps, and more.

Heatmaps show mouse clicks throughout the Sparkster backtesting tool. Here’s a screenshot showing clicks on the performance table columns.

Video screen captures allow our development team to analyse the usability and workflow to help us optimise every stage of building and testing strategies for ease of use.

Our UX team learned that some alpha testers had trouble locating the strategy build window. These simple insights lead us to quick test iterations such as making the Open Creator tab more visible, and the addition of a short video demo.

Screen recordings also revealed friction in the user registration and onboarding process. Our front-end development team has taken steps to refine the registration form and tool navigation.

We have also noticed that performance table data may not be readily understood by less experienced traders. We are now refining the data columns displayed for historical trade analysis.

Another very welcome addition to the Sparkster backtest builder is a list of sample strategies, allowing new users to immediately begin with a solid starting point, easily cutomiseable by the user with alternative indicators or parameters.

Overall, video screen captures have revealed product features that our testers found to be unintuitive. We continue to tweak user experience to make the Sparkster platform as user-friendly as possible.

These insights ensure we are building a highly practical and intuitive tool that is suitable for all levels of cryptocurrency trading experience. This iterative development process accelerates the time required by our team to deliver a highly feature rich and functional product ready for public launch.

User Data Protection

As you can see in the screenshot above, certain block elements are not visible (whited out areas), which is part of Hotjar’s features for user data privacy. Further, when collecting data with Recordings, Hotjar automatically suppresses keystroke data on all input fields. In all cases, the data is suppressed client-side, the visitor’s browser, which means it never reaches Hotjar servers. Hotjars approach to privacy is descrbied at https://www.hotjar.com/blog/hotjar-approach-privacy/.

Next on the Roadmap

There is so much to do for the Sparkster backtester before an official public launch, and our development team are now working on a series of updates.

Most obviously is the inclusion of more currency pairs. ETH/USD is a good starting point as we build the back-end infrastructure for running all of the data analysis required for accurate backtesting. New currency pairs will be added gradually, with high volume pairs added first, including BTC, ETH, XRP, and so on.

What goes on behind the scenes is also of crucial importance. The amount of calculations required to analyse all this data is massive. So increasing the data crunching capacity of the platform to enable more users is the current priority for our back-end development team.

We are also updating the test results table with new columns that provide traders at all levels with detailed clarity about the historical trades executed by their strategy. This includes total gains/losses, average gains/losses, win rates, gains ratio, expected return on investment, time and date stamps per entry/exit, and more.

New alpha test users will also soon have access to a new short demo video for a quick orientation to the tool. As the release version progresses towards public launch, the onboarding process will support new traders with detailed guidelines while giving experienced traders a fast route to building strategies for rapid comparison and selection.

See below for a technical update.

Technical Update and Specs

Since trading platforms are primarily number-crunching programs, we have taken steps to optimise our application to fit this purpose. We are completing optimisations with respect to three major areas of concern: data retrieval, data format, and data storage.

Optimisation 1: Data Retrieval

Our first optimization occurred in the form of a database migration. Previously, we were using MYSQL, a database program that is great for relational data.

However, MYSQL wasn’t performing well in the case of retrieving thousands (or sometimes millions) of rows when we wanted to fetch a large list of datapoints; this was in spite of the fact that we had indexed the table where we store our data points so that we could retrieve data according to timestamps in the most efficient manner MYSQL had available.

The low performance came from MYSQL being an on-disk storage system. This meant that retrievals from the database were happening at the cost of disk IO which, even on our servers that run SSDs, were incurring overheads.

Our solution to this problem was switching to an in-memory database, and we chose Redis as our new datastore. Unlike MYSQL, Redis builds an in-memory database, only writing to disk to persist data. It is also one of the most popular choices for fast-access data (such as a large collection of sequential datapoints in our case.)

The advantage to storing data in memory is that access to the system’s RAM is much faster than access to the disk — even in the case of SSDs which have a constant read/write time — because of the data transfer rate to get the data from the disk into RAM.

Further, the processor can optimize data retrievals by loading regularly accessed data onto the L1 cache, putting it even closer to the CPU than if the data were in RAM. Our application benefits from this since we do common operations on mostly sequential data, and since we act on similar ranges of data repeatedly, we can simply store the data as close as possible to the CPU for fast access. We should note here that we still employ MYSQL for what it was truly designed for: relational data.

Thus, we use MYSQL to store profile information and use Redis for data that needs to be accessed fast, over a wide range, and efficiently.

Just by switching from MYSQL to Redis for our datapoint storage, we saw a dramatic speedup in our application, but as is a common adage among Computer Scientists: “Can we do better?” Yes, we can. This brings us to our next optimization, that of the data format.

Optimisation 2: Data Format

When we first switched from MYSQL to Redis, we were storing data in Redis in a common format known as Javascript Object Notation (JSON.) This meant that each datapoint would look something like: “{close: 50.40, volume: 5000, time: 1569518396}”. While this might look nice, there are a couple problems with this approach.

The first problem is that computers don’t understand this string. Therefore, when we retrieve the string from Redis, it must be parsed into actual datatypes. For example, the “close” field is turned into an actual decimal number (called a floating-point number,) the “volume” field is similarly converted, and the “time” field is converted into an integer. This process is expensive because the string has to be parsed to find the list of fields, then the fields’ data have to be parsed and the parser must guess accurately what datatype the fields contain. For the “close” field, for example, the process might be akin to the following:

Start at the 5.
Go forward until we find a comma, right-brace, or decimal point (this consumes the 5 and the 0.)
Since we found a decimal point, we know this is a floating-point number. Consume until we find a comma, or a right-brace (this consumes the 4 and the 0.)
Convert this number into its appropriate datatype (this involves additional steps we won’t cover here.)

Since we intend to retrieve millions of datapoints with one call to Redis, performing even these four steps on every datapoint multiplies the complexity of every call to Redis by 4. Granted, constants are dropped in time complexity evaluation, but the extra computation here is significant enough to be of some concern.

Optimisation 3: Data Storage

The second problem with storing data in JSON format is that it can get quite large. If we were to store the string presented above into Redis, it would occupy a minimum of 46 bytes in memory. While this might seem trivial at first glance, keep in mind that we have millions of datapoints, each taking on a similar format.

Further, 46 bytes is the lower end of the estimation. We have purposely shortened the numbers we presented to make the string presentable for this article; in reality, we have seen JSON strings of up to 70 bytes in size. If we consider a happy medium such as 60 bytes per string, for four million datapoints we will end up occupying a minimum of 228 MB of RAM.

This doesn’t come close to the amount of available RAM installed in most modern computers today (especially servers,) but our concern is all the data transfers between Redis and our application. Given the enormous size we have demonstrated, the data transfer time is no longer negligible. Add to this the high time complexity of parsing the JSON strings we have shown, and it becomes obvious that we can do better.

Our solution to solve both of these problems was to store data in Redis in a more binary-friendly manner. Why binary? Because computers work natively in binary, so if we stored our data this way, we’d incur less processing overhead.

Considering this, we rewrote our back-end code to store and retrieve data from Redis in base-64 notation. The base-64 encoding allows binary data to be represented as a string, allowing it to be stored and retrieved in any medium. The data are encoded by splitting the bytes of the data into six-bit segments. Each six-bit segment has a one-to-one mapping with a character. It therefore follows that there are 2⁶ or 64 possible characters that base-64 can generate, and this is where base-64 gets its name. Further, since base-64 encoding acts directly on the raw bitstream, its computation time is sufficiently small to be neglected in most cases. In our case, it can certainly be neglected since we always encode a fixed number of bytes: four bytes for the close price, four bytes for the volume, and four bytes for the timestamp. These are the standard sizes of single-precision floating-point numbers.

Therefore, we encode twelve bytes of data, resulting in strings that are only sixteen bytes long (12 bytes * 8 bits/byte / 6 bits). This is a dramatic improvement over the sixty-byte average we saw earlier, since we now only occupy a minimum of 61 MB compared to 228 MB.

Not only do we see a significant reduction in storage size, but decoding the base-64 strings into their original bitstreams is trivial as well, drastically reducing the computation time for data retrieval. This is because of the one-to-one mapping we discussed earlier. Essentially, each character maps to a stream of six bits, so converting a character back into its six-bit value is accomplished in O(1) time. Once we return to the original bitstream, we examine the first four bytes for the close price, the next four bytes for the volume, and the final four bytes for the timestamp.

For the sake of thoroughness, we will also add that we must also account for the endianness of the data storage and act accordingly, but the reader is free to explore this topic further on their own should they wish to explore the caveats of dealing with bitstreams.

Now that our data is stored efficiently, we are seeing quick responses to client-side requests for data, which was our goal.

Another area we explored was the issue of the client being overloaded with data. We saw problems where if too much data was requested, the client’s browser would stop responding and eventually crash. The issue here was that we were loading the entire dataset at one time and depending on the client to store the data themselves in their caches.

We now take a more conservative approach: we reasoned that there is no need to load data the user either doesn’t care about or cannot see at the moment. As the user scrolls left or right causing data to fall off the screen, we silently load more data in the background from the server.

The amount of data we load is directly dependent on how much the user scrolls. For example, if they scroll one full day, we load an additional day of data onto the graph. By contrast, if they scroll one-half of a day, we load only one-half of a day’s worth of data. By tracking the user’s scroll position, we can efficiently load data on-demand as opposed to loading thousands of datapoints at once, making the user experience better.

In addition to loading additional graph data in real-time, we also must consider the signals generated by a strategy. As the user scrolls left or right, they would also like to see the arrows indicating “by” and “sell” signals just like the arrows they see when they first load the graph. On account of this, we also push the signals information to the client as the user scrolls so that the signals stay with the user.

In Summary

As part of our ongoing optimizations, we have corrected the delay the user would experience when first loading the graph on the “View” page. For a small amount of time, the graph would not appear on the page. This problem was caused by the client waiting for the server to tell it what indicators to load onto the graph. The process involved the client sending the unique ID of the strategy it wants indicator information for, the server processing the list of indicators appropriate for that strategy and then delivering that information back to the client. We have eliminated this roundtrip time and now embed this information directly into the page, so that the graph can load instantly when the page loads without having to wait for a response from the server.

As part of general work, we have introduced the ability for users to like strategies and also share them. Strategies created by users are unshared by default, meaning that they are private. If the user wishes to share their strategy so that even those users who are not logged in can view it, they can click on the “Share” button which will immediately place the user’s strategy on the home page.

We have also made significant changes to our expression evaluator to run strategies. We use the expression evaluator to take the code that is generated by the blocks and translate into a language that our application understands. This step is necessary so that the generated strategy can run successfully on the server. As part of the work on the evaluator, we have added many new indicators that are available for the user.

The Sparkster Team!