Charting at Angel One

Abhin Hattikudru
Angel One Square
Published in
7 min readJun 26, 2023
Charting

In order of importance to me are: 1) the long term trend, 2) the current chart pattern, and 3) picking a good spot to buy or sell.

- Ed Seykota (Market Wizard)

Charts is the most important tool, used by a trader to make his trading decisions. Here at Angel One, it is designated as a Tier 1 service. Any issues with chart formation, immediately affects our customers. A couple of years ago, our charting solution went through a major revamp as noted here. Since then we had new learnings, and improved on our previous design. We thought it is time to present a technical article about charting today.

A typical trading chart looks like the one below. It involves 5 major aspects -

  1. Timeframe — The time period represented by a single candlestick
  2. Price information — The Open High Low and Closing prices of that timeframe
  3. Candlestick — A visual representation of the Open High Low and Close of the timeframe
  4. Visible Duration — A form of zoom level which decides the number of candles customer wishes to see.
  5. Technical Indicators — This is usually implemented by UI libraries.
Components of a chart

Now a charting solution should be able to -

  1. Calculate the candles as soon as the price changes happen (within a couple of seconds). This is usually implemented as a combination of
    - UI reading the price stream directly from the feed to keep the last candle up to date.
    - Backend also reading the price stream and caching the candles on databases, ready to be served on request.
  2. Return the pre calculated candles at scale (in the order of thousands of TPS during peak load).

The charting solution

Our charting solution involves an On Premise cluster for collecting the prices, a Kafka Mirror maker based replication to AWS, where we use some of the PaaS offerings of AWS to build charts as shown in the diagram below.

Charting Flow

This article will cover -

  1. Some details of the kafka setup that helps move data over to AWS
  2. The mathematical properties of price stream that are used to construct candles by the Kafka Consumer and
  3. The choice of storage (MySQL) over some alternatives.

UI aspects of the Charting solution will be covered in a future article.

Moving data across Geographies or Datacenters

While our charting solution is on AWS, the pricing information is generated on our On Premise clusters. So we need a reliable way to transfer the data from the OnPremise cluster over to AWS. To achieve this we use Kafka to store the prices locally, and Mirror Maker for geo replication. Kafka as a pubsub system scales to very high TPS (GigaBytes per second). However when moving this data across Geographies some parameters need to be closely looked at, both to prevent data loss and to get a higher throughput -

  1. auto.commit.enabled = false Consumers that do auto commit, run the risk of committing the offsets even though the destination did not receive the message.
  2. acks = -1 Ensures write is done on multiple machines before acknowledging the write, else there is a risk of data loss during broker failures. Acks = -1 (or all) does not mean every secondary replica should acknowledge the write, but only the minimum in-sync replicas needed (usually 2 including the leader) have acknowledged the write.
  3. max.in.flight.requests.per.connection = 1 If the ordering of messages is important to the consumer system, the default of 5 could result in out of order messages. This tends to happen when the producer retries a previously failed write, while a future request is already in flight.
  4. batch.size Depending on how much the underlying hardware supports processing larger datasets, in term of memory and processing power, a larger batch size will reduce the chattiness when travelling over a high latency network.
  5. compression.type Compressed data is always faster over a high latency network. If the Producer can spare the CPU cycles for compression, this should be opted.

There are of course more configurations in Kafka like the replication factor, the rack awareness of the Kafka cluster etc. but the ones mentioned above are some of the configurations to be accounted for by the producing and consuming applications.

Now that the data is available in AWS, we will move on to how we construct a candle for charts.

The math behind a candle

We consume the stream of price changes for the various scrips traded on exchanges, each of which is called a Tick. A Tick comprises of -

  1. Scrip Code — The stock/symbol for which the tick belongs to.
  2. Timestamp — The time that the price change occurred
  3. Price — The price at which the trade occurred
  4. TotalVolume — The total volume traded for the day at that point in time.

A candle for a Timeframe (say 5 mins) is represented as shown in the diagram below -

So building a candle for a Timeframe, effectively boils down to collecting all the Ticks for that Timeframe, and calculating the Highest, Lowest, First and Last Price among them. By tracking these values for the candle, along with the timestamp of the first and the last Tick, any new ticks can be used to update the state (candle). This is done, by comparing the data in the candle, with the data in the tick, as shown in the calculation below.

Candle.High = max(Candle.High, Tick.Price)
Candle.Low = min(Candle.Low, Tick.Price)
Candle.Open = Candle.OpenTimestamp > Tick.Timestamp ? Candle.Open : Tick.Price
Candle.OpenTimestamp = min(Candle.OpenTimestamp,Tick.Timestamp)
Candle.Close = Candle.CloseTimestamp < Tick.Timestamp ? Candle.Close : Tick.Price
Candle.CloseTimestamp = max(Canle.CloseTimestamp,Tick.Timestamp)

These updates can be implemented as an upsert on almost any of the commonly used databases out there. These functions are in theory orderless, and hence we can consume the ticks out of order.

Higher timeframe candles can be constructed from candles of smaller timeframes. A 30 minute candle can be constructed from the individual 1 minute candles of the same duration, by picking the Highest, Lowest, First and Last price of all constituent candles. So, while we ingest and prepare the 1 minute candles from the stream in realtime, candles of higher timeframe are built asynchronously through batches, using the 1 minute candles. During an API call, if we find a higher timeframe candle missing, we build it from the 1 minute candles on the fly, and hence data is still available immediately to the customers.

Volume can also be calculated into the candles, in a similar fashion, but we will keep that out of scope of this post.

The storage choice

Once prices are stored in a database, these prices can also be viewed as prices on a time series. So, one of the first choices we made was to use TimescaleDB, to store the prices. However TimescaleDB came with its limitations -

  1. Lack of High Availability — The Open source version does not have out of the box HA features.
  2. Cloud native versions are not present in AWS India, and the managed time series provides limited visibility on HA.

Other columnstore databases in the market too, are more oriented for batch use cases and analytical queries. They were neither able to support the high TPS we were looking for, nor the high update rate (without a lot of tunings). We then noted that, our application needs to be optimised for only one type of query -

select * from candles where
timeframe = “X” and
timestamp between (start_time,end_time) and
scrip_code = 'S'

So we decided instead, to just store the data in MySQL, and tune the indexes for this query. Since we were on AWS, we stored the candles on AWS Aurora MySQL. Implementing clustered indexes by the query filter (scrip code, timeframe and timestamp), brought the access latency to single digit milliseconds, and was able to scale to 1000s of TPS of read.

We even started metering the time of the last tick processed against the system time, when ingesting the data. This helps better monitor the delay of the ticks (over and above monitoring Kafka lag). We notice that this overall delay from the tick to the chart is around a second in most cases. Given below is the p99 (~1000 ms), p95 (~960ms) and p50 (~550ms) delay of tick ingestion on a typical day.

Tick processing delay

With the data stored in the DB, and with the help of a few other secondary indexes, we are also able to identify “special scrips” in the the watchlist that have made the highest move by volume, price or value, and even if they have reached highs or lows for the year or month.

Tags on watchlist rendered by charts data.

Conclusion

While the newer charts is way more reliable, and scalable we still have distance to go. With the recent announcement of a new Data Center, AWS Hyderabad, we plan on using it as a Active disaster recovery site for charts. We are also working on several other projects to improve the ingestion delays and scale of the overall application, which we will cover in future articles.

This project was successfully implemented with contributions from Rohit Miryala (Software Development Engineer 3), Vikas Rajoria ( Director of Engineering) and Aditya Sharma (Staff Site Reliability Engineer) and many more. Want to take this as an opportunity to thank them for their contribution.

--

--