C++ Backtest Environment — Part 1: Yahoo Finance Data Interface in C++

Evan Kirkiles
Evan Kirkiles Blog
Published in
5 min readMay 27, 2018

As a summer project, I have decided to build my own backtesting environment for trading algorithms in C++ as a way to learn C and acquire a wider understanding of important quant finance fields including risk management, trading biases, and backtest design. I first learned about this idea from a talk by Michael Halls-Moore, the founder of QuantStart.com, who strongly recommended it not only for practical reasons but also educational. I am by no means an expert in any of the fields required, and am learning C++ as I go. Fortunately, there are many resources available documenting similar aspects of this project.

The first step to a working environment is formulating a plan of action. I aim to create an event-driven backtest system that can take varying frequencies of historical data and use it for algorithmic calculations. One roadblock immediately cropped up in this strategy: a good, free historical data feed was incredibly difficult to come by, especially with Yahoo’s Finance API discontinued about a year ago. Yahoo was by no means a truly reliable datafeed, as it has many inconsistencies in data (from what I can tell by reading the complaints of many of its users). However, it seems like it used to be the industry standard for free data before its API was dismantled, and thus I decided to implement it into my system.

Of course, the loss of the Yahoo Finance API meant I had to find a way to get to the wealth of data in their system. After a little research, I came across a small article written by Brad Lucas illustrating a new way to sidestep the cookie-based wall preventing easy access to the CSV’s that are free for download on Yahoo Finance. Unfortunately, the method was using Python, not C++, so I had to transcribe it for use in my environment. This process is documented throughout the rest of this post.

To ‘protect’ the CSV files, Yahoo Finance has two security blocks in place: a crumb located in the URL and a cookie check to prevent direct remote access. These are fairly easy to bypass with the use of the `libcurl` library in C.

Libcurl offers cookie importing and exporting, so to get through the cookie wall simply required grabbing the cookies from the Yahoo finance page and then showing them to the CSV download link, like so:

// Create URL for the Yahoo Finance history for that stockstring cookie_url = string("https://finance.yahoo.com/quote/") + string(symbol) + string("/history?p=") + string(symbol);// Set URL to go to finance.yahoo.com and prepare cookies filecurl_easy_setopt(cookiecurl, CURLOPT_URL, cookie_url.c_str());curl_easy_setopt(cookiecurl, CURLOPT_COOKIEFILE, cookiefilename);curl_easy_setopt(cookiecurl, CURLOPT_COOKIELIST, "ALL");// Netscape format cookiesnprintf(nline, sizeof(nline), "%s\t%s\t%s\t%s\t%lu\t%s\t%s",".example.com", "TRUE", "/", "FALSE",(unsigned long)time(NULL) + 31337UL,"PREF", "hello example, i like you very much!");// Set file as cookie jarcurl_easy_setopt(cookiecurl, CURLOPT_COOKIESESSION, true);curl_easy_setopt(cookiecurl, CURLOPT_COOKIEJAR, cookiefilename);curl_easy_setopt(cookiecurl, CURLOPT_COOKIELIST, nline);curl_easy_setopt(cookiecurl, CURLOPT_FOLLOWLOCATION, 1);curl_easy_setopt(cookiecurl, CURLOPT_HEADER, 0);
// Perform HTTP request
curl_easy_perform(cookiecurl);// Sends all retrieved cookies to filecurl_easy_cleanup(cookiecurl);curl_global_cleanup();

Having downloaded the cookies to a .txt file in my project directory, I was then able to simply set that file as the cookie file to use when later sending an HTTP request to the download link in libcurl request ‘curl’:

// Set cookiescurl_easy_setopt(curl, CURLOPT_COOKIEFILE, cookiefilename);

The second security measure Yahoo Finance put in place, a crumb placed in the download URL, was much less straightforward. The crumb to use was hidden in the body of the Yahoo Finance main page, and required searching for a specific string to acquire it. Fortunately, the crumb is always 11 characters long, so I only had to find the position of the string preceding the crumb to get the crumb itself. To first get the body of the page, I used libcurl:

// Open the filecrumbfile = fopen(crumbfilename, "wb");if (crumbfile) {// Write the page body to this file handlecurl_easy_setopt(cookiecurl, CURLOPT_WRITEDATA, crumbfile);// Performs blocking file transfer as definedcurl_easy_perform(cookiecurl);// Close header filefclose(crumbfile);}

Then, I parsed through this body of text for the characters “ CrumbStore\”:{\”crumb\”:\”” (‘\’ in front of any quotation marks in the string of course) and acquired the substring of length 11 that started when the located string ended.

// Search for crumb in newly created body text file
ifstream searchFile(crumbfilename);
string str;while (getline(searchFile, str)) {if (str.find("CrumbStore\":{\"crumb\":\"") != str.npos){size_t pos = str.find("CrumbStore\":{\"crumb\":\"") + 22;crumb = str.substr(pos, 11).c_str();break;}}

With this in hand, I used string manipulation to build the download url for the OLHCVA data (open-low-high-close-volume-adj.close) of the stock over a given period with a specific interval (daily, weekly, monthly, etc.).

// Get crumb and use it to create the download urlstring down_url = string("https://query1.finance.yahoo.com/v7/finance/download/") + string(symbol) + string("?period1=") + get_time(startdate) + string("&period2=") + get_time(enddate) + string("&interval=") + string(interval) + string("&events=history&crumb=") + get_crumb_and_cookies(symbol, cookiefilename, crumbfilename);

Finally, using this download URL with the cookie earlier exported from Yahoo Finance, I ran a libcurl HTTP request which successfully retrieved the .csv. Hooray!

Of course, this only meant I had the .csv file of the data, which I still had to format for use in my backtest environment. I used a simple ifstream and sstream to do this, with ‘,’ as the delimiter separating each string on each line. In this way, I was able to read all the .csv data into a map of maps with the date as the primary key and then ‘open’, ‘low’, ‘high’, ‘close’, ‘volume’, and ‘adj’ as the secondary keys to get each value:

// MarketEventQueue constructor
// Creates the two-dimensional map 'data'
ifstream csv(csv_file);string line;// Iterate through the csv filewhile(getline(csv, line)) {string date;// Get string streamreplace(line.begin(), line.end(), ',', ' ');stringstream ss(line);// Get each datapointmap<string, double> dataRow;ss >> date;ss >> dataRow["open"];ss >> dataRow["high"];ss >> dataRow["low"];ss >> dataRow["close"];ss >> dataRow["adj"];ss >> dataRow["volume"];// Put date row into event queuedata[date] = dataRow;}

By initializing the MarketEventQueue in my CSVReader, I was then able to easily query the Yahoo Finance historical data. Example:

// moves is a MarketEventQueue object from a YahooFinanceCSVReader to get the prices of AAPL between 2018-01-25 and 2018-02-25 with daily frequencymoves.data["2018-01-29"]["open"]

The output for the above code was 170.16, the correct open price for AAPL on January 29, 2018 as determined by Yahoo Finance.

With the data feed running smoothly, I have now turned my sights on the event-driven pipeline so that I can feed the data into an algorithm bar-by-bar as if it were live trading.

Code used in this post available at: https://github.com/evankirkiles/backtest-environment

--

--