Proof Engineering: Security Master

How we compile a US Equities security master every morning

Prerak Sanghvi
Proof Reading
12 min readMar 1, 2022

--

“Security Master” is a fancy industry term referring to the list of securities known to a trading system on any given day. It sounds innocuous — how hard can it be to list out all publicly-traded US stocks? — but it is a deceptively hard job. This semi-technical article talks about the basics of a US equities security master, why it is a challenge, and how we compile a scrappy but fairly accurate security master each morning from free or inexpensive sources.

Note: We only trade NMS stocks (aka listed stocks), which means this discussion will omit OTC stocks like pink sheets and bulletin boards.

Background

Security masters are hard. In most equity trading organizations, there is typically a dedicated person or team, whose thankless job it is to compile a working security master for the trading system. If you are that person, no one ever commends you for getting it right day in and day out, but one day you get a single security attribute wrong, and suddenly there are people at your desk demanding root cause analysis and post-mortems. What you would like to tell them is: “Meh, there is no such thing as a security master — this list is based on vendor data and compiled based on heuristics. Today, the upstream data was just wrong or violated some of our assumptions. There are no real action items here other than pointing it out to the vendor.”

This article is about how we take data from multiple sources, stitch it all together, and derive each security attribute from the most reliable source for that attribute.

Wait, is this even a problem?

US equities markets are probably the most mature markets in the world, and yet, on any given morning, there is no definitive list of stocks that are trading on the exchanges (or a definitive list of attribute changes for existing stocks, known as Corporate Actions). To be sure, lists exist, but they’re just not fully reliable or timely (keep in mind that trading begins at 4am ET), which is why nearly every vendor offers an hourly or intraday refresh process.

Some vendors will tell you they have such a list, but trust me, they don’t. There are small quirks and not-so-occasional data issues with every one of them (and they’ll tell you it’s because of upstream data). FINRA CAT produces the best symbol master list in our view, but it doesn’t include most of the attributes you actually need for trading and clearing/settlement. Even DTCC doesn’t have the most accurate data until later in the day — if you happen to be in a position to get data files from them, you’ll find that they’re often a day behind. Listing exchanges (aka primary exchanges) are supposed to be the “horse’s mouth”, the original source, but no, they don’t have a timely accurate list either.

The way this is supposed to work is that issuers provide information to listing exchanges about new listings or corporate actions, and the exchange is supposed to record and disseminate this information to everyone — to the DTCC, FINRA, the SIPs (Securities Information Processors), data vendors, and so on. And there are deadlines for when the information needs to be furnished (e.g. it might be 6pm ET on T-1 for the SIPs). But there are changes that can happen after these cutoff times (e.g. a corporate deal gets finalized after the cutoff time), and the communication of some of this information is manual (someone gets on a phone or sends an email or a fax). Couple this with the fact that the exchanges don’t have a public-facing feed for this information, and no one really knows what the “final” set of information is. Or perhaps a better way to put it is that there is no final set — changes are happening even at 9am ET.

Corporate Actions

You may have heard of a Corporate Actions feed, and you may be wondering if you can just ingest that and apply the updates to your existing security master to derive the next day’s security master. Yes, in theory, and in some cases, you must (e.g. if you support Good-Til-Canceled orders and need to adjust orders based on a stock split).

But there are a few reasons not to do this, if you can at all avoid it. First, the data timeliness issues I mentioned above are applicable to corporate actions lists as well. This makes sense because fundamentally, this is the same information. Second, it is hard — some of the changes can get wonky and are not easy to apply (e.g. a merger with both cash and stock components). Or if you’ve been in this industry long enough, you may recall the extremely confusing GOOG → GOOCV/GOOAV → GOOGL switcheroo. Third, there is no such publicly available feed (established vendors like Bloomberg can charge over $100K/year for this data).

For trading purposes, it is much easier to just compile a fresh list of stocks each morning. You can then compare the security master with the previous day’s data and generate out an Adds/Deletes/Updates list for sanity check purposes (and perhaps compare to a list like this and this).

The one piece of information that does require understanding corporate actions is if you need the adjusted previous closing price as part of your security master (we do). See more on that below in the IEX Cloud section.

Security Master Sources

Below are all of the sources we use to compile our security master.

FINRA CAT Reference Data

This is easily the most definitive list available publicly (or as definitive as these things get). We use this as the bones of our security master, which means that our list will not include any security outside of the FINRA list.

It is not perfect, and it may not be timely enough for those who need this at 4am ET when US Equities markets start trading, but it is reliable and complete. The start-of-day file is available around 6am and intraday updates start at 10:30am. The biggest issue though is that it only contains 3 attributes: the ticker, the primary exchange, and whether the security is a test security. You need a whole lot more to actually trade equities.

Exchange Symbol Directories

NASDAQ’s symbol directory is the richest publicly available source, and one of the few that includes the “round lot size” (MEMX/MIAX are the others).

NASDAQ’s symbol directory is timely (published prior to 4am ET, although it does incorporate updates throughout the day), contains nearly everything you need to trade (but not the adjusted previous closing price), and is available in a machine-readable format (pipe-separated values).

NASDAQ actually has multiple versions of this list: NASDAQ-listed, other-listed, NASDAQ-traded, NASDAQ-BX-traded, and NASDAQ-PSX-traded. From my experience/observation, the NASDAQ-BX-traded securities list is typically the most complete and updates ahead of the others (not sure why). That is the one we use.

NASDAQ also produces a few other lists that may be of interest (these include data for all exchanges, not just NASDAQ): (1) Add-Deletes list (2) Symbol-Change list (the JSON API version is here) (3) Splits (API here) (4) Dividends (API here). And there’s more if you browse around — IPO Calendar, Earnings, etc.

Kudos to NASDAQ for providing all of this information. Every other listing exchange seems to be stingy in this regard — NYSE doesn’t even bother with a public list (they sell it instead), while CBOE took the term “symbol master” literally and provides just a list of symbols (they hilariously include a single-column CSV download). Among non-listing exchanges, MEMX and MIAX are the next best when it comes to an instrument directory, IEX only provides a human-readable list of symbols, and I can’t find any such list on the LTSE website.

Vendor — IEX Cloud

IEX Cloud is a data distribution business owned by the parent company of the IEX Exchange. It is an incomplete choice for Equities reference data (e.g. there is no lot size information on any API call), but there are a few reasons it is worth a look.

First, they are a good source for OpenFIGI identifiers (FIGI = Financial Instrument Global Identifier). A quick intro to OpenFIGI: there is a deep history there, but a one-line description is that OpenFIGI is Bloomberg’s contribution of its “Bloomberg Global Identifier” to the industry as an open symbology standard. So, if you’re looking for a durable identifier that can identify a security/issue across name changes and corporate actions, OpenFIGI is a good candidate. OpenFIGI’s own website provides a free API, so you don’t really need a vendor like IEX Cloud to provide it, but the API is not entirely straightforward to use (e.g. you are required to know the security type when querying for a symbol, or you have to know when you can use the TICKER field vs the ID_EXCH_SYMBOL field - see OpenFIGI section below for more).

Second, they are a good source for the adjusted previous close price. After almost any corporate action, even as simple as a dividend, the stock price needs to be adjusted (e.g. in the case of a dividend, the official previous close is decremented by the amount of the dividend; if you thought dividend is free money, sorry to break this to you). This fully-adjusted previous closing price is not an easy piece of information to get and IEX Cloud does a terrific job at it (we use the fClose field in this API call).

Third, the cost. If all you want is this reference data, it is possible to get it from IEX Cloud for as little as $9/mo. Enough said.

Security Attributes

If you think of the security master as a single table (which is a bit simplistic), the FINRA CAT list tells us what rows this table should have, while the below security attributes are the columns.

Security Master compiled using multiple sources of information

Durable Security Identifier

We want our security master to contain a durable identifier that remains stable across name changes and corporate actions. This may not seem like a real requirement at first glance, but it is an absolute must if you perform any sort of security-based analysis on your data (and even more so if historical data drives your trading decisions in some form).

There are essentially two no-cost choices in this regard — Bloomberg’s OpenFIGI or Refinitiv’s PermID. We’ve chosen to adopt OpenFIGI (PermID seems fine as well, but I couldn’t find any rights or warrants on their website, so didn’t dig into it further). More on OpenFIGI below.

You may be thinking about the other commonly used identifiers in Equities such as CUSIP (and CINS), ISIN, and SEDOL. Well, those identifiers are licensed and have costs associated with using them. Last I checked, the base license for CUSIP use started at $50K/year (and it goes up from there depending on how many unique identifiers you use). CUSIP use is unavoidable for clearing and settlement but there is a carve-out for that particular use case (we get our CUSIPs from our clearing firm outside of this process).

This blog post is a good read on this topic.

Ticker Symbol

Speaking of durable security identifiers, the ticker symbol is definitely not one. Symbols change all the time, and also there is no consistent symbology between participants of the equities ecosystem.

Take this example: we receive an order for Berkshire Class B shares as symbol=”BRK”, suffix=”B”. We route a child order to NYSE as “BRK B” and to Nasdaq as “BRK.B”. We use “BRKB” as the symbol with our clearing broker. We look up this security in Bloomberg as “BRK/B”, in Reuters as “BRKb.N”, and in Yahoo Finance as “BRK-B”. That’s 7 variations right there.

At this point, the canonical document for converting between CMS (NYSE family of exchanges + MEMX) and INET (Nasdaq, CBOE, IEX, MIAX) symbologies is this page.

Here is a Rosetta Stone of symbology that I wish someone had given me years ago (please feel free to let me know what the “?” values are supposed to be and I’ll update):

Here are a few handy Python functions to convert between symbologies:

Security Description/Name

Not much to say here other than that the security name must not be used for anything other than display purposes. These things are extremely unstable and non-uniform.

Primary Exchange

A security’s primary exchange is used for a few different things:

  • Where to subscribe for regulatory halts and other security status information
  • Where to route opening and closing orders
  • Where to post orders intraday (regardless of other heuristics, we typically always post on the primary exchange)

Lot Size

Rant about lot size: Consider this: (1) industry-standard market data (SIP) is only published in units of “lots”, and (2) exchanges only protect and display round lot or larger orders, and (3) there is intense industry debate on how round lot sizes should be established. And then realize that there is no official way to actually look up the lot size of an equity security — it’s baffling really.

Out of the 12000+ equity securities, only 11 have a lot size other than 100 at the time of this writing. In our system, the lot size is used by the trading strategy for multiple heuristics. For example, it is used to determine the minQty for an order, the size of the opportunistic slice, and the minimum size of a VWAP bucket, to name a few.

Security Type

We currently do not derive this information from any source, but it is on our list to add. It would be good to know if the security is a Common Stock, Preferred Stock, Right, Warrant, Unit, ADR, ETF, ETN, Closed-end-fund, or another type of security.

For now, the only thing that we really need to know is if a security is When Issued or When Distributed, because that affects its settlement parameters. Such securities do not settle on the usual T+2 date, but instead, they settle at some point in the future once they are issued or distributed (hence the name!). The common trick for making this determination is: (1) a security is When-Issued if its INET ticker ends in a “#” or it is a 5-letter NASDAQ-listed symbol ending in “V”. (2) a security is When-Distributed if it ends in a “$” or it is a 5-letter NASDAQ-listed symbol ending in “Z”.

OpenFIGI

We use OpenFIGI primarily for translating from the exchange symbol (which we get from the FINRA CAT symbol master) to the FIGI. The idea is that if symbol ABC yesterday pointed to FIGI F1 and if symbol XYZ today also points to FIGI F1, they are the same security, despite the ticker change.

When using OpenFIGI, we found a few quirks that we took note of:

Use idType=TICKER when looking up common stock, warrants, rights, units (basically everything other than Preferred stocks). You may need to provide a SecurityType or SecurityType2 or exchCode to narrow down your results.

$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"TICKER", "idValue":"CLAA/U", "securityType": "Unit", "exchCode": "US"}]'
[{"data":[{"figi":"BBG00Z8MRNQ7","name":"COLONNADE ACQUISITION CORP I","ticker":"CLAA/U","exchCode":"US","compositeFIGI":"BBG00Z8MRNQ7","uniqueID":"EQ0000000089313187","securityType":"Unit","marketSector":"Equity","shareClassFIGI":"BBG00Z8MRNR6","uniqueIDFutOpt":null,"securityType2":"Unit","securityDescription":"CLAA/U"}]}]
$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"TICKER", "idValue":"OXY/WS", "securityType2": "Warrant", "exchCode": "US"}]'
[{"data":[{"figi":"BBG00VTL2DT1","name":"OCCIDENTAL PETROLEUM-CW27","ticker":"OXY/WS","exchCode":"US","compositeFIGI":"BBG00VTL2DT1","securityType":"Equity WRT","marketSector":"Equity","shareClassFIGI":null,"securityType2":"Warrant","securityDescription":"OXY/WS"}]}]
$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"TICKER", "idValue":"POST-W", "exchCode": "US"}]'
[{"data":[{"figi":"BBG015H4T6M1","name":"POST HOLDINGS INC - W/I","ticker":"POST-W","exchCode":"US","compositeFIGI":"BBG015H4T6M1","securityType":"Common Stock","marketSector":"Equity","shareClassFIGI":"BBG015H4T7T2","securityType2":"Common Stock","securityDescription":"POST-W"}]}]
$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"TICKER", "idValue":"CELG-R", "marketSecDes": "Equity", "exchCode": "US"}]'
[{"data":[{"figi":"BBG001732JG2","name":"BRISTOL MYERS - CELGENE CVR","ticker":"CELG-R","exchCode":"US","compositeFIGI":"BBG001732JG2","uniqueID":"EQ0000000011412743","securityType":"Right","marketSector":"Equity","shareClassFIGI":"BBG001TCQ0Q2","uniqueIDFutOpt":null,"securityType2":"Right","securityDescription":"CELG-R"}]}]
$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"TICKER", "idValue":"IBM", "marketSecDes": "Equity", "exchCode": "US"}]'
[{"data":[{"figi":"BBG000BLNNH6","name":"INTL BUSINESS MACHINES CORP","ticker":"IBM","exchCode":"US","compositeFIGI":"BBG000BLNNH6","securityType":"Common Stock","marketSector":"Equity","shareClassFIGI":"BBG001S5S399","securityType2":"Common Stock","securityDescription":"IBM"}]}]

Use idType=ID_EXCH_SYMBOL and securityType2=Preferred Stock when looking up Preferred stocks.

$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"ID_EXCH_SYMBOL", "idValue":"GJH", "securityType2": "Preferred Stock"}]'
[{"data":[{"figi":"BBG000007QN8","name":"STRATS-US-04-6","ticker":"STRATS 6.375 12/15/33 USM","exchCode":"NEW YORK","compositeFIGI":null,"uniqueID":"PFEP0115808","securityType":"PUBLIC","marketSector":"Pfd","shareClassFIGI":null,"uniqueIDFutOpt":null,"securityType2":"Preferred Stock","securityDescription":"STRATS 6 3/8 12/15/33"}]}]
$ curl https://api.openfigi.com/v3/mapping --request POST --header 'Content-Type: application/json' --data '[{"idType":"ID_EXCH_SYMBOL", "idValue":"CEQPp", "securityType2": "Preferred Stock"}]'
[{"data":[{"figi":"BBG00BM3XN70","name":"CRESTWOOD EQUITY PARTNER","ticker":"CEQP 9.25 PERP *","exchCode":"NEW YORK","compositeFIGI":null,"uniqueID":"PFEP0497453","securityType":"PUBLIC","marketSector":"Pfd","shareClassFIGI":null,"uniqueIDFutOpt":null,"securityType2":"Preferred Stock","securityDescription":"CEQP 9 1/4 PERP"}]}]

Of course, you don’t always know just from the symbol whether it is a Preferred security (e.g. GJH in the example above). In that case, you may need to try it both ways and see if one of them returns the expected results.

Closing Thoughts

We realize this is a fairly limited view of the “Security Master”. This one doesn’t span geographies or asset classes, and it doesn’t even include some of the common identifiers such as CUSIP or ISIN. However, I’m hoping that the general thought process of stitching together multiple sources to derive a composite list is helpful to someone. Oh, and the symbology conversion table — I think that has to be useful to someone. If you have thoughts or comments, you can reach out to me on Twitter: @preraksanghvi or drop us a note at info @ prooftrading.com.

If you think we’re building cool stuff and would like to join us, please reach out to us at careers @ prooftrading.com. See open roles here.

--

--