By: Edward Tam
Every vehicle on the road has a 17-digit vehicle identification number (VIN) attached to it, as illustrated below. But when someone buys or sells a vehicle, they want to know more information about the vehicle. That leads to the process of VIN decoding. The data engineering team at TrueCar oversees the VIN decoding service, which is one of the most heavily used APIs at TrueCar. The service runs a given VIN through the process of matching it to a specific make, model, and potentially trim. Our article will focus on why VIN decoding is important to our business, why it’s a challenge, a high-level overview of the core VIN decoding process, and finally, a technical overview of the VIN decoder at TrueCar.
VIN Decoder Business Use Cases
VIN decoding is very important to TrueCar. It is used across our data pipelines at TrueCar, and almost every application uses VIN decoding in some manner.
- Inventory: All vehicles shown to users go through the inventory pipeline, which leverages VIN decoding to retrieve trim level information for the vehicle. When selecting a vehicle on the page, all trim-related information for each listing comes from VIN decoding (e.g. engine, style, MPG).
- Dealer Portal: Dealers price their inventory off a vehicle’s invoice or MSRP. Vehicle pricing and other attributes are determined based on VIN decoding.
- Market Pricing: We retrieve transaction data from multiple sources. One of the major factors in determining a vehicle’s market pricing is the invoice or MSRP of the vehicle. Our transaction pipeline uses VIN decoding to retrieve all the pricing and trim-related information to assist with calculations of market pricing.
The VIN Decoding Problem
So what makes VIN decoding difficult? The 17-digit VIN itself is not always sufficient to decode the trim. Trims of a vehicle are how Original Equipment Manufacturers (OEMs) identify what set of features is standard on the vehicle. For example, the L trim of a Toyota Corolla may not have leather seats as the default, but the SE trim may. The drop down menu on the image below shows the trims for a 2020 Toyota Corolla.
And why is it important to decode to a given trim? Pricing. Pricing differs between trims. Depending on the vehicle, price differences between trims can range from a couple hundred to even tens of thousands of dollars. For example, an Audi RS 7 performance trim is $131,675, while the standard trim is $114,875. The importance of VIN decoding led TrueCar to initially have multiple VIN decoders before settling on one.
TrueCar VIN Decoding Flowchart
- Normal VIN decoding using multiple data sources. We attempt to use various data points to decode to the correct trim.
- Build data lookup. This is where we try to get even more specific. Build data is information provided at the individual VIN level. It will state exactly what is on the vehicle, including the options on the vehicle. We do this via a lookup to our HBase data store to see if we have build data for the given VIN.
- Calculate confidence score. In cases where existing data points do not map to a single trim and build data does not exist, we have an algorithm to calculate the confidence score to return the most confident trim to the client.
Let’s go through an example for VIN 1FTEW1E42KKC69677. The VIN goes through the regular VIN decoding process but maps to ten trims, as shown below. We then go to the next step, which is build data. In this instance, we have build data to decode to the correct trim. The trim selected is Lariat SuperCrew 5.5' Box 4WD. If build data information is not available, we end the VIN decoding process and calculate the VIN decoding confidence score.
Based on this understanding of the challenges, importance, business use cases, and details of our VIN decoding process, we will now dive into the technical details of our one and only VIN decoder in TrueCar.
TrueCar VIN Decoder
Lots of work has been done to make UVD, the TrueCar VIN decoding service, operate the way it does today. UVD is currently the most heavily used API throughout the company. Average response time is 10–20 millisecond for each request, decodes to one trim 93% of the time, and can handle up to 15,000 requests per second. As mentioned before, it leverages multiple data sources from different providers, which we load into memory upon deployment. The service does a look-up against HBase. We have designed the tables in such a way that all reads to HBase are point lookups, which on average take milliseconds which explains how the response time can be so quick. We can easily scale horizontally should we ever need to in the future. When there are new updates to the cached files, UVD uses the wormhole design. Whenever files that UVD uses have any update, a message is sent to UVD, which will reload the files for every running instance of UVD.
VIN decoding is a bit tricky, since much of the time, the VIN alone is not sufficient. Using different data sources can help and that is how TrueCar has been able to decode to the correct trim 93% of the time. Hopefully this post has given you insight into how TrueCar tackled VIN decoding problems, how VIN decoding is used within the company, why it is important, and where we are today with UVD.