How AI and Web Crawling can Empower Customs Risk Management Teams

Abishek Bhat
Nov 13 · 4 min read

Customs agencies across the globe are waking up to a new challenge — rapidly growing volumes of international small package shipments. The United States Customs and Border Protection (CBP) noted that international mail shipments have increased by more than 200 percent over the past five years, from ~150 million international mail shipments per year to nearly 500 million.

What could the reason for this be? Cross border e-commerce has been on a tremendous growth trajectory, growing 25% YoY. It’s reasonable to correlate the surge in international mail shipments to this growth in cross border e-commerce. This trend significantly increases the surface area for customs risk and is only exacerbated by the fact that most checks are carried out in a random and manual fashion.

Here’s how a typical manual check works

  1. A standard shipment is paired with a declaration that enumerates an HS Code, the weight of the shipment, the declared value and a description of the good.
  2. Obtaining an estimate of the price and weight of the good is a strong signal for identifying risk. A significant deviation from the estimated price/weight would make a strong case for further investigation.
  3. Estimating price and weight without an easy to lookup database is non-trivial and requires localizing the shipment to a very specific product sold online.
  4. Customs officers often resort to web searches on Google/Bing using the shipping information, in an attempt to find products that match the description and arrive at the estimates. Of course, this isn’t a bulletproof method since shipping descriptions may not directly translate well to Google searches.
  5. The agent has to spend a significant amount of time rephrasing and re-engineering the search terms to get a relevant answer.

This manual process is hard to scale and error-prone … and not ideal in the face of growing shipment traffic. Is there a better approach?

We believe that AI (Artificial Intelligence) can make a huge difference here, and we’d like to show you how.

Over the last few years, our team at Semantics3 has made significant investments in building cutting edge technology to crawl, categorize, organize and deeply understand product information in a highly scalable fashion.

Our expertise in large scale unsupervised content extraction and information enrichment enables us to build a highly scalable solution that can help streamline manual processes and make risk assessment a lot more efficient.

To this end, we’ve built an app that helps customs officers quickly get estimates of price, weight and HS code for any shipment.

Estimates for ‘iphone xs max’

Behind the scenes

  1. NER, Named Entity Recognition: We take shipping information/customs declaration as input and extract meaningful structured data from it. This allows us to tabulate key attributes like the reported weight, declared value and other granular attributes from the product information.
  2. SQE, Search Query Engineering: The manual process involves figuring out the right set of keywords from the product information to formulate a web search query such that relevant results are returned. We approach this as a reinforcement learning problem, where our AI model tweaks and builds search queries using the structured data from NER.
  3. UCE, Unsupervised Content Extraction: Each search result is fed into our patent-pending content extraction system, which visits every URL in the search result and extracts structured information from images, tables, and text on the page.
  4. Price/Weight Estimation: All the meaningful information gathered from each one of the product pages is fed into our multi-headed regression model trained on the weight and price data from our catalog of over 200 million products spanning 20,000 categories.
  5. HS Code Estimation: Similar to the price/weight estimation, we use the data from product pages and the input product description to estimate an appropriate HS Code using a deep hierarchical text classification model.
  6. Human Intelligence/Human in the loop: AI systems need not always provide the right answers; in cases where our system isn’t very confident about estimates, we allow the user to mark the wrong estimates and this mechanism of continuous feedback helps our larger system get better over time.

If you’re curious to try out the app, please schedule a demo with us!

This article was originally published on the Semantics3 Blog

Abishek Bhat

Written by

Code and Poetry

The Ecommerce Intelligencer

A look at how data is shaping the future of e-commerce, gleaned from our stockpile of Ecommerce product, pricing and customer metadata. Also see

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade