Retail Industry Crime Pattern Detection with IBM Confidential Computing

Sandeep Batta
6 min readMar 22, 2022

--

The Wall Street Journal wrote the following about Organized Retail Crime (ORC): “Retailers are spending millions a year to battle organized crime rings that steal from their stores in bulk and then peddle the goods online, often on Amazon.com Inc.’s retail platform, according to retail investigators, law-enforcement officers and court documents. It is a menace that has been supercharged by the pandemic and the rapid growth of online commerce that has accompanied it.”

According to Home Depot Inc, in the same WSJ article, “its investigations into these kinds of criminal networks has grown 86% since 2016 and exceeded 400 cases last year. The majority involved e-commerce.”

The statistics are staggering:

  • 97% of retailers have been victimized by ORC
  • 68% increase in ORC activity just in the last year
  • It is a $40billion problem!

*Source : https://nrf.com/media-center/press-releases/two-thirds-retailers-see-increase-organized-thefts

Every retailer has an Asset Loss / Prevention Team, with their own tracking applications / databases that track loss / pilferage in their retail stores around the United States. They have increased surveillance and put more resources to tackle the problem, but with limited success to address crimes that span across the retailer landscape. The problem is exacerbated because of the legal costs of bringing charges for dollar amounts which amount to a misdemeanor for individual cases, but when aggregated, it can help bring felony charges.

But retailers are operating in isolation of each other and therefore lack visibility into the theft patterns across the industry. The retail industry needs a “National ORC Information Sharing Platform” which can address the following:

  • Identify crime patterns across the aggregated data sets of retailers to find evidence of theft without revealing sensitive data to other retailers
  • Provide assurance that data from one retailer will not be shared with other retailers participating on the platform
  • Provide assurance that retailers will have complete control over their proprietary crime data so that they can opt-in OR opt-out of collaboration on the platform with no hassles
  • Provide the retailers with actionable insights to be proactive in preventing loss in the first place

The IBM Cloud Hyper Protect team took this challenge head on by designing a solution that meets and exceeds the above requirements and ran a PoC that was widely successful with big name retailers in the US.

What is Hyper Protect?

IBM Cloud Hyper Protect is a key management service (KMS) and cloud hardware security module (HSM) that supports industry standards such as PKCS #11. IBM Cloud Hyper Protect implements security with “technical assurance” such that customer data is always owned and controlled by the customer while making it impossible for IBM Cloud operators to access customer data. In contrast, “operational assurance” is dependent on procedures and promises that the Cloud Service Provider (CSP) will provide, which often get flouted, resulting in bad press and headline stories that cause extensive damage to the customers brand image.

The Hyper Protect platform is based on the LinuxONE / Secure Service Container (SSC) technology which provides a “confidential computing” platform which prevents anyone other than the customer from accessing customer data, as illustrated in Fig.1

Fig 1. Technical Assurance of IBM Cloud Hyper Protect Services

At the time of writing, services that are built on the Hyper Protect platform, include:

  1. Hyper Protect Crypto Services (HPCS)
  2. Hyper Protect Virtual Server
  3. Hyper Protect DBaaS (HP DBaaS) for MongoDB & PostgreSQL

The IBM Cloud Hyper Protect Crypto Service enables data protection with a single-tenant dedicated KYOK (Keep Your Own Key) key management service (KMS) that provides access to a FIPS 140–2 Level 4 certified hardware security module (HSM). FIPS 140–2 Level 4 compliance is a de facto standard for the financial services industry, which is the cornerstone of the FS Cloud initiative.

Hyper Protect for ORC

To address the data-confidentiality requirement on the National ORC platform where retailers who do not inherently trust each other with their data, it is very important to make sure that data from each retailer stays in its own protected enclaves when it is being processed and stays protected when it is at rest. The IBM Cloud Hyper Protect Virtual Server (HPVS), which is also available as an on-premises offering if you have access to a IBM LinuxONE machine, fits in perfectly with this requirement — it is a “confidential computing” platform because of the technical assurance of the platform on which it is built.

To address the opt-in /opt-out requirement, the participating retailer needs to have a “RED-Button” control. What that means is access to all the retailer data must become inaccessible at the flick of a button. HPCS, with its unique KYOK feature provides this ability by giving control of the Master Key to the Retailer. By having each retailer control their own Data Encryption Keys, the retailer is able to make all their data inaccessible by simply disabling the Key, which renders all the other downstream encryption keys unusable as well.

Federated Learning for ORC

To address the requirement of generating insights and finding patterns of crime that span across retailers, without combining OR aggregating data from all participating retailers, the IBM Research team brought in their A game by having Federated Learning work on IBM Cloud Hyper Protect Virtual Server. Federated Learning is best explained by the diagram in Fig 2.

Fig 2. Federated Learning with Hyper Protect Virtual Servers

For the ORC use case, each Retailer owns an HPVS where are all data for each individual retailer resides and is operated on. ORC data brought in by the Retailer is never shared or combined. The Aggregator function, which can also run on an HPVS instance, pushes a machine learning model over to the participating retailer HPVS for training. Training happens in individual retailer enclaves and what is sent back to the Aggregator, by each retailer, is a hash function which is representative of each retailer’s data. The Aggregator derives the insights and pushes back results individually to each retailer.

Putting everything together

Multiple resources in IBM Cloud made up the PoC environment. Fig 3. shows the complete list and setup.

Fig 3. Complete PoC Setup
  • Each retailer gets a full complement of resources in individual IBM Cloud Accounts which are isolated and inaccessible by / from any other retailer
  • Each retailer owns the Master Key for their own instance of HPCS
  • Each retailer’s data in IBM Cloud Object Storage (COS) is encrypted / protected by keys from the HPCS instance they own
  • Hyper Protect DBaaS is used for meta data
  • Model training happens in individual HPVS instances in accordance with principles of federated learning.

Working with ORC Data

Big name retailers from the following industry segments participated in the PoC:

  • Home improvement
  • Pharmacy
  • Clothing
  • Big Box Department Store chains
  • Grocery

The Data Flow, as illustrated in Fig 4, was kept simple — to spare the crime prevention folks from the technology that powered the solution:

1. Logon to a Custom Portal

2. Upload data in pre-determined CSV format

3. Get patterns / insights of crimes happening within and across stores in a locality, by zip code or the complete eastern seaboard

Fig 4. Data Flow Diagram

The Data Science magic that happens in the background is key to getting meaningful insights. It took a few tries, some heart ache and a whole lot of tweaks to find patterns which were mind boggling. The insights obtained confirmed what the WSJ article quoted earlier in this blog described and validated our approach. Here is a closer look at some of the patterns:

  • Home-Improvement: Individuals, identified by clothing, physique, vehicle — going from one store to another, starting around 9am up until 6pm, taking away expensive tools of a certain brand, only on weekdays — almost like a regular job!
  • Department Stores: Single individual, identified by name, stealing clothing worth small dollar amounts every time, which individually would have been classified as a misdemeanour, not worth prosecuting, when added up over a 90-day period, totalled up to $11,000. This retailer got enough evidence through our analysis to start felony proceedings against the individual.
  • Home-Improvement / Clothing: Single individual, identified by physical characteristics, stolen clothing, going into multiple home-improvement stores in a certain zip code, stealing tools

Additional Resources

--

--

Sandeep Batta

Sandeep is passionate about bringing various technologies together to develop use cases and patterns that can solve real world problems