Spotting Spam with Similarity Hashes

--

Many researchers have tried to find methods where they can hash a string and then compare the hashes. This allows content to be searched for and matched against to be represented in a hash form (and not in the original text). This makes it more efficient to store and to process. A good example is within a spam message, and where we receive two spam emails which only differ in the target’s contact name (taken from a real spam email):

Dear Sir, target@home !!Reference #PP-003-AC7-993-014
Account Security Alret.
We need your help resolving an issue with your account.
What's going on?
Your debit or credit card issuer let us know that someone used your
card without your permission. We want to make sure that you authorised
any recent PayPal payments.

and:

Dear Sir, other@home !!Reference #PP-003-AC7-993-014
Account Security Alret.
We need your help resolving an issue with your account.
What's going on?
Your debit or credit card issuer let us know that someone used your
card without your permission. We want to make sure that you authorised
any recent PayPal payments.

With this, we can use a similarity match to a known spam message, and gain a score. If the score goes over a certain level, we can quarantine it.

Charikar

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.