Malware/Spear Phishing Detection: Meet TLSH — A Locality Sensitive Hash
In digital forensics and cybersecurity, we often use hashing methods to pinpoint a unique hash value for a given set of data. This is known as a cryptographic hash. We can then use the cryptographic hash to determine if something has been changed or not. But, sometimes, we need to create a hash to show that a file is similar to another file. This is the case of malware or spear phishing emails, and where the data used has often been copied from one source to another, and just changed a little.
Similarity hashing
In malware analysis and in text similarity, we often have to identify areas of data within files that are similar. One method is TLSH (A Locality Sensitive Hash), and which was defined by Oliver et al [1]:
It is used — along with ssdeep — by VirusTotal to identify a hash value for malware:
It is a fuzzy hashing method that requires at least 50 bytes of data. The hash itself is 35 bytes long with…