Paper: Detecting Near-Duplicates for Web Crawling
Three guys from Google have published the paper Detecting Near-Duplicates
for Web Crawling at the 2007 WWW Conference with a technique for detecting near-duplicates over a set of web pages.
They have developed a method aimed at performing…