How We Built Our 60-Node (Almost) Distributed Web Crawler
Web crawling is one of those tasks that is so easy in theory (well you visit some pages, figure out the outgoing links, figure out which haven’t been visited, queue them up, pop the queue, visit the page, repeat), but really hard…