What Hacker News Wants to Know From Paul Graham: A What Would Paul Graham Do Postmortem
Two weeks ago (10/15/2012), “What Would Paul Graham Do?” (WWPGD), a site that I developed, made it to the front page on Hacker News. After being posted at approximately 11:00, the site saw 12,783 visits within a 13 hour window. Subsequently, it was covered on TechCrunch and Inc and saw 2,846 vists the next day.
Site Architecture: I Love Heroku
A frequent question that developers, myself included, usually ask regarding a site (if successful) is how it was created. I scraped PG’s essays from his site and queried the HN search api for his comments using a few python scripts. These documents were sanitized and indexed using functions from NLTK. The site was hosted on Heroku and utilized the MongoHQ and MemCachier add-ons. The MongoHQ instance was used to store the documents and MemCachier was used as a simple cache for results from the database queries. The memcaching was actually implemented after I discovered that performance was bottlenecked by the database queries. The front end is a simple interface that queried these store using Flask.
What HN Queried
Out of the 12,783 visits there were a total of 15,204 queries and 8,340 unique queries. The top 10 queries and their frequencies were as follows:
I’m not sure why the top result is the url to the site itself, but the second and fourth are easily explained by the fact that PG commented on the thread and mentioned that “It appears to work. I asked it where to go for lunch”. In fact, the word “lunch” was mentioned in a total of 1,322 queries, which is 12% of all queries! It’s interesting to note that there were a total of 1,472 queries for lunch over the lifetime of the site, the additional 150 queries were probably visitors from TC or Inc. This, along with a graph of queries for “lunch” by time demonstrates the heavy influence of PG’s comment on the queries, or the impact of comments on site usage.
Counts of queries for lunch are binned per 10 minutes. The number of queries is normalized by the total number of queries within those 10 minutes.
Looking up PG’s post on lunch on the HN Search api, reveals that PG posted at 12:40:56 PST. Looking at the above graph, the largest spike in the graph occurs directly after PG makes his comments at 12:40. The steepest slope is between 12:30 and 12:50.
Looking through the top 100 queries manually, here are a few intriguing ones:
- should i quit my job (59)
- how do you cook steak (57)
- should i raise money (34)
- what is the meaning of life (33)
- kevin wu sucks and should grow up (20)
The query containing my name led me to search for other queries regarding myself. Two other amusing queries that contained my names are: “how should i invite kevin wu to dinner” and “how should i kill kevin wu”. It’s also interesting to note that there were a handful of simple non-malicious XSS and SQL injection attempts.
I was also curious about what languages HNers were most interested in, here’s a graph of a few popular languages along with the number of queries.
The Future of WWPGD
I’m planning to keeping WWPGD up and running since people have reported that it offers some value to them. Thanks to some advice from Andres Morey (@morey_) of Octopart, I’ve refined the indexing of PG’s comments from HN to encompass all of his comments. To all the supporters, thanks for all the fish!
Originally published at kevinformatics.com on 2012–11–02.