Using the Hacker News API

Chris Opperwall
The OpperBlog
Published in
5 min readDec 7, 2014

--

Compliments and Criticisms

I’ve been getting a portion of my daily information from Hacker News for a couple of years now. I found that the content aggregated on the site was almost identical the content I received by reading eight to ten technology and programming subreddits. Since the information was all in the same place, I found it easier to consume (this was before multireddits, which are awesome).

I don’t have too much of a problem reading Hacker News on a full-size monitor, but trying to read posts on a phone is a less pleasing experience. The website is near impossible to read on a phone (I’ve heard YCombinator is trying to remedy this), and the third-party apps didn’t quite have the look and feel that I wanted.

I had been looking for a good project idea from which to build a standalone browser app. By standalone I mean that it can run completely off JavaScript in the user’s browser without any assistance from any backend of mine. Most of my experience has been with Java and PHP, so I really wanted to improve my frontend chops.

When YCombinator announced an official Hacker News API, I decided to make a Hacker News web app that would be readable on both desktops and phones. I was going to make it only using JavaScript in the browser.

The Hacker News API is one of the only APIs I’ve used before (the other two being Reddit’s and iFixit’s), so I’m not an expert on how they’re supposed to be structured. That being said, I found the HN API to be both surprisingly simple, yet frustating for the specific task I was trying to use it for.

For a little background information, and to show how simple the API is, I’ll quickly go over the parts and endpoints I made the most use of. The base URL for the endpoints is http://hacker-news.firebaseio.com/

Items

/v0/item/<integerid>.json

All link posts, comments, jobs, Ask HN posts, and polls are categorized as “items”. They can be accessed by their unique id at this endpoint.

Users

/v0/user/<userid>.json

User information can be accessed at the user endpoint with their specific id (their username).

Top Stories

/v0/topstories.json

This endpoint returns the item ids for the top 100 posts on HN. I use this endpoint to grab the ids of the top items, and then load them in batches with the item endpoint as the user pages through the list of top posts.

They also have an endpoint for the max item id /v0/maxitem.json and an endpoint for user ids and item ids which have changed /v0/updates.json. I haven’t found much use for either of them in my app. All in all, the API consists of just five endpoints. Pretty simple.

While I appreciate the simplicity, I’ve found that the API requires me to make a lot of network requests.

In order to build the home page of the app (the top 20 posts), I have to make a network request to grab the top 100 post ids, and then make a network request for each of the top 20 that I show initially. That seems like a lot to me, but I suppose could cache the item objects in the browser and only have to make a network request to get the current top 100 ids, then only request ones that aren’t currently cached, and then order the items by the order in the top 100 list. Actually, I like the way that sounds. I’ll try that.

My biggest complaint is the process of grabbing all comments associated with a post. Each item contains an array of kids, which are the item ids of its top-level comments. This makes it pretty easy to load the top-level comments by iterating of that array and making a network request for each item, but if a post has 50 top level comments (which isn’t too uncommon for top posts), the application has to make an additional 50 requests for just the top-level comments.

Furthermore, the application must have the parent comment object to know the ids of its children. This makes the process of loading the entire comment tree a big recursive process of

1. Wait for comment to load.2. Grab its child ids.3. Make a request for each child comment.4. Repeat this process for each child comment until a comment has no more child ids.

While recursion is fun, I’m blocked on waiting for each parent comment to load before I can begin to ask for its children. If a comment conversation is five replies deep, I have wait for five network requests to complete before I can even know the idea of (and start a request for) the fifth-level comment. This difficulty is also shown in getting a total count of comments for a post. The post item only tells me the number of top-level comments it has, so in order to show the total number of comments for a post in the top posts view, I have to grab comments until I am sure I’ve reached the end of all branches of the comment tree.

In order to display an accurate number of comments for each post in the top 20 posts (the opening view of the app), the application could possibly need to make hundreds of network requests. The majority of this data would not even be needed yet (or ever needed, if the user never views a top post’s comments). This initial overhead could be remedied by adding a field for the total number of comments an item has, specific to non-comment items.

To contrast this approach, Reddit’s API has an endpoint to return an entire comment tree for an article. This allows an application to grab an entire comment tree with a specified depth and maximum number of comments in a single request.

I’m not sure how constructing a comment tree server side and sending it in one go compares to handling tens to hundreds of network requests for single comment “nodes” in terms of server load. However, I feel that the former makes the API more friendly for developers building third-party apps for HN.

From another point of view, this API makes it very simple to iterate over a large number of posts, comments, polls, and job postings. I’m sure there are plenty of interesting experiments that can be done on a dataset as large as HN’s post and comment history. You could make a python script that count iterates over every item from 0 to maxitem and counts the number of appearances of a certain phrase.

You could also throw together a script that finds out how many posts were Paul Graham’s out of the first 100 made.

Spoiler Alert: It’s 19.

Okay, maybe that’s not super interesting, but I’m sure you can be more creative than I was. ☺

So the API is good at some things and not so good at others. Maybe that’s just a life lesson or something about being resourceful with what you have. Trying to get around the more difficult use cases forced me to come up with more creative solutions for my HN reader, and trying to understand the API’s strong points gave me a couple of ideas for working on HN’s large set of information.

If you’d like to try out my web app, you can find it at http://devopps.me/hn and in the Firefox Marketplace.

The source can be viewed on Github. Comments or criticism of any type would be really appreciated.

--

--

Chris Opperwall
The OpperBlog

Software Engineer, but I write about my own stuff. I like Linux, FreeBSD, open source things, and bicycles.