Build a Hacker News analytics app in 73 lines of Python
People posting on Hacker News want to optimize the day & time they post to get maximum visibility and the highest score. HN Analytics helps you do this quickly.
Links: Live App | Github | Youtube | Pycob | Google Slides
App demo
We have a single page for this app with a few key charts.
- Posts by hour and day of week. This heatmap shows the number of posts by hour and day of week. As is evident, most posts happen on weekday mornings, especially Tuesday.
- Mean and median HN score by day of week. Interestingly, weekdays are the worst time to post. Perhaps because there’s so little activity on weekends, weekend posts actually get higher scores.
- Histogram of HN scores on log scale. In general, most posts receive very low scores. And there are a handful over 1,000 (wonder what they are!)
- Probability that a post has a score > 20. The time of day doesn’t seem to have a very strong relationship on post performance, but the weekend certainly does.
In addition to these charts, you’ll notice some functionality at the very bottom of the page to filter the stories using a particular keyword. In this example, we filter to story titles containing “python”. Results are below —
Interestingly, for Python posts, Friday is the second best day to post vs Saturday for other posts. People hungry to code on Friday? Or just random noise in the data?
Code
Let’s dive into the important parts of the code.
First, we need to get the data. Using Google Cloud’s Public Dataset Program, BigQuery, and Pycob’s convenient cloud pickling functionality, we grab and transform the relevant data using the query below.
# DATA
# We are getting the data from a pickle file that was created in a notebook. It is basically just the result of running this query:
# %%bigquery hn_data --project pycob-prod
# SELECT *
# FROM (
# SELECT *
# FROM `bigquery-public-data.hacker_news.full`
# TABLESAMPLE SYSTEM (1 PERCENT)
# )
# WHERE score is not null and score > 0
# AND type = 'story'
# ORDER BY score desc
Now, in the Pycob app, we load the pickle and do a few small transformations on the DataFrame.
hn_data = app.from_cloud_pickle('hn_data.pkl')
hn_data['pacific_hour'] = hn_data['timestamp'].dt.tz_convert('America/Los_Angeles').dt.hour
hn_data['pacific_day_of_week'] = hn_data['timestamp'].dt.tz_convert('America/Los_Angeles').dt.day_of_week
hn_data['score'] = hn_data['score'].astype('int64')
Next, we generate all the Plotly charts. None take more than four lines; this one does the bar chart by day of week —
pivot_by_day = df.pivot_table(index=['pacific_day_of_week'], values=['score'], aggfunc=['mean', 'median'])
pivot_by_day.index = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
pivot_by_day.columns = ['mean', 'median']
fig2 = px.bar(pivot_by_day, barmode='group', title="<b>Mean and Median HN Score By Day of Week</b>")
Finally, we use just a few lines of Pycob to display the charts —
with page.add_container(grid_columns=2) as container:
container.add_plotlyfigure(fig1)
container.add_plotlyfigure(fig2)
with page.add_container(grid_columns=2) as container:
container.add_plotlyfigure(fig3)
container.add_plotlyfigure(fig4)
Now, let’s take a peak at the functionality that filters the posts by keyword. Four lines of code to throw in the form and card that take input from the user —
with page.add_card() as card:
with card.add_form() as form:
form.add_formtext("Filter the Story Title", "title_filter", "text", value=title_filter)
form.add_formsubmit("Filter")
And if the keyword is present, it’s just one line to filter the DataFrame down —
df = hn_data[hn_data['title'].str.contains(title_filter, case=False, na=False)]
Deployment
Now that we have the app running on our local machine, let’s deploy it to Pycob’s cloud hosting so other people in our organization can access it. Deploying is super simple and just takes one step once you have your API key. All you need to do is —
python3 -m pycob.deploy
And wait about 5 mins, and the app is live on the server and ready to go!
Using the Hacker News app for yourself
There are many modifications and enhancements you may want to make to make this suitable for you. This entire app is 100% free and open source, and can be run locally within your environment or hosted externally through Pycob. There may be some changes you’ll want to make —
- Actually see what the top posts are
- Get analytics on the best keywords that the HN community likes
- Go beyond scores and look at comments and other activity as well