I trained an LSTM neural net to generate Hacker News submissions…

Dan Hon
7 min readAug 6, 2017

--

…you won’t believe what I learned about Rust, node.js and React Native.

After using an LSTM neural network (torch-rnn) to generate British placenames (Hatlet Backlingham, Sattle Boslaw, Fuckley) and Ask Metafilter questions (Am I a Dog?, “I am library, how do I do?”, “How do I stop being a therapist?”), I was on the lookout for my next project.

Just like Goldilocks, I needed a corpus that was just right. It had to:

a) be large enough (at least 5MB of text to train on);

b) be nicely constrained to a particular domain; and

c) plausibly generate interesting insights (e.g. draw attention to trends or patterns, or be used to poke gentle fun at the domain).

🚕 Uber says it doesnt want to be discovered in 2017

Hacker News is the internet’s acknowledgement of the following universal truth: young single men in possession of the knowledge that there is a right way and a wrong way to produce software are in want of the opportunity to disrupt the world.

Arguably an influential online community, Hacker News (launched in 2007 by serial disruptor Paul Graham with the online handle “pg”) is the epicenter, or what hackers would call “the gibson”, of the intersection of computer science and entrepreneurship. Startups, fortunes and reputations are made and broken by their reception on Hacker News.

🏅 The Man Who Could Be a Bitcoin Will Never Be a Billion Dollars in Social Media

Like Condé Nast mainstream news media website Reddit (founded in 2005, 2 years before Hacker News), Hacker News allows registered users to post to the website. Posts attract votes that, through the application of an “algorithm”, allow them to the displayed above the fold on the prestigious front page.

So: what better corpus than Hacker News post titles?

The theory here is that an LSTM neural network trained on Hacker News posts will reflect back the culture of this influential online community. It may help us understand what the community thinks is important. Also, we may discern some clues as to where the hockey puck may be skating to next.

💸 Show HN: Universal Basic Income in Elixir

With that, onward!

Method

  1. First, you need a training corpus or dataset. I used the Python scripts get-all-hacker-news-submissions-comments from minimaxir on Github. Thanks, minimaxir! One of the Python scripts uses the Hacker News algolia API to grab all of the submissions and store them in a local PostgreSQL database.
  2. Wait a while. I waited a few hours and got about 500,000 submissions downloaded. You could wait longer, if you want.
  3. Have a look in the hn_submissions table of the hacker_news database. There will be a lot of submissions there!
  4. Now, have a think: do you want to use all of the submissions? Maybe not! Some of them will be no good! Maybe we only want to use the ones with more than 1 point. Or more than 10 points. Whatever, write some SQL to eyeball some numbers.
  5. Once you’ve figured out the kind of submissions you want, use the SQL COPY command to dump just the titles out to a text file.
  6. As usual, use torch-rnn: pre-process the data and train some models. You’ll want to train the models on EC2 or on any computer that isn’t a Mac because fuck Apple and their stupid approach to GPUs. Once you’ve got some models, you can use whatever computer you want, sampling isn’t particularly processor intensive.
  7. Play around with sampling parameters, generate hundreds of thousands of characters worth and, in a bid to convince yourself that you’ll still have a job in the future, exercise human creative discretion in picking out the ones that you like. I’ve put sample output, models and data on Github, use it at your own risk, etc.

That’s it!

Some of my favorite neural network generated Hacker News submissions

Emoji selections author’s own; even the Culture recognized the need to let the meat sacks feel like they’re inovlved.

🎶 Show HN: Simple Blockchain for All Twitter

🛰 Introducing Google Space

⛈ Self-Driving Frontier in the Cloud

🌋 React Native Conversation with React Native

🤔 Ask HN: What do you use for developers?

💡 Brain Design Protocols [video]

💸 Show HN: Universal Basic Income in Elixir

🌍 Active Map of Hyperloop Engineers (2005)

💣 Ask HN: What are some useful computers to learn Rust?

💉 Peter Thiel Extensions to Hack Your Development Environment in Computer Science

💰 The Case for Everything with Profits (2015)

🖥 Show HN: Simple macOS app to sell your startups

👴🏻 The Man Who Continues to Code (2013)

🔥 The Computer Science of Artificial Intelligence Considered Harmful

🐍 Emacs and Deep Learning in Python

🌈 Building a Common Lisp in Tech Companies

🎷 The Post HoloLens Algorithms with Node.js

🎯 The Gig Economy of a Product Hunt

🤼‍♂️ A Programming Language Translation of Men

📰 The End of the Subscription Model

💰 The Simple Command Line Toolkit for Money

🇺🇸 The American Freedom of the Web

👔 A brief history of sexual harassment claims

🏎 Tesla Model 3 released with Secure Passwords (2015)

🌯 How to build a social media diet

🥑 A Slack Could Be More Than $100,000

🏅 The Man Who Could Be a Bitcoin Will Never Be a Billion Dollars in Social Media

🍔 Ask HN: What are the best ways to be a large developer?

🍩 Developing for the Human Spy Discontinued

❓ Ask HN: What are some thoughts on a startup?

⚠️ Large Silicon Valley is a pill business made from source code

🚿 Mark Zuckerberg is a web development with a college study of the web

🔮 Explore the Holocaust read the world as an AI company

🌿 The Art of Marijuana Access to Finding a Messenger Bot (2014)

🐙 Deep Learning for Alexa Server with Node.js

👊🏼 Why So Many Millions of Startups Want You

👶🏾 Show HN: Get the future of a Haskell programmer

🇨🇦 Canada wants to be in the browser

💶 How to Start a Basic Income as a Service

💯 Ask HN: What is the best way to sell the blockchain in 2017?

📱 The Chat App for a Startup Invented the Internet of Tomorrow

🌀 The Strange Startup School Story

🤖 Restricted Robots and the Apple Machine Learning Company

👔 The Man Who Made the Future of America

🌿 Brain drivers are reality as a service

💉 Peter Thiel could be used to share your life

🎧 Ask HN: How do you manage your data scientists?

🚛 Self-Driving Cars Using Docker Components

👭 How to Setup a Software Developer Species

🚕 Uber says it doesnt want to be discovered in 2017

🌮 Why Im a Social Media Founder?

🍱 The Secret Neural Network Series A Revolution

🍑 Deep Learning for Hacker News (2015)

👔 The Conversation in His Wife for Software Engineers

🌉 How I Beat San Francisco

🎁 Dropbox is a product that has died

‼️ Mark Zuckerberg Is Broken (2016)

💤 A Tale of Software Engineers

🤦🏻‍♂️ The Future of Discrimination

🤔 Deep Learning with Node.js in C++ in Python

🤷🏼‍♀️ Show HN: React Native vs. Arctic Lisp

📈 Mathematicians Are So Many Causes of Technical Debt

🎉 Technology reveals the best way to stop using Rust

⌨️ Machine Learning for Electron

🔍 Regex with Node.js

👻 The Anti-Marketing of Things

🎈 React Native vs. React and React Native with Docker

💎 How to Make a Neural Network to Acquire Them

🍺 Drinking Up in Haskell and Authentication for Free

💉 Peter Thiel to release decentralized routers from scratch

⏱ Show HN: A command-line tool for creating a startup in 100 mins

🔮 Show HN: I made a web server in the future

👨🏾‍💻 The President Obama Programming Language (2006)

🙋🏼‍♂️ The Dark Startup Call for Humans

💊 The State of Addiction in Scala

🛒 Ask HN: What is the best way to sell your life? (2015)

🛡 The State of the Post Neural Network Attack

📉 How to Start a Mistake (2012)

📬 A Simple Email Client for Programming Languages

🚀 Show HN: SpaceX and the Startup Idea Took Over the World

🇨🇳 Chinas Plan to Send Universal Basic Income for You

🇺🇸 Elon Musk and Control of the American Dream (1999)

🌐 Elon Musk says he will look at Google and the entire internet project

💯 Elon Musk says he is a post-truth about the brain

🎱 Elon Musk says he will be more than a secret artificial intelligence

🎯 Elon Musk says Mark Zuckerberg wants to stop the first time

🕵🏻 Mark Zuckerberg really should know about everyone

🔮 Mark Zuckerberg Predicting the Future of All Time

🌎 Mark Zuckerberg Is Now Our World

🌕 10 Signs Youre Working in the Moon

🏛 The Computer Model of the Government

🗞 The New York Times of AI (2013)

🏎 The Fastest Interview with Elixir, Part 1

🇨🇳 China has died

🔦 Quantum Computing Programming in Python

🤼‍♂️ Mark Zuckerberg has a new language that could change the world

🏌🏼 The Trump Win for MongoDB

🙈 MailChimp Get Trump to Support a Container Internet Architecture

💸 The Metaphysics of the Brain Detection of Bitcoin

🕹 Video Games on the Blockchain

⛓ Show HN: Continuous integration for the blockchain

💁🏼 Introducing Chelsea Manning: A tool for developers and women in a large computer

🤦🏿‍♀️ What is your favorite experiment at a black woman?

🙅🏻 A Woman Who Stopped All Their Safety

👨🏻‍🚀 The Man Who Made the World with Convolutional Neural Networks

👨🏽‍🎨 The Man Who Did It Makes His Devices in Scala

👨🏾‍✈️ The Man Who Became the Future of Cars

👨‍🏫 The Man Who Did Not to Deliver Subscribers

👨🏻‍⚕️ The Man Who Is Making Humans

👷🏾 The Man Who Made a Product Manager

👨🏽‍🎤 Show HN: The man who went to the blockchain

☠️ Internet of Things and Black Mirror is now available

👨‍💻 Computer Science company is the new internet of white internet

🎩 Containers with Elixir

🍜 Why I Studied to Death in Clojure

📝 Marc Andreessen is looking for technical lists

🤦🏻‍♀️ Show HN: List of sexual harassment in the world

🤦🏽‍♀️ Show HN: A web browser for harassment in 2017

🦄 Slack is the future of the world

--

--