Using GPT-3 and Hacker News for slightly creepy market research

13 min readDec 18, 2022

“What if I fed GPT-3 a community’s online conversations? Could I use that to simulate interviewing a community member for market research?” That was the spark.

I know, I know, developer writes an article about avoiding interacting with people…but that’s not what this is. I start with a poorly thought-out goal and end with ok results (You can find the results just below). Still, along the way, I learn a bit about the general problem of computational market research. Here’s a tl;dr:

Analysis that you don’t trust or understand will not affect your uncertainty. Your personal context affects what techniques are most effective.
The two pillars of computational market research are: High precision search functionality and SQL as a query interface. They help solve the most fundamental analysis problem: What data contains the answers to my questions?
Your interpersonal skills and intuition have high transferability to marketing. So try and use them.

Final Results

I use ‘note-taking’ as an example domain for research. I ranked users based on how many note-taking related posts they commented on (since the start of 2022). I then used GPT-3 to generate reports about the top ranked users. Here are two reports of the highest ranking users (renamed ‘Alpha’ and ‘Bravo’).

Alpha’s Report:

User Alpha is an active user on Hacker News and has been engaging in conversations related to note-taking for the past 12 months. He is knowledgeable about Obsidian, Confluence, Markdown, WYSIWYG editors, and open source alternatives to Evernote. He is also familiar with Chrome, Chromebooks, Rust, and Microsoft Windows. He values file uploads, does not like tags for querying, and prefers a directory approach to organization. He is also aware of potential risks of using new software and is willing to ask questions. He mentions holding multiple roles in academia.

Bravo’s Report:

User Bravo is an active user on Hacker News who has posted and commented on various topics related to note-taking over the past 12 months. He is an ex-Roam, daily Obsidian user who uses Obsidian for navigation and discovery, with a template to fill in most of what he needs. He is impressed by Obsidian and Dataview, and likes the Excalidraw plugin for its power for visually-oriented users. He prefers to keep notes offline, local, and portable. He is interested in Xournalpp++ and the ReMarkable tablet, and is open to trying new software. He highly recommends Obsidian as a note-taking tool, and finds it powerful, flexible, and extensible. He is interested in self-dialogue as a journaling strategy, mental health, and self-improvement. He enjoys humor and is open to other people’s opinions. He is interested in productivity, creative output, and personal development. He knows about Obsidian’s Outliner plugin and Workflowy.

Act 1 : How do I wield GPT-3?

Getting a year’s worth of Hacker News data:

I’m no data scientist or statistician. I specialize in agonizing over which 3rd party APIs to choose. So that’s where I start.

I find the Hacker News (HN) Firebase API. It’s not the most ergonomic API, but free and without rate limits.

I also find the Big Query Hacker News Public Dataset that provides a SQL interface.

I use both sources to collect nearly a year’s worth of data. I want the information to be fresh and relevant, and one year should be a good cut-off point. I don’t collect ALL the data, just everything for posts (stories, Ask HN, Show HN, etc) that:

have at least 10 upvotes
have at least 1 comment
and were posted anytime between the start of the year and 5PM UTC on Dec 9th, 2022.

I end up with 51,275 posts in a Postgres database. Good start.

Assembling a GPT-3 toolkit:

I read the OpenAI docs. Transformer models (GPT-3 included) have a limit as to how much text they can work with at once (the best OpenAI model has a limit of ~3000 words at the time of writing). That might be a problem. I also notice the ability to fine-tune a model, which sounds interesting, but I’ll keep exploring for now.

I venture into the OpenAI community forum and… Uh oh! People have tried to do what I’m doing. The general problem is teaching GPT-3 large amounts of specific text and expecting answers to be generated based on it. Users report either underwhelming results or just flat-out failure.

But I keep digging. I note down the techniques I think may be helpful with my task:

Chunking and Context Passing: OpenAI has a blog post about recursively summarizing large books in chunks. This chunking approach is popular and shows ok results. I’ll need to ensure important context is preserved between distinct chunks.
The Lorebook: I find this Reddit post about how story-telling tools like NovelAI use something called a lore-book. Essentially all context is saved in a database, and a subset of the context is dynamically included in relevant prompts when it makes sense. (They analyze the prompts for entities and prepend the prompts with little bits of context related to those entities).
Fine-Tuning: Fine-tuning is training the GPT-3 model for a specific task using examples we provide. The examples used for fine-tuning have to be in the form of prompt-completion pairs like so:
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
We can fine-tune the model using examples where the prompt is empty for a broad fine-tuning task (where the task isn’t clearly defined). The completions are the new data we want to “refresh” the model with. I also note that multiple people in the forums claim this method did little for them.
I learned this technique from an OpenAI researcher in the video Creating fine-tuned GPT-3 models via the OpenAI fine-tuning API at timestamp 45:08.
Classifier for Quality Control: GPT-3 can sometimes output unrelated and nonsensical things with complete confidence. To ensure that GPT-3 does what we want, we can generate multiple outputs for the same prompt and then use a classifier to validate them and rank them based on some criteria. Then we pick the best one.
I got this from the document Playbook: Train a fine-tuning discriminator to increase truthfulness of generations that an OpenAI engineer posted in their community forum.

I’m beginning to see how much work and tinkering may be required to get good results. But at this point, something on the inside starts to nag me.

Realizing that GPT-3 isn’t enough:

GPT-3 is a probabilistic model that outputs predictions based on what it has previously seen. How should I interpret the results of my experiment?

Would it be the most probable answer by a community member?
Or would it represent the average answer from a community?
How can I tell if it truly represents the community and not other data it was trained on?
Am I just building a tool for luck-based ideation? There still remained the most important question to ask myself.

Would I trust the output enough to reduce my need for other research? And I had a feeling that…it wouldn’t. Knowing my market was a significant enough problem that I couldn’t just cede trust without a certain degree of confidence. But what was nagging me was that I was doing something I’d seen before.

I’ve read many data science posts that focus very little on the impact of the analysis and spend much time on the tools of analysis. And I get it. These are data scientists/ML specialists writing for their peers. Of course, the technicalities of their trade are what’s most interesting to them. But I always think, That was sooo much work for…that? A dubious prediction and almost no perceptible change in my uncertainty about the domain.

I realize I need to formulate an analysis plan. I want to make something that I can really use. But I also want its usefulness to be easily explainable.

Act 2 : How do I do useful research?

Looking at what marketers are doing:

I want to use social media data for marketing insight. I knew some companies solved this problem, so I looked at them to see what they were doing.

I browse g2.com (a peer-to-peer business software review site) until I come across what they categorize as Audience Intelligence tools.
From their site: “Audience intelligence platforms gather and analyze public data from online sources to help businesses gain in-depth insights into their target audiences. Marketers use the information gathered through these platforms to create customer segments, discover influencers, conduct market research, and inform decision making.”

The business websites were ok, but I find the reviews to be more revealing

Users just wanted to get at the data that was most relevant to them. Every company tried hard to build its own custom interface for search, navigation, and query building.
The real value was that data from multiple platforms was viewable in one place. One of the most common complaints was the lack of data from closed-off platforms like Facebook and Tiktok.
More advanced users complained about prepackaged analysis that needed more transparency.

But I still couldn’t answer the question that was bothering me, how do marketers have any confidence in this data? What was the conceptual framework that was underlying all this data analysis?

Designing research that I can understand:

For me to trust my research, I’d need to design it from the ground up using small pieces that I can accept and understand. And so I went looking for those pieces:

Statistics is about trying to infer something about a large group of things by observing a smaller, representative group. Sampling is the act of picking that smaller, representative group of things. Online data happens to be very difficult to sample from and is prone to sampling bias [0].
Highly explorative, loosely designed studies make good sense when time is not a factor or when trying to study highly complex phenomena [1].
When you are entirely new to a domain, you’re still trying to figure out the most helpful question. Start with any question. It might not be the final or only question you answer. In fact, it’s likely to change because as you develop a conceptual understanding of a domain, you’ll formulate more profound, specific questions [2].
Social media can be performative [3][4]. Instead of figuring out some ground truth, it’s better to think about social media as containing the multiple, sometimes competing, perspectives of the humans that make up the community.
Insight may lie in well-selected extreme cases, cases not statistically representative of their groups. Sometimes it’s better to sacrifice generalizability to explore an information-rich case (like is common with ethnographic research) [5].

And so, my totally biased foundations for online research:

Research must be quick and cheap to perform. We don’t expect to generate surprising breakthroughs from social data but only use it as a supplementary, de-biasing tool. Your personal context and intuition are still the primary driver for insight.
Research must ask an answerable question. We take an iterative approach but not a purely explorative one. This means our primary focus isn’t on data collection and mining but on using inquiry to try and progressively generate more exciting questions.
We don’t try and make generalizable inferences. We ask questions limited to the dataset. Ideally, the observation itself is the answer to our question.
Our ultimate benchmark is if our analysis had a noticeable effect on our perceived uncertainty.

Then…what makes a good market research question?

Generating a market research question:

I’ve read books on marketing strategy before, and I usually think, I don’t see how this is superior to just plain critical thinking when required. But that’s on me. I should have taken the time to really think about the act of marketing.

So I start on the Market Research Wikipedia page (The older I get, the more I go to Wikipedia for just about every first introduction). And sure enough, I find what I expect. I’m paraphrasing, but “gathering information on the target market”, “what segments exist”, etc. But something interesting does come to mind.

The primary questions market research tries to answer don’t seem very different from what a person might ask themselves:

Where do I find people I like?
Will they like me back?
Who do they like right now?
What can I do to be perceived well?

It’s not a profound definition, but I need one to figure out what a good research question is. And so, Marketing is the act and skill of positively interacting with a heterogeneous group of people. Entering a new market is like being the new kid on the first day of school.

It’s important to note that market research can be performed using different units of analysis. It can be done at the industry level, firm level, or at the level of the individual [6]. Each unit of analysis suggests a different line of questioning. We’ll use the individual as our unit, just to pick one (and I think it goes well with our data).

So, market research is any question that tells us something about the individuals that make up the market. Our options are considerably narrowed if we prioritize quick, specific, ethical, and easy-to-answer. So we’ll start with this: “If I post something relevant to my domain on Hacker News, who might show up?”

Act 3 : How do I answer a question about the market?

Locating data that might answer our question:

It’s at this point that I discover the Hacker News Search API provided by Algolia. I would’ve just used this at the start if I’d known about it.

But I had a database full of Hacker News data. And I’d noticed OpenAI’s Text Search Embeddings while going through their docs earlier, so…

I use both as a combined search to find posts relevant to my domain (I picked ‘note-taking’ because I’d written about it a month ago). You can read more about the search implementation in the Optional section on my blog. I end up with a highly relevant set of 108 posts.

I browse through the posts and read through the comments. It takes me around 30 mins to get a sense of what Hacker News thought about the domain. I realized it wasn’t that hard to do. Reading through each post’s top 4–5 comments was good enough. Sheesh! Did I waste all this time? Could I spend an hour on the website and come away with a good understanding of the community? I probably could’ve…

But I still wanted to see this through. I needed to answer my research question.

Using a computer to maintain temporal context:

Since I had all my data in Postgres, I could figure out the following with some basic SQL. In those 108 note-taking posts

there were a total of 7,411 comments made
by 3,990 unique users
who each commented on an average of 1.3 note-taking related posts.
Average score per post was ~131 points.
Average number of comments per post was ~68 comments.

I wanted to see who shows up the most. Here’s a list of the top 4 users who commented on the most posts (with their usernames changed):

Alpha — 15 posts (~14% of the 108 posts)
Bravo — 13 posts (~12% of the 108 posts)
Charlie — 11 posts (~10% of the 108 posts)
Delta — 10 posts (~9% of the 108 posts)

I was disappointed that the most prolific commenters showed up only around 10% of the time. A statistician would work more to see if these numbers were noteworthy. Still, in making these calculations, I realized I’d done something that felt different.

I could do good summarization and sentiment analysis by reading the posts. The computer made it easy to analyze data that required keeping track of context across time and space (such as between different pages on a website). The analysis was understandable (simple counting) and hard to do with my eyes and working memory.

Yes, the data includes outliers, only a small percentage of posts, and it’s simplistic. But it’s very quickly given me rough numbers. When previously, all I had was a blind guess. Even if it’s not perfect, just the fact that it’s understandable information (bias and all) that I didn’t have before made it feel useful. And I think that’s precisely how the marketers used those Audience Intelligence tools. They weren’t trying to be “data-driven”. They were just looking for quick signals to avoid flying completely blind.

Now, I’d used SQL to count who commented on the most note-taking posts. But I still didn’t know who they were. That’s where GPT-3 comes in.

Failing to impress but learning to simplify:

Note: You can find the final results at the beginning of this article.

I start by summarizing each comment thread (from the 108 posts) the user participated in. I experiment for a bit and end up using this as a prompt template (with slight modifications for single comments or multiple users):

Hacker News is a website where users share links and discuss the content using comments. The following is a conversation between user '<USER A>' and user '<USER B>' under a post titled '<POST TITLE>':

original comment by <USER A> : <root comment>

<USER B> replying to <USER A>'s comment: <child comment>

Summarize <USER A>'s conversation in a first person point of view. After the summary include a list of relevant details about <USER A> that would be useful to a marketer.
Summary:

After summarizing a user’s involvement in note-taking posts, I summarize their entire activity over a year using the following prompt template:

Hacker News is a website where users share links and discuss the content using comments. The following is a timeline containing summaries and relevant details of <USER A>'s activity over the last 12 months.

post: <POST 1>
posted: 12 Months Ago
<USER A>'s activity: <SUMMARY GENERATED FROM PREVIOUS STEP>
relevant details:
<LIST OF DETAILS GENERATED FROM PREVIOUS STEP>

post: <POST 2>
posted: 10 Months Ago
<USER A>'s activity: <SUMMARY GENERATED FROM PREVIOUS STEP>
relevant details:
<LIST OF DETAILS GENERATED FROM PREVIOUS STEP>

I am a researcher trying to conduct market research online. I'm currently trying to understand the note-taking app market. Here's my detailed summary of what I learned about <USER A> with regard to note-taking.
Summary:

The prompts could be better. But I noticed that small changes had drastic effects on the generated output. It takes trial and error to start seeing good results consistently. But the truth is…

I’ve failed you, Supreme Reader (I just finished watching the sequel trilogy)! I’m disappointed with the final results, and I focused too much on the generative capabilities of GPT-3 (my apologies to the data scientists and ML specialists). My analysis may not be the most creative, and there are definitely more interesting questions to ask of the data, but…

Combined with my satisfaction with Search+SQL and the user reviews I’d read, picking a handful of classifiers to enrich data, good search functionality, and SQL may be all we need. Something like this, perhaps:

SELECT * FROM comments WHERE 'notion' IS IN product_mentioned AND sentiment = 'delighted';

But would any of it be truly useful? I don’t know. Did it feel a little creepy to make summaries on people? Yes. One big thing I left out was lead generation. Not that useful on Hacker News, but it’s popular on other platforms where you can message users. Ethical or not, I didn’t think it was interesting enough of a problem. You could use a classifier to figure out who’d make a good lead, but it still felt like data extraction and not research.

Originally posted and formatted for my blog here. You can find links to the embeddings and the Optional section that goes into more detail about how I combined vector and lexical search.