Baby Can I Get Your Auth Token?

How To Get Textually Active on Your Favorite Dating Apps

Hello again! Nice to see your shining faces. I’m sorry these posts have been sporadic; I’ve actually been off Tinder for a while now. But what about you? Has Tinder worn you down yet, or are you still too scared to start?

The first comment I always get when I talk to people about online dating is:

Ginny, I’d try online dating, but I wouldn’t know which app to use until I’ve seen a word cloud and a latent dirichlet allocation analysis of the profiles you’d see on at least six dating apps.

And that, friends, is the reason for this blog post. I wanted to compare what kinds of profiles you’d see on different dating apps so that if you’re not sure which one is for you, you can get an idea of who’s smangin’ around each one.

Getting the Data

The dating apps I analyzed for this blog are JSwipe, Hinge, Happn, OkCupid, CoffeeMeetsBagel, and Tinder. I pulled all the data from the profiles on each of these apps. I didn’t include messages or stuff like Facebook tags. For CoffeeMeetsBagel and OkCupid in which users write essays to several prompts, I combined the text from all the prompts into one character string per user.

The hardest part was collecting the data. There are a lot of tools that make Tinder’s API super easy to use, so that wasn’t too tough. For CoffeeMeetsBagel and Hinge, I already had most of the profiles saved in an excel spreadsheet, and I added the rest. Happn has an API, but it was way harder to use than Tinder’s. Instead, I saved all the profiles as as PNGs, converted those into PDFs, used Google Drive’s OCR tools to make text documents, and cleaned them up in R. I had to do the same thing for JSwipe. In both cases, it was painful. For OkCupid, my boyfriend handily happened to already have a script that could scrape OkCupid and read the profiles into a csv file (not to be a tool or anything, but we’re kinda the Kardashian-Wests of creeply online dating analytics. One of those trendy data_scientist_software_engineer couples). A word of caution though- if it seems like a good idea to let your boyfriend download your entire OkCupid history onto his laptop, your boyfriend might need a new girlfriend and/or laptop.

Once I had all the data, I ran LDA on the profiles from each of the data apps. I used RTextTools, topic models, and SnowballC in R for modeling. It automatically stems all the words to match up multiple words with the same root (ex: cook and cooking), so if the words you see aren’t actually English words, that could be why. I also removed English stopwords such as “the” and “I” from the text.

LDA Analysis

I identified the key topics in each of the dating apps using Latent Dirichlet allocation (http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation). Basically, LDA is a technique to take a body of text and boil it down to a few key words/themes. In this case, each body was the set of all profiles I had pulled from each of the dating apps. My goal here was to summarize the types of profiles you’d see on Hinge, Happn, OkCupid, JSwipe, and Tinder in a few key terms so you can compare across apps. Of course, this isn’t really a valid comparison for a number of reasons.

First, the structure of profiles varies greatly across apps, and this affects the number of words per profile. OkCupid has much longer profiles with multiple essay questions. Many people on Tinder and Hinge don’t even fill out their ‘About Me’ section (although I skipped those lame-o’s: in life and in this blog post). CoffeeMeetsBagel asks a number of questions but people give short answers. Basically, there was a ton of text for OkCupid and almost none for any of the others.

Second, I used the same number of terms for each iteration of LDA. There are ways to figure out the optimal number of terms for a data set (http://blog.cigrainger.com/2014/07/lda-number.html). However, even though this was an unsupervised learning problem, I had a feeling there should be about 5 terms per app because I see about five different types of profiles on each app (5'1" -5'4", 5'5"-5'6", 5'7", 5'8"–5'8.5", 5'9"+).

These were the terms I got when I ran LDA with 5 clusters on each of the dating app’s set of profiles:

JSWIPE

HAPPN

I hope it’s French like the fries not the kissing.

HINGE

TINDER

Classic Tinder — just don’t.

OKCUPID

This only matches my profile for Topic #2.

Alright, so take a look and see what appeals to you. I don’t want to influence your decision too much, it’s pretty clear to me that OkCupid is full of downers and druggies.

Word Clouds

The analysis above is helpful if you want 5 words per app, but some people might need more than that to commit to 7+ minutes/day of strenuous thumb exercise. I also wanted to make a longer lists of words people wrote on their profiles on each app. I used a word/tag cloud (http://en.wikipedia.org/wiki/Tag_cloud), which is generated by identifying frequently-occuring sets of words. This uses the same body of words in the previous section: profile information for each of the dating apps.

The word cloud is among the most rigorous of machine learning techniques, competing in accuracy only with the storied data analysis methods of “just-look-at-it”, “just-think-about-it”, and “plot-it-in-yellow-so-no-one-can-see-it.” In fact, I’ve heard Google and Facebook store all their data in the word cloud. It’s powerful.

For the CoffeeMeetsBagel word cloud, I identified categories that I thought matched up with terms in my previous blog post(https://medium.com/@mydatablog/coffee-meets-bagel-meets-ginny-ea33fd0780ae) As a quick summary, I took the words used in a CoffeeMeetsBagel profile or message thread that were most highly correlated to my enjoying the date, and the top five were Harvard, Google, Drink, Math, Cornell). This is different because I don’t use the message text at all, and I’m including all profiles, not only those of people I went on dates with. Still, there’s clearly some overlap, and I noted the sections of the word cloud that I believe corresponded to top-scoring words in my previous analysis. I do this, as always, with my characteristic wit that you’ve all come to know and love.

I totally giggled when I first looked this because I thought none of the guys on CMB knew ‘adventure’ had an e at the end. Then I remembered that I had stemmed the words.

The Tinder word cloud is strikingly different.

Tinder is for overachievers.

The JSwipe word cloud was also pretty interesting. I like that ‘enjoy’ was one of the largest words; I’ve heard the ‘J’ in JSwipe stands for ‘Joy’.

Next up is Hinge. This literally could be an advertisement for a startup incubator that invents its own words. But, like, one of the startup incubators that you have to pay to join. Not like one of the elite startup incubators where the innovative new pita chip delivery services of tomorrow are born; at those incubators, curious only has two u’s.

OkCupid is bigger since I had more data for it. It’s also pretty generic as the ‘like like like love like’ thing would suggest.

I am the dog-wine girl!!! It’s me — I’m the one you’ve been looking for, OkCupid amalgamated text-man.

Anyways, I’d suggest you read carefully through each and figure out which one gives you the best vibe. Do you want good-looking and adventurous, stationary deadbeat loser who pees in the sack, Asian love, curiuous and startup-y, or good love like food? It’s a personal choice, and you don’t have to give me an answer just yet. You do, however, have to give me an answer tomorrow, so don’t procrastinate too too much.

Conclusion

Pick a fucking app and get on it!!!

To contact the author of this post, please email ginny5hogan@gmail.com.