Data Analysis of Coffee Meets Bagel

David Hsu
P.S. I Love You
Published in
8 min readFeb 14, 2017

Last year on Valentine’s Day, I made an informal analysis of the state of Coffee Meets Bagel (or CMB) and the cliches and trends I saw in online profiles girls wrote (posted on a different site). However, I didn’t have hard facts to back up what I saw, only anecdotal musings and common words I noticed while digging through hundreds of profiles presented. This year, I have data to back up my observations and we’re going to dive into it.

Data Mining (or Technical Details for What I Did)

First off, I had to find a way to get the text data from the mobile app. The network data and local cache is encrypted, so instead, I took screenshots and ran it through OCR to get the text. I did some manually to see if it would work, and it worked well, but going through hundreds of profiles manually copying text to an Google sheet would be tedious, so I had to automate this.

Android has a nice automation API called MonkeyRunner and an open source Python version called AndroidViewClient, which allowed full access to the Python libraries I already had. I spent a day coding the script and using Python, AndroidViewClient, PIL, and PyTesseract, I managed to comb through all the profiles in under an hour. All of this was imported into a Google sheet, then downloaded to a Jupyter notebook where I ran more Python scripts using Pandas, NTLK, and Seaborn to filter through the data and generate the graphs below.

In total, I collected text from 2025 profiles.

Data Disclaimer

The data from CMB is tilted in favor of the person’s personal profile, so the data I mined from the profiles I saw are tilted toward my preferences and doesn’t represent all profiles. However, even from this, you can already see trends on how girls write their profile. The data you’re seeing is from my profile, Asian male in their 30's living in the Seattle area.

Number of Profiles per Day

The way CMB works is every day at noon, you get a new profile to view that you can either pass or like. You can only talk to people if there’s a mutual like. Sometimes, you get a bonus profile or two (or four) to view. That used to be the case, but around July 2016, they relaxed that policy to show up to 21 profiles per day, as you can see by the sudden spike. The flat lines around March 2016 and Sept 2016 are when I deactivated the app to take a break, so there’s some data points I missed since I didn’t receive any profiles during that time. Of the profiles seen, about 9.4% had empty sections or incomplete profiles.

Age

Since the app is showing profiles tailored toward my profile, the age grouping is pretty reasonable. However, I’ve noticed that a few profiles list the wrong age, either done intentionally or unintentionally. Usually, they state this in the profile saying “my age is actually ##” instead of the listed. It’s either someone young trying to be older (an 18 year old listing themselves as 23) or someone older listing themselves younger (a 39 year old listing themselves as 36). These are rare cases compared to the amount of profiles.

Profile Lengths

Profile length was an interesting data point. Since this is a mobile phone app, people won’t be typing out too much (not to mention trying to write a full essay with their UI is difficult as it wasn’t made for long text). The average number of words girls wrote was 47.5 with a standard deviation of 32.1. If we drop any rows that contains empty sections, the average number of words is 49.7 with a standard deviation of 31.6, so not much of a difference. There’s a significant amount of people with 10 words or less written (9%). A rare few wrote in only emoji or used emoji in 75% of their profile. One or two wrote their profile in Chinese. In both of those cases, the OCR returned it as one ASCII mess of a word as it was a blob to the text recognition.

Ethnicity vs Word Count

A small note on the way CMB does ethnicity. Users can select multiple ethnicity for themselves, which will show up as “White/Caucasian, Pacific Islander, Asian,” each ethnicity separated by a comma. However, from what I can tell, this doesn’t happen often, so I graphed the data based on the Primary Ethnicity, which I designated as the first listed ethnicity. I did make another graph with all the different ethnicity summed together, but it made such a small difference in the graphs that it wasn’t worth the effort to parse the data that way. Only 6% of the profiles had mixed ethnicity listed.

The graphs comparing word count to ethnicity show that most people hover around the average with a small standard deviation. The lines (for the first graph) for a few of them are large due to the small sample size, so there’s a much larger standard deviation.

Religion vs Word Count

As for religion, the only significant difference is Hindu and Sikh profiles tend to have much shorter profiles compared to other listed religions. One exception is one Caucasian girl who listed their religion as Shinto. Kind of an odd profile.

If you noticed from the graphs, there’s one huge outlier in profile length at 322 words. This person wrote enough to take up more than the screen height on my phone, which required me to manually enter the data due to my script not handling that edge case.

Height Map

There’s a few outliers that you may have noticed. There’s a few people who listed their height as 99+ inches, or 8 feet 3 inches. This is not their actual height, so it’s possible they fat fingered the info and never changed it or it was done intentionally. Why would they do this intentionally? It could be to avoid filters (you can filter by height in CMB) though I’m still not sure why they would do this. On the reverse side, there’s one height listed at 44 inches, or 3 feet 8 inches. This wasn’t listed as their height as CMB restricts the lowest height to 4 feet, but rather listed in their profile text.

Ethnicity vs Religion

If we graph religion compared to the ethnicity, the results are pretty much what you expect. In terms of overall trends, you can see Hindus mostly in the South Asian category and Buddhist in Asian.

Common Words and Phrases

Table of words and the total amount of times they appear in profiles.

By far, the most common word used in profiles is the word “new.” This generally comes from the phrase “willing to try new things” as you can tell from the table of top words. This word by itself is in 50% of all profiles. To put that into context, people used the word “new” almost as much as they used “the” when compared across all words in all profiles.

Some other interesting numbers from the data (compared against profiles that are completely filled out):

  • 20% of Christian profiles (Christians account for 20% of the profiles) will explicitly list “God,” “faith,” “Jesus Christ,” or “Christian” somewhere in their profile. In fact, besides the one Catholic who mentions faith, Christians are the only ones to mention their faith and specifically request other Christians.
  • 24% have “sense of humor” as something they want in a partner. Another phrase, “make(s) me laugh,” also accompanies it, but is much less frequent.
  • 22% have “trying/try new things” somewhere in their profile.
  • 4% list they want someone who “is a gentlemen.” Some also use “chivalrous” (as in “is chivalrous”), but this only occurs in 1% of profiles.
  • 1.4% have the phrase “extroverted introvert.” This was significant enough to show up periodically. In the reverse case, “introverted extrovert” showed up in 6 profiles.

Success Rates?

So the bigger question is how successful is CMB in meeting people and getting dates? Here’s some graphs straight from their app (for 199 beans, their in-app currency). Note again that these are my results and will vary for other people.

In-app graphs for like rate and rank on my profile over 3 months. I added dots to the graph for clarity as the original does not have points on the graph. Each dot represents one week.

Based on the provided graphs, it looks like the average like rate in CMB is 27% and over the span of 3 months, my average liked rate is 7.7% (assuming each week has the same amount of people viewing my profile). If you noticed, there are 7 weeks where I had no likes, yet on the ranking graph, it’s still at 18%. The fact that having a 25% liked rate puts me at the top 40% shows you how skewed the average rate is. The problem with the average of 27% is that it’s most likely tilted toward the top attractive people, as with any online dating site, and heavily weighed toward larger cities where more people use it.

CMB also has a feature called Reveal that allows you to see all the people who liked you over the span of 3 months from the current day, and since I had extra beans, I bought a report to get numbers at the end of 2015. I only had two girls, both of whom I connected with (so a max 2% average like rate assuming my profile is shown to only one girl per day as this was before their profile change).

As for my personal stats, out of all the profiles presented to me over the past 2 years, I connected with 19 people (about 1%). Half of the connections never responded. I met with 7 of them (about 0.4% of the total), and the majority never had a second date.

For anyone experienced with online dating, the numbers probably don’t surprise you. It’s still interesting to see the data.

--

--

David Hsu
P.S. I Love You

Photographer, Designer, Engineer, Video Game Programmer, and all around random craftsman