Learning to find a Girlfriend at the University of Waterloo by Logistic Regression
The University of Waterloo is well known for its lack of social life and difficulty of finding romantic relationships. Like many other Waterloo CS majors, I wouldn’t be able to find a girlfriend, even if my life depended on it.
Some people feel love is unquantifiable, and you should “just be yourself”. Well, I’m UW Data Scientist, so I respectfully disagree. Why not learn how to find a girlfriend with…😎 machine learning?
I made an app to estimate your probability of finding a girlfriend! #innovation #sideproject #wow
Methodology
The question for this study is: what are the attributes that tend to correlate with having a girlfriend among male Waterloo students? It is commonly assumed that having a high paying job will make you more attractive. Physical characteristics like height and muscle may also play a role. We try to identify which attributes are the most predictive, and which are mere assumptions not supported by data.
Off the top of my head, I came up with the following attributes:
- Dating (target variable): person has a girlfriend, or had one for at least 6 months over the last 5 years
- International: person is an international student
- CS: person majors in CS, SE, or ECE
- Career: person is successful in academics and finds “good” jobs for internships
- Interesting: person has interesting things to talk about
- Social: person is outgoing and tries to meet new people
- Confident: person appears confident
- Tall: person is taller than me (>175cm)
- Glasses: person wears glasses
- Gym: person regularly works out at the gym or plays sports
- Fashion: person cares about wearing nice clothes
- Canada: person mostly lived and worked in Canada for the last 5 years
- Asian: person is East Asian ethnicity
You might notice that some of these are quite subjective — what qualifies a person as interesting? In these cases, I tried to assign 1 to about half the population, and 0 to the other half. Therefore, we’re measuring the relation between my own (biased) perception of other people’s interestingness to their ability to find a girlfriend.
Yeah, if you expected a statistically rigorous study, you can stop reading now.
To collect data, I tabulated every person I could think of and rated them either 1 or 0 in each of these attributes. In this way, the dataset has N=70 rows. If you’re a guy, go to Waterloo, and talked to me in the last 2 years, then you’re probably included.
Analysis
First, we perform Fisher’s exact test on the target dating variable against each explanatory variable. The three variables that are the most significant are:
- Gym — guys who go to the gym or play sports regularly are more than twice as likely to have a girlfriend (p-value = 0.02).
- Glasses — guys who don’t wear glasses are about 70% more likely to have a girlfriend than guys who do (p-value = 0.08).
- Confidence — guys that appear confident are more likely to have a girlfriend (p-value = 0.09)
Muscular and confident guys are attractive, as expected. I was quite surprised by the large effect of glasses, and wondered if it was an indication of something else, like general nerdiness. So I looked for more careful studies and confirmed that indeed, the majority of people consider glasses to be unattractive for both genders.
Some variables may be slightly predictive of dating success, but it’s hard to say for sure due to small sample size:
- International students have better success with dating than domestic students
- Asians men have worse chances with dating than other races
- Controlling for other factors, guys in CS seem not to be at a disadvantage, despite the lack of women
The rest of the variables (height, career/academics, interestingness, sociability, fashion, Canada/US) have not much correlation with dating. Sorry, but even if you go to Facebook in Menlo Park in 4A, you will still not have a girlfriend.
Full results of this experiment:
Variable: international
N(international)=10, N(~international)=60
p(dating|international)=0.60, p(dating|~international)=0.38
p-value=0.299Variable: cs
N(cs)=56, N(~cs)=14
p(dating|cs)=0.45, p(dating|~cs)=0.29
p-value=0.368Variable: career
N(career)=46, N(~career)=24
p(dating|career)=0.43, p(dating|~career)=0.38
p-value=0.799Variable: interesting
N(interesting)=34, N(~interesting)=36
p(dating|interesting)=0.47, p(dating|~interesting)=0.36
p-value=0.467Variable: social
N(social)=29, N(~social)=41
p(dating|social)=0.45, p(dating|~social)=0.39
p-value=0.806Variable: confident
N(confident)=37, N(~confident)=33
p(dating|confident)=0.51, p(dating|~confident)=0.30
p-value=0.092Variable: tall
N(tall)=26, N(~tall)=44
p(dating|tall)=0.46, p(dating|~tall)=0.39
p-value=0.619Variable: glasses
N(glasses)=41, N(~glasses)=29
p(dating|glasses)=0.32, p(dating|~glasses)=0.55
p-value=0.084Variable: gym
N(gym)=22, N(~gym)=48
p(dating|gym)=0.64, p(dating|~gym)=0.31
p-value=0.018Variable: fashion
N(fashion)=17, N(~fashion)=53
p(dating|fashion)=0.41, p(dating|~fashion)=0.42
p-value=1.000Variable: canada
N(canada)=31, N(~canada)=39
p(dating|canada)=0.42, p(dating|~canada)=0.41
p-value=1.000Variable: asian
N(asian)=59, N(~asian)=11
p(dating|asian)=0.37, p(dating|~asian)=0.64
p-value=0.181
Next, we examine the correlations between the variates; this can help identify incorrect model assumptions. Red means positive correlation, blue means negative correlation. We only show correlations that have statistical significance < 0.1, so most pairs of variates are blank.
It appears that {having girlfriend, appearing confident, going to the gym, not wearing glasses} are all mutually correlated.
Before we go on, I should emphasize the demographics of my friend groups does not represent the general UW population. I either meet people in classes or at work (wide variety of backgrounds but all doing CS), or through mutual friends (lots of different majors but mostly East Asians that grew up in Canada).
Any model trained on this data will reflect these biases. In the future, I might look into doing a wider survey to get more data.
Girlfriend Prediction with Logistic Regression
Wouldn’t it be great for an algorithm to predict your chances of finding a girlfriend? Let’s do it!
I trained a logistic regression GLM to predict girlfriend from all of the explanatory variates. Using the glmnet and caret packages in R, I trained a GLM using elastic net regularization. A standard grid search was performed for hyperparameter optimization, using leave-one-out cross validation and optimizing for Cohen’s kappa coefficient in each iteration.
The resulting model score has cross validation ROC AUC score of 0.673, meaning it can predict chances of finding a girlfriend better than random, but there is still a lot of inherent uncertainty.
I deployed the model as a RStudio Shiny app here for you to play with.
Well, that’s it for now. Time to hit the gym and book a LASIK appointment.