How San Francisco’s gender disparity affects the attractiveness pairings of couples

Steven Tang
Fun with data and stats
7 min readNov 19, 2015

There seems to have been quite a bit of misunderstanding stirred up by this article, so please read this disclaimer: this analysis makes absolutely no value judgements about how attractive men & women in SF are, or how attractive they should feel. I lay out all the simplifying assumptions and I’ve tried to explain that this is not how the real world works. Nor do I believe this is how the real world works. No sane human should heed any advice from this article. None of this has any basis in reality. It’s not supposed to. This is just a thought experiment about how one might build an economic model for dating with gender ratio imbalances. I’ve preserved the entirety of the original text below. There’s plenty of room for miscommunication because the assumptions are buried inside the text. That’s my fault. But I urge everyone to read the piece in its entirety before jumping to conclusions.

There’s a joke that I’ve heard passed around the circles of frustrated single men in San Francisco. They claim that this city is home to 49ers — girls that are 4’s but think they’re 9’s in terms of attractiveness. Whether the ineptitude of San Franciscan men or the confidence of San Franciscan women bears responsibility for this sentiment I cannot say, but it did make me curious about whether the numbers might be able to reveal anything.

The numbers in question, of course, are the gender ratio numbers. If the shortage of women is as bad as the guys lament, then it shouldn’t be surprising that a woman can date a man that’s more attractive than herself. I tried to find out how large that attractiveness difference should be (can 4's reasonably think they should be matched with 9's?), given a few simplifying assumptions about behaviors and distributions that, like any good economic model, may or may not have any resemblance to reality.

Distribution of attractiveness

We’ll assume that men and women’s attractivenesses are distributed identically along the classic 0–10 scale. Said another way, their attractivenesses have the same probability density functions. There exists research on the statistical distributions of attractiveness, but they’re all pretty bad. So let’s assume something sensible and simple — that attractiveness follows a normal distribution. But since we want our distribution to have a minimum of 0 and maximum of 10, we need to truncate our distribution. To satisfy that, we can use the truncated normal distribution. Here’s the truncated normal distribution with various standard deviations:

Truncated normal probability density functions

I‘m also constraining the 0–10 scale to be relative in SF. The hottest guy/girl in SF is a 10 and the ugliest is a 0, irrespective of how attractive people in New York are. This analysis assumes that the mean is 5 throughout. We’ll analyze the effects of changing the variance of the distribution, but we’ll start off with assuming that the standard deviation is 1.75 (I picked this because I subjectively think the density function looks empirically correct).

The gender balance

I got the data for the number of singles aged 20 and older in SF from Census Reporter, which pulled 2014 data. Single in this case refers to marital status (that’s the best we can do here). There’s not a big gender gap between all of the singles in SF but the differences are much larger in the younger age groups. The men of the 35–49 age group have it the roughest. I think there are a lot more women in the 65+ age group because women tend to live longer than men, so that’s a not very good cohort to examine.

Multiplying probability densities by populations gives us the respective attractiveness distributions for each gender. The 20–34 age group looks like:

Density functions with mean=5 and SD=1.75

At each level of attractiveness there are more men than women, but the difference in absolute quantity is most pronounced near the mean.

Dating behavior

This piece forms the basis of the entire analysis. It might get a bit confusing so bear with me. Here are the rules of our dating game:

  1. People will date based solely based on attractiveness score.
  2. Single men will only date single women and vice versa.
  3. Assuming every single person in SF was required find a partner, they’ll match with the most attractive partner they can find.

The last behavior is the most important one. A 10 woman will date a 10 man since it’s optimal for both to date one another. However, a 0 woman doesn’t have to date a 0 man. In the 20–34 group, there are 11,617 more men than women, so if all dating follows rule 3, the 11,617 least attractive men in SF won’t find a mate. The least attractive woman in SF should be able to date the 11,618th least attractive man in SF and that man is definitely not a 0.

The 100 most attractive men and women will date one another, the 5000 most attractive men and women will date one another, and so on. But because of the gender disparity, the 5000th most attractive man should be more attractive than the 5000th most attractive woman.

I’ll call these pairs (m,w) where the n-th most attractive man with attractiveness m matches with the n-th most attractive woman with attractiveness w “attractiveness equivalences”. If there are more women than men, we’ll always have m>w.

We can visualize this with the cumulative density functions:

The points of equivalent attractiveness are those in which areas under the density curves are equal, starting from the right tail (both 5000 in the graphs above). So to find all attractiveness equivalences (m,w), we need to solve

which we can simplify to solving an equation with a cumulative density function

where the truncated normal cdf with truncation range [a,b] is

and Φ is the normal CDF.

I couldn’t find a good way to solve this for m in terms of w so I had to write an R script to find points where the cdf’s are equal at discrete intervals.

Results

Equivalence attractiveness curve with SD=1.75

Let’s start off by examining the 20–34 age group. The purple line shows how men and women would be matched if there were no gender disparity — each person matches with another of the exact same attractiveness. For 20–34 year olds in SF it’s a different story. Each point on the blue curve shows a point of equivalent attractiveness for men and women. A woman with attractiveness 0 matches with a man with attractiveness 2.895 (recall this is the 11,618th most attractive man age 20–34 in SF). A 5.0 woman matches with a 5.249 man. This curve is the plot of all attractiveness equivalence pairs (m,w).

Note that the difference gets smaller as attractiveness increases. This makes sense. According to our dating rules, 10s are always going to date 10s so at the higher ends of attractiveness the differences will be smaller. As we move to the lower ends, the cumulative disparity in the number of men and women with equivalent attractiveness increases.

The equivalence curves vary widely by age because the gender ratio varies.

Recall that the Total curve is skewed because it incorporates the 65+ cohort. Since there are more women than men in that group, the 65+ curve bends the other way. Unsurprisingly, a larger gender disparity means a bigger difference in attractiveness pairings.

We can think of these equivalence curves as the match-ups in which the dating markets “clear”. In order for every man and woman to be matched up according to our rules, men and women on the same point of the equivalence curve should date one another.

Of course, changing the standard deviation of our distribution will also affect our results. Lower variance means that there are more people who are closer to the average. Let’s examine how standard deviation affects equivalence curves for the 20–34 age group.

When attractiveness is more spread out, the differences are larger at the more attractive ends, but don’t increase as drastically on the lower ends.

So what does this all say about 49ers? We can definitively say that girls who are 4's probably don’t think they are 9s, despite the gender disparity in SF singles. In the 20–34 age group, 4's more likely think they’re somewhere around 4.393. The largest difference is in the 35-49 group where the women 4's are matched with the men 4.837's. Here are some discrete points to consider:

The ones who have it really bad are the guys at the bottom of the spectrum. That’s where the difference is largest, and if you’re below a 2 or 3, you probably won’t find any matches. Of course, this model has no real consequence in the real world because the assumptions are unrealistic. But given the gender disparity in SF, 49ers isn’t a very accurate term. The next time your single guy friend complains, tell him that it’s 44.837ers at best.

--

--