We Found Love In A Hopeless Place
Linguistic Analysis of “Men Seeking Women” on Craigslist
Craigslist is all about lowering barriers to entry. Whether you are looking to sell old Pokemon games or find true love, potential opportunities lie hidden somewhere within Craigslist’s 90’s inspired webpage.
This project looks at the words used in the “personals” section of Craigslist, where users try to seduce strangers into friendship, romance, and a hodgepodge of random fetishes. Their words read less like a sonnet from Shakespeare and more like a letter from James Joyce.
Who are these Craigslist poets, and what inspires them to seek love online? Is there some universal “Craigslist form”, or do they inherit their styles from their local areas?
These questions will be answered by taking a look at Craigslist posts from around the country and looking for patterns. Hopefully, this gives a glimpse of love in the 21st century.
The list of all Craigslist sites was broken down into four broad regions: Northeast, Southeast, Midwest, and Far West. From these regions, I randomly sampled the “Men Seeking Women” section of 10 different sites.
Posts contain basic information like time, title, and body text. Users can additionally volunteer information such as age, physical characteristics, or STI status (Surprisingly, very few people advertise that they have chlamydia). We’ll look at communities ranging from highly urban areas like New York City and Los Angeles to less populated sites like Bloomington, Indiana and Maine.
Tools of the Trade
This project was put together in a Jupyter Notebook and used BeautifulSoup for the initial data acquisition, which was then loaded into Pandas dataframes for easier manipulation. Data visualization uses the Matplotlib package.
Since Craigslist’s terms of service specifically prohibit the use of web scrapers, I will not be posting the full notebook. Sorry! If you have specific questions about how any of the data has been manipulated or presented, then feel free to reach out!
Before we get too deep into the minutia of how people craft their posts, let’s look at some basic demographic data.
What do we know about the primary users of Craigslist? Most online boards will tend to have an older skew. Is Craigslist any different?
Overall, the average poster was 40 years old with a standard deviation of 10 years. The distribution of ages is roughly normal, which means that there is not a statistically significant difference in the average posting age of each site. There is a bit of peak around 43 years old, but nothing overly significant.
What People Talk About
What do 40 year old men like to talk about anyway? I assume that it’s a weird combination of stock trading, dietary fiber, and ham radios.
Well, by using the WordCloud package, we can get an idea of what people like to talk about the most! The larger a world is, the more frequently is appears in the sampled text. Nothing here seems particularly out of the ordinary, but it is nice to know that a lot of people are “looking” for “love.”
On its face, this makes Craigslist look like a more respectable corner of the web. Maybe even wholesome! But a quick visit to your local Craigslist will paint a canvas full of crass and sexual language. What does the typical post really look like?
To test this, I created a bot that randomly generates Craigslist posts based off the information that it has been fed. It is powered by Markov Chains, a mathematical model that looks at the frequency that certain words appear next to each other. It’s not perfect, and there are more effective ways to randomly generate text, but this tool is sufficient for initial analysis. To wit:
Lets get naked and tell me about my day. I like to fool around, massage your back as well as all over your bottom, up your place or my place or I will do my own laundry. Please email with your honey-do lists.
While some “boring” words are essential to crafting a seductive Craigslist post, there is a huge diversity of sexual interests and fetishes. With a community full of people who get their kicks from doing their own laundry and those that just want strangers to tell them about their days, it is difficult for any particular fetish to stand out on Craigslist.
Truly a tragedy of the modern era.
Next, I ran all of the posts through Flesch-Kincaid readability tests. These tests look at the reading grade level (corresponding to the school grades for most American public schools) of writing and give a score based on the average number of words per sentence and syllables per word. Most conversational English falls at around a 6th grade level. Craigslist personal posts were graded as 5th grade, the lowest possible level.
It is not recommended to introduce your 10 year-old nephew to Craigslist.
Each area of the country has a different culture for courtship and romance. Naturally, these linguistic variations appear as users spin their heartfelt requests for love. Do people keep things PG-13, or do they let their carnal desires be known?
To figure this out, I looked at the rate that each site swore in its posts. I’ll skip the gory details here, but swears include the *ahem* traditional cusses and sexual vernacular. Some pretty obvious innuendos can slip past this filter, but Craigslist is not the most subtle place, so I’ll assume this filter is acceptable.
As it turns out, older areas aren’t any more likely to swear than younger areas. When it comes to potty mouths, we’re all about the same… except for those ****** in the northeast.
By looking at the language of each region, we can get a better idea of what words are most unique to an area. Think of this as the “lingo” of the area, except this lingo is specifically geared to attract women. To get this information, I counted how many times a word appeared in a certain region and then compared that to the rest of the country.
For the sake of getting the most interesting data, this metric ignores words that are used extraordinarily rarely (No bonus points to the guy in Albuquerque who used the word pulchritudinous in his post) and geographic terms (Does it surprise anyone that people in Boston use “Boston” more than anyone else?)
The most “East” words follow a very vulgar trend, as people most love words like “suck,” “tight,” and “bi.” Pretty to the point, eh?
Key West Coast phrases reference its Hispanic population, highlighting words like “Hispanic” and “una.” Apparently they are also fairly confused about geography, since they talk about the “east” a lot as well.
The Southern United States showed a predilection for the intermingling of faith and romance, including phrases like “church” and “God” at the highest rate. And beaches. Beaches are fun too!
But when it comes to the Midwest… it’s not really clear what is going on. The strongest connection lies to the word “hate” and both “overweight” and “petite” appear with high frequencies. The word “inch” is mostly used exactly as you would expect.
Bear in mind that these texts have been pulled from a relatively small sample of each region, and each region is sufficiently diverse that many populations have not been included in this dataset. If you really want to stand out from the crowd on the West Coast, it’s probably best to avoid talking about Inclusive Hispanic Boating trips. Then again, I don’t think you’d get much luck with that anywhere.
This is just an introductory exploration of the data. It looks like the Eastern United States needs to wash its mouth out with soap and the Midwest has unique interests. By using some questionable python scripts and natural language processing techniques, we can crawl into a dark corner of the Internet and get a (very vague) idea of how the language of love changes across the country.
Oscar Wilde wrote that a life without love is like a “sunless garden when the flowers are dead.” Craigslist won’t be mistaken for the Gardens of Versailles anytime soon. It’s more like a small potato patch growing in your backyard -it’s not very pretty and might be poisonous, but it can give you hearty sustenance in a pinch.
And that’s gotta count for something, right?
This section describes the sites that were sampled, grouped by geographic region.
East: Central New Jersey, New Haven, Philadelphia, Vermont, Baltimore, Boston, Delaware, New York City, Maine
Midwest: Chicago, Peoria , Lincoln, St. Louis, Madison, Indianapolis, Bloomington, Lawrence, Springfield, Cincinnati
South: Auburn, Little Rock, Treasure Coast, Jackson, Chattanooga, Tri Cities Tennessee, Columbia, New Orleans, Atlanta, Raleigh
West: Stockton, Orange County, San Diego, Los Angeles, SF Bay, Portland, Seattle, Salt Lake City, Albuquerque, Las Vegas