The “tiny” field guide to dating - A Data Science Tutorial

Before I get into it, spoiler alert! The same way reading about martial arts doesn’t make you a black belt, reading this article won’t land you dates directly (if it does, do buy me a pizza). But what it might do, is improve your chances and hopefully give you an insight on how to extract information from data.

https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.bworldonline.com%2Fcall-me-by-your-name%2F&psig=AOvVaw2XCre523FwcRM6XmH
https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.bworldonline.com%2Fcall-me-by-your-name%2F&psig=AOvVaw2XCre523FwcRM6XmH
A possibility?

Dating has always been one of the most interesting and hard to understand parts of life for almost anyone. But can we try to identify some trends which could be followed to potentially get you a date? I found a dataset that will allow us to explore this question a little further. So let us begin.

There will be lots of figures so go ahead and look at them.

A little introduction about the data set must be given before we end up using it. So the data is from a speed dating experiment which is conducted by the University of Columbia. Experimental dating events that were scattered through our 2002 to 2004 were observed and at the end of every speed date, the participants were asked if they would like to continue and see their date again. Throughout different points in the process, data was gathered such as demographics, beliefs, even attributes that they would consider in themselves.

So let’s dive into it. Follow along in this notebook

We first have a look at the data and find that there are 195 attributes and more than 8,000 rows.

Image for post
Image for post

We also find out that there are a lot of missing values. So rather than try to work on them, we drop all the attributes which have more than 4,000 rows missing. This is because it might skew our data and give us a headache later on.

We get the new number of attributes as 130. Which is a lot of attributes. Now we filter out the people who actually did get a second date. Because at the end of the day, that is the aim of our analysis.

We find out that there are less than 21% of the people who actually got a second date. But are we really surprised?(xD) Let’s dig in a little more. How about we take some of the attributes and try to extract a little more information from them.

Let us first take ethnicity an attribute. We use a function called crosstab from the library pandas. This allows us to find the counts of a specific column for all the unique values present in the column.

We find out that more people got a second date and were not of the same ethnicity. It should be noted that this data could be skewed based on the collection method.

How about we consider age. We now try to plot age against the number of people who actually got to a second date.

Image for post
Image for post

We find out that the majority ranges from 21 to 30 years of age. If you are more than 30, there are relatively smaller chances for you, but the very fact that it’s not zero shows that you still have a chance. Go for it!

Now let us consider jobs.

Image for post
Image for post

Sadly for the engineers and the math nerds out there, you might have a little harder time. But we all knew that, didn’t we? If you are doing business, economics or finance, or even physics and chemistry (Hats off.)

Now just for fun let us consider an attribute that tries to identify why the person registered for the speed date in the first place.

Image for post
Image for post

From this, we find out that among the people who did get a match, a majority of them were out to either meet new people or they just felt that it would be a fun night out. My tip? Go out and have fun. Even if you did not get a date at least you had fun.

Those looking for a serious relationship don’t seem to like speed dates. I wonder why?

Now, how about deciding what to do on your first date? To do this, we take both the genders and find out the overlapping activities for both of them.

Image for post
Image for post

That is some very tiny text. But, just by looking at the plots (and zooming in a lot) we can say that you might have a better chance at getting a good date if you work on either Dining (Good food sure does matter), Music (Come on..), Reading (You can always bring up interesting topics) , hiking or clubbing. This tells us that these are some common hobbies that might just be easier to start a conversation using.

Now let us look at how many times the person goes out every week. We first write a small function which we will reuse later that gives us access to the plots from our matched dataset in a single line.

Image for post
Image for post

We can see that most of the people who got a match are people who go out a lot. That does not mean if you step out of your room 5 times a week that somehow makes you more desirable. (Depends on where you go from there though. Especially if you go to some fancy office. Just kidding.)

Now for something I found interesting. I wanted to see how many of these people who got themselves a date thought themselves attractive or felt that they were fun to be around.

Image for post
Image for post
Image for post
Image for post

Okay, wow. This is surprising. We can see that these people have a pretty great amount of self confidence. Awesome! It is great to be confident in yourself and believe that you deserve it.

But… maybe not too much. If we take the parameter which shows us how many matches each person will get. And then divide our data into two parts -> the ones who got a match and the ones who did not.

Image for post
Image for post
Image for post
Image for post

Oops. The ones that did not get a match seem to have significantly higher self-confidence in themselves. Maybe a bit too much(?). So, believe in yourself but do not believe too much. Just the right amount.

Another interesting aspect of this data was an attribute called rating. So among the ones that did get a match, the participants were requested to rate their date. And here we go.

Image for post
Image for post

Hold on.. did.. someone.. get.. a.. 10?!! Multiple people seem to be there. I will skip over the code as I am using the same things as before except this little bit.

Do feel free to use this snippet and extend it for the other bits we talked about. Maybe you might come up with something new.

YAY!

Hopefully, you now have a date and a nice Jupyter Notebook too.

PS. This is a data analytics tutorial. So don’t come running behind me because what you used from here did not get you a date.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Subhaditya Mukherjee

Written by

I am a dreamer and coder. Using my computer to get my thoughts to reality and trying to make the world better, one smile at a time :)

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Subhaditya Mukherjee

Written by

I am a dreamer and coder. Using my computer to get my thoughts to reality and trying to make the world better, one smile at a time :)

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium