When “Good Enough” Isn’t Good Enough (1/4) — Creating Fake Contacts for iPhone using Python

Data Science Filmmaker
4 min readAug 28, 2023

--

This is the story of me going way overboard for a tiny detail in a movie that will be onscreen for less than 3 seconds. There are a thousand easier ways I could accomplish this, but then I would not learn a bunch of new things. It begins with a problem, and it ends with a (double!) plot twist worthy of M. Knight Shyamalan. (I’m talking about this blog entry, not the movie. Though that is also true of the movie!)

The problem: the script calls for a shot of the main character’s phone as she scrolls through her contacts. For legal reasons, I can’t show real people’s names, emails, and phone numbers on screen. Thus, I need some fake contacts!

The easy way to do this — which I actually did on my last film — is to manually create some fake contacts. I don’t need hundreds of them. Just a few. But what if I did need hundreds of them? How would I go about generating them?

My first step was to analyze the contacts in my own phone so that I knew what a realistic list of contacts might look like. Assuming my list is representative, I noted a few things right away: 1) Not everyone has a first and last name. Some have just first names. Some just last. Sometimes the “First Name” contains the entire name, with “Last Name” empty, and vice versa. A subset of these have the full name, but formatted as “Last, First”. Some portion of the names are initials. Almost no one has data in the “Middle Name” field, but several have middle names included as part of “First Name”. There are some company names without a given name. And lastly, some have no names at all. When this happens, the list view displays only the email address.

Since we are primarily going to be interacting on screen with the contacts list and not individual contact details, the most important thing for me is to capture this mix of names in a realistic way.

I needed a more quantifiable way to generate this mix, so I exported my contacts from my phone and pulled them into python. I used some third-party software to generate a csv, imported it directly into a Python DataFrame, counted how many total entries there were (1331), and got rid of the columns I didn’t need right now (there were, no joke, about 70 columns labeled some variation of “Email Address.4 (7)”. A handful of entries had the last name listed as middle name, so I fixed those.

import pandas as pd
import re

filename = 'data/Contacts/Contacts.vcf'
contacts_df = pd.read_csv('data/Contacts/Contacts.csv',low_memory=False)
n_contacts = len(contacts_df.index)

names_df = contacts_df.loc[:,['FirstName', 'MiddleName',
'LastName', 'Nickname', 'Company']]

# If there's a middle name and no last name, move the middle name
# to the last name spot
names_df.loc[(names_df["MiddleName"].notna()) &
(names_df["LastName"].isna()),"LastName"] =
names_df.loc[(names_df["MiddleName"].notna()) &
(names_df["LastName"].isna()),"MiddleName"]

First thing I did was count up the number of entries that had only a first name, only a last name, or only a company name. These counts did not include the entries where the entire name was included in that one entry (e.g. “Bob Jones” listed in the “FirstName” field). Next, I counted all the ones with only a company name, and finally the ones with out any name at all, just an email address.

#calculate various stats
first_names_only = names_df.loc[(names_df["LastName"].isna()) &
(names_df["Company"].isna())]
n_first_names_only = len(first_names_only.loc[
~(first_names_only["FirstName"].str.contains('@').fillna(False)) &
~(first_names_only["FirstName"].str.contains(' ').fillna(False))])

last_names_only = names_df.loc[(names_df["FirstName"].isna()) &
(names_df["Company"].isna())]
n_last_names_only = len(last_names_only.loc[
~(last_names_only["LastName"].str.contains('@').fillna(False)) &
~(last_names_only["LastName"].str.contains(' ').fillna(False))])

n_company_name_only = len(names_df.loc[(names_df["LastName"].isna())
& (names_df["Company"].notna())
& (names_df["FirstName"].isna())])

n_email_only = len(names_df.loc[(names_df["LastName"].isna()) &
(names_df["FirstName"].str.contains('@').fillna(False))]) + \
len(names_df.loc[(names_df["FirstName"].isna()) &
(names_df["LastName"].str.contains('@').fillna(False))])

Lastly, I counted up the number of entries that contained a comma in the first name, indicating that the name was formatted “Last, First”. And then the number that had a period, indicating either a middle initial (“Bob F. Jones”) or a first name plus middle or last initial (“Bob J.” or “Bob F.”). (There were some weird edge cases, like “Dr.” and “Sr.” — that latter was for my ex-father-in-law, who shares a name with my ex-wife. The one time early in our marriage that I accidentally confused them and texted him instead of her, it was… awkward, so he got a “Sr.” added to his first name from then on. Safe to say that’s almost certainly not generalizable!)

Last but not least, I realized there were some people who had their own names, plus a company name, and while this doesn’t display on the contacts list, it would if you clicked through, so for completeness, I counted those, too.

n_backwards = len(first_names_only.loc[
~(first_names_only["FirstName"].str.contains('@').fillna(False)) &
(first_names_only["FirstName"].str.contains("\,").fillna(True))])

n_initials = len(names_df.loc[
~(names_df["FirstName"].str.contains('@').fillna(True)) &
~(names_df["LastName"].str.contains('@').fillna(False)) &
(names_df["FirstName"].str.contains("\.").fillna(True))])

n_company_plus_name = len(names_df.loc[names_df["Company"].notna() &
names_df["FirstName"].notna()])

So now I had a rough idea of the frequencies with which different scenarios occurred in my own contacts. It was time to start generating fake ones!

I’ll tackle that in the next installment, which will get you one step closer to the twist ending you will never see coming!

Complete code for this project is available at https://github.com/stevendegennaro/datasciencefilmmaker/tree/main/contacts_generator

--

--