In Search of a Fantasy Football Name

Brian Benbenek
The Startup
Published in
6 min readSep 14, 2020

--

Looking for a new fantasy football team name? I extracted over 500,000 fantasy football team names to help find your next team name! (Shameless plug: FantasyNameSearch.com)

Background

I am the type of person that likes to build things. I am not the typical creative-type nor am I the most witty or punny person you’ll meet. So in a sense, this project was a match made in heaven. I created an app to fake my way into clever team names. I’ve never been more ashamed and proud of myself at the same time.

Gone are the days of needing to search through countless blog posts to find a new team name. Through my app, users are able to search through actual fantasy team names based upon players on their own fantasy team or other keywords. For example, I have Todd Gurley and Saquon Barkley on my fantasy team. Searching for “Gurley” or “Saquon” results in team names such as “A Gurley Has No Name” and “Obi Saquon Kenobi” (among other more vulgar results) along with thousands more to choose from. I created FantasyNameSearch.com to make it easy for like-minded, wit-deprived people such as myself, to find their next team name.

That’s enough soapboxing, let’s get to the meat of the discussion shall we?

Data Extraction

The data was extracted from the Yahoo Fantasy Sports API using Python. If you’ve ever tried to use it, it’s not the most well-documented API, but in the last few years there has been a significant increase in the 3rd-party utilities to be able to more easily connect and extract data. Python API wrappers such as YFPY have helped to make it more accessible and also helps to clean up the messy JSON response objects.

As a disclaimer, I am not affiliated with Yahoo. Their terms of service require that all data extracted from their API is acknowledged as such. Carry on!

To get started with minimal effort, the following code will get you a response from the API. You’ll need to sign up for a Yahoo developer account, create an app, and generate access tokens before executing this code.

from yahoo_oauth import OAuth2
import json
oauth = OAuth2(None, None, from_file=’./auth/yahoo_api_creds.json’) #access/secret API tokens
if not oauth.token_is_valid():
oauth.refresh_access_token()
game_key = '390' # 2019 game key
league_id = '123456' #whatever league you want to access
url = 'https://fantasysports.yahooapis.com/fantasy/v2/league/'+game_key+'.l.'+league_id+'/' #or other resource urlresponse = oauth.session.get(url, params={‘format’: ‘json’})
raw_response = response.json()
# Do what you want with the raw JSON

Building upon this foundation, we are now able to access all public league data and any private league data that we are a member of. I’m not saying you should, but if you wanted to, you could loop through every possible League ID and extract all the data that is available. That’s what I did at least. It took damn near five weeks to go through million+ League IDs one by one, but boy was it satisfying when it finished.

Data Preparation

Now that we have a list of all the public leagues, we can re-use the same script above to extract League and Team names simply by changing the URL input. The Fantasy Sports Yahoo Developer Guide provides a lot of examples for the types of information you can extract with different URLs and combinations of elements. The examples are written in PHP, but the URLs are ubiquitous across all program languages.

Once we have a list of the names from each league, we can clean and normalize them to make the searching easier and faster. First up is removing unwanted characters from the name strings. Surprisingly (maybe not) there are a LOT of team names with emojis. Although useful in their own right, for our context we’re not interested in them. We’re also not interested in special characters that could break our code like “\” or “*” or any others that could skew normalizing the names. “❤️Josh Allen’s Shorts!!!❤️” and “Josh Allens Shorts😍” should really be the same name, but all the extra characters make it hard to determine that via an algorithm. (Is Josh Allen elite? Yes.)

import emoji
import re
def clean_text(word):
chars_to_remove = ['*', '+', '.', '!', '@', '#', '$', '%', "’", '^', "'", '(', ')', '{', '}', ';', '~', '=', '|', '?', '[', ']', '¿', ]
rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
word = re.sub(rx, '', word) #remove the list of chars defined above
word = re.sub(r"'S", r"'s", word) #one-off replacement for apostrophe-S
split_word = [letter for letter in word.split() if not any(i in letter for i in emoji.UNICODE_EMOJI)]
word = ' '.join(split_word)
word = word.strip()
return word

Next we want to remove some bad words and limit the searchable names. I removed any racial slurs or homophobic language that I could because who really needs to perpetuate that type of ignorance in 2020? I also removed any names that were less than four characters long because the search algorithm splits the data into trigrams (sub-strings of 3 characters) and keeping them in the data causes lengthy searches due to the higher 100% match rates associated with 1–3 character strings. Cleansing code is below:

from tqdm.notebook import tqdm # for nice progress bars in Jupyter Notebookdef format_words(list_x):
master_list = []
for word in tqdm(list_x):
bad_words = ['bad_word_1', 'bad_word_2'] #curses, slurs, and other bad words
if any(bad_word in word for bad_word in bad_words):
pass
if len(word)>3: # Dont add 3-character words or duplicate words
master_list.append(word)

return list(dict.fromkeys(master_list)) # A set, while useful, is not ordered, we want to preserve order while also removing duplicates

Name Search

Determining the best way to search for a name brought me some real grief. Fuzzy matching is not an easy thing to do, especially on a large corpus of textual data. I first started with the FuzzyWuzzy library. It uses Levenshtein distance to calculate the “closeness” of one string to another. FuzzyWuzzy does a great job at matching on a variety of different methods including partial string matches and exact phrases but the downside was that it’s slow. On 500,000 names, the average search time was between 25–35 seconds. Imagine if your preferred search engine took that long? You’d throw your device out of the window. I know I’ll never be able to rival the speed of Google searches, but I could do better than 25 seconds at least!

Enter rapidfuzz. Because FuzzyWuzzy is on the GPL2 license and the associated limitations of it (I won’t pretend to know the intricacies), rapidfuzz was developed to keep the search function on the MIT license. It also implements faster C++ code to reduce compute time. Due to that speed increase, my searches dropped down to an average of 8 seconds, which in my use-case is much more manageable.

In order to limit the input length and character set, I first implemented a check using regex to verify that the search terms contains between 3–20 alphanumeric characters including white space. The code I used is below:

from rapidfuzz import fuzz as rapid_fuzz
from rapidfuzz import process as rapid_process
search_word = "Thielen"def bad_char_search(strg, search=re.compile(r'[^A-Za-z0-9 ]{3,20}').search):
return not bool(search(strg)
def generate(search_name):
for name in rapid_process.iterExtract(re.escape(search_name), team_data, score_cutoff=75):

yield name[0].encode('utf-16', 'surrogatepass').decode('utf-16') # encoding certain characters gets funky when displaying on HTML
for name in output:
print(name)

I actually developed the website to stream the search results in real-time, which is why I created a Python generator and the need to use a For Loop to print the results. Constructing it this way allowed me to immediately start loading results into the HTML tables without needing to wait for the entire search to run. When the search completes, the user will have plenty of names to sift through before they reach the end.

Coming soon: In-depth name analysis, more years of data, and more sports! Stay tuned.

--

--