Human and Coyote Interactions: A Data Science View

Using tweet based sentiment analysis to measure the attitude towards coyote spottings

10 min readFeb 3, 2022

Introduction

Currently, there is a large information gap in human-coyote conflict reports in California. In a state with over 40 million people, only a few thousand official reports are received each year. A majority of these reports reference negative encounters (i.e seeing a coyote in your backyard or a pet being attacked). If positive and neutral encounters are occurring they are rarely reported. In partnership with the California Department of Fish and Wildlife, I explore the hypothesis that positive and neutral encounters do occur and are just not reported. Using Tweets sourced from the state of California, I intend to unravel the overall sentiment of human and coyote encounters in California.

Data Source

The data used for this project will come directly from Twitter. As the platform is used most often to share personal experiences, Twitter will be a great source for unfiltered reports of wildlife sightings.

Data Acquisition

1. Create a Developer Account

You will need an API token key and secret token key to make requests from the Twitter API. Furthermore, the Twitter API only takes HTTP requests, so to increase efficiency and scalability I used Tweepy, a python library that will make these HTTP requests for me. Tweepy includes a set of methods and classes that represents Twitter’s API endpoints and handles implementation details behind the scenes, making the process of using the Twitter API easy and convenient.

2. Get Geographical Specifics

Since the goal is to query tweets and map them to a specific location, when pulling the twitter data, latitude, longitude, and radius are specified via Tweepy’s API.search_tweets() function. For example, for all tweets within a two-mile radius of San Francisco’s Golden Gate Park in San Francisco, you would use ‘37.6749,-122.4194,2mi’.

3. Querying the API

The Twitter API returns a collection of tweets matching the parameters of the given query. Each tweet within this collection is returned in JSON format and includes the tweet’s contents, hashtags, user information, and the tweet’s metadata.

The following is an example of the tweet-level data that I found to be relevant to the project. Given that the API returns nested JSON data, I reformatted the data to be easier used for exploratory analysis as well as modeling.

One thing to note is that the standard Twitter developer account will only pull tweets from the past seven days. Therefore, I set up the queries to run every couple of days and later filtered out duplicate tweets. This allowed for a larger data set than I would have been able to create with a single data pull.

Here is how I used Tweepy to obtain the tweets.

Initial Query

California Department of Fish and Wildlife determined three coyote hotspot cities: San Francisco, Los Angeles, and San Diego. These became our initial targets. As such, I queried a 50-mile radius around these locations.

To get a feel for the data available, my initial query conducted a broad search with retweets omitted to reduce the possibility of duplicate tweets.

query = 'coyotes OR coyote -is:retweet'

Initial Query Results

My initial query returned only 422 tweets.

Many of these 422 tweets were random, irrelevant, and ambiguous. Returned Tweets often included references to school mascots and unspecified events:

“Hey Coyotes! Next week we do not have school on Thursday, November 11th. Please plan accordingly.”
“Coyote revolution has begun btw”

Tweets that were relevant to human-coyote interactions were luckily also available:

“Saw a group of about five coyotes today walking down the street, definitely hanging out.”
“There was a coyote in my backyard this morning. Why?”

Isolating Relevant Tweets

To build a data set that can be used to perform sentiment analysis on human-coyote interaction tweets, irrelevant tweets must be filtered out of the query results. This can be achieved in one of two ways:

1. Optimize the query used to pull data from the Twitter API or

2. Create a tweet classification model

I decided to go with the latter. Given that there was very little data to train a text classification model, I built a few simple models, first a naive Bayes and then support vector machines.

Tweet Classification

To begin, I first manually labeled the returned tweets as relevant for human-wildlife interactions, or not relevant. Of the 422 tweets from the initial query, 105 were labeled as relevant.

I split my data into:

Training set (75%) — Train and select candidate models via cross-validation
Test set (25%) — Assess the final model

Due to the small sample size of 422, splitting the data into a train, validation, and test set will drastically reduce the number of observations the models are trained and tested on. This brings up the possibility that the model’s performance may be dependent on how the data is randomly split. To mitigate this possibility, I did a train and test split and will use cross-validation on the training data for hyperparameter selection and final model selection. The final model will be evaluated on the test set.

Tweet Preprocessing

In order to be fed into a model, the tweets must first be processed. To do so I used spaCy, an NLP library that tokenizes and normalizes text.

Tweets are notoriously noisy. They’re filled with characters, digits, and pieces of text that can interfere with tweet classification. Noisy text will also impede other preprocessing steps like lemmatization and normalization; therefore, I need to clean my tweets first. I began by removing URLs and user mentions which are unique to a tweet, therefore, adding little to no value. Next, I substituted special characters with their associated word. For example, replacing “@” with “at” and “&” with “and”. Lastly, I removed the remaining special characters, digit tokens, and punctuation.

The next preprocessing step was text normalization. I expanded contractions such as “they’re” to “they are”, lemmatized all tokens, and removed #s in front of hashtags.

Here is how I went about creating my preprocessor.

You might have noticed that I didn’t lowercase the tokens or remove stopwords in my preprocessor. These are hyperparameters I will tweak in the countVectorizer and TfidfVectorizer later.

Converting Tweets to Numerical Values

For this project, I used word counts and the TF-IDF scores as the numeric values passed into the models.

Using the word counts consists of converting the tweets into an nxm matrix of token counts, where n=number of tweets and m=number of unique words from all tweets.

TF-IDF or term frequency-inverse document frequency is a metric used to evaluate how relevant a word is to a document in the context of a collection of documents. In the case of tweets, it measures the relevancy of each word in a specific tweet relative to the whole collection of tweets. You can read more about how TF-IDF scores are calculated here.

To calculate the word counts and TF-IDF I used the sklearn CountVectorizer and TfidfVectorizer.

Candidate Models

The models used for this project were Multinomial Naive Bayes and Support Vector Machines. Each model was trained on tweets converted to both word counts and TF-IDF scores:

countVectorizor -> MultinomialNB
TfidfVectorizer -> MultinomialNB
countVectorizor -> SVC (support vector classification)
TfidfVectorizer -> SVC

The hyperparameters used for the vectorizers were lowercasing, removing stopwords, the minimum number of times a word needs to appear, and ngram range.

SVC hyperparameters used were the kernel type, regularization parameter, and class weight since there are more irrelevant tweets.

Laplace smoothing parameter was only adjusted for MultinomialNB.

I did a three-fold grid search with log loss scoring to find the best hyperparameters. The code below is for countVectorizor -> MultinomialNB and TfidfVectorizer -> SVC.

3-fold CV for hyperparameter selection

Model Selection

After the optimal hyperparameters were found, I performed ten-fold cross-validation for each model to retrieve their loss metrics.

10-fold CV for model selection

The results of each candidate model are shown below when optimizing for log loss.

countVectorizor -> MultinomialNB: log loss = 0.457
TfidfVectorizer -> MultinomialNB: log loss = 0.524
countVectorizor -> SVG: log loss = 0.426
TfidfVectorizer -> SVC: log loss = 0.509

countVectorizor -> SVG had the lowest log loss and thus was chosen as the final model. This model produced a log loss of 0.412 on the test set.

Implementing Final Model

With the final model, new tweets can be passed in and classified as relevant or not.

Here is an example of the classifier on a random sample of unseen tweets.

Sentiment Analysis

With the relevant tweets isolated, the next step was to perform sentiment analysis. This was done using transfer learning, in particular the pretrained vaderSentiment model.

The vaderSentiment model was trained on social media text and therefore is able to give valence scores to emojis, acronyms (lol, ty, gn), emoticon (:-), :/), and more.

Metrics

Positive Score: Ratio of words that are classified as positive
Neutral Score: Ratio of words that are classified as neutral
Negative Score: Ratio of words that are classified as negative
Compound Score: Sum of valence score of each word in a tweet. Normalized to be between -1 (most extreme negative) and +1 (most extreme positive)
Sentiment Classification: Positive: compound score ≥ 0.05, Neutral: -0.05 < compound score < 0.05, Negative: compound score ≤ -0.05

Passing a tweet through the vaderSentiment model returns the metrics above. The two most important metrics are:

1. Compound score

2. Sentiment classification

These two metrics give the overall sentiment of the tweet which is needed to test the larger hypothesis.

Here is how I implemented vaderSentiment.

varderSentiment implimitation

The dictionary returned is concatenated to its respected tweet in the data frame.

Analysis

Sentiment Classification

Looking at the sentiment class counts, most were positive with negative and neutral not too far behind.

When taking a closer look at the data and I noticed that it seemed that the vaderSentiment had trouble picking up things like sarcasm. For example, the tweet, “Love to hear a pack of coyotes go absolutely buck wild like 2 minutes before we're going to take the dog for a walk” was given a positive classification, but it’s most likely this is a sarcastic statement, and in fact, reflect a negative sentiment.

Thus, I would be skeptical to conclude that the most common human-coyote interactions are “positive”.

Average Compound Score

Average compound sentiment score: 0.054
Standard deviation: 0.49

Looking at the average compound score we again come to the conclusion that the most common human-coyote interactions are positive (compound score ≥ 0.05) but just by 0.004. However, the standard deviation tells us that there is an extremely high variance. One standard deviation in any direction completely changes the sign of the average compound score.

Looking at how the average score varies over time also shows us that there is high variance.

Similar to sentiment class analysis, I am not comfortable stating the average compound score is positive.

Conclusion

The overall sentiment of tweets relating to human-coyote interactions is marginally positive. This can be mainly contributed to chance, shortcomings in the sentiment model, and the potential miss classification from the small amount of data I had to work with.

Over the period of a month, there were roughly 200 relevant tweets gathered. More data will need to be collected over time to provide a more stable and conclusive answer regarding the overall sentiment of human-coyote interactions.

Challenges

The data used for this project was collected from October 2021 to early December 2021. This posed two problems:

First, I had a small amount of data to work with which causes high variability and ultimately reduces the power of my study. Increasing the number of tweets will help reduce my sampling error and provide more valid results.

Second, only using a month's worth of data may not be representative of the overall tweets throughout the whole year. The goal of this project is to uncover the overall sentiment of human-coyote interactions. However, sentiment may vary during different months of the year. For example, coyotes begin having pups in the springtime therefore mothers may be more aggressive to protect their young.

Collecting data over the course of the whole year will paint a more complete illustration of the sentiment.

Next Steps

The current next steps are to automate the data collection and scale the project to the entire state of California.

The automation of data collection will be achieved by streaming data from the Twitter API. This can be done by creating a pipeline that processes each streamed tweet, labels the tweet (relevancy and sentiment class), and stores it in a corpus in the cloud, such as an S3.

The scaling of the project to the whole of California will first consist of querying from secondary coyote hotspots like Redding, Sacramento, and Fresno. I can then create a 10km by 10km grid of California where each square will be its own location and will be tied with census data to determine its weight to the overall sentiment.

Thanks for reading!

Thank you for letting me share this project with you. I'm more than happy to receive any feedback you may have to offer! Feel free to take a look at the project GitHub.