Exploring Public Venmo Transactions

By: Nick Barker

Nicolas Barker
12 min readJan 31, 2019

What if I told you everyone was able to see every payment you made? The transactions we make on a daily basis can be a direct line to our personal behaviors, habits, and preferences. The value of these insights goes far beyond simply knowing how much we spend. The frequency at which we make payments, who we make payments to, and what we make payments for are only a few of the insights that if stitched together, could highlight trends in our spending and preferences that we did not even realize existed. This paper will expose just that.

Venmo is a digital wallet platform that facilitates electronic peer-to-peer payments. This means that it makes it easier to pay Johnny back for the beer last week (or Johnny can send a request for money unfortunately) or pay your babysitter at the end of the night when you don’t have cash. Venmo users have to create an account on the platform with a username and password, either through their own email or an existing Facebook account. This enables users to add other users as friends and create a similar social network as on other social media platforms. This network of friends is then used to create a news feed that displays the payments that your friends are making to others according to their account settings. For each transaction, users are required to include a message along with the transaction quantity. The message is meant to explain the purpose of the transaction but is frequently a field that only hints to the real nature of the payment through inside jokes, emojis, and other innuendos.

The purpose of this analysis was meant to make sense of what seems to be non-sense or random payments but is rather a treasure trove of data. The first part explores the data and the different components of a Venmo transaction. The second part is an attempt to understand each transaction’s message in order to be able to cluster like-transactions to understand what types of payments users are making on Venmo.

The Public Venmo API:

All Venmo transactions are public by default.

Only once a user changes their account settings to hide their transactions from the public are they then made private. These public transactions can be obtained via Venmo’s public API here. There are use cases that mentioned that via other arguments (“since” and “until”) within the API call, you were able to retrieve historical transactions. However, since this was discovered and bots were created to exploit this feature, Venmo has since limited its public API call to return only the last 50 most current transactions. While accessing historical transactions no longer seems to be possible, being able to gather current transactions still provides a wealth of insight into the types of payments that are made by Venmo users everyday. This data was gathered over the span of several days around Halloween in 2018.

In addition, there are rate checks that limit the number of times you can ping the Venmo API. For the purposes of this analysis, it was not critical that every transaction was captured in sequential order, thus the script that gathered the data only pinged the servers every 30 seconds (I assume this is well below their rate limit because I was able to let the script run for multiple days in a row).

A Venmo Transaction:

Each Venmo transaction is structured the same way and contains the same type of data. The following fields are a selection of the available features that will be used as part of this analysis:

— — — — — — — — — — — — — — — — — — —

Payment_id: A unique ID created for each Venmo transaction

Actor: Contains the following fields that pertain to the account that initiated the transaction

Actor / username: The username of the user

Actor / is_business: Signifies whether it is a business account or a personal account

Actor / name: The full name of the user

Actor / first_name: The first name of the user

Actor / last_name: The last name of the user

Actor / date_created: The date on which the user’s account was created

Actor / id: The unique id of the user

Transaction: Contains the following fields that pertain to the account(s) that are the recipients of the transactions

**Depending on the type of the transaction, the recipient(s) can either be the recipient of a payment or a request. There can also be more than one recipient. **

Transaction / target / username: The username of the user

Transaction / is_business: Signifies whether it is a business account or a personal account

Transaction / name: The full name of the user

Transaction / first_name: The first name of the user

Transaction / last_name: The last name of the user

Transaction / date_created: The date on which the user’s account was created

Transaction / id: The unique id of the user

Created_time: The time and date of the transaction

Message: The field that is supposed to contain a description of the transaction

Type: Signifies whether the transaction was a payment or request

— — — — — — — — — — — — — — — — — — —

Each of the above fields were captured for every transaction that the API call returned. Personal information such as first and last name were only used to predict the gender of the user. For the purposes of the analysis, the transactions were collected and saved in a CSV file so that the data gathering script did not have to be run at each time.

Processing Emojis:

An important characteristic of Venmo transactions are the emojis that are used to describe its purpose. Emojis are graphical symbols and pictures that users are able to make as part of the text in their messages. This occurs for multiple reasons including a more creative way of conveying the meaning or a discrete way to describe an illicit or intimate transaction. Since emojis are regular text and can be used within the message field, it is critical to be able to parse out the emojis since often times they are the most meaningful parts of a message.

Emojis are images that can be included as regular text because they also have a text representation. The Unicode-text for each emoji can be found here. The text representation of emojis follow a consistent structure. Rather than trying to describe it, below is an example of the Unicode-text of similar emojis:

As seen in the structure above, emojis are composed of strings separated by the “\” symbol. For description purposes, we will call these strings “segments”. Similar emojis often contain a similar subset of segments and only differ on one segment. In addition, some segments can be considered “descriptors” rather than object. In the example below, all the emojis described are very different objects but have the same characteristic of “medium_dark_skin_tone” represented by the “U0001F3FE” segment at the end of Unicode text. Now compare this to the example above which is all different types of adult emojis.

Given the structure of the text representation of the emojis, a tree was constructed in order to organize the emojis. The emoji-tree can be defined by the following characteristics:

- There is one root node for the entire tree

- Each node in the tree, but the root, corresponds to a segment

- The first segment of each emoji is a child of the root node

- Any subsequent segment in an emoji’s textual representation corresponds to another child for the specific segment

- The number of segments for each emoji corresponds to the number of levels in the tree plus one due to the root node

- English interpretations of a given emoji was stored as the “value” of the node that represented the last segment in the Unicode text

Visual representation of the emoji tree:

Parsing Emojis:

The tree structure coincides with the modularity of the emoji’s structure and thus facilitates the parsing of emojis within a body of text. The first step in the process was extracting the Unicode segments from the rest of the message. This was done using a simple pattern recognition script that identified the Unicode segments based on their syntax. Once the segments were extracted, the tree could be traversed segment by segment until the next segment was not a child of the current node. This represented the end of one emoji and the beginning of another. In this event, the current value of the node was deemed to be the emoji used and the process was repeated from the root node until there were no more segments.

Exploratory Data Analysis:

Gender Classification of Users:

Given that Venmo only provides the first and last name of the users, we wanted to predict the gender of the users to determine if there were any differences in how males engaged with the platform vs. females. In order to do so, we used a corpus of English names with corresponding gender. There were some assumptions that were made in regards to the typical gender of a name which were accounted for by manually changing the mapping. A list of these names can be seen below. Obviously, this does not allow us to predict the gender for every name, but for the purposes of this analysis, it was sufficient.

Initial gender distribution of the user that initiated the transaction before manual mapping:

Male = 28.81%

Female = 37.57%

Neutral = 20.65%

Misc. = 12.96%

New gender distribution after manual mapping of below names to genders:

Male = 34.31%

Female = 43.41%

Neutral = 9.32%

Misc. = 12.96%

While it cannot be said for sure due to names that cannot be mapped to a gender or unisex names, the results above suggest that more transactions seem to originate from female users than male. This conclusion could be used to understand who engages with the Venmo platform and segment user behavior based on gender.

Transaction Frequency of Users:

An analysis of the frequency of transactions for unique users can also highlight how much Venmo rooted is into an individual user’s daily life.

Transaction Frequency based on the full name of the user:

As seen in the above charts, most users only made a single Venmo transaction during the time period that the data was collected. However, there are some users that made 10 or more transactions which suggests that Venmo is highly integrated into their daily habits. One person even made 44 transactions!

Emoji Analysis:

The analysis of the emojis used in the messages of Venmo transactions can shed additional light on the purpose of the transaction. The top 25 emojis used in Venmo transactions can be found below:

The tokenization of the messages that contain emojis enables a better interpretation of the meanings of emojis. The primary example is the relationship between the “Money with Wings” emoji and the “House” emoji to represent a payment of rent. Without the other emoji, it would have been difficult to interpret either emoji independently as a rent payment. Additional significant emojis included those that were representative of transactions made for food, drinks, utilities, Halloween, and sports (expected to be regarding betting and fantasy football).

Transaction Clustering:

A deeper analysis of the transactions was needed in order to be able to extract meaningful insights that speak to users’ engagement with the Venmo platform. To accomplish this, the transaction’s message was deemed to be representative of the purpose of the transaction. While a logical assumption in practice, this does not always hold true as some transaction messages are completely irrelevant or random. This seems to occur given the social aspect of Venmo that allows users to interact with their friends.

In order to cluster like-messages, the features of the text needed to be extracted. This was done by creating vectors for each of the messages using the Doc2Vec model. Doc2Vec builds upon the Word2Vec model that quantifies a document, or message, by building a vector based on the words in the body of text. A more detailed explanation of Doc2Vec can be found here.

Preprocessing:

In order to properly tokenize the messages, “stop” words had to be removed in order to ensure that only meaningful words were included in the vectors. The corpus of English “stop” words from the NLTK library, (a Python library for natural language processing), was used as the initial set of “stop” words. However, additional slang and informal versions of the original words had to be added to the corpus.

Additions to the corpus of “stop” words:

Once the tokens were created and lemmatized, the ones that represented the Unicode text for the emojis were replaced with English word-representation. Colons were used to distinguish an emoji representation from an English word that was written as regular text. For example, “pizza” is interpreted as the English word for the type of food while “:pizza:” represents the emoji for a pizza slice. This is useful when interpreting the results of the most similar “words” for a given entity.

K-means Model for Clustering:

The elbow method was used to determine the optimal number of clusters in our data. The goal of the elbow method is to find the smallest number of clusters that minimizes the distortion value of the model. To create the graph, the distortion value was plotted against the number of clusters. As seen in the graph below, the optimal number of clusters for this dataset was 5.

Interpreting the Clusters:

Once we knew the optimal number of clusters in which to organize our transactions, a Latent Dirichlet Allocation (LDA) model was used to derive a “topic” or “structure” for each cluster. The vectors that came out of the Doc2Vec model were used as the input to the LDA model and the number of topics was determined by the optimal number of clusters from the Elbow graph. It is important to keep in mind that the Venmo transaction messages can almost be considered “random” bodies of text (both topically and structurally) when attempting to classify them into a limited number of topics. The results of the LDA model are below and we can see that most of them coincide with meanings of the top emojis used:

Most-Similar Results:

In order to further understand the meaning of each cluster/topic, we identified the top 10 words that were most similar to each cluster topic. The topics derived from the LDA model above were used to determine which key words were to be used in the “Most Similar Word” analysis below (among other words):

Closing Remarks:

The topics of discussion in this analysis highlighted many underlying trends in how users engage with the Venmo platform and concluded that the most common types of transactions were for rent/utilities, food/drink, and Halloween!

Furthermore, we were able to derive insights into the composition of the user base, highlight trends in the frequency and types of transactions between users, and identify the top 25 emojis used in the transaction messages.

However, the inability to access historical transactions is this work’s main limitation. As is the case for many studies, not being able to apply this analysis on a wider scope of data prevents us from understanding how the trends we identified changed over time. We are only able to comment on what kind of users engage with Venmo and how for a certain period of time. Understanding how these trends change over time could help explain how user preferences and habits evolve and help set the direction for the platform moving forward. Overall, the experience of this analysis showcased that valuable insights can be derived from user interaction data, but reinforced the importance of access to historical data to understand the evolution of user engagement.

--

--