Targeted Advertising–The Power of Data Science in E-commerce Wealth Building

Published in

Data Science Student Society @ UC San Diego

6 min readJan 5, 2023

In the E-commerce industry, no-code web designing tools allow sellers to advertise their product(s) to a larger range of audiences. Online sales peaked during the COVID-19 pandemic as more people turned to online shopping with the lockdowns. Small-time sellers, for the first time, became capable of competing for some of this new traffic with the emergence of low-cost targeted advertising. Targeted advertising is the practice of presenting ads to a narrow demographic that is believed to be the most well-engaged with the ads. There are essentially 4 types of targeted advertising:

Contextual advertising: Target mediums where the interest of the majority aligns with the offered value of the products (e.g. putting ads for a football-related product on ESPN). This is considered the most basic form of targeted advertising.
Geotargeting: Uses geolocations of possible consumers to target them.
Social Media targeting: Uses social influences data as the means of marketing.
Behavioral: Data analysis and prediction on user behaviors.

Targeted advertising started with artificial intelligence (AI) technologies that learn from seemingly infinite customer digital footprints. Specifically, data science and AI enable sellers to dropship items in bulk to a small amount of targeted audiences through the use of advertising tools, most noticeably Facebook Ads Manager and Google Marketing Platform.

How Targeted Advertising Works

Both Facebook and Google marketing platforms utilize their in-house algorithm to determine the ad prices through a completely autonomous process called ad bidding.

Ad bidding is the practice of matching ads with the best potential customers. One of the three matching metrics is the cost per acquisition (CPA), or the amount of money per conversion (i.e. purchase of the product). This metric tells the algorithm to optimize the cost of advertising with the right audience and not just find a general demographic that matches interests with the product. The algorithm is biased towards demographics with demonstrated interests and historically successful advertisers. In other words, an ad price significantly differs from well-known advertisers to others with little popularity. The ad price is also adjusted in an auction where sellers would bid against one another. One’s bid weight is high or low depending on the number of clicks per impression (also known as ad click-through rate), the expected ad relevance, and the landing page experience.

Ad relevance (how relevant an ad is to the search query of the user) is another key factor in ad bidding and it can be determined using Natural Language Processing (NLP). Word2Vec, a popular NLP algorithm for finding similar words to a given word, can be applied to the search queries by running through the queries and providing results deemed similar. There are two main ways to train Word2Vec: a continuous bag of words or skip-gram (this article will only go into details of the later method for simplicity). A skip-gram training is a training in which a given word is used to predict another word. The model feeds on a large amount of text data in the form of vectors and returns words that are close to each vector in cosine similarity (i.e. the angle between the vectors), creating a vector space where similar words are clustered together. A cosine similarity is closer to 1 when the words are interrelated. If the target word is not predicted, the vectors are adjusted.

Figure 1: CBOW and Skip-gram training. Source: https://towardsdatascience.com/word2vec-explained-49c52b4ccb71

After being trained, relevance scores are assigned by Word2Vec to each key term between the search query and ad associated keywords.

As for landing page experience, it can simply be calculated based on factors that represent user interaction with the ad content after clicking, such as weighing time on the page, close-out rate, and on-page action.

Once the bid weight has been determined by the ad bidding algorithm, it gives the maximum bidding price. Because customer interests influence product acquisitions, the maximum bidding price is determined by the product of the expected selling profit and the predicted likelihood of the user clicking on the ad.

Neural Networks: An Advancement of Targeted Advertising

Every efficient algorithm needs to consider the possibility of all types of targeted advertising. Neural networks come into play when algorithms like Word2vec, albeit work great for contextual targeting, fail to meet the ultimate goal of targeting a wide range of individuals who do not overtly search for these products. A Neural network is supervised machine learning, a type of machine learning that needs labeled outputs to train the model. In other words, the algorithm needs to know if the ad generated a click or not. If a prediction is wrong, the weights of the internal nodes are adjusted based on the principles of gradient descent. In short, gradient descent is the process of fine-tuning the algorithm, by appointing a cost function to the output layer, which will then be used to measure the error. From calculus, taking the negative gradient (the derivative of each of the nodes and biases) would be a vector indicating how to adjust each node to decrease the function the fastest. The nodes would then be adjusted in the algorithm based on this gradient until it “descends” into a local minimum.

A neural network can take in a lot of data points and use those to make predictions. In this case, user data like geolocations, usage times, and browsing history are the model’s inputs, located in the input layer. The weight of each node (i.e. data point) of user data will be assigned to form a linear combination of inputs in a new layer of the neural network. The nodes go through an activation function and their values in the next layer are formed before they are used in the next layer. These nodes would be repeatedly weighted and form new nodes via linear combination. If given large enough datasets, we can have additional “hidden” layers for the model to produce better outputs with the final nodes in the output layer, the final layer. The outputs will give the information on whether the ad is clicked or not.

Figure 2: Neural Network: https://towardsdatascience.com/applied-deep-learning-part-1-artificial-neural-networks-d7834f67a4f6

Conclusion

Initially, ads had to be targeted at singular common points of interest. Word2Vec algorithms optimized this approach by creating a method to quantify the similarity of keywords. Neural networks took this a step further by making ads target the person rather than the search. A neural network can take large amounts of data and make a prediction on how well an ad will perform. Of course, the downside of this is it has to be trained with that specific ad rather than using a pre-trained algorithm such as Word2Vec. The upside is it can find demographics invisible to the human eye and form connections that could never be possible with old-school conventional advertising.

Unfortunately, this ability comes at a cost of privacy. With an abundant amount of data points available for the majority of the population, the ad audiences can be very specifically targeted. This has already been widely implemented in web advertising, but a concern for privacy is raised when one’s digital footprint becomes inescapable with the growing prevalence of smart devices, and all signs are pointing to a future where every part of our lives is quantified.

Targeted Advertising–The Power of Data Science in E-commerce Wealth Building

Written by Liam Manatt