Building the future of knowledge sharing — A closer look at Lunyr’s advertising system
Time is moving forward and so is the progress on Lunyr’s components. Today we’ll take a closer look at the Lunyr advertising system.
The Lunyr ad system uses powerful technologies like IPFS and Deep Learning. Like BitTorrent, IPFS uses a Distributed Hash Table as the underlying technology for decentralized data storage. Deep Learning is a branch of machine learning that uses neural networks with multiple layers (like your neocortex) to learn hierarchical representations of data. You can do lots of cool stuff with deep learning, from image recognition, to simulating hallucinations, to magically applying an art style from a painting onto your photos. See this video for a presentation on some of the more philosophical aspects of neural networks.
So let’s get down to the design
What does it look like behind-the-scenes when advertisers participate?
Advertisers spend LUN to purchase impressions on pages that are matched with ads for relevance. They submit a quadruplet (A,K,B,G) where
- A is a textual ad
- K is a list of keywords with which they’d like to be associated with that don’t appear in A
- B is the maximum amount they’d bid, in LUN per 1000 impressions.
- G is the total budget for ads in LUN
Keep in mind that LUN are divisible up to 18 decimal places, so you can send 0.123456789123456789 LUN if you want. Advertisers call the LUN Pool contract on the blockchain, giving the hash of the ad+keywords as well as their bid, and they transfer their budget G of LUN to the LUN pool. (Advertisers must purchase LUN to advertise on the Lunyr platform). Advertisers also send (A,K,B,G) to the ad auction module, which cross-references the blockchain, and then computes an ad rank
Using our word embedding model, we can associate each document (collection of words) with a vector whose distance from other document-vectors represents its semantic similarity to those documents, so we can define a function relevance(doc1, doc2) that returns a number indicating the relevance. We can use this function on A concatenated with K (A | K) and each web page to get a relevance score. We then combine this relevance score with B to determine the rank of the ad.
Impression price = the amount your nearest lower competitor pays / your quality score) + a small number.
How do we update content?
IPFS has a name service IPNS, similar to DNS. Basically it inserts a layer of indirection between DNS and IPFS, allowing us to make a permanent DNS record pointing to our IPFS node’s id,
app.lunyr.com -> ipfs.lunyr.com/ipfs/QmdhR251bTBD6d9jNjneBtnVbd4aqbSStKsS336mbh9LMu
And then we can have an association between our node’s id and the latest content, which we can update very easily:
ipfs name publish <new content hash>
When anybody requests our node’s id, IPFS will automatically search for the content hash associated with that name that has the largest sequence number (i.e. the latest).
What is word embedding and why do we want to use it?
The idea behind word embedding is that words that are used in similar contexts probably have similar meaning, so if we train a neural network to recognize when words are in context and out of context, then that network will encode a lot of semantic information. The reason this works is that the notion of context is really flexible, and simply represents what the geometry of the media-vector-space *should* look like locally. So if we have known matches of context (via peer review), and it includes images relating to text, then we could train another neural network to associate vectors with images that are close to the word vectors for words that are in context and far from word vectors that are out of context. We can do this with any media. We can even do it with other languages by taking known pairs of synonymous words and treating that as the notion of in-context.
In a nutshell, this technique is both state-of-the-art and very flexible. It was developed at Google as a marriage of old Natural Language Processing (NLP) ideas with new neural network ideas. It has been demonstrated that the word embedding technique doc2vec does very well on identifying duplicate questions in Q&A forums. This is essentially what we want to do: determine the similarity between bodies of text.
How do we use word embeddings?
We have a database of each document, its IPFS hash, its last hash (documents are edited), its latest vector, and the model version that produced the vector (the model may be retrained). Additionally, we have an R-tree (or something like it), which is a way of storing a large number of vectors in a hierarchy of rectangles in such a way that it makes it fast to look up the nearest neighbors to a given vector. We compute the vector corresponding to the text ad, and then use the R-tree to look up the nearest N neighbors to that vector. We then look up the document hashes corresponding to those vectors in the database. This gives us a list of the N most relevant pages for that ad, which we can sort by distance from the ad vector. The quality of an (ad, document) pair is essentially how close together their vectors are in this vector space. This then gets combined with the bid amount for that ad and the bids for other nearby ads to determine the ad-rank. We store the word embedding model on IPFS so anyone who wants to audit the process may do so. We periodically recompute the vectors for the documents to account for changing content.
How do the ads get served?
What is the Ad Repository?
What is the Ad Performance Module?
The ad performance module records impressions and clicks for every ad, so that advertisers can view the performance of their ads.
What is the Ad Auction?
This is the engine that determines ad rank, so it interfaces with IPFS and can view the ads and bids that advertisers submit.
What is the LUN Pool?
The LUN Pool stores all the LUN that advertisers pay, along with newly created LUN. These tokens are distributed at the end of every pay period to Lunyr and the contributors in proportion to the CBN they earn.
The overall system design will be revealed soon. Stay tuned for more details.