Modeling NFT Prices with Smoothed Isotonic Regression

Published in

The BRR

11 min readMay 6, 2022

A peek under the hood of BeyondRarity’s 2.0’s Pricing Analysis Engine

Before diving straight into the charts, we‘re going to establish a few key points to make sure you get the full value out of this inside look:
1. The info used as the basis for scoring (spoiler alert: metadata).
2. How ranking is established using that metadata.
3. How price is correlated and projected on to that ranking.

As of this writing, May 2022, typical daily NFT volume traded exceeds $200M USD, with over 100,000 NFTs transacted. Within this large and growing market, a capitalization of more than $19 billion, exist thousands of collections.

Have you ever sat staring at a large NFT collection, overwhelmed with abundance of images and metadata?

Perhaps you were debating on which item to purchase or struggling with pricing an item you wished to sell. With BeyondRarity, we aim to assist you with these decisions with collection-wide price modeling and recognition of below-market-value items.

Collections and Metadata

NFTs may represent all sorts of different digital media, including images, videos, songs, certificates, and more. For the purposes of ranking and rarity, we are typically only concerned with generative NFT collections whose individual metadata differ across items in the collection.

This is how a generative NFT collection is produced:

One starts with a variety of image layers and/or components.
Then, an image-generation software (such as the one described in Wall Street Dads’ Intelligent Design — A Technological Breakthrough In NFT Creation) will randomly choose combinations of layers and components to combine into a single image.
This is repeated thousands of times to generate thousands of random, unique images (check out A Match Made in the Metaverse).

NFTs allow for representation of not just the images alone, but also carry with them associated metadata. This metadata will be populated with information describing the layers and components that exist in the image.

For example, an image of an ape may have metadata describing the fur color, the type of clothes worn, the facial expression, etc. These different metadata are commonly referred to as “traits,” e.g. fur color trait, background color trait, etc.

Price differences within NFT collections

Traits and their values are determined by the NFT creator. For a generative collection, rather than a purely random combination of traits, creators typically target for certain traits to have a varying frequency of occurrence in the collection.

For example, perhaps an ape with gold fur is targeted to occur in only 5% of NFTs in the collection, while brown fur is targeted to occur in 50% of NFTs in the collection. In this example, gold fur is a rarer trait value compared to brown fur.

In the marketplace, NFTs with rarer and more unique trait values trade at a premium over NFTs with common and less unique trait values. However, as we’ll see below, some traits may carry more significance that others, and project communities may consider particular trait value combinations more valuable.

Generating a Single Metric to Encapsulate Rarity, Trait Weights, and Vibes

We will now shift our focus to building a mathematical model that can output a price for each NFT based on its traits and current market conditions. We begin by generating for each NFT the Vibe Score, a single metric that encapsulates rarity, trait weights, and vibes. This metric enables us to rank each NFT in a collection, beginning with rank number 1, the best and most valuable individual NFT.

Let’s get a better understanding of this with a real example walkthrough.

Navigate to BeyondRarity.com

Navigate to BeyondRarity for an interactive application showcasing the Vibe Score along with each trait, rarity, and vibes. Specifically, take a look at the Wall Street Dads collection token ranked #18 (click the settings gear in the upper right hand side).

Rarity

Take a look at the grid of 21 traits and their associated element on the right side of the image above. Each of these 21 traits has multiple possible elements that it could be. The number and percent of NFTs that have the same element for a trait is given at the bottom of each trait box.

For instance, this particular NFT has no arm tats, which is the same as 2,549 other dads, or 85% of the collection. That particular trait is quite common.

However, the hair trait has the much rarer element, long tan, which appears in only 13 dads or less than half of a percent of the collection. There are a number of different metrics available to be calculated to represent rarity, all of them involving the probability of each trait element.

Trait Weights

As previously mentioned, rarity alone isn’t nearly enough to determine value. Having a very rare, but uninteresting trait may have no value. Similarly, having a rare, but bad trait, such as low health for those NFTs used in a game would correlate with lesser value. Traits themselves must be given importance by weighing them. So, how does one determine which traits are worth more than others?

While trait weights could possibly be learned from historical price data, the NFT creators themselves are an obvious choice for helping understand which traits are more important. Currently, the trait weights are gathered directly from the creators.

Vibes

Vibes are defined as two or more traits that have specific elements that “vibe” together in the same NFT. There can be any number of vibes within a collection composed of any number traits. The NFT in the image above has all three trait elements of the Roaring Daddies Vibe.

Having a vibe provides a boost to the overall VibeRater Score. Possessing some, but not all elements of a vibe still increases the score as seen in the #19 ranked Wall Street Dad below, which has two partial vibes.

So, how do we know which trait elements combine together to form a vibe? While it is possible to use advanced clustering algorithms to automate the discovery of vibes, we currently have them provided directly from the NFT creators, just as with the trait weights.

VibeRater Score

Calculating the rarity, weighted by traits, and boosted by vibes provides the VibeRater Score, our single metric used to rank all NFTs in a collection. The VibeRater Score provides an excellent single-number summary of each individual NFT’s respective ranking within a collection. It helps NFT enthusiasts quickly navigate these collections, often containing thousands of items and countless combinations of elements.

Modeling Individual NFT Price using VibeRater Score

While having each individual NFT scored and ranked is great, it’d be even better to have a price projection for each one. In this section, we’ll build a complex mathematical model using the VibeRater Score and historical sales data to accurately price each individual NFT.

Whether you intend to model NFTs, baseball cards, homes, stocks, or any other item that can be bought and sold, you’ll need historical sales data. Thanks to the openness of the crypto markets, we can gather all historical sales data for each NFT collection.

Sparse sales data

It’s important to understand some of the dynamics of the NFT marketplace. Individual NFTs are sold rarely, even those from the most popular projects. All collections have NFTs that have never sold. For example, here we have some popular projects with the percentage of NFTs that have only been minted and never resold in the marketplace.

Bored Ape Yacht Club — 14%
Meebits — 59%
CyberKongz — 44%

Even for those NFTs that have been sold, it’s rare for there to be more than a handful of ownership changes. This sparsity of data makes for an interesting challenge.

The NFT marketplace resembles something close to a traditional housing market, where each house is unique and sold infrequently. A particular home’s current valuation is not based on its last sale price (which may have occurred decades ago), but on recently sold homes with similar features in the same area.

With NFTs, the VibeRater Score is a proxy for similarity. We can expect individual NFTs with similar VibeRater Scores to be valued approximately the same. This means that it is possible to develop a model to price every NFT in a collection while only having a small fraction of them sold.

Price Variation

Significant variation in pricing can occur within a single NFT collection. The below plot displays the distribution of sold prices as a box-plot for each day of a recent 30-day period of the Bored Ape Yacht Club.

As you can see, individual NFTs can differ by 50% or more, even when sold on the same day.

Accounting for Price Variation with VibeRater Score

This variation in price is due to NFT investors valuing the traits and vibes differently.

Below, the median price sold for both the top and bottom 20% of VibeRater Scores for the BAYC project is plotted. You can clearly see how important the VibeRater Score is to NFT collectors.

Ultimately, we would like to build a model that takes as input the VibeRater Score and outputs the expected current price of an NFT that were to be sold right now. To better show the relationship between VibeRater Score and price, a scatterplot of all NFTs sold for the BAYC is plotted below.

Choosing the model

If you’ve taken statistics courses, then you’ve likely seen a scatterplot similar to the one above and know that a type of regression model is needed.

For this data, we chose an isotonic regression, which creates a strictly non-decreasing (or non-increasing) line as the model. This means that as VibeRater Score increases, the price projection will either increase or remain the same. It will never decrease.

Here, we zoom in on the region where most of the points are concentrated and plot our modeled regression line.

Focusing in on the line generated, you’ll notice that it has several jumps and flat spots, and is not a smooth line that is soothing to the eye that you may have come to expect from other statistical models such as linear regression.

This rigidity is a product of the algorithm used to enforce the monotonicity.

Smoothing the line

To help better connect the points, a rolling average smooth is performed over the entire fit.

Why Isotonic Regression?

Isotonic regression is a rarely used model and only chosen whenever there is a known positive relationship between the predictor variable (VibeRater Score) and output (Price Last Sold).

In our case, we consulted the NFT creators themselves and therefore have expert knowledge of trait weights and vibes and know that the VibeRater Score has a positive correlation with price. Isotonic regression helps maintain this known relationship.

Accounting for change in market conditions

Our model above was built over a 30-day period of data, but as all investors know, market conditions can change drastically over a period of this size.

Take a look at the median monthly sales price of BAYC over the last year.

A mechanism must be put in place that gives more weight to the most recent data. Again, if thousands of NFTs from one collection were sold every hour, then we could just look at the most recently sold to understand the price. As it is, we are dealing with limited data, where often less than 1% of the collection changes hand each day.

See the BAYC bar chart below, which shows that most days have less than 20 sales, or about 0.2% of the collection.

It’s clear that the most recent sales have the highest importance and should be given the most weight in the model, but what is the precise definition of recent?

And how much lower should we weigh a sale that occurred one week ago versus one that occurred one month ago?

These are all open questions that require more research. Our current solution uses an “S”-shaped curve generated by the sigmoid function.

As you can see in this particular version of the sigmoid function, sales within the last few days have the most weight. A precipitous fall in weight then occurs before slowing down again. NFTs sold past 30 days are given almost no weight. Parameters are available to change the shape and slope of this function for different weighting scenarios.

The Final Model

Our final model is a smoothed isotonic regression, weighted by the length of days from the last sale. We can see a slight upward shift in price for the weighted model, as there was an upward trend in pricing during the last two weeks.

Using the Model

With this model, it’s possible to find undervalued NFTs for purchase. By perusing NFT marketplaces such as OpenSea, one can find currently listed items.

At the time of this data was extracted, exactly one BAYC item (ranked near the bottom at #9582) was priced below the projection, by 1.5 ETH.

Model Summary

Historical sales data is retrieved for every NFT for a particular collection.
Using metadata and input from NFT designers, the VibeRater Score is generated representing a synthesis of rarity, trait weights, and vibes.
Isotonic regression is chosen to model the price as a function of the VibeRater Score.
NFTs sold more recent are given more weight through the sigmoid function.
A rolling average is used to smooth the final line.

A Few Thoughts To Wrap Up

We don’t use the word ultimate lightly when talking about BeyondRarity 2.0.

At its surface, it looks like a simple ranking tool (we did this on purpose). But under the hood lies a pricing analysis engine with an algorithm that prioritizes creator input and combines it with market pricing data.

This is a precision instrument built with patient attention to detail. If you’d like the game-changing experience of having your project on BeyondRarity 2.0, visit BeyondRarity.com to sign up.