How to accurately estimate the price of individual NFTs — Part 2

Published in

NFTBank.ai

8 min readApr 13, 2022

Recap of Part I & Preview of Part 2

Many rely on an item’s latest transaction price or the Floor Price of the collection for NFT valuation. However, they do not accurately capture the value of an item because there is a significant variance within an NFT collection and high sparsity in the NFT trade dataset. (This is comparable to challenges faced when buying real estate, particularly those with few transaction records.)

Despite the challenge, this is a crucial problem to solve. Accurate NFT valuation — NFT price estimates close to the transaction price — gives valuable information to NFT collectors, whether they are managing their portfolio or assessing opportunities to buy or sell NFTs.

Our previous article gave you a sneak peek at how NFTBank solves this problem. We introduced some of the difficulties in constructing a valuation model or the conventional time-serial model on NFT projects with low transaction volume, which leads to sparsity in trade datasets and extreme price change.

We also discussed the nuances in estimating NFT price in bundle sales, which occurs quite frequently, and how categorical traits value can be converted into meaningful numerical data, not just mapping one variable to one constant such as “Gold” mapping to 1.

We developed our baseline model that guarantees a highly accurate price estimation for hundreds of different NFT projects that address these difficulties and nuances. This model is what allowed us to serve numerous users and partners.

Since the model described in Part I, we have tested and applied various state-of-the-art ML models to improve model performance. In doing so, we have made significant strides in prediction accuracy. In this article, we share what the improvement looks like, the additional problem we needed to solve, and how we solved it to get to where we are now.

What Effective NFT Price Estimate Looks Like

Let’s address the most important question before we start: how effective is the NFTBank’s price estimate? Take a look at the predicted values and actual sales values of the BAYC collection from 2022–02–01 to 2022–04–06; the blue box plots indicate our price estimate values, whereas the red box plots indicate actual sales price. You can see our predicted prices are in close range with the actual sales price, even when certain traits such as ‘gold fur’ deviate far from the average.

How did we get here?

Low volume problem & high price volatility continued to pose challenge

The low volume problem as well as the high price volatility problem still lingered as challenge in applying ML-based prediction to NFT assets.

Low volume problem led to over or underestimation of NFTs with rare traits. This is because few unordinary trades can skew the model significantly. This turned out to be the greatest challenge, which we describe further below.

High price volatility led to incorrect estimation due to the nature of the ML model. ML model learns through past data and if a sudden movement in price occurs, it takes time for the model to absorb and learn, which results in lagging estimation.

Solving the low volume problem enabled us to better predict values of rarer NFTs which can drastically affect asset efficiency of super rare NFTs, while solving for high price volatility allowed our model to more closely track actual price movements. In this article, we’ll focus on the low volume problem, and address the high price volatility problem in our next one.

Solving low volume problem in NFT ML prediction with Trait Reconstruction

We’ve found that the key to handling the challenge of low transaction volume is tracking and processing price information of “similar” items. In other words, we need to follow the prices of comparable NFTs that are sold, determine how similar they are, and find appropriate prices. This is important because if similarity between items can be determined, the information from sparse data can be approximated with that of similar items.

In finding the similarity between NFTs, we worked with the most intuitive but core key principle that the most critical factor in determining the price of an NFT is the “trait”.

Here is an oversimplified version of what we do:

decompose the traits of NFT items
regroup NFT items by trait
compute the price statistics per each group
reconstruct the value of an NFT item using the computed statistics

It may look simple and pretty easy to implement, but enabling a model to recognize and learn the principle may not be as simple as you think. For example, just putting the trait in the model often causes the under-estimation problem since most of the transactions stay near the floor (which leads the ML model to estimate floor values; Because it is the efficient way to minimize the objective function).

In the NFT market, Steps 1–3 mentioned above can be easily solved by thinking of traits as properties. We can simply calculate the statistics for each trait by subdividing the trait possessed by the item. However, the challenge is in properly collecting the statistics for individual traits and turn them into the stat that can represent an item. In other words, we need to figure out which traits significantly impact price and then reconstruct trait information into items appropriately. Just because a few NFTs with ‘Orange’ background got sold at high price, it does not guarantee that all other NFTs with the ‘Orange’ background will also sell at high price.

Let’s take a look at a real life example. In BAYC, apes with “Solid Gold” fur have a high price range compared to apes with “Brown” fur. It implies that the process of decomposing the traits of NFT items (Step 1), regrouping NFT items by trait (Step 2), and compute the price statistics per each group (Step 3) can be solved directly.

Time-serial price trend of apes with two traits; (Red) Solid gold, (Blue) Brown; x-axis: timestamp, y-axis: log(price)

However, the problem arises in (Step 4): how to reconstruct the value of an NFT item using the computed statistics? For example, consider apes with “Bored unshaven” and “Bored pizza” mouth that is possessed by 3690 apes and 30 apes, respectively. Looking at the price trend for NFTs with each trait, we see that most of the times, the price of NFTs with the “Bored pizza” are higher most of the times, but at times, NFTs with “Bored unshaven” are sold at higher price than the other.

Time-serial price trend of apes with Bored unshaven and Bored Pizza mouth: x-axis: timestamp, y-axis: log(price)

The points where the inversions occurred were when those NFTs had other valuable traits such as “Solid Gold”, “Trippy”, or “Black suit” apes. This example pinpoints the difficulty in addressing step 4, that is, effectively buildIng item information from a set of trait information.

How did NFTBank solve it?

TL’DR

We solved this challenge by devising a model that automatically detects key traits and brings together individual trait information to form item-specific information. We introduced the logic into the model and confirmed that the principle was helpful for the performance through numerous experiments. For example, the graph below shows the effectiveness of the trait reconstruction we’ve done. You can see that the predicted price (red line) closely follows the sold price trend, much more effectively than the project’s floor price trend.

Time-serial trend of item price prediction based on trait reconstruction; x-axis: timestamp, y-axis: log(price); (blue): sold price, (red): item price prediction based on trait reconstruction, (green): project floor price

Full Version

In general, in machine learning territory, response variable is often transformed with logarithmic, power transformation, or further, Box-Cox transformation. The transformations have a variety of purposes, but generally they want the data to satisfy some statistical assumptions. For example, in a pricing modeling, you can adjust the skewness of the target variable distribution by using logarithmic transformation.

We found that the price of NFT is not only skewed even if logarithmic transformation is taken, but also there are many cases where multiple modals occur. Thus, we developed a new method inspired by forward variable selection, and Box-Cox transformation to gather statistics from each trait.

The forward variable selection is one of the traditional statistical variable selection method. It begins with a model that contains no variables, in our case, traits. Then, it starts adding the most significant variables one after the other until a pre-specified stopping rule is reached. Here, we defined the stopping rule as the mean absolute error (MAE) that is used equally in the objective function of the model.

The approximated item price is constructed through a weighted average of information of the selected traits (in this case, the weight is applied differently depending on the time, and the more recent the data is, the larger). Therefore, the approximated item price calculated through our transformation behaves very similar to the actual selling price.

The strength of our ML system is able to correct the sparsity by grasping and considering all trends in each trait. In other words, our model can estimate the item with the transaction has not occurred at all, or even if the same combination of traits has never been sold.

Limitation

Our fundamental approach is to approximate the price of the item to the price of the trait. Naturally, it is impossible to estimate the trait that has never been traded (but, who knows? we may correct ourselves 😉 ). It also implies that our method is vulnerable to wash-trading scenarios. If the same NFT has been traded many times with incomprehensible (too-high or too-low) price, our trait information will be affected no matter how robust it is. We had experienced a temporary decline in model performance due to the wash-trading as we thought.

Conclusion

The model performance of NFTBank is constantly being improved through numerous experiments and efforts. All team members are working day and night to make NFTBank the most effective tool for managing NFT assets. A big thank to our users for supporting us and giving us great feedback. This is a huge motivator for us and means a lot to us.

Please reach out to us here if you want to use our Price Estimate for your product or service. We love to partner!