Data Science

The bitgrit NFT Challenge Winner’s Circle

An interview with the top of the leaderboard

Zia i.e. Holder
bitgrit Data Science Publication

--

Popular NFT avatars

bitgrit recently wrapped up its NFT Price Prediction Challenge and the frontrunners worked up until the last moment to clinch victory.

Known by their online handles, Professeur, Kakarot ui and Gozie, the three data-science aficionados sound like they’re on some kind of heist crew, but these individuals were able to build AI models to predict the prices of NFTs based on public information. No easy task.

The competition was just the latest in a series of AI-focused challenges set by bitgrit and featured a cash prize pool and NFTs for the top 15 scorers on the leaderboard.

This interview explores the winning strategies and backgrounds of this very diverse group of winners. Join us as we learn more about them and their submissions.

Why did you decide to join this NFT Price Prediction Challenge?

PR: I generally love participating in hackathons, they help me learn faster and boost my critical thinking skills. Particularly, I was attracted to the NFT Price Prediction Challenge as I saw it as an opportunity to grow and learn about the NFT space.

KA: I was aware of this platform and encountered this challenge when I was looking for some ML problems to work with.

GO: Firstly, I stumbled upon bitgrit’s website when I was searching for data science and ML competition platforms. I decided to join NFT price prediction just for the fun of it, to sharpen my data science skills, and as well hoping to win the prize money.

What was your impression of the dataset and problem statement for this competition?

PR: The data and problem statement were interesting to work on, and I even found the intended use case fascinating. Machine learning indeed has the potential to drive business growth exponentially.

KA: The problem statement was really interesting, I always want to work on problems like this. The dataset was a bit tricky, the usual methods I follow didn’t work here and I had to brainstorm about the problem to research and try new things.

GO: None. Although, the dataset was quite large in size.

Please explain your winning solution and the process you used to build it.

PR: For my solution, I used all the provided datasets. The files were simply used to build a model with which the predictions were done. I also did some very simple feature engineering and data manipulation which include:

1. Datetime feature extraction: I extracted month and year features from the creation and last sale date, I also calculated the number of days between those dates.

2. Variable mapping: I mapped the Boolean data: Openrarity_enabled, has_website, has_medium, has_own_twitter and has_discord. I did a simple mapping of “”1"” for “”True”” and “”0"” for “”False””. Then I also created a feature that sums them up which served as a rating.

3. Target Transformation: I used a logarithmic transformation to handle the skewed nature of the target variable.

4. Extra Feature Engineering: I created statistical features for Rarity score, Total Supply., Number of traits, Seller Fees, and Platform Fees. To create these features, I grouped the data by the NFT_IDs and aggregated the Minimum, Maximum, Mean, Sum, Median, and Standard deviation.

5. I also tried a bunch of hyperparameters.

KA: Feature Engineering had log transformations on numerical features, and handled date-time features as categories since it had only a few unique items followed by a frequency-based labeling (with and without log transformation). I used multiple algorithms from different feature engineering techniques and stacked the outputs. Newly, I used a method called Panel Regression with tree-based algorithms however this was done on subsets created by K-Means Clustering.

GO: My solution was a single catboost model. I engineered features such as the mean, standard deviation of the collections at each sale month and overall. I also clustered the collections Twitter data. I also created a lightgbm classification model that will predict the probability that an NFT using a set of features will be sold above the mean NFT price. This probability value was used as a feature for regression.

No missing values were handled since catboost model can handle them (this may have been a big mistake on my part)

Feature selection was done where at n models (equal to the number of predictors) were fit at each iteration where at each iteration a predictor was dropped and a model fit without that feature. At each iteration, the difference in performance between it and the baseline (using all features) was stored. Features that led to improvement of the evaluation metric were used to fit a final model. Feature selection was done using a 5-fold cross-validation approach.

This improved the performance on the public leaderboard.

Why do you think you were able to create a winning solution?

PR: I think two things that changed the game for me were feature engineering and choosing the most optimal hyperparameters.

KA: I think I was persistent and was thinking about the problem all day.

GO: I think I tried so many strategies that didn’t work so well and I had to trust my local test set data. I also researched online to get more ideas.

Did you learn anything new by participating in this challenge? If so, what?

PR: Yes, I learnt a lot about NFTs. I never really understood them before now. Technically, I also learnt that searching for the right/most optimal hyperparameter is sometimes worth it.

KA: Yes, the concept of breaking down the data into subsets and building different algorithms on each set.

GO: Trust your guts sometimes.

Did you face any problems or difficulties in this challenge? Please explain.

PR: Not really. Although I had to do a lot of research and speak to many NFT traders to understand their perception of what affects the prices of NFTs.

KA: I had only about 2 weeks of time to deal with this problem and having 3 submissions per day made it really exciting, it made sure I tried only the things I’m super confident about.

GO: Having the time to focus on the task was the major problem.

Do you use a standard step-by-step approach in data science competitions like this one?

PR: Usually when I approach a data science competition.

1. I do a fast submission (like a baseline)

2. I go back to the problem statement to study every bit of it.

3. Perform proper data cleaning and feature engineering.

4. Try several Models.

5. Tune hyperparameters.

KA: Try out with a baseline model and then build up on it. Try as many techniques as you can think of and then monitor which techniques worked or didn’t work.

GO: Standard approach is simply understanding the data by using some charts and distributions and finding some connections between features. Have a sound feature engineering technique, especially on categorical features, I use about five to six encoding techniques. Finally, build multiple algorithms and stack them in order to have a final model which works suitably well across the entire spatial distribution of the data.

Do you have any advice for newbies looking to get started on a machine-learning challenge like this?

PR: Yes, my advice would be: “always be willing to learn something new”. Be open-minded.

KA: I would say, start doing it and research whenever you face an issue. That’s how you will have so many working methods in your pocket.

GO: My advice is for them to join such ML challenge and see it as a learning process to build up their experience.

Like this article? Here are three others you may like:

Be sure to follow the bitgrit Data Science Publication to keep updated!

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit below to stay updated on workshops and upcoming competitions!

Discord | Website | Twitter | LinkedIn | Instagram | Facebook | YouTube

--

--