Predicting Magic the Gathering Card Prices

6 min readNov 22, 2019

Most trading card stores and online stores will allow you to sell your cards for either in store credit, or real world currency. However, not all cards have a price tag, usually due to a low supply of currently selling cards or having been recently released. Using predictive modeling, we can use prior cards attributes, we can predict a price for future cards or cards without a known price.

The Data

To begin, I created a data set using the card sets that are allowed in the standard legality. The cards were obtained from scryfall. There was around 30,000 cards in total, which was filtered by legality. This left around 1300 cards to be used in my model. From there, I selected the relevant columns, which were the attributes of the card:

price, the price of the card in USD
rarity, the rarity of the card: common, uncommon, rare, mythic
color_identity, the cards color identity
colors, the colors of the mana to cast the card
mana_cost, the mana cost, including colored mana, to cast the card
cmc, the converted mana cost, or total amount of mana to cast the card
power, the power of the creature
toughness, the toughness of the creature
type_line, the type of card it is
released_at, the date the card was released
flavor_text, the descriptive text. Usually lore of something of non-importance to the actual card.

Metrics and Splitting

I tested multiple different types of models. To evaluate them, I used 3 different metrics: mean squared error (MSE), median absolute error (MAE), and the coefficient of determination (r²) The first two metrics are used to evaluate how erroneous the predictions are, while the latter is used to determine how much variance is explained by my model. I used a three way split for validation in my model. The training, validation, and test sets were 64%, 16%, and 20% of the data set respectively. Because the prices of the cards were very left skewed, I performed a log transformation on the price column of the dataset.

Baseline

For a baseline, I fit a dummy regressor with the train set. The dummy regressor simply predicted the mean value of the train set. I then created predictions for the validation set and evaluated the model. Unsurprisingly, the dummy regressor didn’t do a very good job and predicting the values. Its resulting metrics for the validation set were:

Val_score MSE:	0.26426934298718585
Val_score MAE:	0.27470715285087466
Val_score r2:	-0.003255598039607177

Obviously this regressor is not good as the range for coefficients of determination are from 0 to 1 and it had a negative score. Also the error is relatively high.

Model Testing

I tested 4 different models: linear regression, decision tree regressor, random forest regressor, and gradient boosting regressor. I began with the gradient boosting regressor as I thought it would do the best. I have ordered the results in what I believe to be least to most complex model.

Linear Regression

Starting with a linear regression. A simple linear regression had the following results:

Val_score MSE:	0.141941338972894
Val_score MAE:	0.18028352351129165
Val_score r2:	0.4611427821777332

As you can see, even a simple linear regression can beat the baseline by quite a large margin. The mean squared error is almost half of the baseline, while the median absolute error is still somewhat high. However, the r2 is explaining almost half the variance of the predictions.

Decision Tree Regressor

For a more advanced model, I tried a decision tree regressor. I used brute force trial and error to determine the best max depth for the tree, which was 3. Below are the metrics determined from this model:

Val_score MSE:	0.11750400201930211
Val_score MAE:	0.09002683428291486
Val_score r2:	0.5539151591123518

The model is improving, explaining over half the variance, and having an even lower mean squared error and median absolute error than a simple linear regression.

Feature importances for the decision tree regressor

However, looking at the feature importances, it is clear to see that the rarity of the card is extremely important in determining the price.

Random Forest Regressor

The next model I used a random forest regressor. I used a grid search cross validation to determine the best values for the number of estimators and the max depth of the forest. The metrics were similar to the decision tree regressor:

Val_score MSE:	0.12048983370318032
Val_score MAE:	0.09038065969162751
Val_score r2:	0.5425799345350534

I thought it was surprising that a random forest was very slightly beat by a decision tree.

feature importances for the random forest regressor

The feature importances of the random forest reveals that still, the rarity of the card is the most important part of determining the price of a card. This makes logical sense as the less common a card is, the more valuable it should be. However, the flavor text length is the second most important feature, which is quite interesting as it has no bearing on the ability of the card.

Gradient Boosting Regressor

For the last model, I used a gradient boosting regressor. I intuitively expected this model to perform the best. However, looking at its resulting metrics, I may have been mistaken:

Val_score MSE:	0.12035860566603304
Val_score MAE:	0.13154170496730305
Val_score r2:	0.5430781204441701

Not only was the mean squared error and median absolute error higher than both the random forest and decision tree models, the r2 score was barely half a percent higher than the random forest.

Looking at the feature importances, rarity, unsurprisingly, is still the most important feature. However, in this model, the type of card is the second most important feature for determining the price of the card.

Evaluation of all the models

Sorted by the least to greatest r² values, the graph shows each of the median absolute errors (mae) and mean squared errors(mse). This shows that although the second least explained variance by the random forest had a lower median absolute error than the next best r² score. This also shows that all the models, even a simple linear regression, scored well below the baseline errors.

In this graph, we take a look at the rarities of each card against their price. We can see that there are some outliers. However, we also see that most of the common and uncommon cards have a lower price than the mythic or rare cards.

Test Results

I chose the gradient boosting model as my final model because it is the model I started with and it explained the second most variance with somewhat low errors. Here are the test set results in each metric:

Test_score MSE:	0.09462939943356202
Test_score MAE:	0.10900239644409848
Test_score r2:	0.6258594138476284

As you can see, the errors are 2–3% lower than the validation set results and the variance is explained 8% more in the test set than the validation set.

The Results

The results were mostly adequate and could be used to predict card prices. However, I feel like using just the standard legality cards limited the complexity of my model. The various models I used demonstrated that the best models may not be the most complex ones. The decision tree out performed both the Gradient boosting and random forest regressors. In the future, I may improve my model by using the 30,000 cards in total, spanning all the way back to the alpha set in 1993.