Machine Learning Insights to Get Funded on Kickstarter in 2020

Published in

The Startup

8 min readMay 6, 2020

Kickstarter: The Premier Crowdfunding Platform

A few years ago, a small tech company in Taiwan released a 360° Camera on Kickstarter. As their intern, I got the opportunity to write marketing materials and design the camera’s Kickstarter campaign. While I wasn’t around for the final launch, the campaign did end up raising over $60,000, opening my eyes to Kickstarter’s power in raising both consumer awareness and funds.

Kickstarter is one of the most popular crowdfunding platforms, with over 4.6 billion dollars from 17 million people to fund 445,000 projects. It follows an all or nothing model — you must hit your dollar target to receive funding, or leave empty handed — which lends itself well to classification modeling.

What contributes to getting successfully funded on Kickstarter? Yes, you need a good idea. Yes, you need good marketing assets. And yes, you need product incentives to attract potential backers. Funders on Kickstarter aren’t just donating their money out of goodwill — they are there to try out the latest in fashion, tech, and art. But a quick glance on Kickstarter could already tell you all this. My goal is to predict success on Kickstarter and uncover what factors contribute beyond the obvious.

Note: Most of this article concerns the analysis portion. For takeaways, feel free to skip to the takeaway section at the end. For code, please see this link.

Key Questions:

Categories: What categories have the most success on Kickstarter?
Countries: Can a product from Brazil have the same chance at success as a product from San Francisco?
Page Elements: Do offerings like “Featured Project” help?
Goals: How high should you set your target dollar goal, and how long should you open the funding period for?
Text: Do text elements like length of your project title matter?

Data Collection

I took Kickstarter data from July 2019 to December 2019 from webrobots.io, leaving me a DataFrame of with 1.4M rows and 40 columns. Since my computer couldn’t quickly handle this amount, I took a random sample of 45% of the data and dropped duplicates. I ended up with a sample size of ~193,000 Kickstarter Campaigns.

Next, I removed all text columns that had too many unique values and were clearly not relevant to predicting Kickstarter success — for instance, url information.

Understanding the Target Variable

The target variable I created was ‘goal_reached’ — whether or not the project met their funding goal. Interestingly 53.5% of the projects reached their funding goal, indicating that Kickstarter already gives entrepreneurs a very good chance at getting funded. Or, it could indicate that the scraper from webrobots.io collected mostly successful projects.

Feature Creation and Engineering

I kept feature creation and engineering for this project simple. Overall, I transformed 3 features and created 7 new ones. I removed 4 columns because > 90% of their values were null. PCA did not make sense with the data because I alreadyhad less than 40 columns in total.

Here are the features I created:

And here are the features I modified. These three variables had right skews greater than 0.2. I fixed these variables with a log transformation, resulting in+.05 to my model accuracy.

to fix right skew: df[‘column_name’] = np.log1p(df[‘column_name’])

Modeling + Analysis

The measures I used for these models was accuracy and precision.

I believed that precision was important because it measures how good the model is at minimizing false negatives — in this case, whether a Kickstarter was predicted successful but actually failed. In business one would want to ensure that their time and money investments would lead to success and not failure.

Model 0: Random Forest Classifier

I started off with a random forest classifier model and the results, while strong, were not informative.

While the model predicted Kickstarter success with ~93% accuracy and precision,we see that the variables “Backer Count” and “USD Pledged” overwhelm the models. These two variables are too obvious. Of course the more people you have pledging, the more likely you are to met your goal. Because the model weighs these two variables so heavily, it fails to elucidate the granular mechanisms that make Kickstarters work. We needed to remove these two variables from our set.

Model 1 : Random Forest Classifier

After removing “Backer Count” and “USD Pledged”, I wanted to try the Random Forest Classifier again. Running a grid search, I found that the model with n_estimators = 10 was with optimized with a max_depth of 20.

The model returned an accuracy of .77 and a precision of .76 — not bad but perhaps another model could improve on this.

Model 2: Gradient Boosted Classifier

The second model I tried was another tree model — Gradient Boosted Classifier. With a grid search, I found the optimal learning rate at 0.8.

The model return a slightly better score and precision. Definitely the best model so far.

Model 3: KNN Neighbors Classifier

The last model I tried was KNN Neighbors. Optimizing for n_neighbors, I found that the best classifier was at n_neighbors = 9. Surprisingly, while the accuracy score of .73 was below the random forest and gradient boosted model, the precision was slightly better than either of those. However I chose to forfeit this model because it took over half an hour to run, making analysis too tedious.

Ultimately, I chose the Gradient Boosted Classifier because provided the best mix of accuracy and precision and ran within a reasonable time. I did try SVM models as well, but like KNN, they took too long to run.

Takeaways

With my final model, I was able to produce an accuracy and precision of ~79%. In my opinion, this model predicts success on Kickstarter relatively well given the time spent and computing power available. What features are most important to the model? I compiled 2 charts to answer this question.

This first chart shows feature importance from the gradient boosting classifier package.

This second chart shows feature importance according to the SHAP (SHapley Additive exPlanations)’s impact score, which measures the marginal contribution of each feature:

Now I will go through key insights that we can glean from both these charts.

Page Position Matters. This is the most important feature in both models. The higher a project is on its respective Kickstarter page, the better your chance of success. Try and understand how you can make Kickstarter’s page ranking algorithm work in your favor.
Lower your goal. A lower goal is correlated with getting funded more often. The lower your goal, the less backers you need and the less funding you need to hit the goal. The median goal on Kickstarter is 5,000 USD.
Get Picked by Staff. staff_pick is a high-impact, positively correlated feature. Kickstarter spotlights such as Kickstarter’s “Featured Product” or “Project We Love” really matter. How does one get picked? If you believe your product is exceptional, then email stories@kickstarter.com until they take action.
Pick a shorter timeline. Somewhat counterintuitively, funding period is negatively correlated with success. Perhaps a shorter timeline forces potential funders to act quickly instead of dawdling and deliberating. The median timeline is 720 days — about 2 years! Pick something shorter than this.
Find someone experienced with Kickstarter. repeat_creator, which measured if a project had been created by a user with multiple Kickstarters under their belt, was a key feature in both models.
Category Analysis: According to SHAP, food, games, tech, and art had the highest impact on the model. Of these 4, only games was positively correlated, suggesting that food, tech, and art are the most competitive. Only games and food were in the top 10 in terms of model impact.
Capitalization and Title / Blurb Length barely matter. Of the variables pertaining to capitalization and length, only title capitalization made the top 10. Even so, its impact on the model was low.
Is_starrable had a negative correlation with success. I couldn’t find any information online as what is_starrable could be referring to, however.

Note that you can not directly infer causality with these graphs, but I do believe there is enough information regardless to make good business decisions.

Revisiting Key Questions

To answer the key questions raised at the beginning of the article:

Categories: What categories have the most success on Kickstarter?

All categories except for food and games were low impact, implying that the category you have positioned yourself in neither sets you up for hard failure or success. Food was negatively correlated, which may imply that it is a tough category to succeed in.

2. Countries: Can a product from Brazil have the same chance at success as a product from San Francisco?

Features regarding location had little to no impact on the model. On Kickstarter, any business in the world can make it.

3. Page Elements: Do offerings like “Featured Project” help?

Very much so. Try your best to get picked by staff, especially for the staff spotlight.

4. Goals: How high should you set your target dollar goal, and how long should you open the funding period for?

The true answer is probably specific for each product category as well as your own funding goals. For instance, if you actually need $100,000 to launch your product but set your goal as only $5,000, then success on Kickstarter doesn’t do you much good anyways. However, in general, a lower goal amount and lower duration will set you up for success.

Text: Do text elements like length of your project title matter?

Not at all.