How to get upvotes for a kernel on Kaggle?
An analysis of Kaggle’s public data
I have recently joined Kaggle and started to create public kernels. My kernels have a lot of views, but no upvotes. Fortunately, there is Meta Kaggle dataset, which contains various data on competitions, users, submissions, and kernels. We can use this dataset to find out:
- Statistics for the number of kernels, which have votes;
- How different factors affect the number of votes (for example, characteristics of the author, source dataset etc.)?
- And finally, make the recommendations on how to make the kernel useful so other Kaggle users would cast upvotes.
Explore statistics for the number of kernels
If we just count the total number of kernels and compare it to the number of ones which have upvotes, we can see that writing a popular kernel is not an easy task.
There are more than 220 thousand kernels on Kaggle in total, only 20% of them were upvoted by Kaggle users, and only 4% are awarded, which means has more than 5 upvotes (see Plot 1).
How does the number of views and number of comments affect the number of votes?
My idea is that the more people view or discuss the kernel, the more votes it gets. To prove this assumption let’s first look at the correlation between the number of votes and the number of views and comments:
We really see that these numbers are highly correlated. We can also plot the number of votes versus the number of views and comments and add to the plot the linear line, which shows the dependency between them:
Looking at the plots it really seems that my idea was right. To gain upvotes from the users, the kernel needs to be shared with others, seen and discussed.
How does the status of the author affect the number of votes per kernel?
Kaggle has its own progression system, there are performance tiers depending on the proficiency and the contribution of the users.
Indeed, we can see on the plot 5 that kernels created by more proficient authors gain more votes on average.
How does dataset related to kernel affects the number of votes?
Kernels on Kaggle use as data sources datasets released on Kaggle. Datasets on Kaggle also gain votes from users. Let’s try to find out how the popularity of a dataset affects the number of votes for the related kernels?
I tried to plot the average number of kernel votes depending on the number of votes and downloads for the dataset,
which was used as the data source (see plot 6 and plot 7).
It looks like there is no dependency between the number of votes for s kernel and the number of votes or downloads for the dataset used as a data source. I suppose that we can create a really helpful and popular kernel for an unpopular dataset and vice versa.
How does kernel language affect the number of votes?
Kernels on Kaggle can have different language types, for example, Python scripts, Python notebooks, R scripts etc. The plot below shows the average number of votes for each language type:
It looks like descriptive kernels which use markdown are more appreciated with Kaggle users and gain more votes on average.
How do kernel tags affect the number of votes?
Authors can add tags to their kernels. We can plot the average number of kernel votes for each of top-20 most popular tags on Kaggle:
It is also interesting to find out which tags have the greatest average number of votes:
According to the plots, the most kernels tagged with the most popular tags on Kaggle do not score the greatest number of votes.
In conclusion, I would like to summarize all the findings and recommendations from this analysis:
- It is hard to create a really helpful kernel, which will be appreciated and upvoted by Kagglers: only 20% of kernels have upvotes and only 4% of kernels have awards (have more than 5 upvotes).
- Views and comments bring upvotes: consider adding a captivating title to the kernel and sharing the link to the kernel with others, the more people will view the kernel — the more people will find it useful.
- Active authors have more votes: try to be an active author and gain visibility, experience in writing kernels and feedback from the others will eventually help to get votes.
- It doesn’t really matter what topic the kernel is related to, but it matters how the kernel material is presented: notebooks tend to be more appreciated by Kagglers.