Understanding Research Trends in Recommender Systems from Word Cloud

Takuya Kitazawa
Sep 1, 2018 · 3 min read

The field of recommender systems grows rapidly according to the recent development of practical intelligent systems. However, even though the field is exceptionally practical compared to the other computer-science-related topics, many researchers are actively studying recommendation techniques in their lab. Here is a question: what are the research trends in recommender systems?

I tried to understand the trends from word cloud by using abstract of papers accepted for ACM RecSys Conference, one of the biggest major conferences on recommendation systems.

Collecting abstract of accepted papers

Luckily, abstract of accepted RecSys papers are well-formatted on the web page in terms of HTML structure e.g., RecSys 2017 Accepted Contributions. So, first we collect the data in a text format by using a scraping tool, especially Scrapy in this article:

pip install scrapy

Using Scrapy is quite easy; once a user implements and executes a module in a command-line, a list of abstract sentences is stored into a CSV file:

$ scrapy runspider recsys_spider.py -a yy=17 -o csv/recsys17.csv

Since the well-structured “accepted contributions” page only exists from RecSys 2014 to 2017, the argument yy simply has the 4 options. Note that utilizing xargs might be helpful to get all of the four years' abstract of papers:

$ echo 14 15 16 17 | xargs -n 1 -I{} scrapy runspider recsys_spider.py -a yy={} -o out/recsys{}.csv

Creating word cloud

A word cloud generator written in Python provides us a really simple way to create visually stimulating word cloud from text data:

pip install wordcloud

In fact, the generator can be used inside of our Python code, but, in this article, we just use its command-line tool as follows, for the sake of simplicity:

$ wordcloud_cli.py --text input.csv --imagefile output.png

Let’s create word cloud images from the aggregated abstract of 4 years’ RecSys papers:

$ echo 14 15 16 17 | xargs -n 1 -I{} wordcloud_cli.py --text csv/recsys{}.csv --imagefile png/recsys{}.png
2014
2015
2016
2017

Okay…the result is trivial… Everyone commonly uses the terminology of this field such as “recommendation,” “model,” “user,” and “item” to write their abstract.

In order to make the word cloud images more meaningful, we can use custom stop words, a list of words that we do not want to use to create the images. To give an example, below I list some frequently used terms which has to be omitted:

user users item items recommendation recommendations model models content based algorithm algorithms recommender system systems data method using new show use proposed result paper information propose approach approaches dataset technique techniques provide different problem methods method one two present work task results However feature preference preferences

(Complete list with more meaningless terms like preposition can be found here.)

Again, create word cloud images with --stopwords option:

$ echo 14 15 16 17 | xargs -n 1 -I{} wordcloud_cli.py --text csv/recsys{}.csv --imagefile png/recsys{}.png --stopwords stopwords.txt

Keep reading original post at takuti.me

Takuya Kitazawa

Written by

A data science engineer at Arm Treasure Data and committer of Apache Hivemall.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade