Building a web app to capture sports trends on Instagram

How Decathlon capitalizes on social listening

Yan Gobeil
Decathlon Digital
7 min readSep 8, 2020

--

One of the strengths of the Artificial Intelligence team at Decathlon Canada is extracting information from images. In the past we have spent a good amount of time building algorithms to analyse sports images (see, for example, parts one, two and three of our image classification story). This has lead us to release the Sport Vision API, which can be used to extract the following data from an image:

  • If a sport is displayed or not
  • Sport practiced
  • Location (indoor/outdoor)
  • Sport equipments displayed and their color attributes

Of course all this information is cool to have, but the question is: what do we do with it? In this post I will describe how we leveraged this information to build a social listening web app.

What is social listening?

Social media platforms like Facebook, Twitter and Instagram have become an integral part of our lives in the past years, so much so that “as of July 2020, more than half of all the people on Earth use social media”. The users of those platforms are highly active. As we can see from this source, every minute, 147k images and 347k stories are uploaded to Facebook and Instagram, respectively.

There are more than 950k posts on Instagram that contain #decathlon

Given this huge amount of content created continuously, the data that can be obtained about social media users is pretty much infinite. Obviously a large part of these users are consumers so, as a company, it is worth asking what kind of information we can collect that would be useful to us. Monitoring social media to gain insights about how you can improve your brand is known as social listening.

There are in fact many different reasons that can motivate a company to do it. In our case, we wanted to know what people think about the Decathlon brands, how customers are using our products and even find influencers to work with. We also wanted to use social media to collect statistics about sports popularity per country and language, and find images to use in marketing campaigns.

“Listening” to Instagram images

The most popular way of listening to social media is to look at what people are writing. Given the expertise of our team, we decided to follow another direction and look at the images that people post online. We focused our attention on the obvious media to use: Instagram.

We built a web app that effectively works as a search engine. There are various fields that can be selected which then displays images from Instagram that correspond to the chosen fields. The different possibilities are the following:

  • Hashtag, from a list of Decathlon brands (kalenji, domyos, artengo, etc)
  • Sport, from a list of 152 (taken from our Sports API)
  • Location (indoor/outdoor)
  • Language of the caption

The user can also decide if they want images posted in the last week or the last month. An option to have only trending media, as characterized by Instagram, is there as well. Finally, an advanced search option allows to write any sentence and the images that correspond the most to the description are displayed.

Menu to search images on Instagram

Once the request is done, the images are displayed on a grid along with links to download them or to go directly to the post to see more details and an indicator if the post is trending. Clicking on a specific image shows the information that was collected on it and that was used for the search.

Example of results obtained when searching for images of kayaking with #decathlon posted in the last month.

An example of a direct application for this app is if the social media team of Decathlon Italy wants to promote their yoga products. They can make a search for images of yoga, with captions in Italian and focus on the trending posts. The results can then be used to find user generated content to share in their communications to the customers or even influencers to work with to promote their products.

How we do it

Now that you know what the app looks like and what it does, let me describe a bit more how we collect all the information and where the AI comes into play. The main tool that we use is the Instagram API. It allows us to collect daily all the images that were posted in the last 24 hours with specific hashtags. We can get some interesting data about each of these posts, like the caption (from which we can extract the hashtags), the number of likes and the date of publication. Another tool of the API gives us a list of trending posts for our desired hashtags, so we can add a trending tag to some images.

After getting the images, we feed them to the different endpoints of the Sport Vision API that give us all the information mentionned above about their contents. We also use a Language Detection API to detect the languages of the captions. The final step is to compute a vector representation for the images, that is used in the advanced search. Without going into too much details, this is done using a pretrained neural network (SBERT) that takes all the words characterizing the image (sport, caption, objects, etc) and combines them. All the information is finally saved in a database.

Only the images that are labelled as sport are considered when performing a search. This search is actually performed in two steps. If a sentence is written in the advanced search, it, along with all the words in the search fields, are encoded into a vector using the neural network. The vectors in the database that are the closest to that query vector are returned and the filters are applied to get only the relevant ones. If no advanced search is done, the research is done simply based on comparing the values of the fields.

Results of a search for top media with “swimsuit” in the advanced search. Returned images most likely contain a swimsuit or have mentions related to swimsuits in their captions.

For those interested in the technical details, the app is split into three components: the cron tasks, the backend and the frontend. All of them are hosted on Heroku. The cron tasks use the scheduler on Heroku to run a python script daily to perform all the data collection. The information is then saved in a MongoDB database in the cloud, which is called at will by the backend, also written in python. The frontend is written using vue.js (more specifically vuetify) and calls the backend for information.

Future steps

So far our tool looks great and is functional, but it is still at its first stages. We are working with testers to check for performance and get feedback on useful features that we could add. In the future we plan on monetizing the app to sell it to users accross the Décathlon network. This will be a big step and there will be a lot of work involved to achieve it.

An aspect of the app that I did not discuss is the statistics that we collect. Indeed we have pages displaying plots that show the number of posts per hashtag and the most popular sports for each hashtag. There are also plots with information collected from other sources. We study the popularity of sports based on the number of searches on Google and the number of tweets that mention the sport, separated by countries. We have plans to add new sources of data.

Top 10 most popular sports in the images with #decathlon posted in the last month.
Top 5 most popular sports in Canada in the last week, based on Google searches.

We are hiring!

Are you interested in computer vision and the application of AI to improve sport accessibility? Luckily for you, we are hiring! Follow https://developers.decathlon.com/careers to see the different exciting opportunities.

Let us now if you have any comment or suggestion about the topic of this article and don’t hesitate to share it with your network if you liked it :)

A special thanks to the members of the AI team at Décathlon Canada for the comments and review.

--

--

Yan Gobeil
Decathlon Digital

I am a data scientist at Décathlon Canada working on generating intelligence from sports images. I aim to learn as much as I can about AI and programming.