Implicit ALS for community recommendation

Jean-Baptiste Delafosse
LumApps Experts
Published in
6 min readApr 23, 2021
Photo by Dylan Gillis on Unsplash

At Lumapps, we believe horizontal communication can have a massive impact on people and organization. It helps people learn more, faster and better. It makes innovation easier by encouraging interdisciplinary cooperation. It helps organizations do what is important for their customers more efficiently.

We implemented this idea within Lumapps in what we call “Communities” where individuals can communicate and collaborate within their organization in any way they deem necessary.

We’ve seen great use cases for communities implemented by our customers : from the usual community of practice that foster creativity and expertise to community about people that want to talk about their pets.

Many kind of communities exists within lumapps

Having too many communities on your intranet can be a burden though : users can experience fear of missing out. A recommendation system for communities is one of the many solutions that could help. This system should be able to help users discover new communities they might like.

We wanted to assess how hard it would be to implement such a recommendation system using Spark MLLib. While the math is fairly straightforward, we found some roadblocks along the way that we wanted to share.

Starting the project

After taking a look at the state of the art for recommendation algorithms, We quickly decided to use a collaborative filtering approach using the Alternating Least Square (ALS) method. This algorithm suits our needs:

  • Fast to put in place when compared to content-based approach
  • Well documented
  • Already tested with success for many use cases

What is great about collaborative filtering is that you only need data that tells you a user is interested in an item: in our case, a community.

We don’t really have reliable explicit feedback from users on a community in Lumapps: you can’t give a 5 star review to a community. We do have implicit feedback metrics already in place though. We used the number of views of a user on a community as an implicit feedback of his interest. The more views a user has on a community, the more likely he is to be interested in that community.

We could start the project.

Evaluating the model

Evaluating the model and tuning its hyperparameters was harder than anticipated.

All precision based metrics (exemple: Mean Square Error) make no sense with implicit feedback as the input label represents “Views” and the output label is a preference calculated by the ALS. One could argue we could scale views and preferences from 0 to 1 but that would be hacky at best. It should also be noted that precision based metrics requires knowing which items were disliked which is not possible in an implicit feedback situation : a workaround could be done by adding randomly user-community tuples with 0 views.

Binary Classification metrics (exemple: Accuracy) cannot be used directly either, as we don’t output a class but a preference. We could convert that preference to a class by defining a threshold above which the recommendation system recommends a community or not. The threshold itself could be considered a hyperparameter. Even doing so, binary classification methods are hacky because we still don’t know which items were disliked.

Recall based metrics such as ranking oriented ones are the best for such a recommendation system. We wanted to use a Ranking Evaluator conjointly with cross validation to find the right hyperparameters. If you want a Ranking Evaluator in SparkMl that is directly usable with cross validation and ALS, you need to implement it yourself.

At first, we tried the usual implementation : rank the communities recommended for a user and compare it to the ones the user actually viewed. We quickly came to the conclusion that it was not the right approach. Our users were interacting with very few communities. Let’s say “John” interacted with 3 communities. If you perform a 3-fold cross-validation, you are evaluating your rank metric on 2 communities at best.
Our second approach was building a ranking metric of users for a given community. For a given community, we rank users based on the number of times they viewed the community. We then turn the model upside down by making it predict users for a community instead of communities for user. It works well because each community is viewed by a lot of users.

Knowing we were going to use the previous Rank Based Metrics. We decided to stratify our whole dataset by community before performing a 80%/20% split for testing purposes.

In the end our dataset is used is the following way:

  • Test Set : A 80%-20% split stratified by communities
  • Validation: through a 3-Fold Cross Validation

Results

Despite all these roadblocks, we managed to train a decent recommendation system on our own Lumapps deployment (dogfooding at its finest).

While the algorithm was not using data regarding the business unit of users, we definitely observed a correlation between the latent variables learned and the business units of users. It was expected : a lot of communities within Lumapps are work related. The biggest separation we observed at Lumapps in community consumption is between the “Sales” and the “Engineering”

Engineering and Sales team use different community and are separated by the ALS

Our UX team also conducted user interviews with a selected customer that showed that the recommendations provided by such a system would be a great addition to our product.

Conclusion

We want to give our users great community recommendations so they can discover what their peers are up to.

The collaborative filtering approach available natively in Spark is a great way to build such a system as it provides horizontal scalability out of the box. We had yet to test how it behaves on production data : it’s now done.

We faced a few challenges when using this algorithm : evaluating the recommendation system was not straightforward with the data we had. We found an appropriate solution by implementing a Ranking Evaluator that ranks users instead of communities.

The results observed are consistent with what was expected: we found that the user factors learned were highly correlated with business units and that the community factors are correlated with the topic of each community.

User interviews performed on the field by our UX team confirm the recommendation system could be of great value for our users.

We also realised ALS could be used as a decent feature engineering method before applying other models.

The next step is to bring this POC to production 🚀

--

--