We have a challenge at Catalant Technologies: there are more than 30,000 business experts and boutique firms using our platform all looking for the right projects to tackle. These experts are alumni of traditional consulting firms, SMEs in niche technical fields, veterans of the world’s largest enterprises, and everything in-between. The projects on our platform are just as diverse as the experts working on them, and we need to find the right projects for every expert. This is the challenge that my colleague Andy Luther and I have been working on.
When an expert signs up for Catalant, they can provide a description of themselves in their ‘About Me’ section and tag themselves with skills and industries that represent their expertise. Similarly, when a business posts a project to our site, they write a project description and can tag their project with the skills they need. Experts are able to search for projects on their own, but we want to make it easy by recommending the most relevant projects for them.
One way we could generate these recommendations is by tracking which projects experts like and dislike (‘bookmark’ or ‘hide’ on Catalant), comparing these interactions to other experts’ likes and dislikes, and recommending projects that similar experts have liked. This is called ‘collaborative filtering,’ and it covers many algorithms with their own strengths and weaknesses. Collaborative filters are widely used on shopping websites like Amazon, news services, and video content providers like YouTube.
There is an important difference between how websites like Amazon incorporate collaborative filters and how Catalant must recommend projects: Catalant project life-times are very short. Products stay on Amazon’s catalog for months and in that time they will be purchased and rated by potentially thousands of users. This allows them to develop a very rich understanding of what kinds of items are purchased by the same customers and make recommendations based off these user-clusters. A Catalant project will only be accepting bids for a few days, may be interacted with by less than thirty experts, and needs to be recommended to the right experts the moment it is posted. This short lifetime and small number of interactions would hobble many collaborative filtering strategies.
An alternative to collaborative filtering is content-based filtering. A content-based recommender is one that matches a user with items that have metadata (tags, genres, etc.) similar to items that user has already liked. On Netflix, that means breaking down movies by their genre, actors, subject matter or other criteria.
For Catalant, a content-based approach means recommending projects to a user that have similar industry and skill tags to projects the user has already bookmarked. There are two major drawbacks to content-based recommendation systems: they require a lot of manual processing to ensure that the project metadata is thorough and rich, and they miss out on the potential to learn from comparing different users’ behavior (called ‘transfer learning’).
The appropriate solution for Catalant recommendations is somewhere in-between collaborative filters and content-based recommenders. That is where LightFM comes in.
LightFM is a hybrid model that incorporates both content-based recommendations and the transfer learning of collaborative filtering methods; it gives us the best of both worlds. Developed by Lyst, a Fashion e-Commerce site based in London, LightFM figures out what users like by learning relationships that map users and user metadata to the projects and project metadata that they like. These relationships are called ‘embeddings.’ In order to build these embeddings, LightFM utilizes three sets of information: the expert metadata, the project metadata, and the interactions between them.
For expert metadata, we build a matrix containing all of the expert’s industry and skill tags, as well as selected words from their profile tagline and ‘About Me’ section. That matrix looks something like this:
Similarly, for project metadata, we build a matrix containing the project’s industry tags, skill tags, selected words from the project name and description, and budget range:
The interactions are also put in to a matrix where every positive value represents a like and every negative is a dislike:
LightFM then takes these three matrices and solves for the embeddings that will allow it to most accurately predict the values in the interactions matrix. You can read more about how LightFM learns (called ‘stochastic gradient descent’) in this paper by Maciej Kula from Lyst.
To create recommendations for an expert, we give LightFM that expert’s metadata and the metadata of all out-for-bid projects. LightFM then uses the embeddings that it has learned to score each project on how much the expert will like it. The top-scoring projects are the user’s recommendations.
This system is something between a collaborative filter and a content-based system. It learns how groups of experts interact with various projects, like a collaborative filter, but it is also learning relationships between expert and project metadata, like a content-based system. By working in both worlds, LightFM gives us the strengths of both, and helps us match our diverse projects to our equally diverse experts.
In future posts, we will dive further in to how we measure success, how we break down expert/project metadata, how we optimize these models, and plans for maximizing sick wheelie hang-time.
If these problems are interesting to you, we have plenty more to solve at Catalant Technologies. We are looking for Data Science Engineers and Software Engineers. Learn more here.