Data, Machine Learning, and Marketplace Optimization at Upwork (Summary)

Summary of Data Science Opportunities and Challenges at Upwork

Thanh Tran
upwork-datascience
10 min readAug 2, 2019

--

Opportunities

The goal of Upwork is to shape the future of work by running an online platform to connect clients and freelancers and to help them get things done through flexible work arrangements.

1. Help Individual Users Grow

Taking the client’s and freelancer’s viewpoints, we see a huge opportunity for data science to play a crucial role. We mapped out the client’s and freelancer’s journeys as conversion funnels and identify the critical steps where data, machine learning and optimization methods can be applied to add value, to make users happy and to increase their conversion. We identified the biggest opportunity lies in building effective search and recommendation applications that can empower clients and freelancers to find each other’s match in our complex two-sided marketplace. The unique opportunity though, is, however, to take a long-term view and to support users to grow beyond their first work arrangement and successful hire. Beyond matching, there is value in building smart applications that help track successful job execution, leverage implicit and explicit job feedback, predict the risk of churn and auto-generate tailored recommendations for clients and freelancers to become more successful on our platform. In summary, the main opportunities from the user conversion viewpoint can be summarized as follows:

  • Identify and expose content (jobs, freelancers) through landing pages that attract the interest of visitors to optimize for clicks and registrations (signup conversion).
  • Help find and proactively highlight results to optimize for clients to hire and for freelancers to take on a job (start conversion).
  • Provide insights into how to become more successful on the platform and generate tailored recommendations for clients to build bigger teams and freelancers to get bigger jobs (grow conversion).

2. Help the Marketplace Grow

We used an analogy to illustrate how user conversion can be seen as part of the system that churns out fast cars. Without a central coordination piece that helps balance traffic, some users might get pushed and others get left behind. We argued the much bigger opportunity especially in terms of business metric impact lies in the understanding of overall marketplace health and sustainability and building a central coordination piece that can help ensure that. Overall, we identified these main opportunities:

  • Break down the marketplace into homogenous segments with predictable behavior.
  • Understand optimal pricing within every segment and implement policies to tilt towards equilibrium pricing.
  • Quantify and predict supply and demand to identify the current and future level of congestion. Implement growth strategies to address congestion and optimize for better flows of jobs throughout all supply and demand “highways”.
  • Understand the value of jobs and users to leverage unit economics and to shift from a throughput-based approach to a value-based optimization of flows (jobs vs. revenues vs. long-term value).

Challenges

Many of the applications we built and the technical solutions we developed are based on known recipes. In our previous presentations (Part 1, Part 2), you can see how we largely rely on the large body of literature and adopted known solutions to these problems:

  • Text understanding: unsupervised word embeddings, supervised text labeling, knowledge graph, named entity recognition and entity linking.
  • Search and recommendation: click-through-rate prediction, learning to rank, collaborative filtering, content-based filtering and wide-and-deep learning for recommender systems.
  • Interpretable models: partial dependence plot and Shapley values (SHAP).

Clearly, there has been lots of improvements and at the current rate, the democratization of deep learning and widespread availability of pre-trained solutions such as BERT will help us to achieve even bigger leaps forward in these areas as we ride along on this wave.

There are, however, challenges that are unique to Upwork. We believe our data science team is in the pole position to solve the following problems, which will have an even bigger impact on the success of our business and the future of online marketplaces similar to Upwork.

1. Large-scale and Contextually Rich Data

Running the largest platform in the space of online labor workplace introduces opportunities but also unique challenges in the scale, heterogeneity, and complexity of the data. Especially because we are both in the business of hiring and job execution, there is a deep and rich funnel of click, behavior and feedback data. We know what our freelancers clicked and viewed, how and when they submitted proposals and how they performed in interviews. And once they are hired, we know their job progress, how they communicated with their clients and we learn about their final job outcome based on client feedback. Analogously, we track how clients search, explore, invite, interview, hire and work with their freelancers towards job completion. We also collect freelancer’s feedback on their client’s conduct.

With this wealth of data, we cannot resist the temptation to model the entire user’s journey and predict their trajectories. However, skewed data presents a major challenge. (1) While the top of the funnel data such as impressions and clicks are plentiful, there is a long tail when we go further down the funnel to consider invites, hires, and feedback. (2) There is much more data available for a small percentage of active and loyal users while for the majority of new users, data is sparse.

2. Numerous Metric Targets and Complex User Conversion Problems

The core business metrics we target for most optimization problems are short-term revenue (the fill rate of jobs) and long-term health (growth and retention of user spends and earnings). But as you might expect, it is not possible to optimize all models and applications directly for these targets.

Rather, the holy grail is to understand and attribute the impact of various user behavior patterns and intermediate outcomes on revenue and retention. Under the lens of a user funnel, we identified key behavior and outcome targets and built step-wise solutions to optimize for their conversion. But if we have to make trade-offs, how important is one conversion goal compared to another?

Many complex models for attribution have been proposed for the purpose of justifying marketing spends but their application is not straightforward, especially in our context of a deep user funnel. What’s the impact of a click on a search result vs. an interview invite? The latter seems to be more immediately related to hire and revenue but as we have learned, click optimization is also important if we consider a long-term view: many clients are not ready to hire but if they find what they look for, they will come back. Modeling the impact of different user actions and making the appropriate trade-off between short- and long-term is key to building machine learning and optimization solutions that have real business impact.

3. Two-sided Matching to Convert Both Sides of the Market

As we build solutions to convert users we have to view the marketplace from both sides of sellers and buyers. Take invite conversion as an example: Clients are more inclined to hire if they see freelancers that match their job requirements, but they will find their invite, only gets accepted when the freelancers are also interested in the jobs they offer.

The problem of matching is well studied. When we conceive client and freelancer preferences as objects, we can borrow from pairwise matching solutions such as text-based matching for document retrieval (query vs. document), structured-matching for entity resolution (entity vs. entity) and generic strategies such as learning to match. We can also simply formulate two-sided matching as a conversion optimization problem (e.g. invite conversion). But the core challenge to all of these lies in the learning and representation of preferences. We invested a great deal in the manual engineering of features to capture preferences as well as pairwise preference interactions and matches. Maybe we already hit the ceiling with this approach and a more complex deep learning-based feature representation solution may help untap the full potential of Upwork’s large and rich data and take us to the next level of two-sided matching.

4. Tailored Strategies for Segment-based Conversion

It seems no matter the business domain, the prevalent belief is that understanding different user segments and implementing a tailored solution for every segment will improve overall conversion. For example, a recurring question at Upwork is whether we correctly group users in terms of their lifetime value and put customized solutions in place to differentiate the treatment of high-value vs. low-value clients.

Unit economics aside, the machine learning question here is whether to build a single model or multiple models (one for every segment). Seemingly counter-intuitive, it has been shown that for domains such as credit risk scoring, segment-specific models do not necessarily outperform a single model approach on core metrics such as AUC. At the same time, our understanding of and the computing infrastructure to support complex single model-based solutions have evolved. A single model solution could be based on a complex ensemble of classifiers or has in-built support for segment-specific effects such as generalized linear mixed models that support ID-level (or segment-level) features. With the single model approach, we can take advantage of the whole dataset to automatically learn and capture differences and interactions between segments.

However, as we explore both directions, we found the answer is not clearcut. Upwork users are quite different in terms of business size, location, job category, and value orientation. For different segmentation strategies, we found very unique behavior within segments. This problem of user heterogeneity coupled with data skew and sparsity seem to put a limit on how much a single model approach can automatically capture unique segment-specific behaviors.

5. From User Conversion to Marketplace Optimization

We showed that user conversion while being our main target of machine learning and optimization methods is in and by itself, not the right lens through which we can attain globally optimal solutions. As an example, converting freelancers to sign up might not yield any benefits or even harmfully increase supply-side competition when there are not enough jobs. The challenge then becomes how to leverage core user conversion to implement and boost user growth strategies, which are conceived, optimized and balanced through the lens of flow-based market-level coordination.

Implementing such a coordination mechanism poses interesting system and implementation challenges, as it requires central and full control of exposure and match logic over all the marketplace touchpoints.

It also requires solving core conceptual problems, which we identified as marketplace segmentation, supply and demand prediction, pricing and user/job value estimation. We have discussed solutions for these problems in isolation but the reality is, they are interdependent. As an operator, Upwork controls the exposure of jobs and freelancers through which, clear boundaries can be drawn around segments. Changing segmentation leads to different marketplace definitions, which result in supply and demand shifts. In turn, price moves to balance the new changes. Besides market dynamics, prices change also as a direct consequence of Upwork’s policies such as minimum supported rates. There are interesting theoretical results for solving this as a holistic optimization problem of finding segmentations that maximize marketplace revenues. However, there is still a long way towards practical solutions for large and complex marketplaces like Upwork. Besides the high complexity of the underlying optimization problem, what complicates matters further is when we relax the closed-system assumption. Supply and demand are driven by both the internal marketplace dynamics and external marketing, sales and support programs Upwork implement to acquire and retain users.

As a final thought, we’d like to explore the differences between job vs. revenue vs. value. First, we’d be happy when we can leverage central coordination to optimize the flow of jobs and increase the total number of transactions. With this, it is only an incremental step to move to a revenue-based solution. What is however most challenging is to optimize for value. Whereas revenue is a reflection of current success, the notion of value takes a long-term view. The ultimate challenge for us is to implement central coordination based on maximizing the total of the estimated lifetime value of all our users.

In Part 1 and Part 2 of the series, we presented applications that can take advantage of the opportunities and discussed technical solutions towards addressing the challenges above. However, this summary should make it very clear the journey we embarked on is long, the mission we are pursuing is exciting, and it hopefully did entice the interests of our readers. We are very open to collaboration. Leveraging our platform for contract management, data sharing, and online workplace facilitation, it has been frictionless for us to work with external collaborators and easy to integrate new members from all over the world into our team. We encourage every interested reader to share ideas, feedback and reach out to us to work together towards shaping the future of work through smart data science and machine learning applications!

About the Authors

Thanh Tran is the head of data science at Upwork, where he works with a team of 30+ scientists and engineers to innovate the core engine behind the world’s largest platform for freelancing and flexible work. As an entrepreneur and advisor of Bay Area startups, he helped built teams, raised capital for many companies and successfully shipped innovative technology solutions and end-user applications. Thanh previously served as a professor at Karlsruhe Institute of Technology (KIT) and Stanford (visiting), where he led a worldwide top research group in semantic search. He earned various awards and recognition for his academic work (Most Cited Article 5-years award, among top-5 in Semantic Search, and top-50 in Web Search per 2016 Google Scholar Global Index).

The article was reviewed and the actual work presented is done by the following data science team members: Alexander Krainov, Amro Tork, Andrei Demus, Artem Moskvin, Danylo D., Dimitris Manikis, Eva Mok, George Barelas, Giannis Koutsoubos, Hemanth Ratakonda, Igor Korsunov, Ivan Portyanko, João Vieira, Le Gu, Lei Zhang, Mikhail Baturov, Nimit Pattanasri, Pablo Celayes, Quang Hieu Vu, Roman Tkachuk, Samur Cardoso De Araujo, Sibo Lu, Siddharth Kumar, Silvestre Losada, Spyros Kapnissis, Vasily Ryazanov, Veli Bicer, Vinh Dang, Yongtao Ma, Zarko Celebic.

--

--

Thanh Tran
upwork-datascience

Head of Data Science at Upwork, the world’s largest platform for freelancing and flexible work.