Data, Machine Learning, and Marketplace Optimization at Upwork (Part 2: Market Level Growth)

How We Use Data and ML to Optimize and Grow our Marketplace

Thanh Tran
upwork-datascience
15 min readAug 2, 2019

--

Think of core user conversion as the part of our engine that churns out core users. Think of core users as fast cars. If there were no rules and regulations, there would be cars that get pushed, and cars that would get left behind. What if some of these cars were our loyal core users that cannot keep up with the new pace? What if all the roads were already so crowded such that new fast cars don’t get to live up to their full potential?

We are introducing FLOW, our centralized coordination piece of the marketplace engine, which has the aim to derive and consolidate marketplace insights, and use this information to accelerate, divert or slow down core users to balance their growth. Ultimately, it aims to ensure optimal flows of jobs and revenues through the system.

This Part 2 of our series is structured as follows:

1) Motivation: the business impact of central marketplace coordination.

2) Data and machine learning foundation of FLOW, our coordination engine.

3) How we use FLOW to conceive and implement core growth strategies.

The Business Impact of FLOW, our Marketplace Coordination Engine

Using the analogy of a traffic system, we conceive every unit of job supply and demand as a car moving through a highway. Then, the optimization goals for our business, such as growing job and revenue streams can be accomplished through the tasks of

  1. building more and bigger highways and,
  2. pushing more and bigger cars through the highways.

With FLOW, at a glance, we can also see marketplace congestions. The example below shows the market for Mobile Development jobs. For the sake of illustration, it is segmented along the dimensions of expertise level and platform experience. A bigger highway for demand on the left compared to the supply side on the right indicates, there is a supply gap in Mobile Development. Zooming in, we can further see this gap is mainly due to the lack of supply in the high-expertise segment. Conversely, given this gap, it means we also have a problem on the demand side. Now that the supply for high-expertise freelancers is limited, our clients have to compete. Incoming newbie clients further increase the competition, which makes it more difficult for our core clients to find freelancers and fill their jobs. With FLOW, we can detect this risk of core client churn.

FLOW analysis for the Mobile Development segment.

With FLOW, we can systematically expose growth opportunities and analyze market risks in the form of congestion. As examples, we saw two types of congestion: supply gap and client churn. In fact, we can derive from our data that filling this supply gap for Mobile Development will help us gain ~43M in revenue. On the other hand, if we do not close this gap, there is a risk we will lose ~4M through core client churn. Mobile Development is just one of the many job categories you can find on Upwork. Understanding flows and fix congestion even in just one of the hundreds of these “highways” have a huge business impact. In fact, if we can optimize and grow at the global market level, our learnings and estimates indicate the opportunities in terms of core business metrics (number of visitors, fill rate, spend retention and overall core user growth) are magnitudes bigger than conversion at the user level.

We will now discuss how to build the foundation of FLOW, and use it to systematically conceive and implement growth strategies that shall have a lasting impact on the health and sustainability of our marketplace.

Part 2.1: Use ML and Marketplace Optimization Methods to Build FLOW

We need a conceptual foundation for our marketplace to build a system like FLOW. This foundation is established through our continuous efforts in improving our solutions for the four core problems illustrated below.

Foundation of marketplace optimization.

Marketplace Segmentation

At the core of everything is the challenge of defining a marketplace. There is a gamut of attributes we can use and should consider for segmentation including job categories (Mobile Development vs. Customer Service), skill requirements (Java vs. Adobe Photoshop), expertise level, price, geography. Ultimately, what we want is for every defined segment to be homogenous such that their behavior is predictable. To understand and effectively coordinate the growth of a segment requires their constituents to be comparable or somewhat substitutable products. Only then we would know how to effectively control supply and demand in these segments, and be able to predict what is the impact of policies such as constraining supply, would have on the clearing rate (price), fill rate (throughput) and other attributes we aim to optimize for these segments.

We are taking a practical approach to solve this: The idea is to start with skill, skill level, and geography as core segmentation attributes to define segments such as “Level 3 Java US-to-US jobs”. Since individual markets have grown at a different pace, this uniform segmentation rule results in segments of different sizes. We assess their homogeneity based on offline metrics, and especially for large segments, we apply additional decomposition rules when they do not behave in line with our policy predictions.

As part of the solution for this segmentation problem, we use statistical learning methods to identify skill substitutability (to form segments based on substitutable skill). Another challenge we have solved is how to derive expertise level requirements from job post (i.e., to identify demand-side expertise level): the level requirement explicitly specified by our clients are often not consistent with their job descriptions. We formulate this as a classification problem and use job metadata, text embeddings as well as behavior data such as clicks and applications we observed for the job post to infer their job level requirement. Predicting the expertise level for our freelancers, i.e., supply-side expertise level poses an even more interesting challenge. Also here, while the level “claimed” by our freelancers are useful as an explicit signal, we need a more reliable and low-resolution (1–10) indicator for level. We employ statistical modeling to leverage our data and capture the process of job proposals, interviews, hires and job outcome feedback as a competition with the clients as the judges: Intuitively, freelancers are assigned a higher expertise level if relative to the competition, they won more interviews, got hired to more and bigger jobs and obtained more positive job feedback.

Supply and Demand Prediction

The task here is to quantify the units of supply and demand for every segment, e.g. how many hours of supply and demand we have for “Level 3 Java US-to-US”. This is non-trivial because on Upwork we have differences in (1) job types (fixed price vs. hourly jobs), (2) client intent (explore vs. ready-to-hire) and (3) freelancer availability (moonlighting vs. full-commitment).

The value of a fixed price job is determined by our clients, which might vary from the actual payment issued at the end of the contract (e.g. a bonus paid to compensate for a change in scope). The contract duration is not known and therefore, converting these contracts to time units requires an estimation of the scope and the rate the freelancer changes. On the other hand, hourly contracts can be directly represented as time units. However, their scope, duration, and value are flexible quantities that are subjected to mutual client-freelancer satisfaction during the lifetime of the project. The task of demand prediction goes beyond counting the actual work hours observed so far to account for the entire scope of all current contracts, which in some cases, extend to more than 24 months in the future (a large percentage of Upwork freelancers seek long-term engagement). So as a unit of measure, we also use value as an alternative to time (1USD vs. 1h). We formulate contract value prediction as a regression problem and built a model to obtain a value-based quantification of demand.

In fact, we think our demand is the total of all ongoing contracts and the job posts, which still seek to hire. But hiring intent varies so how do we distinguish between clients, who are still exploring from clients, who are ready to hire? Fortunately, there are various types of engagement signals we can derive from the click log to infer different levels of hiring intent (will invite vs. will interview vs. will hire a freelancer).

On the supply side, the amount of available work hours largely depends on the freelancer’s commitment. Even though we ask our freelancers to announce their availability in terms of work hours per week, this signal does not line up with our actual observation of the freelancer’s engagement and responses to the client’s hiring requests. We do not attempt to directly predict availability but formulated a model to predict availability signals such as the probability a freelancer will accept a job invite. Analogous to the value quantification of demand, we also build models to directly predict the value, i.e. earnings, our freelancers will be able to generate on Upwork.

Conceptually, a clean and systematic solution for our supply and demand prediction requires establishing one single unit of measure, e.g. dollar value, and building models to predict the value of jobs and earnings of freelancers. While this does represent our ongoing efforts, accurate value prediction is challenging due to the diversity we have in terms of job categories, client’s intent and requirement and freelancer’s availability and commitment. Furthermore, effective supply and demand is not only a question of buyers vs. sellers but also how well we do like the marketplace operator to facilitate the finding and exchange of goods. To account for that, we add the dimension of exposure to capture whether, and to what extent a job/freelancer is seen by freelancers/clients.

Under the assumption there is yet no perfect single solution, we are taking a pragmatic approach using several unit measures and proxies (time and value, but also other proxies for demand and supply such as the number of job posts, client-initiated invites, freelancer initiated applications), and refining these estimates based on our understanding of marketplace exposure, client intent and freelancer availability to identify supply and demand imbalances at different levels of approximation.

Optimal Pricing

In a complex marketplace like Upwork where segmentation and what defines a “product” are hard problems, pricing policy implementations we have seen are far from systematic and well understood. We recognize pricing as a core lever to coordinate between supply and demand, to boost short-term profit margin and when not done well, also to hurt the long-term health and sustainability of the marketplace.

In an ideal world, we want to establish an optimal price (i.e., clearing rate for contracts) for every segment, which perfectly balances between supply and demand, i.e., to establish an equilibrium price at which the quantity supplied is equal to the quantity demanded. This is the theory but the facts are on marketplaces like Upwork, (1) the price is not set by the market operator but is established as a mutual agreement between buyer and seller, (2) seller growth usually scales faster and is cheaper than buyer growth, and (3) paired with the seller’s propensity to go below their market value during their initial period to “explore” and to establish reputation, buyers often have relatively high bargaining power.

On Upwork, we exercise partial control over pricing by adjusting the exposure of freelancers/jobs to clients/freelancers. As generic policies, we distinguish between price adjustment and tilt: The former is simply based on aligning prices to current behaviors of marketplace participants while the latter aims to shift towards a level that is more optimal in terms of achieving equilibrium.

We find that pricing and marketplace segmentation go hand in hand. In fact, we assess the quality of and if need to, further decompose a segment based on the price homogeneity of the constituent contracts. So as a result of segmentation, we shall obtain well-defined price ranges that can be used to inform current pricing decisions. We decrease the rank of freelancers, who have rates or tend to bid below these ranges. We also built a model that predicts the rate a freelancer can use as a guide to winning a contract. We see these current policies as simple adjustments whereas more optimal strategies that we are currently working on, will account for and tilt price to address supply and demand imbalances.

Value Prediction

We have discussed the problem of predicting the value of jobs, clients, and freelancers in the context of value-based supply and demand estimation. In and of itself, a solution to this problem does not seem fundamental to the marketplace coordination mechanics like FLOW, especially if we think of it as a closed system. In such a system, once we have built the highways (marketplace segmentation) and found a way to quantify the traffic and identify congestion (supply and demand), we could implement more direct constraints to limit the exposure of freelancers and jobs to regulate traffic (equilibrium pricing).

Upwork is however not a closed system and we spend a substantial portion of our budget in acquiring users. With value prediction, we gain an understanding of our unit economics and can systematically explore and exploit acquisition strategies in terms of lifetime value and cost of acquisition. Beyond acquisition, we apply the same concept of unit economics to account for the cost of operations to maximize the return on investment for high-touch user success programs that we run to help clients hire and freelancers to become more successful. Given operational costs and constraints and the need to balance supply and demand, high-value clients, jobs, and freelancers shall be prioritized over low-value ones. Viewing through this lens helps to formulate the coordination mechanics FLOW aims to solve as a value optimization as opposed to a throughput optimization problem.

As discussed, we approach value prediction as a regression problem. The challenge in predicting client and freelancer values is that early estimates are more valuable but also more difficult to obtain. We built models for different lifecycle stages and recognize that the accuracy we have for users upon registration, for instance, is much lower than for someone who already has several months of spending/earning on the platform.

Interested to know more? Please contact our team members working on these efforts:

- Danylo D., Samur Cardoso De Araujo, Sibo Lu: Modeling segmentation, supply and demand and equilibrium pricing in online labor marketplaces.

Part 2.2: Implement Growth Strategies with FLOW

With this conceptual foundation set, we will now show how we use FLOW to systematically devise growth strategies that can address congestion issues and optimize and grow our flow of jobs and revenues, as shown below.

Implement core growth strategies with FLOW.

Targeted Growth

For every segment, we run supply and demand prediction to derive well-defined talent orders representing supply gaps. Traditionally, we regulate heavy competition by constraining the supply of freelancers via a policy of limited admission and exposure that essentially, rejects account registrations and reduce the exposure of newbie freelancers based on general quality signals. With the concept of talent orders, we now pursue a targeted growth strategy. The idea here is to focus growth efforts on filling these orders. Put it another way, when we find congestion on the supply side, we remove market-wide growth constraints and furthermore, invest additional measures to boost growth by increasing the exposure of in-demand talents and fast-tracking in-demand newbies through high-touch freelancer success programs, i.e., we are moving from a general policy of limited exposure to a policy of targeted progressive exposure.

With our work on value-based prediction of supply and demand, we are also very close to quantifying the business impact of filling the talent orders and given cost information for acquisition and support efforts, estimating their return on investment.

In this context, FLOW provides the insights for policy formulation and also acts as an enabler for policy implementation. For instance, FLOW adds a layer of business logic constraints to all our search and recommendation logic, which ultimately, dictate which jobs a freelancer can see and which freelancers are visible to a client.

Sustainable Growth

Making new users happy may come at the cost of harming our existing users. With FLOW and especially based on the fine-grained supply and demand predictions, we have insights into the loyalty and commitment of existing users that we can leverage to better protect and retain them. We understand that the key to the sustainability of our business lies in the retention of loyal freelancers that use the platform extensively. But even moonlighters, especially freelancers that have grown with the platform over the years, play a major role.

We devise our segmentation logic to consider the dimensions of platform experience (newbie vs. experienced freelancers) and usage commitment (full-time vs. part-time vs. uncommitted) to understand congestion in these particular submarkets and to take measures towards protection and retention. For instance, as a counterpart to the policy of progressive exposure discussed above, we are implementing a systematic policy of prioritized exposure & access that is applied in times of oversupply. In fact, with insights into demand and pricing and the control mechanisms FLOW is able to provide, we believe to be soon in a position to ensure a stable stream of jobs and provide assured ranges of income for all committed freelancers on Upwork.

Profitable Growth

While long-term sustainability is key, what also matters is understanding and exercising levers that have a direct impact on the bottom line, i.e., profitability. To this end, our efforts in value prediction lay the foundation for policy implementations that help us to double down on users with high-value potential. Viewed from the lens of unit economics, we assess policies based on their return on investment and focus on proposals that pay attention to the highest value users.

On the acquisition side, we leverage client lifetime value predictions to optimize our ad-spends over marketing channels such as Search Engine Marketing. As an example, we share our client value predictions directly to Google to take advantage of advanced bid strategies such as Target ROAS (bid based on target return on ad spend). Internally, Google runs a machine learning algorithm that with better data about client conversion, can better optimize the return on spend.

We pursue a similar optimization strategy with our operations team, which runs support programs targeting user success and retention. On the client-side, manual support includes helping new clients to post jobs, finding relevant freelancers and assisting with hiring decisions. Freelancers also receive support in various ways, such as how to present themselves and gain tailored insights into becoming more competitive and successful on Upwork. We maximize resource constraints to focus high-touch support on users that have the highest value predictions.

Fast Growth

Finally, and especially because Upwork has established itself as a significant player in this market of online labor, insights into our marketplace are not only valuable for internal optimization but also external promotion. For instance, we think that with high-resolution supply and demand data and insights, we can better optimize our visitor sites. The amount of supply and demand we have is the largest in online labor. For almost every job category, we think the volume and quality of talents we have are superior to all including niche players that focus on a particular domain (e.g. Design). We are working on exposing hard facts we can derive from our data to make our landing pages more appealing to visitors.

Understanding our market in terms of fine-grained segments, and identifying competitive edges in areas such as supply, demand, pricing, and quality in every segment helps to form strategies and to clearly communicate the dominance in terms of hard data, facts, and insights towards faster user acquisition.

About the Authors

Thanh Tran is the head of data science at Upwork, where he works with a team of 30+ scientists and engineers to innovate the core engine behind the world’s largest platform for freelancing and flexible work. As an entrepreneur and advisor of Bay Area startups, he helped built teams, raised capital for many companies and successfully shipped innovative technology solutions and end-user applications. Thanh previously served as a professor at Karlsruhe Institute of Technology (KIT) and Stanford (visiting), where he led a worldwide top research group in semantic search. He earned various awards and recognition for his academic work (Most Cited Article 5-years award, among top-5 in Semantic Search, and top-50 in Web Search per 2016 Google Scholar Global Index).

The article was reviewed and the actual work presented is done by the following data science team members: Alexander Krainov, Amro Tork, Andrei Demus, Artem Moskvin, Danylo D., Dimitris Manikis, Eva Mok, George Barelas, Giannis Koutsoubos, Hemanth Ratakonda, Igor Korsunov, Ivan Portyanko, João Vieira, Le Gu, Lei Zhang, Mikhail Baturov, Nimit Pattanasri, Pablo Celayes, Quang Hieu Vu, Roman Tkachuk, Samur Cardoso De Araujo, Sibo Lu, Siddharth Kumar, Silvestre Losada, Spyros Kapnissis, Vasily Ryazanov, Veli Bicer, Vinh Dang, Yongtao Ma, Zarko Celebic.

--

--

Thanh Tran
upwork-datascience

Head of Data Science at Upwork, the world’s largest platform for freelancing and flexible work.