How Data Science helps to build JobTech at StepStone

Timm Lochmann
the-stepstone-group-tech-blog
5 min readAug 1, 2022

With about 4000 employees, The Stepstone Group is a company that strongly contributes to how jobseekers find their next career opportunity and employers find their next talent. With a large set of different cross functional teams comprising over 1000 people in product, tech & marketing, we are building the infrastructure for JobTech, i.e., for digitizing the labor market and making it more functional. To give a concrete example for such improvements, the StepStone platform combines and enriches data from various domains regarding jobseeker interests and current market demand. The enriched data can then be used by automated processes based on machine learning to better support companies to specify their hiring need and make sure they find the right talent.

Building JobTech

In this mission, the closely related fields of data science, data engineering, and machine learning engineering play a key role: to make processes smart, frictionless and useful, it is key to have the right data and the capability to extract relevant information contained in it. For example, analyzing millions of job vacancies allows to identify trending skills in the market, and knowing what users search for on StepStone helps to understand what job options are most attractive to people with a specific profile. And to make the whole ecosystem work smoothly together, different processes need to talk to each other to make this information be available at the right decision points. While building such an ecosystem requires alignment of multiple domains ranging from business and sales to architecture, infrastructure, and platform engineering, our focus here is on the data domains.

Data Science in the cloud: Infrastructure and off-the-shelf tools for data science have matured

As the volumes of data increase and the types of data become more diverse, the set of available data science tools have evolved, too. Not only has the need to scale changed the type of machinery on which typical data science processes run (e.g., from local hardware to managed services in the cloud) — also the types of algorithms have become more diverse and mature. For example, deep learning models like the transformer architecture have significantly grown in importance as they have proven especially useful in Natural Language Processing. A large part of data at StepStone is text data and thus these methods allow to use the valuable information contained therein. A second example is the family of boosted tree methods like xgboost and lightGBM which have shown to be very efficient and are now often the go-to-method for problems of small to intermediate complexity.

This evolution of methods is obviously not specific to StepStone but reflects the accelerating digitization and corresponding increase in available data and the value to be leveraged from it (see here). Furthermore, the impact of covid and the need to understand its temporal evolution has increased the usage of forecasting methodologies.

From individual processes to systems

The increasing scale of data volumes and the potential influence of data science insights on business processes has also fueled the professionalization of how we put data science models to action. For example, the ability to efficiently access and aggregate large volumes of data via distributed systems such as Apache Spark or highly efficient databases such as Amazon Redshift makes it now possible to gain insights from much larger numbers of jobseekers. Furthermore, managed services like AWS Sagemaker allow to flexibly scale the computational infrastructure on which to train machine learning models depending on the size and complexity of the involved datasets and models. Those two developments enable data scientists to experiment with many different combinations of data and models. It is thus not surprising to see the evolution of platforms like MLFlow that allow to track those experiments, keep the overview over which combinations have been successful, and select the best for rollout. Such tools for machine learning operations (MLOps) reduce the cost to develop and maintain new data science models, to share them with other data scientists, and to monitor their performance after they have been put into production. Furthermore, they make it feasible and profitable to build new types of event driven systems that operate and learn in real-time. These developments not only speed up how quickly value can be generated from specific insights, but they also enable systems with processes that interact and listen to each other. This has huge potential for synergies but also requires to understand the potential impact of adaptive, self-learning mechanisms and to understand how we can make such processes unbiased, fair, and robust (see here).

This last point explains the growing importance of data ethics — the increasing impact of data science calls for not only understanding the technical aspects involved, but to also address the social impact that such socio-technical systems have on various levels. Those effects can range from microscopic effects (e.g. unbiasedness of individual recommendations) over mesoscopic impact (e.g. fairness regarding multiple target audiences and the distribution of talent across employers, e.g. gender bias) to macroscopic implications (how sustainable are such processes, how much C02 do they produce?). Expect more content on these topics in our next blog posts J

Embedding data science deeper into the business: Cross functional teams

The catchphrase “form follows function” is true not only in nature, but also is a powerful observation for organizations. The idea that the characteristics of software built in a team reflect its organizational structure can be turned around to improve productivity. Using this insight to shape organizational design has been called the “inverse conway maneuver.” It is one of the reasons why at StepStone, we have decided to build autonomous, empowered, cross-functional teams in many product, tech, and marketing domains. This organizational structure facilitates building modular software systems that consist of reusable building blocks which are easy to use, reliable, and foster creativity to combine them in new ways. Most of our cross functional teams work in sprints, with some research spikes to balance feature delivery and phases of time-boxed research.

To keep the critical mass of data scientists in close exchange, we have also implemented a chapter model that organically connects data scientists from different teams. This horizontal structure provides career support, facilitates knowledge exchange between data scientists in different teams, and helps create a community that is diverse, inclusive and fun to be a part of.

This combination of the challenges we tackle, tech stack, and organizational setup makes StepStone a quite interdisciplinary place to work in data science and help find solutions for the tightening labor market — one of the large upcoming social challenges.

Read more about the technologies we use or take an inside look at our organisation & processes.
Interested in working at StepStone? Check out our careers page.

--

--