Mind the skills gap! Measure it with a GloVe (pun intended)

Nuri
7 min readJul 16, 2019

--

I was honoured to take part in the Asia Pacific Economic Cooperation initiative focused on closing the digital skills gap. We spent the day discussing ideas re: keeping up with emerging technologies and how academia, government and private sector could work together towards re-skilling the workforce.

Project DARE (Data Analytics Raising Employment) focuses on preparing the population for the fast changing demands of the world. I have enclosed a photograph of the event and a summary link of previous activities in Vietnam.

Asia Pacific Economic Corporation — Closing the Digital Skills Gap Forum — July 15th 2019 — Photo credit — Mohammad Juffry Mohamed Najib Mohamad, Chor Meng Tan

How to measure skills gaps?

A key topic that came up in the discussion is “how to measure skills gaps”. I am interested in exchanging ideas on how to develop rigorous approaches to measure supply, demand and evolution of skills real-time so feel free to reach out if you have more thoughts about this!

A real life example… setting the context:

In my earlier post, I spoke about word embeddings for talent acquisition. So we consider a general approach for quantifying skills gaps by applying a similar framework.

As an example, take my current academic profile (note this is a skeleton version of my position title and key skills, which would normally contain a lot more descriptive text and detail):

Position title: Data scientist

Skills: Python, Machine Learning, SQL

I want to apply for a new job, but no role fits my current set of competencies exactly…. So how do I decide which one to apply for? I only have a subset of the skills for these positions, so how do I measure my distance from each job and obtain recommendations on how to close my “skill gap”? Consider the example below:

Job1 position title: Machine Learning Engineer

Required competencies: Python, Machine Learning, Deep Learning

— — — — — — — — — — — — -

Job 2 position title: Data Engineer

Required competencies: Hadoop, Scala, Spark, Hive, Flask

— — — — — — — — — — -

Job 3 position title: Full stack Developer

Required competencies: Javascript, NodeJS, Angular JS, SQL

I’d like to know how far I am from getting each job, and if my profile matches the job requirements, what should I do to become fully qualified for the new position… See Figure 1 for an illustration of my problem.

Figure 1: Looking to apply for a new job, but I’m not sure which position fits best given my profile

How do I quantify the distance between myself and each job?

So here we code… I mean, here we go! Consider a pre-trained public GloVe vector on twitter data (see figure below and the explanatory link on GloVe vectors from Stanford NLP).

A pre-trained twitter model may not be the best choice in this use case, because it contains redundant text information that may not be relevant to the context and add noise to the inferences. However, training new models will take some time, so this public GloVe model is a good example.

Figure 2: Loading public GloVe pre-trained model from twitter data.

Figure 2 shows how to read a pre-trained model using the gensim library to find related terms to any word of interest in the corpus. In addition, the cosine similarity can be a metric of how similar two words are. “Research” is the closest most similar word to “data science” according to this model so it’s probably not the best context for mining skills as there could be more relevant terms that relate to it.

With this approach, a vector representation of each word can be created (Figure 3) by combining individual vectors of multiple words and taking the average of the vector (there are many methods for embedding sentences so average is used here for simplicity).

Figure 3- Embedding the word “data science” using a pre-trained GloVe model from twitter data.

Finally, once the words are embedded, this technique can be applied to encode my skills profile and the job descriptions. The goal is to measure the distance between myself (the data scientist job seeker) and the different job postings available to me.

Figure 4: In the last part of the code, I encode my profile and the job description skills as vectors derived from my pre-trained GloVe model, which already contains a notion of distance between words based on millions of tweets. The most similar match to my profile is the machine learning engineer role, followed by the data engineer and data scientist.

The results in Figure 4 show that the cosine similarity between me and the machine learning engineer role is 0.93, while the similarity between me and the data engineer is 0.81, followed by 0.75 for the full stack engineer. In this case, it seems that the best match for my profile is the machine learning engineer role.

However, this model outputs a higher cosine similarity between me and the machine learning engineer because words match exactly. This may not always be the case when different companies use different taxonomies and terminology to describe similar sets of skills and competencies. Thus, the problem in real life is more complex and some industry-wide standards are needed.

Finally, it would be ideal if I could not only get a distance between me and the shortest gap between roles. It would be better if I could be recommended some concrete actions I can take to upskill myself to be fit for the machine learning engineer role (Figure 5). The idea is to target personalized learning recommendations given my professional profile to provide me the shortest path based on my skillset, with estimated time effort and courses I need to follow to pursue other career paths.

Figure 5 — Visualizing the distance between myself and the three possible roles I could apply for. Since I don’t have the exact skillset for any of the roles, I am given recommendations on how I can train myself to become qualified for those positions and the estimated effort required to learn.

Wider considerations

The framework presented is an overly simplified approach towards addressing the closure of skills gaps. The challenge here is to develop an approach that can generalize across taxonomies, markets and industries. In addition, resumes, job descriptions and web crawled text can be very noisy so the accuracy of producing relevant matches decreases. Maybe a named-entity recognition step with skill-related terms prior to embedding would help to extract only those words that relate to skills and expertise. Other dimensions like time in role, length of role, depth of expertise and academics are also important to consider so the framework is just a starting point.

Useful reports

The LinkedIn economic graph provides interesting reports and analytics on industry skills gaps and career gap analysis, providing a consolidated WW view of various markets and industry. This can help set the context and framework for any profession we are interested in researching. Burning glass technologies also publishes research on trending skills looking at the demand/job posting side. Skills future Singapore and the NTUC learning center are spearheading efforts in continuous professional development in Singapore which will be crucial towards training the workforce of the future. In addition, IBM has a Watson Career coach aimed at providing individual learners a personalized experience based on their interests and career aspirations.

Renewable agile workforce

A misconception about addressing the closure of skills gaps is that once they’re closed we are done. The AI revolution is upon us, and technology is evolving faster than policy as many roles are rapidly becoming obsolete. This means that several industries are being disrupted by automation.

AI provides a huge opportunity to generate new types of job roles. Given the fast pace of innovation, we can no longer expect to have a static career and believe once we’re done with university it ends there. A renewable agile workforce will require continuous re-training and a growth mindset, whereby we will be constantly acquiring new skills to keep up with market demands. Going to university each time we need to learn something new is a costly and time consuming process. Therefore, we must all be ready to renew our skills and leverage all the online resources available to us to keep up with emerging technologies.

Taking my own personal example, I was not born a data scientist and I am not sure if my profession will be what it is now 5–10 years in the future. What really helped me to get here was leveraging open source and short courses from Udacity, Udemy and Coursera and IBM badges and data science certifications to start applying new techniques on my day-to-day business problems.

The key to success is training ourselves to learn how to learn and having a flexible approach towards our work, knowing that things will be in constant movement. Being adaptable is key and a change in mindset will be required in order to keep up with the fast pace of innovation.

--

--

Nuri

I am a data scientist with a passion for sustainability and social impact projects.