Week 2 — This is the way.

Erdem Korhan Erdem
AIN311 Fall 2022 Projects
4 min readNov 20, 2022

by Baran Orhan and Erdem Korhan Erdem

Hello again, and welcome to the second weekly blog post of our very own course recommendation project. Last week, we have introduced the key ideas and the main steps required in our project. In this week, we are going to get in some more technical details. Then, let’s talk about some machine learning techniques we are planning to use.

As explained in the previous week, our plan is to recommend courses which have outcomes related to the skills of engineers working in user-specified fields. In this case, our main goal is to extract skills from raw course outcome texts. To fulfill this task, we have come up with two approaches:

Named Entity Recognition (NER)

Since extracting skills out of raw text is directly related with NLP, we are going to use an NLP approach, NER.

Named Entity Recognition is used to identify named entities in unstructured text. For better understanding, let’s give an example. Assume that we have HTML in raw course outcome and we have predetermined the course category as Web Development. When a new raw text containing HTML comes, the model will be highly likely to classify the course’s category as Web Development.

Our main purpose is to detect entities that refer to skills. Web development is an entity with two tokens. But how can our model know that this words refers to entity? To handle it, we need to create entity categories. As an instance in a different field, names like Facebook, Amazon can be examples of words that are referring companies. We can also create our own examples of entities for our project unless we find ready-to-use labeled data. Still searching entities related to our work.

Research called SKILL: A System for Skill Identification and Normalization[1] can be example for using NER and NEN for extracting skills. This automated system can be used for faster recruitment processes.

Figure for understanding how the concept work. This figure is from the SKILLSPAN[2] extract soft and hard skills from text. In this work they referred Dev/Sec ops as knowledge but do not confused with the terminology.

After some research and understand the concepts of NER and NEN we will be able to give more information about libraries.

Open source libraries for NER:

Named Entity Normalization (NEN)

After entity recognition, the next issue we need to consider is entity normalization. In many text documents, we can encounter entities that might be written in various forms. This discrepancy might be encountered in two different cases: ambiguity and synonymy. [3]

Firstly, synonymy arises when different names refer to the same entity. For instance, the strings AI or ML can be used to refer the concepts artificial intelligence and machine learning respectively.

Secondly, ambiguity arises when distinct entities share the same word: e.g word Hacettepe may refer to Hacettepe University, Hacettepe Hospital or Hacettepe Street.

In this case, names should not be only identified as NER does, but also need to be normalized to the concepts they refer to. One approach to fulfill this normalization task is Named Entity Normalization (NEN). In our text course outcome documents, we think that we may encounter these ambiguity and synonymy cases. To prevent ambiguity&synonymy, and boost the performance of NER, we think that we can end up with more satisfactory results with using NEN besides NER.

Data Collection

Besides seeking for approaches we might use, we are also getting started to collect the required data. From this week on, we are going to gather engineers’ skills and online course outcomes as our main data source.

We are going to use LinkedIn’s ‘skills’ section to collect engineers’ skills.
Udemy’s course outcome section

In these conditions, it seems that next couple weeks will include more research on how to obtain labeled data, and web scraping. Thanks for reading and until next week, au revoir.

Keep in mind, this is the way.

--

--

Erdem Korhan Erdem
AIN311 Fall 2022 Projects

Artificial Intelligence Engineering senior student at Hacettepe University.