Week 3 — This is the way!

Baran Orhan
AIN311 Fall 2022 Projects
3 min readDec 5, 2022

by Erdem Korhan Erdem and Baran Orhan

How was the journey with Razor Crest to Week 2 blog?

https://www.behance.net/gallery/94933395/The-Mandalorians-Razor-Crest-cross-section

After a tiring week, we want to share our progress with you.

Please consider that we are not using any ready-to-use dataset and think about how hard to scrap it.

This week we focused on getting data and handling problems with getting data ready.

Skills Dataset

Starting with LinkedIn, everything is going well so far. We get the skills of 20 software developers to see that nothing wrong.

This is one example of the data we get from LinkedIn. We do not want to get any problems with either LinkedIn or the developer, so not showing the names.

All skills are obtained from real software developers still working on good companies with good lawyers :)

0,{‘https://www.linkedin.com/in/********/': {‘name’: ‘******’, ‘skill’: [‘Microsoft Office’, ‘Teamwork’, ‘Software Development’, ‘Team Leadership’, ‘Management’, ‘Public Speaking’, ‘Research’, ‘Web Development’, ‘SQL’, ‘Team Management’, ‘Project Planning’, ‘Project Management’, ‘C++’, ‘Leadership’, ‘Computational Photography’, ‘Robotics’, ‘iOS’, ‘Business Analysis’, ‘Matlab’, ‘JavaScript’, ‘Software Development’, ‘Research’, ‘Web Development’, ‘Project Planning’, ‘Project Management’, ‘Computational Photography’, ‘Robotics’, ‘Business Analysis’, ‘Calligraphy’, ‘Microsoft Office’, ‘SQL’, ‘C++’, ‘iOS’, ‘Matlab’, ‘JavaScript’, ‘Teamwork’, ‘Team Leadership’, ‘Management’, ‘Public Speaking’, ‘Team Management’, ‘Leadership’]}}

Part of the skill dataset

Course Outcome Dataset

Thanks to Udemy, they are not allowing autonomous systems. We came out with guns blazing. Here is our dataset for the outcomes of courses.

We will get 15 courses for planned areas like ML, frontend, backend, Cyber Security etc.

0,{‘https://www.udemy.com/course/machinelearning/': {‘Outcomes’: [‘Master Machine Learning on Python & R’, ‘Have a great intuition of many Machine Learning models’, ‘Make accurate predictions’, ‘Make powerful analysis’, ‘Make robust Machine Learning models’, ‘Create strong added value to your business’, ‘Use Machine Learning for personal purpose’, ‘Handle specific topics like Reinforcement Learning, NLP and Deep Learning’, ‘Handle advanced techniques like Dimensionality Reduction’, ‘Know which Machine Learning model to choose for each type of problem’, ‘Build an army of powerful Machine Learning models and know how to combine them to solve any problem’]}}

Part of the course outcome dataset

Update about last week

As explained in the previous week, we are currently seeking ways to obtain labeled data. Unfortunately, we could not find a CS domain-specific entity dataset. In the end, if we come up with an entity dataset suitable for our task, we plan to use that to train the model. However, we know that it is possible not to end up with a ready-to-use entity dataset. For this reason, we thought we could perform the labeling process on our own in the worst case. Therefore, we are also looking for convenient labeling techniques for NER datasets.

Below you can see a sample of a labelled entity dataset[1]:

Wolves and Lions labeled
Apollo is labeled

See you in next week's blog. You think I forget something, but no.

Reference:

[1]- https://github.com/aritter/twitter_nlp/blob/master/data/annotated/wnut16/data/train

--

--