Visa-friendly countries/locations for international Data Science professionals

Yaping Lang
7 min readNov 7, 2022

Put Conclusions first:

For Data Scientists, Data Engineer, Machine Learning or Software Engineer professionals who are interested in relocation and visa sponsoring opportunities to explore another country, among cities of these 11 regions (Germany, Netherlands, United States, France, United Kingdom, Poland, Canada, India, Brazil, Italy, England), targeting below mentioned cities and skillsets when preparing yourself for relocation could be a good bet:

  • Job opportunities:

Data Engineer > Software Engineer > Machine Learning > Data Scientist

  • Top 5 Cities that has most visa-sponsoring jobs:

Berlin> Amsterdam>Paris >Bengaluru(India)> Munich>Oxford

  • Top 5 Countries that has most visa-sponsoring jobs:

Germany>Netherlands>United States>United Kingdom> France

  • Top 5 Skills/Experience required:

Framework: Docker > Pandas> Numpy > Tensorflow > Scikit Learn

Platform: Microsoft Azure> Google Cloud >IBM> Amazon AWS

Database: MySQL>Snowflake > Cassandra>ElasticSearch>MongoDB

Languages: Python>SQL>Java>Go>R

Background:

This is the first project as part of the Udacity Data Scientist nanodegree course, where you are asked to apply Cross Industry Standard Process for Data Mining (CRISP-DM) to a topic of interest.

Applied in practical steps:

  • Come up with three questions you are interested in answering.
  • Extract the necessary data to answer these questions.
  • Perform necessary cleaning, analysis, and modeling.
  • Evaluate your results.
  • Share your insights

The questions I’m interested to answer through data and motivation:

I’ve always wanted to explore a lifestyle as a digital nomad where you live and work in different places of your interests for a period of time. But be able to legally work in a place, you first need to look out for opportunities that sponsor visas or work permits. There are a vast amount of jobs being posted but it is just time-consuming to search, click one by one, look for the visa sponsor terms in the job description and then pick the ones that match that requirement. So I want to use a job posting dataset and perform text analysis on the Job Description visa-related text to quickly filter out the jobs that provide visa sponsorship and relocation support for international applicants. Once get the filtered result, in addition to looking at the job list, we can also derive some valuable clues as to which country/city is more open to global talents, and what are the most needed skill sets for these job opportunities. So let’s dive in!

Process:

  1. Come up with three questions you are interested in answering.

I want to know :

  • Among Data engineers, Data Scientists, Machine Learning, and Data Scientists, Which job is most in demand in the job market?
  • Which country or location has the most such job openings? It should somehow suggest this country actively developing in tech
  • Which country or location has the most job openings that provide visa sponsorship worldwide talents? It should indicate which country is more open to international tech talents.
  • Among those jobs that provide visa sponsorship, what are some general required skills/experiences?

2. Extract the necessary data to answer these questions.

Original idea is to scrape the latest worldwide job posting data of LinkedIn. But as I had some previous experience scraping site that has rate limiting and authentication restrictions, it took me a quite long time to figure out that. So for the time constraint of this project, I decided to look for a similar dataset on Kaggle and use that directly.

  • Data Source used: https://www.kaggle.com/datasets/mertguvencli/linkedin-jobs
  • Time of Data: Collected As of 2022/2/26 21:56:06 from Linkedin Jobs
  • Search keyword: [‘Data Scientist’, ‘Data Engineer’, ‘Machine Learning’, ‘Software Engineer’]
  • Country : [‘United States’, ‘Canada’, ‘Netherlands’, ‘Germany’, ‘England’, ‘India’, ‘United Kingdom’, ‘France’, ‘Brazil’, ‘Poland’, ‘Italy’]
  • Volumn: 26565 items
  • What kinds of job info included: [‘row_id’, ‘created_at’, ‘modified_at’, ‘task_id’, ‘keyword’, ‘country’, ‘job_id’, ‘company’, ‘title’, ‘location’, ‘salary’, ‘description’, ‘skills_frameworks’, ‘skills_databases’, ‘skills_platform’, ‘skills_prog_langs’]

3. Perform necessary cleaning, analysis, and modeling.

See https://github.com/lilyyapinglang/linkedin_visajobs

4. Evaluate your results.

  • Among Data engineers, Data Scientists, Machine Learning, and Data Scientists, Which job is most in demand in the job market?
Sort on Keyword

We can see from these results, there are most jobs matching the Data Engineer search keyword than other 3.

Sort on Job Title

When sorting on actual job posting `title`, it gives finer-grained and similar results.

  • Which country or location has the most such job openings? It should somehow suggest this country actively developing in tech
Job counts by Country

We can see that the US takes the lead followed by Germany and Canada. But we can also there’s no dramatic difference between countries.

When we look at the company level and calculate the counts, we get a sense as to which companies are actively hiring in these fields, it is no surprise that global tech giant like Amazon, Meta, and IBM is taking the lead.

  • Which country or location has the most job openings that provide visa sponsorship worldwide talents? It should indicate which country is more open to international tech talents.

After removing the job posts that are not in English, not mentioning visa support, or don’t support visa or relocation, we roughly got 312 entires:

VisaJobs By Country

We can see Germany, the Netherlands are most welcoming international tech talents (Data science, software engineer ).

Visa job by City

When looking at city, Berlin and Netherlands outweighs other cities on the list.

  • Among those jobs that provide visa sponsorship, what are some general required skills/experiences?

As I didn’t find an appropriate nltk tech words corpus to do the string/text segmentation, so I used the single word count frequency to derive these results. Also for skill_prog_langs , as a lot of name of programming languages is a single letter or not dictionary words. Improvement idea would be to use a tech words nltk corpus or use a manually created set of special words to group them into meaningful tech skills.

Skills _Frameworks

We can see Docker, Pandas, Numpy, Tensorflow, Scikit Learn, Apache Spark are the most demanded framework skills.

Skills_Platform

We can see MS Azure, Google Cloud Platform, IBM Cloud and AWS are the most commonly required platform skills .

Skills_Database

When it comes to databases, the competitive skills are MySQL, snowflake,Cassandra, elastic search, MongoDB.

Skills_Programming_Languages

By manually identifying the sensible results from this frequency table, we can derive that these programming languages are of high demand for Data Science jobs: Python, SQL, Java, Go, R, Javascript.

5. Share your insights

Shared at the beginning of article.

Limitation & Improvement Idea:

  1. Data preparation:
  • Scrape more recent data, maybe can also make it a scheduled job
  • Add more similar possible keywords, probably with the help of Find Synonyms from NLTK WordNet in Python
  • Include more information about the job posting, such as post date, job type, company employee counts, company industry, number of applicants etc.
  • Include job posting from multiple mainstream sites instead of just one, linkedin jobs, indeed, etc.
  • Include job postings from more countries, worldwide if possible.

2. Data cleaning :

  • Current job posting filter words are: visa, visas, work permit, relocation. Could train to find similar words and expand the filter words accordingly, which may produce more results.
  • When detecting the sentiment of sentences that contains the filter words, both nltk , textblob are not giving the full correct result. A few negative sentences are getting recognized as positive thus affecting the accuracy of results.
  • For skills-related columns, it is hard to use a tech phrases library to do accurate entity extraction and then frequency count. Currently, the frequency is used, need to find a smart way to deliminate those columns' text.

--

--

Yaping Lang

Software dev, Scrum, Data science. Interested to explore what tech can do for social good > being exploited by consumerism. https://github.com/lilyyapinglang