Visa-friendly countries/locations for international Data Science professionals

7 min readNov 7, 2022

Put Conclusions first:

For Data Scientists, Data Engineer, Machine Learning or Software Engineer professionals who are interested in relocation and visa sponsoring opportunities to explore another country, among cities of these 11 regions (Germany, Netherlands, United States, France, United Kingdom, Poland, Canada, India, Brazil, Italy, England), targeting below mentioned cities and skillsets when preparing yourself for relocation could be a good bet:

Job opportunities:

Data Engineer > Software Engineer > Machine Learning > Data Scientist

Top 5 Cities that has most visa-sponsoring jobs:

Berlin> Amsterdam>Paris >Bengaluru(India)> Munich>Oxford

Top 5 Countries that has most visa-sponsoring jobs:

Germany>Netherlands>United States>United Kingdom> France

Top 5 Skills/Experience required:

Framework: Docker > Pandas> Numpy > Tensorflow > Scikit Learn
Platform: Microsoft Azure> Google Cloud >IBM> Amazon AWS
Database: MySQL>Snowflake > Cassandra>ElasticSearch>MongoDB
Languages: Python>SQL>Java>Go>R

Background:

This is the first project as part of the Udacity Data Scientist nanodegree course, where you are asked to apply Cross Industry Standard Process for Data Mining (CRISP-DM) to a topic of interest.

Applied in practical steps:

Come up with three questions you are interested in answering.
Extract the necessary data to answer these questions.
Perform necessary cleaning, analysis, and modeling.
Evaluate your results.
Share your insights

The questions I’m interested to answer through data and motivation:

I’ve always wanted to explore a lifestyle as a digital nomad where you live and work in different places of your interests for a period of time. But be able to legally work in a place, you first need to look out for opportunities that sponsor visas or work permits. There are a vast amount of jobs being posted but it is just time-consuming to search, click one by one, look for the visa sponsor terms in the job description and then pick the ones that match that requirement. So I want to use a job posting dataset and perform text analysis on the Job Description visa-related text to quickly filter out the jobs that provide visa sponsorship and relocation support for international applicants. Once get the filtered result, in addition to looking at the job list, we can also derive some valuable clues as to which country/city is more open to global talents, and what are the most needed skill sets for these job opportunities. So let’s dive in!

Process:

Come up with three questions you are interested in answering.

I want to know :

Among Data engineers, Data Scientists, Machine Learning, and Data Scientists, Which job is most in demand in the job market?
Which country or location has the most such job openings? It should somehow suggest this country actively developing in tech
Which country or location has the most job openings that provide visa sponsorship worldwide talents? It should indicate which country is more open to international tech talents.
Among those jobs that provide visa sponsorship, what are some general required skills/experiences?

2. Extract the necessary data to answer these questions.

Original idea is to scrape the latest worldwide job posting data of LinkedIn. But as I had some previous experience scraping site that has rate limiting and authentication restrictions, it took me a quite long time to figure out that. So for the time constraint of this project, I decided to look for a similar dataset on Kaggle and use that directly.

Data Source used: https://www.kaggle.com/datasets/mertguvencli/linkedin-jobs
Time of Data: Collected As of 2022/2/26 21:56:06 from Linkedin Jobs
Search keyword: [‘Data Scientist’, ‘Data Engineer’, ‘Machine Learning’, ‘Software Engineer’]
Country : [‘United States’, ‘Canada’, ‘Netherlands’, ‘Germany’, ‘England’, ‘India’, ‘United Kingdom’, ‘France’, ‘Brazil’, ‘Poland’, ‘Italy’]
Volumn: 26565 items
What kinds of job info included: [‘row_id’, ‘created_at’, ‘modified_at’, ‘task_id’, ‘keyword’, ‘country’, ‘job_id’, ‘company’, ‘title’, ‘location’, ‘salary’, ‘description’, ‘skills_frameworks’, ‘skills_databases’, ‘skills_platform’, ‘skills_prog_langs’]

3. Perform necessary cleaning, analysis, and modeling.

See https://github.com/lilyyapinglang/linkedin_visajobs

4. Evaluate your results.

Among Data engineers, Data Scientists, Machine Learning, and Data Scientists, Which job is most in demand in the job market?

We can see from these results, there are most jobs matching the Data Engineer search keyword than other 3.

When sorting on actual job posting `title`, it gives finer-grained and similar results.

Which country or location has the most such job openings? It should somehow suggest this country actively developing in tech

We can see that the US takes the lead followed by Germany and Canada. But we can also there’s no dramatic difference between countries.

When we look at the company level and calculate the counts, we get a sense as to which companies are actively hiring in these fields, it is no surprise that global tech giant like Amazon, Meta, and IBM is taking the lead.

Which country or location has the most job openings that provide visa sponsorship worldwide talents? It should indicate which country is more open to international tech talents.

After removing the job posts that are not in English, not mentioning visa support, or don’t support visa or relocation, we roughly got 312 entires:

We can see Germany, the Netherlands are most welcoming international tech talents (Data science, software engineer ).

When looking at city, Berlin and Netherlands outweighs other cities on the list.

Among those jobs that provide visa sponsorship, what are some general required skills/experiences?

As I didn’t find an appropriate nltk tech words corpus to do the string/text segmentation, so I used the single word count frequency to derive these results. Also for skill_prog_langs , as a lot of name of programming languages is a single letter or not dictionary words. Improvement idea would be to use a tech words nltk corpus or use a manually created set of special words to group them into meaningful tech skills.

We can see Docker, Pandas, Numpy, Tensorflow, Scikit Learn, Apache Spark are the most demanded framework skills.

We can see MS Azure, Google Cloud Platform, IBM Cloud and AWS are the most commonly required platform skills .

When it comes to databases, the competitive skills are MySQL, snowflake,Cassandra, elastic search, MongoDB.

By manually identifying the sensible results from this frequency table, we can derive that these programming languages are of high demand for Data Science jobs: Python, SQL, Java, Go, R, Javascript.

5. Share your insights

Shared at the beginning of article.

Limitation & Improvement Idea:

Data preparation:

Scrape more recent data, maybe can also make it a scheduled job
Add more similar possible keywords, probably with the help of Find Synonyms from NLTK WordNet in Python
Include more information about the job posting, such as post date, job type, company employee counts, company industry, number of applicants etc.
Include job posting from multiple mainstream sites instead of just one, linkedin jobs, indeed, etc.
Include job postings from more countries, worldwide if possible.

2. Data cleaning :

Current job posting filter words are: visa, visas, work permit, relocation. Could train to find similar words and expand the filter words accordingly, which may produce more results.
When detecting the sentiment of sentences that contains the filter words, both nltk , textblob are not giving the full correct result. A few negative sentences are getting recognized as positive thus affecting the accuracy of results.
For skills-related columns, it is hard to use a tech phrases library to do accurate entity extraction and then frequency count. Currently, the frequency is used, need to find a smart way to deliminate those columns' text.

Github repo:

https://github.com/lilyyapinglang/linkedin_visajobs

Skills practice resources:

Docker https://www.docker.com/play-with-docker/
Pandas https://github.com/guipsamora/pandas_exercises
Numpy https://numpy.org/learn/
Tensorflow https://developers.google.com/machine-learning/crash-course/first-steps-with-tensorflow/programming-exercises?hl=zh-cn
Scikit Learn https://github.com/mpfrush/Python-Scikit-Learn-Exercises
Apache Spark https://github.com/XD-DENG/Spark-practice
PyTorch https://cs230.stanford.edu/blog/pytorch

Visa-friendly countries/locations for international Data Science professionals

Written by Yaping Lang