Software Developer Jobs in Berlin: A Textual Anaylsis

Scraping and analyzing job entries from Glassdoor.

Vlad Gheorghe
The Startup
6 min readOct 10, 2020

--

Introduction

Motivation

Landing the first job in the IT industry is not easy. As I moved to Berlin and started browsing through job entries, I figured I might benefit from a few data-driven insights. Here is a brief report of what I’ve learned.

Scraping Jobs with Selenium

For my data source, I selected the popular jobs site Glassdoor. Since their API is not directly accessible, I had to scrape the jobs on my own. The process was not simple. I had to access the website, login with my details, and examine every job posting in turn.

I knew that Selenium WebDriver was the tool for the job. I found a relevant project on GitHub, but it didn’t work for me. (Hardly surprising, as the code is two years old, and popular websites change all the time). So I wrote a new script from scratch. It takes job title and location as input and executes the search.

You can access the code at my GitHub.

The data

On September 4, 2020 I extracted 834 job entries. I used the search term ‘software developer’ and set the location to Berlin. For each job I collected the title, description, and the name and rating of the company.

The data and the notebook with the analysis are available at my GitHub.

Insights

1. German vs. English

Some postings were in English, others in German. To proceed with the text analysis, I had to equalize them. Moreover, I asked myself: how important is German knowledge in the context of these jobs?

Fortunately, Google Cloud provides free access to its translating service through the ‘Free trial credit’ program. I implemented the Google Translate API and used it to translate all the postings. Before translating, the service detects the original language. Thus, I could see that a majority of postings were written in English.

Figure 1. Language of the job description.

However, there is a selection effect at work, as my search query was in English. In fact, at least 38% of the job descriptions mention the German language. The importance of knowing German should not be underestimated.

2. Companies

Berlin’s IT scene is quite rich. For this dataset, I identified 432 distinct companies. The company with most postings was Amazon, with 31 entries. The other companies had no more than 11 entries each.

Figure 2. Companies word cloud. Size is proportional to the number of entries.

Glasdoor provides a rating from 1 to 5 for most companies. By looking at the rating distribution, we obtain a general assessment of the job satisfaction in this domain.

Figure 3. Companies’ ratings distribution.

The distribution is biased towards higher ratings. This suggests that the employees are generally satisfied with these companies.

3. Job titles

‘Software developer’ is a broad term. To gain more insight into the nature of the jobs, we can look at the word cloud for job titles.

Figure 4. Word cloud for job titles.

I was surprised by the preponderance of DevOps jobs. Looking at the qualifications appearing in job titles, it seems that DevOps is favored.

Figure 5. DevOps is mentioned often in job titles.

If we look at seniority qualifications, we find that almost 300 job titles ask for seniors, while only 28 mention juniors or students.

My guess is that the majority of positions want someone who’s in the middle — neither too green nor too ripe. More on that below.

Figure 6. Seniority qualifications in job titles.

4. Technologies

IT jobs are strongly defined by the technologies which the applicant is required to understand. To study which technologies were included in my data, I parsed the job descriptions for upper-cased words. I examined the words which were mentioned in at least 20 descriptions and selected those which represented a specific technology. I then mapped each word to a universal referent to avoid confusions and repetitions (e.g. Go and Golang are the same technology). In the end I obtained a list of relevant technologies.

I then scored each technology on the number of job descriptions in which it appeared.

Figure 7. Technologies by mentions in job descriptions.

JavaScript, Java and SQL were the most popular technologies, followed by AWS and Docker. Python was only sixth — not very surprising, given that most of these jobs are not data-related.

If we look at technologies mentioned in job titles, Java is by far the most popular. PHP is second, despite its shadow of decline, and Python is third.

After this analysis, I am even stronger in my resolve to learn React. It’s the only framework that has 15 job titles all to itself!

Figure 8. Technologies by mention in job titles.

5. Experience

How many years of experience are typically required? This is a crucial question. I didn’t have that information, so I had to extract it from the descriptions. To this purpose, I looked for “n-years expressions”. These are parts of a text where we find the word ‘years’ preceded by a number.

From a qualitative analysis, I concluded that “n-years expressions” serve mainly three functions in job descriptions: they indicate the duration of the contract, they relate to the history of the company, or they refer to the years of experience. I was only interested in the latter, so I selected expressions where the subsequent 10 words included the word ‘experience’.

The numerical expressions sometimes indicated ranges, such as ‘5–7 years’ or ‘5+ years’. In that case, I selected the lower end of the range (5 in both examples). To filter out false positives, I manually verified that no description asked for more than 10 years.

Some descriptions had more than one “n-years expressions”: for example, a job may ask for 5 years experience in Java and 2 years in Go. I wanted to find the binding conditions for obtaining the job, so I selected the highest requirement.

Overall, this method allowed me to identify the required years of experience for 245 jobs.

Figure 9. Required years of experience.

Most jobs require between 3 and 5 years of experience. From the cumulative plot, we see that around 50% of the jobs require at least 3 years of experience.

Figure 10. Required years of experience (cumulative).

However, this should not be too disheartening for beginners. Most likely, recruiters who are willing to accept less years of experience are not going to state it explicitly.

Findings

This is a preliminary analysis and there is plenty of opportunity for more investigations.

Here are the findings so far:

  1. At least a third of the jobs place importance on the German language.
  2. A great number of companies are posting jobs in Berlin. Their employees ratings are generally high.
  3. DevOps is more popular than I expected as far as specializations go.
  4. Most jobs require between 3 and 5 years of experience. The majority of positions don’t mention seniority qualifications explicitly.
  5. If you don’t know which languages to study, Java, JavaScript and SQL are your best bets. If you have more time, study AWS and Docker. Picking JavaScript and React is a strong move.

--

--

Vlad Gheorghe
The Startup

I like to learn difficult things and explain them simply.