INST414 Assignment 1

Michael Kelley
INST414: Data Science Techniques
3 min readFeb 15, 2022

Using the GraphQL Jobs API, I was able to extract data that answered some questions I have regarding job availability and presence. A key insight I wanted to extract from this data was whether certain types of jobs were more common in certain cities. This insight could greatly help in informing decisions of where one lives, depending on the type of employment the individual in question seeks, as well as the inverse: which jobs are sought depending on where one lives.

The data that can answer this uncertainty is data on available jobs based on postings in different cities throughout the United States and around the world. This data is aggregated from job postings by the GraphQL API, which makes it a very convenient source for collecting and analyzing the data.

The GraphQL API came with existing functions and types that made it relatively simple to collect the data needed for this research. With some rudimentary queries typed into the console, I was able to extract some data on the types of jobs being posted by different prospective employers, and where they are currently being offered.

Based on some initial analysis of the data collected by this query, the most employable cities are San Francisco, Berlin, London, and New York. These cities produce the greatest quantity of results when the query used for the sake of this analysis is run. While this does not necessarily suggest that these are the places where it is easiest to find a job, given the high population of the cities and varying qualifications sought by employers, it does suggest that there is a higher amount of jobs available overall. This could potentially make these cities more appealing to people seeking employment and willing to move where they need to in order to get it.

Some examples of bugs that I encountered were syntax errors, a lack of required inputs, and incorrect types being used for variables. These are to be expected, given my relative lack of experience with APIs, including this particular API, compared to my knowledge of programming in general. I fixed syntax errors and type mismatches by modifying my code so that the format fit what was required in order for GraphQL to function normally. The lack of required inputs was worked around by modifying the code to retrieve data on cities overall, rather than any specific city, and have the console list available jobs within each city returned by the query.

Here is a visual summary of data.

Overall, this scraping approach was a worthwhile exercise for learning and academics, but not a reliable source of information. This is in large part due to the nature of the information available to the GraphQL API. It does not scrape the internet as a whole, but rather, it uses jobs that were posted into it ahead of time so that queries can retrieve data internally. A more thorough search would produce far more results from a wider array of cities by nature of searching external sources for job postings instead of simply relying on the API’s own library. My analysis is also relatively basic and does not consider the requirements for jobs listed by employers, salaries, job security, or many other relevant factors that one would likely consider when seeking employment, especially if it is connected to the decision of where one will live. In the future, a more detailed query with more detailed results would likely make for better analysis and information.

--

--