Where inventions happen: Geo-locating global intellectual property

Arash Hajikhani
8 min readJan 23, 2020

--

  • Getting to know where on the glob the inventions happen.
  • Tapping on Google’s BigQuery, which is a serverless, highly scalable enterprise data warehouse.
  • Detailed steps on getting BigQuery access established and browsing the public patent data set.
  • Step by step process of navigating in Google’s patent database, learning the database and its various tables and constructing a table needed for Geo-locating the inventions.
  • Producing a visualized map of all inventions which has been patented by a Finland based Inventors.
  • The code has been prepared in advance so you do not need to follow the steps one by one.

I will start giving some background information about intellectual property and patents. Some context is always good to have so make the later data crunching exercise more meaningful. I will reference to other sources heavily not necessarily scientific articles but more public websites for the sake of free access and their popular contents to follow. The referencing exercise will reduce the length of the article. I use many pictures and provide the easy to copy codes for fast experimentations. Hope you enjoy it and write to me about your experience and of course what to explore next!

Background

A patent is a right granted to an inventor by the federal government that permits the inventor to exclude others from making, selling or using the invention for a period. Therefore, patents are known as one of the means of the invention.

Government agencies typically handle and approve applications for patents. In the United States, the U.S. Patent and Trademark Office (USPTO), which is part of the Department of Commerce, handles applications and grants approvals. In the case of Finland is Finnish patent and registration office (https://www.prh.fi/en/patentit.html).

Patents provide an incentive for companies or individuals to continue developing innovative products or services without the fear of infringement. For example, large pharmaceutical companies can spend billions of dollars on Research and Development (R&D). Without patents, their drugs and medicines could be duplicated and sold by companies that did not research or invest the needed capital for R&D. In other words, patents protect the intellectual property of companies to help their profitability. However, patents also serve as bragging rights for companies demonstrating their innovativeness.

Here is an example of a patent document with the main sections explained:

A sample patent document

Patents are somewhat complex documents with detail different pieces of vital information. Patent documents have a consistent structure that makes it easy to read with practice. The consistency and structured way of data and information presentation makes the data analytics parts all exciting when small individual numbers become aggregated!

Examples of Patents that you might have used or encounter in your lives:

  • One of the most notable patents in the past 40 years was the personal computer filed in 1980 by Steve Jobs and three other employees of Apple Inc.
  • King C. Gillette patented the razor in 1904 and was dubbed a “safety razor.”
  • Garrett Morgan was granted a patent for the traffic light in 1923.
  • The patent for the television was issued in 1930 to Philo Taylor Farnsworth for the “first television system.” At age 20, Farnsworth had created the first electric television image and went on to invent an early model of the electronic microscope.

Patent statistics and analytics

Patent analytics are a set of techniques and tools for studying the information presented within and attached to patents or a major form of intellectual property. Patent analytics service helps to extract and exploit the value of patent portfolios and patent information which unlock detailed insights that help companies to plan and create profitable IP and Research & Development strategies.

There are various services which enable for patent search such as Google itself offering the Google Patents. Google cloud platform hosts various public databases including Patents.

In this post, I will demonstrate how to utilize the patent database in BigQuery to get some insights. The idea is to show the knowledge hidden in various tables and the power of the story they reveal when the careful merging of various tables across BigQuery is practiced.

Setup access to Google BigQuery:

The first thing is to set up your access to “Google Cloud Platform” & “BigQuery”. Get familiar with it here. The Post 1 and Post 2 are other guidelines digging deeply on setting up a “Google BigQuery”.

Let’s do a little warm-up in BigQuery:

BigQuery console is the place where you interact with your data. Adding data sources, writing query, validating the queries, running the queries and seeing the results.

Google’s BigQuery UI

In order to work with the patent data deposited in BigQuery, you should navigate to “ADD DATA” and selecting “Explore Public Data Sets”. There are 125 public data sets deposited by default at BigQuery and “Google Patent Public” dataset is being one of them.

Google DataBase for Patents

By adding the dataset to your resources, you are able to see the various databased and tables in your resource section.

Tables available in Google’s Patent Data

From now on, we will interact with these tables with SQL scripts. On quick SQL query is to see what is in the table. So now we are constructing the query to see the 100 first records in the “Publication” table which is located at “patents-public-data” and “google_patents_research”:

“SELECT * FROM `patents-public-data.google_patents_research.publications` LIMIT 1000”

This is the query processing descriptive showing how long it took to initiate the query and retrieve the results and the size of the processed data.

The results of the query can be extracted in many ways as well (i.e. JSON, CSV)

Now with knowing the preliminary knowledge for interacting with BigQuery, I recommend you to explore the dataset.

Exploring Patent Dataset

In this article, my interest is to Geolocate the patents so to know which invention is happening where. The data we need to do this task is not structured straight forward so in order to get this question answered we need to explore the multiple tables and find the needed data. This requires knowing the data and understanding the ways to structure a new data set by picking particular data points and merging multiple tables.

Here I show you my journey to find the relevant data for this task.

Obviously, I need the location in longitude and latitude measures to map it spatially which can be found from this table -> `patents-public-data.patentsview.location`

Retrieving the patent information from this table -> `patents-public-data.patentsview.brf_sum_text`

In order to match this table to my location table, I need to understand where to find a place to match the “patent_id”. This table has a connection with “patent_id” and “inventor_id“-> `patents-public-data.patentsview.patent_inventor`

Finally, to match it to the location table, I utilize this table to match the “inventor_id” and “location_id” -> `patents-public-data.patentsview.location_inventor`

All these steps are beautifully written in a short and concise SQL code now!

SELECT t3.patent_id, t5.name_first, t5.name_last, t1.city, t1.country, t1.latitude, t1.longitude, t4.textFROM `patents-public-data.patentsview.location` as t1INNER JOIN `patents-public-data.patentsview.location_inventor` as t2ON t1.id = t2.location_idINNER JOIN `patents-public-data.patentsview.inventor` as t5ON t2.inventor_id = t5.idINNER JOIN `patents-public-data.patentsview.patent_inventor` as t3ON t2.inventor_id = t3.inventor_idINNER JOIN `patents-public-data.patentsview.brf_sum_text` as t4ON t3.patent_id = t4.patent_idWHERE t1.country='FI'

In the case of my experiment, I was interested in Finland therefore I have a condition at the end where I isolate the country to be FI which is the ISO code of Finland. You can remove either the condition to get it globally or modify the condition for the country(s) you are interested. I have to warn that the queries will cost! As a new user, you get some free credits which is enough for experimentation but make sure to be cautious as loads of data and complexity of query will cost you actual money.

When I run the query:

It took 27.1 seconds and 56.34 GB of data was processed. In the results tab, you can see the preview of the data which has been constructed. Now you can save it in BigQuery for further processing or download it for use in other tools.

I want to visualize the results in another Google product name “Data studio” which is already integrated with BigQuery. One step I need to do is to combine the Latitude and Longitude columns into one table so the data studio figure templates can recognize the geo type of my data. Some other tools like Tableau and Power BI already recognize the Latitude and Longitude data points separately so you don’t need that if you want to visualize the data with them.

In order to achieve the creation of a Lat-Long column, I saved the view that I got from running the query and this will create a dataset for me in BigQuery.

I save the view as “FI_inventor_patentid_name_city_lati_long” Under FN dataset name. Then in order to merge Lat and Long column I run this code.

SELECT CONCAT(c.latitude, ', ', c.longitude) AS latlong,c.patent_idFROM   `sturdy-pier-228012.mydataset.FI_inventor_patentid_name_city_lati_long` c;

Now you need to switch to look for “Data Studio” in “Explore Data” drop-down for the steps relating to visualizing the data.

In Google Studio environment, you will select the “Map Icon” which is a template that handles Latitude and Longitude data points and place them on the map. You drag the “Patent-id” as the level of our analysis (want to know where the individual patent is coming from) to filter section. The “latlong” column should be placed in dimension section. In addition, you need to define the type of data point here which is “Geo” and “Latitude, Longitude”.

This is the final reward! Now it is possible to read and know, who, when and where patented what (Case of Finland).

This exercise is supposed to give you an idea of what is possible to do. The journey is set to give you a practical guide to get familiar with most of the production aspect using BigQuery, SQL, operationalizing and visualizing the data.

--

--