How to access Open Data with Oracle Autonomous Database 🌳🌲🎄

Loïc Lefèvre
db-one
Published in
4 min readDec 18, 2020

Since August 2020, the Autonomous Databases have seen a new member: Autonomous JSON Database. With it came the support of the new OSON format to efficiently store JSON documents while allowing for powerful analytics on it using modern SQL and the best performances on the market for this kind of data.

Now, this is definitely just the beginning of this data story since JSON data is everywhere today and one of the most used formats to exchange data across the world.

As such, it is time to see how one can effectively bring Open Data in the game and especially make it queryable right from your Autonomous Database.

In this post, I’ll show you a technic that one can use to quickly access such Open Data to enrich your applications and perform on the fly analytics on it.

Source of data: Direction des Espaces Verts et de l’Environnement — Ville de Paris, 11/12/2020, license ODbL.

These JSON data about the remarkable trees in Paris are available using an API.

First, we’ll create a table that will be used as a local cache of the downloaded JSON data:

create table json_cache ( 
downloaded_at date default sysdate not null,
url varchar2(32767) not null,
cached clob
)
lob(cached) store as securefile (DEDUPLICATE CACHE COMPRESS HIGH);

alter table json_cache add constraint
json_cache_cached_is_json check (cached is json);`

Then we’ll create an Oracle Table Function that will do the magic:

  • Check the cache to see if the page has been downloaded since the latest 10 minutes, if not present or too old, redownload it,
  • Parse the parametrized JSON path to look only for JSON objects from the array using the JSON_TABLE function and a dynamic PL/SQL cursor,
  • Generate new rows for further analysis usingJSON_VALUE function

In order to work, a Table Function needs a type and table of this type to be returned:

CREATE OR REPLACE TYPE t_opendata_json_doc AS OBJECT ( 
json_document CLOB
);
CREATE OR REPLACE TYPE t_opendata_json_doc_collection IS TABLE OF t_opendata_json_doc;

I’m usingCLOBhere so that it is compatible with 19c version and it also automatically manages Unicode conversion of the Open Data. JSON standard expects these data to be UTF-8 encoded.

Now the Table Function looks like this:

CREATE OR REPLACE FUNCTION opendata( p_my_url IN varchar2,
p_json_path IN varchar2)
RETURN t_opendata_json_doc_collection PIPELINED AS
TYPE OpenDataCursorType IS REF CURSOR;

l_clob clob;
l_blob blob;
cur OpenDataCursorType;
procedure cache_opendata as
pragma autonomous_transaction;
begin
begin
-- retrieve the latest data cached at most 10 minutes ago
-- purging this cache is let as an exercise (hint: you could use dbms_scheduler)
select cached into l_clob
from json_cache
where url=p_my_url
and sysdate-downloaded_at < 10/24/60
order by downloaded_at desc
fetch first 1 ROW ONLY;
exception when no_data_found then
-- nothing inside the cache, then download the Open Data
l_clob := apex_web_service.make_rest_request(
p_url => p_my_url,
p_http_method => 'GET'
);

insert into json_cache (url, cached)
values (p_my_url, l_clob);
commit;
end;
end;
BEGIN
cache_opendata;
open cur for 'select d.json_document from json_table(:l_clob, ''$'' columns (nested PATH ''' || p_json_path || ''' columns(json_document format json path ''$''))) d' using l_clob;

loop
fetch cur into l_clob;
exit when cur%NOTFOUND;
PIPE ROW(t_opendata_json_doc(l_clob));
end loop;

RETURN;
END;
/
show errors;

Once it’s compiled, you can test it using such SQL query:

select j.json_document.fields.libellefrancais as tree_name, 
j.json_document.fields.arrondissement as location,
j.json_document.fields.hauteurenm as heighth_in_meters,
extract(year from systimestamp) - extract(year from to_timestamp_tz(j.json_document.fields.dateplantation, 'YYYY-MM-DD"T"HH24:MI:SSTZH:TZM')) as age_in_years,
count(*) over (partition by j.json_document.fields.libellefrancais) as total_trees_with_same_name
from (
select json_query(json_document,'$') as json_document
from table(opendata('https://opendata.paris.fr/api/records/1.0/search/?dataset=arbresremarquablesparis&q=&lang=en&rows=200&facet=genre&facet=espece&facet=stadedeveloppement&facet=varieteoucultivar&facet=dateplantation&facet=libellefrancais',
'$.records[*]'))
) j
order by 4 desc;

Explanations:

  • we can invoke the function such as:select count(*) from TABLE(opendata( <url>, <JSON path to JSON documents array> ));
  • as it returns CLOBs, you can’t use the dot notation just yet… you need to convert it first using the JSON_QUERYfunction
  • once it’s done, you’ve got a result set or a row source that is composed of the full JSON documents present in the array… and you can start analyzing your data!
  • Hint: in the case, the URL for the Open Data contains a & char, then you can run this command to avoid variable substitution: set define off

This query in fact returns the list of the oldest remarkable trees in Paris, their height in meters, and age. It also provides the location of the tree and the number of such trees identified as remarkable trees in Paris.

SQL query over fresh Open Data

Hence, the oldest tree in Paris is a small Robinier of 9 meters which was born back in 1602 and it is the only one…

For the curious, it’s close to Notre Dame de Paris:

Now you can enjoy any Open Data from your Autonomous Database…

--

--