Object Storage document access in MongoDB? No problem .. with the Oracle MongoDB API

Hermann Bär
Oracle Developers
Published in
4 min readSep 17, 2024

Have you ever wondered how to work with your documents in Object Storage or file system in the fastest and most efficient way? Yes, you know about loading them, but that takes time, and the external data is constantly changing and growing. You no longer have to wonder: you can access your documents in place without any data loading, ad hoc, and hassle-free.

Photo by Becca Tapert on Unsplash

Interested? Then read on, and let me quickly show you how to do this with the Oracle MongoDB API in one-two-three.

It’s ultimately pretty simple: We added a new aggregation pipeline operator called $external, which allows you to access external documents in Object Storage (or file systems), with or without authentication, and use them directly in an aggregation pipeline.

Let’s see what we have in some sample data sets we can find on GitHub to play with:

jason> db.aggregate([{$external:"https://raw.githubusercontent.com/neelabalan/mongodb-sample-dataset/main/sample_weatherdata/data.json"},
... {$project: {"callLetters":1, "airTemperature":1, "pressure":1}},
... {$limit: 3}])
[
{
_id: ObjectId('5553a998e4b02cf7151190b8'),
callLetters: 'VCSZ',
airTemperature: { value: -3.1, quality: '1' },
pressure: { value: 1015.3, quality: '1' }
},
{
_id: ObjectId('5553a998e4b02cf7151190b9'),
callLetters: 'VC81',
airTemperature: { value: -4.7, quality: '1' },
pressure: { value: 1025.9, quality: '1' }
},
{
_id: ObjectId('5553a998e4b02cf7151190ba'),
callLetters: 'PLAT',
airTemperature: { value: 4.4, quality: '1' },
pressure: { value: 1030.8, quality: '1' }
}
]

That wasn’t too hard, was it? All you needed to know was the URL that pointed to the public data sets containing your JSON documents. All is done at runtime, with no object creation or data loading required.

If you want to do the same, but with data that is private and requires authentication, you can do so quickly, too:

jason> db.aggregate([{$external:{"location": "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idx0zryezzs0/b/JSON/o/movies*.json",
... "credential": "OBJSTORE_CRED"}},
... {$limit: 1}])
[
{
studio: null,
title: "'Gator Bait II: Cajun Justice",
summary: "' Gator Bait II: Cajun Justice is a 1988 sequel to the 1974 film 'Gator Bait , written, produced and directed by Beverly Sebastian and Ferd Sebastian. Largely ignored upon release, the film received a second life on cable television and home video.",
sku: 'COO3790',
list_price: 3.99,
year: 1988,
awards: null,
runtime: 95,
gross: null,
cast: null,
movie_id: 1,
crew: [
{ job: 'director', names: [ 'Beverly Sebastion' ] },
{ job: 'screenwriter', names: [ 'Beverly Sebastion' ] }
],
main_subject: null,
nominations: null,
budget: null,
opening_date: '1988-01-01',
image_url: 'https://upload.wikimedia.org/wikipedia/en/9/91/Gatorbait2.jpg',
genre: [ 'Thriller' ],
wiki_article: "'Gator_Bait_II:_Cajun_Justice",
views: 6
}
]

What were we doing here? In addition to the URL, we needed to specify a so-called credential object, which stores the authentication information in the database in an encrypted format (what this exactly means in the context of an Autonomous Database is documented here).

And if you wanted to even look only at a subset of data in your documents based on a JSON path expression, without even streaming them to the database, you can do so, too:

jason> db.aggregate([{$external:{"location": "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idx0zryezzs0/b/JSON/o/movies*.json",
... "credential": "OBJSTORE_CRED", "path": "$.crew[*]"}},
... {$limit: 10}])
[
{ job: 'director', names: [ 'Beverly Sebastion' ] },
{ job: 'screenwriter', names: [ 'Beverly Sebastion' ] },
{
job: 'producer',
names: [ 'Christian Colson', 'Danny Boyle', 'John Smithson' ]
},
{ job: 'director', names: [ 'Danny Boyle' ] },
{ job: 'screenwriter', names: [ 'Simon Beaufoy', 'Danny Boyle' ] },
{ job: 'producer', names: [ 'Iván Angelusz', 'Ferenc Török' ] },
{ job: 'director', names: [ 'Ferenc Török' ] },
{
job: 'screenwriter',
names: [ 'Gábor T. Szántó', 'Ferenc Török', 'Krisztina Esztergályos' ]
},
{ job: 'producer', names: [ 'Paul Bales', 'David Michael Latt' ] },
{ job: 'director', names: [ 'Alexis Lloyd' ] }
]

If you want to use this to persist the data as a JSON Collection, just add a $out operator, and you’re in business.

I mentioned earlier that this also works with files in the file system. The concept is similar; it’s just that access to the underlying Oracle database is done through Oracle’s directory objects and the database’s built-in access control. This means that the database needs to be able to reach that source directory from the server where the database is running. You only need to specify the directory where your documents reside and the documents. Having said that, this is not really cloud-native but more a functionality for classical on-premises systems. The details for the whole syntax are in the documentation.

Allowing direct and transparent access to your external JSON documents in Object Storage or file system increases your productivity significantly. It streamlines your code to work with data in real-time without creating persistent objects or even loading data. Having said that, this only works with textual JSON documents that can be processed as-is. If you are looking for access to more comprehensive external data, possibly in other file formats or even relational data, you can always do so by creating an external table in your Oracle database and building a JSON Collection View over it or even loading the data. Whatever you need to do, we’ve got you covered.

That was easy, wasn’t it? No degree in rocket science is required, only access to an Oracle Database 23ai with the MongoDB API, which all of you can get for free. If you don’t have one, just follow the instructions here and sign up for an Autonomous Database 23ai. This Oracle-specific operator also works with any other Oracle Database 23ai deployment with ORDS installed — the MongoDB API is part of ORDS — but you need to wait for the next release of ORDS to get your hands on it, and you need to install DBMS_CLOUD manually in your database.

That’s it for today, folks. Please let us know what you want us to cover in the future, and if you have any questions, feel free to reach out to us at any time. We’re here for you.

--

--

Hermann Bär
Oracle Developers

I work in Product Management in the Oracle Database organization. I thrive in helping customers leverage Oracle technology to make them successful.