New Features of GeoParquet Downloader QGIS Plugin
It’s a true pleasure to share that in the time since my last couple of posts about the QGIS plugin to download GeoParquet data there is now a real community of contributors making awesome advances to the plugin.
I got hooked on open source over 20 years ago when I wrote code and others showed up and made it better, and after a long hiatus from actual coding it’s been awesome to tap into that feeling again.
Latest Plugin Enhancements
So this time I get to mostly highlight the recent contributions from others. I did a couple smaller things too, but all the recent advances have been from a couple awesome contributors. These are spread across a three releases (0.4, 0.5 and 0.6), and you can get all the features by just searching for ‘GeoParquet Downloader’ in the QGIS plugin manager, and you should get 0.6 (if you don’t just refresh it).
The first enhancement was to improve the installation process, reporting to the user when DuckDB is getting downloaded and installed.
This was from Till Frankenbach, and I think the installation process should now hopefully work for most people. He’s then followed that up with a number of great improvements, including the most visible change in the recent releases. We’ve reduced from 3 buttons down to one:
The default QGIS toolbar has a lot of buttons on it, and most people installing plugins have even more buttons to add, so that real estate becomes really precious. In the first iteration we had three buttons: one for Overture, one for Source Cooperative, and one for custom downloads. But they all went to the same dialog, though they started on different views. So we decided to just use one button. Till then followed up with a nice improvement to save the state of the radio button, so that if you’re usually using one of the tabs then it’ll be there when you go back to use it again.
Soon after Till started contributing we also had Sam Jackson create a number of great improvements. The first was FlatGeobuf support — my second favorite vector format after GeoParquet, and one I use routinely.
Then he added GeoJSON support, and even implemented my desire to properly warn people if they were going to download a ‘huge ass’ GeoJSON file, and ask them if they’d prefer a format that will handle things better.
<screenshot of dialog for GeoJSON>
I’ve had some bad experience with GeoJSON recently, when I’ve downloaded the entire Planet SkySat catalog, where once you get into gigabytes most tools will really struggle, QGIS included. But formats like GeoParquet and FlatGeobuf will be much smaller (like at least 20% the size, if not 10% or less), and they’ll also perform much better if they do get to tens of gigabytes.
He did also have the PR add shapefile, but after some good discussion we decided that it’s ‘increasingly obsolete’ and we don’t want to support it. I’m still open to someone making the case that we must have it. But I think it’s pretty easy to use any of the other formats and then export to shapefile from QGIS. And then we don’t silently cut off the column names that are longer than 10 characters.
And Sam also added what I think is my favorite new feature — the ability to select multiple Overture layers and download them all at once:
It makes it much easier to just get all the data you need for a given area, and it’s cool to just see all the data get added to the map as it comes in.
I also added a couple small improvements. I upgraded the Foursquare places data, and their latest release included this snippet:
These improvements are in-line with those I’ve been writing up to be best practices for large GeoParquet distributions (and I’ve been working on making tools to make it easier for people to test and implement). The performance of the Foursquare places on Hugging Face is much faster, down from over a minute to around ten seconds on my connection.
I also added a couple little fixes to make things more robust, which came from working with some interesting data. I was trying out some fiboa field boundary data and realized that columns like admin:country_code
weren’t working right with geopackage data, so I fixed that. And I’ve also been experimenting with NHD Data, trying out converting it to Parquet using best practices and putting it on source.coop. And I realize that the code that uses the bbox
column to accelerate querying only worked if the column was named bbox. But the spec allows any column name, you just have to specify the name in the metadata, and GDAL/OGR uses geometry_bbox
if you column name is geometry
. So I fixed that too.
What’s Next
So a huge thanks to Till and Sam, and I’m hoping they’ll continue to contribute great features. And hopefully someone will eventually surpass me in the commits leaderboard, as always happens in my most successful projects, as others most always prove better at making things robust and real than I do.
Till made a great contribution of refactoring the mess of code that comes from ‘my first qgis plugin’, which was the case for me with this one. I’m really excited about it, as it should enable us to make robust testing and hook it up to continuous integration, so that we can more confidently collaborate between multiple people. I just landed it on ‘main’, and we need to add back in a couple of Sam’s latest features before we release. And we’re also discussing if it makes sense to move it off my home repo, to an organization where it’d have a good home. Matt Travis, who was the very first outside contributor, also just recently added a better workflow to create ‘releases’, and I hope to land that soon.
If you’re interested in diving in and contributing please do! There’s a good bit to help on the refactoring, like writing tests and making sure all the features work as they previously did. And I’ve still got a number of ‘good first issues’ tagged in the issue tracker. Feel free to grab one, or add your own ideas to the issue tracker.