Evaluating linked data using the Good Growth Plan

Dean Allemang
5 min readApr 21, 2023

One of the issues (for me, anyway) around having lots of data in the linked data cloud or on data.world has been what to do with it. I’m pretty sure there’s a lot of interesting insight in a lot of these datasets, but finding those insights is a pretty daunting task. One tool we have for finding such insights is data visualization, and there are a lot of data visualization tools out there. Every once in a while, someone tells me about one, and I think I will learn about it, but this just goes on the stack of books I want to read, APIs I want to learn, languages I want to speak, and a lot of other self-improvement things that never make it to the top of a priority list. So a lot of data lies around, collecting dust.

Once again, however, ChatGPT comes to the rescue. GPT is much maligned for its factual inaccuracies, but a lot of the value in an LLM doesn’t come from the facts, it comes from things about the facts. This is why data and LLM seem like a match made in heaven. GPT is good about things like metadata (as I have shown in some other blogs) and it is good with writing programs. So I started of earlier this week giving ChatGPT the following prompt:

ORGANIZATION called ‘syngentagoodgrowthplan’, a DATASET called ‘good-growth-plan-open-data’ and a SPREADSHEET called ‘syt_ggp_c2soil_data_2021_0’. In that spreadsheet, there’s a column called ‘country’ and one called ‘benefitedhectares’. Build a web page that will show a map of the world, where each country is color-coded in grayscale; the darkest ones have the largest sum of benefitedhectares, and the lightest have the least benefitedhectares. Use whatever open source mapping software you please. Make it into a web page I can just drop into my browser and see the map.

I had already told it I was using data.world, so it knew what I meant by ORGANIZATION, DATASET and SPREADSHEET. The response wasn’t perfect out of the box, and in fact I had to spend a day or so fixing a bug in the package it recommended. But it was a far cry faster than if I had learned how to use the the package (called leaflet; I had never heard of it before this week, and I had to look up the doc page for it just as I was writing this blog). But the first thing out of the box did all of the following:

  1. It wrote a query against the data.world datasets to find the required information (in SQL — I don’t even know SQL, but now I’m using it),
  2. It formulated a call using the data.world API,
  3. It did all the imports needed for leaflet as well as a world map to display things on,
  4. It wrote all the async javascript code to make the whole thing work (I tried writing async javascript for a project a few months ago. There were so many things I never got to work. I honestly think that project, which failed, would have succeeded if I had had ChatGPT to help me with the async code),
  5. It helped me debug an issue with an incompatibility between the world map it imported an leaflet’s display policies.

I had a similar experience in an earlier blog, where I used cytoscape to display graph data. ChatGPT has made a whole array of software tools available to even casual programmers. All those specs and documentation that we’ve been writing has paid off.

So, here’s a link to the page I created. I haven’t asked ChatGPT to style it to look more like a 21st century web page rather than HTML homework from 1995, but that might come later. The two maps show data from the soil conservation practices dataset. The first shows the number of hectares of good practice polies implemented, and the second shows the number of hectares that benefited from these practices.

I invite you to hover over Brazil; you’ll see that nearly 9 million hectares benefited from the improved soil practices that Syngenta implemented, out of a total 7.5 million hectares that were actually implemented. You might wonder how so many more hectares benefited than were actually implemented. I wonder that, too. I’m going to check my work, but in the spirit of checking the work, I think this is a good question to take back to the folks who collect the data for Syngenta.

A real photo of me working with ChatGPT on this project.

I am personally looking forward to more reports of this sort; any visualization tool, any analytical tool, anything that processes data to provide more value than we get from the data on its own, is now available at our fingertips, with a minimum of study, implementation and testing.

The “facts” that ChatGPT had to know about were all the details of the APIs that I used, and programming idioms that are useful in Javascript. Unlike uses of ChatGPT to discuss thing in the real world, I can test the ‘veracity’ of its utterances by seeing if the code works. The discussions I had with it when there were bugs (the bounding box of Russia goes beyond the 180 meridian, and Leaflet doesn’t handle that very well), ChatGPT was like a very well informed code partner, who could code up variants in the blink of an eye, and discuss in detail why the new solution failed. I ended up learning a lot about how Javascript and geospatial packages work.

A friend of mine asked me how I would know that the code I got out of this actually works, or, worse yet, how I know that it doesn’t contain malware. For something as simple as this, I can actually review all the code (which I had to do, to debug it). For more complex things, we have to resort to more usually comprehensive methodologies like software testing. I’ve only done a little of that here, but this is just a small application.

Watch out for more analyses of this sort in future blogs. This is actually a lot of fun.

--

--

Dean Allemang

Mathematician/computer scientist, my passion is sharing data on a massive scale. Author of Semantic Web for the Working Ontologist.