Data Processing in PHP, and Other Natural Disasters

Zoe Gadon-Thompson
Nov 5 · 7 min read

I started my first dev job at a small company called Flax & Teal in January. I get to work on some really cool things, and this project is one of them. It’s a really awesome example of how systems can work when there are a few different components.

This project is a really awesome example of how systems can work when there are a few different components.

This was first generation code written for a R&D project a few years ago. In super trivial terms — it’s a natural disaster simulator written in PHP, with help from some other things.

It’s fine for us to read about a disaster, or see a video, and feel empathy. But what would we do in that situation? What local services would we lose first, and how would we prepare?

This web application was made for schools to help kids prepare for the impact of natural disasters on human geography. It pulls together open data, PHP, and python. For all the flaws public sector have, ahem, there is a lot of open data on our country. There are co-ordinates for (probably) every street light in the country, if you ever need them. And the best part? It’s free!

I haven’t been in the dev game long but I have noticed people tend to mock PHP developers. I can’t imagine why. While it isn’t good for actually processing data, it is good for seeding data, and making migrations, and dataset management. Laravel is also good at putting together webpages. It works well with Kubernetes & Docker because of how it scales & how it can be controlled.

Our Raging Planet is a single page application that communicates through various APIs. Front and backend are developed in separate repositories, and running docker-compose in development runs the entire stack from the different repos.The backend is made in Laravel, which seeds datasets from an internal CKAN instance — CKAN is like a content management system for Open Data. Laravel moves the data from CKAN to Python, then Python uses it to run mathematical simulations using FEniCS & other mathematical tools.

The front end of ORP is made in VueJS. It uses the data returned from the backend to display the markers at the co-ordinates contained in the database. This gives a visual representation of local open data with co-ordinates & images on top of a map using Leaflet, an open source mapping tool.

I mainly worked on Laravel & Vue.js, and you can make significant visual and usability changes to the application by just focusing on these two components, which is a huge benefit of having different services running in the application. Having them separate makes it easier to find issues rather than getting 10000 errors in one terminal window.

There are some datasets that need to be visualised on that map that take up a lot of memory and sometimes crash the app. There are datasets for (I think) all the gullies in Northern Ireland, in case you ever need to know that. The dataset for Belfast has around 25000 rows, so putting this on a map is not fun. So how do you get a tonne of data like that to display on a map using little icons that obviously show that they are gullies?

A map of Belfast with icons showing gullies, listed buildings, industrial heritage sites, schools and train stations

It doesn’t really look like a lot going on here, but the map is trying to display 26328 gullies and 1259 listed buildings, and it’s not going well. You can cluster the data using Leaflet.js, so as you scroll in and out, icons are rendered depending on how far in you’re zoomed. This way the map just needs to load a few points of data, rather than 26k different ones.

featureArcs.data.map(function(a) {
if (a.feature_chunk_id in features) {
features[a.feature_chunk_id].arc = {
h: a.arc.map(a => a.h)
};
}
});

The chunks of data are used to identify land markers in the simulation. The location is set in Case Context (which can be user defined) and the feature ID is found when the simulation runs to fill in the feature arc (or chunk). FeatureArcs represent large amounts of 3D data, which is the structural health of the features like buildings in the simulation. Places will get damaged over time depending on the weather phenomenon so this needs to be put into the data. Separate APIs are used because some of the feature data isn’t needed, like icons, images, and textual descriptions. This probably aren’t going to change based on the health or time. The data can be recombined in the browser, and because it only needs to know the health values, it puts less strain on the backend to send a tonne of data.

A map of Belfast with a blue area of effect to show where the disaster is located

When the simulation runs, it shows an area of effect on the map. This is where the python data models are used. The simulation data is sent from python using openfaas, and the simulation is made using FEniCS, which is a library running in functions used in OpenFaas. There’s 1 class of FEniCS/numerical based functions for each type of simulation that’s running. FEniCS is a FOSS tool for open source mathematical simulations made in Python. Laravel gets the data from CKAN using the seeders, then gives the data to Python to get processed. This means the simulation can do things like show lava flow down a mountain, or water levels in a hurricane storm surge.

The area of effect on the map is created with a javascript library that uses the conrec contouring algorithm. It finds the outer parts of the phenomenon by finding structures with affected health, and it draws a geometric shape around that part of the map. So it draws around the data, and renders lines to create a 3d model on top of the map. This helps to visualise the extent of the damage.

A diagram showing the infrastructure, with openfaas, kubernetes, containers (VMs)

Openfaas is serverless functions, using a series of docker containers that are communicating with kubernetes, telling it to scale up or down depending on the processes running. The simulations run in these processes since they are essentially smallish-data, big-computation python functions. However, as long-running serverless functions were pretty new when ourragingplanet was started, this is a recent addition, where we can fire up any number of long-running simulation functions for each type of natural disaster. When the simulation data is returned there are gaps between the points in time, which is why linear interpolation needs to be used. This is the equation for it:

y = y1 + (x - x1) * (y2 - y1) / (x2 - x1)

It adds a percentage of difference from point 0 to 1, depending on how far between the two times it is, to the previous time’s value. This allowed us to only save and transfer two snapshots at two spaced-out timepoints via the API, then calculate the extent of disaster damage rapidly in javascript for any times between.

And it’s used in this bit of code. — For each point of the simulation returned from Python, it finds the two points in time.

v[0] + (timeOffset - j[0]) * (k[1][l][0] - v[0]) / (k[0] - j[0])

It validates whether it’s the start or end point of the two snapshots, then set the values of the two points in time.

So there’s a lot going on here, so I know I basically skimmed over a lot of it. But this project shows how container based web apps can be used. This shows, not literally, how messy it can get because there’s so many services doing different things.

Since there’s heavy emphasis on data science, containers can be implemented, and server-less analytics functions can be easily and scalably integrated in a complex architecture like this one.

Working with free resources such as FOSS & Open Data has really strong benefits, and I’ll always advocate for transparency, especially within Government published Open Data. With this web application there was a lot of freedom with what datasets can be used — anything with co-ordinates can be thrown onto the map. Because PHP is seeding the data from CKAN, which already has all the management functionality for open data, it means the application itself isn’t holding a tonne of data

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade