A Short Introduction to Environmental Data

Abhijeet Singh
The Startup
Published in
6 min readJan 20, 2020

This article is aimed at giving a brief overview of environmental data, what it is, where to find it and some use cases.

Credits: Kevin M. Gill

In the internet age, data has emerged as one of the most valuable resources. However, there are still numerous organisations that are unable to harness the full potential of the different data sets present on the data market. In these, the environmental data is one of the most underrated ones which can help optimise existing processes. At the same time, industry players are constantly aiming to meet the Sustainable Development Goals (SDGs) as we venture into the next decade.

In this instance, I will highlight the advantages of using environmental data and how it can help boost your business and make it future-ready. Let’s look at what are environmental data-sets.

Environmental data is the one that encapsulates various environmental parameters such as pollution levels, land-use change, water quality, soil quality, vegetation, public health, habitat fragmentation, etc. Although, it’s not only limited to these.

The DPSIR (Drivers, Pressures, State, Impact and Response) intervention model for industries best represents environmental data. In which, the ‘P,’ ‘S’ and ‘I’ fall into environmental data, but more on that later. The environmental databases are usually available in structured as well as unstructured formats such as XML, PDF, JSON, etc. These formats are designed to work with different models compatible with your business software application.

The importance of environmental data becomes self-evident when considering the wide array of use cases stemming from it. A few use cases to name are precision agriculture, optimisation for the shipping industry, mining industry, and construction industry. Let’s take a glance at how environmental data can be used to optimise processes in these industries.

Precision agriculture: One of the prime use cases of environmental data is precision agriculture. It gives a massive boost to both local farmers as well as large farming corporations. Several data values are collected such as soil nutrients levels and moisture content and then the farm is cultivated accordingly. The idea is to put the necessary amount of compost, fertilisers and water, rather than putting in a uniform quantity for the entire field. This helps in cost-cutting in terms of compost, fertilisers, fuel for farm equipment and water needs. This also optimises the farm yield.

Outage prediction and rerouting: The shipping industry is another sector that can use environmental data to optimise its cargo delivery, transport system and to comply with SDGs. Apart from the fact that the industry can minimise its effects on the environment, a major boost can be achieved by deploying machine learning algorithms with real-time data on ocean weather conditions and so on.

Construction industry: For the sustainability of any construction project using environmental data is crucial to achieving SDGs. It also helps the company keep track of harmful construction materials that might otherwise end up in the project. Firms in the field can make sure that their large or small scale projects comply with government norms around environmental impact, which can otherwise hinder deadlines.

Sustainable mining: It’s one of those industries where environmental data is vital, both for the well-being of the ecosystems and of the organisation itself. Moreover, as consumers are beginning to question the origins of the products they use, the importance of environmental data increases many folds in any such activity then be it lithium-ion batteries or silicon for computer hardware.

Tree plantation projects: Environmental data sets come in as a major tool to identify the tree spread in urban or rural parts of a city. It can help plantation companies optimise their operations by synthesising and deploying this data set.

The Attributes of Environmental Data

Let’s look at the attributes of this particular data set. The beginning of this article mentions the pressures(P); states(S) and impact(I) from the DPSIR model of intervention and these three are where our environmental data attributes reside.

Pressures: This particular category holds attributes like pollution, population growth, extraction of resources, land-use change, etc. In essence, these are the ones that exert pressure on the environment and ultimately have an impact.

States: These can be summed up as an active situation of various natural resources such as vegetation, biodiversity, water quality, air quality, habitat, etc.

Impact: Last, but not least, the impact of human activity on the environment can be better understood as biodiversity depletion, deforestation, ill public health, economic crises, environmental damage, etc.

The Data Collection Process

The environmental data is usually collected via different methods. However, depending on the attributes it can vary from one use case to another. Hence, the primary question is what data to collect? To start with, the “type” of the collectable variables needs to be defined: dependent or independent. The “scale” of data needs to be considered whether it is continuous data or discrete. Then comes the “type” of data, if it is binary, series, continuous, etc.

The next step is the various data collection methodologies that can be used for environmental data collection. There are different ways to do so and it varies from one environmental data set to another. To help you visualise a few such data set here are some examples along with their data collection methodology.

EARTHDATA: Nasa’s open data set that uses satellites, aircraft and field measurement.

Climate Change Data: World bank’s climate change data also uses satellites, aircraft and demographic surveys to collect the data set.

Knowledge Network for Biodiversity (KNB): Environmental data for research related to ecological and environmental research, whereby, the collection methodology is via field laboratories, research sites as well as independent researchers.

The Quality of Data

The latest machine learning algorithms are only as good as the data you feed into them. But, how do we ascertain that the data set that we have is clean and verified? One answer to this problem is to stick to credible data sources.

However, if that is not the case some primary dimensions for data assessment can be its completeness, accuracy, uniqueness, consistency, validity and timelessness. Overlooking these simple dimension when assessing a data set can lead to unpredictable results and an unsatisfactory outcome.

The Price of Environmental Data

The price varies from institution to institution and also how the data is collected. However, to give you an idea, real-time data tend to cost substantially more than a data set that gets updated every two months. And then there is a yearly subscription that can be as low as 1000EUR to as high as 4500EUR, but it all depends on who is selling the data and how it was collected.

Real-time sensor data is usually the most expensive as the initial cost of the hardware involved is relatively high compared to a collection methodology that collects data periodically.

Challenges when Purchasing Environmental data

The prime challenges while procuring environmental data are lack of knowledge of where to find ideal data for your particular use case, the data set’s assessment for quality, choosing the right algorithms for feeding the dataset, etc.

The way out can be engaging the data engineer at your organisation with the data provider. The aim should be to ask for:

  1. The provenance of the data.
  2. How and when it was collected?
  3. If there are any recommendations from the data provider on how to use this data for the particular use case.
  4. The shortcoming of the data: Is it biased in certain applications? Is it unpredictable because of some key indicators, etc.

This was a brief overview that can help you start. However, finding a credible dataset provider if often challenging. The best approach can be trying out freely available data sets at the onset of your research. Some credible free data providers are mentioned in this short guide, however, the European Union Open Data Portal is also a good place for experimental purposes and also for the real-world applications depending upon the use case.

Feel free to drop comments if you have any questions or suggestions. Cheers.

--

--

Abhijeet Singh
The Startup

I create beautiful technology stories! I write about technology, content strategy and driving engagement. I am a content strategist and a creative director.