Using the CKAN API for bulk data uploads onto openAFRICA

Published in

Code For Africa

5 min readApr 18, 2024

The CKAN API offers a robust interface for various tasks such as dataset and resource management, search functionality, analytics, and more.

openAFRICA is powered by CKAN, an open source data management system. (Source: CfA)

Did you know that there is a way for governments, organisations, activists, researchers and citizens to freely share datasets with the public? In this technical guide, I’ll walk through how to use the CKAN API in Python to do bulk uploads of data files to the openAFRICA platform.

openAFRICA is a pioneering open data repository focused on the African continent and the largest independent platform for open data related to African countries, economies, development, and more.

The platform is powered by CKAN (Comprehensive Knowledge Archive Network) — an open source data management system for powering data hubs and portals. CKAN makes all the core features and functionality of openAFRICA available through its Action API. This API allows interacting with openAFRICA programmatically to automate tasks like creating datasets, uploading data files, editing metadata, and more. You can check out CKAN’s documentation over here.

I’ll use a real example of uploading datasets from Afrobarometer’s Round 8 survey data on governance and democracy across 34 African countries.

Afrobarometer is an independent, non-partisan research project that conducts public attitude surveys on democracy, governance, economic conditions, and related issues in African countries. It is considered one of the most reliable sources of data on African citizens’ views and experiences. Afrobarometer surveys are conducted in waves across a sample of African nations, with Round 8 covering surveys in 34 countries between 2019–2021. The data from these surveys provides crucial insights into Africans’ evaluations of their country’s economic and political trajectory.

Prerequisites

Before starting, you’ll need:

A user account on openAFRICA (sign up at https://openafrica.net/user/register)
An API key from your openAFRICA user account
Python 3 installed
The ckanapi Python library installed via (pip install ckanapi)

You’ll also need a collection of data files (CSV, JSON, etc.) that you want to upload to openAFRICA. For this demo, we will be using Afrobarometer’s merged round 8 dataset which is in .SAV format.

Setting Up

First, let’s install the required Python libraries:

!pip install ckanapi 
!pip install --upgrade google-auth google-auth-oauthlib google-auth-httplib2 
gspread

We’ll use ckanapi to interact with the CKAN API, and gspread to authenticate with Google Sheets in case we need to retrieve any data from spreadsheets.

Next, we import the necessary libraries and authenticate with the openAFRICA API using our API key:

import ckanapi
import requests
import os

api_token = "my_api_token" # Replace with your actual API key
APIKEY = api_token

ckan = ckanapi.RemoteCKAN('https://africaopendata.org', apikey=APIKEY)

Creating a new dataset

The first step is to create a new dataset on openAFRICA that will contain all our uploaded data files. We can use the package_create action from the CKAN API:

package = ckan.action.package_create(
    name="afrobarometer-round-eight-survey-data",
    title="Afrobarometer Round Eight Survey Data", 
    owner_org="afrobarometer"
)

This creates a new dataset package with the provided name, title and owner organisation. The owner_org should be an existing organisation on openAFRICA that you have permissions for. You can check out organisations on openAFRICA here.

If the dataset already exists, the package_create call will raise a ValidationError. We can catch this and either skip creation or update the existing dataset.

Uploading data files

With the new dataset created, we can now upload our data files as resources associated with this dataset. The example code loops through all .SAV files in a local directory:

dataset_folders = "/path/to/data/files/" #Input the correct path to where your files are stored in the local directory
for item in os.listdir(dataset_folders):
    if item.endswith('.SAV'): # .SAV is the file extension in your local storage
        try:
            resource_name = os.path.splitext(item)[0] 
            extension = 'SAV'
            path = os.path.join(dataset_folders, item)
            
            r = requests.post(api_url, data={
                'package_id': package['id'],
                'name': resource_name,
                'format': extension,
                'description': resource_name,
                'url': item,
            }, headers={'Authorization': APIKEY}, files={'upload': open(path, 'rb')})
            
            if r.status_code != 200:
                print('Error uploading resource')
                
        except Exception as e:
            print(f'Error processing file "{item}": {str(e)}')

For each CSV file in the directory, it constructs the resource metadata:

resource_name: Name of the resource derived from file name
extension: File extension/format
path: Full local path to the file

It then makes a POST request to the resource_create API action URL, passing in:

The dataset’s package_id the resource belongs to
Resource metadata fields like name, format, description
Setting url to the filename for uploaded files
Uploading the raw file data in the request’s files

This associates each file as a new resource under the previously created dataset.

Handling errors

The example has some basic error handling. If the resource_create API call fails (non-200 status code), it prints an error message.

For any other exceptions during file processing, like PermissionError for file access, it catches the exception and prints an error message including the problematic filename. You may want to enhance the error handling based on your requirements, like retrying failed uploads, logging errors to a file, etc.

Publishing the dataset

After uploading all data files as resources to the new dataset, you can optionally publish the dataset to make it available on the public openAFRICA portal.

First, check if the dataset was created in the ‘draft’ state:

dataset_dict = ckan.action.package_show(id=package['id'])
state = dataset_dict['state']
print(f"Dataset state: {state}")

If state is ‘draft’, you can publish it:

if state == 'draft':
    ckan.action.package_patch(
        id=package['id'],
        state='active'
    )

This marks the dataset as publicly available and active on openAFRICA.

Conclusion

By using the CKAN API, we can efficiently and programmatically upload entire data collections to the openAFRICA open data repository. This allows streamlining and automating the process of sharing Africa-related open data on the openAFRICA platform. The API enables integrating openAFRICA into data publishing and sharing workflows. The key steps involved are:

Install ckanapi Python library
Obtain an API key from your openAFRICA account
Create a new dataset package using package_create
Upload data files as resources to that package using resource_create
Optionally, publish the dataset to make it publicly available

The CKAN API provides a powerful interface to perform all kinds of operations like dataset/resource management, searching, analytics and more. You can extend this example to build custom data pipelines, scripts and applications that interact with openAFRICA pragmatically.

This data blog was written by CfA DataLab data analyst Stephane Njoki. It was edited and reviewed by CfA copy editor Kiprotich Koros.

Code for Africa (CfA) is the continent’s largest network of civic technology and data journalism labs, with teams in 21 countries. CfA builds digital democracy solutions that give citizens unfettered access to actionable information that empowers them to make informed decisions and strengthens civic engagement for improved public governance and accountability. This includes building infrastructure like the continent’s largest open data portals at openAFRICA and sourceAFRICA. CfA incubates initiatives as diverse as the africanDRONE network, the PesaCheck fact-checking initiative, the sensors.AFRICA air quality sensor network, and the research and analysis programme CivicSignal.

CfA also manages the African Network of Centres for Investigative Reporting (ANCIR), which provides the continent’s best muckraking newsrooms with the newest possible forensic data tools, digital security, and whistleblower encryption to help to improve their ability to tackle crooked politicians, organised crime, and predatory big business. CfA also runs one of Africa’s largest skills development initiatives for digital journalists, and seed funds cross-border collaboration.