Connecting a Kaggle dataset to Streamlit — Ukraine Missile Dashboard example

Fede Caprari
5 min readApr 23, 2024

In this tutorial, we will cover how to connect a Kaggle dataset to a Streamlit Web App to visualize data. Kaggle is an online data-science community platform where multiple datasets are available, and Streamlit is a free service where we can transform Python scripts into interactive web apps.

Being an intelligence analyst myself, I find it useful to visualize data of conflicts and geopolitical events to extract valuable insights. In this example, I will use the database “Massive Missile Attacks on Ukraine(credits to Kaggle user Petro) to check the evolution of Ukrainian missile interception rates through an interactive dashboard. My goal is to see if interceptions have decreased substantially in the latest months due to ammunition shortages.

If you are curious, this is the end result:

You can access the dashboard here.

The main requirement for this tutorial is to create a Kaggle account (which is free) and have some Python knowledge. Let's begin!

Checking dataset

The “Massive Missile Attacks on Ukraine” database is updated often and includes the number of Russian missiles launched daily, those intercepted, missile type, and other data. It is all based on information published by the Ukrainian Air Force.

Overview of the dataset in Kaggle.

After viewing the dataset in Kaggle, you must create an account to be able to download it (and later obtain an API token). In this example, you can see it's a .ZIP file containing two files, “missile_attacks_daily.csv” and “missile_and_UAV.csv.” For this tutorial, we will use “missile_attacks_daily.csv”

CSV file opened with the relevant columns for this tutorial highlighted

This step is optional, but I opened the dataset in Excel to check more details. In this case, I’m interested only in the “time_start”, “launched” and “destroyed” columns (see highlighted in picture above). Later on, I will clear the unnecessary data with Pandas (see “Clearing data and creating graphs” section below).

Connecting to Streamlit

The goal is to have an interactive dashboard that any user can access, where there the dataset can be loaded by simply pressing a button, without the user having to manually upload it. First, you need to obtain a Kaggle API.

  • Go to your Kaggle account >Settings >Create New Token. It will download a kaggle.JSON file, which contains with your username and a key.
  • Go to C:\Users\”YOUR USER”\.kaggle (if the folder does not exist, create it) and move the kaggle.JSON there.

Now install Kaggle in your working environment

pip install Kaggle

Let's write the code to load the data:

from kaggle.api.kaggle_api_extended import KaggleApi

def download_dataset():
# Initialize Kaggle API client and authenticate using secrets
api = KaggleApi()
api.set_config_value('username', st.secrets["kaggle"]["username"])
api.set_config_value('key', st.secrets["kaggle"]["key"])
api.authenticate()

# Define the dataset and the path where files will be downloaded
dataset = 'piterfm/massive-missile-attacks-on-ukraine'
path = '.'

# Download the dataset
api.dataset_download_files(dataset, path=path, unzip=True)

#The dataset is now downloaded in the Streamlit environment

And done! Now let’s create a Streamlit button that calls that function to get the data.

import streamlit as st

if st.sidebar.button('Get Data', type="primary"):
download_dataset()
#The button will appear on a sidebar, otherwise you can use st.button

data = pd.read_csv("missile_attacks_daily.csv")

Clearing data and creating graphs

We can continue now clearing the dataset. With Pandas, I will drop all the unnecessary columns, rename “time_start” to “Date,” and set the format to pd.datetime to plot it properly in a graph later on. As the “time_start” column has both dates and times, I will remove the time component too. This is the final code:

import pandas as pd

def process_dataset(data):
# Drop unnecessary columns including the original 'time_end'
data.drop(columns=['time_end', 'model', 'launch_place', 'target', 'destroyed_details', 'carrier', 'source'], inplace=True)
# Ensure that time is removed and only the date is kept
data['time_start'] = data['time_start'].astype(str).apply(lambda x: x.split(' ')[0])

# Rename 'time_start' to 'date'
data.rename(columns={'time_start': 'date'}, inplace=True)

# Convert 'date' to datetime object and extract the date part
data['date'] = pd.to_datetime(data['date']).dt.date

return data

#Calling the functions:
data_processed = process_dataset(data.copy())

Finally, I will use Plotly to create two interactive charts, a bar graph for the daily number of missiles launched vs intercepted, and the monthly average rate of missile interception. You can find the full code for the graphs in my GitHub repository here. The end result is this:

Testing and deploying

We can now run the Streamlit app locally to test it, and if everything works well, we can deploy it.

Important!

This is the most crucial step for the Web App to work. The Kaggle .JSON file with your credentials is stored locally, and it will not load when you deploy your app on Streamlit. It is also not recommended to store your credentials on GitHub. You need to set your Kaggle API through Streamlit secrets.

To do so, follow these steps:

  • Deploy your app normally through Streamlit. You can follow a step-by-step guide here.
  • Go to your app’s Settings > Secrets. There, add the following:
[kaggle]
username = "your_username"
key = "your_key"
  • Replace your username and key with the ones exactly from the Kaggle .JSON file.
  • DONE!

Connecting a Kaggle database can have multiple uses. In some cases, the datasets might have different formats than .CSV, but you can adjust the code accordingly. The most important thing is to obtain the API token and set up properly Streamlit secrets.

I hope you have enjoyed this tutorial. Please comment below if you have any questions or reach to me directly on Linkedin.

--

--