Voyages dataset use case: post route ballast distribution

Dylan Simpson
VorTECHsa
Published in
7 min readFeb 21, 2023

Join one of our Freight Analysts as we do a deep dive into the code behind one of our use cases for the new Voyages dataset.

Introduction

Our Freight team has created a use case for our Voyages dataset which allows clients to track the behaviour of ballast vessels after discharging on a specific route: the post-route ballast distribution. The output shows counts as well as proportions of post-route ballast legs that head to different destinations after discharging. This can help traders and charterers track supply patterns of vessels and make informed decisions about freight rate movements and vessel supply dynamics in the market. The time series presentation of this solution also allows clients to track any changes in vessel behaviour in order to update any assumptions made when trading/chartering. That said, few things are better explained without the use of an example, so here goes.

Applied example

Assumption: On all routes, a vessel will discharge and ballast back to its origin to load again and repeat.

Our post-route ballast distribution can help to test this, and see to what extent it is true or false. In this example, we’ll look at two routes: TC2 (Europe-to-US Atlantic Coast gasoline on MR tankers) and TD3C (Middle East-to-China crude on VLCCs).

Post-route ballast distribution for TD3C (voyage counts)

Clearly, for TD3C, our assumption is partially true, however the data does highlight other ballast destinations.

Post-route ballast distribution for TC2 (voyage counts)

However, for TC2, our assumption is not true, clearly vessels travel to a diverse set of destinations after discharging in PADD 1. In fact, they are more likely to ballast to the US Gulf to load diesel to take to Europe, this is an example of triangulation. Using our assumption above could lead to potential overestimation of vessel supply in Europe as well as overestimation of the average ballast mileage attached to this voyage.

The code

Now that we have described the use case and applied an example, let’s dive into the SDK code that produces this output. We will focus on the most important parts of the code, specifically, we will focus on the function. The full code is available in the appendix.

Let us begin with the inputs for the function:

origin: a list of one or more origin IDs to include

origin_excl: a list of one or more origin IDs to exclude

destination: a list of one or more destination IDs to include

destination_excl: a list of one or more destination IDs to exclude

vessel: a list of one or more vessel classes or vessel IDs to include

product: a list of one or more product IDs to include

product_excl: a list of one or more product IDs to exclude

start_y: an integer indicating which year to start in

start_m: an integer indicating which month of the year to start in

start_d: an integer indicating which day of the month to start on

end_y: an integer indicating which year to end in

end_m: an integer indicating which month of the year to end in

end_d: an integer indicating which day of the month to end on

end_date: a string containing the end date in the format “yyyy-mm-dd”

The first thing the function does is query voyage information for the original laden voyage, giving us an initial data frame to work from.

route = VoyagesSearchEnriched().search(
origins = origin,
origins_excluded = origin_excl,
destinations = destination,
destinations_excluded = destination_excl,
time_min = datetime(start_y, start_m, start_d),
time_max = datetime(end_y, end_m, end_d, 23, 59, 59),
vessels = vessel,
products = product,
products_excluded = product_excl
columns = "all")
route = pd.DataFrame(route)

We then need to do some rearranging to make the data easier to work with:

Sort by end_timestamp — the date filter uses voyage end dates.

route["end_timestamp"] = pd.to_datetime(route["end_timestamp"])
route.sort_values(by='end_timestamp', ascending = True, inplace=True)

Remove null end_timestamps.

route = route.dropna(subset=['end_timestamp'])

Make sure to remove voyages that end past the specified end date, this can happen since the time filters refer to voyages which were active in that date range.

route = route[(route['end_timestamp'] <= end_date)]

Now we are ready to query information about the ballast voyages. To do this, we can obtain a list of ‘next voyage IDs’, from our original query of laden voyages. All of these will be ballast voyages.

Get the next voyage IDs.

next_voyage_id_list = route["next_voyage_id"].unique()
next_voyage_id_list = next_voyage_id_list.tolist()

Get voyages corresponding to the next voyage IDs.

post_route = VoyagesSearchEnriched().search(
voyage_id = next_voyage_id_list,
columns = "all")

post_route_df = post_route.to_df()

Now some rearranging of the ballast voyages. An important step here is to sort by the start date of the ballast voyage, this is effectively the end date of the original laden voyages, and is what we use as the date when the ballast destination was declared.

Sort by start dates.

post_route_df["START DATE"] = pd.to_datetime(post_route_df["START DATE"])
post_route_df.sort_values(by='START DATE', ascending = True, inplace=True)

Relabel blank destinations as ‘Undetermined’.

post_route_df['DESTINATION SHIPPING REGION'] = post_route_df['DESTINATION SHIPPING REGION'].replace([''],'Undetermined')

At Vortexa, we use a timestamp format for our dates. This can make it difficult to aggregate by specific time intervals, so the code below adds a column to our post_route_df data frame which contains ballast voyage start dates in the “mm-yyyy” format. This allows us to aggregate destination counts by month, but users can also change this to be daily, weekly or annually if required.

Convert dates of ballast voyages to months and years for counting purposes.

post_route_df["months"] = post_route_df['START DATE'].dt.strftime('%m-%Y')

Now we are ready to start counting the number of times each destination is called in each month, the code below does the aggregation.

Obtain counts per month and destination

counts_per_month_and_destination = df.groupby(["months", "DESTINATION SHIPPING REGION"]).size()

Obtain counts per month to help calculate the proportion each destination contributes to the total destination calls in each month

counts_per_month = counts_per_month_and_destination.groupby(level=0).transform(sum)

Obtain proportions by dividing each destination’s counts by the total number of destination calls made each month

ratios_per_month_and_destination = counts_per_month_and_destination / counts_per_month

Reset indices to structure the data in a more readable way

ratios_per_month_and_destination = ratios_per_month_and_destination.reset_index()
counts_per_month_and_destination = counts_per_month_and_destination.reset_index()

Label the columns.

counts_per_month_and_destination.columns = ['Date', 'Destination Shipping Region', 'Count']
ratios_per_month_and_destination.columns = ['Date', 'Destination Shipping Region', 'Proportion']

The final data frames should look like this:

Example: Output of the TD3C ballast distribution counts and proportions

We now have the information we need in order to export to Excel and make the charts above. The below is just a check that makes sure that all the voyages we are considering are indeed ballast.

final_check = post_route_df["VOYAGE STATUS"].unique()

Print the confirmation when you run the function.

print("All voyages are", final_check)

The function also automatically exports the data frames (destination counts and percentages) to a csv on your Desktop.

Change names of the output files here (file path may need to be edited in Windows).

counts_per_month_and_destination.to_csv("~/Desktop/Monthly ballast distribution counts.csv", index=False)
ratios_per_month_and_destination.to_csv("~/Desktop/Monthly ballast distribution percentages.csv", index=False)

Lastly, it returns the data frames in python for you to view.

return counts_per_month_and_destination, ratios_per_month_and_destination

Below is an example of how to call the function in order to see where vessels (LR1s, LR2s and MRs) ballast to after discharging Russian diesel in Brazil (a market development we are watching very closely in 2023).

monthly_post_route_ballast_distribution(Russia,
None,
Brazil,
None,
['aframax', 'panamax', 'handymax'],
diesel_id,
None,
2021, 1, 1,
2023, 1, 31,
"2023–1–31")

Appendix

Full code

# Post voyage ballast distribution retrieval
# Import your libraries
from vortexasdk import VoyagesSearchEnriched
from datetime import datetime
import pandas as pd
import numpy as np
import time

def monthly_post_route_ballast_distribution(origin, origin_excl, destination, destination_excl, vessel, product, product_excl, start_y, start_m, start_d, end_y, end_m, end_d, end_date):

# Pull the laden voyages which occurred in the required timeframe
route = VoyagesSearchEnriched().search(
origins = origin,
origins_excluded = origin_excl,
destinations = destination,
destinations_excluded = destination_excl,
time_min = datetime(start_y, start_m, start_d),
time_max = datetime(end_y, end_m, end_d, 23, 59, 59),
vessels = vessel,
products = product,
products_excluded = product_excl
columns = "all")

# Convert to dataframe
route = pd.DataFrame(route)

# Sort by end_timestamp
route["end_timestamp"] = pd.to_datetime(route["end_timestamp"])
route.sort_values(by='end_timestamp', ascending = True, inplace=True)

# Remove null end_timestamps
route = route.dropna(subset=['end_timestamp'])

# Remove voyages that end past the specified end date
route = route[(route['end_timestamp'] <= end_date)]

# Get the next voyage IDs
next_voyage_id_list = route["next_voyage_id"].unique()
next_voyage_id_list = next_voyage_id_list.tolist()

# Get voyages corresponding to the next voyage IDs
post_route = VoyagesSearchEnriched().search(
voyage_id = next_voyage_id_list,
columns = "all")

# Convert this to dataframe
post_route_df = post_route.to_df()

# Sort them by their start dates (end date of laden voyage/discharge date)
post_route_df["START DATE"] = pd.to_datetime(post_route_df["START DATE"])
post_route_df.sort_values(by='START DATE', ascending = True, inplace=True)

# Relabel blank destinations as Undetermined
post_route_df['DESTINATION SHIPPING REGION'] = post_route_df['DESTINATION SHIPPING REGION'].replace([''],'Undetermined')

# Convert dates of ballast voyages to months and years for counting purposes
df["months"] = df['START DATE'].dt.strftime('%m-%Y')

# Obtain counts per month per destination
counts_per_month_and_destination = df.groupby(["months", "DESTINATION SHIPPING REGION"]).size()

# Obtain counts per month to help calculate proportions
counts_per_month = counts_per_month_and_destination.groupby(level=0).transform(sum)

# Obtain proportions by dividing each destination's counts by the counts in each month
ratios_per_month_and_destination = counts_per_month_and_destination / counts_per_month

# Reset indices to make data more readable
ratios_per_month_and_destination = ratios_per_month_and_destination.reset_index()
counts_per_month_and_destination = counts_per_month_and_destination.reset_index()

# Label the columns
counts_per_month_and_destination.columns = ['Date', 'Destination Shipping Region', 'Count']
ratios_per_month_and_destination.columns = ['Date', 'Destination Shipping Region', 'Proportion']

# Check that voyages are in fact all ballast
final_check = post_route_df["VOYAGE STATUS"].unique()
print("All voyages are", final_check)

# Change names of the output files here (filepath may need to be edited in Windows)
counts_per_month_and_destination.to_csv("~/Desktop/Monthly ballast distribution counts.csv", index=False)
ratios_per_month_and_destination.to_csv("~/Desktop/Monthly ballast distribution percentages.csv", index=False)

return counts_per_month_and_destination, ratios_per_month_and_destination

--

--