Watching Airport Traffic in Real-Time

Tim Spann
Cloudera
Published in
9 min readJul 14, 2023

Apache NiFi, Apache Kafka, Apache Flink, Java, Spring — For Real-Time Airport Arrivals and Departures

openskyairport — nifi — kafka — flink sql
Photo by Jue Huang on Unsplash

This is the next in the series of optimizing travel utilizing real-time streaming, for the start of all this check out “Finding the Best Way Around”.

In a following article we will revist and update our ADSB-Y receiver article and describe how we ingest and transform data from an antenna.

In this stage we are starting to build up all of the feeds we need to be able to make smart decisions quickly and provide all the necessary data to AI and ML models to things like answer LLM/NLP Chat questions on how should I go somewhere if I am leaving tomorrow, now or soon. This will incorporate weather, air quality, roads, buses, light rail, rail, planes, social media, travle advisories and more. As part of this we will provide real-time notifications to users via Slack, Discord, Email, Web socket front-ends and other dashboards. I am open to work with collaborators in the open source or suggestions for end-user applications and other data processors like my friends at RisingWave, Timeplus, StarTree Pinot, LLM/Vector Database collaborators like Zilliz Milvus, IBM watsonx.ai and other.

REST API to obtain airport information (KEWR is Newark New Jersey Airport)

https://opensky-network.org/api/flights/departure?airport=KEWR
&begin=${now():toNumber():divide(1000):minus(604800)}
&end=${now():toNumber():divide(1000)}
https://opensky-network.org/api/flights/arrival?airport=${airport}
&begin=${now():toNumber():divide(1000):minus(604800)}
&end=${now():toNumber():divide(1000)}

The above link utilizes the standard REST link and enhances it be setting the begin date using NiFi’s Expression language to get current time in UNIX format in seconds. In this example I am looking at the last week of data for the airport departures and arrivals in the second URL.

So why stick with just my local airport, I decided to iterate through a list of most of the largest airports doing both departures and arrivals since they use the same format.

[
{"airport":"KATL"},
{"airport":"KEWR"},
{"airport":"KJFK"},
{"airport":"KLGA"},
{"airport":"KDFW"},
{"airport":"KDEN"},
{"airport":"KORD"},
{"airport":"KLAX"},
{"airport":"KLAS"},
{"airport":"KMCO"},
{"airport":"KMIA"},
{"airport":"KCLT"},
{"airport":"KSEA"},
{"airport":"KPHX"},
{"airport":"KSFO"},
{"airport":"KIAH"},
{"airport":"KBOS"},
{"airport":"KFLL"},
{"airport":"KMSP"},
{"airport":"KPHL"},
{"airport":"KDCA"},
{"airport":"KSAN"},
{"airport":"KBWI"},
{"airport":"KTPA"},
{"airport":"KAUS"},
{"airport":"KIAD"},
{"airport":"KMDW"}
]

CODE

All source code for tables, SQL, HTML, Javascript, JSON, formatting, Kafka and NiFi are made available. We also link to free open source environments to run this code.

Waiting For Data the Old Fashioned Way

Schema Data

{"type":"record","name":"openskyairport",
"namespace":"dev.datainmotion",
"fields":[
{"name":"icao24","type":["string","null"]},
{"name":"firstSeen","type":["int","null"]},
{"name":"estDepartureAirport","type":["string","null"]},
{"name":"lastSeen","type":["int","null"]},
{"name":"estArrivalAirport","type":["string","null"]},
{"name":"callsign","type":["string","null"]},
{"name":"estDepartureAirportHorizDistance","type":["int","null"]},
{"name":"estDepartureAirportVertDistance","type":["int","null"]},
{"name":"estArrivalAirportHorizDistance","type":["int","null"]},
{"name":"estArrivalAirportVertDistance","type":["int","null"]},
{"name":"departureAirportCandidatesCount","type":["int","null"]},
{"name":"arrivalAirportCandidatesCount","type":["int","null"]},
{"name":"ts","type":["string","null"]},
{"name":"uuid","type":["string","null"]}
]
}

If you wish to create this in Cloudera/Hortonworks Schema Registry, Confluent Schema Registry, NiFi Avro Schema Registry or just in files feel free to do so. NiFi and SQL Stream Builder can just infer them for now.

Example JSON Data

{
"icao24" : "a46cc1",
"firstSeen" : 1688869070,
"estDepartureAirport" : "KEWR",
"lastSeen" : 1688869079,
"estArrivalAirport" : null,
"callsign" : "UAL1317",
"estDepartureAirportHorizDistance" : 645,
"estDepartureAirportVertDistance" : 32,
"estArrivalAirportHorizDistance" : null,
"estArrivalAirportVertDistance" : null,
"departureAirportCandidatesCount" : 325,
"arrivalAirportCandidatesCount" : 0,
"ts" : "1688869093501",
"uuid" : "30682e35-e695-4524-8d1b-1abd0c7cffaf"
}

This is what are augmented JSON data looks like, we added ts and uuid to the raw data. We also trimmed spaces from callsign.

NiFI Flow to Acquire Data

Original EWR Only Departures Data
In this updated version we ingest from 25+ airports for arrivals and departures.
Split out individual records and slow them down for demo speed
JSON Read to JSON Write and build out an AVRO Schema. For Now SQL returns all rows and all fields.
Write our JSON Records with avro.schema as a NiFi attribute
UpdateRecord to add timestamp and unique ID.
Write out stream of records to Kafka as JSON records to the openskyairport to our Kafka cluster
Set everything as a parameter for easy deployment via NiFi CLI, CDF Public Cloud or REST API
Provenance Data from our JSON Rows
Photo by Bao Menglong on Unsplash

Kafka Data Viewed in Cloudera Streams Messaging Manager (SMM)

SMM Lets us view our Kafka data without changing active consumers
We can view any JSON/AVRO records without effecting the live stream

Flink SQL Table Against Kafka Topic (openskyairport)

CREATE TABLE `ssb`.`Meetups`.`openskyairport` (
`icao24` VARCHAR(2147483647),
`firstSeen` BIGINT,
`estDepartureAirport` VARCHAR(2147483647),
`lastSeen` BIGINT,
`estArrivalAirport` VARCHAR(2147483647),
`callsign` VARCHAR(2147483647),
`estDepartureAirportHorizDistance` BIGINT,
`estDepartureAirportVertDistance` BIGINT,
`estArrivalAirportHorizDistance` VARCHAR(2147483647),
`estArrivalAirportVertDistance` VARCHAR(2147483647),
`departureAirportCandidatesCount` BIGINT,
`arrivalAirportCandidatesCount` BIGINT,
`ts` VARCHAR(2147483647),
`uuid` VARCHAR(2147483647),
`eventTimeStamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp',
WATERMARK FOR `eventTimeStamp` AS `eventTimeStamp` - INTERVAL '3' SECOND
) WITH (
'scan.startup.mode' = 'group-offsets',
'deserialization.failure.policy' = 'ignore_and_log',
'properties.request.timeout.ms' = '120000',
'properties.auto.offset.reset' = 'earliest',
'format' = 'json',
'properties.bootstrap.servers' = 'kafka:9092',
'connector' = 'kafka',
'properties.transaction.timeout.ms' = '900000',
'topic' = 'openskyairport',
'properties.group.id' = 'openskyairportflrdrgrp'
)
This is the Flink SQL table that was autogenerated for us by inferring data from the Kafka topic.

Flink SQL Query Against Kafka Table

select icao24, callsign, firstSeen, lastSeen, estDepartureAirport, arrivalAirportCandidatesCount,
estDepartureAirportHorizDistance, estDepartureAirportVertDistance, estArrivalAirportHorizDistance,
estArrivalAirportVertDistance, departureAirportCandidatesCount
from openskyairport

This is an example query, we can do things like add time windows, max/min/average/sum (aggregates), joins and more. We can also setup upsert tables to insert results into Kafka topics (or in JDBC tables).

SQL Stream Builder (Apache Flink SQL / PostgreSQL) Materialized View in HTML/JSON

[{"icao24":"c060b9","callsign":"POE2136","firstSeen":"1689193028",
"lastSeen":"1689197805","estDepartureAirport":"KEWR",
"arrivalAirportCandidatesCount":"3","estDepartureAirportHorizDistance":"357",
"estDepartureAirportVertDistance":"24","estArrivalAirportHorizDistance":"591",
"estArrivalAirportVertDistance":"14","departureAirportCandidatesCount":"1"},{"icao24":"a9b85b","callsign":"RPA3462","firstSeen":"1689192822","lastSeen":"1689196463","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"6","estDepartureAirportHorizDistance":"788","estDepartureAirportVertDistance":"9","estArrivalAirportHorizDistance":"2017","estArrivalAirportVertDistance":"30","departureAirportCandidatesCount":"1"},{"icao24":"a4b205","callsign":"N401TD","firstSeen":"1689192818","lastSeen":"1689198430","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"4","estDepartureAirportHorizDistance":"13461","estDepartureAirportVertDistance":"24","estArrivalAirportHorizDistance":"204","estArrivalAirportVertDistance":"8","departureAirportCandidatesCount":"1"},{"icao24":"a6eed5","callsign":"GJS4485","firstSeen":"1689192782","lastSeen":"1689195255","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"4","estDepartureAirportHorizDistance":"451","estDepartureAirportVertDistance":"17","estArrivalAirportHorizDistance":"1961","estArrivalAirportVertDistance":"56","departureAirportCandidatesCount":"1"},{"icao24":"a64996","callsign":"JBU1527","firstSeen":"1689192458","lastSeen":"1689200228","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"5","estDepartureAirportHorizDistance":"750","estDepartureAirportVertDistance":"9","estArrivalAirportHorizDistance":"4698","estArrivalAirportVertDistance":"107","departureAirportCandidatesCount":"1"},{"icao24":"aa8548","callsign":"N777ZA","firstSeen":"1689192423","lastSeen":"1689194898","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"4","estDepartureAirportHorizDistance":"13554","estDepartureAirportVertDistance":"55","estArrivalAirportHorizDistance":"13735","estArrivalAirportVertDistance":"32","departureAirportCandidatesCount":"1"}]

This JSON data can now be read in web pages, Jupyter notebooks, Python code, mobile phones or wherever.

Materialized View Endpoint Creation

Build a Materialized View from our SQL

Our Dashboard Feed from that Materialized View

Step-by-Step Building an Airport Arrivals and Departures Streaming Pipeline

  1. NiFi: NiFi schedules REST Calls
  2. NiFi: Calls Arrivals REST Endpoint with iteration of all 25 airports
  3. NiFi: Calls Departure REST Endpoint with iterations of all 25 airports
  4. NiFi: Extracts Avro Schema for JSON data
  5. NiFi: Updates records adding a unique ID and timestamp for each record
  6. NiFi: (For demos, we split record batches into single records and drip feed 1 record per second)
  7. NiFi: We publish records to Kafka topic: openskyairport
  8. Kafka: topic arrives in cluster in order as JSON Records
  9. Flink SQL: Table built by inferring JSON data from Kafka topic
  10. SSB: Interactive SQL is launched as Flink job on Flink cluster in K8
  11. SSB: Create a materialized view from SQL results
  12. SSB: Hosts materialized view as JSON REST Endpoint
  13. HTML/JSON Dashboard Reads JSON REST Endpoint and feeds it to JQuery datatables.
  14. Data: Live and available data feed published via REST Endpoint, Kafka topic, Slack channel, Discord channel, and future sink. We will add Apache Iceberg and Apache Kudu storage. Please suggest other endpoints.

Flink SQL Consuming all the Kafka Messages

Data in consumed in mass quantities, quickly.

Video

References

Data

Sploot Kitten by Tim Spann on Unsplash Background

Data Provided By OpenSky Network

The OpenSky Network, https://opensky-network.org

Matthias Schäfer, Martin Strohmeier, Vincent Lenders, Ivan Martinovic and Matthias Wilhelm.
"Bringing Up OpenSky: A Large-scale ADS-B Sensor Network for Research".
In Proceedings of the 13th IEEE/ACM International Symposium on Information Processing in Sensor Networks (IPSN), pages 83-94, April 2014.
Remote NiFi

To practice what we preach, I had ChatGPT generate a summation of my article and I include that as an extract output.

Title: Real-Time Airport Traffic Analysis: Unveiling Insights with NiFi, Kafka, and Flink SQL

Introduction: In today’s fast-paced world, airports serve as vital hubs for global travel and transportation. With millions of flights taking off and landing worldwide, the ability to monitor and analyze airport traffic data in real-time becomes invaluable. OpenSky Networks, a collaborative open-source initiative, provides a wealth of real-time aircraft tracking data. In this article, we explore how NiFi, Kafka, and Flink SQL can be leveraged to collect, process, and analyze airport data from OpenSky Networks, enabling us to gain valuable insights into airport traffic patterns.

  1. Understanding NiFi: Data Collection and Flow Management: Apache NiFi, a powerful data integration and flow management tool, acts as the foundation for our data collection pipeline. NiFi enables us to fetch real-time airport data from OpenSky Networks through their API. By utilizing NiFi’s processors, we can extract relevant information such as flight details, aircraft positions, and metadata. Additionally, NiFi provides various processors for data transformation, enrichment, and filtering, allowing us to refine the incoming data to meet our specific requirements.
  2. Enabling Real-Time Data Streaming with Kafka: Once the data is collected and processed by NiFi, we need a robust messaging system to enable real-time data streaming. Apache Kafka, a distributed event streaming platform, serves as an ideal choice. NiFi integrates seamlessly with Kafka, enabling us to publish the processed airport data as Kafka messages. Kafka’s fault-tolerant and scalable architecture ensures reliable and efficient data streaming, making it an excellent choice for handling high volumes of real-time airport traffic data.
  3. Leveraging Flink SQL for Real-Time Analysis: With the data streaming into Kafka, we can now utilize Apache Flink SQL to perform real-time analysis on the airport traffic data. Flink SQL provides a high-level SQL-like interface to express complex analytical queries over continuous data streams. By defining Flink SQL queries, we can extract valuable insights from the airport data in real-time. For example, we can monitor flight delays, track airport congestion, identify unusual flight patterns, or even detect potential security threats.
  4. Extracting Insights and Visualization: Once we have processed and analyzed the airport traffic data using Flink SQL, we can extract meaningful insights to aid decision-making and operational efficiency. By visualizing the results using tools like Apache Superset or Tableau, we can create intuitive dashboards and reports that provide a comprehensive overview of airport traffic patterns. These visualizations can help airport authorities, airlines, and ground services teams make informed decisions regarding flight schedules, resource allocation, and infrastructure planning.
  5. Enhancing Airport Operations and Safety: The utilization of NiFi, Kafka, and Flink SQL for real-time airport traffic analysis brings numerous benefits to airport operations and safety. By monitoring real-time flight data, airports can optimize gate assignments, runway utilization, and baggage handling processes, leading to reduced delays and improved passenger experiences. Furthermore, the ability to detect anomalies and security threats in real-time enhances airport safety and security protocols, ensuring a safer environment for travelers.

Conclusion: Real-time analysis of airport traffic data plays a crucial role in enhancing operational efficiency and safety in the aviation industry. By leveraging the combined power of NiFi, Kafka, and Flink SQL, airports can collect, process, and analyze data from OpenSky Networks in real-time, enabling them to make data-driven decisions and improve overall airport operations. The ability to monitor flight patterns, detect anomalies, and optimize resources ultimately leads to improved passenger experiences and safer travel environments. With advancements in data streaming and analytics technologies, the future of airport traffic analysis looks promising, revolutionizing the way we manage and optimize air travel.

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/