Apache NiFi, Apache Kafka, Apache Flink, Java, Spring — For Real-Time Airport Arrivals and Departures
This is the next in the series of optimizing travel utilizing real-time streaming, for the start of all this check out “Finding the Best Way Around”.
In a following article we will revist and update our ADSB-Y receiver article and describe how we ingest and transform data from an antenna.
In this stage we are starting to build up all of the feeds we need to be able to make smart decisions quickly and provide all the necessary data to AI and ML models to things like answer LLM/NLP Chat questions on how should I go somewhere if I am leaving tomorrow, now or soon. This will incorporate weather, air quality, roads, buses, light rail, rail, planes, social media, travle advisories and more. As part of this we will provide real-time notifications to users via Slack, Discord, Email, Web socket front-ends and other dashboards. I am open to work with collaborators in the open source or suggestions for end-user applications and other data processors like my friends at RisingWave, Timeplus, StarTree Pinot, LLM/Vector Database collaborators like Zilliz Milvus, IBM watsonx.ai and other.
REST API to obtain airport information (KEWR is Newark New Jersey Airport)
https://opensky-network.org/api/flights/departure?airport=KEWR
&begin=${now():toNumber():divide(1000):minus(604800)}
&end=${now():toNumber():divide(1000)}
https://opensky-network.org/api/flights/arrival?airport=${airport}
&begin=${now():toNumber():divide(1000):minus(604800)}
&end=${now():toNumber():divide(1000)}
The above link utilizes the standard REST link and enhances it be setting the begin date using NiFi’s Expression language to get current time in UNIX format in seconds. In this example I am looking at the last week of data for the airport departures and arrivals in the second URL.
So why stick with just my local airport, I decided to iterate through a list of most of the largest airports doing both departures and arrivals since they use the same format.
[
{"airport":"KATL"},
{"airport":"KEWR"},
{"airport":"KJFK"},
{"airport":"KLGA"},
{"airport":"KDFW"},
{"airport":"KDEN"},
{"airport":"KORD"},
{"airport":"KLAX"},
{"airport":"KLAS"},
{"airport":"KMCO"},
{"airport":"KMIA"},
{"airport":"KCLT"},
{"airport":"KSEA"},
{"airport":"KPHX"},
{"airport":"KSFO"},
{"airport":"KIAH"},
{"airport":"KBOS"},
{"airport":"KFLL"},
{"airport":"KMSP"},
{"airport":"KPHL"},
{"airport":"KDCA"},
{"airport":"KSAN"},
{"airport":"KBWI"},
{"airport":"KTPA"},
{"airport":"KAUS"},
{"airport":"KIAD"},
{"airport":"KMDW"}
]
CODE
All source code for tables, SQL, HTML, Javascript, JSON, formatting, Kafka and NiFi are made available. We also link to free open source environments to run this code.
Waiting For Data the Old Fashioned Way
Schema Data
{"type":"record","name":"openskyairport",
"namespace":"dev.datainmotion",
"fields":[
{"name":"icao24","type":["string","null"]},
{"name":"firstSeen","type":["int","null"]},
{"name":"estDepartureAirport","type":["string","null"]},
{"name":"lastSeen","type":["int","null"]},
{"name":"estArrivalAirport","type":["string","null"]},
{"name":"callsign","type":["string","null"]},
{"name":"estDepartureAirportHorizDistance","type":["int","null"]},
{"name":"estDepartureAirportVertDistance","type":["int","null"]},
{"name":"estArrivalAirportHorizDistance","type":["int","null"]},
{"name":"estArrivalAirportVertDistance","type":["int","null"]},
{"name":"departureAirportCandidatesCount","type":["int","null"]},
{"name":"arrivalAirportCandidatesCount","type":["int","null"]},
{"name":"ts","type":["string","null"]},
{"name":"uuid","type":["string","null"]}
]
}
If you wish to create this in Cloudera/Hortonworks Schema Registry, Confluent Schema Registry, NiFi Avro Schema Registry or just in files feel free to do so. NiFi and SQL Stream Builder can just infer them for now.
Example JSON Data
{
"icao24" : "a46cc1",
"firstSeen" : 1688869070,
"estDepartureAirport" : "KEWR",
"lastSeen" : 1688869079,
"estArrivalAirport" : null,
"callsign" : "UAL1317",
"estDepartureAirportHorizDistance" : 645,
"estDepartureAirportVertDistance" : 32,
"estArrivalAirportHorizDistance" : null,
"estArrivalAirportVertDistance" : null,
"departureAirportCandidatesCount" : 325,
"arrivalAirportCandidatesCount" : 0,
"ts" : "1688869093501",
"uuid" : "30682e35-e695-4524-8d1b-1abd0c7cffaf"
}
This is what are augmented JSON data looks like, we added ts and uuid to the raw data. We also trimmed spaces from callsign.
NiFI Flow to Acquire Data
Kafka Data Viewed in Cloudera Streams Messaging Manager (SMM)
Flink SQL Table Against Kafka Topic (openskyairport)
CREATE TABLE `ssb`.`Meetups`.`openskyairport` (
`icao24` VARCHAR(2147483647),
`firstSeen` BIGINT,
`estDepartureAirport` VARCHAR(2147483647),
`lastSeen` BIGINT,
`estArrivalAirport` VARCHAR(2147483647),
`callsign` VARCHAR(2147483647),
`estDepartureAirportHorizDistance` BIGINT,
`estDepartureAirportVertDistance` BIGINT,
`estArrivalAirportHorizDistance` VARCHAR(2147483647),
`estArrivalAirportVertDistance` VARCHAR(2147483647),
`departureAirportCandidatesCount` BIGINT,
`arrivalAirportCandidatesCount` BIGINT,
`ts` VARCHAR(2147483647),
`uuid` VARCHAR(2147483647),
`eventTimeStamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp',
WATERMARK FOR `eventTimeStamp` AS `eventTimeStamp` - INTERVAL '3' SECOND
) WITH (
'scan.startup.mode' = 'group-offsets',
'deserialization.failure.policy' = 'ignore_and_log',
'properties.request.timeout.ms' = '120000',
'properties.auto.offset.reset' = 'earliest',
'format' = 'json',
'properties.bootstrap.servers' = 'kafka:9092',
'connector' = 'kafka',
'properties.transaction.timeout.ms' = '900000',
'topic' = 'openskyairport',
'properties.group.id' = 'openskyairportflrdrgrp'
)
Flink SQL Query Against Kafka Table
select icao24, callsign, firstSeen, lastSeen, estDepartureAirport, arrivalAirportCandidatesCount,
estDepartureAirportHorizDistance, estDepartureAirportVertDistance, estArrivalAirportHorizDistance,
estArrivalAirportVertDistance, departureAirportCandidatesCount
from openskyairport
This is an example query, we can do things like add time windows, max/min/average/sum (aggregates), joins and more. We can also setup upsert tables to insert results into Kafka topics (or in JDBC tables).
SQL Stream Builder (Apache Flink SQL / PostgreSQL) Materialized View in HTML/JSON
[{"icao24":"c060b9","callsign":"POE2136","firstSeen":"1689193028",
"lastSeen":"1689197805","estDepartureAirport":"KEWR",
"arrivalAirportCandidatesCount":"3","estDepartureAirportHorizDistance":"357",
"estDepartureAirportVertDistance":"24","estArrivalAirportHorizDistance":"591",
"estArrivalAirportVertDistance":"14","departureAirportCandidatesCount":"1"},{"icao24":"a9b85b","callsign":"RPA3462","firstSeen":"1689192822","lastSeen":"1689196463","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"6","estDepartureAirportHorizDistance":"788","estDepartureAirportVertDistance":"9","estArrivalAirportHorizDistance":"2017","estArrivalAirportVertDistance":"30","departureAirportCandidatesCount":"1"},{"icao24":"a4b205","callsign":"N401TD","firstSeen":"1689192818","lastSeen":"1689198430","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"4","estDepartureAirportHorizDistance":"13461","estDepartureAirportVertDistance":"24","estArrivalAirportHorizDistance":"204","estArrivalAirportVertDistance":"8","departureAirportCandidatesCount":"1"},{"icao24":"a6eed5","callsign":"GJS4485","firstSeen":"1689192782","lastSeen":"1689195255","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"4","estDepartureAirportHorizDistance":"451","estDepartureAirportVertDistance":"17","estArrivalAirportHorizDistance":"1961","estArrivalAirportVertDistance":"56","departureAirportCandidatesCount":"1"},{"icao24":"a64996","callsign":"JBU1527","firstSeen":"1689192458","lastSeen":"1689200228","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"5","estDepartureAirportHorizDistance":"750","estDepartureAirportVertDistance":"9","estArrivalAirportHorizDistance":"4698","estArrivalAirportVertDistance":"107","departureAirportCandidatesCount":"1"},{"icao24":"aa8548","callsign":"N777ZA","firstSeen":"1689192423","lastSeen":"1689194898","estDepartureAirport":"KEWR","arrivalAirportCandidatesCount":"4","estDepartureAirportHorizDistance":"13554","estDepartureAirportVertDistance":"55","estArrivalAirportHorizDistance":"13735","estArrivalAirportVertDistance":"32","departureAirportCandidatesCount":"1"}]
This JSON data can now be read in web pages, Jupyter notebooks, Python code, mobile phones or wherever.
Materialized View Endpoint Creation
Our Dashboard Feed from that Materialized View
Step-by-Step Building an Airport Arrivals and Departures Streaming Pipeline
- NiFi: NiFi schedules REST Calls
- NiFi: Calls Arrivals REST Endpoint with iteration of all 25 airports
- NiFi: Calls Departure REST Endpoint with iterations of all 25 airports
- NiFi: Extracts Avro Schema for JSON data
- NiFi: Updates records adding a unique ID and timestamp for each record
- NiFi: (For demos, we split record batches into single records and drip feed 1 record per second)
- NiFi: We publish records to Kafka topic: openskyairport
- Kafka: topic arrives in cluster in order as JSON Records
- Flink SQL: Table built by inferring JSON data from Kafka topic
- SSB: Interactive SQL is launched as Flink job on Flink cluster in K8
- SSB: Create a materialized view from SQL results
- SSB: Hosts materialized view as JSON REST Endpoint
- HTML/JSON Dashboard Reads JSON REST Endpoint and feeds it to JQuery datatables.
- Data: Live and available data feed published via REST Endpoint, Kafka topic, Slack channel, Discord channel, and future sink. We will add Apache Iceberg and Apache Kudu storage. Please suggest other endpoints.
Flink SQL Consuming all the Kafka Messages
Data in consumed in mass quantities, quickly.
Video
References
- https://github.com/tspannhw/pulsar-adsb-function
- https://flightaware.com/adsb/stats/user/TimothySpann
- https://www.linkedin.com/pulse/flight-monitor-cloudera-best-flow-contest-alexis-naranjo-escalona/
- https://github.com/tspannhw/raspberry-pi-adsb
- https://github.com/tspannhw/java-adsb
- https://opensky-network.org/
- https://github.com/tspannhw/FLiP-Py-ADS-B
Data
- https://opensky-network.org/api/states/all?extended=1
- https://www.radarbox.com
- https://openskynetwork.github.io/opensky-api/rest.html
- https://openskynetwork.github.io/opensky-api/
Data Provided By OpenSky Network
The OpenSky Network, https://opensky-network.org
Matthias Schäfer, Martin Strohmeier, Vincent Lenders, Ivan Martinovic and Matthias Wilhelm.
"Bringing Up OpenSky: A Large-scale ADS-B Sensor Network for Research".
In Proceedings of the 13th IEEE/ACM International Symposium on Information Processing in Sensor Networks (IPSN), pages 83-94, April 2014.
To practice what we preach, I had ChatGPT generate a summation of my article and I include that as an extract output.
Title: Real-Time Airport Traffic Analysis: Unveiling Insights with NiFi, Kafka, and Flink SQL
Introduction: In today’s fast-paced world, airports serve as vital hubs for global travel and transportation. With millions of flights taking off and landing worldwide, the ability to monitor and analyze airport traffic data in real-time becomes invaluable. OpenSky Networks, a collaborative open-source initiative, provides a wealth of real-time aircraft tracking data. In this article, we explore how NiFi, Kafka, and Flink SQL can be leveraged to collect, process, and analyze airport data from OpenSky Networks, enabling us to gain valuable insights into airport traffic patterns.
- Understanding NiFi: Data Collection and Flow Management: Apache NiFi, a powerful data integration and flow management tool, acts as the foundation for our data collection pipeline. NiFi enables us to fetch real-time airport data from OpenSky Networks through their API. By utilizing NiFi’s processors, we can extract relevant information such as flight details, aircraft positions, and metadata. Additionally, NiFi provides various processors for data transformation, enrichment, and filtering, allowing us to refine the incoming data to meet our specific requirements.
- Enabling Real-Time Data Streaming with Kafka: Once the data is collected and processed by NiFi, we need a robust messaging system to enable real-time data streaming. Apache Kafka, a distributed event streaming platform, serves as an ideal choice. NiFi integrates seamlessly with Kafka, enabling us to publish the processed airport data as Kafka messages. Kafka’s fault-tolerant and scalable architecture ensures reliable and efficient data streaming, making it an excellent choice for handling high volumes of real-time airport traffic data.
- Leveraging Flink SQL for Real-Time Analysis: With the data streaming into Kafka, we can now utilize Apache Flink SQL to perform real-time analysis on the airport traffic data. Flink SQL provides a high-level SQL-like interface to express complex analytical queries over continuous data streams. By defining Flink SQL queries, we can extract valuable insights from the airport data in real-time. For example, we can monitor flight delays, track airport congestion, identify unusual flight patterns, or even detect potential security threats.
- Extracting Insights and Visualization: Once we have processed and analyzed the airport traffic data using Flink SQL, we can extract meaningful insights to aid decision-making and operational efficiency. By visualizing the results using tools like Apache Superset or Tableau, we can create intuitive dashboards and reports that provide a comprehensive overview of airport traffic patterns. These visualizations can help airport authorities, airlines, and ground services teams make informed decisions regarding flight schedules, resource allocation, and infrastructure planning.
- Enhancing Airport Operations and Safety: The utilization of NiFi, Kafka, and Flink SQL for real-time airport traffic analysis brings numerous benefits to airport operations and safety. By monitoring real-time flight data, airports can optimize gate assignments, runway utilization, and baggage handling processes, leading to reduced delays and improved passenger experiences. Furthermore, the ability to detect anomalies and security threats in real-time enhances airport safety and security protocols, ensuring a safer environment for travelers.
Conclusion: Real-time analysis of airport traffic data plays a crucial role in enhancing operational efficiency and safety in the aviation industry. By leveraging the combined power of NiFi, Kafka, and Flink SQL, airports can collect, process, and analyze data from OpenSky Networks in real-time, enabling them to make data-driven decisions and improve overall airport operations. The ability to monitor flight patterns, detect anomalies, and optimize resources ultimately leads to improved passenger experiences and safer travel environments. With advancements in data streaming and analytics technologies, the future of airport traffic analysis looks promising, revolutionizing the way we manage and optimize air travel.