Mastering NetFlow Traffic Analysis with Nfdump: A Comprehensive Guide

Yousef Alkhanafseh
TurkNet Technology
Published in
9 min readApr 25, 2023

All things to know about Analyzing Netflow traffic data with Nfdump software

Illustrative image that represents data traffic

HISTORY

Historically, the first version of Nfdump, which is version Nfdump 1.1, was released in 2004. Since then, it has been updated many times, with the most recent version being Nfdump 1.7.2, which was released in March, 2023. The concept of Nfdump is derived from “NetFlow dump,” reflecting its original purpose of processing NetFlow data.

INTRODUCTION

This tutorial mainly focuses on utilizing Nfdump software for collecting and analyzing NetFlow traffic data. Nfdump can be defined as an open-source command-line tool that is used for processing and analyzing NetFlow data generated by various network devices such as routers, switches, firewalls, etc. Moreover, it is used for real-time monitoring and for creating different types of reports for network traffic. Thus, Nfdump can be considered as an essential tool that plays an important role in analyzing network performance, security, and usage patterns. NetFlow is a networking protocol developed by Cisco Systems for collecting network traffic data. A network protocol is a set of rules and conventions used for communication between devices in a computer network. It defines how data is transmitted, received, and processed between different devices and applications in a network. In general, records/observations inside NetFlow data are called Flows which can be either ingress “enter” or egress “exit”.

NFDUMP FEATURES

The features of Nfdump can be briefly divided into five categories which are compatibility, reporting skill, flexibility, filtering skill, and exporting skill:

A. Compatibility: it is compatible with a variety of NetFlow versions, including v5, v7, v9, and IPFIX. Furthermore, it supports NSEL (Network Event Security Logging), Jun OS NAT Event Logging, as well as NEL (NAT Event Logging). It is suitable for reading both IPv4 and IPv6 traffic data as well.

B. Flexibility: it can process large amounts of NetFlow data quickly and efficiently, making it suitable for use in high-speed networks. It can also handle different types of input data, including compressed and binary files.

C. Reporting skill: it can generate various types of reports such as summary reports, traffic volume reports, and protocol distribution reports. In addition, it can create graphical reports on its User Interface (UI), such as pie charts and bar charts, to help visualize the data.

D. Filtering skill: it supports flexible filtering capabilities based on different criteria, thereby easy extracting information from large volumes of NetFlow data.

E. Exporting skill: it can export NetFlow data in different formats, such as CSV, JSON, and XML.

To use Nfdump, at first it should be installed on Linux or a Unix-based system such as Windows Subsystem for Linux (WSL), Nfdump installation. After installing Nfdump, Nfdump version and tags can be printed using nfdump -v and nfdump -h commands, respectively. It can be installed as a Python model, pynfdump, as well. Once it is installed, it can be run from the command line with various options and argument.

DATA FILES

The data files that are used during this tutorial are accessible on GitHub. It is important to state that specific sensitive features such as source and destination IPs, ports, etc., that are found inside this tutorial data files have been anonymized using Nfanon software. The data consists of four files which were captured from three different routers that belong to Turknet Company. Each file of these data-sets contains nearly 100,000 flows. As an example of the used data-set that shows 10 flows is shown in Figure 1.

Sample of printing netflow data using nfdump output default format. It consists of the following columns; date first seen, duration, event, Xevent, Proto, Src IP Addr:Port, Dst IP Addr:Port, X-Src IP Addr:Port, Dst IP Addr:Port, In Byte, Out Byte, and Flow. In addition, at the bottom of the image some statistics are shown
Figure 1. Sample of Netflow Data with Default Output

PROCESSING NETFLOW TRAFFIC DATA WITH NFDUMP

In this section, a comprehensive explanation of how to use Nfdump by examining its various important tags will be provided. Three main parts, which are reading, filtering & aggregating, and extracting NetFlow data will be discussed.

1. Reading NetFlow data

There are three tags to read NetFlow data using Nfdump which are -r, -R, and -M. They can be explained as follow:

* Reading a single NetFlow file (-r):

A single file of NetFlow data can be read using the following command:

nfdump -r <filename>

> nfdump -r router_1/nfcapd.202304172010

* Reading all NetFlow files found inside one directory (-R)

Multiple files of NetFlow data found inside one directory can be read using the following command:

nfdump -R <dirname>

> nfdump -R router_1

* Reading all/specific NetFlow files found inside different directories (-M)

All files of NetFlow data found inside different directories can be read using the names of directories separated by the colon punctuation mark (:) followed by -R tag and dot punctuation mark (.). If multiple specific files are supposed to be read, after that the dot must be replaced with file names separated by the colon punctuation mark. Lastly, if one file is needed, then, -M tag followed by -r and <filename> must be used.

  • All NetFlow files found inside different directories

nfdump -M <dirname1>:<dirname2>:<dirname3> -R .

> nfdump -M router_1:router_2 -R .
  • Specific NetFlow file found inside different directories

nfdump -M <dirname1>:<dirname2>:<dirname3> -r <filename>

> nfdump -M router_1:router_3 -r nfcapd.202304172020

2. Filtering & Aggregating NetFlow data

This part is intended to be completed using only one NetFlow data file, which is router_1/nfcapd.202304172010. However, the filtering & aggregating tags that will be explained can also be applied to multiple files by reading them using either the -R or -M tags, as mentioned earlier.

* Returning specific num of flows (-c)

It is used to print out specific number of top lines.

nfdump -r <filename> -c <num of flows>

> nfdump -r router_1/nfcapd.202304172010 -c 10

* Filtering flows based on columns

Filtering flows can be based on several columns, which are mentioned here. These column names must be used with a greater-than sign (>) for bigger, a smaller-than sign (<) for smaller, and an empty space for equal. Moreover, connecting two or more conditions can be done using the “and” or “or” options.

nfdump -r <filename> ‘<col name> <num/type>’

> nfdump -r router_1/nfcapd.202304172010 'Packets > 0 and Bytes > 0'

* Returning specific features/columns (-o)

The default output format of Nfdump may not always be useful, and additional columns may be needed. To specify the output column format, there are two options: using Nfdump formats or defining user-based formats. It’s essential to note that the comma (,) between column tags can be replaced with any separation character. The complete list of column tags and their explanations can be found here.

  • Default Nfdump output formats

The Nfdump default defined formats with their explanations are found here.

nfdump -r <filename> -o <format>

> nfdump -r router_1/nfcapd.202304172010 -o csv
  • User-based output format

nfdump -r <filename> -o ‘fmt:<col tag1>,<col2 tag>,<col3 tag>’

> nfdump -r router_1/nfcapd.202304172010 -o 'fmt:%ts|%td|%pr|%sa|%da|%sp|%dp|%ibyt|%obyt'

* Filtering flows based on timestamp range (-t)

This option returns only flows between the given timestamp range. The specified start and end timestamps must be in the format of %YYYY/%MM/dd./%hh:%mm:%ss. However, if any field of these times is not given, nfdump replaces it with zeros. For example, 2023/04/17.12 will be the same as 2023/04/17.12:00:00.

nfdump -r <filename> -t <start time>-<end time>

> nfdump -r router_1/nfcapd.202304172010 -t 2023/04/17.20:09:00–2023/04/17.20:10:00

* Filtering flows with IPv4 or IPv6

Sometimes, only flows with IP version 4 or IP version6 are required, so in order to filter flows based on these conditions, the following commands can be used:

  • IPv4

To obtain only flows that contain IPs with version 4, use the following command:

nfdump -r <filename> ‘ipv4’

> nfdump -r router_1/nfcapd.202304172010 'ipv4'
  • IPv6

To obtain only flows that contain IPs with version 6, use the following command:

nfdump -r <filename> ‘inet6’

> nfdump -r router_1/nfcapd.202304172010 'inet6'

* Filtering header and statistics inside Netflow file (-q)

In general, there is a header and tail in NetFlow data which display column names and statistics about NetFlow data, respectively. This information sometimes has to be deleted, thus, they can be deleted using the following command.

nfdump -r <filename> -q

Example: nfdump -r router_1/nfcapd.202304172010 -q

* Replacing string columns with their reference numbers (-N)

In order to replace strings found in some columns such as Network Protocols with their reference numbers and KB, MB, GB found inside Bytes columns -N tag can be used.

Nfdump -r <filename> -N

> nfdump -r router_1/nfcapd.202304172010 -N

* Aggregating flows (Unidirectional)

There are two options of tags in nfdump that can perform unidirectional flow aggregation: -a and -A. The first one aggregates flows using Nfdump’s default list of columns, which are protocol, srcip, dstip, srcport, and dstport. On the other hand, the -A tag enables users to define a specific aggregation format based on selected columns. To see the columns that can be used with the -A tag, look here. It is important to note that column names must be comma-separated.

  • Default aggregation

Nfdump -r <filename> -a

> nfdump -r router_1/nfcapd.202304172010 -a
  • User-based aggregation

Nfdump -r <filename> -A <col1>,<col2>,<col3>

> nfdump -r router_1/nfcapd.202304172010 -A proto,srcip,dstip

* Flow statistics (-s)

In default, -s tag returns the top 10 statistics of specified field, look here to know the fields that can be used along with -s. In order to increase the number of top statistics, -n <num> can be added next to -s. In addition, -l and -L tags can be used along with -s in order to filter flows that are above, use (+), or below, use (-), packet and byte amounts, respectively.

Nfdump -r <filename> -s <filed> -n <num> -l <- or +><num> -L <- or +><num>

> nfdump -r router_1/nfcapd.202304172010 -s srcip -n 30 -l +40 -L +20

* Ordering/sorting flows (-O)

To sort flows based on a given field, the following command can be used. Look here to know the fields that can be used with -O. If the field is not specified, the default will be the flow field.

Nfdump -r <filename> -O <filed>

> nfdump -r router_1/nfcapd.202304172010 -O tstart

3. Exporting NetFlow data

Netflow data can be exported to new file that either can be readable or nonreadable file by Nfdump.

* Extract flows to another binary, Nfdump readable, file (-w).

To extract Netflow data to another binary file that is readable by Nfdump, the following command can be used.

Nfdump -r <filename> -w <output_filename>

> nfdump -r router_1/nfcapd.202304172010 -w mynewfile.202304171300

* Extarct flows to another user-specified file, nfdump unreadable, file (>)

To extract Netflow data to another file that is unreadable by Nfdump, the following command can be used.

Nfdump -r <filename> > <output_filename>

> nfdump -r router_1/nfcapd.202304172010 > mynewfile.csv

Example

The example shown in Figure 2 summarizes almost all the previously explained tags. It reads all NetFlow files found inside the router_1 directory, deletes the header and tail, converts strings to numerics, specifies the output format to include <start time, duration, network protocol, source IP, destination IP, source port, destination port, input bytes, output bytes> columns, filters flows between 2023/04/17.20:09:00 and 2023/04/17.20:10:00, sorts flows based on start time, aggregates flows that have the same <network protocol, source IP, destination IP, source port, destination port, router>, takes flows with packets bigger than zero and have IP version 4, and finally writes the obtained data to the output_data.csv file.

> nfdump -R router_1 -q -N -o 'fmt:%ts|%td|%pr|%sa|%da|%sp|%dp|%ibyt|%obyt' -t 2023/04/17.20:09:00-2023/04/17.20:10:00 -O tstart -A proto,srcip,dstip,srcport,dstport,router 'Packets >0 and ipv4' > output_data.csv
Figure 2. Nfdump Example

CONCLUSION

In conclusion, Nfdump is a powerful tool which can be efficiently utilized to analyz network traffic data captured using NetFlow. With Nfdump, Netflow data can be read, filtered, aggregated, and extracted which in turn leads to gain significant insights into NetFlow data. Therefore, several benefits that are related to network performance, security, and usage patterns can be obtained. Princely, monitoring network performance metrics, such as bandwidth usage and packet loss, detecting security threats, such as malware infections and Denial-Of-Service (DOS) attacks, and analyzing user behavior and application usage. As a result, smart network optimization and capacity planning decisions can be accurately made. In this tutorial, the history, benefits, and especially the usage of Nfdump are significantly discussed.

REFERENCES

--

--