Technology

TCP-INT: Intel’s Lightweight Network Telemetry Improves Visibility and Control for TCP Workloads

Simon Wass, Staff Engineer & Jeongkeun Lee, Senior Principal Engineer

Published in

Intel Tech

6 min readOct 18, 2022

Today’s data centers are serving challenging scale-out workloads, such as microservices, AI training and disaggregated storage. In this environment the network plays a critical role in the performance of those workloads while delivering high bandwidth and low latency. Deep visibility into the network performance is necessary for identifying problems and instantly addressing them.

We introduce a new solution to the Linux community, TCP-INT, a lightweight in-band telemetry for e2e visibility and closed-loop control of TCP workloads in data centers.

Telemetry for control

Intel has been a leader in providing packet-level network telemetry technologies such as In-band Network Telemetry (INT). INT is a form of packet telemetry developed to instrument and trace production data packets. The power of INT is that unlike out-of-band techniques such as ICMP ping or separate probe connections, INT experiences the same routing, and load-balancing decisions, queuing, and latency across the network as the production data packets.

After developing a suite of INT technologies, we identified several key requirements to evolve INT to enable closed-loop control of network issues such as congestion and network load-imbalance in 10s of microseconds time scale. First, it must be lightweight, minimizing the overhead of additional bytes on the data packets and the real-time processing of per-packet telemetry data. Second, the fabric telemetry must be integrated with end-hosts to deliver closed-loop control in the timeframe of a single network round-trip. The integration of network telemetry with a host stack also enables application-aware reporting of the telemetry data, minimizing telemetry report and processing overhead in the monitoring system. Finally, it must be flexible; to optimize for different data-center designs, workloads, and network issues the solution must be easily adapted to support different metrics while eliminating redundant information.

Key features of TCP-INT

TCP-INT stays lightweight and constant size regardless of the network size, by aggregating key metrics (e.g., max or min) across the entire path for a given TCP flow; unlike traditional INT which accumulates a header stack where each switch adds more data. The max/min aggregation provides critical congestion and resource information of the “worst” bottleneck point in the e2e path.

In this concise form, TCP-INT fits into the TCP options header and is provisioned directly to the end-host TCP stack, integrating local TCP state with that of the network fabric without requiring new network protocol support and preserving interoperability with legacy systems.

We leverage P4 and eBPF programmability to implement TCP-INT on switches and end-hosts. The programmability coupled with TCP-INT’s lightweight design allows the metrics collected from hosts and switches to be customized for the specific requirements of each data center. Some of the metrics the current version of TCP-INT carries in TCP option header are:

Maximum queue depth
Maximum bandwidth utilization (or minimum available bandwidth)
Sum of switch latencies
Switch Hop ID of the worst bottleneck point
Echo of the above metrics

TCP-INT’s kernel integration correlates and enriches this fabric telemetry with local TCP state such as: the src/dst IP addresses and ports, arrival timestamp, throughput and TCP congestion information such as round-trip-time (RTT), in-flight packets, dropped packets and retransmissions. This provides unparalleled, real-time visibility into the end-to-end performance of the infrastructure.

How it works

Figure 1. TCP-INT operation and building blocks

When deployed on Linux hosts or IPU networking stack, the source host adds the TCP-INT TCP option to TCP headers of the application flows chosen by the user. Each switch in the path compares the incoming header metrics to its current local state, conditionally updating the header metrics if “worse” conditions are present compared to previous hops.

The destination host stores the fabric telemetry with its local socket information and writes it back into the echo-reply fields in the TCP-INT header of the next outbound ACK packet. Switches update the outbound TCP-INT header fields but not ‘echo-reply’ fields, so when the source host receives the returned data, it contains fabric telemetry for both directions.

The source and destination host operations can be performed simultaneously in a bi-directional TCP conversation. The source host can directly use the information for closed-loop control, pass it to user-space applications or stream out to the monitoring system for visualization and network-wide analysis.

Use cases

TCP-INT opens many opportunities to improve the performance and efficiency of the data center, including:

End-to-end latency dissection: the rich combination of fabric and host telemetry allows operators to pinpoint the bottleneck quickly, across the whole infrastructure from the application to the kernel TCP, NIC/IPU or switching fabric.
A precise congestion signal: TCP-INT provides precise and concise signal of congestion queuing and bottleneck link bandwidth availability. This enables fast and decisive reaction to network congestion and available bandwidth, unlike the heuristic guesses performed by congestion control algorithms of today using coarse-grained signals such as RTT, packet drops or 1-bit ECN (Explicit Congestion Notification).
With the fabric telemetry available at the host stack, applications running on the host or IPU can directly access the e2e telemetry and use it to quickly adjust the application behavior via scheduling or admission control.

How to use it?

Visit the P4 language GitHub space to obtain the TCP-INT end-host code and find out how to get started using TCP-INT https://github.com/p4lang/p4app-TCP-INT. Here you can learn about important design details such as the current format of the TCP option header and the interactions with TSO and GRO, how we minimize the performance impact by adaptive INT tagging policy.

To integrate easily with existing monitoring solutions, we include a protobuf definition and example gRPC streaming application to export fabric telemetry and key TCP stack information. One consumer of TCP-INT streaming telemetry today is Intel® Deep Insight Network Analytics Software, providing fine-grained, end-to-end visibility to pinpoint exactly when and where network issues arise (See Figure 2).

Figure 2. Intel® Deep Insight Network Analytics Software below

Figure 2. Intel® Deep Insight Network Analytics Software

OCP Global Summit

Join us at Open Compute Project (OCP) Global Summit where we will demonstrate Intel’s TCP-INT technology providing visibility into distributed NVMe storage clusters, exporting TCP-INT to Intel® Deep Insight Network Analytics Software. OCP Intel booth is A12, demo kiosk #2. OCP will be held from October 18th through October 20th, 2022, at the McEnery Convention Center in San Jose, CA, USA.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.