Mark Zhu
bpf-istio
Published in
5 min readApr 8, 2022

--

logo

Notice: The original article come from https://blog.mygraphql.com/en/posts/low-tec/trace/trace-istio/trace-istio-part4/ . If pictures are unclear, please redirect to the original article.

There is a Chinese version too.

Why

I can’t believe that I really insisted on writing Part 4. Believe it or not, the “Why” section of each part is the hardest thing to write :) . If you’re reading this series for the first time, don’t worry, each section is relatively independent.

Most likely you have sawed Envoy’s features from other places looks like this:

  • Written in C++, native to the lower level, no ‘GC stop the world’, so excellent performance
  • Asynchronous event-driven, multiplexing, perfect solution to C10k problem
  • Because a single thread is responsible for multiple connections, the memory overhead of a large number of threads and the overhead of CPU context switching is reduced when large connections.

These descriptions, of course, have their reasonableness. But a lot of things that are beautiful from a distance, after macro magnification, may have a lot of interesting, valuable things. I believe that if we deep dive, it is always possible to make some meaningful optimizations for our actual operating environment and traffic characteristics. It may just be a modification of the configuration of an Envoy/Kernel, or it may be a modification of a line of Envoy’s code. Or your app’s behavior, such as the size of the buffer each time the socket is written.

All require to be based on understanding the implementation details. Unless you feel lucky or experiences are particularly good that you can guess.

[BPF tracing Istio/Envoy] series

Make a preview before you begin, [BPF tracing Istio/Envoy] series (will be) include:

In this series, I’ll show you how to use bpftrace to "read" the object data in the memory of Envoy process which is written in C++11 on runtime. In order not to scare people away, I try show more pictures and less code. But some diagrams are a little complicated. Uncle Programmer began to tell stories. 🚜

High-level process of HTTP reverse proxy

The overall process of socket event-driven HTTP reverse proxy:

As you can see in the diagram, there are 4 types of events that drive the entire process. The next few sections are analyzed one by one.

To avoid getting lost in the details of each step at once, let’s take a look at the overall flow of all the steps:

Figure: Istio/Envoy module collaboration overview

Downstream Read Request collaboration

Figure:Downstream Read-Ready collaboration

Explain the process in high-level:

  1. Downstream socket readable callback.
  2. Http::ConnectionManagerImpl reads data from downstreamthe socket, incrementally put into Http1::ConnectionImpl.
  3. Http1::ConnectionImpl calls nghttp2 incrementally interprets http requests.
  4. If nghttp2 believes that the HTTP Request request has been read completely, it calls Http::ServerConnection::onMessageCompleteBase().
  5. Http::ServerConnection::onMessageCompleteBase() Stop downstream ReadReady listening.
  6. Http::ServerConnection::onMessageCompleteBase() calls Http::FilterManager to initiate the decodeHeaders iteration of http filter chain.
  7. In general, the last http filter of http filter chain is Router::Filter, and finally, Router::Filter::decodeHeaders() is called.
  8. The logic of Router::Filter::decodeHeaders() will shown in next figure.
Figure: Downstream Request Router collaboration

Explain the process:

  1. Router::Filter, Router::Filter::d ecodeHeaders() is called.
  2. Select a cluster according to the configured Router rules.
  3. If the Cluster connection pool object does not exist, create a new one.
  4. Create a new Envoy::Router::UpstreamRequest object.
  5. Call Envoy::Router::UpstreamRequest::encodeHeaders(bool end_stream) to encode HTTP header.
  6. After a series of load balancing algorithms, match to the upstream host (endpoint).
  7. If no available connection to selected upstream host from connection pool, then:
  8. Open a new socket fd (not connected).
  9. Register the WriteReady / Connected event for the upstream socket FD. Prepare to write an upstream request when the event callback occurs.
  10. Initiate an asynchronous connection to upstream host with socket fd.
  11. Associate downstream and upstream fd

Upstream Write Request Collaboration

Figure: Upstream Write Request Collaboration

Explain the process:

  1. The first time upstream socket write ready callback.
  2. Detect the callback event type is successful connection, then associating the upstream socket to ConnectionPool::ActiveClient.
  3. Second time upstream socket write ready callback.
  4. Detect the callback event type is writable to the connection, then write the upstream HTTP request.

Upstream Read Response Collaboration

Figure: Upstream Read-Response Collaboration

Downstream Write Response Collaboration

Figure: Downstream Write Response Collaboration

bpftrace output

Above figures are not only according to the source code, but also the output of the bpftrace script and tracepoint. The principle of the bpftrace script is:

  1. Record downstream FD (file descriptor of socket), which can be thought of as the socket id in the process.
  2. Add kernel tracepoint and application uprobe. Record the input, output and stack of the probe.
  3. Associate downstream FD and upstream FD.

Of course, there are many details, but I am not going to talk about them one by one. Who wants to know more can contact me to discuss. The bpftrace script in the next section is more detail.

bpftrace script

End

This part, from a socket event-driven perspective, study the main process of Envoy as a reverse proxy. I think I learned something, How about you?

This is an old photo taken a few years ago.

--

--