Rust gRPC server with postgres PART IV
This is the last episode of the series. In this part we will implement logging and distributed traces using OpenTelemetry.
In the complex landscape of modern software development, where applications are often distributed across multiple services and platforms, gaining insights into their behavior and performance can be a daunting task. OpenTelemetry emerges as a powerful solution to this challenge, offering a vendor-neutral and open-source framework for collecting, processing, and exporting telemetry data such as traces or logs. By providing a standardized approach to observability, OpenTelemetry empowers developers to gain a deeper understanding of their systems, troubleshoot issues more effectively, and ultimately deliver better user experiences.
Logs and traces
- Logs: Logs serve as the chronicle of events within a system, capturing discrete messages or records at specific points in time. They provide valuable information about system activities, errors, warnings, and other noteworthy occurrences.
- Traces: Traces offer a more holistic view by tracking the end-to-end journey of a request or transaction as it traverses various components and services within a distributed system. They capture timing information, dependencies, and contextual data, enabling developers to visualize the flow of execution and pinpoint performance bottlenecks or errors.
Together, logs and traces form a powerful duo in the realm of observability, providing complementary perspectives on system behavior. Logs offer detailed insights into individual events, while traces illuminate the broader context and interrelationships between those events. By leveraging both logs and traces, developers gain a comprehensive understanding of their systems, empowering them to troubleshoot issues, optimize performance, and ensure the smooth operation of their applications.
Implementation
For this implementation, we’ll utilize the opentelemetry
and tracing
crates. Please note that we’ll exclusively export Jaeger traces via OTLP and logs to Loki. If you wish to export logs and traces to a different destination, such as those offered by public cloud providers like GCP or AWS, you’ll need to implement that functionality yourself.
Please, all code is in repository https://github.com/pestebani/tonic-server/tree/v0.0.4
Structure of the code
In this part, the file structure will be as follows:
tonic-server/
|_ src/
|_ database/
|_ postgres/
|_ mod.rs
|_ database_object.rs
|_ error.rs
|_ mod.rs
|_ model/
|_ mod.rs
|_ otel/
|_ mod.rs
|_ logger.rs
|_ tracer.rs
|_ proto/
|_ agenda/
|_ v1/
|_ agenda.proto
|_ service/
|_ mod.rs
|_ main.rs
|_ build.rs
|_ Cargo.lock
|_ Cargo.toml
Note that there are the same files as before, but with folder src/otel
.
Cargo.toml file
We need to add the following dependencies:
once_cell = "1.19.0"
opentelemetry = { version = "0.24.0", features = ["metrics", "logs"] }
opentelemetry-otlp = { version="0.17.0", features = ["metrics", "logs", "grpc-tonic"] }
opentelemetry_sdk = { version = "*", features = ["async-std", "rt-tokio"] }
opentelemetry-stdout = "0.5.0"
opentelemetry-semantic-conventions = "0.16.0"
tracing = "0.1.40"
tracing-opentelemetry = "0.25.0"
tracing-loki = "0.2.5"
tracing-subscriber = { version = "0.3.18" , features = ["env-filter"]}
otel module
tracer.rs file
This tracer.rs
module gives the Rust application the ability to generate and export traces that adhere to the OpenTelemetry standard. It offers flexibility in choosing between sending traces to a centralized collector (for production environments) or simply displaying them on the console (for development/testing). This last kind of tracing is invaluable for gaining insights into the inner workings of the system and troubleshooting any issues that may arise.
To select between the available options, we need to configure the OTEL_EXPORTER_TRACES
environment variable. Setting it to "otlp" will export traces to the endpoint specified by the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
environment variable.
logger.rs file
The logger.rs
module sets up a flexible logging mechanism for the Rust application. It offers the option to send structured logs to Loki for centralized analysis or simply print them to the console during development. The crucial addition of OpenTelemetry integration connects the logs and traces, enabling powerful observability and troubleshooting capabilities.
To choose between the different logging options, we must configure the OTEL_EXPORTER_LOGS
environment variable. If it is set to "loki", logs will be exported to Loki using the endpoint specified by the OTEL_EXPORTER_OTLP_LOGS_ENDPOINT
environment variable.
To control the level of logging and tracing, you should set the RUST_LOG
environment variable. In development environments, setting it to info
or debug
can be helpful for gaining detailed insights into the application's behavior. However, in production environments, it's generally recommended to set it to warn
or error
to reduce log volume and focus on potential issues.
otel/mod.rs file
This file acts as the orchestrator, bringing together the tracing and logging components. It handles the initialization process, ensuring both systems are correctly set up and interconnected. The shutdown function provides a clean way to wrap up tracing activities when the application finishes.
Rest of the application
main.rs file
We need to add here the mod otel, and initialize the otel module before starting the server, and stopping it when it receives an stop signal.
mod otel;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let addr = "[::]:50051".parse()?;
let agenda_service = service::CustomAgendaService::new().await;
let _ = init_tracer_and_logger()?;
Server::builder()
.add_service(AgendaServiceServer::new(agenda_service))
.serve_with_shutdown(addr, async {
tokio::signal::ctrl_c().await.expect("failed to install CTRL+C signal handler");
stop_tracer_and_logger();
time::sleep(Duration::from_secs(1)).await;
})
.await?;
Ok(())
}
src/service/mod.rs
We will create the following macro.
macro_rules! trace_and_handle_error {
// Match the block of code within the macro invocation
({ $($code:tt)* }) => {{
// Get the current span for tracing
let current_span = tracing::Span::current();
// Log the entry into the function
tracing::info!("Entered function in service module");
// Execute the provided code within an async block
let result = async {
$($code)*
}.await;
// Match on the result to handle success and error cases
match &result {
Ok(_) => {
tracing::info!("Function executed successfully");
current_span.record("success", &true);
},
Err(err) if err.code() == Code::Internal => {
tracing::error!("Error in function: {}", err.to_string());
current_span.record("error", &format!("{}", err.to_string()));
},
Err(err) => {
tracing::warn!("Error in function: {}", err.to_string());
current_span.record("warn", &format!("{}", err.to_string()));
}
}
result
}};
}
This macro will contain code blocks for each function, generating an “info” level log before entering a function and another log after its execution. It also intelligently adjusts the trace level based on the response: if it’s a controlled error, the trace is set to “warn”; for an uncontrolled error, it’s set to “error.”
Additionally, we will employ the tracing::instrument
procedural macro before each function to create a corresponding span for detailed tracing. As an example, here’s how the delete_agenda
function from the AgendaService
trait implementation would look:
#[instrument(level = "info", target = "service::delete_agenda")]
async fn delete_agenda(
&self,
request: Request<DeleteAgendaRequest>,
) -> Result<Response<DeleteAgendaResponse>, Status> {
trace_and_handle_error!({
self.database
.clone()
.get_db_handler()
.unwrap()
.delete_agenda(request.into_inner().id)
.await?;
Ok::<Response<DeleteAgendaResponse>, Status>(Response::new(DeleteAgendaResponse {}))
})
}
src/database/mod.rs
In this module, we will create a macro that can be used by any implementation of the Database
trait. This macro functions similarly to the trace_and_handle_error
macro, but it specifically handles DatabaseError
as an error instead of Status
.
#[macro_export]
macro_rules! trace_and_handle_error_database {
// Match the block of code within the macro invocation
({ $($code:tt)* }) => {{
// Get the current span for tracing
let current_span = tracing::Span::current();
// Log the entry into the function
tracing::info!("Entered function in database module");
// Execute the provided code within an async block
let result = async {
$($code)*
}.await;
// Match on the result to handle success and error cases
match &result {
Ok(_) => {
tracing::info!("Function executed successfully");
current_span.record("success", &true);
},
Err(DatabaseError::UnknownError{ error: err }) => {
tracing::error!("Error in function: {}", err);
current_span.record("error", &format!("{}", err));
},
Err(err) => {
tracing::warn!("Error in function: {}", err);
current_span.record("warn", &format!("{}", err));
}
}
result
}};
}
src/database/postgres/mod.rs
Just as we did in the service module, we’ll employ the tracing::instrument
procedural macro to create spans. Additionally, we will utilize the trace_and_handle_error_database
macro to generate logs and, if an error occurs, adjust the span level accordingly. Here's an example of how the delete_agenda
function would look:
#[instrument(level = "info")]
async fn delete_agenda(self, id: i64) -> Result<(), DatabaseError> {
trace_and_handle_error_database!({
let query = "DELETE from my_table WHERE id = $1";
let deleted_elements_query: PgQueryResult = sqlx::query(query)
.bind(id)
.execute(&self.pool)
.await?;
if deleted_elements_query.rows_affected() < 1 {
Err(DatabaseError::NotFoundError {id})
} else {
Ok(())
}
})
}
docker-compose.yaml
To run the server with the implemented tracing and logging features, we need to include the following services in our setup:
- Loki: For aggregating and storing logs.
- Jaeger: For collecting and visualizing traces.
- Grafana: For creating dashboards and visualizing both logs and traces.
We will also need to set the environment variables mentioned in the logging and tracing code:
- RUST_LOG: info
- OTEL_EXPORTER_LOGS: loki
- OTEL_EXPORTER_OTLP_LOGS_ENDPOINT: “http://loki:3100"
- OTEL_EXPORTER_TRACES: otlp
- OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: “http://otel-collector:4317"
With these in place, the configuration file should look similar to this:
services:
postgres:
image: postgres:16.3-alpine3.20
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: postgres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
restart: always
ports:
- "5432:5432"
otel-collector:
image: jaegertracing/all-in-one:1.58.1
ports:
- "4317:4317"
- "16686:16686"
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
tonic-server:
build:
context: .
ports:
- "50051:50051/tcp"
depends_on:
- postgres
- otel-collector
- loki
environment:
DATABASE_TYPE: postgres
DATABASE_URL: "postgres://postgres:postgres@postgres:5432/postgres"
RUST_LOG: info
OTEL_EXPORTER_LOGS: loki
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT: "http://loki:3100"
OTEL_EXPORTER_TRACES: otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: "http://otel-collector:4317"
volumes:
postgres_data:
Running the server
As we did in the previous section, we will use the command docker compose up
to run the code. Keep in mind that this might take a while because it needs to build the new image. If it starts up quickly, you might need to delete the image built in the last part because the latest changes are not incorporated.
# Test connection works
grpcurl -plaintext -proto src/proto/agenda/v1/agenda.proto [::1]:50051 agenda.v1.AgendaService/Ping
### OUTOUT
{
"response": "pong"
}
Now let's take a look on jaeger user interface in http://localhost:16686
.
We can see that ping implementation took around 39 micro-seconds to be completed.
Let's try another endpoint.
grpcurl -plaintext -proto src/proto/agenda/v1/agenda.proto -d '{"id":1}' [::1]:50051 agenda.v1.AgendaService/GetAgenda
Now, if we analyze it in Jaeger, we see the following:
We can see that it has 2 spans embedded. We can also see that the request took 6.85 ms, around 200 times more. On the other hand, looking at the spans, we can see that almost all the time were lost in the database function.
See the logs
For retrieving the logs, we need to go to grafana http://localhost:3000
and configure loki as a datasource. NOTE: the user is admin
and the password is also admin
.
In the URL type http://loki:3100
, and then in the botton press Save & test
. Finally, in the Explore
section in the menu, we can see the logs.
Conclussions
Thank you for reading! Throughout this journey, we have successfully built a gRPC server using Rust and Postgres, complete with instrumentation for observability.
If you have the time, I would greatly appreciate any feedback you could provide on these four articles. Please feel free to ask if you have any questions, and I’ll do my best to respond promptly.