Rust gRPC server with postgres PART IV

Pab Est Pe
8 min read4 days ago

--

This is the last episode of the series. In this part we will implement logging and distributed traces using OpenTelemetry.

In the complex landscape of modern software development, where applications are often distributed across multiple services and platforms, gaining insights into their behavior and performance can be a daunting task. OpenTelemetry emerges as a powerful solution to this challenge, offering a vendor-neutral and open-source framework for collecting, processing, and exporting telemetry data such as traces or logs. By providing a standardized approach to observability, OpenTelemetry empowers developers to gain a deeper understanding of their systems, troubleshoot issues more effectively, and ultimately deliver better user experiences.

Logs and traces

  • Logs: Logs serve as the chronicle of events within a system, capturing discrete messages or records at specific points in time. They provide valuable information about system activities, errors, warnings, and other noteworthy occurrences.
  • Traces: Traces offer a more holistic view by tracking the end-to-end journey of a request or transaction as it traverses various components and services within a distributed system. They capture timing information, dependencies, and contextual data, enabling developers to visualize the flow of execution and pinpoint performance bottlenecks or errors.

Together, logs and traces form a powerful duo in the realm of observability, providing complementary perspectives on system behavior. Logs offer detailed insights into individual events, while traces illuminate the broader context and interrelationships between those events. By leveraging both logs and traces, developers gain a comprehensive understanding of their systems, empowering them to troubleshoot issues, optimize performance, and ensure the smooth operation of their applications.

Implementation

For this implementation, we’ll utilize the opentelemetryand tracing crates. Please note that we’ll exclusively export Jaeger traces via OTLP and logs to Loki. If you wish to export logs and traces to a different destination, such as those offered by public cloud providers like GCP or AWS, you’ll need to implement that functionality yourself.

Please, all code is in repository https://github.com/pestebani/tonic-server/tree/v0.0.4

Structure of the code

In this part, the file structure will be as follows:

tonic-server/
|_ src/
|_ database/
|_ postgres/
|_ mod.rs
|_ database_object.rs
|_ error.rs
|_ mod.rs
|_ model/
|_ mod.rs
|_ otel/
|_ mod.rs
|_ logger.rs
|_ tracer.rs
|_ proto/
|_ agenda/
|_ v1/
|_ agenda.proto
|_ service/
|_ mod.rs
|_ main.rs
|_ build.rs
|_ Cargo.lock
|_ Cargo.toml

Note that there are the same files as before, but with folder src/otel.

Cargo.toml file

We need to add the following dependencies:

once_cell = "1.19.0"
opentelemetry = { version = "0.24.0", features = ["metrics", "logs"] }
opentelemetry-otlp = { version="0.17.0", features = ["metrics", "logs", "grpc-tonic"] }
opentelemetry_sdk = { version = "*", features = ["async-std", "rt-tokio"] }
opentelemetry-stdout = "0.5.0"
opentelemetry-semantic-conventions = "0.16.0"
tracing = "0.1.40"
tracing-opentelemetry = "0.25.0"
tracing-loki = "0.2.5"
tracing-subscriber = { version = "0.3.18" , features = ["env-filter"]}

otel module

tracer.rs file

This tracer.rs module gives the Rust application the ability to generate and export traces that adhere to the OpenTelemetry standard. It offers flexibility in choosing between sending traces to a centralized collector (for production environments) or simply displaying them on the console (for development/testing). This last kind of tracing is invaluable for gaining insights into the inner workings of the system and troubleshooting any issues that may arise.

To select between the available options, we need to configure the OTEL_EXPORTER_TRACES environment variable. Setting it to "otlp" will export traces to the endpoint specified by the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT environment variable.

logger.rs file

The logger.rs module sets up a flexible logging mechanism for the Rust application. It offers the option to send structured logs to Loki for centralized analysis or simply print them to the console during development. The crucial addition of OpenTelemetry integration connects the logs and traces, enabling powerful observability and troubleshooting capabilities.

To choose between the different logging options, we must configure the OTEL_EXPORTER_LOGS environment variable. If it is set to "loki", logs will be exported to Loki using the endpoint specified by the OTEL_EXPORTER_OTLP_LOGS_ENDPOINT environment variable.

To control the level of logging and tracing, you should set the RUST_LOG environment variable. In development environments, setting it to info or debug can be helpful for gaining detailed insights into the application's behavior. However, in production environments, it's generally recommended to set it to warn or error to reduce log volume and focus on potential issues.

otel/mod.rs file

This file acts as the orchestrator, bringing together the tracing and logging components. It handles the initialization process, ensuring both systems are correctly set up and interconnected. The shutdown function provides a clean way to wrap up tracing activities when the application finishes.

Rest of the application

main.rs file

We need to add here the mod otel, and initialize the otel module before starting the server, and stopping it when it receives an stop signal.

mod otel;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let addr = "[::]:50051".parse()?;
let agenda_service = service::CustomAgendaService::new().await;

let _ = init_tracer_and_logger()?;

Server::builder()
.add_service(AgendaServiceServer::new(agenda_service))
.serve_with_shutdown(addr, async {
tokio::signal::ctrl_c().await.expect("failed to install CTRL+C signal handler");
stop_tracer_and_logger();
time::sleep(Duration::from_secs(1)).await;
})
.await?;

Ok(())
}

src/service/mod.rs

We will create the following macro.

macro_rules! trace_and_handle_error {
// Match the block of code within the macro invocation
({ $($code:tt)* }) => {{
// Get the current span for tracing
let current_span = tracing::Span::current();

// Log the entry into the function
tracing::info!("Entered function in service module");

// Execute the provided code within an async block
let result = async {
$($code)*
}.await;

// Match on the result to handle success and error cases
match &result {
Ok(_) => {
tracing::info!("Function executed successfully");
current_span.record("success", &true);

},

Err(err) if err.code() == Code::Internal => {
tracing::error!("Error in function: {}", err.to_string());
current_span.record("error", &format!("{}", err.to_string()));
},
Err(err) => {
tracing::warn!("Error in function: {}", err.to_string());
current_span.record("warn", &format!("{}", err.to_string()));
}
}
result
}};
}

This macro will contain code blocks for each function, generating an “info” level log before entering a function and another log after its execution. It also intelligently adjusts the trace level based on the response: if it’s a controlled error, the trace is set to “warn”; for an uncontrolled error, it’s set to “error.”

Additionally, we will employ the tracing::instrument procedural macro before each function to create a corresponding span for detailed tracing. As an example, here’s how the delete_agendafunction from the AgendaServicetrait implementation would look:

#[instrument(level = "info", target = "service::delete_agenda")]
async fn delete_agenda(
&self,
request: Request<DeleteAgendaRequest>,
) -> Result<Response<DeleteAgendaResponse>, Status> {
trace_and_handle_error!({
self.database
.clone()
.get_db_handler()
.unwrap()
.delete_agenda(request.into_inner().id)
.await?;
Ok::<Response<DeleteAgendaResponse>, Status>(Response::new(DeleteAgendaResponse {}))
})
}

src/database/mod.rs

In this module, we will create a macro that can be used by any implementation of the Database trait. This macro functions similarly to the trace_and_handle_error macro, but it specifically handles DatabaseError as an error instead of Status.

#[macro_export]
macro_rules! trace_and_handle_error_database {
// Match the block of code within the macro invocation
({ $($code:tt)* }) => {{
// Get the current span for tracing
let current_span = tracing::Span::current();

// Log the entry into the function
tracing::info!("Entered function in database module");

// Execute the provided code within an async block
let result = async {
$($code)*
}.await;

// Match on the result to handle success and error cases
match &result {
Ok(_) => {
tracing::info!("Function executed successfully");
current_span.record("success", &true);

},

Err(DatabaseError::UnknownError{ error: err }) => {
tracing::error!("Error in function: {}", err);
current_span.record("error", &format!("{}", err));
},

Err(err) => {
tracing::warn!("Error in function: {}", err);
current_span.record("warn", &format!("{}", err));
}
}
result
}};
}

src/database/postgres/mod.rs

Just as we did in the service module, we’ll employ the tracing::instrument procedural macro to create spans. Additionally, we will utilize the trace_and_handle_error_database macro to generate logs and, if an error occurs, adjust the span level accordingly. Here's an example of how the delete_agenda function would look:

#[instrument(level = "info")]
async fn delete_agenda(self, id: i64) -> Result<(), DatabaseError> {
trace_and_handle_error_database!({
let query = "DELETE from my_table WHERE id = $1";
let deleted_elements_query: PgQueryResult = sqlx::query(query)
.bind(id)
.execute(&self.pool)
.await?;

if deleted_elements_query.rows_affected() < 1 {
Err(DatabaseError::NotFoundError {id})
} else {
Ok(())
}
})
}

docker-compose.yaml

To run the server with the implemented tracing and logging features, we need to include the following services in our setup:

  • Loki: For aggregating and storing logs.
  • Jaeger: For collecting and visualizing traces.
  • Grafana: For creating dashboards and visualizing both logs and traces.

We will also need to set the environment variables mentioned in the logging and tracing code:

  • RUST_LOG: info
  • OTEL_EXPORTER_LOGS: loki
  • OTEL_EXPORTER_OTLP_LOGS_ENDPOINT: “http://loki:3100"
  • OTEL_EXPORTER_TRACES: otlp
  • OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: “http://otel-collector:4317"

With these in place, the configuration file should look similar to this:

services:
postgres:
image: postgres:16.3-alpine3.20
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: postgres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
restart: always
ports:
- "5432:5432"
otel-collector:
image: jaegertracing/all-in-one:1.58.1
ports:
- "4317:4317"
- "16686:16686"
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
tonic-server:
build:
context: .
ports:
- "50051:50051/tcp"
depends_on:
- postgres
- otel-collector
- loki
environment:
DATABASE_TYPE: postgres
DATABASE_URL: "postgres://postgres:postgres@postgres:5432/postgres"
RUST_LOG: info
OTEL_EXPORTER_LOGS: loki
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT: "http://loki:3100"
OTEL_EXPORTER_TRACES: otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: "http://otel-collector:4317"

volumes:
postgres_data:

Running the server

As we did in the previous section, we will use the command docker compose up to run the code. Keep in mind that this might take a while because it needs to build the new image. If it starts up quickly, you might need to delete the image built in the last part because the latest changes are not incorporated.

# Test connection works
grpcurl -plaintext -proto src/proto/agenda/v1/agenda.proto [::1]:50051 agenda.v1.AgendaService/Ping

### OUTOUT
{
"response": "pong"
}

Now let's take a look on jaeger user interface in http://localhost:16686 .

We can see that ping implementation took around 39 micro-seconds to be completed.

Let's try another endpoint.

grpcurl -plaintext -proto src/proto/agenda/v1/agenda.proto -d '{"id":1}' [::1]:50051 agenda.v1.AgendaService/GetAgenda

Now, if we analyze it in Jaeger, we see the following:

We can see that it has 2 spans embedded. We can also see that the request took 6.85 ms, around 200 times more. On the other hand, looking at the spans, we can see that almost all the time were lost in the database function.

See the logs

For retrieving the logs, we need to go to grafana http://localhost:3000and configure loki as a datasource. NOTE: the user is admin and the password is also admin .

In the URL type http://loki:3100 , and then in the botton press Save & test . Finally, in the Exploresection in the menu, we can see the logs.

Conclussions

Thank you for reading! Throughout this journey, we have successfully built a gRPC server using Rust and Postgres, complete with instrumentation for observability.

If you have the time, I would greatly appreciate any feedback you could provide on these four articles. Please feel free to ask if you have any questions, and I’ll do my best to respond promptly.

--

--