Multi-Cloud Observability — Leveraging Fluent Bit to feed a unified view with OCI Logging Analytics

Phil Wilkins
Oracle Developers
Published in
17 min readJan 8, 2024

--

Photo by Ben Krb on Unsplash

In the cloud native space, we’ve seen a lot of evolution in observability and monitoring. Not only have we seen the coalescing of handling the different signal types (Log, Metrics, and Traces), but also the drive towards using native binaries to maximize performance and minimize overheads. While a lot of change may be associated with the CNCF (Cloud Native Compute Foundation) projects, the changes are relevant across all of IT – something true to more projects than is often realized.

This post will take a brief look at those changes and their impact, particularly on multi-cloud stories. A deeper look at the evolution, the rationale and return on investment discussions are a subject for another blog. Here, we will look at the mechanics for quickly piecing together and showing how easy it is to achieve basic multi-cloud observability – focusing on log handling. OCI can also handle Metrics and Traces through the Application Performance Monitoring (APM) part of the OCI Observability and Management services.

Evolving Landscape

The latest evolution in the observation space is the shift from Fluentd to Fluent Bit as a key technology for observability. Several factors drive this:

  • The performance efficiency drivers in the cloud native space propel the adoption of technologies that run as native binaries rather than languages run with just-in-time compiling engines or interpreters. Making this shift reduces overhead and additional startup time of an interpreter or just-in-time compiler.
  • The convergence of handling metrics, traces, and logging in the form of OpenTelemetry with a formalized and unified set of APIs and supported with a range of tools to help instrument applications has meant the ability to support the OpenTelemetry specification is important.

Fluent Bit has retained its ability to be pluggable and extensible, an important driver for Fluentd, with many of the same sources and targets that can be supported. This is why the major cloud vendors such as AWS, Azure, and Oracle have implemented plugins into the Fluent Bit core that support the use of their technologies.

Fluent Bit and Fluentd

Oracle’s first plugin has brought together functionality that makes sending events (logs and logged traces) to its Logging Analytics platform easy. A Fluentd adaptor has existed for a couple of years. Still, Fluent Bit’s ability to support the evolving goals described earlier and has the ability to monitor anything anywhere fits well with Oracle’s philosophy towards multi-cloud. Using Fluent Bit or Fluentd on OCI keeps solutions portable but at the price of not getting the same efficiencies of leveraging Log Analytics and Application Performance Monitoring capabilities that are built in across OCI’s services.

The rest of this article will walk through a basic setup of Fluent Bit to ingest data into OCI Log Analytics. Once you’ve got something basic that will work and can see how it hangs together, then it's time to tighten up the controls. So don’t take this configuration as fully hardened and enterprise-ready. Looking at driving data into APM using Fluent Bit is the subject of a future blog. But combinf the two blogs will show you how we achieve a complete OpenTelemetry capability.

Note: Oracle has contributed to both Fluent Bit and Fluentd with plugins and other enhancements, such as work on the Fluent Operator.

A multi-cloud architecture

Fluent Bit allows you to deploy in a vast range of configurations. Given Fluent Bit’s tiny footprint and the ease through which we can establish a network of Fluent Bit nodes (which can be combined with Fluentd if needed), a concentrator network is a good model to adopt. That means how the data is collected – log files or APIs, pushed or pulled hidden from the downstream setup, so Fluent Bit needs to know about how an application is configured (i.e., where log files are in the file system) that is managed.

Additionally, a concentrator network means that the points through which we control data egress can be controlled. A task that is sometimes performed by a proxy server. Except the proxy server may not understand the data being transferred.

A hypothetical multi-cloud setup — solutions in other clouds have data collected and routed through a central concentrator deployment of Fluent Bit, which can direct the traffic into Log Analytics (and APM)

OCI Setup

The following steps will guide you through a process that is just enough for configuring Logging Analytics capabilities, a single instance central node VM, and the means to simulate source solutions. We’ve provided a list at the end of the document with references to the OCI core documentation for further reading.

OCI User group

As we need to define policies granting permissions, it is possible to do this for individual users. However, it is better and safer to attribute the policy permissions to groups. So, we need to create a user group called Logging-Analytics-Admins in OCI and add ourselves to that group. We can navigate the console to do this from the Hamburger menu drop-down (top left of the console page) – Identity & Security → Groups (note, from here on, all menu navigations start here unless indicated otherwise). This will display a list of existing groups and include the button Create Group. We then get a dialogue like this:

OCI Group configuration UI

Once we’ve provided a name and description, the Create button will take us to the following UI, where we can add users to the group, including ourselves.

OCI Add user to a group

This step doesn’t need to consider the compartment involved, as users and groups are tenancy-wide constructs. However, all the following setup steps will need to take into account the compartment. So, it is worth checking the compartment setting each time an OCI resource is created.

The steps Oracle documentation on these steps is here. During or after the group creation, you need to add yourself to the group.

OCI Policies

With our group ready, we can now set up policies and associate them with the group. To do that, we must select the Identity & Security → Identity → Policies option from the menu. With the correct compartment selected, we can select the Create Policy button to launch the necessary dialogue. We need to provide a name, and then we can edit the policies. The following policies are defined with minimal controls. For a production setup, we’d need to revisit and tighten things up (for example, rather than the policy being tenancy-wide, it should be restricted to the relevant compartment, and the use-related policies should be applied to an additional user level of the group)

The policies needed are:

  • allow service loganalytics to MANAGE loganalytics-features-family in tenancy
  • allow group Logging-Analytics-Admins to MANAGE loganalytics-features-family in tenancy
  • allow group Logging-Analytics-Admins to READ compartments in tenancy
  • allow group Logging-Analytics-Admins to MANAGE loganalytics-resources-family in tenancy
  • allow group Logging-Analytics-Admins to MANAGE management-dashboard-family in tenancy
  • allow group Logging-Analytics-Admins to MANAGE loganalytics-log-group in tenancy
  • allow group Logging-Analytics-Admins to USE loganalytics-entity in tenancy
  • allow group Logging-Analytics-Admins to USE loganalytics-log-group in tenancy

The process of adding these policies should look something like this:

Configured policies needed for our logging.

The Oracle documentation for policy creation is here.

Logging Analytics

If you’ve not yet activated logging analytics in your tenancy – that’s the first step. We can do this from the Observability & Management à Logging Analytics → Administration menu navigation. If you’ve not started Logging Analytics, you’ll be shown a landing page with a button to start using Log Analytics.

Log Analytics dashboard with start option for Log Analytics

Setting up a Log Group

Logs need to belong to Log Groups. So we need to go into the Log Analytics administration to configure the Log Group and the Source that will receive the log events using the menu navigation Observability & Management → Logging Analytics → Administration.

Logging Analytics Admin home page

To do this, we need to select the Groups tile or Log Groups from the left-hand Resources list to get the UI that lists the existing Log Groups and includes the Create Log Group button. Before creating the Log Group, ensure the correct compartment is shown in the left-hand Scope section. Selecting the Create Log Group button will launch a simple form like this:

Log Group creation UI

We only need to provide a simple, logical name. Once the group has been created, we will need to obtain the OCID, which will be needed for the Fluent Bit configuration. We can do this by clicking on the group's name in the list of groups. We then get a page as follows:

Created Log Group UI

Logging Entity

In addition to the log group, we also need to establish a logging entity. Select the Entities tile from the same administration page (using the Create Entity shortcut in the tile) or via the left list. If you don’t use the tile shortcut, you’re presented with an Entities list page, which includes the Create Entity button. Both this button and the tile shortcut will present us with a web form to complete.

The Entity Type gives the Log Analytics the means to infer structure and meaning to the received log events. To get things up and running, we’re less concerned about the entity type and that we can consume the payload – so we can set the value to OCI Generic Type, which can be found by starting to type the name in the field.

We must put the entity into the proper compartment to help manage access and visibility. As we’re using the Fluent Bit plugin, which under the hood uses the REST APIs, we can leave that value unset along with the Cloud Resource ID; with that, we can use the Create Entity button. Once created, we need to capture the resource’s OCID again.

Logging Entity creation UI

Hosting Fluent Bit

Deploying Fluent Bit is straightforward. Most will ultimately want to use Fluent Bit in a Kubernetes environment, and we can also use Fluent Bit to support more traditional deployments such as VMs, bare metal, etc. While getting the hang of Fluent Bit, we recommend using a VM approach as it sets aside the complexities of understanding the Kubernetes landscape. To make this infrastructure work, we need to establish several resources, including:

  • Virtual Cloud Network
  • Public Subnet
  • Network routing rules to allow traffic in and out of the executable environment.
  • A Virtual Machine with a public IP, configured firewalls, and, of course, Fluent Bit.

OCI Network setup

The most convenient approach is to use the VCN Wizard, which can be accessed from the Network Overview console (menu: Networking → Overview).

The wizard will offer a form like this, where, at a minimum, we need to provide a VCN name and ensure the correct compartment is being used. All the other resources created will be derived from this.

VCN configuration UI

If you want Fluent Bit to receive events from outside of OCI, then you’ll need to amend the Security Lists to allow the traffic to flow through to the VM’s port. This is a point we’ll come back to later.

OCI VM to host Fluent Bit

Once the networks are in place, we need a VM to use. Like the networks, the most expedient way to do this is through the OCI Console. The Compute Overview panel has a button to launch a wizard to complete this (menu Compute → Overview).

This step is a little more complex, as we have a choice of compute shapes and OS images in the Image and Shape section of the wizard. We recommend adopting Ubuntu 22.04 because it is mainstream enough that most people will be familiar with it. In terms of compute shape, this doesn’t need to be very big (1 OCPU and 2GB of memory is sufficient).

In the Primary VNIC information section, we need to ensure that the correct VCN has been selected and that the public subnet has been created. The last essential step before creating the VM is to provide or download generated keys for the VM.

Create VM UI

Once the instance is created, we can view all the details for the VM. We will need to take note of the public IP.

Preparing for external application calls

To invoke OCI services as an application, we need to provide user credentials. Monitoring processes are not part of a user workflow, so we can’t link it to a user login. For this, we need to create a user with appropriate permissions or, for simplicity, use our own identity. In addition to this, the plugin for Fluent Bit requires a few other details beyond the resources we’ve already created. So let’s gather the necessary details.

The OCI Namespace.

While the OCI namespace for the tenancy may line up with the name we see, it is best not to assume an exact match. There are several ways to obtain the OCI namespace to be used. One way is to use the OCI command line from the Cloud Shell, which can be accessed from the main menu bar, as shown here:

Accessing the Cloud Shell

Then, within the cloud shell, we need to run the command:

oci os ns get

This will return the result like the one shown next. From the result, we want the data’s value without the quotes for our configuration.

Cloud shell executing the Namespace query

OCI Properties File

As the Fluent Bit configuration for the OCI Logging Analytics plugin needs details to authenticate the connection to OCI, the plugin expects to find a properties file. Just as we would if using the CLI or the SDK. So, we need to configure a file that looks like this:

[DEFAULT]
user=user-OCID
fingerprint=User-fingerprint
tenancy=tenancy-OCID
region=us-ashburn-1
key_file=/home/ubuntu/demo-flb-key.pem

Accessing the Linux VM

When we created the VM, we will have used or downloaded key files, which we’ll need now. We also need a local SSH client such as OpenSSH. To access the VM, we need to use the following command, substituting the key file name and the VM’s IP:

ssh - <key-file-name>.pem ubuntu@<public IP>

Installing and Configuring Fluent Bit

Getting Fluent Bit ready for use needs us to do a little bit of preparation as the images Oracle provides have a little bit of default hardening that needs to be changed to allow network traffic to flow as necessary. The specifics of this are addressed here.

With Linux ready, we can install Fluent Bit.

Installing Fluent Bit

Fluent Bit is not in the Oracle Advanced Package Manager (APT) repository. So, the easiest way to install Fluent Bit is to use the command line provided by the Fluent Bit project:

sudo curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh

This downloads and executes an installation script. At the end of the process, you should find Fluent Bit installed in /opt/fluent-bit/bin/fluent-bit. But it hasn’t modified details such as $PATH. A word of warning: the script does check which Linux flavor is to be used, so if you try to run the script on an OS it hasn’t taken into account, the installation will fail. If you’re comfortable with bash scripts, it doesn’t look too scary to tweak.

Setting up the environment

To make a demo environment easy to set up, we’ve found that creating a demo setup is quick and convenient, and if you need to change any configurations, it is easy to just source the shell file. Rather than refresh the entire environment or VM through a reboot, etc. To simplify things, we can also set the .profile file in your home directory to execute the shell script, e.g.

source demo-setup.sh

In our setup, we’re going to need the following environment variables setup:

  • OCI_LA_ENTITY_OCID
  • OCI_LA_ENTITY_TYPE
  • OCI_LA_GROUP_ID
  • OCI_LA_SOURCE_NAME
  • OCI_NAMESPACE
  • FLB_CENTRAL_NODE

We should end up with export statements looking like this:

export OCI_LA_SOURCE_NAME="flb-demo-source"

Rather than messing with $PATH to add Fluent Bit, we can add to our script:

alias fluent-bit=/opt/fluent-bit/bin/fluent-bit

This gives the same practical convenience of running Fluent Bit without using the full path.

Fluent Bit Configuration Overview

Fluent Bit can be configured in several different ways. The most common way is to use a configuration file that can follow a newer YAML format or its classic custom notation. Currently, the classic notation is more prevalent, and there is plenty of content to support its understanding; it is also less nuanced than YAML, using a simple key-value pair model with the separation between the key and the value being a space character after the key.

As this node is the concentration point for connecting to OCI Logging Analytics, we need to receive log events from other nodes. We can do this in several ways (such as acting as an OpenTelemetry collector and accepting HTTP calls). But the simplest is to use the Forward protocol that allows Fluent Bit and Fluentd, as well as other services, to communicate, and it doesn’t require any data transformation. This means we should define our source in Fluent Bit as the forward input plugin. For output, we obviously want to define log events going to OCI Log Analytics, but we might also want to keep a local copy of the events, which can be done with a simple file output. This gives us a configuration we’ve called fluent-bit-demo-receiver.conf.

The resulting configuration looks like this:

[SERVICE]
flush 1
# means we will push any cached data to its target every second

[INPUT]
name forward
port 9090
listen 0.0.0.0

[FILTER]
name rewrite_tag
match demo.*
rule $message .*[HELP].* error false

[OUTPUT]
name file
match *
file demo.txt
path /home/ubuntu
mkdir Yes

[OUTPUT]
name oracle_log_analytics
match *
profile_name DEFAULT
tls on
tls.verify off
config_file_location /home/ubuntu/LA_conf.properties
namespace ${OCI_NAMESPACE}
oci_la_log_source_name ${OCI_LA_SOURCE_NAME}
oci_la_log_group_id ${OCI_LA_GROUP_ID}
oci_la_entity_type "${OCI_LA_ENTITY_TYPE}"
oci_la_entity_id ${OCI_LA_ENTITY_OCID}

[OUTPUT]
name stdout
match *

Fluent Bit deployment

With this, we can now fire up the Fluent Bit instance with the command on the console:

fluent-bit -c fluent-bit-demo-receiver.conf

At this stage, the configuration won’t do anything, as any processing that is triggered is predicated on the receipt of events over HTTP. To address this easily, we can take advantage of Fluent Bit’s dummy plugin, which enables it to create very simple log events at regular intervals (second or sub-second intervals). Combining this with the Fluent Bit HTTP output, we can easily direct a flow of simple dummy events into our Fluent Bit receiver. The following configuration can be deployed anywhere, which can send traffic to our VM’s public IP – either another VM on OCI or your desktop (assuming you have Fluent Bit already installed).

[SERVICE]
log_level debug
flush 1

[INPUT]
name dummy
tag local-dummy
dummy {"message" : "separate FLB instance generating this"}

[OUTPUT]
name forward
match *
port 9090
host ${FLB_CENTRAL_NODE}
[output]
name stdout
match *

For this to work, the local environment needs the environment variable, FLB_CENTRAL_NODE, to be defined with the IP of our VM hosting the Fluent Bit central instance. By saving this configuration as fluent-bit-demo-source.conf, we can now run it with the command:

fluent-bit -c fluent-bit-demo-source.conf

Seeing results

With the second instance of Fluent Bit generating traffic, we can confirm that the log events are being created as our central node is both forwarding the traffic to OCI Log Analytics and writing to a file called demo.txt in /home/ubuntu, so the content can be inspected by tailing the file.

To see the results in OCI’s Logging Analytics

The best solution is to create your own Dashboard for rapid access back to log views and related reports. But to jump directly to the right information, we can navigate Observability & Management → Logging Analytics → Log Explorer. Then, if the chat is not showing the data you expect, in the expression bar, click the X on the right-hand side, and insert the following expression replacing the <log source name> with the name used during the setup (e.g., flb-demo-source):

'Log Source' = '<log source name>' | timestats count as logrecords by 'Log Source' | sort -logrecords

Then, hit the run command. If this is the first time it's been queried, the data might take a moment to load, as it needs to be retrieved from the backing store and cached. As a result, you’ll see a screen like this:

Log Analytics — log view

We can then drill into the details of each record using the arrow next to the entity or the right-hand menu on each row. We can also drill in by clicking on the graph to focus on a specific range of events. The details for drilling in include views like these:

Expanding the information relating to a specific log event.
Looking at a specific log event. With our dummy data, this is of limited value, but in the real world, such views are essential when this can be very rich data or a stack trace.

Conclusion

As you can see, it is possible to use Fluent Bit as a cloud/deployment-neutral tool to feed data to OCI Log analytics. While we have cheated with Fluent Bit, using an input that generates dummy events, Fluent Bit has plenty of mechanisms to capture events, particularly log sources. One of the natural next steps in this demo is to evolve the event capture. If we know the kinds of sources we want, we can simulate the source using the open-source Log Simulator (which can simulate more than just logs now). This means it is easy to replay log events into the Fluent Bit setup and OCI Log Analytics to test routing, filtering, and alerting configurations without running a real application.

To get our Fluent Bit set up to be a production fit, we need to consider the following kinds of improvements:

  • Rather than using a public IP to address Fluent Bit on OCI, move to a DNS setup – it will make for a more robust configuration.
  • We need to protect the inbound flows to OCI from malicious actors. The first step is firewalling the ingress point.
  • Ideally, we look at securing the ‘pipes’ connecting the different cloud ingress/egress points. This will help protect the payloads and reduce the opportunities for data theft. Ideally, such infrastructure will be in place for the application itself.
  • Review the deployment of Fluent Bit – is a single VM sufficient, or do we want to use Virtual instances or Kubernetes? As the Fluent Bit node is a focal point for potentially many sources – it will probably warrant setting up some resilience, such as a balanced pair of nodes. So we need load balancing configured if VMs or Virtual Instances are adopted.
  • Configure Fluent Bit to use SSL/TLS between nodes. Consider establishing VPNs from remote locations, preferably as an additional layer, but failing that as an alternative.
  • As we said at the start, we’re minimizing the security settings for ease and speed of setup. This needs to be revisited and tightened to reflect best practices (a story for another time).
  • Adding additional metadata to the log events to further understand the context and origin of the event.

OCI Document Reference

The following list links to the OCI documentation relevant to the OCI setup.

Other Useful resources

--

--

Phil Wilkins
Oracle Developers

Techie, author, blogger, https://blog.mp3monster.org -- my blog covering tech https://cloud-native.info I work for Oracle, but all opinions are my own.