Quick words on Vector

hdhoang
Coccoc Engineering Blog
4 min readDec 29, 2021

Today’s a quick introduction to Timber Vector. It’s a multi-purpose metrics&log routing tool, positioned as sending source agent and/or aggregation node. We first trialed it in several log-to-metric pipelines, aggregating error rate & latency histograms out of app log. But the most impactful application it delivered is a logging sidecar.

Originally, our php-fpm apps write messages directly to kafka. However, the CGI-style request-response model proved problematic with:

  • conn set up, including TCP roundtrip, then SASL login roundtrip
  • each connection delivered 1 message
  • conn tear down, using up pod->broker TCP tuples

This raised users’ request latencies, held up php-fpm process pool, and used kafka less efficiently without message batching.

Fortunately, the info we wanted to deliver is available from pod’s nginx log formatting. We set up a sidecar Vector listening on loopback syslog, with some minimal data editing, then producing to kafka a batch at a time.

# nginx config
log_format demo escape=json '{'
'"user_agent":"$http_user_agent",'
'"cookies":"$http_cookie",'
'"event_time_float":"$msec",'
'"datetime":"$time_iso8601",'
'"ip":"$remote_addr",'
'"uri":"$request_uri",'
'"http_response_code":"$status"'
'}';
map "$request_uri" $loggable {
~^/interesting/ 1;
default 0;
}
access_log syslog:server=127.0.0.1:1514 demo if=$loggable;# vector config in TOML format# prometheus-format metrics at pod's ip:8998/metrics
[sources.internal_metrics]
type = "internal_metrics"
[sinks.vector_metrics]
type = "prometheus"
inputs = ["internal_metrics"]
address = "0.0.0.0:8998"
# extract, transform, and store pipeline
[sources.syslog]
type = "syslog"
mode = "udp"
address = "127.0.0.1:1514"

[transforms.parse_json]
type = "remap"
inputs = ["syslog"]

source = """
log_data, err = parse_json(.message)
# nginx log time_iso8601 sample "datetime":"2019-08-12T16:23:38+07:00"
# to parse event date/time from it:
datetime, err = split(log_data.datetime, pattern: "+")[0]
log_data.event_time = replace(datetime, "T", " ")
log_data.event_date, err = split(log_data.event_time, " ")[0]
del(log_data.datetime)

. = log_data
"""

# For initial debugging
# [sinks.stdout]
# type = "console"
# inputs = ["parse_json"]
# encoding.codec = "json"

[sinks.logstore]
type = "kafka"
inputs = ["parse_json"]
encoding.codec = "json"

batch = { max_events = 250 }
compression = "none"
librdkafka_options."compression.codec" = "zstd"

healthcheck = true
librdkafka_options."client.id" = "${HOSTNAME}"

sasl.enabled = true
bootstrap_servers = "${KAFKA_BROKER_LIST}"
sasl.mechanism = "SCRAM-SHA-256"
sasl.password = "${KAFKA_SASL_PASSWORD}"
sasl.username = "${KAFKA_SASL_ACCOUNT}"
topic = "${KAFKA_TOPIC}"

Because the messages are batched, we can apply in-protocol compression on them, reducing transfer & storage cost on kafka brokers. This pattern became well-used throughout our k8s deployments.

The parse_json transform above demonstrates vector’s home-grown Remap Language (VRL). It has similar utility to Logstash grok or JRuby, but with an explicit compiling step at config parsing, to validate and avoid runtime errors. The go-esque error handling (if err != nil), coupled with forceful quick override (fallible_op!() to abort processing on error), affords good iteration for early development, while still mark fallible calls explicitly for future review. Rust-inspired compile error messages with actionable suggestion save us from runtime surprises in the face of realworld complications. When surpises happen anyway, we can route such erroneous data to a secondary pipeline for manual inspection & replay. VRL also provides functions to redact sensitive data, or enrich raw data with more descriptive values. To capture & detect regression in processing, vector specifies a unit-testing scheme for transforms. Config writer can inject data at several steps to excerise the transforms, then assert that output events appear as expected. There was such out-there experiment as WASM-binary transformation, but it was removed in favor of Lua and VRL.

Compared to the Elastic beats+logstash combo, or Treasure Data fluent-bit/fluentd ecosystem, there are pros (single tool, transform stages testing, flexible routing with numerous sources & sinks), and several cons:

  • evolving configuration scheme, with no 1.0 on roadmap yet
  • lacks of examples/tutorials, as well as old versions’ docs on the webpage
  • being 3rd-party to every potential sink & source

In early 2021, the acquisition into DataDog, a big player in ObsOps space, is a big boon. DD places vector at the center of their upcoming pipeline platform. The release cadence picked up pace, with clear upgrade guidance between versions.

Early 2022, ClickHouse community added a guide on using Vector for shipping log straight into CH: https://clickhouse.com/docs/en/integrations/vector-to-clickhouse/.

We have another kafka-to-compressed-file pipeline. Due to the clear ratio and decompression speed advantage, we agreed to do hourly rotation with Zstandard. We ended up using td-agent (a distribution of fluentd+plugins) calling zstd cmdline, mostly due to vector not supporting zstd with file sink yet.

Regarding perf, the website used to have a benchmark table sourced from their test-harness repo, but skipped it in a recent redesign. There are some evaluations around, but this article from 2021–09 doesn’t specify version of tested software. In our experience with the log-to-metrics extraction, a couple of vector pods could keep up with the k8s log volume which previously overwhelmed 8 logstash instances, at a fraction of memory & CPU usage. While I’m writing this, vector released 0.19 which fixed some perf regressions, as well as adding opportunistic parallelism to some transformation types.

I hope to have piqued your interest in the tool. Let’s try browsing through their VRL samples & find inspirations!

--

--