Data Logging with Golang: How to Store Customer Details Securely

Vadzim Zapolski-Dounar
Pipedrive R&D Blog
Published in
5 min readApr 22, 2020

--

A Shutterstock image

Much like many other software companies, Pipedrive utilizes the power of data ‘logging’ significantly as we find it’s extremely useful when analyzing issues if/when they come up. While logging is obviously important so is privacy which, by design, is one of our core values when it comes to development.

In order for logging to really be useful, one would expect that the maximum amount of contextual information is logged. The more context you have, the less time it takes to understand the specifics of what went wrong in any particular situation (the quality of errors messages is also just as important, but that’s a topic for another day).

While more context makes understanding an issue easier, exposing private information (names, emails, addresses, etc.) in logs is not an option for us — we value the privacy of our customers and wouldn’t want to take advantage of their trust.

The problem: How do we find a balance between full contextual logging and preventing any privacy violations?

As it turns out, using Golang and protocol buffers to communicate between microservices seems to solve the problem fairly seamlessly.

This year, near the beginning of March, an article titled “A New Go API for Protocol Buffers” was published in “The Go Blog” and in this article, a new version of API came with a very handy reflection of protocol messages. We gave this API a spin and tried to make our logged fields automatically white-listed, exactly at the protocol messages design stage.

Hands on: Sample Logger and it’s pitfalls

Sample logger wrapper:

Sample logger

Once it is used, it will just output to a standard logger — so let’s imagine we’re using it in some controller:

Sample controller

Our main function calls a controller (of course data doesn’t come to an application hard-coded in another file — it’s usually through something like forms, but for the sake of simplicity we won’t build a web page):

Main function

This all looks fine until we see what kind of information is stored inside the Company message:

Protocol buffers for Company

If we run our main function a couple of times (until a random failure), we will eventually receive an output showing:

1970/01/01 00:00:00 error: failed to process company id:11 owner:{id:1 name:"Batman" email:"batman@cave.com" title:{id:100001 name:"CLSO - Chief Life Savior Officer"}} coOwner:{id:2 name:"Catwoman" email:"catwoman@box.com" title:{id:100002 name:"CCO - Chef Cuddling Officer"}} size:3

Clearly, there’s a lot of sensitive information here which we’d like to avoid having in our logs.

Advanced Logger with parameters sanitization

It seems that inside our logger we need do some sanitizing steps before sending data to output:

Sanitizing before logging

From the protocol message, we only want to leave in those fields which aren’t considered to contain any sensitive information. Ideally, there should be a way to state which field can be logged (whitelisting) inside the protocol message.

As it turns out, the custom options for protocol messages is exactly what we were looking for. (If you want to check out the documentation on Custom Options, you can find it here: Language Guide — Custom Options. The documentation is for the proto2 syntax, but custom options are the same in the proto3 syntax version).

For what we need, we’re specifically interested in the section related to custom Field Options. Here we’ll introduce our custom field option which states which logField will be used when logging:

Protocol options extension

Here we’re extending the default google.protobuf.FieldOptions by providing our own options. Pay special attention to the number for our extension:

One last thing: Since custom options are extensions, they must be assigned field numbers like any other field or extension. In the examples above, we have used field numbers in the range 50000–99999. This range is reserved for internal use within individual organizations, so you can use numbers in this range freely for in-house applications. — https://developers.google.com/protocol-buffers/docs/proto#customoptions

Below, you’ll see how we use our option by changing the company message description:

company.proto with the newly introduced option for logField
Differences after applying newly introduced option for logField

We simply add the option to all non-sensitive fields, so that any other field inside the message is considered to be sensitive.

Reflection magic for sanitizing

Reflection API has been improved in the recent version of ‘API for Protocol Buffers’ — https://blog.golang.org/protobuf-apiv2

Regarding the improvement, .Range seems to be particularly handy in walking through protocol message fields so... let’s utilize it!

On the line where the case protoreflect.ProtoMessage is used — the sanitizeProtoMessage function will be called:

Sanitize function which calls sanitizeProtoMessage

With this, we’re going over all field values in the message, and processing those fields which have logField option specified. If the field is simple, the code combines the prefix and stores it to map[string]interface{}. If the field is just another protocol message, it goes in recursion. This process guarantees that only fields with the logField option will end up in the log output:

sanitizeProtoMessage recursive — goes over all Message fields to extract the logField

The extractLogField function extracts the value of the custom field option logField:

Extracting logField option for specific field

The logFieldName filters out empty prefixes and joins everything together using _:

Filtering out empty prefixes

After all these manipulations, our new logger prints only the white-listed information to the output:

1970/01/01 00:00:00 error: failed to process company map[co_owner_profession_id:100002 co_owner_user_id:2 company_id:11 owner_profession_id:100001 owner_user_id:1 size:3]
The difference after applying sanitization of fields based on reflection

Perfect! We can now keep our logs free of sensitive information, while still having enough context to debug issues.

This technique allows us to have any type of protocol messages with any kind of nesting, while being sure that only white-listed fields of messages will end up in our logging system.

Thank you very much for reading, and make sure to keep your customer’s data safe!

--

--

Vadzim Zapolski-Dounar
Pipedrive R&D Blog

Principle developer @ Pipedrive, hacking in NodeJS and Golang