Create and Run a Protobuf Plugin — Basic Data

Part 2: How to handle the input a plugin receives from protoc

Published in

Cloud Native Daily

7 min readJun 13, 2023

So now you know, how to create a simple hello-world plugin for protoc. You don’t? Then check out the first part of this series before you continue! In this article, we’re going to focus on how to handle the input a plugin receives from protoc. Our goal is to generate a small overview website that contains some basic stats about the protobuf. All code is available on GitHub. The code snippets below are all in Python.

The following two screenshots show the final result, showing basic information like package name and imports. It also provides a simple lint functionality if a package name is not specified:

The final outcome: Protobuf with proto3 syntax, 3 imports and a defined package

Problem detected: This protobuf has no package name.

1. Reading input from protoc — The CodeGeneratorRequest

When executing a plugin, protoc sends us a protobuf called CodeGeneratorRequest (definition available here) via stdin. So as a very first step, we’ll have to deserialize that protobuf. Python let’s us read from stdin by calling sys.stdin.buffer.read(), which can directly be passed to the CodeGeneratorRequest.FromString() method.

The CodeGeneratorRequest contains a total of 4 fields, but for the purpose of this tutorial, the important ones are

file_to_generate: This is a list of all the file-names that were explicitly passed by the user. We need to generate code for all the files in this list.
proto_file: This is a list of FileDescriptors, that contain information about pretty much anything in a proto-file. We get one of those for each proto-file found on the import path (the one passed with the -I argument), which means that there’s going to be an entry here for every name in file_to_generate.

In addition to handling the file they are currently generating code for, more sophisticated plugins often need to handle imported protos as well. In our case though, we just want to generate some simple statistics about each file, so we can just filter everything else:

if __name__ == "__main__":
    request = CodeGeneratorRequest.FromString(sys.stdin.buffer.read())
    response = CodeGeneratorResponse()

    for proto_file in request.proto_file:                         # 1
        if proto_file.name in request.file_to_generate:           # 2
            f = generate_for_proto(proto_file)                    # 3
            response.file.append(f)                               # 4

    sys.stdout.buffer.write(response.SerializeToString())

The first couple of lines and the last one are required to communicate with protoc. Refer to the previous part of this tutorial, if you need a refresher. In the rest of this snippet, we’re iterating over all known proto-files (1) and then filter them by the ones we’re asked to generated code for (2). Next, we generate the code we need for that file (3) and add it to the final result (4).

The first version of the generate_for_proto function looks like this:

def generate_for_proto(
    file_descriptor: FileDescriptorProto,
) -> CodeGeneratorResponse.File:
    """Generates code for one specific proto-file"""
    generated_file = CodeGeneratorResponse.File()                         # 1
    generated_file.name = file_descriptor.name.replace(".proto", ".html") # 2

    # This is where content-generation will happen

    return generated_file                                                 # 3

It takes a file_descriptor as an arugment, which contains a lot of important information we need. Most importantly, the name of the proto, for which we’re generating code. So first, we create a new file (1) and then specify it’s name by replacing the “.proto” ending with “.html” (2). We are adding content in the next step. For now, we’re just creating empty files, so we can return right away (3).

Now download the example protofiles from here and compile them with:

protoc                                             \
   -I ./proto                                      \
   ./proto/solar_system/mars/mars_services.proto   \
   ./proto/solar_system/mars/marsians.proto        \
   --demo_out=/tmp/generated                       \
   --plugin=protoc-gen-demo=./proto_summarizer/generator.py

Looking inside the generated directory, there should be a solar_system/mars subdirectory, containing two empty html files, mars_services.html and marsians.html.

2. Add simple content

The first piece of content we’re adding is the tab-title and headline. Our goal is to create something similar to this:

A screenshot of a browser window showing the path of the proto-file as tab-title and headline. — Title and headline of the generated code

The first step for this is to add a new function called render that takes the proto-path:

def render(
        proto_path: str,
) -> str:
    """Renders html-code for the given proto-data"""
    return f"""
        <html>
          <head>
            <title>{proto_path}</title>
          </head>
          <body>
            <h1>Summary of <br/> {proto_path}</h1>
          </body>
        </html>
        """

This is an example where f-strings (or similar types of string-interpolation in other languages) can really shine. The code almost reads like html, only replacing the proto-path in the places where it’s needed.

To generate that code, extend the generate_for_proto function with a call to render:

def generate_for_proto(
        file_descriptor: FileDescriptorProto,
) -> CodeGeneratorResponse.File:
    
    ...
       
    generated_file.content = render(file_descriptor.name)

    return generated_file

Next, run the protoc command again and look in the generated directory. It should now contain two simple html files. If you open them, they should look similar to the screenshot above.

3. Collecting basic information — syntax, package and dependencies

Our next goal is to learn some basic information about the proto we’re generating and list them on the summary page. Those are the proto-syntax used (proto2 or proto3), the package name and the number of imported protobufs. We’ll also add a little check-box that warns the user, if the package-name was not defined.

Basic information about the proto-file, including checkmarks and warnings if there’s an issue.

The information about the syntax is available in file_descriptor.syntax, the package name under file_descriptor.package. Imported protobufs are available as a list of file-names under file_descriptor.dependency. To get their number, we need to get the length of that list, len(file_descriptor.dependency). Passing all that information to the render function, I recommend using keyword-arguments, to avoid bugs. The changes to generate_for_proto look like this:

# ...
generated_file.content = render(proto_path=file_descriptor.name,
                                syntax=file_descriptor.syntax,
                                dependencies=len(file_descriptor.dependency))
# ...

To use all this information in the render function, we’ll define some constants, CHECK(✅), CROSS (❌) and INFO(ℹ), that define the unicode value of the symbols we want to use. In the html code, we add the checkmark and a message about the used syntax. Next, we add ✅ if the package_name is defined, otherwise, we’ll add a ❌. Similarly, we’ll show the package_name only if it’s present, otherwise, the user will get an error message. Finally, we’ll print an ℹ along with the number of dependencies.

CHECK = "&#9989;"
CROSS = "&#10060;"
INFO = "&#8505;"

def render(
        proto_path: str,
        syntax: str,
        dependencies: int
        package_name: Optional[str]
) -> str:
    """Renders html-code for the given proto-data"""
    return f"""
        <html>
          <head>
            <title>{proto_path}</title>
          </head>
          <body>
            <h1>Summary of <br/> {proto_path}</h1>
            {CHECK} Nice protobuf you defined there using {syntax} syntax.<br/>

            {CHECK if package_name else CROSS} Its package name is
               {package_name if package_name else 'NOT DEFINED'}.<br/>
            {INFO} It is importing {dependencies} other protobufs.
          </body>
        </html>
        """

Again, run the protoc command and check the output in the generated directory. It should look similar to the screenshot.

4. Adding style

This part is unrelated to protobuf, but who likes websites without style? To make things simple, I just embedded some CSS in the generated file inside the <style> header. I also wrapped the package-name inside a <span>, so I can highlight it. You can’t argue about taste, but for me, this looks ok 😀:

f"""
    <html>
      <head>
        <style>
            body {{
              background-color: #161b24;
              font-family: sans-serif;
              color: #e9ecf2;
              line-height: 1.6;
            }}

            h1 {{
              color: #6ef093;
            }}

            .package {{
              color: orange;
            }}
        </style>
        <title>{proto_path}</title>
      </head>
      <body>
        <h1>Summary of <br/> {proto_path}</h1>
        {CHECK} Nice protobuf you defined there using {syntax} syntax. <br />

        {CHECK if package_name else CROSS} Its package name is
            <span class="package">
                {package_name if package_name else 'NOT DEFINED'}
            </span>.<br/>
        {INFO} It is importing {dependency_count} other protobufs.
      </body>
    </html>
    """

Some other small improvements I’ve made from here are a prettier way of showing the proto-path or running an html-formatter over the generated output.

Conclusion

In this tutorial, you learned how to extract basic information of the code generator request. You used that information to create a little summary website, showing these statistics.

In the next part, I will dig deeper into generating code for messages and services. Follow me, to make sure you don’t miss it :-)

Create and Run a Protobuf Plugin — Basic Data

Part 2: How to handle the input a plugin receives from protoc

1. Reading input from protoc — The CodeGeneratorRequest

2. Add simple content

3. Collecting basic information — syntax, package and dependencies

4. Adding style

Conclusion

Further Reading:

Distributed Tracing: A Guide for 2023

Explore the basics of distributed tracing, how it works, the major components, key benefits, challenges, and best…

Microservices Monitoring: Cutting Engineering Costs and Saving Time

A few ways fort leveraging Helios to save on engineering costs and dev time for a more resource-efficient organization…

OpenTelemetry Tracing: Everything you need to know

OpenTelemetry tracing is filling the gaps of traditional observability methods in microservices apps. Here's how it's…

Written by David Groemling