What are source{d} Lookout analyzers and how to create one?

Published in

sourcedtech

3 min readMar 5, 2019

Earlier this month, we announced the release of a new format analyzer powered by Machine Learning in source{d} Lookout, our brand new assisted code review framework. source{d} Lookout is our first step towards a full suite of Machine Learning on Code applications. It’s a framework to develop and deploy new code analyzers for assisted code review on GitHub pull requests. Analyzers benefit from language agnostic representations of source code with Universal Abstract Syntax Trees (UASTs) available in source{d} Engine, avoiding the need for separate parsing steps.

For more information on source{d} Lookout and the underlying architecture, you can watch the video recording below from our last source{d} Online meetup.

Available Analyzers

This is the list of the known implemented analyzers for source{d} Lookout:

source{d} Lookout analyzers available as of 3/1/2019

While there is only a handful of analyzers available at the moment, we’re actively working on adding more and invite developers to create their own based on their own use cases. Here is a quick tutorial on how to create your own source{d} Lookout analyzer.

Implementing Your Own Analyzer

For a brief description about what is an analyzer, you can read source{d} Lookout Analyzers documentation. Please refer to the official Protocol Buffers documentation to learn how to get started with Protocol Buffers. To implement your own analyzer from scratch you need to create a gRPC service implementing the Analyzer service interface:

service Analyzer {  rpc NotifyReviewEvent (ReviewEvent) returns (EventResponse);  rpc NotifyPushEvent (PushEvent) returns (EventResponse);}

You can create a new analyzer in any language that supports protocol buffers, generating code from the .proto definitions. The resulting code will provide data access classes, with accessors for each field, as well as methods to serialize/parse the message structures to/from bytes.

Caveats

All the analyzers should consider the caveats described by the SDK.

Fetching Changes, UASTs or Languages from DataService

source{d} Lookout will take care of dealing with Git repositories, UAST extraction, programming language detection, etc. Your analyzer will be able to use the DataService to query all this data.

You can read more about it in the source{d} Lookout Server section.

How to Test an Analyzer Locally

Please refer to lookout-sdk docs to see how to locally test an analyzer without accessing GitHub at all.

Using pre-generated Code from the SDK

If you’re creating your analyzer in Golang or Python, you’ll find pre-generated libraries in the lookout-sdk repository. The SDK libraries also come with helpers to deal with gRPC caveats.

lookout-sdk repository contains a quickstart example — implemented in Go and in Python — of an Analyzer that detects the language and the number of functions for every file.

Machine Learning analyzers (Python)

There exists an easier alternative to the described way of creating new analyzers that are tailored for Pythonistas. The difference is in leveraging higher-level abstractions from lookout-SDK-ml Python package. You won’t have to mess with gRPC internals, and you will get optional access to the rich MLonCode research ecosystem developed by source{d}. In brief, you need to implement two methods and you are ready to go:

class MyAnalyzer(Analyzer):
 @with_changed_uasts_and_contents
 def analyze(self, ptr_from, ptr_to, Data_service, changes):
    # TODO
@classmethod
@with_uasts_and_contents
def train(cls, ptr, config, data_service, files):
    # TODO

Then watch how it works directly on GitHub:

analyzer package my_analyzer -u your_user -t your_token -r your/repo -y

Refer to the getting started guide for the complete instructions. This is how we actually develop the announced format analyzer.

Learn more about source{d} Lookout and MLonCode: