Chapter-3: The MILESTONE

Published in

Outreachy Diary

9 min readFeb 28, 2018

Here comes the description of indeed, the jewel in the crown of my Outreachy intern!

A special thanks to my mentor Adam Michael Wood and co-ordinator Hélène Martin for making the implementation a cake walk for me!

This one’s for all those people who are looking out for a custom style guide testing tool for their documentation.

To begin with, if you are looking for a tool which checks a bunch of spellings and grammar rules, Proselint is a great one. Apart from that if you are using Sphinx for your documentation, you can use the Sphinx spelling checker to get rid of the spelling errors which have sneaked into your docs.

I incorporated both of these in the docs but there is a custom style guide which is followed and we wanted a tool which checked for the rules defined in this style guide.

A tool was desired which could extend Proselint and define custom checks.

So, let us take a brief tour through the code structure of Proselint. Proselint uses various functions to check for existence of a pattern in text, consistency of words used, preferred usage of words. You can have a look at them here. Then, there is a set of tests which check for various grammatical issues using these functions. To enable or disable any test, you need to set the corresponding dictionary mapping for the test name to true or false by modifying proselintrc. So, in simple terms Proselint defines some tests which can be enabled or disabled and use some other functions for checking existence or preference.

Another important thing is the use of regular expressions in most of the tests which I was going to use in the custom checks as well. So, if you are new to regular expressions, go through this guide to become an expert.

Now, let us dive into the process of making a tool for a custom style guide check. I am not describing any code here but I am focussing on describing a logic along with screenshots of code!

Step 1: Defining a custom test

This is an example of how to define a custom test. I followed the same pattern as defined in the tests of proselint. The best way to get to know this is to go through the proselint tools and tests. I will explain a custom test here which checks for missing space after sentence end.

Custom test to check missing space after sentence end

The test uses two utility functions in proselint: memoize to cache the result for faster execution and existence_check for finding the pattern in the given text file.

The function existence_check returns a list of tuples containing the information about the error and the test: [(start, end, err, msg, replacement)]. start and end denote the position of matched text, err is the name of the test, msg is the message to be displayed and replacement is any preferred form if given. The replacement returned is None . So, the custom function defined returns the same list.

You can check the code for all the utility functions in proselint here.

Step 2: Copy over custom tests and enable them

After defining all custom tests in a new folder, we need to copy over the custom tests folder to the proselint tests folder and then enable them in proselintrc. This can be easily done in python with shutil.copytree . Also, before each copy over, the previous copy is removed using shutil.rmtree .

All checks reside in the folder style guide which was copied over

Now, to enable or disable the tests, you can hold a dictionary mapping for all your tests with true or false and copy them over to proselintrc. Again we need to be careful about removing any previous mapping and that can be done by checking for first occurence of your test and then ignoring any further lines.

The text file contains mapping for all tests like: **“style-guide.check-comma”: true,**

All said and done, it would be way easier if Proselint provided a way to add custom tests and a custom path to proselintrc. I have filed in a PR for that. Hopefully that gets merged soon and we can then safely ignore this step!

It looks like we are done, we can now easily add custom tests and follow the style guide but apart from this there is a lot more to be done!

Step 3: Add feature to allow the linter to ignore text

I used the comments in sphinx to add this feature. Any piece of text enclosed with the comments .. startignore and .. endignore will be ignored by the linter. Also some of the sphinx directives, links, references etc were to be ignored. To check for the directives, I defined a list for comparing lines with directive names and then ignoring the indented blocks. Any text enclosed within backticks was checked in a similar way.

Replacing ignored lines with **ignore_line\n**

Another thing to keep in mind is to not remove the ignored lines as that may affect the row and column number of other results. So, you can replace the ignored lines with some other text with a newline character at the end. Apart from ignoring some text, directives and inline literals, I created a utility function to remove(replaced with asterisk since that would not affect row and column of other matches) quote marks in the text since proselint does not check any text which is surrounded with quotes. This might not be necessary. If you do not want the text within quotes to be checked, this utility is not needed.

Step 4: Classify errors and warnings

Next is to decide which of the tests are classified as errors and which as warnings. The tests in which we are pretty sure about the correctness of the result can be classified as errors. I prepared a list of test names which were to be considered as errors and then checked it during generating output.

Step 5: Add feature to exclude proselint checks

Some of the tests in proselint were a repetition of some custom checks. Also, there were checks which were not relevant to our documentation. So, they were to be disabled in proselintrc. I defined a function which contains a list of checks to be disbaled and then maps them to false in proselintrc. This list can be extended whenever we want a test to be excluded.

Step 6: Add feature for automatic fixing of errors

All the checks classified as errors provide a replacement depending on the preferred usage. This replacement can be used for fixing the errors.

Step 7: Add feature to produce formatted output on terminal

The next step is to ensure generation of proper output on the terminal. I generated a colored output using blessings package.

The procedure was to get an error list for each file being tested, and then display the errors in the format as in the above image.

The row and column number was used to extract some text from the file to provide a better insight to the user.

Utility function to obatin a portion of matched text

After displaying all errors and warnings, total count of errors and warnings is displayed as well.

Utility function to display error and warning count

Step 8: Add a feature to bind all utilities and run the checks

Finally, we need a function which accepts the paths to run checks on and then performs the testing on the list of files specified. It uses the other utility functions to obtain a path list, ignore lines, classify output and produce a final error list which is used for output generation on terminal. It also produces a list of fixable errors which is used for automatic fixing of errors.

Let me explain the functioning in two chunks:

Part 1: Perform testing

The code above performs the following tasks:

Obtains a list of complete paths using the filenames provided by the user. Another utility function is used to generate a valid path list.

Utility function to obtain a valid path list

Obtains a list of check names classified as errors.
Reads the file line by line, removes the ignored blocks.
Imports the extra checks which are run independently without using inbuilt proselint functions and then runs the extra checks.
Runs the custom proselint dependent checks after removing quote marks using lint function in proselint which generates a better error list providing row, column, severity and replacement. ( Proselint independent checks are defined to return a similar list for consistency.)
Sorts the errors according to row and column
Note that a temporary file is created to be given as input to checks because the checks are defined to accept a file as input instead of a list.

Utility function to generate a temporary file for testing

Part 2: Error list and fix list generation

Proselint specifies severity for all checks as warnings. So, we need to re-define the severity based on the checks classified as errors.
The final error list contains the filename as well to be displayed in the output.
A list of fixable errors is prepared (in which a valid replacement is present)
Use the utilities to display results and fix errors depending on user request.

Step 9: Addition of special features (optional)

Additional features:

Running tests on files modified by user i.e. using the output of git diff — — name-only as path input. I used the GitPython library for this purpose.

Generating csv output using python csv module.

Generating an error list without any other output. Run the checks with disp and fix arguments as false.

Literate programming: Define the code where you define the rules and then parse the code-blocks to generate all checks. Any checks which are too lengthy to be defined with the rules can be written in a separate file with a small snippet added to the rules.

Parse code-blocks for proselint dependent and independent checks into two separate files

These scripts are removed after testing is completed unless user specifies to store them for debugging purpose.

Utility function to remove scripts generated after code-blocks parsing

Step 10: Implement a CLI for style testing

I used Click package to implement a command line interface for style testing.

Define context settings to enable help option.

CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help'])
@click.command(context_settings = CONTEXT_SETTINGS)

Add options for:

Running testing based on git diff
Automatic fixing of errors
Generating csv output after accepting an output path
Storing scripts generated by parsing code-blocks

Add argument to accept file paths for testing.

@click.argument('in_path', nargs = -1, type = click.Path())

Boom! There you go! A tool is ready to perform custom style guide testing.

You can refer the complete script here.

Some tips:

It is not necessary to know the syntax of a language to implement something. It is necessary to have a clear thought of what you want to implement and a determination to do that.
Googling out the things and trying them out is the best way to learn things.
Asking your queries without any hesitation solves most of the troubles.
Try to explore this tool and implement it in an even better way if your documentation needs this.

Feel free to share your reviews and queries and stay tuned for Chapter-4!!!