PHP static code analysis based on the example of PHPStan, Phan and Psalm

Published in

Bumble Tech

24 min readDec 4, 2018

As a company Badoo has been around for over 12 years now. We have a lot of PHP code (millions of lines of code) and there are probably still strings of code that we use which were written all 12 years ago. Some of them were written back in the time of PHP 4 and PHP 5. We deploy code twice a day and each time about 10–20 tasks are involved. Besides that, our programmers sometimes deploy urgent patches that require small changes. And each day, we can get up to 20 patches. All in all, our code changes pretty continuously.

We are always looking out for ways to speed up development as well as improve code quality. So, one day, we decided to introduce static code analysis. Scroll down to find out how this worked out for us.

Strict types: why we don’t use them yet

One day a discussion came up on our corporate PHP chat: one of our new joiners was telling us how, at their previous job, they had introduced obligatory strict_types + scalar type hints for the whole code — and this had significantly lowered the number of bugs at the production stage.

Most of the veterans on internal chat came out against the idea. Their main reason was that PHP doesn’t have a compiler to check at the compilation stage that all the types in the code match up. The risk is that if you don’t have 100% testing coverage for your code, errors could crop up in production — obviously something we want to avoid.

Of course, strict_types will find some bugs due to types mismatch and to the fact that PHP ‘silently’ performs type conversion. However, many experienced PHP programmers already know how the type system works in PHP, the rules for type conversion and in most cases their code works fine.

However, we did like the idea of having a system that identified types mismatches in our code, so we started to think about finding alternatives to strict_types.

At first, we even thought of patching PHP. If a function accepts a given scalar type (let’s say, int), and, if a different scalar type (for example, float) comes in, what we wanted was not for a TypeError to be thrown up (which is essentially an exception), but instead for type conversion to occur and the event to be logged in the error.log. This would allow us to locate all the places where we have made incorrect assumptions regarding types. In the end, though, we decided that a patch of this kind might be risky, and that problems might also arise with external dependencies which are unprepared for this kind of behaviour.

So, we gave up on the idea of patching PHP. However, the timing of all this coincided with the first releases of the static analyser Phan in which the first commits were made by Rasmus Lerdorf himself. And this is how we came up with the idea of giving static code analysers a try.

What’s static code analysis?

Static code analysers simply read code and try to find errors. They are able to perform both very simple and straightforward tests (for example, for the existence of classes, methods and functions), as well as ‘smarter’ tests (such as finding non-matching types, race conditions or vulnerabilities in the code). The key is that analysers do not run the code; rather, they analyse the program text and test it for typical (or not-so-typical) errors.

The most obvious example of a PHP code static analyser is the inspections in PHPStorm: when you write code, it highlights wrong function and method calling, non-matching types and so on. PHPStorm does this without running your PHP code but simply analyses it.

I should point out that in this article we are discussing analysers which look for errors in the code. There is another class of analysers which checks the code style, cyclomatic complexity, method size, code line length, etc. This latter class of analysers is not being covered here.

That being said, not everything found by the analysers is actually an error. By ‘error’ I mean code which generates a ‘wrong’ response at production. Very often what analysers find is merely an inaccuracy. For example, PHPDoc may show an incorrect parameter type but such an inaccuracy would not affect the working code. However, clearly the code is going to evolve over the time and so, at some point a future programmer might let an error slip through.

Existing PHP code analysers

There are three popular PHP code analysers:

1. PHPStan

2. Psalm

3. Phan

Exakat is another but we have not tried it yet.

From the user’s perspective, all three analysers are the same: you install them (probably via Composer), and configure them, at which point you can then run an analysis of your whole project or a group of files. As a rule, the analyser is able to nicely export the results to the console. The results can also be exported in JSON format and used in CI.

All three projects are now actively developing. Their maintainers respond very actively to issues on GitHub — often you can expect an answer within 24 hours. This is very helpful because many of the bugs we identified were fixed within a couple of days. I particularly like the fact that the maintainers of the projects actively interact among each other, report bugs and sending pull requests.

We have introduced, and now use, all three analysers. They each have their issues and their own bugs but using all three analysers at the same time makes it easier to understand what represents a real problem and what constitutes a false positive.

What analysers can do

The three analysers share a lot of features so we shall first consider these before moving on to explore the specifics of each one.

Standard checks

Of course, the analysers perform all the standard checks on the code, to ensure that:

there are no syntax errors;
all the classes, methods, functions and constants exist;
the variables exist;
the hints in PHPDoc correspond to reality;
there are no arguments or variables unused.

Many of these errors inevitably lead to ‘fatals’ in the code.

At first glance you might think that good programmers wouldn’t make such errors, but sometimes we are in a hurry, sometimes we copy-paste and sometimes we are just careless. In all such cases these checks are very much a lifeline.

Data type checks

Of course, static analysers also perform standard checks related to data types. If it is written in the code that a function accepts, say, int, then the analyser will check that there aren’t any places where an object has been sent to this function. Most analysers allow you to configure the level of strictness of checking and imitate strict_types: they check that String or Boolean aren’t passed to this function.

Besides standard checking, analysers are also designed to do a whole lot more.

Union types

All three analysers support the Union types concept. Let’s say you have a function like this:

/**  
 * @var string|int|bool $yes_or_no
 */
function isYes($yes_or_no) :bool 
{
    if (\is_bool($yes_or_no)) {
         return $yes_or_no;
     } elseif (is_numeric($yes_or_no)) {
         return $yes_or_no > 0;
     } else {
         return strtoupper($yes_or_no) == 'YES';
     }
}

The content is not that important; what is important is the type of incoming parameter string|int|bool, that is to say the variable $yes_or_no — either a string or an integer number or a boolean.

It is not possible to describe this type of function parameter using PHP. However, this is possible in PHPDoc and many editors (such as PHPStorm) understand it.

In static analysers this type is called a union type, and they are very good at testing this type of data. For example, if we had written the function referred to above in the following way (without checking for boolean):

/**  
 * @var string|int|bool $yes_or_no
 */
function isYes($yes_or_no) :bool 
{
     if (is_numeric($yes_or_no)) {
         return $yes_or_no > 0;
     } else {
         return strtoupper($yes_or_no) == 'YES';
     }
}

The analysers would have noticed that either a String or Boolean might be passed to strtoupper and would return an error: you cannot pass Boolean to strtoupper.

This type of test helps programmers to correctly process errors and situations when a function cannot return data. After all, we often write functions which may return some data or null:

// load() returns null or \User object \User
$User = UserLoader::load($user_id);
$User->getName();

For such a piece of code the analyser provides a hint that the variable $User here may be equal to null and therefore this code may lead to fatals.

‘False’ type

The PHP language itself contains lots of functions which may either return a given value or a ‘false’. If we had written this function, how would we document its type?

/** @return resource|bool */
 function fopen(...) {
        …
 }

In formal terms, everything here is basically correct: fopen returns either resource or a false value (of Boolean type). But when we say that a function returns a given type, this means that it could return any value from a set belonging to this type. In our example, for the analyser this means that fopen()may also return true. And, for example, in the case of the following code:

$fp = fopen(‘some.file’,’r’);
if($fp === false) {
      return false;
}
fwrite($fp, "some string");

the analysers would complain that fwrite accepts resource as the first parameter, and we pass it bool (because the analyser sees that true is possible). For this reason all analysers understand this ‘artificial’ type as false, and in our example we can write @return false|resource. PHPStorm also understands this type.

Array shapes

Very often arrays in PHP are used as a record type: a structure with a defined list of fields, where each field has its own type. Of course, lots of programmers already use classes for this. But we have a lot of legacy code at Badoo, and it uses arrays a lot. It also happens that programmers may be too lazy to set up a separate class for a given one-off structure and, in these instances, arrays are also often used.

The problem with these arrays is that there is no clear description of this structure (list of fields and their types) in the code. Programmers can make mistakes working with such structures, perhaps forgetting mandatory fields or adding ‘random’ keys, making the code even messier.

Analysers allow you to define structures for the arrays:

/** @param array{scheme:string,host:string,path:string} $parsed_url */
function showUrl(array $parsed_url) { … }

In this example, we described the array with three string fields: scheme, host and path. If, within the function we call another field, the analyser will show an error.

If you don’t describe types, the analysers will try to ‘guess’ the structure of the array. However, experience has shown us that, at least in the case of our own code, they are not very good at guessing :)

This approach has one main drawback. Let’s say that we have a structure which is used a lot in the code. You can’t identify a pseudotype in one place and then use it everywhere. You have to specify PHPDoc everywhere in the code with a description of the array. This is very inconvenient, especially if the array has lots of fields. It will also be a problem to edit (adding or removing fields).

Description of array key types

PHP arrays can contain integer and string keys. Sometimes types can be important for static analysis (and for the programmer as well). Static analysers allow you to describe array keys in PHPDoc:

/** @var array<int, \User> $users */
$users = UserLoaders::loadUsers($user_ids);

In this example, with the help of PHPDoc, we added a hint that in the $users array the keys are ints and the values are \User class objects. We could describe the type as \User[], this would tell the analyser that there are \User class objects in the array, but wouldn’t tell us anything about the type of keys.

PHPStorm supports this format for describing arrays starting from version 2018.3.

Your own namespace in PHPDoc

PHPStorm (and other editors too) and static analysers may understand PHPDoc in different ways. For example, analysers support the following format:

/** @param array{scheme:string,host:string,path:string} $parsed_url */ 
function showUrl($parsed_url) { … }

But PHPStorm does not understand it. However, we can write the following:

/** 
 * @param array
 * @phan-param array{scheme:string,host:string,path:string} $parsed_url
 * @psalm-param array{scheme:string,host:string,path:string} $parsed_url  
*/
function showUrl($parsed_url) { … }

In this case, the analysers and PHPStorm will both be happy. PHPStorm will use @param, and analysers will use their PHPDoc tags.

Tests related to the distinctives of PHP

This type of test is best explained using an example.

Do we all know what the explode() function may return? If you skim-read the documentation, it seems to return ‘array’. However, closer study reveals that it can also still return ‘false’. In actual fact, it may also return ‘null’ and ‘error’ if the wrong types are passed to it, but passing the wrong value with the wrong type of data already constitutes an error. And this is why we’re not interested in this option at the moment.

Formally, from the point of view of the analyser, if a function may return ‘false’ or ‘array’, most likely, there then has to be a check for false in the code. However, the explode() function only returns ‘false’ if the separator (first parameter) is equal to an empty line. Often it is explicitly stated in the code and analysers can check that it is not empty, which means that in the present place the explode() function definitely returns ‘array’ and there is no need to check for ‘false’.

PHP has no shortage of these kinds of peculiarities. Analysers gradually add the relevant checks or enhance them so we programmers don’t have to remember all these peculiarities.

Let’s now move on to the description of the individual analysers.

PHPStan

PHPStan was developed by a certain Ondřej Mirtes from the Czech Republic. It has been under active development since the end of 2016.

In order to start using PHPStan, you need to:

Install it (the simplest way is via Composer).
Configure it (optional).
Run it (in the simplest case:

vendor/bin/phpstan analyse ./src

(instead of src there may be a list of specific files which you would like to check).

PHPStan will read the PHP code from the passed files. If it encounters unknown classes, then it will try to autoload them and understand their interface by reflection. You can also configure a path to the Bootstrap file via which you can configure autoload, as well as include() some additional files in order to simplify the PHPStan analysis.

Key features:

You don’t have to analyse the whole code base, but just a part of it: PHPStan will try to autoload unknown classes.
If, for any reason, some of your classes are not in autoload, PHPStan will not be able to find them and will return an error.
If you actively use magical methods via __call / __get / __set, you can write a plug-in for PHPStan. Plug-ins are already available for Symfony, Doctrine, Laravel, Mockery and others.
In actual fact, PHPStan doesn’t only perform autoload in the case of unknown classes, but it also does so for all classes. We have a lot of old code which was written before the appearance of anonymous classes. We create a given class in one file and then instance it straightaway and, possibly, also call some methods. If you autoload (include) these files, it leads to errors because the code is not being run in its usual environment.
It uses neon-format for configuration (I’m not aware of this format being still use anywhere else).
There is no support for its PHPDoc tags @phpstan-var, @phpstan-return etc.
PhpStan has a playground website https://phpstan.org/. This is very useful for bug reports: you can recreate the error and provide a link to GitHub.

Another feature is that errors have text, but don’t have a type. In other words, the text of the error is returned to us, e.g.:

Method \SomeClass::getAge() should return int but returns int|null
Method \SomeOtherClass::getName() should return string but returns string|null

This example contains two errors which are basically roughly the same: the method should return one particular type and actually it returns another one. But despite being similar the errors have different texts. For this reason, if you want to filter off some errors in PHPStan, do so using regular expressions only.

By way of comparison, errors have a type in other analysers. For example, in Phan this error is of type PhanPossiblyNullTypeReturn, and you can specify in the configuration that you don’t need to check for this error type. Also, for example, if errors have a type, it is easy to collect error statistics.

Since we don’t use Laravel, Symfony, Doctrine and similar solutions, and since we don’t use magical method actively, in our case it turns out that the main feature of PHPStan isn’t something we actually need (moreover, since PHPStan includes all classes to be checked, sometimes its analysis simply doesn’t work with our code base).

Nevertheless, we still find PHPStan useful:

When checking several files, PHPStan is noticeably faster than Phan and a little faster (20–50%) than Psalm.
PHPStan reports make it simpler for us to find cases of false-positive in other analysers. Usually, if a fatal is clearly identified in a code, it is shown by all the analysers (or by at least two out of three).

Phan

Developed by the Etsy company. First commits by Rasmus Lerdorf.

Of the three analysers being considered, Phan is the only genuine static analyser (in the sense that it doesn’t run any of your files; it parses all your code base before analysing what you tell it to analyse). Even for the purpose of analysing several files in our code base it requires about 6 Gb of RAM, and the process takes about 4–5 minutes to complete. On the other hand, a full analysis of the whole code base takes 6–7 minutes. For comparison, Psalm requires tens of minutes to perform this same task. As for PHPStan, we couldn’t get it to perform a full analysis of the whole code base at all, because it includes classes.

Phan has pluses and minuses. On the one hand, it is the highest quality and most stable analyser available, it finds lots of things and creates the fewest problems when analysing a whole code base. But on the other hand, it has two unfortunate features.

Firstly, ‘under the bonnet’ Phan uses the php-ast extension. It would seem that this is one of the reasons why Phan is able to analyse the whole code base relatively quickly. However, php-ast displays the AST tree internally in such a way that it is displayed within PHP itself. And within PHP itself the AST tree does not contain information about comments inside the function. In other words, if you have written something like this:

/**
 * @param int $type
 */
function doSomething($type) {
    /** @var \My\Object $obj **/
    $obj = MyFactory::createObjectByType($type);
    …
}

then the AST tree contains information as to the external PHPDoc for the doSomething() function, but there is no PHPDoc hint information inside the function. Correspondingly, Phan knows nothing about it either. This is the most common reason for a false-positive in Phan. There are recommendations available as to how to place hints (inside Strings or via asserts), but unfortunately, they are very different to what our programmers are used to. In part, we solved this problem by writing a plugin for Phan. We will discuss about plugins below.

The second unfortunate particularity of Phan is that it is poor at analysing the properties of objects. Here is an example:

class A {
    /**
     * @var string|null
     */
     private $a;
 
     public function __construct(string $a = null)
     {
        $this->a = $a;
     }
 
     public function doSomething()
     {
        if ($this->a && strpos($this->a, 'a') === 0) {
             var_dump("test1");
        }
     }
}

In this example Phan will tell you that you can send a null to strpos. You can find out more about this problem here: https://github.com/phan/phan/issues/204.

Summing up: despite some complications, Phan is a really great, useful tool. Apart from these two types of false-positive, it almost never makes an error or, if it does, it does so on very complex code. We also liked the fact that configuration is inside the PHP file; this provides a certain degree of flexibility. Phan is also able to function as a language server although we have not used this feature, PHPStorm being sufficient for what we needed.

Plugins

Phan has a well-developed API for developing plugins. You can add your own checks and improve type inference for your code. This API has documentation, but a particularly great aspect is that Phan already contains ready-to-use operational plugins, which can be used as examples.

We managed to write two plugins. The first was intended for one-off checks. We wanted to assess the extent to which our code was ready for PHP 7.3 (in particular to find out whether or not a case-insensitive constant exists). We were practically certain that there weren’t any such constants, but anything can happen in 12 years and so it was worth checking. So, our first Phan was designed to kick up a fuss if a third parameter were used in define().

The plugin is very simple.

<?php declare(strict_types=1);use Phan\AST\ContextNode;
use Phan\CodeBase;
use Phan\Language\Context;
use Phan\Language\Element\Func;
use Phan\PluginV2;
use Phan\PluginV2\AnalyzeFunctionCallCapability;
use ast\Node;class DefineThirdParamTrue extends PluginV2 implements AnalyzeFunctionCallCapability
{
 
    public function getAnalyzeFunctionCallClosures(CodeBase $code_base) : array
   {  
     $define_callback = function ( 
                CodeBase $code_base, 
                   Context $context, 
                   Func $function, 
                   array $args
      ) {
          if (\count($args) < 3) {    
              return;
           } 
           $this->emitIssue(   
                $code_base,
                $context,
                'PhanDefineCaseInsensitiv',
                'Define with 3 arguments',
                []
          );
      };
      return [   
          'define'  => $define_callback,
      ];
     }
}return new DefineThirdParamTrue();

In Phan different plugins may be used for different events. In particular, plugins with the AnalyzeFunctionCallCapability interface will activate when ‘call function’ is analysed. We built this plugin in such a way that, when the define() function is called, our anonymous function is called, which checks that define() doesn’t have more than two arguments. Then we simply run Phan, find all the places where define() was called with three arguments and check that we don’t have case-insensitive-constants.

With the help of the plugin we also partly solved the false-positive problem, when Phan doesn’t see PHPDoc hints inside the code.

We often use factory methods which we pass that constant and it creates an object based on it. Often the code looks something like this:

/** @var \Objects\Controllers\My $Object */
$Object = \Objects\Factory::create(\Objects\Config::MY_CONTROLLER);

Phan doesn’t understand these PHPDoc hints, however in this code the class of the object may be obtained from the name of the constant passed to the create()method. Phan allows you to write a plugin which activates when the returned value of the method is analysed. And, with the help of this plugin, you can let the analyser know which particular type the method returns.

Our plugin is quite complicated for the article, but you will find a good example in Phan’s github.

Overall, we are very happy with the Phan analyser. We learned how to filter out the cases of false-positive listed above (in simple cases, with simple code) and having done this, Phan became for us an almost exemplary analyser. However, the fact that the whole code base needs to be parsed straightaway (taking time and a lot of memory) continues to complicate the process of introducing this analyser.

Psalm

Psalm was developed by the Vimeo company. To be honest, I didn’t even know that PHP was used in Vimeo until I discovered Psalm.

This analyser is the youngest of our threesome. When I first read the news that Vimeo had released Psalm, I was surprised. Why invest resources in developing Psalm, when Phan and PHPStan already existed? But it turns out that Psalm has its own useful features.

Psalm has followed in the footsteps of PHPStan: you can give it a list of files to analyse and it will analyse them, and it will include classes it can’t find using autoload. At the same time, it only include() classes it hasn’t found, while files which we asked it to analyse will not be included (this is the difference between Psalm and PHPStan). The config is saved in an XML file — for us this is more of a drawback, but not really critical.

Psalm has a site with a ‘playground’ where you can write code in PHP and analyse it. This is very convenient for bug reports: you can recreate the error on the site and provide a link to GitHub. Incidentally, the site describes every possible type of error (by way of comparison, in PHPStan errors do not have types, and Phan does have types but doesn’t provide a single list to help).

We also liked the fact that when outputting errors Psalm straightaway shows the lines of code where they were found. This very much simplifies the reading of reports.

However, Psalm’s most interesting feature that sets it apart is probably its custom PHPDoc tags that allow you to improve analysis (in particular, specifying types). Here are the most interesting custom PHPDoc tags.

@psalm-ignore-nullable-return

Sometimes a method may in principle be capable of returning null, but the code has already been arranged in such a way that this never happens. In this case it is very convenient that you are able to add a PHPDoc hint to the method/function and Psalm will assume that a null has not been returned.

A similar hint also exists for false: @psalm-ignore-falsable-return.

Types for closure

If you have ever taken an interest in functional programming, you may have noticed that a function may often return another function or accept some function as a parameter. In PHP, this style may really confuse your colleagues, one reason being that PHP has no standards for documenting these functions. For example:

function my_filter(array $ar, \Closure $func) { … }

How can a programmer know which interface the function has in the second parameter? What parameters should it accept? What should it return?

Psalm supports syntax for describing functions in PHPDoc:

/**
  * @param array $ar
  * @psalm-param Closure(int):bool $func
  */
function my_filter(array $ar, \Closure $func) { … }

In case of this description it is already clear that you need to send an anonymous function to my_filter which accepts int at input and will return bool. And, of course, Psalm will check that in your code this particular function is sent here.

Enums

Let’s say you have a function that accepts a string parameter and you can only send given lines of code to it:

function isYes(string $yes_or_no) : bool
{
      $yes_or_no = strtolower($yes_or_no)
      switch($yes_or_no)  {
            case ‘yes’:
                  return true;
         case ‘no’:
                  return false;
            default:
               throw new \InvalidArgumentException(…);
      }
}

Psalm allows you to describe the parameter for this function in the following way:

/** @psalm-param ‘Yes’|’No’ $yes_or_no **/
function isYes(string $yes_or_no) : bool { … }

In this case, Psalm will attempt to understand which specific values are sent to this function and will issue errors in the case of values other than Yes and No.

Type aliases

Earlier on, when describing array shapes, I mentioned that, although analysers allow you to describe the structure of arrays, this feature isn’t very convenient to use because the array description needs to be duplicated in different places. The right solution, of course, is to use classes instead of arrays. However, in the case of legacy code going back many years, this is not always possible.

In actual fact, this problem doesn’t only apply to arrays but to any type that is not a class:

array;
closure;
union type (for example, several classes or a class and other types);
enum.

Any type of this kind, if it is used in several places, has to be duplicated in PHPDoc and, if changes are made, the code must be corrected everywhere. For this reason, Psalm represents a slight improvement in this regard. You can identify alias for a type and then use this alias in PHPDoc.

Unfortunately, there is a limitation: it only works within the scope of a single PHP file. However, this already simplifies things in respect of type description. Having said that, it only does so in the case of Psalm.

Generics aka templates

Let’s consider this feature based on an example. Let’s say we have the following function:

function identity($x) { return $x; }

How can we describe the type for this function? What type does it accept at input? What does it return?

Probably the first thing that comes to mind is mixed, that is to say it may accept any value at input and return any value.

For static analysis, if you come across mixed, it’s a disaster. It means that there is absolutely no information as to the type and you cannot make any assumptions. However, in actual fact, although the identity() function accepts/returns any types, it does have a certain logic: it returns the same type that it accepted. And for static analysis that is already something. This means that in the code:

$i = 5; // int
$y = identity($i);

the analyser may determine the type of incoming argument (int), and that means that it can also determine the type of variable, $y (also int).

However, how can we pass on this information to the analyser? Psalm has special PHPDoc tags for this purpose:

/**
  * @template T
  * @psalm-param T $x
  * @psalm-return T
  */
 function identity($x) { $return $x; }

Within Psalm there are good examples of work with templates:

— vendor/vimeo/psalm/src/Psalm/Stubs/CoreGenericFunctions.php;

— vendor/vimeo/psalm/src/Psalm/Stubs/CoreGenericClasses.php.

Phan also has similar functionality, however it only works with classes: https://github.com/phan/phan/wiki/Generic-Types.

Overall, we very much liked Psalm. It would seem that the author is attempting to ‘bolt on’ a smarter type system as well as smarter and more practical hints for the analyser. We also liked the fact that Psalm straightaway shows the particular lines of code in which errors were found and we even managed to achieve the same thing for Phan and PHPStan. But we will come to that a bit later.

Code inspection in PHPStorm

Analysers suffer from the same general drawback: you don’t get information about a given error when you are in the process of writing the code, but only much later. Usually, you write code, then open the console and run the analysers, after which you obtain the report.

For programmers it would be more useful to receive information about errors while editing the code. Phan is moving in this direction, developing its own language server. However, in PHPStorm this is an inconvenience.

That being said, fortunately PHPStorm has its own excellent analyser (code inspection), which in terms of quality is comparable to the solutions described above. And, in addition, it has a great plug-in called Php Inspections (EA Extended). The main difference of the analysers is their convenience for the programmer, namely that errors are visible in the editor as the code is being written. Also, these inspections can also be very finely tuned. For example, in your project you can select different file scopes and configure inspections for them in different ways.

I would also like to draw attention to a very useful plug-in called deep-assoc-completion. It understands the structure of arrays very well and simplifies autocomplete for keys.

Use of analysers at Badoo

How does it work in our case?

At the present time several of our teams are using static code analysis, and we plan to introduce this practice for everyone.

The only files we analyse are amended files — right down to the lines of code. That is to say, when the developer completes their task, we take their git diff and run the analysers only in respect of amended/added lines, and from the list obtained we remove those files which relate to old (unamended) code. In this way, we ‘hide’ from the developer any mistakes that were made previously.

Of course, this approach is not quite right: a programmer working on their code might ‘break’ something beyond the scope of their git diff. This is where we made a compromise. Even in this situation using static analysis yields results in the form of errors found in the new code. And we don’t want to force a programmer to correct old mistakes. However, of course, in future, when from an analyser’s point of view our code is cleaner, we will be reviewing this decision.

With reports from the three analysers in hand we then combine them into a single report, in which errors are grouped in terms of files and lines of code:

As we were in the process of compiling the above report we also attempted to delete some cases of false-positive. For example, remember that Phan has problems determining the type for object properties? So, if there is a given line of code which only Phan complains about, and if the error type is similar to that which Phan often gets it wrong, then we ‘hide’ the error in question from the programmer.

The place of analysers in our QA

We try to reduce the number of bugs at production in all sorts of ways:

We use several kinds of autotests;
We have code review;
We have manual testing.

Static analysers basically constitute another tool in this list and they complement the other items well. Static analysers have a number of advantages:

They deal with 100% of the code (unlike tests written for each section of code);
They often flag up errors which are difficult to identify during the code review process;
They are even able to analyse code which is difficult or impossible to run when using manual testing.

Introducing static analysis originally resulted from the idea of introducing strict types. However, the outcome has been that static analysers have yielded more checks than strict types, and have proved to be more flexible:

Analysers work for the whole code, whereas in order to see strict types errors you need to run the code;
An analyser’s error can be corrected later, if it is not critical (this is possibly not best practice, but in some cases it can be useful);
The type system in static analysers is even more flexible than in PHP itself (for example, it supports union types which do not exist in PHP);
Static analysers move us closer to the adoption of strict types, since they are able to emulate such strict checks.

Analyser reports: programmer opinion

It cannot be said that all programmers are happy with static analysers. There are several reasons for this.

Firstly, lots of people are somewhat sceptical about analysers believing them capable of finding only the most basic of errors that we normally wouldn’t make in the first place.

Secondly, as it has already been stated above, a lot of what is contained in the analyser reports simply inaccuracies, for example, wrongly-identified types in PHPDoc. Some programmers are not really bothered about such errors; after all, the code still works.

Thirdly, some programmers have high expectations. They thought that analysers would find all kinds of smart bugs, and instead analysers force them to add type tests and to correct PHPDoc. :)

Overall, the benefits that analysers deliver outweigh these somewhat insignificant downsides. And whichever way you look at it, they are a good investment for future code.

Tags:

● php

● static code analysis

● PHPStan

● Phan

● Psalm