Ability to Infer Types or What Python Got to do With It

Frank Fischer
DeepCodeAI
Published in
4 min readApr 20, 2020

DeepCode recently added a feature to infer types from hints in source code. For “hard-typed” languages like Java, types are pretty straight forward. For languages like Python, they are not. Let us start with a small example and dig into the details after that.

Example First

Our example below uses Python and if you want to follow along, you can use this link.

The suggestion on the left says that a variable that is a list might be accessed using the index operator by providing a string. While this would normally lead to a crash, in our case since the code uses a try except pass cadences in line 118 to 129… well, it is a kind of creative use of the exception mechanism. And while it seems kind of smart at first, I guess not a best-practice as it covers possible exceptions and there is a severe performance penalty to recover from an exception. Let us unveil what happens here.

As you can see the variable params is initialized using the function get_params() (see line 113). Since later on we are using the index operator [] with a string (see lines 119, 123, and 127), we would need this to be a dictionary (signalled by {}) or something along that lines. A list (signalled by [] ) allows integers as indexes but no string.

Let us have a look at get_params() now:

As you can see, line 81 the return value is the variable param which is initialized to a list in line 66. If you follow the paths, you see that param actually becomes a dictionary in line 74. But only if len(paramstring) is greater or equal to two (line 68). Otherwise, the param stays a list and we will see the creative use of the exception mechanism kicking in. I spare any comment if someone should use exceptions in that regard and focus on what happens from a language point of view.

Behind the Scenes

The DeepCode engine analysed that piece of code and inferred that the return value of get_params() can be list or dictionary. Python is less strict with types like for example Java. And in Python, it is quite common to use different types for the same variable to signal different states. Following the flow of the application, DeepCode noticed that later an access mechanism is used that is not compatible with all possible values. This is what is mentioned in the suggestion.

The DeepCode engine solves several tasks here: First, the possible types of an element in memory. Next, the path this element takes within the code. It might be — just as in our case — that variables get assigned to results of function calls or other variables. Last, DeepCode checks the usage of these elements for example in function calls or operators. The engine learns which types are acceptable where by using a combination of open-source code usage, documentation scanning, and human engineers reviewing the rules. Augmented AI at its best.

This feature is especial interest for developers using languages like Python or JavaScript. In these languages, variables can change type. A lot of developers put this to great use as the type is used as a flag to carry information. But sometimes, you can entangle yourself and these faults are hard to trace. In the above’s example, our developer — instead of making a call and catch the exception — could have delivered a None back if there is no parameter, check for this case and act accordingly.

We added several new rules that find patterns like passing linear data structures into **kwargs or map data into *args , attribute address on primitive values, mutation of immutables, iterating over non-iterables, and more.

In DeepCode, we are committed to serve the developer community to achieve the best code quality possible. We provide our service for free for teams below 30 people and for open source projects. Simply, check it out at DeepCode.AI

--

--