Basic Python: a concise guide

Sam Campitiello
Towards Dev
Published in
12 min readSep 25, 2022

--

I started using Python during the first year of my master’s in Astrophysics (so, several years ago :D). I needed a powerful and fast tool to compute integrals of complex functions without an exact solution. In the beginning, I was surprised by the fact that what I have done until that moment was a very simple task with Python that simplified my PhD life a bit :D From that moment, Python was the main tool that I used for complex computations.

According to Wikipedia:

Python is a high-level general-purpose programming language

widely used for a very large number of tasks and it is already preinstalled on Linux. I have never taken notes for this programming language given that I learned everything I needed “on the field”. However, this article will contain some notes to keep everything tidy and organized even though what I will show is very little compared to the whole system of functions and functionalities that Python has. Usually, I use IPython to work with data, which is a more interactive command shell of Python.

Anyway, let’s go into it and remember: I may have missed something by writing this article, but the arguments discussed here can be considered as a starting point to begin with!

Access, variables, and arrays

It is possible to access to IPython from the terminal by typing ipython, and exit from it by typing exit. Once in the environment, commands can be executed. It is always possible to write them in a file (with extension .py) and then execute this latter from the terminal:

As in all programming languages, objects like variables, vectors, matrices, and arrays have their proper command for creation and modification.

A list is a vector of elements that can be modified contrary to a tuple whose elements cannot be modified. Instead, a dictionary is a way of defining things made of a name and an element (or elements). Variables can be displayed in several ways:

The function print is used for displaying objects of different kinds (i.e., numbers, strings,…) specifying also the number of decimals (when dealing with numbers). Arrays are objects that can have any dimension and they are widely used (they can contain all kinds of elements). To define them, the library numpy is needed (containing also a lot of very useful functions for data manipulation):

The first line imports the library numpy and renames it as np: this means that in order to use the related functions, it is possible to write np instead of the whole name. As the arrays, also matrices can be defined with the same library (notice that matrices have only two dimensions). It is possible to manipulate and extract information from arrays and matrices:

Important: contrary to other programming languages (like Octave,R, or Julia), in Python, the first element of an object is represented (or called) with the index 0 (not 1)! Here is a list of some special arrays and matrices:

Element selection and search

Accessing the elements of an array can be done by specifying the related index (or indices):

When dealing with arrays, finding particular elements given a certain condition (or conditions) may be useful:

All these commands above can be used for every kind of object, also for data-frames for which some extra functions are used for locating particular elements.

Basic operations

The numpy library is very useful for mathematical operations:

The first command works for both arrays and lists (of numbers). However, the equivalent x+y works differently for lists (i.e., it concatenates them instead of summing up the elements). Notice that, functions without np are not related to the numpy library. Other useful functions are:

Arrays can be combined in different ways and elements can be added:

If statements and loops

As in all programming languages, If statements and loops are very useful ways of creating conditional operations and performing multiple (recursive) tasks:

In If statements, multiple conditions can be defined by adding elif; when only one condition is present, if is sufficient (even without the else) meaning that if the condition is False, nothing happens. In while cycles, the condition is usually referred to the counter i which must be present inside the loop (of course, the increment can be different, e.g. i=i**2, depending on the task or other conditions). Finally, the for loop is very similar to the previous one except for the absence of the counter: what it does is perform a task for each element present in a range. If the task is not related to those elements, the range of elements is used as a counter.

Functions

A powerful way of using all the above commands for specific tasks is the creation of functions that allow simple or complex operations without always specifying them. The syntax for creating them is the following:

The operations under function_name use the specified arguments (a,b,c,…), and there is always a result (specified with return). It is also possible to create a multiple output function:

If a function (function_name) is created and saved in a file (file_name.py), it is possible to import it:

and use it by calling it (file_name does not have .py at the end!).

Import and export data

When data are listed in a file, it is possible to import them and work with them. In the same way, once data have been created, it is possible to export and save them into an output file:

Data-frames: definition and information

Data-frames are tables containing data, organized in rows (i.e., records) and columns (i.e. features). In Python, it is possible to work with data-frames using the library pandas containing a large number of different functions. Remember that, once data are imported with pandas (see the previous section), they are already organized in a data-frame. A data-frame can be defined in two ways:

For what concerns the index names associated with each record, if a list is not present then the record indices are integers, starting from 0. Notice that, data are defined as a dictionary but it is always possible to define a data-frame from an array. Once a data-frame is created, it is possible to access some information about the structure and the data it contains:

The last line can be used in e.g. loops and decision-making statements. The types “int64” is associated with integers, “float64” with floats, and “object” with strings.

Data-frames: element selection

As already seen in the last sections, elements (i.e. records and columns) can be selected in different ways by specifying the corresponding row/column indices. Moreover, the selection can be made by adding particular conditions on the data:

The last line can be used to select only those records fulfilling the particular condition, for a specific column, or for the whole data-frame. Important: the difference between iloc and loc is explained below:

The former is used with the indices of the elements in order of appearance (i.e. index 1: row 2, index 4: row 5,…), while the latter is used with the indices as reported in ”index”.

Data-frames: (re)name columns

A data-frame can be created by assigning to each column a name. However, when it is created from arrays, columns are marked by numbers and can be called only with indices. In the first case, the column names can be changed with new ones while, in the second case, it is always possible to define them:

Modifications are saved in the data-frame by defining this latter before the commands (df =), as shown in the first two lines. However, it is also possible to do that without using identities by inserting the command inplace = True meaning that modification is directly applied to the data-frame (this command can be used for any kind of modifications inside the data-frame).

Data-frames: concatenation and merging

Data-frames can be merged or concatenated, inline (vertically), or as columns (horizontally, one after the other):

It is easy to see that, merging data-frames is very similar to the same SQL task: the lines left-on and right-on specify the reference columns of the two data-frame for selecting the common values. The merging type how can be “inner” for a classic interception, “left” for interception plus elements of the first data-frame, “right” for interception plus elements of the second data-frame, and “outer” for the union of the two data-frames.

Data-frames: melt and pivot

The function melt is very similar to the one defined in R. A column (or a list of columns) is selected as “identifier” while the others are treated like variables along with their values:

Similar to melt, the function pivot can organize data according to some columns. There are indices (representing records in rows) and columns (representing features) whose values are those related to another column:

When a particular value is not present (e.g., for type A, there are no records when the year equal to 65), it is substituted with NaN. In the second example, the two identifiers are combined together (more precisely, their unique values are combined).

Data-frames: add, replace and remove elements

It is possible to add new records (i.e., rows) and new features (i.e., columns), contained in lists or data-frames:

The last command put the new column in the last position, i.e. it is a sort of concatenation between the data-frame and the new column. It is also possible to delete records and columns, or replace them with new values:

In the second line related to the function “replace”, the two lists must have the same number of elements. Moreover, if the replacement is not applied to specified columns, it will be done for the whole data-frame!

The replacing command is very useful for converting categorical string values into numeric values. Moreover, particular substitutions can be done also with the command map:

In this last example, the command checks if “string” is present in each element of the specified column and then it makes the sum of all the verified conditions (i.e., the sum of True values).

Data-frames: grouping and sorting

Dealing with data-frames often leads to the need of working with specific data or some groups according to particular conditions. Data can be grouped together with the function group by. Once done, it is possible to apply a specific function:

The command value counts is used for counting unique values in a Series of data (useful for counting categorical features). So, in a nutshell, the function group by is used only for gathering data together (according to specific columns) on which a function (or functions) can be applied.

Another way of grouping data is to split them into bins. For this task, the function cut is appropriate: for a chosen numeric column, the function found the maximum and minimum values and uses them to split the range into a certain number of bins. For example, for the array [1, 2, 3, 4, 5, 6], the maximum and minimum values are 6 and 1, respectively; suppose to split the array into 2 segments: the bins will have a width (6 − 1)/2 = 2.5, therefore the two segments will be [1, 3.5] and [3.5, 6].

So, the two intervals are [60, 67.5] and [67.5, 75]. Once this splitting is done, it is possible to apply specific functions. Also, it is possible to label each value according to its interval:

Finally, values inside a data-frame can be sorted according to specific columns:

Data-frames: null values

Data-frames can often contain null values i.e. cells where no value is present. Several operations can be done on those cells:

In the first commands, by adding .sum(), it is possible to make the count of null (or not null) values. The last commands are very useful in data analysis when null values must be substituted (if not removed) before moving on.

Statistics on data

Python provides different functions for statistical analyses that can be applied to arrays and data-frames. Here is a list of some functions:

The commands with and without np (i.e. the numpy library) can be used both on arrays and data-frames giving the same results. The command fill-diagonal can be used in particular cases when only some cells of the diagonal must be filled with a value (e.g. Regularization in Regression or Classification models). Finally, the Correlation matrix is very useful to study the correlation coefficients between different features (its representation can be made also with the library Seaborn — see next Section).

Plots and charts: data visualization

There are several ways to display data. The commands used for describing the style, the width, the color, labels, titles, etc. are almost always the same for all kinds of plots. First of all, once a chart has been created, it is possible to save it into an external .PNG file:

Here is a list of the main visuals and charts (for more details and plot options/parameters, see the online Documentation):

The visuals above can be built using the special Python library Seaborn.

Seaborn library

It is a library based on matplolib and it provides a high-level interface for drawing attractive and informative statistical graphics. Here is a list of some of them (for more details, visit the online Documentation — here data is organized in data-frames, easier to use):

Basics of Object-Oriente Programming

Before going into some details about it, a distinction must be made to have a clearer and very broad view of this kind of programming:

  • Procedural Programming (PP): something is done through the usage of functions.
  • Object-Oriented Programming (OOP): something is done through the usage of objects.

In the first case, data and functions are separated: they are handled separately, and to use a function, this latter must be imported first and then data must be passed into the function. In the second case instead, data and functions are treated together: there is no need to import a function or to pass data into it. In this way, the code is easier to preserve and it is less easy to introduce bugs in it. Here is the example “Area and Perimeter of a Rectangle”:

  • PP: we must define the function “Area” and the function “Perimeter”. Then we pass the data into the function “Area” and into the function “Perimeter” to obtain the final results.
  • OOP: data must be passed once in order to calculate Area and Perimeter. Everything is inside the so-called class.

A. Definition: attributes & methods

A Class is made of two features: attributes (they are variables, e.g, base, height, radius,…), and methods (they are functions, e.g., “Area”, “Perimeter”,…). An object created from a class is called Instance:

In this example above, Myrect is the instance created from the class Rect where the base b and the height h are specified.

In the following example, a class Test will be defined: it will have one class attribute (i.e. an attribute defined in the class and that can be shared by all objects inside it) and one instance attribute (i.e. an attribute attached to an instance of the class). Moreover, three functions will be defined: a function to print “Hello”, a function to print a name, and a function to return the square of a number:

The function init is needed to define attributes (self always there!). When defining a function, after self, there may be also other arguments (like in func3). For printing strings, the function to use is print otherwise return to display an attribute or the output of a function. Remember that print must be put before return otherwise, it will not be displayed!). All attributes, functions, and outputs can be called and printed in this way:

Inside the class, each attribute can be used as self.attribute. From the examples above, it is possible to change the value of an instance attribute but it is also possible to add new attributes:

B. Methods and functions calling each other

It is also possible to define a method that calls some other methods inside a class:

The function self.func1 can be used in func2 as a variable. It is also possible to define simple functions (which are not methods!) inside a class:

To call the simple function inside the class, its name must be written along with the name of the class (i.e. Test.simple_func).

C. Inheritance

When a Class is created based on another existing one, we talk about inheritance: the class “Rectangle” is the base class while the class “Square” can be considered as a subclass (i.e., the particular case when the base is equal to the height, i.e. b = h). When a class is built on several other classes, this is called multiple inheritance. Suppose we have defined a class “Person” (i.e. base class) with an attribute A and a method B:

In this example, the class “Time” will have the same attributes and methods of “Person” without re-defining them. However, if a new method is defined in “Time”, it can be called only by “Time” and not by “Person”.

D. Special Methods

There are hundreds of special methods that can be defined inside a class to have more information about attributes and to perform particular operations. Here is a small list:

E. Importing a class

As for other classes and libraries (e.g. math, numpy,…), once a class is created, it can be called along with its attributes and methods:

  • Create the class (or classes) class_name and save it (them) in a file file_name.py
  • Call it (them) by writing in IPython (or in a Python script):
  • At this point, the class has been imported with all its methods and attributes.

--

--

I am a Data Analyst with a Ph.D. in Astrophysics who follows his passions, from science to sport, up to the Ancient Egyptian culture and the Data Science world!