I’ve been a Mac user for many years now, and I’ve seen people use Macs in many ways.

Professionals should be able to use the tools available to them properly, otherwise it is mostly overkill, and amateurs should be able to enjoy their system without too many difficulties.

That applies to the great macOS as well, and there are 3 things that are usually overlooked by most of its users, even the most experienced ones.

Configure the dock properly

The macOS dock is, by default, located at the bottom of the screen, fixed in that position. That is actually not the best choice, and…

My final submission for the Applied Data Science Capstone project from the IBM Data Science Professional certification offered by Coursera.

All the code for this project can be found on the dedicated GitHub repository.


In a few months, I’m moving to Cambridge, UK to start a new job as a software developer. I’m currently looking for a flat, and I’ll soon be looking for a bicycle to commute easily and keep healthy.
This brought me to a simple but stimulating idea for this final project.

Suppose we may want to open a new bicycle shop in Cambridge, UK. I’ve already…

Most commonly, data are shared and worked with using a tabular format, which means observations are stored in rows and variables in columns.

Based on the number of available rows and columns, the file size of these data can range from a few KB to many GB. There is also an additional factor that determines the final size of the data: the data type of each variable; numeric variables usually require less space that more complicated data types, like characters. …

XML is a markup language used to represent and distribute data structures which can be often difficult to create using more standard tabular formats.

Basically, the XML format is similar to HTML (which is another markup language, indeed), in that data are organised in elements, which define the type of information exposed, and each element contains the actual value in the form of content or attributes.

The XML page on Wikipedia offers an extensive overview of all the details and technicalities of this format, but the key concepts are simple. …

When you need to concatenate a Python string with some variables, there are actually several ways to achieve the same results.
Some of them are quite basic, others are less recommended, but all of them are equally efficient for string formatting purposes.

The basic way

The simplest method for concatenating strings and variables is the following:

>>> print("hello" + " " + "world")
hello world
>>> age = 28
>>> print("I am " + str(age) + " years old")
I am 28 years old
>>> print("Pi is equal to " + str(3.14159265) + " and so on")
Pi is equal to 3.14159265 …

The Linux PATH is an environmental variable that contains all the directories that the shell will search for executable files, when a user issues a command.

It can be modified temporarily or permanently in order to include specific software, so that having to type the whole path of that software won’t be needed anymore. Setting the PATH variable can also be useful if a user wants to use a different version of a software already included in the PATH.

Since I often use different types of shells to do my job, I decided to highlight how to change the PATH…

One of the most important things to do after setting up a new Linux server (or after taking over an existing one) is to create a new user, possibly with sudo powers. Sudo is a special Linux command that allows users to perform administrator tasks even if they are not system admins.

The main reason for having a sudo user (or sudoer) is because logging in as root is usually not desirable, since it can cause troubles more often than not, but we may still want to be able to perform administrator tasks with a non-root user. …

Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called Phred quality score (or Q score).

The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:

Q = -10 log10 P

In the real world, a quality score of 20 means that there is a possibility in 100…

A well-established bioinformatician usually has a handful of appropriate informatics tools to manipulate and analyse genomic data, for example counting sequences in a file.

Nonetheless, in some cases it may be useful to rely on standard Unix commands, for example when your trusty laptop is not available or you’re working on someone else’s machine.

FASTA files

A .fasta file is a simple plain text file in which every sequence is represented by a header line, beginning with > and containing the sequence identifier and details, followed by a number of lines containing the actual sequence:


I often find myself looking for some shortcuts and quick commands while working in the Unix terminal, so here is a list of the most useful ones to perform common tasks.

File system

ls – list items in the current directory
ls -l – list items in the current directory in a long format, to see permissions, size and modification date
ls -a – list all items in the current directory, including hidden files
ls -F – list items in the current directory showing directories with a slash and executables with a star
ls [dir] – list items in the directory [dir]
cd [dir] – change directory…

Roberto Preste

I’m a Scientific Software Developer, with a PhD in Bioinformatics, located in Cambridge (UK). I like writing about programming, data science and bioinformatics.

