A Comprehensive Beginner’s Introduction To Linux For The Data Scientist

Kaelyn Parris
13 min readApr 12, 2023

--

For the data scientist first entering the world of Linux it can be an intimidating place. Downloading a distribution isn’t enough, as the user is thrust into confusing terminal commands, an alien vocabulary, and on top of that someone else’s Linux looks completely different to yours, and takes totally different commands.

Thankfully, getting an overview of the complex map of the open source landscape doesn’t need to be painful, or require years of running it on your own. By familiarizing yourself with a few terms, a data scientist totally new to the world of Linux can quickly get up to speed and choose which tools they need.

But before we begin, a lot of terms are going to be thrown around and it helps to have a place to come back to if you see a word you don’t understand. I’ve tried to condense everything the beginner needs to know at one section, feel free to read it at the start, or refer back to it later.

What is Linux?:

Linux is a kernel. Technically the operating system we think of as Linux is GNU/Linux — the GNU operating system running with the Linux kernel. Nowadays it is just called Linux, and the difference is irrelevant for the home user.

What is a Linux distro?:

Short for distribution. A Linux distro is a full suite of software and packages that form a complete Linux desktop. A Linux distro can be thought of as windows releases. While there is only one ‘distro’ per generation for windows, Linux has many distributions.

What is a package manager?

There are two primary linux packages (besides compiling from source) .deb, and .rpm (redhat package manager). Each requires software to use, and that software may vary in ease of use. The most common package manager is apt-get, and used by ubuntu-based distributions.

Why does the base of the distribution matter?:

Many distributions are based on Ubuntu. For beginners, this is a very good thing. Almost all support articles written on the internet are written for Ubuntu, and the instructions given will also work in any of it’s derivatives. When looking to other bases, and other package managers, help may require going to that community, or may be harder to find.

What is a LTS (long-term support) release?:

This is specific to Ubuntu and important to know. Ubuntu provides regular yearly releases with the latest software and kernel, and also long-term support releases that provide a stable desktop.

What is a rolling distribution?:

There are two options when it comes to distributions — snapshot distributions, and rolling distributions. Snapshot distributions are the most common — it is a “snapshot” of extensive developer work to create a comprehensive package release that is also stable and usable for the home desktop. Software is usually static and only given security updates. This option requires installing the next release after a few years.

Rolling releases constantly provide the user with the latest software and security updates. The primary drawback of rolling releases is they are prone to breaking, and require much more work to maintain on the part of the user. Think of snapshot distributions as having that work done for you, while on rolling distributions you have to do it yourself.

How do I find Linux distributions to download?:

The single best resource on the internet is DistroWatch.

What affects the user:

For a beginner coming into the world of Linux, there are two main things to consider:

The back end:

The back end includes a critical program that you will be interacting with constantly, even if you don’t realize it: the package manger, and also the distribution’s repos — a place to pull trusted and up-to-date packages from. Windows users before 11 and later 8 will recall having to download all programs from websites on the internet, which also required being able to differentiate between virus programs and legitimate programs.

It also includes process supervisor software, like SystemD, and a display server protocol, like Wayland or the X Window System. These are much lower-level programs, and for a beginner, simply knowing their existence is enough. Should you want to dive to that level, the options are there.

The front end:

This is where you will be interacting with your system 99% of the time. It includes a desktop environment. If you are not familiar with what a desktop environment is — think of the windows 11 desktop. The desktop environment is the task bar, the window bars, the desktop, everything that you interact with. Windows 11 has a different, but similar desktop environment to windows 8 and 7.

In Linux, there are lots of different desktop environments to choose from, and which one you decide on mostly comes down to personal preference, and how fast your system is. The modern desktop environments will typically be the preferred choice when system resources aren’t an issue, but which modern desktop comes down to how you like to use your system, or whether you would like to try something new.

The Primary Desktop Environments:

This is by no means an exhaustive list of the desktop environments, but basic familiarity with the following will ensure the user has used the primary technologies used to build Linux applications — GTK (typically associated with Gnome) and QT (typically associated with KDE).

Modern Desktops:

KDE Plasma 5:

The desktop windows users will find most familiar, and also my personal favorite desktop. The primary drivers of the Qt GUI software, KDE was once an extremely powerful and advanced desktop, that was plagued by a horrible default theme (oxygen was egregious, no one can convince me otherwise), and a waterfall of options and customization that made the desktop cluttered and clunky. Starting with Plasma 5 KDE rebuilt it’s entire aesthetic design from the ground up, as well as redesigned their menus to be much more user friendly. Today KDE stands as a powerful, modern, fast, and flexible desktop that provides an out of the box theme that doesn’t have to be changed, and a modern flow that is intuitive. KDE also stands as the most customizable Linux desktop, bar none. Users looking for unique desktops, and old features of the compiz window manager, such as wobbly windows and the 3D desktop will find their window manager, KWin, to fit all their needs.

Gnome 3:

What was once the primary flagship of the traditional desktop world, and the developers of the much loved and widely used GTK GUI development software, changed it’s course towards a more modern experience, to an outcry of both love and hate. In it’s original release Gnome 3 was very limited but provided glimpses at a new paradigm of work. Seeing the desktop more as minimalistic, rather than a tablet, allows one to see the potential for speed, and also focus, the desktop provides. The desktop is unlike any others on the market, with a unique way of starting applications, and also a new innovation — dynamic desktops. Gnome 3 excels at single-task focus, with ideally one, or a few windows being open. Gnome 3 has progressed to become a solid and unique desktop experience. If given a chance, and not modified straight away to be more traditional, Gnome 3 can offer great improvements to focus, but some users may find the flow to be too hard to adapt to.

Cinnamon:

Cinnamon is a unique desktop developed by the Linux Mint project for their own Linux Mint distributions. It was developed after the traditional gnome 2 desktop was abandoned and the tablet-like Gnome 3 became the central focus of the Gnome project. It is based on GTK, essentially a traditional Gnome 3. Windows users will find this desktop both familiar, and a pleasant experience when compared to Windows.

Unity:

Unity is a desktop environment created by the Ubuntu project, and as of recently, abandoned for Gnome 3. The desktop environment is still maintained by the Ubuntu Unity project, allowing users to once again use the much loved, and much hated, Unity. This is one desktop environment I really enjoyed, especially it’s adaptive theme to the desktop wallpaper, and the global menu Mac OS users will be intimately familiar with. This is an easy to use, but non-traditional desktop, that sets itself apart from Windows and Mac both.

Pantheon:

Pantheon is a unique desktop environment, built by the elementary OS project. Using GTK, running on Ubuntu’s LTS base, a suite of their own apps, and a custom built unified desktop experience. Upon first glance, most people will think it looks like Mac OS, but it’s workflow and experience is very different. This offers a very nice looking, and fast experience for those who like their desktop to look pretty, and for those perhaps more familiar with the Mac side.

Traditional Desktops:

These desktops follow a traditional, older paradigm, but are by no means outdated. These desktops are extremely lightweight, unbelievably fast, and can be used to breathe life into old hardware once again. They offer an older way of doing things without sacrificing the conveniences of modern software.

A big selling point of the following desktops is their ability to use both Qt and GTK applications (especially LXDE). When crossing lanes the themes will still clash, but it will be much less harsh than on KDE or Gnome.

Mate:

Mate (pronounced “mah-tay”) is the remnant of the gnome 2 code, maintained and updated into the modern day. The code and applications are modern, but the desktop provides, once again the much loved and familiar applications-places-system, double bar paradigm that was once known as “the Linux desktop look”. Users craving that classic Linux look will find the environment heavenly, and for those looking for a more modern workflow (as in a windows menu to start applications) will also find the customization options present to adapt their desktop to their preferences.

Xfce:

A fast and lightweight GTK-based desktop environment, xfce stands as one of the smallest (in terms of download size) and also most popular Linux desktops. It offers modern aesthetics running on the x window manger, a unique suite of traditional apps, and continued development and updates to this day. Xfce can be customized to fit almost any workflow, and offers the user an incredible level of customization over it’s desktop, surpassed by only KDE.

Lxde:

An extremely lightweight Qt-based desktop. LXDE is one of the fastest desktops out there, and also my least favorite. It’s primary use is for old computers with limited power. Users will find it to be extremely minimalistic, perhaps to the point of being too limited.

Tiling Window Managers:

Here there be dragons! This is not an individual desktop, but a group. Including such desktops as i3 and Awesome WM. Users looking for a “hacker” desktop look will find this to be everything they’ve ever dreamed of and more. These desktops are created by-developers, for developers, and allow for almost every aspect of your space to be coded yourself. For those with the programming know-how, and the initiative, will find the desktop to be a very fun experience, but those without will find it to be extremely limiting and frustrating; as most applications aren’t designed for such window managers, and are maximized anyway. Full immersion in such a window manager requires thorough mastery of the command line, as well as command line programs in order to get the most out of.

Deciding On A Distro:

What follows is a list of, what I think are, the most important Linux distributions to become familiar with as a new user. Running these desktops at least once will give the user a wide array of experience and comfort with both the front end and the back end of Linux.

The distributions will be organized in descending difficulty.

NOTE : Do not think that the goal is reaching an “advanced” distribution. The difficult reflects the level of knowledge required to maintain the system. Linux is an ocean; one can dive very deep and explore and customize to their specific needs and as they see fit, but one can also enjoy floating on the surface too. The level of detail required for your system will depend both on your needs and how much you want to put into your system. It can be as boring as you like, or as exciting as you like (and also likely to break).

Easy Distributions:

Linux Mint:

Linux mint is by far the best distribution for a beginner to use. It has a large and helpful community for Linux newcomers, as well as a desktop made by the community, and totally focused on making the new user experience as easy as possible. If you want a system to be as boring as possible, and very stable and reliable, Linux Mint is an excellent choice. This distribution is based on Ubuntu LTS, and boasts all the support that Ubuntu does as well.

Ubuntu:

Not quite as easy as Linux Mint, but still the most common distribution used by beginners. Ubuntu is legendary for being one of the first (if not the first, then the most successful) attempts to bring a linux desktop to the home user, instead of being designed for developers and servers. Based on Debian testing, It is responsible for making the gnome 2 desktop the face of the Linux desktop, and along with the compiz window manger, was once the most widely used distribution and desktop used. With the abandonment of gnome 2, and a new direction by Canonical (the developer of Ubuntu), Ubuntu went towards a more modern, tablet-like interface with Unity, and alienated many users. The desktop is still one of the most popular and easy to use, and is also what most support articles are written for.

Alternative Ubuntu Flavors:

Ubuntu currently uses a modified gnome 3 as it’s flagship desktop. Should that not be to your liking you can find community-maintained distributions with alternative desktops, such as KDE, LXDE, Xfce, and others.

Elementary OS:

Designed by elementary, elementary OSboasts a totally unique, and well designed desktop that is fast and functional. This is based on the latest Ubuntu LTS.

Ubuntu Mate:

Ubuntu Mate is a newcomer in the beginner desktop space, and offers a user-friendly, beginner-friendly experience that rivals that of Linux Mint. Users will find the traditional gnome 2 desktop, along with easy options to change paradigms.

Intermediate:

All of the above distributions were based on Ubuntu, and therefore all use the same package manager, apt-get. From here on, the package manger, and significant portions of the back-end change. Many articles that have apt-get specifically will no longer apply, and the user may need to seek support in the distribution’s respective communities.

Fedora:

Developed by the fedora project, and operating under the titan of Linux servers, RedHat Linux, Fedora is a desktop designed for the home user, and features cutting edge software in it’s repos. This is the closest the home user can get to the cutting edge without using Arch Linux. This is an RPM based distribution, uses YUM as it’s package manager, and gnome 3 is it’s flagship desktop.

OpenSUSE:

Also an RPM based distribution, and also a home desktop created by a large provider of server Linux software — SUSE. OpenSUSE offers a wide variety of choices, and also a choice between a classic snapshot distribution, or a rolling distribution.

Manjaro:

This distribution is based on Arch Linux, with tools and a user guide designed to make the experience a whole lot easier. Manjaro linux took the linux world by storm when it came out and still stands as one of the most popular linux distributions. It offers a wide variety of desktop environments.

Advanced:

Arch Linux:

Arch Linux is one of the most notorious, and also most misunderstood distributions. The install process of arch Linux is legendary, having the user build their desktop up from a command line and basic Linux tools. The install process has become much easier, in fact, the whole thing can now be automated. The arch Linux community once had a negative reputation, a common instance being told to RTFM (read the freakin’ manual). Arch Linux has changed greatly over the years, and really isn’t that intimidating of an install, or to maintain. The community has also changed, and despite what you have heard (or might not have), it should be given a fair chance. This distribution also boasts an incredibly powerful and popular package manger, as well as a ton of packages, and the legendary AUR (arch user repository). The AUR is totally community maintained, and sometimes well-maintained and high quality packages make it in the primary repos. If you can’t find the software in the arch repos, chances are someone has uploaded it to the AUR.

Gentoo Linux:

Here we reach what is probably the most advanced distribution on DistroWatch, and also probably one of the most niche. Gentoo has a very similar install process to Arch, users with experience here will find themselves carrying out many of the same actions. What sets Gentoo apart from Arch is that Arch (despite it’s already high difficulty) automates far more of the install process than Gentoo. The primary selling point of Gentoo is complete control over the software installed on the computer. The distro intentionally has no precompiled binaries, and allows the user to optimize the packages for their specific hardware. This is a distribution that appeals to a niche of a niche of a niche. Nowadays everyone is using the same hardware and the hardware on your machine isn’t an issue anymore, but the distribution still exists for those who want it.

And with that the reader should know everything they need to understand the basic Linux landscape. Good luck, and happy coding!

--

--