Gigantum Quick-start #1

Published in

Gigantum

6 min readMay 9, 2019

This is Quick-start #1 for Gigantum, the first in a series going progressively deeper into the the platform’s capabilities.

In this first post we answer:

What is Gigantum?
What is a Project?
What is the Activity Feed?

This Quick-start is meant to be interactive, and to follow along you can use the cloud demo available here. If you would prefer to work locally, you can see how to download Gigantum here.

1. What is Gigantum?

Gigantum is a data science platform that lets you work locally but share globally with a single click. It combines a browser based work environment with a publishing platform to enable easy reproducibility of data science and scientific research.

This is the main Projects overview page. Note it is pointing at localhost.

The MIT licensed Gigantum Client runs locally in a Docker container and automates everything needed to create, use & share data science. It also provides development environments in the form of Jupyter, JupyterLab, and now RStudio.

Gigantum Projects are repositories that organize and automatically version your data, code & environment configuration.

Gigantum Cloud is a storage & collaboration service for dissemination & sharing that provides backups & permissioned access for collaborators.

It also has an open and simple portal for public Projects.

2. What is a Project?

A Gigantum Project is a Git repository that bundles code, data and environment configuration along with a history of attributed actions and executions called the Activity Feed.

This is the folder structure of a Gigantum Project as seen on disk.

Projects are visible on disk & can be browsed like a directory but you typically interact with Projects in the Client.

So let’s get a Client’s eye view of a Project.

The demo Project

When you first login to the Client, you see the main Projects listing page. There will only be one Project, titled my-first-project, which analyzes per capita income against developer salary information in various countries. It is a mini-tutorial on basic and not so basic, aspects of Gigantum.

This is the Project overview page, and you can see the tabs for the 4 major components.

You will also see the “guide” as a bunch of yellow dots on top of the application. Mousing over the dots give information on different features. You can turn it off via the help menu on the bottom right hand side of the page.

Click on the Project card to go to the Overview page where there is a Readme and summary.

What is in a Project

Project have 4 major components: Code, Data, Environment & Activity.

Step 1: Look in the Code directory.

To use files in Gigantum, they have to be in a Project, typically in the Code, Input Data, or Output Data directories. So let’s look at those directories.

If you haven’t yet clicked the Project card, then do it to open the Overview.
Click the Code tab to see the filename for the demo notebook.

This directory is automatically versioned, and you should put your code here.

*Adding files in Gigantum is fairly easy. Just drag and drop or upload files using the buttons in the directories.*

Step 2: Look in the Input and Output Data Directories.

Input and Output Data directories are automatically versioned, and they are the proper locations for your data and outputs.

Click the Input Data tab to see an excel file and a blue sub directory. The blue sub directory is special, but ignore it for now.
Click the Output Data tab. You will only see the untracked sub directory.

The untracked sub directory is the exception to automated versioning. This sub directory is for things you don’t want versioned or syncing to the Cloud because they are sensitive, too large, or don’t actually need to be versioned.

Step 3: Inspect the Environment.

Each Project has a Docker container that provides a computational environment. The container is built automatically on Project creation, and it is modifiable (and versioned) throughout the Project’s life.

You can see configuration of that environment in the Environment tab.

Click on the Environment tab to see the installed pip packages.
Click on the conda tab, and you will see that there are no conda packages.
Click on the apt tab, and again there are no apt packages.

In our next post in this Quick-start series, we’ll dig into how you edit a Project’s environment, including how to use package managers and directly modify the Project’s Dockerfile.

3. What is the Activity Feed?

The history of a Project, i.e. who did what and when, is available in the Activity Feed. This feature dynamically captures and illustrates changes to files, as well as code executions and outputs.

Step 1: Viewing the history

The main thing the Activity Feed provides is a history of who did what, when.

In the Client, click on the Activity tab.
Scroll down and look at the individual entries. Click on the drop down arrow to see what files were altered and what code was executed.

The video is a little blurry, but it should give you the idea.

The Activity has time stamped entries indicating who did what. As you scroll, down you will see the username gigantum-examples. That is the robot we use to make our examples.

The Activity is a living thing, and the best way to see this is with a live execution environment.

Step 2: Start the Project container and launch a Jupyter notebook.

Let’s start a Project container and open the demo notebook. Even if you know nothing about containers, it doesn’t matter. The Client will handle it.

Click the blue Launch jupyterlab button on the tool bar.
If you prefer classic Jupyter, then use the drop down menu to select Notebook.
A Jupyter(Lab) instance will open in a new tab. Click the filename to open the notebook.

A close up of the Launch button in the top right hand side of the app. Note how to select classic notebook.

If there was a problem, then you probably have a popup blocker. Wrangle your blocker & then do the following:

Stop the Project container by clicking the status toggle to the right of the Launch button.
Once the toggle turns grey and says “Stopped”, go back and launch another Project container.

The status toggle is the basic “Off Button” for when you want to stop the Project container, e.g. when you are finished computing.

Step 3: View recent activity

The Activity keeps track of things as they happen.

The final thing to show in this post is how the Activity Feed updates while you work in your notebook.

As you work, Gigantum monitors Jupyter to extract information that later renders into the Activity Feed. The rendering happens when the kernel goes idle for a bit, & at that point, Gigantum automatically creates a new version and extracts results for viewing in the Activity Feed.

As you work, Gigantum will let you rollback to any point in time where data was written to disk. This means that if you write a file into the Output Data directory, or you hit save for the Jupyter notebook, or the Jupyter check pointing runs with whatever frequency, you will get a rollback point.

Even for executions that don’t provide a rollback point, the Activity Feed still provides an easy way to capture the code that was executed, because it will have its own entry.

What’s up next?

In our next post we will dig into the demo notebook and show a variety of things about the Activity, as well as show you how to Publish your Projects for the purposes of storage and sharing.

In Gigantum Quick-start #2 we’ll go over:

Using the Activity Feed to recover lost work.
Suppressing entries in the Activity Feed.
Publishing to Gigantum Cloud and importing cloud Projects.

Until next time!

Written by Tyler Whitehouse (CEO), Dean Kleissas (CTO), and Dav Clark (HDS) at Gigantum.

Follow us on Twitter or just try the software!

For the last two years we worked hard to develop the MIT licensed Gigantum Client. While it isn’t completely finished yet, you can read about it here, try it here, or even download it here.