This is Quick-start #1 for Gigantum, the first in a series going progressively deeper into the the platform’s capabilities.
In this first post we answer:
- What is Gigantum?
- What is a Project?
- What is the Activity Feed?
1. What is Gigantum?
Gigantum is a data science platform that lets you work locally but share globally with a single click. It combines a browser based work environment with a publishing platform to enable easy reproducibility of data science and scientific research.
The MIT licensed Gigantum Client runs locally in a Docker container and automates everything needed to create, use & share data science. It also provides development environments in the form of Jupyter, JupyterLab, and now RStudio.
Gigantum Projects are repositories that organize and automatically version your data, code & environment configuration.
Gigantum Cloud is a storage & collaboration service for dissemination & sharing that provides backups & permissioned access for collaborators.
It also has an open and simple portal for public Projects.
2. What is a Project?
A Gigantum Project is a Git repository that bundles code, data and environment configuration along with a history of attributed actions and executions called the Activity Feed.
Projects are visible on disk & can be browsed like a directory but you typically interact with Projects in the Client.
So let’s get a Client’s eye view of a Project.
The demo Project
When you first login to the Client, you see the main Projects listing page. There will only be one Project, titled
my-first-project, which analyzes per capita income against developer salary information in various countries. It is a mini-tutorial on basic and not so basic, aspects of Gigantum.
You will also see the “guide” as a bunch of yellow dots on top of the application. Mousing over the dots give information on different features. You can turn it off via the help menu on the bottom right hand side of the page.
Click on the Project card to go to the Overview page where there is a Readme and summary.
What is in a Project
Project have 4 major components: Code, Data, Environment & Activity.
Step 1: Look in the Code directory.
To use files in Gigantum, they have to be in a Project, typically in the Code, Input Data, or Output Data directories. So let’s look at those directories.
- If you haven’t yet clicked the Project card, then do it to open the Overview.
- Click the Code tab to see the filename for the demo notebook.
This directory is automatically versioned, and you should put your code here.
Step 2: Look in the Input and Output Data Directories.
Input and Output Data directories are automatically versioned, and they are the proper locations for your data and outputs.
- Click the
Input Datatab to see an excel file and a blue sub directory. The blue sub directory is special, but ignore it for now.
- Click the
Output Datatab. You will only see the
untracked sub directory is the exception to automated versioning. This sub directory is for things you don’t want versioned or syncing to the Cloud because they are sensitive, too large, or don’t actually need to be versioned.
Step 3: Inspect the Environment.
Each Project has a Docker container that provides a computational environment. The container is built automatically on Project creation, and it is modifiable (and versioned) throughout the Project’s life.
You can see configuration of that environment in the Environment tab.
- Click on the
Environmenttab to see the installed
- Click on the
condatab, and you will see that there are no conda packages.
- Click on the
apttab, and again there are no apt packages.
In our next post in this Quick-start series, we’ll dig into how you edit a Project’s environment, including how to use package managers and directly modify the Project’s Dockerfile.
3. What is the Activity Feed?
The history of a Project, i.e. who did what and when, is available in the Activity Feed. This feature dynamically captures and illustrates changes to files, as well as code executions and outputs.
Step 1: Viewing the history
The main thing the Activity Feed provides is a history of who did what, when.
- In the Client, click on the Activity tab.
- Scroll down and look at the individual entries. Click on the drop down arrow to see what files were altered and what code was executed.
The Activity has time stamped entries indicating who did what. As you scroll, down you will see the username
gigantum-examples. That is the robot we use to make our examples.
The Activity is a living thing, and the best way to see this is with a live execution environment.
Step 2: Start the Project container and launch a Jupyter notebook.
Let’s start a Project container and open the demo notebook. Even if you know nothing about containers, it doesn’t matter. The Client will handle it.
- Click the blue
Launch jupyterlabbutton on the tool bar.
- If you prefer classic Jupyter, then use the drop down menu to select Notebook.
- A Jupyter(Lab) instance will open in a new tab. Click the filename to open the notebook.
If there was a problem, then you probably have a popup blocker. Wrangle your blocker & then do the following:
- Stop the Project container by clicking the status toggle to the right of the Launch button.
- Once the toggle turns grey and says “Stopped”, go back and launch another Project container.
The status toggle is the basic “Off Button” for when you want to stop the Project container, e.g. when you are finished computing.
Step 3: View recent activity
The final thing to show in this post is how the Activity Feed updates while you work in your notebook.
As you work, Gigantum monitors Jupyter to extract information that later renders into the Activity Feed. The rendering happens when the kernel goes idle for a bit, & at that point, Gigantum automatically creates a new version and extracts results for viewing in the Activity Feed.
As you work, Gigantum will let you rollback to any point in time where data was written to disk. This means that if you write a file into the Output Data directory, or you hit save for the Jupyter notebook, or the Jupyter check pointing runs with whatever frequency, you will get a rollback point.
Even for executions that don’t provide a rollback point, the Activity Feed still provides an easy way to capture the code that was executed, because it will have its own entry.
What’s up next?
In our next post we will dig into the demo notebook and show a variety of things about the Activity, as well as show you how to Publish your Projects for the purposes of storage and sharing.
In Gigantum Quick-start #2 we’ll go over:
- Using the Activity Feed to recover lost work.
- Suppressing entries in the Activity Feed.
- Publishing to Gigantum Cloud and importing cloud Projects.
Until next time!