qri.io
Published in

qri.io

An Intro to Qri Command Line

Qri is ultimately about collaborating on datasets, and the command line is a great place to start as it shows off a lot of what qri can do. We cut this video (and accompanying blog post) to help you get started.

Download Qri CLI & Get Started

curl -fsSL https://qri.io/install.sh | sh

Video Breakdown

0:47 Save a dataset

Once you’ve got qri installed, first step is creating a dataset. In a sense, I’m introducing a file I already have locally (synths.csv) to qri via the ‘save’ command:

$ qri save --body synths.csv

Proof the dataset saved:

More on that ‘body’ flag later. For now it’s worth knowing that my synths.csv file has been saved to a qri repository.

1:43 Your Local Qri repo

To see what datasets you have in your qri repo at any given point, run ‘qri list’:

$ qri list

If you’ve already created a few datasets, as I have, you’ll get something like this:

2:20 Push to qri.cloud

Next step is to push that dataset to qri cloud using the ‘qri push’ command:

$ qri push me/synths

Once you’ve pushed the dataset, you (and the world) can view on the cloud. In this case, the url generated would be https://qri.cloud/b5/synths. Voilà:

3:25 Dataset Components

Qri datasets are more than just data. This is our point of view on what makes others’ datasets easier to work with. If you’re going to use someone else’s data/information, you need to be able to understand what it is for yourself. Qri dataset components help you do this.

See also: What are dataset components on Qri?

Using HTML as an analogy, the ‘body’ of a qri dataset is kind of like the ‘body’ of an HTML page. The ‘body’ is…the data, the stuff we care the most about.

You can read all about the other Qri dataset components here.

4:05 Version Control

Qri is, at its core, about version control. This is the most important difference between Qri and any other dataset tool. Every dataset in qri is versioned, and new versions can be created very easily, again, with the save command. In the example below, we’re creating a new version of our Synths dataset by adding two new components, a readme.md file and a meta.json file with the following command(s):

$ qri save --file readme.md --filemeta.json me/synths

You can see that new version (and eventually all previous versions) with ‘qri log’ command:

$ qri log me/synths

The result is a new version!

You may also notice qri has inferred a commit message, “updated meta and readme.” This is particularly useful when qri is working inside of data pipelines, where machines are doing much of the data movement and manipulation, and are unable to add human-readable context to key changes.

Using the push command you’re already familiar with, you can now share this new version (enhanced with metadata and a readme) with the world on qri cloud.

5:46 Structure

Qri automatically infers and assigns a structure (or, schema) to datasets, which define how a dataset becomes machine-readable. Structure includes: format (CSV, JSON), & the schema (JSON schema, used by OpenAPI). In this case qri correctly identified the data (body) content in columns 1 & 2 as strings, and column 3 as integers.

A view on the Structure component of a dataset on qri.cloud

This comes in handy for the next step…

6:20 Dataset SQL

Let’s pretend another user comes along named, “b6." b6 can use the command ‘qri sql’ to run SQL directly against any dataset, even those b6 does not have.

$ qri sql 

In the example below, b6 joins two datasets with similar structures — and therefore are joinable using country codes as the primary key.

Qri then finds those datasets, pulls the latest versions down to my local repository, runs the sql command, and spits out the join:

FUN!

From here, once you have qri installed on your machine, you can give someone an SQL statement. When they run it, you’ll know the statement is being run against the latest versions of the datasets in question.

Using the qri log command…

$ qri log b5/world_bank_population 

…returns a log of that dataset. This will show you which versions of the dataset you have locally (local), and which are held by others (remote) — either as peers or on qri cloud.

You can easily fetch older versions and work with them directly.

Read also: “Putting the Query back into Qri”

8:30 Local Checkout

Turn that dataset back into individual files with the ‘checkout’ command:

$ qri checkout b5/world_bank_population

This creates a linked working directory with which I can apply other tools. Here you see the components ‘broken up’ into normal, standard files (JSON, markdown, CSV) other tools and apps know how to work with:

Qri Pull

Data changes (updates) all the time. To be assured you’re working with the latest version of a dataset, use the ‘qri pull’ command:

$ qri pull b5/world_bank_population

9:17 Conclusion

We think these features add up to more than the sum of their parts. Among other key benefits, Qri datasets are:

  • interoperable
  • easier to version
  • easier to move
  • easier to understand and contextualize as you inspect and prepare to work with the data (body).
  • …and ultimately easier to collaborate on, which means less work for everybody.

It can be very difficult to rely on one another’s data, so we need version histories and consistent structures to help us stay organized and informed. We hope making data easier to work with in this way will bring down the barriers preventing us working on data together, and allow us to build on each other’s work.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rico Gardaphe

Rico Gardaphe

11 Followers

Head of Business Development for Qri — free and open source dataset versioning software. Former strategy consultant and Obama White House staffer.