Tracking, organization, and collaboration for data science projects with Neptune.

Jakub Czakon
Mar 12, 2019 · 5 min read

I am proud to announce that we have just made Neptune open and free for non-organizations!

Image for post
Image for post

Some of you are excited already, while the rest is probably thinking:

Neptune is an experiment tracking hub that brings organization and collaboration to your data science team. It is a hosted knowledge repository that you can share with anyone, provides back-ups and keeps your mind clear and at peace.

It works with any infrastructure setup, framework or working style, and lets you track the hyperparameters, metrics, code versions, data versions and more.

Once you have it safely logged to Neptune, you can organize it and share your knowledge and ideas with others and actually collaborate on data science projects.

Let me share this quick start project with you to show you what I mean.

Image for post
Image for post

Looks interesting?

Cool, let’s get you started then!

Get started

Sign up

Go to https://neptune.ml/ and sign up. It is free for non-organizations so you can work on your projects with Neptune for free. No strings attached.

Get your API token

In order to start working with Neptune, you need to get the API token first.
To do that, click on the `Get API Token` button on the top left.

Image for post
Image for post
Get API token.

Create a project

Click on Project and then New project.Choose a name for it and whether you want it public or private.

Image for post
Image for post
Create a project.

Invite others

Go to your project, click Settingsand send invites!

Image for post
Image for post
Invite people.

Install Neptune client

pip install neptune-client

Ok, it seems that you are ready to go. Let’s learn how to track stuff with Neptune.

Intro to tracking

Initialize Neptune

Toward the top of your script insert the following snippet.

You can (and should) keep your API token in the environmental variable NEPTUNE_API_TOKEN. If you do so, Neptune will find it and you wouldn’t have to pass it at initialization.

Create an experiment

You can treat every piece of work that you want to record as an experiment.
For example, let’s train a random forest model on a synthetic dataset.

We need some training and testing data so let’s use the make_classification function from sklearn.datasets to generate it.

Now, we can start experimenting. Let’s train a model with some (quite randomly chosen) hyperparameters.

Neptune records the parameters for every experiment and gives you a customizable dashboard to organize information however you want. You can filter on values of the hyperparameters to get a better idea of what is important to your model.

Image for post
Image for post
Track hyperparameters.

Track metrics

It is always a good idea to know your model performance so let’s score our model and log that information to Neptune.

You can sort by the best results and always know which idea is performing best.

Image for post
Image for post
Track metric.

If you want to track metrics after every step of training, all you need to do is send metrics to the same channel.

Neptune will automatically create charts for them.

Image for post
Image for post
Track metrics during training.

Track images

You can also log images to Neptune. You can either pass thePIL.Imageobject or a path to an image that you want to log. For example, you could log model diagnostics charts like ROC AUC curve or confusion matrix.

Your image is now attached to the experiment. You can compare diagnostic charts for different experiments, you can share it with others or come back to it later when you need it.

Image for post
Image for post
Model diagnostics channels.
Image for post
Image for post
Model diagnostics chart.

If you send multiple images to the same channel you will have a collection of images. Let’s see how it works:

Image for post
Image for post
Confusion matrix per threshold.

Track data versions

You can log data versions to Neptune to make sure that you are not comparing apples to oranges. The idea is very simple yet powerful. Calculate a hash of your data and save that as a string.

And Neptune will display a column with your data version. You can now filter on it and compare your models on the same data versions. Pretty sweet.

Image for post
Image for post
Data version.

Track artifacts

You can save model weights and any other artifact that you created during your experiment.

Now, whenever you want to share your best model with your colleague you can simply send her a link to theOutputs.

Image for post
Image for post
Track model weights.

Track code

You can track your codebase too.
Just choose the files that you want to send to Neptune and specify them when creating your experiment. It can save you a lot of trouble especially if you are doing a lot of quick and dirty work in jupyter notebooks.

Whenever you create an experiment a snapshot of the specified files will be logged to Neptune. Safe and simple.

Image for post
Image for post
Track your code.

Stop experiment

In order to stop the experiment, and mark it as succeeded, you need to explicitly stop your experiment.

You can also run your experiment using the with statement to make things cleaner and more pythonic:

Conclusions

Neptune is a simple, yet powerful tool that can track your work and keep it safe, organized and easy to share. You can now sleep peacefully and let your experiments run.

Image for post
Image for post
Sweet dreams :)

Oh, but before you do, don’t forget to visit https://neptune.ml/ and sign up 😃.

PS

For those of you that would like to see the full script for this tutorial.

Here it is:


If you liked this, you can find more posts like this on our Neptune blog.

Your feedback is more than welcome. You can find me tweeting @NeptuneML or experimenting @ neptune.ml/jakub-czakon

neptune-ai

The most lightweight experiment tracking tool

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store