Tracking, organization, and collaboration for data science projects with Neptune.

Jakub Czakon
Mar 12 · 5 min read

I am proud to announce that we have just made Neptune open and free for non-organizations!

Some of you are excited already, while the rest is probably thinking:

Ok, that’s cool but what the heck is Neptune?

Neptune is an experiment tracking hub that brings organization and collaboration to your data science team. It is a hosted knowledge repository that you can share with anyone, provides back-ups and keeps your mind clear and at peace.

It works with any infrastructure setup, framework or working style, and lets you track the hyperparameters, metrics, code versions, data versions and more.

Once you have it safely logged to Neptune, you can organize it and share your knowledge and ideas with others and actually collaborate on data science projects.

Let me share this quick start project with you to show you what I mean.

Looks interesting?

Cool, let’s get you started then!

Get started

Sign up

Go to https://neptune.ml/ and sign up. It is free for non-organizations so you can work on your projects with Neptune for free. No strings attached.

Get your API token

In order to start working with Neptune, you need to get the API token first.
To do that, click on the `Get API Token` button on the top left.

Create a project

Click on Project and then New project.Choose a name for it and whether you want it public or private.

Invite others

Go to your project, click Settingsand send invites!

Install Neptune client

pip install neptune-client

Ok, it seems that you are ready to go. Let’s learn how to track stuff with Neptune.

Intro to tracking

Initialize Neptune

Toward the top of your script insert the following snippet.

You can (and should) keep your API token in the environmental variable NEPTUNE_API_TOKEN. If you do so, Neptune will find it and you wouldn’t have to pass it at initialization.

Create an experiment

You can treat every piece of work that you want to record as an experiment.
For example, let’s train a random forest model on a synthetic dataset.

We need some training and testing data so let’s use the make_classification function from sklearn.datasets to generate it.

Now, we can start experimenting. Let’s train a model with some (quite randomly chosen) hyperparameters.

Neptune records the parameters for every experiment and gives you a customizable dashboard to organize information however you want. You can filter on values of the hyperparameters to get a better idea of what is important to your model.

Track metrics

It is always a good idea to know your model performance so let’s score our model and log that information to Neptune.

You can sort by the best results and always know which idea is performing best.

If you want to track metrics after every step of training, all you need to do is send metrics to the same channel.

Neptune will automatically create charts for them.

Track images

You can also log images to Neptune. You can either pass thePIL.Imageobject or a path to an image that you want to log. For example, you could log model diagnostics charts like ROC AUC curve or confusion matrix.

Your image is now attached to the experiment. You can compare diagnostic charts for different experiments, you can share it with others or come back to it later when you need it.

If you send multiple images to the same channel you will have a collection of images. Let’s see how it works:

Track data versions

You can log data versions to Neptune to make sure that you are not comparing apples to oranges. The idea is very simple yet powerful. Calculate a hash of your data and save that as a string.

And Neptune will display a column with your data version. You can now filter on it and compare your models on the same data versions. Pretty sweet.

Track artifacts

You can save model weights and any other artifact that you created during your experiment.

Now, whenever you want to share your best model with your colleague you can simply send her a link to theOutputs.

Track code

You can track your codebase too.
Just choose the files that you want to send to Neptune and specify them when creating your experiment. It can save you a lot of trouble especially if you are doing a lot of quick and dirty work in jupyter notebooks.

Whenever you create an experiment a snapshot of the specified files will be logged to Neptune. Safe and simple.

Stop experiment

In order to stop the experiment, and mark it as succeeded, you need to explicitly stop your experiment.

You can also run your experiment using the with statement to make things cleaner and more pythonic:

Conclusions

Neptune is a simple, yet powerful tool that can track your work and keep it safe, organized and easy to share. You can now sleep peacefully and let your experiments run.

Oh, but before you do, don’t forget to visit https://neptune.ml/ and sign up 😃.

PS

For those of you that would like to see the full script for this tutorial.

Here it is:


Your feedback is more than welcome. You can find me tweeting @NeptuneML or experimenting @ neptune.ml/jakub-czakon

neptune-ml

Data science collaboration hub.

Jakub Czakon

Written by

neptune-ml

Data science collaboration hub.