Tracking, organization, and collaboration for data science projects with Neptune.
I am proud to announce that we have just made Neptune open and free for non-organizations!
Some of you are excited already, while the rest is probably thinking:
Ok, that’s cool but what the heck is Neptune?
Neptune is an experiment tracking hub that brings organization and collaboration to your data science team. It is a hosted knowledge repository that you can share with anyone, provides back-ups and keeps your mind clear and at peace.
It works with any infrastructure setup, framework or working style, and lets you track the hyperparameters, metrics, code versions, data versions and more.
Once you have it safely logged to Neptune, you can organize it and share your knowledge and ideas with others and actually collaborate on data science projects.
Let me share this quick start project with you to show you what I mean.
Cool, let’s get you started then!
Go to https://neptune.ml/ and sign up. It is free for non-organizations so you can work on your projects with Neptune for free. No strings attached.
Get your API token
In order to start working with Neptune, you need to get the API token first.
To do that, click on the `Get API Token` button on the top left.
Create a project
Project and then
New project.Choose a name for it and whether you want it public or private.
Go to your project, click
Settingsand send invites!
Install Neptune client
pip install neptune-client
Ok, it seems that you are ready to go. Let’s learn how to track stuff with Neptune.
Intro to tracking
Toward the top of your script insert the following snippet.
You can (and should) keep your API token in the environmental variable
NEPTUNE_API_TOKEN. If you do so, Neptune will find it and you wouldn’t have to pass it at initialization.
Create an experiment
You can treat every piece of work that you want to record as an experiment.
For example, let’s train a random forest model on a synthetic dataset.
We need some training and testing data so let’s use the
make_classification function from
sklearn.datasets to generate it.
Now, we can start experimenting. Let’s train a model with some (quite randomly chosen) hyperparameters.
Neptune records the parameters for every experiment and gives you a customizable dashboard to organize information however you want. You can filter on values of the hyperparameters to get a better idea of what is important to your model.
It is always a good idea to know your model performance so let’s score our model and log that information to Neptune.
You can sort by the best results and always know which idea is performing best.
If you want to track metrics after every step of training, all you need to do is send metrics to the same channel.
Neptune will automatically create charts for them.
You can also log images to Neptune. You can either pass the
PIL.Imageobject or a path to an image that you want to log. For example, you could log model diagnostics charts like ROC AUC curve or confusion matrix.
Your image is now attached to the experiment. You can compare diagnostic charts for different experiments, you can share it with others or come back to it later when you need it.
If you send multiple images to the same channel you will have a collection of images. Let’s see how it works:
Track data versions
You can log data versions to Neptune to make sure that you are not comparing apples to oranges. The idea is very simple yet powerful. Calculate a hash of your data and save that as a string.
And Neptune will display a column with your data version. You can now filter on it and compare your models on the same data versions. Pretty sweet.
You can save model weights and any other artifact that you created during your experiment.
Now, whenever you want to share your best model with your colleague you can simply send her a link to the
You can track your codebase too.
Just choose the files that you want to send to Neptune and specify them when creating your experiment. It can save you a lot of trouble especially if you are doing a lot of quick and dirty work in jupyter notebooks.
Whenever you create an experiment a snapshot of the specified files will be logged to Neptune. Safe and simple.
In order to stop the experiment, and mark it as succeeded, you need to explicitly stop your experiment.
You can also run your experiment using the
with statement to make things cleaner and more pythonic:
Neptune is a simple, yet powerful tool that can track your work and keep it safe, organized and easy to share. You can now sleep peacefully and let your experiments run.
Oh, but before you do, don’t forget to visit https://neptune.ml/ and sign up 😃.
For those of you that would like to see the full script for this tutorial.
Here it is: