Hopsworks is an open source project-based multi-tenant platform for both data parallel programming and horizontally scalable machine learning pipelines based on
multiple open-source projects and using Jupyter Notebooks as the main interface for users.
Hopsworks provides a Notebook centric experience which complements the existing GCP Spark offering for Machine Learning pipelines.
Hopsworks provides Feature Store, GPU support, parallel experimentation and distributed training using Spark for TensorFlow and Pytorch.
Hopsworks added support for GCP, which allow users to get access to an experimental image.
In this post we will show you how will be able to get access to this Hopsworks experimental image without any additional configuration steps.
What is HopsWorks?
Hopsworks is an on-prem managed platform for scale-out data science, with support for both GPUs and Big Data, in a familiar development environment. Hopsworks can be used either through its User-Interface or via a REST API. Hopsworks unique features are:
- A user-friendly UI for development with the latest open-source platforms for Data Science (Jupyter, Conda, etc),
- Github-like Projects to manage teams/products/workflows/data,
- Managed GPUs as a Resources — scale out Deep Learning training and hyperparameter optimization,
- The world’s fastest, most-scalable distributed hierarchical file system, HopsFS,
- A REST API for the whole Hopsworks Platform,
- A TLS Certificate based security model with extensive auditing and data provenance capabilities,
- End-to-end support for Python-based Deep Learning workflows with: a Feature Store, Data and Model Validation, Model Serving on Kubernetes, workflow orchestration in Airflow.
Hopsworks supports the following open-source platforms for Data Science:
- Development: Jupyter, plugin to IDEs (vi the REST API), Conda/Pip;
- Machine learning frameworks: TensorFlow, Keras, PyTorch, ScikitLearn;
- Data analytics and BI: SparkSQL, Hive;
- Stream processing: Spark streaming, Flink, Kafka;
- Model serving: Kubernetes/Docker.
Due to the complexity of installing and configuring all Hopsworks’ services, we recommend installing Hopsworks using the automated installer Karamel/Chef, http://www.karamel.io. For the existing open source version there is no detailed documentation on the steps for installing and configuring all services in Hopsworks. Instead, Chef cookbooks contain all the installation and configuration steps needed to install and configure Hopsworks. The Chef cookbooks are available at https://github.com/logicalclocks.
Installation in Google Cloud Platform
We have created a custom image for you to make this easier, this guide will help you setup Hopsworks in Google Cloud Platform.
The cost of running this tutorial varies by section.
The estimated price to install Hopsworks, is approximately USD $35 per day. This cost is estimated based on the following specifications:
- 1 preemptible VM instance: n1-standard-32 (vCPUs: 32, RAM 120GB)
- 4 NVIDIA Tesla V100 GPU
These costs were estimated by using the pricing calculator.
Before you can run any of the samples in this repository, you’ll need to setup your GCP account and environment. The main steps are:
- Have a GCP account and create/select a GCP project on GCP Console.
- Enable the billing for your GCP project. Click here for more information.
In order to install the Google Cloud version of Hopsworks open source platform follow the next steps:
Before you can create a Compute Instance with Hopsworks, you need to import the Public Image.
Create a Hopsworks Image
Open Google Cloud Console > Compute Engine > Create Image
Source: Cloud Storage File: hopsworks/hopsworks-0–10–0.tar.gz
Leave the other fields as default.
Click on the Create button
Note: The image import process may take between 25–30 minutes, you can go grab a coffee and come back! You need to do this only once.
Once is image is created, you can go ahead and create a Compute Instance.
Create Compute Instance
Open Google Cloud Console > Compute Engine > Create instance
Region: Select your region
Zone: Select your zone
Machine type: n1-standard-32*
- You can use n1-standard-8 or higher
Boot disk: Click change and select Custom Images tab, from there select:
Boot disk size: 64GB
GPU: 4 NVIDIA TESLA V100
Firewall: Check Allow HTTPS
Click on the Create button
Image creation will take a couple of minutes, once image is created you can access the Hopsworks UI by entering the Public address of Compute Instance:
You should be able to land into HopsWorks main page!
You can click on the Deep Learning Tour and this will create a new project.
You will see different options including the available GPUs.
Once Demo project is created, you will be able to access and use Hopsworks platform!