Hopsworks in GCP

Gonzalo Gasca Meza
5 min readOct 28, 2019

--

Hopsworks is an open source project-based multi-tenant platform for both data parallel programming and horizontally scalable machine learning pipelines based on

multiple open-source projects and using Jupyter Notebooks as the main interface for users.

Hopsworks provides a Notebook centric experience which complements the existing GCP Spark offering for Machine Learning pipelines.

Hopsworks provides Feature Store, GPU support, parallel experimentation and distributed training using Spark for TensorFlow and Pytorch.

Hopsworks added support for GCP, which allow users to get access to an experimental image.

In this post we will show you how will be able to get access to this Hopsworks experimental image without any additional configuration steps.

What is HopsWorks?

Hopsworks is an on-prem managed platform for scale-out data science, with support for both GPUs and Big Data, in a familiar development environment. Hopsworks can be used either through its User-Interface or via a REST API. Hopsworks unique features are:

  • A user-friendly UI for development with the latest open-source platforms for Data Science (Jupyter, Conda, etc),
  • Github-like Projects to manage teams/products/workflows/data,
  • Managed GPUs as a Resources — scale out Deep Learning training and hyperparameter optimization,
  • The world’s fastest, most-scalable distributed hierarchical file system, HopsFS,
  • A REST API for the whole Hopsworks Platform,
  • A TLS Certificate based security model with extensive auditing and data provenance capabilities,
  • End-to-end support for Python-based Deep Learning workflows with: a Feature Store, Data and Model Validation, Model Serving on Kubernetes, workflow orchestration in Airflow.

Hopsworks supports the following open-source platforms for Data Science:

  • Development: Jupyter, plugin to IDEs (vi the REST API), Conda/Pip;
  • Machine learning frameworks: TensorFlow, Keras, PyTorch, ScikitLearn;
  • Data analytics and BI: SparkSQL, Hive;
  • Stream processing: Spark streaming, Flink, Kafka;
  • Model serving: Kubernetes/Docker.

Installation overview

Due to the complexity of installing and configuring all Hopsworks’ services, we recommend installing Hopsworks using the automated installer Karamel/Chef, http://www.karamel.io. For the existing open source version there is no detailed documentation on the steps for installing and configuring all services in Hopsworks. Instead, Chef cookbooks contain all the installation and configuration steps needed to install and configure Hopsworks. The Chef cookbooks are available at https://github.com/logicalclocks.

Installation in Google Cloud Platform

We have created a custom image for you to make this easier, this guide will help you setup Hopsworks in Google Cloud Platform.

Costs

The cost of running this tutorial varies by section.

The estimated price to install Hopsworks, is approximately USD $35 per day. This cost is estimated based on the following specifications:

  • 1 preemptible VM instance: n1-standard-32 (vCPUs: 32, RAM 120GB)
  • 4 NVIDIA Tesla V100 GPU

These costs were estimated by using the pricing calculator.

Project setup

Before you can run any of the samples in this repository, you’ll need to setup your GCP account and environment. The main steps are:

  1. Have a GCP account and create/select a GCP project on GCP Console.
  2. Enable the billing for your GCP project. Click here for more information.

In order to install the Google Cloud version of Hopsworks open source platform follow the next steps:

Before you can create a Compute Instance with Hopsworks, you need to import the Public Image.

Create a Hopsworks Image

Open Google Cloud Console > Compute Engine > Create Image

Name: hopsworks-0–10

Source: Cloud Storage File: hopsworks/hopsworks-0–10–0.tar.gz

Leave the other fields as default.

Click on the Create button

Note: The image import process may take between 25–30 minutes, you can go grab a coffee and come back! You need to do this only once.

Once is image is created, you can go ahead and create a Compute Instance.

Create Compute Instance

Open Google Cloud Console > Compute Engine > Create instance

Name: hopsworks-0–10

Region: Select your region

Zone: Select your zone

Machine type: n1-standard-32*

  • You can use n1-standard-8 or higher

Boot disk: Click change and select Custom Images tab, from there select:

Hopsworks-0–10–0

Boot disk size: 64GB

GPU: 4 NVIDIA TESLA V100

Firewall: Check Allow HTTPS

Click on the Create button

Image creation will take a couple of minutes, once image is created you can access the Hopsworks UI by entering the Public address of Compute Instance:

https:/<Public IP Address>/hopsworks/#!/login

Username: admin@hopsworks.ai

Password: admin

You should be able to land into HopsWorks main page!

You can click on the Deep Learning Tour and this will create a new project.

You will see different options including the available GPUs.

Once Demo project is created, you will be able to access and use Hopsworks platform!

Next steps

To get started, please follow the documentation and use the different examples available here.

Jim Dowling has put together a list of videos which will guide you through Hopsworks Open Source platform in Google Cloud.

--

--