What to expect from Vertex TensorBoard

J Berzborn
6 min readJun 23, 2023

--

A bit more than 7 years have passed since Google has announced its alpha release of Cloud Machine Learning — a product that has grown into what is now called Vertex AI. There are many features that have been added over the past years. Among those is Vertex Experiments. It “lets you track and analyze different model architectures, hyperparameters, and training environments” with experiment results being accessible to your team members via the web UI.

As with Vertex as an overarching product, also the features of Vertex Experiments have grown over the past. And while there is a lot you can do with it, it also has its limitations. This is where Vertex TensorBoard comes into the picture. In this article I will explain how Vertex TensorBoard

  • extends the functionality of Vertex Experiments and how it
  • differs from its open source sibling.

Also, after reading the article you will know why there is a separate tab “TensorBoard instances” in the Google Cloud UI and whether you should care about it.

TensorBoard instances tab in the Google Cloud UI

How Vertex TensorBoard is integrated into Vertex Experiments

As said — Vertex Experiments has a lot of features and comes in quite handy when comparing machine learning experiments. Sascha Heyer has written a great article on its features. The article will also introduce you to its python API. The one thing which Vertex Experiments cannot handle internally is tracking time series entries. For this it relies on the backend implementation of Vertex TensorBoard.

Logging time series metrics is required if you need to track training metrics such as the training loss over time. Good news first: If this is the only thing you would like to achieve, you do not need to dig too deep into Vertex TensorBoard and the associated TensorBoard instances. Google will handle everything for you. All you need to do is call aiplatform.log_time_series_metrics() in the scope of your Vertex ExperimentRun.

Below you will find a synonymous implementation that calls run.log_time_series_metrics() from within a python context manager. To provide an impression on how a full implementation could look like, I have also integrated the calls to the Vertex Experiment “native” parameter and metric logging.

In the UI you can then click on “Open TensorBoard” to view the logged time series metrics. A separate tab will open and take you to the TensorBoard UI, in which your time series logs have been integrated.

What will happen in the background

Not only is the time series data shown in a separate UI — it is also stored differently from the logged parameters and metrics. This is mostly due to technical reasons on Google’s side.

The call to aiplatform.init(…, experiment=’experiment-name’) has checked the availability of a TensorBoard instance in the background (this is new since aiplatform==1.25.0). In case there was no instance yet, it has created a default one for you and assigned it to the experiment. Simplifying things a bit, you can think of a TensorBoard instance as a database. That database is able to store information on objects, which it references as TensorBoard experiments. For each of these objects it is able to store more detailed information — in our example, this would be the time series metrics.

Both the creation of TensorBoard experiment and the storage of your data will be taken care of by Google and you do not need to worry about this. The only scenario which requires you to interfere is when you need to restrict access to your time series logs or provide access to different people for different experiments. Within a given project this can only be achieved if you use separate Vertex TensorBoard instances for each experiment. To achieve this, a custom TensorBoard instance can be assigned to an experiment upon initialisation of the experiment via the parameter experiment_tensorboard:

The parameter tensorboard_ressource_name will be a string of the form projects/{your-project}/locations/{your-location}/tensorboards/{tensorboard-instance-id}, where the instance ID can also be found in the Google Cloud UI.

Please note: The backing TensorBoard instance associated to your Vertex Experiment cannot be changed. Please do not get surprised by the assign_backing_tensorboard method of your Vertex Experiment python object. This method can only be called once. This is because with creation of the Vertex TensorBoard Experiment object multiple database entries have been created. There is currently no mechanism to transfer them to another TensorBoard instance. — Nothing for you to worry about, if you have followed me so far.

A word on costs

While Vertex Experiments itself is free of charge and you only pay for the metadata storage costs associated to your stored parameters, metrics and artifacts, the same is not (yet) true for Vertex TensorBoard. Currently, you will be billed $300/month per active user. Access to TensorBoard is not granted per default.

Vertex TensorBoard access is currently disabled per default

Fortunately, the pricing model will be unified with the pricing for Vertex Metadata starting August 2023 — soon you will only pay a small amount for the storage costs of your logging data in Vertex TensorBoard.

That being said, Vertex TensorBoard would not be named after its open source sibling, if its usage was restricted to logging time series metrics through the Google Cloud SDK.

Why you should use TensorBoard

TensorBoard is an open source tool maintained by the TensorFlow community. It lets you compare different model training runs across time series and scalar metrics, helps you visualise the distribution of your model weights across time and debug computational performance. See the projects documentation on how to get started.

TensorBoard web UI

Using TensorBoard in addition to Vertex Experiments is a great idea if you need to debug your model in detail, or if you would like to log custom generated charts across your training runs. E.g., in above screenshot I have saved a custom precision-recall plot to the TensorBoard logs. Storing an image this way is as easy as:

This is just an example of how easy TensorBoard can be used to customize your experiment tracking. For more features please refer to the TensorBoard documentation.

How Vertex TensorBoard is different (it’s not — just faster)

Vertex TensorBoard is build upon the open source TensorBoard. The main difference is the way the logs are stored. I have already touched upon this earlier in this article: While TensorBoard logs are stored as files in your local file system or your GCS bucket on Google Cloud, Vertex TensorBoard uses a database to store its logs.

This is especially beneficial to you as a user, once the number of experiment runs grows and the TensorBoard UI gets loaded with data. While for the open source version a data refresh can take some time, your Vertex TensorBoard will fetch the changes in a fraction of the time.

Also, there is no code adjustment required from your side. If you are already writing TensorBoard logs to your local file system, Google’s recommendation is to use the continuous upload functionality. The upload will be handled in a separate thread and keeps monitoring whether there’s new data in your local log directory.

What other benefits do I get?

Besides the performance improvement, let me close this article with two additional benefits that come with Vertex TensorBoard:

  • You will get a persistent, shareable link to your experiment’s dashboard.
  • When you set up Cloud Profiler, it will be integrated immediately into your TensorBoard UI.
TensorBoard Profiler UI

--

--