Setting up the Eclipse Theia IDE on an Amazon SageMaker Notebook Instance
Jupyter Notebooks provide useful environments to interactively explore and experiment during an ML project. However, by helping many teams deliver ML solutions for large enterprises on AWS, I often noticed a time in the project when data scientists and ML engineers needed to work with a full-fledged cloud-based IDE offering better code-completion and debugging capabilities for containers running in SageMaker.
Amazon SageMaker is a fully managed service bringing together a broad set of capabilities to help you prepare, build, train, and deploy high-quality ML models quickly.
Using SageMaker Studio or Notebook Instances are good options for you to access an ML environment running Jupyter with compute and pre-installed popular ML libraries. SageMaker manages the creation of the underlying instances and resources so you can get started quickly in your ML project.
The Eclipse Theia IDE
Eclipse Theia is a cloud & desktop IDE framework implemented in TypeScript. It reuses components like the Monaco code editor from the VS Code open source project and can run on a web-browser. See here for a comparison of Theia with VS Code.
It is compatible with VS Code in terms of extensions which means you can install many of the existing extensions without modification! You can also add new features to the IDE via plugins. See here for differences between extensions and plugins
In the following section I will show you how to automate this setup with a Lifecycle Configuration.
Running Theia on a Notebook Instance
The following SageMaker lifecycle configuration installs Theia on your notebook instance and configures the Jupyter server proxy so you can access the IDE alongside your Jupyter notebooks when the instance is running.
For your convenience, I have set example package.json, settings.json, and launch.json files in my github repository to setup Theia. Feel free to adjust the IDE features by following instructions here.
I have set the following line in settings.json so Theia can use the Anaconda environments from the notebook instance:
Create the Theia Lifecycle Configuration
- Go to SageMaker console
- Select Lifecycle configurations
- Go to Create configuration and give it a name.
- Copy the script above and paste the content of into Start notebook
- Attach the Lifecycle configuration to your instance
Start your notebook instance and access the IDE
Starting your notebook instance should take roughly 5 minutes. When the instance is InService, you can launch Theia by clicking on the following button:
You can also chose the python interpreter to be from an Anaconda environment of the Notebook Instance