How to prepare your Azure environment to use Azure ML Studio

Gabriel Reversi
7 min readMay 1, 2023

--

How to prepare your environment for building a pipeline in Azure ML Studio.

Azure Machine Learning (Azure ML) is a cloud-based machine learning service provided by Microsoft that enables users to build, train, and deploy machine learning models at scale. Azure ML provides a wide range of tools and services for machine learning, including data preparation, experimentation, model training, and deployment.

Azure Machine Learning Studio is a web-based integrated development environment (IDE) that allows users to build, train, and deploy machine learning models using drag-and-drop interfaces, pre-built templates, and a wide range of machine learning algorithms. Azure ML Studio also provides access to various Azure services, including Azure Databricks, Azure Data Factory, and Azure Cognitive Services.

In Azure ML Studio, users can create and manage experiments, which are workflows that define the steps needed to build and train a machine learning model. Users can choose from a wide range of algorithms, and can also use Python or R scripts for more advanced customization. Additionally, Azure ML Studio provides tools for data preparation, feature engineering, and model evaluation, allowing users to build and deploy models quickly and easily.

Overall, Azure Machine Learning and Azure Machine Learning Studio provide a powerful and flexible platform for building and deploying machine learning models at scale, with a wide range of tools and services to support the entire machine learning lifecycle.

The purpose of this article is to show you how to build a simple pipeline into the Azure ML Studio if you have never seen it before.

Create an Azure Machine Learning resource

The first thing that you need is to have a resource group created. Because you’ll create your Azure Machine Learning resource in this resource group. This is for all recourses or services that you will create on the Azure portal, all need to be in the resource group.

As you can see below, you will look for “Azure Machine Learning” on the search bar or the button Create a resource.

https://cdn-images-1.medium.com/max/800/1*OWOSr_Yy9TMnZDIhDYCrYQ.png

Click on Azure Machine Learning service found and after that click on Create > New Workspace, like the image below.

https://cdn-images-1.medium.com/max/800/1*i9AIMh8aDGgR2VM6qtiSGw.png

As the purpose of this article is to show you how to build a pipeline, we won’t enter the details of configuration to create this resource, then just fill in the fields Resource group, Workspace name, and Region. After that, click on Review + Create.

https://cdn-images-1.medium.com/max/800/1*rZ0Iayuqfx5vqq-R4KlI4A.png

After creating this resource you will see inside the resource group other services besides Azure Machine Learning. These services are created automatically.

  • Application Insights: Used to monitoring predict services in the workspace.
  • Storage Account: Used for storing notebooks and files that will be used to train models.
  • Key Vault: Used for storing secrets, authentication keys, and credentials used in the workspace.
https://cdn-images-1.medium.com/max/800/1*wRbfJ3twttR21lGLoZGzBQ.png

Before finishing this step you need to create a blob container in your Storage Account to store the data and use it to train the model.

Create a blob container

You can use the Storage Account service created automatically by Azure previously to create a blob container. But in some cases is better you create another storage account to manage your data independent of the machine learning service.

Creating a new storage account service is the same way that we created the Azure Machine Learning resource. Look for Storage Account on the search bar and click Create.

And fill in the fields, Resource group, and Storage Account name.

Now we have two storage accounts in our resource group. In the last one, we will click it and click on Containers, and after click on + Container.

Next, enter a name for your container and click on Create. Finally, your container was created and now you will be able of making upload data.

Upload the Data

After that, you need to make upload the data for the container. Just click on the container that you created, click on Upload, and find your file.

And that’s it, now you will see the file inside to container, like below.

Now we can go to what matters, build the first pipeline of machine learning in Azure ML Studio.

Power Computation

This step is where you go to apply the most steps in the machine learning project, like data exploration, data transforms, and training the model. Go to your Azure Machine Learning service inside the resource group and click on Launch Studio.

Before we need to create a cluster. What does this mean? Always that you train the model and execute a pipeline you will need the computation power where you go to execute the training processing and data transformation. In some cases, the computation power will have to be high or low and you be able to choose the ideal machine for each situation.

For this article, I set the lowest configuration for this compute cluster and you can do the same.

So, if you look closely is possible to create other options besides Computer Cluster. On the top, you can see Compute Instances, Kubernetes Clusters, and Attached Compute. We go to keep the focus on Computer Cluster and Compute Instances.

Compute Clusters are designed for long-running jobs such as training complex machine learning models, while Compute Instances are designed for shorter tasks such as data visualization and code debugging.

If you’re running an experiment that involves training complex models or compute-intensive tasks, a Compute Cluster is likely the best option. If you need a computing environment for shorter or interactive tasks such as data visualization or code debugging, a Compute Instance may be the best option.

Additionally, you may consider the size and resource capacity required for your experiment. Compute Clusters can be scaled vertically and horizontally, meaning you can increase or decrease the number of available cores and memory, while Compute Instances have fixed resource capacity.

In summary, the choice between Compute Clusters and Compute Instances depends on the specific needs of your experiment. For shorter or interactive tasks, a Compute Instance may be sufficient, while for long-running or compute-intensive tasks, a Compute Cluster is generally more suitable.

Create a Datastore in Azure ML Studio

To finish this part of environment preparation we will do the last task which is to bring the data that we did upload in the container for Azure ML Studio. This is important because through this will be able to build the pipeline and train the model.

Click in Data, Data Assets, and + Create.

In the first step, you need to fill in the data name field and click on Next. Second, you will choose From Azure Store and click on Next.

Click on Create Datastore and fill in the same fields as the image below. Datastore Name, Choose the last storage account that you created previously and choose the blob container that you created too.

The last field calls Account key you will find this information in the Storage Account sidebar, in the Access key. Just copy the value and paste it there.

Click on Create, select the Datastore created, and click on Next. In step 4 just select the path of the blob container and click on Next. In step 5 you will see if the data are correct in table format.

If some type of data doesn't be correct you can change it in step 6.

If everything is correct you can click on Create.

Conclusion

Preparing your Azure environment before using Azure ML Studio is an essential step in ensuring that your machine learning workflows run smoothly and efficiently. By following the steps outlined in this article, you can create and configure the necessary resources and access keys needed to get started with machine learning on Azure.

In the next part of this series, we’ll dive deeper into using Azure ML Studio, exploring its features and capabilities, and demonstrating how to create, train, and deploy machine learning models on the platform.

--

--

Gabriel Reversi

Hi, I'm data analyst and data scientist. Here I share content about data, tools, methods and business. https://www.linkedin.com/in/gabrielreversi/