Developing Large AI Models Faster: Leverage SFS Turbo for High-Performance ModelArts Workflows

Hüseyin Çayırlı
Huawei Developers
Published in
7 min readJun 10, 2024
SFS Architecture

Introduction

Hello everyone. In this article, we will introduce SFS Turbo and ModelArts, two Huawei Cloud services for developers who want to accelerate the development of AI applications.

SFS Turbo is a high-performance and scalable network-attached storage (NAS) solution that provides shared file access between containers running on multiple Elastic Cloud Servers (ECS), Bare Metal Servers (BMS), and Cloud Container Engine (CCE). This allows you to store all the data you need for your AI projects in a single, accessible location.

ModelArts is a comprehensive AI development platform that enables all developers and data scientists to quickly build, train, and deploy models. With features such as data preprocessing and automatic labeling, distributed training, automatic model building, and code-free workflows, ModelArts significantly accelerates the AI development process.

SFS Turbo

AI projects often work with large amounts of data. Processing and storing this data requires a reliable, high-performance storage method. Huawei Cloud’s Scalable File Service (SFS Turbo) offers many advantages over traditional file storage methods, speeding up your AI development process.

Advantages of SFS Turbo

  • File Sharing: You can simultaneously access the same file system and share files from multiple Availability Zone (AZ) servers in the same region. This keeps all the data you need for your AI projects in one centralized location and facilitates collaboration.
  • Elastic Scaling: Scale storage up or down with a few clicks to dynamically adapt to service changes. This ensures that applications are resourced without interruption.
  • Superior Performance and Reliability: SFS Turbo enables file system performance to increase as capacity increases, providing high data durability. The back-end storage system supports both HDD and SSD storage media.
  • Seamless Integration: SFS Turbo supports the Network File System (NFS) protocol. This standard protocol allows various applications to read and write data from the file system.
  • Easy Management and Low Cost: The graphical user interface (GUI) makes it easy to create and manage file systems. SFS Turbo reduces your costs since it is pay-per-use.

Accessing SFS Turbo

You can access SFS Turbo from the management console or through APIs by sending HTTPS requests.

APIs: If you need to integrate SFS Turbo with a third-party system for secondary development, you can use APIs. For detailed operations, refer to the Scalable File Service API Reference.

Management Console: If you prefer a web-based interface to perform operations, you can use the console.

In the rest of this article, we will examine how SFS Turbo is used with ModelArts and how it accelerates the AI development process.

Installing and using SFS Turbo with ModelArts

First, we need to create a new VPC. Open the VPC Console and click My VPCs. Then click Create VPC.

VPC Creation

Select Region. Type a new name for the VPC. Select 10.0.0.0.0/8–24 as the IPv4 CIDR Block. Give a name for the Subnet. Then, click Create Now.

VPC Configurations

Go to the Scalable File Service Console (SFS) page. Go to SFS Turbo>File Systems and click Create File System.

SFS Creation

Select the SFS Turbo type. Select the created VPC and subnet. Next, give a name for SFS Turbo and click Create Now.

SFS Configurations

The created SFS Turbo will look as follows.

Getting Created SFS Turbo Ready

Open ModelArts Console. Click Dedicated Resource Pools > Elastic Cluster. Click Create.

Creating a Dedicated Resource Pool

Give a name for the dedicated resource pool. Click Pay-per-use. Select Job Type. Create and select a new network. Select the flavors. Click Next.

Dedicated Resource Pool Configurations

After the selected jobs are enabled, the status will be Running. Now, click on Network.

Dedicated Resource Pool’un Hazır Hale Gelmesi

Click on Interconnect VPC.

Getting Dedicated Resource Pool Ready

Select the created VPC and Subnet. Click OK.

Selecting the Created VPC and Subnet

After the Interconnect VPC process is complete, click More and click Add sfsturbo.

Connecting the SFS Turbo

Select the created sfsturbo. Then click OK.

Selecting the Created SFS Turbo

After Sfs Turbo is enabled, the created dedicated resource pool will be ready.

Getting Dedicated Resource Pool Ready

Buy an ECS with Ubuntu and connect via SSH.

Creating a ECS

Open the SFS Console again and click on the created sfsturbo.

Accessing Created SFS Turbo Information

Copy the command line and paste it into ECS. (You can change the mount path if you want.)

Accessing the Mount Command

Paste the command line for Mount into the created ECS terminal. You can upload your data here.

Mounting to ECS

Now open the ModelArts Console again. Click DevEnviron > Notebook. Click Create. In this step we will see how to connect SFS Turbo to a ModelArts Notebook workspace.

Creating ModelArts Notebook

Give a name for this notebook. Select the auto stop time. Select the image you want. And click Dedicated Resource Pool. Select the created resource pool. Select the flavor you want to use. Click SFS. Type the path to the folder you created in ECS. Click Next and click Submit it.

Notebook Configurations

After the notebook is created, the status will be Running. Click Open.

Getting Notebook Ready

A notebook page will open. As you can see from the image below, the data you loaded in ECS will be in the notebook workspace. You can now develop your algorithms in this environment and record them instantly.

ModelArts Notebook with SFS Turbo

Now let us see how we can use SFS Turbo for model training. First, the training code needs to be in an OBS bucket. A training job will be created by pulling the training code from OBS, but the big data we will use for training will be in SFS Turbo. If you are not using SFS Turbo when you create the training job in ModelArts, you will have to store your large data in OBS and wait for the training job to download the data from OBS into the EVS it creates. SFS Turbo provides a faster process by mounting SFS Turbo directly to the created EVS without having to wait for this data to be downloaded.

Now we can create a training job. In the ModelArts Console, click Training Management>Training Jobs. Click Create Training Job.

Creating Training Job

Specify the configurations as desired and select the code directory and code that should contain the training code already uploaded to OBS.

Training Job Configurations — 1
Training Job Configurations — 2

You have to give the mount path and in SFSTurbo to Directory you have to write the directory you want to mount to. Then, submit the training job.

As a result, you will see the results of the training as in the image below.

Completion of Training Job

Conclusion

Using ModelArts with SFS Turbo significantly accelerates and optimizes the development process of AI projects. SFS Turbo’s high-performance and scalable storage solutions enable fast and reliable processing of large data sets. By integrating with the ModelArts platform, data scientists and developers can train and deploy large AI models faster.

SFS Turbo’s high-speed data access and low latency speed up the training process and minimize data loading times. This reduces latency when working with large data sets, and models can be trained in less time. In addition, the elastic scaling feature allows storage capacity to be quickly adapted to project needs, resulting in cost efficiencies.

With its seamless integration capabilities, you can easily integrate SFS Turbo into the ModelArts platform and enable data sharing between different applications. This integration simplifies data management and makes the workflow of AI projects more efficient.

In summary, ModelArts powered by SFS Turbo increases performance and reduces costs at every stage of AI development. This integration enables you to gain speed and efficiency in projects working with large data sets so that innovative AI solutions can be realized in less time.

--

--