Kubernetes with AWS EFS

Published in

DeepLearning-101

3 min readOct 8, 2018

Recently I was trying to perform distributed training of ResNet50 model using TensorFlow. I set up EKS (Elastic Kubernetes Service) cluster with 3 nodes by following the AWS EKS getting started page to perform the training. There were many challenges in terms of setting up clusters, integrating KubeFlow etc but one of the main challenge was to select the right kind of storage that should be used to share the data between pods siting on different worker nodes for distributed training.

I had following choices

Use S3: This works well, but then I have to download the data eventually to local machine (EBS or instance store).
Use EBS (Elastic Block Storage): This works well as long as you have separate worker nodes accessing separate data set, but doesn’t work when you want to share the data across different worker nodes.
Use EFS (Elastic File System): This works well when you want to share the data set among different worker nodes and eventually among pods.

Prerequisite:

A brief understanding of Kubernetes
A running kubernetes cluster with at least two nodes

One can follow the below steps to setup EFS and mounting it among worker nodes.

Setup an EFS from AWS EFS Getting Started page.
Mount the EFS volume to each worker nodes by following the instruction from created EFS page by following nfs part from Amazon EC2 mountain instructions.

The next step is to creating a Volume in kubernetes backed by EFS. A complete guide on various type of volume is mentioned here.

There are basically three steps to use EFS storage as volume in a pod

PersistentVolume -> PersistentVolumeClaim -> Volume

Create a persistent Volume (PV) by using below yaml code. You can specify the storage based on your guess on upper limit of your application. you can also use different access mode supported by Kubernetes. The default path ‘/’ works well in case of EFS. Reclaim policy is important to set to retain if you want to have the data accessible even after Pod life cycle is over.

apiVersion: v1kind: PersistentVolumemetadata:   name: nfs-data   namespace: defaultspec:   accessModes:      - ReadWriteMany   capacity:      storage: 5Gi   mountOptions:      - hard      - nfsvers=4.1   nfs:      path: /      server: fs-<fid>.efs.<region>.amazonaws.com   persistentVolumeReclaimPolicy: Retain   storageClassName: nfs-external

Create a Persistent Volume Claim (PVC) by using below yaml code. In order to use the available persistent volume, persistent volume claim request capacity should be at most storage capacity of available persistent volume. In my case PVC request for 2Gi, however my PV has 5 Gi availability.

apiVersion: v1kind: PersistentVolumeClaimmetadata:   annotations:     volume.beta.kubernetes.io/storage-class: nfs-external   name: nfs-external   namespace: defaultspec:   accessModes:      - ReadWriteMany   resources:      requests:        storage: 2Gi

Create a Pod with Volume which uses PVC by below yaml code.

apiVersion: v1kind: Podmetadata:   name: my_podspec:   containers:      - image: my_image      name: my_container_name      volumeMounts:         - name: volume_name         mountPath: path_inside_the_container   volumes:      - name: volume_name        persistentVolumeClaim:           claimName: efs

Now create a file (or write any data) from container at path (path_inside_the_container e.g. /var/log/) inside my_pod, it will be available on EFS volume and all the data on EFS volume will be available for container running inside the my_pod at mountPath in read and write mode.

Kubernetes with AWS EFS

Written by Gautam Kumar