How to Use Portworx Software-Defined Storage in Your Kubernetes Cluster

Kirill Goltsman
May 9, 2019 · 17 min read

In a previous tutorial, we discussed the architecture and key features of software-defined storage (SDS) systems and reviewed key SDS solutions for Kubernetes. We’ll now show you how to add SDS functionality to your Kubernetes cluster using Portworx SDS.

This post is organized as follows. In the first part, we discuss basic features and benefits of Portworx SDS. Next, we walk you through the process of deploying Portworx to your K8s cluster, creating Portworx volumes, and using them with stateful applications running in your cluster. Let’s get started!

What Is Portworx?

Portworx is the SDS system optimized for container environments and container orchestrators like Kubernetes and Mesos. It has all the benefits of traditional SDS such as storage virtualization and pooling.

What sets Portworx apart from the rest of SDS systems is its deep integration with the container environment and awareness of the orchestrator’s native scheduling functionality. This makes Portworx an excellent storage solution for applications running in your K8s cluster.

Below are some other reasons why Portworx fits well into your Kubernetes cluster:

  • Portworx creates a storage pool that can be tiered across class-of-service, availability zones, and IOPs. The SDS can design various storage tiers for your stateful applications based on the required performance, IOPs, storage size, file system type, availability zone, and other parameters. Storage tiering introduces additional cost savings derived from placing workloads on the most cost-efficient storage.
  • Container-granular replication. Portworx can guarantee storage backups for all containers with mounted Portworx volumes. It ensures that storage used by containers is replicated across availability zones and nodes in your cluster.
  • Support for storage-aware orchestration that extends native scheduling of the K8s. Portworx introduced the STORK (STorage Orchestrator Runtime for Kubernetes) add-on in early 2018. STORK implements storage-aware scheduling that extends Kubernetes scheduler to ensure the optimal placement of volumes in the cluster. The component offers container-data hyperconvergence, storage health monitoring, snapshot-lifecycle features, and failure-domain awareness for applications in your cluster. Thanks to STORK, Portworx volumes can be placed on the most secure, healthy, and performant nodes and can be co-located with applications that use Portworx storage.
  • Data security. Portworx provides secure, key-managed encryption for container volumes. Its encryption component integrates well with the popular key management systems such as Hashicorp Vault and AWS KMS. Also, with Portworx, you can implement access control policies for volumes and data in your stateful applications.
  • Storage Health Monitoring. Portworx scans hard drives for media errors and tries to repair broken drivers and volumes. If the volume can’t be repaired, Portworx can automatically attach a new volume. This solves a problem when containers continue to run unaware of disk errors. Even though the application running in the container can’t write to the volume, users and/or admins regard them as healthy. Portworx media error detection system addresses this problem.

All these features make Portworx a great SDS solution for your Kubernetes cluster.

We’ll now show you how to deploy Portworx to Kubernetes and use Dynamic Volume Provisioning to mount Portworx volumes to applications in your K8s cluster.

Tutorial

To complete examples in this tutorial, we used:

  • A Kubernetes 1.11.6 cluster deployed on AWS with Kops. We tested Portworx deployment in the K8s cluster deployed with Kops on AWS. To reproduce all steps of this tutorial, you’ll need a running Kops cluster. Here is a detailed guide for deploying a K8s cluster on AWS with Kops.
  • AWS CLI tools for managing the AWS cluster. Read this guide to install the AWS CLI.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Portworx Requirements

To successfully run Portworx, a worker node should have at minimum:

  • 4 CPU cores
  • 4 GB memory
  • 128 GB of raw unformatted storage
  • 10 Gbps network speed

Note: Most Portworx SDS services are paid, but you can deploy Portworx on Kubernetes using a 31-days trial.

Step #1: Granting AWS Permissions to Portworx

First, we need to create an AWS IAM user for Portworx with a custom policy allowing Portworx to manage EBS volumes in your AWS cluster. We need to create the following IAM user policy:

You can do this with the AWS CLI or directly from your AWS console. Let’s do it the easiest way: using the AWS console.

  1. Sign in to the AWS Management Console and open the IAM console.
  2. Chose “Users” and then “Add user” in the navigation menu.
  3. Select a name for the Portworx user. For example, “Portworx.
  4. Select the access type for your Portworx user. We need to select the “Programmatic access” because Portworx will be using AWS API to manage volumes.
Image for post
Image for post

5. Next, we need to create a policy for the Portworx user. If you have an existing policy with the permissions specified above, you can reuse it. We’re going to create a new policy. Click “ Create policy “ to open a new browser tab with the editable JSON file. Paste the policy definition we provided above into this window.

Image for post
Image for post

6. Give the name to a new policy, and finalize the process (see the image below)

Image for post
Image for post

7. Now, we can attach a new policy to our Portworx user. Click “ Attach existing policies directly” and select the policy (our policy is named porworx-ebs-policy).

Image for post
Image for post

8. Finally, we can activate the Portworx user. Don’t forget to download and save the Access key and the Secret Key generated for your Portworx user. These credentials should be granted to Portworx later to manage your EBS volumes.

For more information about creating a new IAM user, see the official AWS docs.

Step #2: Preparing Portworx DaemonSet for Kubernetes

Portworx website features a Kubernetes spec generator that helps you optimize Portworx DaemonSet for your Kubernetes environment (click on “Generating the Portworx specs” on this page to access the generator). This is quite useful because storage and networking options may differ across different cloud providers and environments, and users don’t necessarily know all the details.

Image for post
Image for post

First, we need to provide the K8s version used by our cluster. You can find it by running:

Configuring ETCD

Portworx requires an cluster to maintain its metadata and cluster state. You can choose among the following options for etcd:

  • Using your own etcd cluster. In this scenario, you should point Portworx to the existing etcd cluster endpoint (e.g., etcd-1.com.net:2379 ).
  • A Portworx-hosted etcd . With this option selected, Portworx will use its own hosted etcd cluster. However, this option is not recommended for production.
  • Built-in cluster. In this case, Portworx will create and manage an internal key-value store ( kvdb ) cluster. Users can restrict the built-in etcd to certain nodes by attaching the label px/metadata-node=true to them. Only the nodes with this label will be able to participate in the kvdb cluster.

We don’t have our own cluster and don’t want to host it, so we chose the built-in cluster option. Click “ Next” to go to the “ Storage “ configuration page.

Image for post
Image for post

Here, we have to select the environment in which our Kubernetes cluster is running. Because we’ve deployed our K8s cluster on AWS with Kops, first select “ Cloud” and then “ AWS. “ This will open the AWS configuration dialogue (see the image below).

Image for post
Image for post

Here, we can configure AWS storage devices. The wizard recommends using the Portworx Auto-Scaling Groups (ASG) feature that allows Portworx to manage the entire lifecycle of EBS volume storage. This feature is available if your EC2 instances are part of the ASG (Auto-Scaling Groups). Under the hood, Kops uses the AWS ASG, so we can go ahead with this feature and select “Create Using a Spec.”

Note: an AWS Auto-Scaling Group is a collection of EC2 instances treated as a logical grouping for scaling and management purposes. AWS EC2 Auto Scaling is needed for Portworx to dynamically provision EBS volumes, create snapshots, etc.

Therefore, if we choose the “Create Using a Spec” option, Portworx will create its own EBS volumes. We selected a GP2 (General Purpose) Volume type with 30GB of storage. You can add as many volumes as you want. Optionally, you can set “Max Storage nodes per availability zone.” If the value is specified, Portworx will ensure that many storage nodes exist in each AZ.

Because we’ve specified a built-in KVDB option in the previous section, it is recommended to allocate a separate device for storing internal KVDB metadata for production clusters. This allows separating metadata I/O from storage I/O (see the image below).

Image for post
Image for post

Please note that the minimum size of the Metadata Device is 64GB. Once all edits were made, click “ Next” to go to the “ Network “ configuration.

Image for post
Image for post

We don’t want to change anything here. Just leave the “auto “ parameter for the Data Network Interface and Management Network Interface for Portworx to use its networking defaults.

Final Step: Customize

We need to add several finishing touches to prepare our Portworx installation: specify environmental variables, registry and image settings, and some advanced settings.

Image for post
Image for post

You should pay particular attention to the “ Environment Variables “ tab inside this dialogue.

Image for post
Image for post

As you remember, Portworx needs access to the AWS API to manage EBS volumes. We have created a Portworx user with the Access Key and Secret and policy to manage EBS volumes. Now we have to provide these credentials to the Portworx as environmental variables. The AWS Access Key will be stored in the AWS_ACCESS_KEY_ID variable, and the AWS Secret will be stored in the AWS_SECRET_ACCESS_KEY.

If you are planning to use any custom container registry with Portworx, you can specify this registry in the “Registry and Image Settings “ (see the image below).

Image for post
Image for post

Finally, in the “ Advanced “ settings, we can choose to enable/disable Stork, GUI, and monitoring for your Portworx cluster. We recommend using all the suggested options here. For example, you’ll need Stork for storage-aware placement of Pods, health monitoring, and data-application hyperconvergence.

Image for post
Image for post

That’s it! Now, click “ Finish,“ and the wizard will generate the Portworx spec for Kubernetes.

Image for post
Image for post

The spec is lengthy so just copy the URL displayed in the window. You can see that AWS credentials required by Portworx were added to the URL as parameters.

Step #3: Deploy Portworx to your Kubernetes Cluster

Now, let’s use the spec generated by the wizard to deploy Portworx to your Kubernetes cluster. Run kubectl apply with the spec’s URL as the -f value:

configmap/stork-config created
serviceaccount/stork-account created
clusterrole.rbac.authorization.k8s.io/stork-role created
clusterrolebinding.rbac.authorization.k8s.io/stork-role-binding created
service/stork-service created
deployment.extensions/stork created
storageclass.storage.k8s.io/stork-snapshot-sc created
serviceaccount/stork-scheduler-account created
clusterrole.rbac.authorization.k8s.io/stork-scheduler-role created
-----

It will take some time for Portworx to create KVDB and data volumes we specified in the previous step. Portworx will create a data and metadata volume for each node in your cluster. Access your AWS console to verify this:

Image for post
Image for post

Also, let’s verify that Portworx has successfully launched Pods. Because we’ve deployed Portworx as a DaemonSet, it had to launch one Pod per node in your cluster:

NAME           READY   STATUS    RESTARTS   AGE   IP              NODE                                               NOMINATED NODEportworx-cm6vs   1/1     Running   0          10m   172.20.56.137   ip-172-20-56-137.ap-southeast-2.compute.internal   <none>portworx-gp2tz   1/1     Running   0          10m   172.20.36.169   ip-172-20-36-169.ap-

Step #4: Using Portworx

Now that you have successfully deployed Portworx to the cluster, let’s learn how to use it. To manage Portworx, we can use the pxctl tool available on every node where Portworx is running. You can access the Portworx CLI inside the container at /opt/pwx/bin/pxctl or directly on the host.

First, let’s use the CLI to check the Portworx cluster status. Below, we first save the Portworx Pod name to PX_POD shell variable for later reuse and then get a shell to the Portworx container to run pxctl status command:

kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status

Status: PX is operational
License: Trial (expires in 31 days)
Node ID: 1d64f394-a237-4096-846d-084b0f7eb29a
IP: 172.20.56.137
Local Storage Pool: 1 pool
POOLIO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 LOW raid0 30 GiB 1.8 GiB Online ap-southeast-2a ap-southeast-2
Local Storage Devices: 1 device
Device Path Media Type Size Last-Scan
0:1 /dev/xvdf STORAGE_MEDIUM_SSD 30 GiB 29 Jan 19 13:08 UTC
total - 30 GiB
Metadata Device:
1 /dev/xvdg STORAGE_MEDIUM_SSD
Cluster Summary
Cluster ID: px-cluster-c9924822-6299-4250-9c2d-f3b016f2abdb
Cluster UUID: 2a15fc88-b843-40c5-b075-6dbe3dc39599
Scheduler: kubernetes
Nodes: 2 node(s) with storage (2 online)
IP ID SchedulerNodeName StorageNode Used Capacity Status StorageStatus Version Kernel OS
172.20.36.169 fb7259b2-3e8a-4ef9-815e-aa45a001fcec ip-172-20-36-169.ap-southeast-2.compute.internal Yes 1.8 GiB 30 GiB Online Up 2.0.2.0-e346215 4.9.0-7-amd64 Debian GNU/Linux 9 (stretch)
172.20.56.137 1d64f394-a237-4096-846d-084b0f7eb29a ip-172-20-56-137.ap-southeast-2.compute.internal Yes 1.8 GiB 30 GiB Online Up (This node) 2.0.2.0-e346215 4.9.0-7-amd64 Debian GNU/Linux 9 (stretch)
Global Storage Pool
Total Used : 3.5 GiB
Total Capacity : 60 GiB

The status details above tell us that the PX cluster is operational. The total capacity of the storage cluster is 60GiB and 3.5 GiB have been used so far.

To use storage capacity allocated to Portworx, you should create a Portworx volume and expose it to your Pod. It can be done through manual pre-provisioning or using Kubernetes dynamic volume provisioning.

Pre-provisioning a Portworx Volume

You can pre-provision a Portworx volume using the pxctl tool. To access the tool, you can either get a shell to one of the PX Pods as we did in the example above or ssh to one of the nodes in your K8s cluster and access pxctl directly at /opt/pwx/bin/pxctl of your instance. Below is the example of the second option:

--size=5 \
--repl=2 \
--fs=ext4 \
test-disk

Here, we used pxctl volume create command that has the following format: pxctl volume create[command options] volume-name . This command creates a 5GB volume named “test-disk” with the ext4 file system and two copies across the cluster. We specified the number of copies to create using the--repl argument. Please check the official pxctl CLI reference for more information about this command.

Dynamic Volume Provisioning

With the DVP, you don’t need to pre-provision Portworx volumes before using them in your applications. Cluster administrators can create Storage Classes that define different classes of Portworx Volumes offered in the cluster. Thereafter, applications can request dynamic provisioning of these volumes.

Below is the example of the Portworx StorageClass:

Please note that we need to specify kubernetes.io/portworx-volume as the storage provisioner to link this StorageClass to Portworx.

Also, in the parameters field, we specified the Portworx volume parameters for provisioning volumes. Let’s briefly describe them:

  • repl — a number of Portworx volume replicas.
  • fs — a filesystem type used by the volume.
  • shared — a Boolean flag to create a globally shared volume that can be used by multiple Pods. It is useful when you want multiple Pods to access the same volume at the same time even if the Pods are running on different hosts.
  • sticky — “sticky” volumes cannot be deleted until the “sticky” flag is disabled.
  • snap_schedule — this parameter defines a snapshot schedule for PX volumes (PX 1.3 and higher). The following formats are accepted: periodic=mins,snaps-to-keep , daily=hh:mm,snaps-to-keep ,
    weekly=weekday@hh:mm,snaps-to-keep , and monthly=day@hh:mm,snaps-to-keep . We used a periodic snap schedule with a period of 60 minutes and 10 snaps to keep.

You can find an exhaustive list of all available parameters for the Portworx volume in this article.

Step #5: Deploying PostgreSQL with Portworx

In what follows, we’ll demonstrate how to dynamically provision Portworx volumes for PostgreSQL database in Kubernetes.

As in the example above, let’s first create a StorageClass for Portworx volumes. We’ll use a simple spec with just a few necessary parameters. This StorageClass allows Portworx volumes to be shared between Pods.

Save this spec to postgresql.yml and run:

Verify that the StorageClass was created:

Name: px-postgres-sc
IsDefaultClass: No
Annotations: <none>
Provisioner: kubernetes.io/portworx-volume
Parameters: repl=2,shared=true
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>

Next, we need to create a Persistent Volume Claim (PVC) that requests Portworx provisioner to dynamically provision the volume type specified in the StorageClass . The PVC spec may look something like this:

Please, note that we need to set the spec.accessModes of this PVC to ReadWriteMany to allow mounting this PVC to multiple Pods.

Save this spec to postgres-pvc.yml and create the PVC:

Let’s verify that the PVC has successfully provisioned the volume using kubernetes.io/portworx-volume provisioner:

As the events section suggests, the Portworx Volume storage provisioner has successfully provisioned the volume and bound our PVC to it.

Now, we are ready to use the Portworx “shared” volume in the PostgreSQL Deployment. But before doing this, let’s inspect the Portworx volume we’ve just created:

As we’ve requested, the PX volume has 3GiB of memory and two replicas spread across two nodes of our cluster. However, the volume is currently detached, as the “State” section indicates. Let’s change this by deploying PostgreSQL. Portworx documentation recommends doing this using STORK. As you remember, we included STORK during Portworx spec generation, so it was automatically deployed to our K8s cluster.

To deploy PostgreSQL, you’ll need to define the following environment variables for security credentials:

To deploy PostgreSQL, you’ll need to define the following environment variables for security credentials:

  • POSTGRES_USER — PostgreSQL username.
  • POSTGRES_PASSWORD — PostgreSQL user password.
  • PGDATA — Data Directory for PostgreSQL Database.

And the Deployment Spec looks something like this:

As you see, we defined a volume with the PVC created above. This PVC will mount the requested Portworx volume to the PostgreSQL container at /var/lib/postgresql/data .

Now, save this spec to postgres-deploy.yml and create the Deployment:

It may take some time for the Deployment controller to start the PostgreSQL Pod and attach the Portworx volume to it. Wait some time and then inspect the PX volume again:

As you see, the volume is now attached and used by the PostgreSQL consumer. As the “Replication Status “ section suggests, two volume replicas have been created and are up. So far, everything worked out as we expected!

Remember that we created a shared Portworx Volume, right? Let’s verify that two different applications can use the same Portworx volume by attaching it to another Pod:

This Pod uses the same PVC as the PostgreSQL deployment. Save this manifest to pod2.yml and run:

Now, inspect the Portworx volume again:

 Volume :  271214083253399685
Name : pvc-ae6530e0-23f8-11e9-b286-02233ca4dbf4
Size : 3.0 GiB
Format : ext4
HA : 2
IO Priority : LOW
Creation time : Jan 29 19:04:34 UTC 2019
Shared : yes
Status : up
State : Attached: fb7259b2-3e8a-4ef9-815e-aa45a001fcec (172.20.36.169)
Device Path : /dev/pxd/pxd271214083253399685
Labels : namespace=default,pvc=postgres-data
Reads : 11
Reads MS : 12
Bytes Read : 45056
Writes : 4251
Writes MS : 27928
Bytes Written : 140308480
IOs in progress : 0
Bytes used : 52 MiB
Replica sets on nodes:
Set 0
Node : 172.20.36.169 (Pool 0)
Node : 172.20.56.137 (Pool 0)
Replication Status : Up
Volume consumers :
- Name : pod2 (1bff22ee-2400-11e9-b286-02233ca4dbf4) (Pod)
Namespace : default
Running on : ip-172-20-56-137.ap-southeast-2.compute.internal

- Name : postgres-6695688dcb-8qkv5 (cfc69590-23fd-11e9-b286-02233ca4dbf4) (Pod)
Namespace : default
Running on : ip-172-20-56-137.ap-southeast-2.compute.internal
Controlled by : postgres-6695688dcb (ReplicaSet)

As you see, the volume is now shared by two consumers: a new “pod2 “ consumer and PostgreSQL database. As simple as that!

Conclusion

In this tutorial, you learned how to deploy Portworx to Kubernetes and use it to dynamically provision “shared” volumes for applications running in your Kubernetes cluster. With Portworx, you can create multiple volume replicas, set snapshot policies, and make volumes shareable between applications. In this article, we just scratched the surface of Portworx features in Kubernetes. Portworx SDS can be used to perform a number of storage management tasks such as volume migration, snapshotting, monitoring, disaster recovery, and co-location with the application data (hyperconvergence). We’ll show you how to use these Portworx features in Kubernetes in subsequent tutorials. Stay tuned to the Supergiant blog to find out more!

Originally published at https://supergiant.io.

Supergiant.io

Kubernetes how-to’s and tutorials brought to you by…

Kirill Goltsman

Written by

I am a tech writer with the interest in cloud-native technologies and AI/ML

Supergiant.io

Kubernetes how-to’s and tutorials brought to you by Supergiant.io, the Kubernetes Certified Service Provider

Kirill Goltsman

Written by

I am a tech writer with the interest in cloud-native technologies and AI/ML

Supergiant.io

Kubernetes how-to’s and tutorials brought to you by Supergiant.io, the Kubernetes Certified Service Provider

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store