Build AWS S3 compatible cloud storage on GCP with Minio and Kubernetes
Applications today generate more data than ever, and this upward trend is expected to keep up in foreseeable future. How do you handle this ever growing storage requirement of your application? A storage solution that can run where your application runs and can scale with it in an automated manner, is the way to go. Add multi-tenant capabilities and it becomes near perfect!
Minio provides a reliable, lightweight object storage service. Running it on an orchestration platform like Kubernetes, adds automated storage mapping and multi-tenant capabilities. Such setup has a clear separation of concerns — one of most important parameters for scalability. Also, it can be very easy to find and isolate errors in such a setup.
Minio running on orchestration platforms like Kubernetes is a perfect solution for growing storage needs.
In this post, we’ll see how to build AWS S3 compatible object storage server on Google Cloud Platform with Minio and Kubernetes. We’ll also see how you can scale this setup for a multi-tenant environment.
What is Minio?
Minio is a lightweight, AWS S3 compatible object storage server. It is best suited for storing unstructured data such as photos, videos, log files, backups, VM and container images. Size of an object can range from a few KBs to a maximum of 5TB. Salient Minio features include
- Erasure coding for bitrot protection.
- Lambda function support through event notification service.
- S3 V4 and S3 V2 signature support.
- Distributed mode.
For readers not aware of Kubernetes terminology, I’ll quickly go through all the terms used in this post.
Pod: A pod is the smallest unit of computing in Kubernetes. It is a group of containers running in shared context.
ReplicaSets: A ReplicaSet ensures a specific number of pod replicas are always up and running. While ReplicaSets are independent entities, they are mainly used by Deployments as a mechanism to orchestrate pod creation, deletion and updates.
Deployment: A deployment can be thought of as an abstraction containing Pods and ReplicaSet.
Service: A service defines a logical set of Pods and a policy by which to access them. The set of Pods targeted by a Service is determined by a Label Selector (defined in Service’s yaml file).
Persistent Volumes: A Persistent Volume (PV) is a piece of networked storage in the cluster with storage specific details abstracted away.
Persistent Volume Claims: A Persistent Volume Claim (PVC) is a request for storage by an application/pod.
To get started you’ll need a Kubernetes cluster running on Google Compute Engine (GCE). Follow these detailed steps on setting up a Kubernetes cluster on GCE.
With persistent volumes (PV) and persistent volume claims (PVC) — Kubernetes makes it very easy to abstract away physical storage details from your application. You can just create PVs with the physical storage in your cluster and then let your application ask for storage it needs via PVCs. As a storage request is made via PVC, Kubernetes maps it to actual storage (PVs) automatically.
Let’s explore this further in Google Compute Engine context. GCE has disks that serve as physical storage for your compute nodes. On Kubernetes, you can create PVs that use these disks as the backbone physical storage.
Later, as you deploy Minio on the Kubernetes cluster, you can create PVCs to request for storage that you need for that particular Minio instance. Kubernetes automatically binds matching PV to PVC. This is known as static binding in Kubernetes world, and yes, there is dynamic binding too, but we’ll skip that for now. Read more about binding here.
Now that you’ve a clear picture of how things work, lets start with creating a GCE disk.
$ gcloud compute disks create minio-1 --size=10GiB
This creates a disk named
disk1 with a size of
10GiB. Now create a PV based on the GCE Disk we just created.
Download and save the file as minio-gce-pv.yaml. You can then create a persistent volume using the command:
$ kubectl create -f minio-gce-pv.yaml
A deployment encapsulates replica sets and pods — so, if a pod goes down, replica set makes sure another pod comes up automatically. This way you won’t need to bother about pod failures and will have a stable Minio service available.
But before creating the deployment, we need to create a persistent volume claim (PVC) to request storage for the Minio instance. As explained above, Kubernetes looks out for PVs matching the PVC request in the cluster and binds it to the PVC automatically.
This automation can come in very handy if you need a large scale multi-tenant environment with varying storage requirements. You can spin up a Minio deployment (with a PVC requesting appropriate storage therein), per tenant. Kubernetes automatically binds the PVCs to PVs. This way you have a multi-tenant, stable, S3 compatible object storage server at your command!
Here is how you can create a PVC and a single pod deployment running Minio Docker image.
Download and save the file as minio-standalone-deployment.yaml. Notice that we create PVC first and then the deployment uses it as its volume. You can then deploy Minio using the command:
$ kubectl create -f minio-standalone-deployment.yaml
Expose Minio as a service
Now that you have a Minio deployment running, you may either want to access it internally (within the cluster) or expose it as a Service onto an external (outside of your cluster, maybe public internet) IP address, depending on your use case.
You can achieve this using Services. There are 3 major service types — default type is ClusterIP, which exposes a service to connection from inside the cluster. NodePort and LoadBalancer are two types that expose services to external traffic. Read more about services here.
Below yaml file configures a LoadBalancer service for your Minio deployment.
Download and save this file as minio-service.yaml and run the command —
$ kubectl create -f minio-service.yaml
The IP address where the service is served generally takes a couple of minutes to be created after the above command is run. You can check the IP using —
$ kubectl get services
Once you have the IP address available, you can access Minio via the address
Access Key and Secret Key remain the same as the environment variables set in minio-standalone-deployment.yaml.
Note that LoadBalancer will work only if the underlying cloud provider supports external load balancing.
Kubernetes comes bundled with a neat dashboard. You can easily track your Minio pod’s memory, CPU usage and many other metrics via the dashboard.
To access the dashboard, execute the command —
$ kubectl cluster-info
Access the URL mentioned against kubernetes-dashboard. Here is how my dashboard looks
Need help? We hangout on Slack. Join us!