Multi-Segment Distributed Storage for Kubernetes

Dmitry Yusupov
EdgeFS
Published in
5 min readMay 26, 2019

--

Working with a group of researchers involved into Calit2 project helped us nail an interesting EdgeFS use case of stretching within single Kubernetes cluster and across continents over fast throughput yet high latency networking backend.

The challenge of long-distance high throughput transfer of data is an old problem, and scientific communities, such as Cenic, have solved it. However, sharing data among researchers, even over dedicated DMZ, is a super complex task with data management being a real pain.

Latency across geographies is high, even with optical backbones and as such, stretching single storage namespace isn’t going to be efficient.

Datasets are distributed, and some can be very big in sizes. Copy of the data spreads it out, imposing security control challenges and content consistency uncertainties. Good news is that in the majority of cases not all datasets need to be accessed all at once.

So, we thought that EdgeFS can be of help here.

What is EdgeFS? This is a new storage provider addition to CNCF Rook project. While it is a scale-out storage cluster, it can still operate in a so-called “solo” mode, a single-node Docker container with an ability to scale out your deployment as it grows by simply connecting more nodes and/or geographically distributed cluster segments to it.

And the nice thing about Kubernetes is that by providing built-in namespace isolation, segmentation within same…

--

--