In Part 1, I had covered some of the basic features of the Linode Cluster Toolkit (LCT) and LinodeTool. If those names and terms like “Cluster Plans” are new to you, you may want to read Part 1 first .

In this article, I’ll introduce some of their more advanced features, including predefined cluster plans, firewall management, advanced disk image management, and DNS management.

Predefined Cluster Plans

While LCT and LinodeTool make it easy create clusters from cluster plans, preparing a cluster plan itself often requires domain knowledge and prior experience with the software stacks it’s deploying.

LCT and LinodeTool come packaged with…


Nowadays, deployment — be it big data, highly available databases or load balanced web applications — involves multiple clusters of many servers providing web serving, data processing, storage or coordination services.

In this article, I’ll introduce the Linode Cluster Toolkit, a software library, and LinodeTool, a command-line tool, that together make cluster deployments on the Linode cloud quick, simple and secure.

Motivation

If you are a Linode customer, you have probably used the Linode Manager web application. It’s a user interface — to be used by people.

Linode also provides APIs — Application Programming Interfaces — to be used by applications


One reason I love statistics and machine learning is that they provide techniques to make computers solve problems smartly and quickly that would otherwise require considerable manual effort and time.

In this article, I describe how I approached one such problem related to content discovery and recommendations, using unsupervised machine learning techniques.

I also used the opportunity to explore a solution using Apache Spark, instead of using a more common machine learning platform like Python’s scikit-learn. An advantage of Spark’s machine learning implementations are that they support distributed cluster processing out of the box, unlike scikit-learn’s implementations.

I chose to…


In this article, I explore the Linode cloud’s capabilities at running challenging computer vision tasks like deep learning, multiple object detection and face recognition.

Mining content in photos and videos is something I think is very useful. Such a mining engine opens up the possibility of running rich queries like “show me all my 2005 videos that have me, mom, and our pets” over your photo collections.

So I wrote a tool called deepvisualminer to “visually mine” photos and videos, by discovering objects and recognizing individuals in them, using a mixture of deep learning techniques and traditional computer vision techniques.


In part 1 of this series, we looked at GlusterFS. In part 2, we looked at the Ceph Object Store. In this concluding part, we look at HDFS which is arguably the most popular among the three in big data ecosystems and is also quite different from the other two.

HDFS — Hadoop Distributed File System — is a file system that was designed for the Hadoop distributed data processing system. It remains the file system of choice in the big data ecosystems of Hadoop and Spark because it has proved itself in many large deployments of major software companies.


In part 1 of this series, we looked at GlusterFS. Now in part 2, we look at an entirely different kind of storage system - the Ceph Object Store.

Ceph is actually an ecosystem of technologies offering three different storage models — object storage, block storage and filesystem storage. Interestingly, Ceph’s approach is to treat object storage as its foundation, and provide block and filesystem capabilities as layers built upon that foundation. Its scope is far bigger compared to GlusterFS, and consequently its architecture is more complex.

In this article, I’ll cover Ceph’s object store - its architecture, its deployment…


We live in the Age of Data. Data analytics, machine learning and big data have become critical factors — and in some cases, even unique selling points and outright products — for many businesses and verticals.

This explosion in data often involves persisting large volumes of data to storage. The distributed and scalable nature of big data processing systems often impose similar demands of scalability and performance on their storage layers.

In this 3-part series, we take up three such storage systems — GlusterFS, Ceph and Hadoop DFS — and explore their architectures, deployment and performance on the Linode cloud.

Introduction

Karthik Shiraly

Tech lover. Data Science | Big Data | Machine Learning. Pathbreak Consulting. Always on the path less traveled.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store