Automate Vertica Cluster provisioning with SaltStack

Giandomenico Avelluto
Quantyca
Published in
9 min readJul 29, 2019

1. Introduction

Hello everyone!

My name is Giandomenico Avelluto and I work as Lead Site Reliability Engineer @Quantyca.

I want to share how the adoption of Salt has facilitated and optimized the applications provisioning and testing process.

Note: If you are not familiar with the Salt system, check out my Salt Essentials article:

In this article, I’ll share with you how we’ve automated the creation of Vertica Database Clusters using SaltStack.

In essence, Vertica already provides a usable virtual environment, but regrettably, it is limited to a single node instance. As a consequence, it does not allow us to take full advantage of Vertica data partitioning and load distribution capabilities.

For this reason, we have created an automatic procedure to set up a Vertica Cluster that allows us to test all the features that the database provides.

For this demo, we’ll use Vagrant for virtual instances provisioning and Salt Provisioner for Vagrant to automate the Vertica Cluster creation using SaltStack.

This article will focus on the following topics:

  • What Vertica is
  • Vertica Cluster Quick Start
  • Salt Orchestration walkthrough
  • Conclusion

2. What Vertica is

Vertica is a stacked analytic database designed to handle large volumes of data that, enables fast query performance in intensive scenarios.

Vertica’s design features include:

  • Column-oriented storage organization. A stacked storage organization that increases the performance of sequential record access at the expense of common transactional operations with key features such as: single record retrieval, updates, and deletes.
  • Massively parallel processing (MPP) architecture. A stacked structure designed to distribute queries on independent nodes and scale performance linearly.
  • Standard SQL interface. A domain language with many analytics capabilities built-in, such as time series gap filling/interpolation, event-based windowing, and sessionization, pattern matching, event series joins, statistical computation (e.g., regression analysis), and geospatial analysis.
  • In-database machine learning. A platform that include the following features: categorization, fitting and prediction to enhance processing speed by eliminating the need for down-sampling and data movement. Vertica offers a variety of in-database algorithms, including linear regression, logistic regression, k-means clustering, Naive Bayes classification, random forest decision trees, and support vector machine regression and classification. Vertica also allows the deployment of ML models to multiple clusters.
  • Compression. A process that reduces storage costs and I/O bandwidth. High compression is possible because columns of a homogeneous datatype are stored together and updates to the main store are batched.
  • Shared-nothing architecture. An Architecture that reduces system contention for shared resources and allows gradual degradation of performance in the face of hardware failure.
  • Easy to use and maintain through automated workload management, data replication, server recovery, query optimization, and storage optimization.
  • Native integration with open source big data technologies like Apache Kafka and Apache Spark.
  • Support for standard programming interfaces, including ODBC, JDBC, ADO.NET, and OLEDB.
  • High-performance and parallel data transfer to statistical tools such as built-in machine learning algorithms based on R, and the ability to store machine learning models, and use them for in-database scoring.

Vertica Analytics Platform Community Edition was made available for free with certain limitations, such as a maximum of one terabyte of raw data, three-node (servers) cluster at most, and community-based support.

3. How to get started

3.1 Requirements

In order to follow this tutorial, you need to download and install the following tools:

3.2 Environment configuration

First of all, you will need to install the following vagrant plugin by running on terminal:

vagrant plugin install vagrant-vbguest
vagrant plugin install vagrant-hostmanager

The vbguest plugin will ensure that the correct version of Virtual Box Guest Additions is automatically installed on servers. Instead, the hostmanager plugin will enable hostname resolution of multi-machine environment.

Next, you can download our Vertica salt-formula from GitHub:

git clone https://github.com/Quantyca/demo-sre-vertica-salt-formula.git

Note: Formulas are pre-written Salt States that offer reusable bundles which package all the necessary piece to automate a specific task.

Now, move into the repository:

cd demo-sre-vertica-salt-formula

Note: By default, the Vagrantfile is configured to bootstrap a three-node Vertica Cluster. You can run the setup on a higher number of nodes as long as you hold an Enterprise license.

Next, it will be necessary to register on the Vertica web site and download the Vertica Server package for “Red Hat Enterprise Linux 6 and 7”:

then rename the downloaded package in “vertica.rpm” and move it in demo-sre-vertica-salt-formula/salt/vertica/packages folder.

That’s it! Now you can come back on terminal and create your Three Node Vertica Cluster by running:

vagrant up

Note: After a while vagrant will ask you to pass your user password. This is needed in order to map automatically vagrant instances hostname in your /etc/hosts file.

At the end of the procedure if no error occurred, you should see an output like this:

...    ...    ...
... ... ...
Summary for 10.0.2.15_master
------------
Succeeded: 8 (changed=8)
Failed: 0
------------
Total states run: 8
Total run time: 672.630 s
[INFO ] Runner completed: 20190720104900190410

Then, let’s try to do some query to the database. To do that you can enter in one of three instances by running on terminal:

vagrant ssh vertica01

Then we can use VSQL, a SQL client provided by Vertica, to do the queries. For instance:

sudo su - dbadmin
vsql -U dbadmin -w 'Vertica!'

At this point, VSQL shell should appear. So let’s try to run a query:

SELECT node_name, node_state, node_address FROM NODES order by 1;

Output:

 node_name          | node_state | node_address 
--------------------+------------+--------------
v_vertica_node0001 | UP | 192.168.99.2
v_vertica_node0002 | UP | 192.168.99.3
v_vertica_node0003 | UP | 192.168.99.4
(3 rows)

Congratulation! Your cluster is Up & Running!

You can also connect with a graphical SQL client to the cluster and start testing its functionality. You can use the following convention, in order to connect to the database:

  • JDBC Connection: jdbc:vertica://vertica01:5433/vertica
  • Credentials:
    * Username: dbadmin
    * Password: Vertica!

You can control these virtual instances with vagrant by running:

vagrant halt # This will shutdown the instances without losing data
vagrant up # This will start the instances
vagrant destroy # This will destroy the instances and all related data

4. Salt Orchestration Walkthrough

Next, let us cover what the Salt Orchestration does.

Let’s begin by checking the structure of our formula repository and identify its key components:

demo-sre-vertica-salt-formula/
├── README.md
├── Vagrantfile
├── pillar
│ ├── top.sls
│ └── vertica.sls
├── salt
│ └── vertica
│ ├── map.jinja
│ ├── mine
│ │ └── init.sls
│ ├── orchestration
│ │ ├── check_data_dir.sls
│ │ ├── cleanup.sls
│ │ ├── create_database.sls
│ │ ├── init.sls
│ │ └── install_vertica.sls
│ ├── packages
│ │ ├── README.md
│ │ └── vertica.rpm
│ └── requirements
│ ├── generate_ssh_key.sls
│ ├── init.sls
│ ├── mine_refresh.sls
│ └── propagate_ssh_key.sls
└── vagrant
├── config
│ ├── master
│ ├── minion1
│ ├── minion2
│ └── minion3
└── pki
├── master.pem
├── master.pub
├── vertica01.pem
├── vertica01.pub
├── vertica02.pem
├── vertica02.pub
├── vertica03.pem
└── vertica03.pub
  • Vagrantfile: it contains all the information for virtual instances provisioning plus some of Salt provisioner configurations
  • pillar: this folder contains the files that will be shared with the Salt Master in order to configure the necessary Pillars for orchestration, such as defining the Mine functions useful for distributing ssh keys for configuring the ssh passwordless on virtual instances
  • salt/vertica/map.jinja: it contains the definition of the variables used in the orchestration
  • salt/vertica/orchestration/init.sls: it contains the Vertica Salt Orchestration functions
  • salt/vertica/packages: this folder contains the Vertica database installation package
  • salt/vertica/requirements: it contains all the states for Vertica requirements check to be fulfilled before proceeding with the installation
  • vagrant/pki: the folder contains the public and private keys that vagrant will use for configuring Salt Minions and Master
  • vagrant/config: it contains the specific Salt Minions and Master configuration files

Next, we shall analyze our salt/vertica/orchestration/init.sls file:

{% from "vertica/map.jinja" import options with context %}
{% from "vertica/map.jinja" import data_dir with context %}
{% from "vertica/map.jinja" import dbadmin_passwd with context %}
{% from "vertica/map.jinja" import vertica_user with context %}
{% from "vertica/map.jinja" import vertica_group with context %}
{% from "vertica/map.jinja" import vertica_user_home with context %}
{% from "vertica/map.jinja" import tech_user with context %}
{% from "vertica/map.jinja" import tech_user_home with context %}
{% from "vertica/map.jinja" import vertica_db with context %}
{% from "vertica/map.jinja" import dev with context %}
#1
Check Requirements:
salt.state:
- sls: vertica.requirements
- tgt: 'roles:vertica_*'
- tgt_type: grain
- pillar:
dev: {{ dev }}
tech_user: {{ tech_user }}
tech_user_home: {{ tech_user_home }}
#2
generate_ssh_key:
salt.state:
- sls: vertica.requirements.generate_ssh_key
- tgt: 'roles:vertica_init'
- tgt_type: grain
- require:
- salt: Check Requirements
- pillar:
tech_user: {{ tech_user }}
tech_user_home: {{ tech_user_home }}
#3
Refresh Mine functions:
salt.state:
- sls: vertica.requirements.mine_refresh
- tgt: 'roles:vertica_*'
- tgt_type: grain
- require:
- salt: generate_ssh_key
#4
Propagate ssh-key on all host for passwordless setup:
salt.state:
- sls: vertica.requirements.propagate_ssh_key
- tgt: 'roles:vertica_node'
- tgt_type: grain
- require:
- salt: Refresh Mine functions
- pillar:
tech_user: {{ tech_user }}
tech_user_home: {{ tech_user_home }}
#5
Install Vertica:
salt.state:
- sls: vertica.orchestration.install_vertica
- tgt: 'roles:vertica_init'
- tgt_type: grain
- require:
- salt: Propagate ssh-key on all host for passwordless setup
- pillar:
data_dir: {{ data_dir }}
dbadmin_passwd: {{ dbadmin_passwd }}
options: {{ options }}
tech_user: {{ tech_user }}
tech_user_home: {{ tech_user_home }}
vertica_user: {{ vertica_user }}
vertica_user_home: {{ vertica_user_home }}
vertica_group: {{ vertica_group }}
#6
Check for data dir:
salt.state:
- sls: vertica.orchestration.check_data_dir
- tgt: 'roles:vertica_*'
- tgt_type: grain
- require:
- salt: Install Vertica
- pillar:
data_dir: {{ data_dir }}
vertica_user: {{ vertica_user }}
vertica_user_home: {{ vertica_user_home }}
#7
Create Vertica Database:
salt.state:
- sls: vertica.orchestration.create_database
- tgt: 'roles:vertica_init'
- tgt_type: grain
- pillar:
vertica_user: {{ vertica_user }}
dbadmin_passwd: {{ dbadmin_passwd }}
vertica_db: {{ vertica_db }}
- require:
- salt: Check for data dir
#8
Cleanup:
salt.state:
- sls: vertica.orchestration.cleanup
- tgt: 'roles:vertica_*'
- tgt_type: grain
- require:
- salt: Create Vertica Database
- pillar:
data_dir: {{ data_dir }}
tech_user: {{ tech_user }}
tech_user_home: {{ tech_user_home }}

In the first part, we find all the necessary variables imports. Next, step by step:

  • #1 - Check Requirements: it verifies that all the Vertica requirements are met
  • #2 - generate_ssh_key: during setup, Vertica needs, to distribute its configurations, to connect to all nodes via ssh. For this reason, we generate the necessary keys to set up the ssh passwordless
  • #3 - Refresh Mine functions: in order to retrieve ssh keys, you need to reload Mine functions
  • #4 - Propagate ssh-key on all host for passwordless setup: as the title already explains, in this step Salt propagate the ssh keys to all the nodes to enable the ssh-passwordless capability
  • #5 - Install Vertica: we install Vertica using its installation script

Tips: the official Vertica installation script requires the list of nodes to be configured in the cluster as an argument. So, we use the power of Salt Mine functions to retrieve this information at runtime and pass it to the script. This allows us to be able to automatically create Vertica Clusters of any size in any Environment

  • #6 - Check for data dir: before creating a database in our new Vertica Cluster, we make sure that the necessary folders exist and that the correct permissions are set
  • #7 - Create Vertica Database: in this step, we create a test database which we will call “vertica
  • #8 - Cleanup: finally, we remove the no longer necessary configurations

Note: all the steps are dependent on the previous one. So only in case of correct execution of the previous step, the following one is executed. Otherwise, the orchestration stops.

5. Conclusions

As you can be noted, is quite simple to orchestrate Vertica Cluster setup for test purposes, by using Vagrant and SaltStack.

I hope this article will useful for you to build intelligent and powerful orchestrations for your applications.

If you see room for improvement, let me know or open an Issue on GitHub! Thanks for following me and see you at the next article! Below, you will find some useful links.

5.1 Resources:

--

--