Our Pipeline Journey: Setting up our servers

PXYData
3 min readJan 25, 2019

--

Servers are crucial for any data pipeline. Our servers will provide a centralized platform to run our tasks day and night, and when paired with source control and a deployment strategy, will allow multiple people to edit the tasks on that server. Since we’ll configure what software we run on our servers, we’ll be able to optimize for performance for our tasks. We’ve historically used AWS, but for now we will use Digital Ocean; the DO billing model and configuration settings are easier for beginners.

Configure your Server

First and foremost, register for Digital Ocean. Before you use any services, set up billing alerts to prevent you from spending too much money. A small droplet (the Digital Ocean term for server) can cost as little as $15 a month for your data pipeline needs.

You have your variety of choices of operating software for your server.The community and documentation around Ubuntu 16 is excellent, so we will use that for our operating system. For our sandbox, we’ll use a box with 4GB RAM and 80GB of storage. You can always scale up both later (or spin up more servers).

We’ll want to set up SSH keys. Password authentication is less secure than SSH keys and SSH keys are simple and easy to maintain. We’ll follow these directions to generate a key pair. DO allows us to upload SSH keys on creation, so once you create your key pair, upload your public key (the .pub file you created in the tutorial). Finish the wizard by using the defaults and HOORAY you now are renting space on “the cloud”.

Now we’re going to want to connect to the server. If you successfully uploaded your ssh key, the following line should do it

ssh root@[yourserverip]

Configure Secure Shell (SSH)

There are a few settings we are going to want to turn on/off in order to use our deployment tools. We’re going to need to create a file called sshd_config. We recommend either Sublime Text or VSCode as our editor of choice. Both have syntax highlighting, and are easy to get started with. You can copy the settings below, paste them into the text editor, and save the file as “sshd_config”.

This is a lengthy file, but server SSH configs are important, so we’ll explain what these lines mean in chunks:

LogininGraceTime is the amount of time the server is allowing you to log in for. This means you have 120 seconds from the time you attempt to connect to finish authentication.

PermitRootLogin allows you to log in as the root user, the highest level of authority on the server.

StrictModes makes sure that only the current user can write to their home directory.

RSAAuthentication specifies whether to try RSA authentication. This option must be set to yes for better security in your sessions. RSA use public and private key pairs created with the ssh-keygen utility for authentication purposes.

PubkeyAuthentication allows public key authentication, which is how we logged in to the server in the first place

AuthorizedKeysFile tells ssh where to find the public keys. The %h tells ssh that it is in the home directory of the user attempting to log in.

IgnoreRhosts ; RhostsRSAAuthentication ;HostbasedAuthentication These three disable rhosts, a less secure method of ssh authentication.

PermitEmptyPasswords set to no will prevent users from having no passwords

ChallengeResponseAuthentication set to no means that a user will not be able to authenticate with a password. This, combined with PasswordAuthentication, forces users to authenticate with an ssh key.

PasswordAuthentication set to no means that a user cannot pass a password to authenticate.

X11Forwarding set to yes allows for visual GUI if we ever need it.

The rest are the default settings.

Now that we created the file, we’re going to need to get it on the server. Find the path to the file you created earlier and then run the following line to copy the file to the server:

scp /path/to/local/file root@your-server-ip:/etc/ssh/sshd_config

scp uses ssh to securely copy a file.

These settings enable ansible, our job runner and code deployment tool, to authenticate and configure the server.

Once you have done this try to ssh again, and if you can log in then nothing broke!

--

--

PXYData

PXY DATA is a data design agency that will help solve your most complex business problems.