AWS and Public Key Authentication

Want to simplify connecting your existing and future cloud instances to a central server in a secure, automated and reliable way?
Read on to find out…

This article explains how to establish Public Key Authentication (aka PKA) between a central server and various client machines. PKA has numerous advantages over conventional password based authentication including increased security, stronger identity checking and not having to maintain ledgers of individual usernames and passwords.

One often cited disadvantage of using PKA is the complexity around distribution of public keys and making relevant configuration changes on each client machine before they are ready to utilize password-less authentication.

In this article, we discuss one of the ways to circumvent these shortcomings, at least in cloud environments, using event-driven, serverless computing capabilities to automate much of the requisites. We make use of AWS EventBridge and Lambda to accomplish this. Although this article concentrates on AWS, the same can be accomplished in Azure as well using Azure Functions and Event Grid.

Fig: Proposed workflow in 3 parts as explained in the article

As depicted in the high level workflow above, this article is divided into 3 parts:

Part I: Here we discuss the creation of SSH key pair on the central server (the machine which should be able to login to all other client machines using its private key). The public key generated here will be used in subsequent steps.

Part II: We discuss the various steps which need to be performed on a client machine so that the central server can authenticate to it without any password. Each step involved is performed on one sample client machine in a manual way to make it comprehensible. All the steps in this part will be automated in subsequent steps.

Part III: Finally, we automate the steps performed in Part II to rapidly enable PKA between machines and make the entire process much more scalable. We utilize serverless compute capabilities provided by AWS to set this up.

Part I: Generate SSH key pair on central server

Lets start by creating a SSH key pair (public and private key) on the central server which needs to authenticate to various client machines. The public key will be distributed to various client machines in later steps.

  1. Login to the central server and create a public key pair using RSA algorithm on the central server:
ssh-keygen -t rsa
Fig: Generating SSH key pair on the central server
  • The ‘-t’ flag is used to specify the algorithm which will be used to generate the key pair. Options include dsa, ecdsa, ed25519, rsa (chosen) and rsa1.
  • The command also prompts for a passphrase, which is basically used to protect the private key using encryption in case it gets stolen or leaked.
  • In our use case, we are creating service accounts on client machines which are non-interactive in nature. If we utilize passphrase, it can potentially disrupt any automation routine that utilizes PKA as the service account will be prompted for a passphrase to proceed. This is undesirable and therefore we proceed to generate our key pair without a passphrase.
  • Once complete, the key pair by default gets generated under /root/.ssh directory. The ‘id_rsa’ file contains the private key while the corresponding public key is stored in ‘id_rsa.pub’ file.
Fig: Generated key pair and actual contents of the public key

2. Copy the contents of the public key file (id_rsa.pub) for later use. Note that the private key should never leave the central server and even on the central server, it should be accessible to only authorized users.

Now that we have generated and obtained the public key of the central server, we can proceed to make our client machines store this key for authentication purposes.

We’ll do this manually for one machine in part-II of this article and then automate the complete process using AWS Lambda and AWS EventBridge in part-III of this article.

Part-II: Copy the public key to client machines

The steps below (1–8) establish PKA between the central server and one of the client machines using the public key which was obtained as an output in part-I of this article.

We will basically copy the public key from the central server to the client machine, mark it as authorized and make changes to certain ssh configurations to enable PKA.

Many of these steps can be accomplished using ssh-copy-id utility. However the process to utilize it requires password based authentication for one time to the target machine. This is fine if we were doing this manually, however we are automating the complete process in part-III of this article and hence we proceed with the seemingly longer process illustrated below:

  1. Login to a client machine and create an account (named svc_user) with no password on the target node (aka client machine):
useradd -m svc_user
  • The -m flag is used to specify whether the user’s home directory should be created or not. We require this for our use case to create the user’s ssh directory (and to place the authorized_keys file within it)

In case the same account needs to be setup with a password as well (not recommended), we can use the command below:

useradd -m -p $(openssl passwd -1 ‘<<password>>’) svc_user

2. Add the user to sudo group on the target node:

usermod -aG sudo svc_user
  • This step is required on some RHEL and Ubuntu distributions. Just ignore if this prompts an error like ‘usermod: group ‘sudo’ does not exist’.

3. Modify visudo file to indicate no password requirement for privilege escalation while using ‘svc_user’ account:

echo ‘svc_user ALL=(ALL) NOPASSWD:ALL’ >> /etc/sudoers
  • If this step is not implemented, the service account created will require a password to escalate its privileges. Since this is a service account to automate stuff, we don’t want any password prompts to execute commands with sudo privileges.

4. Create a .ssh directory under the new user account created in step 1 to store the authorized public key(s):

mkdir -m 700 /home/svc_user/.ssh

5. Change the owner of the .ssh directory from default root user to the new user account:

chown svc_user /home/svc_user/.ssh
  • The owner of this directory and its contents is the root user by default. We need to change it to the service account for full access.

6. Create and update the authorized_keys file under the new .ssh directory with the public key of the central server:

echo '<<insert public key of central server here>>' >> /home/svc_user/.ssh/authorized_keys

7. Change access permissions on the authorized_keys file:

chmod 600 /home/svc_user/.ssh/authorized_keys

8. Change the owner of the authorized_keys file from default root user to the new user account:

chown svc_user /home/svc_user/.ssh/authorized_keys

Once done, log back into your central server and attempt to ssh into the client machine:

ssh svc_user@172.31.37.52
Fig: Successful authentication using PKA (no password required)

As seen in the figure above, authentication is working without the use of any password. We have successfully setup PKA between the 2 machines.

As you might have gauged by now, the manual procedure to establish PKA between machines is not straightforward and the approach is not tenable if we have several client machines which need to be accessible from the central server.

Which brings us to the final part in this article, where we discuss automating PKA establishment in cloud (AWS in this example). This automation will work on new Linux based machines which will be provisioned going forward.

Part III: Automate PKA in Cloud environments

Finally, in this part we automate the PKA process in AWS so that any new ec2 instance created in future automatically gets the central server’s public key and the necessary configurations to make PKA work.

We primarily make use of AWS Lambda, which in turn will utilize events from AWS Systems Manager and trigger information from AWS EventBridge to know when to run. If you are unfamiliar with these services, I strongly recommend glancing over their official AWS documentation.

In a gist,

  • AWS Lambda lets you run code in the programming language of your choice and essentially utilizes a serverless compute model. This means you don’t have to worry about server compatibility or downloading various dependencies. The cool part is that you can define custom or predefined triggers which is a set of conditions which when met, instruct Lambda to execute the written code.
  • Using lightweight agents (known as SSM agents), AWS Systems Manager lets you manage servers remotely and gives you the ability to execute management and configuration tasks. It lets you run commands on your cloud instances without the need to login.
  • AWS EventBridge lets you define event-based rules (or conditions) and gives the ability to route data or instructions between applications or even multiple AWS services.

Using these AWS services, we are trying to do 2 things:

  • Define a Lambda trigger based on event rules configured in AWS EventBridge (which in turn is based on ec2 instance association status within AWS Systems Manager) — essentially whenever an instance is successfully associated with Systems Manager, our Lambda function should get automatically triggered
  • Configure the Lambda function code to essentially perform all the PKA setup changes defined in Part II of this article (copy SSH public key, make necessary configuration changes and allocate required permissions)
  1. Lets start by creating a Lambda function:
  • From the list of AWS services, select Lambda
  • Click on the ‘Create Function’ tab
  • Choose ‘Author from scratch’ from the options tab
  • Give your function a relevant name
  • Select ‘Python 3.8’ as the Runtime language
Fig: Creating a Python based Lambda function

2. Once the function has been created, lets create an EventBridge rule to define when the function should be triggered. (Note: This is just one of several rule that can be used to define trigger conditions. For your environment and constraints, other rules might make more sense.)

  • From the list of AWS services, select Amazon EventBridge
  • Click on ‘Create Rule’ button and give your rule a relevant name
  • Define the trigger rule using details depicted in the screenshot below:
Fig: Defining a trigger using AWS EventBridge
  • Under ‘Target’ section, select the Lambda function created in step 1 and click on Create
  • Go back to AWS Lambda console and edit your new function
  • Click on ‘Add Trigger’ button under the Configuration tab, select ‘EventBridge’ and choose the newly created rule from the drop down
  • The designer tab should look something like this:
Fig: Design level view of the new Lambda function

3. Finally, lets write the function code to include the Python snippet which needs to be executed whenever the trigger rules are met:

  • Under the ‘Function code’ section, insert the Python snippet as shown in the screenshot below:
Fig: Python Function code to specify the course of action once the trigger rules are met

Below is the reusable code snippet which can be copied for future use (replace the bold section with the actual public key obtained from the central server):

import json
import boto3
def lambda_handler(event, context):
# TODO implement
ssm = boto3.client('ssm')
instance_id = str(event['detail']['instance-id'])

script = [
'useradd -m svc_user',
'usermod -aG sudo svc_user',
'echo "svc_user ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers',
'mkdir -m 700 /home/svc_user/.ssh',
'chown svc_user /home/svc_user/.ssh',
'echo "INSERT PUBLIC KEY HERE" >> /home/svc_user/.ssh/authorized_keys',
'chmod 600 /home/svc_user/.ssh/authorized_keys',
'chown svc_user /home/svc_user/.ssh/authorized_keys'
]

response = ssm.send_command(
InstanceIds = [instance_id],
DocumentName = 'AWS-RunShellScript',
Parameters = {
'commands': script
}
)
return {
'statusCode': 200,
}

The code basically does 2 things:

  • Retrieves instance IDs of the ec2 instance for which the trigger rule was fired owing to a matching event
  • Using the SSM agent deployed on the instance, a run command (which in this case is actually a shell script) is executed locally

Once the Lambda function is deployed, we are all DONE!

Now whenever a new ec2 instance is spun up and the instance gets associated with AWS Systems Manager, our Lambda function should automatically start executing the list of shell commands to copy the public key of the central server and make the necessary permission and configuration changes.

You can SSH to any of the new machines by executing the command below(replace the bold part with the actual IP of the new machine) on the central server and it should work without the need of any password.

ssh svc_user@host-ip

We have successfully deployed an automated solution to deploy password-less service accounts on cloud instances using a scalable and self evaluating approach.

There are a number of applications of this solution — think of various DevOps tools (Ansible/Chef/Puppet/Salt/Terraform etc.), many of which rely on authentication through these service accounts to push configuration changes.

Establishing connectivity between the central server (where these tools are hosted) and the client machines is perhaps the most cumbersome piece, which has been taken care of by this solution.