Remote Access

Kev Jackson
THG Tech Blog
Published in
9 min readJan 5, 2022

The current world situation has accelerated the steady move to more remote working for employees. Tech companies are naturally in the vanguard of this move to “remote working” or telecommuting (depending on how far away from the main office the worker is).

Once an organisation has taken the leap to allow much of their workforce to work from home, the next inevitable step is “digital transformation” — where processes and practices are changed to replace the default assumption that your colleagues are co-located in the same office.

“Within days of the outbreak, work from home (WFH), until then practiced sporadically by companies and organizations, became mandatory — a question of physical and financial survival.”

Zero-trust Architecture

Running parallel to migrating to remote-working at speed, a concept known as zero-trust has also been gaining momentum since Google published a white paper giving an overview of their BeyondCorp security posture.

A strong message from this publication is that the standard security model of strong perimeter security with VPN access for remote access needed to be rethought in terms of access. The traditional perimeter security model defines two tiers of access:

  • external to the corporate network == no access
  • internal to the corporate network == full access

In terms of the concept of trust, then there are only two viable levels — untrusted or trusted. This lack of flexibility limits the design of the network and encourages a mode of operations where security is positioned as a “strong wall” to keep out intruders — think of a classic medieval fort.

https://en.wikipedia.org/wiki/Eston_Nab

The walls exist to keep out invaders, however the walls offer no protection at all once the invaders are inside. Hill-forts, castles etc, however were designed to have layers of defences — an external ditch followed by a wall with a bailey or stronghold in the centre. You may be trusted to enter within the walls but would not be allowed to enter into the stronghold itself, for example.

Models of networking

In networking terms this can be likened to external, DMZ and internal networks — typically webservers would live in a DMZ network whereas a DB server would live deep inside the internal network.

This is better than a binary external/internal, however without defining rules on who & what can communicate to other devices within the same network (east-west traffic), it’s still very much open to exploitation — once an attacker is in the DMZ, then by impersonating a trusted resource in that tier they can gain access to the internal network.

A typical network architecture with 3 subnets; DMZ, App & Data

In example diagram, there are logical firewalls or security groups between each network tier. The rules for accessing a given App Server are stored in the common firewall rules / security group rules. Crucially this model follows the pattern of trusting interactions between services within the same network tier, ie. East-West traffic is unrestricted by firewall rules / security groups.

An alternative to this pattern of network tiers is the concept of micro-segmentation, commonly seen in Kubernetes deployments. With micro-segmentation, instead of relying on an external firewall to prohibit access to resources, each vm or host runs its own firewall (typically iptables or firewalld). The advantage of this pattern is that no host trusts another host simply because the inhabit the same network, instead trust is explicitly defined on a host by host basis.

The major downside of this approach is that there are significantly more places to update firewall rules than in the standard external firewall approach.

Service Meshes & ACLs

Managing and maintaining firewall rules on each host becomes less and less feasible as the fleet of machines expands. To mitigate this, a common tactic is to embed a service mesh in your infrastructure. This allows all services / hosts to be available on an “open” network while still controlling access between services.

A service mesh can act as a super-charged firewall, tightly controlling access between services deployed to the host hardware. For example an application server needs to access a database server on a specific port. Typical host-based firewall rules would need to be modified to allow this traffic.

However with a modern architecture where services can often move from host to host (i.e. within a Kubernetes cluster), a firewall rule is harder to create that is flexible enough to cope with a service changing IP addresses and to provide the correct security.

A service mesh can solve this issue by enabling security based on the service identity rather than based on the service location (ip/host). This service based security scoping is known as “intentions” by Consul or more generically ACLs. These intentions enforce mutual TLS between services and define which services can access each other.

This is just one of the benefits of using a service-mesh, however it is key to providing service-to-service security within a cloud-based environment where hosts don’t have a fixed address.

BeyondCorp

The key takeaway from the BeyondCorp concept is not network security tiers, it is identity (of users and services) and what identity means in terms of access to protected resources.

X-Files https://en.wikipedia.org/wiki/The_X-Files

Instead of trusting users inside the network, treat everyone the same, untrusted. Even users and services making requests for resources inside what would be considered the internal network.

To implement BeyondCorp style security then requires a solution for both service-to-service communications and user-to-service communications.

Assuming that a service-mesh (Consul, Istio etc.) works for services, what are the options for user access?

Remote user access — BeyondCorp style

While service meshes have been around for some time and are now the de-facto solution to securing access between services (and other tasks such as canary testing, blue-green deployments, traffic shifting etc), user access via identity is a less well-trodden path.

Typical enterprise access patterns for users involve a VPN. VPNs are the epitome of allowing access by virtue of “location” instead of identity. By accessing a VPN a user acts as if they are “on the same network” and as such now has access to “internal only” network resources. This is precisely the opposite of establishing trust by identity.

Instead of accessing network resources by connected temporarily to an internal network via VPN, a company can (and arguably should) place network resources on the public network and then correctly identify users and determine the correct level of access based on that identity.

This can be achieved with an Identity Provider (Idp) and an authentication mechanism. Social logins wrap both of these together but they can be separate installations. With these building blocks we can start to create a BeyondCorp-style remote access solution for almost all users of enterprise network resources. There is just one missing piece to the puzzle: how do manage remote access to software engineering & sysadmin resources?

Teleport

https://www.flickr.com/photos/kschlot1/4539496972/

Traditional access to bare metal servers and virtual machine hosts is via ssh. This has some positives:

  • Installed on every linux machine as default
  • Well-understood by the sysadmin community
  • Can be integrated with Yubikeys and other 2FA mechanisms

It also comes with some significant downsides:

  • Must open a port on the firewall (typically 22)
  • Easy to have pockets of unmanaged access appearing over time which makes access revocation on leaving difficult

To solve these and other problems associated with “bastion” or “jump” hosts, Teleport was created;

a Unified Access Plane for your infrastructure.

Implementing Teleport

At THG we decided that investigating teleport as an alternative (and potential replacement) for our sysadmins and DevOps-minded engineers to their usual ssh & bastion hosts.

As a pre-requisite for Teleport clusters it’s important to define some terminology and concepts. Teleport is designed to allow access to multiple separate teleport environments — this allows each part of the organisation to manage their own user access. To achieve this you can model your teleport setup as a tree; a root cluster which connects to leaf clusters:

A teleport configuration spanning public & private clouds

In this example diagram there is a “root” cluster deployed to Google’s Cloud (GCP) with two leaf clusters: one in a private cloud using Openstack and a second leaf deployed directly to bare metal servers. Each of these leaf clusters can have separate ownership and operations responsibilities.

In this worked example the SecOps team “owns” the root Teleport cluster in GCP and two other engineering teams have decided to replace legacy ssh access with Teleport access; one team use Openstack and the other work with bare metal servers.

Teleport documents a manual GCP deployment, however we need to ensure that as much as possible is automated so we need to adjust & adapt the given instructions to fit our requirements.

The authentication and identity piece is out of scope for this discussion, but suffice to say that Teleport offers support for various IdP backends — let’s assume we use a keycloak server to perform this task.

GCP Access

To be able to configure the Teleport root cluster in GCP, your account will need to have admin privileges.

The first step is to create the appropriate GCP service account, assign the correct roles to this account and download the x509 PEM of the service account key to use in later steps:

We’re using Pulumi as the Infra-as-code solution for this project, for an overview of pulumi and how we’ve used it previously:

GCP Instances, networking & load balancing

As Teleport acts as a proxy for secure connections, it’s a requirement to already have the fullchain and secret key for the domain that you are using. LetsEncrypt is an easy/free method of creating the required certificates. Newer versions of Teleport have integrated support for LetsEncrypt via a simple command:

sudo teleport configure --acme --acme-email=your-email@example.com --cluster-name=tele.example.com -o file

As with prior devops solutions we presented, we are using a mix of pulumi to create the base resources and Ansible to configure the VM resources after boot via cloud-init. This requires setting some metadata on the VM resource and rendering a set of values into a cloud-config file that is injected into the VM under the user-data key.

One of the values that is rendered into the cloud-config template is the GCP x509 certificate downloaded after creating the service account.

Lastly for the GCP configuration we need the ansible tasks that are executed as part of the cloud-init boot process:

This is fairly simple as it is just installing the teleport package and copying a template to the correct location:

With this teleport config in place the GCP instance will behave as a Teleport Auth and Proxy node with sessions stored in a GCP bucket and audit log events in firestore.

When all is configured and running it is a simple case of using the tsh wrapper around ssh to login to the auth/proxy node

Teleport Auth/Proxy in GCP

A similar set of pulumi & ansible scripts can be created for other environments such as the Openstack cluster described in the diagram above, or a fully-fledged Teleport cluster providing access to a Kubernetes cluster.

Open source

All the code for this is available at THG’s open source github organisation.

--

--

Kev Jackson
THG Tech Blog

Principal Software Engineer @ THG, We’re recruiting — thg.com/careers