Nerd For Tech
Published in

Nerd For Tech

Photo by FLY:D on Unsplash

Threat Modeling — EKS

Identify Service/App

In this step, we define the scope of the threat modeling.

Create a Responsibility & ownership matrix

In this step, we define who is responsible for the component in question. AWS vs Customer. In EKS, the Control plane, Fargate & Managed Node is managed by AWS, and the Data plane is managed by us. Custom node group & custom controllers, and webhook mutation are all our responsibility.

Image source AWS docs.

Identify Admin APIs

  • CreateNodegroup
  • DeleteNodegroup
  • UpdateNodegroupConfig
  • UpdateNodegroupVersion
  • CreateFargateProfile
  • DeleteFargateProfile
  • CreateCluster
  • DeleteCluster
  • UpdateClusterConfig
  • UpdateClusterVersion
  • AssociateIdentityProviderConfig
  • DisassociateIdentityProviderConfig

Identify Access/Entry Points

In this step, we find all entry points of the service in question.

  • API Endpoint
  • SSH to Host
  • Access to Container by kubelet
  • Access to cluster by launching rouge containers

Identify Exfiltration Points

In this step we find all the ways using which data can be moved out.

  • Access to upload artifacts to outside/unintended system (S3)
  • Downloading and uploading images from ECR to other ECR or artifactory

Identify places where you can put controls

Based on the above two steps we know all the places where we need to put a control mechanism.

Dataflow diagram for EKS (Security View)

Identify Actors

In this step, we try to find who all can cause a security concern in the service.

  • External Attackers
  • Malicious Containers (Tampered image)
  • Vulnerable third party packages
  • Malicious User / Stolen Credentials
  • Misuse of legitimate privileges

Identify Threats (What can go wrong?)

In this step, we find the threats associated with the actors. This is basically what an actor can do in the service.

  • People who have no access to the cluster may be able to reach the applications running on it and/or the management port(s) over a network
  • An attacker has access to a single container and would like to expand their access to take over the whole cluster
  • An attacker has valid credentials to execute commands against the Kubernetes API, as well as network access to the port
  • If the user has access to Admin API's they can modify cluster configuration

Identify the Impact of the threat

In this step, we evaluate the impact of the action performed by various actors.

  • Data destruction (deleting configurations, storage)
  • Resource Hijacking (running digital currency mining)
  • Denial of service (makes the service unavailable)
  • Disruption of service

Identify Persistance of breach

In this step, we try to find how long the said action can be present in the system.

  • Backdoor container
  • Writable Hostpath (creating a cron job on the host)
  • Kubernetes Cronjob (scheduled pods in the cluster)
  • Malicious Admission controller

Identify Lateral movements

In this step, we assess all the touchpoints the malicious actor can reach. This includes systems and services outside the scope of service being evaluated.

  • Access cloud services
  • Application Credentials in container
  • Kubernetes secrets
  • CoreDNS poisoning
  • Writable volume mounts in the host node

Identify Preventive measures

In this step we find the preventive controls to mitigate the threats identified.

  • Ensure that management services (API server, kubelet) are not exposed to untrusted networks without authentication controls in place
    - API Server Authentication
    - API Server Authorisation
    - Kubelet Authentication
  • Ensure that service accounts are either not mounted in containers or have restricted rights (i.e. not cluster-admin)
  • RBAC, IRSA, Pod Security Policy
  • Separate security groups for control plane & workers
  • Calico Network Policies
  • Private registry authentication
  • Ensure Admin APIs are only given to the Admin roles and are restricted to be assumed by certain entities from the corporate network.
  • Input validation, Authentication, Session handling, and contextual bound handling

Create a controls matrix

Once we have the controls we create a control matrix, which maps threats and all controls for that threat. We separate them into different categories like below. This needs to be reviewed by a security SME and approved.

  • Directive
    - Put configuration in the product against the threats.
    - Calico Network policy
  • Preventive
    - Instead of enabling SSH access, use SSM Session Manager when you need to remote into a host.
  • Detective
    - Event-based detection of misconfiguration (AWS Config & Custom Security tool)
    - Periodically run Kube-bench to verify compliance with CIS Amazon EKS Benchmark
    - Periodically use Amazon Inspector to assess hosts for exposure, vulnerabilities, and deviations from best practices
    - Periodically Scan your container images
  • Remediative
    - Incidence Response plan
  • Corrective
    - Iterative Hardening
Threat control matrix

Create an Incidence Response plan

In case the threat identified happens how to deal with it. This is a runbook or SOP for each threat.

  • Identify the offending Pod and worker node (by worker node, by deployment, by label, using the service account name
  • Isolate the pod/node (Network policy)
  • Revoke temporary security credentials assigned to pod/node
  • Cordon the worker node
  • Enable termination protection on the impacted worker node
  • Capture volatile artifacts on the worker node (os system memory, netstat tree dump)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Amit Singh Rathore

Amit Singh Rathore

1.4K Followers

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML