Data on EKS Platform Series 1— A Comprehensive Guide to Permission Management in Big Data Components

Tom Xiao
30 min readDec 6, 2023

--

1. Introduction

In the realm of big data, securing data on cloud platforms is crucial. This blog explores the deployment of Apache Ranger for permission management within the Hadoop ecosystem on Amazon EKS. Through practical case studies, we’ll see how Ranger integrates with Hadoop components like Hive, Spark, Trino, Yarn, and Hdfs, ensuring secure and efficient data management in a cloud environment. Join us as we navigate these advanced security strategies in the context of Kubernetes and cloud computing.

2. About DEP (Data on EKS Platform)

The DEP (Data on EKS Platform) is a Kubernetes-based, cloud-native big data platform that revolutionizes the way we handle data in EKS environments. Developed by our company, DEP integrates with familiar components like Hive, Spark, Flink, Trino, HDFS, Kafka, and more, making it a versatile and comprehensive solution for data management and BI platforms.

3. Crucial Role of Permission Management in Big Data

Effective permission management is crucial for several key reasons:

  1. Enhanced Security: Proper permission management ensures that sensitive data is only accessible to authorized individuals, thereby safeguarding against unauthorized access and potential security breaches. This is especially important in industries handling large volumes of sensitive or personal data.
  2. Operational Efficiency: By defining clear user roles and permissions, organizations can streamline workflows and reduce administrative overhead. This system simplifies managing user access, saves time, and minimizes the risk of configuration errors.
  3. Scalability and Compliance: As businesses grow and evolve, a scalable permission management system helps in smoothly adjusting user roles and access rights. This adaptability is essential for maintaining compliance with various data privacy regulations like GDPR and HIPAA, ensuring that the organization’s data practices are legally sound and up to date.
  4. Addressing Big Data Challenges: Big data comes with unique challenges like managing large volumes of rapidly evolving data across multiple platforms. Effective permission management helps tackle these challenges by controlling how data is accessed and used, ensuring data integrity and minimizing the risk of data breaches​.

4. About Apache Ranger

Apache Ranger is a comprehensive framework designed for data governance and security in Hadoop ecosystems. It provides a centralized platform to define, administer, and manage security policies consistently across various Hadoop components. Ranger specializes in fine-grained access control, offering detailed management of user permissions and auditing capabilities.

Image source: https://kymr.github.io/files/hadoop-summit/security/ranger_architecture.png

4.1 Key Components

Ranger’s Architecture and Principles Apache Ranger’s architecture is built to integrate seamlessly with a variety of big data tools like Hadoop, Hive, HBase, and Spark. The key components of Ranger include:

  1. Ranger Admin Service: This is the central component where all security policies are created and managed. It provides a web-based user interface for policy management and an API for programmatic configuration.
  2. Ranger Usersync Service: Responsible for syncing user/group information from a directory service like LDAP/AD into Ranger.
  3. Ranger Plugins: These are installed on each component of the Hadoop ecosystem (like Hive, HBase, etc.). Plugins pull policies from the Ranger Admin Service and enforce them locally.
  4. Ranger Auditing: Ranger captures access audit logs and stores them for compliance and monitoring purposes. It can integrate with external tools for advanced analytics on these audit logs.
  5. Ranger KMS (Key Management Service): Provides encryption and key management, extending Hadoop’s HDFS Transparent Data Encryption (TDE).

4.2 Permission Model

At the core of this model are policies composed of three elements: users, resources, and permissions.

  1. Users: Ranger allows the grouping of users, where a single user may be a member of multiple groups. It enables the configuration of permissions for individual users or user groups, facilitating granular access control.
  2. Resources: In Ranger, the term ‘resource’ varies across different big data components. For instance, in HDFS, a resource is identified by its file path, whereas, in Hive, it can be as broad as a Database or as specific as a Table or even a Column.
  3. Permissions: Ranger is capable of limiting actions such as read, write, and execute on resources. Permissions are typically managed through inclusion and exclusion lists: whitelist (allow), whitelist exclude, blacklist (deny), and blacklist exclude.

When a Ranger administrator sets up policies for a component and installs the corresponding Ranger plugin, the permissions become active. Now, when a user attempts to access a resource, Ranger initiates a permission validation process as follows:

  • The system retrieves all policies associated with the requested resource.
  • It then iterates through these policies to determine if the user has permission to access the resource based on the established blacklist and whitelist criteria.

From the flowchart, it’s evident that the priority levels for matching policies are as follows:

  • Blacklist takes precedence over whitelist.
  • Blacklist exclude has a higher priority than blacklist.
  • Whitelist exclude has a higher priority than whitelist.

5. Practical Case: Deploying Big Data Cluster with Apache Ranger on EKS

In this section, we delve into a case study that outlines the deployment of a fully containerized big data environment on Amazon EKS, using AWS CloudFormation for resource provisioning.

5.1 Deployment Structure

Our EKS-based deployment will encompass the following components:

  1. S3 Buckets: Leveraged for scalable and durable Hive data storage.
  2. MySQL Database: Utilized for storing Hive metadata, facilitating efficient metadata retrieval and management.
  3. EKS Cluster: Comprising three distinct node groups named platform, hadoop, and trino, each tailored for specific operational needs.
  4. Hadoop Cluster Applications: Including HDFS for distributed storage and YARN for managing cluster resources.
  5. Trino Cluster Application: To enable distributed SQL query execution for analytics.
  6. Apache Ranger: Serving as the central security management tool for access policy across the big data components.
  7. OpenLDAP: Integrated as the LDAP service to provide a centralized user information repository, essential for user authentication and authorization.
  8. Other Cloud Services Resources: Including a dedicated VPC for network security and isolation.

By the end of this deployment process, the following benefits will be realized:

  • A high-performing, scalable big data platform that can handle complex data workflows with ease.
  • Enhanced security through centralized management of authentication and authorization, provided by the integration of OpenLDAP and Apache Ranger.
  • Cost-effective infrastructure management and operation, thanks to the containerized nature of services on EKS.
  • Compliance with stringent data security and privacy regulations, assured by Apache Ranger’s policy enforcement capabilities.

Source code is available at:

  1. TomXiaoYZ/dep-public (github.com)
  2. TomXiaoYZ/big-data-platform-public (github.com)

Prerequisites

  1. Setup an free AWS account for testing purpose. You may refer to this page for access your free tiers.
  2. Configure proper permissions for your account.
  3. Follow this guidance to install AWS CLI, eksctl CLI, kubectl for AWS EKS.
Prerequisites: Install AWS CLI, eksctl CLI, kubectl for AWS EKS

TL;DR: Quick Deployment with CloudFormation

For those eager to get started with deploying a big data environment on EKS with Apache Ranger, you can immediately kick off the process using my provided CloudFormation templates. Simply add the template to your AWS CloudFormation console, and the deployment of the entire stack, from the EKS cluster to the integrated Apache Ranger, will be automated for you.

Here’s how to do it:

  1. Go to AWS CloudFormation: Log into your AWS Management Console and navigate to the CloudFormation service.
  2. Create New Stack: Select ‘Create Stack’ > ‘With new resources (standard)’.
  3. Specify Template: Download the template file from GitHub. Choose ‘Upload a template file’ and upload the provided CloudFormation template file. (Or paste this url directly: https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep.template)
  4. Configure Stack Options: Set your stack name and other parameters required by the template.
  5. Review and Launch: Review your configuration details, acknowledge any required capabilities, and click ‘Create stack’.
TL;DR: Quick Deployment with CloudFormation

After a few minutes, you’ll have a fully functional big data environment with robust security management ready for your analytical workloads.

Resources Creation Successfully

Now it's time to see how this template works.

In aws web console, find the name of your EKS cluster. In this case, it is ‘dep-demo-eks-cluster-ap-northeast-1’ for me.

aws eks update-kubeconfig --name dep-eks-cluster-ap-northeast-1 --region ap-northeast-1

## Check pod status.
kubectl get pods --namespace hadoop
kubectl get pods --namespace platform
kubectl get pods --namespace trino
Check Pod Status

Let’s see how ranger admin works in this template.

kubectl port-forward service/ranger-admin 6080:6080 --namespace platform

After ranger admin is successfully forwarded to port 6080 of localhost, go to ‘localhost:6080’ in your browser. Login with username ‘admin’ and ranger password entered above.

By default, we already created two ranger policies: hive and trino and grant all access to the ldap user created above (depadmin in my case).

Hive Policy
Trino Policy

Also, ldap user sync service is setup and will automatically sync all users from the ldap service created in this template.

Ranger Usersync Service

5.2 Explanation of template parameters

## dep-public/series-1/dep.template

---
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: 'Deploy Dep (owner: tom.xiao@hotmail.com)'
Metadata:
LICENSE: Apache License Version 2.0
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: DEP Basic Configuration
Parameters:
- EnvironmentName
- CliUsername

- Label:
default: VPC Network Configuration
Parameters:
- VpcCIDR
- PrivateSubnet1CIDR
- PrivateSubnet2CIDR
- PublicSubnet1CIDR
- PublicSubnet2CIDR
- RemoteAccessCIDR
- AccessCIDR

- Label:
default: S3 Configuration
Parameters:
- S3BucketName

- Label:
default: MySQL Configuration
Parameters:
- DBAllocatedStorage
- DBAutoMinorVersionUpgrade
- DBBackupRetentionPeriod
- DBInstanceClass
- DBIops
- DBMasterUsername
- DBMasterUserPassword
- DBMultiAZ
- DBStorageType

- Label:
default: Ldap Configuration
Parameters:
- LdapDomainName
- LdapAdmin
- LdapAdminPassword
- LdapUser
- LdapUserPassword

- Label:
default: Redis Configuration
Parameters:
- RedisPassword

- Label:
default: Others
Parameters:
- RangerPassword

ParameterLabels:
# DEP Basic Configuration
EnvironmentName:
default: Set your Dep application name.
# Cli username
CliUsername:
default: The aws username to grant eks cli access to.

# VPC Network Configuration
VpcCIDR:
default: VPC CIDR.
PrivateSubnet1CIDR:
default: Private subnet 1 CIDR.
PrivateSubnet2CIDR:
default: Private subnet 2 CIDR.
PublicSubnet1CIDR:
default: Public subnet 1 CIDR.
PublicSubnet2CIDR:
default: Public subnet 2 CIDR.
RemoteAccessCIDR:
default: Allowed WebServer external access CIDR.
AccessCIDR:
default: Permitted IP range.

# S3 Configuration
S3BucketName:
default: S3 bucket name for Hive data storage.

# MySQL Configuration
DBAllocatedStorage:
default: The size of the database in gigabytes (GiB).
DBAutoMinorVersionUpgrade:
default: Select true/false to setup Auto Minor Version upgrade.
DBBackupRetentionPeriod:
default: The number of days for which automatic DB snapshots are retained.
DBInstanceClass:
default: The name of the compute and memory capacity class of the Amazon mysql DB instance.
DBIops:
default: DB Iops. Used only when io1 specified for the StorageType property.
DBMasterUsername:
default: DB admin username.
DBMasterUserPassword:
default: DB admin password.
DBMultiAZ:
default: Specifies if the database instance is a multiple Availability Zone deployment.
DBStorageType:
default: The storage type associated with this database instance.

# Ldap Configuration
LdapDomainName:
default: Ldap domain name.
LdapAdmin:
default: Ldap admin username.
LdapAdminPassword:
default: Ldap admin password.
LdapUser:
default: Ldap user to grant admin access to.
LdapUserPassword:
default: Password for Ldap user to grant admin access to.

# Redis Configuration
RedisPassword:
default: Redis root password.

RangerPassword:
default: Admin password for ranger.

Parameters:
# DEP Basic Configuration
EnvironmentName:
Description: Set your dep application name.
Default: dep
Type: String

CliUsername:
Description: The aws username to grant eks cli access to.
Type: String

# VPC Network Configuration
VpcCIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Default: 10.99.0.0/16
Description: CIDR Block for the VPC.
Type: String
PrivateSubnet1CIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Default: 10.99.128.0/20
Description: CIDR block for private subnet 1 located in Availability Zone 1.
Type: String
PrivateSubnet2CIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Default: 10.99.144.0/20
Description: CIDR block for private subnet 2 located in Availability Zone 2.
Type: String
PublicSubnet1CIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Default: 10.99.0.0/20
Description: CIDR Block for the public DMZ subnet 1 located in Availability Zone 1.
Type: String
PublicSubnet2CIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Default: 10.99.16.0/20
Description: CIDR Block for the public DMZ subnet 2 located in Availability Zone 2.
Type: String
RemoteAccessCIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: CIDR block parameter must be in the form x.x.x.x/x
Default: 0.0.0.0/0
Description: Allowed CIDR block for external SSH access to the WebServers.
Type: String
AccessCIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Default: 0.0.0.0/0
Description: 'The CIDR IP range that is permitted to access odoo instances.
Note: a value of 0.0.0.0/0 will allow access from ANY ip address.'
Type: String

# S3 Configuration
S3BucketName:
Default: ''
Description: S3 bucket name for Hive data storage. By default 'dep-{AccountId}-{Region}'
Type: String

# MySQL Configuration
DBAllocatedStorage:
ConstraintDescription: must be between 5 and 4096 GiB. If Iops specified, AllocatedStorage
must be at least 100 GiB and with minimum Iops value of 1000
Default: '50'
Description: The size of the database in gigabytes (GiB)
MaxValue: '4096'
MinValue: '5'
Type: Number
DBAutoMinorVersionUpgrade:
AllowedValues:
- 'true'
- 'false'
Default: 'false'
Description: Select true/false to setup Auto Minor Version upgrade
Type: String
DBBackupRetentionPeriod:
Default: '7'
Description: The number of days for which automatic DB snapshots are retained.
Type: String
DBInstanceClass:
AllowedValues:
- db.m6g.large
- db.m6g.xlarge
- db.m6g.2xlarge
- db.m6g.4xlarge
- db.m6g.8xlarge
- db.m6g.12xlarge
- db.m6g.16xlarge
- db.m5.large
- db.m5.xlarge
- db.m5.2xlarge
- db.m5.8xlarge
ConstraintDescription: Must select a valid database instance type.
Default: db.m5.large
Description: The name of the compute and memory capacity class of the Amazon mysql DB instance.
Type: String
DBIops:
AllowedValues:
- '1000'
- '2000'
- '3000'
- '4000'
- '5000'
- '6000'
- '7000'
- '8000'
- '9000'
- '10000'
ConstraintDescription: '1000 Iops min and increased in 1K increments. '
Default: '1000'
Description: DB Iops. Used only when io1 specified for the StorageType property
Type: Number
DBMasterUsername:
Default: depadmin
Type: String
DBMasterUserPassword:
Default: XwgAST0%QsgTx[
Type: String
DBMultiAZ:
AllowedValues:
- 'true'
- 'false'
Default: 'false'
Description: Specifies if the database instance is a multiple Availability Zone
deployment.
Type: String
DBStorageType:
AllowedValues:
- standard
- gp2
- io1
Default: standard
Description: The storage type associated with this database instance
Type: String

LdapDomainName:
Description: Ldap domain name.
Type: String
Default: 'dep'
LdapAdmin:
Description: Ldap admin username.
Type: String
Default: 'admin'
LdapAdminPassword:
Description: Ldap admin password.
Type: String
Default: 'admin123456'
LdapUser:
Description: Ldap user to grant admin access to.
Type: String
Default: 'depadmin'
LdapUserPassword:
Description: Password for Ldap user to grant admin access to.
Type: String
Default: 'dep123456'

RedisPassword:
Description: Redis root password.
Type: String
Default: 'depredis123456'

RangerPassword:
Description: Admin password for ranger.
Type: String
Default: 'dqwW988Ca012n4!'

DEP Basic Configuration

  • EnvironmentName: This parameter allows you to set a unique name for your DEP application, helping identify resources related to your deployment.
  • CliUsername : The aws cli username to grant EKS access to. This allows you to execute eksctl & kubectl commands in your command line tools.
Example of CliUsername

VPC Network Configuration

  • VpcCIDR: Defines the IP range for your VPC, creating an isolated network space for your resources.
  • PrivateSubnet1CIDR & PrivateSubnet2CIDR: These specify the IP ranges for private subnets where your EKS and database resources will reside, ensuring secure internal access.
  • PublicSubnet1CIDR & PublicSubnet2CIDR: These are for public subnets, typically used for resources that need to be accessed over the internet, like load balancers.
  • RemoteAccessCIDR: This controls which external IP ranges are allowed SSH access to the servers, enhancing security by restricting access.
  • AccessCIDR: Specifies the range of IP addresses that are permitted to access the instances, a critical setting for controlling access to your application.

S3 Configuration

  • S3BucketName : S3 bucket name for Hive data storage. By default ‘dep-{AccountId}-{Region}’.

MySQL Configuration

  • DBAllocatedStorage: Determines the initial storage capacity allocated for your database.
  • DBAutoMinorVersionUpgrade: Allows you to enable automatic updates to newer minor versions for your MySQL database.
  • DBBackupRetentionPeriod: Sets how many days your automated database backups are retained.
  • DBInstanceClass: Selects the class of your database instance, balancing compute and memory capacity according to your workload.
  • DBIops: Specifies the provisioned IOPS for the database, enhancing performance for I/O-intensive applications.
  • DBMasterUsername & DBMasterUserPassword: These parameters are for setting the administrative credentials for your MySQL database.
  • DBMultiAZ: Enables deployment across multiple availability zones for high availability.
  • DBStorageType: Lets you choose the type of storage used by the database instance.

Ldap Configuration

  • LdapDomainName: Sets the domain name for the LDAP service.
  • LdapAdmin & LdapAdminPassword: These are the credentials for the LDAP administrator, crucial for setting up and managing the LDAP service.
  • LdapUser & LdapUserPassword : These are the credentials for the default LDAP users with admin access.

Redis Configuration

  • RedisPassword : Root password for Redis service.

Ranger Configuration

  • RangerPassword : Admin password for Ranger admin service.

5.3 Create VPC

This will automate the creation of a network architecture that adheres to AWS best practices for high availability and security.

## dep-public/series-1/dep.template

Resources:
DepVpcStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-vpc.template'
Parameters:
EnvironmentName: !Ref EnvironmentName
PrivateSubnet1CIDR: !Ref PrivateSubnet1CIDR
PrivateSubnet2CIDR: !Ref PrivateSubnet2CIDR
PublicSubnet1CIDR: !Ref PublicSubnet1CIDR
PublicSubnet2CIDR: !Ref PublicSubnet2CIDR
VpcCIDR: !Ref VpcCIDR
## dep-public/series-1/dep-vpc.template

AWSTemplateFormatVersion: 2010-09-09
Description: This template deploys a VPC, with a pair of public and private subnets spread
across two Availability Zones. It deploys an internet gateway, with a default
route on the public subnets. It deploys a pair of NAT gateways (one in each AZ),
and default routes for them in the private subnets.

Parameters:
EnvironmentName:
Description: An environment name that is prefixed to resource names
Type: String

VpcCIDR:
Description: Please enter the IP range (CIDR notation) for this VPC
Type: String
Default: 10.192.0.0/16

PublicSubnet1CIDR:
Description: Please enter the IP range (CIDR notation) for the public subnet in the first Availability Zone
Type: String
Default: 10.192.10.0/24

PublicSubnet2CIDR:
Description: Please enter the IP range (CIDR notation) for the public subnet in the second Availability Zone
Type: String
Default: 10.192.11.0/24

PrivateSubnet1CIDR:
Description: Please enter the IP range (CIDR notation) for the private subnet in the first Availability Zone
Type: String
Default: 10.192.20.0/24

PrivateSubnet2CIDR:
Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone
Type: String
Default: 10.192.21.0/24

Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcCIDR
EnableDnsSupport: true
EnableDnsHostnames: true
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- vpc

InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- igw

InternetGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
InternetGatewayId: !Ref InternetGateway
VpcId: !Ref VPC

PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [ 0, !GetAZs '' ]
CidrBlock: !Ref PublicSubnet1CIDR
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- public-subnet-1
- Key: kubernetes.io/role/elb
Value: 1

PublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [ 1, !GetAZs '' ]
CidrBlock: !Ref PublicSubnet2CIDR
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- public-subnet-2
- Key: kubernetes.io/role/elb
Value: 1

PrivateSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [ 0, !GetAZs '' ]
CidrBlock: !Ref PrivateSubnet1CIDR
MapPublicIpOnLaunch: false
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- private-subnet-1
- Key: kubernetes.io/role/internal-elb
Value: 1

PrivateSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [ 1, !GetAZs '' ]
CidrBlock: !Ref PrivateSubnet2CIDR
MapPublicIpOnLaunch: false
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- private-subnet-2
- Key: kubernetes.io/role/internal-elb
Value: 1

NatGateway1EIP:
Type: AWS::EC2::EIP
DependsOn: InternetGatewayAttachment
Properties:
Domain: vpc

NatGateway2EIP:
Type: AWS::EC2::EIP
DependsOn: InternetGatewayAttachment
Properties:
Domain: vpc

NatGateway1:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NatGateway1EIP.AllocationId
SubnetId: !Ref PublicSubnet1

NatGateway2:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NatGateway2EIP.AllocationId
SubnetId: !Ref PublicSubnet2

PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- public-route-table

DefaultPublicRoute:
Type: AWS::EC2::Route
DependsOn: InternetGatewayAttachment
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway

PublicSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref PublicSubnet1

PublicSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref PublicSubnet2

PrivateRouteTable1:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- private-route-table-1

DefaultPrivateRoute1:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTable1
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGateway1

PrivateSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTable1
SubnetId: !Ref PrivateSubnet1

PrivateRouteTable2:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Join
- '-'
- - !Ref EnvironmentName
- private-route-table-2

DefaultPrivateRoute2:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTable2
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGateway2

PrivateSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTable2
SubnetId: !Ref PrivateSubnet2

Outputs:
VPC:
Description: A reference to the created VPC.
Value: !Ref VPC

PublicSubnets:
Description: A list of the public subnets.
Value: !Join [ ",", [ !Ref PublicSubnet1, !Ref PublicSubnet2 ]]

PrivateSubnets:
Description: A list of the private subnets.
Value: !Join [ ",", [ !Ref PrivateSubnet1, !Ref PrivateSubnet2 ]]

PublicSubnet1:
Description: A reference to the public subnet in the 1st Availability Zone.
Value: !Ref PublicSubnet1

PublicSubnet2:
Description: A reference to the public subnet in the 2nd Availability Zone.
Value: !Ref PublicSubnet2

PrivateSubnet1:
Description: A reference to the private subnet in the 1st Availability Zone.
Value: !Ref PrivateSubnet1

PrivateSubnet2:
Description: A reference to the private subnet in the 2nd Availability Zone.
Value: !Ref PrivateSubnet2

CidrBlock:
Description: The primary IPv4 CIDR block for the VPC. For example, 10.0.0.0/16.
Value: !GetAtt VPC.CidrBlock

5.4 Create security groups

This template sets up the necessary network security parameters for a MySQL database and an EKS cluster in AWS, ensuring that the appropriate ports are open and traffic is allowed from and to the specified IP ranges. It’s a crucial step in setting up a secure and functional environment for applications that require database and Kubernetes cluster services.

## dep-public/series-1/dep.template

Resources:
DepSecurityGroupStack:
Type: AWS::CloudFormation::Stack
DependsOn:
- DepVpcStack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-securitygroup.template'
Parameters:
EnvironmentName: !Ref EnvironmentName
AccessCIDR: !GetAtt DepVpcStack.Outputs.CidrBlock
VPCID: !GetAtt DepVpcStack.Outputs.VPC
VPCCIDR: !GetAtt DepVpcStack.Outputs.CidrBlock
## dep-public/series-1/dep-securitygroup.template

AWSTemplateFormatVersion: 2010-09-09
Description: Dep Security Groups template
Parameters:
EnvironmentName:
Description: An environment name that is prefixed to resource names
Type: String
AccessCIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Description: 'The CIDR IP range that is permitted to access. Note: a value of
0.0.0.0/0 will allow access from ANY ip address'
Type: String
VPCID:
Description: VPC ID of your existing Virtual Private Cloud (VPC) where you want
to depoy RDS.
Type: AWS::EC2::VPC::Id
VPCCIDR:
AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$
ConstraintDescription: Must be a valid IP range in x.x.x.x/x notation
Description: The CIDR block for VPC
Type: String

Resources:
MySQLRDSSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub ${EnvironmentName}-mysql-sg
GroupDescription: Allow access to MySQL Port
VpcId: !Ref VPCID
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 3306
ToPort: 3306
CidrIp: !Ref VPCCIDR
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0

EKSSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub ${EnvironmentName}-eks-sg
GroupDescription: EKS security group
VpcId: !Ref VPCID
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 0
ToPort: 65535
CidrIp: !Ref AccessCIDR
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 0
ToPort: 65535
CidrIp: 0.0.0.0/0
Tags:
- Key: !Sub kubernetes.io/cluster/${EnvironmentName}-eks-cluster
Value: shared

Outputs:
MySQLRDSSecurityGroup:
Description: MySQL Security Group
Value: !Ref MySQLRDSSecurityGroup
EKSSecurityGroup:
Description: EKS Security Group
Value: !Ref EKSSecurityGroup

5.5 Create S3 Bucket

This template automates the setup of an S3 bucket for the Data Engineering Platform (DEP) with policies that help manage the storage costs effectively.

*Note:

1. Before you clean up cloudformation resources, you should manually delete contents in previously created S3 bucket to avoid unsuccessful deletion.
2. Use this demo only for demo.

## dep-public/series-1/dep.template

Conditions:
CustomS3BucketName: !Equals ['', !Ref S3BucketName]

Resources:
DepS3Stack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-s3.template'
Parameters:
BucketName: !If [CustomS3BucketName, !Ref S3BucketName, !Join ['-', [!Ref EnvironmentName, !Ref AWS::AccountId, !Ref AWS::Region]]]
## dep-public/series-1/dep-s3.template

AWSTemplateFormatVersion: 2010-09-09
Description: 'Create Dep S3 Bucket'
Parameters:
# S3 Configuration
BucketName:
Type: String
Resources:
DepS3:
Type: 'AWS::S3::Bucket'
# DeletionPolicy: Retain
Properties:
BucketName: !Ref BucketName
LifecycleConfiguration:
Rules:
- Id: IntelligentTieringTransition
Status: Enabled
Transitions:
- TransitionInDays: 7
StorageClass: INTELLIGENT_TIERING
- Id: NoncurrentVersionExpirationInDays
Status: Enabled
NoncurrentVersionExpirationInDays: 1
Outputs:
BucketName:
Description: S3 Bucket Name
Value: !Ref DepS3

5.6 Create IAM roles and assign policies

This template facilitates the creation of specialized IAM roles that are integral to managing and operating a DEP environment on AWS, ensuring that each service and node group within EKS has the appropriate permissions for operation and access to necessary resources.

  • The DepIamRole is suited for administrative tasks that require broad permissions across multiple AWS services, including EKS management, Lambda functions, API Gateway, CloudFormation, and S3.
  • The DepEksWorkerIamRole is tailored for EKS worker nodes, granting permissions necessary for EKS operations, access to the ECR, and S3 bucket interaction.
## dep-public/series-1/dep.template

Resources:
DepIamStack:
Type: AWS::CloudFormation::Stack
DependsOn:
- DepS3Stack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-iam.template'
Parameters:
EnvironmentName: !Ref EnvironmentName
S3BucketName: !GetAtt DepS3Stack.Outputs.BucketName
## dep-public/series-1/dep-iam.template

AWSTemplateFormatVersion: 2010-09-09
Description: Dep IAM Role
Parameters:
EnvironmentName:
Description: An environment name that is prefixed to resource names
Type: String
S3BucketName:
Description: S3 bucket name to grant access to.
Type: String

Resources:
DepIamRole:
Type: AWS::IAM::Role
Properties:
Description: "IAM role created for managing dep resources"
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- eks.amazonaws.com
- resources.cloudformation.amazonaws.com
- cloudformation.amazonaws.com
- lambda.amazonaws.com
- apigateway.amazonaws.com
- events.amazonaws.com
Action:
- "sts:AssumeRole"
Path: "/"
RoleName: !Join
- "-"
- - !Ref EnvironmentName
- "iam-role"
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
- arn:aws:iam::aws:policy/AmazonEKSServicePolicy
- arn:aws:iam::aws:policy/SecretsManagerReadWrite
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy
Policies:
- PolicyName: DEPS3AccessPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- s3:*
Effect: Allow
Resource:
- !Join
- ""
- - "arn:aws:s3:::"
- !Ref S3BucketName
- !Join
- ""
- - "arn:aws:s3:::"
- !Ref S3BucketName
- "/*"
- PolicyName: ResourceTypePolicy
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "sts:GetCallerIdentity"
- "eks:CreateCluster"
- "eks:DeleteCluster"
- "eks:DescribeCluster"
- "eks:ListTagsForResource"
- "eks:UpdateClusterVersion"
- "eks:UpdateClusterConfig"
- "eks:TagResource"
- "eks:UntagResource"
- "iam:PassRole"
- "sts:AssumeRole"
- "lambda:UpdateFunctionConfiguration"
- "lambda:DeleteFunction"
- "lambda:GetFunction"
- "lambda:InvokeFunction"
- "lambda:CreateFunction"
- "lambda:UpdateFunctionCode"
- "ec2:DescribeVpcs"
- "ec2:DescribeSubnets"
- "ec2:DescribeSecurityGroups"
- "kms:CreateGrant"
- "kms:DescribeKey"
- "logs:CreateLogGroup"
- "logs:CreateLogStream"
- "logs:DescribeLogGroups"
- "logs:DescribeLogStreams"
- "logs:PutLogEvents"
- "cloudwatch:ListMetrics"
- "cloudwatch:PutMetricData"
Resource: "*"
- PolicyName: EbsCsi
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- "kms:Decrypt"
- "kms:GenerateDataKeyWithoutPlaintext"
- "kms:CreateGrant"
Effect: Allow
Resource: "*"
- PolicyName: EksAutoScaling
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- "autoscaling:DescribeAutoScalingGroups"
- "autoscaling:DescribeAutoScalingInstances"
- "autoscaling:DescribeLaunchConfigurations"
- "autoscaling:DescribeTags"
- "autoscaling:SetDesiredCapacity"
- "autoscaling:TerminateInstanceInAutoScalingGroup"
- "ec2:DescribeLaunchTemplateVersions"
Effect: Allow
Resource: "*"

DepEksWorkerIamRole:
Type: AWS::IAM::Role
Properties:
Description: "IAM role created for managing eks node groups"
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- eks.amazonaws.com
- resources.cloudformation.amazonaws.com
- cloudformation.amazonaws.com
- lambda.amazonaws.com
- apigateway.amazonaws.com
- events.amazonaws.com
- ec2.amazonaws.com
Action:
- "sts:AssumeRole"
RoleName: !Join
- "-"
- - !Ref EnvironmentName
- "worker-role"
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy
- arn:aws:iam::aws:policy/SecretsManagerReadWrite
Policies:
- PolicyName: DEPS3AccessPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- s3:*
Effect: Allow
Resource:
- !Join
- ""
- - "arn:aws:s3:::"
- !Ref S3BucketName
- !Join
- ""
- - "arn:aws:s3:::"
- !Ref S3BucketName
- "/*"
- PolicyName: ResourceTypePolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- "eks:CreateCluster"
- "eks:DeleteCluster"
- "iam:PassRole"
- "sts:AssumeRole"
- "lambda:InvokeFunction"
- "lambda:CreateFunction"
Effect: Allow
Resource: !Sub arn:aws:eks:${AWS::Region}:${AWS::AccountId}:cluster/${EnvironmentName}-eks-cluster/*
- PolicyName: EbsCsi
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- "kms:Decrypt"
- "kms:GenerateDataKeyWithoutPlaintext"
- "kms:CreateGrant"
Effect: Allow
Resource: "*"
- PolicyName: AutoScaler
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- "autoscaling:DescribeAutoScalingGroups"
- "autoscaling:DescribeAutoScalingInstances"
- "autoscaling:DescribeLaunchConfigurations"
- "autoscaling:DescribeTags"
- "autoscaling:SetDesiredCapacity"
- "autoscaling:TerminateInstanceInAutoScalingGroup"
- "eks:DescribeNodegroup"
Effect: Allow
Resource: "*"

DepLambdaIamRole:
Type: AWS::IAM::Role
Properties:
Description: "IAM role created for lambda execution"
RoleName: !Join
- "-"
- - !Ref EnvironmentName
- "lambda-role"
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- eks.amazonaws.com
- lambda.amazonaws.com
Action: sts:AssumeRole
Path: "/"
ManagedPolicyArns:
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEKSClusterPolicy'
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEKSServicePolicy'
- arn:aws:iam::aws:policy/service-role/AWSLambdaENIManagementAccess
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Outputs:
DepIamRole:
Description: IAM role created for managing dep resources
Value: !GetAtt DepIamRole.Arn
DepEksWorkerIamRole:
Description: IAM role created for managing eks node groups
Value: !GetAtt DepEksWorkerIamRole.Arn
DepLambdaIamRole:
Description: IAM role created for lambda execution
Value: !GetAtt DepLambdaIamRole.Arn

5.7 Create ec2 ssh key

## dep-public/series-1/dep.template

Resources:
DepEc2SshKeyPair:
Type: AWS::EC2::KeyPair
Properties:
KeyName: !Sub ${EnvironmentName}-eks-worker-key-pair

5.8 Enable third-party extension for EKS to manage Kubernetes resources

This template extends the native functionality of CloudFormation, allowing it to interact with and manage EKS clusters and Kubernetes resources more effectively.

Reference: AWS CloudFormation Resource Types for Kubernetes (amazon.com)

## dep-public/series-1/dep.template

Resources:
# Enable third-party cloudformation extension to manage eks cluster
DepCloudformationExtensionsStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-extensions.template'
## dep-public/series-1/dep-extensions.template

AWSTemplateFormatVersion: '2010-09-09'
Description: Extensions
Resources:
# AWSQS::EKS::Cluster
AWSQSEKSClusterExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Join
- '-'
- - 'AWSQSExecutionIAMRole'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
MaxSessionDuration: 8400
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- resources.cloudformation.amazonaws.com
- cloudformation.amazonaws.com
- lambda.amazonaws.com
Action: sts:AssumeRole
Path: '/'
Policies:
- PolicyName: !Join
- '-'
- - 'AWSQSExecutionIAMRole'
- 'policy'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sts:GetCallerIdentity
- eks:CreateCluster
- eks:DeleteCluster
- eks:DescribeCluster
- eks:ListTagsForResource
- eks:UpdateClusterVersion
- eks:UpdateClusterConfig
- eks:TagResource
- eks:UntagResource
- lambda:UpdateFunctionConfiguration
- lambda:DeleteFunction
- lambda:GetFunction
- lambda:InvokeFunction
- lambda:CreateFunction
- lambda:UpdateFunctionCode
- ec2:DescribeVpcs
- ec2:DescribeSubnets
- ec2:DescribeSecurityGroups
- kms:CreateGrant
- kms:DescribeKey
Resource:
- '*'
- Effect: Allow
Action:
- iam:PassRole
Resource:
- !Sub arn:aws:iam::${AWS::AccountId}:role/*
Condition:
StringEquals:
iam:PassedToService:
- eks.amazonaws.com
- lambda.amazonaws.com
StringLike:
iam:AssociatedResourceArn:
- !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:*
- !Sub arn:aws:eks:${AWS::Region}:${AWS::AccountId}:cluster/*
Tags:
- Key: Name
Value: !Join
- '-'
- - 'AWSQSExecutionIAMRole'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]

AWSQSEKSClusterLogDeliveryRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Join
- '-'
- - 'AWSQSLogDeliveryIAMRole'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
MaxSessionDuration: 8400
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- resources.cloudformation.amazonaws.com
- cloudformation.amazonaws.com
Action: sts:AssumeRole
Path: '/'
Policies:
- PolicyName: !Join
- '-'
- - 'AWSQSLogDeliveryIAMRole'
- 'policy'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:DescribeLogGroups
- logs:DescribeLogStreams
- logs:PutLogEvents
- cloudwatch:ListMetrics
- cloudwatch:PutMetricData
Resource:
- '*'
Tags:
- Key: Name
Value: !Join
- '-'
- - 'AWSQSLogDeliveryIAMRole'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]

AWSQSEKSClusterResourceVersion:
Type: AWS::CloudFormation::ResourceVersion
Properties:
TypeName: AWSQS::EKS::Cluster
ExecutionRoleArn: !GetAtt AWSQSEKSClusterExecutionRole.Arn
LoggingConfig:
LogGroupName: !Join
- '-'
- - '/aws/cloudformation/registry/AWSQSEKSClusterLogGroup'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
LogRoleArn: !GetAtt AWSQSEKSClusterLogDeliveryRole.Arn
SchemaHandlerPackage: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/extensions/awsqs-eks-cluster.zip'

AWSQSEKubernetesHelmVersion:
Type: AWS::CloudFormation::ResourceVersion
Properties:
TypeName: AWSQS::Kubernetes::Helm
ExecutionRoleArn: !GetAtt AWSQSEKSClusterExecutionRole.Arn
LoggingConfig:
LogGroupName: !Join
- '-'
- - '/aws/cloudformation/registry/AWSQSEKSClusterLogGroup'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
LogRoleArn: !GetAtt AWSQSEKSClusterLogDeliveryRole.Arn
SchemaHandlerPackage: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/extensions/awsqs-kubernetes-helm.zip'

AWSQSEKubernetesResourceVersion:
Type: AWS::CloudFormation::ResourceVersion
Properties:
TypeName: AWSQS::Kubernetes::Resource
ExecutionRoleArn: !GetAtt AWSQSEKSClusterExecutionRole.Arn
LoggingConfig:
LogGroupName: !Join
- '-'
- - '/aws/cloudformation/registry/AWSQSEKSClusterLogGroup'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
LogRoleArn: !GetAtt AWSQSEKSClusterLogDeliveryRole.Arn
SchemaHandlerPackage: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/extensions/awsqs-kubernetes-apply.zip'

AWSQSEKubernetesResourceGetVersion:
Type: AWS::CloudFormation::ResourceVersion
Properties:
TypeName: AWSQS::Kubernetes::Get
ExecutionRoleArn: !GetAtt AWSQSEKSClusterExecutionRole.Arn
LoggingConfig:
LogGroupName: !Join
- '-'
- - '/aws/cloudformation/registry/AWSQSEKSClusterLogGroup'
- !Select [2, !Split [ '/', !Ref AWS::StackId ]]
LogRoleArn: !GetAtt AWSQSEKSClusterLogDeliveryRole.Arn
SchemaHandlerPackage: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/extensions/awsqs-kubernetes-get.zip'

AWSQSEKSClusterResourceDefaultVersion:
Type: AWS::CloudFormation::ResourceDefaultVersion
Properties:
TypeVersionArn: !Ref AWSQSEKSClusterResourceVersion

AWSQSEKubernetesHelmDefaultVersion:
Type: AWS::CloudFormation::ResourceDefaultVersion
Properties:
TypeVersionArn: !Ref AWSQSEKubernetesHelmVersion

AWSQSEKubernetesResourceDefaultVersion:
Type: AWS::CloudFormation::ResourceDefaultVersion
Properties:
TypeVersionArn: !Ref AWSQSEKubernetesResourceVersion

AWSQSEKubernetesResourceGetDefaultVersion:
Type: AWS::CloudFormation::ResourceDefaultVersion
Properties:
TypeVersionArn: !Ref AWSQSEKubernetesResourceGetVersion

5.9 Create RDS MySQL instance

This template is designed to create an Amazon RDS (Relational Database Service) instance with a MySQL database, along with the necessary networking and security configurations.

## dep-public/series-1/dep.template

Resources:
DepDBInstanceStack:
Type: AWS::CloudFormation::Stack
DependsOn:
- DepVpcStack
- DepSecurityGroupStack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-mysql.template'
Parameters:
DBInstanceIdentifier: !Join
- '-'
- - !Ref EnvironmentName
- mysql
DBAllocatedStorage: !Ref DBAllocatedStorage
DBAutoMinorVersionUpgrade: !Ref DBAutoMinorVersionUpgrade
DBBackupRetentionPeriod: !Ref DBBackupRetentionPeriod
DBInstanceClass: !Ref DBInstanceClass
DBIops: !Ref DBIops
DBMasterUsername: !Ref DBMasterUsername
DBMasterUserPassword: !Ref DBMasterUserPassword
DBMultiAZ: !Ref DBMultiAZ
DBStorageType: !Ref DBStorageType
CustomDBSecurityGroup: !GetAtt DepSecurityGroupStack.Outputs.MySQLRDSSecurityGroup
Subnet1ID: !GetAtt DepVpcStack.Outputs.PublicSubnet1
Subnet2ID: !GetAtt DepVpcStack.Outputs.PublicSubnet2
VPCID: !GetAtt DepVpcStack.Outputs.VPC
## dep-public/series-1/dep-mysql.template

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Aurora cluster template
Parameters:
DBInstanceIdentifier:
Description: A name for the DB instance. If you specify a name, AWS CloudFormation converts it to lowercase.
If you don't specify a name, AWS CloudFormation generates a unique physical ID and uses that ID for the DB instance.
Type: String
MinLength: '1'
MaxLength: '63'
DBAllocatedStorage:
ConstraintDescription: must be between 5 and 4096 GiB. If Iops specified, AllocatedStorage
must be at least 100 GiB and with minimum Iops value of 1000.
Default: '50'
Description: The size of the database in gigabytes (GiB).
MaxValue: '4096'
MinValue: '5'
Type: Number
DBAutoMinorVersionUpgrade:
AllowedValues:
- 'true'
- 'false'
Default: 'false'
Description: Select true/false to setup Auto Minor Version upgrade.
Type: String
DBBackupRetentionPeriod:
Default: '7'
Description: The number of days for which automatic DB snapshots are retained.
Type: String
DBInstanceClass:
AllowedValues:
- db.m6g.large
- db.m6g.xlarge
- db.m6g.2xlarge
- db.m6g.4xlarge
- db.m6g.8xlarge
- db.m6g.12xlarge
- db.m6g.16xlarge
- db.m5.large
- db.m5.xlarge
- db.m5.2xlarge
- db.m5.8xlarge
ConstraintDescription: Must select a valid database instance type.
Default: db.m5.large
Description: The name of the compute and memory capacity class of the Amazon mysql DB instance.
Type: String
DBIops:
AllowedValues:
- '1000'
- '2000'
- '3000'
- '4000'
- '5000'
- '6000'
- '7000'
- '8000'
- '9000'
- '10000'
ConstraintDescription: '1000 Iops min and increased in 1K increments. '
Default: '1000'
Description: DB Iops. Used only when io1 specified for the StorageType property.
Type: Number
DBMasterUsername:
Type: String
DBMasterUserPassword:
Type: String
DBMultiAZ:
AllowedValues:
- 'true'
- 'false'
Default: 'false'
Description: Specifies if the database instance is a multiple Availability Zone deployment.
Type: String
DBStorageType:
AllowedValues:
- standard
- gp2
- io1
Default: standard
Description: The storage type associated with this database instance.
Type: String
CustomDBSecurityGroup:
Description: MySQL Security Group.
Type: AWS::EC2::SecurityGroup::Id
Subnet1ID:
Description: The ID of the private subnet in Availability Zone 1.
Type: 'AWS::EC2::Subnet::Id'
Subnet2ID:
Description: The ID of the private subnet in Availability Zone 2.
Type: 'AWS::EC2::Subnet::Id'
VPCID:
Description: ID of the VPC you are deploying into (e.g., vpc-0343606e).
Type: 'AWS::EC2::VPC::Id'
Default: ''
Conditions:
IOPSStorageType:
!Equals
- !Ref DBStorageType
- io1
Resources:
DBParameterGroup:
Type: AWS::RDS::DBParameterGroup
Properties:
Description: Parameter group of mysql instance
Family: mysql8.0
Parameters:
log_bin_trust_function_creators: 1

DBSubnetGroup:
Type: AWS::RDS::DBSubnetGroup
Properties:
DBSubnetGroupDescription: Subnets available for the RDS mysql DB Instance
SubnetIds:
- !Ref Subnet1ID
- !Ref Subnet2ID

DBInstance:
Type: AWS::RDS::DBInstance
Properties:
AllocatedStorage: !Ref DBAllocatedStorage
DBInstanceClass: !Ref DBInstanceClass
DBInstanceIdentifier: !Ref DBInstanceIdentifier
Engine: mysql
EngineVersion: '8.0.33'
MasterUsername: !Ref DBMasterUsername
MasterUserPassword: !Ref DBMasterUserPassword
DBParameterGroupName: !GetAtt DBParameterGroup.DBParameterGroupName
DBSubnetGroupName: !Ref DBSubnetGroup
VPCSecurityGroups:
- !Ref CustomDBSecurityGroup
MultiAZ: !Ref DBMultiAZ
StorageType: !Ref DBStorageType
AutoMinorVersionUpgrade: !Ref DBAutoMinorVersionUpgrade
BackupRetentionPeriod: !Ref DBBackupRetentionPeriod
Iops:
!If
- IOPSStorageType
- !Ref DBIops
- !Ref AWS::NoValue
Tags:
- Key: Name
Value: !Sub PostgreSQLDB-${AWS::StackName}

DBSecret:
Type: 'AWS::SecretsManager::Secret'
Properties:
Name: dep-db-instance-secret
SecretString: !Sub '{"username":"${DBMasterUsername}","password":"${DBMasterUserPassword}"}'

Outputs:
RDSEndPoints:
Description: Amazon RDS Endpoint to connect
Value: !Sub ${DBInstance.Endpoint.Address}:${DBInstance.Endpoint.Port}
RDSEndPointAddress:
Description: Amazon RDS Endpoint to connect
Value: !Sub ${DBInstance.Endpoint.Address}
RDSEndPointPort:
Description: Amazon RDS Endpoint to connect
Value: !Sub ${DBInstance.Endpoint.Port}
DBInstanceArn:
Description: The Amazon Resource Name (ARN) for the DB instance.
Value: !GetAtt DBInstance.DBInstanceArn
DBSecretArn:
Description: The ARN of the secret.
Value: !GetAtt DBSecret.Id

5.10 Create EKS cluster and nodegroups

This template is designed to create and configure an Amazon EKS (Elastic Kubernetes Service) cluster along with its associated node groups and Kubernetes resources. It provides a comprehensive setup for a Kubernetes environment in AWS, including the EKS cluster, node groups for platform, Hadoop, and Trino, and specific Kubernetes resources like namespaces and storage classes.

## dep-public/series-1/dep.template

Resources:
DepEksStack:
Type: AWS::CloudFormation::Stack
DependsOn:
- DepVpcStack
- DepSecurityGroupStack
- DepIamStack
- DepCloudformationExtensionsStack
- DepDBInstanceStack
Properties:
TemplateURL: 'https://dep-public.s3.ap-northeast-1.amazonaws.com/series-1/dep-eks.template'
Parameters:
EnvironmentName: !Ref EnvironmentName
LambdaRoleArn: !GetAtt DepIamStack.Outputs.DepLambdaIamRole
ClusterRoleArn: !GetAtt DepIamStack.Outputs.DepIamRole
WorkerRoleArn: !GetAtt DepIamStack.Outputs.DepEksWorkerIamRole
SecurityGroupIds: !GetAtt DepSecurityGroupStack.Outputs.EKSSecurityGroup
PrivateSubnet1Id: !GetAtt DepVpcStack.Outputs.PrivateSubnet1
PrivateSubnet2Id: !GetAtt DepVpcStack.Outputs.PrivateSubnet2
PublicSubnet1Id: !GetAtt DepVpcStack.Outputs.PublicSubnet1
PublicSubnet2Id: !GetAtt DepVpcStack.Outputs.PublicSubnet2
VpcCIDR: !Ref VpcCIDR
AmiType: 'AL2_ARM_64'
CapacityType: 'SPOT'
InstanceTypes: 'm6g.large'
Ec2SshKey: !Ref DepEc2SshKeyPair
DbEndpoint: !GetAtt DepDBInstanceStack.Outputs.RDSEndPointAddress
DbPort: !GetAtt DepDBInstanceStack.Outputs.RDSEndPointPort
DbUserName: !Ref DBMasterUsername
DbUserPassword: !Ref DBMasterUserPassword
S3BucketName: !GetAtt DepS3Stack.Outputs.BucketName
LdapDomainName: !Ref LdapDomainName
LdapAdmin: !Ref LdapAdmin
LdapAdminPassword: !Ref LdapAdminPassword
LdapUser: !Ref LdapUser
LdapUserPassword: !Ref LdapUserPassword
RedisPassword: !Ref RedisPassword
CliUsername: !Ref CliUsername
RangerPassword: !Ref RangerPassword
## dep-public/series-1/dep-eks.template

AWSTemplateFormatVersion: '2010-09-09'
Description: 'Create Dep kubernetes resources.'
Parameters:
# EKS Cluster Configuration
EnvironmentName:
Type: String
LambdaRoleArn:
Type: String
ClusterRoleArn:
Type: String
WorkerRoleArn:
Type: String
SecurityGroupIds:
Type: String
PrivateSubnet1Id:
Type: String
PrivateSubnet2Id:
Type: String
PublicSubnet1Id:
Type: String
PublicSubnet2Id:
Type: String
VpcCIDR:
Type: String

# EKS Nodegroup Configuration
AmiType:
Description: The AMI type for your node group.
Type: String
CapacityType:
Description: The capacity type of your managed node group.
Type: String
DiskSize:
Description: The root device disk size (in GiB) for your node group instances.
Type: Number
Default: 100
ForceUpdateEnabled:
Description: Force the update if the existing node group's pods are unable to be drained due to a pod disruption budget issue.
Type: String
Default: false
InstanceTypes:
Description: Specify the instance types for a node group.
Type: String
Ec2SshKey:
Description: The Amazon EC2 SSH key name that provides access for SSH communication with the nodes in the managed node group.
Type: String

# MySql Configuration
DbEndpoint:
Type: String
DbPort:
Type: String
DbUserName:
Type: String
DbUserPassword:
Type: String
S3BucketName:
Type: String

# LDAP Configuration
LdapDomainName:
Description: Ldap domain name.
Type: String
LdapAdmin:
Description: Ldap admin username.
Type: String
LdapAdminPassword:
Description: Ldap admin password.
Type: String
LdapUser:
Description: Ldap user to grant admin access to.
Type: String
Default: 'depadmin'
LdapUserPassword:
Description: Password for Ldap user to grant admin access to.
Type: String
Default: 'dep123456'

# Redis Configuration
RedisPassword:
Description: Redis password.
Type: String

CliUsername:
Description: The aws username to grant eks cli access to.
Type: String

RangerPassword:
Description: Admin password for ranger.
Type: String

Resources:
EksCluster:
Type: AWSQS::EKS::Cluster
Properties:
Name: !Join
- '-'
- - !Ref EnvironmentName
- eks-cluster
- !Ref AWS::Region
Version: 1.28
LambdaRoleName: !Ref LambdaRoleArn
RoleArn: !Ref ClusterRoleArn
ResourcesVpcConfig:
SubnetIds:
- !Ref PrivateSubnet1Id
- !Ref PrivateSubnet2Id
- !Ref PublicSubnet1Id
- !Ref PublicSubnet2Id
SecurityGroupIds:
- !Ref SecurityGroupIds
EndpointPrivateAccess: true
EndpointPublicAccess: true
EnabledClusterLoggingTypes: ["audit"]
KubernetesApiAccess:
Roles:
# nodeRole so the extension can update the kubernetes api access
- Arn: !Ref LambdaRoleArn
Username: "Lambda"
Groups: [ "aws-auth-admin" ]
- Arn: !Ref ClusterRoleArn
Username: "Admin"
Groups: ["system:masters", "kube-system:aws-auth" ]
- Arn: !Ref WorkerRoleArn
Username: "Worker"
Groups: [ "system:masters", "kube-system:aws-auth", "system:nodes" ]
## Temporary use
- Arn: !Sub "arn:${AWS::Partition}:iam::${AWS::AccountId}:user/${CliUsername}"
Username: "CliUser"
Groups: [ "system:masters", "kube-system:aws-auth" ]
Tags:
- Key: ClusterName
Value: !Join
- '-'
- - !Ref EnvironmentName
- eks-cluster

EbsCsiAddon:
Type: AWS::EKS::Addon
DependsOn:
- EksCluster
Properties:
AddonName: aws-ebs-csi-driver
ClusterName: !Ref EksCluster

PlatformNodeGroup:
Type: AWS::EKS::Nodegroup
Properties:
AmiType: !Ref AmiType
CapacityType: !Ref CapacityType
ClusterName: !Ref EksCluster
DiskSize: !Ref DiskSize
ForceUpdateEnabled: !Ref ForceUpdateEnabled
InstanceTypes:
- m6g.large
- m6g.xlarge
- m6g.2xlarge
- m6g.4xlarge
- r6g.2xlarge
- c6g.2xlarge
- c6gd.2xlarge
- c6g.large
- c6g.xlarge
- c6gn.xlarge
NodegroupName: platform
NodeRole: !Ref WorkerRoleArn
RemoteAccess:
Ec2SshKey: !Ref Ec2SshKey
ScalingConfig:
DesiredSize: 2
MaxSize: 5
MinSize: 2
Subnets:
- !Ref PublicSubnet1Id
- !Ref PublicSubnet2Id
Labels:
nodegroup: platform

HadoopNodeGroup:
Type: AWS::EKS::Nodegroup
Properties:
AmiType: !Ref AmiType
CapacityType: !Ref CapacityType
ClusterName: !Ref EksCluster
DiskSize: !Ref DiskSize
ForceUpdateEnabled: !Ref ForceUpdateEnabled
InstanceTypes:
- m6g.large
- m6g.xlarge
- m6g.2xlarge
- m6g.4xlarge
- r6g.2xlarge
- c6g.2xlarge
- c6gd.2xlarge
- c6g.large
- c6g.xlarge
- c6gn.xlarge
NodegroupName: hadoop
NodeRole: !Ref WorkerRoleArn
RemoteAccess:
Ec2SshKey: !Ref Ec2SshKey
ScalingConfig:
DesiredSize: 2
MaxSize: 5
MinSize: 2
Subnets:
- !Ref PublicSubnet1Id
- !Ref PublicSubnet2Id
Labels:
nodegroup: hadoop

TrinoNodeGroup:
Type: AWS::EKS::Nodegroup
Properties:
AmiType: !Ref AmiType
CapacityType: !Ref CapacityType
ClusterName: !Ref EksCluster
DiskSize: !Ref DiskSize
ForceUpdateEnabled: !Ref ForceUpdateEnabled
InstanceTypes:
- m6g.large
- m6g.xlarge
- m6g.2xlarge
- m6g.4xlarge
- r6g.2xlarge
- c6g.2xlarge
- c6gd.2xlarge
- c6g.large
- c6g.xlarge
- c6gn.xlarge
NodegroupName: trino
NodeRole: !Ref WorkerRoleArn
RemoteAccess:
Ec2SshKey: !Ref Ec2SshKey
ScalingConfig:
DesiredSize: 2
MaxSize: 5
MinSize: 2
Subnets:
- !Ref PublicSubnet1Id
- !Ref PublicSubnet2Id
Labels:
nodegroup: trino

ClusterNamespace:
Type: AWSQS::Kubernetes::Resource
Properties:
ClusterName: !Ref EksCluster
Namespace: default
Manifest: |
---
apiVersion: v1
kind: Namespace
metadata:
name: platform
---
apiVersion: v1
kind: Namespace
metadata:
name: hadoop
---
apiVersion: v1
kind: Namespace
metadata:
name: trino

EbsStorageClass:
Type: AWSQS::Kubernetes::Resource
Properties:
ClusterName: !Ref EksCluster
Namespace: default
Manifest: |
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer

ClusterAutoscaler:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- PlatformNodeGroup
- HadoopNodeGroup
- TrinoNodeGroup
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: dep/cluster-autoscaler
Namespace: kube-system
Name: cluster-autoscaler
Version: 1.0.0
ValueYaml: !Sub |
autoDiscovery:
clusterName: ${EnvironmentName}-eks-cluster-${AWS::Region}
awsRegion: ${AWS::Region}

Outputs:
Arn:
Description: The ARN of the cluster, such as arn:aws:eks:us-west-2:666666666666:cluster/prod.
Value: !GetAtt EksCluster.Arn
CertificateAuthorityData:
Description: The certificate-authority-data for your cluster.
Value: !GetAtt EksCluster.CertificateAuthorityData
ClusterSecurityGroupId:
Description: The cluster security group that was created by Amazon EKS for the cluster.
Managed node groups use this security group for control plane to data plane communication.
Value: !GetAtt EksCluster.ClusterSecurityGroupId

Endpoint:
Description: The endpoint for your Kubernetes API server,
such as https://5E1D0CEXAMPLEA591B746AFC5AB30262.yl4.us-west-2.eks.amazonaws.com.
Value: !GetAtt EksCluster.Endpoint
Name:
Description: EKS cluster name
Value: !Ref EksCluster
OIDCIssuerURL:
Description: The issuer URL for the OIDC identity provider.
Value: !GetAtt EksCluster.OIDCIssuerURL

Next, we will start to deploy related applications on our EKS cluster.

5.11 Deploy OpenLDAP and phpLDAPadmin services

OpenLDAP and phpLDAPadmin images are available in the following registries:

  • public.ecr.aws/y8d7x2g6/dep-public/openldap

To deploy the image using helm in CloudFormation:

## dep-public/series-1/dep-eks.template

Resources:
OpenldapService:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- PlatformNodeGroup
- EbsStorageClass
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: dep/openldap
Namespace: platform
Name: openldap
Version: 1.0.0
ValueYaml: !Sub |
global:
ldapDomain: "dc=${LdapDomainName},dc=com"
adminUser: "${LdapAdmin}"
adminPassword: "${LdapAdminPassword}"
configUserEnabled: false
username: "${LdapUser}"
userPassword: "${LdapUserPassword}"
customLdifFiles:
00-root.ldif: |-
# Root creation
dn: dc=${LdapDomainName},dc=com
objectClass: dcObject
objectClass: organization
o: ${LdapDomainName}
dc: ${LdapDomainName}
01-users-group.ldif: |-
dn: ou=People,dc=${LdapDomainName},dc=com
ou: People
objectClass: organizationalUnit
objectClass: top
customAcls: |-
dn: olcDatabase={2}mdb,cn=config
changetype: modify
replace: olcAccess
olcAccess: {0}to *
by dn.exact=gidNumber=0+uidNumber=1001,cn=peercred,cn=external,cn=auth manage
by * break
olcAccess: {1}to attrs=userPassword,shadowLastChange
by self write
by dn="cn=${LdapAdmin},dc=${LdapDomainName},dc=com" write
by set="user/employeeType & [ldap_admin]" write
by anonymous auth by * none
olcAccess: {2}to *
by dn="cn=${LdapAdmin},dc=${LdapDomainName},dc=com" write
by set="user/employeeType & [ldap_admin]" write
by self read
by * none
persistence:
storageClass: ebs-sc
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodegroup
operator: In
values:
- platform

This template will deploy OpenLDAP services in platform namespace, create admin and config users with given username and password and persist ldap users utilizing EBS storage class.

This template exposes OpenLDAP service on standard port 389 and phpLDAPadmin on port 80.

5.12 Deploy ranger-related services

This template will setup ranger-admin, ranger-solr and ranger-usersync,

Ranger images are available in the following registries:

  • public.ecr.aws/y8d7x2g6/dep-public/ranger/admin
  • public.ecr.aws/y8d7x2g6/dep-public/ranger/solr
  • public.ecr.aws/y8d7x2g6/dep-public/ranger/usersync
## dep-public/series-1/dep-eks.template

Resources:
RangerService:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- PlatformNodeGroup
- OpenldapService
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: dep/ranger
Namespace: platform
Name: ranger
Version: 1.0.0
ValueYaml: !Sub |
env:
admin:
DB_HOST: ${DbEndpoint}
DB_ROOT_USER: ${DbUserName}
DB_ROOT_PASSWORD: ${DbUserPassword}
DB_NAME: ranger
DB_USER: ${DbUserName}
DB_PASSWORD: ${DbUserPassword}
AUDIT_SOLR_URLS: http://ranger-solr.platform.svc.cluster.local:8983/solr/ranger_audits
POLICYMGR_EXTERNAL_URL: http://ranger-admin.platform.svc.cluster.local:6080
XA_LDAP_URL: ldap://openldap.platform.svc.cluster.local:389
XA_LDAP_BASE_DN: dc=${LdapDomainName},dc=com
XA_LDAP_BIND_DN: cn=${LdapAdmin},dc=${LdapDomainName},dc=com
XA_LDAP_BIND_PASSWORD: ${LdapAdminPassword}
LDAP_USER: ${LdapUser}
LDAP_PASSWORD: ${LdapUserPassword}
RANGER_PASSWORD: ${RangerPassword}
usersync:
POLICY_MGR_URL: http://ranger-admin.platform.svc.cluster.local:6080
SYNC_SOURCE: ldap
SYNC_LDAP_URL: ldap://openldap.platform.svc.cluster.local:389
SYNC_LDAP_BIND_DN: cn=${LdapAdmin},dc=${LdapDomainName},dc=com
SYNC_LDAP_BIND_PASSWORD: ${LdapAdminPassword}
SYNC_LDAP_SEARCH_BASE: dc=${LdapDomainName},dc=com
SYNC_LDAP_USER_SEARCH_BASE: ou=People,dc=${LdapDomainName},dc=com
SYNC_LDAP_USER_NAME_ATTRIBUTE: cn

5.13 Deploy standalone Hive metastore service

This template will instance standalone Hive metastore service which act as the metastore service for Trino engine.

Metadata backend is RDS created in chapter 5.9 and warehouse location is s3://${S3BucketName}/hive/ where bucket name is referenced to the one created in chapter 5.5.

Image is available in the following registry:

  • public.ecr.aws/y8d7x2g6/dep-public/hive-metastore-standalone
## dep-public/series-1/dep-eks.template

Resources:
HiveMetastoreService:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- HadoopNodeGroup
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: dep/hive-metastore
Version: 1.0.0
Namespace: hadoop
Name: hive-metastore
ValueYaml: !Sub |
env:
HIVE_DB_JDBC_URL: "jdbc:mysql://${DbEndpoint}:${DbPort}/hive"
HIVE_DB_USER: "${DbUserName}"
HIVE_DB_PASS: "${DbUserPassword}"
HIVE_WAREHOUSE_S3LOCATION: "s3://${S3BucketName}/hive/"

5.14 Deploy Hadoop and Hive services with ranger plugin setup

This template will install Hadoop and hive services with ranger plugin setup.

Images are available in the following registries:

  • public.ecr.aws/y8d7x2g6/dep-public/emr

Within the first image, we install hadoop-hdfs, hadoop-yarn, hiveServer2 as well as hive metastore services.

## dep-public/series-1/dep-eks.template

Resources:
HadoopService:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- HadoopNodeGroup
- HiveMetastoreService
- OpenldapService
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: dep/hadoop
Namespace: hadoop
Name: hadoop
Version: 1.0.0
ValueYaml: !Sub |
ldap:
url: ldap://openldap.platform.svc.cluster.local:389
bind_dn: cn=${LdapAdmin},dc=${LdapDomainName},dc=com
bind_password: ${LdapAdminPassword}
base_dn: ou=People,dc=${LdapDomainName},dc=com
hive:
mysql:
endpoint: ${DbEndpoint}
database: hive
port: ${DbPort}
username: ${DbUserName}
password: ${DbUserPassword}
warehouse:
location: s3://${S3BucketName}/hive/
domain_database: ${LdapDomainName}

ZookeeperService:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- HadoopNodeGroup
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: zookeeper
Namespace: hadoop
Name: zookeeper
Version: 1.0.0
ValueYaml: |
persistence:
storageClass: ebs-sc
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodegroup
operator: In
values:
- hadoop

5.15 Deploy Trino engine with ranger plugin setup

This template will install Trino service with ranger plugin setup.

Also, it creates a catalog configuration refrenced to Hive metastore service created in chapter 5.13. This template will expose jmx metrics to port 9081 so that we can monitor the status of Trino server simultaneously.

Image is available in the following registry:

  • public.ecr.aws/y8d7x2g6/dep-public/trino

*Note: The Trino service in this demo is not secured by TLS and thus is not recommended for production use.

## dep-public/series-1/dep-eks.template

Resources:
TrinoService:
Type: AWSQS::Kubernetes::Helm
DependsOn:
- EbsCsiAddon
- EksCluster
- ClusterNamespace
- TrinoNodeGroup
- OpenldapService
Properties:
ClusterID: !Ref EksCluster
Repository: https://tomxiaoyz.github.io/big-data-platform-public/
Chart: dep/trino
Namespace: trino
Name: trino
Version: 1.0.0
ValueYaml: !Sub |
server:
workers: 4
exchangeManager:
baseDir: s3://${S3BucketName}/trino/
additionalCatalogs:
hive: |-
hive.metastore-refresh-interval=1m
connector.name=hive
hive.metastore-cache-ttl=60s
hive.metastore=thrift
hive.metastore.uri=thrift://hive-metastore.hadoop.svc.cluster.local:9083
hive.non-managed-table-writes-enabled=true
hive.hdfs.impersonation.enabled=true
hive.recursive-directories=true
hive.allow-drop-table=true
hive.parquet.use-column-names=true
hive.max-partitions-per-writers=3000
parquet.ignore-statistics=true
hive.allow-rename-table=true
additionalConfigProperties:
- retry-policy=TASK
- query.remote-task.max-error-duration=10m
- task-retry-attempts-per-task=3
- retry-initial-delay=10s
- retry-max-delay=1m
- retry-delay-scale-factor=2.0
- fault-tolerant-execution-task-descriptor-storage-max-memory=10GB
additionalExchangeManagerProperties:
- exchange.s3.region=${AWS::Region}
domain_database: ${LdapDomainName}
domain_username: ${LdapUser}

6. Cleaning up resources

  • Go to AWS S3: Log into your AWS Management Console and navigate to the S3 service.
  • Delete all the contents in the S3 bucket created in this template.
  • Go to AWS CloudFormation: Navigate to the CloudFormation service.
  • Click ‘Delete’ to delete stack.
  • Go to AWS EC2: Navigate to the EC2 service.
  • Under ‘Volumes’, select all previously created volumes and delete them all.

7. Conclusion

In conclusion, this comprehensive guide on permission management in big data, particularly within the Amazon EKS platform using Apache Ranger, equips users with the essential knowledge and tools for robust data security and management. By implementing the strategies and understanding the components detailed in this guide, users can effectively manage permissions, ensuring data security and compliance in their big data environments.

--

--

No responses yet