Red Hat OpenShift — how to improve security and stability
The default Openshift 3 cluster installation may not meet the security requirements of an enterprise and without configuration may lead to environment stability issues. In this blog I will be sharing the configuration that can be applied to address these issues and provide provide operational stability for OpenShift cluster. Solution provided in this blog are meant for Openshift 3 environment only. I plan to create a followup blog that will cover how such issues can be address in Openshift 4 environment as well. Stay tuned…
Configure journald service to limit the disk space used for logging.
JournalD is a system service that collects and stores logging data like Kernel logs, system logs, standard output and error messages, etc. The logs are stored under location /var/log/journal. It uses a configuration file /etc/systemd/journald.conf that specifies the size, location and how the logs should be stored.
By default journald service is configured to use MAX disk space of 8GB for openshift cluster. If log file system “/var/log” is allocated a space smaller than 8GB, over a period of time entire file system will be filled and that will result in pod startup failures due to disk space issue.
Solution:
Journald Service can be configured not to exceed a given disk space size. Below journald config options can be updated under Journal section to control the disk space used by journal.
Storage — Variable controls where to store the journal data.
SystemMaxFileSize — Variable controls how large individual journal files may grow at most.
SystemMaxUse — Variable controls how much disk space the journal may use at most.
SystemKeepFree — Variable controls how much disk space the journal shall leave for other applications.
Two options available:
As part of Openshift cluster build. Add following variable in the openshift inventory file. The OpenShift 3 installer playbooks will configure journald service to use the input values.
journald_vars_to_replace:
— {var: Storage, value: persistent}
— {var: SystemMaxFileSize, value: 100M}
— {var: SystemMaxUse, value: 2G}
— {var: SystemKeepFree, value: 4G}
Configure docker logging to limit the max size of log files
Containers running on Openshift nodes generate log files and are stored on the node. Filesystem dedicated for containers may run out of space if a limit is not specified on the container logs.
Solution
Update Openshift cluster install inventory file to add docker options with max log file size and number of files.
openshift_docker_options: “ — log-driver=json-file — signature-verification=false — log-opt max-size=2M — log-opt max-file=5”
Remove self provisioning
OpenShift cluster creates a default rolebinding that provides any authenticated user permissions to create new projects. In ra restricted environment only given set of users needs to have this permission
Solution
Update the self provisioning role binding to remove the assignment of this permission
oc patch clusterrolebinding.rbac self-provisioners -p ‘{“subjects”: null}’
OR
If other user groups are provided permission through this role binding
oc patch clusterrolebinding.rbac self-provisioners -p ‘{“subjects”: null}’
For more details refer the openshift documentation https://docs.openshift.com/container-platform/3.11/admin_guide/managing_projects.html
Auto approve certificate signing request
Openshift API server communicated with internal components like etcd and applications using certificate based authentication. These applications generate a certificate signing request (CSR) and this needs to be approved by the API server in time, if not the communication between application and API server will be broken and will result in failures.
Solution
Update inventory will below variable. Openshift installation playbooks will enable the auto approval.
openshift_master_bootstrap_auto_approve=true
Check Certs expiry
Openshift provides a default Certificate Authority for granting and approving new certificate signing request. This CA certs expires by default in 3 years and needs to be renewed in time to ensure cluster is functioning without any downtime. There are other OpenShift component certs like etcd, worker node and master node certs as well which needs cert renewal
Solution
Openshift provides a playbook that can find the expiry dates of all certs used in the cluster and generate a html & json report file
$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook -v -i /etc/ansible/openshift/aws-poc/playbook/hosts.aws-poc playbooks/openshift-checks/certificate_expiry/easy-mode.yaml
A job can be scheduled to run periodically to run this playbook and send alerts if there are any certs expiring in near future.
Refer OpenShift documentation for more details
Update default session timeout
By default OpenShift has a default web console session timeout of 24 hours and this may not meet the security needs of an enterprise.
OAuthClient object definition has a parameter “accessTokenMaxAgeSeconds” that holds the session timeout value. If it is not initialized, timeout value is taken from the config file “/etc/origin/master/master-config.yaml”
Refer KB link for more details
Assigning static IPs for external project traffic
OpenShift cluster an be configured to be a multi-tenant environment using Software defined networking (SDN — ovs-networkpolicy) and networking policies. Applications running in a given project/namespace cannot communicate with other project applications unless networking rule allows.
Application containers may need to access services running outside of cluster and by default host IP on which the container is running is send as part of the request to the target service. If the target service is configured to allow request originating for a given IP for better access management, than it would be ideal to have all the requests origination from a project/namespace to have a unique static IP
Solution
Static IP can be assigned to namespace using below command
oc patch netnamespace <project_name> -p ‘{“egressIPs”: [“<IP_address>”]}’oc patch hostsubnet
<node_name> -p '{"egressIPs": ["<IP_address_1>", "<IP_address_2>"]}'
Refer OpenShift documentation for more details
Create ingress and egress network policy
In clusters built with Software defined networking (SDN — ovs-networkpolicy), project can communicate with each other and there are no restrictions. To achieve multi-tenancy and isolate each project, default network policies can be applied as part of project-request template. This will ensure whenever a new project is created these network policies are applied and prevent projects talking to each other by default
Sample networking policies:
- Networking policy to allow connections between pods in the same project only
- Egress policy to deny traffic from namespace to other namespace
This policies can be adjusted as per the requirements at the project level and rules can be added to allow/deny traffic from the target projects
Refer OpenShift documentation for more details
Configure monitoring stack — Prometheus
Openshift is shipped with pre-configured and self updating stack that is based on Prometheus open source project and its wider eco-system. It provides monitoring of cluster components like ETCD, API server, kubelet and ships with set of alerts to immediately notify cluster administrators about any errors and set of Grafana dashboards
Monitoring stack can be enabled by adding below variable in inventory file
openshift_cluster_monitoring_operator_install=true
Refer for more details
Create custom roles as needed
Different roles are available with default Openshift installation that will provide permission to perform operations on the cluster like ClusterAdmin, ClusterReader, etc. Custom roles may needs to be defined to achieve segregation of duties like IAMAdmin that can only create roles and grant permission to different users/groups but cannot manage cluster. And cluster admins can only manage cluster but not create new cluster roles and assign permissions
Refer Openshift documentation to know how to create new custom roles
Integrate with LDAP
Openshift allows integrating with existing LDAP server like AD and to sync the existing users/groups in AD to openshift. Permissions/roles can be assigned at the AD group level and access can be managed to cluster.
https://docs.openshift.com/container-platform/3.11/install_config/syncing_groups_with_ldap.html
Enable ETCD encryption
ETCD is the key value data store that is used by Openshift to store the manifest data about the different objects like deployment, configmap, secrets, etc. The data stored by default is not encrypted and contents can be retrieved and viewed using OC or ETCD command line utility. Some data for example secrets is desired to be stored in encrypted form for security. Enabling the ETCD encryption helps to achieve this
Regular operational maintenance tasks
Defragment ETCD to decrease DB size
Addressing ETCD startup failures
Pruning objects to reduce DB size
Install logging and monitoring agents
Openshift comes with default monitoring stack (prometheus) for logging and monitoring. Enterprises may have their own external logging (example: Splunk )and application monitoring solutions (Example : Dynatrace ). These external applications can be integrated with OpenShift cluster by installing respective agents to send the cluster logs and other data in real time.
Install host and container scanning tools
Openshift host and container images needs to be regularly scanned for new vulnerabilities and remediated with patches. Tools are available that performs this scanning and remediation and respective agents needs to be installed.
Examples: Qualys, Redhat Advanced security management
Disable unsupported and not secured TLS cypher suites
Enable the TLS version and cypher suites that are recommend by the security team