Categories
00 - Hybrid Cloud Infrastructure 00 - IT Automation & Management Red Hat Advanced Cluster Management for Kubernetes

RHACM and Policies – An Introduction

As Kubernetes gets more and more adopted, the need for tools to manage diverse and widespread installations grows. Red Hat’s answer to that challenge is Red Hat Advanced Cluster Management for Kubernetes (RHACM). Its range of features is divided into four main areas, which can also easily be seen in its UI (Note: The Screenshots are taken from RHACM 2.2, we already have RHACM 2.3, the UI might now look a bit different):

In the top left corner, we see “End-to-end visibility”. In the bottom left corner, we see “Application lifecycle”. In the top right corner, we see “Cluster lifecycle” and in the bottom right corner we see “Governance, Risk, and Compliance”.

This article provides an introduction to the concepts behind the “Governance, Risk and Compliance” section, sometimes shortened to “GRC” and also known as “Policy Engine”.

The architecture of the Policy Engine in RHACM

Quoting from https://github.com/open-cluster-management/governance-policy-framework :

“The Policy is the Custom Resource Definition (CRD), created for policy framework controllers to monitor. It acts as a vehicle to deliver policies to managed cluster and collect results to send to the hub cluster.”

So, here we find a couple of interesting aspects:

  1. A policy is a CRD, which means, it makes use of standard Kubernetes capabilities and is therefore applicable and usable for any Kubernetes, and not only for OpenShift.
  2. It interacts with so-called “policy framework controllers”, meaning, there can be different controllers.
  3. It interacts with managed clusters, meaning, there can be more than one, and they can be selected individually.
  4. Results are sent back to the hub cluster, so that there is one central place of control.

The above-mentioned URL also has an architecture image of the whole policy engine setup:

Red Hat’s standard slide deck for the introduction of RHACM has a slightly simplified and therefore different image:

What we can see in the lower image is that there are “Out-of-box Policy Controllers”, “Policy Controllers” and “Custom Policy Controllers”.

So RHACM comes with a set of usable policies, the “Out-of-box Policy Controllers”.

According to the documentation, there are currently three such “Out-of-box Policy Controllers”:

More information on the specifics and the allowed parameters of these controllers can be found in the GitHub sections of the controllers:

A piece of additional important information here is that Policy Controllers in principle could also enforce their policies, and not only send back the status of adherence or compliance to that specific policy. From the “Out-of-box Policy Controllers” so far only the configuration policy controller supports the enforce feature.

What makes a Policy?

Now that we have had a small look into the overall architecture, let’s dive into policies a bit deeper.

As mentioned above, a Policy is a CRD, and therefore can be represented in a YAML file. For the policy to be effective, it needs to consist of three parts: The Policy, the PlacementBinding and the PlacementRule:

apiVersion: policy.open-cluster-management.io/v1 kind: Policy . apiVersion: policy.open-cluster-management.io/v1 kind: PlacementBinding . apiVersion: apps.open-cluster-management.io/v1 kind: PlacementRule

Here we simply see the top line definitions of the three parts. What they can contain, and how they interact will be shown later in this blog entry and also in upcoming blog entries.

As a quick start, let’s note that the PlacementBinding connects the Policy to the PlacementRule, and the PlacementRule defines where the Policy should be active.

With that, we have covered the basics of what a Policy in RHACM is, and how it works.

Some Examples

Okay, let’s look at some example policies to better understand what they do.

Like all Red Hat products, RHACM has a so-called “upstream” community version, which is available under: https://open-cluster-management.io/. Its source code can be found on GitHub: https://github.com/open-cluster-management-io. The “downstream” product also has some GitHub repositories, here we are specifically interested in the policies, which can be found in: https://github.com/open-cluster-management/policy-collection 

For this exercise, let’s pick two specific policies:

  1. An operator-installation policy
  2. A config check policy

In the first example, we will make use of the gatekeeper operator policy, which checks for the existence (and can also enforce the installation) of the gatekeeper operator. Gatekeeper itself is something that we will not look into here, check out these articles if you would like to learn more, or have a look at its source code repository.

The second policy checks if the SCCs (Security Context Constraints) adhere to predefined settings (for example, if it’s allowed to run a container with root privileges).

So, the policy for 1.) can be found here: https://github.com/open-cluster-management/policy-collection/blob/main/community/CM-Configuration-Management/policy-gatekeeper-operator.yaml 

The policy for 2.) can be found here: https://github.com/open-cluster-management/policy-collection/blob/main/stable/SC-System-and-Communications-Protection/policy-scc.yaml

They can also be found below in the Appendixes for reference.

Example 1

Let’s start with a quick look at the gatekeeper policy:

It defines some namespaces and operators to be installed. We can see these in the YAML with their own “objectDefinition” section categorized under “Policy”. They are:

  1. A “Namespace” called: “openshift-gatekeeper-operator”
  2. A “CatalogSource” called “gatekeep-operator-catalog-source” to be deployed in the aforementioned new Namespace “openshift-gatekeeper-operator” and be named “gatekeeper-operator” and making use of an image “quay.io/gatekeeper/gatekeeper-operator-bundle-index:latest”.
  3. An “OperatorGroup” named “gatekeeper-operator-group” again to be deployed in the aforementioned Namespace created in 1.
  4. A “Subscription” to manage the whole things in RHACM
  5. The “Gatekeeper” itself, making use of the following image: “docker.io/openpolicyagent/gatekeeper:v3.3.0”

This is all described in the YAML of our example 1.). We see that a policy can also check and enforce the existence of multiple elements at the same time. Here we will not go into more detail, on what needs to be put into the YAML and what these things all do, this will be left for a later blog entry. Here, we simply want to show how easy it is to make use of such policies.

For demo purposes, I have two OCP-clusters, one of them runs RHACM. We can see that, when we go to the “Cluster lifecycle” section of RHACM:

An important note here: One of the clusters was prepared with the label “environment=dev” (circled in red above).

This RHACM has not yet created any GRC policies, so let’s do this with our first example.

To start, let’s go to the bottom left group on the RHACM start page, and we’ll see the UI for Governance and risk (also called: Policy engine):

We click on “Create Policy”:

By default, we see the YAML code on the right side, which makes it also easy for us to import the above-mentioned first policy. Let’s go to the GitHub page, click on “Raw” for the policy YAML, and just copy the YAML code from GitHub into the YAML section of RHACM. Note: Before pasting into RHACM clear the YAML section there. Typically you do a <ctrl>-a <ctrl>-c in the GitHub Window, and a <ctrl>-a <ctrl>-v in the RHACM window. After you paste the policy into that YAML-Edit Window in RHACM, you should have the following:

In the last line of the policy code, in the “PlacementRule” section, we see that this policy should be used on all clusters which have a label “environment” with a value of “dev”. Before we can press the “Create” button, we still need to select a namespace, in which this policy shall be executed. This is for internal organization reasons only, it does NOT affect the results of the policy engine itself. So, here I simply select on the left side the “Default” namespace. I could also have created some specific policy-engine namespaces in advance to be able to group them more efficiently. Also note, that I will not yet select the “Enforce if supported” button.

Before we create the policy, let’s again check the list of installed operators on the cluster itself in its OpenShift UI:

We see that also this cluster is a fresh one, no additional operators are installed.

So, let’s create the policy by clicking on the “Create” button in the top right corner of the “Create policy” dialog in RHACM.

We are forwarded to a screen, which after a couple of moments looks like:

We see that RHACM detected that the policy shall be used on 1 cluster, and that the policy is NOT adhered to in this cluster, therefore we have one policy violation. We can click on the policy name to get a more detailed overview, and there we select the “Status” Tab:

We see: The required operator elements are missing, which is why the policy failed.

If we go back to the policy overview two images above, we see that at the end of the line, where the policy is listed, we see three dots on top of each other. If we click on those, we get a popup box, in which we can select an action to the policy:

Let’s click on “Enforce” and confirm that action in the next popup box.

A couple of moments later the image changes to:

And, when we again check the details of the policy, we see:

And we can confirm in our cluster with the “environment=dev” label, that the operator has been installed:

This concludes our first example. We learned how a policy can ensure or simply check for the installation of a specific operator and all the elements it needs to run.

Example 2

Just as with our example 1, let’s look at our policy.

Different from our first example, this policy only has one single “objectDefinition” section. It is of the kind “ConfigurationPolicy” and refers to an “object-template” of the kind “SecurityContextConstraints”.

SecurityContextConstraints in OpenShift are used to define what permissions running containers will have. They consist of a couple of attributes, for example “allowPrivilegedContainer” or “allowPrivilegeEscalation”.

We can check for the default setup of these SCCs via:

[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc get scc NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES anyuid false <no value> MustRunAs RunAsAny RunAsAny RunAsAny 10 false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] hostaccess false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"] hostmount-anyuid false <no value> MustRunAs RunAsAny RunAsAny RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"] hostnetwork false <no value> MustRunAs MustRunAsRange MustRunAs MustRunAs <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] machine-api-termination-handler false <no value> MustRunAs RunAsAny MustRunAs MustRunAs <no value> false ["downwardAPI","hostPath"] node-exporter true <no value> RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"] nonroot false <no value> MustRunAs MustRunAsNonRoot RunAsAny RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] privileged true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"] restricted false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] [mpfuetzn@mpfuetzn oc4.6.25]$
Code language: PHP (php)

We see that there are a couple of predefined ones. And we can look at for example the “restricted” one and its definition via:

[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc describe scc restricted Name: restricted Priority: <none> Access: Users: <none> Groups: system:authenticated Settings: Allow Privileged: false Allow Privilege Escalation: true Default Add Capabilities: <none> Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID Allowed Capabilities: <none> Allowed Seccomp Profiles: <none> Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret Allowed Flexvolumes: <all> Allowed Unsafe Sysctls: <none> Forbidden Sysctls: <none> Allow Host Network: false Allow Host Ports: false Allow Host PID: false Allow Host IPC: false Read Only Root Filesystem: false Run As User Strategy: MustRunAsRange UID: <none> UID Range Min: <none> UID Range Max: <none> SELinux Context Strategy: MustRunAs User: <none> Role: <none> Type: <none> Level: <none> FSGroup Strategy: MustRunAs Ranges: <none> Supplemental Groups Strategy: RunAsAny Ranges: <none> [mpfuetzn@mpfuetzn oc4.6.25]$
Code language: HTML, XML (xml)

In the YAML of our example, we see that this policy will create a new one, or check for the existence of a SCC named “sample-restricted-scc”.

In this example, we will now use the policy to make sure that the “restricted” SCC will look like the definition in the policy.

So, let’s again create a new policy by using copy&paste of the YAML code into our RHACM policy definition window.

To achieve this, we need to again define a Namespace in which the policy shall run. I again select default, see also my remarks above on this selection. But there is a second thing we need to do. In line 39 we see still “sample-restricted-scc”, let’s change that to “restricted”:

Note again, that in the last line of this policy this is also only to be applied to clusters with the “environment=dev” labels, so we are safe in this example.

Again, this policy fails, because it defines a slightly different setup than the one we saw above as the current status. So let’s look at the results of the policy check (note: I also did not set the “enforce” button when creating the policy):

Drilldown reveals:

Let’s view the details:

If we again set the policy to “Enforce” as we did above in the other example, it will “correct” the error, and that will lead to the following output for:

[mpfuetzn@mpfuetzn oc4.6.25]$ ./oc describe scc restricted Name: restricted Priority: 10 Access: Users: <none> Groups: system:authenticated Settings: Allow Privileged: false Allow Privilege Escalation: true Default Add Capabilities: <none> Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID Allowed Capabilities: <none> Allowed Seccomp Profiles: <none> Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret Allowed Flexvolumes: <all> Allowed Unsafe Sysctls: <none> Forbidden Sysctls: <none> Allow Host Network: false Allow Host Ports: false Allow Host PID: false Allow Host IPC: false Read Only Root Filesystem: false Run As User Strategy: MustRunAsRange UID: <none> UID Range Min: <none> UID Range Max: <none> SELinux Context Strategy: MustRunAs User: <none> Role: <none> Type: <none> Level: <none> FSGroup Strategy: MustRunAs Ranges: <none> Supplemental Groups Strategy: RunAsAny Ranges: <none> [mpfuetzn@mpfuetzn oc4.6.25]$
Code language: HTML, XML (xml)

If we look into more details where the origial SCC and the new enforced SCC differ, we see:

[mpfuetzn@mpfuetzn oc4.6.25]$ diff orig new 3c3 < Priority: <none> --- > Priority: 10 [mpfuetzn@mpfuetzn oc4.6.25]$
Code language: JavaScript (javascript)

It’s a small change, but still, it’s a difference… 🙂 And regardless, the policy ensures that the SCC stays as intended, and doesn’t get modified by accident.

This finishes our introduction to Policies in RHACM. More to come in future blog entries. Here: https://www.opensourcerers.org/2021/10/11/rhacm-and-policies-more-details/

Appendix A: Example 1:

apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: name: policy-gatekeeper-operator annotations: policy.open-cluster-management.io/standards: NIST SP 800-53 policy.open-cluster-management.io/categories: CM Configuration Management policy.open-cluster-management.io/controls: CM-2 Baseline Configuration spec: remediationAction: inform disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: gatekeeper-operator-ns spec: remediationAction: inform severity: high object-templates: - complianceType: musthave objectDefinition: apiVersion: v1 kind: Namespace metadata: name: openshift-gatekeeper-operator - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: gatekeeper-operator-catalog-source spec: remediationAction: inform severity: high object-templates: - complianceType: musthave objectDefinition: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: gatekeeper-operator namespace: openshift-gatekeeper-operator spec: displayName: Gatekeeper Operator Upstream publisher: github.com/font/gatekeeper-operator sourceType: grpc image: 'quay.io/gatekeeper/gatekeeper-operator-bundle-index:latest' updateStrategy: registryPoll: interval: 45m - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: gatekeeper-operator-group spec: remediationAction: inform severity: high object-templates: - complianceType: musthave objectDefinition: apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: gatekeeper-operator namespace: openshift-gatekeeper-operator - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: gatekeeper-operator-subscription spec: remediationAction: inform severity: high object-templates: - complianceType: musthave objectDefinition: apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: gatekeeper-operator-sub namespace: openshift-gatekeeper-operator spec: channel: stable name: gatekeeper-operator source: gatekeeper-operator sourceNamespace: openshift-gatekeeper-operator - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: gatekeeper spec: remediationAction: inform severity: high object-templates: - complianceType: musthave objectDefinition: apiVersion: operator.gatekeeper.sh/v1alpha1 kind: Gatekeeper metadata: name: gatekeeper spec: audit: logLevel: INFO replicas: 1 image: image: 'docker.io/openpolicyagent/gatekeeper:v3.3.0' validatingWebhook: Enabled mutatingWebhook: Disabled webhook: emitAdmissionEvents: Enabled logLevel: INFO replicas: 2 --- apiVersion: policy.open-cluster-management.io/v1 kind: PlacementBinding metadata: name: binding-policy-gatekeeper-operator placementRef: name: placement-policy-gatekeeper-operator kind: PlacementRule apiGroup: apps.open-cluster-management.io subjects: - name: policy-gatekeeper-operator kind: Policy apiGroup: policy.open-cluster-management.io --- apiVersion: apps.open-cluster-management.io/v1 kind: PlacementRule metadata: name: placement-policy-gatekeeper-operator spec: clusterConditions: - status: "True" type: ManagedClusterConditionAvailable clusterSelector: matchExpressions: - {key: environment, operator: In, values: ["dev"]}
Code language: JavaScript (javascript)

Appendix B: Example 2:

apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: name: policy-securitycontextconstraints annotations: policy.open-cluster-management.io/standards: NIST SP 800-53 policy.open-cluster-management.io/categories: SC System and Communications Protection policy.open-cluster-management.io/controls: SC-4 Information In Shared Resources spec: remediationAction: inform disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: policy-securitycontextconstraints-example spec: remediationAction: inform # the policy-template spec.remediationAction is overridden by the preceding parameter value for spec.remediationAction. severity: high namespaceSelector: exclude: ["kube-*"] include: ["default"] object-templates: - complianceType: musthave objectDefinition: apiVersion: security.openshift.io/v1 kind: SecurityContextConstraints # restricted scc metadata: annotations: kubernetes.io/description: restricted denies access to all host features and requires pods to be run with a UID, and SELinux context that are allocated to the namespace. This is the most restrictive SCC and it is used by default for authenticated users. name: sample-restricted-scc allowHostDirVolumePlugin: false allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegeEscalation: true allowPrivilegedContainer: false allowedCapabilities: [] defaultAddCapabilities: [] fsGroup: type: MustRunAs groups: - system:authenticated priority: 10 readOnlyRootFilesystem: false requiredDropCapabilities: - KILL - MKNOD - SETUID - SETGID runAsUser: type: MustRunAsRange seLinuxContext: type: MustRunAs supplementalGroups: type: RunAsAny users: [] volumes: - configMap - downwardAPI - emptyDir - persistentVolumeClaim - projected - secret --- apiVersion: policy.open-cluster-management.io/v1 kind: PlacementBinding metadata: name: binding-policy-securitycontextconstraints placementRef: name: placement-policy-securitycontextconstraints kind: PlacementRule apiGroup: apps.open-cluster-management.io subjects: - name: policy-securitycontextconstraints kind: Policy apiGroup: policy.open-cluster-management.io --- apiVersion: apps.open-cluster-management.io/v1 kind: PlacementRule metadata: name: placement-policy-securitycontextconstraints spec: clusterConditions: - status: "True" type: ManagedClusterConditionAvailable clusterSelector: matchExpressions: - {key: environment, operator: In, values: ["dev"]}

2 replies on “RHACM and Policies – An Introduction”

Leave a Reply