Categories
00 - Hybrid Cloud Infrastructure

Backup and Recovery with OpenShift APIs for Data Protection (OADP)

Introduction

The well-known phrase “Data is the new oil”, which was coined by Clive Humby many years ago holds true to this day. If that is the case, it should be one of the top priorities for companies to protect their data – including data that is generated and processed by cloud-native applications running on OpenShift. Yet, a large number of enterprises still suffer from data loss. [1, 2] And while OpenShift solves a lot of problems, it can not prevent disasters (e. g. caused by human error) from happening.

As a result, there is a need for a modern solution that addresses the backup and recovery requirements of highly dynamic, containerized applications that run on OpenShift – both on premises and in the cloud. With that in mind, the OpenShift APIs for Data Protection (OADP) project was started in mid-2020. Based on the popular Velero tool, OADP aims to provide the following APIs while offering a solid, tested integration with OpenShift:

  • Backup – what to back up?
  • Restore – what to restore?
  • Schedule – when to back up?
  • BackupStorageLocation – where to store backups?
  • VolumeSnapshotLocation – where to place storage snapshots?

The main goal of those APIs is to backup and restore Kubernetes resources (“yaml files”), as well as container images and data stored in persistent volumes. That being said, OADP is not a full end-to-end data protection solution, as it does not provide a standalone storage solution, an enhanced scheduling or policy engine, nor enhanced data movement features. [3]

With that said, let’s go ahead and install the OADP Operator, deploy an example project and simulate a disaster in order to see if we can recover from it.

Installing the OADP Operator

As with any Operator, one of the easiest ways to get it up and running on OpenShift is via the built-in OperatorHub, which is essentially a registry for Kubernetes operators provided by the community, independent software vendors, or Red Hat.

Fig. 1: Installing the OADP Operator via the OperatorHub

After locating the Operator, clicking the Install button and confirming the installation, we are ready to consume the provided APIs. The next step is to initiate a Velero instance by creating a Custom Resource. In the following example, we are going to specify that all of our backups shall be stored in an S3 compatible object storage – in this case AWS S3 (OpenShift Data Foundation, minio, Azure Blob Storage, Google Object Storage or similar would be possible as well):

$ cat example-velero.yaml apiVersion: konveyor.openshift.io/v1alpha1 kind: Velero metadata: name: example-velero namespace: oadp-operator spec: olm_managed: true backup_storage_locations: - config: profile: default region: us-east-1 credentials_secret_ref: name: cloud-credentials namespace: oadp-operator name: default object_storage: bucket: oadp prefix: velero provider: aws default_velero_plugins: - aws - csi - openshift velero_feature_flags: EnableCSI enable_restic: true volume_snapshot_locations: - config: profile: default region: us-east-1 name: default provider: aws
Code language: JavaScript (javascript)

Note that restic is enabled, which is a useful tool for backing up persistent volumes that are backed by a storage provider that does not support volume snapshots (e. g. NFS or emptyDir). Deploying the above specification results in the following resources that are created and managed by the OADP Operator:

Fig. 2: Resources managed by the OADP Operator

This example represents the bare-minimum needed to get up and running. If you are interested in a more detailed configuration, see the upstream documentation to learn more about resource requests and limits, TLS certificates, plugins and custom backup storage and volume snapshot locations.

Creating an example project

Next, we are going to deploy a stateful application and generate some data. Thankfully, OpenShift Container Platform exposes a set of example applications via the web console’s developer perspective that should do just fine. Hence we don’t need to invest a lot of time in building an actual application.

Fig. 3: Creating a PostgreSQL database from a Template in the web console

After hitting “Instantiate Template” and accepting the default values, OpenShift takes care of provisioning not only the database instance, but also a PersistentVolume which stores our user data – independent from the respective pod.

As we don’t have a proper front-end application, we are going to create a database and populate it with content the old fashioned way: via the command line:

$ oc -n my-persistent-application exec -it postgresql-1-gf5q6 -- psql psql (10.15) Type "help" for help. postgres=# CREATE DATABASE restore_me; CREATE DATABASE postgres=# \connect restore_me; You are now connected to database "restore_me" as user "postgres". postgres=# CREATE TABLE me_as_well ( id INT PRIMARY KEY, text VARCHAR ); CREATE TABLE postgres=# \dt List of relations Schema | Name | Type | Owner --------+------------+-------+---------- public | me_as_well | table | postgres (1 row) postgres=# INSERT INTO me_as_well VALUES (1,'hello there'); INSERT 0 1 postgres=# SELECT * from me_as_well; id | text ----+------------- 1 | hello there (1 row)
Code language: PHP (php)

In order to make sure that our data is decoupled from the pod, let’s simply delete it and see if we can still access our data:

$ oc -n my-persistent-application delete pod -l name=postgresql pod "postgresql-1-gf5q6" deleted $ oc get pods NAME READY STATUS RESTARTS AGE postgresql-1-9plwt 1/1 Running 0 70s $ oc -n my-persistent-application exec -it postgresql-1-9plwt -- psql psql (10.15) Type "help" for help. postgres=# \connect restore_me; You are now connected to database "restore_me" as user "postgres". restore_me=# \dt List of relations Schema | Name | Type | Owner --------+------------+-------+---------- public | me_as_well | table | postgres (1 row) restore_me=# SELECT * FROM me_as_well; id | text ----+------------- 1 | hello there (1 row)
Code language: PHP (php)

Perfect, this seems to have worked. Now let’s go ahead and create a backup specification, make sure the backup is executed and then inspect the bits that have been stored.

Creating and inspecting a backup

The following specification is one of the simplest forms to instruct Velero to fully back up a single namespace, in this case “my-persistent-application”. Furthermore, because our PersistenVolume is backed by NFS, we tell Velero to use restic in order to back it up.

$ cat example-backup.yaml apiVersion: velero.io/v1 kind: Backup metadata: namespace: oadp-operator name: backup-my-persistent-application spec: defaultVolumesToRestic: true includedNamespaces: - my-persistent-application
Code language: JavaScript (javascript)

After creating the above custom resource, Velero takes care of the rest. After a few minutes, we can see that the Status field flips from Phase: In Progress to Phase: Completed.

Fig. 4: Overview of a completed backup in the OpenShift Console

Remember, our backup location is an S3 bucket. Let’s find out what has been stored there in order to get an understanding of the items that have been backed up.

$ mkdir inspect-backup $ aws s3 sync s3://oadp/velero/ inspect-backup/ $ tree inspect-backup/ inspect-backup/ ├── backups │ └── backup-my-persistent-application │ ├── backup-my-persistent-application-csi-volumesnapshotcontents.json.gz │ ├── backup-my-persistent-application-csi-volumesnapshots.json.gz │ ├── backup-my-persistent-application-logs.gz │ ├── backup-my-persistent-application-podvolumebackups.json.gz │ ├── backup-my-persistent-application-resource-list.json.gz │ ├── backup-my-persistent-application.tar.gz │ ├── backup-my-persistent-application-volumesnapshots.json.gz │ └── velero-backup.json └── restic └── my-persistent-application ├── config ├── data │ ├── 1e │ │ └── 1e97234f2d00e619e3dc074425abd0ab4174b4641fcac01eba147c1abe593b42 │ ├── 52 │ │ └── 524872dfc547974b306603f387fd65caf737ce76b766b4d19906fdae91947e24 │ ├── 7f │ │ └── 7f0b3bb51500e68ebcd8fd87c0d877225811036a5eb3f8a81f938900dce7fe99 │ ├── cc │ │ ├── cc728c09d3f0908955cbac81296fde285b7a24cc27c5d773dae306b6b8eed107 │ │ └── cce7d2b057c9d97779aa1062c1e850b76144de3c6e1b6a31efa95a9909930e9d │ ├── e5 │ │ └── e5514a0e5d30efc6d7f40fbba66a2904324f1b7ed0197379ca7419c93fd07442 │ └── fe │ └── fe95884a3f4498476fb60097e9c2f7d39112b57cb8ec28db94d3a65ceed7fe83 ├── index │ └── f8baafd312d3c7bf76944fb0227049d843ec03eb7ac38becdf96914add68c4bd ├── keys │ └── 59915de2daffd4312c28398c93877128c40602236aeb7face3e1540928923eb9 └── snapshots └── 8c9370199d4cb588e1022944b526302a6161a10c04d782fe5daea8b18ff53cf6 14 directories, 19 files
Code language: PHP (php)

We can tell at a first glance that the restic directory consists exclusively of binary files. As a result, we are going to need the restic command line interface as well as the repository password that is used to encrypt our backup:

$ oc -n oadp-operator get secret/velero-restic-credentials -o yaml | grep "repository-password" | awk {'print $2'} | base64 -d > tmp.txt $ restic --password-file=tmp.txt --repo inspect-backup/restic/my-persistent-application ls -l latest | tail drwx------ 1000820000 0 0 2021-05-03 17:04:36 /userdata/pg_twophase drwx------ 1000820000 0 0 2021-05-03 17:04:36 /userdata/pg_wal -rw------- 1000820000 0 16777216 2021-05-03 17:36:03 /userdata/pg_wal/000000010000000000000001 drwx------ 1000820000 0 0 2021-05-03 17:04:36 /userdata/pg_wal/archive_status drwx------ 1000820000 0 0 2021-05-03 17:04:36 /userdata/pg_xact -rw------- 1000820000 0 8192 2021-05-03 17:35:57 /userdata/pg_xact/0000 -rw------- 1000820000 0 88 2021-05-03 17:04:36 /userdata/postgresql.auto.conf -rw------- 1000820000 0 23066 2021-05-03 17:04:43 /userdata/postgresql.conf -rw------- 1000820000 0 18 2021-05-03 17:30:57 /userdata/postmaster.opts -rw------- 1000820000 0 98 2021-05-03 17:30:57 /userdata/postmaster.pid $ rm -f tmp.txt
Code language: JavaScript (javascript)

The result looks promising, which means we can be confident that our PostgreSQL data has been backed up successfully. Next, let’s have a look at the Kubernetes objects after extracting them:

$ gunzip inspect-backup/backups/backup-my-persistent-application/*.gz $ tar -xf inspect-backup/backups/backup-my-persistent-application/*.tar -C inspect-backup/backups/backup-my-persistent-application/ $ tree inspect-backup/backups/backup-my-persistent-application/ inspect-backup/backups/backup-my-persistent-application/ ├── backup-my-persistent-application-csi-volumesnapshotcontents.json ├── backup-my-persistent-application-csi-volumesnapshots.json ├── backup-my-persistent-application-logs ├── backup-my-persistent-application-podvolumebackups.json ├── backup-my-persistent-application-resource-list.json ├── backup-my-persistent-application.tar ├── backup-my-persistent-application-volumesnapshots.json ├── metadata │ └── version ├── resources │ ├── clusterserviceversions.operators.coreos.com │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── redhat-openshift-pipelines-operator.v1.2.3.json │ │ └── v1alpha1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── redhat-openshift-pipelines-operator.v1.2.3.json │ ├── configmaps │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── kube-root-ca.crt.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── kube-root-ca.crt.json │ ├── customresourcedefinitions.apiextensions.k8s.io │ │ ├── cluster │ │ │ └── clusterserviceversions.operators.coreos.com.json │ │ └── v1-preferredversion │ │ └── cluster │ │ └── clusterserviceversions.operators.coreos.com.json │ ├── deploymentconfigs.apps.openshift.io │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql.json │ ├── endpoints │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql.json │ ├── endpointslices.discovery.k8s.io │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql-2pfdn.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql-2pfdn.json │ ├── namespaces │ │ ├── cluster │ │ │ └── my-persistent-application.json │ │ └── v1-preferredversion │ │ └── cluster │ │ └── my-persistent-application.json │ ├── persistentvolumeclaims │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql.json │ ├── persistentvolumes │ │ ├── cluster │ │ │ └── pvc-1f2b90c8-70d7-42ac-a25a-471185be5a37.json │ │ └── v1-preferredversion │ │ └── cluster │ │ └── pvc-1f2b90c8-70d7-42ac-a25a-471185be5a37.json │ ├── pods │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql-1-9plwt.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql-1-9plwt.json │ ├── replicationcontrollers │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql-1.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql-1.json │ ├── rolebindings.authorization.openshift.io │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ ├── admin.json │ │ │ ├── edit.json │ │ │ ├── system:deployers.json │ │ │ ├── system:image-builders.json │ │ │ └── system:image-pullers.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ ├── admin.json │ │ ├── edit.json │ │ ├── system:deployers.json │ │ ├── system:image-builders.json │ │ └── system:image-pullers.json │ ├── rolebindings.rbac.authorization.k8s.io │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ ├── admin.json │ │ │ ├── edit.json │ │ │ ├── system:deployers.json │ │ │ ├── system:image-builders.json │ │ │ └── system:image-pullers.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ ├── admin.json │ │ ├── edit.json │ │ ├── system:deployers.json │ │ ├── system:image-builders.json │ │ └── system:image-pullers.json │ ├── secrets │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ ├── builder-dockercfg-l72px.json │ │ │ ├── builder-token-5vrf5.json │ │ │ ├── builder-token-pvmrc.json │ │ │ ├── default-dockercfg-xnqx5.json │ │ │ ├── default-token-ch4bx.json │ │ │ ├── default-token-gz6h2.json │ │ │ ├── deployer-dockercfg-pnd7c.json │ │ │ ├── deployer-token-84mnb.json │ │ │ ├── deployer-token-r9s4l.json │ │ │ ├── pipeline-dockercfg-w2855.json │ │ │ ├── pipeline-token-qqbj4.json │ │ │ ├── pipeline-token-zxnxb.json │ │ │ ├── postgresql.json │ │ │ └── postgresql-persistent-parameters-55bdh.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ ├── builder-dockercfg-l72px.json │ │ ├── builder-token-5vrf5.json │ │ ├── builder-token-pvmrc.json │ │ ├── default-dockercfg-xnqx5.json │ │ ├── default-token-ch4bx.json │ │ ├── default-token-gz6h2.json │ │ ├── deployer-dockercfg-pnd7c.json │ │ ├── deployer-token-84mnb.json │ │ ├── deployer-token-r9s4l.json │ │ ├── pipeline-dockercfg-w2855.json │ │ ├── pipeline-token-qqbj4.json │ │ ├── pipeline-token-zxnxb.json │ │ ├── postgresql.json │ │ └── postgresql-persistent-parameters-55bdh.json │ ├── serviceaccounts │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ ├── builder.json │ │ │ ├── default.json │ │ │ ├── deployer.json │ │ │ └── pipeline.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ ├── builder.json │ │ ├── default.json │ │ ├── deployer.json │ │ └── pipeline.json │ ├── services │ │ ├── namespaces │ │ │ └── my-persistent-application │ │ │ └── postgresql.json │ │ └── v1-preferredversion │ │ └── namespaces │ │ └── my-persistent-application │ │ └── postgresql.json │ └── templateinstances.template.openshift.io │ ├── namespaces │ │ └── my-persistent-application │ │ └── postgresql-persistent-d5c86.json │ └── v1-preferredversion │ └── namespaces │ └── my-persistent-application │ └── postgresql-persistent-d5c86.json └── velero-backup.json 98 directories, 91 files

In addition to quite a bit of metadata, the backup includes pretty much everything you can think of when it comes to Kubernetes objects: Services, ServiceAccounts, Secrets and much more. However, a backup that can not be restored is just a waste of storage space – so let’s see if we can actually recover from a disaster.

Simulating a disaster and trying to recover

Let’s go ahead and simulate the complete destruction of our application, including the OpenShift related configuration, secrets and logs. The easiest way to do that is to “accidentally” remove the OpenShift project:

$ oc delete project my-persistent-application project.project.openshift.io "my-persistent-application" deleted $ oc -n my-persistent-application get all No resources found in my-persistent-application namespace.
Code language: JavaScript (javascript)

To rule out that there is any bit left that survived our simulated outage, we are going to try and restore our backup into a completely new namespace called “my-restored-application”.

$ cat example-restore.yaml apiVersion: velero.io/v1 kind: Restore metadata: namespace: oadp-operator name: restore spec: backupName: backup-my-persistent-application restorePVs: true includedNamespaces: - my-persistent-application namespaceMapping: my-persistent-application: my-restored-application
Code language: JavaScript (javascript)

After a few minutes, the restore operation completes. This can be observed in the OpenShift console:

Fig. 5: Overview of a completed restore in the OpenShift Console

And indeed, the most important OpenShift objects seem to be up and running:

$ oc -n my-restored-application get all NAME READY STATUS RESTARTS AGE pod/postgresql-1-9hnjq 1/1 Running 0 65s pod/postgresql-1-deploy 0/1 Completed 0 72s NAME DESIRED CURRENT READY AGE replicationcontroller/postgresql-1 1 1 1 72s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/postgresql ClusterIP 172.30.1.56 <none> 5432/TCP 62s NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/postgresql 1 1 1 config,image(postgresql:10-el8)
Code language: HTML, XML (xml)

The most important question to ask now is: did my business logic, or in other words, my data survive the disaster? Let’s check by trying to query our database once again:

$ oc -n my-restored-application exec -it postgresql-1-9hnjq -- psql psql (10.15) Type "help" for help. postgres=# \connect restore_me; You are now connected to database "restore_me" as user "postgres". restore_me=# \dt List of relations Schema | Name | Type | Owner --------+------------+-------+---------- public | me_as_well | table | postgres (1 row) restore_me=# SELECT * FROM me_as_well; id | text ----+------------- 1 | hello there (1 row)
Code language: PHP (php)

And yes, as expected, my OpenShift project, as well as my data has been restored and can be accessed as usual.

A word on scale and compliance

The majority of tasks outlined above were conducted manually. This might not scale too well if you are operating many clusters. Furthermore, forgetting to install the OADP operator after installing a cluster, regularly checking if backups are in-place, etc. can be a time consuming and error prone task. Here are two ideas that you could look into in order to address said challenges:

  • Red Hat OpenShift GitOps, which recently became generally available (GA), can be used to  ensure that all of your clusters are configured in a similar fashion. For example, you could centrally store your custom resources for Velero, Backups, etc. and apply them to a fleet of clusters.
  • Red Hat Advanced Cluster Management for Kubernetes (RHACM) comes with a powerful policy engine that lets you define and enforce predefined, but also custom policies across your fleet of clusters. One way to make use of this would be to enforce that the OADP operator is installed and up to date. In addition to that, one could be checking if at least one`Backup` custom resources per project is defined.

The good thing about both approaches is that they are highly customizable, which means that it is possible to adjust the technology to your individual needs with very little effort.

Summary and outlook

This article demonstrated that it is fairly simple to create and restore a full backup of a given OpenShift project. The OADP operator offers many more APIs that I did not cover, but highly recommend looking into. Especially the `Schedule` API is key in order to make sure that backups are conducted in a regular manner.

At the time of writing, OADP is exclusively available as a community driven project. As a result, Red Hat customers can not make use of the value of a Red Hat subscription when it comes to the OADP component.


To address this gap, Red Hat is working on a fully supported, and thus tested, maintained and hardened version of the OADP operator, which will be tightly integrated into OpenShift Container Platform. This offering is expected to be available in the first half of 2021, but for various reasons this is subject to change. [4]

One reply on “Backup and Recovery with OpenShift APIs for Data Protection (OADP)”