Forensic container checkpointing in OpenShift

September 11, 2023

Photo by Daniel van den Berg on Unsplash

On a warm summer day, I visited the Kubernetes Community Days Munich and enjoyed Adrian Reber’s talk about “Forensic container checkpointing and analysis”. Now I want to try that with OpenShift 4.13! This blog post will mainly cover how to enable and use checkpointing on OpenShift 4.13. All the details about Forensic container checkpointing you can learn and read from two great blog posts from my colleague Adrian Reber:

On Tuesday, September 12th at the hybrid conference ContainersDays in Hamburg, Adrian Reber will present his great talk again at 9:45 CEST on Stage K1.  Don’t miss the talk on-site or virtually! I will add the recording to this blog post as soon as the recording is available.

Let’s start at the beginning.

What is “Forensic container checkpointing”? The important part here is “checkpointing”: In the realm of computing, checkpointing usually refers to a process where the state of a system or an application (in our case  a container)  is saved at a particular point in time. This allows for recovery or analysis at a later point in case of failures or for debugging purposes.

To achieve that, the technical foundation is Checkpoint/Restore In Userspace (CRIU), and it’s integrated in runc, crun, CRI-O and containerd – which is to say,  on most of the container runtimes. In OpenShift, we use CRI-O and runc by default, but if you like you can also switch to crun. Second, we have to enable the ContainerCheckpoint feature on the Kubelet level to enable the Kubelet API to create a checkpoint of a container. 

The downside is, checkpointing is an alpha level feature in CRI-O and Kubelet/Kubernetes level. Additionally, enabling this feature in OpenShift means you lose the support of the cluster because this feature is not yet supported by Red Hat. In case you are interested, a Feature Request (RFE) is available: 

Enabling checkpointing

Let’s get our hands dirty and change the OpenShift Cluster configuration to enable checkpointing:

Step 1: pause the machine config pools

We first need to pause the machine config pools to rollout all changes at once and avoid too many node reboots and changes:

$ oc patch mcp/{master,worker} --type merge -p '{"spec":{"paused": true}}' patched patchedCode language: Shell Session (shell)

Step 2: enable checkpointing at the CRI-O level for all worker nodes

To do this, we rollout an additional CRI-O configuration via a MachineConfig; to create the MachineConfig objects we use a tool called butane. If you want to learn more about that, I recommend reading the documentation: Creating machine configs with Butane.

$ curl -L -O

$ cat 05-worker-enable-criu.bu
variant: openshift
version: 4.13.0
  name: 05-worker-enable-criu
  labels: worker
  - path: /etc/crio/crio.conf.d/05-enable-criu
	mode: 0644
	overwrite: true
  	inline: |
    	enable_criu_support = true

$ butane  05-worker-enable-criu.bu -o 05-worker-enable-criu.yaml

$ cat 05-worker-enable-criu.yaml
# Generated by Butane; do not edit
kind: MachineConfig
  labels: worker
  name: 05-worker-enable-criu
  	version: 3.2.0
    	- contents:
        	compression: ""
        	source: data:,%5Bcrio.runtime%5D%0Aenable_criu_support%20%3D%20true%0A
      	mode: 420
      	overwrite: true
      	path: /etc/crio/crio.conf.d/05-enable-criu

$ oc apply -f 05-worker-enable-criu.yaml createdCode language: Shell Session (shell)

Step 3: enable checkpointing at the Kubelet level

To do this we have to enable the ContainerCheckpoint feature gate in an existing custom resource. If you want to learn more about that I recommend reading the documentation: Enabling features using feature gates. We have two options to adjust the customer resource object, via `oc edit` or `oc patch`:

Option 1) oc edit

oc edit featuregate/clusterCode language: Bash (bash)

Edit YAML and add or adjust:

      - ContainerCheckpoint
  featureSet: CustomNoUpgradeCode language: YAML (yaml)

Option 2) oc patch

$ oc patch featuregate/cluster \
    --type='json' \
   	 {"op": "add", "path": "/spec/featureSet", "value": "CustomNoUpgrade"},
   	 {"op": "add", "path": "/spec/customNoUpgrade", "value": {"enabled": ["ContainerCheckpoint"]}}
    ]'Code language: Shell Session (shell)

Step 4: unpause the Machine Config Pools and rollout all changes on the Nodes

Now we unpause the Machine Config Pools we had paused in the first step to enforce the rollout of CRI-O and Kubelet configuration rollout on the Nodes.

$ oc patch mcp/{master,worker} --type merge -p '{"spec":{"paused": false}}' patched patchedCode language: Shell Session (shell)

Wait for all machine config pools to be in status: Updated: true, updating: false and Degraded: false with the command 

oc get mcpCode language: Bash (bash)


After a successful machine config rollout, let’s deploy our demo application called counters app and a checkpoint analyze helper to analyze the created checkpoint from counters app.

Deployment of the counters application

I created a git repository with all the deployment artifacts and application code:

Let’s create a new project and deploy the application:

oc new-project demo
oc apply -k language: Bash (bash)

Wait until  the application is running:

$ oc get pods -l app=counters
NAME                    	 READY  STATUS	  RESTARTS      AGE
counters-857d7978fd-jnkck   1/1 	Running   0      	123mCode language: Shell Session (shell)

Let’s fetch some information for later commands and tests:

# Get counter-app URL
export COUNTER_URL=$(oc get route/counters -o jsonpath="https://{}")

# Get node where Pod is running
export NODE_NAME=$(oc get pods -l app=counters -o  jsonpath="{.items[0].spec.nodeName}" )

# Get pod name
export POD_NAME=$(oc get pods -l app=counters -o  jsonpath="{.items[0]}" )Code language: Bash (bash)

Deployment of our checkpoint analyser helper pods

We deploy the checkpoint analyser in the same demo project as the counters application:

oc apply -k language: Bash (bash)

Now we come to the checkpointing part

Again, in this blog post I’m focusing on the OpenShift part. Now we have everything running and ready to follow Adrian Reber’s blog post 

 Run queries against the counters app to write a file and write something into the memory:

$ curl ${COUNTER_URL}/create?test-file
counter: 0
$ curl ${COUNTER_URL}/secret?RANDOM_1432_KEY
counter: 1
$ curl ${COUNTER_URL}/
counter: 2Code language: Shell Session (shell)

Let’s create the checkpoint through the OpenShift / Kubernetes API at the Kubelet:

$ export TOKEN=$(oc whoami -t )
$ curl -k -X POST --header "Authorization: Bearer $TOKEN"$NODE_NAME/proxy/checkpoint/demo/$POD_NAME/counter
{"items":["/var/lib/kubelet/checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar"]}Code language: Shell Session (shell)

Now finally, we have our checkpoint ???? 

Keep in mind, a checkpoint contains everything from memory to filesystem and process information. If you create a checkpoint from an application with sensitive information in memory, you can easily export and discover that sensitive information! 

Let’s discover the checkpoint a bit:

Get the matching Pod on the same node as we created the checkpoint:

$ export CHECKPOINT_POD_NAME=$(oc get pods -l -o jsonpath="{.items[?(@.spec.nodeName=='${NODE_NAME}')]}")Code language: Shell Session (shell)

“Login” into the checkpoint analyser pod:

$ oc rsh $CHECKPOINT_POD_NAMECode language: Shell Session (shell)

Now we are inside the checkpoint analyser pod and can discover the checkpoint: 

sh-5.2# cd /checkpoints/
sh-5.2# ls
checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tarCode language: Shell Session (shell)

With the checkpointctl tool you can show some information:

sh-5.2# checkpointctl show checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar

Displaying container checkpoint data from checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11:24:18Z.tar

| CONTAINER |                             			  IMAGE                              			  |      ID      | RUNTIME |  		  CREATED   		  | ENGINE | 	IP      | CHKPT SIZE | ROOT FS DIFF SIZE |
| counter   | | b7fe1c786b7d | runc	| 2023-08-24T11:19:38.607090024Z | CRI-O  | | 8.7 MiB	| 3.0 KiB 		  |
+-----------+--------------------------------------------------------------------------------------------+--------------+---------+--------------------------------+--------+-------------+------------+-------------------+Code language: Shell Session (shell)

Let’s unpack the checkpoint and take a look inside:

  • bind.mounts – this file contains information about bind mounts and is needed during restore to mount all external files and directories at the right location
  • checkpoint/ – this directory contains the actual checkpoint as created by CRIU
  • config.dump and spec.dump – these files contain metadata about the container which is needed during restore
  • dump.log – this file contains the debug output of CRIU created during checkpointing
  • stats-dump – this file contains the data which is used by checkpointctl to display dump statistics (--print-stats)
  • rootfs-diff.tar – this file contains all changed files on the container’s file-system
sh-5.2# cd /tmp/
sh-5.2# mkdir checkpoint
sh-5.2# cd checkpoint/
sh-5.2# tar xf /checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar
sh-5.2# ls
bind.mounts  checkpoint  config.dump  dump.log	io.kubernetes.cri-o.LogPath  rootfs-diff.tar  spec.dump  stats-dumpCode language: Shell Session (shell)

CRiu Image Tool (CRIT) is another tool to analyze the CRiu Images in the checkpoint/ directory.

sh-5.2# sh-5.2# crit show checkpoint/pstree.img | jq .entries[].pid
sh-5.2# crit show checkpoint/core-1.img | jq .entries[0].tc.comm
"Python3"Code language: Shell Session (shell)

Here is an important example. As mentioned above, the whole memory is also stored on disk with possible sensitive information. We stored with our application a “Secret” key “RANDOM_1432_KEY” in memory and can easily find it:

sh-5.2# ls  checkpoint/pages-*
sh-5.2# grep -ao RANDOM_1432_KEY checkpoint/pages-*
RANDOM_1432_KEYCode language: Shell Session (shell)

In case you want to debug your application with gdb, you can convert the checkpoint to a coredump:

sh-5.2# cd checkpoint/
sh-5.2# pwd
sh-5.2# coredump-python3
sh-5.2# echo info registers | gdb --core core.1 -q
BFD: warning: /tmp/checkpoint/checkpoint/core.1 has a segment extending past end of file

warning: malformed note - filename area is too big
[New LWP 1]
Missing separate debuginfo for the main executable file
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/3e/6eae34c82de9e112e48289c49532ee80ab3929

warning: Unexpected size of section `.reg-xstate/1' in core file.
Core was generated by `python3'.

warning: Unexpected size of section `.reg-xstate/1' in core file.
#0  0x00007f563e142937 in ?? ()
(gdb) rax    	0xfffffffffffffffc  -4
rbx    		0x1f4   			500
rcx    		0x7f563e142937 	 140008385423671
rdx    		0x1f4   			500
rsi    		0x1 			1
rdi    		0x7f563de4c6b0 	 140008382318256
rbp    		0x4345886f1693 	 0x4345886f1693
rsp    		0x7ffd7fbf3a68 	 0x7ffd7fbf3a68
r8 		0x0 			0
r9 		0x0 			0
r10    		0x4345518d0200 	 73965000000000
r11    		0x246   		582
r12    		0x7f563e7741c0 	 140008391918016
r13    		0x7f563df226c0 	 140008383194816
r14    		0x7f563e72dbf8 	 140008391629816
r15    		0x7f563dc8bfc0 	 140008380481472
rip    		0x7f563e142937 	 0x7f563e142937
eflags 		0x246   		[ PF ZF IF ]
cs 		0x33    			51
ss 		0x2b    			43
ds 		0x0 			0
es 		0x0 			0
fs 		0x0 			0
gs 		0x0 			0
(gdb) sh-5.2#
Code language: Shell Session (shell)

Another option to analyze the checkpoint is to copy it on your local machine via:

$ oc cp $CHECKPOINT_POD_NAME:/checkpoints/checkpoint-counters-857d7978fd-jnkck_demo-counter-2023-08-24T11\:24\:18Z.tar checkpoint-counters.tarCode language: Shell Session (shell)


Forensic analysis is just one among the various use cases of container checkpointing. Consider the following scenarios, and there are likely many more:

  • Long-Running Processes: Applications with prolonged processes or computations benefit from checkpointing. When a container needs temporary pausing or stopping, checkpointing allows for resuming from the interruption point. For instance, this is useful during node maintenance to apply operating system updates. Similarly, it enables starting a long-running process with higher priority and resuming the lower priority process after the higher priority task completes.
  • Backup and Recovery: Creating backups of running containers is critical for swift recovery in the event of hardware failures or crashes. These checkpoints can restore container states and data on alternative infrastructure, ensuring business continuity.
  • Pre-Warming and Caching: Another valuable application is pre-warming or caching an application’s startup. By initiating an application, creating a checkpoint, and then quickly starting from the checkpoint, the startup time can be significantly reduced. A proposal by Adrian Reber at the Open Container Initiative explores the idea of storing checkpoints for later startups and other use-cases. You can find the proposal here: OCI Proposal (still a work-in-progress).

While this concept is in its early stages, it’s exciting to witness the possibilities that lie ahead.

One reply on “Forensic container checkpointing in OpenShift”

Leave a Reply

%d bloggers like this:

Subscribe to our newsletter.

Please select all the ways you would like to hear from Open Sourcerers:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our newsletter platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.