Pre-pulling k8s images

I was surprised to learn that Kubernetes doesn’t include a built-in mechanism to pre-pull images before initiating a deployment rollout. In my mind, it makes perfect sense to pull the image before any other action.

Of course, this choice aligns with the Kubernetes philosophy of managing stateless, ephemeral workloads where multiple replicas and nodes are expected to mitigate the impact of pod startup times.

However, in real-world scenarios, conditions are often less than ideal. I’m sure that this won’t be the last time that I encounter a single-replica application with frequent updates that requires the fastest possible pod startup time.

Solutions

I’ve seen people run DaemonSets to pre-pull the images, use cache registries or even deploy operators that manage multiple-image caches.

Here are some examples:

Naive implementation using a DaemonSet

I want to demonstrate now how to implement an image pre-puller using a DaemonSet, mainly because it’s a fun solution to try out!

I’ll also track the startup time of the pods to see how effective this approach can be.

Test deployment

Our initial deployment consists of 3 replicas. At startup, each replica generates a file at /tmp/start.txt containing the current timestamp to track the startup time. We’ll change the image later on.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-deployment
  name: my-deployment
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-deployment
  template:
    metadata:
      labels:
        app: my-deployment
    spec:
      containers:
        - name: app
          image: busybox:latest
          command:
            - /usr/bin/env
          args:
            - sh
            - -c
            - date +%s > /tmp/start.txt; while true; do sleep 10; done

Let’s apply the deployment and check the startup time.

$ kubectl apply -f deployment.yaml

To check the startup time I’ll use the following bash script (check-startup-time.sh):

#!/usr/bin/env bash
set -eu -o pipefail

namespace="default"
pod_selector="app=my-deployment"

kubectl get pods \
    --namespace "$namespace" \
    --selector "$pod_selector" \
    -o jsonpath='{range .items[*]}{.metadata.name} {.spec.containers[0].image} {.spec.nodeName} {.metadata.creationTimestamp}{"\n"}{end}' |
    while read -r pod img node ts; do
        startTimestamp="$(kubectl exec "$pod" -- head -1 /tmp/start.txt)"
        creationTimestamp="$(date --date="$ts" +'%s')"
        echo "Pod '$pod' ($img) on node '$node' started in $((startTimestamp - creationTimestamp)) seconds"
    done

If we run the script we get the following:

$ ./check-startup-time.sh
Pod 'my-deployment-db7577cf4-srjpf' (busybox:latest) on node 'worker01' started in 4 seconds
Pod 'my-deployment-db7577cf4-vgnbd' (busybox:latest) on node 'worker02' started in 3 seconds
Pod 'my-deployment-db7577cf4-wjz22' (busybox:latest) on node 'worker02' started in 4 seconds

Of course, the busybox image is pretty lightweight:

$ crictl --image-endpoint unix:///run/containerd/containerd.sock images | grep busy
docker.io/library/busybox           latest              27a71e19c9562       2.17MB

DaemonSet pre-puller

Let’s also create the DaemonSet that will be in charge of pre-pulling the images. For the moment, using the same busybox image.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: pre-pull
  namespace: default
spec:
  selector:
    matchLabels:
      name: pre-pull
  template:
    metadata:
      labels:
        name: pre-pull
    spec:
      initContainers:
        - name: pre-pull
          image: busybox:latest
          imagePullPolicy: Always
          command: ["/usr/bin/env"]
          args: ["true"]

      containers:
        # Pause container to keep it running with the lowest resource consumption possible
        - name: pause
          image: gcr.io/google-containers/pause:latest
      tolerations:
        # these tolerations are to have the daemonset runnable on control plane nodes
        # remove them if your control plane nodes should not run pods
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
      terminationGracePeriodSeconds: 5

$ kubectl apply -f pre-pull-ds.yaml

Deploying a heavier image

Now, let’s test with a heavier image. Imagine we’re deploying a new version of a critical single-replica app that needs to be up and running as fast as possible.

To simulate this, I’ll update our deployment’s image:

$ kubectl set image deployment my-deployment app=archlinux:multilib-devel

After the rollout completes, check the startup times:

❯ ./check-startup-time.sh
Pod 'my-deployment-7f44948958-95qh9' (archlinux:multilib-devel) on node 'worker01' started in 3 seconds
Pod 'my-deployment-7f44948958-db988' (archlinux:multilib-devel) on node 'worker02' started in 41 seconds
Pod 'my-deployment-7f44948958-mmv8q' (archlinux:multilib-devel) on node 'worker01' started in 32 seconds

As you can see the startup time increased by around 10x on nodes where the image wasn’t already pulled. And this is “only” with 322MB, I’ve seen worse ;)

$ crictl --image-endpoint unix:///run/containerd/containerd.sock images  | grep archlinux
docker.io/library/archlinux         multilib-devel      e6ea8b8396eac       322MB

Trying the pre-puller DaemonSet

First, let’s cleanup the environment.

Reset the deployment’s image:

$ kubectl set image deployment my-deployment app=busybox:latest

Then, delete the image from all the worker nodes to ensure it’s not cached:

$ crictl --image-endpoint unix:///run/containerd/containerd.sock rmi docker.io/library/archlinux:multilib-devel
Deleted: docker.io/library/archlinux:multilib-devel

Next, we’ll run a set of commands that could easily be part of a CI pipeline.

# Update the pre-pull DS with the new image
$ kubectl set image daemonset pre-pull pre-pull=archlinux:multilib-devel
daemonset.apps/pre-pull image updated

# Wait for the rollout to complete
$ kubectl rollout status daemonset pre-pull --timeout 120s
Waiting for daemon set "pre-pull" rollout to finish: 1 out of 3 new pods have been updated...
Waiting for daemon set "pre-pull" rollout to finish: 1 out of 3 new pods have been updated...
Waiting for daemon set "pre-pull" rollout to finish: 2 out of 3 new pods have been updated...
Waiting for daemon set "pre-pull" rollout to finish: 2 out of 3 new pods have been updated...
Waiting for daemon set "pre-pull" rollout to finish: 2 of 3 updated pods are available...
daemon set "pre-pull" successfully rolled out

# Update our deployment with the new image
$ kubectl set image deployment my-deployment app=archlinux:multilib-devel
deployment.apps/my-deployment image updated

Finally, check the startup time of the pods.

$ ./check-startup-time.sh
Pod 'my-deployment-7f44948958-w4fbn' (archlinux:multilib-devel) on node 'worker01' started in 5 seconds
Pod 'my-deployment-7f44948958-wjjcs' (archlinux:multilib-devel) on node 'worker01' started in 3 seconds
Pod 'my-deployment-7f44948958-wvpck' (archlinux:multilib-devel) on node 'worker02' started in 3 seconds

Much better! =)