從原始碼理解 kubectl debug — Kubernetes 除錯指南

18 min readDec 23, 2024

在 Kubernetes 的日常運維和開發過程中，對正在運行的 Pod 進行問題排查是不可避免的。傳統的排查方法，如 kubectl exec，在面對精簡的容器 image（如 Distroless）時，可能因缺乏 Shell 或是相對應的 binary 指令 (nslookup, nc, awscli … 等)而無法進行有效的指令測試。

為了解決這一問題，Kubernetes 在 v1.25 版本中正式把 kubectl debug 命令變成 stable，允許用戶在不重新啟動 Pod 的情況下，向其注入臨時容器（Ephemeral Containers），以便進行更深入的故障排除和診斷。

功能介紹

kubectl debug 命令提供了一種直接、互動式的方式來除錯 Kubernetes 集群中的資源。其主要功能包括：

工作負載除錯：為現有的 Pod 創建副本，並根據需要修改其屬性，例如更改 image tag 以使用新版本。
臨時容器注入：向正在運行的 Pod 添加臨時容器，這些容器可包含除錯工具，無需重新啟動 Pod，即可進行故障排除。
節點除錯：在節點上創建新的 Pod，該 Pod 在節點的主機命名空間中運行，並可訪問節點的檔案系統，便於對節點級別的問題進行診斷。

運作原理

kubectl debug 的核心在於利用 Kubernetes 的臨時容器 (Ephemeral Containers) 功能。臨時容器是 Pod 的子資源，與普通容器不同，它們主要用於除錯和檢查，而非承載應用程式。這些容器可以在 Pod 運行時動態注入，允許用戶在不影響現有容器的情況下，執行除錯任務。

以下範例說明

先創建一個 deployment ，使用 distroless 的鏡像

這個鏡像是我用 node js 建立的，裡面包含了下面的 Dockerfile , package.json 以及 server.js，會去啟動node.js 聆聽 3000 端口，並回應 Hello, Distroless Node.js!

Dockerfile

FROM node:18 AS builder
WORKDIR /app
COPY /app/package.json .
RUN npm install
COPY /app/server.js .

# 第二階段：創建 distroless image
FROM gcr.io/distroless/nodejs18
WORKDIR /app
COPY --from=builder /app /app
EXPOSE 3000
CMD ["server.js"]

package.json

{
    "name": "distroless-node-app",
    "version": "1.0.0",
    "main": "server.js",
    "scripts": {
      "start": "node server.js"
    },
    "dependencies": {
      "express": "^4.18.2"
    }
}

server.js

const express = require('express');
const app = express();

app.get('/', (req, res) => {
  res.send('Hello, Distroless Node.js!');
});

app.listen(3000, () => {
  console.log('Server is running on port 3000');
});

創建 deployment

$ echo "
apiVersion: apps/v1
kind: Deployment
metadata:
  name: distroless-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: distroless-image
  template:
    metadata:
      labels:
        app: distroless-image
    spec:
      containers:
      - name: distroless-image
        image: jim123820/distroless-node-app
        imagePullPolicy: Always
" | kubectl apply -f -
deployment.apps/distroless-deployment created

$ kubectl get pod
NAME                                     READY   STATUS    RESTARTS   AGE
distroless-deployment-654ffdd58c-6w9ld   1/1     Running   0          40m

嘗試用 kubectl exec 進入 shell

$ kubectl exec -it distroless-deployment-654ffdd58c-6w9ld -- /bin/bash
error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "95bfb48af9b9263bea1229d5c2dce7fd5d48f3404b241a9c7626f6ce1505c0bd": OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown

$ kubectl exec -it distroless-deployment-654ffdd58c-6w9ld -- /bin/sh
error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "03c9a894c5f156d550a9f54a11c3ee7a2a9979b02e9b9391eb5b055f4aaa611f": OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown

不論是/bin/bash 或是 /bin/sh，Image 裡面都沒有，沒辦法登進去 shell 裡面。

接著讓我們嘗試使用 kubectl debug 指令啟動一個臨時容器加進去 pod 裡面

並且使用 ss -nltp 查看目前 listen 哪些端口，並用 curl 對端口進行測試

$ kubectl debug distroless-deployment-654ffdd58c-6w9ld -it --image=nicolaka/netshoot
--profile=legacy is deprecated and will be removed in the future. It is recommended to explicitly specify a profile, for example "--profile=general".
Defaulting debug container name to debugger-5fb5j.
If you don't see a command prompt, try pressing enter.
                    dP            dP                           dP
                    88            88                           88
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP
Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.13


$ distroless-deployment-654ffdd58c-6w9ld  ~  ss -nltp
State            Recv-Q           Send-Q                     Local Address:Port                       Peer Address:Port           Process
LISTEN           0                511                                    *:3000                                  *:*

$ distroless-deployment-654ffdd58c-6w9ld  ~  curl localhost:3000
Hello, Distroless Node.js!#

如此一來，我們就可以藉由臨時容器去查看當前原本容器運作的情況並進行相對應的排查。

除了可以添加臨時容器進去 pod 之外，debug 另一個重要的工能是可以進行 node 除錯，容器將在主機命名空間中運行，主機的檔案系統將被掛載到 /host。

這有什麼好處呢？如果你的主機沒辦法透過 ssh 連線進去查看機器內部的資訊，就可以用這種方式，不論是查看 process 運作情況或是收集相關 /var/log 裡面的日誌，都會很有幫助。

$ kubectl get node
NAME                 STATUS   ROLES           AGE    VERSION
kind-control-plane   Ready    control-plane   3d5h   v1.32.0

$ kubectl debug node/kind-control-plane -it --image=busybox
Creating debugging pod node-debugger-kind-control-plane-tqhp8 with container debugger on node kind-control-plane.
If you don't see a command prompt, try pressing enter.
/ # ls
bin           etc           host          lib64         product_uuid  sys           usr
dev           home          lib           proc          root          tmp           var

Source code 分析

要進行 kubectl debug 指令執行過程的分析，我們可以參考 kubernetes 上面的 kubectl/pkg/cmd/debug/debug.go 原始碼

我們來看看執行過程吧！

流程

debug 指令主要入口點是在 Run 函數：

// Run executes a kubectl debug.
func (o *DebugOptions) Run(restClientGetter genericclioptions.RESTClientGetter, cmd *cobra.Command) error {
...
        // 根據資源類型執行不同的處理邏輯
        // 如果是 pod 則呼叫 -> visitPod
        // 如果是 node 則呼叫 -> visitNode
        switch obj := info.Object.(type) {
        case *corev1.Node:
            debugPod, containerName, visitErr = o.visitNode(ctx, obj)
        case *corev1.Pod:
            debugPod, containerName, visitErr = o.visitPod(ctx, obj)
        default:
            visitErr = fmt.Errorf("%q not supported by debug", info.Mapping.GroupVersionKind)
        }
...
    })
}

visitPod 函數會針對這個 Pod，根據不同的呼叫選項，執行不同的操作，並返回結果。

如果使用者指定了 --copy-to 這個選項（表示要複製這個 Pod 並進行除錯），那麼就走「複製 Pod」的邏輯
如果沒有指定 --copy-to，那麼走的是另一條邏輯(在現有的 Pod 中直接添加一個「臨時容器」（ephemeral container）來進行除錯)

// visitPod handles debugging for pod targets by (depending on options):
//  1. Creating an ephemeral debug container in an existing pod, OR
//  2. Making a copy of pod with certain attributes changed
//
// visitPod returns a pod and debug container name for subsequent attach, if applicable.
func (o *DebugOptions) visitPod(ctx context.Context, pod *corev1.Pod) (*corev1.Pod, string, error) {
	if len(o.CopyTo) > 0 {
		return o.debugByCopy(ctx, pod)
	}
	return o.debugByEphemeralContainer(ctx, pod)
}

Copy 模式 (debugByCopy):

// debugByCopy runs a copy of the target Pod with a debug container added or an original container modified
func (o *DebugOptions) debugByCopy(ctx context.Context, pod *corev1.Pod) (*corev1.Pod, string, error) {
	// 生成帶有 debug container 的 pod 副本
	copied, dc, err := o.generatePodCopyWithDebugContainer(pod)
...
	// 創建新的 debug pod
	created, err := o.podClient.Pods(copied.Namespace).Create(ctx, copied, metav1.CreateOptions{})
...
        // 如果設置了 Replace，則刪除原有 pod
        if o.Replace {
		err := o.podClient.Pods(pod.Namespace).Delete(ctx, pod.Name, *metav1.NewDeleteOptions(0))
		if err != nil {
			return nil, "", err
		}
	}
	return created, dc, nil
}

2. Ephemeral Container 模式 (debugByEphemeralContainer):

// debugByEphemeralContainer runs an EphemeralContainer in the target Pod for use as a debug container
func (o *DebugOptions) debugByEphemeralContainer(ctx context.Context, pod *corev1.Pod) (*corev1.Pod, string, error) {
...
	// 生成 debug container
	debugPod, debugContainer, err := o.generateDebugContainer(pod)
...
	// 創建 patch
	patch, err := strategicpatch.CreateTwoWayMergePatch(podJS, debugJS, pod

        // 應用 patch 到目標 pod
	pods := o.podClient.Pods(pod.Namespace)
	result, err := pods.Patch(ctx, pod.Name, types.StrategicMergePatchType, patch, metav1.PatchOptions{}, "ephemeralcontainers")
...
	return result, debugContainer.Name, nil
}

對 Node 的除錯是通過 visitNode 函數實現的：

// visitNode handles debugging for node targets by creating a privileged pod running in the host namespaces.
// Returns an already created pod and container name for subsequent attach, if applicable.
func (o *DebugOptions) visitNode(ctx context.Context, node *corev1.Node) (*corev1.Pod, string, error) {
	pods := o.podClient.Pods(o.Namespace)
	// 生成用於節點除錯的 pod
	debugPod, err := o.generateNodeDebugPod(node)
...
	// 創建這個除錯用的 pod
	newPod, err := pods.Create(ctx, debugPod, metav1.CreateOptions{})
...
	return newPod, newPod.Spec.Containers[0].Name, nil
}

具體的 Pod 生成邏輯在 generateNodeDebugPod 中：

// generateNodeDebugPod generates a debugging pod that schedules on the specified node.
// The generated pod will run in the host PID, Network & IPC namespaces, and it will have the node's filesystem mounted at /host.
func (o *DebugOptions) generateNodeDebugPod(node *corev1.Node) (*corev1.Pod, error) {
...
	// The name of the debugging pod is based on the target node, and it's not configurable to
	// limit the number of command line flags. There may be a collision on the name, but this
	// should be rare enough that it's not worth the API round trip to check.
	pn := fmt.Sprintf("node-debugger-%s-%s", node.Name, nameSuffixFunc(5))
	if !o.Quiet {
		fmt.Fprintf(o.Out, "Creating debugging pod %s with container %s on node %s.\\n", pn, cn, node.Name)
	}
        
        p := &corev1.Pod{
		ObjectMeta: metav1.ObjectMeta{
			Name: pn,
		},
		Spec: corev1.PodSpec{
			Containers: []corev1.Container{
				{
					Name:                     cn,
					Env:                      o.Env,
					Image:                    o.Image,
					ImagePullPolicy:          o.PullPolicy,
					Stdin:                    o.Interactive,
					TerminationMessagePolicy: corev1.TerminationMessageReadFile,
					TTY:                      o.TTY,
				},
			},
                        // 指定運行的節點
			NodeName: node.Name,
			// 設置為不重啟
			RestartPolicy: corev1.RestartPolicyNever,
	                // 加容忍以確保可以在節點上運行
			Tolerations: []corev1.Toleration{
				{
					Operator: corev1.TolerationOpExists,
				},
			},
		},
	}
...
	return p, nil
}

結論

kubectl debug 是一個功能強大且靈活的除錯工具，它提供了三種不同的除錯模式來滿足各種場景需求：

臨時容器模式：適合快速診斷
Pod 複製模式：適合深入測試
節點除錯模式：適合解決節點問題

通過了解其內部運作機制和各種使用方法，我們可以更有效地利用這個工具來解決 Kubernetes 集群中的問題。關鍵是要根據具體問題選擇合適的除錯模式，並遵循最佳實踐來確保除錯過程的效率和安全性。

參考資料

kubectl debug 原始碼：https://github.com/kubernetes/kubectl/blob/master/pkg/cmd/debug/debug.go
Kubernetes Enhancement Proposals (KEPs)

希望這篇深入的技術文章能幫助您更好地理解和使用 kubectl debug！如果您有任何問題或建議，歡迎在評論區討論。