[Cilium] 실습 환경 구성 및 Cilium 설치

들어가며

오랜만에 다시 스터디를 시작합니다. 이번에도 CloudNet@ 팀에서 진행하는 스터디로 고맙게도 스터디에 참여할 수 있게 되었습니다. 이번 스터디는 Cilium의 공식문서를 기반으로 실습해보는 스터디입니다. Cilium 한가지 주제로 진행되는 만큼 깊고 진하게 학습할 수 있을것 같아 기대가 됩니다.

첫주차에는 실습 환경을 구성하고 Cilium을 설치하는 방법을 알아보겠습니다.

실습 환경 구성

실습 환경 구성 준비

저는 MacOS를 사용하고 있기때문에 homebrew를 이용하여 VirtualBox와 Vagrant를 설치하였습니다.

VirtualBox 설치

$ brew install --cask virtualbox
# => 🍺  virtualbox was successfully installed!

$ VBoxManage --version
# => 7.1.10r169112

Vagrant 설치

$ brew install --cask vagrant
# => 🍺  vagrant was successfully installed!

$ vagrant version
# => Installed Version: 2.4.7
#    Latest Version: 2.4.7
#    
#    You're running an up-to-date version of Vagrant!

실습 환경 소개

실습 환경을 도식화하면 다음과 같습니다.

배포 가상 머신은 컨트롤플레인인 k8s-ctr, 워커노드 k8s-w1, k8s-w2로 구성되어 있습니다.
- eth0 : 10.0.2.15 (모든 노드가 동일)
- eth1 : 192.168.10.100~102
초기 프로비저닝시 kubeadm init과 join 을 실행하여 클러스터를 구성하며, 초기에는 CNI가 설치되어 있지 않습니다.

실습 환경 배포 파일 작성

Vagrantfile

가상머신을 정의하고 부팅시 실행할 프로비저닝 설정을 합니다.

# Variables
K8SV = '1.33.2-1.1' # Kubernetes Version : apt list -a kubelet , ex) 1.32.5-1.1
CONTAINERDV = '1.7.27-1' # Containerd Version : apt list -a containerd.io , ex) 1.6.33-1
N = 2 # max number of worker nodes

# Base Image  https://portal.cloud.hashicorp.com/vagrant/discover/bento/ubuntu-24.04
## Rocky linux Image https://portal.cloud.hashicorp.com/vagrant/discover/rockylinux
BOX_IMAGE = "bento/ubuntu-24.04"
BOX_VERSION = "202502.21.0"

Vagrant.configure("2") do |config|
#-ControlPlane Node
    config.vm.define "k8s-ctr" do |subconfig|
      subconfig.vm.box = BOX_IMAGE
      subconfig.vm.box_version = BOX_VERSION
      subconfig.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--groups", "/Cilium-Lab"]
        vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
        vb.name = "k8s-ctr"
        vb.cpus = 2
        vb.memory = 2048
        vb.linked_clone = true
      end
      subconfig.vm.host_name = "k8s-ctr"
      subconfig.vm.network "private_network", ip: "192.168.10.100"
      subconfig.vm.network "forwarded_port", guest: 22, host: 60000, auto_correct: true, id: "ssh"
      subconfig.vm.synced_folder "./", "/vagrant", disabled: true
      subconfig.vm.provision "shell", path: "init_cfg.sh", args: [ K8SV, CONTAINERDV]
      subconfig.vm.provision "shell", path: "k8s-ctr.sh", args: [ N ]
    end

#-Worker Nodes Subnet1
  (1..N).each do |i|
    config.vm.define "k8s-w#{i}" do |subconfig|
      subconfig.vm.box = BOX_IMAGE
      subconfig.vm.box_version = BOX_VERSION
      subconfig.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--groups", "/Cilium-Lab"]
        vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
        vb.name = "k8s-w#{i}"
        vb.cpus = 2
        vb.memory = 1536
        vb.linked_clone = true
      end
      subconfig.vm.host_name = "k8s-w#{i}"
      subconfig.vm.network "private_network", ip: "192.168.10.10#{i}"
      subconfig.vm.network "forwarded_port", guest: 22, host: "6000#{i}", auto_correct: true, id: "ssh"
      subconfig.vm.synced_folder "./", "/vagrant", disabled: true
      subconfig.vm.provision "shell", path: "init_cfg.sh", args: [ K8SV, CONTAINERDV]
      subconfig.vm.provision "shell", path: "k8s-w.sh"
    end
  end
end

init_cfg.sh

프로비저닝시 vagrant가 실행할 초기 설정 스크립트입니다. arguments로 Kubernetes 버전과 Containerd 버전등을 받아서 설치합니다.

#!/usr/bin/env bash

echo ">>>> Initial Config Start <<<<"

echo "[TASK 1] Setting Profile & Change Timezone"
echo 'alias vi=vim' >> /etc/profile
echo "sudo su -" >> /home/vagrant/.bashrc
ln -sf /usr/share/zoneinfo/Asia/Seoul /etc/localtime

echo "[TASK 2] Disable AppArmor"
systemctl stop ufw && systemctl disable ufw >/dev/null 2>&1
systemctl stop apparmor && systemctl disable apparmor >/dev/null 2>&1

echo "[TASK 3] Disable and turn off SWAP"
swapoff -a && sed -i '/swap/s/^/#/' /etc/fstab

echo "[TASK 4] Install Packages"
apt update -qq >/dev/null 2>&1
apt-get install apt-transport-https ca-certificates curl gpg -y -qq >/dev/null 2>&1

# Download the public signing key for the Kubernetes package repositories.
mkdir -p -m 755 /etc/apt/keyrings
K8SMMV=$(echo $1 | sed -En 's/^([0-9]+\.[0-9]+)\..*/\1/p')
curl -fsSL https://pkgs.k8s.io/core:/stable:/v$K8SMMV/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$K8SMMV/deb/ /" >> /etc/apt/sources.list.d/kubernetes.list
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

# packets traversing the bridge are processed by iptables for filtering
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/k8s.conf

# enable br_netfilter for iptables 
modprobe br_netfilter
modprobe overlay
echo "br_netfilter" >> /etc/modules-load.d/k8s.conf
echo "overlay" >> /etc/modules-load.d/k8s.conf

echo "[TASK 5] Install Kubernetes components (kubeadm, kubelet and kubectl)"
# Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version
apt update >/dev/null 2>&1

# apt list -a kubelet ; apt list -a containerd.io
apt-get install -y kubelet=$1 kubectl=$1 kubeadm=$1 containerd.io=$2 >/dev/null 2>&1
apt-mark hold kubelet kubeadm kubectl >/dev/null 2>&1

# containerd configure to default and cgroup managed by systemd
containerd config default > /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml

# avoid WARN&ERRO(default endpoints) when crictl run  
cat <<EOF > /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF

# ready to install for k8s 
systemctl restart containerd && systemctl enable containerd
systemctl enable --now kubelet

echo "[TASK 6] Install Packages & Helm"
apt-get install -y bridge-utils sshpass net-tools conntrack ngrep tcpdump ipset arping wireguard jq tree bash-completion unzip kubecolor >/dev/null 2>&1
curl -s https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash >/dev/null 2>&1

echo ">>>> Initial Config End <<<<"

k8s-ctr.sh

kubeadm init으로 컨트롤플레인을 설정하고, 편의를 위한 k, kc 등의 alias를 설정합니다.

#!/usr/bin/env bash

echo ">>>> K8S Controlplane config Start <<<<"

echo "[TASK 1] Initial Kubernetes"
kubeadm init --token 123456.1234567890123456 --token-ttl 0 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/16 --apiserver-advertise-address=192.168.10.100 --cri-socket=unix:///run/containerd/containerd.sock >/dev/null 2>&1


echo "[TASK 2] Setting kube config file"
mkdir -p /root/.kube
cp -i /etc/kubernetes/admin.conf /root/.kube/config
chown $(id -u):$(id -g) /root/.kube/config


echo "[TASK 3] Source the completion"
echo 'source <(kubectl completion bash)' >> /etc/profile
echo 'source <(kubeadm completion bash)' >> /etc/profile


echo "[TASK 4] Alias kubectl to k"
echo 'alias k=kubectl' >> /etc/profile
echo 'alias kc=kubecolor' >> /etc/profile
echo 'complete -F __start_kubectl k' >> /etc/profile


echo "[TASK 5] Install Kubectx & Kubens"
git clone https://github.com/ahmetb/kubectx /opt/kubectx >/dev/null 2>&1
ln -s /opt/kubectx/kubens /usr/local/bin/kubens
ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx


echo "[TASK 6] Install Kubeps & Setting PS1"
git clone https://github.com/jonmosco/kube-ps1.git /root/kube-ps1 >/dev/null 2>&1
cat <<"EOT" >> /root/.bash_profile
source /root/kube-ps1/kube-ps1.sh
KUBE_PS1_SYMBOL_ENABLE=true
function get_cluster_short() {
  echo "$1" | cut -d . -f1
}
KUBE_PS1_CLUSTER_FUNCTION=get_cluster_short
KUBE_PS1_SUFFIX=') '
PS1='$(kube_ps1)'$PS1
EOT
kubectl config rename-context "kubernetes-admin@kubernetes" "HomeLab" >/dev/null 2>&1


echo "[TASK 6] Install Kubeps & Setting PS1"
echo "192.168.10.100 k8s-ctr" >> /etc/hosts
for (( i=1; i<=$1; i++  )); do echo "192.168.10.10$i k8s-w$i" >> /etc/hosts; done


echo ">>>> K8S Controlplane Config End <<<<"

k8s-w.sh

워커노드에서 kubeadm join을 실행하여 컨트롤플레인에 조인합니다.

#!/usr/bin/env bash

echo ">>>> K8S Node config Start <<<<"

echo "[TASK 1] K8S Controlplane Join" 
kubeadm join --token 123456.1234567890123456 --discovery-token-unsafe-skip-ca-verification 192.168.10.100:6443  >/dev/null 2>&1

echo ">>>> K8S Node config End <<<<"

실습 환경 배포

실습 환경 배포를 위한 파일이 준비되었으니 vagrant up 명령을 이용하여 가상 머신을 배포하겠습니다.

$ vagrant up
# => Bringing machine 'k8s-ctr' up with 'virtualbox' provider...
#    Bringing machine 'k8s-w1' up with 'virtualbox' provider...
#    Bringing machine 'k8s-w2' up with 'virtualbox' provider...
#    ==> k8s-ctr: Box 'bento/ubuntu-24.04' could not be found. Attempting to find and install...
#        k8s-ctr: Box Provider: virtualbox
#        k8s-ctr: Box Version: 202502.21.0
#    ==> k8s-ctr: Loading metadata for box 'bento/ubuntu-24.04'
#        k8s-ctr: URL: https://vagrantcloud.com/api/v2/vagrant/bento/ubuntu-24.04
#    ==> k8s-ctr: Adding box 'bento/ubuntu-24.04' (v202502.21.0) for provider: virtualbox (arm64)
#        k8s-ctr: Downloading: https://vagrantcloud.com/bento/boxes/ubuntu-24.04/versions/202502.21.0/providers/virtualbox/arm64/vagrant.box
#    ==> k8s-ctr: Successfully added box 'bento/ubuntu-24.04' (v202502.21.0) for 'virtualbox (arm64)'!
#    ==> k8s-ctr: Preparing master VM for linked clones...
#    ...
#        k8s-w2: >>>> K8S Node config End <<<<

배포 후 각 노드에 ssh로 접속하여 ip를 확인해 보겠습니다.

$ for i in ctr w1 w2 ; do echo ">> node : k8s-$i <<"; vagrant ssh k8s-$i -c 'ip -c -4 addr show dev eth0'; echo; done #
# => >> node : k8s-ctr <<
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        altname enp0s8
#        inet 10.0.2.15/24 metric 100 brd 10.0.2.255 scope global dynamic eth0
#           valid_lft 85500sec preferred_lft 85500sec
#    
#    >> node : k8s-w1 <<
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        altname enp0s8
#        inet 10.0.2.15/24 metric 100 brd 10.0.2.255 scope global dynamic eth0
#           valid_lft 85707sec preferred_lft 85707sec
#    
#    >> node : k8s-w2 <<
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        altname enp0s8
#        inet 10.0.2.15/24 metric 100 brd 10.0.2.255 scope global dynamic eth0
#           valid_lft 85781sec preferred_lft 85781sec

k8s-ctr 노드에 접속하여 기본 정보를 확인해 보겠습니다.

$ vagrant ssh k8s-ctr
---
# => Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-53-generic aarch64)
#    ...
#    (⎈|HomeLab:N/A) root@k8s-ctr:~#
$ whoami
# => root
$ pwd
# => /root
$ hostnamectl
# =>  Static hostname: k8s-ctr
#           Icon name: computer-vm
#             Chassis: vm
#          Machine ID: 3d6bd65db7dd43d392b2d5229abb5654
#             Boot ID: 2d9ede04fd294425988e58c588dd201c
#      Virtualization: qemu
#    Operating System: Ubuntu 24.04.2 LTS
#              Kernel: Linux 6.8.0-53-generic
#        Architecture: arm64
$ htop

$ cat /etc/hosts
# => 127.0.0.1 localhost
#    127.0.1.1 vagrant
#    ...
#    127.0.2.1 k8s-ctr k8s-ctr
#    192.168.10.100 k8s-ctr
#    192.168.10.101 k8s-w1
#    192.168.10.102 k8s-w2
$ ping -c 1 k8s-w1
# => PING k8s-w1 (192.168.10.101) 56(84) bytes of data.
#    64 bytes from k8s-w1 (192.168.10.101): icmp_seq=1 ttl=64 time=0.795 ms
#    
#    --- k8s-w1 ping statistics ---
#    1 packets transmitted, 1 received, 0% packet loss, time 0ms
#    rtt min/avg/max/mdev = 0.795/0.795/0.795/0.000 ms
$ ping -c 1 k8s-w2
# => PING k8s-w2 (192.168.10.102) 56(84) bytes of data.
#    64 bytes from k8s-w2 (192.168.10.102): icmp_seq=1 ttl=64 time=1.20 ms
#    ...
$ sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-w1 hostname
# => k8s-w1
$ sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-w2 hostname
# => k8s-w2

# vagrant ssh 로 접속 시 tcp 연결 정보 : NAT Mode 10.0.2.2(GateWay)
$ ss -tnp |grep sshd
# => ESTAB 0      0           [::ffff:10.0.2.15]:22          [::ffff:10.0.2.2]:63578 users:((&quot;sshd&quot;,pid=5141,fd=4),(&quot;sshd&quot;,pid=5094,fd=4))

# nic 정보
$ ip -c addr
# => 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
#       ...
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        link/ether 08:00:27:71:19:d8 brd ff:ff:ff:ff:ff:ff
#        altname enp0s8
#        inet 10.0.2.15/24 metric 100 brd 10.0.2.255 scope global dynamic eth0
#           valid_lft 82445sec preferred_lft 82445sec
#    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        link/ether 08:00:27:da:24:93 brd ff:ff:ff:ff:ff:ff
#        altname enp0s9
#        inet 192.168.10.100/24 brd 192.168.10.255 scope global eth1
#           valid_lft forever preferred_lft forever

# default 라우팅 정보 
$ ip -c route
# => default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.100

# dns 서버 정보 : NAT Mode 10.0.2.3
$ resolvectl
# => Global
#             Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
#      resolv.conf mode: stub
#    
#    Link 2 (eth0)
#        Current Scopes: DNS
#             Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
#    Current DNS Server: 10.0.2.3
#           DNS Servers: 10.0.2.3
#    
#    Link 3 (eth1)
#        Current Scopes: none
#             Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
$ exit
---

k8s-ctr k8s 정보 확인

# 클러스터 정보 확인
$ kubectl cluster-info
# => Kubernetes control plane is running at https://192.168.10.100:6443
#    CoreDNS is running at https://192.168.10.100:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
#    
#    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

# 노드 정보 : 상태, INTERNAL-IP 확인
$ kubectl get node -owide
# => NAME      STATUS     ROLES           AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
#    k8s-ctr   NotReady   control-plane   2d    v1.33.2   192.168.10.100   <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27
#    k8s-w1    NotReady   <none>          2d    v1.33.2   10.0.2.15        <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27
#    k8s-w2    NotReady   <none>          2d    v1.33.2   10.0.2.15        <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27

# 파드 정보 : 상태, 파드 IP 확인 - kube-proxy 확인
$ kubectl get pod -A -owide
# => NAMESPACE     NAME                              READY   STATUS    RESTARTS      AGE   IP               NODE      NOMINATED NODE   READINESS GATES
#    kube-system   coredns-674b8bbfcf-79mbb          0/1     Pending   0             2d    <none>           <none>    <none>           <none>
#    kube-system   coredns-674b8bbfcf-rtx95          0/1     Pending   0             2d    <none>           <none>    <none>           <none>
#    kube-system   etcd-k8s-ctr                      1/1     Running   1 (12m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-apiserver-k8s-ctr            1/1     Running   1 (12m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-controller-manager-k8s-ctr   1/1     Running   1 (12m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-proxy-hdffr                  1/1     Running   1 (11m ago)   2d    10.0.2.15        k8s-w1    <none>           <none>
#    kube-system   kube-proxy-r96sz                  1/1     Running   1 (12m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-proxy-swgmb                  1/1     Running   1 (11m ago)   2d    10.0.2.15        k8s-w2    <none>           <none>
#    kube-system   kube-scheduler-k8s-ctr            1/1     Running   1 (12m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>

# 단축어 확인(kc = kubecolor) & coredns 파드 상태 확인
$ k  describe pod -n kube-system -l k8s-app=kube-dns
# => Name:                 coredns-674b8bbfcf-79mbb
#    Namespace:            kube-system
#    Priority:             2000000000
#    Priority Class Name:  system-cluster-critical
#    Service Account:      coredns
#    Node:                 <none>
#    Labels:               k8s-app=kube-dns
#                          pod-template-hash=674b8bbfcf
#    Annotations:          <none>
#    Status:               Pending
#    IP:
#    IPs:                  <none>
#    Controlled By:        ReplicaSet/coredns-674b8bbfcf
#    Containers:
#      coredns:
#        Image:       registry.k8s.io/coredns/coredns:v1.12.0
#        Ports:       53/UDP, 53/TCP, 9153/TCP
#        Host Ports:  0/UDP, 0/TCP, 0/TCP
#        Args:
#          -conf
#          /etc/coredns/Corefile
#        Limits:
#          memory:  170Mi
#        Requests:
#          cpu:        100m
#          memory:     70Mi
#        Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
#        Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
#        Environment:  <none>
#        Mounts:
#          /etc/coredns from config-volume (ro)
#          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vrqlj (ro)
#    Conditions:
#      Type           Status
#      PodScheduled   False
#    Volumes:
#      config-volume:
#        Type:      ConfigMap (a volume populated by a ConfigMap)
#        Name:      coredns
#        Optional:  false
#      kube-api-access-vrqlj:
#        Type:                    Projected (a volume that contains injected data from multiple sources)
#        TokenExpirationSeconds:  3607
#        ConfigMapName:           kube-root-ca.crt
#        Optional:                false
#        DownwardAPI:             true
#    QoS Class:                   Burstable
#    Node-Selectors:              kubernetes.io/os=linux
#    Tolerations:                 CriticalAddonsOnly op=Exists
#                                 node-role.kubernetes.io/control-plane:NoSchedule
#                                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
#                                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
#    Events:
#      Type     Reason            Age                  From               Message
#      ----     ------            ----                 ----               -------
#      Warning  FailedScheduling  7m18s (x2 over 12m)  default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
#      Warning  FailedScheduling  47h(x12 over 2d)    default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
#    ...
$ kc describe pod -n kube-system -l k8s-app=kube-dns

k8s-ctr INTERNAL-IP 변경 설정

#
$ cat /var/lib/kubelet/kubeadm-flags.env
# => KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.10"

# INTERNAL-IP 변경 설정
$ NODEIP=$(ip -4 addr show eth1 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
$ sed -i "s/^\(KUBELET_KUBEADM_ARGS=\"\)/\1--node-ip=${NODEIP} /" /var/lib/kubelet/kubeadm-flags.env
$ systemctl daemon-reexec && systemctl restart kubelet

$ cat /var/lib/kubelet/kubeadm-flags.env
# => KUBELET_KUBEADM_ARGS="--node-ip=192.168.10.100 --container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.10"

#
$ kubectl get node -owide
# => NAME      STATUS     ROLES           AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
#    k8s-ctr   NotReady   control-plane   2d    v1.33.2   192.168.10.100   <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27
#    k8s-w1    NotReady   <none>          2d    v1.33.2   10.0.2.15        <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27
#    k8s-w2    NotReady   <none>          2d    v1.33.2   10.0.2.15        <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27

k8s-w1, k8s-w2 에도 위와 동일한 방법으로 INTERNAL-IP를 192.168.10.x로 변경합니다.
k8s-w1/w2 설정 완료 후 INTERNAL-IP 확인

$ kubectl get node -owide
# => NAME      STATUS     ROLES           AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
#    k8s-ctr   NotReady   control-plane   2d    v1.33.2   192.168.10.100   <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27
#    k8s-w1    NotReady   <none>          2d    v1.33.2   192.168.10.101   <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27
#    k8s-w2    NotReady   <none>          2d    v1.33.2   192.168.10.102   <none>        Ubuntu 24.04.2 LTS   6.8.0-53-generic   containerd://1.7.27

$ kubectl get pod -A -owide
# => NAMESPACE     NAME                              READY   STATUS    RESTARTS      AGE   IP               NODE      NOMINATED NODE   READINESS GATES
#    kube-system   coredns-674b8bbfcf-79mbb          0/1     Pending   0             2d    <none>           <none>    <none>           <none>
#    kube-system   coredns-674b8bbfcf-rtx95          0/1     Pending   0             2d    <none>           <none>    <none>           <none>
#    kube-system   etcd-k8s-ctr                      1/1     Running   1 (27m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-apiserver-k8s-ctr            1/1     Running   1 (27m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-controller-manager-k8s-ctr   1/1     Running   1 (27m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-proxy-hdffr                  1/1     Running   1 (26m ago)   2d    192.168.10.101   k8s-w1    <none>           <none>
#    kube-system   kube-proxy-r96sz                  1/1     Running   1 (27m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-proxy-swgmb                  1/1     Running   1 (26m ago)   2d    192.168.10.102   k8s-w2    <none>           <none>
#    kube-system   kube-scheduler-k8s-ctr            1/1     Running   1 (27m ago)   2d    192.168.10.100   k8s-ctr   <none>           <none>

k8s-ctr static pod의 IP 변경 설정

#
$ tree /etc/kubernetes/manifests
# => /etc/kubernetes/manifests
#    ├── etcd.yaml
#    ├── kube-apiserver.yaml
#    ├── kube-controller-manager.yaml
#    └── kube-scheduler.yaml

# etcd 정보 확인
$ cat /etc/kubernetes/manifests/etcd.yaml
# =>   ...
#      volumes:
#      - hostPath:
#          path: /etc/kubernetes/pki/etcd
#          type: DirectoryOrCreate
#        name: etcd-certs
#      - hostPath:
#          path: /var/lib/etcd
#          type: DirectoryOrCreate
#        name: etcd-data
#      ...

$ tree /var/lib/etcd/
# => /var/lib/etcd/
#    └── member
#        ├── snap
#        │   ├── 0000000000000003-0000000000002711.snap
#        │   └── db
#        └── wal
#            ├── 0000000000000000-0000000000000000.wal
#            └── 0.tmp

# k8s-ctr 재부팅
$ reboot

Flannel CNI

Flannel 소개

Flannel은 쿠버네티스의 네트워크 요구사항을 충족하는 가장 간단하고 사용하기 쉬운 오버레이 네트워크 플러그인입니다.
Flannel은 가상 네트워크를 생성하여 파드 간 통신을 가능하게 하며, VXLAN, UDP, Host-GW 등 다양한 백엔드를 지원합니다. 이 중에서는 VXLAN 사용이 가장 권장됩니다.
VXLAN(Virtual eXtensible Local Area Network)은 물리적인 네트워크 환경 위에 논리적인 가상 네트워크를 구성하는 기술로, UDP 8472 포트를 통해 노드 간 터널링 방식으로 통신합니다.

Flannel 구조 (출처: 추가예정)

위 그림처럼 파드의 eth0 네트워크 인터페이스는 호스트 네임스페이스의 veth 인터페이스와 연결되고, veth는 cni0와 연결됩니다.
같은 노드 내에서는 cni0 브릿지를 통해 파드 간 통신이 이루어지며, 다른 노드와의 통신은 VXLAN을 통해 처리됩니다.
VXLAN 경로에서는 cni0 브릿지를 거쳐 flannel.1 인터페이스로 패킷이 전달되고, flannel.1은 호스트의 eth0을 통해 다른 노드로 전송합니다. 이때 flannel.1은 VTEP(Vxlan Tunnel End Point) 역할을 하며, 패킷을 캡슐화하여 대상 노드의 IP로 전송하고, 도착한 노드에서는 캡슐을 해제해 해당 파드로 전달합니다.
각 노드는 파드에 할당할 수 있는 IP 네트워크 대역을 가지고 있으며, flannel을 통해 ETCD나 Kubernetes API에 전달된 정보를 바탕으로 모든 노드는 자신의 라우팅 테이블을 업데이트합니다. 이를 통해 서로 다른 노드의 파드끼리도 내부 IP 주소로 통신할 수 있습니다.

Flannel 설치 및 확인

설치 전 확인

# IP 주소 범위 확인
$ kubectl cluster-info dump | grep -m 2 -E "cluster-cidr|service-cluster-ip-range"
# =>                             "--service-cluster-ip-range=10.96.0.0/16",
#                                "--cluster-cidr=10.244.0.0/16",

# coredns 파드 상태 확인
$ kubectl get pod -n kube-system -l k8s-app=kube-dns -owide
# => NAME                       READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
#    coredns-674b8bbfcf-79mbb   0/1     <span style="color: green;">Pending</span>   0          2d14h   <none>   <none>   <none>           <none>
#    coredns-674b8bbfcf-rtx95   0/1     <span style="color: green;">Pending</span>   0          2d14h   <none>   <none>   <none>           <none>
#    <span style="color: green;">👉 CNI가 설치되지 않아서 Pending 상태입니다</span>

#
$ ip -c link
# => ...
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#        link/ether 08:00:27:71:19:d8 brd ff:ff:ff:ff:ff:ff
#    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#        link/ether 08:00:27:da:24:93 brd ff:ff:ff:ff:ff:ff
$ ip -c route
# => default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.100
$ brctl show
# => <span style="color: green;">없음</span>

$ ip -c addr
# => ...
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        link/ether 08:00:27:71:19:d8 brd ff:ff:ff:ff:ff:ff
#        inet 10.0.2.15/24 metric 100 brd 10.0.2.255 scope global dynamic eth0
#           valid_lft 85957sec preferred_lft 85957sec
#    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
#        link/ether 08:00:27:da:24:93 brd ff:ff:ff:ff:ff:ff
#        inet 192.168.10.100/24 brd 192.168.10.255 scope global eth1
#           valid_lft forever preferred_lft forever
$ ifconfig | grep -iEA1 'eth[0-9]:'
# => eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
#            inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
#    --
#    eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
#            inet 192.168.10.100  netmask 255.255.255.0  broadcast 192.168.10.255

#
$ iptables-save
$ iptables -t nat -S
$ iptables -t filter -S
$ iptables -t mangle -S

# flannel 설치 후 비교를 위해 설치 전의 iptables 설정을 저장합니다.
$ iptables-save > iptables-before-flannel.txt

#
$ tree /etc/cni/net.d/
# => /etc/cni/net.d/
#    
#    0 directories, 0 files

Flannel 설치

# helm에 의한 namespace 생성 오류 방지를 위해 kube-flannel 네임스페이스를 수동으로 생성합니다.
$ kubectl create ns kube-flannel
# => namespace/kube-flannel created
$ kubectl label --overwrite ns kube-flannel pod-security.kubernetes.io/enforce=privileged
# => namespace/kube-flannel labeled

$ helm repo add flannel https://flannel-io.github.io/flannel/
# => "flannel" has been added to your repositories
$ helm repo list
# => NAME    URL
#    flannel https://flannel-io.github.io/flannel/

$ helm search repo flannel
# => NAME            CHART VERSION   APP VERSION     DESCRIPTION
#    flannel/flannel v0.27.1         v0.27.1         Install Flannel Network Plugin.
$ helm show values flannel/flannel
# => ...
#    podCidr: "10.244.0.0/16"
#    ...
#      cniBinDir: "/opt/cni/bin"
#      cniConfDir: "/etc/cni/net.d"
#      skipCNIConfigInstallation: false
#      enableNFTables: false
#      args:
#      - "--ip-masq"
#      - "--kube-subnet-mgr"
#      backend: "vxlan"
#    ...

# k8s 관련 트래픽 통신 동작하는 nic 지정
$ cat << EOF > flannel-values.yaml
podCidr: "10.244.0.0/16"

flannel:
  args:
  - "--ip-masq"
  - "--kube-subnet-mgr"
  - "--iface=eth1"  
EOF

# helm 설치
$ helm install flannel --namespace kube-flannel flannel/flannel -f flannel-values.yaml
# => NAME: flannel
#    LAST DEPLOYED: Mon Jan 19 13:52:04 2025
#    NAMESPACE: kube-flannel
#    STATUS: deployed
#    REVISION: 1
#    TEST SUITE: None
$ helm list -A
# => NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
#    flannel kube-flannel    1               2025-01-19 13:52:04.427781204 +0900 KST deployed        flannel-v0.27.1 v0.27.1

# 확인 : install-cni-plugin, install-cni
$ kc describe pod -n kube-flannel -l app=flannel
# => Name:                 kube-flannel-ds-5fm6l
#    Namespace:            kube-flannel
#    Priority:             2000001000
#    Priority Class Name:  system-node-critical
#    Service Account:      flannel
#    Node:                 <span style="color: green;">k8s-w1/192.168.10.101</span>
#    Start Time:           Sat, 19 Jul 2025 13:52:06 +0900
#    Labels:               app=flannel
#                          controller-revision-hash=66c5c78475
#                          pod-template-generation=1
#                          tier=node
#    Annotations:          <none>
#    Status:               Running
#    IP:                   192.168.10.101
#    IPs:
#      IP:           192.168.10.101
#    Controlled By:  DaemonSet/kube-flannel-ds
#    Init Containers:
#      install-cni-plugin:
#      ...
#      install-cni:
#      ....
#    Containers:
#      kube-flannel:
#        Container ID:  containerd://c6a1e24ae6193491289908c4b10a8ce6f9a36e000114aaf61dc60da43bdc50ca
#        Image:         ghcr.io/flannel-io/flannel:v0.27.1
#        Image ID:      ghcr.io/flannel-io/flannel@sha256:0c95c822b690f83dc827189d691015f92ab7e249e238876b56442b580c492d85
#        Port:          <none>
#        Host Port:     <none>
#    ...
#    Name:                 kube-flannel-ds-dstmv
#    Namespace:            kube-flannel
#    Priority:             2000001000
#    Priority Class Name:  system-node-critical
#    Service Account:      flannel
#    Node:                 <span style="color: green;">k8s-w2/192.168.10.102</span>
#    Start Time:           Sat, 19 Jul 2025 13:52:05 +0900
#    ...
#    IP:                   192.168.10.102
#    ...
#    Name:                 kube-flannel-ds-lsf7h
#    Namespace:            kube-flannel
#    Priority:             2000001000
#    Priority Class Name:  system-node-critical
#    Service Account:      flannel
#    Node:                 <span style="color: green;">k8s-ctr/192.168.10.100</span>
#    Start Time:           Sat, 19 Jul 2025 13:52:04 +0900
#    Labels:               app=flannel
#                          controller-revision-hash=66c5c78475
#                          pod-template-generation=1
#                          tier=node
#    Annotations:          <none>
#    Status:               Running
#    IP:                   192.168.10.100
#    ...

$ tree /opt/cni/bin/ # flannel
# => /opt/cni/bin/
#    ├── bandwidth
#    ├── bridge
#    ├── dhcp
#    ├── dummy
#    ├── firewall
#    ├── flannel
#    ├── host-device
#    ├── host-local
#    ├── ipvlan
#    ├── LICENSE
#    ├── loopback
#    ├── macvlan
#    ├── portmap
#    ├── ptp
#    ├── README.md
#    ├── sbr
#    ├── static
#    ├── tap
#    ├── tuning
#    ├── vlan
#    └── vrf
#    
#    1 directory, 21 files
$ tree /etc/cni/net.d/
# => /etc/cni/net.d/
#    └── 10-flannel.conflist
#    
#    1 directory, 1 file
$ cat /etc/cni/net.d/10-flannel.conflist | jq
# => {
#      "name": "cbr0",
#      "cniVersion": "0.3.1",
#      "plugins": [
#        {
#          "type": "flannel",
#          "delegate": {
#            "hairpinMode": true,
#            "isDefaultGateway": true
#          }
#        },
#        {
#          "type": "portmap",
#          "capabilities": {
#            "portMappings": true
#          }
#        }
#      ]
#    }
$ kc describe cm -n kube-flannel kube-flannel-cfg
# => ...
#    net-conf.json:
#    ----
#    {
#      "Network": "10.244.0.0/16",
#      "Backend": {
#        "Type": "vxlan"
#      }
#    }

# 설치 전과 비교해보겠습니다.
$ ip -c link
# => ...
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#        link/ether 08:00:27:71:19:d8 brd ff:ff:ff:ff:ff:ff
#    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#        link/ether 08:00:27:da:24:93 brd ff:ff:ff:ff:ff:ff
#    <span style="color: green;">4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default</span>
#    <span style="color: green;">    link/ether aa:3f:e5:cd:ae:92 brd ff:ff:ff:ff:ff:ff</span>
$ ip -c route | grep 10.244.
# => 10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
#    10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink

$ ping -c 1 10.244.1.0
# => PING 10.244.1.0 (10.244.1.0) 56(84) bytes of data.
#    64 bytes from 10.244.1.0: icmp_seq=1 ttl=64 time=1.31 ms
#    
#    --- 10.244.1.0 ping statistics ---
#    1 packets transmitted, 1 received, 0% packet loss, time 0ms
#    rtt min/avg/max/mdev = 1.314/1.314/1.314/0.000 ms
$ ping -c 1 10.244.2.0
# => PING 10.244.2.0 (10.244.2.0) 56(84) bytes of data.
#    64 bytes from 10.244.2.0: icmp_seq=1 ttl=64 time=1.31 ms
#    
#    --- 10.244.2.0 ping statistics ---
#    1 packets transmitted, 1 received, 0% packet loss, time 0ms
#    rtt min/avg/max/mdev = 1.312/1.312/1.312/0.000 ms

$ brctl show
$ iptables-save
$ iptables -t nat -S
$ iptables -t filter -S
$ iptables-save > iptables-after-flannel.txt
# 설치 전과 후의 iptables 설정을 비교합니다.
$ diff -u iptables-before-flannel.txt iptables-after-flannel.txt

# k8s-w1, k8s-w2 정보 확인
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c link ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c route ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i brctl show ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i sudo iptables -t nat -S ; echo; done

샘플 애플리케이션 배포 및 확인

# 샘플 애플리케이션 배포
$ cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webpod
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webpod
  template:
    metadata:
      labels:
        app: webpod
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - sample-app
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: webpod
        image: traefik/whoami
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: webpod
  labels:
    app: webpod
spec:
  selector:
    app: webpod
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
EOF
# => deployment.apps/webpod created
#    service/webpod created

# k8s-ctr 노드에 curl-pod 파드 배포
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: curl-pod
  labels:
    app: curl
spec:
  nodeName: k8s-ctr
  containers:
    - name: curl
      image: alpine/curl
      command: ["sleep", "36000"]
EOF
# => pod/curl-pod created

# 컨트롤플레인 노드(k8s-ctr)에서 파드 확인
$ crictl ps
# => CONTAINER     IMAGE         CREATED        STATE   NAME                    ATTEMPT POD ID        POD                               NAMESPACE
#    ba7ddc59fa138 1fb7da88b3320 1 second ago   Running curl                    0       a87775d3d7098 <span style="color: green;">curl-pod</span>                          default
#    2c095a0d795b7 83a2e3e54aa1e 19 minutes ago Running kube-flannel            0       49d7f057d7491 kube-flannel-ds-lsf7h             kube-flannel
#    13ff95772dd16 738e99dbd7325 35 minutes ago Running kube-proxy              3       0ce95a5226767 kube-proxy-r96sz                  kube-system
#    625a7ec089f93 c03972dff86ba 35 minutes ago Running kube-scheduler          3       7691ca47ac391 kube-scheduler-k8s-ctr            kube-system
#    3b02267780926 ef439b94d49d4 35 minutes ago Running kube-controller-manager 3       232075758b77f kube-controller-manager-k8s-ctr   kube-system
#    f956731d12744 31747a36ce712 35 minutes ago Running etcd                    3       7ac2514bac9cb etcd-k8s-ctr                      kube-system
#    9f50506b3ca66 c0425f3fe3fbf 35 minutes ago Running kube-apiserver          3       d5246dd2d31b9 kube-apiserver-k8s-ctr            kube-system
# <span style="color: green;">👉 curl-pod는 nodeName: 을 통해 컨트롤플레인 노드(k8s-ctr)에 배포되었습니다.</span>

# 워커 노드(k8s-w1, k8s-w2)에서 파드 확인
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i sudo crictl ps ; echo; done
# => CONTAINER     IMAGE         CREATED        STATE   NAME                    ATTEMPT POD ID        POD                       NAMESPACE
#    c934a221a4fec ab541801c8cc5 57 seconds ago Running webpod                  0       7323e0f4eced8 <span style="color: green;">webpod-697b545f57-7j5vt</span>   default
#    c6a1e24ae6193 83a2e3e54aa1e 20 minutes ago Running kube-flannel            0       3a86bc505126a kube-flannel-ds-5fm6l     kube-flannel
#    b55a66b5cd0a6 738e99dbd7325 35 minutes ago Running kube-proxy              2       8bc7a54488b35 kube-proxy-hdffr          kube-system
#    
#    >> node : k8s-w2 <<
#    CONTAINER     IMAGE         CREATED        STATE   NAME                    ATTEMPT POD ID              POD                        NAMESPACE
#    54658fa9cfc29 ab541801c8cc5 58 seconds ago Running webpod                  0       8dec7a5a7aed1 <span style="color: green;">webpod-697b545f57-sdv4l</span>    default
#    9b2a414ee1acc f72407be9e08c 19 minutes ago Running coredns                 0       23071bf7a21e4 coredns-674b8bbfcf-rtx95   kube-system
#    e1c86c4fa20fe f72407be9e08c 20 minutes ago Running coredns                 0       757397c6bcd8f coredns-674b8bbfcf-79mbb   kube-system
#    eeac62c8beba7 83a2e3e54aa1e 20 minutes ago Running kube-flannel            0       1b4ba4f721424 kube-flannel-ds-dstmv      kube-flannel
#    0a6112c11e948 738e99dbd7325 35 minutes ago Running kube-proxy              2       f9f19975aed04 kube-proxy-swgmb           kube-system
# <span style="color: green;">👉 webpod는 별도로 nodeName: 을 지정하지 않았기 때문에 워커 노드(k8s-w1, k8s-w2)에 배포되었습니다.</span>

확인

# 배포 확인
$ kubectl get deploy,svc,ep webpod -owide
# => NAME                     READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES           SELECTOR
#    deployment.apps/webpod   2/2     2            2           18m   webpod       traefik/whoami   app=webpod
#    
#    NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE   SELECTOR
#    service/webpod   ClusterIP   10.96.62.184   <none>        80/TCP    18m   app=webpod
#    
#    NAME               ENDPOINTS                     AGE
#    endpoints/webpod   10.244.1.2:80,10.244.2.4:80   18m

#
$ kubectl api-resources | grep -i endpoint
# => endpoints                           ep           v1                                true         Endpoints
#    endpointslices                                   discovery.k8s.io/v1               true         EndpointSlice

$ kubectl get endpointslices -l app=webpod
# => NAME           ADDRESSTYPE   PORTS   ENDPOINTS               AGE
#    webpod-9pfs7   IPv4          80      10.244.2.4,10.244.1.2   18m

# 배포 전과 비교해보겠습니다.
$ ip -c link
# => ...
#    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#        link/ether 08:00:27:71:19:d8 brd ff:ff:ff:ff:ff:ff
#    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#        link/ether 08:00:27:da:24:93 brd ff:ff:ff:ff:ff:ff
#    4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
#        link/ether aa:3f:e5:cd:ae:92 brd ff:ff:ff:ff:ff:ff
#    5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether b2:e2:a2:aa:4e:5c brd ff:ff:ff:ff:ff:ff
#    <span style="color: green;">6: veth0911be7c@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000</span>
#    <span style="color: green;">    link/ether 6a:8b:92:5e:74:b3 brd ff:ff:ff:ff:ff:ff link-netns cni-15400ffe-d5f7-c6c2-78d9-dbbbc2f08db7</span>
$ brctl show
# => bridge name     bridge id               STP enabled     interfaces
#    cni0            8000.b2e2a2aa4e5c       no              veth0911be7c
# <span style="color: green;">👉 veth 인터페이스를 통해 파드와 연결된 cni0 브릿지가 생성되었습니다.</span>
$ iptables-save
$ iptables -t nat -S
$ iptables-save > iptables-after-deployment.txt
$ diff iptables-after-flannel.txt iptables-after-deployment.txt
# => ...
#    62a63,64
#    > :KUBE-SEP-PQBQBGZJJ5FKN3TB - [0:0]
#    > :KUBE-SEP-R5LRHDMUTGTM635J - [0:0]
#    66a69
#    > :KUBE-SVC-CNZCPOCNCNOROALA - [0:0]
#    92a96,99
#    > -A KUBE-SEP-PQBQBGZJJ5FKN3TB -s 10.244.1.2/32 -m comment --comment "default/webpod" -j KUBE-MARK-MASQ
#    > -A KUBE-SEP-PQBQBGZJJ5FKN3TB -p tcp -m comment --comment "default/webpod" -m tcp -j DNAT --to-destination 10.244.1.2:80
#    > -A KUBE-SEP-R5LRHDMUTGTM635J -s 10.244.2.4/32 -m comment --comment "default/webpod" -j KUBE-MARK-MASQ
#    > -A KUBE-SEP-R5LRHDMUTGTM635J -p tcp -m comment --comment "default/webpod" -m tcp -j DNAT --to-destination 10.244.2.4:80
#    98a106
#    > -A KUBE-SERVICES -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-SVC-CNZCPOCNCNOROALA
#    103a112,114
#    > -A KUBE-SVC-CNZCPOCNCNOROALA ! -s 10.244.0.0/16 -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
#    > -A KUBE-SVC-CNZCPOCNCNOROALA -m comment --comment "default/webpod -> 10.244.1.2:80" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-PQBQBGZJJ5FKN3TB
#    > -A KUBE-SVC-CNZCPOCNCNOROALA -m comment --comment "default/webpod -> 10.244.2.4:80" -j KUBE-SEP-R5LRHDMUTGTM635J
#    ...

# k8s-w1, k8s-w2 정보 확인
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c link ; echo; done
# => >> node : k8s-w1 <<
#    ...
#    5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether 56:8b:b8:09:e1:a0 brd ff:ff:ff:ff:ff:ff
#    6: veth52205e86@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether ba:b1:d5:a8:5e:c6 brd ff:ff:ff:ff:ff:ff link-netns cni-42b4483c-e253-de82-a5c3-2cbf657cc6ed
#    
#    >> node : k8s-w2 <<
#    ...
#    5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether b6:aa:16:04:b0:58 brd ff:ff:ff:ff:ff:ff
#    6: veth605dad7b@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether 36:5a:70:ff:de:e9 brd ff:ff:ff:ff:ff:ff link-netns cni-e020c420-373a-900d-bf44-34fbe4622f7e
#    7: veth002efe84@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether be:86:35:e2:1a:5d brd ff:ff:ff:ff:ff:ff link-netns cni-af271963-86ee-26b6-35b9-39173672cd1a
#    8: veth1dce6530@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether fa:d9:af:04:69:e2 brd ff:ff:ff:ff:ff:ff link-netns cni-ca1eedc6-43ff-e346-318d-ba345e0ba532
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c route ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i brctl show ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i sudo iptables -t nat -S ; echo; done

통신 확인

#
$ kubectl get pod -l app=webpod -owide
# => NAME                      READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
#    webpod-697b545f57-7j5vt   1/1     Running   0          24m   10.244.1.2   k8s-w1   <none>           <none>
#    webpod-697b545f57-sdv4l   1/1     Running   0          24m   10.244.2.4   k8s-w2   <none>           <none>
$ POD1IP=10.244.1.2
$ kubectl exec -it curl-pod -- curl $POD1IP
# => Hostname: webpod-697b545f57-7j5vt
#    IP: 127.0.0.1
#    IP: ::1
#    IP: 10.244.1.2
#    IP: fe80::dc28:1fff:fe46:abd0
#    RemoteAddr: 10.244.0.2:46774
#    GET / HTTP/1.1
#    Host: 10.244.1.2
#    User-Agent: curl/8.14.1
#    Accept: */*

#
$ kubectl get svc,ep webpod
# => NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
#    service/webpod   ClusterIP   10.96.62.184   <none>        80/TCP    25m
#    
#    NAME               ENDPOINTS                     AGE
#    endpoints/webpod   10.244.1.2:80,10.244.2.4:80   25m
$ kubectl exec -it curl-pod -- curl webpod
# => Hostname: webpod-697b545f57-7j5vt
#    IP: 127.0.0.1
#    IP: ::1
#    IP: 10.244.1.2
#    IP: fe80::dc28:1fff:fe46:abd0
#    RemoteAddr: 10.244.0.2:55684
#    GET / HTTP/1.1
#    Host: webpod
#    User-Agent: curl/8.14.1
#    Accept: */*
$ kubectl exec -it curl-pod -- curl webpod | grep Hostname
# => Hostname: webpod-697b545f57-7j5vt
$ kubectl exec -it curl-pod -- sh -c 'while true; do curl -s webpod | grep Hostname; sleep 1; done'
# => Hostname: webpod-697b545f57-7j5vt
#    Hostname: webpod-697b545f57-sdv4l
#    Hostname: webpod-697b545f57-7j5vt
#    ...

# Service 동작 처리에 iptables 규칙 활용 확인 >> Service 가 100개 , 1000개 , 10000개 증가 되면???
$ kubectl get svc webpod -o jsonpath="{.spec.clusterIP}"
# => 10.96.62.184
$ SVCIP=$(kubectl get svc webpod -o jsonpath="{.spec.clusterIP}")
$ iptables -t nat -S | grep $SVCIP
# => -A KUBE-SERVICES -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-SVC-CNZCPOCNCNOROALA
#    -A KUBE-SVC-CNZCPOCNCNOROALA ! -s 10.244.0.0/16 -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i sudo iptables -t nat -S | grep $SVCIP ; echo; done
# => >> node : k8s-w1 <<
#    -A KUBE-SERVICES -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-SVC-CNZCPOCNCNOROALA
#    -A KUBE-SVC-CNZCPOCNCNOROALA ! -s 10.244.0.0/16 -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
#    
#    >> node : k8s-w2 <<
#    -A KUBE-SERVICES -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-SVC-CNZCPOCNCNOROALA
#    -A KUBE-SVC-CNZCPOCNCNOROALA ! -s 10.244.0.0/16 -d 10.96.62.184/32 -p tcp -m comment --comment "default/webpod cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ

대규모 환경에서 iptables 단점
- kube-proxy에 의해 생성되는 iptables 규칙이 많아질수록 성능 저하가 발생할 수 있습니다.
- 특히, 많은 수의 서비스가 있는 경우 iptables 규칙이 급격히 증가하여 성능에 영향을 미칠 수 있습니다.
- 테스트 클러스터에서 3800개 노드의 19000개 파드를 배포한 결과, iptables 규칙이 24,000개 이상 생성되었습니다.
- 이로 인한 성능 저하는 다음과 같습니다.
  - 통신 연결시 1.2ms의 지연이 발생했습니다.
  - 클러스터의 iptables 규칙 갱신이 5분 이상 소요되었습니다.
  - 53%의 CPU 오버헤드가 발생했습니다.
- 이러한 문제로 인해 iptables를 사용하지 않고 eBPF을 사용하는 cilium 과 같은 CNI 플러그인이 대안으로 인기를 얻고 있습니다.

Cilium CNI

Cilium CNI 소개

Cilium은 기존의 복잡한 네트워크 스택을 eBPF를 통해 간소화하고, 빠르게 처리할 수 있도록 하는 CNI 플러그인입니다.
iptables 기반의 kube-proxy를 대체하여, 앞서 살펴본 기존의 Iptables 기반의 CNI 플러그인 들의 단점을 대부분 해결할 수 있습니다.

https://isovalent.com/blog/post/migrating-from-metallb-to-cilium/

Cilium eBPF는 추가적인 코드 수정이나 설정 변경없이, 리눅스 커널에서 동작하는 Bytecode를 자유롭게 프로그래밍하여 커널에 로딩시켜 동작이 가능합니다. 링크
또한 eBPF는 모든 패킷을 가로채기 위해서 수신 NIC의 ingress TC(Traffic Control) hooks를 사용할 수 있습니다. NIC의 TC Hooks에 eBPF 프로그램이 attach 된 예
Cilium은 터널모드(VXLAN, GENEVE), 네이티브 라우팅 모드의 2가지 네트워크 모드를 제공합니다. Docs
- 터널모드 : Cilium이 VXLAN(UDP 8472), GENEVE(UDP 6081) 인터페이스를 만들어서 이들을 통해 트래픽을 전달합니다. Encapsulation 모드라고도 합니다.
- 네이티브 라우팅 모드 : Cilium가 패킷 전달을 위해 구성을 변경하지 않고, 외부에서 제공되는 패킷 전달 방법(클라우드 또는 BGP 라우팅등)을 사용합니다. Direct Routing 모드라고도 합니다.
2021년 10월 Cilium은 CNCF에 채택되었습니다. 링크
Googke GKE dataplane과 AWS EKS Anywhere에서 기본 CNI로 Cilium을 사용하고 있습니다. 링크
Cilium은 Kube-Proxy를 100% 대체 가능합니다.
- iptables를 거의 사용하지 않고도 동작하며, 이를 통해 성능을 향상시킬 수 있습니다.
- 하지만 istio, FHRP, VRRP 등과 같이 iptables 기능을 활용하는 동작들은 이슈가 발생할 수 있으며, 차츰 해결해 나가고 있습니다.
- 동작 소개 1 - Youtube
- 동작 소개 2 : ByteDance 사례 - Youtube, CNCF
구성요소 - 링크 Cilium 아키텍쳐 - 출처
- Cilium Operator : K8S 클러스터에 대한 한 번씩 처리해야 하는 작업을 관리합니다.
- Cilium Agent : 데몬셋으로 실행, K8S API 설정으로 부터 ‘네트워크 설정, 네트워크 정책, 서비스 부하분산, 모니터링’ 등을 수행하며, eBPF 프로그램을 관리합니다.
- Cilium Client (CLI) : Cilium 커멘드툴로 eBPF maps 에 직접 접속하여 상태를 확인할 수 있습니다.
- Hubble : 네트워크와 보안 모니터링 플랫폼 역할을 하여, Server, Relay, Client, Graphical UI 로 구성되어 있습니다.
- Data Store : Cilium Agent 간의 상태를 저장하고 전파하는 데이터 저장소, 2가지 종류 중 선택(K8S CRDs, Key-Value Store)할 수 있습니다.

Cilium CNI 설치

Cilium 시스템 요구 사항 확인 - 공식 문서

AMD64 또는 AArch64 CPU 아키텍처를 사용하는 호스트

Linux 커널 5.4 이상 또는 동등 버전(예: RHEL 8.6의 경우 4.18)

$ arch
# => aarch64
      
$ uname -r
# => 6.8.0-53-generic

커널 구성 옵션 활성화

# [커널 구성 옵션] 기본 요구 사항 
$ grep -E 'CONFIG_BPF|CONFIG_BPF_SYSCALL|CONFIG_NET_CLS_BPF|CONFIG_BPF_JIT|CONFIG_NET_CLS_ACT|CONFIG_NET_SCH_INGRESS|CONFIG_CRYPTO_SHA1|CONFIG_CRYPTO_USER_API_HASH|CONFIG_CGROUPS|CONFIG_CGROUP_BPF|CONFIG_PERF_EVENTS|CONFIG_SCHEDSTATS' /boot/config-$(uname -r)
# => CONFIG_BPF=y
#    CONFIG_BPF_SYSCALL=y
#    CONFIG_BPF_JIT=y
#    CONFIG_BPF_JIT_ALWAYS_ON=y
#    CONFIG_BPF_JIT_DEFAULT_ON=y
#    CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
#    # CONFIG_BPF_PRELOAD is not set
#    CONFIG_BPF_LSM=y
#    CONFIG_CGROUPS=y
#    CONFIG_CGROUP_BPF=y
#    CONFIG_PERF_EVENTS=y
#    CONFIG_NET_SCH_INGRESS=m
#    CONFIG_NET_CLS_BPF=m
#    CONFIG_NET_CLS_ACT=y
#    CONFIG_BPF_STREAM_PARSER=y
#    CONFIG_CRYPTO_SHA1=y
#    CONFIG_CRYPTO_USER_API_HASH=m
#    CONFIG_CRYPTO_SHA1_ARM64_CE=m
#    CONFIG_SCHEDSTATS=y
#    CONFIG_BPF_EVENTS=y
#    CONFIG_BPF_KPROBE_OVERRIDE=y
    
# [커널 구성 옵션] Requirements for Tunneling and Routing
$ grep -E 'CONFIG_VXLAN=y|CONFIG_VXLAN=m|CONFIG_GENEVE=y|CONFIG_GENEVE=m|CONFIG_FIB_RULES=y' /boot/config-$(uname -r)
$ CONFIG_FIB_RULES=y # 커널에 내장됨
$ CONFIG_VXLAN=m # 모듈로 컴파일됨 → 커널에 로드해서 사용
$ CONFIG_GENEVE=m # 모듈로 컴파일됨 → 커널에 로드해서 사용
    
## (참고) 커널 로드
$ lsmod | grep -E 'vxlan|geneve'
# => vxlan                 147456  0
#    ip6_udp_tunnel         16384  1 vxlan
#    udp_tunnel             36864  1 vxlan
$ modprobe geneve
$ lsmod | grep -E 'vxlan|geneve'
# => geneve                 45056  0
#    vxlan                 147456  0
#    ip6_udp_tunnel         16384  2 geneve,vxlan
#    udp_tunnel             36864  2 geneve,vxlan
    
# [커널 구성 옵션] Requirements for L7 and FQDN Policies
$ grep -E 'CONFIG_NETFILTER_XT_TARGET_TPROXY|CONFIG_NETFILTER_XT_TARGET_MARK|CONFIG_NETFILTER_XT_TARGET_CT|CONFIG_NETFILTER_XT_MATCH_MARK|CONFIG_NETFILTER_XT_MATCH_SOCKET' /boot/config-$(uname -r)
# => CONFIG_NETFILTER_XT_TARGET_CT=m
#    CONFIG_NETFILTER_XT_TARGET_MARK=m
#    CONFIG_NETFILTER_XT_TARGET_TPROXY=m
#    CONFIG_NETFILTER_XT_MATCH_MARK=m
#    CONFIG_NETFILTER_XT_MATCH_SOCKET=m
    
# [커널 구성 옵션] Requirements for Netkit Device Mode
$ grep -E 'CONFIG_NETKIT=y|CONFIG_NETKIT=m' /boot/config-$(uname -r)
# => CONFIG_NETKIT=y

고급 기능 동작을 위한 최소 커널 버전 - Docs

Cilium Feature	Minimum Kernel Version
WireGuard Transparent Encryption	>= 5.6
Full support for Session Affinity	>= 5.7
BPF-based proxy redirection	>= 5.7
Socket-level LB bypass in pod netns	>= 5.7
L3 devices	>= 5.8
BPF-based host routing	>= 5.10
Multicast Support in Cilium (Beta) (AMD64)	>= 5.10
IPv6 BIG TCP support	>= 5.19
Multicast Support in Cilium (Beta) (AArch64)	>= 6.0
IPv4 BIG TCP support	>= 6.3

Cilium 동작(Node 간)을 위한 방화벽 규칙 : 해당 포트 인/아웃 허용 필요 - Docs

Mounted eBPF filesystem : 일부 배포판 마운트되어 있음, 혹은 Cilium 설치 시 마운트 시도 - Docs

# eBPF 파일 시스템 마운트 확인
$ mount | grep /sys/fs/bpf
# => bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)

Privileges : Cilium 동작을 위해서 관리자 수준 권한 필요 - Docs
- cilium은 네트워킹 작업과 보안 정책을 구현하는 eBPF 프로그램 설치를 위해 리눅스 커널과 상호작용합니다. 이 작업은 관리자 권한이 필요하며, cilium은 이를 위해 CAP_SYS_ADMIN 권한을 사용합니다. 또한 해당 권한은 cilium-agent 컨테이너에 부여되어야 합니다.
- 가장 편리한 방법은 cilium-agent를 root 사용자나 privileged 모드로 실행하는 것입니다.
- cilium은 또한 호스트 네트워킹 네임스페이스에 대한 접근을 필요로 합니다. 따라서 cilium 파드는 호스트 네트워킹 네임스페이스에 직접 사용할 수 있도록 설정되어야 합니다.

kube-proxy 제거

기존 Flannel CNI를 제거합니다.

$ helm uninstall -n kube-flannel flannel
# => release "flannel" uninstalled
$ helm list -A
# => NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION
  
#
$ kubectl get all -n kube-flannel
$ => No resources found in kube-flannel namespace.
$ kubectl delete ns kube-flannel
# => namespace "kube-flannel" deleted
  
$ kubectl get pod -A -owide

k8s-ctr, k8s-w1, k8s-w2 모든 노드에서 아래 실행하여 flannel 관련된 인터페이스를 제거합니다.

# 제거 전 확인
$ ip -c link
# => ...
#    4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
#        link/ether aa:3f:e5:cd:ae:92 brd ff:ff:ff:ff:ff:ff
#    5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether 1e:a9:44:a0:00:e1 brd ff:ff:ff:ff:ff:ff
#    6: veth322e34b5@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether 8e:ea:40:c5:46:a7 brd ff:ff:ff:ff:ff:ff link-netns cni-62beabc5-97a9-e6cf-7f8f-bd4de413d33c
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c link ; echo; done
# => >> node : k8s-w1 <<
#    ...
#    4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
#        link/ether c2:b2:62:af:c2:93 brd ff:ff:ff:ff:ff:ff
#    5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether 06:8a:4c:62:12:de brd ff:ff:ff:ff:ff:ff
#    6: vethd8fb7cb1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether e2:3f:03:c3:be:a2 brd ff:ff:ff:ff:ff:ff link-netns cni-81f15ae4-4a35-bce7-f755-657f3b8e39ea
#    
#    >> node : k8s-w2 <<
#    ...
#    4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
#        link/ether 02:dd:56:d3:f6:3f brd ff:ff:ff:ff:ff:ff
#    5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether 06:57:05:39:42:57 brd ff:ff:ff:ff:ff:ff
#    6: veth390f8e9e@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether 5a:23:ff:ba:28:90 brd ff:ff:ff:ff:ff:ff link-netns cni-a27cec88-43c0-acf5-0bc5-f64e945bded3
#    7: veth357a49b9@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether b6:6c:69:43:29:a6 brd ff:ff:ff:ff:ff:ff link-netns cni-36af9b39-bcb8-ad52-beb5-4b67475b404f
#    8: vethf9bb5584@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default qlen 1000
#        link/ether 4a:7c:ab:42:e7:ca brd ff:ff:ff:ff:ff:ff link-netns cni-29e132ee-4860-b74c-c4f5-d5d27b341b83

$ brctl show
# => bridge name     bridge id               STP enabled     interfaces
#    cni0            8000.1ea944a000e1       no              veth322e34b5
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i brctl show ; echo; done
# => >> node : k8s-w1 <<
#    bridge name     bridge id               STP enabled     interfaces
#    cni0            8000.068a4c6212de       no              vethd8fb7cb1
#    
#    >> node : k8s-w2 <<
#    bridge name     bridge id               STP enabled     interfaces
#    cni0            8000.065705394257       no              veth357a49b9
#                                                            veth390f8e9e
#                                                            vethf9bb5584

$ ip -c route
# => default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
#    10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
#    10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.100
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c route ; echo; done
# => >> node : k8s-w1 <<
#    default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
#    10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
#    10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.101
#    
#    >> node : k8s-w2 <<
#    default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
#    10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
#    10.244.2.0/24 dev cni0 proto kernel scope link src 10.244.2.1
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.102

# vnic 제거
$ ip link del flannel.1
$ ip link del cni0

$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i sudo ip link del flannel.1 ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i sudo ip link del cni0 ; echo; done

# 제거 확인
$ ip -c link
# => ...
#    6: veth322e34b5@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#        link/ether 8e:ea:40:c5:46:a7 brd ff:ff:ff:ff:ff:ff link-netns cni-62beabc5-97a9-e6cf-7f8f-bd4de413d33c
# <span style="color: green;">👉 flannel.1과 cni0가 삭제되었습니다.</span>
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c link ; echo; done

$ brctl show
# => <span style="color: green;">없음</span>
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i brctl show ; echo; done
# => >> node : k8s-w1 <<
#    >> node : k8s-w2 <<

$ ip -c route
# => default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.100
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh -o StrictHostKeyChecking=no vagrant@k8s-$i ip -c route ; echo; done
# => >> node : k8s-w1 <<
#    default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.101
#    
#    >> node : k8s-w2 <<
#    default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
#    10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
#    10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
#    192.168.10.0/24 dev eth1 proto kernel scope link src 192.168.10.102
# <span style="color: green;">👉 flannel.1과 cni0가 삭제되어, 관련 라우팅 정보가 삭제되었습니다.</span>

기존 kube-proxy를 제거합니다.

#
$ kubectl -n kube-system delete ds kube-proxy
# => daemonset.apps "kube-proxy" deleted
$ kubectl -n kube-system delete cm kube-proxy
# => configmap "kube-proxy" deleted

# 배포된 파드의 IP는 남겨져 있습니다.
$ kubectl get pod -A -owide
# => NAMESPACE     NAME                              READY   STATUS             RESTARTS         AGE     IP               NODE      NOMINATED NODE   READINESS GATES
#    default       curl-pod                          1/1     Running            1 (143m ago)     3h1m    10.244.0.3       k8s-ctr   <none>           <none>
#    default       webpod-697b545f57-7j5vt           1/1     Running            1 (142m ago)     3h2m    10.244.1.3       k8s-w1    <none>           <none>
#    default       webpod-697b545f57-sdv4l           1/1     Running            1 (142m ago)     3h2m    10.244.2.7       k8s-w2    <none>           <none>
#    kube-system   coredns-674b8bbfcf-79mbb          0/1     CrashLoopBackOff   27 (4m ago)      2d17h   10.244.2.5       k8s-w2    <none>           <none>
#    kube-system   coredns-674b8bbfcf-rtx95          0/1     CrashLoopBackOff   27 (4m26s ago)   2d17h   10.244.2.6       k8s-w2    <none>           <none>
#    kube-system   etcd-k8s-ctr                      1/1     Running            4 (143m ago)     2d17h   192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-apiserver-k8s-ctr            1/1     Running            4 (143m ago)     2d17h   192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-controller-manager-k8s-ctr   1/1     Running            4 (143m ago)     2d17h   192.168.10.100   k8s-ctr   <none>           <none>
#    kube-system   kube-scheduler-k8s-ctr            1/1     Running            4 (143m ago)     2d17h   192.168.10.100   k8s-ctr   <none>           <none>

#
$ kubectl exec -it curl-pod -- curl webpod
# => curl: (6) Could not resolve host: webpod
#    command terminated with exit code 6
# <span style="color: green;">👉 kube-proxy와 CNI의 삭제로 coredns가 동작하지 않아서 webpod 서비스에 접근할 수 없습니다.</span>

#
$ iptables-save

# Run on each node with root permissions:
$ iptables-save | grep -v KUBE | grep -v FLANNEL | iptables-restore
$ iptables-save

$ sshpass -p 'vagrant' ssh vagrant@k8s-w1 "sudo iptables-save | grep -v KUBE | grep -v FLANNEL | sudo iptables-restore"
$ sshpass -p 'vagrant' ssh vagrant@k8s-w1 sudo iptables-save

$ sshpass -p 'vagrant' ssh vagrant@k8s-w2 "sudo iptables-save | grep -v KUBE | grep -v FLANNEL | sudo iptables-restore"
$ sshpass -p 'vagrant' ssh vagrant@k8s-w2 sudo iptables-save

#
$ kubectl get pod -owide
# => NAME                      READY   STATUS    RESTARTS       AGE     IP           NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                  1/1     Running   1 (155m ago)   3h14m   10.244.0.3   k8s-ctr   <none>           <none>
#    webpod-697b545f57-7j5vt   1/1     Running   1 (155m ago)   3h14m   10.244.1.3   k8s-w1    <none>           <none>
#    webpod-697b545f57-sdv4l   1/1     Running   1 (155m ago)   3h14m   10.244.2.7   k8s-w2    <none>           <none>

노드별 파드에 할당되는 IPAM(PodCIDR) 정보를 확인해보겠습니다.

#--allocate-node-cidrs=true 로 설정된 kube-controller-manager에서 CIDR을 자동 할당함
$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# => k8s-ctr 10.244.0.0/24
#    k8s-w1  10.244.1.0/24
#    k8s-w2  10.244.2.0/24
$ kubectl get pod -owide
# => NAME                      READY   STATUS    RESTARTS       AGE     IP           NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                  1/1     Running   1 (157m ago)   3h15m   10.244.0.3   k8s-ctr   <none>           <none>
#    webpod-697b545f57-7j5vt   1/1     Running   1 (156m ago)   3h16m   10.244.1.3   k8s-w1    <none>           <none>
#    webpod-697b545f57-sdv4l   1/1     Running   1 (156m ago)   3h16m   10.244.2.7   k8s-w2    <none>           <none>

#
$ kc describe pod -n kube-system kube-controller-manager-k8s-ctr
# => ...
#       Command:
#          kube-controller-manager
#          --allocate-node-cidrs=true
#          --cluster-cidr=10.244.0.0/16
#          --service-cluster-ip-range=10.96.0.0/16
#    ... 

Cilium CNI 설치 with Helm

관련 문서 : Helm, Masquering, ClusterScope, Routing
Cilium 1.17.5 Helm Chart - ArtifactHub를 사용하여 설치합니다.

# Cilium 설치 with Helm
$ helm repo add cilium https://helm.cilium.io/
# => "cilium" has been added to your repositories

# 모든 NIC 지정 + bpf.masq=true + NoIptablesRules
$ helm install cilium cilium/cilium --version 1.17.5 --namespace kube-system \
  --set k8sServiceHost=192.168.10.100 --set k8sServicePort=6443 \
  --set kubeProxyReplacement=true \
  --set routingMode=native \
  --set autoDirectNodeRoutes=true \
  --set ipam.mode="cluster-pool" \
  --set ipam.operator.clusterPoolIPv4PodCIDRList={"172.20.0.0/16"} \
  --set ipv4NativeRoutingCIDR=172.20.0.0/16 \
  --set endpointRoutes.enabled=true \
  --set installNoConntrackIptablesRules=true \
  --set bpf.masquerade=true \
  --set ipv6.enabled=false
# => NAME: cilium
#    LAST DEPLOYED: Sat Jul 19 17:34:05 2025
#    NAMESPACE: kube-system
#    STATUS: deployed
#    REVISION: 1
#    TEST SUITE: None
#    NOTES:
#    You have successfully installed Cilium with Hubble.
#    
#    Your release version is 1.17.5.

# 확인
$ helm get values cilium -n kube-system
# => USER-SUPPLIED VALUES:
#    autoDirectNodeRoutes: true
#    bpf:
#      masquerade: true
#    endpointRoutes:
#      enabled: true
#    installNoConntrackIptablesRules: true
#    ipam:
#      mode: cluster-pool
#      operator:
#        clusterPoolIPv4PodCIDRList:
#        - 172.20.0.0/16
#    ipv4NativeRoutingCIDR: 172.20.0.0/16
#    ipv6:
#      enabled: false
#    k8sServiceHost: 192.168.10.100
#    k8sServicePort: 6443
#    kubeProxyReplacement: true
#    routingMode: native
$ helm list -A
# => NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
#    cilium  kube-system     1               2025-07-19 17:34:05.700270399 +0900 KST deployed        cilium-1.17.5   1.17.5
$ kubectl get crd
# => No resources found
$ watch -d kubectl get pod -A
# => Every 2.0s: kubectl get pod -A                                                                                                                 k8s-ctr: Sat Jul 19 17:36:42 2025
#    
#    NAMESPACE     NAME                               READY   STATUS    RESTARTS       AGE
#    default       curl-pod                           1/1     Running   1 (166m ago)   3h24m
#    default       webpod-697b545f57-7j5vt            1/1     Running   1 (165m ago)   3h25m
#    default       webpod-697b545f57-sdv4l            1/1     Running   1 (165m ago)   3h25m
#    kube-system   <span style="color: green;">cilium-envoy-b2mn9</span>                 1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-envoy-dgdmn</span>                 1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-envoy-pjn95</span>                 1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-fl689</span>                       1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-mqnkn</span>                       1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-operator-865bc7f457-hpwvh</span>   1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-operator-865bc7f457-v5k84</span>   1/1     Running   0              2m36s
#    kube-system   <span style="color: green;">cilium-zz9k4</span>                       1/1     Running   0              2m36s
#    kube-system   coredns-674b8bbfcf-7q52c           1/1     <span style="color: green;">Running</span>   0              52s
#    kube-system   coredns-674b8bbfcf-bvsfb           1/1     <span style="color: green;">Running</span>   0              66s
#    kube-system   etcd-k8s-ctr                       1/1     Running   4 (166m ago)   2d18h
#    kube-system   kube-apiserver-k8s-ctr             1/1     Running   4 (166m ago)   2d18h
#    kube-system   kube-controller-manager-k8s-ctr    1/1     Running   4 (166m ago)   2d18h
#    kube-system   kube-scheduler-k8s-ctr             1/1     Running   4 (166m ago)   2d18h
# <span style="color: green;">👉 cilium 관련된 파드가 배포되었고 coredns 파드도 정상적으로 동작합니다.</span>

$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg status --verbose
# => KubeProxyReplacement:   True   [eth0    10.0.2.15 fd17:625c:f037:2:a00:27ff:fe71:19d8 fe80::a00:27ff:fe71:19d8, eth1   192.168.10.102 fe80::a00:27ff:fe79:58ac (Direct Routing)]
#    ...
#    Routing:                Network: Native   Host: BPF
#    ...
#    Masquerading:           BPF   [eth0, eth1]   172.20.0.0/16 [IPv4: Enabled, IPv6: Disabled]
#    ...

# 노드에 iptables 확인
$ iptables -t nat -S
# => -P PREROUTING ACCEPT
#    -P INPUT ACCEPT
#    -P OUTPUT ACCEPT
#    -P POSTROUTING ACCEPT
#    -N CILIUM_OUTPUT_nat
#    -N CILIUM_POST_nat
#    -N CILIUM_PRE_nat
#    -N KUBE-KUBELET-CANARY
#    -A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_nat" -j CILIUM_PRE_nat
#    -A OUTPUT -m comment --comment "cilium-feeder: CILIUM_OUTPUT_nat" -j CILIUM_OUTPUT_nat
#    -A POSTROUTING -m comment --comment "cilium-feeder: CILIUM_POST_nat" -j CILIUM_POST_nat
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i sudo iptables -t nat -S ; echo; done

$ iptables-save
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i sudo iptables-save ; echo; done 

PodCIDR IPAM 확인해보겠습니다. - ClusterScope

#
$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# => k8s-ctr 10.244.0.0/24
#    k8s-w1  10.244.1.0/24
#    k8s-w2  10.244.2.0/24

# 파드 IP 확인
$ kubectl get pod -owide
# => NAME                      READY   STATUS    RESTARTS       AGE     IP           NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                  1/1     Running   1 (172m ago)   3h30m   10.244.0.3   k8s-ctr   <none>           <none>
#    webpod-697b545f57-7j5vt   1/1     Running   1 (171m ago)   3h30m   10.244.1.3   k8s-w1    <none>           <none>
#    webpod-697b545f57-sdv4l   1/1     Running   1 (171m ago)   3h30m   10.244.2.7   k8s-w2    <none>           <none>

#
$ kubectl get ciliumnodes
# => NAME      CILIUMINTERNALIP   INTERNALIP       AGE
#    k8s-ctr   172.20.2.68        192.168.10.100   6m11s
#    k8s-w1    172.20.1.88        192.168.10.101   6m50s
#    k8s-w2    172.20.0.235       192.168.10.102   7m8s
$ kubectl get ciliumnodes -o json | grep podCIDRs -A2
# =>                     "podCIDRs": [
#                            "172.20.2.0/24"
#                        ],
#    --
#                        "podCIDRs": [
#                            "172.20.1.0/24"
#                        ],
#    --
#                        "podCIDRs": [
#                            "172.20.0.0/24"
#                        ],

#
$ kubectl rollout restart deployment webpod
# => deployment.apps/webpod restarted
$ kubectl get pod -owide
# => NAME                      READY   STATUS    RESTARTS       AGE     IP             NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                  1/1     Running   1 (172m ago)   3h31m   10.244.0.3     k8s-ctr   <none>           <none>
#    webpod-86f878c468-448pc   1/1     Running   0              11s     172.20.0.202   k8s-w2    <none>           <none>
#    webpod-86f878c468-ttbs2   1/1     Running   0              15s     172.20.1.123   k8s-w1    <none>           <none>

# k8s-ctr 노드에 curl-pod 파드 배포
$ kubectl delete pod curl-pod --grace-period=0
# => pod "curl-pod" deleted

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: curl-pod
  labels:
    app: curl
spec:
  nodeName: k8s-ctr
  containers:
  - name: curl
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF
# => pod/curl-pod created

$ kubectl get pod -owide
# => NAME                      READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                  1/1     Running   0          33s   172.20.2.15    k8s-ctr   <none>           <none>
#    webpod-86f878c468-448pc   1/1     Running   0          66s   172.20.0.202   k8s-w2    <none>           <none>
#    webpod-86f878c468-ttbs2   1/1     Running   0          70s   172.20.1.123   k8s-w1    <none>           <none>
$ kubectl get ciliumendpoints
# => NAME                      SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    curl-pod                  5180                ready            172.20.2.15
#    webpod-86f878c468-448pc   34270               ready            172.20.0.202
#    webpod-86f878c468-ttbs2   34270               ready            172.20.1.123
$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg endpoint list
# => ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                  IPv6   IPv4           STATUS
#               ENFORCEMENT        ENFORCEMENT
#    60         Disabled           Disabled          20407      k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system          172.20.0.167   ready
#                                                               k8s:io.cilium.k8s.policy.cluster=default
#                                                               k8s:io.cilium.k8s.policy.serviceaccount=coredns
#                                                               k8s:io.kubernetes.pod.namespace=kube-system
#                                                               k8s:k8s-app=kube-dns
#    71         Disabled           Disabled          4          reserved:health                                                                     172.20.0.114   ready
#    1368       Disabled           Disabled          20407      k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system          172.20.0.92    ready
#                                                               k8s:io.cilium.k8s.policy.cluster=default
#                                                               k8s:io.cilium.k8s.policy.serviceaccount=coredns
#                                                               k8s:io.kubernetes.pod.namespace=kube-system
#                                                               k8s:k8s-app=kube-dns
#    2533       Disabled           Disabled          1          reserved:host                                                                                      ready
#    2605       Disabled           Disabled          34270      k8s:app=webpod                                                                      172.20.0.202   ready
#                                                               k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
#                                                               k8s:io.cilium.k8s.policy.cluster=default
#                                                               k8s:io.cilium.k8s.policy.serviceaccount=default
#                                                               k8s:io.kubernetes.pod.namespace=default

# 통신 확인
$ kubectl exec -it curl-pod -- curl webpod | grep Hostname
# => Hostname: webpod-86f878c468-448pc

Cilium 설치 확인

cilium cli를 설치하여 Cilium 상태를 확인해 보겠습니다.

# cilium cli 설치
$ CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
$ CLI_ARCH=amd64
$ if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
$ curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz >/dev/null 2>&1
$ tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
# => cilium
$ rm cilium-linux-${CLI_ARCH}.tar.gz

# cilium 상태 확인
$ which cilium
# => /usr/local/bin/cilium
$ cilium status
# =>     /¯¯\
#     /¯¯\__/¯¯\    Cilium:             OK
#     \__/¯¯\__/    Operator:           OK
#     /¯¯\__/¯¯\    Envoy DaemonSet:    OK
#     \__/¯¯\__/    Hubble Relay:       disabled
#        \__/       ClusterMesh:        disabled
#    
#    DaemonSet              cilium                   Desired: 3, Ready: 3/3, Available: 3/3
#    DaemonSet              cilium-envoy             Desired: 3, Ready: 3/3, Available: 3/3
#    Deployment             cilium-operator          Desired: 2, Ready: 2/2, Available: 2/2
#    Containers:            cilium                   Running: 3
#                           cilium-envoy             Running: 3
#                           cilium-operator          Running: 2
#                           clustermesh-apiserver
#                           hubble-relay
#    Cluster Pods:          5/5 managed by Cilium
#    Helm chart version:    1.17.5
#    Image versions         cilium             quay.io/cilium/cilium:v1.17.5@sha256:baf8541723ee0b72d6c489c741c81a6fdc5228940d66cb76ef5ea2ce3c639ea6: 3
#                           cilium-envoy       quay.io/cilium/cilium-envoy:v1.32.6-1749271279-0864395884b263913eac200ee2048fd985f8e626@sha256:9f69e290a7ea3d4edf9192acd81694089af048ae0d8a67fb63bd62dc1d72203e: 3
#                           cilium-operator    quay.io/cilium/operator-generic:v1.17.5@sha256:f954c97eeb1b47ed67d08cc8fb4108fb829f869373cbb3e698a7f8ef1085b09e: 2
$ cilium config view
# => ...
#    cluster-pool-ipv4-cidr                            172.20.0.0/16
#    default-lb-service-ipam                           lbipam
#    ipam                                              cluster-pool
#    ipam-cilium-node-update-rate                      15s
#    iptables-random-fully                             false
#    ipv4-native-routing-cidr                          172.20.0.0/16
#    kube-proxy-replacement                            true
#    ...
$ kubectl get cm -n kube-system cilium-config -o json | jq

#
$ cilium config set debug true && watch kubectl get pod -A
# => ✨ Patching ConfigMap cilium-config with debug=true...
#    ♻️  Restarted Cilium pods
$ cilium config view | grep -i debug
# => debug                                             true
#    debug-verbose

# cilium daemon = cilium-dbg
$ kubectl exec -n kube-system -c cilium-agent -it ds/cilium -- cilium-dbg config
# => ##### Read-write configurations #####
#    ConntrackAccounting               : Disabled
#    ConntrackLocal                    : Disabled
#    Debug                             : Disabled
#    DebugLB                           : Disabled
#    DebugPolicy                       : Enabled
#    DropNotification                  : Enabled
#    MonitorAggregationLevel           : Medium
#    PolicyAccounting                  : Enabled
#    PolicyAuditMode                   : Disabled
#    PolicyTracing                     : Disabled
#    PolicyVerdictNotification         : Enabled
#    SourceIPVerification              : Enabled
#    TraceNotification                 : Enabled
#    MonitorNumPages                   : 64
#    PolicyEnforcement                 : default
$ kubectl exec -n kube-system -c cilium-agent -it ds/cilium -- cilium-dbg status --verbose
# => ...
#    KubeProxyReplacement:   True   [eth0    10.0.2.15 fd17:625c:f037:2:a00:27ff:fe71:19d8 fe80::a00:27ff:fe71:19d8, eth1   192.168.10.102 fe80::a00:27ff:fe79:58ac (Direct Routing)]
#    Routing:                Network: Native   Host: BPF
#    Attach Mode:            TCX
#    Device Mode:            veth
#    ...
#    KubeProxyReplacement Details:
#      Status:                 True
#      Socket LB:              Enabled
#      Socket LB Tracing:      Enabled
#      Socket LB Coverage:     Full
#      Devices:                eth0    10.0.2.15 fd17:625c:f037:2:a00:27ff:fe71:19d8 fe80::a00:27ff:fe71:19d8, eth1   192.168.10.102 fe80::a00:27ff:fe79:58ac (Direct Routing)
#      Mode:                   SNAT
#      Backend Selection:      Random
#      Session Affinity:       Enabled
#      Graceful Termination:   Enabled
#      NAT46/64 Support:       Disabled
#      XDP Acceleration:       Disabled
#      Services:
#      - ClusterIP:      Enabled
#      - NodePort:       Enabled (Range: 30000-32767)
#      - LoadBalancer:   Enabled
#      - externalIPs:    Enabled
#      - HostPort:       Enabled
#    ...

cilium_host, cilium_net, cilium_health 등의 네트워크 기본 정보를 확인해보겠습니다.

출처 : https://arthurchiao.art/blog/ctrip-network-arch-evolution/

#
$ ip -c addr
# => ...
#    7: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether 4e:28:ea:3e:83:e0 brd ff:ff:ff:ff:ff:ff
#        inet6 fe80::4c28:eaff:fe3e:83e0/64 scope link
#           valid_lft forever preferred_lft forever
#    8: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether 22:ad:62:34:21:8e brd ff:ff:ff:ff:ff:ff
#        inet 172.20.2.68/32 scope global cilium_host
#           valid_lft forever preferred_lft forever
#        inet6 fe80::20ad:62ff:fe34:218e/64 scope link
#           valid_lft forever preferred_lft forever
#    12: lxcc4a3ffff7931@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether 76:62:d3:8d:58:1f brd ff:ff:ff:ff:ff:ff link-netns cni-ca74ac02-08e1-9092-74ad-f60026576c19
#        inet6 fe80::7462:d3ff:fe8d:581f/64 scope link
#           valid_lft forever preferred_lft forever
#    14: lxc_health@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether ca:67:c2:a6:88:89 brd ff:ff:ff:ff:ff:ff link-netnsid 2
#        inet6 fe80::c867:c2ff:fea6:8889/64 scope link
#           valid_lft forever preferred_lft forever
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i ip -c addr ; echo; done

#
$ ip -c addr show cilium_net
# => 7: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether 4e:28:ea:3e:83:e0 brd ff:ff:ff:ff:ff:ff
#        inet6 fe80::4c28:eaff:fe3e:83e0/64 scope link
#           valid_lft forever preferred_lft forever
$ ip -c addr show cilium_host
# => 8: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether 22:ad:62:34:21:8e brd ff:ff:ff:ff:ff:ff
#        inet 172.20.2.68/32 scope global cilium_host
#           valid_lft forever preferred_lft forever
#        inet6 fe80::20ad:62ff:fe34:218e/64 scope link
#           valid_lft forever preferred_lft forever
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i ip -c addr show cilium_net  ; echo; done
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i ip -c addr show cilium_host ; echo; done

# lxc_health 인터페이스는 veth 로 cilium(NET NS 0, 호스트와 다름)과 veth pair 이다 - 링크
# cilium 인터페이스에 파드 IP가 할당되어 있으며, cilium-health-responder 로 동작한다
$ ip -c addr show lxc_health
# => 14: lxc_health@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
#        link/ether ca:67:c2:a6:88:89 brd ff:ff:ff:ff:ff:ff link-netnsid 2
#        inet6 fe80::c867:c2ff:fea6:8889/64 scope link
#           valid_lft forever preferred_lft forever
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i ip -c addr show lxc_health  ; echo; done

# IP 확인
$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg status --verbose
# => ....
#    Name              IP              Node   Endpoints
#      k8s-w2 (localhost):
#        Host connectivity to 192.168.10.102:   <span style="color: green;"># <-- NodeIP</span>
#          ICMP to stack:   OK, RTT=440.958µs
#          HTTP to agent:   OK, RTT=539.708µs
#        Endpoint connectivity to 172.20.0.114: <span style="color: green;"># <-- HealthIP</span>
#          ICMP to stack:   OK, RTT=189µs
#          HTTP to agent:   OK, RTT=502µs
#      k8s-ctr:
#        Host connectivity to 192.168.10.100:   <span style="color: green;"># <-- NodeIP</span>
#          ICMP to stack:   OK, RTT=1.011167ms
#          HTTP to agent:   OK, RTT=1.071083ms
#        Endpoint connectivity to 172.20.2.223: <span style="color: green;"># <-- HealthIP</span>
#          ICMP to stack:   OK, RTT=1.027125ms
#          HTTP to agent:   OK, RTT=7.677708ms
#      k8s-w1:
#        Host connectivity to 192.168.10.101:   <span style="color: green;"># <-- NodeIP</span>
#          ICMP to stack:   OK, RTT=888.417µs
#          HTTP to agent:   OK, RTT=1.167333ms
#        Endpoint connectivity to 172.20.1.229: <span style="color: green;"># <-- HealthIP</span>
#          ICMP to stack:   OK, RTT=1.806167ms
#          HTTP to agent:   OK, RTT=1.903ms
#    ....

$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg endpoint list | grep health
# => 2955  Disabled  Disabled  4  reserved:health  172.20.0.114  ready                                                             172.20.1.40    ready 

$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg status --all-addresses
# => ...
#    Allocated addresses:
#      172.20.0.114 (health)
#      172.20.0.167 (kube-system/coredns-674b8bbfcf-bvsfb [restored])
#      172.20.0.202 (default/webpod-86f878c468-448pc [restored])
#      172.20.0.235 (router)
#      172.20.0.92 (kube-system/coredns-674b8bbfcf-7q52c [restored])
#    ...

# Check health info in CT/NAT tables : ICMP records in Conntrack (CT) table and NAT table
$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium bpf ct list global | grep ICMP |head -n4
# => ICMP IN 192.168.10.101:19430 -> 172.20.0.114:0 expires=11874 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=11814 TxFlagsSeen=0x00 LastTxReport=11814 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=6 IfIndex=0 BackendID=0
#    ICMP IN 192.168.10.101:0 -> 172.20.0.114:0 related expires=11874 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=11814 TxFlagsSeen=0x00 LastTxReport=0 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=6 IfIndex=0 BackendID=0
#    ICMP IN 192.168.10.101:2374 -> 172.20.0.114:0 expires=11535 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=11475 TxFlagsSeen=0x00 LastTxReport=11475 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=6 IfIndex=0 BackendID=0
#    ICMP IN 192.168.10.101:47855 -> 172.20.0.114:0 expires=11415 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=11355 TxFlagsSeen=0x00 LastTxReport=11355 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=6 IfIndex=0 BackendID=0

$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium bpf nat list | grep ICMP |head -n4
# => ICMP OUT 192.168.10.102:35430 -> 172.20.1.229:0 XLATE_SRC 192.168.10.102:35430 Created=164sec ago NeedsCT=1
#    ICMP IN 172.20.2.223:0 -> 192.168.10.102:47029 XLATE_DST 192.168.10.102:47029 Created=464sec ago NeedsCT=1
#    ICMP IN 172.20.2.223:0 -> 192.168.10.102:52326 XLATE_DST 192.168.10.102:52326 Created=54sec ago NeedsCT=1
#    ICMP OUT 192.168.10.102:47029 -> 172.20.2.223:0 XLATE_SRC 192.168.10.102:47029 Created=464sec ago NeedsCT=1

node 및 endpoint health check 절차

routing 정보 확인해보겠습니다.

# Native-Routing + autoDirectNodeRoutes=true
$ ip -c route | grep 172.20 | grep eth1
# => 172.20.0.0/24 via 192.168.10.102 dev eth1 proto kernel
#    172.20.1.0/24 via 192.168.10.101 dev eth1 proto kernel
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i ip -c route | grep 172.20 | grep eth1 ; echo; done

# hostNetwork 를 사용하지 않는 파드의 경우 endpointRoutes.enabled=true 설정으로 lxcY 인터페이스 생성됨
$ kubectl get ciliumendpoints -A
# => NAMESPACE     NAME                       SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    default       curl-pod                   5180                ready            172.20.2.15
#    default       webpod-86f878c468-448pc    34270               ready            172.20.0.202
#    default       webpod-86f878c468-ttbs2    34270               ready            172.20.1.123
#    kube-system   coredns-674b8bbfcf-7q52c   20407               ready            172.20.0.92
#    kube-system   coredns-674b8bbfcf-bvsfb   20407               ready            172.20.0.167
$ ip -c route | grep lxc
# => 172.20.2.15 dev lxcc4a3ffff7931 proto kernel scope link
#    172.20.2.223 dev lxc_health proto kernel scope link
$ for i in w1 w2 ; do echo ">> node : k8s-$i <<"; sshpass -p 'vagrant' ssh vagrant@k8s-$i ip -c route | grep lxc ; echo; done

보다 상세한 내용은 공식문서를 참고하세요.

통신 확인

노드간 ‘파드 -> 파드’ 통신

참고 추천 글

파드에서 빠져나갈 때

파드로 들어올 때

cilium 정보 확인

# 먼저 아래의 cheatsheet을 참고하여 c0, c0bpf 등의 단축키(alias)를 지정한 후에 진행합니다.

# 엔드포인트 정보 확인
$ kubectl get pod -owide
# => NAME                      READY   STATUS    RESTARTS   AGE     IP             NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                  1/1     Running   0          3h40m   172.20.2.15    k8s-ctr   <none>           <none>
#    webpod-86f878c468-448pc   1/1     Running   0          3h41m   <span style="color: green;">172.20.0.202</span>   k8s-w2    <none>           <none>
#    webpod-86f878c468-ttbs2   1/1     Running   0          3h41m   172.20.1.123   k8s-w1    <none>           <none>
$ kubectl get svc,ep webpod
# => NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
#    service/webpod   ClusterIP   10.96.62.184   <none>        80/TCP    7h12m
#    
#    NAME               ENDPOINTS                         AGE
#    endpoints/webpod   172.20.0.202:80,172.20.1.123:80   7h12m

# 첫번째 webpod의 IP 주소를 WEBPOD1IP 변수에 저장합니다.
$ WEBPOD1IP=172.20.0.202

# BPF maps : 목적지 파드와 통신 시 어느곳으로 보내야 될지 확인할 수 있다
$ c0 map get cilium_ipcache
$ c0 map get cilium_ipcache | grep $WEBPOD1IP
# => 172.20.0.202/32     identity=34270 encryptkey=0 tunnelendpoint=192.168.10.102 flags=<none>   sync

# curl-pod 의 LXC 변수 지정
# $ LXC=<k8s-ctr의 가장 나중에 lxc 이름(lxc_health 제외)>
$ ip -c route | grep lxc
# => 172.20.2.15 dev <span style="color: green;">lxcc4a3ffff7931</span> proto kernel scope link
#    172.20.2.223 dev lxc_health proto kernel scope link
$ LXC=lxcc4a3ffff7931

# Node’s eBPF programs
## list of eBPF programs
$ c0bpf net show
$ c0bpf net show | grep $LXC 
# => lxcc4a3ffff7931(12) tcx/ingress cil_from_container prog_id 1212 link_id 22
#    lxcc4a3ffff7931(12) tcx/egress cil_to_container prog_id 1214 link_id 23 

## Use bpftool prog show id to view additional information about a program, including a list of attached eBPF maps:
# $ c0bpf prog show id <출력된 prog id 입력>
$ c0bpf prog show id 1214
# => 1214: sched_cls  name cil_to_container  tag 0b3125767ba1861c  gpl
#            loaded_at 2025-07-19T08:50:37+0000  uid 0
#            xlated 1448B  jited 1144B  memlock 4096B  map_ids 219,41,218
#            btf_id 468

$ c0bpf map list
# => ...
#    41: percpu_hash  name cilium_metrics  flags 0x1
#            key 8B  value 16B  max_entries 1024  memlock 19024B
#    ...
#    227: array  name .rodata.config  flags 0x480
#            key 4B  value 52B  max_entries 1  memlock 8192B
#            btf_id 496  frozen
#    228: prog_array  name cilium_calls_ne  flags 0x0
#            key 4B  value 4B  max_entries 50  memlock 720B
#            owner_prog_type sched_cls  owner jited
#    ...

다른 노드 간 ‘파드 -> 파드’ 통신을 확인해보겠습니다.

# vagrant ssh k8s-w1 , # vagrant ssh k8s-w2 각각 터미널 접속 후 아래 실행
$ ngrep -tW byline -d eth1 '' 'tcp port 80'

# [k8s-ctr] curl-pod 에서 curl 요청 시도
$ kubectl exec -it curl-pod -- curl $WEBPOD1IP
# 각각 터미널에서 출력 확인 : 파드의 소스 IP와 목적지 IP가 다른 노드의 서버 NIC에서 확인! : Native-Routung 
# => ####
#    T 2025/07/19 21:36:42.198609 172.20.2.15:46708 -> 172.20.0.202:80 [AP] #4
#    GET / HTTP/1.1.
#    ...
#    ##
#    T 2025/07/19 21:36:42.200368 172.20.0.202:80 -> 172.20.2.15:46708 [AP] #6
#    HTTP/1.1 200 OK.
#    ...

노드간 ‘파드 -> 서비스’ 통신

참고 : [K8S/Cilium] Socket-Based LoadBalancing 기법

네트워크기반 로드밸런싱 vs 소켓기반 로드밸런싱 비교

Pod1 안에서 동작하는 앱이 connect() 시스템콜을 이용하여 소켓을 연결할 때 목적지 주소가 서비스 주소(10.10.8.55)이면 소켓의 목적지 주소를 바로 백엔드 주소(10.0.0.31)로 설정합니다.
이후 앱에서 해당 소켓을 통해 보내는 모든 패킷의 목적지 주소는 이미 백엔드 주소(10.0.0.31)로 설정되어 있기 때문에 중간에 DNAT 변환 및 역변환 과정이 필요없어집니다.
Destination NAT 변환은 시스템 콜 레벨에서 발생하며, 패킷이 커널에 의해 생성되기도 전에 수행됩니다.
Socket operations : BPF socket operations program 은 root cgroup 에 연결되며 TCP event(ESTABLISHED) 에서 실행됩니다.
Socket send/recv : Socket send/recv hook 은 TCP socket 의 모든 송수신 작업에서 실행되며, hook 에서 검사/삭제/리다이렉션을 할 수 있습니다. https://cilium.io/blog/2020/11/10/ebpf-future-of-networking/
파드 네임스페이스에서 Socket-Based LoadBalancing 기법을 그림으로 정리해보면 아래와 같습니다.

출처 : [K8S/Cilium] Socket-Based LoadBalancing 기법

그림상의 좌측은 네트워크 기반 로드밸런싱 기법을 사용한 경우이고, 우측은 소켓 기반 로드밸런싱 기법을 사용한 경우입니다.
소켓 기반 로드밸런싱 기법은 네트워크 기반 로드밸런싱 기법과 비교하여 DNAT 변환 및 역변환 과정이 필요 없기 때문에 성능이 향상됩니다.
connect() 와 sendto() 소켓 함수에 연결된 프로그램(connect4, sendmsg4)에서는 소켓의 목적지 주소를 백엔드 주소와 포트로 변환하고, cilium_lb4_backends 맵에 백엔드 주소와 포트를 등록해놓습니다.
이후 recvmsg() 소켓 함수에 연결된 프로그램(recvmsg4)에서는 cilium_lb4_reverse_nat 맵을 이용해서 목적지 주소와 포트를 다시 서비스 주소와 포트로 변환합니다. https://k8s.networkop.co.uk/services/clusterip/dataplane/ebpf/
실습 확인

# curl 호출
$ kubectl exec -it curl-pod -- curl webpod

# 신규 터미널 : 파드에서 SVC(ClusterIP) 접속 시 tcpdump 로 확인 : ClusterIP가 소켓 레벨에서 이미 Endpoint 로 변경되었음을 확인!
$ kubectl exec curl-pod -- tcpdump -enni any -q
# => ...
#    14:09:05.318403 eth0  Out ifindex 11 d6:ab:bb:21:3a:93 172.20.2.15.40982 > 172.20.1.123.80: tcp 0
#    14:09:05.319286 eth0  In  ifindex 11 76:62:d3:8d:58:1f 172.20.1.123.80 > 172.20.2.15.40982: tcp 0
#    ...

# Socket-Based LoadBalancing 관련 설정들 확인
$ c0 status --verbose
# => ...
#    KubeProxyReplacement Details:
#      Status:                 True
#      Socket LB:              Enabled
#      Socket LB Tracing:      Enabled
#      Socket LB Coverage:     Full
#      Devices:                eth0    10.0.2.15 fd17:625c:f037:2:a00:27ff:fe71:19d8 fe80::a00:27ff:fe71:19d8, eth1   192.168.10.100 fe80::a00:27ff:feda:2493 (Direct Routing)
#      Mode:                   SNAT
#      Backend Selection:      Random
#      Session Affinity:       Enabled
#      Graceful Termination:   Enabled
#      NAT46/64 Support:       Disabled
#      XDP Acceleration:       Disabled
#      Services:
#      - ClusterIP:      Enabled
#      - NodePort:       Enabled (Range: 30000-32767)
#      - LoadBalancer:   Enabled
  
# syscall 호출 확인
$ kubectl exec curl-pod -- strace -c curl -s webpod
# => ...
#    % time     seconds  usecs/call     calls    errors syscall
#    ------ ----------- ----------- --------- --------- ----------------
#     <span style="color: green;">19.00    0.001003         334         3         1 connect</span>
#     15.97    0.000843         281         3           sendto
#     15.82    0.000835          23        35           munmap
#     10.59    0.000559          93         6         3 recvfrom
#     10.33    0.000545           8        63           mmap
#      7.58    0.000400           8        47        30 openat
#      4.32    0.000228          10        22           close
#      3.87    0.000204          20        10           lseek
#      2.94    0.000155           6        24           fcntl
#      2.56    0.000135           4        28           rt_sigaction
#      1.46    0.000077           8         9           ppoll
#      1.23    0.000065          16         4           socket
#      0.72    0.000038          12         3         3 ioctl
#      0.68    0.000036          36         1           newfstatat
#      0.63    0.000033           2        14           mprotect
#      <span style="color: green;">0.63    0.000033           1        27           read</span>
#      0.55    0.000029           9         3           readv
#      0.21    0.000011           0        12           fstat
#      0.21    0.000011           0        14           rt_sigprocmask
#      0.17    0.000009           9         1           writev
#      0.15    0.000008           1         5           setsockopt
#      <span style="color: green;">0.15    0.000008           1         5           getsockname</span>
#      0.09    0.000005           5         1           eventfd2
#      0.06    0.000003           0         4           brk
#      0.04    0.000002           2         1           getrandom
#      <span style="color: green;">0.04    0.000002           2         1           getsockopt</span>
#      0.02    0.000001           1         1           getgid
#      0.00    0.000000           0         1           set_tid_address
#      0.00    0.000000           0         1           getuid
#      0.00    0.000000           0         2           geteuid
#      0.00    0.000000           0         1           getegid
#      0.00    0.000000           0         1           execve
#    ------ ----------- ----------- --------- --------- ----------------
#    100.00    0.005278          14       353        37 total

# 상세 출력
$ kubectl exec curl-pod -- strace -s 65535 -f -tt curl -s webpod

# 특정 이벤트 필터링 : -e
## connect 로 출력되는 10.96.62.184 는 webpod Service 의 ClusterIP입니다.
$ kubectl exec curl-pod -- strace -e trace=connect     curl -s webpod
# => connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.96.62.184")}, 16) = 0
#    connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.96.62.184")}, 16) = -1 EINPROGRESS (Operation in progress)
#    ...

## connect 로 출력되는 172.20.2.15 는 curl-pod 의 파드 IP입니다. -> 목적지 webpod 파드 IP가 아닙니다.
$ kubectl exec curl-pod -- strace -e trace=getsockname curl -s webpod
# => getsockname(4, {sa_family=AF_INET, sin_port=htons(52951), sin_addr=inet_addr("172.20.2.15")}, [128 => 16]) = 0
#    getsockname(5, {sa_family=AF_INET, sin_port=htons(42089), sin_addr=inet_addr("172.20.2.15")}, [16]) = 0
#    getsockname(4, {sa_family=AF_INET, sin_port=htons(60934), sin_addr=inet_addr("172.20.2.15")}, [128 => 16]) = 0
#    getsockname(4, {sa_family=AF_INET, sin_port=htons(60934), sin_addr=inet_addr("172.20.2.15")}, [128 => 16]) = 0
#    getsockname(4, {sa_family=AF_INET, sin_port=htons(60934), sin_addr=inet_addr("172.20.2.15")}, [128 => 16]) = 0

$ kubectl exec curl-pod -- strace -e trace=getsockopt curl -s webpod
# => getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 # 소켓 연결 성공

strace를 통해 위와 같이 IP 변환에 대해 알아보려 했지만 실제로는 소켓 레벨에서 이미 변환이 완료되어 있기 때문에 strace로는 확인할 수 없습니다.
이는 eBPF를 통해 시스템 콜을 줄여 성능을 향상시키는 Cilium의 특징 중 하나입니다.

ℹ️ 참고로 strace는 시스템 콜을 추적하는 도구로 다음과 같은 기능들로 사용할 수 있습니다.

# 중단점 트레이싱 : -ttt(첫 열에 기준시간으로부터 흐른 시간 표시) , -T(마지막 필드 time에 시스템 콜에 걸린 시간을 표시) , -p PID(프로세스 ID가 PID 인 프로세스를 트레이싱)
$ strace -ttt -T -p 1884
  
# 시스템 콜별 통계
$ strace -c -p 1884
  
# 프로그램 실행시 시스템 콜 추적
$ strace ls
  
# 옵션 사용해보기 : -s(출력 string 결과 최댓값 지정), -tt(첫 열에 기준시간으로부터 흐른 시간 표시, ms단위), -f(멀티 스레드,멀티 프로레스의 자식 프로세스의 시스템 콜 추적)
$ strace -s 65535 -f -T -tt -o <파일명> -p <pid>
  
# hostname 명령 분석하기 : -o <파일명> 출력 결과를 파일로 떨구기
$ strace -s 65535 -f -T -tt -o hostname_f_trace hostname -f
  
# 특정 이벤트 : -e
$ strace -e trace=connect curl ipinfo.io

Cilium 사용시 주의사항

소켓 기반 로드밸런싱 이용시 Istio(EnvoyProxy)와 같은 사이드카 우회문제가 있을 수 있습니다. 링크

앞서 확인한것 처럼 서비스의 IP가 이미 백엔드 IP로 변환되었기 때문에, 서비스 IP기반으로 동작하는 모든 필터가 우회되는 현상입니다.

해결 방안은 파드 네임스페이스에서는 소켓 기반 로드밸런싱을 사용하지 않는 것입니다. 즉, 호스트 네임스페이스만 사용하게 설정하는 것입니다

HTTP의 경우 Envoy의 HTTP 필터가 HTTP 패킷의 host 헤더를 필터링하여 패킷의 목적지 주소를 서비스 IP에서 백엔드 IP로 변환을 잘 합니다.
하지만, HTTP가 아닌 일반 TCP 서비스 (예) Telnet 등)은 위 환경에서 문제가 발생합니다.

# 설정
$ VERSION=1.17.5
$ helm upgrade cilium cilium/cilium --version $VERSION --namespace kube-system --reuse-values \
  --set socketLB.hostNamespaceOnly=true
$ kubectl -n kube-system rollout restart ds/cilium
# => daemonset.apps/cilium restarted
  
$ cilium config view | grep bpf-lb-sock-hostns-only
# => bpf-lb-sock-hostns-only                           true
  
# 확인
# 지속적으로 접속 트래픽 발생
$ while true; do kubectl exec curl-pod -- curl -s $SVCIP | grep Hostname;echo "-----";sleep 1;done
# => Hostname: webpod-86f878c468-448pc
#    -----
#    Hostname: webpod-86f878c468-ttbs2
#    -----
#    Hostname: webpod-86f878c468-ttbs2
#    ...
  
# 파드에서 SVC(ClusterIP) 접속 시 tcpdump 로 확인 >> 파드 내부 캡쳐인데, SVC(10.96.62.184) 트래픽이 보인다!
$ kubectl exec curl-pod -- tcpdump -enni any -q
# => 14:38:41.369005 eth0  Out ifindex 11 d6:ab:bb:21:3a:93 172.20.2.15.52976 > <span style="color: green;">10.96.62.184.80</span>: tcp 0
#    14:38:41.369050 eth0  Out ifindex 11 d6:ab:bb:21:3a:93 172.20.2.15.52976 > <span style="color: green;">10.96.62.184.80</span>: tcp 76
#    14:38:41.369767 eth0  In  ifindex 11 76:62:d3:8d:58:1f <span style="color: green;">10.96.62.184.80</span> > 172.20.2.15.52976: tcp 0
#    14:38:41.370802 eth0  In  ifindex 11 76:62:d3:8d:58:1f <span style="color: green;">10.96.62.184.80</span> > 172.20.2.15.52976: tcp 327
#    ...	

Service ClusterIP로 NFS나 SMB 같은 프로토콜을 사용하면 문제가 발생할 수 있습니다. (Longhorn, Portworx, Robin 등) - Docs, Issue
- Cilium의 eBPF를 통한 kube-proxy 대체 기능은 socket기반 로드밸런싱을 사용하기 때문에, 앞서 살펴본것 처럼 서비스 IP가 백엔드 IP로 변환되어 사용됩니다.
- NFS나 SMB 프로토콜은 서비스 IP를 사용하여 통신하기 때문에, socket기반 로드밸런싱을 사용하면 문제가 발생할 수 있습니다. 이 문제는 Longhorn, Portworx, Robin 등과 같은 스토리지 시스템에서 발생할 수 있으며, ReadWriteMany 모드를 사용하는 다른 스토리지 시스템에서도 발생할 수 있습니다.
- 이를 해결하기 위해서는 다음의 패치들이 커널에 포함되어있어야 합니다.
  - 0bdf399342c5 ("net: Avoid address overwrite in kernel_connect")
  - 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()")
  - 01b2885d9415 ("net: Save and restore msg_namelen in sock_sendmsg")
  - cedc019b9f26 ("smb: use kernel_connect() and kernel_bind()") (SMB only)
- 위의 패치들은 많은 안정화 커널버전에 백포트 되었으며, 아래의 배포판 중 해당 버전 이상에서는 해결되었습니다.
  - Ubuntu: 5.4.0-187-generic, 5.15.0-113-generic, 6.5.0-41-generic or newer.
  - RHEL 8: 4.18.0-553.8.1.el8_10.x86_64 or newer (RHEL 8.10+).
  - RHEL 9: kernel-5.14.0-427.31.1.el9_4 or newer (RHEL 9.4+).
- 보다 자세한 사항은 Github Issue 21541를 확인하세요.
Cilium은 kubernetes의 중추적인 역할을 하는 kube-proxy를 대체하기 때문에, Linux Network Stack을 사용하는 애플리케이션 등을 적용시 꼭 사전 검증이 필요합니다.

마치며

이번 주에는 가장 기본적인 CNI인 Flannel과 가장 고도화된 CNI 중의 하나인 Cilium을 살펴보았습니다. 한번에 두가지를 비교해 보면서 Cilium의 특징과 장점을 확인하는 시간이었습니다. 또한 Cilium이 기능적으로 혁신적이고 최근 기술인 만큼 아직 엣지 케이스가 많이 남아있다는 것도 알 수 있었습니다.

줌 영상 스터디로 한번 설명을 듣고, 정리된 실습자료를 따라하는데 시간이 쭉쭉 가고, 이해가 잘 안 가는 부분이 많은데, 스터디를 준비해주시는 CloudNet@ 팀 분들이 얼마나 정성과 시간을 쏟았을지 감사한 마음이 듭니다.

마지막으로 실습환경을 삭제하며 마치겠습니다.

$ vagrant destroy -f && rm -rf .vagrant

💁 참고 : Cilium CMD Cheatsheet

Cheatsheet - Docs
CMD References - Docs

# cilium 파드 이름
$ export CILIUMPOD0=$(kubectl get -l k8s-app=cilium pods -n kube-system --field-selector spec.nodeName=k8s-ctr -o jsonpath='{.items[0].metadata.name}')
$ export CILIUMPOD1=$(kubectl get -l k8s-app=cilium pods -n kube-system --field-selector spec.nodeName=k8s-w1  -o jsonpath='{.items[0].metadata.name}')
$ export CILIUMPOD2=$(kubectl get -l k8s-app=cilium pods -n kube-system --field-selector spec.nodeName=k8s-w2  -o jsonpath='{.items[0].metadata.name}')
$ echo $CILIUMPOD0 $CILIUMPOD1 $CILIUMPOD2
  
# 단축키(alias) 지정
$ alias c0="kubectl exec -it $CILIUMPOD0 -n kube-system -c cilium-agent -- cilium"
$ alias c1="kubectl exec -it $CILIUMPOD1 -n kube-system -c cilium-agent -- cilium"
$ alias c2="kubectl exec -it $CILIUMPOD2 -n kube-system -c cilium-agent -- cilium"
  
$ alias c0bpf="kubectl exec -it $CILIUMPOD0 -n kube-system -c cilium-agent -- bpftool"
$ alias c1bpf="kubectl exec -it $CILIUMPOD1 -n kube-system -c cilium-agent -- bpftool"
$ alias c2bpf="kubectl exec -it $CILIUMPOD2 -n kube-system -c cilium-agent -- bpftool"
  
# endpoint
$ c0 endpoint list
$ c0 endpoint list -o json
$ c1 endpoint list
$ c2 endpoint list
  
$ c1 endpoint get <id>
$ c1 endpoint log <id>
  
## Enable debugging output on the cilium-dbg monitor for this endpoint
$ c1 endpoint config <id> Debug=true
  
# monitor
$ c1 monitor
$ c1 monitor -v
$ c1 monitor -v -v
  
## Filter for only the events related to endpoint
$ c1 monitor --related-to=<id>
  
## Show notifications only for dropped packet events
$ c1 monitor --type drop
  
## Don’t dissect packet payload, display payload in hex information
$ c1 monitor -v -v --hex
  
## Layer7
$ c1 monitor -v --type l7
  
# Manage IP addresses and associated information - IP List
$ c0 ip list
  
# IDENTITY :  1(host), 2(world), 4(health), 6(remote), 파드마다 개별 ID
$ c0 ip list -n
  
# Retrieve information about an identity
$ c0 identity list
  
# 엔드포인트 기준 ID
$ c0 identity list --endpoints
  
# 엔드포인트 설정 확인 및 변경
$ c0 endpoint config <엔트포인트ID>
  
# 엔드포인트 상세 정보 확인
$ c0 endpoint get <엔트포인트ID>
  
# 엔드포인트 로그 확인
$ c0 endpoint log <엔트포인트ID>
  
# Show bpf filesystem mount details
$ c0 bpf fs show
  
# bfp 마운트 폴더 확인
$ tree /sys/fs/bpf
  
# Get list of loadbalancer services
$ c0 service list
$ c1 service list
$ c2 service list
  
## Or you can get the loadbalancer information using bpf list
$ c0 bpf lb list
$ c1 bpf lb list
$ c2 bpf lb list
  
## List reverse NAT entries
$ c1 bpf lb list --revnat
$ c2 bpf lb list --revnat
  
# List connection tracking entries
$ c0 bpf ct list global
$ c1 bpf ct list global
$ c2 bpf ct list global
  
# Flush connection tracking entries
$ c0 bpf ct flush
$ c1 bpf ct flush
$ c2 bpf ct flush
  
# List all NAT mapping entries
$ c0 bpf nat list
$ c1 bpf nat list
$ c2 bpf nat list
  
# Flush all NAT mapping entries
$ c0 bpf nat flush
$ c1 bpf nat flush
$ c2 bpf nat flush
  
# Manage the IPCache mappings for IP/CIDR <-> Identity
$ c0 bpf ipcache list# Display cgroup metadata maintained by Cilium
$ c0 cgroups list
$ c1 cgroups list
$ c2 cgroups list
  
# List all open BPF maps
$ c0 map list
$ c1 map list --verbose
$ c2 map list --verbose
  
$ c1 map events cilium_lb4_services_v2
$ c1 map events cilium_lb4_reverse_nat
$ c1 map events cilium_lxc
$ c1 map events cilium_ipcache
  
# List all metrics
$ c1 metrics list
  
# List contents of a policy BPF map : Dump all policy maps
$ c0 bpf policy get --all
$ c1 bpf policy get --all -n
$ c2 bpf policy get --all -n
  
# Dump StateDB contents as JSON
$ c0 statedb dump
  
#
$ c0 shell -- db/show devices
$ c1 shell -- db/show devices
$ c2 shell -- db/show devices