[Cilium] Networking - 노드의 파드들간 통신 상세 part 1

들어가며

이번 포스트에서는 Cilium의 Networking에 대해 살펴보겠습니다. Cilium이 어떻게 노드에 있는 파드들 간의 통신을 처리하는지, IP 주소 할당(IPAM), 라우팅, 마스커레이딩, DNS 설정 등을 다룰 것입니다.

실습 환경 구성

이번 실습에서는 다음 그림과 같이 worker 노드를 1대 줄이고, router 노드를 추가해서 실습할 예정입니다.
- 가상머신 : k8s-ctr, k8s-w1, router
- router : 사내망 10.10.0.0/16 대역 통신과 연결, k8s 에 join 되지 않은 서버, loop1/loop2 dump 인터페이스 배치
- Cilium CNI 가 설치된 상태로 배포됩니다.

실습환경 배포 파일

Vagrantfile : 가상머신 정의, 부팅 시 초기 프로비저닝 설정을 포함하는 Vagrantfile입니다.

# Variables
K8SV = '1.33.2-1.1' # Kubernetes Version : apt list -a kubelet, ex) 1.32.5-1.1
CONTAINERDV = '1.7.27-1' # Containerd Version : apt list -a containerd.io, ex) 1.6.33-1
CILIUMV = '1.17.6' # Cilium CNI Version : https://github.com/cilium/cilium/tags
N = 1 # max number of worker nodes
  
# Base Image  https://portal.cloud.hashicorp.com/vagrant/discover/bento/ubuntu-24.04
BOX_IMAGE = "bento/ubuntu-24.04"
BOX_VERSION = "202502.21.0"
  
Vagrant.configure("2") do |config|
#-ControlPlane Node
    config.vm.define "k8s-ctr" do |subconfig|
      subconfig.vm.box = BOX_IMAGE
        
      subconfig.vm.box_version = BOX_VERSION
      subconfig.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--groups", "/Cilium-Lab"]
        vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
        vb.name = "k8s-ctr"
        vb.cpus = 2
        vb.memory = 2560
        vb.linked_clone = true
      end
      subconfig.vm.host_name = "k8s-ctr"
      subconfig.vm.network "private_network", ip: "192.168.10.100"
      subconfig.vm.network "forwarded_port", guest: 22, host: 60000, auto_correct: true, id: "ssh"
      subconfig.vm.synced_folder "./", "/vagrant", disabled: true
      subconfig.vm.provision "shell", path: "init_cfg.sh", args: [ K8SV, CONTAINERDV ]
      subconfig.vm.provision "shell", path: "k8s-ctr.sh", args: [ N, CILIUMV, K8SV ]
      subconfig.vm.provision "shell", path: "route-add1.sh"
    end
  
#-Worker Nodes Subnet1
  (1..N).each do |i|
    config.vm.define "k8s-w#{i}" do |subconfig|
      subconfig.vm.box = BOX_IMAGE
      subconfig.vm.box_version = BOX_VERSION
      subconfig.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--groups", "/Cilium-Lab"]
        vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
        vb.name = "k8s-w#{i}"
        vb.cpus = 2
        vb.memory = 1536
        vb.linked_clone = true
      end
      subconfig.vm.host_name = "k8s-w#{i}"
      subconfig.vm.network "private_network", ip: "192.168.10.10#{i}"
      subconfig.vm.network "forwarded_port", guest: 22, host: "6000#{i}", auto_correct: true, id: "ssh"
      subconfig.vm.synced_folder "./", "/vagrant", disabled: true
      subconfig.vm.provision "shell", path: "init_cfg.sh", args: [ K8SV, CONTAINERDV]
      subconfig.vm.provision "shell", path: "k8s-w.sh"
      subconfig.vm.provision "shell", path: "route-add1.sh"
    end
  end
  
#-Router Node
    config.vm.define "router" do |subconfig|
      subconfig.vm.box = BOX_IMAGE
      subconfig.vm.box_version = BOX_VERSION
      subconfig.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--groups", "/Cilium-Lab"]
        vb.name = "router"
        vb.cpus = 1
        vb.memory = 768
        vb.linked_clone = true
      end
      subconfig.vm.host_name = "router"
      subconfig.vm.network "private_network", ip: "192.168.10.200"
      subconfig.vm.network "forwarded_port", guest: 22, host: 60009, auto_correct: true, id: "ssh"
      subconfig.vm.synced_folder "./", "/vagrant", disabled: true
      subconfig.vm.provision "shell", path: "router.sh"
    end    
end  

init_cfg.sh : args 참고하여 초기 설정을 수행하는 스크립트입니다.

#!/usr/bin/env bash
  
echo ">>>> Initial Config Start <<<<"
  
echo "[TASK 1] Setting Profile & Bashrc"
echo 'alias vi=vim' >> /etc/profile
echo "sudo su -" >> /home/vagrant/.bashrc
ln -sf /usr/share/zoneinfo/Asia/Seoul /etc/localtime # Change Timezone
  
echo "[TASK 2] Disable AppArmor"
systemctl stop ufw && systemctl disable ufw >/dev/null 2>&1
systemctl stop apparmor && systemctl disable apparmor >/dev/null 2>&1
  
echo "[TASK 3] Disable and turn off SWAP"
swapoff -a && sed -i '/swap/s/^/#/' /etc/fstab
  
echo "[TASK 4] Install Packages"
apt update -qq >/dev/null 2>&1
apt-get install apt-transport-https ca-certificates curl gpg -y -qq >/dev/null 2>&1
  
# Download the public signing key for the Kubernetes package repositories.
mkdir -p -m 755 /etc/apt/keyrings
K8SMMV=$(echo $1 | sed -En 's/^([0-9]+\.[0-9]+)\..*/\1/p')
curl -fsSL https://pkgs.k8s.io/core:/stable:/v$K8SMMV/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$K8SMMV/deb/ /" >> /etc/apt/sources.list.d/kubernetes.list
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
  
# packets traversing the bridge are processed by iptables for filtering
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/k8s.conf
  
# enable br_netfilter for iptables 
modprobe br_netfilter
modprobe overlay
echo "br_netfilter" >> /etc/modules-load.d/k8s.conf
echo "overlay" >> /etc/modules-load.d/k8s.conf
  
echo "[TASK 5] Install Kubernetes components (kubeadm, kubelet and kubectl)"
# Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version
apt update >/dev/null 2>&1
  
# apt list -a kubelet ; apt list -a containerd.io
apt-get install -y kubelet=$1 kubectl=$1 kubeadm=$1 containerd.io=$2 >/dev/null 2>&1
apt-mark hold kubelet kubeadm kubectl >/dev/null 2>&1
  
# containerd configure to default and cgroup managed by systemd
containerd config default > /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
  
# avoid WARN&ERRO(default endpoints) when crictl run  
cat <<EOF > /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF
  
# ready to install for k8s 
systemctl restart containerd && systemctl enable containerd
systemctl enable --now kubelet
  
echo "[TASK 6] Install Packages & Helm"
export DEBIAN_FRONTEND=noninteractive
apt-get install -y bridge-utils sshpass net-tools conntrack ngrep tcpdump ipset arping wireguard jq yq tree bash-completion unzip kubecolor termshark >/dev/null 2>&1
curl -s https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash >/dev/null 2>&1
  
echo ">>>> Initial Config End <<<<"
  

k8s-ctr.sh : kubeadm init를 통하여 kubernetes controlplane 노드를 설정하고 Cilium CNI 설치, 편리성 설정(k, kc)하는 스크립트입니다.

#!/usr/bin/env bash
  
echo ">>>> K8S Controlplane config Start <<<<"
  
echo "[TASK 1] Initial Kubernetes"
curl --silent -o /root/kubeadm-init-ctr-config.yaml https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/cilium-study/kubeadm-init-ctr-config.yaml
K8SMMV=$(echo $3 | sed -En 's/^([0-9]+\.[0-9]+\.[0-9]+).*/\1/p')
sed -i "s/K8S_VERSION_PLACEHOLDER/v${K8SMMV}/g" /root/kubeadm-init-ctr-config.yaml
kubeadm init --config="/root/kubeadm-init-ctr-config.yaml"  >/dev/null 2>&1
  
echo "[TASK 2] Setting kube config file"
mkdir -p /root/.kube
cp -i /etc/kubernetes/admin.conf /root/.kube/config
chown $(id -u):$(id -g) /root/.kube/config
  
echo "[TASK 3] Source the completion"
echo 'source <(kubectl completion bash)' >> /etc/profile
echo 'source <(kubeadm completion bash)' >> /etc/profile
  
echo "[TASK 4] Alias kubectl to k"
echo 'alias k=kubectl' >> /etc/profile
echo 'alias kc=kubecolor' >> /etc/profile
echo 'complete -F __start_kubectl k' >> /etc/profile
  
echo "[TASK 5] Install Kubectx & Kubens"
git clone https://github.com/ahmetb/kubectx /opt/kubectx >/dev/null 2>&1
ln -s /opt/kubectx/kubens /usr/local/bin/kubens
ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx
  
echo "[TASK 6] Install Kubeps & Setting PS1"
git clone https://github.com/jonmosco/kube-ps1.git /root/kube-ps1 >/dev/null 2>&1
cat <<"EOT" >> /root/.bash_profile
source /root/kube-ps1/kube-ps1.sh
KUBE_PS1_SYMBOL_ENABLE=true
function get_cluster_short() {
  echo "$1" | cut -d . -f1
}
KUBE_PS1_CLUSTER_FUNCTION=get_cluster_short
KUBE_PS1_SUFFIX=') '
PS1='$(kube_ps1)'$PS1
EOT
kubectl config rename-context "kubernetes-admin@kubernetes" "HomeLab" >/dev/null 2>&1
  
echo "[TASK 7] Install Cilium CNI"
NODEIP=$(ip -4 addr show eth1 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
helm repo add cilium https://helm.cilium.io/ >/dev/null 2>&1
helm repo update >/dev/null 2>&1
helm install cilium cilium/cilium --version $2 --namespace kube-system \
--set k8sServiceHost=192.168.10.100 --set k8sServicePort=6443 \
--set ipam.mode="kubernetes" --set k8s.requireIPv4PodCIDR=true --set ipv4NativeRoutingCIDR=10.244.0.0/16 \
--set routingMode=native --set autoDirectNodeRoutes=true --set endpointRoutes.enabled=true \
--set kubeProxyReplacement=true --set bpf.masquerade=true --set installNoConntrackIptablesRules=true \
--set endpointHealthChecking.enabled=false --set healthChecking=false \
--set hubble.enabled=true --set hubble.relay.enabled=true --set hubble.ui.enabled=true \
--set hubble.ui.service.type=NodePort --set hubble.ui.service.nodePort=30003 \
--set prometheus.enabled=true --set operator.prometheus.enabled=true --set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
--set operator.replicas=1 --set debug.enabled=true >/dev/null 2>&1
#--set ipam.mode="cluster-pool" --set ipam.operator.clusterPoolIPv4PodCIDRList={"172.20.0.0/16"} --set ipv4NativeRoutingCIDR=172.20.0.0/16 \
  
echo "[TASK 8] Install Cilium / Hubble CLI"
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz >/dev/null 2>&1
tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz
  
HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
HUBBLE_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then HUBBLE_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz >/dev/null 2>&1
tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /usr/local/bin
rm hubble-linux-${HUBBLE_ARCH}.tar.gz
  
echo "[TASK 9] Remove node taint"
kubectl taint nodes k8s-ctr node-role.kubernetes.io/control-plane-
  
echo "[TASK 10] local DNS with hosts file"
echo "192.168.10.100 k8s-ctr" >> /etc/hosts
for (( i=1; i<=$1; i++  )); do echo "192.168.10.10$i k8s-w$i" >> /etc/hosts; done
  
echo "[TASK 11] Install Prometheus & Grafana"
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.17.6/examples/kubernetes/addons/prometheus/monitoring-example.yaml >/dev/null 2>&1
kubectl patch svc -n cilium-monitoring prometheus -p '{"spec": {"type": "NodePort", "ports": [{"port": 9090, "targetPort": 9090, "nodePort": 30001}]}}' >/dev/null 2>&1
kubectl patch svc -n cilium-monitoring grafana -p '{"spec": {"type": "NodePort", "ports": [{"port": 3000, "targetPort": 3000, "nodePort": 30002}]}}' >/dev/null 2>&1
  
echo "[TASK 12] Dynamically provisioning persistent local storage with Kubernetes"
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml >/dev/null 2>&1
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' >/dev/null 2>&1
  
echo ">>>> K8S Controlplane Config End <<<<"

이번 실습에서는 --set endpointHealthChecking.enabled=false 과 --set healthChecking=false 옵션을 통해 endpoint health check를 완전히 해제합니다.
참고로 해당 health check 기능은 비교적 소규모의 클러스터(3~10노드)에만 활성화 하기를 권장하고 있습니다. 대규모의 클러스터에서는 방화벽 정책이나 하이퍼바이저 설정으로 인해 패킷 손실이 발생할 수 있기 때문입니다. - Docs

kubeadm-init-ctr-config.yaml

  apiVersion: kubeadm.k8s.io/v1beta4
  kind: **InitConfiguration**
  bootstrapTokens:
  - token: "123456.1234567890123456"
    ttl: "0s"
    usages:
    - signing
    - authentication
  localAPIEndpoint:
    advertiseAddress: "192.168.10.100"
  nodeRegistration:
    **kubeletExtraArgs:
      - name: node-ip
        value: "192.168.10.100"**
    criSocket: "unix:///run/containerd/containerd.sock"
  ---
  apiVersion: kubeadm.k8s.io/v1beta4
  kind: **ClusterConfiguration**
  kubernetesVersion: "**K8S_VERSION_PLACEHOLDER**"
  networking:
    podSubnet: "10.244.0.0/16"
    serviceSubnet: "10.96.0.0/16"

k8s-w.sh : kubernetes worker 노드 설정, kubeadm join, Cilium CNI 설치 등을 수행하는 스크립트입니다.

#!/usr/bin/env bash
  
echo ">>>> K8S Node config Start <<<<"
  
echo "[TASK 1] K8S Controlplane Join"
curl --silent -o /root/kubeadm-join-worker-config.yaml https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/cilium-study/2w/kubeadm-join-worker-config.yaml
NODEIP=$(ip -4 addr show eth1 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
sed -i "s/NODE_IP_PLACEHOLDER/${NODEIP}/g" /root/kubeadm-join-worker-config.yaml
kubeadm join --config="/root/kubeadm-join-worker-config.yaml" > /dev/null 2>&1
  
echo ">>>> K8S Node config End <<<<"

kubeadm-join-worker-config.yaml

apiVersion: kubeadm.k8s.io/v1beta4
kind: JoinConfiguration
discovery:
  bootstrapToken:
    token: "123456.1234567890123456"
    apiServerEndpoint: "192.168.10.100:6443"
    unsafeSkipCAVerification: true
nodeRegistration:
  criSocket: "unix:///run/containerd/containerd.sock"
  kubeletExtraArgs:
    - name: node-ip
      value: "NODE_IP_PLACEHOLDER"

route-add1.sh : k8s node 들이 사내망(?)과 통신을 위한 route 설정 스크립트입니다.

#!/usr/bin/env bash
  
echo ">>>> Route Add Config Start <<<<"
  
chmod 600 /etc/netplan/01-netcfg.yaml
chmod 600 /etc/netplan/50-vagrant.yaml
  
cat <<EOT>> /etc/netplan/50-vagrant.yaml
      routes:
      - to: 10.10.0.0/16
        via: 192.168.10.200
EOT
  
netplan apply
  
echo ">>>> Route Add Config End <<<<"

router.sh : router 역할과 추가적으로 웹서버 역할을 하는 서버의 초기 설정을 담당합니다.

#!/usr/bin/env bash
  
echo ">>>> Initial Config Start <<<<"
  
echo "[TASK 1] Setting Profile & Bashrc"
echo 'alias vi=vim' >> /etc/profile
echo "sudo su -" >> /home/vagrant/.bashrc
ln -sf /usr/share/zoneinfo/Asia/Seoul /etc/localtime
  
echo "[TASK 2] Disable AppArmor"
systemctl stop ufw && systemctl disable ufw >/dev/null 2>&1
systemctl stop apparmor && systemctl disable apparmor >/dev/null 2>&1
  
echo "[TASK 3] Add Kernel setting - IP Forwarding"
sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/g' /etc/sysctl.conf
sysctl -p >/dev/null 2>&1
  
echo "[TASK 4] Setting Dummy Interface"
modprobe dummy
ip link add loop1 type dummy
ip link set loop1 up
ip addr add 10.10.1.200/24 dev loop1
  
ip link add loop2 type dummy
ip link set loop2 up
ip addr add 10.10.2.200/24 dev loop2
  
echo "[TASK 5] Install Packages"
export DEBIAN_FRONTEND=noninteractive
apt update -qq >/dev/null 2>&1
apt-get install net-tools jq tree ngrep tcpdump arping -y -qq >/dev/null 2>&1
  
echo "[TASK 6] Install Apache"
apt install apache2 -y >/dev/null 2>&1
echo -e "<h1>Web Server : $(hostname)</h1>" > /var/www/html/index.html
  
echo ">>>> Initial Config End <<<<"

실습환경 배포 및 분석 툴 설치

실습환경 배포

$ vagrant up
# =>     ...
#        router: [TASK 5] Install Packages
#        router: [TASK 6] Install Apache
#        router: >>>> Initial Config End <<<<

k8s-ctr, cilium 설치정보확인은 지난주의 포스트를 참고해주세요. 링크

k9s

이번 주에는 k9s라는 CLI 기반의 Kubernetes 대시보드 툴을 설치하고 살펴보겠습니다. github

설치 및 실행

# arm64 CPU 일 경우
$ wget https://github.com/derailed/k9s/releases/latest/download/k9s_linux_arm64.deb -O /tmp/k9s_linux_arm64.deb
$ apt install /tmp/k9s_linux_arm64.deb
# => ...
#    Preparing to unpack /tmp/k9s_linux_arm64.deb ...
#    Unpacking k9s (0.50.9) ...
#    Setting up k9s (0.50.9) ...
    
# amd64 CPU 일 경우
$ wget https://github.com/derailed/k9s/releases/latest/download/k9s_linux_amd64.deb -O /tmp/k9s_linux_amd64.deb
$ apt install /tmp/k9s_linux_amd64.deb
    
# k9s 설치 경로 확인
$ which k9s
# => /usr/bin/k9s
    
# k9s 실행
$ k9s

실행 화면 터미널이지만 한눈에 보기 쉽게 구성되어 있습니다.

k9s 기본 사용법

# 버전 확인
$ k9s version
# =>  ____  __ ________
#    |    |/  /   __   \______
#    |       /\____    /  ___/
#    |    \   \  /    /\___  \
#    |____|\__ \/____//____  /
#             \/           \/
#    Version:    v0.50.9
#    Commit:     ffdc7b70f044e1f26c2f6fbb93b5495e4ebdb1ad
    
# k9s 런타임에 대한 정보
$ k9s info
# => ...
#    Version:           v0.50.9
#    Config:            /root/.config/k9s/config.yaml
#    Custom Views:      /root/.config/k9s/views.yaml
#    Plugins:           /root/.config/k9s/plugins.yaml
#    Hotkeys:           /root/.config/k9s/hotkeys.yaml
#    Aliases:           /root/.config/k9s/aliases.yaml
#    Skins:             /root/.config/k9s/skins
#    Context Configs:   /root/.local/share/k9s/clusters
#    Logs:              /root/.local/state/k9s/k9s.log
#    Benchmarks:        /root/.local/state/k9s/benchmarks
#    ScreenDumps:       /root/.local/state/k9s/screen-dumps
    
# CLI의 도움말
$ k9s help
    
# 특정 네임스페이스에서 K9s 시작
$ k9s -n mycoolns
    
# KubeConfig에 존재하는 컨텍스트로 K9s 시작
$ k9s --context coolCtx
    
# K9s를 읽기 전용 모드로 시작 - 클러스터 수정 명령이 비활성화됩니다.
$ k9s --readonly

termshark

터미널에서 Wireshark 처럼 패킷을 볼 수 있는 툴입니다. github, Home

# 이미 설치되어 있음
$ export DEBIAN_FRONTEND=noninteractive
$ apt-get install -y termshark
# => Reading package lists... Done
#    Building dependency tree... Done
#    Reading state information... Done
#    termshark is already the newest version (2.4.0-1ubuntu0.24.04.3).
#    0 upgraded, 0 newly installed, 0 to remove and 170 not upgraded.
    
# pcap 파일 분석
$ termshark -r test.pcap
    
# eth0 인터페이스에서 ping 패킷을 캡처합니다.
$ termshark -i eth0 icmp

실행 화면

IPAM

IPAM은 IP Address Management의 약자로, 네트워크 엔드포인트(컨테이너 등)에 대한 IP 주소를 할당하고 관리하는 시스템입니다. Docs

Feature	Kubernetes Host Scope	Cluster Scope (default)	Multi-Pool (Beta)	CRD-backed	AWS ENI…
Tunnel routing	✅	✅	❌	❌	❌
Direct routing	✅	✅	✅	✅	✅
CIDR Configuration	Kubernetes	Cilium	Cilium	External	External (AWS)
Multiple CIDRs per cluster	❌	✅	✅	N/A	N/A
Multiple CIDRs per node	❌	❌	✅	N/A	N/A
Dynamic CIDR/IP allocation	❌	❌	✅	✅	✅

기존 클러스터의 IPAM 모드를 변경하지 마세요.
라이브 환경에서 IPAM 모드를 변경하면 기존 워크로드의 지속적인 연결 중단이 발생할 수 있습니다.
IPAM 모드를 변경하는 가장 안전한 방법은 새로운 IPAM 구성으로 새로운 Kubernetes 클러스터를 설치하는 것입니다.

Kubernetes Host Scope

Docs

Kubernetes 호스트 범위 IPAM 모드는 ipam: Kubernetes로 활성화되며, 클러스터의 각 개별 노드에 주소 할당을 위임합니다.
IP는 Kubernetes에 의해 각 노드에 연결된 PodCIDR 범위에서 할당됩니다. 즉, CIDR 설정의 주체는 Kubernetes입니다.
이 모드에서는 Cilium 에이전트가 Kubernetes v1.Node 객체를 통해 PodCIDR 범위가 다음 방법 중 하나를 통해 활성화된 모든 주소 패밀리에 대해 제공될때까지 시작시 대기합니다.

# 클러스터 정보 확인
$ kubectl cluster-info dump | grep -m 2 -E "cluster-cidr|service-cluster-ip-range"
# =>                             "--service-cluster-ip-range=10.96.0.0/16",
#                                "--cluster-cidr=10.244.0.0/16",

# ipam 모드 확인
$ cilium config view | grep ^ipam
# => ipam                                              kubernetes
#    ipam-cilium-node-update-rate                      15s

# 노드별 파드에 할당되는 IPAM(PodCIDR) 정보 확인
# --allocate-node-cidrs=true 로 설정된 kube-controller-manager에서 CIDR을 자동 할당함
$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# => k8s-ctr 10.244.0.0/24
#    k8s-w1  10.244.1.0/24

$ kc describe pod -n kube-system kube-controller-manager-k8s-ctr
# => ....
#    Command:
#          kube-controller-manager
#          --allocate-node-cidrs=true
#          --cluster-cidr=10.244.0.0/16
#          --service-cluster-ip-range=10.96.0.0/16
#    ...

$ kubectl get ciliumnode -o json | grep podCIDRs -A2
# =>                     "podCIDRs": [
#                            "10.244.0.0/24"
#                        ],
#    --
#                        "podCIDRs": [
#                            "10.244.1.0/24"
#                        ],

# 파드 정보 : 상태, 파드 IP 확인
$ kubectl get ciliumendpoints.cilium.io -A
# => NAMESPACE            NAME                                      SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    cilium-monitoring    grafana-5c69859d9-zgx9k                   22364               ready            10.244.0.5
#    cilium-monitoring    prometheus-6fc896bc5d-5rbpx               15628               ready            10.244.0.143
#    kube-system          coredns-674b8bbfcf-dhckj                  28257               ready            10.244.0.155
#    kube-system          coredns-674b8bbfcf-n4xbw                  28257               ready            10.244.0.7
#    kube-system          hubble-relay-5dcd46f5c-9bcxl              9702                ready            10.244.0.59
#    kube-system          hubble-ui-76d4965bb6-rq9gv                64346               ready            10.244.0.166
#    local-path-storage   local-path-provisioner-74f9666bc9-rrn2r   29718               ready            10.244.0.130
# <span style="color: green;">👉 현재 모든 파드가 k8s-ctr에서 동작 중이어서 10.244.0.0/24 아이피가 할당된 것을 확인할 수 있습니다.</span>

샘플 애플리케이션 배포 및 확인

샘플 애플리케이션을 배포하고 IPAM이 올바르게 작동하는지 확인합니다.

# 샘플 애플리케이션 배포
$ cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webpod
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webpod
  template:
    metadata:
      labels:
        app: webpod
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - sample-app
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: webpod
        image: traefik/whoami
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: webpod
  labels:
    app: webpod
spec:
  selector:
    app: webpod
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
EOF
# => deployment.apps/webpod created
#    service/webpod created

# k8s-ctr 노드에 curl-pod 파드 배포
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: curl-pod
  labels:
    app: curl
spec:
  nodeName: k8s-ctr
  containers:
  - name: curl
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF
# => pod/curl-pod created

배포 확인

# 배포 확인
$ kubectl get deploy,svc,ep webpod -owide
# => NAME                     READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES           SELECTOR
#    deployment.apps/webpod   2/2     2            2           77s   webpod       traefik/whoami   app=webpod
#    
#    NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE   SELECTOR
#    service/webpod   ClusterIP   10.96.194.47   <none>        80/TCP    77s   app=webpod
#    
#    NAME               ENDPOINTS                       AGE
#    endpoints/webpod   10.244.0.2:80,10.244.1.188:80   77s
$ kubectl get endpointslices -l app=webpod
# => NAME           ADDRESSTYPE   PORTS   ENDPOINTS                 AGE
#    webpod-j45jt   IPv4          80      10.244.0.2,10.244.1.188   95s
$ kubectl get ciliumendpoints # IP 확인
# => NAME                      SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    curl-pod                  1072                ready            10.244.0.188
#    webpod-697b545f57-2zpdp   24748               ready            10.244.0.2
#    webpod-697b545f57-thl79   24748               ready            10.244.1.188
$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg endpoint list

# 통신 확인
$ kubectl exec -it curl-pod -- curl webpod | grep Hostname
# => Hostname: webpod-697b545f57-2zpdp
$ kubectl exec -it curl-pod -- sh -c 'while true; do curl -s webpod | grep Hostname; sleep 1; done'
# => Hostname: webpod-697b545f57-thl79
#    Hostname: webpod-697b545f57-2zpdp
#    Hostname: webpod-697b545f57-thl79
#    Hostname: webpod-697b545f57-thl79
#    Hostname: webpod-697b545f57-2zpdp
#    ...

Hubble 확인

# hubble ui 웹 접속 주소 확인 : default 네임스페이스 확인
$ NODEIP=$(ip -4 addr show eth1 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
$ echo -e "http://$NODEIP:30003"
# => http://192.168.10.100:30003

# hubble relay 포트 포워딩 실행
$ cilium hubble port-forward&
# => ℹ️   Hubble Relay is available at 127.0.0.1:4245
$ hubble status
# => Healthcheck (via localhost:4245): Ok
#    Current/Max Flows: 5,284/8,190 (64.52%)
#    Flows/s: 33.82
#    Connected Nodes: 2/2

# flow log 모니터링
$ hubble observe -f --protocol tcp --to-pod curl-pod
# => Aug  2 08:36:29.808: default/curl-pod:56772 (ID:1072) <- default/webpod-697b545f57-thl79:80 (ID:24748) to-network FORWARDED (TCP Flags: ACK, FIN)
#    Aug  2 08:36:30.530: default/curl-pod:50248 (ID:1072) <- default/webpod-697b545f57-2zpdp:80 (ID:24748) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
#    Aug  2 08:36:30.533: default/curl-pod:50248 (ID:1072) <- default/webpod-697b545f57-2zpdp:80 (ID:24748) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
#    ...
$ hubble observe -f --protocol tcp --from-pod curl-pod
# => Aug  2 08:36:57.991: default/curl-pod (ID:1072) <> 10.96.194.47:80 (world) pre-xlate-fwd TRACED (TCP)
#    Aug  2 08:36:57.991: default/curl-pod (ID:1072) <> default/webpod-697b545f57-thl79:80 (ID:24748) post-xlate-fwd TRANSLATED (TCP)
#    Aug  2 08:36:57.991: default/curl-pod:51316 (ID:1072) -> default/webpod-697b545f57-thl79:80 (ID:24748) to-network FORWARDED (TCP Flags: SYN)
#    Aug  2 08:36:57.992: default/curl-pod:51316 (ID:1072) -> default/webpod-697b545f57-thl79:80 (ID:24748) to-network FORWARDED (TCP Flags: ACK, PSH)
#    Aug  2 08:36:57.993: default/curl-pod:51316 (ID:1072) -> default/webpod-697b545f57-thl79:80 (ID:24748) to-network FORWARDED (TCP Flags: ACK)
#    Aug  2 08:36:57.999: default/curl-pod:51316 (ID:1072) -> default/webpod-697b545f57-thl79:80 (ID:24748) to-network FORWARDED (TCP Flags: ACK, FIN)
#    Aug  2 08:36:57.999: default/curl-pod:51316 (ID:1072) -> default/webpod-697b545f57-thl79:80 (ID:24748) to-network FORWARDED (TCP Flags: ACK)
$ hubble observe -f --protocol tcp --pod curl-pod
# => Aug  2 08:37:22.316: default/curl-pod (ID:1072) <> 10.96.194.47:80 (world) pre-xlate-fwd TRACED (TCP)
#    Aug  2 08:37:22.316: default/curl-pod (ID:1072) <> default/webpod-697b545f57-2zpdp:80 (ID:24748) post-xlate-fwd TRANSLATED (TCP)
#    Aug  2 08:37:22.317: default/curl-pod:51022 (ID:1072) -> default/webpod-697b545f57-2zpdp:80 (ID:24748) to-endpoint FORWARDED (TCP Flags: SYN)
#    Aug  2 08:37:22.317: default/curl-pod:51022 (ID:1072) <- default/webpod-697b545f57-2zpdp:80 (ID:24748) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
#    Aug  2 08:37:22.317: default/curl-pod:51022 (ID:1072) -> default/webpod-697b545f57-2zpdp:80 (ID:24748) to-endpoint FORWARDED (TCP Flags: ACK)
# <span style="color: green;">👉 pre-xlate-fwd, TRACED : NAT (IP 변환) 전, 추적 중인 flow</span>
# <span style="color: green;">   post-xlate-fwd, TRANSLATED : NAT 후의 흐름, NAT 변환이 일어났음</span>

# 호출 시도
$ kubectl exec -it curl-pod -- curl webpod | grep Hostname
# => Hostname: webpod-697b545f57-2zpdp
$ kubectl exec -it curl-pod -- curl webpod | grep Hostname
# => Hostname: webpod-697b545f57-thl79
# 혹은
$ kubectl exec -it curl-pod -- sh -c 'while true; do curl -s webpod | grep Hostname; sleep 1; done'

# tcpdump 확인 : 파드 IP 확인
$ tcpdump -i eth1 tcp port 80 -nn
# => 20:11:41.085067 IP 10.244.0.188.56954 > 10.244.1.188.80: Flags [P.], seq 1:71, ack 1, win 502, options [nop,nop,TS val 2064675417 ecr 2283779636], length 70: HTTP: GET / HTTP/1.1

# http 패킷 캡처
$ tcpdump -i eth1 tcp port 80 -w /tmp/http.pcap

# termshark로 pcap 파일 분석
$ termshark -r /tmp/http.pcap

hubble UI에서 확인한 흐름 정보

termshark에서 확인한 패킷 정보

[Cilium] Cluster Scope

Docs, IPAM

각 노드에 노드별 PodCIDR 범위가 할당되며, 각 노드의 호스트 범위 할당기를 사용하여 IP를 할당합니다.
이 모드는 Kubernetes Host Scope IPAM 모드와 유사하지만, Cilium이 v2.CiliumNode라는 리소스(CRD)를 통해 노드별 PodCIDR 범위를 관리하는 점이 다릅니다.
장점은 Kubernetes가 노드별 PodCIDR 범위를 관리하지 않기 때문에, Cilium이 노드별 PodCIDR 범위를 동적으로 할당할 수 있습니다.
최소 마스크 길이는 /30이며, 권장 최소 마스크 길이는 /29 이상입니다. 2개 주소는 예약되어 있습니다. (네트워크, 브로드캐스트 주소)
기본 pod CIDR은 10.0.0.0/8입니다.

IPAM 모드를 Cluster Scope로 변경

앞서 언급한것 처럼 라이브 환경에서 IPAM 모드를 변경하지 마세요.

# 반복 요청 해두기
$ kubectl exec -it curl-pod -- sh -c 'while true; do curl -s webpod | grep Hostname; sleep 1; done'

# Cluster Scopre 로 설정 변경
$ helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values \
  --set ipam.mode="cluster-pool" --set ipam.operator.clusterPoolIPv4PodCIDRList={"172.20.0.0/16"} \
  --set ipv4NativeRoutingCIDR=172.20.0.0/16

$ kubectl -n kube-system rollout restart deploy/cilium-operator # 오퍼레이터 재시작 필요
# => deployment.apps/cilium-operator restarted
$ kubectl -n kube-system rollout restart ds/cilium
# => daemonset.apps/cilium restarted

# 변경 확인
$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# => k8s-ctr 10.244.0.0/24
#    k8s-w1  10.244.1.0/24
$ cilium config view | grep ^ipam
# => ipam                                              cluster-pool
#    ipam-cilium-node-update-rate                      15s

$ kubectl get ciliumnode -o json | grep podCIDRs -A2
# =>                     "podCIDRs": [
#                            "10.244.0.0/24"
#                        ],
#    --
#                        "podCIDRs": [
#                            "10.244.1.0/24"
#                        ],
$ kubectl get ciliumendpoints.cilium.io -A
# => NAMESPACE            NAME                                      SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    cilium-monitoring    grafana-5c69859d9-zgx9k                   22364               ready            10.244.0.5
#    cilium-monitoring    prometheus-6fc896bc5d-5rbpx               15628               ready            10.244.0.143
#    default              curl-pod                                  1072                ready            10.244.0.188
#    default              webpod-697b545f57-2zpdp                   24748               ready            10.244.0.2
#    default              webpod-697b545f57-thl79                   24748               ready            10.244.1.188
#    kube-system          coredns-674b8bbfcf-dhckj                  28257               ready            10.244.0.155
#    kube-system          coredns-674b8bbfcf-n4xbw                  28257               ready            10.244.0.7
#    local-path-storage   local-path-provisioner-74f9666bc9-rrn2r   29718               ready            10.244.0.130
# <span style="color: green;">👉 IPAM 모드는 변경되었으나 podCIDR을 비롯한 IP는 아직 변경되지 않았습니다.</span>

# IPAM 모드 변경 후, 반영을 위해 Cilium 노드 리소스를 삭제하고 데몬셋을 재시작합니다.
$ kubectl delete ciliumnode k8s-w1
# => ciliumnode.cilium.io "k8s-w1" deleted
$ kubectl -n kube-system rollout restart ds/cilium
# => daemonset.apps/cilium restarted
$ kubectl get ciliumnode -o json | grep podCIDRs -A2
# =>                     "podCIDRs": [
#                            "10.244.0.0/24"
#                        ],
#    --
#                        "podCIDRs": [
#                            "172.20.0.0/24"
#                        ],
# <span style="color: green;">👉 k8s-w1의 podCIDR이 변경되었습니다.</span>
$ kubectl get ciliumendpoints.cilium.io -A
# => NAMESPACE            NAME                                      SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    cilium-monitoring    grafana-5c69859d9-zgx9k                   22364               ready            10.244.0.5
#    cilium-monitoring    prometheus-6fc896bc5d-5rbpx               15628               ready            10.244.0.143
#    default              curl-pod                                  1072                ready            10.244.0.188
#    default              webpod-697b545f57-2zpdp                   24748               ready            10.244.0.2
#    kube-system          coredns-674b8bbfcf-dhckj                  28257               ready            10.244.0.155
#    kube-system          coredns-674b8bbfcf-n4xbw                  28257               ready            10.244.0.7
#    kube-system          hubble-relay-5b48c999f9-qktv5             9702                ready            172.20.0.167
#    kube-system          hubble-ui-655f947f96-ts4n6                64346               ready            172.20.0.122
#    local-path-storage   local-path-provisioner-74f9666bc9-rrn2r   29718               ready            10.244.0.130

# 마찬가지로 k8s-ctr 노드의 podCIDR도 변경합니다.
$ kubectl delete ciliumnode k8s-ctr
# => ciliumnode.cilium.io "k8s-ctr" deleted
$ kubectl -n kube-system rollout restart ds/cilium
# => daemonset.apps/cilium restarted
$ kubectl get ciliumnode -o json | grep podCIDRs -A2
# =>                     "podCIDRs": [
#                            "172.20.1.0/24"
#                        ],
#    --
#                        "podCIDRs": [
#                            "172.20.0.0/24"
#                        ],
$ kubectl get ciliumendpoints.cilium.io -A # 파드 IP 변경 되는가?
# => NAMESPACE     NAME                            SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    kube-system   coredns-674b8bbfcf-tg9ll        28257               ready            172.20.0.56
#    kube-system   hubble-relay-5b48c999f9-qktv5   9702                ready            172.20.0.167
#    kube-system   hubble-ui-655f947f96-ts4n6      64346               ready            172.20.0.122
# <span style="color: green;">👉 변경되었습니다.</span>

# 노드의 podcidr static routing 자동 변경 적용 확인
$ ip -c route
# => ...
#    <span style="color: green;">172.20.1.113 dev lxc781feae60918</span> proto kernel scope link
$ sshpass -p 'vagrant' ssh vagrant@k8s-w1 ip -c route
# => ...
#    <span style="color: green;">172.20.0.56 dev lxc1bf5de6d4ec4</span> proto kernel scope link
#    <span style="color: green;">172.20.0.122 dev lxcb644cb2f80be</span> proto kernel scope link
#    <span style="color: green;">172.20.0.167 dev lxc9c5d083a0332</span> proto kernel scope link

# 직접 rollout restart 하자! 
$ kubectl get pod -A -owide | grep 10.244.
# => cilium-monitoring    grafana-5c69859d9-zgx9k                   0/1     Running   1 (4h58m ago)   20h     10.244.0.5       k8s-ctr   <none>           <none>
#    cilium-monitoring    prometheus-6fc896bc5d-5rbpx               1/1     Running   1 (4h58m ago)   20h     10.244.0.143     k8s-ctr   <none>           <none>
#    default              curl-pod                                  1/1     Running   0               4h18m   10.244.0.188     k8s-ctr   <none>           <none>
#    default              webpod-697b545f57-2zpdp                   1/1     Running   0               4h18m   10.244.0.2       k8s-ctr   <none>           <none>
#    default              webpod-697b545f57-thl79                   1/1     Running   0               4h18m   10.244.1.188     k8s-w1    <none>           <none>
#    local-path-storage   local-path-provisioner-74f9666bc9-rrn2r   1/1     Running   1 (4h58m ago)   20h     10.244.0.130     k8s-ctr   <none>           <none>
# <span style="color: green;">👉 아직 변경되지 않은 pod들이 남아있어서 재시작하겠습니다.</span>

$ kubectl -n kube-system rollout restart deploy/hubble-relay deploy/hubble-ui
# => deployment.apps/hubble-relay restarted
#    deployment.apps/hubble-ui restarted
$ kubectl -n cilium-monitoring rollout restart deploy/prometheus deploy/grafana
# => deployment.apps/prometheus restarted
#    deployment.apps/grafana restarted
$ kubectl rollout restart deploy/webpod
# => deployment.apps/webpod restarted

#
$ cilium hubble port-forward&
# => ℹ️ Hubble Relay is available at 127.0.0.1:4245

# curl-pod는 파드만 수동으로 배포한것이라 삭제하고 다시 만들겠습니다.
$ kubectl delete pod curl-pod
# => pod "curl-pod" deleted
# k8s-ctr 노드에 curl-pod 파드 배포
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: curl-pod
  labels:
    app: curl
spec:
  nodeName: k8s-ctr
  containers:
  - name: curl
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF
# => pod/curl-pod created

# 파드 IP 변경 확인!
$ kubectl get ciliumendpoints.cilium.io -A
# => NAMESPACE           NAME                            SECURITY IDENTITY   ENDPOINT STATE   IPV4           IPV6
#    cilium-monitoring   grafana-74f45ff4b-8s8jk         22364               ready            172.20.0.129
#    cilium-monitoring   prometheus-5cd9888b5c-nq2jh     15628               ready            172.20.0.22
#    default             curl-pod                        1072                ready            172.20.1.80
#    default             webpod-bb8b9557f-7rn8t          24748               ready            172.20.0.130
#    default             webpod-bb8b9557f-9tzvj          24748               ready            172.20.1.31
#    kube-system         coredns-674b8bbfcf-58c7n        28257               ready            172.20.1.113
#    kube-system         coredns-674b8bbfcf-tg9ll        28257               ready            172.20.0.56
#    kube-system         hubble-relay-575fc84f49-m7bkm   9702                ready            172.20.0.177
#    kube-system         hubble-ui-5b686f8966-cnqxd      64346               ready            172.20.0.244
# <span style="color: green;">👉 모든 파드의 IP가 변경되었습니다.</span>

# 반복 요청
$ kubectl exec -it curl-pod -- sh -c 'while true; do curl -s webpod | grep Hostname; sleep 1; done'

이렇듯 IPAM 모드를 변경해도 이미 배포된 파드들은 IP가 변경되지 않기 때문에 운영중인 클러스터에서 IPAM 모드를 변경하는 것은 권장하지 않습니다.
새로운 IPAM 모드로 새로운 클러스터를 설치하고, 기존 클러스터에서 워크로드를 이전하는 것이 가장 안전한 방법입니다.

[Cilium CNI Chaining] AWS VPC CNI plugin

Docs

이번에 알아볼 것은 Cilium을 AWS VPC CNI 플러그인과 함께 사용하는 방법입니다.
이 하이브리드 모드에서는 AWS VPC CNI 플러그인이 가상 네트워크 장치 설정뿐만 아니라 ENI를 통한 IP 주소 관리(IPAM)도 담당합니다.
주어진 pod에 대해 초기 네트워킹이 설정된 후, Cilium CNI 플러그인은 네트워크 정책을 시행하고 로드밸런싱을 수행하며 암호화를 제공하기 위해 AWS VPC CNI 플러그인이 설정한 네트워크 장치에 eBPF 프로그램을 연결하도록 호출합니다.

AWS-CNI 역할 : Device plumbing, IPAM(ENI), Routing(Native-Routing 등)
Cilium 역할 : LB, Network Policy, Encrption, Multi-Cluster, Visiblity

설정

helm install cilium cilium/cilium --version 1.17.6 \
  --namespace kube-system \
  --set cni.chainingMode=aws-cni \
  --set cni.exclusive=false \
  --set enableIPv4Masquerade=false \
  --set routingMode=native

AWS ENI IPAM 모드 Docs
AWS ENI 할당기는 AWS 클라우드에서 수행되는 Cilium 배포에 특화되어있으며, AWS EC2 API와 통신하여 AWS Elastic Network Interface(ENI)의 IP를 기반으로 IP 를 할당합니다.
이 모드는 대규모 클러스터에서의 속도 제한 문제를 해결하기 위해 단일 운영자만 EC2 서비스 API와 통신할 수 있도록 보장합니다.
사전 할당 워터마크는 클러스터에서 새 pod가 예약될때 EC2 API를 호출할 필요없이 노드에서 항상 사용할 수 있도록 여러 IP 주소를 유지하는데 사용됩니다.

Routing

Cilium은 Encapsulation과 Native Routing을 지원합니다. 각각에 대해 살펴 보겠습니다.

Method 1. Encapsulation (VXLAN, GENEVE)

Docs

Encapsulation 모드는 특별한 인프라 요구사항이 없기 때문에 Cilium은 기본적으로 Encapsulation 모드를 사용합니다.
이 모드에서는 모든 클러스터 노드가 UDP 기반의 VXLAN 또는 GENEVE를 사용하여 터널링을 통해 서로 통신합니다.
Cilium 노드간의 모든 트래픽이 캡슐화 됩니다.
그리고 캡슐화는 일반 노드간 연결에 의존합니다. 즉, Cilium 노드가 이미 서로 연결될 수 있다면 Encapsulation 모드를 사용할 수 있다는 이야기 입니다.
기본 네트워크는 IPv4를 지원해야 하며, 다음의 UDP 포트를 방화벽에서 허용해야 합니다.
- VXLAN (Defaut) : UDP 8472
- GENEVE : UDP 6081
장점
- 단순함 (Simplicity)
  - - 클러스터 노드를 연결하는 네트워크는 PodCIDR을 인식할 필요가 없습니다.
  - 클러스터 노드는 여러 라우팅 또는 링크 계층 도메인을 생성할 수 있습니다.
  - 클러스터 노드가 IP/UDP를 사용하여 서로 연결할 수 있는 한 기본 네트워크의 토폴로지는 중요하지 않습니다.
- 정체성 맥락 (Identity context)
  - 캡슐화 프로토콜은 네트워크 패킷과 함께 메타데이터를 전송할 수 있게 해줍니다.
  - Cilium은 소스 보안 ID와 같은 메타데이터를 전송하는 이 기능을 활용합니다.
  - 정체성 전달은 원격 노드에서 하나의 정체성 조회를 피하기 위해 설계된 최적화입니다.
단점
- MTU Overhead
  - 캡슐화 헤더가 추가됨에 의해서 페이로드에 사용할 수 있는 유효 MTU가 줄어듭니다. (VXLAN의 경우 50바이트, GENEVE의 경우 60바이트)
  - 이로 인해 특정 네트워크 연결에 대한 최대 처리량이 낮아집니다.
  - 점보 프레임(Jumbo Frame)을 사용하여 MTU를 늘려 해당 문제를 크게 완화할 수 있지만, 모든 네트워크 장치가 점보 프레임을 지원하지는 않습니다.
- Encapsulation/Decapsulation Overhead
  - 캡슐화 및 디캡슐화는 CPU 오버헤드를 발생시킵니다.
  - 이 오버헤드는 일반적으로 네트워크 대역폭에 비해 작지만, 대규모 클러스터에서는 성능에 영향을 미칠 수 있습니다.
설정방법
- tunnel-protocol : Encapsulation 프로토콜을 vxlan이나 geneve로 설정합니다. (기본값: vxlan)
- tunnel-port : Encapsulation 프로토콜을 위한 UDP 포트를 설정합니다. vxlan의 경우 8472, geneve의 경우 6081입니다. (기본값: 8472)

Method 2. Native Routing

Docs

Native Routing 모드는 Cilium이 캡슐화 없이 Pod 간에 직접 통신할 수 있도록 합니다.
캡슐화를 수행하는 대신 Cilium이 실행되는 네트워크의 라우팅 기능을 활용합니다.
Native Routing 모드에서는 Cilium이 다른 로컬 엔드포인트로 주소를 지정하지 않은 모든 패킷을 Linux 커널 라우팅 하위 시스템에 위임합니다.
이는 패킷이 로컬 프로세스가 패킷을 방출하는 것 처럼 라우팅 된다는것을 의미합니다.
따라서 클러스터 노드를 연결하는 네트워크가 PodCIDR을 인식하고, PodCIDR를 라우팅하는 설정되어 있어야 합니다.
PodCIDR 라우팅 방안 1
- 각 개별 노드는 다른 모든 노드의 모든 포드 IP를 인식하고 이를 표현하기 위해 Linux 커널 라우팅 테이블에 삽입합니다.
- 모든 노드가 단일 L2 네트워크를 공유하는 경우 auto-direct-node-routes: true하여 이 문제를 해결할 수 있습니다.
- 그렇지 않으면 BGP 데몬과 같은 추가 시스템 구성 요소를 실행하여 경로를 배포해야 합니다.
PodCIDR 라우팅 방안 2
- 노드 자체는 모든 포드 IP를 라우팅하는 방법을 모르지만 다른 모든 포드에 도달하는 방법을 아는 라우터가 네트워크에 존재합니다.
- 이 시나리오에서는 Linux 노드가 이러한 라우터를 가리키는 기본 경로를 포함하도록 구성됩니다.
- 이 모델은 클라우드 제공자 네트워크 통합에 사용됩니다. 자세한 내용은 Google Cloud, AWS ENI 및 Azure IPAM을 참조하세요.
설정방법
- routing-mode: native: Native Routing 모드를 활성화합니다.
- ipv4-native-routing-cidr: x.x.x.x/y: Native Routing 모드에서 PodCIDR를 라우팅하는 CIDR을 설정합니다.
- auto-direct-node-routes: true : 동일 L2 네트워크 공유 시, 걱 노드의 PodCIDR에 대한 Linux 커널 라우팅 테이블에 삽입합니다.

Native Roung 실습을 위한 Cilium Agent 단축키 지정

# cilium 파드 이름
$ export CILIUMPOD0=$(kubectl get -l k8s-app=cilium pods -n kube-system --field-selector spec.nodeName=k8s-ctr -o jsonpath='{.items[0].metadata.name}')
$ export CILIUMPOD1=$(kubectl get -l k8s-app=cilium pods -n kube-system --field-selector spec.nodeName=k8s-w1  -o jsonpath='{.items[0].metadata.name}')
$ echo $CILIUMPOD0 $CILIUMPOD1 $CILIUMPOD2
# => cilium-6ggxf cilium-hb6jp
  
# 단축키(alias) 지정
$ alias c0="kubectl exec -it $CILIUMPOD0 -n kube-system -c cilium-agent -- cilium"
$ alias c1="kubectl exec -it $CILIUMPOD1 -n kube-system -c cilium-agent -- cilium"

노드간 파드 통신 상세 확인 with Native Routing

#
$ kubectl get pod -owide
# => NAME                     READY   STATUS    RESTARTS   AGE    IP             NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                 1/1     Running   0          104m   172.20.1.80    k8s-ctr   <none>           <none>
#    webpod-bb8b9557f-7rn8t   1/1     Running   0          105m   172.20.0.130   k8s-w1    <none>           <none>
#    webpod-bb8b9557f-9tzvj   1/1     Running   0          105m   172.20.1.31    k8s-ctr   <none>           <none>

# Webpod1,2 파드 IP
$ export WEBPODIP1=$(kubectl get -l app=webpod pods --field-selector spec.nodeName=k8s-ctr -o jsonpath='{.items[0].status.podIP}')
$ export WEBPODIP2=$(kubectl get -l app=webpod pods --field-selector spec.nodeName=k8s-w1  -o jsonpath='{.items[0].status.podIP}')
$ echo $WEBPODIP1 $WEBPODIP2
# => 172.20.1.31 172.20.0.130

# curl-pod 에서 WEBPODIP2 로 ping
$ kubectl exec -it curl-pod -- ping $WEBPODIP2

# 커널 라우팅 확인
$ ip -c route
# => ...
#    <span style="color: green;">172.20.0.0/24 via 192.168.10.101 dev eth1 proto kernel</span>
# <span style="color: green;">👉 curl-pod가 있는 k8s-ctr에서는 WEBPODIP2의 172.20.0.130이 포함된 패킷을 192.168.10.101 (k8s-w1 노드 IP)로 라우팅합니다.</span>

$ sshpass -p 'vagrant' ssh vagrant@k8s-w1 ip -c route
# => ...
#    <span style="color: green;">172.20.0.130 dev lxc9938d1653585 proto kernel scope link</span>
# <span style="color: green;">👉 k8s-w1에서는 172.20.0.130의 IP를 해당 pod의 veth로 전달합니다.</span>

#
$ cilium hubble port-forward&
# => ℹ️   Hubble Relay is available at 127.0.0.1:4245
$ hubble observe -f --pod curl-pod
# => Aug  2 14:29:00.271: default/curl-pod (ID:1072) -> default/webpod-bb8b9557f-7rn8t (ID:24748) to-network FORWARDED (ICMPv4 EchoRequest)
#    Aug  2 14:29:00.272: default/curl-pod (ID:1072) <- default/webpod-bb8b9557f-7rn8t (ID:24748) to-endpoint FORWARDED (ICMPv4 EchoReply)

#
$ tcpdump -i eth1 icmp
# => tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
#    listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
#    23:29:18.464606 IP 172.20.1.80 > 172.20.0.130: ICMP echo request, id 13, seq 815, length 64
#    23:29:18.465333 IP 172.20.0.130 > 172.20.1.80: ICMP echo reply, id 13, seq 815, length 64

#
$ tcpdump -i eth1 icmp -w /tmp/icmp.pcap
$ termshark -r /tmp/icmp.pcap

termshark에서 확인한 ICMP 패킷 정보

Hubble UI에서 확인한 ICMP 흐름 정보

Masquerading

Masquerading 소개

Docs

Masquerading는 Pod가 외부 네트워크와 통신할 때 Pod의 IP 주소를 Cilium 노드의 IP 주소로 변환하는 기능입니다.
Pod에서 사용되는 IPv4 주소는 일반적으로 RFC1918 개인 주소 공간에 할당되므로 외부로 라우팅 할 수 없습니다.
Cilium은 이러한 Pod IP를 이미 네트워크에서 라우팅 가능한 Cilium 노드의 IP로 변환하여 외부 네트워크와 통신할 수 있도록 합니다.
만약 masquerading 기능을 사용하지 않으려면, enable-ipv4-masquerade: false, enable-ipv6-masquerade: false 를 지정합니다
기본 동작은 로컬 노드의 IP 할당 CIDR 내에서 모든 목적지를 제외하는 것입니다.
즉, Pod가 로컬 노드의 IP 할당 CIDR 내에 있는 다른 Pod와 통신할 때는 masquerading를 수행하지 않습니다.
더 넓은 CIDR 범위를 제외하려면 ipv4-native-routing-cidr: 10.0.0/8 (또는 IPv6 주소의 경우 ipv6-native-routing-cidr: fd00:/100) 옵션을 사용하여 지정할 수 있습니다. 이 경우 해당 CIDR 내의 모든 목적지는 masquerade 되지 않습니다.

eBPF 기반 Masquerading

bpf.masquerade=true 옵션을 사용하여 eBPF 기반 masquerading을 활성화할 수 있습니다.
기본적으로 BPF masquerading은 BPF Host-Routing 모드도 활성화 시킵니다. 해당 모드의 장점과 한계를 확인하려면 eBPF Host-Routing 문서를 참조하세요.
Masquerading은 eBPF Masquerading 프로그램을 실행하는 장치에서만 작동합니다.

이는 출력 장치가 프로그램을 실행하는 경우 Pod에서 외부주소로 전송된 패킷이 Masquerading(출력장치 IPv4 주소로) 된다는 것을 의미합니다.

$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent  -- cilium status | grep Masquerading
# => Masquerading:            BPF   [eth0, eth1]   172.20.0.0/16  [IPv4: Enabled, IPv6: Disabled]

지정되지 않는 경우, 프로그램은 BPF NodePort 장치 감지를 사용하여 자동으로 감지됩니다.
이를 수동으로 변경하려면 devices helm 옵션을 사용하세요.
eBPF 기반 Masquerading은 TCP, UDP 및 ICMP 프로토콜을 지원합니다.

기본적으로 ipv4-native-routing-cidr 범위를 벗어난 IP 주소를 향하는 모든 패킷을 Masquerade하지만, 다른 클러스터 노드의 Node IP로 향하는 패킷은 제외됩니다. eBPF Masquerading이 활성화되면 pod에서 클러스터 노드의 External IP로의 트래픽도 Masquerading 되지 않습니다.

#
$ cilium config view  | grep ipv4-native-routing-cidr
# => ipv4-native-routing-cidr                          172.20.0.0/16
  
# 노드 IP로 통신 시 확인
$ tcpdump -i eth1 icmp -nn
# => 23:58:58.175157 IP 172.20.1.80 > 192.168.10.101: ICMP echo request, id 31, seq 1, length 64
#    23:58:58.175918 IP 192.168.10.101 > 172.20.1.80: ICMP echo reply, id 31, seq 1, length 64
#    ...
# <span style="color: green;">👉 Node IP로의 패킷은 Masquerading 되지 않고 Pod IP가 사용됨을 알 수 있습니다.</span>
$ kubectl exec -it curl-pod -- ping 192.168.10.101
# => PING 192.168.10.101 (192.168.10.101) 56(84) bytes of data.
#    64 bytes from 192.168.10.101: icmp_seq=1 ttl=63 time=0.888 ms
#    ...

iptables 기반 Masquerading

이 모드는 모든 커널버전에서 작동할 수 있는 레거시 구현입니다.
Cilium 네트워크 장치가 아닌 기본 네트워크 장치에서 iptables를 사용하여 masquerading을 수행합니다.
masquerading이 사용되는 네트워크 장치를 제한하고 싶을 경우 egress-masquerade-interfaces: eth0 옵션을 사용합니다.
대상 네트워크 CIDR에 따라 다른 소스 주소를 사용하는 고급 구성을 위해서는 enable-masquerade-to-route-source: "true"를 사용하여, 메인 network interface의 주소대신 소스 주소들을 사용할 수도 있습니다.

Masquerading 실습

실습 환경 구성
- router : 사내망 10.10.0.0/16 대역 통신과 연결, k8s에 join 되지 않은 web 서버, loop1/loop2 dump 인터페이스를 배치
현재 상태 확인

# 현재 설정 확인
$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent  -- cilium status | grep Masquerading
# => Masquerading:            BPF   [eth0, eth1]   172.20.0.0/16  [IPv4: Enabled, IPv6: Disabled]
#
$ cilium config view  | grep ipv4-native-routing-cidr
# => ipv4-native-routing-cidr                          172.20.0.0/16

# iptables 확인
$ iptables-save | grep -v KUBE | iptables-restore
$ iptables-save

$ sshpass -p 'vagrant' ssh vagrant@k8s-w1 "sudo iptables-save | grep -v KUBE | sudo iptables-restore"
$ sshpass -p 'vagrant' ssh vagrant@k8s-w1 sudo iptables-save

# 통신 확인
$ kubectl exec -it curl-pod -- curl -s webpod | grep Hostname
# => Hostname: webpod-bb8b9557f-7rn8t
$ kubectl exec -it curl-pod -- curl -s webpod | grep Hostname
# => Hostname: webpod-bb8b9557f-9tzvj

router eth1 192.168.10.200 통신 확인

# 터미널 2개 사용
[k8s-ctr] $ tcpdump -i eth1 icmp -nn # 혹은 hubble observe -f --pod curl-pod
[router] $ tcpdump -i eth1 icmp -nn

# router eth1 192.168.10.200 로 ping >> IP 확인해보자!
$ kubectl exec -it curl-pod -- ping 192.168.10.101
# <span style="color: green;">👉 k8s-ctr 쪽에만 패킷이 캡쳐됨</span>
$ kubectl exec -it curl-pod -- ping 192.168.10.200
# <span style="color: green;">👉 k8s-ctr 및 router 모두 패킷이 캡쳐됨</span>

# k8s-ctr 패킷 캡쳐 결과
# => 00:45:19.552476 IP 192.168.10.100 > 192.168.10.200: ICMP echo request, id 73, seq 1, length 64
#    00:45:19.553044 IP 192.168.10.200 > 192.168.10.100: ICMP echo reply, id 73, seq 1, length 64
#    ...

# router 패킷 캡쳐 결과
# => 00:45:19.494633 IP 192.168.10.100 > 192.168.10.200: ICMP echo request, id 73, seq 1, length 64
#    00:45:19.494758 IP 192.168.10.200 > 192.168.10.100: ICMP echo reply, id 73, seq 1, length 64
#    ...

---
# 터미널 2개 사용
[k8s-ctr] $ tcpdump -i eth1 tcp port 80 -nnq # 혹은 hubble observe -f --pod curl-pod
[router] $ tcpdump -i eth1 tcp port 80 -nnq

# router eth1 192.168.10.200 로 curl >> IP 확인해보자!
$ kubectl exec -it curl-pod -- curl -s webpod
# => Hostname: webpod-bb8b9557f-9tzvj
#    IP: 127.0.0.1
#    IP: ::1
#    IP: 172.20.1.31
#    IP: fe80::6857:8fff:fe68:c5d5
#    RemoteAddr: 172.20.1.80:56614
#    GET / HTTP/1.1
#    Host: webpod
#    User-Agent: curl/8.14.1
#    Accept: */*
# <span style="color: green;">👉 k8s-ctr 노드에 있는 webpod에 통신할때만 k8s-ctr에 캡쳐됨</span>
# <span style="color: green;">👉 router에는 캡쳐되지 않음</span>

# k8s-ctr 패킷 캡쳐 결과
# => 00:58:25.714438 IP 172.20.1.80.58682 > 172.20.0.130.80: tcp 70
#    ...
# <span style="color: green;">👉 IP는 Pod CIDR임</span>

$ kubectl exec -it curl-pod -- curl -s webpod
# => Hostname: webpod-bb8b9557f-7rn8t
#    IP: 172.20.0.130
#    RemoteAddr: 172.20.1.80:60086
#    ...
# <span style="color: green;">👉 k8s-1 노드에 있는 webpod에 통신할때는 패킷 캡쳐 되지않음</span>

$ kubectl exec -it curl-pod -- curl -s 192.168.10.200
# => <h1>Web Server : router</h1>
# <span style="color: green;">👉 k8s-ctr와 router 모두에 캡쳐됨</span>

# k8s-ctr 패킷 캡쳐 결과
# => 01:01:34.458560 IP 192.168.10.100.45668 > 192.168.10.200.80: tcp 78
#    ...
# router 패킷 캡쳐 결과
# => 01:01:34.501812 IP 192.168.10.100.45668 > 192.168.10.200.80: tcp 78
#    ...
# <span style="color: green;">👉 클러스터 바깥의 서버인 Router와는 Node IP로 통신하고 있음</span>

ip-masq-agent 설정

Docs

eBPF 기반 ip-masq-agent는 설정파일을 통해 nonMasqueradeCIDRs, masqLinkLocal, masqLinkLocalIPv6 옵션을 지원합니다.
nonMasqueradeCIDRs는 masquerading을 수행하지 않을 CIDR 범위를 지정합니다.

해당 설정이 없는 경우 agent는 다음의 masquerading 제외 CIDR을 사용합니다.

0.0.0/8
16.0.0/12
168.0.0/16
64.0.0/10
0.0.0/24
0.2.0/24
88.99.0/24
18.0.0/15
51.100.0/24
0.113.0/24
0.0.0/4

masqLinkLocal이 false이거나 지정되어있지 않으면 169.254.0.0/16 또한 masquerading 제외 CIDR로 사용됩니다.

ipMasqAgent 설정

# 아래 설정값은 cilium 데몬셋 자동 재시작됨
$ helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values \
  --set ipMasqAgent.enabled=true --set ipMasqAgent.config.nonMasqueradeCIDRs='{10.10.1.0/24,10.10.2.0/24}'
  
$ cilium hubble port-forward&
# => ℹ️   Hubble Relay is available at 127.0.0.1:4245
  
# ip-masq-agent configmap 생성 확인
$ kubectl get cm -n kube-system ip-masq-agent -o yaml | yq
# => ...
#        "config": "{\"nonMasqueradeCIDRs\":[\"10.10.1.0/24\",\"10.10.2.0/24\"]}"
#    ...
#        "name": "ip-masq-agent",
#    ...
$ kc describe cm -n kube-system ip-masq-agent 
# => Data
#    ====
#    config:
#    ----
#    {"nonMasqueradeCIDRs":["10.10.1.0/24","10.10.2.0/24"]}
#    ...
$ k9s 
  
#
$ cilium config view  | grep -i ip-masq
# => enable-ip-masq-agent                              true
  
#
$ kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg bpf ipmasq list
# => IP PREFIX/ADDRESS
#    10.10.1.0/24
#    10.10.2.0/24
#    169.254.0.0/16

CoreDNS, NodeLocalDNS

CoreDNS

CoreDNS 소개 - Docs , Home , Plugins , Youtube
- CoreDNS는 Kubernetes 클러스터의 DNS 서버로, 클러스터 내에서 서비스와 파드의 이름을 IP 주소로 변환하는 역할을 합니다.
- CoreDNS는 BIND, Knot, PowerDNS와 같은 전통적인 DNS 서버와는 다르게 대부분의 기능을 플러그인화 하여 유연하게 확장할 수 있습니다.
CoreDNS 설정 확인

# 파드의 DNS 설정 정보 확인
$ kubectl exec -it curl-pod -- cat /etc/resolv.conf
# => search default.svc.cluster.local svc.cluster.local cluster.local
#    nameserver 10.96.0.10
#    options ndots:5

#
$ cat /var/lib/kubelet/config.yaml | grep cluster -A1
# => clusterDNS:
#    - 10.96.0.10
#    clusterDomain: cluster.local

#
$ kubectl get svc,ep -n kube-system kube-dns
# => NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
#    service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   24h
#    
#    NAME                 ENDPOINTS                                                   AGE
#    endpoints/kube-dns   172.20.0.56:53,172.20.1.113:53,172.20.0.56:53 + 3 more...   24h

$ kubectl get pod -n kube-system -l k8s-app=kube-dns
# => NAME                       READY   STATUS    RESTARTS   AGE
#    coredns-674b8bbfcf-58c7n   1/1     Running   0          4h17m
#    coredns-674b8bbfcf-tg9ll   1/1     Running   0          4h17m

#
$ kc describe pod -n kube-system -l k8s-app=kube-dns
# => ...
#     config-volume:
#        Type:      ConfigMap (a volume populated by a ConfigMap)
#        Name:      coredns
#        Optional:  false
#    ...

$ kc describe cm -n kube-system coredns
# => ...
#    Corefile:
#    ----
#    .:53 {              # 모든 도메인 요청을 53포트에서 수신
#        errors          # DNS 응답 중 에러가 발생할 경우 로그 출력
#        health {        # health 엔드포인트를 제공하여 상태 확인 가능
#           lameduck 5s  # 종료 시 5초간 lameduck 모드로 트래픽을 점차 줄이며 종료
#        }
#        ready           # ready 엔드포인트 제공, 8181 포트의 HTTP 엔드포인트가, 모든 플러그인이 준비되었다는 신호를 보내면 200 OK 를 반환
#        kubernetes cluster.local in-addr.arpa ip6.arpa {    # Kubernetes DNS 플러그인 설정(클러스터 내부 도메인 처리), cluster.local: 클러스터 도메인
#           pods insecure                         # 파드 IP로 DNS 조회 허용 (보안 없음)
#           fallthrough in-addr.arpa ip6.arpa     #  해당 도메인에서 결과 없으면 다음 플러그인으로 전달
#           ttl 30                                #  캐시 타임 (30초)
#        }
#        prometheus :9153 # Prometheus metrics 수집 가능
#        forward . /etc/resolv.conf {             # CoreDNS가 모르는 도메인은 지정된 업스트림(보통 외부 DNS)으로 전달, .: 모든 쿼리
#           max_concurrent 1000                   # 병렬 포워딩 최대 1000개
#        }
#        cache 30 {                        # DNS 응답 캐시 기능, 기본 캐시 TTL 30초
#           disable success cluster.local  # 성공 응답 캐시 안 함 (cluster.local 도메인)
#           disable denial cluster.local   # NXDOMAIN 응답도 캐시 안 함
#        } 
#        loop         # 간단한 전달 루프(loop)를 감지하고, 루프가 발견되면 CoreDNS 프로세스를 중단(halt).
#        reload       # Corefile 이 변경되었을 때 자동으로 재적용, 컨피그맵 설정을 변경한 후에 변경 사항이 적용되기 위하여 약 2분정도 소요.
#        loadbalance  # 응답에 대하여 A, AAAA, MX 레코드의 순서를 무작위로 선정하는 라운드-로빈 DNS 로드밸런서.
#    }

#
$ cat /etc/resolv.conf
# => nameserver 127.0.0.53
#    options edns0 trust-ad
#    search .

$ resolvectl 
# => Link 2 (eth0)
#        Current Scopes: DNS
#             Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
#    Current DNS Server: 10.0.2.3
#           DNS Servers: 10.0.2.3

(참고) forward 플러그인 - Docs

# 활용 1 : '.consul.local' 도메인을 관리하는 도메인 서버가 존재 시, coredns 에서 해당 도메인 서버로 질의 설정 시
consul.local:53 {
    errors
    cache 30
    forward . 10.150.0.1
}
  
# 활용 2 : 모든 비 클러스터의 DNS 조회가 172.16.0.1 의 특정 네임서버 사용 시, /etc/resolv.conf 대신 forward 를 네임서버로 지정
forward .  172.16.0.1
  
# 위 1,2 포함한 설정 예시
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 172.16.0.1  # 활용 2
        cache 30
        loop
        reload
        loadbalance
    }
    consul.local:53 {         # 활용 1
        errors
        cache 30
        forward . 10.150.0.1
    }  

파드에서 DNS 질의 확인 - Docs , DNS for Services and Pods , Autoscale the DNS Service in a Cluster

# 모니터링1
$ cilium hubble port-forward&
$ hubble observe -f --port 53
$ hubble observe -f --port 53 --protocol UDP

# 모니터링2
$ tcpdump -i any udp port 53 -nn

# 파드 IP 확인
$ kubectl get pod -owide
# => NAME                     READY   STATUS    RESTARTS   AGE     IP             NODE      NOMINATED NODE   READINESS GATES
#    curl-pod                 1/1     Running   0          3h55m   172.20.1.80    k8s-ctr   <none>           <none>
#    webpod-bb8b9557f-7rn8t   1/1     Running   0          3h55m   172.20.0.130   k8s-w1    <none>           <none>
#    webpod-bb8b9557f-9tzvj   1/1     Running   0          3h55m   172.20.1.31    k8s-ctr   <none>           <none>

$ kubectl get pod -n kube-system -l k8s-app=kube-dns -owide
# => NAME                       READY   STATUS    RESTARTS   AGE     IP             NODE      NOMINATED NODE   READINESS GATES
#    coredns-674b8bbfcf-58c7n   1/1     Running   0          4h29m   172.20.1.113   k8s-ctr   <none>           <none>
#    coredns-674b8bbfcf-tg9ll   1/1     Running   0          4h29m   172.20.0.56    k8s-w1    <none>           <none>

$ kubectl exec -it curl-pod -- cat /etc/resolv.conf
# => search default.svc.cluster.local svc.cluster.local cluster.local
#    nameserver 10.96.0.10
#    options ndots:5

# 실습 편리를 위해 coredns 파드를 1개로 축소
$ kubectl scale deployment -n kube-system coredns --replicas 1
# => deployment.apps/coredns scaled
$ kubectl get pod -n kube-system -l k8s-app=kube-dns -owide
# => NAME                       READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
#    coredns-674b8bbfcf-tg9ll   1/1     Running   0          4h29m   172.20.0.56   k8s-w1   <none>           <none>

#
$ kubectl exec -it curl-pod -- curl kube-dns.kube-system.svc:9153/metrics | grep coredns_cache_ | grep -v ^#
# => coredns_cache_entries{server="dns://:53",type="denial",view="",zones="."} 1
#    coredns_cache_entries{server="dns://:53",type="success",view="",zones="."} 0
#    coredns_cache_misses_total{server="dns://:53",view="",zones="."} 46
#    coredns_cache_requests_total{server="dns://:53",view="",zones="."} 46

# 도메인 질의
$ kubectl exec -it curl-pod -- nslookup webpod
# => Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    Name:   webpod.default.svc.cluster.local
#    Address: 10.96.194.47

# tcpdump로 DNS 질의 패킷 확인
# => 01:27:05.029100 lxc9dcdf61704e7 In  IP 172.20.1.80.41316 > 172.20.0.56<span style="color: green;">.53</span>: 62435+ <span style="color: green;">A? webpod.default.svc.cluster.local.</span> (50)
#    01:27:05.029378 eth1  Out IP 172.20.1.80.41316 > 172.20.0.56.53: 62435+ A? webpod.default.svc.cluster.local. (50)
#    01:27:05.032936 eth1  In  IP 172.20.0.56.53 > 172.20.1.80.41316: 62435*- <span style="color: green;">1/0/0 A 10.96.194.47</span> (98)

$ kubectl exec -it curl-pod -- nslookup -debug webpod
$ kubectl exec -it curl-pod -- nslookup -debug google.com
# => Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    ------------
#        QUESTIONS:
#            google.com.default.svc.cluster.local, type = A, class = IN
#        ...
#    ------------
#    ** server can't find google.com.default.svc.cluster.local: NXDOMAIN
#    ;; Got recursion not available from 10.96.0.10
#    Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    ------------
#        QUESTIONS:
#            google.com.svc.cluster.local, type = A, class = IN
#        ...
#    ------------
#    ** server can't find google.com.svc.cluster.local: NXDOMAIN
#    ;; Got recursion not available from 10.96.0.10
#    Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    ------------
#        QUESTIONS:
#            google.com.cluster.local, type = A, class = IN
#        ...
#    ------------
#    ** server can't find google.com.cluster.local: NXDOMAIN
#    Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    ------------
#        QUESTIONS:
#            google.com, type = A, class = IN
#        ANSWERS:
#        ->  google.com
#            internet address = 142.250.206.238
#            ttl = 30
#        AUTHORITY RECORDS:
#        ADDITIONAL RECORDS:
#    ------------
#    Non-authoritative answer:
#    Name:   google.com
#    Address: 142.250.206.238
#    ------------
#        QUESTIONS:
#            google.com, type = AAAA, class = IN
#        ANSWERS:
#        ->  google.com
#            has AAAA address 2404:6800:400a:804::200e
#            ttl = 30
#        AUTHORITY RECORDS:
#        ADDITIONAL RECORDS:
#    ------------
#    Name:   google.com
#    Address: 2404:6800:400a:804::200e
# <span style="color: green;">👉 클러스터 외부의 도메인을 질의할때는 클러스터 내부의 search 도메인들을 먼저</span> 
# <span style="color: green;">확인하고, 없으면 외부 DNS 서버로 질의하여 응답을 받는 것을 알 수 있습니다.</span>

# tcpdump로 DNS 질의 패킷 확인
# => 01:33:33.852442 lxc9dcdf61704e7 In  IP 172.20.1.80.58442 > 172.20.0.56.53: 52842+ <span style="color: green;">A? google.com.default.svc.cluster.local.</span> (54)
#    01:33:33.852610 eth1  Out IP 172.20.1.80.58442 > 172.20.0.56.53: 52842+ A? google.com.default.svc.cluster.local. (54)
#    01:33:33.855947 eth1  In  IP 172.20.0.56.53 > 172.20.1.80.58442: 52842 NXDomain*- 0/1/0 (147)
#    01:33:33.861141 lxc9dcdf61704e7 In  IP 172.20.1.80.36323 > 172.20.0.56.53: 10644+ <span style="color: green;">A? google.com.svc.cluster.local.</span> (46)
#    01:33:33.861515 eth1  Out IP 172.20.1.80.36323 > 172.20.0.56.53: 10644+ A? google.com.svc.cluster.local. (46)
#    01:33:33.862582 eth1  In  IP 172.20.0.56.53 > 172.20.1.80.36323: 10644 NXDomain*- 0/1/0 (139)
#    01:33:33.868623 lxc9dcdf61704e7 In  IP 172.20.1.80.48326 > 172.20.0.56.53: 52392+ <span style="color: green;">A? google.com.cluster.local.</span> (42)
#    01:33:33.868857 eth1  Out IP 172.20.1.80.48326 > 172.20.0.56.53: 52392+ A? google.com.cluster.local. (42)
#    01:33:33.870240 eth1  In  IP 172.20.0.56.53 > 172.20.1.80.48326: 52392 NXDomain*- 0/1/0 (135)
#    01:33:33.874341 lxc9dcdf61704e7 In  IP 172.20.1.80.50810 > 172.20.0.56.53: 16999+ <span style="color: green;">A? google.com.</span> (28)
#    01:33:33.874556 eth1  Out IP 172.20.1.80.50810 > 172.20.0.56.53: 16999+ A? google.com. (28)
#    01:33:33.928831 eth1  In  IP 172.20.0.56.53 > 172.20.1.80.50810: 16999 <span style="color: green;">1/0/0 A 142.250.206.238</span> (54)

# coredns 로깅, 디버깅 활성화
# k9s → configmap → coredns 선택 → E(edit) → 아래처럼 log, debug 입력 후 빠져나오기
---
    .:53 {
        log
        debug
        errors
---

# 로그 모니터링 3
$ kubectl -n kube-system logs -l k8s-app=kube-dns -f

# 도메인 질의
$ kubectl exec -it curl-pod -- nslookup webpod
# => ;; Got recursion not available from 10.96.0.10
#    Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    Name:   webpod.default.svc.cluster.local
#    Address: 10.96.194.47
#    ;; Got recursion not available from 10.96.0.10

# coredns 로그 확인
# => [INFO] 172.20.1.80:46753 - 59201 "A IN webpod.default.svc.cluster.local. udp 50 false 512" NOERROR qr,aa,rd 98 0.000500084s
#    [INFO] 172.20.1.80:39996 - 54949 "AAAA IN webpod.default.svc.cluster.local. udp 50 false 512" NOERROR qr,aa,rd 143 0.000554333s

$ kubectl exec -it curl-pod -- nslookup google.com
# => Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    Non-authoritative answer:
#    Name:   google.com
#    Address: 142.250.206.238
#    Name:   google.com
#    Address: 2404:6800:400a:804::200e

# coredns 로그 확인
# => [INFO] 172.20.1.80:43389 - 1366 "A IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.001736458s
#    [INFO] 172.20.1.80:34460 - 17323 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000483917s
#    [INFO] 172.20.1.80:40469 - 26458 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000331334s
#    [INFO] 172.20.1.80:56627 - 19160 "A IN google.com. udp 28 false 512" NOERROR qr,rd,ra 54 0.054453541s
#    [INFO] 172.20.1.80:48324 - 36047 "AAAA IN google.com. udp 28 false 512" NOERROR qr,rd,ra 66 0.044442084s

# CoreDNS가 prometheus 플러그인을 사용하고 있다면, 메트릭 포트(:9153)를 통해 캐시 관련 정보를 수집.
## coredns_cache_entries 현재 캐시에 저장된 엔트리(항목) 수 : type: success 또는 denial (정상 응답 or NXDOMAIN 등)
## coredns_cache_hits_total	캐시 조회 성공 횟수
## coredns_cache_misses_total	캐시 미스 횟수
## coredns_cache_requests_total	캐시 관련 요청 횟수의 총합

$ kubectl exec -it curl-pod -- curl kube-dns.kube-system.svc:9153/metrics | grep coredns_cache_ | grep -v ^#
# => coredns_cache_entries{server="dns://:53",type="denial",view="",zones="."} 1
#    coredns_cache_entries{server="dns://:53",type="success",view="",zones="."} 2
#    coredns_cache_hits_total{server="dns://:53",type="success",view="",zones="."} 4
#    coredns_cache_misses_total{server="dns://:53",view="",zones="."} 116
#    coredns_cache_requests_total{server="dns://:53",view="",zones="."} 120

NodeLocalDNS

소개 블로그

NodeLocal DNSCache는 클러스터 노드에서 DNS 캐싱 에이전트를 DaemonSet으로 실행하여 클러스터 DNS 성능을 향상시킵니다.
오늘날의 아키텍처에서 ‘ClusterFirst’ DNS 모드의 Pods는 DNS 쿼리를 위해 kube-dns 서비스 IP에 도달합니다.
이는 kube-proxy에 의해 추가된 iptables 규칙을 통해 kube-dns/CoreDNS 엔드포인트로 변환됩니다.
이 새로운 아키텍처를 통해 Pods는 동일한 노드에서 실행되는 DNS 캐싱 에이전트에 도달하여 iptables DNAT 규칙과 연결 추적을 피할 수 있습니다.
로컬 캐싱 에이전트는 클러스터 호스트 이름(기본적으로 “cluster.local” 접미사)의 캐시 누락에 대해 kube-dns 서비스에 쿼리합니다.
현재 DNS 아키텍처에서는 로컬 kube-dns/CoreDNS 인스턴스가 없는 경우 DNS QPS가 가장 높은 포드가 다른 노드에 도달해야 할 수도 있습니다. 로컬 캐시를 사용하면 이러한 시나리오에서 지연 시간을 개선하는 데 도움이 됩니다.
iptables DNAT 및 연결 추적을 건너뛰면 연결 추적 레이스를 줄이고 UDP DNS 항목이 연결 추적 테이블을 채우는 것을 방지하는 데 도움이 됩니다.
로컬 캐싱 에이전트에서 kube-dns 서비스로의 연결은 TCP로 업그레이드할 수 있습니다. TCP 연결 트랙 항목은 시간 초과를 해야 하는 UDP 항목과 달리 연결 종료 시 제거됩니다(기본값 nf_conntrack_udp_timeout은 30초)
DNS 쿼리를 UDP에서 TCP로 업그레이드하면 삭제된 UDP 패킷과 DNS 타임아웃으로 인한 테일 지연 시간이 보통 최대 30초(3회 재시도 + 10초 타임아웃)까지 줄어듭니다. 노드로컬 캐시가 UDP DNS 쿼리를 듣기 때문에 애플리케이션을 변경할 필요가 없습니다.
노드 수준에서 DNS 요청에 대한 메트릭 및 가시성.
네거티브 캐싱을 다시 활성화하여 kube-dns 서비스에 대한 쿼리 수를 줄일 수 있습니다.
NodeLocal DNSCache 설치 방법 - Docs
- 설치시 NodeLocal DNSCache의 로컬 Listening IP 주소는 클러스터의 기존 IP와 충돌하지 않는 모든 주소일 수 있습니다.
- 예를들어 IPv4의 링크 로컬 범위인 169.254.0.0/16을 사용하거나, IPv6의 fd00::/8 범위를 사용하는것이 좋습니다.

$ curl -LO https://raw.githubusercontent.com/kubernetes/kubernetes/refs/heads/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

# 다음 값들은 적절한 값으로 대체 합니다.
$ kubedns=`kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP}` # coredns 의 ClusterIP
# $ domain=<cluster-domain> # 보통 기본값 cluster.local 사용
$ domain=cluster.local
# $ localdns=<node-local-address> # local listen IP address chosen for NodeLocal DNSCache
$ localdns=169.254.20.10
$ echo $kubedns $domain $localdns
# => 10.96.0.10 cluster.local 169.254.1.2

# case 1) kube-proxy가 IPTABLES 모드인 경우
$ sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/__PILLAR__DNS__SERVER__/$kubedns/g" nodelocaldns.yaml
# 이 모드에서는 node-local-dns 포드가 kube-dns 서비스 IP와 <node-local-address>를 모두 수신하므로, 포드는 IP 주소 중 하나를 사용하여 DNS 레코드를 조회할 수 있습니다.

# case 2) kube-proxy가 IPVS 모드인 경우
$ sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/,__PILLAR__DNS__SERVER__//g; s/__PILLAR__CLUSTER__DNS__/$kubedns/g" nodelocaldns.yaml
# - 이 모드에서는`node-local-dns` 포드가 **<node-local-address>**에서만 청취합니다.
# - IPVS 로드 밸런싱에 사용되는 인터페이스가 이미 이 주소를 사용하고 있기 때문에 `node-local-dns` 인터페이스는 kube-dns 클러스터 IP를 바인딩할 수 없습니다.
# - `__PILLAR__UPSTREAM__SERVERS__`는 `node-local-dns` 포드에 의해 채워집니다.

$ kubectl create -f nodelocaldns.yaml

node-local-dns 포드가 활성화되면 각 클러스터 노드의 kube-system 네임스페이스에서 실행됩니다.
이 포드는 캐시 모드에서 CoreDNS를 실행하므로 서로 다른 플러그인이 노출하는 모든 CoreDNS 메트릭을 노드 단위로 사용할 수 있습니다.
kubectl delete -f <manifest>를 사용하여 DaemonSet을 제거하여 비활성화할 수 있습니다. 변경한 내용을 kubetle 설정으로 되돌려야 합니다.
kube-dns의 ConfigMap에 지정된 StubDomains 과 upstream servers이 node-local-dns에 의해 사용됩니다.
iptables 모드일때와 ipvs 모드일때의 차이 iptables 모드일때 ipvs 모드일때
- iptables 모드일 때와 ipvs 모드일 때의 차이점은, ipvs 모드에서는 Pod에서 Domain Resolve 요청을 CoreDNS Service의 ClusterIP인 10.96.0.10 IP 주소가 아니라 NodeLocal DNSCache의 CoreDNS가 설정한 Local Address IP인 169.254.25.10 IP 주소로 전송한다는 점입니다.
- 따라서 Kubernetes Cluster가 ipvs 모드 kube-proxy를 사용하고 있다면 NodeLocal DNSCache 기법 적용 유무를 변경할 수 없습니다. ipvs 모드에서는 매번 kubelet의 Pod DNS Server 주소를 변경하고 kublet을 재시작해야 합니다. 또한 Pod들도 재시작하여 Pod가 이용하는 DNS Server의 주소가 변경되도록 해야 합니다.
- Kubernetes Cluster가 ipvs kube-proxy 모드를 사용하면 ipvs가 iptables의 NOTRACK Rule 을 무시하고 Loadbalancing 하기 때문입니다.

NodeLocal DNSCache 설치 및 확인

# iptables 확인
$ iptables-save | tee before.txt

#
$ wget https://github.com/kubernetes/kubernetes/raw/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
# => 2025-08-03 02:19:25 (534 KB/s) - ‘nodelocaldns.yaml’ saved [5377/5377]

# kubedns 는 coredns 서비스의 ClusterIP를 변수 지정
$ kubedns=`kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP}`
$ domain='cluster.local'    ## default 값
$ localdns='169.254.20.10'  ## default 값
$ echo $kubedns $domain $localdns
# => 10.96.0.10 cluster.local 169.254.20.10

# iptables 모드 사용 중으로 아래 명령어 수행
$ sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/__PILLAR__DNS__SERVER__/$kubedns/g" nodelocaldns.yaml

# nodelocaldns 설치
$ kubectl apply -f nodelocaldns.yaml
# => serviceaccount/node-local-dns created
#    service/kube-dns-upstream created
#    configmap/node-local-dns created
#    daemonset.apps/node-local-dns created
#    service/node-local-dns created

#
$ kubectl get pod -n kube-system -l k8s-app=node-local-dns -owide
# => NAME                   READY   STATUS              RESTARTS   AGE   IP               NODE      NOMINATED NODE   READINESS GATES
#    node-local-dns-6gzpj   0/1     ContainerCreating   0          10s   192.168.10.100   k8s-ctr   <none>           <none>
#    node-local-dns-c2846   0/1     ContainerCreating   0          10s   192.168.10.101   k8s-w1    <none>           <none>

#
$ kubectl edit cm -n kube-system node-local-dns # 'cluster.local' 과 '.:53' 에 log, debug 추가
# => configmap/node-local-dns edited
$ kubectl -n kube-system rollout restart ds node-local-dns
# => daemonset.apps/node-local-dns restarted

$ kubectl describe cm -n kube-system node-local-dns
# => cluster.local:53 {
#        log
#        debug
#        errors
#        cache {
#                success 9984 30
#                denial 9984 5
#        }
#        reload
#        loop
#        bind 169.254.20.10 10.96.0.10
#        forward . __PILLAR__CLUSTER__DNS__ {
#                force_tcp
#        }
#        prometheus :9253
#        health 169.254.20.10:8080
#        }
#    ...
#    .:53 {
#        log
#        debug
#        errors
#        cache 30
#        reload
#        loop
#        bind 169.254.20.10 10.96.0.10
#        forward . __PILLAR__UPSTREAM__SERVERS__
#        prometheus :9253
#        }

# iptables 확인 : 규칙 업데이트까지 다소 시간 소요!
$ iptables-save | tee after.txt
$ diff before.txt after.txt

##
$ iptables -t filter -S | grep -i dns
# => -A INPUT -d 10.96.0.10/32 -p udp -m udp --dport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A INPUT -d 10.96.0.10/32 -p tcp -m tcp --dport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A INPUT -d 169.254.20.10/32 -p udp -m udp --dport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A INPUT -d 169.254.20.10/32 -p tcp -m tcp --dport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A OUTPUT -s 10.96.0.10/32 -p udp -m udp --sport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A OUTPUT -s 10.96.0.10/32 -p tcp -m tcp --sport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A OUTPUT -s 169.254.20.10/32 -p udp -m udp --sport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT
#    -A OUTPUT -s 169.254.20.10/32 -p tcp -m tcp --sport 53 -m comment --comment "NodeLocal DNS Cache: allow DNS traffic" -j ACCEPT

##
$ iptables -t raw -S | grep -i dns
# => -A PREROUTING -d 10.96.0.10/32 -p udp -m udp --dport 53 -m comment --comment "NodeLocal DNS Cache: skip conntrack" -j NOTRACK
#    -A PREROUTING -d 10.96.0.10/32 -p tcp -m tcp --dport 53 -m comment --comment "NodeLocal DNS Cache: skip conntrack" -j NOTRACK
#    -A PREROUTING -d 169.254.20.10/32 -p udp -m udp --dport 53 -m comment --comment "NodeLocal DNS Cache: skip conntrack" -j NOTRACK
#    -A PREROUTING -d 169.254.20.10/32 -p tcp -m tcp --dport 53 -m comment --comment "NodeLocal DNS Cache: skip conntrack" -j NOTRACK
#    -A OUTPUT -s 10.96.0.10/32 -p tcp -m tcp --sport 8080 -m comment --comment "NodeLocal DNS Cache: skip conntrack" -j NOTRACK
#    -A OUTPUT -d 10.96.0.10/32 -p tcp -m tcp --dport 8080 -m comment --comment "NodeLocal DNS Cache: skip conntrack" -j NOTRACK
#    ...

# logs : 
$ kubectl -n kube-system logs -l k8s-app=kube-dns -f
$ kubectl -n kube-system logs -l k8s-app=node-local-dns -f

#
$ kubectl exec -it curl-pod -- cat /etc/resolv.conf
# => search default.svc.cluster.local svc.cluster.local cluster.local
#    nameserver 10.96.0.10
#    options ndots:5

# 
$ kubectl exec -it curl-pod -- nslookup webpod
# kube-dns 로그 확인
# => [INFO] 172.20.1.80:57227 - 46178 "A IN webpod.default.svc.cluster.local. udp 50 false 512" NOERROR qr,aa,rd 98 0.000729042s
#    [INFO] 172.20.1.80:49087 - 7946 "AAAA IN webpod.default.svc.cluster.local. udp 50 false 512" NOERROR qr,aa,rd 143 0.000557625s

$ kubectl exec -it curl-pod -- nslookup google.com
# kube-dns 로그 확인
# => [INFO] 172.20.1.80:52008 - 19063 "A IN google.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000802417s
#    [INFO] 172.20.1.80:53056 - 1029 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000377583s
#    [INFO] 172.20.1.80:40165 - 2061 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000287959s
#    [INFO] 172.20.1.80:42852 - 42332 "A IN google.com. udp 28 false 512" NOERROR qr,aa,rd,ra 54 0.000450083s
#    [INFO] 172.20.1.80:52047 - 947 "AAAA IN google.com. udp 28 false 512" NOERROR qr,aa,rd,ra 66 0.000662125s
# <span style="color: green;">👉 로그가 kube-dns 쪽에만 쌓입니다.</span>

#
$ kubectl delete pod curl-pod

$ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: curl-pod
  labels:
    app: curl
spec:
  containers:
  - name: curl
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF
# => pod/curl-pod created

$ kubectl exec -it curl-pod -- cat /etc/resolv.conf
# => search default.svc.cluster.local svc.cluster.local cluster.local
#    nameserver 10.96.0.10
#    options ndots:5

# 로그 확인 시 현재 nodelocaldns 미활용! 
$ kubectl -n kube-system logs -l k8s-app=kube-dns -f
$ kubectl -n kube-system logs -l k8s-app=node-local-dns -f

#
$ kubectl exec -it curl-pod -- nslookup webpod
$ kubectl exec -it curl-pod -- nslookup google.com

위의 예제에서는 아직 NodeLocal DNSCache를 사용하지 않고 있습니다.

Cilium Local Redirect Policy

--set localRedirectPolicy=true 해서 Local Redirect Policy를 활성화하면, Cilium은 NodeLocal DNSCache를 사용하여 DNS 요청을 처리합니다. Docs
IP 주소와 Port/Protocol tuple 또는 Kubernetes Service 로 향하는 포드 트래픽을 eBPF를 사용하여 노드 내 백엔드 포드로 로컬로 리디렉션할 수 있도록 하는 Cilium의 로컬 리디렉션 정책을 구성하는 방법을 설명합니다.
백엔드 포드의 네임스페이스는 정책의 네임스페이스와 일치해야 합니다.
CiliumLocalRedirectPolicy는 CustomResourceDefinition으로 구성되어 있습니다.

#
$ helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values \
  --set localRedirectPolicy=true

$ kubectl rollout restart deploy cilium-operator -n kube-system
# => deployment.apps/cilium-operator restarted
$ kubectl rollout restart ds cilium -n kube-system
# => daemonset.apps/cilium restarted

#
$ wget https://raw.githubusercontent.com/cilium/cilium/1.17.6/examples/kubernetes-local-redirect/node-local-dns.yaml

$ kubedns=$(kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP})
$ sed -i "s/__PILLAR__DNS__SERVER__/$kubedns/g;" node-local-dns.yaml
$ vi -d nodelocaldns.yaml node-local-dns.yaml

nodelocaldns.yaml과 node-local-dns.yaml의 diff 결과

## before
# args: [ "-localip", "169.254.20.10,10.96.0.10", "-conf", "/etc/Corefile", "-upstreamsvc", "kube-dns-upstream" ]

## after
# args: [ "-localip", "169.254.20.10,10.96.0.10", "-conf", "/etc/Corefile", "-upstreamsvc", "kube-dns-upstream", "-skipteardown=true", "-setupinterface=false", "-setupiptables=false" ]


# 배포
# Modify Node-local DNS cache’s deployment yaml to pass these additional arguments to node-cache: 
## -skipteardown=true, -setupinterface=false, and -setupiptables=false.

# Modify Node-local DNS cache’s deployment yaml to put it in non-host namespace by setting hostNetwork: false for the daemonset.
# In the Corefile, bind to 0.0.0.0 instead of the static IP.
$ kubectl apply -f node-local-dns.yaml
# => serviceaccount/node-local-dns configured
#    service/kube-dns-upstream configured
#    configmap/node-local-dns configured
#    daemonset.apps/node-local-dns configured

#
$ kubectl edit cm -n kube-system node-local-dns # log, debug 추가
# => configmap/node-local-dns edited
$ kubectl -n kube-system rollout restart ds node-local-dns
# => daemonset.apps/node-local-dns restarted

$ kubectl describe cm -n kube-system node-local-dns
# => ...
#    cluster.local:53 {
#        log
#        debug
#        errors
#        cache {
#                success 9984 30
#                denial 9984 5
#        }
#        reload
#        loop
#        bind 0.0.0.0
#        forward . __PILLAR__CLUSTER__DNS__ {
#                force_tcp
#        }
#        prometheus :9253
#        health
#        }
#    ...
#    .:53 {
#        log
#        debug
#        errors
#        cache 30
#        reload
#        loop
#        bind 0.0.0.0
#        forward . __PILLAR__UPSTREAM__SERVERS__
#        prometheus :9253
#        }
#    ...

#
$ wget https://raw.githubusercontent.com/cilium/cilium/1.17.6/examples/kubernetes-local-redirect/node-local-dns-lrp.yaml
$ cat node-local-dns-lrp.yaml
# => apiVersion: "cilium.io/v2"
#    kind: CiliumLocalRedirectPolicy
#    metadata:
#      name: "nodelocaldns"
#      namespace: kube-system
#    spec:
#      redirectFrontend:
#        serviceMatcher:
#          serviceName: kube-dns
#          namespace: kube-system
#      redirectBackend:
#        localEndpointSelector:
#          matchLabels:
#            k8s-app: node-local-dns
#        toPorts:
#          - port: "53"
#            name: dns
#            protocol: UDP
#          - port: "53"
#            name: dns-tcp
#            protocol: TCP
        
$ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.17.6/examples/kubernetes-local-redirect/node-local-dns-lrp.yaml
# => ciliumlocalredirectpolicy.cilium.io/nodelocaldns created

#
$ kubectl get CiliumLocalRedirectPolicy -A
# => NAMESPACE     NAME           AGE
#    kube-system   nodelocaldns   8s

#
$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg lrp list
# => LRP namespace   LRP name       FrontendType                Matching Service
#    kube-system     nodelocaldns   clusterIP + all svc ports   kube-system/kube-dns

$ kubectl exec -it -n kube-system ds/cilium -c cilium-agent -- cilium-dbg service list | grep LocalRedirect
# => 16   10.96.0.10:53/UDP       LocalRedirect   1 => 172.20.0.73:53/UDP (active)
#    17   10.96.0.10:53/TCP       LocalRedirect   1 => 172.20.0.73:53/TCP (active)

# logs
$ kubectl -n kube-system logs -l k8s-app=kube-dns -f
$ kubectl -n kube-system logs -l k8s-app=node-local-dns -f

#
$ kubectl exec -it curl-pod -- nslookup www.google.com
# => Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    Non-authoritative answer:
#    Name:   www.google.com
#    Address: 142.250.206.228
#    Name:   www.google.com
#    Address: 2404:6800:400a:804::2004

# kube-dns 로그 확인
# => [INFO] 172.20.1.178:55860 - 32731 "A IN www.google.com.default.svc.cluster.local. tcp 58 false 65535" NXDOMAIN qr,aa,rd 151 0.002254584s
#    [INFO] 172.20.1.178:55860 - 50477 "A IN www.google.com.svc.cluster.local. tcp 50 false 65535" NXDOMAIN qr,aa,rd 143 0.000426625s
#    [INFO] 172.20.1.178:55860 - 42463 "A IN www.google.com.cluster.local. tcp 46 false 65535" NXDOMAIN qr,aa,rd 139 0.000225583s

# node-local-dns 로그 확인
# => [INFO] 172.20.1.20:52275 - 32731 "A IN www.google.com.default.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.009171834s
#    [INFO] 172.20.1.20:60077 - 50477 "A IN www.google.com.svc.cluster.local. udp 50 false 512" NXDOMAIN qr,aa,rd 143 0.001657537s
#    [INFO] 172.20.1.20:48089 - 42463 "A IN www.google.com.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.001401951s
#    [INFO] 172.20.1.20:58721 - 18703 "A IN www.google.com. udp 32 false 512" NOERROR qr,rd,ra 62 0.046337824s
#    [INFO] 172.20.1.20:42914 - 43270 "AAAA IN www.google.com. udp 32 false 512" NOERROR qr,rd,ra 74 0.042382055s

# 한번더 dns 조회
$ kubectl exec -it curl-pod -- nslookup www.google.com
# => Server:         10.96.0.10
#    Address:        10.96.0.10#53
#    
#    Non-authoritative answer:
#    Name:   www.google.com
#    Address: 142.250.206.228
#    Name:   www.google.com
#    Address: 2404:6800:400a:804::2004

# kube-dns 로그 확인
# => (로그 없음)

# node-local-dns 로그 확인
# => [INFO] 172.20.1.20:52275 - 32731 "A IN www.google.com.default.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.009171834s
#    [INFO] 172.20.1.20:60077 - 50477 "A IN www.google.com.svc.cluster.local. udp 50 false 512" NXDOMAIN qr,aa,rd 143 0.001657537s
#    [INFO] 172.20.1.20:48089 - 42463 "A IN www.google.com.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.001401951s
#    [INFO] 172.20.1.20:58721 - 18703 "A IN www.google.com. udp 32 false 512" NOERROR qr,rd,ra 62 0.046337824s
#    [INFO] 172.20.1.20:42914 - 43270 "AAAA IN www.google.com. udp 32 false 512" NOERROR qr,rd,ra 74 0.042382055s
# <span style="color: green;">👉 연속 조회시 node-local-dns에 캐시가 되어서 kube-dns의 조회가 줄어듬을 확인할 수 있습니다.</span>

# nodelocaldns 에 캐시된 정보로 바로 질의 응답 확인!
$ kubectl -n kube-system logs -l k8s-app=node-local-dns -f
# => [INFO] 172.20.1.20:52275 - 32731 "A IN www.google.com.default.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.009171834s
#    [INFO] 172.20.1.20:60077 - 50477 "A IN www.google.com.svc.cluster.local. udp 50 false 512" NXDOMAIN qr,aa,rd 143 0.001657537s
#    [INFO] 172.20.1.20:48089 - 42463 "A IN www.google.com.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.001401951s
#    [INFO] 172.20.1.20:58721 - 18703 "A IN www.google.com. udp 32 false 512" NOERROR qr,rd,ra 62 0.046337824s
#    [INFO] 172.20.1.20:42914 - 43270 "AAAA IN www.google.com. udp 32 false 512" NOERROR qr,rd,ra 74 0.042382055s

마치며

이번 포스트에서는 노드의 파드 들간 통신과 외부와의 통신, DNS 요청을 처리하는 방법에 대해 알아보았습니다. 네트워크는 볼때마다 어려운것 같습니다. 하지만 조금씩 이해가 되고, 익숙해지는 것 같습니다. 한걸음 한걸음 나아가고 있는것이 느껴집니다.

다양한 주제에 걸쳐 배웠는데, 모든 파트에서 수 많은 사람들이 조금이라도 네트워크를 효율적으로 하기위해서 애쓴 흔적들을 볼 수 있었습니다. 그런 분들이 있어서 지금 이렇게 인터넷을 사용하고 클라우드를 이용할 수 있으니 한번도 뵌적은 없지만 그분들에게 감사의 마음을 전합니다.

부록

K9s 주요 단축키

Action	Command	Comment
Show active keyboard mnemonics and help	`?`
Show all available resource alias	`ctrl-a`
To bail out of K9s	`:quit` `:q` `ctrl-c`
To go up/back to the previous view	`esc`	If you have crumbs on, this will go to the previous one
View a Kubernetes resource using singular/plural or short-name	`:pod`	accepts singular, plural, short-name or alias ie pod or pods
View a Kubernetes resource in a given namespace	`:pod ns-x`
View filtered pods (New v0.30.0!)	`:pod /fred`	View all pods filtered by fred
View labeled pods (New v0.30.0!)	`:pod app=fred,env=dev`	View all pods with labels matching app=fred and env=dev
View pods in a given context (New v0.30.0!)	`:pod @ctx1`	View all pods in context ctx1. Switches out your current k9s context!
Filter out a resource view given a filter	`/filter`	Regex2 supported ie `fred`
Inverse regex filter	`/! filter`	Keep everything that doesn’t match.
Filter resource view by labels	`/-l label-selector`
Fuzzy find a resource given a filter	`/-f filter`
Bails out of view/command/filter mode	`<esc>`
Key mapping to describe, view, edit, view logs,…	`d`, `v`, `e`, `l`,…
To view and switch to another Kubernetes context (Pod view)	`:ctx`
To view and switch directly to another Kubernetes context (Last used view)	`:ctx context-name`
To view and switch to another Kubernetes namespace	`:ns`
To switch back to the last active command (like how “cd -“ works)	`-`	Navigation that adds breadcrumbs to the bottom are not commands
To go back and forward through the command history	back: `[`, forward: `]`	Same as above
To view all saved resources	`:screendump` or `:sd`
To delete a resource (TAB and ENTER to confirm)	`ctrl-d`
To kill a resource (no confirmation dialog, equivalent to kubectl delete –now)	`ctrl-k`
Launch pulses view	`:pulses` or `:pu`
Launch XRay view	`:xray RESOURCE [NAMESPACE]`	RESOURCE can be one of po, svc, dp, rs, sts, ds, NAMESPACE is optional
Launch Popeye view	`:popeye` or `:pop`	See popeye