Life of a Packet in Kubernetes — Part 1

Topics — Part 1

  1. Linux namespaces
  2. Container networking
  3. What is CNI?
  4. Pod network namespace

Topics — Part 2

  1. Calico CNI

Topics — Part 3

  1. Pod-to-Pod
  2. Pod-to-External
  3. Pod-to-Service
  4. External-to-Pod
  5. External Traffic Policy
  6. Kube-Proxy
  7. iptable rules processing flow
  8. Network Policy basics

Topics — Part 4

  1. Ingress Controller
  2. Ingress Resources Example
  3. Nginx
  4. Envoy+Contour
  5. Ingress with MetalLB

Topics — Part 5

  1. ISTIO service mesh

Linux Namespaces

  1. Mount — isolate filesystem mount points
  2. UTS — isolate hostname and domain name
  3. IPC — isolate interprocess communication (IPC) resources
  4. PID — isolate the PID number space
  5. Network — isolate network interfaces
  6. User — isolate UID/GID number spaces
  7. Cgroup — isolate cgroup root directory
PC: Google

Container Networking (Network Namespace)

master# ip netns add client
master# ip netns add server
master# ip netns list
server
client
master# ip link add veth-client type veth peer name veth-server
master# ip link list | grep veth
4: veth-server@veth-client: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
5: veth-client@veth-server: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
master# ip link set veth-client netns client
master# ip link set veth-server netns server
master# ip link list | grep veth # doesn’t exist on the host network namespace now
master# ip netns exec client ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth-client@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ca:e8:30:2e:f9:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 1
master# ip netns exec server ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: veth-server@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 42:96:f0:ae:f0:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
master# ip netns exec client ip address add 10.0.0.11/24 dev veth-client
master# ip netns exec client ip link set veth-client up
master# ip netns exec server ip address add 10.0.0.12/24 dev veth-server
master# ip netns exec server ip link set veth-server up
master#
master# ip netns exec client ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth-client@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ca:e8:30:2e:f9:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 10.0.0.11/24 scope global veth-client
valid_lft forever preferred_lft forever
inet6 fe80::c8e8:30ff:fe2e:f9d2/64 scope link
valid_lft forever preferred_lft forever
master#
master# ip netns exec server ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: veth-server@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 42:96:f0:ae:f0:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.0.12/24 scope global veth-server
valid_lft forever preferred_lft forever
inet6 fe80::4096:f0ff:feae:f0c5/64 scope link
valid_lft forever preferred_lft forever
master#
master# ip netns exec client ping 10.0.0.12
PING 10.0.0.12 (10.0.0.12) 56(84) bytes of data.
64 bytes from 10.0.0.12: icmp_seq=1 ttl=64 time=0.101 ms
64 bytes from 10.0.0.12: icmp_seq=2 ttl=64 time=0.072 ms
64 bytes from 10.0.0.12: icmp_seq=3 ttl=64 time=0.084 ms
64 bytes from 10.0.0.12: icmp_seq=4 ttl=64 time=0.077 ms
64 bytes from 10.0.0.12: icmp_seq=5 ttl=64 time=0.079 ms
# All in one
BR=bridge1
HOST_IP=172.17.0.33
ip link add client1-veth type veth peer name client1-veth-br
ip link add server1-veth type veth peer name server1-veth-br
ip link add $BR type bridge
ip netns add client1
ip netns add server1
ip link set client1-veth netns client1
ip link set server1-veth netns server1
ip link set client1-veth-br master $BR
ip link set server1-veth-br master $BR
ip link set $BR up
ip link set client1-veth-br up
ip link set server1-veth-br up
ip netns exec client1 ip link set client1-veth up
ip netns exec server1 ip link set server1-veth up
ip netns exec client1 ip addr add 172.30.0.11/24 dev client1-veth
ip netns exec server1 ip addr add 172.30.0.12/24 dev server1-veth
ip netns exec client1 ping 172.30.0.12 -c 5
ip addr add 172.30.0.1/24 dev $BR
ip netns exec client1 ping 172.30.0.12 -c 5
ip netns exec client1 ping 172.30.0.1 -c 5
controlplane $ ip netns exec client1 ping 172.30.0.12 -c 5
PING 172.30.0.12 (172.30.0.12) 56(84) bytes of data.
64 bytes from 172.30.0.12: icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from 172.30.0.12: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 172.30.0.12: icmp_seq=3 ttl=64 time=0.073 ms
64 bytes from 172.30.0.12: icmp_seq=4 ttl=64 time=0.070 ms
64 bytes from 172.30.0.12: icmp_seq=5 ttl=64 time=0.107 ms
controlplane $ ip netns exec client1 ping $HOST_IP -c 2
connect: Network is unreachable
controlplane $ ip netns exec client1 ip route add default via 172.30.0.1
controlplane $ ip netns exec server1 ip route add default via 172.30.0.1
controlplane $ ip netns exec client1 ping $HOST_IP -c 5
PING 172.17.0.23 (172.17.0.23) 56(84) bytes of data.
64 bytes from 172.17.0.23: icmp_seq=1 ttl=64 time=0.053 ms
64 bytes from 172.17.0.23: icmp_seq=2 ttl=64 time=0.121 ms
64 bytes from 172.17.0.23: icmp_seq=3 ttl=64 time=0.078 ms
64 bytes from 172.17.0.23: icmp_seq=4 ttl=64 time=0.129 ms
64 bytes from 172.17.0.23: icmp_seq=5 ttl=64 time=0.119 ms
--- 172.17.0.23 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.053/0.100/0.129/0.029 ms
controlplane $ ping 8.8.8.8 -c 2
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=3.40 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=3.81 ms
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 3.403/3.610/3.817/0.207 ms

How to reach the private network from external servers?

docker0   Link encap:Ethernet  HWaddr 02:42:e2:44:07:39
inet addr:172.18.0.1 Bcast:172.18.0.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
controlplane $ docker run -d --name web --rm nginx
efff2d2c98f94671f69cddc5cc88bb7a0a5a2ea15dc3c98d911e39bf2764a556
controlplane $ WEB_IP=`docker inspect -f "{{ .NetworkSettings.IPAddress }}" web`
controlplane $ docker inspect web --format '{{ .NetworkSettings.SandboxKey }}'
/var/run/docker/netns/c009f2a4be71
controlplane $ container_id=web
controlplane $ container_netns=$(docker inspect ${container_id} --format '{{ .NetworkSettings.SandboxKey }}')
controlplane $ mkdir -p /var/run/netns
controlplane $ rm -f /var/run/netns/${container_id}
controlplane $ ln -sv ${container_netns} /var/run/netns/${container_id}
'/var/run/netns/web' -> '/var/run/docker/netns/c009f2a4be71'
controlplane $ ip netns list
web (id: 3)
server1 (id: 1)
client1 (id: 0)
controlplane $ ip netns exec web ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.3/24 brd 172.18.0.255 scope global eth0
valid_lft forever preferred_lft forever
controlplane $ WEB_IP=`docker inspect -f "{{ .NetworkSettings.IPAddress }}" web`
controlplane $ echo $WEB_IP
172.18.0.3
controlplane $ curl $WEB_IP
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
controlplane $ iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination $WEB_IP:80
controlplane $ echo $HOST_IP
172.17.0.23
node01 $ curl 172.17.0.23
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
node01 $

What is CNI?

  • The spec defines a container as being a Linux network namespace. We should be comfortable with that definition as container runtimes like Docker create a new network namespace for each container.
  • Network definitions for CNI are stored as JSON files.
  • The network definitions are streamed to the plugin through STDIN; i.e there are no configuration files sitting on the host for the network configuration.
  • Other arguments are passed to the plugin via environmental variables
  • A CNI plugin is implemented as an executable.
  • The CNI plugin is responsible for wiring up the container. i.e it needs to do all the work to get the container on the network. In Docker, this would include connecting the container network namespace back to the host somehow.
  • The CNI plugin is responsible for IPAM which includes IP address assignment and installing any required routes.

Let’s try to simulate the Pod creation manually without Kubernetes and assign IP via CNI plugin instead of using too many CLI commands. At end of this demo, you will understand what is a Pod in Kubernetes.

controlplane $ mkdir cni
controlplane $ cd cni
controlplane $ curl -O -L https://github.com/containernetworking/cni/releases/download/v0.4.0/cni-amd64-v0.4.0.tgz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 644 100 644 0 0 1934 0 --:--:-- --:--:-- --:--:-- 1933
100 15.3M 100 15.3M 0 0 233k 0 0:01:07 0:01:07 --:--:-- 104k
controlplane $ tar -xvf cni-amd64-v0.4.0.tgz
./
./macvlan
./dhcp
./loopback
./ptp
./ipvlan
./bridge
./tuning
./noop
./host-local
./cnitool
./flannel
controlplane $
cat > /tmp/00-demo.conf <<"EOF"
{
"cniVersion": "0.2.0",
"name": "demo_br",
"type": "bridge",
"bridge": "cni_net0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.0.10.0/24",
"routes": [
{ "dst": "0.0.0.0/0" },
{ "dst": "1.1.1.1/32", "gw":"10.0.10.1"}
]
}
}
EOF
-:CNI generic parameters:-cniVersion: The version of the CNI spec in which the definition works with
name: The network name
type: The name of the plugin you wish to use. In this case, the actual name of the plugin executable
args: Optional additional parameters
ipMasq: Configure outbound masquerade (source NAT) for this network
ipam:
type: The name of the IPAM plugin executable
subnet: The subnet to allocate out of (this is actually part of the IPAM plugin)
routes:
dst: The subnet you wish to reach
gw: The IP address of the next hop to reach the dst. If not specified the default gateway for the subnet is assumed
dns:
nameservers: A list of nameservers you wish to use with this network
domain: The search domain to use for DNS requests
search: A list of search domains
options: A list of options to be passed to the receiver
controlplane $ docker run --name pause_demo -d --rm --network none kubernetes/pause
Unable to find image 'kubernetes/pause:latest' locally
latest: Pulling from kubernetes/pause
4f4fb700ef54: Pull complete
b9c8ec465f6b: Pull complete
Digest: sha256:b31bfb4d0213f254d361e0079deaaebefa4f82ba7aa76ef82e90b4935ad5b105
Status: Downloaded newer image for kubernetes/pause:latest
763d3ef7d3e943907a1f01f01e13c7cb6c389b1a16857141e7eac0ac10a6fe82
controlplane $ container_id=pause_demo
controlplane $ container_netns=$(docker inspect ${container_id} --format '{{ .NetworkSettings.SandboxKey }}')
controlplane $ mkdir -p /var/run/netns
controlplane $ rm -f /var/run/netns/${container_id}
controlplane $ ln -sv ${container_netns} /var/run/netns/${container_id}
'/var/run/netns/pause_demo' -> '/var/run/docker/netns/0297681f79b5'
controlplane $ ip netns list
pause_demo
controlplane $ ip netns exec $container_id ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
controlplane $ CNI_CONTAINERID=$container_id CNI_IFNAME=eth10 CNI_COMMAND=ADD CNI_NETNS=/var/run/netns/$container_id CNI_PATH=`pwd` ./bridge </tmp/00-demo.conf
2020/10/17 17:32:37 Error retriving last reserved ip: Failed to retrieve last reserved ip: open /var/lib/cni/networks/demo_br/last_reserved_ip: no such file or directory
{
"ip4": {
"ip": "10.0.10.2/24",
"gateway": "10.0.10.1",
"routes": [
{
"dst": "0.0.0.0/0"
},
{
"dst": "1.1.1.1/32",
"gw": "10.0.10.1"
}
]
},
"dns": {}
  • CNI_COMMAND=ADD — Action (Available values: ADD/DEL/CHECK)
  • CNI_CONTAINER=pause_demo — We’re telling CNI that the network namespace we want to work is called ‘pause_demo
  • CNI_NETNS=/var/run/netns/pause_demo— The path to the namespace in question
  • CNI_IFNAME=eth10— The name of the interface we wish to use on the container side of the connection
  • CNI_PATH=`pwd` — We always need to tell CNI where the plugin executables live. In this case, since we’re already in the ‘cni’ directory we just have the variable reference pwd (present working directory). You need the ticks around the command pwd for it to evaluate correctly. The formatting here seems to be removing them but they are in the command above correctly
controlplane $ ip netns exec pause_demo ifconfig
eth10 Link encap:Ethernet HWaddr 0a:58:0a:00:0a:02
inet addr:10.0.10.2 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1476 (1.4 KB) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
controlplane $ ip netns exec pause_demo ip route
default via 10.0.10.1 dev eth10
1.1.1.1 via 10.0.10.1 dev eth10
10.0.10.0/24 dev eth10 proto kernel scope link src 10.0.10.2
controlplane $
controlplane $ ifconfig
cni_net0 Link encap:Ethernet HWaddr 0a:58:0a:00:0a:01
inet addr:10.0.10.1 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::c4a4:2dff:fe4b:aa1b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1174 (1.1 KB) TX bytes:1545 (1.5 KB)
controlplane $ docker run --name web_demo -d --rm --network container:$container_id nginx
8fadcf2925b779de6781b4215534b32231685b8515f998b2a66a3c7e38333e30
controlplane $ curl `cat /var/lib/cni/networks/demo_br/last_reserved_ip`
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Pod Network Namespace

References:

https://man7.org/linux/man-pages/man7/namespaces.7.html
https://github.com/containernetworking/cni/blob/master/SPEC.md
https://github.com/containernetworking/cni/tree/master/cnitool
https://github.com/containernetworking/cni
https://tldp.org/HOWTO/BRIDGE-STP-HOWTO/set-up-the-bridge.html
https://kubernetes.io/
https://www.dasblinkenlichten.com/

--

--

--

dineshkumarr.ramasamy@gmail.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Salesforce Functions: First Impression

Fresher to Hirable Software Engineer

Hidden trepidation of automation

Integrating Nanoid into a MERN app

Learning to google better with Rails

Essential when to use guide for Test Automation

Hacker101 CTF- Micro-CMS v2 Walkthrough Part 2

Git With It!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store