Life of a Packet in Kubernetes — Part 1

Kubernetes cluster networking can be more than a bit confusing, even for engineers with hands-on experience working with virtual networks and request routing. This article will help to understand the fundamental Kubernetes networking. The initial plan was to deep dive into Kubernetes networking’s complexities by following the journey of an HTTP request to a service running on a Kubernetes cluster. However, the life of a packet will not be complete without namespaces, CNI, and calico. We will start with Linux networking and cover other topics later.

This article is already too long, therefore divided the topics into several parts to ensure we give each topic time.

Topics — Part 1

  1. Linux namespaces
  2. Container networking
  3. What is CNI?
  4. Pod network namespace

Link: This article.

Topics — Part 2

  1. Calico CNI

Link: Life of a Packet in Kubernetes — Part 2

Topics — Part 3

  1. Pod-to-Pod
  2. Pod-to-External
  3. Pod-to-Service
  4. External-to-Pod
  5. External Traffic Policy
  6. Kube-Proxy
  7. iptable rules processing flow
  8. Network Policy basics

Link: Life of a Packet in Kubernetes — Part 3

Topics — Part 4

  1. Ingress Controller
  2. Ingress Resources Example
  3. Nginx
  4. Envoy+Contour
  5. Ingress with MetalLB

Link: Life of a Packet in Kubernetes — Part 4

Topics — Part 5

  1. ISTIO service mesh

Link: Comming soon…

Linux Namespaces

Linux namespaces comprise some of the fundamental technologies behind most modern-day container implementations. At a high level, they allow for the isolation of global system resources between independent processes. For example, the PID namespace isolates the process ID number space. This means that two processes running on the same host can have the same PID!

This level of isolation is clearly useful in the world of containers. Without namespaces, a process running in container A could, for example, umount an important filesystem in container B, or change the hostname of container C, or remove a network interface from container D. By namespacing these resources, the process in container A isn’t even aware that the processes in containers B, C, and D exist.

  1. Mount — isolate filesystem mount points
  2. UTS — isolate hostname and domain name
  3. IPC — isolate interprocess communication (IPC) resources
  4. PID — isolate the PID number space
  5. Network — isolate network interfaces
  6. User — isolate UID/GID number spaces
  7. Cgroup — isolate cgroup root directory

Most container implementations make use of the above namespaces in order to provide the highest level of isolation between separate container processes. Although note that the cgroup namespace is slightly more recent than the others and isn’t as widely used.

Container Networking (Network Namespace)

Before we jump into understanding the various options provided by CNI and Docker, let’s explore the core technology that powers container networking. The Linux kernel has various features that have been developed to provide multi-tenancy on hosts. Namespaces provide functionality that offers different kinds of isolation, with network namespace being the one that provides network isolation.

It’s very easy to create network namespaces using the ip command in any Linux operating system. Let’s create two different network namespaces and name them as client and server.

Create a veth pair to connect these network namespaces. Think of a veth pair as a network cable with connectors at both ends.

The veth pair (cable) exists on the host network namespace; Now let’s move the two ends of the veth pair to their respective namespaces that we created earlier.

Let’s verify the veth ends actually exist in the namespaces. We’ll start with the client namespace

Now let’s check the server namespace

Now let’s assign IP addresses to these interfaces and bring them up

Using the ping command, we can verify the two network namespaces have been connected and are reachable,

If we would like to create more network namespaces and connect them together, it might not be a scalable solution to create a veth pair for every combination of namespaces. Instead, one can create a Linux bridge and hook up these network namespaces to the bridge to get connectivity. And that’s exactly how Docker sets up networking between containers running on the same host!

Let’s create namespaces and attach the same to the bridge.

Using the ping command, we can verify the two network namespaces have been connected and are reachable,

Let’s ping the HOST_IP from the namespace,

Yes, ‘Network is unreachable’ because there is no route configured in the newly created namespaces. Let’s add the default route,

Now the ‘default’ route to reach the external network is the bridge, hence the namespaces can consume any external network services,

How to reach the private network from external servers?

As you can see, the machine we’re demoing on already have Docker installed, which has led to the creation of the docker0 bridge. It’s not easy to run a webserver from a network namespace context as all the Linux namespaces should interwork to simulate that scenario. Let’s use the Docker to simulate the scenario.

Let’s spin up an Nginx container now and inspect the same.

Since Docker doesn’t create the netns in the default location, ip netns list doesn’t show this network namespace. We can create a symlink to the expected location to overcome that limitation.

Let’s check the IP address in the web namespace,

Let’s check the IP address in the docker container,

It is very clear that the docker uses all the Linux namespaces and isolates things from the host. Let’s try to reach the WebApp that runs in the web network namespace from the HOST server.

Is it possible to reach this webserver from an external network? Yes, by adding port forwarding.

Let’s try to reach the webserver using the HOST IP address.

The CNI plugin executes the above commands (Not exactly, but similar) to set up the loopback interface, eth0, and assign the IP address to the container. The Container run time (i.e Kubernetes, PodMan, etc) makes use of CNIs to setup the POD network. Let’s discuss the CNI in the next section.

What is CNI?

A CNI plugin is responsible for inserting a network interface into the container network namespace (e.g. one end of a veth pair) and making any necessary changes on the host (e.g. attaching the other end of the veth into a bridge). It should then assign the IP to the interface and setup the routes consistent with the IP Address Management section by invoking the appropriate IPAM plugin.

Wait, Looks familiar? Yes, we have seen this in the “Container Networking” section.

CNI (Container Network Interface), a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted. Because of this focus, CNI has a wide range of support and the specification is simple to implement.

Note: The runtime can be anything — i.e Kubernetes, PodMan, cloud foundry, etc

Detailed CNI Specification

https://github.com/containernetworking/cni/blob/master/SPEC.md

Here are a couple of points I found interesting during my first read-through…

  • The spec defines a container as being a Linux network namespace. We should be comfortable with that definition as container runtimes like Docker create a new network namespace for each container.
  • Network definitions for CNI are stored as JSON files.
  • The network definitions are streamed to the plugin through STDIN; i.e there are no configuration files sitting on the host for the network configuration.
  • Other arguments are passed to the plugin via environmental variables
  • A CNI plugin is implemented as an executable.
  • The CNI plugin is responsible for wiring up the container. i.e it needs to do all the work to get the container on the network. In Docker, this would include connecting the container network namespace back to the host somehow.
  • The CNI plugin is responsible for IPAM which includes IP address assignment and installing any required routes.

Let’s try to simulate the Pod creation manually without Kubernetes and assign IP via CNI plugin instead of using too many CLI commands. At end of this demo, you will understand what is a Pod in Kubernetes.

Step1: Download the CNI plugins.

Step 2: Create a CNI configuration in a JSON format.

CNI Configuration parameters,

Step 3: Create a container with a ‘none’ network so that the container won’t have any IP address to use. You can use any image for this container, however, I’m using the ‘pause’ container to mimic Kubernetes.

Step 4: Invoke the CNI plugin with the CNI configuration file.

  • CNI_COMMAND=ADD — Action (Available values: ADD/DEL/CHECK)
  • CNI_CONTAINER=pause_demo — We’re telling CNI that the network namespace we want to work is called ‘pause_demo
  • CNI_NETNS=/var/run/netns/pause_demo— The path to the namespace in question
  • CNI_IFNAME=eth10— The name of the interface we wish to use on the container side of the connection
  • CNI_PATH=`pwd` — We always need to tell CNI where the plugin executables live. In this case, since we’re already in the ‘cni’ directory we just have the variable reference pwd (present working directory). You need the ticks around the command pwd for it to evaluate correctly. The formatting here seems to be removing them but they are in the command above correctly

I strongly recommend you to read the CNI specification to get more information about the plugin and their functions. You can use more than one plugin in the same JSON file to chain the actions; i.e add firewall rule etc.

step 5: Running the above command returns a couple of things. First — it returns an error since the IPAM driver can’t find the file it uses to store IP information locally. If we run this again for a different namespace, we wouldn’t get this error since the file is created the first time we run the plugin. The second thing we get is a JSON return indicating the relevant IP configuration that was configured by the plugin. In this case, the bridge itself should have received the IP address of 10.0.10.1/24 and the namespace interface would have received 10.0.10.2/24. It also added the default route and the 1.1.1.1/32 route that we defined in the network configuration JSON. Let’s check what it did,

CNI creates a bridge and configures it as per the configuration,

Step 6: Start a web server and share the ‘pause’ containers namespace.

Step 7: Browse the web page using pause containers IP address

Now let’s see the definition of a Pod in the following section.

Pod Network Namespace

The first thing to understand in Kubernetes is that a POD is not actually the equivalent to a container, but is a collection of containers. And all these containers of the same collection share a network stack. Kubernetes manages that by setting up the network itself on the pause container, which you will find for every pod you create. All other containers attach to the network of the pause container which itself does nothing but provide the network. Therefore, it is also possible for one container to talk to a service in a different container, which is in the same definition of the same pod, via the localhost.

References: