High-Performance Containerized Applications in Kubernetes

The Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device with multiple pods. SR-IOV enables you to segment a compliant network device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs), and make them available for direct IO to the POD.

You can use SR-IOV network devices with additional networks on your Kubernetes cluster for applications that require high bandwidth or low latency.

In this article, we will talk about how to set up the POD to use SR-IOV VF by using the device plugin and Multus. The assumption here is that the worker nodes have enough VFs assigned; and you are familiar with the SR-IOV concept. This OpenStack documentation will help to set up SR-IOV VFs.

A disadvantage of SR-IOV is that it has specific vendor hardware dependency. When used for CNFs, it is difficult to assign a network function that is fully portable from node to node unless the nodes have identical NICs.

Kubernetes currently supports the discovery of CPU and Memory primarily to a minimal extent. Very few devices are handled natively by Kubelet. It is not a sustainable solution to expect every hardware vendor to add their vendor-specific code inside Kubernetes to make their devices usable.

Requirements

  • I want to use a particular device type in my pod.
  • I should be able to use that device without writing custom Kubernetes code.
  • I want a consistent and portable solution to consume hardware devices across k8s clusters.

This can be achieved by using the “Device Plugin”. Instead of customizing the code for Kubernetes itself, vendors can implement a device plugin that you deploy either manually or as a DaemonSet. The targeted devices include GPUs, high-performance NICs, FPGAs, InfiniBand adapters, and other similar computing resources that may require vendor-specific initialization and setup

The SRIOV network device plugin is a Kubernetes device plugin for discovering and advertising SRIOV network virtual functions (VFs) in a Kubernetes host. To deploy workloads with SRIOV VF this plugin needs to work together with the CNI plugins.

How does it work?

Device plugins are simple gRPC servers that may run in a container deployed through the pod mechanism or in bare metal mode.

The device plugin is structured in 3 parts:

  1. Registration: The device plugin advertises its presence to Kubelet
  2. ListAndWatch: The device plugin advertises a list of Devices to Kubelet and sends it again if the state of a Device changes
  3. Allocate: When creating containers, Kubelet calls the device plugin’s Allocate function so that it can run device-specific instructions (GPU cleanup, QRNG initialization, ...) and instruct Kubelet how to make the device available in the container.

These servers implement the gRPC interface and once the device plugin makes itself known to kubelet, kubelet will interact with the device through two simple functions:

  1. A ListAndWatch function for the kubelet to discover the devices and their properties as well as notify of any status change (device became unhealthy).
  2. An Allocate function which is called before creating a user container consuming any exported devices

PC: Kubernetes Device Manager Proposal

Installation and Configuration

The end result will be similar to the following picture except for the SRIOV-CNI and the DPDK userspace.

PC: redhat.com

When setting up the cluster the admin knows what kind of devices are present on the different machines and therefore can select what devices to enable. The cluster-admin knows his cluster has Intel NICs therefore he deploys the device plugin that supports Intel installed NIC through: kubectl create -f deployments/k8s-v1.16/sriovdp-daemonset.yaml

Note: Make sure that the host-device plugin is installed which is a MUST to move an already-existing device into a container.

Supported SR-IOV NICs

The following NICs were tested with this implementation. However, other SR-IOV capable NICs should work as well.

  • Intel® Ethernet Controller X710 Series 4x10G — PF driver: v2.4.6 — VF driver: v3.5.6

please refer to the Intel download center for installing the latest Intel Ethernet Controller-X710-Series drivers

  • Intel® 82599ES 10 Gigabit Ethernet Controller
  • PF driver: v4.4.0-k
  • VF driver: v3.2.2-k

please refer to the Intel download center for installing latest Intel-® 82599ES 10 Gigabit Ethernet drivers

  • Mellanox ConnectX®-4 Lx EN Adapter
  • Mellanox ConnectX®-5 Adapter

Network card drivers are available as a part of the various Linux distributions and upstream. To download the latest Mellanox NIC drivers, click here.

Device Driver Information:

https://www.kernel.org/doc/Documentation/networking/device_drivers/

The SRIOV network device plugin needs the configuration to discover and advertise SRIOV network virtual functions (VFs) in a Kubernetes host. Please find the configmap as follows,

sriovdp-config.yaml (AWS)

# Vendor:Devices
# 1d0f:0ec2 - ENA PF
# 1d0f:1ec2 - ENA PF with LLQ support
# 1d0f:ec20 - ENA VF
# 1d0f:ec21 - ENA VF with LLQ support
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourceName": "amazon_ena",
"selectors": {
"vendors": ["1d0f"],
"devices": ["ec20"],
"drivers": ["ena"]
}
}
]
}

sriovdp-config.yaml (Intel)

apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourceName": "intel_sriov_netdevice",
"selectors": {
"vendors": ["8086"],
"devices": ["154c", "10ed"],
"drivers": ["i40evf", "ixgbevf"]
}
}
]
}

sriovdp-daemonset.yaml:

---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sriov-device-plugin
namespace: kube-system

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-sriov-device-plugin-amd64
namespace: kube-system
labels:
tier: node
app: sriovdp
spec:
selector:
matchLabels:
name: sriov-device-plugin
template:
metadata:
labels:
name: sriov-device-plugin
tier: node
app: sriovdp
spec:
hostNetwork: true
hostPID: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: sriov-device-plugin
containers:
- name: kube-sriovdp
image: nfvpe/sriov-device-plugin:v3.2
imagePullPolicy: IfNotPresent
args:
- --log-dir=sriovdp
- --log-level=10
securityContext:
privileged: true
volumeMounts:
- name: devicesock
mountPath: /var/lib/kubelet/
readOnly: false
- name: log
mountPath: /var/log
- name: config-volume
mountPath: /etc/pcidp
volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: config-volume
configMap:
name: sriovdp-config
items:
- key: config.json
path: config.json

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-sriov-device-plugin-ppc64le
namespace: kube-system
labels:
tier: node
app: sriovdp
spec:
selector:
matchLabels:
name: sriov-device-plugin
template:
metadata:
labels:
name: sriov-device-plugin
tier: node
app: sriovdp
spec:
hostNetwork: true
hostPID: true
nodeSelector:
beta.kubernetes.io/arch: ppc64le
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: sriov-device-plugin
containers:
- name: kube-sriovdp
image: nfvpe/sriov-device-plugin:ppc64le
imagePullPolicy: IfNotPresent
args:
- --log-dir=sriovdp
- --log-level=10
securityContext:
privileged: true
volumeMounts:
- name: devicesock
mountPath: /var/lib/kubelet/
readOnly: false
- name: log
mountPath: /var/log
- name: config-volume
mountPath: /etc/pcidp
volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: config-volume
configMap:
name: sriovdp-config
items:
- key: config.json
path: config.json
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-sriov-device-plugin-arm64
namespace: kube-system
labels:
tier: node
app: sriovdp
spec:
selector:
matchLabels:
name: sriov-device-plugin
template:
metadata:
labels:
name: sriov-device-plugin
tier: node
app: sriovdp
spec:
hostNetwork: true
hostPID: true
nodeSelector:
beta.kubernetes.io/arch: arm64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: sriov-device-plugin
containers:
- name: kube-sriovdp
image: alexeyperevalov/arm64-sriov-device-plugin
imagePullPolicy: IfNotPresent
args:
- --log-dir=sriovdp
- --log-level=10
securityContext:
privileged: true
volumeMounts:
- name: devicesock
mountPath: /var/lib/kubelet/
readOnly: false
- name: log
mountPath: /var/log
- name: config-volume
mountPath: /etc/pcidp
volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: config-volume
configMap:
name: sriovdp-config
items:
- key: config.json
path: config.json

The device plugin lands on all the nodes of the cluster and if it detects that there are no VFs it terminates (assuming restart: OnFailure). However, when there are VFs it reports them to Kubelet and starts its gRPC server to monitor devices and hook into the container creation process.

Devices reported by Device Plugins are advertised as Extended resources of the shape vendor-domain/vendor-device.

Install Multus CNI

Multus CNI is a container network interface (CNI) plugin for Kubernetes that enables attaching multiple network interfaces to pods. Typically, in Kubernetes, each pod only has one network interface (apart from a loopback) — with Multus you can create a multi-homed pod that has multiple interfaces. This is accomplished by Multus acting as a “meta-plugin”, a CNI plugin that can call multiple other CNI plugins.

Kubernetes Deployment

Deploy the application with annotations so that the discovered VF will be attached to the POD.

apiVersion: v1
kind: Namespace
metadata:
name: hpsample
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-dev1
namespace: hpsample
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/amazon_ena
spec:
config: |
{
"type": "host-device",
"cniVersion": "0.3.0",
"name": "sriov-network",
"ipam": {}
}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: hp-deployment-v1
namespace: hpsample
spec:
replicas: 1
template:
metadata:
labels:
name: hp-deployment
version: v1
annotations:
k8s.v1.cni.cncf.io/networks: hpsample/sriov-dev1
spec:
containers:
- name: hpcontainer
image: busybox:1.28
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
resources:
requests:
intel.com/amazon_ena: '1'
limits:
intel.com/amazon_ena: '1'
securityContext:
privileged: true

References

https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/

https://github.com/containernetworking/plugins

https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin

https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/blob/master/docs/vf-setup.md

https://github.com/kubernetes/community/tree/master/contributors/design-proposals

https://docs.openstack.org/mitaka/networking-guide/config-sriov.html

https://docs.openshift.com/container-platform

https://github.com/Mellanox/docker-sriov-plugin

https://www.openness.org/docs/doc/enhanced-platform-awareness/openness-sriov-multiple-interfaces