OpenShift Networking from a container/workload point of view – Part 2: Container Networking on an OpenShift Node

July 15, 2016

In OpenShift, networking is equally simple from a container point of view. Within the container’s namespace there is a eth0 network interface configured and services such as DNS just work. You can still use dedicated NICs on the host to isolate specific types of traffic. What’s the difference?

It turns out there is hardly any difference if you are using docker commands to launch a container (yes, that still works). The most notable difference might be that the bridge’s name used for container networking in that case is called lbr0 instead of docker0.

But what about Containers running within Kubernetes/OpenShift Pods? A Pod is a grouping of one or more containers between which the namespacing has been relaxed to a certain extent – i.e. all containers within a pod share the same view on the network interfaces, hostname, etc. This is the first difference: Two containers (linux processes) within a pod can simply use “localhost” to communicate with each other, which is a great enabler of interesting deployment patterns.

OpenShift promises that all pods in a namespace can talk to each other, no matter where they are running. This is achieved via the OpenShift SDN component, of which two flavors exist – one in which all pods in all namespaces are able to see each other on the network, and one where each namespace can act as a private network.

OpenShift SDN allocates a /16 subnet in the RFC1918 private IP address space (per default 10.1.0.0/16) as cluster network. Within this cluster network, each node is allocated a /24 subnet to allocate IP addresses to Pods. The node itself is allocated this subnet’s gateway address (e.g. 10.1.x.1), and the subnet’s broadcast address (10.1.x.255) cannot be assigned to neither pod nor node. This means that per default, a cluster is limited to 256 nodes and 254 pods per node (all of this is configurable).

After the subnet is allocated, Docker is responsiblefor allocating the IPs to the pods. OpenShift will then query what IP was assigned and lift the pod’s vethXXXX virtual device (the peer interface of the pod’s eth0) from the lbr0 bridge and connect it to an OpenVSwitch bridge device called “br0” instead.


This ovs bridge br0 is in turn connected to the lbr0 bridge via a virtual network device pair “vovsbr” and “vlinuxbr”, to be able to communicate with containers started via docker commands. It is also connected to the node’s host network via the virtual device tun0 to be able to communicate to the outside world. The device tun0 acts as default gateway for all OpenShift pods, whereas the lbr0 acts as default gateway for plain docker containers. They both therefore have the node’s subnet’s default gateway IP address assigned. This does not cause conflicts since containers and pods can only see their respective lbr0 or tun0 counterpart.

OpenShift SDN ensures that the ovs flows are configured correctly to ensure that packets destined for a pod running on the same node, for the physical network or for containers connected to the lbr0 bridge are switched to the appropriate endpoints. If the multitenant-capable SDN is selected, all packets are tagged via the kernel’s openflow data path with a namespace-specific virtual network ID (VNID) and OVS flow keys ensure that communication paths are segregated between namespaces.
Therefore OpenShift adds the following paths to the ones established for a plain Docker host:

  • From a container within a pod to another container within the same pod: Pod lo → Pod lo
  • Between pods on the same node: PodA eth0 → vethXXXX → (ovs) br0 → vethYYYY → PodB eth0
  • From a pod to a plain docker container on the same node: Pod eth0 → vethXXXX → ovs br0 → vovsbr → vlinuxbr → lbr0 → vethYYYY → Container eth0
  • From a plain docker container to a pod on the same node: Container eth0 → vethXXXX → lbr0 → vlinuxbr → vovsbr → br0 → vethYYYY → Pod eth0
  • Outbound from pod: Pod eth0 → vethXXXX → (ovs) br0 → tun0 → (IPTables NAT) → host network
  • There is still the capability to bind a container to a host port to allow inbound network traffic: host network → IPTables DNAT → tun0 → (ovs) br0 → vethXXXX → Pod eth0, but more common is to use the OpenShift Router to realize inbound traffic.

A practical example

The following (truncated) listing shows that an OpenShift node knows a number of additional network interfaces, among them (5) the ovs bridge device br0, (7) the docker bridge lbr0, (8) and (9) the vovsbr/vlinuxbr device pair and (10) the tun0 device. Note that (7) lbr0 and (10) tun0 share the same IP address.

[code language=”bash”][root@openshift ~]# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:12:96:98 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 77432sec preferred_lft 77432sec
inet6 fe80::a00:27ff:fe12:9698/64 scope link
valid_lft forever preferred_lft forever
3: ovs-system: mtu 1500 qdisc noop state DOWN
link/ether 0e:49:a7:f2:e2:7b brd ff:ff:ff:ff:ff:ff
5: br0: mtu 1450 qdisc noop state DOWN
link/ether ae:5f:43:6d:37:4d brd ff:ff:ff:ff:ff:ff
7: lbr0: mtu 1450 qdisc noqueue state UP
link/ether 8a:8a:b7:6b:5b:50 brd ff:ff:ff:ff:ff:ff
inet 10.1.0.1/24 scope global lbr0
valid_lft forever preferred_lft forever
inet6 fe80::d8a0:74ff:fe19:d4be/64 scope link
valid_lft forever preferred_lft forever
8: vovsbr@vlinuxbr: mtu 1450 qdisc pfifo_fast master ovs-system state UP
link/ether 76:49:cc:f0:08:42 brd ff:ff:ff:ff:ff:ff
inet6 fe80::7449:ccff:fef0:842/64 scope link
valid_lft forever preferred_lft forever
9: vlinuxbr@vovsbr: mtu 1450 qdisc pfifo_fast master lbr0 state UP
link/ether 8a:8a:b7:6b:5b:50 brd ff:ff:ff:ff:ff:ff
inet6 fe80::888a:b7ff:fe6b:5b50/64 scope link
valid_lft forever preferred_lft forever
10: tun0: mtu 1450 qdisc noqueue state UNKNOWN
link/ether ca:8e:2f:26:4f:bf brd ff:ff:ff:ff:ff:ff
inet 10.1.0.1/24 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::c88e:2fff:fe26:4fbf/64 scope link
valid_lft forever preferred_lft forever
12: vethd6edc06@if11: mtu 1450 qdisc noqueue master ovs-system state UP
link/ether 0e:04:d8:cd:3c:92 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::c04:d8ff:fecd:3c92/64 scope link
valid_lft forever preferred_lft forever
[…truncated…][/code]

Even though a number of pods (resulting in about 20 containers) are running on the node, no vethXXXX device is connected to the lbr0:

[code language=”bash”][root@openshift ~]# docker ps | wc -l
20
[root@openshift ~]# brctl show lbr0
bridge name bridge id STP enabled interfaces
lbr0 8000.8a8ab76b5b50 no vlinuxbr
[root@openshift ~]#[/code]

They are instead connected to the ovs bridge br0. Note that the number of devices is much lower than 20, since a pod hosts usually at least two containers (the pod controller and one or more workload containers).

[code language=”bash”][root@openshift ~]# ovs-vsctl show
7c0a94c8-63f5-4be9-bd31-fcd94a06cc47
Bridge "br0"
fail_mode: secure
Port vovsbr
Interface vovsbr
Port "veth1e54157"
Interface "veth1e54157"
Port "br0"
Interface "br0"
type: internal
Port "vethd43b712"
Interface "vethd43b712"
Port "veth175be20"
Interface "veth175be20"
Port "veth3516ca3"
Interface "veth3516ca3"
Port "vxlan0"
Interface "vxlan0"
type: vxlan
options: {key=flow, remote_ip=flow}
Port "veth6d98fc1"
Interface "veth6d98fc1"
Port "veth9a1ff7c"
Interface "veth9a1ff7c"
Port "vethb03e6a9"
Interface "vethb03e6a9"
Port "tun0"
Interface "tun0"
type: internal
Port "vethd6edc06"
Interface "vethd6edc06"
ovs_version: "2.4.0"
[root@openshift ~]#[/code]

Querying the interfaces from within a running pod, you can see it looks very much like the plain docker example. The service (the embedded docker registry) listens on a single port:

[code language=”bash”][root@openshift ~]# oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-3u208 1/1 Running 0 33d
image-registry-n2c5e 1/1 Running 0 33d
router-1-ms7i2 1/1 Running 0 33d
[root@openshift ~]# oc rsh image-registry-n2c5e
root@image-registry-n2c5e:/# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
11: eth0@if12: mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:01:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.1.0.2/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:aff:fe01:2/64 scope link
valid_lft forever preferred_lft forever
root@image-registry-n2c5e:/# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:5000 *:* LISTEN
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
root@image-registry-n2c5e:/#[/code]

The OpenShift pod can easily be accessed from a plain docker container running on the same host:

[code language=”bash”]
[root@openshift ~]# docker run -i -t centos /bin/sh
[…truncated…]
sh-4.2# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
29: eth0@if30: mtu 1450 qdisc noqueue state UP
link/ether 02:42:0a:01:00:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.1.0.11/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:aff:fe01:b/64 scope link
valid_lft forever preferred_lft forever
sh-4.2# ping -w 3 10.1.0.2
PING 10.1.0.2 (10.1.0.2) 56(84) bytes of data.
64 bytes from 10.1.0.2: icmp_seq=1 ttl=64 time=0.078 ms
64 bytes from 10.1.0.2: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 10.1.0.2: icmp_seq=3 ttl=64 time=0.094 ms
64 bytes from 10.1.0.2: icmp_seq=4 ttl=64 time=0.091 ms

— 10.1.0.2 ping statistics —
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.078/0.088/0.094/0.011 ms
sh-4.2#
[/code]

And vice versa:

[code language=”bash”]
root@image-registry-n2c5e:/# ping -w 3 10.1.0.11
PING 10.1.0.11 (10.1.0.11) 56(84) bytes of data.
64 bytes from 10.1.0.11: icmp_seq=1 ttl=64 time=0.276 ms
64 bytes from 10.1.0.11: icmp_seq=2 ttl=64 time=0.051 ms
64 bytes from 10.1.0.11: icmp_seq=3 ttl=64 time=0.073 ms

— 10.1.0.11 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.051/0.133/0.276/0.101 ms
root@image-registry-n2c5e:/#
[/code]

While this specific container remains attached to the lbr0 device:

[code language=”bash”]
[root@openshift ~]# brctl show lbr0
bridge name bridge id STP enabled interfaces
lbr0 8000.866a139a1945 no veth3bb4c4b
vlinuxbr
[root@openshift ~]#
[/code]

Leave a Reply

close

Subscribe to our newsletter.

Please select all the ways you would like to hear from Open Sourcerers:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our newsletter platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.