benchs netperf

Introduction

The network namespace from the patchset has been benchmarked with tbench and netperf. The tests are simple and just try to measure the impact of the network virtualization on the TCP throughtput and the CPU usage overhead.

Two kernels were tested with several configuration:

  • 2.6.20 for the reference values
  • 2.6.20-lxc8
    1. network namespace compiled out (CONFIG_NET_NS=no)
    2. network namespace compiled in (CONFIG_NET_NS=yes)
      1. without container
      2. inside a container with a real network device
      3. inside a container with ip_forward, route and etun
      4. inside a container with a bridge and etun

Each benchmarking has been done with 2 machines running netperf and
tbench. A dedicated machine with a RH4 kernel run the bench servers.

For each bench, netperf and tbench, the tests are ran on:

  1. Intel Xeon EM64T, Bi-processor 2,8GHz with hyperthreading activated, 4GB of RAM and Gigabyte NIC (tg3)
  2. AMD Athlon MP 1800+, Bi-processor 1,5GHz, 1GB of RAM and Gigabyte NIC (dl2000)

Each tests are run on these machines in order to have a CPU relative overhead.

Vanilla 2.6.20

Netperf CPU usage (%) Throughput (Mbits/s) Service Demand (us/KB)
on xeon 5.99 941.38 2.084
on athlon 28.17 844.82 5.462

Tbench Throughput (Mbits/s)
on xeon 66.35
on athlon 65.31

lxc 2.6.20-lxc8

With net_ns compiled out

Netperf CPU usage (%) / overhead Throughput (Mbits/s) / changed Service Demand (us/KB)
on xeon 6.04 / +0.8 % 941.33 / 0 % 2.104
on athlon 28.45 / +1 % 840.76 / -0.5 % 5.545

Tbench Throughput (Mbits/s) / changed
on xeon 65.69 / -1 %
on athlon 65.35 / -0.2 %

Observation : no noticeable overhead

With net_ns compiled in

Without container

Netperf CPU usage (%) / overhead Throughput (Mbits/s) / changed Service Demand (us/KB)
on xeon 6.02 / +0.5 % 941.34 / 0 % 2.097
on athlon 27.93 / -0.8 % 833.53 / -1.3 % 5.490

Tbench Throughput (Mbits/s) / changed
on xeon 66.00 / -0.5 %
on athlon 64.94 / -0.9 %

Observation : no noticeable overhead

Inside the container with real device

Netperf CPU usage (%) / overhead Throughput (Mbits/s) / changed Service Demand (us/KB)
on xeon 5.60 / -6.5 % 941.42 / 0 % 1.949
on athlon 27.73 / -1.5 % 835.11 / +1.5 % 5.440

Tbench Throughput (Mbits/s) / changed
on xeon 74.36 / +12 %
on athlon 70.87 / +8.2 %

Observation : no noticeable overhead. The network interface is only
used by the container, so I guess it does not interact with another
network traffic and that explains the performances are better.

Inside the container with etun and routes

Netperf CPU usage (%) / overhead Throughput (Mbits/s) / changed Service Demand (us/KB)
on xeon 16.25 / +171 % 941.31 / 0 % 5.657
on athlon 49.99 / +77 % 828.94 / -1.9 % 9.880

Tbench Throughput (Mbits/s) / changed
on xeon 65.61 / -1.1 %
on athlon 62.58 / -4.5 %

Observation : The CPU overhead is very big. Throughput is a little
impacted on the less powerful machine

Inside the container with etun and bridge

Netperf CPU usage (%) / overhead Throughput (Mbits/s) / changed Service Demand (us/KB)
on xeon 18.39 / +207 % 941.30 / 0 % 6.400
on athlon 49.94 / +77 % 823.75 / -2.5 % 9.933

Tbench Throughput (Mbits/s) / changed
on xeon 66.52 / +0.2 %
on athlon 61.07 / -6.8 %

Observation : The CPU overhead is very big. Throughput is a little
impacted on the less powerful machine

General observations

The objective to have no performances degrations, when the network
namespace is off in the kernel, is reached.

When the network is used outside the container and the network
namespace are compiled in, there is no performance degradations.

The patchset allows to move network devices between namespaces and
this is clearly a good feature. This helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container.

The loss of performances is very noticeable inside the container and
seems to be directly related to the usage of the pair device and the
specific network configuration needed for the container. When the
packets are sent by the container, the mac address is for the pair
device but the IP address is not owned by the host. That directly
implies to have the host to act as a router and the packets to be
forwarded. That adds a noticeable overhead.

A hack has been made in the ip_forward function to avoid useless skb_cow when using the pair device/tunnel device and the overhead is reduced by the half. When the bridge configuration is used and the CONFIG_BRIDGE_NETFILER is off, the CPU overhead is significantly reduced by the half.

One Comment

  1. admin says:

    The related patches associated to these old tests on 2.6.20, are not available anymore