qemu/hw/net
Thomas Huth 831e882253 hw/net/spapr_llan: Fix receive buffer handling for better performance
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.

Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).

After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.

Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.

To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.

Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
..
fsl_etsec all: Clean up includes 2016-02-23 12:43:05 +00:00
rocker rocker: allow user to specify rocker world by property 2016-03-08 15:34:18 +08:00
allwinner_emac.c arm: Clean up includes 2016-01-29 15:07:23 +00:00
cadence_gem.c cadence_gem: fix buffer overflow 2016-02-04 13:22:06 +08:00
dp8393x.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
e1000.c e1000: eliminate infinite loops on out-of-bounds transfer start 2016-02-04 14:13:11 +08:00
e1000_regs.h e1000: Trivial implementation of various MAC registers 2015-11-12 15:26:53 +08:00
eepro100.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
etraxfs_eth.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
imx_fec.c i.MX: Add missing descriptions in devices. 2016-03-16 17:42:18 +00:00
lan9118.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
lance.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
Makefile.objs i.MX: Add FEC Ethernet Emulator 2015-09-07 10:39:30 +01:00
mcf_fec.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
milkymist-minimac2.c lm32: Clean up includes 2016-01-29 15:07:22 +00:00
mipsnet.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
ne2000-isa.c qom: Swap 'name' next to visitor in ObjectPropertyAccessor 2016-02-08 17:29:56 +01:00
ne2000.c net: ne2000: check ring buffer control registers 2016-03-08 15:34:09 +08:00
ne2000.h ne2000: Drop ne2000_can_receive 2015-09-02 14:51:07 +01:00
opencores_eth.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
pcnet-pci.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
pcnet.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
pcnet.h pcnet: Drop pcnet_can_receive 2015-07-27 14:12:18 +01:00
rtl8139.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
smc91c111.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
spapr_llan.c hw/net/spapr_llan: Fix receive buffer handling for better performance 2016-03-24 11:17:34 +11:00
stellaris_enet.c arm: Clean up includes 2016-01-29 15:07:23 +00:00
vhost_net.c vhost-user interrupt management fixes 2016-02-18 16:13:56 +02:00
virtio-net.c virtio-net: use the backend cross-endian capabilities 2016-02-16 12:05:17 +02:00
vmware_utils.h fpu: Replace uint8 typedef with uint8_t 2016-01-22 15:09:21 +00:00
vmxnet3.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
vmxnet3.h vmxnet3: Add support for VMXNET3_CMD_GET_ADAPTIVE_RING_INFO command 2015-10-12 13:19:29 +08:00
vmxnet_debug.h net/vmxnet3: fix debug macro pattern for vmxnet3 2016-01-11 11:01:34 +08:00
vmxnet_rx_pkt.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
vmxnet_rx_pkt.h net/vmxnet3: Refactor 'vmxnet_rx_pkt_attach_data' 2015-07-20 17:39:05 +01:00
vmxnet_tx_pkt.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
vmxnet_tx_pkt.h hw: move target-independent files to subdirectories 2013-04-08 18:13:12 +02:00
xen_nic.c xen: Clean up includes 2016-01-29 15:07:23 +00:00
xgmac.c arm: Clean up includes 2016-01-29 15:07:23 +00:00
xilinx_axienet.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00
xilinx_ethlite.c hw/net: Clean up includes 2016-01-29 15:07:23 +00:00