linux/include
Achiad Shochat 88a85f99e5 net/mlx5e: TX latency optimization to save DMA reads
A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
..
acpi
asm-generic mm: clean up per architecture MM hook header files 2015-07-17 16:39:53 -07:00
clocksource
crypto
drm
dt-bindings
keys
kvm
linux net/mlx5e: TX latency optimization to save DMA reads 2015-07-27 00:29:17 -07:00
math-emu
media
memory
misc
net ip_tunnel: Call ip_tunnel_core_init() from inet_init() 2015-07-23 01:28:21 -07:00
pcmcia
ras
rdma IB: Add rdma_cap_ib_switch helper and use where appropriate 2015-07-14 13:20:08 -04:00
rxrpc
scsi IB/srp: Avoid using uninitialized variable 2015-07-14 13:20:09 -04:00
soc
sound
target
trace
uapi lwtunnel: export linux/lwtunnel.h to userspace 2015-07-26 21:45:54 -07:00
video
xen
Kbuild