linux/include/net
Eric Dumazet 5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
..
9p 9p: Reduce object size with CONFIG_NET_9P_DEBUG 2012-01-05 10:51:44 -06:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-09-15 11:43:53 -04:00
caif caif-hsi: Remove use of module parameters 2012-06-25 16:44:12 -07:00
irda
iucv af_iucv: add shutdown for HS transport 2012-03-07 22:52:24 -08:00
netfilter netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
netns ipv6: add a new namespace for nf_conntrack_reasm 2012-09-19 17:23:28 -04:00
nfc netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
sctp sctp: fix a compile error in sctp.h 2012-08-15 03:43:43 -07:00
tc_act
act_api.h net: sched: constify tcf_proto and tc_action 2011-07-06 02:52:16 -07:00
addrconf.h netfilter: ip6tables: add MASQUERADE target 2012-08-30 03:00:18 +02:00
af_ieee802154.h
af_rxrpc.h
af_unix.h af_unix: speedup /proc/net/unix 2012-06-08 14:27:23 -07:00
ah.h
arp.h net: Dont use ifindices in hash fns 2012-08-09 16:18:06 -07:00
atmclip.h atm: clip: Use device neigh support on top of "arp_tbl". 2011-11-30 18:51:03 -05:00
ax25.h userns: Convert net/ax25 to use kuid_t where appropriate 2012-08-14 21:49:42 -07:00
ax88796.h
cfg80211-wext.h cfg80211: remove unused wext handler exports 2011-08-08 14:26:29 -04:00
cfg80211.h Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-09-14 13:53:49 -04:00
checksum.h net: core: add function for incremental IPv6 pseudo header checksum updates 2012-08-30 03:00:16 +02:00
cipso_ipv4.h cipso: handle CIPSO options correctly when NetLabel is disabled 2012-06-01 14:18:29 -04:00
cls_cgroup.h
codel.h codel: refine one condition to avoid a nul rec_inv_sqrt 2012-08-10 16:52:54 -07:00
compat.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
datalink.h
dcbevent.h dcb: Add stub routines for !CONFIG_DCB 2011-10-06 15:49:51 -04:00
dcbnl.h net/dcb: Add an optional max rate attribute 2012-04-05 05:08:04 -04:00
dn.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dn_dev.h
dn_fib.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dn_neigh.h
dn_nsp.h
dn_route.h decnet: Use neighbours privately in dn_route struct. 2012-07-05 01:12:14 -07:00
dsa.h dsa: Include linux/if_ether.h to fix build error 2011-12-01 11:41:06 -05:00
dsfield.h
dst.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
dst_ops.h net: Fix warnings in dst_ops.h 2012-07-19 10:43:03 -07:00
esp.h
ethoc.h
fib_rules.h ipv4: Elide fib_validate_source() completely when possible. 2012-06-29 01:36:36 -07:00
flow.h ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. 2012-07-20 13:36:54 -07:00
flow_keys.h flow_dissector: use a 64bit load/store 2011-11-29 13:17:03 -05:00
garp.h
gen_stats.h
genetlink.h netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
gre.h
icmp.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ieee80211_radiotap.h wireless: add radiotap A-MPDU status field 2012-08-20 13:53:09 +02:00
ieee802154.h 6LoWPAN: add fragmentation support 2011-11-14 00:19:42 -05:00
ieee802154_netdev.h mac802154: declare reduced mlme operations 2012-05-16 15:16:56 -04:00
if_inet6.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
inet6_connection_sock.h ipv6: Add helper inet6_csk_update_pmtu(). 2012-07-16 03:44:56 -07:00
inet6_hashtables.h ipv6: Early TCP socket demux 2012-07-26 15:50:39 -07:00
inet_common.h net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) 2012-07-19 11:02:03 -07:00
inet_connection_sock.h net: ipv6: fix TCP early demux 2012-08-06 13:33:21 -07:00
inet_ecn.h inet: add rfc 3168 extract in front of INET_ECN_encapsulate() 2011-10-22 01:25:23 -04:00
inet_frag.h ipv6: unify fragment thresh handling code 2012-09-19 17:23:28 -04:00
inet_hashtables.h ipv4: Early TCP socket demux. 2012-06-19 21:22:05 -07:00
inet_sock.h net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
inet_timewait_sock.h inet: remove rcu protection on tw_net 2011-12-14 13:34:55 -05:00
inetpeer.h ipv4: Maintain redirect and PMTU info in struct rtable again. 2012-07-10 22:40:14 -07:00
ip.h ipv4: fix path MTU discovery with connection tracking 2012-08-26 19:13:55 +02:00
ip6_checksum.h
ip6_fib.h ipv6: fix handling of blackhole and prohibit routes 2012-09-05 17:49:28 -04:00
ip6_route.h ipv6: fix inet6_csk_xmit() 2012-07-18 08:59:58 -07:00
ip6_tunnel.h gre: Support GRE over IPv6 2012-08-14 14:28:32 -07:00
ip_fib.h ipv4: Cache routes in nexthop exception entries. 2012-07-31 15:02:02 -07:00
ip_vs.h ipvs: add pmtu_disc option to disable IP DF for TUN packets 2012-08-10 10:35:07 +09:00
ipcomp.h
ipconfig.h
ipip.h tunnel: implement 64 bits statistics 2012-04-14 14:47:05 -04:00
ipv6.h ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline 2012-09-19 17:23:28 -04:00
ipx.h
iw_handler.h
lapb.h lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
lib80211.h include: replace linux/module.h with "struct module" wherever possible 2011-10-31 19:32:32 -04:00
llc.h llc: Remove stray reference to sysctl_llc_station_ack_timeout. 2012-09-17 13:13:24 -04:00
llc_c_ac.h
llc_c_ev.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
mac80211.h mac80211: add IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF 2012-08-20 13:58:23 +02:00
mac802154.h mac802154: add wpan device-class support 2012-06-26 21:06:11 -07:00
mip6.h
mld.h
ndisc.h net: Dont use ifindices in hash fns 2012-08-09 16:18:06 -07:00
neighbour.h net: output path optimizations 2012-08-07 16:24:55 -07:00
net_namespace.h ipv6: add a new namespace for nf_conntrack_reasm 2012-09-19 17:23:28 -04:00
net_ratelimit.h net: Kill ratelimit.h dependency in linux/net.h 2011-05-27 13:41:33 -04:00
netdma.h
netevent.h net: Pass neighbours and dest address into NETEVENT_REDIRECT events. 2012-07-05 02:21:55 -07:00
netlabel.h doc: Update the email address for Paul Moore in various source files 2011-08-01 17:58:33 -07:00
netlink.h netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
netprio_cgroup.h net: netprio_cgroup: rework update socket logic 2012-07-22 12:44:01 -07:00
netrom.h
nexthop.h
nl802154.h
p8022.h
ping.h
pkt_cls.h
pkt_sched.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
protocol.h ipv6: Early TCP socket demux 2012-07-26 15:50:39 -07:00
psnap.h
raw.h
rawv6.h ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
red.h net_sched: red: Make minor corrections to comments 2012-04-16 23:53:11 -04:00
regulatory.h cfg80211: add cellular base station regulatory hint support 2012-07-17 12:16:39 +02:00
request_sock.h tcp: TCP Fast Open Server - support TFO listeners 2012-08-31 20:02:19 -04:00
rose.h
route.h ipv4/route: arg delay is useless in rt_cache_flush() 2012-09-07 14:44:08 -04:00
rtnetlink.h rtnl: allow to specify different num for rx and tx queue count 2012-07-20 11:06:59 -07:00
sch_generic.h net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
scm.h net: Remove unnecessary NULL check in scm_destroy(). 2012-09-24 15:52:33 -04:00
secure_seq.h tcp: add const qualifiers where possible 2011-10-21 05:22:42 -04:00
slhc_vj.h
snmp.h net: avoid reloads in SNMP_UPD_PO_STATS 2012-08-06 13:40:47 -07:00
sock.h net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
stp.h
tcp.h tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS 2012-09-22 15:47:10 -04:00
tcp_memcontrol.h cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg 2012-04-10 10:04:07 -07:00
tcp_states.h
timewait_sock.h [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. 2012-06-09 14:56:12 -07:00
transp_v6.h net: relax PKTINFO non local ipv6 udp xmit check 2011-08-30 17:39:01 -04:00
udp.h net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6 2012-04-28 22:21:51 -04:00
udplite.h net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
wext.h
wimax.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wpan-phy.h mac802154: monitor device support 2012-05-16 15:17:08 -04:00
x25.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
x25device.h
xfrm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-09-15 11:43:53 -04:00