Commit graph

548224 commits

Author SHA1 Message Date
David S. Miller
21c4c073f1 Revert "regmap: Allow installing custom reg_update_bits function"
This reverts commit 7741c373cf.
2015-10-06 06:25:43 -07:00
David S. Miller
6a27a6c3be Revert "net: Microchip encx24j600 driver"
This reverts commit 04fbfce7a2.
2015-10-06 06:25:36 -07:00
David S. Miller
c664bc6d94 Revert "net: encx24j600_exit() can be static"
This reverts commit 9886ce2b9d.
2015-10-06 06:25:29 -07:00
Peter Nørlund
0a837fe472 ipv4: Fix compilation errors in fib_rebalance
This fixes

net/built-in.o: In function `fib_rebalance':
fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3'

and

net/built-in.o: In function `fib_rebalance':
net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod'

Fixes: 0e884c78ee ("ipv4: L3 hash-based multipath")

Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 23:48:09 -07:00
Santosh Shilimkar
0676651323 RDS: IB: split mr pool to improve 8K messages performance
8K message sizes are pretty important usecase for RDS current
workloads so we make provison to have 8K mrs available from the pool.
Based on number of SG's in the RDS message, we pick a pool to use.

Also to make sure that we don't under utlise mrs when say 8k messages
are dominating which could lead to 8k pull being exhausted, we fall-back
to 1m pool till 8k pool recovers for use.

This helps to at least push ~55 kB/s bidirectional data which
is a nice improvement.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
41a4e96462 RDS: IB: use max_mr from HCA caps than max_fmr
All HCA drivers seems to popullate max_mr caps and few of
them do both max_mr and max_fmr.

Hence update RDS code to make use of max_mr.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
67161e250a RDS: IB: mark rds_ib_fmr_wq static
Fix below warning by marking rds_ib_fmr_wq static

net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static?

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
26139dc1db RDS: IB: use already available pool handle from ibmr
rds_ib_mr already keeps the pool handle which it associates
with. Lets use that instead of round about way of fetching
it from rds_ib_device.

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
2e1d6b813a RDS: IB: fix the rds_ib_fmr_wq kick call
RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need
to use queue_delayed_work() to kick the work. This was hurting
the performance since pool maintenance was less often triggered
from other path.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
9441c973e1 RDS: IB: handle rds_ibdev release case instead of crashing the kernel
Just in case we are still handling the QP receive completion while the
rds_ibdev is released, drop the connection instead of crashing the kernel.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
0c28c04500 RDS: IB: split send completion handling and do batch ack
Similar to what we did with receive CQ completion handling, we split
the transmit completion handler so that it lets us implement batched
work completion handling.

We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
identify the send vs receive completion event handler invocation.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
f4f943c958 RDS: IB: ack more receive completions to improve performance
For better performance, we split the receive completion IRQ handler. That
lets us acknowledge several WCE events in one call. We also limit the WC
to max 32 to avoid latency. Acknowledging several completions in one call
instead of several calls each time will provide better performance since
less mutual exclusion locks are being performed.

In next patch, send completion is also split which re-uses the poll_cq()
and hence the code is moved to ib_cm.c

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
db6526dcb5 RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULL
In Transport indepedent rds_sendmsg(), we shouldn't make decisions based
on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based
transports. We can safely issue rds_send_xmit() and the using its
return value take decision on deferred work. This will also fix
the scenario where at times we are seeing connections stuck with
the LL_SEND_FULL bit getting set and never cleared.

We kick krdsd after any time we see -ENOMEM or -EAGAIN from the
ring allocation code.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
4bebdd7a4d RDS: defer the over_batch work to send worker
Current process gives up if its send work over the batch limit.
The work queue will get  kicked to finish off any other requests.
This fixes remainder condition from commit 443be0e5af ("RDS: make
sure not to loop forever inside rds_send_xmit").

The restart condition is only for the case where we reached to
over_batch code for some other reason so just retrying again
before giving up.

While at it, make sure we use already available 'send_batch_count'
parameter instead of magic value. The batch count threshold value
of 1024 came via commit 443be0e5af ("RDS: make sure not to loop
forever inside rds_send_xmit"). The idea is to process as big a
batch as we can but at the same time we don't hold other waiting
processes for send. Hence back-off after the send_batch_count
limit (1024) to avoid soft-lock ups.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:18:45 -07:00
Andrzej Hajda
5edfcee5ed mac80211: make ieee80211_new_mesh_header return unsigned
The function returns always non-negative values.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-05 17:54:16 +02:00
Daniel Borkmann
0cdf5640e4 ebpf: include perf_event only where really needed
Commit ea317b267e ("bpf: Add new bpf map type to store the pointer
to struct perf_event") added perf_event.h to the main eBPF header, so
it gets included for all users. perf_event.h is actually only needed
from array map side, so lets sanitize this a bit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 07:04:08 -07:00
Nicolas Schichan
4560cdff03 ARM: net: support BPF_ALU | BPF_MOD instructions in the BPF JIT.
For ARMv7 with UDIV instruction support, generate an UDIV instruction
followed by an MLS instruction.

For other ARM variants, generate code calling a C wrapper similar to
the jit_udiv() function used for BPF_ALU | BPF_DIV instructions.

Some performance numbers reported by the test_bpf module (the duration
per filter run is reported in nanoseconds, between "jitted:<x>" and
"PASS":

ARMv7 QEMU nojit:	test_bpf: #3 DIV_MOD_KX jited:0 2196 PASS
ARMv7 QEMU jit:		test_bpf: #3 DIV_MOD_KX jited:1 104 PASS
ARMv5 QEMU nojit:	test_bpf: #3 DIV_MOD_KX jited:0 2176 PASS
ARMv5 QEMU jit:		test_bpf: #3 DIV_MOD_KX jited:1 1104 PASS
ARMv5 kirkwood nojit:	test_bpf: #3 DIV_MOD_KX jited:0 1103 PASS
ARMv5 kirkwood jit:	test_bpf: #3 DIV_MOD_KX jited:1 311 PASS

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 07:02:42 -07:00
David S. Miller
df7b601542 Merge branch 'asix-rx-mem-handling'
Mark Craske says:

====================
Improve ASIX RX memory allocation error handling

The ASIX RX handler algorithm is weak on error handling.
There is a design flaw in the ASIX RX handler algorithm because the
implementation for handling RX Ethernet frames for the DUB-E100 C1 can
have Ethernet frames spanning multiple URBs. This means that payload data
from more than 1 URB is sometimes needed to fill the socket buffer with a
complete Ethernet frame. When the URB with the start of an Ethernet frame
is received then an attempt is made to allocate a socket buffer. If the
memory allocation fails then the algorithm sets the buffer pointer member
to NULL and the function exits (no crash yet). Subsequently, the RX hander
is called again to process the next URB which assumes there is a socket
buffer available and the kernel crashes when there is no buffer.

This patchset implements an improvement to the RX handling algorithm to
avoid a crash when no memory is available for the socket buffer.

The patchset will apply cleanly to the net-next master branch but the
created kernel has not been tested. The driver was tested on ARM kernels
v3.8 and v3.14 for a commercial product.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:58:51 -07:00
Dean Jenkins
6a570814cd asix: Continue processing URB if no RX netdev buffer
Avoid a loss of synchronisation of the Ethernet Data header 32-bit
word due to a failure to get a netdev socket buffer.

The ASIX RX handling algorithm returned 0 upon a failure to get
an allocation of a netdev socket buffer. This causes the URB
processing to stop which potentially causes a loss of synchronisation
with the Ethernet Data header 32-bit word. Therefore, subsequent
processing of URBs may be rejected due to a loss of synchronisation.
This may cause additional good Ethernet frames to be discarded
along with outputting of synchronisation error messages.

Implement a solution which checks whether a netdev socket buffer
has been allocated before trying to copy the Ethernet frame into
the netdev socket buffer. But continue to process the URB so that
synchronisation is maintained. Therefore, only a single Ethernet
frame is discarded when no netdev socket buffer is available.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Mark Craske <Mark_Craske@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:58:43 -07:00
Dean Jenkins
3f30b158eb asix: On RX avoid creating bad Ethernet frames
When RX Ethernet frames span multiple URB socket buffers,
the data stream may suffer a discontinuity which will cause
the current Ethernet frame in the netdev socket buffer
to be incomplete. This frame needs to be discarded instead
of appending unrelated data from the current URB socket buffer
to the Ethernet frame in the netdev socket buffer. This avoids
creating a corrupted Ethernet frame in the netdev socket buffer.

A discontinuity can occur when the previous URB socket buffer
held an incomplete Ethernet frame due to truncation or a
URB socket buffer containing the end of the Ethernet frame
was missing.

Therefore, add a sanity test for when an Ethernet frame
spans multiple URB socket buffers to check that the remaining
bytes of the currently received Ethernet frame point to
a good Data header 32-bit word of the next Ethernet
frame. Upon error, reset the remaining bytes variable to
zero and discard the current netdev socket buffer.
Assume that the Data header is located at the start of
the current socket buffer and attempt to process the next
Ethernet frame from there. This avoids unnecessarily
discarding a good URB socket buffer that contains a new
Ethernet frame.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Mark Craske <Mark_Craske@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:58:43 -07:00
Dean Jenkins
9a5ccd8e03 asix: Simplify asix_rx_fixup_internal() netdev alloc
The code is checking that the Ethernet frame will fit into a
netdev allocated socket buffer within the constraints of MTU size,
Ethernet header length plus VLAN header length.

The original code was checking rx->remaining each loop of the while
loop that processes multiple Ethernet frames per URB and/or Ethernet
frames that span across URBs. rx->remaining decreases per while loop
so there is no point in potentially checking multiple times that the
Ethernet frame (remaining part) will fit into the netdev socket buffer.

The modification checks that the size of the Ethernet frame will fit
the netdev socket buffer before allocating the netdev socket buffer.
This avoids grabbing memory and then deciding that the Ethernet frame
is too big and then freeing the memory.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Mark Craske <Mark_Craske@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:58:41 -07:00
Dean Jenkins
3bfc69abf8 asix: Tidy-up 32-bit header word synchronisation
Tidy-up the Data header 32-bit word synchronisation logic in
asix_rx_fixup_internal() by removing redundant logic tests.

The code is looking at the following cases of the Data header
32-bit word that is present before each Ethernet frame:

a) all 32 bits of the Data header word are in the URB socket buffer
b) first 16 bits of the Data header word are at the end of the URB
   socket buffer
c) last 16 bits of the Data header word are at the start of the URB
   socket buffer eg. split_head = true

Note that the lifetime of rx->split_head exists outside of the
function call and is accessed per processing of each URB. Therefore,
split_head being true acts on the next URB to be processed.

To check for b) the offset will be 16 bits (2 bytes) from the end of
the buffer then indicate split_head is true.
To check for c) split_head must be true because the first 16 bits
have been found.
To check for a) else c)

Note that the || logic of the old code included the state
(skb->len - offset == sizeof(u16) && rx->split_head) which is not
possible because the split_head cannot be true whilst checking for b).
This is because the split_head indicates that the first 16 bits have
been found and that is not possible whilst checking for the first 16
bits. Therefore simplify the logic.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Mark Craske <Mark_Craske@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:58:40 -07:00
Dean Jenkins
7b0378f517 asix: Rename remaining and size for clarity
The Data header synchronisation is easier to understand
if the variables "remaining" and "size" are renamed.

Therefore, the lifetime of the "remaining" variable exists
outside of asix_rx_fixup_internal() and is used to indicate
any remaining pending bytes of the Ethernet frame that need
to be obtained from the next socket buffer. This allows an
Ethernet frame to span across multiple socket buffers.

"size" is now local to asix_rx_fixup_internal() and contains
the size read from the Data header 32-bit word.

Add "copy_length" to hold the number of the Ethernet frame
bytes (maybe a part of a full frame) that are to be copied
out of the socket buffer.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Mark Craske <Mark_Craske@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:58:38 -07:00
Daniel Borkmann
bab1899187 bpf, seccomp: prepare for upcoming criu support
The current ongoing effort to dump existing cBPF seccomp filters back
to user space requires to hold the pre-transformed instructions like
we do in case of socket filters from sk_attach_filter() side, so they
can be reloaded in original form at a later point in time by utilities
such as criu.

To prepare for this, simply extend the bpf_prog_create_from_user()
API to hold a flag that tells whether we should store the original
or not. Also, fanout filters could make use of that in future for
things like diag. While fanout filters already use bpf_prog_destroy(),
move seccomp over to them as well to handle original programs when
present.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:47:05 -07:00
WANG Cong
0a15afd2ea vrf: fix a kernel warning
This fixes:

 tried to remove device ip6gre0 from (null)
 ------------[ cut here ]------------
 kernel BUG at net/core/dev.c:5219!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
 CPU: 3 PID: 161 Comm: kworker/u8:2 Not tainted 4.3.0-rc2+ #1142
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: netns cleanup_net
 task: ffff8800d784a9c0 ti: ffff8800d74a4000 task.ti: ffff8800d74a4000
 RIP: 0010:[<ffffffff817f0797>]  [<ffffffff817f0797>] __netdev_adjacent_dev_remove+0x40/0xec
 RSP: 0018:ffff8800d74a7a98  EFLAGS: 00010282
 RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
 RDX: ffff88011adcf701 RSI: ffff88011adccbf8 RDI: ffff88011adccbf8
 RBP: ffff8800d74a7ab8 R08: 0000000000000001 R09: 0000000000000000
 R10: ffffffff81d190ff R11: 00000000ffffffff R12: ffff8800d599e7c0
 R13: 0000000000000000 R14: ffff8800d599e890 R15: ffffffff82385e00
 FS:  0000000000000000(0000) GS:ffff88011ac00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00007ffd6f003000 CR3: 000000000220c000 CR4: 00000000000006e0
 Stack:
  0000000000000000 ffff8800d599e7c0 0000000000000b00 ffff8800d599e8a0
  ffff8800d74a7ad8 ffffffff817f0861 0000000000000000 ffff8800d599e7c0
  ffff8800d74a7af8 ffffffff817f088f 0000000000000000 ffff8800d599e7c0
 Call Trace:
  [<ffffffff817f0861>] __netdev_adjacent_dev_unlink+0x1e/0x35
  [<ffffffff817f088f>] __netdev_adjacent_dev_unlink_neighbour+0x17/0x41
  [<ffffffff817f56e6>] netdev_upper_dev_unlink+0x6c/0x13d
  [<ffffffff81674a3d>] vrf_del_slave+0x26/0x7d
  [<ffffffff81674ac3>] vrf_device_event+0x2f/0x34
  [<ffffffff81098c40>] notifier_call_chain+0x75/0x9c
  [<ffffffff81098fa2>] raw_notifier_call_chain+0x14/0x16
  [<ffffffff817ee129>] call_netdevice_notifiers_info+0x52/0x59
  [<ffffffff817f179d>] call_netdevice_notifiers+0x13/0x15
  [<ffffffff817f6f18>] rollback_registered_many+0x14f/0x24f
  [<ffffffff817f70f2>] unregister_netdevice_many+0x19/0x64
  [<ffffffff819a2455>] ip6gre_exit_net+0x163/0x177
  [<ffffffff817eb019>] ops_exit_list+0x44/0x55
  [<ffffffff817ebcb7>] cleanup_net+0x193/0x226
  [<ffffffff81091e1c>] process_one_work+0x26c/0x4d8
  [<ffffffff81091d20>] ? process_one_work+0x170/0x4d8
  [<ffffffff81092296>] worker_thread+0x1df/0x2c2
  [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f
  [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f
  [<ffffffff81097a20>] kthread+0xd4/0xdc
  [<ffffffff810bc523>] ? trace_hardirqs_on_caller+0x17d/0x199
  [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83
  [<ffffffff81a5240f>] ret_from_fork+0x3f/0x70
  [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83

Fixes: 93a7e7e837 ("net: Remove the now unused vrf_ptr")
Cc: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:35:51 -07:00
kbuild test robot
9886ce2b9d net: encx24j600_exit() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 04:02:43 -07:00
Jon Ringle
04fbfce7a2 net: Microchip encx24j600 driver
This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet
controller over a SPI bus interface. This driver makes use of the regmap API to
optimize access to registers by caching registers where possible.

Datasheet:
http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdf

Signed-off-by: Jon Ringle <jringle@gridpoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 04:02:41 -07:00
Jon Ringle
7741c373cf regmap: Allow installing custom reg_update_bits function
This commit allows installing a custom reg_update_bits function for cases where
the hardware provides a mechanism to set or clear register bits without a
read/modify/write cycle. Such is the case with the Microchip ENCX24J600.

Signed-off-by: Jon Ringle <jringle@gridpoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 04:02:40 -07:00
Govindarajulu Varadarajan
937317c7c1 enic: do hang reset only in case of tx timeout
The current code invokes hang reset in case of error interrupt. We should
hang reset only in case of tx timeout. This because of the way hang reset
is implemented in firmware. Hang reset takes more firmware resources than
soft reset. Adaptor does not generate error interrupt in case of tx
timeout.

Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset
otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:51:35 -07:00
Govindarajulu Varadarajan
cc809237e1 enic: handle spurious error interrupt
Some of the enic adaptors are know to generate spurious interrupts. When
error interrupt is generated, driver just resets the device. This patch
resets the device only when an error is occurred.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:51:33 -07:00
David S. Miller
2905f5bb1c Merge branch 'cxgb4-next'
Hariprasad Shenai says:

====================
cxgb4: Trivial fixes for cxgb4

Fixes the following issues
Don't read non existent T4/T5/T6 adapter registers for ethtool dump.
For T4, dont read mailbox control registers. Adds new devlog faility and
report correct link speed for unsupported ones.

This patch series has been created against net-next tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:48:45 -07:00
Hariprasad Shenai
85412255ef cxgb4: Report correct link speed for unsupported ones
When we get garbage from the firmware with weird Port Speeds,
etc. we should emit a warning regarding unsupported speeds rather than
use the bogus default of "10Mbps" which isn't even an option in the
firmware Port Information message

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:48:41 -07:00
Hariprasad Shenai
da4976e17b cxgb4: Adds a new Device Log Facility FW_DEVLOG_FACILITY_CF
The firmware team added a new Device Log Facility FW_DEVLOG_FACILITY_CF,
but the driver has been decoding Device Log messages with that Facility as
"(NULL)", fixing it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:48:41 -07:00
Hariprasad Shenai
b3695540ba cxgb4: For T4, don't read the Firmware Mailbox Control register
T4 doesn't have the Shadow copy of the register which we can read without
side effect. So don't read mbox control register for T4 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:48:40 -07:00
Hariprasad Shenai
8119c01800 cxgb4 : Update T4/T5/T6 register ranges
Update T4/T5/T6 adapter register ranges so that it doesn't read non
existent registers when dumped using ethtool

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:48:39 -07:00
David S. Miller
40e106801e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next
Eric W. Biederman says:

====================
net: Pass net through ip fragmention

This is the next installment of my work to pass struct net through the
output path so the code does not need to guess how to figure out which
network namespace it is in, and ultimately routes can have output
devices in another network namespace.

This round focuses on passing net through ip fragmentation which we seem
to call from about everywhere.  That is the main ip output paths, the
bridge netfilter code, and openvswitch.  This has to happend at once
accross the tree as function pointers are involved.

First some prep work is done, then ipv4 and ipv6 are converted and then
temporary helper functions are removed.
====================

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:39:31 -07:00
David S. Miller
7e2832f17f Merge branch 'rds-perf'
Sowmini Varadhan says:

====================
RDS: RDS-TCP perf enhancements

A 3-part patchset that (a) improves current RDS-TCP perf
by 2X-3X and (b) refactors earlier robustness code for
better observability/scaling.

Patch 1 is an enhancment of earlier robustness fixes
that had used separate sockets for client and server endpoints to
resolve race conditions. It is possible to have an equivalent
solution that does not use 2 sockets. The benefit of a
single socket solution is that it results in more predictable
and observable behavior for the underlying TCP pipe of an
RDS connection

Patches 2 and 3 are simple, straightforward perf bug fixes
that align the RDS TCP socket with other parts of the kernel stack.

v2: fix kbuild-test-robot warnings, comments from  Sergei Shtylov
    and Santosh Shilimkar.
====================

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:35:29 -07:00
Sowmini Varadhan
76b29ef120 RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit
For the same reasons as commit 2f53384424 ("tcp: allow splice() to
build full TSO packets") and commit 35f9c09fe9 ("tcp: tcp_sendpages()
should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
tcp_sendpage()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:34:53 -07:00
Sowmini Varadhan
1edd6a14d2 RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.

In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:34:53 -07:00
Sowmini Varadhan
3b20fc3897 RDS: Use a single TCP socket for both send and receive.
Commit f711a6ae06 ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port.  The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.

However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().

A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection attempt.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:34:51 -07:00
David S. Miller
393159e917 Merge branch 'xgbe-next'
Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2015-09-30

The following patches are included in this driver update series:

- Remove unneeded semi-colon
- Follow the DT/ACPI precedence used by the device_ APIs
- Add ethtool support for getting and setting the msglevel
- Add ethtool support error and debug messages
- Simplify the hardware FIFO assignment calculations
- Add receive buffer unavailable statistic
- Use the device workqueue instead of the system workqueue
- Remove the use of a link state bit

This patch series is based on net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:40 -07:00
Lendacky, Thomas
50789845cf amd-xgbe: Remove the XGBE_LINK state bit
The XGBE_LINK bit is used just to determine whether to call the
netif_carrier_on/off functions. Rather than define and use this bit,
just call the functions. The netif_carrier_ok function can be used in
place of checking the XGBE_LINK bit in the future.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:27 -07:00
Lendacky, Thomas
afb43e8a0a amd-xgbe: Use device workqueue instead of system workqueue
The driver creates, flushes and destroys a device workqueue but queues
work to the system workqueue. Switch from using the system workqueue to
the device workqueue.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:26 -07:00
Lendacky, Thomas
72c9ac4e1f amd-xgbe: Add receive buffer unavailable statistic
Add a statistic that tracks how many times an interrupt is generated for
a receive buffer not being available to the hardware which prevents the
hardware from being able to DMA the received data.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:26 -07:00
Lendacky, Thomas
9c439e4b73 amd-xgbe: Simplify calculation and setting of queue fifos
The calculation of the Tx and Rx fifo sizes can be calculated rather
than hardcoded in a switch statement. Additionally, the per-queue fifo
sizes can be calculated rather than hardcoded using if/else if statements
that can possibly underutilize the available fifo area.

Change the code to calculate the fifo sizes and the per-queue fifo sizes
to simplify the code and make best use of the available fifo.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:25 -07:00
Lendacky, Thomas
e5dd8b8110 amd-xgbe: Add ethtool error and debug messages
Add error and dynamic debug messages to various ethtool functions in
the driver while also removing the DBGPR debug print calls. Also, change
the message level for some error messages from alert to err.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:25 -07:00
Lendacky, Thomas
349fb2d700 amd-xgbe: Add ethtool support for setting the msglevel
Provide the ethtool functions to support getting and setting the
msglevel for the driver.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:23 -07:00
Lendacky, Thomas
47f2e6c275 amd-xgbe: Use proper DT / ACPI precedence checking
Device tree presence takes precedence over ACPI in the device_* APIs.
The amd-xgbe driver should follow the same precedence. Update the check
on whether to use DT / ACPI to follow this.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:22 -07:00
Lendacky, Thomas
3947d78a54 amd-xgbe: Remove an unneeded semicolon on a switch statement
Remove an unneeded semicolon at the end of a switch statement block.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:23:22 -07:00
Eric Dumazet
ac8cfc7bb8 tcp: restore fastopen operations
I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc()
while max_qlen can be set before listen() is called,
using TCP_FASTOPEN socket option for example.

Fixes: 0536fcc039 ("tcp: prepare fastopen code for upcoming listener changes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:19:06 -07:00