Commit graph

707529 commits

Author SHA1 Message Date
Jacob Keller 0a3b4f702f i40evf: enable support for VF VLAN tag stripping control
A recent commit 809481484e5d ("i40e/i40evf: support for VF VLAN tag
stripping control") added support for VFs to negotiate the control of
VLAN tag stripping. This should have allowed VFs to disable the feature.
Unfortunately, the flag was set only in netdev->feature flags and not in
netdev->hw_features.

This ultimately causes the stack to assume that it cannot change the
flag, so it was unchangeable and marked as [fixed] in the ethtool -k
output.

Fix this by setting the feature in hw_features first, just as we do for
the PF code. This enables ethtool -K to disable the feature correctly,
and fully enables user control of the VLAN tag stripping feature.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:32 -07:00
Mariusz Stachura 052b93d0c2 i40e: do not enter PHY debug mode while setting LEDs behaviour
Previous implementation of LED set/get functions required to enter
PHY debug mode, in order to prevent access to it from FW and SW at
the same time. Reset of all ports was a unwanted side effect.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:32 -07:00
Alan Brady 19b7960b2d i40e: implement split PCI error reset handler
This patch implements the PCI error handler reset_prepare and reset_done.
This allows us to handle function level reset.  Without this patch we
are unable to perform and recover from an FLR correctly and this will cause
VFs to be unable to recover from an FLR on the PF.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:32 -07:00
Filip Sadowski 013df598d6 i40e: Properly maintain flow director filters list
When there is no space for more flow director filters and user requested to
add a new one it is rejected by firmware and automatically removed from the
filter list maintained by driver. This behaviour is correct. Afterwards
existing filter can be removed making free slot for the new one. This
however causes the newly added filter to be accepted by firmware but
removed from driver filter list resulting in not showing after issuing
'ethtool -n <dev_name>'.

This happened due to not clearing the variable pf->fd_inv which stores
filter number to be removed from the list when firmware refused to add the
requested filter. It caused the filter with this specific ID to be
constantly removed once it was added to the list although it has been
accepted by firmware and effectively applied to the NIC.
It was fixed by clearing pf->fd_inv variable after removal of the filter
from the list when it was rejected by firmware.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:32 -07:00
Filip Sadowski 9a858178ef i40e: Display error message if module does not meet thermal requirements
This patch causes error message to be displayed when NIC detects
insertion of module that does not meet thermal requirements.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Alice Michael 7f66182263 i40e: fix merge error
This patch removes some code that was accidentally added to
the wrong function with a merge error.  Fixes: c53934c6d1
("i40e: fix: do not sleep in netdev_ops")

Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Jesse Brandeburg bd6cd4e6dd i40e/i40evf: use DECLARE_BITMAP for state
When using set_bit and friends, we should be using actual
bitmaps, and fix all the locations where we might access
it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Mitch Williams 0a0d9af5bc i40e: fix incorrect register definition
This register was defined incorrectly. Fix the increment value to 8, and
replace the iterator with _i to make the definition consistent with
other statistics registers.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Mitch Williams 60518a0489 i40e: redfine I40E_PHY_TYPE_MAX
Since I40E_PHY_TYPE_MAX is used as an iterator, usually combined with
some sort of bit-shifting, it should only include actual PHY types and
not error cases. Move it up in the enum declaration so that loops only
iterate across valid PHY types.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Alan Brady c3d26b75c2 i40e: re-enable PTP L4 capabilities for XL710 if FW >6.0
Starting with XL710 FW 5.3 PTP L4 was disabled for XL710 due to a bug.  The
bug has since been resolved in XL710 FW >6.0 and PTP L4 can now be
re-enabled on those devices with updated firmware.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Jacob Keller be664cbefc i40e/i40evf: spread CPU affinity hints across online CPUs only
Currently, when setting up the IRQ for a q_vector, we set an affinity
hint based on the v_idx of that q_vector. Meaning a loop iterates on
v_idx, which is an incremental value, and the cpumask is created based
on this value.

This is a problem in systems with multiple logical CPUs per core (like in
simultaneous multithreading (SMT) scenarios). If we disable some logical
CPUs, by turning SMT off for example, we will end up with a sparse
cpu_online_mask, i.e., only the first CPU in a core is online, and
incremental filling in q_vector cpumask might lead to multiple offline
CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

Instead, we should only assign hints for CPUs which are online. Even
better, the kernel already provides a function, cpumask_local_spread()
which takes an index and returns a CPU, spreading the interrupts across
local NUMA nodes first, and then remote ones if necessary.

Since we generally have a 1:1 mapping between vectors and CPUs, there
is no real advantage to spreading vectors to local CPUs first. In order
to avoid mismatch of the default XPS hints, we'll pass -1 so that it
spreads across all CPUs without regard to the node locality.

Note that we don't need to change the q_vector->affinity_mask as this is
initialized to cpu_possible_mask, until an actual affinity is set and
then notified back to us.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Mitch Williams 64615b5418 i40e: add private flag to control source pruning
By default, our devices do source pruning, that is, they drop receive
packets that have the source MAC matching one of the receive filters.
Unfortunately, this breaks ARP monitoring in channel bonding, as the
bonding driver expects devices to receive ARPs containing their own
source address.

Add an ethtool private flag to control this feature.

Also, remove the netif_running() check when we process our private
flags. It's OK to reset when the device is closed and in most cases we
need the reset the apply these changes.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Rami Rosen ec2f25d203 i40e: fix a typo in i40e_pf documentation
This patch fixes a typo in i40e_pf object documentation; num_req_vfs
refers to the number of VFs requested for the PF.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Tim Hansen cc71b7b071 net/ipv6: remove unused err variable on icmpv6_push_pending_frames
int err is unused by icmpv6_push_pending_frames(), this patch returns removes the variable and returns the function with 0.

git bisect shows this variable has been around since linux has been in git in commit 1da177e4c3.

This was found by running make coccicheck M=net/ipv6/ on linus' tree on commit 77ede3a014 (current HEAD as of this patch).

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:56:26 -07:00
Lin Zhang 380537b4f7 net: ipv6: remove unused code in ipv6_find_hdr()
Storing the left length of skb into 'len' actually has no effect
so we can remove it.

Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:53:02 -07:00
David S. Miller d433b3f3e8 Merge branch 'libbpf-support-more-map-options'
Craig Gallek says:

====================
libbpf: support more map options

The functional change to this series is the ability to use flags when
creating maps from object files loaded by libbpf.  In order to do this,
the first patch updates the library to handle map definitions that
differ in size from libbpf's struct bpf_map_def.

For object files with a larger map definition, libbpf will continue to load
if the unknown fields are all zero, otherwise the map is rejected.  If the
map definition in the object file is smaller than expected, libbpf will use
zero as a default value in the missing fields.
====================

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:45:17 -07:00
Craig Gallek fe9b5f774b libbpf: use map_flags when creating maps
This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type
which requires flags.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:42:28 -07:00
Craig Gallek b13c5c14db libbpf: parse maps sections of varying size
This library previously assumed a fixed-size map options structure.
Any new options were ignored.  In order to allow the options structure
to grow and to support parsing older programs, this patch updates
the maps section parsing to handle varying sizes.

Object files with maps sections smaller than expected will have the new
fields initialized to zero.  Object files which have larger than expected
maps sections will be rejected unless all of the unrecognized data is zero.

This change still assumes that each map definition in the maps section
is the same size.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:42:28 -07:00
Colin Ian King d009313c99 net: qcom/emac: make function emac_isr static
The function emac_isr is local to the source and does not need to
be in global scope, so make it static.

Cleans up sparse warnings:
symbol 'emac_isr' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:27:02 -07:00
David S. Miller cec451ce60 Merge branch 'tcp-improving-RACK-cpu-performance'
Yuchung Cheng says:

====================
tcp: improving RACK cpu performance

This patch set improves the CPU consumption of the RACK TCP loss
recovery algorithm, in particular for high-speed networks. Currently,
for every ACK in recovery RACK can potentially iterate over all sent
packets in the write queue. On large BDP networks with non-trivial
losses the RACK write queue walk CPU usage becomes unreasonably high.

This patch introduces a new queue in TCP that keeps only skbs sent and
not yet (s)acked or marked lost, in time order instead of sequence
order.  With that, RACK can examine this time-sorted list and only
check packets that were sent recently, within the reordering window,
per ACK. This is the fastest way without any write queue walks. The
number of skbs examined per ACK is reduced by orders of magnitude.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:24:48 -07:00
Yuchung Cheng bef0622308 tcp: a small refactor of RACK loss detection
Refactor the RACK loop to improve readability and speed up the checks.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:24:47 -07:00
Yuchung Cheng 043b87d759 tcp: more efficient RACK loss detection
Use the new time-ordered list to speed up RACK. The detection
logic is identical. But since the list is chronologically ordered
by skb_mstamp and contains only skbs not yet acked or sacked,
RACK can abort the loop upon hitting skbs that were sent more
recently. On YouTube servers this patch reduces the iterations on
write queue by 40x. The improvement is even bigger with large
BDP networks.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:24:47 -07:00
Eric Dumazet e2080072ed tcp: new list for sent but unacked skbs for RACK recovery
This patch adds a new queue (list) that tracks the sent but not yet
acked or SACKed skbs for a TCP connection. The list is chronologically
ordered by skb->skb_mstamp (the head is the oldest sent skb).

This list will be used to optimize TCP Rack recovery, which checks
an skb's timestamp to judge if it has been lost and needs to be
retransmitted. Since TCP write queue is ordered by sequence instead
of sent time, RACK has to scan over the write queue to catch all
eligible packets to detect lost retransmission, and iterates through
SACKed skbs repeatedly.

Special cares for rare events:
1. TCP repair fakes skb transmission so the send queue needs adjusted
2. SACK reneging would require re-inserting SACKed skbs into the
   send queue. For now I believe it's not worth the complexity to
   make RACK work perfectly on SACK reneging, so we do nothing here.
3. Fast Open: currently for non-TFO, send-queue correctly queues
   the pure SYN packet. For TFO which queues a pure SYN and
   then a data packet, send-queue only queues the data packet but
   not the pure SYN due to the structure of TFO code. This is okay
   because the SYN receiver would never respond with a SACK on a
   missing SYN (i.e. SYN is never fast-retransmitted by SACK/RACK).

In order to not grow sk_buff, we use an union for the new list and
_skb_refdst/destructor fields. This is a bit complicated because
we need to make sure _skb_refdst and destructor are properly zeroed
before skb is cloned/copied at transmit, and before being freed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:24:47 -07:00
Avinash Repaka b1fb67fa50 RDS: IB: Initialize max_items based on underlying device attributes
Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
variables are set based on the underlying device attributes.

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:16:33 -07:00
Avinash Repaka 9dff9936f0 RDS: IB: Limit the scope of has_fr/has_fmr variables
This patch fixes the scope of has_fr and has_fmr variables as they are
needed only in rds_ib_add_one().

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:16:33 -07:00
Tim Hansen 1bcdca3ffb net/ipv4: Remove unused variable in route.c
int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0.

This was found with coccicheck M=net/ipv4/ on linus' tree.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:16:06 -07:00
Wei Wang 6d05081e55 tcp: clean up TFO server's initial tcp_rearm_rto() call
This commit does a cleanup and moves tcp_rearm_rto() call in the TFO
server case into a previous spot in tcp_rcv_state_process() to make
it more compact.
This is only a cosmetic change.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:10:16 -07:00
Wei Wang 27204aaa9d tcp: uniform the set up of sockets after successful connection
Currently in the TCP code, the initialization sequence for cached
metrics, congestion control, BPF, etc, after successful connection
is very inconsistent. This introduces inconsistent bevhavior and is
prone to bugs. The current call sequence is as follows:

(1) for active case (tcp_finish_connect() case):
        tcp_mtup_init(sk);
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_init_metrics(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_init_buffer_space(sk);

(2) for passive case (tcp_rcv_state_process() TCP_SYN_RECV case):
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_mtup_init(sk);
        tcp_init_buffer_space(sk);
        tcp_init_metrics(sk);

(3) for TFO passive case (tcp_fastopen_create_child()):
        inet_csk(child)->icsk_af_ops->rebuild_header(child);
        tcp_init_congestion_control(child);
        tcp_mtup_init(child);
        tcp_init_metrics(child);
        tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
        tcp_init_buffer_space(child);

This commit uniforms the above functions to have the following sequence:
        tcp_mtup_init(sk);
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_init_metrics(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_init_buffer_space(sk);
This sequence is the same as the (1) active case. We pick this sequence
because this order correctly allows BPF to override the settings
including congestion control module and initial cwnd, etc from
the route, and then allows the CC module to see those settings.

Suggested-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 21:10:16 -07:00
David S. Miller 5820299a27 Merge branch 'VSOCK-sock_diag'
Stefan Hajnoczi says:

====================
VSOCK: add sock_diag interface

v3:
 * Rebased onto net-next/master and resolved Hyper-V transport conflict

v2:
 * Moved tests to tools/testing/vsock/.  I was unable to put them in selftests/
   because they require manual setup of a VMware/KVM guest.
 * Moved to __vsock_in_bound/connected_table() to af_vsock.h
 * Fixed local variable ordering in Patch 4

There is currently no way for userspace to query open AF_VSOCK sockets.  This
means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets.

This patch series adds the netlink sock_diag interface for AF_VSOCK.  Userspace
programs sent a DUMP request including an sk_state bitmap to filter sockets
based on their state (connected, listening, etc).  The vsock_diag.ko module
replies with information about matching sockets.  This userspace ABI is defined
in <linux/vm_sockets_diag.h>.

The final patch adds a test suite that exercises the basic cases.

Jorgen and Dexuan: I have only tested the virtio transport but this should also
work for VMCI and Hyper-V.  Please give it a shot if you have time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:18 -07:00
Stefan Hajnoczi 0b02503384 VSOCK: add tools/testing/vsock/vsock_diag_test
This patch adds tests for the vsock_diag.ko module.

These tests are not self-tests because they require manual set up of a
KVM or VMware guest.  Please see tools/testing/vsock/README for
instructions.

The control.h and timeout.h infrastructure can be used for additional
AF_VSOCK tests in the future.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:17 -07:00
Stefan Hajnoczi 413a4317ac VSOCK: add sock_diag interface
This patch adds the sock_diag interface for querying sockets from
userspace.  Tools like ss(8) and netstat(8) can use this interface to
list open sockets.

The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes
netlink request and response structs.  The request can query sockets
based on their sk_state (e.g. listening sockets only) and the response
contains socket information fields including the local/remote addresses,
inode number, etc.

This patch does not dump VMCI pending sockets because I have only tested
the virtio transport, which does not use pending sockets.  Support can
be added later by extending vsock_diag_dump() if needed by VMCI users.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:17 -07:00
Stefan Hajnoczi 3b4477d2dc VSOCK: use TCP state constants for sk_state
There are two state fields: socket->state and sock->sk_state.  The
socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
sock->sk_state typically uses values that match TCP state constants
(TCP_CLOSE, TCP_ESTABLISHED).  AF_VSOCK does not follow this convention
and instead uses SS_* constants for both fields.

The sk_state field will be exposed to userspace through the vsock_diag
interface for ss(8), netstat(8), and other programs.

This patch switches sk_state to TCP state constants so that the meaning
of this field is consistent with other address families.  Not just
AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.

The following mapping was used to convert the code:

  SS_FREE -> TCP_CLOSE
  SS_UNCONNECTED -> TCP_CLOSE
  SS_CONNECTING -> TCP_SYN_SENT
  SS_CONNECTED -> TCP_ESTABLISHED
  SS_DISCONNECTING -> TCP_CLOSING
  VSOCK_SS_LISTEN -> TCP_LISTEN

In __vsock_create() the sk_state initialization was dropped because
sock_init_data() already initializes sk_state to TCP_CLOSE.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:17 -07:00
Stefan Hajnoczi bf359b8127 VSOCK: move __vsock_in_bound/connected_table() to af_vsock.h
The vsock_diag.ko module will need to check socket table membership.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:17 -07:00
Stefan Hajnoczi 44f209807e VSOCK: export socket tables for sock_diag interface
The socket table symbols need to be exported from vsock.ko so that the
vsock_diag.ko module will be able to traverse sockets.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:17 -07:00
David S. Miller 53954cf8c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:19:22 -07:00
Linus Torvalds 7a92616c0b Power management fix for v4.14-rc4
This fixes a code ordering issue in the main suspend-to-idle loop
 that causes some "low power S0 idle" conditions to be incorrectly
 reported as unmet with suspend/resume debug messages enabled.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZ1rGKAAoJEILEb/54YlRxONQP/0NzL79PrxFgtbsnhZZfDZ+U
 os6pcNYl9J+n2YVq5NAP9nnqEhN3E2gctJwjYMRIuJC/g6uU8Z6Ym6h7D+QfZrv1
 yzEsbQgLh8N9nR+lUGRi+meoF8BolOLeXnNgB18uP1ZShZLikvAELxMkmJUx+TjW
 CVQaOkXe4I/Ey5O4Jjur+tgVn+ik+xl40akw7+wHAnY+I7KwzLffP7nwHBGavHsd
 dCtbcRogWzpcihgpLpgJMaixjZXakJ2n/Zmg+IdpnYt9WRIMy2ztTu3+bPRPEVVJ
 hcP9p93r1BZclkyyYNyM0QMv4Ac96xBOe8qigXP/9EdtWYHLeV+N+N+EFbeutHgn
 LsiuO4h9FCrv4ltn7jm88sTyna0IpuKSva9pWk9nopI8e5r/Yplvi00ZJW1lz/B7
 xrPWHdm6ozuK7UGQQoeeb5CVOuClTCBC3PFTw/XTD0emogjI+xJi802zIG00eRPm
 VBAvMilLeS1ezegVWJte6wPptF3wUW2Ss6jYzh4zVgnJ5XKgfxbANhLeQizAhGFj
 lxxW9/MS1APxTrV9VluY7zyqq/s9H1iWUMyf2H06zjetBfKPiqEn+oL4/k4rnYN6
 SB4WrT8CwWxsDusMxN0aVjfftRwnGQ1+kPX3EP7mYQFP+zwiK44iilL7D37Rdyoe
 Z8Hdmdf/sCORK3zXZWd8
 =6lm4
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "This fixes a code ordering issue in the main suspend-to-idle loop that
  causes some "low power S0 idle" conditions to be incorrectly reported
  as unmet with suspend/resume debug messages enabled"

* tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / s2idle: Invoke the ->wake() platform callback earlier
2017-10-05 15:51:37 -07:00
Rafael J. Wysocki ca935f8e76 Merge branch 'pm-sleep'
* pm-sleep:
  PM / s2idle: Invoke the ->wake() platform callback earlier
2017-10-06 00:24:14 +02:00
Linus Torvalds 076264ada9 - A stable fix for the alignment of the event number reported at the
end of the 'DM_LIST_DEVICES' ioctl.
 
 - A couple stable fixes for the DM crypt target.
 
 - A DM raid health status reporting fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZ1pR1AAoJEMUj8QotnQNa48kIAJ+HTqeNjVhspxqKyJHPl78W
 3N/B11dWJ/CQ4xN7tbpC2gmsbnBBHE8RFTJzk3xQo7yoKsD0muqH35n0XA7X2A29
 i7DoYro/7F6ZuPlgzhzcCjA7eTugR4vcp5dTFYoIQG0DaOKAkN/+gJTVjNDjpRR5
 oGljZhKTeS4UNJTv/+ZjSMuAPycZq8LKRMOn/EgqT9MD4cIQ9VHN2qGc8jQt0Xrb
 m58URvAoFesGnSjZcypk+JG2SbUfJ4WB3Db7+A+X7lu2219FIroFhNHMk9obYhXG
 mkrhEnAsVsq/paPhCY4gdXWmSe7RNiAeSJeWhUSrNfjUACf1GF+l4CgBeBWIX+0=
 =V40h
 -----END PGP SIGNATURE-----

Merge tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - a stable fix for the alignment of the event number reported at the
   end of the 'DM_LIST_DEVICES' ioctl.

 - a couple stable fixes for the DM crypt target.

 - a DM raid health status reporting fix.

* tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm raid: fix incorrect status output at the end of a "recover" process
  dm crypt: reject sector_size feature if device length is not aligned to it
  dm crypt: fix memory leak in crypt_ctr_cipher_old()
  dm ioctl: fix alignment of event number in the device list
2017-10-05 15:17:40 -07:00
Jonathan Brassow 41dcf197ad dm raid: fix incorrect status output at the end of a "recover" process
There are three important fields that indicate the overall health and
status of an array: dev_health, sync_ratio, and sync_action.  They tell
us the condition of the devices in the array, and the degree to which
the array is synchronized.

This commit fixes a condition that is reported incorrectly.  When a member
of the array is being rebuilt or a new device is added, the "recover"
process is used to synchronize it with the rest of the array.  When the
process is complete, but the sync thread hasn't yet been reaped, it is
possible for the state of MD to be:
 mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
 curr_resync_completed = <max dev size> (but not MaxSector)
 and all rdevs to be In_sync.
This causes the 'array_in_sync' output parameter that is passed to
rs_get_progress() to be computed incorrectly and reported as 'false' --
or not in-sync.  This in turn causes the dev_health status characters to
be reported as all 'a', rather than the proper 'A'.

This can cause erroneous output for several seconds at a time when tools
will want to be checking the condition due to events that are raised at
the end of a sync process.  Fix this by properly calculating the
'array_in_sync' return parameter in rs_get_progress().

Also, remove an unnecessary intermediate 'recovery_cp' variable in
rs_get_progress().

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-10-05 16:21:30 -04:00
Linus Torvalds 0f380715e5 sound fixes for 4.14-rc4
A collection of small fixes, mostly with stable ones:
 - X32 ABI fix for PCM;
   likely not so many people suffer from it, but still better to fix
 - Two minor kernel warning fixes on USB audio devices spotted by
   syzkaller
 - Regression fix of echoaudio due to its inconsistent dimension
 - Fix for HBR support on Intel DP audio, on some recent chips
 - USB-audio quirk for yet another Plantronics devices
 - Fix for potential double-fetch in ASIHPI FIFO queue
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZ1chwAAoJEGwxgFQ9KSmk6OgP/2otFJK3ToOYKcEa27vMq32Y
 Cnqjxe4kWPDoaVL0EwFCCyk3UHyN+TG+ybmS4ViAIWOr64SjvN1Z51gd35HwWer6
 rrPGT4qhQfnA9D9yg9zSYhMQlM6Epbrje67hEbHMttRBSI3Jxb488beO9ZHRCIaS
 gGHzBxJnkmmsvH1Rm/4QKsm0hFxjU1P00nemFSGi4wGneDUhxe9T3J1ez9XogKiA
 x6mEqJmPwudS3qH2VAqEu/cgrnKPgKF51CrdbqoJ8kql60ejcWa9sizGVyjQ/4BX
 eHFAem4I1uLLOSG8gjPaMAPFqBn7hGr0ONPzwXdEJDDM48jJoYvdmNpk1zPZFs5+
 BSocKq5iX8NrJxVu7FlbFip/9TGM399Ep9JVIm8SZRbJCiu+Y6IVsiPM8Ww+tDIE
 jY4Qeoq4Li3O+BY1LqW8vGyvPrfyKpR/WvnQY7Gg4m7hWOUU9xBlzfbIONlXH9tF
 s5XGA9CLfM3f5zN+YCj/tRtcN8LSfL7XtMsLg99HMWD7Rkjp2GK1zNr7u/Xuz6/J
 RksFOCg/s5YwMYxYR4tE1S/HAgdyv8uZQCl4cpIIgryY+OOi7vYHiDlOaMAAVx5f
 g0X8EUKLmRa7f1EAH6E9lVNxNZtVGbh6cyqE3gggZd3mUSOSzNsNVIVu9USSo0Yh
 lBwdxB1qn+ik98ZNDlYe
 =uVz7
 -----END PGP SIGNATURE-----

Merge tag 'sound-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes, mostly with stable ones:

 - X32 ABI fix for PCM; likely not so many people suffer from it, but
   still better to fix

 - Two minor kernel warning fixes on USB audio devices spotted by
   syzkaller

 - Regression fix of echoaudio due to its inconsistent dimension

 - Fix for HBR support on Intel DP audio, on some recent chips

 - USB-audio quirk for yet another Plantronics devices

 - Fix for potential double-fetch in ASIHPI FIFO queue"

* tag 'sound-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usx2y: Suppress kernel warning at page allocation failures
  Revert "ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members"
  ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
  ALSA: pcm: Fix structure definition for X32 ABI
  ALSA: usb-audio: Add sample rate quirk for Plantronics C310/C520-M
  ALSA: hda - program ICT bits to support HBR audio
  ALSA: asihpi: fix a potential double-fetch bug when copying puhm
  ALSA: compress: Remove unused variable
2017-10-05 10:39:29 -07:00
Linus Torvalds 77ede3a014 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID subsystem fixes from Jiri Kosina:

 - buffer management size fix for i2c-hid driver, from Adrian Salido

 - tool ID regression fixes for Wacom driver from Jason Gerecke

 - a few small assorted fixes and a few device ID additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  Revert "HID: multitouch: Support ALPS PTP stick with pid 0x120A"
  HID: hidraw: fix power sequence when closing device
  HID: wacom: Always increment hdev refcount within wacom_get_hdev_data
  HID: wacom: generic: Clear ABS_MISC when tool leaves proximity
  HID: wacom: generic: Send MSC_SERIAL and ABS_MISC when leaving prox
  HID: i2c-hid: allocate hid buffers for real worst case
  HID: rmi: Make sure the HID device is opened on resume
  HID: multitouch: Support ALPS PTP stick with pid 0x120A
  HID: multitouch: support buttons and trackpoint on Lenovo X1 Tab Gen2
  HID: wacom: Correct coordinate system of touchring and pen twist
  HID: wacom: Properly report negative values from Intuos Pro 2 Bluetooth
  HID: multitouch: Fix system-control buttons not working
  HID: add multi-input quirk for IDC6680 touchscreen
  HID: wacom: leds: Don't try to control the EKR's read-only LEDs
  HID: wacom: bits shifted too much for 9th and 10th buttons
2017-10-05 10:28:12 -07:00
Linus Torvalds 9a431ef962 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Check iwlwifi 9000 reorder buffer out-of-space condition properly,
    from Sara Sharon.

 2) Fix RCU splat in qualcomm rmnet driver, from Subash Abhinov
    Kasiviswanathan.

 3) Fix session and tunnel release races in l2tp, from Guillaume Nault
    and Sabrina Dubroca.

 4) Fix endian bug in sctp_diag_dump(), from Dan Carpenter.

 5) Several mlx5 driver fixes from the Mellanox folks (max flow counters
    cap check, invalid memory access in IPoIB support, etc.)

 6) tun_get_user() should bail if skb->len is zero, from Alexander
    Potapenko.

 7) Fix RCU lookups in inetpeer, from Eric Dumazet.

 8) Fix locking in packet_do_bund().

 9) Handle cb->start() error properly in netlink dump code, from Jason
    A. Donenfeld.

10) Handle multicast properly in UDP socket early demux code. From Paolo
    Abeni.

11) Several erspan bug fixes in ip_gre, from Xin Long.

12) Fix use-after-free in socket filter code, in order to handle the
    fact that listener lock is no longer taken during the three-way TCP
    handshake. From Eric Dumazet.

13) Fix infoleak in RTM_GETSTATS, from Nikolay Aleksandrov.

14) Fix tail call generation in x86-64 BPF JIT, from Alexei Starovoitov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  net: 8021q: skip packets if the vlan is down
  bpf: fix bpf_tail_call() x64 JIT
  net: stmmac: dwmac-rk: Add RK3128 GMAC support
  rndis_host: support Novatel Verizon USB730L
  net: rtnetlink: fix info leak in RTM_GETSTATS call
  socket, bpf: fix possible use after free
  mlxsw: spectrum_router: Track RIF of IPIP next hops
  mlxsw: spectrum_router: Move VRF refcounting
  net: hns3: Fix an error handling path in 'hclge_rss_init_hw()'
  net: mvpp2: Fix clock resource by adding an optional bus clock
  r8152: add Linksys USB3GIGV1 id
  l2tp: fix l2tp_eth module loading
  ip_gre: erspan device should keep dst
  ip_gre: set tunnel hlen properly in erspan_tunnel_init
  ip_gre: check packet length and mtu correctly in erspan_xmit
  ip_gre: get key from session_id correctly in erspan_rcv
  tipc: use only positive error codes in messages
  ppp: fix __percpu annotation
  udp: perform source validation for mcast early demux
  IPv4: early demux can return an error code
  ...
2017-10-05 08:40:09 -07:00
David S. Miller 4b54db1375 Merge branch 'bpftool'
Jakub Kicinski says:

====================
tools: add bpftool

This set adds bpftool to the tools/ directory.  The first
patch renames tools/net to tools/bpf, the second one adds
the new code, while the third adds simple documentation.

v4:
 - rename docs *.txt -> *.rst (Jesper).
v3:
 - address Alexei's comments about output and docs.
v2:
 - report names, map ids, load time, uid;
 - add docs/man pages;
 - general cleanups & fixes.
====================

Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:46:56 -07:00
Jakub Kicinski ff69c21a85 tools: bpftool: add documentation
Add documentation for bpftool.  Separate files for each subcommand.
Use rst format.  Documentation is compiled into man pages using
rst2man.

Signed-off-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:46:14 -07:00
Jakub Kicinski 71bb428fe2 tools: bpf: add bpftool
Add a simple tool for querying and updating BPF objects on the system.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:45:06 -07:00
Jakub Kicinski a92bb546cf tools: rename tools/net directory to tools/bpf
We currently only have BPF tools in the tools/net directory.
We are about to add more BPF tools there, not necessarily
networking related, rename the directory and related Makefile
targets to bpf.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:45:05 -07:00
David S. Miller c331501c88 Merge branch 'enslavement-extack'
David Ahern says:

====================
net: Plumb extack error reporting to enslavements

Another round of extending extack error reporting, this time for
enslavements through ndo_add_slave and notifiers.

v2
- changed how the messages are added to bonding driver per Jiri's request
- fixed spectrum message for LAG overflow per Ido's comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:34 -07:00
David Ahern e58376e1df mlxsw: spectrum: Add extack messages for enslave failures
mlxsw fails device enslavement for a number of reasons. Use the extack
facility to return an error message to the user stating why the enslave
is failing.

Messages are prefixed with "spectrum" so users know it is a constraint
imposed by the hardware driver. For example:
    $ ip li add br0.11 link br0 type vlan id 11
    $ ip li set swp11 master br0
    Error: spectrum: Enslaving a port to a device that already has an upper device is not supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:34 -07:00
David Ahern ca752be006 net: bridge: Pass extack to down to netdev_master_upper_dev_link
Pass extack arg to br_add_if. Add messages for a couple of failures
and pass arg to netdev_master_upper_dev_link.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:34 -07:00
David Ahern 759088bda2 net: bonding: Add extack messages for some enslave failures
A number of bond_enslave errors are logged using the netdev_err API.
Return those messages to userspace via the extack facility.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:34 -07:00