Commit graph

9085 commits

Author SHA1 Message Date
David S. Miller 6f3de75cdf We have a number of changes
* code cleanups and fixups as usual
  * AQL & internal TXQ improvements from Felix
  * some mesh 802.1X support bits
  * some injection improvements from Mathy of KRACK
    fame, so we'll see what this results in ;-)
  * some more initial S1G supports bits, this time
    (some of?) the userspace APIs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl8kTFUACgkQB8qZga/f
 l8SQdQ/+MfiHePQzCFcnlXCzBa6wwaPQz5XbbWMS+1yVN5WECHomtF8Hb9YA5ITd
 Iw6HXcDzYk1bt1EqIPAQNQ0t2VfEvC+HtQKxL1LAXKYrlpD4JoL7hU2FK6CPA6/e
 h8qwrrhOeexeen+0r6LEslxFuVKSbXuBEIszsWkQNmUOoCFWdvbyD7kPNfoQ/wcu
 1TYR51Z7TiiyadptP5R0PSxlid1N1qSbnQ9Rq4ow4lfEwCfNb0ksv3eb6paQOeGX
 b5MIDpZ9FHVhWAMjzt7b7KgbmPreoHbZ+bSv0YgR+g32yLIpcH3wRuPH/eTIV1uG
 xTQtvHjnOtF7HTJzAzaCY4RH0ozf0rwTVwtcUxl/qVcV/JYXLKXnTOrAidj06gTr
 Ic5eaUaFALRX82wnmQbjkIezolmdQXom2VEg1yPT1azXnhi/bWQlDW3XcSMAfd8o
 E+HiTaGo8jD+tll1kvh5sLokecnmnKFgo5dPqlVOQ5qbJKOiLC/3g7/yjSdCKauG
 COQN0b1t+lHuvu8I2tdp+S1YI1sCES6OdHGKdQQiem5D66mUBaiYN+YuqGnvRF75
 8ZU9H5Mn9YRT2n6KEGsVfSnrOeqN15ZTRKepkwYgRzPtAGmVY+iNWwhE6I/clQgQ
 pt9jzq4jZsFMSO1flBYrFUy+fB4IYYki+VBSk5RFq3Yr8ElN4J0=
 =FR6T
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2020-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
We have a number of changes
 * code cleanups and fixups as usual
 * AQL & internal TXQ improvements from Felix
 * some mesh 802.1X support bits
 * some injection improvements from Mathy of KRACK
   fame, so we'll see what this results in ;-)
 * some more initial S1G supports bits, this time
   (some of?) the userspace APIs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31 18:51:40 -07:00
Roopa Prabhu 829eb208e8 rtnetlink: add support for protodown reason
netdev protodown is a mechanism that allows protocols to
hold an interface down. It was initially introduced in
the kernel to hold links down by a multihoming protocol.
There was also an attempt to introduce protodown
reason at the time but was rejected. protodown and protodown reason
is supported by almost every switching and routing platform.
It was ok for a while to live without a protodown reason.
But, its become more critical now given more than
one protocol may need to keep a link down on a system
at the same time. eg: vrrp peer node, port security,
multihoming protocol. Its common for Network operators and
protocol developers to look for such a reason on a networking
box (Its also known as errDisable by most networking operators)

This patch adds support for link protodown reason
attribute. There are two ways to maintain protodown
reasons.
(a) enumerate every possible reason code in kernel
    - A protocol developer has to make a request and
      have that appear in a certain kernel version
(b) provide the bits in the kernel, and allow user-space
(sysadmin or NOS distributions) to manage the bit-to-reasonname
map.
	- This makes extending reason codes easier (kind of like
      the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)

This patch takes approach (b).

a few things about the patch:
- It treats the protodown reason bits as counter to indicate
active protodown users
- Since protodown attribute is already an exposed UAPI,
the reason is not enforced on a protodown set. Its a no-op
if not used.
the patch follows the below algorithm:
  - presence of reason bits set indicates protodown
    is in use
  - user can set protodown and protodown reason in a
    single or multiple setlink operations
  - setlink operation to clear protodown, will return -EBUSY
    if there are active protodown reason bits
  - reason is not included in link dumps if not used

example with patched iproute2:
$cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ip link set dev vxlan0 protodown on protodown_reason vrrp on
$ip link set dev vxlan0 protodown_reason mlag on
$ip link show
14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>

$ip link set dev vxlan0 protodown_reason mlag off
$ip link set dev vxlan0 protodown off protodown_reason vrrp off

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31 18:49:16 -07:00
Yousuk Seung 48040793fa tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS
This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
the earliest departure time(EDT) of the timestamped skb. By tracking EDT
values of the skb from different timestamps, we can observe when and how
much the value changed. This allows to measure the precise delay
injected on the sender host e.g. by a bpf-base throttler.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31 17:00:44 -07:00
Chung-Hsien Hsu f96622749a nl80211: support 4-way handshake offloading for WPA/WPA2-PSK in AP mode
Let drivers advertise support for AP-mode WPA/WPA2-PSK 4-way handshake
offloading with a new NL80211_EXT_FEATURE_4WAY_HANDSHAKE_AP_PSK flag.

Extend use of NL80211_ATTR_PMK attribute indicating it might be passed
as part of NL80211_CMD_START_AP command, and contain the PSK (which is
the PMK, hence the name).

The driver is assumed to handle the 4-way handshake by itself in this
case, instead of relying on userspace.

Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Link: https://lore.kernel.org/r/20200623134938.39997-2-chi-hsien.lin@cypress.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:27:02 +02:00
Veerendranath Jakkam fd17dba1c8 cfg80211: Add support to advertize OCV support
Add a new feature flag that drivers can use to advertize support for
Operating Channel Validation (OCV) when using driver's SME for RSNA
handshakes.

Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Link: https://lore.kernel.org/r/20200720074225.8990-1-vjakkam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:26:59 +02:00
Markus Theil 1303a51c24 cfg80211/mac80211: add connected to auth server to station info
This patch adds the necessary bits to later query the auth server
flag for every peer from iw.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200611140238.427461-2-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:24:24 +02:00
Markus Theil 184eebe664 cfg80211/mac80211: add connected to auth server to meshconf
Besides information about num of peerings and gate connectivity,
the mesh formation byte also contains a flag for authentication
server connectivity, that currently cannot be set in the mesh conf.
This patch adds this capability, which is necessary to implement
802.1X authentication in mesh mode.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200611140238.427461-1-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:24:24 +02:00
Linus Lüssing e3718a6114 cfg80211/mac80211: add mesh_param "mesh_nolearn" to skip path discovery
Currently, before being able to forward a packet between two 802.11s
nodes, both a PLINK handshake is performed upon receiving a beacon and
then later a PREQ/PREP exchange for path discovery is performed on
demand upon receiving a data frame to forward.

When running a mesh protocol on top of an 802.11s interface, like
batman-adv, we do not need the multi-hop mesh routing capabilities of
802.11s and usually set mesh_fwding=0. However, even with mesh_fwding=0
the PREQ/PREP path discovery is still performed on demand. Even though
in this scenario the next hop PREQ/PREP will determine is always the
direct 11s neighbor node.

The new mesh_nolearn parameter allows to skip the PREQ/PREP exchange in
this scenario, leading to a reduced delay, reduced packet buffering and
simplifies HWMP in general.

mesh_nolearn is still rather conservative in that if the packet destination
is not a direct 11s neighbor, it will fall back to PREQ/PREP path
discovery.

For normal, multi-hop 802.11s mesh routing it is usually not advisable
to enable mesh_nolearn as a transmission to a direct but distant neighbor
might be worse than reaching that same node via a more robust /
higher throughput etc. multi-hop path.

Cc: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Linus Lüssing <ll@simonwunderlich.de>
Link: https://lore.kernel.org/r/20200617073034.26149-1-linus.luessing@c0d3.blue
[fix nl80211 policy to range 0/1 only]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:24:23 +02:00
Randy Dunlap 0f55c0c500 net/wireless: wireless.h: drop duplicate word in comments
Drop doubled word "threshold" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: linux-wireless@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20200715164325.9109-2-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:24:23 +02:00
Randy Dunlap 987021726f net/wireless: nl80211.h: drop duplicate words in comments
Drop doubled words in several comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: linux-wireless@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20200715164325.9109-1-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:24:22 +02:00
Thomas Pedersen df78a0c0b6 nl80211: S1G band and channel definitions
Gives drivers the definitions needed to advertise support
for S1G bands.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200602062247.23212-1-thomas@adapt-ip.com
Link: https://lore.kernel.org/r/20200731055636.795173-1-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-07-31 09:24:13 +02:00
David S. Miller 3c2d19cb8d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-07-30

Please note that I did the first time now --no-ff merges
of my testing branch into the master branch to include
the [PATCH 0/n] message of a patchset. Please let me
know if this is desirable, or if I should do it any
different.

1) Introduce a oseq-may-wrap flag to disable anti-replay
   protection for manually distributed ICVs as suggested
   in RFC 4303. From Petr Vaněk.

2) Patchset to fully support IPCOMP for vti4, vti6 and
   xfrm interfaces. From Xin Long.

3) Switch from a linear list to a hash list for xfrm interface
   lookups. From Eyal Birger.

4) Fixes to not register one xfrm(6)_tunnel object twice.
   From Xin Long.

5) Fix two compile errors that were introduced with the
   IPCOMP support for vti and xfrm interfaces.
   Also from Xin Long.

6) Make the policy hold queue work with VTI. This was
   forgotten when VTI was implemented.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30 14:39:31 -07:00
Murali Karicheri 8f4c0e0178 hsr: enhance netlink socket interface to support PRP
Parallel Redundancy Protocol (PRP) is another redundancy protocol
introduced by IEC 63439 standard. It is similar to HSR in many
aspects:-

 - Use a pair of Ethernet interfaces to created the PRP device
 - Use a 6 byte redundancy protocol part (RCT, Redundancy Check
   Trailer) similar to HSR Tag.
 - Has Link Redundancy Entity (LRE) that works with RCT to implement
   redundancy.

Key difference is that the protocol unit is a trailer instead of a
prefix as in HSR. That makes it inter-operable with tradition network
components such as bridges/switches which treat it as pad bytes,
whereas HSR nodes requires some kind of translators (Called redbox) to
talk to regular network devices. This features allows regular linux box
to be converted to a DAN-P box. DAN-P stands for Dual Attached Node - PRP
similar to DAN-H (Dual Attached Node - HSR).

Add a comment at the header/source code to explicitly state that the
driver files also handles PRP protocol as well.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:20:40 -07:00
David S. Miller a57066b1a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25 17:49:04 -07:00
Willem de Bruijn 01370434df icmp6: support rfc 4884
Extend the rfc 4884 read interface introduced for ipv4 in
commit eba75c587e ("icmp: support rfc 4884") to ipv6.

Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.

Changes v1->v2:
  - make ipv6_icmp_error_rfc4884 static (file scope)

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 17:12:41 -07:00
Ariel Levkovich 5923b8f7fa net/sched: cls_flower: Add hash info to flow classification
Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 15:23:31 -07:00
David S. Miller dee72f8a0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-21

The following pull-request contains BPF updates for your *net-next* tree.

We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).

The main changes are:

1) Run BPF program on socket lookup, from Jakub.

2) Introduce cpumap, from Lorenzo.

3) s390 JIT fixes, from Ilya.

4) teach riscv JIT to emit compressed insns, from Luke.

5) use build time computed BTF ids in bpf iter, from Yonghong.
====================

Purely independent overlapping changes in both filter.h and xdp.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 12:35:33 -07:00
Martin Varghese 4787dd582d bareudp: Reverted support to enable & disable rx metadata collection
The commit fe80536acf ("bareudp: Added attribute to enable & disable
rx metadata collection") breaks the the original(5.7) default behavior of
bareudp module to collect RX metadadata at the receive. It was added to
avoid the crash at the kernel neighbour subsytem when packet with metadata
from bareudp is processed. But it is no more needed as the
commit 394de110a7 ("net: Added pointer check for
dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash.

Fixes: fe80536acf ("bareudp: Added attribute to enable & disable rx metadata collection")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 18:30:47 -07:00
Vladimir Oltean b6bd41363a ptp: introduce a phase offset in the periodic output request
Some PHCs like the ocelot/felix switch cannot emit generic periodic
output, but just PPS (pulse per second) signals, which:
- don't start from arbitrary absolute times, but are rather
  phase-aligned to the beginning of [the closest next] second.
- have an optional phase offset relative to that beginning of the
  second.

For those, it was initially established that they should reject any
other absolute time for the PTP_PEROUT_REQUEST than 0.000000000 [1].

But when it actually came to writing an application [2] that makes use
of this functionality, we realized that we can't really deal generically
with PHCs that support absolute start time, and with PHCs that don't,
without an explicit interface. Namely, in an ideal world, PHC drivers
would ensure that the "perout.start" value written to hardware will
result in a functional output. This means that if the PTP time has
become in the past of this PHC's current time, it should be
automatically fast-forwarded by the driver into a close enough future
time that is known to work (note: this is necessary only if the hardware
doesn't do this fast-forward by itself). But we don't really know what
is the status for PHC drivers in use today, so in the general sense,
user space would be risking to have a non-functional periodic output if
it simply asked for a start time of 0.000000000.

So let's introduce a flag for this type of reduced-functionality
hardware, named PTP_PEROUT_PHASE. The start time is just "soon", the
only thing we know for sure about this signal is that its rising edge
events, Rn, occur at:

Rn = perout.phase + n * perout.period

The "phase" in the periodic output structure is simply an alias to the
"start" time, since both cannot logically be specified at the same time.
Therefore, the binary layout of the structure is not affected.

[1]: https://patchwork.ozlabs.org/project/netdev/patch/20200320103726.32559-7-yangbo.lu@nxp.com/
[2]: https://www.mail-archive.com/linuxptp-devel@lists.sourceforge.net/msg04142.html

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:22:56 -07:00
Vladimir Oltean f65b71aa25 ptp: add ability to configure duty cycle for periodic output
There are external event timestampers (PHCs with support for
PTP_EXTTS_REQUEST) that timestamp both event edges.

When those edges are very close (such as in the case of a short pulse),
there is a chance that the collected timestamp might be of the rising,
or of the falling edge, we never know.

There are also PHCs capable of generating periodic output with a
configurable duty cycle. This is good news, because we can space the
rising and falling edge out enough in time, that the risks to overrun
the 1-entry timestamp FIFO of the extts PHC are lower (example: the
perout PHC can be configured for a period of 1 second, and an "on" time
of 0.5 seconds, resulting in a duty cycle of 50%).

A flag is introduced for signaling that an on time is present in the
perout request structure, for preserving compatibility. Logically
speaking, the duty cycle cannot exceed 100% and the PTP core checks for
this.

PHC drivers that don't support this flag emit a periodic output of an
unspecified duty cycle, same as before.

The duty cycle is encoded as an "on" time, similar to the "start" and
"period" times, and reuses the reserved space while preserving overall
binary layout.

Pahole reported before:

struct ptp_perout_request {
        struct ptp_clock_time start;                     /*     0    16 */
        struct ptp_clock_time period;                    /*    16    16 */
        unsigned int               index;                /*    32     4 */
        unsigned int               flags;                /*    36     4 */
        unsigned int               rsv[4];               /*    40    16 */

        /* size: 56, cachelines: 1, members: 5 */
        /* last cacheline: 56 bytes */
};

And now:

struct ptp_perout_request {
        struct ptp_clock_time start;                     /*     0    16 */
        struct ptp_clock_time period;                    /*    16    16 */
        unsigned int               index;                /*    32     4 */
        unsigned int               flags;                /*    36     4 */
        union {
                struct ptp_clock_time on;                /*    40    16 */
                unsigned int       rsv[4];               /*    40    16 */
        };                                               /*    40    16 */

        /* size: 56, cachelines: 1, members: 5 */
        /* last cacheline: 56 bytes */
};

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:22:56 -07:00
Willem de Bruijn eba75c587e icmp: support rfc 4884
Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an
extension struct if present.

ICMP messages may include an extension structure after the original
datagram. RFC 4884 standardized this behavior. It stores the offset
in words to the extension header in u8 icmphdr.un.reserved[1].

The field is valid only for ICMP types destination unreachable, time
exceeded and parameter problem, if length is at least 128 bytes and
entire packet does not exceed 576 bytes.

Return the offset to the start of the extension struct when reading an
ICMP error from the error queue, if it matches the above constraints.

Do not return the raw u8 field. Return the offset from the start of
the user buffer, in bytes. The kernel does not return the network and
transport headers, so subtract those.

Also validate the headers. Return the offset regardless of validation,
as an invalid extension must still not be misinterpreted as part of
the original datagram. Note that !invalid does not imply valid. If
the extension version does not match, no validation can take place,
for instance.

For backward compatibility, make this optional, set by setsockopt
SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see
github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c

For forward compatibility, reserve only setsockopt value 1, leaving
other bits for additional icmp extensions.

Changes
  v1->v2:
  - convert word offset to byte offset from start of user buffer
    - return in ee_data as u8 may be insufficient
  - define extension struct and object header structs
  - return len only if constraints met
  - if returning len, also validate

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:20:22 -07:00
Christoph Hellwig 55db9c0e85 net: remove compat_sys_{get,set}sockopt
Now that the ->compat_{get,set}sockopt proto_ops methods are gone
there is no good reason left to keep the compat syscalls separate.

This fixes the odd use of unsigned int for the compat_setsockopt
optlen and the missing sock_use_custom_sol_socket.

It would also easily allow running the eBPF hooks for the compat
syscalls, but such a large change in behavior does not belong into
a consolidation patch like this one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Michael Walle c4471ad9a5 net: phy: add USXGMII link partner ability constants
The constants are taken from the USXGMII Singleport Copper Interface
specification. The naming are based on the SGMII ones, but with an MDIO_
prefix.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:05:49 -07:00
Jakub Sitnicki e9ddbb7707 bpf: Introduce SK_LOOKUP program type with a dedicated attach point
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
Priyaranjan Jha e3a5a1e8b6 tcp: add SNMP counter for no. of duplicate segments reported by DSACK
There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv,
which are incremented depending on whether the DSACKed range is below
the cumulative ACK sequence number or not. Unfortunately, these both
implicitly assume each DSACK covers only one segment. This makes these
counters unusable for estimating spurious retransmit rates,
or real/non-spurious loss rate.

This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks
the estimated number of duplicate segments based on:
(DSACKed sequence range) / MSS. This counter is usable for estimating
spurious retransmit rates, or real/non-spurious loss rate.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:54:30 -07:00
Randy Dunlap bfdfa51702 bpf: Drop duplicated words in uapi helper comments
Drop doubled words "will" and "attach".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org
2020-07-16 21:00:09 +02:00
Linus Torvalds 3e543a4d30 Char/Misc fixes for 5.8-rc6
Here are number of small char/misc driver fixes for 5.8-rc6
 
 Not that many complex fixes here, just a number of small fixes for
 reported issues, and some new device ids.  Nothing fancy.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXxBvPg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynJEQCfTmYNFFZ7WTx1FNsG/ScZLvgEC/QAnA19tK48
 HBVDaLxodkGEa24u582V
 =EcB/
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc into master

Pull char/misc fixes from Greg KH:
 "Here are number of small char/misc driver fixes for 5.8-rc6

  Not that many complex fixes here, just a number of small fixes for
  reported issues, and some new device ids. Nothing fancy.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  virtio: virtio_console: add missing MODULE_DEVICE_TABLE() for rproc serial
  intel_th: Fix a NULL dereference when hub driver is not loaded
  intel_th: pci: Add Emmitsburg PCH support
  intel_th: pci: Add Tiger Lake PCH-H support
  intel_th: pci: Add Jasper Lake CPU support
  virt: vbox: Fix guest capabilities mask check
  virt: vbox: Fix VBGL_IOCTL_VMMDEV_REQUEST_BIG and _LOG req numbers to match upstream
  uio_pdrv_genirq: fix use without device tree and no interrupt
  uio_pdrv_genirq: Remove warning when irq is not specified
  coresight: etmv4: Fix CPU power management setup in probe() function
  coresight: cti: Fix error handling in probe
  Revert "zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()"
  mei: bus: don't clean driver pointer
  misc: atmel-ssc: lock with mutex instead of spinlock
  phy: sun4i-usb: fix dereference of pointer phy0 before it is null checked
  phy: rockchip: Fix return value of inno_dsidphy_probe()
  phy: ti: j721e-wiz: Constify structs
  phy: ti: am654-serdes: Constify regmap_config
  phy: intel: fix enum type mismatch warning
  phy: intel: Fix compilation error on FIELD_PREP usage
  ...
2020-07-16 11:26:40 -07:00
Lorenzo Bianconi 9216477449 bpf: cpumap: Add the possibility to attach an eBPF program to cpumap
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.

This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:

$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
Lorenzo Bianconi 644bfe51fa cpumap: Formalize map value as a named struct
As it has been already done for devmap, introduce 'struct bpf_cpumap_val'
to formalize the expected values that can be passed in for a CPUMAP.
Update cpumap code to use the struct.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
Randy Dunlap c201324b54 net: caif: drop duplicate words in comments
Drop doubled words "or" and "the" in several comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-07-15 20:34:11 -07:00
Linus Torvalds 0665a4e9a1 dmaengine fixes for v5.5-rc6
- Update dmaengine tree location to k.org
  - dmatest fix for completing threads
  - Driver fixes for:
    - k3dma
    - fsl-dma
    - idxd
    - tegra
    - and few other in bunch of other drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl8Oy/IACgkQfBQHDyUj
 g0d73g/+PGhjOBwt+nb4e4Vo/cvXrkThfIe1xXb92ZFIh3AGRct6QPlpPrMautYJ
 i0xglxenmrOblr3OFaEwg5DmhV31UqCWN1U1aagNHe7daNEW0MZQ29dtpJnUT2+B
 MVBhrNcDPIjgWl8ujhO7y+5vkCWc2iGxNt7lgpE2lAelwmJztaxZiubDD9cVJOjb
 FOYPvQFrr4Rs3Xkg8yUOs1T7xbiL5R4xyGXQ12jfl0PCoAaQVi2JKholI9zs0j64
 4TYQjIixxFRxJd3b3Ml0u21IC514+NFlN91MNmlsNXstME4tE9emT9MHVznYPRgA
 cV1Fam9sRiX+zCLpktXaKws/MWTnBvgwx7ypbM7NCKBfOY85Ta7Xi47wixFwcQBB
 PjjloTNIINq3/ye3YEaI4ia8TxmDIUiovvAaYXmgdmQd3fUs+B/iHRGfXOHpTVVM
 sc0S8Q+alZ4Zj/qjz3VmfyCGp2ppDeFC5JcpDYsv0aTkUqpRZEdlZB6/Nh2IysLT
 2U4BFDnkuW5g5Was7hiWtdynqgEPCQFYe69azBQjh+ehj+cLOjPyQxP2yvKJBT5j
 O5sOb7CYTFu3TMH0bAFCbChsWRsK7hhYA+tjoC06Qfj4xm8QXJyGhJXgr60ob+9d
 5rKRY/AxdnR7dHALxICuUktj2q8YzZlCy2Oln0q0abJ4R4KyUm4=
 =F+IX
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine into master

Pull dmaengine fixes from Vinod Koul:

 - update dmaengine tree location to kernel.org

 - dmatest fix for completing threads

 - driver fixes for k3dma, fsl-dma, idxd, ,tegra, and few other drivers

* tag 'dmaengine-fix-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (21 commits)
  dmaengine: ioat setting ioat timeout as module parameter
  dmaengine: fsl-edma: fix wrong tcd endianness for big-endian cpu
  dmaengine: dmatest: stop completed threads when running without set channel
  dmaengine: fsl-edma-common: correct DSIZE_32BYTE
  dmaengine: dw: Initialize channel before each transfer
  dmaengine: idxd: fix misc interrupt handler thread unmasking
  dmaengine: idxd: cleanup workqueue config after disabling
  dmaengine: tegra210-adma: Fix runtime PM imbalance on error
  dmaengine: mcf-edma: Fix NULL pointer exception in mcf_edma_tx_handler
  dmaengine: fsl-edma: Fix NULL pointer exception in fsl_edma_tx_handler
  dmaengine: fsl-edma: Add lockdep assert for exported function
  dmaengine: idxd: fix hw descriptor fields for delta record
  dmaengine: ti: k3-udma: add missing put_device() call in of_xudma_dev_get()
  dmaengine: sh: usb-dmac: set tx_result parameters
  dmaengine: ti: k3-udma: Fix delayed_work usage for tx drain workaround
  dmaengine: idxd: fix cdev locking for open and release
  dmaengine: imx-sdma: Fix: Remove 'always true' comparison
  MAINTAINERS: switch dmaengine tree to kernel.org
  dmaengine: ti: k3-udma: Fix the running channel handling in alloc_chan_resources
  dmaengine: ti: k3-udma: Fix cleanup code for alloc_chan_resources
  ...
2020-07-15 15:58:11 -07:00
Horatiu Vultur ffb3adba64 net: bridge: Add port attribute IFLA_BRPORT_MRP_IN_OPEN
This patch adds a new port attribute, IFLA_BRPORT_MRP_IN_OPEN, which
allows to notify the userspace when the node lost the contiuity of
MRP_InTest frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:43 -07:00
Horatiu Vultur 559139cb04 bridge: uapi: mrp: Extend MRP_INFO attributes for interconnect status
Extend the existing MRP_INFO to return status of MRP interconnect. In
case there is no MRP interconnect on the node then the role will be
disabled so the other attributes can be ignored.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:43 -07:00
Horatiu Vultur 2801758391 bridge: uapi: mrp: Extend MRP attributes for MRP interconnect
Extend the existing MRP netlink attributes to allow to configure MRP
Interconnect:

IFLA_BRIDGE_MRP_IN_ROLE - the parameter type is br_mrp_in_role which
  contains the interconnect id, the ring id, the interconnect role(MIM
  or MIC) and the port ifindex that represents the interconnect port.

IFLA_BRIDGE_MRP_IN_STATE - the parameter type is br_mrp_in_state which
  contains the interconnect id and the interconnect state.

IFLA_BRIDGE_MRP_IN_TEST - the parameter type is br_mrp_start_in_test
  which contains the interconnect id, the interval at which to send
  MRP_InTest frames, how many test frames can be missed before declaring
  the interconnect ring open and the period which represents for how long
  to send MRP_InTest frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:42 -07:00
Linus Torvalds e9919e11e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "A few quirks for the Elan touchpad driver, another Thinkpad is being
  switched over from PS/2 to native RMI4 interface, and we gave a brand
  new SW_MACHINE_COVER switch definition"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - add more hardware ID for Lenovo laptops
  Input: i8042 - add Lenovo XiaoXin Air 12 to i8042 nomux list
  Revert "Input: elants_i2c - report resolution information for touch major"
  Input: elan_i2c - only increment wakeup count on touch
  Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen
  ARM: dts: n900: remove mmc1 card detect gpio
  Input: add `SW_MACHINE_COVER`
2020-07-13 18:31:15 -07:00
David S. Miller 07dd1b7e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).

The main changes are:

1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
   its own tracing event, from Alan.

2) Better libbpf support on older kernels, from Andrii.

3) Additional AF_XDP stats, from Ciara.

4) build time resolution of BTF IDs, from Jiri.

5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 18:04:05 -07:00
Alexander A. Klimov ed757328c3 atm: Replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:01:44 -07:00
Ciara Loftus 0d80cb4612 xsk: Add xdp statistics to xsk_diag
Add xdp statistics to the information dumped through the xsk_diag interface

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200708072835.4427-4-ciara.loftus@intel.com
2020-07-13 15:32:56 -07:00
Ciara Loftus 8aa5a33578 xsk: Add new statistics
It can be useful for the user to know the reason behind a dropped packet.
Introduce new counters which track drops on the receive path caused by:
1. rx ring being full
2. fill ring being empty

Also, on the tx path introduce a counter which tracks the number of times
we attempt pull from the tx ring when it is empty.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200708072835.4427-2-ciara.loftus@intel.com
2020-07-13 15:32:56 -07:00
David S. Miller 71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Linus Torvalds 5a764898af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
    BPF programs, from Maciej Żenczykowski.

 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
    Mariappan.

 3) Slay memory leak in nl80211 bss color attribute parsing code, from
    Luca Coelho.

 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.

 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
    Dumazet.

 6) xsk code dips too deeply into DMA mapping implementation internals.
    Add dma_need_sync and use it. From Christoph Hellwig

 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.

 8) Check for disallowed attributes when loading flow dissector BPF
    programs. From Lorenz Bauer.

 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
    Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
    it. From Alex Elder.

11) Resolve several issues in TCP MD5 signature support. Missing memory
    barriers, bogus options emitted when using syncookies, and failure
    to allow md5 key changes in established states. All from Eric
    Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
    Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast code,
    from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
    currently only works for one level of VLAN encapsulation. Allow
    Q-in-Q situations to work properly here, from Toke
    Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
    support and various protocols. From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
    Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
    Eli Britstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  mlxsw: pci: Fix use-after-free in case of failed devlink reload
  mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
  net: macb: fix call to pm_runtime in the suspend/resume functions
  net: macb: fix macb_suspend() by removing call to netif_carrier_off()
  net: macb: fix macb_get/set_wol() when moving to phylink
  net: macb: mark device wake capable when "magic-packet" property present
  net: macb: fix wakeup test in runtime suspend/resume routines
  bnxt_en: fix NULL dereference in case SR-IOV configuration fails
  libbpf: Fix libbpf hashmap on (I)LP32 architectures
  net/mlx5e: CT: Fix memory leak in cleanup
  net/mlx5e: Fix port buffers cell size value
  net/mlx5e: Fix 50G per lane indication
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix usage of rcu-protected pointer
  net/mxl5e: Verify that rpriv is not NULL
  net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  net/mlx5: Fix eeprom support for SFP module
  cgroup: Fix sock_cgroup_data on big-endian.
  selftests: bpf: Fix detach from sockmap tests
  ...
2020-07-10 18:16:22 -07:00
Jakub Kicinski c7d759eb7b ethtool: add tunnel info interface
Add an interface to report offloaded UDP ports via ethtool netlink.

Now that core takes care of tracking which UDP tunnel ports the NICs
are aware of we can quite easily export this information out to
user space.

The responsibility of writing the netlink dumps is split between
ethtool code and udp_tunnel_nic.c - since udp_tunnel module may
not always be loaded, yet we should always report the capabilities
of the NIC.

$ ethtool --show-tunnels eth0
Tunnel information for eth0:
  UDP port table 0:
    Size: 4
    Types: vxlan
    No entries
  UDP port table 1:
    Size: 4
    Types: geneve, vxlan-gpe
    Entries (1):
        port 1230, vxlan-gpe

v4:
 - back to v2, build fix is now directly in udp_tunnel.h
v3:
 - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET
   not set.
v2:
 - fix string set count,
 - reorder enums in the uAPI,
 - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset
   in docs and comments.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 13:54:00 -07:00
Linus Torvalds a581387e41 io_uring-5.8-2020-07-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8IkFIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgph7tEACdkJ9CupSjVQBJ35/2wBiUCU6lh0iUbru3
 oURuw9AdjzT75en68LOyrhfdGCkNX8xrIoHnh9pGukoPid//R3rAsk+5j4Doc7SY
 FBZuH7HbOC/gdTAzSd0tLyaddmf06eq0PPmlWQGUYu29Vy49nvLDM2V+455OgkT9
 csCNk/Fq/np7IKVZW56KZ1Iz90Dtp8BVwFJeFdzFK5cpCjKXTyvzS/5GEFqz0rc3
 AwvxkBCpDjXQqh5OOs2qM18QUeJAQV0PX1x+XjtASMB3dr4W8D5KDEhaQwiXiGOT
 j7yHX/JThtOYI41IiKpF/LP9EQAkavIT1n7ZjXwablq6ecQjj24DxCe0h1uwnKyW
 G5Hztjcb2YMhhDVj4q6m/uzD13+ccIeEmVoKjRuGj+0YDFETIBYK8U4GElu7VUIr
 yUKqy7jFEHKFZj5eQz5JDl08+zq/ocZHJctqZhCB3DWMbE5f+VCzNGEqGGs2IiRS
 J7/S8jB1j7Te8jFXBN3uaNpuj+U/JlYCPqO83fxF14ar3wZ1mblML4cw0sXUJazO
 1YI3LU91qBeCNK/9xUc43MzzG/SgdIKJjfoaYYPBPMG5SzVYAtRoUUiNf942UE1k
 7u8/leql+RqwSLedjU/AQqzs29p7wcK8nzFBApmTfiiBHYcRbJ6Fxp3/UwTOR3tQ
 kwmkUtivtQ==
 =31iz
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Fix memleak for error path in registered files (Yang)

 - Export CQ overflow state in flags, necessary to fix a case where
   liburing doesn't know if it needs to enter the kernel (Xiaoguang)

 - Fix for a regression in when user memory is accounted freed, causing
   issues with back-to-back ring exit + init if the ulimit -l setting is
   very tight.

* tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block:
  io_uring: account user memory freed when exit has been queued
  io_uring: fix memleak in io_sqe_files_register()
  io_uring: fix memleak in __io_sqe_files_update()
  io_uring: export cq overflow status to userspace
2020-07-10 09:57:57 -07:00
Hans de Goede f794db6841 virt: vbox: Fix VBGL_IOCTL_VMMDEV_REQUEST_BIG and _LOG req numbers to match upstream
Until this commit the mainline kernel version (this version) of the
vboxguest module contained a bug where it defined
VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG using
_IOC(_IOC_READ | _IOC_WRITE, 'V', ...) instead of
_IO(V, ...) as the out of tree VirtualBox upstream version does.

Since the VirtualBox userspace bits are always built against VirtualBox
upstream's headers, this means that so far the mainline kernel version
of the vboxguest module has been failing these 2 ioctls with -ENOTTY.
I guess that VBGL_IOCTL_VMMDEV_REQUEST_BIG is never used causing us to
not hit that one and sofar the vboxguest driver has failed to actually
log any log messages passed it through VBGL_IOCTL_LOG.

This commit changes the VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG
defines to match the out of tree VirtualBox upstream vboxguest version,
while keeping compatibility with the old wrong request defines so as
to not break the kernel ABI in case someone has been using the old
request defines.

Fixes: f6ddd094f5 ("virt: Add vboxguest driver for Virtual Box Guest integration UAPI")
Cc: stable@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-2-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 13:40:19 +02:00
Danielle Ratson a0f49b5486 devlink: Add a new devlink port split ability attribute and pass to netlink
Add a new attribute that indicates the split ability of devlink port.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:30 -07:00
Danielle Ratson a21cf0a833 devlink: Add a new devlink port lanes attribute and pass to netlink
Add a new devlink port attribute that indicates the port's number of lanes.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

The attribute is not passed to user space in case the number of lanes is
invalid (0).

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Paolo Abeni ac3b45f609 mptcp: add MPTCP socket diag interface
exposes basic inet socket attribute, plus some MPTCP socket
fields comprising PM status and MPTCP-level sequence numbers.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:38:41 -07:00
Paolo Abeni 3f935c75eb inet_diag: support for wider protocol numbers
After commit bf9765145b ("sock: Make sk_protocol a 16-bit value")
the current size of 'sdiag_protocol' is not sufficient to represent
the possible protocol values.

This change introduces a new inet diag request attribute to let
user space specify the relevant protocol number using u32 values.

The attribute is parsed by inet diag core on get/dump command
and the extended protocol value, if available, is preferred to
'sdiag_protocol' to lookup the diag handler.

The parse attributed are exposed to all the diag handlers via
the cb->data.

Note that inet_diag_dump_one_icsk() is left unmodified, as it
will not be used by protocol using the extended attribute.

Suggested-by: David S. Miller <davem@davemloft.net>
Co-developed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:38:41 -07:00
Xiaoguang Wang 6d5f904904 io_uring: export cq overflow status to userspace
For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:

static void test_cq_overflow(struct io_uring *ring)
{
        struct io_uring_cqe *cqe;
        struct io_uring_sqe *sqe;
        int issued = 0;
        int ret = 0;

        do {
                sqe = io_uring_get_sqe(ring);
                if (!sqe) {
                        fprintf(stderr, "get sqe failed\n");
                        break;;
                }
                ret = io_uring_submit(ring);
                if (ret <= 0) {
                        if (ret != -EBUSY)
                                fprintf(stderr, "sqe submit failed: %d\n", ret);
                        break;
                }
                issued++;
        } while (ret > 0);
        assert(ret == -EBUSY);

        printf("issued requests: %d\n", issued);

        while (issued) {
                ret = io_uring_peek_cqe(ring, &cqe);
                if (ret) {
                        if (ret != -EAGAIN) {
                                fprintf(stderr, "peek completion failed: %s\n",
                                        strerror(ret));
                                break;
                        }
                        printf("left requets: %d\n", issued);
                        continue;
                }
                io_uring_cqe_seen(ring, cqe);
                issued--;
                printf("left requets: %d\n", issued);
        }
}

int main(int argc, char *argv[])
{
        int ret;
        struct io_uring ring;

        ret = io_uring_queue_init(16, &ring, 0);
        if (ret) {
                fprintf(stderr, "ring setup failed: %d\n", ret);
                return 1;
        }

        test_cq_overflow(&ring);
        return 0;
}

To fix this issue, export cq overflow status to userspace by adding new
IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as
io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08 19:17:06 -06:00
Meir Lichtinger 065e0d42a0 ethtool: Add support for 100Gbps per lane link modes
Define 100G, 200G and 400G link modes using 100Gbps per lane

LR, ER and FR are defined as a single link mode because they are
using same technology and by design are fully interoperable.
EEPROM content indicates if the module is LR, ER, or FR, and the
user space ethtool decoder is planned to support decoding these
modes in the EEPROM.

Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:30:42 -07:00