Commit graph

1062075 commits

Author SHA1 Message Date
Toke Høiland-Jørgensen 35b2e54989 page_pool: Add callback to init pages when they are allocated
Add a new callback function to page_pool that, if set, will be called every
time a new page is allocated. This will be used from bpf_test_run() to
initialise the page data with the data provided by userspace when running
XDP programs with redirect turned on.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220103150812.87914-3-toke@redhat.com
2022-01-05 19:46:32 -08:00
Toke Høiland-Jørgensen 4a48ef70b9 xdp: Allow registering memory model without rxq reference
The functions that register an XDP memory model take a struct xdp_rxq as
parameter, but the RXQ is not actually used for anything other than pulling
out the struct xdp_mem_info that it embeds. So refactor the register
functions and export variants that just take a pointer to the xdp_mem_info.

This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a
page_pool instance that is not connected to any network device.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com
2022-01-05 19:46:32 -08:00
Alexei Starovoitov 640a171c93 Merge branch 'samples/bpf: xdpsock app enhancements'
Ong Boon says:

====================

First of all, sorry for taking more time to get back to this series and
thanks to all valuble feedback in series-1 at [1] from Jesper and Song
Liu.

Since then I have looked into what Jesper suggested in [2] and worked on
revising the patch series into several patches for ease of review:

v1->v2:
1/7: [No change]. Add VLAN tag (ID & Priority) to the generated Tx-Only
     frames.

2/7: [No change]. Add DMAC and SMAC setting to the generated Tx-Only
     frames. If parameters are not set, previous DMAC and SMAC are used.

3/7: [New]. Add support for selecting different CLOCK for clock_gettime()
     used in get_nsecs.

4/7: [New]. This is a total rework from series-1 3/4-patch [3]. It uses
     clock_nanosleep() suggested by Jesper. In addition, added statistic
     for Tx schedule variance under application stat (-a|--app-stats).
     Make the cyclic Tx operation and --poll mode to be mutually-
     exclusive. Still, the ability to specify TX cycle time and used
     together with batch size and packet count remain the same.

5/7: [New]. Add the support for TX process schedule policy and priority
     setting. By default, SCHED_OTHER policy is used. This too is matching
     the schedule policy setting in [2].

6/7: [Change]. This is update from series-1 4/4-patch [4]. Added TX clean
     process time-out in 1s granularity with configurable retries count
     (-O|--retries).

7/7: [New]. Added timestamp for TX packet following pktgen_hdr format
     matching the implementation in [2]. However, the sequence ID remains
     the same as it is instead of process schedule diff in [2].

To summarize on what program options have been added with v2 series
using an example below:-

 DMAC (-G)                 = fa:8d:f1:e2:0b:e8
 SMAC (-H)                 = ce:17:07:17:3e:3a

 VLAN tagged (-V)
 VLAN ID (-J)              = 12
 VLAN Pri (-K)             = 3

 Tx Queue (-q)             = 3
 Cycle Time in us (-T)     = 1000
 Batch (-b)                = 2
 Packet Count              = 6
 Tx schedule policy (-W)   = FIFO
 Tx schedule priority (-U) = 50
 Clock selection (-w)      = REALTIME

 Tx timeout retries(-O)    = 5
 Tx timestamp (-y)
 Cyclic Tx schedule stat (-a)

Note: xdpsock sets UDP dest-port and src-port to 0x1000 as default.

 Sending Board
 =============
 $ xdpsock -i eth0 -t -N -z -H ce:17:07:17:3e:3a -G fa:8d:f1:e2:0b:e8 \
   -V -J 12 -K 3 -q 3 \
   -T 1000 -b 2 -C 6 -W FIFO -U 50 -w REALTIME \
   -O 5 -y -a

  sock0@eth0:3 txonly xdp-drv
                    pps            pkts           0.00
 rx                 0              0
 tx                 0              6

                    calls/s        count
 rx empty polls     0              0
 fill fail polls    0              0
 copy tx sendtos    0              0
 tx wakeup sendtos  0              5
 opt polls          0              0

                    period     min        ave        max        cycle
 Cyclic TX          1000000    31033      32009      33397      3

 Receiving Board
 ===============
 $ tcpdump -nei eth0 udp port 0x1000 -vv -Q in -X \
    --time-stamp-precision nano
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
03:46:40.520111580 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
    10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
        0x0000:  4500 002c 0000 0000 4011 527e 0a0a 0a10  E..,....@.R~....
        0x0010:  0a0a 0a20 1000 1000 0018 e997 be9b e955  ...............U
        0x0020:  0000 0000 61cd 2ba1 0006 987c            ....a.+....|
03:46:40.520112163 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
    10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
        0x0000:  4500 002c 0000 0000 4011 527e 0a0a 0a10  E..,....@.R~....
        0x0010:  0a0a 0a20 1000 1000 0018 e996 be9b e955  ...............U
        0x0020:  0000 0001 61cd 2ba1 0006 987c            ....a.+....|
03:46:40.521066860 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
    10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
        0x0000:  4500 002c 0000 0000 4011 527e 0a0a 0a10  E..,....@.R~....
        0x0010:  0a0a 0a20 1000 1000 0018 e5af be9b e955  ...............U
        0x0020:  0000 0002 61cd 2ba1 0006 9c62            ....a.+....b
03:46:40.521067012 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
    10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
        0x0000:  4500 002c 0000 0000 4011 527e 0a0a 0a10  E..,....@.R~....
        0x0010:  0a0a 0a20 1000 1000 0018 e5ae be9b e955  ...............U
        0x0020:  0000 0003 61cd 2ba1 0006 9c62            ....a.+....b
03:46:40.522061935 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
    10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
        0x0000:  4500 002c 0000 0000 4011 527e 0a0a 0a10  E..,....@.R~....
        0x0010:  0a0a 0a20 1000 1000 0018 e1c5 be9b e955  ...............U
        0x0020:  0000 0004 61cd 2ba1 0006 a04a            ....a.+....J
03:46:40.522062173 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
    10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
        0x0000:  4500 002c 0000 0000 4011 527e 0a0a 0a10  E..,....@.R~....
        0x0010:  0a0a 0a20 1000 1000 0018 e1c4 be9b e955  ...............U
        0x0020:  0000 0005 61cd 2ba1 0006 a04a            ....a.+....J

I have tested the above with both tagged and untagged packet format and
based on the timestamp in tcpdump found that the timing of the batch
cyclic transmission is correct.

Appreciate if community can give the patch series v2 a try and point out
any gap.

Thanks
Boon Leong

[1] https://patchwork.kernel.org/project/netdevbpf/cover/20211124091821.3916046-1-boon.leong.ong@intel.com/
[2] https://github.com/netoptimizer/network-testing/blob/master/src/udp_pacer.c
[3] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-4-boon.leong.ong@intel.com/
[4] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-5-boon.leong.ong@intel.com/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05 17:53:25 -08:00
Ong Boon Leong eb68db45b7 samples/bpf: xdpsock: Add timestamp for Tx-only operation
It may be useful to add timestamp for Tx packets for continuous or cyclic
transmit operation. The timestamp and sequence ID of a Tx packet are
stored according to pktgen header format. To enable per-packet timestamp,
use -y|--tstamp option. If timestamp is off, pktgen header is not
included in the UDP payload. This means receiving side can use the magic
number for pktgen for differentiation.

The implementation supports both VLAN tagged and untagged option. By
default, the minimum packet size is set at 64B. However, if VLAN tagged
is on (-V), the minimum packet size is increased to 66B just so to fit
the pktgen_hdr size.

Added hex_dump() into the code path just for future cross-checking.
As before, simply change to "#define DEBUG_HEXDUMP 1" to inspect the
accuracy of TX packet.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-8-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Ong Boon Leong 8121e78932 samples/bpf: xdpsock: Add time-out for cleaning Tx
When user sets tx-pkt-count and in case where there are invalid Tx frame,
the complete_tx_only_all() process polls indefinitely. So, this patch
adds a time-out mechanism into the process so that the application
can terminate automatically after it retries 3*polling interval duration.

v1->v2:
 Thanks to Jesper's and Song Liu's suggestion.
 - clean-up git message to remove polling log
 - make the Tx time-out retries configurable with 1s granularity

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-7-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Ong Boon Leong fa24d0b1d5 samples/bpf: xdpsock: Add sched policy and priority support
By default, TX schedule policy is SCHED_OTHER (round-robin time-sharing).
To improve TX cyclic scheduling, we add SCHED_FIFO policy and its priority
by using -W FIFO or --policy=FIFO and -U <PRIO> or --schpri=<PRIO>.

A) From xdpsock --app-stats, for SCHED_OTHER policy:
   $ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a

                      period     min        ave        max        cycle
   Cyclic TX          1000000    53507      75334      712642     6250

B) For SCHED_FIFO policy and schpri=50:
   $ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a -W FIFO -U 50

                      period     min        ave        max        cycle
   Cyclic TX          1000000    3699       24859      54397      6250

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-6-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Ong Boon Leong fa0d27a1d5 samples/bpf: xdpsock: Add cyclic TX operation capability
Tx cycle time is in micro-seconds unit. By combining the batch size (-b M)
and Tx cycle time (-T|--tx-cycle N), xdpsock now can transmit batch-size of
packets every N-us periodically. Cyclic TX operation is not applicable if
--poll mode is used.

To transmit 16 packets every 1ms cycle time for total of 100000 packets
silently:
 $ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000

To print cyclic TX schedule variance stats, use --app-stats|-a:
 $ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000 -a

 sock0@eth0:0 txonly xdp-drv
                   pps            pkts           0.00
rx                 0              0
tx                 0              100000

                   calls/s        count
rx empty polls     0              0
fill fail polls    0              0
copy tx sendtos    0              0
tx wakeup sendtos  0              6254
opt polls          0              0

                   period     min        ave        max        cycle
Cyclic TX          1000000    53507      75334      712642     6250

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-5-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Ong Boon Leong 5a3882542a samples/bpf: xdpsock: Add clockid selection support
User specifies the clock selection by using -w CLOCK or --clock=CLOCK
where CLOCK=[REALTIME, TAI, BOOTTIME, MONOTONIC].

The default CLOCK selection is MONOTONIC.

The implementation of clock selection parsing is borrowed from
iproute2/tc/q_taprio.c

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-4-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Ong Boon Leong 6440a6c23f samples/bpf: xdpsock: Add Dest and Src MAC setting for Tx-only operation
To set Dest MAC address (-G|--tx-dmac) only:
 $ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff

To set Source MAC address (-H|--tx-smac) only:
 $ xdpsock -i eth0 -t -N -z -H 11:22:33:44:55:66

To set both Dest and Source MAC address:
 $ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff \
   -H 11:22:33:44:55:66

The default Dest and Source MAC address remain the same as before.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20211230035447.523177-3-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Ong Boon Leong 2741a0493c samples/bpf: xdpsock: Add VLAN support for Tx-only operation
In multi-queue environment testing, the support for VLAN-tag based
steering is useful. So, this patch adds the capability to add
VLAN tag (VLAN ID and Priority) to the generated Tx frame.

To set the VLAN ID=10 and Priority=2 for Tx only through TxQ=3:
 $ xdpsock -i eth0 -t -N -z -q 3 -V -J 10 -K 2

If VLAN ID (-J) and Priority (-K) is set, it default to
  VLAN ID = 1
  VLAN Priority = 0.

For example, VLAN-tagged Tx only, xdp copy mode through TxQ=1:
 $ xdpsock -i eth0 -t -N -c -q 1 -V

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211230035447.523177-2-boon.leong.ong@intel.com
2022-01-05 17:53:24 -08:00
Christy Lee 5f60826428 libbpf 1.0: Deprecate bpf_object__find_map_by_offset() API
API created with simplistic assumptions about BPF map definitions.
It hasn’t worked for a while, deprecate it in preparation for
libbpf 1.0.

  [0] Closes: https://github.com/libbpf/libbpf/issues/302

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220105003120.2222673-1-christylee@fb.com
2022-01-05 16:11:32 -08:00
Christy Lee 9855c131b9 libbpf 1.0: Deprecate bpf_map__is_offload_neutral()
Deprecate bpf_map__is_offload_neutral(). It’s most probably broken
already. PERF_EVENT_ARRAY isn’t the only map that’s not suitable
for hardware offloading. Applications can directly check map type
instead.

  [0] Closes: https://github.com/libbpf/libbpf/issues/306

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220105000601.2090044-1-christylee@fb.com
2022-01-05 16:09:06 -08:00
Qiang Wang 51a33c60f1 libbpf: Support repeated legacy kprobes on same function
If repeated legacy kprobes on same function in one process,
libbpf will register using the same probe name and got -EBUSY
error. So append index to the probe name format to fix this
problem.

Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211227130713.66933-2-wangqiang.wq.frank@bytedance.com
2022-01-05 15:38:21 -08:00
Qiang Wang 71cff670ba libbpf: Use probe_name for legacy kprobe
Fix a bug in commit 46ed5fc33d, which wrongly used the
func_name instead of probe_name to register legacy kprobe.

Fixes: 46ed5fc33d ("libbpf: Refactor and simplify legacy kprobe code")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20211227130713.66933-1-wangqiang.wq.frank@bytedance.com
2022-01-05 15:38:21 -08:00
Christy Lee 7218c28c87 libbpf: Deprecate bpf_perf_event_read_simple() API
With perf_buffer__poll() and perf_buffer__consume() APIs available,
there is no reason to expose bpf_perf_event_read_simple() API to
users. If users need custom perf buffer, they could re-implement
the function.

Mark bpf_perf_event_read_simple() and move the logic to a new
static function so it can still be called by other functions in the
same file.

  [0] Closes: https://github.com/libbpf/libbpf/issues/310

Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211229204156.13569-1-christylee@fb.com
2022-01-05 15:27:43 -08:00
Kuniyuki Iwashima 28479934f2 bpf: Add SO_RCVBUF/SO_SNDBUF in _bpf_getsockopt().
This patch exposes SO_RCVBUF/SO_SNDBUF through bpf_getsockopt().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220104013153.97906-3-kuniyu@amazon.co.jp
2022-01-05 14:16:07 -08:00
Kuniyuki Iwashima 04c350b1ae bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt().
The commit 4057765f2d ("sock: consistent handling of extreme
SO_SNDBUF/SO_RCVBUF values") added a change to prevent underflow
in setsockopt() around SO_SNDBUF/SO_RCVBUF.

This patch adds the same change to _bpf_setsockopt().

Fixes: 4057765f2d ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220104013153.97906-2-kuniyu@amazon.co.jp
2022-01-05 14:15:42 -08:00
Kris Van Hees a5bebc4f00 bpf: Fix verifier support for validation of async callbacks
Commit bfc6bb74e4 ("bpf: Implement verifier support for validation of async callbacks.")
added support for BPF_FUNC_timer_set_callback to
the __check_func_call() function.  The test in __check_func_call() is
flaweed because it can mis-interpret a regular BPF-to-BPF pseudo-call
as a BPF_FUNC_timer_set_callback callback call.

Consider the conditional in the code:

	if (insn->code == (BPF_JMP | BPF_CALL) &&
	    insn->imm == BPF_FUNC_timer_set_callback) {

The BPF_FUNC_timer_set_callback has value 170.  This means that if you
have a BPF program that contains a pseudo-call with an instruction delta
of 170, this conditional will be found to be true by the verifier, and
it will interpret the pseudo-call as a callback.  This leads to a mess
with the verification of the program because it makes the wrong
assumptions about the nature of this call.

Solution: include an explicit check to ensure that insn->src_reg == 0.
This ensures that calls cannot be mis-interpreted as an async callback
call.

Fixes: bfc6bb74e4 ("bpf: Implement verifier support for validation of async callbacks.")
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220105210150.GH1559@oracle.com
2022-01-05 13:38:22 -08:00
Christoph Hellwig 58d8a3fc4a bpf, docs: Fully document the JMP mode modifiers
Add a description for all the modifiers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-7-hch@lst.de
2022-01-05 13:11:26 -08:00
Christoph Hellwig 9e533e22b5 bpf, docs: Fully document the JMP opcodes
Add pseudo-code to document all the different BPF_JMP / BPF_JMP64
opcodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-6-hch@lst.de
2022-01-05 13:11:26 -08:00
Christoph Hellwig 03c517ee9e bpf, docs: Fully document the ALU opcodes
Add pseudo-code to document all the different BPF_ALU / BPF_ALU64
opcodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-5-hch@lst.de
2022-01-05 13:11:26 -08:00
Christoph Hellwig 894cda554c bpf, docs: Document the opcode classes
Add a description for each opcode class.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-4-hch@lst.de
2022-01-05 13:11:26 -08:00
Christoph Hellwig be3193cded bpf, docs: Add subsections for ALU and JMP instructions
Add a little more stucture to the ALU/JMP documentation with sections and
improve the example text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-3-hch@lst.de
2022-01-05 13:11:26 -08:00
Christoph Hellwig 62e4683849 bpf, docs: Add a setion to explain the basic instruction encoding
The eBPF instruction set document does not currently document the basic
instruction encoding.  Add a section to do that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-2-hch@lst.de
2022-01-05 13:11:26 -08:00
Daniel Borkmann ca796fe66f bpf, selftests: Add verifier test for mem_or_null register with offset.
Add a new test case with mem_or_null typed register with off > 0 to ensure
it gets rejected by the verifier:

  # ./test_verifier 1011
  #1009/u check with invalid reg offset 0 OK
  #1009/p check with invalid reg offset 0 OK
  Summary: 2 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05 12:00:19 -08:00
Daniel Borkmann e60b0d12a9 bpf: Don't promote bogus looking registers after null check.
If we ever get to a point again where we convert a bogus looking <ptr>_or_null
typed register containing a non-zero fixed or variable offset, then lets not
reset these bounds to zero since they are not and also don't promote the register
to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown
register could be an avenue as well, but then if we run into this case it would
allow to leak a kernel pointer this way.

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05 12:00:19 -08:00
John Fastabend 218d747a41 bpf, sockmap: Fix double bpf_prog_put on error case in map_link
sock_map_link() is called to update a sockmap entry with a sk. But, if the
sock_map_init_proto() call fails then we return an error to the map_update
op against the sockmap. In the error path though we need to cleanup psock
and dec the refcnt on any programs associated with the map, because we
refcnt them early in the update process to ensure they are pinned for the
psock. (This avoids a race where user deletes programs while also updating
the map with new socks.)

In current code we do the prog refcnt dec explicitely by calling
bpf_prog_put() when the program was found in the map. But, after commit
'38207a5e81230' in this error path we've already done the prog to psock
assignment so the programs have a reference from the psock as well. This
then causes the psock tear down logic, invoked by sk_psock_put() in the
error path, to similarly call bpf_prog_put on the programs there.

To be explicit this logic does the prog->psock assignment:

  if (msg_*)
    psock_set_prog(...)

Then the error path under the out_progs label does a similar check and
dec with:

  if (msg_*)
     bpf_prog_put(...)

And the teardown logic sk_psock_put() does ...

  psock_set_prog(msg_*, NULL)

... triggering another bpf_prog_put(...). Then KASAN gives us this splat,
found by syzbot because we've created an inbalance between bpf_prog_inc and
bpf_prog_put calling put twice on the program.

  BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline]
  BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829
  BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829
  Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641

To fix clean up error path so it doesn't try to do the bpf_prog_put in the
error path once progs are assigned then it relies on the normal psock
tear down logic to do complete cleanup.

For completness we also cover the case whereh sk_psock_init_strp() fails,
but this is not expected because it indicates an incorrect socket type
and should be caught earlier.

Fixes: 38207a5e81 ("bpf, sockmap: Attach map progs to psock early for feature probes")
Reported-by: syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com
2022-01-05 20:43:08 +01:00
John Fastabend 5b2c5540b8 bpf, sockmap: Fix return codes from tcp_bpf_recvmsg_parser()
Applications can be confused slightly because we do not always return the
same error code as expected, e.g. what the TCP stack normally returns. For
example on a sock err sk->sk_err instead of returning the sock_error we
return EAGAIN. This usually means the application will 'try again'
instead of aborting immediately. Another example, when a shutdown event
is received we should immediately abort instead of waiting for data when
the user provides a timeout.

These tend to not be fatal, applications usually recover, but introduces
bogus errors to the user or introduces unexpected latency. Before
'c5d2177a72a16' we fell back to the TCP stack when no data was available
so we managed to catch many of the cases here, although with the extra
latency cost of calling tcp_msg_wait_data() first.

To fix lets duplicate the error handling in TCP stack into tcp_bpf so
that we get the same error codes.

These were found in our CI tests that run applications against sockmap
and do longer lived testing, at least compared to test_sockmap that
does short-lived ping/pong tests, and in some of our test clusters
we deploy.

Its non-trivial to do these in a shorter form CI tests that would be
appropriate for BPF selftests, but we are looking into it so we can
ensure this keeps working going forward. As a preview one idea is to
pull in the packetdrill testing which catches some of this.

Fixes: c5d2177a72 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220104205918.286416-1-john.fastabend@gmail.com
2022-01-05 20:43:08 +01:00
Hou Tao e4a41c2c1f bpf, arm64: Use emit_addr_mov_i64() for BPF_PSEUDO_FUNC
The following error is reported when running "./test_progs -t for_each"
under arm64:

  bpf_jit: multi-func JIT bug 58 != 56
  [...]
  JIT doesn't support bpf-to-bpf calls

The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:

  0x48:  21 00 C0 D2    movz x1, #0x1, lsl #32
  0x4c:  21 00 80 F2    movk x1, #0x1

  0x48:  E1 3F C0 92    movn x1, #0x1ff, lsl #32
  0x4c:  41 FE A2 F2    movk x1, #0x17f2, lsl #16
  0x50:  81 70 9F F2    movk x1, #0xfb84

Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
the size of jited image will not change.

Fixes: 69c087ba62 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211231151018.3781550-1-houtao1@huawei.com
2022-01-05 20:43:08 +01:00
Jiri Olsa 5e22dd1862 bpf/selftests: Fix namespace mount setup in tc_redirect
The tc_redirect umounts /sys in the new namespace, which can be
mounted as shared and cause global umount. The lazy umount also
takes down mounted trees under /sys like debugfs, which won't be
available after sysfs mounts again and could cause fails in other
tests.

  # cat /proc/self/mountinfo | grep debugfs
  34 23 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:14 - debugfs debugfs rw
  # cat /proc/self/mountinfo | grep sysfs
  23 86 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
  # mount | grep debugfs
  debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)

  # ./test_progs -t tc_redirect
  #164 tc_redirect:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

  # mount | grep debugfs
  # cat /proc/self/mountinfo | grep debugfs
  # cat /proc/self/mountinfo | grep sysfs
  25 86 0:22 / /sys rw,relatime shared:2 - sysfs sysfs rw

Making the sysfs private under the new namespace so the umount won't
trigger the global sysfs umount.

Reported-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jussi Maki <joamaki@gmail.com>
Link: https://lore.kernel.org/bpf/20220104121030.138216-1-jolsa@kernel.org
2022-01-05 13:35:18 +01:00
Paul Chaignon 0fd800b245 bpftool: Probe for instruction set extensions
This patch introduces new probes to check whether the kernel supports
instruction set extensions v2 and v3. The first introduced eBPF
instructions BPF_J{LT,LE,SLT,SLE} in commit 92b31a9af7 ("bpf: add
BPF_J{LT,LE,SLT,SLE} instructions"). The second introduces 32-bit
variants of all jump instructions in commit 092ed0968b ("bpf:
verifier support JMP32").

These probes are useful for userspace BPF projects that want to use newer
instruction set extensions on newer kernels, to reduce the programs'
sizes or their complexity. LLVM already provides an mcpu=probe option to
automatically probe the kernel and select the newest-supported
instruction set extension. That is however not flexible enough for all
use cases. For example, in Cilium, we only want to use the v3
instruction set extension on v5.10+, even though it is supported on all
kernels v5.1+.

Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/3bfedcd9898c1f41ac67ca61f144fec84c6c3a92.1641314075.git.paul@isovalent.com
2022-01-05 13:31:40 +01:00
Paul Chaignon c04fb2b0bd bpftool: Probe for bounded loop support
This patch introduces a new probe to check whether the verifier supports
bounded loops as introduced in commit 2589726d12 ("bpf: introduce
bounded loops"). This patch will allow BPF users such as Cilium to probe
for loop support on startup and only unconditionally unroll loops on
older kernels.

The results are displayed as part of the miscellaneous section, as shown
below.

  $ bpftool feature probe | grep loops
  Bounded loop support is available
  $ bpftool feature probe macro | grep LOOPS
  #define HAVE_BOUNDED_LOOPS
  $ bpftool feature probe -j | jq .misc
  {
    "have_large_insn_limit": true,
    "have_bounded_loops": true
  }

Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/f7807c0b27d79f48e71de7b5a99c680ca4bd0151.1641314075.git.paul@isovalent.com
2022-01-05 13:31:40 +01:00
Paul Chaignon b22bf1b997 bpftool: Refactor misc. feature probe
There is currently a single miscellaneous feature probe,
HAVE_LARGE_INSN_LIMIT, to check for the 1M instructions limit in the
verifier. Subsequent patches will add additional miscellaneous probes,
which follow the same pattern at the existing probe. This patch
therefore refactors the probe to avoid code duplication in subsequent
patches.

The BPF program type and the checked error numbers in the
HAVE_LARGE_INSN_LIMIT probe are changed to better generalize to other
probes. The feature probe retains its current behavior despite those
changes.

Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/956c9329a932c75941194f91790d01f31dfbe01b.1641314075.git.paul@isovalent.com
2022-01-05 13:31:40 +01:00
David S. Miller c5bcdd8228 Merge branch 'lan966x-extend-switchdev-and-mdb-support'
Horatiu Vultur says:

====================
net: lan966x: Extend switchdev with mdb support

This patch series extends lan966x with mdb support by implementing
the switchdev callbacks: SWITCHDEV_OBJ_ID_PORT_MDB and
SWITCHDEV_OBJ_ID_HOST_MDB.
It adds support for both ipv4/ipv6 entries and l2 entries.

v2->v3:
- rename PGID_FIRST and PGID_LAST to PGID_GP_START and PGID_GP_END
- don't forget and relearn an entry for the CPU if there are more
  references to the cpu.

v1->v2:
- rename lan966x_mac_learn_impl to __lan966x_mac_learn
- rename lan966x_mac_cpu_copy to lan966x_mac_ip_learn
- fix grammar and typos in comments and commit messages
- add reference counter for entries that copy frames to CPU
====================

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:25:14 +00:00
Horatiu Vultur 7aacb894b1 net: lan966x: Extend switchdev with mdb support
Extend lan966x driver with mdb support by implementing the switchdev
calls: SWITCHDEV_OBJ_ID_PORT_MDB and SWITCHDEV_OBJ_ID_HOST_MDB.
It is allowed to add both ipv4/ipv6 entries and l2 entries. To add
ipv4/ipv6 entries is not required to use the PGID table while for l2
entries it is required. The PGID table is much smaller than MAC table
so only fewer l2 entries can be added.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:25:13 +00:00
Horatiu Vultur 11b0a27772 net: lan966x: Add PGID_GP_START and PGID_GP_END
The first entries in the PGID table are used by the front ports while
the last entries are used for different purposes like flooding mask,
copy to CPU, etc. So add these macros to define which entries can be
used for general purpose.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:25:13 +00:00
Horatiu Vultur fc0c3fe748 net: lan966x: Add function lan966x_mac_ip_learn()
Extend mac functionality with the function lan966x_mac_ip_learn. This
function adds an entry in the MAC table for IP multicast addresses.
These entries can copy a frame to the CPU but also can forward on the
front ports.
This functionality is needed for mdb support. In case the CPU and some
of the front ports subscribe to an IP multicast address.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:25:13 +00:00
David S. Miller 2a5ab39beb Merge branch 'mtk_eth_soc-refactoring-and-clause45'
Daniel Golle says:

====================
net: ethernet: mtk_eth_soc: refactoring and Clause 45

Rework value and type of mdio read and write functions in mtk_eth_soc
and generally clean up and unify both functions.
Then add support to access Clause 45 phy registers, using newly
introduced helper inline functions added by a patch Russell King has
suggested in a reply to an earlier version of this series [1].

All three commits are tested on the Bananapi BPi-R64 board having
MediaTek MT7531BE DSA gigE switch using clause 22 MDIO and
Ubiquiti UniFi 6 LR access point having Aquantia AQR112C PHY using
clause 45 MDIO.

[1]: https://lore.kernel.org/netdev/Ycr5Cna76eg2B0An@shell.armlinux.org.uk/

v11: also address return value of mtk_mdio_busy_wait
v10: correct order of SoB lines in 2/3, change patch order in series
v9: improved formatting and Cc missing maintainer
v8: add patch from Russel King, switch to bitfield helper macros
v7: remove unneeded variables and order OR-ed call parameters
v6: further clean up functions and more cleanly separate patches
v5: fix wrong variable name in first patch covered by follow-up patch
v4: clean-up return values and types, split into two commits
v3: return -1 instead of 0xffff on error in _mtk_mdio_write
v2: use MII_DEVADDR_C45_SHIFT and MII_REGADDR_C45_MASK to extract
    device id and register address. Unify read and write functions to
    have identical types and parameter names where possible as we are
    anyway already replacing both function bodies.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:22:17 +00:00
Daniel Golle e2e7f6e29c net: ethernet: mtk_eth_soc: implement Clause 45 MDIO access
Implement read and write access to IEEE 802.3 Clause 45 Ethernet
phy registers while making use of new mdiobus_c45_regad and
mdiobus_c45_devad helpers.

Tested on the Ubiquiti UniFi 6 LR access point featuring
MediaTek MT7622BV WiSoC with Aquantia AQR112C.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:22:17 +00:00
Russell King (Oracle) c6af53f038 net: mdio: add helpers to extract clause 45 regad and devad fields
Add a couple of helpers and definitions to extract the clause 45 regad
and devad fields from the regnum passed into MDIO drivers.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:22:17 +00:00
Daniel Golle eda80b249d net: ethernet: mtk_eth_soc: fix return values and refactor MDIO ops
Instead of returning -1 (-EPERM) when MDIO bus is stuck busy
while writing or 0xffff if it happens while reading, return the
appropriate -ETIMEDOUT. Also fix return type to int instead of u32.
Refactor functions to use bitfield helpers instead of having various
masking and shifting constants in the code, which also results in the
register definitions in the header file being more obviously related
to what is stated in the MediaTek's Reference Manual.

Fixes: 656e705243 ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05 11:22:17 +00:00
M Chetan Kumar ffd32ea6b1 Revert "net: wwan: iosm: Keep device at D0 for s2idle case"
Depending on BIOS configuration IOSM driver exchanges
protocol required for putting device into D3L2 or D3L1.2.

ipc_pcie_suspend_s2idle() is implemented to put device to D3L1.2.

This patch forces PCI core know this device should stay at D0.
- pci_save_state()is expensive since it does a lot of slow PCI
config reads.

The reported issue is not observed on x86 platform. The supurios
wake on AMD platform needs to be futher debugged with orignal patch
submitter [1]. Also the impact of adding pci_save_state() needs to be
assessed by testing it on other platforms.

This reverts commit f4dd5174e273("net: wwan: iosm: Keep device
at D0 for s2idle case").

[1] https://lore.kernel.org/all/20211224081914.345292-2-kai.heng.feng@canonical.com/

Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Link: https://lore.kernel.org/r/20220104150213.1894-1-m.chetan.kumar@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-04 18:15:17 -08:00
Jakub Kicinski 18343b8069 Just a few more changes:
* mac80211: allow non-standard VHT MCSes 10/11
  * mac80211: add sleepable station iterator for drivers
  * nl80211: clarify a comment
  * mac80211: small cleanup to use typed element helpers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmHUX+8ACgkQB8qZga/f
 l8RHoQ//aihUQkeiQc1D/xpGtpg9Bnphv0hQExxtwyCmI3FUupzqEcgWLFrJvkmM
 N/e0h1Fl6GFosDTsThVg3u39ERJfVvxY4MO7gQ07R3tBSFIwwdxLwi6b6B5TbsPE
 OVaU5clmsMjr8B4o1VRzKI9Rsgn5VVrj5MyDgBcR1mwPAwXwCgvUoNFGJ74dN0Wz
 Cr0zVGZvX+cVI4B977U0EnUU25JjEJouV2AqviNbfIeRCUmlFIUO8UOOr3VgY/pL
 1/ywf0s6ARlBpvz9HjqnZJfDgPcysJqP9k4bgyqW+ox0hFtzhivuzPw8AUVr1Xg1
 3IXLS0CjAedEV73aps9F6Xx+DJW79YgD0aZrHs097BEdlHEq1HqH+2CDDTZc03vT
 tySMtbO+CSMLCjjVZH+JsGwSfQ8bLoBInrFsZH/2Lc9hdQ2lWEMHuz7Vj+FFMw37
 wBsvdcUQBc3TvjjNSa34jnvfH5dhXAj4K/n8mmtZori8AYjOr8VoDLNMy8lACBHt
 AubJhHUXElvXylaN+fxvQf+ps0eajEoGP5Q86+5l+YWfwXki6M9tQxjhwOaJGH05
 rGJW3zQx9YALrjfU6R9m89v1akJhg67H6FcGwPG3ASI7ZhuQ7Al2TrhLWOmPvt43
 xfL+zfMtlExV8bj6qrHLw2YJJUSxqxRMe2ZLCqkJksYqxa0TNRs=
 =m+bZ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Just a few more changes:
 - mac80211: allow non-standard VHT MCSes 10/11
 - mac80211: add sleepable station iterator for drivers
 - nl80211: clarify a comment
 - mac80211: small cleanup to use typed element helpers

* tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next:
  mac80211: use ieee80211_bss_get_elem()
  nl80211: clarify comment for mesh PLINK_BLOCKED state
  mac80211: Add stations iterator where the iterator function may sleep
  mac80211: allow non-standard VHT MCS-10/11
====================

Link: https://lore.kernel.org/r/20220104153403.69749-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-04 08:13:03 -08:00
Johannes Berg b3c1906ed0 mac80211: use ieee80211_bss_get_elem()
Instead of ieee80211_bss_get_ie(), use the more typed
ieee80211_bss_get_elem().

Link: https://lore.kernel.org/r/20211220113609.56f8e2a70152.Id5a56afb8a4f9b38d10445e5a1874e93e84b5251@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04 15:50:36 +01:00
Felix Fietkau 5bc03b28ec nl80211: clarify comment for mesh PLINK_BLOCKED state
When a mesh link is in blocked state, it is very useful to still allow
auth requests from the peer to re-establish it.
When a remote node is power cycled, the peer state can easily end up
in blocked state if multiple auth attempts are performed. Since this
can lead to several minutes of downtime, we should accept auth attempts
of the peer after it has come back.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20211220105147.88625-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04 15:50:23 +01:00
Martin Blumenstingl acb99b9b2a mac80211: Add stations iterator where the iterator function may sleep
ieee80211_iterate_active_interfaces() and
ieee80211_iterate_active_interfaces_atomic() already exist, where the
former allows the iterator function to sleep. Add
ieee80211_iterate_stations() which is similar to
ieee80211_iterate_stations_atomic() but allows the iterator to sleep.
This is needed for adding SDIO support to the rtw88 driver. Some
interators there are reading or writing registers. With the SDIO ops
(sdio_readb, sdio_writeb and friends) this means that the iterator
function may sleep.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20211228211501.468981-2-martin.blumenstingl@googlemail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04 15:47:15 +01:00
Ping-Ke Shih 04be6d337d mac80211: allow non-standard VHT MCS-10/11
Some AP can possibly try non-standard VHT rate and mac80211 warns and drops
packets, and leads low TCP throughput.

    Rate marked as a VHT rate but data is invalid: MCS: 10, NSS: 2
    WARNING: CPU: 1 PID: 7817 at net/mac80211/rx.c:4856 ieee80211_rx_list+0x223/0x2f0 [mac8021

Since commit c27aa56a72 ("cfg80211: add VHT rate entries for MCS-10 and MCS-11")
has added, mac80211 adds this support as well.

After this patch, throughput is good and iw can get the bitrate:
    rx bitrate:	975.1 MBit/s VHT-MCS 10 80MHz short GI VHT-NSS 2
or
    rx bitrate:	1083.3 MBit/s VHT-MCS 11 80MHz short GI VHT-NSS 2

Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1192891
Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20220103013623.17052-1-pkshih@realtek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04 15:45:17 +01:00
Minghao Chi 416b27439d ethernet/sfc: remove redundant rc variable
Return value from efx_mcdi_rpc() directly instead
of taking this in another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04 12:41:41 +00:00
David S. Miller a0619a9e9e Merge branch 'namespacify-mtu-ipv4'
xu xin says:

====================
ipv4: Namespaceify two sysctls related with mtu

The following patch series enables the min_pmtu and mtu_expires to
be visible and configurable per net namespace. Different namespace
application might have different requirements on the setting of
min_pmtu and mtu_expires.

If these two patches are applied, inside a net namespace we create,
we can see two more sysctls under /proc/sys/net/ipv4/route:
1. min_pmtu
2. mtu_expires

where min_pmtu and mtu_expires are configurable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04 12:40:22 +00:00
xu xin 1135fad204 Namespaceify mtu_expires sysctl
This patch enables the sysctl mtu_expires to be configured per net
namespace.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04 12:40:22 +00:00