linux

mirror of https://github.com/torvalds/linux synced 2024-10-02 01:10:22 +00:00

Author	SHA1	Message	Date
Andrew Halaney	f3c2caacee	net: stmmac: don't create a MDIO bus if unnecessary Currently a MDIO bus is created if the devicetree description is either: 1. Not fixed-link 2. fixed-link but contains a MDIO bus as well The "1" case above isn't always accurate. If there's a phy-handle, it could be referencing a phy on another MDIO controller's bus[1]. In this case, where the MDIO bus is not described at all, currently stmmac will make a MDIO bus and scan its address space to discover phys (of which there are none). This process takes time scanning a bus that is known to be empty, delaying time to complete probe. There are also a lot of upstream devicetrees[2] that expect a MDIO bus to be created, scanned for phys, and the first one found connected to the MAC. This case can be inferred from the platform description by not having a phy-handle && not being fixed-link. This hits case "1" in the current driver's logic, and must be handled in any logic change here since it is a valid legacy dt-binding. Let's improve the logic to create a MDIO bus if either: - Devicetree contains a MDIO bus - !fixed-link && !phy-handle (legacy handling) This way the case where no MDIO bus should be made is handled, as well as retaining backwards compatibility with the valid cases. Below devicetree snippets can be found that explain some of the cases above more concretely. Here's[0] a devicetree example where the MAC is both fixed-link and driving a switch on MDIO (case "2" above). This needs a MDIO bus to be created: &fec1 { phy-mode = "rmii"; fixed-link { speed = <100>; full-duplex; }; mdio1: mdio { switch0: switch0@0 { compatible = "marvell,mv88e6190"; pinctrl-0 = <&pinctrl_gpio_switch0>; }; }; }; Here's[1] an example where there is no MDIO bus or fixed-link for the ethernet1 MAC, so no MDIO bus should be created since ethernet0 is the MDIO master for ethernet1's phy: &ethernet0 { phy-mode = "sgmii"; phy-handle = <&sgmii_phy0>; mdio { compatible = "snps,dwmac-mdio"; sgmii_phy0: phy@8 { compatible = "ethernet-phy-id0141.0dd4"; reg = <0x8>; device_type = "ethernet-phy"; }; sgmii_phy1: phy@a { compatible = "ethernet-phy-id0141.0dd4"; reg = <0xa>; device_type = "ethernet-phy"; }; }; }; &ethernet1 { phy-mode = "sgmii"; phy-handle = <&sgmii_phy1>; }; Finally there's descriptions like this[2] which don't describe the MDIO bus but expect it to be created and the whole address space scanned for a phy since there's no phy-handle or fixed-link described: &gmac { phy-supply = <&vcc_lan>; phy-mode = "rmii"; snps,reset-gpio = <&gpio3 RK_PB4 GPIO_ACTIVE_HIGH>; snps,reset-active-low; snps,reset-delays-us = <0 10000 1000000>; }; [0] https://elixir.bootlin.com/linux/v6.5-rc5/source/arch/arm/boot/dts/nxp/vf/vf610-zii-ssmb-dtu.dts [1] https://elixir.bootlin.com/linux/v6.6-rc5/source/arch/arm64/boot/dts/qcom/sa8775p-ride.dts [2] https://elixir.bootlin.com/linux/v6.6-rc5/source/arch/arm64/boot/dts/rockchip/rk3368-r88.dts#L164 Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Co-developed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-15 09:30:21 +00:00
Daniel Xu	7489723c2e	bpf: xdp: Register generic_kfunc_set with XDP programs Registering generic_kfunc_set with XDP programs enables some of the newer BPF features inside XDP -- namely tree based data structures and BPF exceptions. The current motivation for this commit is to enable assertions inside XDP bpf progs. Assertions are a standard and useful tool to encode intent. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/d07d4614b81ca6aada44fcb89bb6b618fb66e4ca.1702594357.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 19:12:16 -08:00
Jason Xing	8d182d5869	i40e: remove fake support of rx-frames-irq Since we never support this feature for I40E driver, we don't have to display the value when using 'ethtool -c eth0'. Before this patch applied, the rx-frames-irq is 256 which is consistent with tx-frames-irq. Apparently it could mislead users. Signed-off-by: Jason Xing <kernelxing@tencent.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231213184406.1306602-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:56:46 -08:00
Jakub Kicinski	0d2f3b87d5	Merge branch 'mdio-mux-cleanup' Vladimir Oltean says: ==================== MDIO mux cleanup This small patch set resolves some technical debt in the MDIO mux driver which was discovered during the investigation for commit `1f9f2143f2` ("net: mdio-mux: fix C45 access returning -EIO after API change"). The patches have been sitting for 2 months in the NXP SDK kernel and haven't caused issues. ==================== Link: https://lore.kernel.org/r/20231213152712.320842-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:55:40 -08:00
Vladimir Oltean	10ad63da5c	net: mdio-mux: be compatible with parent buses which only support C45 After the mii_bus API conversion to a split read() / read_c45(), there might be MDIO parent buses which only populate the read_c45() and write_c45() function pointers but not the C22 variants. We haven't seen these in the wild paired with MDIO multiplexers, but Andrew points out we should treat the corner case. Link: https://lore.kernel.org/netdev/4ccd7dc9-b611-48aa-865f-68d3a1327ce8@lunn.ch/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231213152712.320842-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:55:38 -08:00
Vladimir Oltean	d215ab4d6a	net: mdio-mux: show errors on probe failure Showing the precise error symbols can help debugging probe issues, such as the recent -EIO error in of_mdiobus_register() caused by the lack of bus->read_c45() and bus->write_c45() methods. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231213152712.320842-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:55:38 -08:00
Igor Russkikh	b3cb7a830a	net: atlantic: eliminate double free in error handling logic Driver has a logic leak in ring data allocation/free, where aq_ring_free could be called multiple times on same ring, if system is under stress and got memory allocation error. Ring pointer was used as an indicator of failure, but this is not correct since only ring data is allocated/deallocated. Ring itself is an array member. Changing ring allocation functions to return error code directly. This simplifies error handling and eliminates aq_ring_free on higher layer. Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Link: https://lore.kernel.org/r/20231213095044.23146-1-irusskikh@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:53:54 -08:00
Jakub Kicinski	1891cfe3b3	Merge branch 'convert-net-selftests-to-run-in-unique-namespace-part-3' Hangbin Liu says: ==================== Convert net selftests to run in unique namespace (Part 3) Here is the 3rd part of converting net selftests to run in unique namespace. This part converts all srv6 and fib tests. Note that patch 06 is a fix for testing fib_nexthop_multiprefix. Here is the part 1 link: https://lore.kernel.org/netdev/20231202020110.362433-1-liuhangbin@gmail.com And part 2 link: https://lore.kernel.org/netdev/20231206070801.1691247-1-liuhangbin@gmail.com ==================== Link: https://lore.kernel.org/r/20231213060856.4030084-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:38 -08:00
Hangbin Liu	b795db185e	selftests/net: convert fdb_flush.sh to run it in unique namespace Here is the test result after conversion. # ./fdb_flush.sh TEST: vx10: Expected 5 FDB entries, got 5 [ OK ] TEST: vx20: Expected 5 FDB entries, got 5 [ OK ] ... TEST: vx10: Expected 5 FDB entries, got 5 [ OK ] TEST: Test entries with dst 192.0.2.1 [ OK ] Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20231213060856.4030084-14-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:36 -08:00
Hangbin Liu	f6fc5b9499	selftests/net: convert fib_tests.sh to run it in unique namespace Here is the test result after conversion. # ./fib_tests.sh Single path route test Start point TEST: IPv4 fibmatch [ OK ] ... Fib6 garbage collection test TEST: ipv6 route garbage collection [ OK ] IPv4 multipath list receive tests TEST: Multipath route hit ratio (1.00) [ OK ] IPv6 multipath list receive tests TEST: Multipath route hit ratio (1.00) [ OK ] Tests passed: 225 Tests failed: 0 Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20231213060856.4030084-13-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:36 -08:00
Hangbin Liu	6c0ee7b4d6	selftests/net: convert fib_rule_tests.sh to run it in unique namespace Here is the test result after conversion. ]# ./fib_rule_tests.sh TEST: rule6 check: oif redirect to table [ OK ] ... TEST: rule4 dsfield tcp connect (dsfield 0x07) [ OK ] Tests passed: 66 Tests failed: 0 Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-12-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:36 -08:00
Hangbin Liu	3a06833b2a	selftests/net: convert fib-onlink-tests.sh to run it in unique namespace Remove PEER_CMD, which is not used in this test Here is the test result after conversion. ]# ./fib-onlink-tests.sh Error: ipv4: FIB table does not exist. Flush terminated Error: ipv6: FIB table does not exist. Flush terminated ######################################## Configuring interfaces ... TEST: Gateway resolves to wrong nexthop device - VRF [ OK ] Tests passed: 38 Tests failed: 0 Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-11-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	39333e3167	selftests/net: convert fib_nexthops.sh to run it in unique namespace Here is the test result after conversion. ]# ./fib_nexthops.sh Basic functional tests ---------------------- TEST: List with nothing defined [ OK ] TEST: Nexthop get on non-existent id [ OK ] ... TEST: IPv6 resilient nexthop group torture test [ OK ] Tests passed: 234 Tests failed: 0 Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20231213060856.4030084-10-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	d2168ea792	selftests/net: convert fib_nexthop_nongw.sh to run it in unique namespace Here is the test result after conversion. ]# ./fib_nexthop_nongw.sh TEST: nexthop: get route with nexthop without gw [ OK ] TEST: nexthop: ping through nexthop without gw [ OK ] Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-9-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	5ae89fe43a	selftests/net: convert fib_nexthop_multiprefix to run it in unique namespace Here is the test result after conversion. ]# ./fib_nexthop_multiprefix.sh TEST: IPv4: host 0 to host 1, mtu 1300 [ OK ] TEST: IPv6: host 0 to host 1, mtu 1300 [ OK ] TEST: IPv4: host 0 to host 2, mtu 1350 [ OK ] TEST: IPv6: host 0 to host 2, mtu 1350 [ OK ] TEST: IPv4: host 0 to host 3, mtu 1400 [ OK ] TEST: IPv6: host 0 to host 3, mtu 1400 [ OK ] TEST: IPv4: host 0 to host 1, mtu 1300 [ OK ] TEST: IPv6: host 0 to host 1, mtu 1300 [ OK ] TEST: IPv4: host 0 to host 2, mtu 1350 [ OK ] TEST: IPv6: host 0 to host 2, mtu 1350 [ OK ] TEST: IPv4: host 0 to host 3, mtu 1400 [ OK ] TEST: IPv6: host 0 to host 3, mtu 1400 [ OK ] Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-8-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	a33e9da347	selftests/net: fix grep checking for fib_nexthop_multiprefix When running fib_nexthop_multiprefix test I saw all IPv6 test failed. e.g. ]# ./fib_nexthop_multiprefix.sh TEST: IPv4: host 0 to host 1, mtu 1300 [ OK ] TEST: IPv6: host 0 to host 1, mtu 1300 [FAIL] With -v it shows COMMAND: ip netns exec h0 /usr/sbin/ping6 -s 1350 -c5 -w5 2001:db8:101::1 PING 2001:db8:101::1(2001:db8:101::1) 1350 data bytes From 2001:db8:100::64 icmp_seq=1 Packet too big: mtu=1300 --- 2001:db8:101::1 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms Route get 2001:db8:101::1 via 2001:db8:100::64 dev eth0 src 2001:db8:100::1 metric 1024 expires 599sec mtu 1300 pref medium Searching for: 2001:db8:101::1 from :: via 2001:db8:100::64 dev eth0 src 2001:db8:100::1 .* mtu 1300 The reason is when CONFIG_IPV6_SUBTREES is not enabled, rt6_fill_node() will not put RTA_SRC info. After fix: ]# ./fib_nexthop_multiprefix.sh TEST: IPv4: host 0 to host 1, mtu 1300 [ OK ] TEST: IPv6: host 0 to host 1, mtu 1300 [ OK ] Fixes: `735ab2f65d` ("selftests: Add test with multiple prefixes using single nexthop") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-7-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	779283b777	selftests/net: convert fcnal-test.sh to run it in unique namespace Here is the test result after conversion. There are some failures, but it also exists on my system without this patch. So it's not affectec by this patch and I will check the reason later. ]# time ./fcnal-test.sh /usr/bin/which: no nettest in (/root/.local/bin:/root/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin) ########################################################################### IPv4 ping ########################################################################### ################################################################# No VRF SYSCTL: net.ipv4.raw_l3mdev_accept=0 TEST: ping out - ns-B IP [ OK ] TEST: ping out, device bind - ns-B IP [ OK ] TEST: ping out, address bind - ns-B IP [ OK ] ... ################################################################# SNAT on VRF TEST: IPv4 TCP connection over VRF with SNAT [ OK ] TEST: IPv6 TCP connection over VRF with SNAT [ OK ] Tests passed: 893 Tests failed: 21 real 52m48.178s user 0m34.158s sys 1m42.976s BTW, this test needs a really long time. So expand the timeout to 1h. Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-6-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	792cd1dbc8	selftests/net: convert srv6_end_dt6_l3vpn_test.sh to run it in unique namespace As the name \${rt-${rt}} may make reader confuse, convert the variable hs/rt in setup_rt/hs to hid, rid. Here is the test result after conversion. ]# ./srv6_end_dt6_l3vpn_test.sh ################################################################################ TEST SECTION: IPv6 routers connectivity test ################################################################################ TEST: Routers connectivity: rt-1 -> rt-2 [ OK ] TEST: Routers connectivity: rt-2 -> rt-1 [ OK ] ... TEST: Hosts isolation: hs-t200-4 -X-> hs-t100-2 [ OK ] Tests passed: 18 Tests failed: 0 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:35 -08:00
Hangbin Liu	7b2d941c81	selftests/net: convert srv6_end_dt4_l3vpn_test.sh to run it in unique namespace As the name \${rt-${rt}} may make reader confuse, convert the variable hs/rt in setup_rt/hs to hid, rid. Here is the test result after conversion. ]# ./srv6_end_dt4_l3vpn_test.sh ################################################################################ TEST SECTION: IPv6 routers connectivity test ################################################################################ TEST: Routers connectivity: rt-1 -> rt-2 [ OK ] TEST: Routers connectivity: rt-2 -> rt-1 [ OK ] ... TEST: Hosts isolation: hs-t200-4 -X-> hs-t100-2 [ OK ] Tests passed: 18 Tests failed: 0 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:34 -08:00
Hangbin Liu	59cac2efd3	selftests/net: convert srv6_end_dt46_l3vpn_test.sh to run it in unique namespace As the name \${rt-${rt}} may make reader confuse, convert the variable hs/rt in setup_rt/hs to hid, rid. Here is the test result after conversion. ]# ./srv6_end_dt46_l3vpn_test.sh ################################################################################ TEST SECTION: IPv6 routers connectivity test ################################################################################ TEST: Routers connectivity: rt-1 -> rt-2 [ OK ] TEST: Routers connectivity: rt-2 -> rt-1 [ OK ] ... TEST: IPv4 Hosts isolation: hs-t200-4 -X-> hs-t100-2 [ OK ] Tests passed: 34 Tests failed: 0 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:34 -08:00
Hangbin Liu	b6925b4ed5	selftests/net: add variable NS_LIST for lib.sh Add a global variable NS_LIST to store all the namespaces that setup_ns created, so the caller could call cleanup_all_ns() instead of remember all the netns names when using cleanup_ns(). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231213060856.4030084-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:38:34 -08:00
Randy Dunlap	fcb29877f7	page_pool: fix typos and punctuation Correct spelling (s/and/any) and a run-on sentence. Spell out "multi". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231213043650.12672-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 18:00:20 -08:00
Randy Dunlap	bf873a800a	net: skbuff: fix spelling errors Correct spelling as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231213043511.10357-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:59:52 -08:00
Jakub Kicinski	81d56f567a	Merge branch 'net-mdio-mdio-bcm-unimac-optimizations-and-clean-up' Justin Chen says: ==================== net: mdio: mdio-bcm-unimac: optimizations and clean up Clean up mdio poll to use read_poll_timeout() and reduce the potential poll time. ==================== Link: https://lore.kernel.org/r/20231213222744.2891184-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:59:00 -08:00
Justin Chen	54a600ed21	net: mdio: mdio-bcm-unimac: Use read_poll_timeout Simplify the code by using read_poll_timeout(). Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231213222744.2891184-3-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:58:58 -08:00
Justin Chen	268531be21	net: mdio: mdio-bcm-unimac: Delay before first poll With a clock interval of 400 nsec and a 64 bit transactions (32 bit preamble & 16 bit control & 16 bit data), it is reasonable to assume the mdio transaction will take 25.6 usec. Add a 30 usec delay before the first poll to reduce the chance of a 1000-2000 usec sleep. Reduce the timeout from 1000ms to 100ms as it is unlikely for the bus to take this long. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231213222744.2891184-2-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:58:58 -08:00
Jakub Kicinski	c2d919cdfe	Merge branch 'tools-ynl-gen-fill-in-the-gaps-in-support-of-legacy-families' Jakub Kicinski says: ==================== tools: ynl-gen: fill in the gaps in support of legacy families Fill in the gaps in YNL C code gen so that we can generate user space code for all genetlink families for which we have specs. The two major changes we need are support for fixed headers and support for recursive nests. For fixed header support - place the struct for the fixed header directly in the request struct (and don't bother generating access helpers). The member of a fixed header can't be too complex, and also are by definition not optional so the user has to fill them in. The YNL core needs a bit of a tweak to understand that the attrs may now start at a fixed offset, which is not necessarily equal to sizeof(struct genlmsghdr). Dealing with nested sets is much harder. Previously we'd gen the nested structs as: struct outer { struct inner inner; }; If structs are recursive (e.g. inner contains outer again) we must break this chain and allocate one of the structs dynamically (store a pointer rather than full struct). ==================== Link: https://lore.kernel.org/r/20231213231432.2944749-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:24 -08:00
Jakub Kicinski	7b5fe80ebc	tools: ynl-gen: print prototypes for recursive stuff We avoid printing forward declarations and prototypes for most types by sorting things topologically. But if structs nest we do need the forward declarations, there's no other way. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	461f25a2e4	tools: ynl-gen: store recursive nests by a pointer To avoid infinite nesting store recursive structs by pointer. If recursive struct is placed in the op directly - the first instance can be stored by value. That makes the code much less of a pain for majority of practical uses. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	aa75783b95	tools: ynl-gen: re-sort ignoring recursive nests We try to keep the structures and helpers "topologically sorted", to avoid forward declarations. When recursive nests are at play we need to sort twice, because structs which end up being marked as recursive will get a full set of forward declarations, so we should ignore them for the purpose of sorting. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	38329fcfb7	tools: ynl-gen: record information about recursive nests Track which nests are recursive. Non-recursive nesting gets rendered in C as directly nested structs. For recursive ones we need to put a pointer in, rather than full struct. Track this information, no change to generated code, yet. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	f967a498fc	tools: ynl-gen: fill in implementations for TypeUnused Fill in more empty handlers for TypeUnused. When 'unused' attr gets specified in a nested set we have to cleanly skip it during code generation. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	f6805072c2	tools: ynl-gen: support fixed headers in genetlink Support genetlink families using simple fixed headers. Assume fixed header is identical for all ops of the family for now. Fixed headers are added to the request and reply structs as a _hdr member, and copied to/from netlink messages appropriately. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	139c163b5b	tools: ynl-gen: use enum user type for members and args Commit `30c9020015` ("tools: ynl-gen: use enum name from the spec") added pre-cooked user type for enums. Use it to fix ignoring enum-name provided in the spec. This changes a type in struct ethtool_tunnel_udp_entry but is generally inconsequential for current families. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:21 -08:00
Jakub Kicinski	4dc27587dc	tools: ynl-gen: add missing request free helpers for dumps The code gen generates a prototype for dump request free in the header, but no implementation in the source. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20231213231432.2944749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:51:20 -08:00
Alexei Starovoitov	0f5d5454c7	Merge branch 'bpf-fs-mount-options-parsing-follow-ups' Andrii Nakryiko says: ==================== BPF FS mount options parsing follow ups Original BPF token patch set ([0]) added delegate_xxx mount options which supported only special "any" value and hexadecimal bitmask. This patch set attempts to make specifying and inspecting these mount options more human-friendly by supporting string constants matching corresponding bpf_cmd, bpf_map_type, bpf_prog_type, and bpf_attach_type enumerators. This implementation relies on BTF information to find all supported symbolic names. If kernel wasn't built with BTF, BPF FS will still support "any" and hex-based mask. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=805707&state=* v1->v2: - strip BPF_, BPF_MAP_TYPE_, and BPF_PROG_TYPE_ prefixes, do case-insensitive comparison, normalize to lower case (Alexei). ==================== Link: https://lore.kernel.org/r/20231214225016.1209867-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:30:27 -08:00
Andrii Nakryiko	f2d0ffee1f	selftests/bpf: utilize string values for delegate_xxx mount options Use both hex-based and string-based way to specify delegate mount options for BPF FS. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231214225016.1209867-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:30:27 -08:00
Andrii Nakryiko	c5707b2146	bpf: support symbolic BPF FS delegation mount options Besides already supported special "any" value and hex bit mask, support string-based parsing of delegation masks based on exact enumerator names. Utilize BTF information of `enum bpf_cmd`, `enum bpf_map_type`, `enum bpf_prog_type`, and `enum bpf_attach_type` types to find supported symbolic names (ignoring __MAX_xxx guard values and stripping repetitive prefixes like BPF_ for cmd and attach types, BPF_MAP_TYPE_ for maps, and BPF_PROG_TYPE_ for prog types). The case doesn't matter, but it is normalized to lower case in mount option output. So "PROG_LOAD", "prog_load", and "MAP_create" are all valid values to specify for delegate_cmds options, "array" is among supported for map types, etc. Besides supporting string values, we also support multiple values specified at the same time, using colon (':') separator. There are corresponding changes on bpf_show_options side to use known values to print them in human-readable format, falling back to hex mask printing, if there are any unrecognized bits. This shouldn't be necessary when enum BTF information is present, but in general we should always be able to fall back to this even if kernel was built without BTF. As mentioned, emitted symbolic names are normalized to be all lower case. Example below shows various ways to specify delegate_cmds options through mount command and how mount options are printed back: 12/14 14:39:07.604 vmuser@archvm:~/local/linux/tools/testing/selftests/bpf $ mount \| rg token $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp $ mount \| grep token bpffs on /sys/fs/bpf/token type bpf (rw,relatime,delegate_cmds=map_create:prog_load,delegate_progs=kprobe,delegate_attachs=xdp) Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231214225016.1209867-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:30:27 -08:00
Jakub Kicinski	8f674972d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/intel/iavf/iavf_ethtool.c `3a0b5a2929` ("iavf: Introduce new state machines for flow director") `95260816b4` ("iavf: use iavf_schedule_aq_request() helper") https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/ drivers/net/ethernet/broadcom/bnxt/bnxt.c `c13e268c07` ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic") `c2f8063309` ("bnxt_en: Refactor RX VLAN acceleration logic.") `a7445d6980` ("bnxt_en: Add support for new RX and TPA_START completion types for P7") `1c7fd6ee2f` ("bnxt_en: Rename some macros for the P5 chips") https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c `bd6781c18c` ("bnxt_en: Fix wrong return value check in bnxt_close_nic()") `84793a4995` ("bnxt_en: Skip nic close/open when configuring tstamp filters") https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/ drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c `3d7a3f2612` ("net/mlx5: Nack sync reset request when HotPlug is enabled") `cecf44ea1a` ("net/mlx5: Allow sync reset flow when BF MGT interface device is present") https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-14 17:14:41 -08:00
Alexei Starovoitov	403f3e8fda	Merge branch 'add-bpf_xdp_get_xfrm_state-kfunc' Daniel Xu says: ==================== Add bpf_xdp_get_xfrm_state() kfunc This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and xfrm_state_put(). The intent is to support software RSS (via XDP) for the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed on (hopefully) reproducible AWS testbeds indicate that single tunnel pcpu ipsec can reach line rate on 100G ENA nics. Note this patchset only tests/shows generic xfrm_state access. The "secret sauce" (if you can really even call it that) involves accessing a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is available here [1]. [0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/ [1]: `e89a1c617a/xdp-bench/xdp_redirect_cpumap.bpf.c (L385-L406)` Changes from v5: * Improve kfunc doc comments * Remove extraneous replay-window setting on selftest reverse path * Squash two kfunc commits into one * Rebase to bpf-next to pick up bitfield write patches * Remove testing of opts.error in selftest prog Changes from v4: * Fixup commit message for selftest * Set opts->error -ENOENT for !x * Revert single file xfrm + bpf Changes from v3: * Place all xfrm bpf integrations in xfrm_bpf.c * Avoid using nval as a temporary * Rebase to bpf-next * Remove extraneous __failure_unpriv annotation for verifier tests Changes from v2: * Fix/simplify BPF_CORE_WRITE_BITFIELD() algorithm * Added verifier tests for bitfield writes * Fix state leakage across test_tunnel subtests Changes from v1: * Move xfrm tunnel tests to test_progs * Fix writing to opts->error when opts is invalid * Use __bpf_kfunc_start_defs() * Remove unused vxlanhdr definition * Add and use BPF_CORE_WRITE_BITFIELD() macro * Make series bisect clean Changes from RFCv2: * Rebased to ipsec-next * Fix netns leak Changes from RFCv1: * Add Antony's commit tags * Add KF_ACQUIRE and KF_RELEASE semantics ==================== Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/cover.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:12:56 -08:00
Daniel Xu	2cd07b0eb0	bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() This commit extends test_tunnel selftest to test the new XDP xfrm state lookup kfunc. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/e704e9a4332e3eac7b458e4bfdec8fcc6984cdb6.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:12:49 -08:00
Daniel Xu	e7adc8291a	bpf: selftests: Move xfrm tunnel test to test_progs test_progs is better than a shell script b/c C is a bit easier to maintain than shell. Also it's easier to use new infra like memory mapped global variables from C via bpf skeleton. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/a350db9e08520c64544562d88ec005a039124d9b.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:12:49 -08:00
Daniel Xu	02b4e126e6	bpf: selftests: test_tunnel: Use vmlinux.h declarations vmlinux.h declarations are more ergnomic, especially when working with kfuncs. The uapi headers are often incomplete for kfunc definitions. This commit also switches bitfield accesses to use CO-RE helpers. Switching to vmlinux.h definitions makes the verifier very unhappy with raw bitfield accesses. The error is: ; md.u.md2.dir = direction; 33: (69) r1 = (u16 )(r2 +11) misaligned stack access off (0x0; 0x0)+-64+11 size 2 Fix by using CO-RE-aware bitfield reads and writes. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/884bde1d9a351d126a3923886b945ea6b1b0776b.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:12:49 -08:00
Daniel Xu	77a7a8220f	bpf: selftests: test_tunnel: Setup fresh topology for each subtest This helps with determinism b/c individual setup/teardown prevents leaking state between different subtests. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/0fb59fa16fb58cca7def5239df606005a3e8dd0e.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:12:49 -08:00
Daniel Xu	8f0ec8c681	bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc This commit adds an unstable kfunc helper to access internal xfrm_state associated with an SA. This is intended to be used for the upcoming IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other words: for custom software RSS. That being said, the function that this kfunc wraps is fairly generic and used for a lot of xfrm tasks. I'm sure people will find uses elsewhere over time. This commit also adds a corresponding bpf_xdp_xfrm_state_release() kfunc to release the refcnt acquired by bpf_xdp_get_xfrm_state(). The verifier will require that all acquired xfrm_state's are released. Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/a29699c42f5fad456b875c98dd11c6afc3ffb707.1702593901.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:12:49 -08:00
Yonghong Song	56925f389e	selftests/bpf: Remove flaky test_btf_id test With previous patch, one of subtests in test_btf_id becomes flaky and may fail. The following is a failing example: Error: #26 btf Error: #26/174 btf/BTF ID Error: #26/174 btf/BTF ID btf_raw_create:PASS:check 0 nsec btf_raw_create:PASS:check 0 nsec test_btf_id:PASS:check 0 nsec ... test_btf_id:PASS:check 0 nsec test_btf_id:FAIL:check BTF lingersdo_test_get_info:FAIL:check failed: -1 The test tries to prove a btf_id not available after the map is closed. But btf_id is freed only after workqueue and a rcu grace period, compared to previous case just after a rcu grade period. Depending on system workload, workqueue could take quite some time to execute function bpf_map_free_deferred() which may cause the test failure. Instead of adding arbitrary delays, let us remove the logic to check btf_id availability after map is closed. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231214203820.1469402-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:10:32 -08:00
Yonghong Song	59e5791f59	bpf: Fix a race condition between btf_put() and map_free() When running `./test_progs -j` in my local vm with latest kernel, I once hit a kasan error like below: [ 1887.184724] BUG: KASAN: slab-use-after-free in bpf_rb_root_free+0x1f8/0x2b0 [ 1887.185599] Read of size 4 at addr ffff888106806910 by task kworker/u12:2/2830 [ 1887.186498] [ 1887.186712] CPU: 3 PID: 2830 Comm: kworker/u12:2 Tainted: G OEL 6.7.0-rc3-00699-g90679706d486-dirty #494 [ 1887.188034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1887.189618] Workqueue: events_unbound bpf_map_free_deferred [ 1887.190341] Call Trace: [ 1887.190666] <TASK> [ 1887.190949] dump_stack_lvl+0xac/0xe0 [ 1887.191423] ? nf_tcp_handle_invalid+0x1b0/0x1b0 [ 1887.192019] ? panic+0x3c0/0x3c0 [ 1887.192449] print_report+0x14f/0x720 [ 1887.192930] ? preempt_count_sub+0x1c/0xd0 [ 1887.193459] ? __virt_addr_valid+0xac/0x120 [ 1887.194004] ? bpf_rb_root_free+0x1f8/0x2b0 [ 1887.194572] kasan_report+0xc3/0x100 [ 1887.195085] ? bpf_rb_root_free+0x1f8/0x2b0 [ 1887.195668] bpf_rb_root_free+0x1f8/0x2b0 [ 1887.196183] ? __bpf_obj_drop_impl+0xb0/0xb0 [ 1887.196736] ? preempt_count_sub+0x1c/0xd0 [ 1887.197270] ? preempt_count_sub+0x1c/0xd0 [ 1887.197802] ? _raw_spin_unlock+0x1f/0x40 [ 1887.198319] bpf_obj_free_fields+0x1d4/0x260 [ 1887.198883] array_map_free+0x1a3/0x260 [ 1887.199380] bpf_map_free_deferred+0x7b/0xe0 [ 1887.199943] process_scheduled_works+0x3a2/0x6c0 [ 1887.200549] worker_thread+0x633/0x890 [ 1887.201047] ? __kthread_parkme+0xd7/0xf0 [ 1887.201574] ? kthread+0x102/0x1d0 [ 1887.202020] kthread+0x1ab/0x1d0 [ 1887.202447] ? pr_cont_work+0x270/0x270 [ 1887.202954] ? kthread_blkcg+0x50/0x50 [ 1887.203444] ret_from_fork+0x34/0x50 [ 1887.203914] ? kthread_blkcg+0x50/0x50 [ 1887.204397] ret_from_fork_asm+0x11/0x20 [ 1887.204913] </TASK> [ 1887.204913] </TASK> [ 1887.205209] [ 1887.205416] Allocated by task 2197: [ 1887.205881] kasan_set_track+0x3f/0x60 [ 1887.206366] __kasan_kmalloc+0x6e/0x80 [ 1887.206856] __kmalloc+0xac/0x1a0 [ 1887.207293] btf_parse_fields+0xa15/0x1480 [ 1887.207836] btf_parse_struct_metas+0x566/0x670 [ 1887.208387] btf_new_fd+0x294/0x4d0 [ 1887.208851] __sys_bpf+0x4ba/0x600 [ 1887.209292] __x64_sys_bpf+0x41/0x50 [ 1887.209762] do_syscall_64+0x4c/0xf0 [ 1887.210222] entry_SYSCALL_64_after_hwframe+0x63/0x6b [ 1887.210868] [ 1887.211074] Freed by task 36: [ 1887.211460] kasan_set_track+0x3f/0x60 [ 1887.211951] kasan_save_free_info+0x28/0x40 [ 1887.212485] ____kasan_slab_free+0x101/0x180 [ 1887.213027] __kmem_cache_free+0xe4/0x210 [ 1887.213514] btf_free+0x5b/0x130 [ 1887.213918] rcu_core+0x638/0xcc0 [ 1887.214347] __do_softirq+0x114/0x37e The error happens at bpf_rb_root_free+0x1f8/0x2b0: 00000000000034c0 <bpf_rb_root_free>: ; { 34c0: f3 0f 1e fa endbr64 34c4: e8 00 00 00 00 callq 0x34c9 <bpf_rb_root_free+0x9> 34c9: 55 pushq %rbp 34ca: 48 89 e5 movq %rsp, %rbp ... ; if (rec && rec->refcount_off >= 0 && 36aa: 4d 85 ed testq %r13, %r13 36ad: 74 a9 je 0x3658 <bpf_rb_root_free+0x198> 36af: 49 8d 7d 10 leaq 0x10(%r13), %rdi 36b3: e8 00 00 00 00 callq 0x36b8 <bpf_rb_root_free+0x1f8> <==== kasan function 36b8: 45 8b 7d 10 movl 0x10(%r13), %r15d <==== use-after-free load 36bc: 45 85 ff testl %r15d, %r15d 36bf: 78 8c js 0x364d <bpf_rb_root_free+0x18d> So the problem is at rec->refcount_off in the above. I did some source code analysis and find the reason. CPU A CPU B bpf_map_put: ... btf_put with rcu callback ... bpf_map_free_deferred with system_unbound_wq ... ... ... ... btf_free_rcu: ... ... ... bpf_map_free_deferred: ... ... ... ---------> btf_struct_metas_free() ... \| race condition ... ... ---------> map->ops->map_free() ... ... btf->struct_meta_tab = NULL In the above, map_free() corresponds to array_map_free() and eventually calling bpf_rb_root_free() which calls: ... __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false); ... Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code: meta = btf_find_struct_meta(btf, btf_id); if (!meta) return -EFAULT; rec->fields[i].graph_root.value_rec = meta->record; So basically, 'value_rec' is a pointer to the record in struct_metas_tab. And it is possible that that particular record has been freed by btf_struct_metas_free() and hence we have a kasan error here. Actually it is very hard to reproduce the failure with current bpf/bpf-next code, I only got the above error once. To increase reproducibility, I added a delay in bpf_map_free_deferred() to delay map->ops->map_free(), which significantly increased reproducibility. diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5e43ddd1b83f..aae5b5213e93 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct work) struct bpf_map map = container_of(work, struct bpf_map, work); struct btf_record rec = map->record; + mdelay(100); security_bpf_map_free(map); bpf_map_release_memcg(map); / implementation dependent freeing */ Hao also provided test cases ([1]) for easily reproducing the above issue. There are two ways to fix the issue, the v1 of the patch ([2]) moving btf_put() after map_free callback, and the v5 of the patch ([3]) using a kptr style fix which tries to get a btf reference during map_check_btf(). Each approach has its pro and cons. The first approach delays freeing btf while the second approach needs to acquire reference depending on context which makes logic not very elegant and may complicate things with future new data structures. Alexei suggested in [4] going back to v1 which is what this patch tries to do. Rerun './test_progs -j' with the above mdelay() hack for a couple of times and didn't observe the error for the above rb_root test cases. Running Hou's test ([1]) is also successful. [1] https://lore.kernel.org/bpf/20231207141500.917136-1-houtao@huaweicloud.com/ [2] v1: https://lore.kernel.org/bpf/20231204173946.3066377-1-yonghong.song@linux.dev/ [3] v5: https://lore.kernel.org/bpf/20231208041621.2968241-1-yonghong.song@linux.dev/ [4] v4: https://lore.kernel.org/bpf/CAADnVQJ3FiXUhZJwX_81sjZvSYYKCFB3BT6P8D59RS2Gu+0Z7g@mail.gmail.com/ Cc: Hou Tao <houtao@huaweicloud.com> Fixes: `958cf2e273` ("bpf: Introduce bpf_obj_new") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231214203815.1469107-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-14 17:10:32 -08:00
Linus Torvalds	c7402612e2	Current release - regressions: - tcp: fix tcp_disordered_ack() vs usec TS resolution Current release - new code bugs: - dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set() - eth: octeon_ep: initialise control mbox tasks before using APIs Previous releases - regressions: - io_uring/af_unix: disable sending io_uring over sockets - eth: mlx5e: - TC, don't offload post action rule if not supported - fix possible deadlock on mlx5e_tx_timeout_work - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb() - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on - eth: team: fix use-after-free when an option instance allocation fails Previous releases - always broken: - neighbour: don't let neigh_forced_gc() disable preemption for long - net: prevent mss overflow in skb_segment() - ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX - tcp: remove acked SYN flag from packet in the transmit queue correctly - eth: octeontx2-af: - fix a use-after-free in rvu_nix_register_reporters - fix promisc mcam entry action - eth: dwmac-loongson: make sure MDIO is initialized before use - eth: atlantic: fix double free in ring reinit logic Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmV6/E4SHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkas8P/if7c+MUxkegwRbO0vOObG/B/QXJ+dR8 UcqPYnroF0u7s2KhDqbj/h9msbNhAmWzrhzk4c086hpIkq34piiS+W319K/tia6u H1fRbVfBAo/mcQ8eG7EPiDYrNKDhuiGL6Gsd/Fdl9om1CMjW4fAFWY1F79OoL7F5 mDTiVdnHik06CGgic6zRdp4xy6zHZ5oBanS60VNjLa4sb69g1Z1fjLQoJt4qXYbJ jWZ9QkJ1t/98MOca6mFIZNJY+f3doYMRv5dP1oUSJmbFGfCYjbMcdpa3BQlTiDdu 96xWF01p5uJ2UBib0nKiGSZmg1Xz1xal9V+ahApmTe8BpZAn6PJeXYbtMQO2SXYf VW3V7rSkCB482UPN3siubhtZnOE5oYixM/5OL/UGZv113ShF8HNjj4AAZOeXtJPc 75QeQOSRy+vhopEexCZ+21Zou+Ao3MjEFlVMCfTJ7couvjFg9LNkazHTXfAkwe0J QaLYpbbaXwS3lOspwWFK2rV/G+3fpJZBrW2WRwlLBMMg3lXLuo2OdqrewV9GoI36 ksqv2c5mMtLwomdM2QfK0zeUc6kDeqlpEcjMzfapn/92A+pcAmcBpT2FfFDR4QUz nhoULC2XvTdlri7nxxp/9AYbQK0DFXqChPPV3NdcN/HPI7fYFHTv387ZkLU5zDlN nwnXj8rbA0d5 =84lK -----END PGP SIGNATURE----- Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Current release - regressions: - tcp: fix tcp_disordered_ack() vs usec TS resolution Current release - new code bugs: - dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set() - eth: octeon_ep: initialise control mbox tasks before using APIs Previous releases - regressions: - io_uring/af_unix: disable sending io_uring over sockets - eth: mlx5e: - TC, don't offload post action rule if not supported - fix possible deadlock on mlx5e_tx_timeout_work - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb() - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on - eth: team: fix use-after-free when an option instance allocation fails Previous releases - always broken: - neighbour: don't let neigh_forced_gc() disable preemption for long - net: prevent mss overflow in skb_segment() - ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX - tcp: remove acked SYN flag from packet in the transmit queue correctly - eth: octeontx2-af: - fix a use-after-free in rvu_nix_register_reporters - fix promisc mcam entry action - eth: dwmac-loongson: make sure MDIO is initialized before use - eth: atlantic: fix double free in ring reinit logic" * tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net: atlantic: fix double free in ring reinit logic appletalk: Fix Use-After-Free in atalk_ioctl net: stmmac: Handle disabled MDIO busses from devicetree net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX dpaa2-switch: do not ask for MDB, VLAN and FDB replay dpaa2-switch: fix size of the dma_unmap net: prevent mss overflow in skb_segment() vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space() Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set" MIPS: dts: loongson: drop incorrect dwmac fallback compatible stmmac: dwmac-loongson: drop useless check for compatible fallback stmmac: dwmac-loongson: Make sure MDIO is initialized before use tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set() net: ena: Fix XDP redirection error net: ena: Fix DMA syncing in XDP path when SWIOTLB is on net: ena: Fix xdp drops handling due to multibuf packets net: ena: Destroy correct number of xdp queues upon failure net: Remove acked SYN flag from packet in the transmit queue correctly qed: Fix a potential use-after-free in qed_cxt_tables_alloc ...	2023-12-14 13:11:49 -08:00
Linus Torvalds	bdb2701f0b	for-6.7-rc5-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmV5rTIACgkQxWXV+ddt WDuLUg/+Ix/CeA+JY6VZMA2kBHMzmRexSjYONWfQwIL7LPBy4sOuSEaTZt+QQMs+ AEKau1YfTgo7e9S2DlbZhIWp6P87VFui7Q1E99uJEmKelakvf94DbMrufPTTKjaD JG2KB6LsD59yWwfbGHEAVVNGSMRk2LDXzcUWMK6/uzu/7Bcr4ataOymWd86/blUV cw5g87uAHpBn+R1ARTf1CkqyYiI9UldNUJmW1q7dwxOyYG+weUtJImosw2Uda76y wQXAFQAH3vsFzTC+qjC9Vz7cnyAX9qAw48ODRH7rIT1BQ3yAFQbfXE20jJ/fSE+C lz3p05tA9373KAOtLUHmANBwe3NafCnlut6ZYRfpTcEzUslAO5PnajPaHh5Al7uC Iwdpy49byoyVFeNf0yECBsuDP8s86HlUALF8mdJabPI1Kl66MUea6KgS1oyO3pCB hfqLbpofV4JTywtIRLGQTQvzSwkjPHTbSwtZ9nftTw520a5f7memDu5vi4XzFd+B NrJxmz2DrMRlwrLgWg9OXXgx1riWPvHnIoqzjG5W6A9N74Ud1/oz7t3VzjGSQ5S2 UikRB6iofPE0deD8IF6H6DvFfvQxU9d9BJ6IS9V2zRt5vdgJ2w08FlqbLZewSY4x iaQ+L7UYKDjC9hdosXVNu/6fAspyBVdSp2NbKk14fraZtNAoPNs= =uF/Q -----END PGP SIGNATURE----- Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Some fixes to quota accounting code, mostly around error handling and correctness: - free reserves on various error paths, after IO errors or transaction abort - don't clear reserved range at the folio release time, it'll be properly cleared after final write - fix integer overflow due to int used when passing around size of freed reservations - fix a regression in squota accounting that missed some cases with delayed refs" * tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: ensure releasing squota reserve on head refs btrfs: don't clear qgroup reserved bit in release_folio btrfs: free qgroup pertrans reserve on transaction abort btrfs: fix qgroup_free_reserved_data int overflow btrfs: free qgroup reserve when ORDERED_IOERR is set	2023-12-14 11:53:00 -08:00
Paul M Stillwell Jr	d96f04e05f	ice: add documentation for FW logging Add documentation for FW logging in Documentation/networking/device_drivers/ethernet/intel/ice.rst Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-12-14 09:51:02 -08:00

... 2 3 4 5 6 ...

1236364 commits