Commit graph

1296623 commits

Author SHA1 Message Date
Linus Torvalds 3ec3f5fc4a Livepatching selftest fixup for 6.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmbNnWsACgkQUqAMR0iA
 lPK8sQ//Y6fsXu3TIoFRu0mCgwV2SyyseCBLOHMPOvwIQawBc3EdwzAfktAcObqW
 6+y/RZN6sWk/7IcNhRYt29L6eBglcz08SQ0UFimaNPgh9BD8TOzyzoJw4kvrejUs
 EalecacAo/C2ntDi0N9UNARm7X4zcr6hnIkmF/HDuMGzkRngiACOuL7MFJyOIIfK
 diz8Pu0CGPfjEnby1eUDSunUrVc5TFz6rt+AGnO6GAaH5YUASRby3NvGxyk98MX4
 oYjr+Q7dVAhDfazG7wW0JeCH6XQafQllyuZZKjt+udEavLOLmVvNfE3zVCf8+E5V
 kSymUfQaMpVqaWsJFS1uxchWwkQnbqBEgx/WJlA7esP1RsydjS1ui65Wi+P+WYpL
 L9l68N0NZJ10mr6+tzn9qj9Rr69rm5lmuA/hJglAwgZydWdl2bb8EXZzSJofhIbg
 KjjPDcWCl4GKSXt7lseYO7tDZofhD4E24tzk9xme8gOPRwdqzAcpFKoobqhk/9jO
 /HlgDNtXbkq6JfOmcg1wIA0+XO31ezP7w2aSn5oIn9YoQdFxT0xY8hj7W690FOfU
 SOLtXWC43XWJzGStt19b9inMO4E9dmmNffX3CAzqI3mEQcwfFdXv8S+ZHQn0jUAF
 Y/lx9qDgtYjdpe/FbS1+bICFgbWxCi0NCSF3GaQjFGf7Rz77K7k=
 =dua0
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching fix from Petr Mladek:
 "Selftest regression fix"

* tag 'livepatching-for-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  selftests/livepatch: wait for atomic replace to occur
2024-08-28 06:34:08 +12:00
Linus Torvalds 41594663c3 Pin control fixes for the v6.11 series:
- Fix the hwirq map and pin offsets in the Qualcomm X1E80100 driver.
 
 - Fix the pin range handling in the AT91 driver so it works again.
 
 - Fix a NULL-dereference risk in pinctrl single.
 
 - Fix a serious biasing bug in the Mediatek driver.
 
 - Fix the level trigged IRQ in the StarFive JH7110.
 
 - Fix the iomux width in the Rockchip GPIO2-B pin handling.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmbN7ksACgkQQRCzN7AZ
 XXN4LRAAq1SfehCkuTOcJLEMgRUyWXoY9GadNNBoW/j7wV/XkBiBnZGBxWSlbSM7
 2NP/Rdnmadx/PXp39ptMYPyQjOouROpMnq0VNv9GqU3lF5JonXA0ccXQeFEqCtDq
 57d/orkcRrVJ8AxerEp/SDasfm6yUDOBKA0M/lc4aFNebMsH785LZt6zfBTuxAUs
 dU453uilizOM8w2qS280BcAdpW+hhm0Ev6dWuRN7ALNSXnLM/LXx6vutf984rCmH
 Ua+9CxbQ46CO2gZcdBWRZjkbgz6tI6sCELAum/bWEZQytx9BpAPA//0iJcCPtVS+
 BVAFAAABLtDEP4p9pRtMcXhBkA9CJ1Uv+Fx3nK+qoYv0M4BkpUC0nE7csoKs2ya0
 odfXJOjnprY6T93yHdrnUxb7ME7jkr4wpQJA6dYo8rBpwHmvyXRdtRZLwtsG97Wi
 DZohQTTt+M/CnduuMDNebER6k9lHjgHx7VyaCrKfBMm/F4fjrFgK2z7gnz8XmMZQ
 MyLPN0zA2or/LuyIk7hnEREPdvqhTd+3ORWLSBX08BTQMkmBWTLuAoX3cN5kpCBH
 6exbVCqR+KrX2C7r8oEBrpH7wncoBon6br7Bh08y4UhzA/5e2Rl4pgjULfc/gTGW
 Fs+yRZwgJuFN9GzaCzSudxvPD/bUt56ceyx2w9CdBrjXdyAmxj4=
 =+YGQ
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

 - Fix the hwirq map and pin offsets in the Qualcomm X1E80100 driver

 - Fix the pin range handling in the AT91 driver so it works again

 - Fix a NULL-dereference risk in pinctrl single

 - Fix a serious biasing bug in the Mediatek driver

 - Fix the level trigged IRQ in the StarFive JH7110

 - Fix the iomux width in the Rockchip GPIO2-B pin handling

* tag 'pinctrl-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: rockchip: correct RK3328 iomux width flag for GPIO2-B pins
  pinctrl: starfive: jh7110: Correct the level trigger configuration of iev register
  pinctrl: qcom: x1e80100: Fix special pin offsets
  pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE
  pinctrl: single: fix potential NULL dereference in pcs_get_function()
  pinctrl: at91: make it work with current gpiolib
  pinctrl: qcom: x1e80100: Update PDC hwirq map
2024-08-28 06:26:32 +12:00
Linus Torvalds 6ace1c7ea2 sound fixes for 6.11-rc6
It became a bit larger collection of fixes than wished at this time,
 but all changes are small and mostly device-specific fixes that
 should be fairly safe to apply.  Majority of fixes are about ASoC
 for AMD SOF, Cirrus codecs, lpass, etc, in addition to the usual
 HD-audio quirks / fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmbNkGYOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE/yZxAAwFz2l10Kia6qPJuqmlfZU4o6Wwt7x4MPO+hS
 v0t2P9HqesiFA3ObYpOHYTsR1ufz1Jvh8X99Ie5GinjxOvAO95TAvAL1UVq9zWTw
 J1pUQ8eFJjRAufAzFFo4Ffn2F5VmYdTK8HYPzy+3+ck5pdFSNcF3x1oc3b6gOb7f
 xA9I6L6xLQdcKYAOfUFRuh1HvV970vdPNwJnxQXwy4R/UwJWqkkVlnwPmnpiz/Xg
 7u95QdtlL8H46AqNgufDBZ0i+1reO3vMaJY5Xrz2c7egxyQqbH4fcHEqgRRVKZ7N
 WF9Bti8YXcruBB0FLIuNIPs2dRnnYmZy1fZy0gOtJzhIUZI08+zr+gm5BxAat/q2
 /ktjPWTc7LGRQLrZ1T9x6zmnJ6Cq8vCF4zyN/bSI7VhWr/KtfyZzu5PKpgx81zS9
 r3UP+edaOcf4Xvsg95D6QQxFTkg46kg730jUeKAg47R1dnCqZvZKLkZWS/IusV9u
 TBnV1JKrgkyJu3a/1RP1hN1eGWaAPm1Z8aI2VorG16/FUNlBkyn6KFiLRJk7phRL
 RtHN8oZUiyxV+swLprYnZ1e8uYhLEklqQ3HZkchj2DKOj/Oe180udlEQpxAzLnd+
 4G2iY2A/4x2Aab1YgH5xuPoC6sw1l7EJ0lZbxhlE6vHnOBL8+wx0cwnmH51QxftS
 lk0eCSI=
 =u+nW
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "It became a bit larger collection of fixes than wished at this time,
  but all changes are small and mostly device-specific fixes that should
  be fairly safe to apply.

  Majority of fixes are about ASoC for AMD SOF, Cirrus codecs, lpass,
  etc, in addition to the usual HD-audio quirks / fixes"

* tag 'sound-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (22 commits)
  ALSA: hda: hda_component: Fix mutex crash if nothing ever binds
  ALSA: hda/realtek: support HP Pavilion Aero 13-bg0xxx Mute LED
  ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book3 Ultra
  ASoC: cs-amp-lib: Ignore empty UEFI calibration entries
  ASoC: cs-amp-lib-test: Force test calibration blob entries to be valid
  ALSA: hda/realtek - FIxed ALC285 headphone no sound
  ALSA: hda/realtek - Fixed ALC256 headphone no sound
  ASoC: allow module autoloading for table board_ids
  ASoC: allow module autoloading for table db1200_pids
  ALSA: hda: cs35l56: Don't use the device index as a calibration index
  ALSA: seq: Skip event type filtering for UMP events
  ALSA: hda/realtek: Enable mute/micmute LEDs on HP Laptop 14-ey0xxx
  ASoC: SOF: amd: Fix for acp init sequence
  ASoC: amd: acp: fix module autoloading
  ASoC: mediatek: mt8188: Mark AFE_DAC_CON0 register as volatile
  ASoC: codecs: wcd937x: Fix missing de-assert of reset GPIO
  ASoC: SOF: mediatek: Add missing board compatible
  ASoC: MAINTAINERS: Drop Banajit Goswami from Qualcomm sound drivers
  ASoC: SOF: amd: Fix for incorrect acp error register offsets
  ASoC: SOF: amd: move iram-dram fence register programming sequence
  ...
2024-08-28 06:24:22 +12:00
Stefan Berger 08d08e2e9f tpm: ibmvtpm: Call tpm2_sessions_init() to initialize session support
Commit d2add27cf2 ("tpm: Add NULL primary creation") introduced
CONFIG_TCG_TPM2_HMAC. When this option is enabled on ppc64 then the
following message appears in the kernel log due to a missing call to
tpm2_sessions_init().

[    2.654549] tpm tpm0: auth session is not active

Add the missing call to tpm2_session_init() to the ibmvtpm driver to
resolve this issue.

Cc: stable@vger.kernel.org # v6.10+
Fixes: d2add27cf2 ("tpm: Add NULL primary creation")
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-08-27 21:11:44 +03:00
Pablo Neira Ayuso 70c261d500 netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
From netdev/egress, skb->len can include the ethernet header, therefore,
subtract network offset from skb->len when validating IPv6 packet length.

Fixes: 42df6e1d22 ("netfilter: Introduce egress hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-27 18:11:56 +02:00
Eric Dumazet bc21000e99 net_sched: sch_fq: fix incorrect behavior for small weights
fq_dequeue() has a complex logic to find packets in one of the 3 bands.

As Neal found out, it is possible that one band has a deficit smaller
than its weight. fq_dequeue() can return NULL while some packets are
elligible for immediate transmit.

In this case, more than one iteration is needed to refill pband->credit.

With default parameters (weights 589824 196608 65536) bug can trigger
if large BIG TCP packets are sent to the lowest priority band.

Bisected-by: John Sperbeck <jsperbeck@google.com>
Diagnosed-by: Neal Cardwell <ncardwell@google.com>
Fixes: 29f834aa32 ("net_sched: sch_fq: add 3 bands and WRR scheduling")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20240824181901.953776-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27 08:20:45 -07:00
Filipe Manana ecb54277cb btrfs: fix uninitialized return value from btrfs_reclaim_sweep()
The return variable 'ret' at btrfs_reclaim_sweep() is never assigned if
none of the space infos is reclaimable (for example if periodic reclaim
is disabled, which is the default), so we return an undefined value.

This can be fixed my making btrfs_reclaim_sweep() not return any value
as well as do_reclaim_sweep() because:

1) do_reclaim_sweep() always returns 0, so we can make it return void;

2) The only caller of btrfs_reclaim_sweep() (btrfs_reclaim_bgs()) doesn't
   care about its return value, and in its context there's nothing to do
   about any errors anyway.

Therefore remove the return value from btrfs_reclaim_sweep() and
do_reclaim_sweep().

Fixes: e4ca3932ae ("btrfs: periodic block_group reclaim")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-27 16:42:09 +02:00
Brett Creeley 4786fe29f5 ionic: Prevent tx_timeout due to frequent doorbell ringing
With recent work to the doorbell workaround code a small hole was
introduced that could cause a tx_timeout. This happens if the rx
dbell_deadline goes beyond the netdev watchdog timeout set by the driver
(i.e. 2 seconds). Fix this by changing the netdev watchdog timeout to 5
seconds and reduce the max rx dbell_deadline to 4 seconds.

The test that can reproduce the issue being fixed is a multi-queue send
test via pktgen with the "burst" setting to 1. This causes the queue's
doorbell to be rung on every packet sent to the driver, which may result
in the device missing doorbells due to the high doorbell rate.

Cc: stable@vger.kernel.org
Fixes: 4ded136c78 ("ionic: add work item for missed-doorbell check")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240822192557.9089-1-brett.creeley@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27 13:22:42 +02:00
Paolo Abeni f8fdda9e4f Merge branch 'tc-adjust-network-header-after-2nd-vlan-push'
Boris Sukholitko says:

====================
tc: adjust network header after 2nd vlan push

<tldr>
skb network header of the single-tagged vlan packet continues to point the
vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This
causes problem at the dissector which expects double-tagged packet network
header to point to the inner vlan.

The fix is to adjust network header in tcf_act_vlan.c but requires
refactoring of skb_vlan_push function.
</tldr>

Consider the following shell script snippet configuring TC rules on the
veth interface:

ip link add veth0 type veth peer veth1
ip link set veth0 up
ip link set veth1 up

tc qdisc add dev veth0 clsact

tc filter add dev veth0 ingress pref 10 chain 0 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5
tc filter add dev veth0 ingress pref 20 chain 0 flower \
	num_of_vlans 1 action vlan push id 100 \
	protocol 0x8100 action goto chain 5
tc filter add dev veth0 ingress pref 30 chain 5 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success"

Sending double-tagged vlan packet with the IP payload inside:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64   ..........."...d
0010  81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11   ......E..&......
0020  18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12   ................
0030  e1 c7 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to
the dmesg.

OTOH, sending single-tagged vlan packet:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14   ..........."....
0010  08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00   ..E..*..........
0020  00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00   ................
0030  00 00 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 20, will push the second vlan tag but will *not* match
rule 30. IOW, the match at rule 30 fails if the second vlan was freshly
pushed by the kernel.

Lets look at  __skb_flow_dissect working on the double-tagged vlan packet.
Here is the relevant code from around net/core/flow_dissector.c:1277
copy-pasted here for convenience:

	if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX &&
	    skb && skb_vlan_tag_present(skb)) {
		proto = skb->protocol;
	} else {
		vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
					    data, hlen, &_vlan);
		if (!vlan) {
			fdret = FLOW_DISSECT_RET_OUT_BAD;
			break;
		}

		proto = vlan->h_vlan_encapsulated_proto;
		nhoff += sizeof(*vlan);
	}

The "else" clause above gets the protocol of the encapsulated packet from
the skb data at the network header location. printk debugging has showed
that in the good double-tagged packet case proto is
htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet
case proto is garbage leading to the failure to match tc filter 30.

proto is being set from the skb header pointed by nhoff parameter which is
defined at the beginning of __skb_flow_dissect
(net/core/flow_dissector.c:1055 in the current version):

		nhoff = skb_network_offset(skb);

Therefore the culprit seems to be that the skb network offset is different
between double-tagged packet received from the interface and single-tagged
packet having its vlan tag pushed by TC.

Lets look at the interesting points of the lifetime of the single/double
tagged packets as they traverse our packet flow.

Both of them will start at __netif_receive_skb_core where the first vlan
tag will be stripped:

	if (eth_type_vlan(skb->protocol)) {
		skb = skb_vlan_untag(skb);
		if (unlikely(!skb))
			goto out;
	}

At this stage in double-tagged case skb->data points to the second vlan tag
while in single-tagged case skb->data points to the network (eg. IP)
header.

Looking at TC vlan push action (net/sched/act_vlan.c) we have the following
code at tcf_vlan_act (interesting points are in square brackets):

	if (skb_at_tc_ingress(skb))
[1]		skb_push_rcsum(skb, skb->mac_len);

	....

	case TCA_VLAN_ACT_PUSH:
		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
				    (p->tcfv_push_prio << VLAN_PRIO_SHIFT),
				    0);
		if (err)
			goto drop;
		break;

	....

out:
	if (skb_at_tc_ingress(skb))
[3]		skb_pull_rcsum(skb, skb->mac_len);

And skb_vlan_push (net/core/skbuff.c:6204) function does:

		err = __vlan_insert_tag(skb, skb->vlan_proto,
					skb_vlan_tag_get(skb));
		if (err)
			return err;

		skb->protocol = skb->vlan_proto;
[2]		skb->mac_len += VLAN_HLEN;

in the case of pushing the second tag. Lets look at what happens with
skb->data of the single-tagged packet at each of the above points:

1. As a result of the skb_push_rcsum, skb->data is moved back to the start
   of the packet.

2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is
   incremented, skb->data still points to the start of the packet.

3. As a result of the skb_pull_rcsum, skb->data is moved forward by the
   modified skb->mac_len, thus pointing to the network header again.

Then __skb_flow_dissect will get confused by having double-tagged vlan
packet with the skb->data at the network header.

The solution for the bug is to preserve "skb->data at second vlan header"
semantics in the skb_vlan_push function. We do this by manipulating
skb->network_header rather than skb->mac_len. skb_vlan_push callers are
updated to do skb_reset_mac_len.

More about the patch series:

* patch 1 fixes skb_vlan_push and the callers
* patch 2 adds ingress tc_actions test
* patch 3 adds egress tc_actions test
====================

Link: https://patch.msgid.link/20240822103510.468293-1-boris.sukholitko@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27 11:37:46 +02:00
Boris Sukholitko 2da44703a5 selftests: tc_actions: test egress 2nd vlan push
Add new test checking the correctness of inner vlan flushing to the skb
data when outer vlan tag is added through act_vlan on egress.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27 11:37:43 +02:00
Boris Sukholitko 59c330ecce selftests: tc_actions: test ingress 2nd vlan push
Add new test checking the correctness of inner vlan flushing to the skb
data when outer vlan tag is added through act_vlan on ingress.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27 11:37:42 +02:00
Boris Sukholitko 9388637270 tc: adjust network header after 2nd vlan push
<tldr>
skb network header of the single-tagged vlan packet continues to point the
vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This
causes problem at the dissector which expects double-tagged packet network
header to point to the inner vlan.

The fix is to adjust network header in tcf_act_vlan.c but requires
refactoring of skb_vlan_push function.
</tldr>

Consider the following shell script snippet configuring TC rules on the
veth interface:

ip link add veth0 type veth peer veth1
ip link set veth0 up
ip link set veth1 up

tc qdisc add dev veth0 clsact

tc filter add dev veth0 ingress pref 10 chain 0 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5
tc filter add dev veth0 ingress pref 20 chain 0 flower \
	num_of_vlans 1 action vlan push id 100 \
	protocol 0x8100 action goto chain 5
tc filter add dev veth0 ingress pref 30 chain 5 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success"

Sending double-tagged vlan packet with the IP payload inside:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64   ..........."...d
0010  81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11   ......E..&......
0020  18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12   ................
0030  e1 c7 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to
the dmesg.

OTOH, sending single-tagged vlan packet:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14   ..........."....
0010  08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00   ..E..*..........
0020  00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00   ................
0030  00 00 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 20, will push the second vlan tag but will *not* match
rule 30. IOW, the match at rule 30 fails if the second vlan was freshly
pushed by the kernel.

Lets look at  __skb_flow_dissect working on the double-tagged vlan packet.
Here is the relevant code from around net/core/flow_dissector.c:1277
copy-pasted here for convenience:

	if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX &&
	    skb && skb_vlan_tag_present(skb)) {
		proto = skb->protocol;
	} else {
		vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
					    data, hlen, &_vlan);
		if (!vlan) {
			fdret = FLOW_DISSECT_RET_OUT_BAD;
			break;
		}

		proto = vlan->h_vlan_encapsulated_proto;
		nhoff += sizeof(*vlan);
	}

The "else" clause above gets the protocol of the encapsulated packet from
the skb data at the network header location. printk debugging has showed
that in the good double-tagged packet case proto is
htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet
case proto is garbage leading to the failure to match tc filter 30.

proto is being set from the skb header pointed by nhoff parameter which is
defined at the beginning of __skb_flow_dissect
(net/core/flow_dissector.c:1055 in the current version):

		nhoff = skb_network_offset(skb);

Therefore the culprit seems to be that the skb network offset is different
between double-tagged packet received from the interface and single-tagged
packet having its vlan tag pushed by TC.

Lets look at the interesting points of the lifetime of the single/double
tagged packets as they traverse our packet flow.

Both of them will start at __netif_receive_skb_core where the first vlan
tag will be stripped:

	if (eth_type_vlan(skb->protocol)) {
		skb = skb_vlan_untag(skb);
		if (unlikely(!skb))
			goto out;
	}

At this stage in double-tagged case skb->data points to the second vlan tag
while in single-tagged case skb->data points to the network (eg. IP)
header.

Looking at TC vlan push action (net/sched/act_vlan.c) we have the following
code at tcf_vlan_act (interesting points are in square brackets):

	if (skb_at_tc_ingress(skb))
[1]		skb_push_rcsum(skb, skb->mac_len);

	....

	case TCA_VLAN_ACT_PUSH:
		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
				    (p->tcfv_push_prio << VLAN_PRIO_SHIFT),
				    0);
		if (err)
			goto drop;
		break;

	....

out:
	if (skb_at_tc_ingress(skb))
[3]		skb_pull_rcsum(skb, skb->mac_len);

And skb_vlan_push (net/core/skbuff.c:6204) function does:

		err = __vlan_insert_tag(skb, skb->vlan_proto,
					skb_vlan_tag_get(skb));
		if (err)
			return err;

		skb->protocol = skb->vlan_proto;
[2]		skb->mac_len += VLAN_HLEN;

in the case of pushing the second tag. Lets look at what happens with
skb->data of the single-tagged packet at each of the above points:

1. As a result of the skb_push_rcsum, skb->data is moved back to the start
   of the packet.

2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is
   incremented, skb->data still points to the start of the packet.

3. As a result of the skb_pull_rcsum, skb->data is moved forward by the
   modified skb->mac_len, thus pointing to the network header again.

Then __skb_flow_dissect will get confused by having double-tagged vlan
packet with the skb->data at the network header.

The solution for the bug is to preserve "skb->data at second vlan header"
semantics in the skb_vlan_push function. We do this by manipulating
skb->network_header rather than skb->mac_len. skb_vlan_push callers are
updated to do skb_reset_mac_len.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27 11:37:42 +02:00
Emmanuel Grumbach 094513f8a2 wifi: iwlwifi: clear trans->state earlier upon error
When the firmware crashes, we first told the op_mode and only then,
changed the transport's state. This is a problem if the op_mode's
nic_error() handler needs to send a host command: it'll see that the
transport's state still reflects that the firmware is alive.

Today, this has no consequences since we set the STATUS_FW_ERROR bit and
that will prevent sending host commands. iwl_fw_dbg_stop_restart_recording
looks at this bit to know not to send a host command for example.

To fix the hibernation, we needed to reset the firmware without having
an error and checking STATUS_FW_ERROR to see whether the firmware is
alive will no longer hold, so this change is necessary as well.

Change the flow a bit.
Change trans->state before calling the op_mode's nic_error() method and
check trans->state instead of STATUS_FW_ERROR. This will keep the
current behavior of iwl_fw_dbg_stop_restart_recording upon firmware
error, and it'll allow us to call iwl_fw_dbg_stop_restart_recording
safely even if STATUS_FW_ERROR is clear, but yet, the firmware is not
alive.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.9d7427fbdfd7.Ia056ca57029a382c921d6f7b6a6b28fc480f2f22@changeid
[I missed this was a dependency for the hibernation fix, changed
 the commit message a bit accordingly]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-08-27 09:54:24 +02:00
Alexander Sverdlin 6d30bb88f6 wifi: wfx: repair open network AP mode
RSN IE missing in beacon is normal in open networks.
Avoid returning -EINVAL in this case.

Steps to reproduce:

$ cat /etc/wpa_supplicant.conf
network={
	ssid="testNet"
	mode=2
	key_mgmt=NONE
}

$ wpa_supplicant -iwlan0 -c /etc/wpa_supplicant.conf
nl80211: Beacon set failed: -22 (Invalid argument)
Failed to set beacon parameters
Interface initialization failed
wlan0: interface state UNINITIALIZED->DISABLED
wlan0: AP-DISABLED
wlan0: Unable to setup interface.
Failed to initialize AP interface

After the change:

$ wpa_supplicant -iwlan0 -c /etc/wpa_supplicant.conf
Successfully initialized wpa_supplicant
wlan0: interface state UNINITIALIZED->ENABLED
wlan0: AP-ENABLED

Cc: stable@vger.kernel.org
Fixes: fe0a7776d4 ("wifi: wfx: fix possible NULL pointer dereference in wfx_set_mfp_ap()")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20240823131521.3309073-1-alexander.sverdlin@siemens.com
2024-08-27 10:49:26 +03:00
Linus Torvalds 3e9bff3bbe vfs-6.11-rc6.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZsxg5QAKCRCRxhvAZXjc
 olSiAQDvFvim4YtMmUDagC3yWTBsf+o3lYdAIuzNE0NtSn4vpAEAl/HVhQCaEDjv
 mcE3jokEsbvyXLnzs78PrY0Heua2mQg=
 =AHAd
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.11-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "VFS:

   - Ensure that backing files uses file->f_ops->splice_write() for
     splice

  netfs:

   - Revert the removal of PG_private_2 from netfs_release_folio() as
     cephfs still relies on this

   - When AS_RELEASE_ALWAYS is set on a mapping the folio needs to
     always be invalidated during truncation

   - Fix losing untruncated data in a folio by making letting
     netfs_release_folio() return false if the folio is dirty

   - Fix trimming of streaming-write folios in netfs_inval_folio()

   - Reset iterator before retrying a short read

   - Fix interaction of streaming writes with zero-point tracker

  afs:

   - During truncation afs currently calls truncate_setsize() which sets
     i_size, expands the pagecache and truncates it. The first two
     operations aren't needed because they will have already been done.
     So call truncate_pagecache() instead and skip the redundant parts

  overlayfs:

   - Fix checking of the number of allowed lower layers so 500 layers
     can actually be used instead of just 499

   - Add missing '\n' to pr_err() output

   - Pass string to ovl_parse_layer() and thus allow it to be used for
     Opt_lowerdir as well

  pidfd:

   - Revert blocking the creation of pidfds for kthread as apparently
     userspace relies on this. Specifically, it breaks systemd during
     shutdown

  romfs:

   - Fix romfs_read_folio() to use the correct offset with
     folio_zero_tail()"

* tag 'vfs-6.11-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix interaction of streaming writes with zero-point tracker
  netfs: Fix missing iterator reset on retry of short read
  netfs: Fix trimming of streaming-write folios in netfs_inval_folio()
  netfs: Fix netfs_release_folio() to say no if folio dirty
  afs: Fix post-setattr file edit to do truncation correctly
  mm: Fix missing folio invalidation calls during truncation
  ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_err
  ovl: fix wrong lowerdir number check for parameter Opt_lowerdir
  ovl: pass string to ovl_parse_layer()
  backing-file: convert to using fops->splice_write
  Revert "pidfd: prevent creation of pidfds for kthreads"
  romfs: fix romfs_read_folio()
  netfs, ceph: Partially revert "netfs: Replace PG_fscache by setting folio->private and marking dirty"
2024-08-27 16:57:35 +12:00
Jakub Kicinski d0cb324c47 Merge branch 'add-embedded-sync-feature-for-a-dpll-s-pin'
Arkadiusz Kubalewski says:

====================
Add Embedded SYNC feature for a dpll's pin

Introduce and allow DPLL subsystem users to get/set capabilities of
Embedded SYNC on a dpll's pin.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
====================

Link: https://patch.msgid.link/20240822222513.255179-1-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 19:21:16 -07:00
Arkadiusz Kubalewski 87abc5666a ice: add callbacks for Embedded SYNC enablement on dpll pins
Allow the user to get and set configuration of Embedded SYNC feature
on the ice driver dpll pins.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240822222513.255179-3-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 19:21:14 -07:00
Arkadiusz Kubalewski cda1fba15c dpll: add Embedded SYNC feature for a pin
Implement and document new pin attributes for providing Embedded SYNC
capabilities to the DPLL subsystem users through a netlink pin-get
do/dump messages. Allow the user to set Embedded SYNC frequency with
pin-set do netlink message.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240822222513.255179-2-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 19:21:14 -07:00
Qu Wenruo 10d9d8c351 btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk()
[BUG]
There is an internal report that KASAN is reporting use-after-free, with
the following backtrace:

  BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs]
  Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45
  CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ #76
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
  Call Trace:
   dump_stack_lvl+0x61/0x80
   print_address_description.constprop.0+0x5e/0x2f0
   print_report+0x118/0x216
   kasan_report+0x11d/0x1f0
   btrfs_check_read_bio+0xa68/0xb70 [btrfs]
   process_one_work+0xce0/0x12a0
   worker_thread+0x717/0x1250
   kthread+0x2e3/0x3c0
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x11/0x20

  Allocated by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   __kasan_slab_alloc+0x7d/0x80
   kmem_cache_alloc_noprof+0x16e/0x3e0
   mempool_alloc_noprof+0x12e/0x310
   bio_alloc_bioset+0x3f0/0x7a0
   btrfs_bio_alloc+0x2e/0x50 [btrfs]
   submit_extent_page+0x4d1/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Freed by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x50
   __kasan_slab_free+0x4b/0x60
   kmem_cache_free+0x214/0x5d0
   bio_free+0xed/0x180
   end_bbio_data_read+0x1cc/0x580 [btrfs]
   btrfs_submit_chunk+0x98d/0x1880 [btrfs]
   btrfs_submit_bio+0x33/0x70 [btrfs]
   submit_one_bio+0xd4/0x130 [btrfs]
   submit_extent_page+0x3ea/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

[CAUSE]
Although I cannot reproduce the error, the report itself is good enough
to pin down the cause.

The call trace is the regular endio workqueue context, but the
free-by-task trace is showing that during btrfs_submit_chunk() we
already hit a critical error, and is calling btrfs_bio_end_io() to error
out.  And the original endio function called bio_put() to free the whole
bio.

This means a double freeing thus causing use-after-free, e.g.:

1. Enter btrfs_submit_bio() with a read bio
   The read bio length is 128K, crossing two 64K stripes.

2. The first run of btrfs_submit_chunk()

2.1 Call btrfs_map_block(), which returns 64K
2.2 Call btrfs_split_bio()
    Now there are two bios, one referring to the first 64K, the other
    referring to the second 64K.
2.3 The first half is submitted.

3. The second run of btrfs_submit_chunk()

3.1 Call btrfs_map_block(), which by somehow failed
    Now we call btrfs_bio_end_io() to handle the error

3.2 btrfs_bio_end_io() calls the original endio function
    Which is end_bbio_data_read(), and it calls bio_put() for the
    original bio.

    Now the original bio is freed.

4. The submitted first 64K bio finished
   Now we call into btrfs_check_read_bio() and tries to advance the bio
   iter.
   But since the original bio (thus its iter) is already freed, we
   trigger the above use-after free.

   And even if the memory is not poisoned/corrupted, we will later call
   the original endio function, causing a double freeing.

[FIX]
Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(),
which has the extra check on split bios and do the proper refcounting
for cloned bios.

Furthermore there is already one extra btrfs_cleanup_bio() call, but
that is duplicated to btrfs_orig_bbio_end_io() call, so remove that
label completely.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 852eee62d3 ("btrfs: allow btrfs_submit_bio to split bios")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-27 01:34:08 +02:00
Jeff Layton 7e8ae8486e fs/nfsd: fix update of inode attrs in CB_GETATTR
Currently, we copy the mtime and ctime to the in-core inode and then
mark the inode dirty. This is fine for certain types of filesystems, but
not all. Some require a real setattr to properly change these values
(e.g. ceph or reexported NFS).

Fix this code to call notify_change() instead, which is the proper way
to effect a setattr. There is one problem though:

In this case, the client is holding a write delegation and has sent us
attributes to update our cache. We don't want to break the delegation
for this since that would defeat the purpose. Add a new ATTR_DELEG flag
that makes notify_change bypass the try_break_deleg call.

Fixes: c5967721e1 ("NFSD: handle GETATTR conflict with write delegation")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-08-26 19:04:00 -04:00
MD Danish Anwar e846be0fba net: ti: icssg-prueth: Fix 10M Link issue on AM64x
Crash is seen on AM64x 10M link when connecting / disconnecting multiple
times.

The fix for this is to enable quirk_10m_link_issue for AM64x.

Fixes: b256e13378 ("net: ti: icssg-prueth: Add AM64x icssg support")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://patch.msgid.link/20240823120412.1262536-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 15:20:33 -07:00
Xi Huang 2c163922de net: dpaa: reduce number of synchronize_net() calls
In the function dpaa_napi_del(), we execute the netif_napi_del()
for each cpu, which is actually a high overhead operation
because each call to netif_napi_del() contains a synchronize_net(),
i.e. an RCU operation. In fact, it is only necessary to call
 __netif_napi_del and use synchronize_net() once outside of the loop.
This change is similar to commit 2543a6000e ("gro_cells: reduce
number of synchronize_net() calls") and commit 5198d545db (" net:
remove napi_hash_del() from driver-facing API") 5198d545db.

Signed-off-by: Xi Huang <xuiagnh@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240822072042.42750-1-xuiagnh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 15:16:59 -07:00
Eric Dumazet 89683b45f1 ipv6: avoid indirect calls for SOL_IP socket options
ipv6_setsockopt() can directly call ip_setsockopt()
instead of going through udp_prot.setsockopt()

ipv6_getsockopt() can directly call ip_getsockopt()
instead of going through udp_prot.getsockopt()

These indirections predate git history, not sure why they
were there.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240823140019.3727643-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 14:53:50 -07:00
Hongbo Li 9ceebd7a26 net/ipv4: fix macro definition sk_for_each_bound_bhash
The macro sk_for_each_bound_bhash accepts a parameter
__sk, but it was not used, rather the sk2 is directly
used, so we replace the sk2 with __sk in macro.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20240823070453.3327832-1-lihongbo22@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 14:04:25 -07:00
Jamie Bainbridge a699781c79 ethtool: check device is present when getting link settings
A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd7fb
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17e2d ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd7fb ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 14:03:02 -07:00
Jason Xing 0d9e5df4a2 tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process
We found that one close-wait socket was reset by the other side
due to a new connection reusing the same port which is beyond our
expectation, so we have to investigate the underlying reason.

The following experiment is conducted in the test environment. We
limit the port range from 40000 to 40010 and delay the time to close()
after receiving a fin from the active close side, which can help us
easily reproduce like what happened in production.

Here are three connections captured by tcpdump:
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191
127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070
127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1
// a few seconds later, within 60 seconds
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2
127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193
// later, very quickly
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805
127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1

As we can see, the first flow is reset because:
1) client starts a new connection, I mean, the second one
2) client tries to find a suitable port which is a timewait socket
   (its state is timewait, substate is fin_wait2)
3) client occupies that timewait port to send a SYN
4) server finds a corresponding close-wait socket in ehash table,
   then replies with a challenge ack
5) client sends an RST to terminate this old close-wait socket.

I don't think the port selection algo can choose a FIN_WAIT2 socket
when we turn on tcp_tw_reuse because on the server side there
remain unread data. In some cases, if one side haven't call close() yet,
we should not consider it as expendable and treat it at will.

Even though, sometimes, the server isn't able to call close() as soon
as possible like what we expect, it can not be terminated easily,
especially due to a second unrelated connection happening.

After this patch, we can see the expected failure if we start a
connection when all the ports are occupied in fin_wait2 state:
"Ncat: Cannot assign requested address."

Reported-by: Jade Dong <jadedong@tencent.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240823001152.31004-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 13:28:51 -07:00
Jakub Kicinski 73b22ba0ae Merge branch 'net-pse-pd-tps23881-reset-gpio-support'
Kyle Swenson says:

====================
net: pse-pd: tps23881: Reset GPIO support

On some boards, the TPS2388x's reset line (active low) is pulled low to
keep the chip in reset until the SoC pulls the device out of reset.
This series updates the device-tree binding for the tps23881 and then
adds support for the reset gpio handling in the tps23881 driver.

v1: https://lore.kernel.org/20240819190151.93253-1-kyle.swenson@est.tech
====================

Link: https://patch.msgid.link/20240822220100.3030184-1-kyle.swenson@est.tech
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 13:24:12 -07:00
Kyle Swenson 69f47cad3a net: pse-pd: tps23881: Support reset-gpios
The TPS23880/1 has an active-low reset pin that some boards connect to
the SoC to control when the TPS23880 is pulled out of reset.

Add support for this via a reset-gpios property in the DTS.

Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240822220100.3030184-3-kyle.swenson@est.tech
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 13:24:11 -07:00
Kyle Swenson ec82fa2c87 dt-bindings: pse: tps23881: add reset-gpios
The TPS23881 has an active-low reset pin that can be connected to an
SoC.  Document this with the device-tree binding.

Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240822220100.3030184-2-kyle.swenson@est.tech
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 13:24:10 -07:00
Rosen Penev d2ab3bb890 net: ag71xx: move clk_eth out of struct
It's only used in one place. It doesn't need to be in the struct.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240822192758.141201-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:59:19 -07:00
Cong Wang 1461f5a3d8 l2tp: avoid overriding sk->sk_user_data
Although commit 4a4cd70369 ("l2tp: don't set sk_user_data in tunnel socket")
removed sk->sk_user_data usage, setup_udp_tunnel_sock() still touches
sk->sk_user_data, this conflicts with sockmap which also leverages
sk->sk_user_data to save psock.

Restore this sk->sk_user_data check to avoid such conflicts.

Fixes: 4a4cd70369 ("l2tp: don't set sk_user_data in tunnel socket")
Reported-by: syzbot+8dbe3133b840c470da0e@syzkaller.appspotmail.com
Cc: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Tested-by: James Chapman <jchapman@katalix.com>
Reviewed-by: James Chapman <jchapman@katalix.com>
Link: https://patch.msgid.link/20240822182544.378169-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:55:40 -07:00
Jakub Kicinski 7888173eb1 Merge branch 'net-xilinx-axienet-multicast-fixes-and-improvements'
Sean Anderson says:

====================
net: xilinx: axienet: Multicast fixes and improvements

This series has a few small patches improving the handling of multicast
addresses. In particular, it makes the driver a whole lot less spammy,
and adjusts things so we aren't in promiscuous mode when we have more
than four multicast addresses (a common occurance on modern systems).

As the hardware has a 4-entry CAM, the ideal method would be to "pack"
multiple addresses into one CAM entry. Something like:

entry.address = address[0] | address[1];
entry.mask = ~(address[0] ^ address[1]);

Which would make the entry match both addresses (along with some others
that would need to be filtered in software).

Mapping addresses to entries in an efficient way is a bit tricky. If
anyone knows of an in-tree example of something like this, I'd be glad
to hear about it.
====================

Link: https://patch.msgid.link/20240822154059.1066595-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:52:07 -07:00
Sean Anderson 749e67d5b2 net: xilinx: axienet: Support IFF_ALLMULTI
Add support for IFF_ALLMULTI by configuring a single filter to match the
multicast address bit. This allows us to keep promiscuous mode disabled,
even when we have more than four multicast addresses. An even better
solution would be to "pack" addresses into the available CAM registers,
but that can wait for a future series.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-6-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:52:03 -07:00
Sean Anderson 7a826fb3e4 net: xilinx: axienet: Don't set IFF_PROMISC in ndev->flags
Contrary to the comment, we don't have to inform the net subsystem.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-5-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:52:02 -07:00
Sean Anderson cd039e6787 net: xilinx: axienet: Don't print if we go into promiscuous mode
A message about being in promiscuous mode is printed every time each
additional multicast address beyond four is added. Suppress this message
like is done in other drivers.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-4-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:52:02 -07:00
Aleksandr Mishin 62fdaf9e80 ice: Adjust over allocation of memory in ice_sched_add_root_node() and ice_sched_add_node()
In ice_sched_add_root_node() and ice_sched_add_node() there are calls to
devm_kcalloc() in order to allocate memory for array of pointers to
'ice_sched_node' structure. But incorrect types are used as sizeof()
arguments in these calls (structures instead of pointers) which leads to
over allocation of memory.

Adjust over allocation of memory by correcting types in devm_kcalloc()
sizeof() arguments.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:49:57 -07:00
Jakub Kicinski 5efc9623cf Merge branch 'some-modifications-to-optimize-code-readability'
Li Zetao says:

====================
Some modifications to optimize code readability

This patchset is mainly optimized for readability in contexts where size
needs to be determined. By using min() or max(), or even directly
removing redundant judgments (such as the 5th patch), the code is more
consistent with the context.
====================

Link: https://patch.msgid.link/20240822133908.1042240-1-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:48:54 -07:00
Li Zetao a18308623c tipc: use min() to simplify the code
When calculating size of own domain based on number of peers, the result
should be less than MAX_MON_DOMAIN, so using min() here is very semantic.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822133908.1042240-8-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:48:53 -07:00
Li Zetao 26549dab8a ipv6: mcast: use min() to simplify the code
When coping sockaddr in ip6_mc_msfget(), the time of copies
depends on the minimum value between sl_count and gf_numsrc.
Using min() here is very semantic.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822133908.1042240-7-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:48:53 -07:00
Li Zetao b4985aa8e3 net: caif: use max() to simplify the code
When processing the tail append of sk buffer, the final length needs
to be determined based on expectlen and addlen. Using max() here can
increase the readability of the code.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822133908.1042240-4-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:48:53 -07:00
Sergey Temerkhanov b1703d5f79 ice: Report NVM version numbers on mismatch during load
Report NVM version numbers (both detected and expected) when a mismatch b/w
driver and firmware is detected. This provides more useful information
about which NVM version the driver expects, rather than requiring manual
code inspection.

Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:47:13 -07:00
Jacob Keller 448711c1da ice: remove unnecessary control queue cmd_buf arrays
The driver allocates a cmd_buf array in addition to the desc_buf array.
This array stores an ice_sq_cd command details structure for each entry in
the control queue ring.

The contents of the structure are copied from the value passed in via
ice_sq_send_cmd, and include only a pointer to storage for the write back
descriptor contents.

Originally this array was intended to support asynchronous completion
including features such as a callback function. This support was never
implemented. All that exists today is needless copying and resetting of a
cmd_buf array that is otherwise functionally unused.

Since we do not plan to implement asynchronous completions, drop this
unnecessary memory and logic. This saves memory for each control queue, and
avoids the pointless copying and memset.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:46:14 -07:00
Jacob Keller 1d95d9256c ice: reword comments referring to control queues
Many comments in ice_controlq.c use the term "Admin queue" despite the code
being intended for arbitrary control queues, not just the Admin queue.
Reword the comments to make it clear that this code is the generic control
queue logic that is shared by all of the control queues, and is not
specific to the Admin queue.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:41:43 -07:00
Przemek Kitszel 74ce564a30 ice: stop intermixing AQ commands/responses debug dumps
The ice_debug_cq() function is called to generate a debug log of control
queue messages both sent and received. It currently does this over a
potential total of 6 different printk invocations.

The main logic prints over 4 calls to ice_debug():

 1. The metadata including opcode, flags, datalength and return value.
 2. The cookie in the descriptor.
 3. The parameter values.
 4. The address for the databuffer.

In addition, if the descriptor has a data buffer, it can be logged with two
additional prints:

 5. A message indicating the start of the data buffer.
 6. The actual data buffer, printed using print_hex_dump_debug.

This can lead to trouble in the event that two different PFs are logging
messages. The messages become intermixed and it may not be possible to
determine which part of the output belongs to which control queue message.

To fix this, it needs to be possible to unambiguously determine which
messages belong together. This is trivial for the messages that comprise
the main printing. Combine them together into a single invocation of
ice_debug().

The message containing a hex-dump of the data buffer is a bit more
complicated. This is printed separately as part of print_hex_dump_debug.
This function takes a prefix, which is currently always set to
KBUILD_MODNAME. Extend this prefix to include the buffer address for the
databuffer, which is printed as part of the main print, and which is
guaranteed to be unique for each buffer.

Refactor the ice_debug_array(), introducing an ice_debug_array_w_prefix().
Build the prefix by combining KBUILD_MODNAME with the databuffer address
using snprintf().

These changes make it possible to unambiguously determine what data belongs
to what control queue message.

Reported-by: Jacek Wierzbicki <jacek.wierzbicki@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:40:44 -07:00
Bruce Allan 6bd7cb522b ice: do not clutter debug logs with unused data
Currently, debug logs are unnecessarily cluttered with the contents of
command data buffers even if the receiver of that command (i.e. FW or MBX)
are not told to read the buffer.  Change to only log command data buffers
when the RD flag (indicates receiver needs to read the buffer) is set.
Continue to log response data buffer when the returned datalen is non-zero.

Also, rename a local variable to reflect what is in the hardware
specification and how it is used elsewhere in the code, use local variables
instead of duplicating endian conversions unnecessarily and remove an
unnecessary assignment.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:39:34 -07:00
Jacob Keller caf4daae87 ice: improve debug print for control queue messages
The ice_debug_cq function is called to print debug data for a control queue
descriptor in multiple places. This includes both before we send a message
on a transmit queue, after the writeback completion of a message on the
transmit queue, and when we receive a message on a receive queue.

This function does not include data about *which* control queue the message
is on, nor whether it was what we sent to the queue or what we received
from the queue.

Modify ice_debug_cq to take two extra parameters, a pointer to the control
queue and a boolean indicating if this was a response or a command. Improve
the debug messages by replacing "CQ CMD" with a string indicating which
specific control queue (based on cq->qtype) and whether this was a command
sent by the PF or a response from the queue.

This helps make the log output easier to understand and consume when
debugging.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26 09:38:36 -07:00
Jakub Kicinski 77f0caecf4 Merge branch 'net-header-and-core-spelling-corrections'
Simon Horman says:

====================
net: header and core spelling corrections

This patchset addresses a number of spelling errors in comments in
Networking files under include/, and files in net/core/. Spelling
problems are as flagged by codespell.

It aims to provide patches that can be accepted directly into net-next.
And splits patches up based on maintainer boundaries: many things
feed directly into net-next. This is a complex process and I apologise
for any errors.

I also plan to address, via separate patches, spelling errors in other
files in the same directories, for files whose changes typically go
through trees other than net-next (which feed into net-next).
====================

Link: https://patch.msgid.link/20240822-net-spell-v1-0-3a98971ce2d2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:37:26 -07:00
Simon Horman a8c924e987 net: Correct spelling in net/core
Correct spelling in net/core.
As reported by codespell.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822-net-spell-v1-13-3a98971ce2d2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:37:23 -07:00
Simon Horman 70d0bb45fa net: Correct spelling in headers
Correct spelling in Networking headers.
As reported by codespell.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822-net-spell-v1-12-3a98971ce2d2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:37:23 -07:00
Simon Horman 01d86846a5 x25: Correct spelling in x25.h
Correct spelling in x25.h
As reported by codespell.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Martin Schiller <ms@dev.tdt.de>
Link: https://patch.msgid.link/20240822-net-spell-v1-11-3a98971ce2d2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26 09:37:23 -07:00