Commit graph

1235041 commits

Author SHA1 Message Date
Filipe Manana 3128b548c7 btrfs: split assert into two different asserts when removing block group
When starting a transaction to remove a block group we have one ASSERT
that checks we found an extent map and that the extent map's start offset
matches the desired chunk offset. In case one of the conditions fails, we
get a stack trace that point to the respective line of code, however we
can't tell which condition failed: either there's no extent map or we got
one with an unexpected start offset. To make such an issue easier to debug
and analyse, split the assertion into two, one for each condition. This
was actually triggered during development of another upcoming change.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
Filipe Manana 5031660a1b btrfs: mark sanity checks when getting chunk map as unlikely
When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity
checks to verify that we found an extent map and that it includes the
requested logical address. These are never expected to fail, so mark
them as unlikely to make it more clear as well as to allow a compiler
to generate more efficient code.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba 46524fab69 btrfs: remove unused btrfs_root::type
Looks like the struct member was added in 2007 in 2.6.29 in commit
87ee04eb0f ("Btrfs: Add simple stripe size parameter") but hasn't been
used at all since. So let's remove it. This was found by tool
https://github.com/jirislaby/clang-struct, then build tested after
removing the struct member.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba 49542050b1 btrfs: remove unused definition of tree_entry in extent-io-tree.c
The declaration was temporarily moved in a4055213bf ("btrfs: unexport
all the temporary exports for extent-io-tree.c") and then should have
been removed in 6.0 in 071d19f513 ("btrfs: remove struct tree_entry in
extent-io-tree.c") but was not.  This was found by tool
https://github.com/jirislaby/clang-struct .

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba a0df0a2680 btrfs: raid56: remove unused btrfs_plug_cb::work
The raid56 changes in 6.2 reworked the IO path to RMW, commit
93723095b5 ("btrfs: raid56: switch write path to rmw_rbio()") in
particular removed the last use of the work member so it can be removed
as well. This was found by tool https://github.com/jirislaby/clang-struct .

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba 3d72941664 btrfs: remove unused btrfs_ordered_extent::outstanding_isize
The whole isize code was deleted in 5.6 3f1c64ce04 ("btrfs: delete the
ordered isize update code"), except the struct member.  This was found
by tool https://github.com/jirislaby/clang-struct .

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba a5e182d85f btrfs: scrub: remove unused scrub_ctx::sectors_per_bio
The recent scrub rewrite forgot to remove the sectors_per_bio in
6.3 in 13a62fd997 ("btrfs: scrub: remove scrub_bio structure").
This was found by tool https://github.com/jirislaby/clang-struct .

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
Qu Wenruo cfbf07e278 btrfs: migrate to use folio private instead of page private
As a cleanup and preparation for future folio migration, this patch
would replace all page->private to folio version.  This includes:

- PagePrivate()
  -> folio_test_private()

- page->private
  -> folio_get_private()

- attach_page_private()
  -> folio_attach_private()

- detach_page_private()
  -> folio_detach_private()

Since we're here, also remove the forced cast on page->private, since
it's (void *) already, we don't really need to do the cast.

For now even if we missed some call sites, it won't cause any problem
yet, as we're only using order 0 folio (single page), thus all those
folio/page flags should be synced.

But for the future conversion to utilize higher order folio, the page
<-> folio flag sync is no longer guaranteed, thus we have to migrate to
utilize folio flags.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba 4cea422a77 btrfs: use shrinker for compression page pool
The pages are now allocated and freed centrally, so we can extend the
logic to manage the lifetime. The main idea is to keep a few recently
used pages and hand them to all writers. Ideally we won't have to go to
allocator at all (a slight performance gain) and also raise chance that
we'll have the pages available (slightly increased reliability).

In order to avoid gathering too many pages, the shrinker is attached to
the cache so we can free them on when MM demands that. The first
implementation will drain the whole cache. Further this can be refined
to keep some minimal number of pages for emergency purposes.  The
ultimate goal to avoid memory allocation failures on the write out path
from the compression.

The pool threshold is set to cover full BTRFS_MAX_COMPRESSED / PAGE_SIZE
for minimal thread pool, which is 8 (btrfs_init_fs_info()). This is 128K
/ 4K * 8 = 256 pages at maximum, which is 1MiB.

This is for all filesystems currently mounted, with heavy use of
compression IO the allocator is still needed. The cache helps for short
burst IO.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
David Sterba 9ba965dca3 btrfs: use page alloc/free wrappers for compression pages
This is a preparation for managing compression pages in a cache-like
manner, instead of asking the allocator each time. The common allocation
and free wrappers are introduced and are functionally equivalent to the
current code.

The freeing helpers need to be carefully placed where the last reference
is dropped.  This is either after directly allocating (error handling)
or when there are no other users of the pages (after copying the contents).

It's safe to not use the helper and use put_page() that will handle the
reference count. Not using the helper means there's lower number of
pages that could be reused without passing them back to allocator.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:01 +01:00
Qu Wenruo 9ba7c686fe btrfs: do not utilize goto to implement delayed inode ref deletion
[PROBLEM]
The function __btrfs_update_delayed_inode() is doing something not
meeting the code standard of today:

	path->slots[0]++
	if (path->slots[0] >= btrfs_header_nritems(leaf))
		goto search;
again:
	if (!is_the_target_inode_ref())
		goto out;
	ret = btrfs_delete_item();
	/* Some cleanup. */
	return ret;

search:
	ret = search_for_the_last_inode_ref();
	goto again;

With the tag named "again", it's pretty common to think it's a loop, but
the truth is, we only need to do the search once, to locate the last
(also the first, since there should only be one INODE_REF or
INODE_EXTREF now) ref of the inode.

[FIX]
Instead of the weird jumps, just do them in a stream-lined fashion.
This removes those weird labels, and add extra comments on why we can do
the different searches.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:00 +01:00
Filipe Manana 80d197fe04 btrfs: make the logic from btrfs_block_can_be_shared() easier to read
The logic in btrfs_block_can_be_shared() is hard to follow as we have a
lot of conditions in a single if statement including a subexpression with
a logical or and two nested if statements inside the main if statement.

Make this easier to read by using separate if statements that return
immediately when we find a condition that determines if a block can be
or can not be shared.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:00 +01:00
Filipe Manana 6e5de50fc5 btrfs: use bool for return type of btrfs_block_can_be_shared()
Currently btrfs_block_can_be_shared() returns an int that is used as a
boolean. Since it all it needs is to return true or false, and it can't
return errors for example, change the return type from int to bool to
make it a bit more readable and obvious.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:00 +01:00
Filipe Manana 6000d9313f btrfs: remove log_extents_lock and logged_list from struct btrfs_root
The logged_list[2] and log_extents_lock[2] members of struct btrfs_root
are no longer used, their last use was removed in commit 5636cf7d6d
("btrfs: remove the logged extents infrastructure"). So remove these
fields. This reduces the size of struct btrfs_root, on a release kernel,
from 1392 bytes down to 1352 bytes.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:00 +01:00
Filipe Manana b1dd019de6 btrfs: remove duplicate btrfs_clear_buffer_dirty() prototype from disk-io.h
The prototype for btrfs_clear_buffer_dirty() is declared in both disk-io.h
and extent_io.h, but the function is defined at extent_io.c. So remove the
prototype declaration from disk-io.h.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 20:27:00 +01:00
Linus Torvalds 3f7168591e four import client fixes addressing potential overflows, all marked for stable as well
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmV7q7QACgkQiiy9cAdy
 T1G9EQv/fpdrMMDcivh3h8vzZTxR9kIDa971C/wEPgQb4CNtRp2LTfybg/OOeyPD
 qtdRVXyUs3fA/1/tCxfdo2Jan1E4iEFOkzGXv+EmolCpQ5Ye3tEsAwF6s5eP9pUc
 wR5/swzNFdVfW5BwoES7/RonMezc43OXWZY0Y/9NiaPZKV7i8NTz2ZlfDMjPkplL
 Pxlmiht62L11O3Ui4h8udVGaLagfbmbPt4MLfpuMupDFg071XA8Sz8AF0Wfqh2zu
 WxkTCGHD6Oj8GPp1gJcVUkLgugvSzeSmarTOgygZVF5/fIeFJKB8VrfqCxDZcxhe
 e4E4QEv6tfetutwuCFJejTHeNgrzvMOoR+tuw5/oci/W8msq0l91varSXf0TwUBc
 7ZSnFIw92Oa4pG0zYV9SbTAxEwuoMbrUAXDvraT9AccBYFBZm66TVooR2rnTwRwc
 art398CiTdRcllP9g4ZI4ogxzkHHsVJnQ5w0h/R6/7Y1qLEqRcps84LwmSMYaK4y
 5jad3mh9
 =i6Gk
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Address OOBs and NULL dereference found by Dr. Morris's recent
  analysis and fuzzing.

  All marked for stable as well"

* tag '6.7-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix OOB in smb2_query_reparse_point()
  smb: client: fix NULL deref in asn1_ber_decoder()
  smb: client: fix potential OOBs in smb2_parse_contexts()
  smb: client: fix OOB in receive_encrypted_standard()
2023-12-14 19:57:42 -08:00
Linus Torvalds 976600c6da platform-drivers-x86 for v6.7-4
Changes:
 - tablet-mode-switch events fix
 - kernel-doc warning fixes
 
 The following is an automated shortlog grouped by driver:
 
 intel_ips:
  -  fix kernel-doc formatting
 
 intel-vbtn:
  -  Fix missing tablet-mode-switch events
 
 thinkpad_acpi:
  -  fix kernel-doc warnings
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZXsmeAAKCRBZrE9hU+XO
 MTloAQD4LVq4SjFt6ioTe/rViBNEp2SzMIVg7TCGVmK03l/wVQEAthR0Wm0m+PhK
 CNKXB0PhFkf1rPLznaO1nRh6MIYZUQE=
 =Uqjo
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

 - tablet-mode-switch events fix

 - kernel-doc warning fixes

* tag 'platform-drivers-x86-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: intel_ips: fix kernel-doc formatting
  platform/x86: thinkpad_acpi: fix kernel-doc warnings
  platform/x86: intel-vbtn: Fix missing tablet-mode-switch events
2023-12-14 17:15:33 -08:00
Linus Torvalds c7402612e2 Current release - regressions:
- tcp: fix tcp_disordered_ack() vs usec TS resolution
 
 Current release - new code bugs:
 
   - dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
 
   - eth: octeon_ep: initialise control mbox tasks before using APIs
 
 Previous releases - regressions:
 
   - io_uring/af_unix: disable sending io_uring over sockets
 
   - eth: mlx5e:
     - TC, don't offload post action rule if not supported
     - fix possible deadlock on mlx5e_tx_timeout_work
 
   - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close
 
   - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()
 
   - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on
 
   - eth: team: fix use-after-free when an option instance allocation fails
 
 Previous releases - always broken:
 
   - neighbour: don't let neigh_forced_gc() disable preemption for long
 
   - net: prevent mss overflow in skb_segment()
 
   - ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX
 
   - tcp: remove acked SYN flag from packet in the transmit queue correctly
 
   - eth: octeontx2-af:
     - fix a use-after-free in rvu_nix_register_reporters
     - fix promisc mcam entry action
 
   - eth: dwmac-loongson: make sure MDIO is initialized before use
 
   - eth: atlantic: fix double free in ring reinit logic
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmV6/E4SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkas8P/if7c+MUxkegwRbO0vOObG/B/QXJ+dR8
 UcqPYnroF0u7s2KhDqbj/h9msbNhAmWzrhzk4c086hpIkq34piiS+W319K/tia6u
 H1fRbVfBAo/mcQ8eG7EPiDYrNKDhuiGL6Gsd/Fdl9om1CMjW4fAFWY1F79OoL7F5
 mDTiVdnHik06CGgic6zRdp4xy6zHZ5oBanS60VNjLa4sb69g1Z1fjLQoJt4qXYbJ
 jWZ9QkJ1t/98MOca6mFIZNJY+f3doYMRv5dP1oUSJmbFGfCYjbMcdpa3BQlTiDdu
 96xWF01p5uJ2UBib0nKiGSZmg1Xz1xal9V+ahApmTe8BpZAn6PJeXYbtMQO2SXYf
 VW3V7rSkCB482UPN3siubhtZnOE5oYixM/5OL/UGZv113ShF8HNjj4AAZOeXtJPc
 75QeQOSRy+vhopEexCZ+21Zou+Ao3MjEFlVMCfTJ7couvjFg9LNkazHTXfAkwe0J
 QaLYpbbaXwS3lOspwWFK2rV/G+3fpJZBrW2WRwlLBMMg3lXLuo2OdqrewV9GoI36
 ksqv2c5mMtLwomdM2QfK0zeUc6kDeqlpEcjMzfapn/92A+pcAmcBpT2FfFDR4QUz
 nhoULC2XvTdlri7nxxp/9AYbQK0DFXqChPPV3NdcN/HPI7fYFHTv387ZkLU5zDlN
 nwnXj8rbA0d5
 =84lK
 -----END PGP SIGNATURE-----

Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Current release - regressions:

   - tcp: fix tcp_disordered_ack() vs usec TS resolution

  Current release - new code bugs:

   - dpll: sanitize possible null pointer dereference in
     dpll_pin_parent_pin_set()

   - eth: octeon_ep: initialise control mbox tasks before using APIs

  Previous releases - regressions:

   - io_uring/af_unix: disable sending io_uring over sockets

   - eth: mlx5e:
       - TC, don't offload post action rule if not supported
       - fix possible deadlock on mlx5e_tx_timeout_work

   - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close

   - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()

   - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on

   - eth: team: fix use-after-free when an option instance allocation
     fails

  Previous releases - always broken:

   - neighbour: don't let neigh_forced_gc() disable preemption for long

   - net: prevent mss overflow in skb_segment()

   - ipv6: support reporting otherwise unknown prefix flags in
     RTM_NEWPREFIX

   - tcp: remove acked SYN flag from packet in the transmit queue
     correctly

   - eth: octeontx2-af:
       - fix a use-after-free in rvu_nix_register_reporters
       - fix promisc mcam entry action

   - eth: dwmac-loongson: make sure MDIO is initialized before use

   - eth: atlantic: fix double free in ring reinit logic"

* tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  net: atlantic: fix double free in ring reinit logic
  appletalk: Fix Use-After-Free in atalk_ioctl
  net: stmmac: Handle disabled MDIO busses from devicetree
  net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
  dpaa2-switch: do not ask for MDB, VLAN and FDB replay
  dpaa2-switch: fix size of the dma_unmap
  net: prevent mss overflow in skb_segment()
  vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
  Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
  MIPS: dts: loongson: drop incorrect dwmac fallback compatible
  stmmac: dwmac-loongson: drop useless check for compatible fallback
  stmmac: dwmac-loongson: Make sure MDIO is initialized before use
  tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
  dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
  net: ena: Fix XDP redirection error
  net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
  net: ena: Fix xdp drops handling due to multibuf packets
  net: ena: Destroy correct number of xdp queues upon failure
  net: Remove acked SYN flag from packet in the transmit queue correctly
  qed: Fix a potential use-after-free in qed_cxt_tables_alloc
  ...
2023-12-14 13:11:49 -08:00
Linus Torvalds bdb2701f0b for-6.7-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmV5rTIACgkQxWXV+ddt
 WDuLUg/+Ix/CeA+JY6VZMA2kBHMzmRexSjYONWfQwIL7LPBy4sOuSEaTZt+QQMs+
 AEKau1YfTgo7e9S2DlbZhIWp6P87VFui7Q1E99uJEmKelakvf94DbMrufPTTKjaD
 JG2KB6LsD59yWwfbGHEAVVNGSMRk2LDXzcUWMK6/uzu/7Bcr4ataOymWd86/blUV
 cw5g87uAHpBn+R1ARTf1CkqyYiI9UldNUJmW1q7dwxOyYG+weUtJImosw2Uda76y
 wQXAFQAH3vsFzTC+qjC9Vz7cnyAX9qAw48ODRH7rIT1BQ3yAFQbfXE20jJ/fSE+C
 lz3p05tA9373KAOtLUHmANBwe3NafCnlut6ZYRfpTcEzUslAO5PnajPaHh5Al7uC
 Iwdpy49byoyVFeNf0yECBsuDP8s86HlUALF8mdJabPI1Kl66MUea6KgS1oyO3pCB
 hfqLbpofV4JTywtIRLGQTQvzSwkjPHTbSwtZ9nftTw520a5f7memDu5vi4XzFd+B
 NrJxmz2DrMRlwrLgWg9OXXgx1riWPvHnIoqzjG5W6A9N74Ud1/oz7t3VzjGSQ5S2
 UikRB6iofPE0deD8IF6H6DvFfvQxU9d9BJ6IS9V2zRt5vdgJ2w08FlqbLZewSY4x
 iaQ+L7UYKDjC9hdosXVNu/6fAspyBVdSp2NbKk14fraZtNAoPNs=
 =uF/Q
 -----END PGP SIGNATURE-----

Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
  "Some fixes to quota accounting code, mostly around error handling and
   correctness:

   - free reserves on various error paths, after IO errors or
     transaction abort

   - don't clear reserved range at the folio release time, it'll be
     properly cleared after final write

   - fix integer overflow due to int used when passing around size of
     freed reservations

   - fix a regression in squota accounting that missed some cases with
     delayed refs"

* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ensure releasing squota reserve on head refs
  btrfs: don't clear qgroup reserved bit in release_folio
  btrfs: free qgroup pertrans reserve on transaction abort
  btrfs: fix qgroup_free_reserved_data int overflow
  btrfs: free qgroup reserve when ORDERED_IOERR is set
2023-12-14 11:53:00 -08:00
Igor Russkikh 7bb26ea74a net: atlantic: fix double free in ring reinit logic
Driver has a logic leak in ring data allocation/free,
where double free may happen in aq_ring_free if system is under
stress and driver init/deinit is happening.

The probability is higher to get this during suspend/resume cycle.

Verification was done simulating same conditions with

    stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000
    while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done

Fixed by explicitly clearing pointers to NULL on deallocation

Fixes: 018423e90b ("net: ethernet: aquantia: Add ring support code")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/netdev/CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com/
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20231213094044.22988-1-irusskikh@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-14 12:10:59 +01:00
Hyunwoo Kim 189ff16722 appletalk: Fix Use-After-Free in atalk_ioctl
Because atalk_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with atalk_recvmsg().
A use-after-free for skb occurs with the following flow.
```
atalk_ioctl() -> skb_peek()
atalk_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
```
Add sk->sk_receive_queue.lock to atalk_ioctl() to fix this issue.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Link: https://lore.kernel.org/r/20231213041056.GA519680@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-14 12:02:45 +01:00
Andrew Halaney e23c0d21ce net: stmmac: Handle disabled MDIO busses from devicetree
Many hardware configurations have the MDIO bus disabled, and are instead
using some other MDIO bus to talk to the MAC's phy.

of_mdiobus_register() returns -ENODEV in this case. Let's handle it
gracefully instead of failing to probe the MAC.

Fixes: 47dd7a540b ("net: add support for STMicroelectronics Ethernet controllers.")
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20231212-b4-stmmac-handle-mdio-enodev-v2-1-600171acf79f@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-14 11:59:21 +01:00
Sneh Shah 981d947bcd net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
In 10M SGMII mode all the packets are being dropped due to wrong Rx clock.
SGMII 10MBPS mode needs RX clock divider programmed to avoid drops in Rx.
Update configure SGMII function with Rx clk divider programming.

Fixes: 463120c31c ("net: stmmac: dwmac-qcom-ethqos: add support for SGMII")
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Sneh Shah <quic_snehshah@quicinc.com>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20231212092208.22393-1-quic_snehshah@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-14 10:51:08 +01:00
Jakub Kicinski 89e0c6467e Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-12-12 (iavf)

This series contains updates to iavf driver only.

Piotr reworks Flow Director states to deal with issues in restoring
filters.

Slawomir fixes shutdown processing as it was missing needed calls.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
  iavf: Handle ntuple on/off based on new state machines for flow director
  iavf: Introduce new state machines for flow director
====================

Link: https://lore.kernel.org/r/20231212203613.513423-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:03:01 -08:00
Jakub Kicinski dc84bb19b5 Merge branch 'dpaa2-switch-various-fixes'
Ioana Ciornei says:

====================
dpaa2-switch: various fixes

The first patch fixes the size passed to two dma_unmap_single() calls
which was wrongly put as the size of the pointer.

The second patch is new to this series and reverts the behavior of the
dpaa2-switch driver to not ask for object replay upon offloading so that
we avoid the errors encountered when a VLAN is installed multiple times
on the same port.
====================

Link: https://lore.kernel.org/r/20231212164326.2753457-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 18:38:57 -08:00
Ioana Ciornei f24a49a375 dpaa2-switch: do not ask for MDB, VLAN and FDB replay
Starting with commit 4e51bf44a0 ("net: bridge: move the switchdev
object replay helpers to "push" mode") the switchdev_bridge_port_offload()
helper was extended with the intention to provide switchdev drivers easy
access to object addition and deletion replays. This works by calling
the replay helpers with non-NULL notifier blocks.

In the same commit, the dpaa2-switch driver was updated so that it
passes valid notifier blocks to the helper. At that moment, no
regression was identified through testing.

In the meantime, the blamed commit changed the behavior in terms of
which ports get hit by the replay. Before this commit, only the initial
port which identified itself as offloaded through
switchdev_bridge_port_offload() got a replay of all port objects and
FDBs. After this, the newly joining port will trigger a replay of
objects on all bridge ports and on the bridge itself.

This behavior leads to errors in dpaa2_switch_port_vlans_add() when a
VLAN gets installed on the same interface multiple times.

The intended mechanism to address this is to pass a non-NULL ctx to the
switchdev_bridge_port_offload() helper and then check it against the
port's private structure. But since the driver does not have any use for
the replayed port objects and FDBs until it gains support for LAG
offload, it's better to fix the issue by reverting the dpaa2-switch
driver to not ask for replay. The pointers will be added back when we
are prepared to ignore replays on unrelated ports.

Fixes: b28d580e29 ("net: bridge: switchdev: replay all VLAN groups")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20231212164326.2753457-3-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 18:38:54 -08:00
Ioana Ciornei 2aad7d4189 dpaa2-switch: fix size of the dma_unmap
The size of the DMA unmap was wrongly put as a sizeof of a pointer.
Change the value of the DMA unmap to be the actual macro used for the
allocation and the DMA map.

Fixes: 1110318d83 ("dpaa2-switch: add tc flower hardware offload on ingress traffic")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20231212164326.2753457-2-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 18:38:53 -08:00
Eric Dumazet 23d05d563b net: prevent mss overflow in skb_segment()
Once again syzbot is able to crash the kernel in skb_segment() [1]

GSO_BY_FRAGS is a forbidden value, but unfortunately the following
computation in skb_segment() can reach it quite easily :

	mss = mss * partial_segs;

65535 = 3 * 5 * 17 * 257, so many initial values of mss can lead to
a bad final result.

Make sure to limit segmentation so that the new mss value is smaller
than GSO_BY_FRAGS.

[1]

general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 1 PID: 5079 Comm: syz-executor993 Not tainted 6.7.0-rc4-syzkaller-00141-g1ae4cd3cbdd0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:skb_segment+0x181d/0x3f30 net/core/skbuff.c:4551
Code: 83 e3 02 e9 fb ed ff ff e8 90 68 1c f9 48 8b 84 24 f8 00 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 8a 21 00 00 48 8b 84 24 f8 00
RSP: 0018:ffffc900043473d0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000010046 RCX: ffffffff886b1597
RDX: 000000000000000e RSI: ffffffff886b2520 RDI: 0000000000000070
RBP: ffffc90004347578 R08: 0000000000000005 R09: 000000000000ffff
R10: 000000000000ffff R11: 0000000000000002 R12: ffff888063202ac0
R13: 0000000000010000 R14: 000000000000ffff R15: 0000000000000046
FS: 0000555556e7e380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 0000000027ee2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
udp6_ufo_fragment+0xa0e/0xd00 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x534/0x17e0 net/ipv6/ip6_offload.c:120
skb_mac_gso_segment+0x290/0x610 net/core/gso.c:53
__skb_gso_segment+0x339/0x710 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x36c/0xeb0 net/core/dev.c:3626
__dev_queue_xmit+0x6f3/0x3d60 net/core/dev.c:4338
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
packet_xmit+0x257/0x380 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x24c6/0x5220 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f8692032aa9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 d1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff8d685418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8692032aa9
RDX: 0000000000010048 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000000f4240 R08: 0000000020000540 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8d685480
R13: 0000000000000001 R14: 00007fff8d685480 R15: 0000000000000003
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:skb_segment+0x181d/0x3f30 net/core/skbuff.c:4551
Code: 83 e3 02 e9 fb ed ff ff e8 90 68 1c f9 48 8b 84 24 f8 00 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 8a 21 00 00 48 8b 84 24 f8 00
RSP: 0018:ffffc900043473d0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000010046 RCX: ffffffff886b1597
RDX: 000000000000000e RSI: ffffffff886b2520 RDI: 0000000000000070
RBP: ffffc90004347578 R08: 0000000000000005 R09: 000000000000ffff
R10: 000000000000ffff R11: 0000000000000002 R12: ffff888063202ac0
R13: 0000000000010000 R14: 000000000000ffff R15: 0000000000000046
FS: 0000555556e7e380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 0000000027ee2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 3953c46c3a ("sk_buff: allow segmenting based on frag sizes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231212164621.4131800-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 18:26:25 -08:00
Nikolay Kuratov 60316d7f10 vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
We need to do signed arithmetic if we expect condition
`if (bytes < 0)` to be possible

Found by Linux Verification Center (linuxtesting.org) with SVACE

Fixes: 06a8fc7836 ("VSOCK: Introduce virtio_vsock_common.ko")
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20231211162317.4116625-1-kniv@yandex-team.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 17:59:08 -08:00
Yusong Gao 829649443e sign-file: Fix incorrect return values check
There are some wrong return values check in sign-file when call OpenSSL
API. The ERR() check cond is wrong because of the program only check the
return value is < 0 which ignored the return val is 0. For example:
1. CMS_final() return 1 for success or 0 for failure.
2. i2d_CMS_bio_stream() returns 1 for success or 0 for failure.
3. i2d_TYPEbio() return 1 for success and 0 for failure.
4. BIO_free() return 1 for success and 0 for failure.

Link: https://www.openssl.org/docs/manmaster/man3/
Fixes: e5a2e3c847 ("scripts/sign-file.c: Add support for signing with a raw signature")
Signed-off-by: Yusong Gao <a869920004@gmail.com>
Reviewed-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20231213024405.624692-1-a869920004@gmail.com/ # v5
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-13 12:55:11 -08:00
Linus Torvalds 5bd7ef53ff ufs got broken this merge window on folio conversion - calling
conventions for filemap_lock_folio() are not the same as for
 find_lock_page()
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZXnZHAAKCRBZ7Krx/gZQ
 6y3QAQCazzMsqWYmqfkbR5yGjolKBPS6ILFWBHWoFySs9/WptAEA3c/960nhFuh1
 aQE9Qp5zUlbWmSZ5zjz3Q2lX8N/jugU=
 =kyMm
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull ufs fix from Al Viro:
 "ufs got broken this merge window on folio conversion - calling
  conventions for filemap_lock_folio() are not the same as for
  find_lock_page()"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix ufs_get_locked_folio() breakage
2023-12-13 11:09:58 -08:00
Jakub Kicinski 9702817384 Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
This reverts commit f3f32a356c.

Paolo reports that the change disables autocorking even after
the userspace sets TCP_CORK.

Fixes: f3f32a356c ("tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set")
Link: https://lore.kernel.org/r/0d30d5a41d3ac990573016308aaeacb40a9dc79f.camel@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 10:58:54 -08:00
Linus Torvalds af2a9c6a83 EFI fixes for v6.7 #2
- Deal with a regression in the recently refactored x86 EFI stub code on
   older Dell systems by disabling randomization of the physical load
   address
 - Use the correct load address for relocatable Loongarch kernels
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZXgvLAAKCRAwbglWLn0t
 XLgKAP9oKLP7v0TD2BJOPGqr4kEtMfZYayV2EUN387VbPYfT0wEAoeDeZmaGUYce
 BuovToERSgjj2FylAWNlZATEh2d35ww=
 =kv9E
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Deal with a regression in the recently refactored x86 EFI stub code
   on older Dell systems by disabling randomization of the physical load
   address

 - Use the correct load address for relocatable Loongarch kernels

* tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/x86: Avoid physical KASLR on older Dell systems
  efi/loongarch: Use load address to calculate kernel entry address
2023-12-13 10:54:50 -08:00
Al Viro 485053bb81 fix ufs_get_locked_folio() breakage
filemap_lock_folio() returns ERR_PTR(-ENOENT) if the thing is not
in cache - not NULL like find_lock_page() used to.

Fixes: 5fb7bd50b3 "ufs: add ufs_get_locked_folio and ufs_put_locked_folio"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-13 11:14:09 -05:00
David S. Miller 2513974cc3 Merge branch 'stmmac-bug-fixes'
Yanteng Si says:

====================
stmmac: Some bug fixes

* Put Krzysztof's patch into my thread, pick Conor's Reviewed-by
  tag and Jiaxun's Acked-by tag.(prev version is RFC patch)

* I fixed an Oops related to mdio, mainly to ensure that
  mdio is initialized before use, because it will be used
  in a series of patches I am working on.

see <https://lore.kernel.org/loongarch/cover.1699533745.git.siyanteng@loongson.cn/T/#t>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13 10:57:01 +00:00
Krzysztof Kozlowski 4907a3f54b MIPS: dts: loongson: drop incorrect dwmac fallback compatible
Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
so checking for some other compatible does not make sense.  It cannot be
bound to unsupported platform.

Drop useless, incorrect (space in between) and undocumented compatible.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13 10:57:01 +00:00
Krzysztof Kozlowski 31fea092c6 stmmac: dwmac-loongson: drop useless check for compatible fallback
Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
so checking for some other compatible does not make sense.  It cannot be
bound to unsupported platform.

Drop useless, incorrect (space in between) and undocumented compatible.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13 10:57:00 +00:00
Yanteng Si e87d3a1370 stmmac: dwmac-loongson: Make sure MDIO is initialized before use
Generic code will use mdio. If it is not initialized before use,
the kernel will Oops.

Fixes: 30bba69d7d ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13 10:57:00 +00:00
Salvatore Dipietro f3f32a356c tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
Based on the tcp man page, if TCP_NODELAY is set, it disables Nagle's algorithm
and packets are sent as soon as possible. However in the `tcp_push` function
where autocorking is evaluated the `nonagle` value set by TCP_NODELAY is not
considered which can trigger unexpected corking of packets and induce delays.

For example, if two packets are generated as part of a server's reply, if the
first one is not transmitted on the wire quickly enough, the second packet can
trigger the autocorking in `tcp_push` and be delayed instead of sent as soon as
possible. It will either wait for additional packets to be coalesced or an ACK
from the client before transmitting the corked packet. This can interact badly
if the receiver has tcp delayed acks enabled, introducing 40ms extra delay in
completion times. It is not always possible to control who has delayed acks
set, but it is possible to adjust when and how autocorking is triggered.
Patch prevents autocorking if the TCP_NODELAY flag is set on the socket.

Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and
Apache Tomcat 9.0.83 running the basic servlet below:

import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class HelloWorldServlet extends HttpServlet {
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
        String s = "a".repeat(3096);
        osw.write(s,0,s.length());
        osw.flush();
    }
}

Load was applied using  wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance.  With the current auto-corking behavior and TCP_NODELAY
set an additional 40ms latency from P99.99+ values are observed.  With the
patch applied we see no occurrences of 40ms latencies. The patch has also been
tested with iperf and uperf benchmarks and no regression was observed.

# No patch with tcp_autocorking=1 and TCP_NODELAY set on all sockets
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.49.177:8080/hello/hello'
  ...
 50.000%    0.91ms
 75.000%    1.12ms
 90.000%    1.46ms
 99.000%    1.73ms
 99.900%    1.96ms
 99.990%   43.62ms   <<< 40+ ms extra latency
 99.999%   48.32ms
100.000%   49.34ms

# With patch
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.49.177:8080/hello/hello'
  ...
 50.000%    0.89ms
 75.000%    1.13ms
 90.000%    1.44ms
 99.000%    1.67ms
 99.900%    1.78ms
 99.990%    2.27ms   <<< no 40+ ms extra latency
 99.999%    3.71ms
100.000%    4.57ms

Fixes: f54b311142 ("tcp: auto corking")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13 10:37:05 +00:00
Linus Torvalds 88035e5694 hid-for-linus-2023121201
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIVAwUAZXji4KZi849r7WBJAQKOXhAApTeEWwONuprA4bT5mwi2CAhYmiqmJI2H
 VbFBe8HX0f/Mrupp/E39/m2kCSNKq633iXC7XQT4XnvyhQQPEmPYISCd1ULogx0c
 Jb9XRqM3OjZzmEbk4AaL7cZKRS1LOLERoJ5BSytQ3i8+KbjjCVieO4r/Hk3qfoGd
 bJgtP5IxCwGE7ByOpz2OaRFOstrj3knrpnqEM8UgRkqfymzPVOHisBm06URcFA3B
 ARJ+Mc/rl1niuWwAWWWAQNbPvHS1nxz4KxUtRGbgKfoSx3ungMd5EC5/1K16eopk
 p+26Lv5dq7x/kmy2cMZUBr0DI+QeR2LHsiIMVQOtVu5FZ7b8Z45fKqFOZFbWHq/l
 YDGHJu9fDPMbfSZrSMT5KNEqW1Epqe0wHvDLr94lICkvhxiDNAWQnxAda3u/2vIV
 o42q7yXrxH3JM/RcBIeFKfb236ZAGfSdEmCGqcRNllWoNJMWH85TG/i1WcWho9tT
 GuP64fODIbMafqn4MoO9zNnmVGrXC8yLyYo2RvASt7wv/JM7vu6zUKQd4tsBL3P2
 WhnL1YRHBtwADtwN77rGGo4zhq2uh57iDRFHr/PefEBSW2yCPf3Ba9C5iBMsIDr7
 mRbuP/88NxwUNBCy+a4QyKROExSUwLTPyQOGIofIPxl1BqjaEnEaMTkQdfzjtuiT
 OUg/yR3G8sU=
 =CC6N
 -----END PGP SIGNATURE-----

Merge tag 'hid-for-linus-2023121201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - Lenovo ThinkPad TrackPoint Keyboard II firmware-specific regression
   fix (Mikhail Khvainitski)

 - device-specific fixes (various authors)

* tag 'hid-for-linus-2023121201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboards
  HID: lenovo: Restrict detection of patched firmware only to USB cptkbd
  HID: Add quirk for Labtec/ODDOR/aikeec handbrake
  HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[]
  mailmap: add address mapping for Jiri Kosina
2023-12-12 17:02:56 -08:00
Jiri Pirko 65c95f7891 dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
User may not pass DPLL_A_PIN_STATE attribute in the pin set operation
message. Sanitize that by checking if the attr pointer is not null
and process the passed state attribute value only in that case.

Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: 9d71b54b65 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20231211083758.1082853-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:20:54 -08:00
Jakub Kicinski 154bb2fa48 Merge branch 'ena-driver-xdp-bug-fixes'
David Arinzon says:

====================
ENA driver XDP bug fixes

This patchset contains multiple XDP-related bug fixes
in the ENA driver.
====================

Link: https://lore.kernel.org/r/20231211062801.27891-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:33 -08:00
David Arinzon 4ab138ca0a net: ena: Fix XDP redirection error
When sending TX packets, the meta descriptor can be all zeroes
as no meta information is required (as in XDP).

This patch removes the validity check, as when
`disable_meta_caching` is enabled, such TX packets will be
dropped otherwise.

Fixes: 0e3a3f6dac ("net: ena: support new LLQ acceleration mode")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
David Arinzon d760117060 net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
This patch fixes two issues:

Issue 1
-------
Description
```````````
Current code does not call dma_sync_single_for_cpu() to sync data from
the device side memory to the CPU side memory before the XDP code path
uses the CPU side data.
This causes the XDP code path to read the unset garbage data in the CPU
side memory, resulting in incorrect handling of the packet by XDP.

Solution
````````
1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to
   use the data in the CPU side memory.
2. The XDP code verdict can be XDP_PASS, in which case there is a
   fallback to the non-XDP code, which also calls
   dma_sync_single_for_cpu().
   To avoid calling dma_sync_single_for_cpu() twice:
2.1. Put the dma_sync_single_for_cpu() in the code in such a place where
     it happens before XDP and non-XDP code.
2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code
     for the first buffer only (rx_copybreak and non-rx_copybreak
     cases), since the new call that was added covers these cases.
     The call to dma_sync_single_for_cpu() for the second buffer and on
     stays because only the first buffer is handled by the newly added
     dma_sync_single_for_cpu(). And there is no need for special
     handling of the second buffer and on for the XDP path since
     currently the driver supports only single buffer packets.

Issue 2
-------
Description
```````````
In case the XDP code forwarded the packet (ENA_XDP_FORWARDED),
ena_unmap_rx_buff_attrs() is called with attrs set to 0.
This means that before unmapping the buffer, the internal function
dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on
the whole buffer (not only on the data part of it).
This sync is both wasteful (since a sync was already explicitly
called before) and also causes a bug, which will be explained
using the below diagram.

The following diagram shows the flow of events causing the bug.
The order of events is (1)-(4) as shown in the diagram.

CPU side memory area

     (3)convert_to_xdp_frame() initializes the
        headroom with xdpf metadata
                      ||
                      \/
          ___________________________________
         |                                   |
 0       |                                   V                       4K
 ---------------------------------------------------------------------
 | xdpf->data      | other xdpf       |   < data >   | tailroom ||...|
 |                 | fields           |              | GARBAGE  ||   |
 ---------------------------------------------------------------------

                   /\                        /\
                   ||                        ||
   (4)ena_unmap_rx_buff_attrs() calls     (2)dma_sync_single_for_cpu()
      dma_sync_single_for_cpu() on the       copies data from device
      whole buffer page, overwriting         side to CPU side memory
      the xdpf->data with GARBAGE.           ||
 0                                                                   4K
 ---------------------------------------------------------------------
 | headroom                           |   < data >   | tailroom ||...|
 | GARBAGE                            |              | GARBAGE  ||   |
 ---------------------------------------------------------------------

Device side memory area                      /\
                                             ||
                               (1) device writes RX packet data

After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data
becomes corrupted, and so when it is later accessed in
ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault,
crashing the kernel.

Solution
````````
Explicitly tell ena_unmap_rx_buff_attrs() not to call
dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC
flag.

Fixes: f7d625adeb ("net: ena: Add dynamic recycling mechanism for rx buffers")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
David Arinzon 505b1a88d3 net: ena: Fix xdp drops handling due to multibuf packets
Current xdp code drops packets larger than ENA_XDP_MAX_MTU.
This is an incorrect condition since the problem is not the
size of the packet, rather the number of buffers it contains.

This commit:

1. Identifies and drops XDP multi-buffer packets at the
   beginning of the function.
2. Increases the xdp drop statistic when this drop occurs.
3. Adds a one-time print that such drops are happening to
   give better indication to the user.

Fixes: 838c93dc54 ("net: ena: implement XDP drop support")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
David Arinzon 41db6f99b5 net: ena: Destroy correct number of xdp queues upon failure
The ena_setup_and_create_all_xdp_queues() function freed all the
resources upon failure, after creating only xdp_num_queues queues,
instead of freeing just the created ones.

In this patch, the only resources that are freed, are the ones
allocated right before the failure occurs.

Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
Dong Chenchen f99cd56230 net: Remove acked SYN flag from packet in the transmit queue correctly
syzkaller report:

 kernel BUG at net/core/skbuff.c:3452!
 invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135
 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452)
 Call Trace:
 icmp_glue_bits (net/ipv4/icmp.c:357)
 __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165)
 ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341)
 icmp_push_reply (net/ipv4/icmp.c:370)
 __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772)
 ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577)
 __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295)
 ip_output (net/ipv4/ip_output.c:427)
 __ip_queue_xmit (net/ipv4/ip_output.c:535)
 __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
 __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387)
 tcp_retransmit_skb (net/ipv4/tcp_output.c:3404)
 tcp_retransmit_timer (net/ipv4/tcp_timer.c:604)
 tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)

The panic issue was trigered by tcp simultaneous initiation.
The initiation process is as follows:

      TCP A                                            TCP B

  1.  CLOSED                                           CLOSED

  2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...

  3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT

  4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED

  5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...

  // TCP B: not send challenge ack for ack limit or packet loss
  // TCP A: close
	tcp_close
	   tcp_send_fin
              if (!tskb && tcp_under_memory_pressure(sk))
                  tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
           TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;  // set FIN flag

  6.  FIN_WAIT_1  --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...

  // TCP B: send challenge ack to SYN_FIN_ACK

  7.               ... <SEQ=301><ACK=101><CTL=ACK>   <-- SYN-RECEIVED //challenge ack

  // TCP A:  <SND.UNA=101>

  8.  FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic

	__tcp_retransmit_skb  //skb->len=0
	    tcp_trim_head
		len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
		    __pskb_trim_head
			skb->data_len -= len // skb->len=-1, wrap around
	    ... ...
	    ip_fragment
		icmp_glue_bits //BUG_ON

If we use tcp_trim_head() to remove acked SYN from packet that contains data
or other flags, skb->len will be incorrectly decremented. We can remove SYN
flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
can fix the problem mentioned above.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 15:56:02 -08:00
Dinghao Liu b65d52ac9c qed: Fix a potential use-after-free in qed_cxt_tables_alloc
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
qed_cxt_tables_alloc() accesses the freed pointer on failure
of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
which may lead to use-after-free. Fix this issue by setting
p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().

Fixes: fe56b9e6a8 ("qed: Add module with basic common support")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 13:33:51 -08:00
Linus Torvalds cf52eed70e Fix various bugs / regressions for ext4, including a soft lockup, a
WARN_ON, and a BUG.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmV4tMwACgkQ8vlZVpUN
 gaNZRAf/ejQZne9iZck8SSV62mkR9E7EwN9J2+gkWFrlsyurErZlVsBA5yRB+i9A
 V1v6DRGDnYFwKFNHJhR/RW9NhEwpYkX9Vo3miksSCq8rsAB1kjSs3xVrTBIYi/8c
 ztw4ncyxW7RRFRmruzFfUEKriiyJzxJYx+EqbNsQHcl5ET6Y2/5zM0bChV9MwuN3
 iS1Rm98RbHVrylzKbGG562MaGdJyUYvQ+mnRCgma1mTu6K9SWLJg211icLTsDhHg
 XEB/QGWji2O7xOudcry8wLIpoR6rYPAhWfbkLekW1K9hjV3iXuJoVjj7eB9LctMf
 FAXr8u0FKJI0iIQyrQrEEqIuh+jKBA==
 =4zQL
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix various bugs / regressions for ext4, including a soft lockup, a
  WARN_ON, and a BUG"

* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix soft lockup in journal_finish_inode_data_buffers()
  ext4: fix warning in ext4_dio_write_end_io()
  jbd2: increase the journal IO's priority
  jbd2: correct the printing of write_flags in jbd2_write_superblock()
  ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12 11:37:04 -08:00
Slawomir Laba 7ae42ef308 iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
Make the flow for pci shutdown be the same to the pci remove.

iavf_shutdown was implementing an incomplete version
of iavf_remove. It misses several calls to the kernel like
iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap
that might break the system on reboot or hibernation.

Implement the call of iavf_remove directly in iavf_shutdown to
close this gap.

Fixes below error messages (dmesg) during shutdown stress tests -
[685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for
 VF 0
[685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for
VF 0

Reproduction:

1. Create one VF interface:
echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs

2. Run live dmesg on the host:
dmesg -wH

3. On SUT, script below steps into vf_namespace_assignment.sh

<#!/bin/sh> // Remove <>. Git removes # line
if=<VF name> (edit this per VF name)
loop=0

while true; do

echo test round $loop
let loop++

ip netns add ns$loop
ip link set dev $if up
ip link set dev $if netns ns$loop
ip netns exec ns$loop ip link set dev $if up
ip netns exec ns$loop ip link set dev $if netns 1
ip netns delete ns$loop

done

4. Run the script for at least 1000 iterations on SUT:
./vf_namespace_assignment.sh

Expected result:
No errors in dmesg.

Fixes: 129cf89e58 ("iavf: rename functions and structs to new name")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 11:21:30 -08:00