linux/fs/nfs
Dave Wysochanski fd5860ab63 NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt
The loop inside nfs_netfs_issue_read() currently does not disable
interrupts while iterating through pages in the xarray to submit
for NFS read.  This is not safe though since after taking xa_lock,
another page in the mapping could be processed for writeback inside
an interrupt, and deadlock can occur.  The fix is simple and clean
if we use xa_for_each_range(), which handles the iteration with RCU
while reducing code complexity.

The problem is easily reproduced with the following test:
 mount -o vers=3,fsc 127.0.0.1:/export /mnt/nfs
 dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1
 echo 3 > /proc/sys/vm/drop_caches
 dd if=/mnt/nfs/file1.bin of=/dev/null
 umount /mnt/nfs

On the console with a lockdep-enabled kernel a message similar to
the following will be seen:

 ================================
 WARNING: inconsistent lock state
 6.7.0-lockdbg+ #10 Not tainted
 --------------------------------
 inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
 test5/1708 [HC0[0]:SC0[0]:HE1:SE1] takes:
 ffff888127baa598 (&xa->xa_lock#4){+.?.}-{3:3}, at:
nfs_netfs_issue_read+0x1b2/0x4b0 [nfs]
 {IN-SOFTIRQ-W} state was registered at:
   lock_acquire+0x144/0x380
   _raw_spin_lock_irqsave+0x4e/0xa0
   __folio_end_writeback+0x17e/0x5c0
   folio_end_writeback+0x93/0x1b0
   iomap_finish_ioend+0xeb/0x6a0
   blk_update_request+0x204/0x7f0
   blk_mq_end_request+0x30/0x1c0
   blk_complete_reqs+0x7e/0xa0
   __do_softirq+0x113/0x544
   __irq_exit_rcu+0xfe/0x120
   irq_exit_rcu+0xe/0x20
   sysvec_call_function_single+0x6f/0x90
   asm_sysvec_call_function_single+0x1a/0x20
   pv_native_safe_halt+0xf/0x20
   default_idle+0x9/0x20
   default_idle_call+0x67/0xa0
   do_idle+0x2b5/0x300
   cpu_startup_entry+0x34/0x40
   start_secondary+0x19d/0x1c0
   secondary_startup_64_no_verify+0x18f/0x19b
 irq event stamp: 176891
 hardirqs last  enabled at (176891): [<ffffffffa67a0be4>]
_raw_spin_unlock_irqrestore+0x44/0x60
 hardirqs last disabled at (176890): [<ffffffffa67a0899>]
_raw_spin_lock_irqsave+0x79/0xa0
 softirqs last  enabled at (176646): [<ffffffffa515d91e>]
__irq_exit_rcu+0xfe/0x120
 softirqs last disabled at (176633): [<ffffffffa515d91e>]
__irq_exit_rcu+0xfe/0x120

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&xa->xa_lock#4);
   <Interrupt>
     lock(&xa->xa_lock#4);

  *** DEADLOCK ***

 2 locks held by test5/1708:
  #0: ffff888127baa498 (&sb->s_type->i_mutex_key#22){++++}-{4:4}, at:
      nfs_start_io_read+0x28/0x90 [nfs]
  #1: ffff888127baa650 (mapping.invalidate_lock#3){.+.+}-{4:4}, at:
      page_cache_ra_unbounded+0xa4/0x280

 stack backtrace:
 CPU: 6 PID: 1708 Comm: test5 Kdump: loaded Not tainted 6.7.0-lockdbg+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
 Call Trace:
  dump_stack_lvl+0x5b/0x90
  mark_lock+0xb3f/0xd20
  __lock_acquire+0x77b/0x3360
  _raw_spin_lock+0x34/0x80
  nfs_netfs_issue_read+0x1b2/0x4b0 [nfs]
  netfs_begin_read+0x77f/0x980 [netfs]
  nfs_netfs_readahead+0x45/0x60 [nfs]
  nfs_readahead+0x323/0x5a0 [nfs]
  read_pages+0xf3/0x5c0
  page_cache_ra_unbounded+0x1c8/0x280
  filemap_get_pages+0x38c/0xae0
  filemap_read+0x206/0x5e0
  nfs_file_read+0xb7/0x140 [nfs]
  vfs_read+0x2a9/0x460
  ksys_read+0xb7/0x140

Fixes: 000dbe0bec ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-03-09 09:14:38 -05:00
..
blocklayout rpc_pipefs: Replace one label in bl_resolve_deviceid() 2024-01-04 10:47:56 -05:00
filelayout pnfs/filelayout: add tracepoint to getdeviceinfo 2024-02-28 16:18:19 -05:00
flexfilelayout hardening updates for v6.7-rc1 2023-10-30 19:09:55 -10:00
cache_lib.c
cache_lib.h
callback.c SUNRPC: discard sv_refcnt, and svc_get/svc_put 2024-01-07 17:54:33 -05:00
callback.h NFS Client Updates for Linux 6.8 2024-01-10 16:13:57 -08:00
callback_proc.c NFSv4.1: if referring calls are complete, trust the stateid argument 2024-01-04 10:47:56 -05:00
callback_xdr.c NFSv4.1: Use the nfs_client's rpc timeouts for backchannel 2024-01-04 17:01:01 -05:00
client.c nfs: fix UAF on pathwalk running into umount 2024-02-25 02:10:32 -05:00
delegation.c NFSv4: fairly test all delegations on a SEQ4_ revocation 2023-11-01 15:15:52 -04:00
delegation.h NFSv4: fairly test all delegations on a SEQ4_ revocation 2023-11-01 15:15:52 -04:00
dir.c nfs: make nfs_set_verifier() safe for use in RCU pathwalk 2024-02-25 02:10:31 -05:00
direct.c NFS: drop unused nfs_direct_req bytes_left 2024-01-04 10:47:56 -05:00
dns_resolve.c NFS: Move common includes outside ifdef 2023-08-24 13:24:15 -04:00
dns_resolve.h
export.c nfsd: allow reaping files still under writeback 2023-04-26 09:04:59 -04:00
file.c NFS Client Updates for Linux 6.8 2024-01-10 16:13:57 -08:00
fs_context.c NFS: Add an "xprtsec=" NFS mount option 2023-06-19 12:30:17 -04:00
fscache.c NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt 2024-03-09 09:14:38 -05:00
fscache.h netfs: Optimise away reads above the point at which there can be no data 2023-12-28 09:45:27 +00:00
getroot.c
inode.c nfs: convert to new timestamp accessors 2023-10-18 14:08:23 +02:00
internal.h NFS: drop unused nfs_direct_req bytes_left 2024-01-04 10:47:56 -05:00
io.c
iostat.h NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API 2023-04-11 13:08:26 -04:00
Kconfig netfs, fscache: Combine fscache with netfs 2023-12-24 15:08:46 +00:00
Makefile
mount_clnt.c
namespace.c fs: pass the request_mask to generic_fillattr 2023-08-09 08:56:36 +02:00
netns.h
nfs.h nfs: move nfs4_xattr_handlers to .rodata 2023-10-09 16:24:20 +02:00
nfs2super.c
nfs2xdr.c NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN 2023-08-30 11:08:27 -04:00
nfs3_fs.h
nfs3acl.c Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
nfs3client.c NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver 2023-08-24 13:24:15 -04:00
nfs3proc.c nfs: Convert nfs_symlink() to use a folio 2023-11-01 15:40:44 -04:00
nfs3super.c
nfs3xdr.c NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN 2023-08-30 11:08:27 -04:00
nfs4_fs.h NFS client updates for Linux 6.7 2023-11-08 13:39:16 -08:00
nfs4client.c NFSv4.1: fix pnfs MDS=DS session trunking 2023-09-13 11:51:11 -04:00
nfs4file.c fs: use splice_copy_file_range() inline helper 2023-12-12 16:20:02 +01:00
nfs4getroot.c
nfs4idmap.c
nfs4idmap.h
nfs4namespace.c
nfs4proc.c NFSv4.2: fix nfs4_listxattr kernel BUG at mm/usercopy.c:102 2024-02-28 16:18:18 -05:00
nfs4renewd.c
nfs4session.c
nfs4session.h
nfs4state.c NFSv4: Fix a state manager thread deadlock regression 2023-09-27 15:16:40 -04:00
nfs4super.c nfs: fix regression in handling of fsc= option in NFSv4 2024-02-28 16:18:19 -05:00
nfs4sysctl.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
nfs4trace.c pnfs/filelayout: add tracepoint to getdeviceinfo 2024-02-28 16:18:19 -05:00
nfs4trace.h pnfs/filelayout: add tracepoint to getdeviceinfo 2024-02-28 16:18:19 -05:00
nfs4xdr.c NFSv4: Always ask for type with READDIR 2024-01-04 10:47:56 -05:00
nfs42.h NFSv4.2: fix listxattr maximum XDR buffer size 2024-02-28 16:18:18 -05:00
nfs42proc.c nfs42: client needs to strip file mode's suid/sgid bit after ALLOCATE op 2023-10-11 09:37:48 -04:00
nfs42xattr.c list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
nfs42xdr.c NFSv4.2: Rework scratch handling for READ_PLUS (again) 2023-08-23 15:58:47 -04:00
nfsroot.c NFS: Prefer strscpy over strlcpy calls 2023-05-22 12:34:41 -07:00
nfstrace.c
nfstrace.h NFS: drop unused nfs_direct_req bytes_left 2024-01-04 10:47:56 -05:00
pagelist.c NFS: Convert buffered read paths to use netfs when fscache is enabled 2023-04-11 13:08:26 -04:00
pnfs.c pNFS: Fix the pnfs block driver's calculation of layoutget size 2024-01-04 10:47:56 -05:00
pnfs.h NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts 2023-10-22 19:47:56 -04:00
pnfs_dev.c NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info 2023-08-24 13:24:15 -04:00
pnfs_nfs.c pNFS: Fix assignment of xprtdata.cred 2023-08-30 14:31:31 -04:00
proc.c nfs: Convert nfs_symlink() to use a folio 2023-11-01 15:40:44 -04:00
read.c NFSv4.2: Rework scratch handling for READ_PLUS (again) 2023-08-23 15:58:47 -04:00
super.c NFS: Display the "fsc=" mount option if it is set 2024-02-28 16:18:18 -05:00
symlink.c
sysctl.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
sysfs.c NFS: Fix sysfs server name memory leak 2023-08-19 10:26:29 -04:00
sysfs.h NFS: Add sysfs links to sunrpc clients for nfs_clients 2023-06-19 15:04:13 -04:00
unlink.c nfs: rename the nfs_async_rename_done tracepoint 2024-01-04 10:47:56 -05:00
write.c NFS Client Updates for Linux 6.8 2024-01-10 16:13:57 -08:00