Commit graph

1240296 commits

Author SHA1 Message Date
Chuck Lever 57666bbb4e svcrdma: Remove struct svc_rdma_read_info
The remaining fields of struct svc_rdma_read_info are no longer
referenced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever efd02cb0dd svcrdma: Update the synopsis of svc_rdma_read_special()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_special() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever 23bab3b22d svcrdma: Update the synopsis of svc_rdma_read_call_chunk()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_call_chunk() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever 740a3c895d svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever 6518204d23 svcrdma: Update synopsis of svc_rdma_copy_inline_range()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_copy_inline_range() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever 6e4b9b8643 svcrdma: Update the synopsis of svc_rdma_read_data_item()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_data_item() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever c7eb4feb1b svcrdma: Update synopsis of svc_rdma_read_chunk_range()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever 02e8fe1eca svcrdma: Update synopsis of svc_rdma_build_read_chunk()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk() can use that recv_ctxt to derive that
information rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever fc20f19b4d svcrdma: Update synopsis of svc_rdma_build_read_segment()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_segment() can use the recv_ctxt to derive that
information rather than the other way around. This removes one usage
of the ri_readctxt field, enabling its removal in a subsequent
patch.

At the same time, the use of ri_rqst can similarly be replaced with
a passed-in function parameter.

Start with build_read_segment() because it is a common utility
function at the bottom of the Read chunk path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever 919f6e790a svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt
Further clean up: move the starting byte offset field into
svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever 8e12258268 svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt
Further clean up: move the page index field into svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever b1818412d0 svcrdma: Start moving fields out of struct svc_rdma_read_info
Since the request's svc_rdma_recv_ctxt will stay around for the
duration of the RDMA Read operation, the contents of struct
svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt
rather than being allocated separately. This will eventually save a
call to kmalloc() in a hot path.

Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever 6a04a43493 svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h
Prepare for nestling these into the send and recv ctxts so they
no longer have to be allocated dynamically.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever 2cc0f23b53 svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field
In every instance, the pointer address in that field is now
available by other means.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever bc8fd4e915 svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever 83fe6dd6a8 svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever 4a68edd93f svcrdma: Explicitly pass the transport into Read chunk I/O paths
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever c3899b7107 svcrdma: Explicitly pass the transport into Write chunk I/O paths
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever c4fd9f4525 svcrdma: Acquire the svcxprt_rdma pointer from the CQ context
Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever 5ef6c66676 svcrdma: Reduce size of struct svc_rdma_rw_ctxt
SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first
SGL array more than 4200 bytes in length, pushing the memory
allocation well into order 1.

Even so, the RDMA rw core doesn't seem to use more than max_send_sge
entries in that array (typically 32 or less), so that is all wasted
space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever 2dd6e29a3e svcrdma: Update some svcrdma DMA-related tracepoints
A send/recv_ctxt already records transport-related information
in the cq.id, thus there is no need to record the IP addresses of
the transport endpoints.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever 848760a9e7 svcrdma: DMA error tracepoints should report completion IDs
Update the DMA error flow tracepoints to report the completion ID of
the failing context. This ties the wait/failure to a particular
operation or request, which is more useful than knowing only the
failing transport.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever ad3656bd84 svcrdma: SQ error tracepoints should report completion IDs
Update the Send Queue's error flow tracepoints to report the
completion ID of the waiting or failing context. This ties the
wait/failure to a particular operation or request, which is a little
more useful than knowing only the transport that is about to close.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever be2acb1048 rpcrdma: Introduce a simple cid tracepoint class
De-duplicate some code, making it easier to add new tracepoints that
report only a completion ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever 907e34a7d0 svcrdma: Add lockdep class keys for transport locks
Two svcrdma-related transport locks can become quite contended.
Collate their use and make them easy to find in /proc/lock_stat for
better observability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever bfb81535c2 svcrdma: Clean up locking
There's no need to protect llist_entry() with a spin lock.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever f09c36c8df svcrdma: Add an async version of svc_rdma_write_info_free()
DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing write_info
structs to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Write completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever ae225fe27b svcrdma: Add an async version of svc_rdma_send_ctxt_put()
DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing send_ctxts
to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Send completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever 9c7e1a0658 svcrdma: Add a utility workqueue to svcrdma
To handle work in the background, set up an UNBOUND workqueue for
svcrdma. Subsequent patches will make use of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever 877118c667 svcrdma: Pre-allocate svc_rdma_recv_ctxt objects
The original reason for allocating svc_rdma_recv_ctxt objects during
Receive completion was to ensure the objects were allocated on the
NUMA node closest to the underlying IB device.

Since commit c5d68d25bd ("svcrdma: Clean up allocation of
svc_rdma_recv_ctxt"), however, the device's favored node is
explicitly passed to the memory allocator.

To enable switching Receive completion to soft IRQ context, move
memory allocation out of completion handling, since it can be
costly, and it can sleep.

A limited number of objects is now allocated at "accept" time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever b541dd554b svcrdma: Eliminate allocation of recv_ctxt objects in backchannel
The svc_rdma_recv_ctxt free list uses a lockless list to avoid the
need for a spin lock in the fast path. llist_del_first(), which is
used by svc_rdma_recv_ctxt_get(), requires serialization, however,
when there are multiple list producers that are unserialized.

I mistakenly thought there was only one caller of
svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit
serialization would not be necessary. But there is another caller:
svc_rdma_bc_sendto(), and these two are not serialized against each
other. I haven't seen ill effects that I could directly ascribe to
a lack of serialization. It's just an observation based on code
audit.

When DMA-mapping before sending a Reply, the passed-in struct
svc_rdma_recv_ctxt is used only for its write and reply PCLs. These
are currently always empty in the backchannel case. So, instead of
passing a full svc_rdma_recv_ctxt object to
svc_rdma_map_reply_msg(), let's pass in just the Write and Reply
PCLs.

This change makes it unnecessary for the backchannel to acquire a
dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need
for svc_rdma_recv_ctxt free list serialization is now completely
avoided.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
ChenXiaoSong 52e8910075 NFSv4, NFSD: move enum nfs_cb_opnum4 to include/linux/nfs4.h
Callback operations enum is defined in client and server, move it to
common header file.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Dan Carpenter 3c86e615d1 nfsd: remove unnecessary NULL check
We check "state" for NULL on the previous line so it can't be NULL here.
No need to check again.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202312031425.LffZTarR-lkp@intel.com/
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever 3587b5c753 SUNRPC: Remove RQ_SPLICE_OK
This flag is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever a2c91753a4 NFSD: Modify NFSv4 to use nfsd_read_splice_ok()
Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever c21fd7a8e8 NFSD: Replace RQ_SPLICE_OK in nfsd_read()
RQ_SPLICE_OK is a bit of a layering violation. Also, a subsequent
patch is going to provide a mechanism for always disabling splice
reads.

Splicing is an issue only for NFS READs, so refactor nfsd_read() to
check the auth type directly instead of relying on an rq_flag
setting.

The new helper will be added into the NFSv4 read path in a
subsequent patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever deb704281f SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor
NFSD will use this new API to determine whether nfsd_splice_read is
safe to use. This avoids the need to add a dependency to NFSD for
CONFIG_SUNRPC_GSS.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever a853ed5525 NFSD: Document lack of f_pos_lock in nfsd_readdir()
Al Viro notes that normal system calls hold f_pos_lock when calling
->iterate_shared and ->llseek; however nfsd_readdir() does not take
that mutex when calling these methods.

It should be safe however because the struct file acquired by
nfsd_readdir() is not visible to other threads.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever d0ab8b649b NFSD: Remove nfsd_drc_gc() tracepoint
This trace point was for debugging the DRC's garbage collection. In
the field it's just noise.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever ce7df05508 NFSD: Make the file_delayed_close workqueue UNBOUND
workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8
	times, consider switching to WQ_UNBOUND

There's no harm in closing a cached file descriptor on another core.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Oleg Nesterov f3734cc407 NFSD: use read_seqbegin() rather than read_seqbegin_or_lock()
The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect,
this code can never take ->writeverf_lock for writing.

I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
and nfsd_reset_write_verifier() is supposed to be very rare operation
so we do not need the adaptive locking in this case.

Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
without changing the behaviour.

[ cel: Note also that it eliminates this Sparse warning:

fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' -
	different lock contexts for basic block

]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:24 -05:00
Jeff Layton 74fd48739d nfsd: new Kconfig option for legacy client tracking
We've had a number of attempts at different NFSv4 client tracking
methods over the years, but now nfsdcld has emerged as the clear winner
since the others (recoverydir and the usermodehelper upcall) are
problematic.

As a case in point, the recoverydir backend uses MD5 hashes to encode
long form clientid strings, which means that nfsd repeatedly gets dinged
on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not
cryptographically significant, so there is no danger there, but allowing
us to compile that out allows us to sidestep the issue entirely.

As a prelude to eventually removing support for these client tracking
methods, add a new Kconfig option that enables them. Mark it deprecated
and make it default to N.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:24 -05:00
Christophe JAILLET 3ee29a4474 ipvlan: Remove usage of the deprecated ida_simple_xx() API
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

This is less verbose.

Note that the upper bound of ida_alloc_range() is inclusive while the one
of ida_simple_get() was exclusive. So calls have been updated accordingly.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 22:32:48 +00:00
Christophe JAILLET e900274f27 ipvlan: Fix a typo in a comment
s/diffentiate/differentiate/

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 22:32:48 +00:00
Paulo Alcantara 6d039984c1 smb: client: stop revalidating reparse points unnecessarily
Query dir responses don't provide enough information on reparse points
such as major/minor numbers and symlink targets other than reparse
tags, however we don't need to unconditionally revalidate them only
because they are reparse points.  Instead, revalidate them only when
their ctime or reparse tag has changed.

For instance, Windows Server updates ctime of reparse points when
their data have changed.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-07 15:46:06 -06:00
David Howells 6ebfede8d5 cifs: Pass unbyteswapped eof value into SMB2_set_eof()
Change SMB2_set_eof() to take eof as CPU order rather than __le64 and pass
it directly rather than by pointer.  This moves the conversion down into
SMB_set_eof() rather than all of its callers and means we don't need to
undo it for the traceline.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-07 15:46:06 -06:00
Markus Elfring 96d566b6c9 smb3: Improve exception handling in allocate_mr_list()
The kfree() function was called in one case by
the allocate_mr_list() function during error handling
even if the passed variable contained a null pointer.
This issue was detected by using the Coccinelle software.

Thus use another label.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-07 15:46:06 -06:00
Shyam Prasad N 516eea97f9 cifs: fix in logging in cifs_chan_update_iface
Recently, cifs_chan_update_iface was modified to not
remove an iface if a suitable replacement was not found.
With that, there were two conditionals that were exactly
the same. This change removes that extra condition check.

Also, fixed a logging in the same function to indicate
the correct message.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-07 15:46:06 -06:00
Paulo Alcantara 9c38568a75 smb: client: handle special files and symlinks in SMB3 POSIX
Parse reparse points in SMB3 posix query info as they will be
supported and required by the new specification.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-07 15:46:06 -06:00
Paulo Alcantara 3ded18a9e9 smb: client: cleanup smb2_query_reparse_point()
Use smb2_compound_op() with SMB2_OP_GET_REPARSE to get reparse point.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-07 15:46:06 -06:00