Commit graph

1216740 commits

Author SHA1 Message Date
Chuck Lever 782448e1ec NFSD: Add nfsd4_encode_fattr4_rdattr_error()
Refactor the encoder for FATTR4_RDATTR_ERROR into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:15 -04:00
Chuck Lever 1252b283aa NFSD: Add nfsd4_encode_fattr4_lease_time()
Refactor the encoder for FATTR4_LEASE_TIME into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:15 -04:00
Chuck Lever b6b6259590 NFSD: Add nfsd4_encode_fattr4_fsid()
Refactor the encoder for FATTR4_FSID into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:14 -04:00
Chuck Lever d0b28aadfd NFSD: Add nfsd4_encode_fattr4_size()
Refactor the encoder for FATTR4_SIZE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:14 -04:00
Chuck Lever 263453d9bb NFSD: Add nfsd4_encode_fattr4_change()
Refactor the encoder for FATTR4_CHANGE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

The code is restructured a bit to use the modern xdr_stream flow,
and the encoded cinfo value is made const so that callers of the
encoders can be passed a const cinfo.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:14 -04:00
Chuck Lever 36ed7e6494 NFSD: Add nfsd4_encode_fattr4_fh_expire_type()
Refactor the encoder for FATTR4_FH_EXPIRE_TYPE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:13 -04:00
Chuck Lever b06cf37545 NFSD: Add nfsd4_encode_fattr4_type()
Refactor the encoder for FATTR4_TYPE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

In addition, restructure the code so that byte-swapping is done on
constant values rather than at run time. Run-time swapping can be
costly on some platforms, and "type" is a frequently-requested
attribute.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:13 -04:00
Chuck Lever c9090e2733 NFSD: Add nfsd4_encode_fattr4_supported_attrs()
Refactor the encoder for FATTR4_SUPPORTED_ATTRS into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:13 -04:00
Chuck Lever 8c4422881f NFSD: Add nfsd4_encode_fattr4__false()
Add an encoding helper that encodes a single boolean "false" value.
Attributes that always return "false" can use this helper.

In a subsequent patch, this helper will be called from a bitmask
loop, so it is given a standardized synopsis.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:12 -04:00
Chuck Lever c88cb4727a NFSD: Add nfsd4_encode_fattr4__true()
Add an encoding helper that encodes a single boolean "true" value.
Attributes that always return "true" can use this helper.

In a subsequent patch, this helper will be called from a bitmask
loop, so it is given a standardized synopsis.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:12 -04:00
Chuck Lever 83ab8678ad NFSD: Add struct nfsd4_fattr_args
I'm about to split nfsd4_encode_fattr() into a number of smaller
functions. Instead of passing a large number of arguments to each of
the smaller functions, create a struct that can gather the common
argument variables into something with a convenient handle on it.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:12 -04:00
Chuck Lever c3dcb45bcd NFSD: Clean up nfsd4_encode_setattr()
De-duplicate the encoding of bitmap4 results in
nfsd4_encode_setattr().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:11 -04:00
Chuck Lever e64301f51b NFSD: Rename nfsd4_encode_bitmap()
For alignment with the specification, the name of NFSD's encoder
function should match the name of the XDR type.

I've also replaced a few "naked integers" with symbolic constants
that better reflect the usage of these values.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:11 -04:00
Chuck Lever 6cc5829140 NFSD: Add simple u32, u64, and bool encoders
The generic XDR encoders return a length or a negative errno. NFSv4
encoders want to know simply whether the encode ran out of stream
buffer space. The return values for server-side encoding are either
nfs_ok or nfserr_resource.

So far I've found it adds a lot of duplicate code to try to use the
generic XDR encoder utilities when encoding the simple data types in
the NFSv4 operation encoders.

Add a set of NFSv4-specific utilities that handle the basic XDR data
types. These are added in xdr4.h so they might eventually be used by
the callback server and pNFS driver encoders too.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:11 -04:00
Chuck Lever 789ce196a3 SUNRPC: Remove BUG_ON call sites
There is no need to take down the whole system for these assertions.

I'd rather not attempt a heroic save here, as some bug has occurred
that has left the transport data structures in an unknown state.
Just warn and then leak the left-over resources.

Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:10 -04:00
Kinglong Mee 2929ba9b46 nfs: fix the typo of rfc number about xattr in NFSv4
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:10 -04:00
Chuck Lever f14122b2c2 tools: ynl: Add source files for nfsd netlink protocol
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:10 -04:00
Lorenzo Bianconi bd9d6a3efa NFSD: add rpc_status netlink support
Introduce rpc_status netlink support for NFSD in order to dump pending
RPC requests debugging information from userspace.

Closes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=366
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:09 -04:00
Lorenzo Bianconi 13727f85b4 NFSD: introduce netlink stubs
Generate stubs and uAPI for nfsd netlink protocol. For the moment,
the new protocol has one operation: rpc_status.

The generated header and source files are created by running:

  tools/net/ynl/ynl-regen.sh

Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:09 -04:00
Dai Ngo 6c41d9a9bd NFSD: handle GETATTR conflict with write delegation
If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

    . update time_modify and time_metadata into file's metadata
      with current time

    . encode GETATTR as normal except the file size is encoded with
      the value returned from CB_GETATTR

    . mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:09 -04:00
Dai Ngo 738401a9bd NFSD: add support for CB_GETATTR callback
Includes:
   . CB_GETATTR proc for nfs4_cb_procedures[]
   . XDR encoding and decoding function for CB_GETATTR request/reply
   . add nfs4_cb_fattr to nfs4_delegation for sending CB_GETATTR
     and store file attributes from client's reply.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:08 -04:00
NeilBrown 15d39883ee SUNRPC: change the back-channel queue to lwq
This removes the need to store and update back-links in the list.
It also remove the need for the _bh version of spin_lock().

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:08 -04:00
NeilBrown 580a25756a SUNRPC: discard sp_lock
sp_lock is now only used to protect sp_all_threads.  This isn't needed
as sp_all_threads is only manipulated through svc_set_num_threads(),
which is already serialized.  Read-acccess only requires rcu_read_lock().
So no more locking is needed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:08 -04:00
NeilBrown 2e8fc923fe SUNRPC: change sp_nrthreads to atomic_t
Using an atomic_t avoids the need to take a spinlock (which can soon be
removed).

Choosing a thread to kill needs to be careful as we cannot set the "die
now" bit atomically with the test on the count.  Instead we temporarily
increase the count.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:07 -04:00
NeilBrown 9a0e6accc0 SUNRPC: use lwq for sp_sockets - renamed to sp_xprts
lwq avoids using back pointers in lists, and uses less locking.
This introduces a new spinlock, but the other one will be removed in a
future patch.

For svc_clean_up_xprts(), we now dequeue the entire queue, walk it to
remove and process the xprts that need cleaning up, then re-enqueue the
remaining queue.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:07 -04:00
NeilBrown 5b80147e0c SUNRPC: only have one thread waking up at a time
Currently if several items of work become available in quick succession,
that number of threads (if available) will be woken.  By the time some
of them wake up another thread that was already cache-warm might have
come along and completed the work.  Anecdotal evidence suggests as many
as 15% of wakes find nothing to do once they get to the point of
looking.

This patch changes svc_pool_wake_idle_thread() to wake the first thread
on the queue but NOT remove it.  Subsequent calls will wake the same
thread.  Once that thread starts it will dequeue itself and after
dequeueing some work to do, it will wake the next thread if there is more
work ready.  This results in a more orderly increase in the number of
busy threads.

As a bonus, this allows us to reduce locking around the idle queue.
svc_pool_wake_idle_thread() no longer needs to take a lock (beyond
rcu_read_lock()) as it doesn't manipulate the queue, it just looks at
the first item.

The thread itself can avoid locking by using the new
llist_del_first_this() interface.  This will safely remove the thread
itself if it is the head.  If it isn't the head, it will do nothing.
If multiple threads call this concurrently only one will succeed.  The
others will do nothing, so no corruption can result.

If a thread wakes up and finds that it cannot dequeue itself that means
either
- that it wasn't woken because it was the head of the queue.  Maybe the
  freezer woke it.  In that case it can go back to sleep (after trying
  to freeze of course).
- some other thread found there was nothing to do very recently, and
  placed itself on the head of the queue in front of this thread.
  It must check again after placing itself there, so it can be deemed to
  be responsible for any pending work, and this thread can go back to
  sleep until woken.

No code ever tests for busy threads any more.  Only each thread itself
cares if it is busy.  So svc_thread_busy() is no longer needed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:07 -04:00
NeilBrown d7926ee8b7 SUNRPC: rename some functions from rqst_ to svc_thread_
Functions which directly manipulate a 'struct rqst', such as
svc_rqst_alloc() or svc_rqst_release_pages(), can reasonably
have "rqst" in there name.
However functions that act on the running thread, such as
XX_should_sleep() or XX_wait_for_work() should seem more
natural with a "svc_thread_" prefix.

So make those changes.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:06 -04:00
NeilBrown de9e82c355 lib: add light-weight queuing mechanism.
lwq is a FIFO single-linked queue that only requires a spinlock
for dequeueing, which happens in process context.  Enqueueing is atomic
with no spinlock and can happen in any context.

This is particularly useful when work items are queued from BH or IRQ
context, and when they are handled one at a time by dedicated threads.

Avoiding any locking when enqueueing means there is no need to disable
BH or interrupts, which is generally best avoided (particularly when
there are any RT tasks on the machine).

This solution is superior to using "list_head" links because we need
half as many pointers in the data structures, and because list_head
lists would need locking to add items to the queue.

This solution is superior to a bespoke solution as all locking and
container_of casting is integrated, so the interface is simple.

Despite the similar name, this solution meets a distinctly different
need to kfifo.  kfifo provides a fixed sized circular buffer to which
data can be added at one end and removed at the other, and does not
provide any locking.  lwq does not have any size limit and works with
data structures (objects?) rather than data (bytes).

A unit test for basic functionality, which runs at boot time, is included.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Gow <davidgow@google.com>
Cc: linux-kernel@vger.kernel.org
Message-Id: <20230911111333.4d1a872330e924a00acb905b@linux-foundation.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:06 -04:00
NeilBrown 8a3e5975ed llist: add llist_del_first_this()
llist_del_first_this() deletes a specific entry from an llist, providing
it is at the head of the list.  Multiple threads can call this
concurrently providing they each offer a different entry.

This can be uses for a set of worker threads which are on the llist when
they are idle.  The head can always be woken, and when it is woken it
can remove itself, and possibly wake the next if there is an excess of
work to do.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:06 -04:00
NeilBrown 9bd4161c59 SUNRPC: change service idle list to be an llist
With an llist we don't need to take a lock to add a thread to the list,
though we still need a lock to remove it.  That will go in the next
patch.

Unlike double-linked lists, a thread cannot reliably remove itself from
the list.  Only the first thread can be removed, and that can change
asynchronously.  So some care is needed.

We already check if there is pending work to do, so we are unlikely to
add ourselves to the idle list and then want to remove ourselves again.

If we DO find something needs to be done after adding ourselves to the
list, we simply wake up the first thread on the list.  If that was us,
we successfully removed ourselves and can continue.  If it was some
other thread, they will do the work that needs to be done.  We can
safely sleep until woken.

We also remove the test on freezing() from rqst_should_sleep().  Instead
we set TASK_FREEZABLE before scheduling.  This makes is safe to
schedule() when a freeze is pending.  As we now loop waiting to be
removed from the idle queue, this is a cleaner way to handle freezing.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:05 -04:00
NeilBrown d6b3358a28 llist: add interface to check if a node is on a list.
With list.h lists, it is easy to test if a node is on a list, providing
it was initialised and that it is removed with list_del_init().

This patch provides similar functionality for llist.h lists.

 init_llist_node()
marks a node as being not-on-any-list be setting the ->next pointer to
the node itself.
 llist_on_list()
tests if the node is on any list.
 llist_del_first_init()
remove the first element from a llist, and marks it as being off-list.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:05 -04:00
NeilBrown 2b65a22684 SUNRPC: discard SP_CONGESTED
We can tell if a pool is congested by checking if the idle list is
empty.  We don't need a separate flag.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:05 -04:00
NeilBrown 5ff817b235 SUNRPC: add list of idle threads
Rather than searching a list of threads to find an idle one, having a
list of idle threads allows an idle thread to be found immediately.

This adds some spin_lock calls which is not ideal, but as the hold-time
is tiny it is still faster than searching a list.  A future patch will
remove them using llist.h.  This involves some subtlety and so is left
to a separate patch.

This removes the need for the RQ_BUSY flag.  The rqst is "busy"
precisely when it is not on the "idle" list.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:04 -04:00
NeilBrown fa341560ca SUNRPC: change how svc threads are asked to exit.
svc threads are currently stopped using kthread_stop().  This requires
identifying a specific thread.  However we don't care which thread
stops, just as long as one does.

So instead, set a flag in the svc_pool to say that a thread needs to
die, and have each thread check this flag instead of calling
kthread_should_stop().  The first thread to find and clear this flag
then moves towards exiting.

This removes an explicit dependency on sp_all_threads which will make a
future patch simpler.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:04 -04:00
NeilBrown f4578ba11c lockd: hold a reference to nlmsvc_serv while stopping the thread.
Both nfsd and nfsv4-callback take a temporary reference to the svc_serv
while calling svc_set_num_threads() to stop the last thread.  lockd does
not.

This extra reference prevents the scv_serv from being freed when the
last thread drops its reference count.  This is not currently needed
for lockd as the svc_serv is not accessed after the last thread is told
to exit.

However a future patch will require svc_exit_thread() to access the
svc_serv after the svc_put() so it will need the code that calls
svc_set_num_threads() to keep a reference and keep the svc_serv active.

So copy the pattern from nfsd and nfsv4-cb to lockd, and take a
reference around svc_set_num_threads(.., 0)

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:04 -04:00
NeilBrown 063ab935a4 SUNRPC: integrate back-channel processing with svc_recv()
Using svc_recv() for (NFSv4.1) back-channel handling means we have just
one mechanism for waking threads.

Also change kthread_freezable_should_stop() in nfs4_callback_svc() to
kthread_should_stop() as used elsewhere.
kthread_freezable_should_stop() effectively adds a try_to_freeze() call,
and svc_recv() already contains that at an appropriate place.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:03 -04:00
Chuck Lever 6ed8cdf967 SUNRPC: Clean up bc_svc_process()
The test robot complained that, in some build configurations, the
@error variable in bc_svc_process's only caller is set but never
used. This happens because dprintk() is the only consumer of that
value.

 - Remove the dprintk() call sites in favor of the svc_process
   tracepoint
 - The @error variable and the return value of bc_svc_process() are
   now unused, so get rid of them.
 - The @serv parameter is set to rqstp->rq_serv by the only caller,
   and bc_svc_process() then uses it only to set rqstp->rq_serv. It
   can be removed.
 - Rename bc_svc_process() according to the convention that
   globally-visible RPC server functions have names that begin with
   "svc_"; and because it is globally-visible, give it a proper
   kdoc comment.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308121314.HA8Rq2XG-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:03 -04:00
NeilBrown 7b31f4daeb SUNRPC: rename and refactor svc_get_next_xprt()
svc_get_next_xprt() does a lot more than just get an xprt.  It also
decides if it needs to sleep, depending not only on the availability of
xprts but also on the need to exit or handle external work.

So rename it to svc_rqst_wait_for_work() and only do the testing and
waiting.  Move all the waiting-related code out of svc_recv() into the
new svc_rqst_wait_for_work().

Move the dequeueing code out of svc_get_next_xprt() into svc_recv().

Previously svc_xprt_dequeue() would be called twice, once before waiting
and possibly once after.  Now instead rqst_should_sleep() is called
twice.  Once to decide if waiting is needed, and once to check against
after setting the task state do see if we might have missed a wakeup.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:03 -04:00
NeilBrown e3274026e2 SUNRPC: move all of xprt handling into svc_xprt_handle()
svc_xprt_handle() does lots of things itself, but leaves some to the
caller - svc_recv().  This isn't elegant.

Move that code out of svc_recv() into svc_xprt_handle()

Move the calls to svc_xprt_release() from svc_send() and svc_drop()
(the two possible final steps in svc_process()) and from svc_recv() (in
the case where svc_process() wasn't called) into svc_xprt_handle().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:02 -04:00
Alexander Aring e70da17633 lockd: add doc to enable EXPORT_OP_ASYNC_LOCK
This patch adds a note to enable EXPORT_OP_ASYNC_LOCK for
asynchronous lock request handling.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:02 -04:00
Alexander Aring afb13302aa lockd: fix race in async lock request handling
This patch fixes a race in async lock request handling between adding
the relevant struct nlm_block to nlm_blocked list after the request was
sent by vfs_lock_file() and nlmsvc_grant_deferred() does a lookup of the
nlm_block in the nlm_blocked list. It could be that the async request is
completed before the nlm_block was added to the list. This would end
in a -ENOENT and a kernel log message of "lockd: grant for unknown
block".

To solve this issue we add the nlm_block before the vfs_lock_file() call
to be sure it has been added when a possible nlmsvc_grant_deferred() is
called. If the vfs_lock_file() results in an case when it wouldn't be
added to nlm_blocked list, the nlm_block struct will be removed from
this list again.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:02 -04:00
Alexander Aring b743612c0a lockd: don't call vfs_lock_file() for pending requests
This patch returns nlm_lck_blocked in nlmsvc_lock() when an asynchronous
lock request is pending. During testing I ran into the case with the
side-effects that lockd is waiting for only one lm_grant() callback
because it's already part of the nlm_blocked list. If another
asynchronous for the same nlm_block is triggered two lm_grant()
callbacks will occur but lockd was only waiting for one.

To avoid any change of existing users this handling will only being made
when export_op_support_safe_async_lock() returns true.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:01 -04:00
Alexander Aring 2dd10de8e6 lockd: introduce safe async lock op
This patch reverts mostly commit 40595cdc93 ("nfs: block notification
on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
export flag to signal that the "own ->lock" implementation supports
async lock requests. The only main user is DLM that is used by GFS2 and
OCFS2 filesystem. Those implement their own lock() implementation and
return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93
("nfs: block notification on fs with its own ->lock") the DLM
implementation were never updated. This patch should prepare for DLM
to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
plock implementation regarding to it.

Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:01 -04:00
Trond Myklebust 1b2021bdee nfsd: Don't reset the write verifier on a commit EAGAIN
If fsync() is returning EAGAIN, then we can assume that the filesystem
being exported is something like NFS with the 'softerr' mount option
enabled, and that it is just asking us to replay the fsync() operation
at a later date.

If we see an ESTALE, then ditto: the file is gone, so there is no danger
of losing the error.

For those cases, do not reset the write verifier. A write verifier
change has a global effect, causing retransmission by all clients of
all uncommitted unstable writes for all files, so it is worth
mitigating where possible.

Link: https://lore.kernel.org/linux-nfs/20230911184357.11739-1-trond.myklebust@hammerspace.com/
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:01 -04:00
Trond Myklebust d59b3515ab nfsd: Handle EOPENSTALE correctly in the filecache
The nfsd_open code handles EOPENSTALE correctly, by retrying the call to
fh_verify() and __nfsd_open(). However the filecache just drops the
error on the floor, and immediately returns nfserr_stale to the caller.

This patch ensures that we propagate the EOPENSTALE code back to
nfsd_file_do_acquire, and that we handle it correctly.

Fixes: 65294c1f2c ("nfsd: add a new struct file caching facility to nfsd")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230911183027.11372-1-trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:00 -04:00
Dai Ngo 5896a87054 NFSD: add trace points to track server copy progress
Add trace points on destination server to track inter and intra
server copy operations.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Tested-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:00 -04:00
Dai Ngo 15d1975b72 NFSD: initialize copy->cp_clp early in nfsd4_copy for use by trace point
Prepare for adding server copy trace points.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Tested-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:00 -04:00
Linus Torvalds 58720809f5 Linux 6.6-rc6 2023-10-15 13:34:39 -07:00
Linus Torvalds fbe1bf1e5f Revert "x86/smp: Put CPUs into INIT on shutdown if possible"
This reverts commit 45e34c8af5, and the
two subsequent fixes to it:

  3f874c9b2a ("x86/smp: Don't send INIT to non-present and non-booted CPUs")
  b1472a60a5 ("x86/smp: Don't send INIT to boot CPU")

because it seems to result in hung machines at shutdown.  Particularly
some Dell machines, but Thomas says

 "The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs -
  at least that's what I could figure out from the various bug reports.

  I don't know which CPUs the DELL machines have, so I can't say it's a
  pattern.

  I agree with the revert for now"

Ashok Raj chimes in:

 "There was a report (probably this same one), and it turns out it was a
  bug in the BIOS SMI handler.

  The client BIOS's were waiting for the lowest APICID to be the SMI
  rendevous master. If this is MeteorLake, the BSP wasn't the one with
  the lowest APIC and it triped here.

  The BIOS change is also being pushed to others for assimilation :)

  Server BIOS's had this correctly for a while now"

and it does look likely to be some bad interaction between SMI and the
non-BSP cores having put into INIT (and thus unresponsive until reset).

Link: https://bbs.archlinux.org/viewtopic.php?pid=2124429
Link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/
Link: https://forum.artixlinux.org/index.php/topic,5997.0.html
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-10-15 12:02:02 -07:00
Xuan Zhuo 5720c43d52 virtio_net: fix the missing of the dma cpu sync
Commit 295525e29a ("virtio_net: merge dma operations when filling
mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
do not want to do the sync for the entire page_frag. But that misses the
sync for the current area.

This patch does cpu sync regardless of whether the ref is zero or not.

Fixes: 295525e29a ("virtio_net: merge dma operations when filling mergeable buffers")
Reported-by: Michael Roth <michael.roth@amd.com>
Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.com
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-10-15 11:49:57 -07:00