linux

mirror of https://github.com/torvalds/linux synced 2024-10-03 18:00:50 +00:00

Author	SHA1	Message	Date
Linus Torvalds	f35d170615	NFSD 6.6 Release Notes I'm thrilled to announce that the Linux in-kernel NFS server now offers NFSv4 write delegations. A write delegation enables a client to cache data and metadata for a single file more aggressively, reducing network round trips and server workload. Many thanks to Dai Ngo for contributing this facility, and to Jeff Layton and Neil Brown for reviewing and testing it. This release also sees the removal of all support for DES- and triple-DES-based Kerberos encryption types in the kernel's SunRPC implementation. These encryption types have been deprecated by the Internet community for years and are considered insecure. This change affects both the in-kernel NFS client and server. The server's UDP and TCP socket transports have now fully adopted David Howells' new bio_vec iterator so that no more than one sendmsg() call is needed to transmit each RPC message. In particular, this helps kTLS optimize record boundaries when sending RPC-with-TLS replies, and it takes the server a baby step closer to handling file I/O via folios. We've begun work on overhauling the SunRPC thread scheduler to remove a costly linked-list walk when looking for an idle RPC service thread to wake. The pre-requisites are included in this release. Thanks to Neil Brown for his ongoing work on this improvement. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTwoa0ACgkQM2qzM29m f5cZvw/8CmFVNC27aMrJEhRRhwwrXbLzUkWh9GCYkG98PHYiLxLTvZ6qELXAax/a UjSgIDSRcWl4z8M/tyBQtgsw7NADr+7XWqEoXR7HZ5pEEC/KNGM0oQWQ92ojjKYy JmHdB02uaDJfcd9ioFTU13cw7q2BQfoe2xLI8yqis2vcVSu92AM7aIw+cvJIpwQB inA3TIIsYTV/gPByXSfEtvmYACadoFiMvfvYwaWhjFS9MdSzFmcVG0Dp3EFIig29 odmWEofcz6uIvUWvUswWEGdoSu7uOKIztSuAI4PlTwaofUaSKG6e5kmtpr3cLERD Uhg2lm5JgqkXBd7QHObNimJ4DtQzFwHmhA08qo8rd/zba75mn/Hr5IF0q3Rxs99J SRYHcAeP8afKn5Ge0yzoTgCNcqhfz8KLRfoCQX49mljr+muNxld4nMklD2KdUwJi XEB512/q3E4nUgopXZiSJYQYAq1CfdR5WpGipZ9X0XK9HZBDF/qhXGtk1YQuNWyj ZxJS3bfBza4oVIvP5/ehjCIQwOvqkcrC5zZGDIgDvw9Q6L3L1wqmVntsdCLCLRcJ jB4IOsj+DECfJ6w2vP2SZ3GeMtFnyuTQjhUTkjPuAKTBBiKo4Tj0o/agpfDYbWZy 1l3a2yH5jqJgkm4MaVh3YHRJGc0ub0ccpIrs3QQ4jvjMLQ/3Gcs= =XGHs -----END PGP SIGNATURE----- Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "I'm thrilled to announce that the Linux in-kernel NFS server now offers NFSv4 write delegations. A write delegation enables a client to cache data and metadata for a single file more aggressively, reducing network round trips and server workload. Many thanks to Dai Ngo for contributing this facility, and to Jeff Layton and Neil Brown for reviewing and testing it. This release also sees the removal of all support for DES- and triple-DES-based Kerberos encryption types in the kernel's SunRPC implementation. These encryption types have been deprecated by the Internet community for years and are considered insecure. This change affects both the in-kernel NFS client and server. The server's UDP and TCP socket transports have now fully adopted David Howells' new bio_vec iterator so that no more than one sendmsg() call is needed to transmit each RPC message. In particular, this helps kTLS optimize record boundaries when sending RPC-with-TLS replies, and it takes the server a baby step closer to handling file I/O via folios. We've begun work on overhauling the SunRPC thread scheduler to remove a costly linked-list walk when looking for an idle RPC service thread to wake. The pre-requisites are included in this release. Thanks to Neil Brown for his ongoing work on this improvement" * tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits) Documentation: Add missing documentation for EXPORT_OP flags SUNRPC: Remove unused declaration rpc_modcount() SUNRPC: Remove unused declarations NFSD: da_addr_body field missing in some GETDEVICEINFO replies SUNRPC: Remove return value of svc_pool_wake_idle_thread() SUNRPC: make rqst_should_sleep() idempotent() SUNRPC: Clean up svc_set_num_threads SUNRPC: Count ingress RPC messages per svc_pool SUNRPC: Deduplicate thread wake-up code SUNRPC: Move trace_svc_xprt_enqueue SUNRPC: Add enum svc_auth_status SUNRPC: change svc_xprt::xpt_flags bits to enum SUNRPC: change svc_rqst::rq_flags bits to enum SUNRPC: change svc_pool::sp_flags bits to enum SUNRPC: change cache_head.flags bits to enum SUNRPC: remove timeout arg from svc_recv() SUNRPC: change svc_recv() to return void. SUNRPC: call svc_process() from svc_recv(). nfsd: separate nfsd_last_thread() from nfsd_put() nfsd: Simplify code around svc_exit_thread() call in nfsd() ...	2023-08-31 15:32:18 -07:00
Chuck Lever	6372e2ee62	NFSD: da_addr_body field missing in some GETDEVICEINFO replies The XDR specification in RFC 8881 looks like this: struct device_addr4 { layouttype4 da_layout_type; opaque da_addr_body<>; }; struct GETDEVICEINFO4resok { device_addr4 gdir_device_addr; bitmap4 gdir_notification; }; union GETDEVICEINFO4res switch (nfsstat4 gdir_status) { case NFS4_OK: GETDEVICEINFO4resok gdir_resok4; case NFS4ERR_TOOSMALL: count4 gdir_mincount; default: void; }; Looking at nfsd4_encode_getdeviceinfo() .... When the client provides a zero gd_maxcount, then the Linux NFS server implementation encodes the da_layout_type field and then skips the da_addr_body field completely, proceeding directly to encode gdir_notification field. There does not appear to be an option in the specification to skip encoding da_addr_body. Moreover, Section 18.40.3 says: > If the client wants to just update or turn off notifications, it > MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero. > In that event, if the device ID is valid, the reply's da_addr_body > field of the gdir_device_addr field will be of zero length. Since the layout drivers are responsible for encoding the da_addr_body field, put this fix inside the ->encode_getdeviceinfo methods. Fixes: `9cf514ccfa` ("nfsd: implement pNFS operations") Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Tom Haynes <loghyr@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
NeilBrown	c743b4259c	SUNRPC: remove timeout arg from svc_recv() Most svc threads have no interest in a timeout. nfsd sets it to 1 hour, but this is a wart of no significance. lockd uses the timeout so that it can call nlmsvc_retry_blocked(). It also sometimes calls svc_wake_up() to ensure this is called. So change lockd to be consistent and always use svc_wake_up() to trigger nlmsvc_retry_blocked() - using a timer instead of a timeout to svc_recv(). And change svc_recv() to not take a timeout arg. This makes the sp_threads_timedout counter always zero. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
NeilBrown	7b719e2bf3	SUNRPC: change svc_recv() to return void. svc_recv() currently returns a 0 on success or one of two errors: - -EAGAIN means no message was successfully received - -EINTR means the thread has been told to stop Previously nfsd would stop as the result of a signal as well as following kthread_stop(). In that case the difference was useful: EINTR means stop unconditionally. EAGAIN means stop if kthread_should_stop(), continue otherwise. Now threads only exit when kthread_should_stop() so we don't need the distinction. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
NeilBrown	f78116d3bf	SUNRPC: call svc_process() from svc_recv(). All callers of svc_recv() go on to call svc_process() on success. Simplify callers by having svc_recv() do that for them. This loses one call to validate_process_creds() in nfsd. That was debugging code added 14 years ago. I don't think we need to keep it. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
NeilBrown	9f28a971ee	nfsd: separate nfsd_last_thread() from nfsd_put() Now that the last nfsd thread is stopped by an explicit act of calling svc_set_num_threads() with a count of zero, we only have a limited number of places that can happen, and don't need to call nfsd_last_thread() in nfsd_put() So separate that out and call it at the two places where the number of threads is set to zero. Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all() into nfsd_last_thread(), as they are really part of the same action. nfsd_put() is now a thin wrapper around svc_put(), so make it a static inline. nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of places we have to use svc_put() instead. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
NeilBrown	18e4cf9155	nfsd: Simplify code around svc_exit_thread() call in nfsd() Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed. Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
NeilBrown	3903902401	nfsd: don't allow nfsd threads to be signalled. The original implementation of nfsd used signals to stop threads during shutdown. In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads internally it if was asked to run "0" threads. After this user-space transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to threads was no longer an important part of the API. In commit `3ebdbe5203` ("SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the use of signals for stopping threads, using kthread_stop() instead. This patch makes the "obvious" next step and removes the ability to signal nfsd threads - or any svc threads. nfsd stops allowing signals and we don't check for their delivery any more. This will allow for some simplification in later patches. A change worth noting is in nfsd4_ssc_setup_dul(). There was previously a signal_pending() check which would only succeed when the thread was being shut down. It should really have tested kthread_should_stop() as well. Now it just does the latter, not the former. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Jeff Layton	d424797032	nfsd: inherit required unset default acls from effective set A well-formed NFSv4 ACL will always contain OWNER@/GROUP@/EVERYONE@ ACEs, but there is no requirement for inheritable entries for those entities. POSIX ACLs must always have owner/group/other entries, even for a default ACL. nfsd builds the default ACL from inheritable ACEs, but the current code just leaves any unspecified ACEs zeroed out. The result is that adding a default user or group ACE to an inode can leave it with unwanted deny entries. For instance, a newly created directory with no acl will look something like this: # NFSv4 translation by server A::OWNER@:rwaDxtTcCy A::GROUP@:rxtcy A::EVERYONE@:rxtcy # POSIX ACL of underlying file user::rwx group::r-x other::r-x ...if I then add new v4 ACE: nfs4_setfacl -a A:fd:1000:rwx /mnt/local/test ...I end up with a result like this today: user::rwx user:1000:rwx group::r-x mask::rwx other::r-x default:user::--- default:user:1000:rwx default:group::--- default😷:rwx default:other::--- A::OWNER@:rwaDxtTcCy A::1000:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy D:fdi:OWNER@:rwaDx A:fdi:OWNER@:tTcCy A:fdi:1000:rwaDxtcy A:fdi:GROUP@:tcy A:fdi:EVERYONE@:tcy ...which is not at all expected. Adding a single inheritable allow ACE should not result in everyone else losing access. The setfacl command solves a silimar issue by copying owner/group/other entries from the effective ACL when none of them are set: "If a Default ACL entry is created, and the Default ACL contains no owner, owning group, or others entry, a copy of the ACL owner, owning group, or others entry is added to the Default ACL. Having nfsd do the same provides a more sane result (with no deny ACEs in the resulting set): user::rwx user:1000:rwx group::r-x mask::rwx other::r-x default:user::rwx default:user:1000:rwx default:group::r-x default😷:rwx default:other::r-x A::OWNER@:rwaDxtTcCy A::1000:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy A:fdi:OWNER@:rwaDxtTcCy A:fdi:1000:rwaDxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy Reported-by: Ondrej Valousek <ondrej.valousek@diasemi.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2136452 Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Jeff Layton	f2b7019d2e	nfsd: set missing after_change as before_change + 1 In the event that we can't fetch post_op_attr attributes, we still need to set a value for the after_change. The operation has already happened, so we're not able to return an error at that point, but we do want to ensure that the client knows that its cache should be invalidated. If we weren't able to fetch post-op attrs, then just set the after_change to before_change + 1. The atomic flag should already be clear in this case. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Jeff Layton	976626073a	nfsd: remove unsafe BUG_ON from set_change_info At one time, nfsd would scrape inode information directly out of struct inode in order to populate the change_info4. At that time, the BUG_ON in set_change_info made some sense, since having it unset meant a coding error. More recently, it calls vfs_getattr to get this information, which can fail. If that fails, fh_pre_saved can end up not being set. While this situation is unfortunate, we don't need to crash the box. Move set_change_info to nfs4proc.c since all of the callers are there. Revise the condition for setting "atomic" to also check for fh_pre_saved. Drop the BUG_ON and just have it zero out both change_attr4s when this occurs. Reported-by: Boyang Xue <bxue@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2223560 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Jeff Layton	a332018a91	nfsd: handle failure to collect pre/post-op attrs more sanely Collecting pre_op_attrs can fail, in which case it's probably best to fail the whole operation. Change fh_fill_pre_attrs and fh_fill_both_attrs to return __be32, and have the callers check the return code and abort the operation if it's not nfs_ok. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Jeff Layton	5865bafa19	nfsd: add a MODULE_DESCRIPTION I got this today from modpost: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfsd/nfsd.o Add a module description. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	e7421ce714	NFSD: Rename struct svc_cacherep The svc_ prefix is identified with the SunRPC layer. Although the duplicate reply cache caches RPC replies, it is only for the NFS protocol. Rename the struct to better reflect its purpose. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	cb18eca4b8	NFSD: Remove svc_rqst::rq_cacherep Over time I'd like to see NFS-specific fields moved out of struct svc_rqst, which is an RPC layer object. These fields are layering violations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	c135e1269f	NFSD: Refactor the duplicate reply cache shrinker Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to reduce the shrinker's impact on the cache's effectiveness. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	a9507f6af1	NFSD: Replace nfsd_prune_bucket() Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache. A few percpu operations are moved outside the lock since they temporarily disable local IRQs which is expensive and does not need to be done while the lock is held. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	ff0d169329	NFSD: Rename nfsd_reply_cache_alloc() For readability, rename to match the other helpers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	35308e7f0f	NFSD: Refactor nfsd_reply_cache_free_locked() To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held. Start by refactoring nfsd_reply_cache_free_locked() into a helper that removes an entry from the bucket (and must therefore run under the lock) and a second helper that frees the entry (which does not need to hold the lock). For readability, rename the helpers nfsd_cacherep_<verb>. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Dai Ngo	1d3dd1d56c	NFSD: Enable write delegation support This patch grants write delegations for OPEN with NFS4_SHARE_ACCESS_WRITE if there is no conflict with other OPENs. Write delegation conflicts with another OPEN, REMOVE, RENAME and SETATTR are handled the same as read delegation using notify_change, try_break_deleg. The NFSv4.0 protocol does not enable a server to determine that a conflicting GETATTR originated from the client holding the delegation versus coming from some other client. With NFSv4.1 and later, the SEQUENCE operation that begins each COMPOUND contains a client ID, so delegation recall can be safely squelched in this case. With NFSv4.0, however, the server must recall or send a CB_GETATTR (per RFC 7530 Section 16.7.5) even when the GETATTR originates from the client holding that delegation. An NFSv4.0 client can trigger a pathological situation if it always sends a DELEGRETURN preceded by a conflicting GETATTR in the same COMPOUND. COMPOUND execution will always stop at the GETATTR and the DELEGRETURN will never get executed. The server eventually revokes the delegation, which can result in loss of open or lock state. Tracepoint added to track whether read or write delegation is granted. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Chuck Lever	50bce06f0e	NFSD: Report zero space limit for write delegations Replace the -1 (no limit) with a zero (no reserved space). This prevents certain non-determinant client behavior, such as silly-renaming a file when the only open reference is a write delegation. Such a rename can leave unexpected .nfs files in a directory that is otherwise supposed to be empty. Note that other server implementations that support write delegation also set this field to zero. Suggested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Dai Ngo	fd19ca36fd	NFSD: handle GETATTR conflict with write delegation If the GETATTR request on a file that has write delegation in effect and the request attributes include the change info and size attribute then the write delegation is recalled. If the delegation is returned within 30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY error is returned for the GETATTR. Add counter for write delegation recall due to conflict GETATTR. This is used to evaluate the need to implement CB_GETATTR to adoid recalling the delegation with conflit GETATTR. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-29 17:45:22 -04:00
Linus Torvalds	615e95831e	v6.6-vfs.ctime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2 XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0= =eJz5 -----END PGP SIGNATURE----- Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...	2023-08-28 09:31:32 -07:00
Linus Torvalds	f8d6ff4490	nfsd-6.5 fixes: - Close race window when handling FREE_STATEID operations - Fix regression in /proc/fs/nfsd/v4_end_grace introduced in v6.5-rc -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTntHEACgkQM2qzM29m f5cGgQ//ZvUW1Vvp+s86puw+EyEtwu15Ms19kTYNR+AKebpPz/c9K9iEF3nmZXEL bRn25fELtzXYi8rqavrv8fMj7dQhmkT3DE0WaJcTtCLD5N5bGDO3mQeoQ1fKGR1r rHITp0jC25Viur7kXhXU6qIcvu0VthK+feW/DMlkKsmSlQE5V4utxUGYZp8gfZGU 7cbYRpCqF2J1bJSPxH/lKpg5ZHztpZW6aPXG7frHcg04qsfqrMRS0HqG8KYaAKXh BObBqSYDo8agOa3u361pBZoVZHF2/7gFXlZKIZdp+6F5/B1IjoN+7eWnI7hFxiH7 zf5jLa9xlWrXr2vQTuPEJa9dCr756Ixzq7IJ7ZzIMOpVypixZ04jBLfnuhcnayu9 8k/0CFqQwmfvIcXgJEpTJ+OKm0kDqI3n7WE9gkeYBkRewEvJQXaFZ/vqTYi7bp9H eWlwQ4bHE5touERBMp0HmDdct/ZdUn8dS6MDcdGFXrVf5m+Jt6hZCXTnpU3Ah+zF d0uK4IEwJ2yC9FhBqOYZ6+XBr1JA+40vdnHOBvKdpAzQnIdwnNa4rzR0Eab6+m4i fmhI63s9slPBcBMroRC0mhftcdkd7LjBhhWbsDu8nemKmmHcOKzwTda78EayQYnm /zJUVr8BqzkgaJG1PUn9y0g4IOfgTiokDmBdLu6bTAanRtekhVY= =BF07 -----END PGP SIGNATURE----- Merge tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Two last-minute one-liners for v6.5-rc. One got lost in the shuffle, and the other was reported just this morning" - Close race window when handling FREE_STATEID operations - Fix regression in /proc/fs/nfsd/v4_end_grace introduced in v6.5-rc" * tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix a thinko introduced by recent trace point changes nfsd: Fix race to FREE_STATEID and cl_revoked	2023-08-24 14:30:47 -07:00
Chuck Lever	8073a98e95	NFSD: Fix a thinko introduced by recent trace point changes The fixed commit erroneously removed a call to nfsd_end_grace(), which makes calls to write_v4_end_grace() a no-op. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202308241229.68396422-oliver.sang@intel.com Fixes: `39d432fc76` ("NFSD: trace nfsctl operations") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-24 10:56:28 -04:00
Benjamin Coddington	3b816601e2	nfsd: Fix race to FREE_STATEID and cl_revoked We have some reports of linux NFS clients that cannot satisfy a linux knfsd server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though those clients repeatedly walk all their known state using TEST_STATEID and receive NFS4_OK for all. Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then nfsd4_free_stateid() finds the delegation and returns NFS4_OK to FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to cl_revoked. This would produce the observed client/server effect. Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID and move to cl_revoked happens within the same cl_lock. This will allow nfsd4_free_stateid() to properly remove the delegation from cl_revoked. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575 Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v4.17+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-08-04 11:38:33 -04:00
Linus Torvalds	7bafbd4027	nfsd-6.5 fixes: - Fix tmpfs splice read support -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTLxS4ACgkQM2qzM29m f5cJOg//T/CR1IUGTxd6A4MYjN3J3nIgQwMIbNP1npaLZQp6rF1UlE4seCyMXWtG VxxR0TnCrxxCl7YbHz6jzQsel2lWpVpMh5+aWQ3jPeJqnrdfJX3VwLzKYEeVj8Uj use2i0g5UA3HTt3bULEgf17j7zCMqYqmldU5jLHvmkXyLN+E+WcHrhx6RmXH4lYN fR2bC8a6MmpyGJJ6P+mYw1GdOUIqWU/HQaeZC+L+T5xG156RqvZAQWsUR40kA3Ta 6C4NQZr8CtbZcIzVIhxYDKYEg+R9iLRJeHoYVi/O0F779akzYcIuGHYMjG/Pi2z5 IjxWJoGOi1L1bFYe4Nm06/0DLveNPfUjRybV4iCRY92TilyEhUlTjoHoGx5ajlsC 02kvjyaJNJQ6oCvQmYH+PjHm5OW5Bj/BEl5O21MQUMXRklSNyW7HpGpWCUjpyVAJ by2YatGNCGoS8mLp3HebW9XjiuDEQ+Y/S4PgXaEaHDW/L5wmmoT1yR/iLl+1b/hV 9jGhsyuowLQHLqIk3ZP7o2X8ApXh0ZD+Xmvvxobh47W/fYTCOe+SXySfpTX03EDV SBEel7d4aGESndB7DgWXMjP1D7xLwkMIpk4tJVw+7idVLgKlrJGfidY5ROIFJiIM 95v+FKksLM3NTjt2Zb8MTfF65YKsRH3QqfgP34r5FzwAIx3t2VU= =Lq7u -----END PGP SIGNATURE----- Merge tag 'nfsd-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix tmpfs splice read support * tag 'nfsd-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: Fix reading via splice	2023-08-03 09:26:34 -07:00
David Howells	101df45e7e	nfsd: Fix reading via splice nfsd_splice_actor() has a clause in its loop that chops up a compound page into individual pages such that if the same page is seen twice in a row, it is discarded the second time. This is a problem with the advent of shmem_splice_read() as that inserts zero_pages into the pipe in lieu of pages that aren't present in the pagecache. Fix this by assuming that the last page is being extended only if the currently stored length + starting offset is not currently on a page boundary. This can be tested by NFS-exporting a tmpfs filesystem on the test machine and truncating it to more than a page in size (eg. truncate -s 8192) and then reading it by NFS. The first page will be all zeros, but thereafter garbage will be read. Note: I wonder if we can ever get a situation now where we get a splice that gives us contiguous parts of a page in separate actor calls. As NFSD can only be splicing from a file (I think), there are only three sources of the page: copy_splice_read(), shmem_splice_read() and file_splice_read(). The first allocates pages for the data it reads, so the problem cannot occur; the second should never see a partial page; and the third waits for each page to become available before we're allowed to read from it. Fixes: `bd194b1871` ("shmem: Implement splice-read") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> cc: Hugh Dickins <hughd@google.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-07-30 18:07:12 -04:00
Linus Torvalds	0b4a9fdc93	nfsd-6.5 fixes: - Fix TEST_STATEID response -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTALGMACgkQM2qzM29m f5eTLw//cyAYWF2VBP1byO5x3ssju9LUDn0fqXHPstdlqQIN/sG2A3pS2f59JPSx Vsa7wlsxRwHj9/tcDups7LBAelA/GopNTK5XuIDMLdhhKAgnJMqa5NCKR2cCC3It Xd+nkgigkeKMlQivBMZYBYtmFLNqAS0LkYBaEpWpyuBUtdayKUk+iMFaFwcfNn+E /VKaAjyWv1Ef2JpuwgCA3gpGHUnqel2MnTtbtSB09W2bFoPCQEusdpfq4XMzi+1S Rmv3tzt/B4/oQFlJnn8lOzSWYpT6Vol5kQcb2YM7d6asiqWZSYA78Iv0HekbhUZA dDEkowvix27v7wStwgfZCJofmc7u6mMaTRTLhMnF08zSLsqjmW1N+9o5WwXGngVm Y7x7eh5uhT56yN8exQLxiq2XItASrF/Z1XI0xqwrdn3+WtuOThw2DkhT2+ElErUk cuioHRHZ2LVLwCg5uJn3OrxtBYYxDLiCi3BPhB3I3J17Uyb6M/Tls0QV1FjLzwF3 GyeeoJ7622ZNSOv/j4EuRk2tXWtewN65k4pcnpJbL+iQ2ejXFzls+Fe2m+tDLM+y jfTKMqO5Yb8jK7zn5Fmssfb32nwtJ7b7+SpDUHvyyttv8we/EKzbWA4/l8ECd5iT nClU7Pos9+WPaXjt1uawevgUNdm0Bhu54zyqgyeNWCYITq7DX1E= =z8O/ -----END PGP SIGNATURE----- Merge tag 'nfsd-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix TEST_STATEID response * tag 'nfsd-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: Remove incorrect check in nfsd4_validate_stateid	2023-07-25 13:54:04 -07:00
Jeff Layton	38d721b13f	nfsd: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-56-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-07-24 10:30:01 +02:00
Trond Myklebust	f75546f58a	nfsd: Remove incorrect check in nfsd4_validate_stateid If the client is calling TEST_STATEID, then it is because some event occurred that requires it to check all the stateids for validity and call FREE_STATEID on the ones that have been revoked. In this case, either the stateid exists in the list of stateids associated with that nfs4_client, in which case it should be tested, or it does not. There are no additional conditions to be considered. Reported-by: "Frank Ch. Eigler" <fche@redhat.com> Fixes: `7df302f75e` ("NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-07-18 11:34:09 -04:00
Linus Torvalds	ee152be17a	nfsd-6.5 fixes: - Fix ordering of attributes in NFSv4 GETATTR replies -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmSfFEwACgkQM2qzM29m f5eHCw/8C3EAIsYGSe+aREUZgH7/m3JlElkrGA6IcKP6kKN9ZD7x0RIDxnkqcnbv 73fmbgXrbu5NjjtvT9WHwS4yJPNKvU3xiVEMKJKelEdkboG4Qb76r951wzTkXK25 ev/68BzYfbuX36Wa/21Gp/PmisqlspQjbk3X7zckCk8KM6PWvAO4HrmA7+VOkHMP tTVc3Bd482DqgCNmG7fz8UHeW97itkr8w4oFSgifuzQLC3h6f28yyxI9pQ0bFSn/ fdcAOxx5W9crQ+lwIaHCzPZa8N+oGt0amqKXeqnyvY5FmH7Q3SjZ/9h07QRt7q4m sUb0peHpraOWB+rGMy6pcq4XxDEVhPsiJiM/hiHCtobZS18PEPn7Dd7ybC5GkUVC 9H6hKSwvkubgsBzKZOmrO5trKCW4g3tSSnQ4olJbIA39neijtDsshi9UkZrbqm44 54cg6DsTY/CzIYVzGqf9TRL18uf7yzYtadGwmM3fn03i+BDD8p7iS0VNinfekvJA PIL0DWpQLfU49fFKgTphe968fXbGA9aDgP2Lu379qYf7UdcpQ8N2t5BszsS2Sdkv U3wwNbhMNS3WZrlLXj+zgR+zJCEIo4YwPG23XSujFyrd0W46DbYvCpMSzOsKDM8B ZV6yiX4/G1Byr8GSffmKrmezXTGJS5BguiBi1D95o2pFMLw0eFo= =F59G -----END PGP SIGNATURE----- Merge tag 'nfsd-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix ordering of attributes in NFSv4 GETATTR replies * tag 'nfsd-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: Fix creation time serialization order	2023-06-30 21:48:44 -07:00
Linus Torvalds	18c9901d74	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmScTY8ACgkQnJ2qBz9k QNnfwwgAhZtow1klDLH6qCnWMufB6AwT7VfAHyA3fvyTYMjUKb0sGHkpuh8hqVOb Lzb4YB+jSWV8XnMFn/4gFJQU/nAv8bMPavghMGpr5VNjQi7WkxYF/GB6O1I5NOHK EnJjDExgdxXDJZORaaXLVJWrtzJuDFgdiSeIwJECFa0MdTHNgPy3XOl+PPxnYQ/V xyHyP5ImGgd5O4iy3PFDQBGgOXIMrBX8IMce+qLQNYIvjSIUgmdnIkoUCvsQiisp LyKI2LxqAqnpA4h4Ow6hOZDw2VlPT0vDwFVUfFIZMIqs5YgaSbWa1Z6cs37MigAn fgUyRVx2y8A2Lwla7rwLaUEToRVADw== =ZdcG -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - Support for fanotify events returning file handles for filesystems not exportable via NFS - Improved error handling exportfs functions - Add missing FS_OPEN events when unusual open helpers are used * tag 'fsnotify_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: move fsnotify_open() hook into do_dentry_open() exportfs: check for error return value from exportfs_encode_*() fanotify: support reporting non-decodeable file handles exportfs: allow exporting non-decodeable file handles to userspace exportfs: add explicit flag to request non-decodeable file handles exportfs: change connectable argument to bit flags	2023-06-29 13:31:44 -07:00
Linus Torvalds	3a8a670eee	Networking changes for 6.5. Core ---- - Rework the sendpage & splice implementations. Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is. Remove the MSG_SENDPAGE_NOTLAST flag completely. - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid. - Enable socket busy polling with CONFIG_RT. - Improve reliability and efficiency of reporting for ref_tracker. - Auto-generate a user space C library for various Netlink families. Protocols --------- - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2]. - Use per-VMA locking for "page-flipping" TCP receive zerocopy. - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags. - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative. - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO). - Avoid waking up applications using TLS sockets until we have a full record. - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring. - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address. - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch. - PPPoE: make number of hash bits configurable. - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig). - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge). - Support matching on Connectivity Fault Management (CFM) packets. - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug. - HSR: don't enable promiscuous mode if device offloads the proto. - Support active scanning in IEEE 802.15.4. - Continue work on Multi-Link Operation for WiFi 7. BPF --- - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators. - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should be, without writing anything. - Accept dynptr memory as memory arguments passed to helpers. - Add routing table ID to bpf_fib_lookup BPF helper. - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands. - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only). - Show target_{obj,btf}_id in tracing link fdinfo. - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter --------- - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value. - Increase ip_vs_conn_tab_bits range for 64BIT builds. - Allow updating size of a set. - Improve NAT tuple selection when connection is closing. Driver API ---------- - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out). - Support configuring rate selection pins of SFP modules. - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines. - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer. - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio). - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message. - Split devlink instance and devlink port operations. New hardware / drivers ---------------------- - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers ------- - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/ fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/ JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4= =9i4I -----END PGP SIGNATURE----- Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...	2023-06-28 16:43:10 -07:00
Tavian Barnes	d7dbed457c	nfsd: Fix creation time serialization order In nfsd4_encode_fattr(), TIME_CREATE was being written out after all other times. However, they should be written out in an order that matches the bit flags in bmval1, which in this case are #define FATTR4_WORD1_TIME_ACCESS (1UL << 15) #define FATTR4_WORD1_TIME_CREATE (1UL << 18) #define FATTR4_WORD1_TIME_DELTA (1UL << 19) #define FATTR4_WORD1_TIME_METADATA (1UL << 20) #define FATTR4_WORD1_TIME_MODIFY (1UL << 21) so TIME_CREATE should come second. I noticed this on a FreeBSD NFSv4.2 client, which supports creation times. On this client, file times were weirdly permuted. With this patch applied on the server, times looked normal on the client. Fixes: `e377a3e698` ("nfsd: Add support for the birth time attribute") Link: https://unix.stackexchange.com/q/749605/56202 Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-27 12:10:47 -04:00
David Howells	dc97391e66	sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:13 -07:00
Colin Ian King	75bfb70457	nfsd: remove redundant assignments to variable len There are a few assignments to variable len where the value is not being read and so the assignments are redundant and can be removed. In one case, the variable len can be removed completely. Cleans up 4 clang scan warnings of the form: fs/nfsd/export.c💯7: warning: Although the value stored to 'len' is used in the enclosing expression, the value is never actually read from 'len' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-21 15:05:32 -04:00
Chuck Lever	5e092be741	NFSD: Distinguish per-net namespace initialization I find the naming of nfsd_init_net() and nfsd_startup_net() to be confusingly similar. Rename the namespace initialization and tear- down ops and add comments to distinguish their separate purposes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-18 12:02:52 -04:00
Jeff Layton	ed9ab7346e	nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net Commit `f5f9d4a314` ("nfsd: move reply cache initialization into nfsd startup") moved the initialization of the reply cache into nfsd startup, but didn't account for the stats counters, which can be accessed before nfsd is ever started. The result can be a NULL pointer dereference when someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still shut down. This is a regression and a user-triggerable oops in the right situation: - non-x86_64 arch - /proc/fs/nfsd is mounted in the namespace - nfsd is not started in the namespace - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats" Although this is easy to trigger on some arches (like aarch64), on x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the fixed_percpu_data. That struct looks just enough like a newly initialized percpu var to allow nfsd_reply_cache_stats_show to access it without Oopsing. Move the initialization of the per-net+per-cpu reply-cache counters back into nfsd_init_net, while leaving the rest of the reply cache allocations to be done at nfsd startup time. Kudos to Eirik who did most of the legwork to track this down. Cc: stable@vger.kernel.org # v6.3+ Fixes: `f5f9d4a314` ("nfsd: move reply cache initialization into nfsd startup") Reported-and-tested-by: Eirik Fuller <efuller@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-18 12:02:40 -04:00
Chuck Lever	262176798b	NFSD: Add an nfsd4_encode_nfstime4() helper Clean up: de-duplicate some common code. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-17 13:18:06 -04:00
Dai Ngo	58f5d89400	NFSD: add encoding of op_recall flag for write delegation Modified nfsd4_encode_open to encode the op_recall flag properly for OPEN result with write delegation granted. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org	2023-06-12 12:16:35 -04:00
Jeff Layton	518f375c15	nfsd: don't provide pre/post-op attrs if fh_getattr fails nfsd calls fh_getattr to get the latest inode attrs for pre/post-op info. In the event that fh_getattr fails, it resorts to scraping cached values out of the inode directly. Since these attributes are optional, we can just skip providing them altogether when this happens. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Neil Brown <neilb@suse.de>	2023-06-11 16:37:46 -04:00
Chuck Lever	df56b384de	NFSD: Remove nfsd_readv() nfsd_readv()'s consumers now use nfsd_iter_read(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-11 16:37:46 -04:00
Chuck Lever	703d752155	NFSD: Hoist rq_vec preparation into nfsd_read() [step two] Now that the preparation of an rq_vec has been removed from the generic read path, nfsd_splice_read() no longer needs to reset rq_next_page. nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I can ascertain, resetting rq_next_page for NFSv4 splice reads is unnecessary because rq_next_page is already set correctly. Moreover, resetting it might even be incorrect if previous operations in the COMPOUND have already consumed at least a page of the send buffer. I would expect that the result would be encoding the READ payload over previously-encoded results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-11 16:37:46 -04:00
Chuck Lever	507df40ebf	NFSD: Hoist rq_vec preparation into nfsd_read() Accrue the following benefits: a) Deduplicate this common bit of code. b) Don't prepare rq_vec for NFSv2 and NFSv3 spliced reads, which don't use rq_vec. This is already the case for nfsd4_encode_read(). c) Eventually, converting NFSD's read path to use a bvec iterator will be simpler. In the next patch, nfsd_iter_read() will replace nfsd_readv() for all NFS versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-11 16:37:45 -04:00
Chuck Lever	ed4a567a17	NFSD: Update rq_next_page between COMPOUND operations A GETATTR with a large result can advance xdr->page_ptr without updating rq_next_page. If a splice READ follows that GETATTR in the COMPOUND, nfsd_splice_actor can start splicing at the wrong page. I've also seen READLINK and READDIR leave rq_next_page in an unmodified state. There are potentially a myriad of combinations like this, so play it safe: move the rq_next_page update to nfsd4_encode_operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-11 16:37:45 -04:00
Chuck Lever	ba21e20b30	NFSD: Use svcxdr_encode_opaque_pages() in nfsd4_encode_splice_read() Commit `15b23ef5d3` ("nfsd4: fix corruption of NFSv4 read data") encountered exactly the same issue: after a splice read, a filesystem-owned page is left in rq_pages[]; the symptoms are the same as described there. If the computed number of pages in nfsd4_encode_splice_read() is not exactly the same as the actual number of pages that were consumed by nfsd_splice_actor() (say, because of a bug) then hilarity ensues. Instead of recomputing the page offset based on the size of the payload, use rq_next_page, which is already properly updated by nfsd_splice_actor(), to cause svc_rqst_release_pages() to operate correctly in every instance. This is a defensive change since we believe that after commit `27c934dd88` ("nfsd: don't replace page in rq_pages if it's a continuation of last page") has been applied, there are no known opportunities for nfsd_splice_actor() to screw up. So I'm not marking it for stable backport. Reported-by: Andy Zlotek <andy.zlotek@oracle.com> Suggested-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-11 16:37:40 -04:00
Chuck Lever	82078b9895	NFSD: Ensure that xdr_write_pages updates rq_next_page All other NFSv[23] procedures manage to keep page_ptr and rq_next_page in lock step. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-05 09:01:44 -04:00
Chuck Lever	66a21db7db	NFSD: Replace encode_cinfo() De-duplicate "reserve_space; encode_cinfo". Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-05 09:01:44 -04:00
Chuck Lever	adaa7a50d0	NFSD: Add encoders for NFSv4 clientids and verifiers Deduplicate some common code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-06-05 09:01:44 -04:00

1 2 3 4 5 ...

3735 commits