system/qemu - HydraGit

mirror of https://gitlab.com/qemu-project/qemu synced 2024-11-02 22:41:07 +00:00

Author	SHA1	Message	Date
Paolo Bonzini	3f99cf5710	tools/virtiofsd: convert to Meson Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:09 -04:00
Dr. David Alan Gilbert	3005c099ef	virtiofsd: Allow addition or removal of capabilities Allow capabilities to be added or removed from the allowed set for the daemon; e.g. default: CapPrm: 00000000880000df CapEff: 00000000880000df -o modcaps=+sys_admin CapPrm: 00000000882000df CapEff: 00000000882000df -o modcaps=+sys_admin:-chown CapPrm: 00000000882000de CapEff: 00000000882000de Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200629115420.98443-4-dgilbert@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-07-03 16:23:05 +01:00
Dr. David Alan Gilbert	55b22a60cc	virtiofsd: Check capability calls Check the capability calls worked. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200629115420.98443-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-07-03 16:23:05 +01:00
Dr. David Alan Gilbert	b1288dfafb	virtiofsd: Terminate capability list capng_updatev is a varargs function that needs a -1 to terminate it, but it was missing. In practice what seems to have been happening is that it's added the capabilities we asked for, then runs into junk on the stack, so if we're unlucky it might be adding some more, but in reality it's failing - but after adding the capabilities we asked for. Fixes: `a59feb483b` ("virtiofsd: only retain file system capabilities") Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200629115420.98443-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-07-03 16:23:05 +01:00
Max Reitz	63659fe74e	virtiofsd: Whitelist fchmod lo_setattr() invokes fchmod() in a rarely used code path, so it should be whitelisted or virtiofsd will crash with EBADSYS. Said code path can be triggered for example as follows: On the host, in the shared directory, create a file with the sticky bit set and a security.capability xattr: (1) # touch foo (2) # chmod u+s foo (3) # setcap '' foo Then in the guest let some process truncate that file after it has dropped all of its capabilities (at least CAP_FSETID): int main(int argc, char *argv[]) { capng_setpid(getpid()); capng_clear(CAPNG_SELECT_BOTH); capng_updatev(CAPNG_ADD, CAPNG_PERMITTED \| CAPNG_EFFECTIVE, 0); capng_apply(CAPNG_SELECT_BOTH); ftruncate(open(argv[1], O_RDWR), 0); } This will cause the guest kernel to drop the sticky bit (i.e. perform a mode change) as part of the truncate (where FATTR_FH is set), and that will cause virtiofsd to invoke fchmod() instead of fchmodat(). (A similar configuration exists further below with futimens() vs. utimensat(), but the former is not a syscall but just a wrapper for the latter, so no further whitelisting is required.) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1842667 Reported-by: Qian Cai <caiqian@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608093111.14942-1-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-06-17 17:48:38 +01:00
Miklos Szeredi	93bb3d8d4c	virtiofsd: remove symlink fallbacks Path lookup in the kernel has special rules for looking up magic symlinks under /proc. If a filesystem operation is instructed to follow symlinks (e.g. via AT_SYMLINK_FOLLOW or lack of AT_SYMLINK_NOFOLLOW), and the final component is such a proc symlink, then the target of the magic symlink is used for the operation, even if the target itself is a symlink. I.e. path lookup is always terminated after following a final magic symlink. I was erronously assuming that in the above case the target symlink would also be followed, and so workarounds were added for a couple of operations to handle the symlink case. Since the symlink can be handled simply by following the proc symlink, these workardouds are not needed. Also remove the "norace" option, which disabled the workarounds. Commit `bdfd667883` ("virtiofsd: Fix xattr operations") already dealt with the same issue for xattr operations. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Message-Id: <20200514140736.20561-1-mszeredi@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-06-01 18:44:27 +01:00
Stefan Hajnoczi	66502bbca3	virtiofsd: drop all capabilities in the wait parent process All this process does is wait for its child. No capabilities are needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-01 20:05:37 +01:00
Stefan Hajnoczi	a59feb483b	virtiofsd: only retain file system capabilities virtiofsd runs as root but only needs a subset of root's Linux capabilities(7). As a file server its purpose is to create and access files on behalf of a client. It needs to be able to access files with arbitrary uid/gid owners. It also needs to be create device nodes. Introduce a Linux capabilities(7) whitelist and drop all capabilities that we don't need, making the virtiofsd process less powerful than a regular uid root process. # cat /proc/PID/status ... Before After CapInh: 0000000000000000 0000000000000000 CapPrm: 0000003fffffffff 00000000880000df CapEff: 0000003fffffffff 00000000880000df CapBnd: 0000003fffffffff 0000000000000000 CapAmb: 0000000000000000 0000000000000000 Note that file capabilities cannot be used to achieve the same effect on the virtiofsd executable because mount is used during sandbox setup. Therefore we drop capabilities programmatically at the right point during startup. This patch only affects the sandboxed child process. The parent process that sits in waitpid(2) still has full root capabilities and will be addressed in the next patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200416164907.244868-2-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-01 18:57:31 +01:00
Max Reitz	ace0829c0d	virtiofsd: Show submounts Currently, setup_mounts() bind-mounts the shared directory without MS_REC. This makes all submounts disappear. Pass MS_REC so that the guest can see submounts again. Fixes: `5baa3b8e95` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424133516.73077-1-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Changed Fixes to point to the commit with the problem rather than the commit that turned it on	2020-05-01 18:52:17 +01:00
Miklos Szeredi	397ae982f4	virtiofsd: jail lo->proc_self_fd While it's not possible to escape the proc filesystem through lo->proc_self_fd, it is possible to escape to the root of the proc filesystem itself through "../..". Use a temporary mount for opening lo->proc_self_fd, that has it's root at /proc/self/fd/, preventing access to the ancestor directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Message-Id: <20200429124733.22488-1-mszeredi@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-01 18:46:54 +01:00
Stefan Hajnoczi	8c1d353d10	virtiofsd: stay below fs.file-max sysctl value (CVE-2020-10717) The system-wide fs.file-max sysctl value determines how many files can be open. It defaults to a value calculated based on the machine's RAM size. Previously virtiofsd would try to set RLIMIT_NOFILE to 1,000,000 and this allowed the FUSE client to exhaust the number of open files system-wide on Linux hosts with less than 10 GB of RAM! Take fs.file-max into account when choosing the default RLIMIT_NOFILE value. Fixes: CVE-2020-10717 Reported-by: Yuval Avrahami <yavrahami@paloaltonetworks.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200501140644.220940-3-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-01 18:41:56 +01:00
Stefan Hajnoczi	6dbb716877	virtiofsd: add --rlimit-nofile=NUM option Make it possible to specify the RLIMIT_NOFILE on the command-line. Users running multiple virtiofsd processes should allocate a certain number to each process so that the system-wide limit can never be exhausted. When this option is set to 0 the rlimit is left at its current value. This is useful when a management tool wants to configure the rlimit itself. The default behavior remains unchanged: try to set the limit to 1,000,000 file descriptors if the current rlimit is lower. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200501140644.220940-2-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-01 18:41:55 +01:00
Philippe Mathieu-Daudé	e1cd92d95c	tools/virtiofsd/passthrough_ll: Fix double close() On success, the fdopendir() call closes fd. Later on the error path we try to close an already-closed fd. This can lead to use-after-free. Fix by only closing the fd if the fdopendir() call failed. Cc: qemu-stable@nongnu.org Fixes: `b39bce121b` (add dirp_map to hide lo_dirp pointers) Reported-by: Coverity (CID 1421933 USE_AFTER_FREE) Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200321120654.7985-1-philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-25 12:31:38 +00:00
Misono Tomohiro	bdfd667883	virtiofsd: Fix xattr operations Current virtiofsd has problems about xattr operations and they does not work properly for directory/symlink/special file. The fundamental cause is that virtiofsd uses openat() + f...xattr() systemcalls for xattr operation but we should not open symlink/special file in the daemon. Therefore the function is restricted. Fix this problem by: 1. during setup of each thread, call unshare(CLONE_FS) 2. in xattr operations (i.e. lo_getxattr), if inode is not a regular file or directory, use fchdir(proc_loot_fd) + ...xattr() + fchdir(root.fd) instead of openat() + f...xattr() (Note: for a regular file/directory openat() + f...xattr() is still used for performance reason) With this patch, xfstests generic/062 passes on virtiofs. This fix is suggested by Miklos Szeredi and Stefan Hajnoczi. The original discussion can be found here: https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-03 15:13:24 +00:00
Misono Tomohiro	16e15a7308	virtiofsd: passthrough_ll: cleanup getxattr/listxattr This is a cleanup patch to simplify the following xattr fix and there is no functional changes. - Move memory allocation to head of the function - Unify fgetxattr/flistxattr call for both size == 0 and size != 0 case - Remove redundant lo_inode_put call in error path (Note: second call is ignored now since @inode is already NULL) Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20200227055927.24566-2-misono.tomohiro@jp.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-03 15:13:24 +00:00
Xiao Yang	285eb7a704	virtiofsd: Remove fuse.h and struct fuse_module All code in fuse.h and struct fuse_module are not used by virtiofsd so removing them is safe. Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-21 12:53:17 +00:00
Philippe Mathieu-Daudé	09c086b2a1	tools/virtiofsd/fuse_lowlevel: Fix fuse_out_header::error value Fix warning reported by Clang static code analyzer: CC tools/virtiofsd/fuse_lowlevel.o tools/virtiofsd/fuse_lowlevel.c:195:9: warning: Value stored to 'error' is never read error = -ERANGE; ^ ~~~~~~~ Fixes: `3db2876` Reported-by: Clang Static Analyzer Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-21 12:53:17 +00:00
Philippe Mathieu-Daudé	4e1fb9e7bc	tools/virtiofsd/passthrough_ll: Remove unneeded variable assignment Fix warning reported by Clang static code analyzer: CC tools/virtiofsd/passthrough_ll.o tools/virtiofsd/passthrough_ll.c:925:9: warning: Value stored to 'newfd' is never read newfd = -1; ^ ~~ tools/virtiofsd/passthrough_ll.c:942:9: warning: Value stored to 'newfd' is never read newfd = -1; ^ ~~ Fixes: `7c6b66027` Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-21 12:53:17 +00:00
Philippe Mathieu-Daudé	d4db6f545d	tools/virtiofsd/passthrough_ll: Remove unneeded variable assignment Fix warning reported by Clang static code analyzer: CC tools/virtiofsd/passthrough_ll.o tools/virtiofsd/passthrough_ll.c:1083:5: warning: Value stored to 'saverr' is never read saverr = ENOMEM; ^ ~~~~~~ Fixes: `7c6b66027` Reported-by: Clang Static Analyzer Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-21 12:53:17 +00:00
Dr. David Alan Gilbert	82c1474e68	virtiofsd: Help message fix for 'seconds' second should be seconds. Reported-by: Christophe de Dinechin <dinechin@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-21 12:53:17 +00:00
Dr. David Alan Gilbert	99ce9a7e60	virtiofsd: do_read missing NULL check Missing a NULL check if the argument fetch fails. Fixes: Coverity CID 1413119 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-10 17:24:43 +00:00
Dr. David Alan Gilbert	686391112f	virtiofsd: load_capng missing unlock Missing unlock in error path. Fixes: Covertiy CID 1413123 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-10 17:24:43 +00:00
Dr. David Alan Gilbert	6fa249027f	virtiofsd: fv_create_listen_socket error path socket leak If we fail when bringing up the socket we can leak the listen_fd; in practice the daemon will exit so it's not really a problem. Fixes: Coverity CID 1413121 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-10 17:24:43 +00:00
Dr. David Alan Gilbert	988717b46b	virtiofsd: Remove fuse_req_getgroups Remove fuse_req_getgroups that's unused in virtiofsd; it came in from libfuse but we don't actually use it. It was called from fuse_getgroups which we previously removed (but had left it's header in). Coverity had complained about null termination in it, but removing it is the easiest answer. Fixes: Coverity CID: 1413117 (String not null terminated) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-10 17:24:43 +00:00
Masayoshi Mizuma	1d59b1b210	virtiofsd: add some options to the help message Add following options to the help message: - cache - flock\|no_flock - norace - posix_lock\|no_posix_lock - readdirplus\|no_readdirplus - timeout - writeback\|no_writeback - xattr\|no_xattr Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> dgilbert: Split cache, norace, posix_lock, readdirplus off into our own earlier patches that added the options Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Eryu Guan	9883df8cca	virtiofsd: stop all queue threads on exit in virtio_loop() On guest graceful shutdown, virtiofsd receives VHOST_USER_GET_VRING_BASE request from VMM and shuts down virtqueues by calling fv_set_started(), which joins fv_queue_thread() threads. So when virtio_loop() returns, there should be no thread is still accessing data in fuse session and/or virtio dev. But on abnormal exit, e.g. guest got killed for whatever reason, vhost-user socket is closed and virtio_loop() breaks out the main loop and returns to main(). But it's possible fv_queue_worker()s are still working and accessing fuse session and virtio dev, which results in crash or use-after-free. Fix it by stopping fv_queue_thread()s before virtio_loop() returns, to make sure there's no-one could access fuse session and virtio dev. Reported-by: Qingming Su <qingming.su@linux.alibaba.com> Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Xiao Yang	a931b6861e	virtiofsd/passthrough_ll: Pass errno to fuse_reply_err() lo_copy_file_range() passes -errno to fuse_reply_err() and then fuse_reply_err() changes it to errno again, so that subsequent fuse_send_reply_iov_nofree() catches the wrong errno.(i.e. reports "fuse: bad error value: ..."). Make fuse_send_reply_iov_nofree() accept the correct -errno by passing errno directly in lo_copy_file_range(). Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Eryu Guan <eguan@linux.alibaba.com> dgilbert: Sent upstream and now Merged as aa1185e153f774f1df65 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert	fe4c15798a	virtiofsd: Convert lo_destroy to take the lo->mutex lock itself lo_destroy was relying on some implicit knowledge of the locking; we can avoid this if we create an unref_inode that doesn't take the lock and then grab it for the whole of the lo_destroy. Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	951b3120db	virtiofsd: add --thread-pool-size=NUM option Add an option to control the size of the thread pool. Requests are now processed in parallel by default. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	28f7a3b026	virtiofsd: fix lo_destroy() resource leaks Now that lo_destroy() is serialized we can call unref_inode() so that all inode resources are freed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	cdc497c692	virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races When running with multiple threads it can be tricky to handle FUSE_INIT/FUSE_DESTROY in parallel with other request types or in parallel with themselves. Serialize FUSE_INIT and FUSE_DESTROY so that malicious clients cannot trigger race conditions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	a3d756c5ae	virtiofsd: process requests in a thread pool Introduce a thread pool so that fv_queue_thread() just pops VuVirtqElements and hands them to the thread pool. For the time being only one worker thread is allowed since passthrough_ll.c is not thread-safe yet. Future patches will lift this restriction so that multiple FUSE requests can be processed in parallel. The main new concept is struct FVRequest, which contains both VuVirtqElement and struct fuse_chan. We now have fv_VuDev for a device, fv_QueueInfo for a virtqueue, and FVRequest for a request. Some of fv_QueueInfo's fields are moved into FVRequest because they are per-request. The name FVRequest conforms to QEMU coding style and I expect the struct fv_* types will be renamed in a future refactoring. This patch series is not optimal. fbuf reuse is dropped so each request does malloc(se->bufsize), but there is no clean and cheap way to keep this with a thread pool. The vq_lock mutex is held for longer than necessary, especially during the eventfd_write() syscall. Performance can be improved in the future. prctl(2) had to be added to the seccomp whitelist because glib invokes it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
piaojun	c465bba2c9	virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance fuse_buf_writev() only handles the normal write in which src is buffer and dest is fd. Specially if src buffer represents guest physical address that can't be mapped by the daemon process, IO must be bounced back to the VMM to do it by fuse_buf_copy(). Signed-off-by: Jun Piao <piaojun@huawei.com> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
piaojun	9ceaaa15cf	virtiofsd: add definition of fuse_buf_writev() Define fuse_buf_writev() which use pwritev and writev to improve io bandwidth. Especially, the src bufs with 0 size should be skipped as their mems are not block_size aligned which will cause writev failed in direct io mode. Signed-off-by: Jun Piao <piaojun@huawei.com> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Misono Tomohiro	9b610b09b4	virtiofsd: passthrough_ll: Use cache_readdir for directory open Since keep_cache(FOPEN_KEEP_CACHE) has no effect for directory as described in fuse_common.h, use cache_readdir(FOPNE_CACHE_DIR) for diretory open when cache=always mode. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Misono Tomohiro	8e4e41e39e	virtiofsd: Fix data corruption with O_APPEND write in writeback mode When writeback mode is enabled (-o writeback), O_APPEND handling is done in kernel. Therefore virtiofsd clears O_APPEND flag when open. Otherwise O_APPEND flag takes precedence over pwrite() and write data may corrupt. Currently clearing O_APPEND flag is done in lo_open(), but we also need the same operation in lo_create(). So, factor out the flag update operation in lo_open() to update_open_flags() and call it in both lo_open() and lo_create(). This fixes the failure of xfstest generic/069 in writeback mode (which tests O_APPEND write data integrity). Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Vivek Goyal	65da453980	virtiofsd: Reset O_DIRECT flag during file open If an application wants to do direct IO and opens a file with O_DIRECT in guest, that does not necessarily mean that we need to bypass page cache on host as well. So reset this flag on host. If somebody needs to bypass page cache on host as well (and it is safe to do so), we can add a knob in daemon later to control this behavior. I check virtio-9p and they do reset O_DIRECT flag. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Eryu Guan	fc1aed0bf9	virtiofsd: convert more fprintf and perror to use fuse log infra Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Peng Tao	e468d4af5f	virtiofsd: do not always set FUSE_FLOCK_LOCKS Right now we always enable it regardless of given commandlines. Fix it by setting the flag relying on the lo->flock bit. Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	c241aa9457	virtiofsd: introduce inode refcount to prevent use-after-free If thread A is using an inode it must not be deleted by thread B when processing a FUSE_FORGET request. The FUSE protocol itself already has a counter called nlookup that is used in FUSE_FORGET messages. We cannot trust this counter since the untrusted client can manipulate it via FUSE_FORGET messages. Introduce a new refcount to keep inodes alive for the required lifespan. lo_inode_put() must be called to release a reference. FUSE's nlookup counter holds exactly one reference so that the inode stays alive as long as the client still wants to remember it. Note that the lo_inode->is_symlink field is moved to avoid creating a hole in the struct due to struct field alignment. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Miklos Szeredi	9257e514d8	virtiofsd: passthrough_ll: fix refcounting on remove/rename Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	1222f01555	virtiofsd: rename inode->refcount to inode->nlookup This reference counter plays a specific role in the FUSE protocol. It's not a generic object reference counter and the FUSE kernel code calls it "nlookup". Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	acefdde73b	virtiofsd: prevent races with lo_dirp_put() Introduce lo_dirp_put() so that FUSE_RELEASEDIR does not cause use-after-free races with other threads that are accessing lo_dirp. Also make lo_releasedir() atomic to prevent FUSE_RELEASEDIR racing with itself. This prevents double-frees. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	baed65c060	virtiofsd: make lo_release() atomic Hold the lock across both lo_map_get() and lo_map_remove() to prevent races between two FUSE_RELEASE requests. In this case I don't see a serious bug but it's safer to do things atomically. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	e7b337326d	virtiofsd: prevent fv_queue_thread() vs virtio_loop() races We call into libvhost-user from the virtqueue handler thread and the vhost-user message processing thread without a lock. There is nothing protecting the virtqueue handler thread if the vhost-user message processing thread changes the virtqueue or memory table while it is running. This patch introduces a read-write lock. Virtqueue handler threads are readers. The vhost-user message processing thread is a writer. This will allow concurrency for multiqueue in the future while protecting against fv_queue_thread() vs virtio_loop() races. Note that the critical sections could be made smaller but it would be more invasive and require libvhost-user changes. Let's start simple and improve performance later, if necessary. Another option would be an RCU-style approach with lighter-weight primitives. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Stefan Hajnoczi	620e9d8d9c	virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy() vu_socket_path is NULL when --fd=FDNUM was used. Use fuse_lowlevel_is_virtio() instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Vivek Goyal	0e81414c54	virtiofsd: Support remote posix locks Doing posix locks with-in guest kernel are not sufficient if a file/dir is being shared by multiple guests. So we need the notion of daemon doing the locks which are visible to rest of the guests. Given posix locks are per process, one can not call posix lock API on host, otherwise bunch of basic posix locks properties are broken. For example, If two processes (A and B) in guest open the file and take locks on different sections of file, if one of the processes closes the fd, it will close fd on virtiofsd and all posix locks on file will go away. This means if process A closes the fd, then locks of process B will go away too. Similar other problems exist too. This patch set tries to emulate posix locks while using open file description locks provided on Linux. Daemon provides two options (-o posix_lock, -o no_posix_lock) to enable or disable posix locking in daemon. By default it is enabled. There are few issues though. - GETLK() returns pid of process holding lock. As we are emulating locks using OFD, and these locks are not per process and don't return pid of process, so GETLK() in guest does not reuturn process pid. - As of now only F_SETLK is supported and not F_SETLKW. We can't block the thread in virtiofsd for arbitrary long duration as there is only one thread serving the queue. That means unlock request will not make it to daemon and F_SETLKW will block infinitely and bring virtio-fs to a halt. This is a solvable problem though and will require significant changes in virtiofsd and kernel. Left as a TODO item for now. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Liu Bo	740b0b700a	Virtiofsd: fix memory leak on fuse queueinfo For fuse's queueinfo, both queueinfo array and queueinfos are allocated in fv_queue_set_started() but not cleaned up when the daemon process quits. This fixes the leak in proper places. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Eric Ren <renzhen@linux.alibaba.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Eric Ren	fc3f0041b4	virtiofsd: fix incorrect error handling in lo_do_lookup Signed-off-by: Eric Ren <renzhen@linux.alibaba.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00
Liu Bo	b7ed733a38	virtiofsd: enable PARALLEL_DIROPS during INIT lookup is a RO operations, PARALLEL_DIROPS can be enabled. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-23 16:41:37 +00:00

1 2 3

127 commits