linux

mirror of https://github.com/torvalds/linux synced 2024-10-17 00:39:37 +00:00

Author	SHA1	Message	Date
Jaegeuk Kim	9ac00e7cef	f2fs: do not issue small discard commands during checkpoint If there're huge # of small discards, this will increase checkpoint latency insanely. Let's issue small discards only by trim. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:21:41 -07:00
Daeho Jeong	c9667b19e2	f2fs: check zone write pointer points to the end of zone We don't need to report an issue, when the zone write pointer already points to the end of the zone, since the zone mismatch is already taken care. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:10 -07:00
Sheng Yong	ac1ee161de	f2fs: add f2fs_ioc_get_compress_blocks This patch adds f2fs_ioc_get_compress_blocks() to provide a common f2fs_get_compress_blocks(). Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:10 -07:00
Sheng Yong	dde38c03b3	f2fs: cleanup MIN_INLINE_XATTR_SIZE Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:10 -07:00
Sheng Yong	c571fbb5b5	f2fs: add helper to check compression level This patch adds a helper function to check if compression level is valid. Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:09 -07:00
Christoph Hellwig	94c8431fb4	f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method Since commit `a2ad63daa8` ("VFS: add FMODE_CAN_ODIRECT file flag") file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Do that for f2fs so that noop_direct_IO can eventually be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:09 -07:00
Chao Yu	f240d3aaf5	f2fs: do more sanity check on inode There are several issues in sanity_check_inode(): - The code looks not clean, it checks extra_attr related condition dispersively. - It missed to check i_extra_isize w/ lower boundary - It missed to check feature dependency: prjquota, inode_chksum, inode_crtime, compression features rely on extra_attr feature. - It's not necessary to check i_extra_isize due to it will only be assigned to non-zero value if f2fs_has_extra_attr() is true in do_read_inode(). Fix them all in this patch. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:09 -07:00
Chao Yu	64ee9163fe	f2fs: compress: fix to check validity of i_compress_flag field The last valid compress related field is i_compress_flag, check its validity instead of i_log_cluster_size. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:09 -07:00
Yangtao Li	698a5c8c8e	f2fs: add sanity compress level check for compressed file Commit `3fde13f817` ("f2fs: compress: support compress level") forgot to do basic compress level check, let's add it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:09 -07:00
Jaegeuk Kim	00e120b5e4	f2fs: assign default compression level Let's avoid any confusion from assigning compress_level=0 for LZ4HC and ZSTD. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:09 -07:00
Chao Yu	ccf3ff2b30	f2fs: introduce F2FS_QUOTA_DEFAULT_FL for cleanup This patch adds F2FS_QUOTA_DEFAULT_FL to include two default flags: F2FS_NOATIME_FL and F2FS_IMMUTABLE_FL, and use it to clean up codes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:08 -07:00
Chao Yu	8bec7dd1b3	f2fs: check return value of freeze_super() freeze_super() can fail, it needs to check its return value and do error handling in f2fs_resize_fs(). Fixes: `04f0b2eaa3` ("f2fs: ioctl for removing a range from F2FS") Fixes: `b4b10061ef` ("f2fs: refactor resize_fs to avoid meta updates in progress") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-26 06:07:08 -07:00
Chao Yu	5079e1c0c8	f2fs: avoid dead loop in f2fs_issue_checkpoint() generic/082 reports a bug as below: __schedule+0x332/0xf60 schedule+0x6f/0xf0 schedule_timeout+0x23b/0x2a0 wait_for_completion+0x8f/0x140 f2fs_issue_checkpoint+0xfe/0x1b0 f2fs_sync_fs+0x9d/0xb0 sync_filesystem+0x87/0xb0 dquot_load_quota_sb+0x41b/0x460 dquot_load_quota_inode+0xa5/0x130 dquot_quota_on+0x4b/0x60 f2fs_quota_on+0xe3/0x1b0 do_quotactl+0x483/0x700 __x64_sys_quotactl+0x15c/0x310 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The root casue is race case as below: Thread A Kworker IRQ - write() : write data to quota.user file - writepages - f2fs_submit_page_write - __is_cp_guaranteed return false - inc_page_count(F2FS_WB_DATA) - submit_bio - quotactl(Q_QUOTAON) - f2fs_quota_on - dquot_quota_on - dquot_load_quota_inode - vfs_setup_quota_inode : inode->i_flags \|= S_NOQUOTA - f2fs_write_end_io - __is_cp_guaranteed return true - dec_page_count(F2FS_WB_CP_DATA) - dquot_load_quota_sb - f2fs_sync_fs - f2fs_issue_checkpoint - do_checkpoint - f2fs_wait_on_all_pages(F2FS_WB_CP_DATA) : loop due to F2FS_WB_CP_DATA count is negative Calling filemap_fdatawrite() and filemap_fdatawait() to keep all data clean before quota file setup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:37:36 -07:00
Wu Bo	cadfc2f9f8	f2fs: fix args passed to trace_f2fs_lookup_end The NULL return of 'd_splice_alias' dosen't mean error. Thus the successful case will also return NULL, which makes the tracepoint always print 'err=-ENOENT'. And the different cases of 'new' & 'err' are list as following: 1) dentry exists: err(0) with new(NULL) --> dentry, err=0 2) dentry exists: err(0) with new(VALID) --> new, err=0 3) dentry exists: err(0) with new(ERR) --> dentry, err=ERR 4) no dentry exists: err(-ENOENT) with new(NULL) --> dentry, err=-ENOENT 5) no dentry exists: err(-ENOENT) with new(VALID) --> new, err=-ENOENT 6) no dentry exists: err(-ENOENT) with new(ERR) --> dentry, err=ERR Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:09 -07:00
Yangtao Li	38b57833de	f2fs: flag as supporting buffered async reads The f2fs uses generic_file_buffered_read(), which supports buffered async reads since commit `1a0a7853b9` ("mm: support async buffered reads in generic_file_buffered_read()"). Let's enable it to match other file-systems. The read performance has been greatly improved under io_uring: 167M/s -> 234M/s, Increase ratio by 40% Test w/: ./fio --name=onessd --filename=/data/test/local/io_uring_test --size=256M --rw=randread --bs=4k --direct=0 --overwrite=0 --numjobs=1 --iodepth=1 --time_based=0 --runtime=10 --ioengine=io_uring --registerfiles --fixedbufs --gtod_reduce=1 --group_reporting --sqthread_poll=1 Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:09 -07:00
Chao Yu	20872584b8	f2fs: fix to drop all dirty meta/node pages during umount() For cp error case, there will be dirty meta/node pages remained after f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and do sanity check on reference count of dirty pages and inflight IOs. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:09 -07:00
Chunhai Guo	38a4a330c8	f2fs: Detect looped node chain efficiently find_fsync_dnodes() detect the looped node chain by comparing the loop counter with free blocks. While it may take tens of seconds to quit when the free blocks are large enough. We can use Floyd's cycle detection algorithm to make the detection more efficient. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:09 -07:00
Daejun Park	25f9080576	f2fs: add async reset zone command support This patch enables submit reset zone command asynchornously. It helps decrease average latency of write IOs in high utilization scenario by faster checkpointing. Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:09 -07:00
Chao Yu	901c12d144	f2fs: flush error flags in workqueue In IRQ context, it wakes up workqueue to record errors into on-disk superblock fields rather than in-memory fields. Fixes: `1aa161e431` ("f2fs: fix scheduling while atomic in decompression path") Fixes: `95fa90c9e5` ("f2fs: support recording errors into superblock") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:08 -07:00
Chao Yu	458c15dfbc	f2fs: don't reset unchangable mount option in f2fs_remount() syzbot reports a bug as below: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942 Call Trace: lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691 __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline] _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300 __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100 f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116 f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664 f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838 vfs_fallocate+0x54b/0x6b0 fs/open.c:324 ksys_fallocate fs/open.c:347 [inline] __do_sys_fallocate fs/open.c:355 [inline] __se_sys_fallocate fs/open.c:353 [inline] __x64_sys_fallocate+0xbd/0x100 fs/open.c:353 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is race condition as below: - since it tries to remount rw filesystem, so that do_remount won't call sb_prepare_remount_readonly to block fallocate, there may be race condition in between remount and fallocate. - in f2fs_remount(), default_options() will reset mount option to default one, and then update it based on result of parse_options(), so there is a hole which race condition can happen. Thread A Thread B - f2fs_fill_super - parse_options - clear_opt(READ_EXTENT_CACHE) - f2fs_remount - default_options - set_opt(READ_EXTENT_CACHE) - f2fs_fallocate - f2fs_insert_range - f2fs_drop_extent_tree - __drop_extent_tree - __may_extent_tree - test_opt(READ_EXTENT_CACHE) return true - write_lock(&et->lock) access NULL pointer - parse_options - clear_opt(READ_EXTENT_CACHE) Cc: <stable@vger.kernel.org> Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:08 -07:00
Chao Yu	d8189834d4	f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io() butt3rflyh4ck reports a bug as below: When a thread always calls F2FS_IOC_RESIZE_FS to resize fs, if resize fs is failed, f2fs kernel thread would invoke callback function to update f2fs io info, it would call f2fs_write_end_io and may trigger null-ptr-deref in NODE_MAPPING. general protection fault, probably for non-canonical address KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] RIP: 0010:NODE_MAPPING fs/f2fs/f2fs.h:1972 [inline] RIP: 0010:f2fs_write_end_io+0x727/0x1050 fs/f2fs/data.c:370 <TASK> bio_endio+0x5af/0x6c0 block/bio.c:1608 req_bio_endio block/blk-mq.c:761 [inline] blk_update_request+0x5cc/0x1690 block/blk-mq.c:906 blk_mq_end_request+0x59/0x4c0 block/blk-mq.c:1023 lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1101 __do_softirq+0x1d4/0x8ef kernel/softirq.c:571 run_ksoftirqd kernel/softirq.c:939 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:931 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 The root cause is below race case can cause leaving dirty metadata in f2fs after filesystem is remount as ro: Thread A Thread B - f2fs_ioc_resize_fs - f2fs_readonly --- return false - f2fs_resize_fs - f2fs_remount - write_checkpoint - set f2fs as ro - free_segment_range - update meta_inode's data Then, if f2fs_put_super() fails to write_checkpoint due to readonly status, and meta_inode's dirty data will be writebacked after node_inode is put, finally, f2fs_write_end_io will access NULL pointer on sbi->node_inode. Thread A IRQ context - f2fs_put_super - write_checkpoint fails - iput(node_inode) - node_inode = NULL - iput(meta_inode) - write_inode_now - f2fs_write_meta_page - f2fs_write_end_io - NODE_MAPPING(sbi) : access NULL pointer on node_inode Fixes: `b4b10061ef` ("f2fs: refactor resize_fs to avoid meta updates in progress") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Closes: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:08 -07:00
Chao Yu	bfd4766239	f2fs: clean up w/ sbi->log_sectors_per_block Use sbi->log_sectors_per_block to clean up below calculated one: unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:08 -07:00
Chao Yu	90b7c4b748	f2fs: fix to set noatime and immutable flag for quota file We should set noatime bit for quota files, since no one cares about atime of quota file, and we should set immutalbe bit as well, due to nobody should write to the file through exported interfaces. Meanwhile this patch use inode_lock to avoid race condition during inode->i_flags, f2fs_inode->i_flags update. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:08 -07:00
Chao Yu	77e820ea73	f2fs: renew value of F2FS_FEATURE_* Define F2FS_FEATURE_* macro w/ 32-bits value rather than 16-bits value. No logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:07 -07:00
Chao Yu	478d7100f4	f2fs: renew value of F2FS_MOUNT_* Then we can just define newly introduced mount option w/ lasted free number rather than random free one. Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:07 -07:00
Chao Yu	f082c6b205	f2fs: fix potential deadlock due to unpaired node_write lock use If S_NOQUOTA is cleared from inode during data page writeback of quota file, it may miss to unlock node_write lock, result in potential deadlock, fix to use the lock in paired. Kworker Thread - writepage if (IS_NOQUOTA()) f2fs_down_read(&sbi->node_write); - vfs_cleanup_quota_inode - inode->i_flags &= ~S_NOQUOTA; if (IS_NOQUOTA()) f2fs_up_read(&sbi->node_write); Fixes: `79963d967b` ("f2fs: shrink node_write lock coverage") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:07 -07:00
Yonggil Song	36ded4c106	f2fs: Fix over-estimating free section during FG GC There was a bug that finishing FG GC unconditionally because free sections are over-estimated after checkpoint in FG GC. This patch initializes sec_freed by every checkpoint in FG GC. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:07 -07:00
Daeho Jeong	04abeb699d	f2fs: close unused open zones while mounting Zoned UFS allows only 6 open zones at the same time, so we need to take care of the count of open zones while mounting. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-06-12 13:04:07 -07:00
Christoph Hellwig	05bdb99653	block: replace fmode_t with a block-specific type for block open flags The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-06-12 08:04:05 -06:00
Christoph Hellwig	81b1fb7d17	fs: remove sb->s_mode There is no real need to store the open mode in the super_block now. It is only used by f2fs, which can easily recalculate it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-06-12 08:04:04 -06:00
Christoph Hellwig	2736e8eeb0	block: use the holder as indication for exclusive opens The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-06-12 08:04:04 -06:00
Christoph Hellwig	182c25e9c1	filemap: update ki_pos in generic_perform_write All callers of generic_perform_write need to updated ki_pos, move it into common code. Link: https://lkml.kernel.org/r/20230601145904.1385409-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-09 16:25:52 -07:00
Christoph Hellwig	0d625446d0	backing_dev: remove current->backing_dev_info Patch series "cleanup the filemap / direct I/O interaction", v4. This series cleans up some of the generic write helper calling conventions and the page cache writeback / invalidation for direct I/O. This is a spinoff from the no-bufferhead kernel project, for which we'll want to an use iomap based buffered write path in the block layer. This patch (of 12): The last user of current->backing_dev_info disappeared in commit `b9b1335e64` ("remove bdi_congested() and wb_congested() and related functions"). Remove the field and all assignments to it. Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-09 16:25:51 -07:00
Christoph Hellwig	0718afd47f	block: introduce holder ops Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-06-05 10:53:04 -06:00
Jan Kara	cde3c9d7e2	Revert "f2fs: fix potential corruption when moving a directory" This reverts commit `d94772154e`. The locking is going to be provided by VFS. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-3-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-06-02 14:55:32 +02:00
David Howells	ceb11d0e2d	f2fs: Provide a splice-read wrapper Provide a splice_read wrapper for f2fs. This does some checks and tracing before calling filemap_splice_read() and will update the iostats afterwards. Direct I/O is handled by the caller. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jaegeuk Kim <jaegeuk@kernel.org> cc: Chao Yu <chao@kernel.org> cc: linux-f2fs-devel@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-20-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-24 08:42:16 -06:00
Jaegeuk Kim	633c8b9409	f2fs: fix the wrong condition to determine atomic context Should use !in_task for irq context. Cc: stable@vger.kernel.org Fixes: `1aa161e431` ("f2fs: fix scheduling while atomic in decompression path") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-23 18:37:42 -07:00
Daeho Jeong	e067dc3c6b	f2fs: maintain six open zones for zoned devices To keep six open zone constraints, make them not to be open over six open zones. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-23 18:37:38 -07:00
Christophe JAILLET	08c3eab525	f2fs: remove some dead code 'ret' is known to be 0 at the point. So these lines of code should just be removed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-08 11:18:04 -07:00
Li Zetao	1223e432d9	f2fs: remove redundant goto statement in f2fs_read_single_page() After the commit "0a4ee518185", this "goto" statement was redundant, remote it for clean code. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-08 11:18:04 -07:00
Yangtao Li	7cd2e5f75b	f2fs: do not allow to defragment files have FI_COMPRESS_RELEASED If a file has FI_COMPRESS_RELEASED, all writes for it should not be allowed. Fixes: `5fdb322ff2` ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Signed-off-by: Qi Han <hanqi@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-08 11:18:04 -07:00
Yangtao Li	888ca6edac	f2fs: add sanity check for proc_mkdir Return -ENOMEM when proc_mkdir failed. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-08 11:18:04 -07:00
Chao Yu	b62e71be21	f2fs: support errors=remount-ro\|continue\|panic mountoption This patch supports errors=remount-ro\|continue\|panic mount option for f2fs. f2fs behaves as below in three different modes: mode continue remount-ro panic access ops normal noraml N/A syscall errors -EIO -EROFS N/A mount option rw ro N/A pending dir write keep keep N/A pending non-dir write drop keep N/A pending node write drop keep N/A pending meta write keep keep N/A By default it uses "continue" mode. [Yangtao helps to clean up function's name] Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-05-08 11:18:04 -07:00
Linus Torvalds	5c7ecada25	f2fs update for 6.4-rc1 In this round, we've mainly modified to support non-power-of-two zone size, which is not required for f2fs by design. In order to avoid arch dependency, we refactored the messy rb_entry structure shared across different extent_cache. In addition to the improvement, we've also fixed several subtle bugs and error cases. Enhancement: - support non-power-of-two zone size for zoned device - remove sharing the rb_entry structure in extent cache - refactor f2fs_gc to call checkpoint in urgent condition - support iopoll Bug fix: - fix potential corruption when moving a directory - fix to avoid use-after-free for cached IPU bio - fix the folio private usage - avoid kernel warnings or panics in the cp_error case - fix to recover quota data correctly - fix some bugs in atomic operations - fix system crash due to lack of free space in LFS - fix null pointer panic in tracepoint in __replace_atomic_write_block - fix iostat lock protection - fix scheduling while atomic in decompression path - preserve direct write semantics when buffering is forced - fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmRIHMMACgkQQBSofoJI UNI3Mw//eQvxUXaWtCjTJQtXPotaah6ZcvnMMtfl6Cf0Z8Sq4L9q4yQMA16MXbLU zz3cexKXIHTzqWfqFLunaj6cmH/THAY3L3fTkFhE+dx1H2IaFprGLW3H8hW/58tr j9365RPVY2d/3agB1KikTj6FQ5OTGibkZagjsC28VmQ30VLIm+4jnHdIoX92UP+k 87JQ/fbG2XAiHX/ifcVuMXY3++db9jaZahsmhdJ1LNTZzztO241RzrNoBsLcSwSZ DkPgJXARQzFNDRfveRXSbV3ygR9C62pNITtSGC86ZRLyoAmko9se+nMEFH7YEkUy Rhf0Qzq2Gy6ThiVo8ZjuLvNycF0oj3OefX1PQLT6vzkv3Sv4Yij48bN1HqPdYsKH 3hPZd2V7A3o2LCJPPPNjZ/6nuKhrX+kU33FjUrxiYqz7Lt74j70vVEHQ7vSCGkrQ YpQYVXFr1hdejdemCpwgdvcEegNlV0GfqCG5KL1f7jJiGHfvxZnOEJ3x9dCQFTIE xVoWTzw9pbmBkTudrFNVRlX2RSQYSvgLFwUhQ3WE0qNu0mUMP+4E+50iKHYraJ7R W1TajZ+ttUJAnZ076vGGEOxabefEdtReOtdstohcJlDaGm5sI9I9CXQRvY4ZSymW l7ZHY/b+/IzP+/fLEX7DgTnWip37H14FImvjYRGpSEzc6sXiOUU= =qHTl -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this round, we've mainly modified to support non-power-of-two zone size, which is not required for f2fs by design. In order to avoid arch dependency, we refactored the messy rb_entry structure shared across different extent_cache. In addition to the improvement, we've also fixed several subtle bugs and error cases. Enhancements: - support non-power-of-two zone size for zoned device - remove sharing the rb_entry structure in extent cache - refactor f2fs_gc to call checkpoint in urgent condition - support iopoll Bug fixes: - fix potential corruption when moving a directory - fix to avoid use-after-free for cached IPU bio - fix the folio private usage - avoid kernel warnings or panics in the cp_error case - fix to recover quota data correctly - fix some bugs in atomic operations - fix system crash due to lack of free space in LFS - fix null pointer panic in tracepoint in __replace_atomic_write_block - fix iostat lock protection - fix scheduling while atomic in decompression path - preserve direct write semantics when buffering is forced - fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()" * tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: remove unnessary comment in __may_age_extent_tree f2fs: allocate node blocks for atomic write block replacement f2fs: use cow inode data when updating atomic write f2fs: remove power-of-two limitation of zoned device f2fs: allocate trace path buffer from names_cache f2fs: add has_enough_free_secs() f2fs: relax sanity check if checkpoint is corrupted f2fs: refactor f2fs_gc to call checkpoint in urgent condition f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del() f2fs: support iopoll method f2fs: remove batched_trim_sections node description f2fs: fix to check return value of inc_valid_block_count() f2fs: fix to check return value of f2fs_do_truncate_blocks() f2fs: fix passing relative address when discard zones f2fs: fix potential corruption when moving a directory f2fs: add radix_tree_preload_end in error case f2fs: fix to recover quota data correctly f2fs: fix to check readonly condition correctly docs: f2fs: Correct instruction to disable checkpoint ...	2023-04-26 09:42:10 -07:00
Qi Han	8375be2b64	f2fs: remove unnessary comment in __may_age_extent_tree This comment make no sense and is in the wrong place, so let's remove it. Signed-off-by: Qi Han <hanqi@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-24 11:03:10 -07:00
Daeho Jeong	994b442b66	f2fs: allocate node blocks for atomic write block replacement When a node block is missing for atomic write block replacement, we need to allocate it in advance of the replacement. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-24 11:03:10 -07:00
Daeho Jeong	591fc34e1f	f2fs: use cow inode data when updating atomic write Need to use cow inode data content instead of the one in the original inode, when we try to write the already updated atomic write files. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-24 11:03:10 -07:00
Jaegeuk Kim	2e2c6e9b72	f2fs: remove power-of-two limitation of zoned device In f2fs, there's no reason to force po2. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-24 11:03:10 -07:00
Wu Bo	5584785080	f2fs: allocate trace path buffer from names_cache It would be better to use the dedicated slab to store path. Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-20 09:38:12 -07:00
Yangtao Li	c1660d88a0	f2fs: add has_enough_free_secs() Replace !has_not_enough_free_secs w/ has_enough_free_secs. BTW avoid nested 'if' statements in f2fs_balance_fs(). Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-18 09:05:54 -07:00
Jaegeuk Kim	bd90c5cd33	f2fs: relax sanity check if checkpoint is corrupted 1. extent_cache - let's drop the largest extent_cache 2. invalidate_block - don't show the warnings Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-18 09:05:54 -07:00
Jaegeuk Kim	2d3f197bad	f2fs: refactor f2fs_gc to call checkpoint in urgent condition The major change is to call checkpoint, if there's not enough space while having some prefree segments in FG_GC case. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-18 09:05:43 -07:00
Chao Yu	635a52da86	f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio We have maintain PagePrivate and page_private and page reference w/ {set,clear}_page_private_*, it doesn't need to call folio_detach_private() in the end of .invalidate_folio and .release_folio, remove it and use f2fs_bug_on instead. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-17 14:49:40 -07:00
Yangtao Li	33560f8020	f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del() Convert to use remove_proc_subtree() and kill kobject_del() directly. kobject_put() actually covers kobject removal automatically, which is single stage removal. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-17 14:49:30 -07:00
Wu Bo	50aa6f44e1	f2fs: support iopoll method Wire up the iopoll method to the common implementation. As f2fs use common dio infrastructure: commit `a1e09b03e6` ("f2fs: use iomap for direct I/O") Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-13 16:38:18 -07:00
Chao Yu	935fc6fa64	f2fs: fix to check return value of inc_valid_block_count() In __replace_atomic_write_block(), we missed to check return value of inc_valid_block_count(), for extreme testcase that f2fs image is run out of space, it may cause inconsistent status in between SIT table and total valid block count. Cc: Daeho Jeong <daehojeong@google.com> Fixes: `3db1de0e58` ("f2fs: change the current atomic write way") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-13 16:37:57 -07:00
Chao Yu	b851ee6ba3	f2fs: fix to check return value of f2fs_do_truncate_blocks() Otherwise, if truncation on cow_inode failed, remained data may pollute current transaction of atomic write. Cc: Daeho Jeong <daehojeong@google.com> Fixes: `a46bebd502` ("f2fs: synchronize atomic write aborts") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-13 16:37:57 -07:00
Daeho Jeong	1ac3d037be	f2fs: fix passing relative address when discard zones We should not pass relative address in a zone to __f2fs_issue_discard_zone(). Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-13 16:37:57 -07:00
Jaegeuk Kim	d94772154e	f2fs: fix potential corruption when moving a directory F2FS has the same issue in ext4_rename causing crash revealed by xfstests/generic/707. See also commit `0813299c58` ("ext4: Fix possible corruption when moving a directory") CC: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-12 20:00:37 -07:00
Yohan Joung	d09bd85300	f2fs: add radix_tree_preload_end in error case To prevent excessive increase in preemption count add radix_tree_preload_end in retry Signed-off-by: Yohan Joung <yohan.joung@sk.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-12 20:00:37 -07:00
Chao Yu	e1bb7d3d9c	f2fs: fix to recover quota data correctly With -O quota mkfs option, xfstests generic/417 fails due to fsck detects data corruption on quota inodes. [ASSERT] (fsck_chk_quota_files:2051) --> Quota file is missing or invalid quota file content found. The root cause is there is a hole f2fs doesn't hold quota inodes, so all recovered quota data will be dropped due to SBI_POR_DOING flag was set. - f2fs_fill_super - f2fs_recover_orphan_inodes - f2fs_enable_quota_files - f2fs_quota_off_umount <--- quota inodes were dropped ---> - f2fs_recover_fsync_data - f2fs_enable_quota_files - f2fs_quota_off_umount This patch tries to eliminate the hole by holding quota inodes during entire recovery flow as below: - f2fs_fill_super - f2fs_recover_quota_begin - f2fs_recover_orphan_inodes - f2fs_recover_fsync_data - f2fs_recover_quota_end Then, recovered quota data can be persisted after SBI_POR_DOING is cleared. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-12 20:00:36 -07:00
Chao Yu	d78dfefcde	f2fs: fix to check readonly condition correctly With below case, it can mount multi-device image w/ rw option, however one of secondary device is set as ro, later update will cause panic, so let's introduce f2fs_dev_is_readonly(), and check multi-devices rw status in f2fs_remount() w/ it in order to avoid such inconsistent mount status. mkfs.f2fs -c /dev/zram1 /dev/zram0 -f blockdev --setro /dev/zram1 mount -t f2fs dev/zram0 /mnt/f2fs mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only. mount -t f2fs -o remount,rw mnt/f2fs dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=8192 kernel BUG at fs/f2fs/inline.c:258! RIP: 0010:f2fs_write_inline_data+0x23e/0x2d0 [f2fs] Call Trace: f2fs_write_single_data_page+0x26b/0x9f0 [f2fs] f2fs_write_cache_pages+0x389/0xa60 [f2fs] __f2fs_write_data_pages+0x26b/0x2d0 [f2fs] f2fs_write_data_pages+0x2e/0x40 [f2fs] do_writepages+0xd3/0x1b0 __writeback_single_inode+0x5b/0x420 writeback_sb_inodes+0x236/0x5a0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x2a3/0x490 wb_do_writeback+0x2b2/0x330 wb_workfn+0x6a/0x260 process_one_work+0x270/0x5e0 worker_thread+0x52/0x3e0 kthread+0xf4/0x120 ret_from_fork+0x29/0x50 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-12 20:00:36 -07:00
Chao Yu	6fd257cb35	f2fs: fix to keep consistent i_gc_rwsem lock order i_gc_rwsem[WRITE] and i_gc_rwsem[READ] lock order is reversed in gc_data_segment() and f2fs_dio_write_iter(), fix to keep consistent lock order as below: 1. lock i_gc_rwsem[WRITE] 2. lock i_gc_rwsem[READ] Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-12 20:00:34 -07:00
Chao Yu	c9b3649a93	f2fs: fix to drop all dirty pages during umount() if cp_error is set xfstest generic/361 reports a bug as below: f2fs_bug_on(sbi, sbi->fsync_node_num); kernel BUG at fs/f2fs/super.c:1627! RIP: 0010:f2fs_put_super+0x3a8/0x3b0 Call Trace: generic_shutdown_super+0x8c/0x1b0 kill_block_super+0x2b/0x60 kill_f2fs_super+0x87/0x110 deactivate_locked_super+0x39/0x80 deactivate_super+0x46/0x50 cleanup_mnt+0x109/0x170 __cleanup_mnt+0x16/0x20 task_work_run+0x65/0xa0 exit_to_user_mode_prepare+0x175/0x190 syscall_exit_to_user_mode+0x25/0x50 do_syscall_64+0x4c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc During umount(), if cp_error is set, f2fs_wait_on_all_pages() should not stop waiting all F2FS_WB_CP_DATA pages to be writebacked, otherwise, fsync_node_num can be non-zero after f2fs_wait_on_all_pages() causing this bug. In this case, to avoid deadloop in f2fs_wait_on_all_pages(), it needs to drop all dirty pages rather than redirtying them. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 11:03:02 -07:00
Chao Yu	5cdb422c83	f2fs: fix to avoid use-after-free for cached IPU bio xfstest generic/019 reports a bug: kernel BUG at mm/filemap.c:1619! RIP: 0010:folio_end_writeback+0x8a/0x90 Call Trace: end_page_writeback+0x1c/0x60 f2fs_write_end_io+0x199/0x420 bio_endio+0x104/0x180 submit_bio_noacct+0xa5/0x510 submit_bio+0x48/0x80 f2fs_submit_write_bio+0x35/0x300 f2fs_submit_merged_ipu_write+0x2a0/0x2b0 f2fs_write_single_data_page+0x838/0x8b0 f2fs_write_cache_pages+0x379/0xa30 f2fs_write_data_pages+0x30c/0x340 do_writepages+0xd8/0x1b0 __writeback_single_inode+0x44/0x370 writeback_sb_inodes+0x233/0x4d0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x1dd/0x2d0 wb_workfn+0x367/0x4a0 process_one_work+0x21d/0x430 worker_thread+0x4e/0x3c0 kthread+0x103/0x130 ret_from_fork+0x2c/0x50 The root cause is: after cp_error is set, f2fs_submit_merged_ipu_write() in f2fs_write_single_data_page() tries to flush IPU bio in cache, however f2fs_submit_merged_ipu_write() missed to check validity of @bio parameter, result in submitting random cached bio which belong to other IO context, then it will cause use-after-free issue, fix it by adding additional validity check. Fixes: `0b20fcec86` ("f2fs: cache global IPU bio") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 11:01:49 -07:00
Chao Yu	c277991d7c	f2fs: remove unneeded in-memory i_crtime copy i_crtime will never change after inode creation, so we don't need to copy it into f2fs_inode_info.i_disk_time[3], and monitor its change to decide whether updating inode page, remove related stuff. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 11:00:58 -07:00
Chao Yu	68f0453dab	f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only() f2fs has supported multi-device feature, to check devices' rw status, it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix it. Meanwhile, it removes f2fs_hw_is_readonly() check condition in: - f2fs_write_checkpoint() - f2fs_convert_inline_inode() As it has checked f2fs_readonly() condition, and if f2fs' devices were readonly, f2fs_readonly() must be true. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 11:00:31 -07:00
Weizhao Ouyang	0c9f452195	f2fs: use common implementation of file type Use common implementation of file type conversion helpers. Signed-off-by: Weizhao Ouyang <o451686892@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 10:58:45 -07:00
Yangtao Li	3094e5579b	f2fs: merge lz4hc_compress_pages() to lz4_compress_pages() Remove unnecessary lz4hc_compress_pages(). Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 10:58:45 -07:00
Yangtao Li	084e15ea14	f2fs: convert to use sysfs_emit Let's use sysfs_emit. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 10:58:45 -07:00
Yangtao Li	c2c14ca5b1	f2fs: set default compress option only when sb_has_compression If the compress feature is not enabled, there is no need to set compress-related parameters. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 10:58:45 -07:00
Yonggil Song	d11cef14f8	f2fs: Fix system crash due to lack of free space in LFS When f2fs tries to checkpoint during foreground gc in LFS mode, system crash occurs due to lack of free space if the amount of dirty node and dentry pages generated by data migration exceeds free space. The reproduction sequence is as follows. - 20GiB capacity block device (null_blk) - format and mount with LFS mode - create a file and write 20,000MiB - 4k random write on full range of the file RIP: 0010:new_curseg+0x48a/0x510 [f2fs] Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff RSP: 0018:ffff977bc397b218 EFLAGS: 00010246 RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0 RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8 RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40 R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000 R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000 FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> allocate_segment_by_default+0x9c/0x110 [f2fs] f2fs_allocate_data_block+0x243/0xa30 [f2fs] ? __mod_lruvec_page_state+0xa0/0x150 do_write_page+0x80/0x160 [f2fs] f2fs_do_write_node_page+0x32/0x50 [f2fs] __write_node_page+0x339/0x730 [f2fs] f2fs_sync_node_pages+0x5a6/0x780 [f2fs] block_operations+0x257/0x340 [f2fs] f2fs_write_checkpoint+0x102/0x1050 [f2fs] f2fs_gc+0x27c/0x630 [f2fs] ? folio_mark_dirty+0x36/0x70 f2fs_balance_fs+0x16f/0x180 [f2fs] This patch adds checking whether free sections are enough before checkpoint during gc. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> [Jaegeuk Kim: code clean-up] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 10:58:45 -07:00
Yangtao Li	19e0e21a51	f2fs: remove struct victim_selection default_v_ops There is only single instance of these ops, and Jaegeuk point out that: Originally this was intended to give a chance to provide other allocation option. Anyway, it seems quit hard to do it anymore. So remove the indirection and call f2fs_get_victim() directly. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-10 10:58:44 -07:00
Jaegeuk Kim	da6ea0b050	f2fs: fix null pointer panic in tracepoint in __replace_atomic_write_block We got a kernel panic if old_addr is NULL. https://bugzilla.kernel.org/show_bug.cgi?id=217266 BUG: kernel NULL pointer dereference, address: 0000000000000000 Call Trace: <TASK> f2fs_commit_atomic_write+0x619/0x990 [f2fs a1b985b80f5babd6f3ea778384908880812bfa43] __f2fs_ioctl+0xd8e/0x4080 [f2fs a1b985b80f5babd6f3ea778384908880812bfa43] ? vfs_write+0x2ae/0x3f0 ? vfs_write+0x2ae/0x3f0 __x64_sys_ioctl+0x91/0xd0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f69095fe53f Fixes: `2f3a9ae990` ("f2fs: introduce trace_f2fs_replace_atomic_write_block") Cc: <stable@vger.kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-04 13:57:30 -07:00
Qilin Tan	144f1cd40b	f2fs: fix iostat lock protection Made iostat lock irq safe to avoid potentinal deadlock. Deadlock scenario: f2fs_attr_store -> f2fs_sbi_store -> _sbi_store -> spin_lock(sbi->iostat_lock) <interrupt request> -> scsi_end_request -> bio_endio -> f2fs_dio_read_end_io -> f2fs_update_iostat -> spin_lock_irqsave(sbi->iostat_lock) ===> Dead lock here Fixes: `61803e9843` ("f2fs: fix iostat related lock protection") Fixes: `a1e09b03e6` ("f2fs: use iomap for direct I/O") Signed-off-by: Qilin Tan <qilin.tan@mediatek.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-04 13:57:30 -07:00
Yohan Joung	f26aaee60a	f2fs: fix align check for npo2 Fix alignment check to be correct in npo2 as well Signed-off-by: Yohan Joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-04 13:57:30 -07:00
Yangtao Li	d4998b7895	f2fs: add compression feature check for all compress mount opt Opt_compress_chksum, Opt_compress_mode and Opt_compress_cache lack the necessary check to see if the image supports compression, let's add it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-04 13:57:29 -07:00
Yangtao Li	c0abbdf2b5	f2fs: convert is_extension_exist() to return bool type is_extension_exist() only return two values, 0 or 1. So there is no need to use int type. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-04-04 13:57:29 -07:00
Jaegeuk Kim	1aa161e431	f2fs: fix scheduling while atomic in decompression path [ 16.945668][ C0] Call trace: [ 16.945678][ C0] dump_backtrace+0x110/0x204 [ 16.945706][ C0] dump_stack_lvl+0x84/0xbc [ 16.945735][ C0] __schedule_bug+0xb8/0x1ac [ 16.945756][ C0] __schedule+0x724/0xbdc [ 16.945778][ C0] schedule+0x154/0x258 [ 16.945793][ C0] bit_wait_io+0x48/0xa4 [ 16.945808][ C0] out_of_line_wait_on_bit+0x114/0x198 [ 16.945824][ C0] __sync_dirty_buffer+0x1f8/0x2e8 [ 16.945853][ C0] __f2fs_commit_super+0x140/0x1f4 [ 16.945881][ C0] f2fs_commit_super+0x110/0x28c [ 16.945898][ C0] f2fs_handle_error+0x1f4/0x2f4 [ 16.945917][ C0] f2fs_decompress_cluster+0xc4/0x450 [ 16.945942][ C0] f2fs_end_read_compressed_page+0xc0/0xfc [ 16.945959][ C0] f2fs_handle_step_decompress+0x118/0x1cc [ 16.945978][ C0] f2fs_read_end_io+0x168/0x2b0 [ 16.945993][ C0] bio_endio+0x25c/0x2c8 [ 16.946015][ C0] dm_io_dec_pending+0x3e8/0x57c [ 16.946052][ C0] clone_endio+0x134/0x254 [ 16.946069][ C0] bio_endio+0x25c/0x2c8 [ 16.946084][ C0] blk_update_request+0x1d4/0x478 [ 16.946103][ C0] scsi_end_request+0x38/0x4cc [ 16.946129][ C0] scsi_io_completion+0x94/0x184 [ 16.946147][ C0] scsi_finish_command+0xe8/0x154 [ 16.946164][ C0] scsi_complete+0x90/0x1d8 [ 16.946181][ C0] blk_done_softirq+0xa4/0x11c [ 16.946198][ C0] _stext+0x184/0x614 [ 16.946214][ C0] __irq_exit_rcu+0x78/0x144 [ 16.946234][ C0] handle_domain_irq+0xd4/0x154 [ 16.946260][ C0] gic_handle_irq.33881+0x5c/0x27c [ 16.946281][ C0] call_on_irq_stack+0x40/0x70 [ 16.946298][ C0] do_interrupt_handler+0x48/0xa4 [ 16.946313][ C0] el1_interrupt+0x38/0x68 [ 16.946346][ C0] el1h_64_irq_handler+0x20/0x30 [ 16.946362][ C0] el1h_64_irq+0x78/0x7c [ 16.946377][ C0] finish_task_switch+0xc8/0x3d8 [ 16.946394][ C0] __schedule+0x600/0xbdc [ 16.946408][ C0] preempt_schedule_common+0x34/0x5c [ 16.946423][ C0] preempt_schedule+0x44/0x48 [ 16.946438][ C0] process_one_work+0x30c/0x550 [ 16.946456][ C0] worker_thread+0x414/0x8bc [ 16.946472][ C0] kthread+0x16c/0x1e0 [ 16.946486][ C0] ret_from_fork+0x10/0x20 Fixes: `bff139b49d` ("f2fs: handle decompress only post processing in softirq") Fixes: `95fa90c9e5` ("f2fs: support recording errors into superblock") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:40 -07:00
Hans Holmberg	92318f20d7	f2fs: preserve direct write semantics when buffering is forced In some cases, e.g. for zoned block devices, direct writes are forced into buffered writes that will populate the page cache and be written out just like buffered io. Direct reads, on the other hand, is supported for the zoned block device case. This has the effect that applications built for direct io will fill up the page cache with data that will never be read, and that is a waste of resources. If we agree that this is a problem, how do we fix it? A) Supporting proper direct writes for zoned block devices would be the best, but it is currently not supported (probably for a good but non-obvious reason). Would it be feasible to implement proper direct IO? B) Avoid the cost of keeping unwanted data by syncing and throwing out the cached pages for buffered O_DIRECT writes before completion. This patch implements B) by reusing the code for how partial block writes are flushed out on the "normal" direct write path. Note that this changes the performance characteristics of f2fs quite a bit. Direct IO performance for zoned block devices is lower for small writes after this patch, but this should be expected with direct IO and in line with how f2fs behaves on top of conventional block devices. Another open question is if the flushing should be done for all cases where buffered writes are forced. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Yonggil Song <yonggil.song@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:39 -07:00
Yangtao Li	babedcbac1	f2fs: compress: fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages() BUG_ON() will be triggered when writing files concurrently, because the same page is writtenback multiple times. 1597 void folio_end_writeback(struct folio *folio) 1598 { ...... 1618 if (!__folio_end_writeback(folio)) 1619 BUG(); ...... 1625 } kernel BUG at mm/filemap.c:1619! Call Trace: <TASK> f2fs_write_end_io+0x1a0/0x370 blk_update_request+0x6c/0x410 blk_mq_end_request+0x15/0x130 blk_complete_reqs+0x3c/0x50 __do_softirq+0xb8/0x29b ? sort_range+0x20/0x20 run_ksoftirqd+0x19/0x20 smpboot_thread_fn+0x10b/0x1d0 kthread+0xde/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Below is the concurrency scenario: [Process A] [Process B] [Process C] f2fs_write_raw_pages() - redirty_page_for_writepage() - unlock page() f2fs_do_write_data_page() - lock_page() - clear_page_dirty_for_io() - set_page_writeback() [1st writeback] ..... - unlock page() generic_perform_write() - f2fs_write_begin() - wait_for_stable_page() - f2fs_write_end() - set_page_dirty() - lock_page() - f2fs_do_write_data_page() - set_page_writeback() [2st writeback] This problem was introduced by the previous commit `7377e85396` ("f2fs: compress: fix potential deadlock of compress file"). All pagelocks were released in f2fs_write_raw_pages(), but whether the page was in the writeback state was ignored in the subsequent writing process. Let's fix it by waiting for the page to writeback before writing. Cc: Christoph Hellwig <hch@lst.de> Fixes: `4c8ff7095b` ("f2fs: support data compression") Fixes: `7377e85396` ("f2fs: compress: fix potential deadlock of compress file") Signed-off-by: Qi Han <hanqi@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:39 -07:00
Yangtao Li	c948be797d	f2fs: remove else in f2fs_write_cache_pages() As Christoph Hellwig point out: Please avoid the else by doing the goto in the branch. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:39 -07:00
Jaegeuk Kim	0b37ed21e3	f2fs: apply zone capacity to all zone type If we manage the zone capacity per zone type, it'll break the GC assumption. And, the current logic complains valid block count mismatch. Let's apply zone capacity to all zone type, if specified. Fixes: `de881df977` ("f2fs: support zone capacity less than zone size") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:39 -07:00
Yangtao Li	b822dc9149	f2fs: fix to handle filemap_fdatawrite() error in f2fs_ioc_decompress_file/f2fs_ioc_compress_file It seems inappropriate that the current logic does not handle filemap_fdatawrite() errors, so let's fix it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:39 -07:00
Yangtao Li	5bb9c111cd	f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show() BIW reduce the s_flag array size and make s_flag constant. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:39 -07:00
Yonggil Song	6797ebc4ac	f2fs: Fix discard bug on zoned block devices with 2MiB zone size When using f2fs on a zoned block device with 2MiB zone size, IO errors occurs because f2fs tries to write data to a zone that has not been reset. The cause is that f2fs tries to discard multiple zones at once. This is caused by a condition in f2fs_clear_prefree_segments that does not check for zoned block devices when setting the discard range. This leads to invalid reset commands and write pointer mismatches. This patch fixes the zoned block device with 2MiB zone size to reset one zone at a time. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:38 -07:00
Jaegeuk Kim	bf21acf995	f2fs: remove entire rb_entry sharing This is a last part to remove the memory sharing for rb_tree in extent_cache. This should also fix arm32 memory alignment issue. [struct extent_node] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { union { struct { struct { [16] unsigned int fofs; [12] unsigned int ofs; unsigned int len; unsigned int len; }; unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: `13054c548a` ("f2fs: introduce infra macro and data structure of rb-tree extent cache") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:38 -07:00
Jaegeuk Kim	f69475dd48	f2fs: factor out discard_cmd usage from general rb_tree use This is a second part to remove the mixed use of rb_tree in discard_cmd from extent_cache. This should also fix arm32 memory alignment issue caused by shared rb_entry. [struct discard_cmd] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { union { struct { struct { [16] block_t lstart; [12] unsigned int ofs; block_t len; unsigned int len; }; unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: `004b686218` ("f2fs: use rb-tree to track pending discard commands") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:38 -07:00
Jaegeuk Kim	043d2d00b4	f2fs: factor out victim_entry usage from general rb_tree use Let's reduce the complexity of mixed use of rb_tree in victim_entry from extent_cache and discard_cmd. This should fix arm32 memory alignment issue caused by shared rb_entry. [struct victim_entry] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { struct { unsigned int ofs; unsigned int len; }; [16] unsigned long long mtime; [12] unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: `093749e296` ("f2fs: support age threshold based garbage collection") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:38 -07:00
Yonggil Song	c17caf0ba3	f2fs: fix uninitialized skipped_gc_rwsem When f2fs skipped a gc round during victim migration, there was a bug which would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem was not initialized. It fixes the bug by correctly initializing the skipped_gc_rwsem inside the gc loop. Fixes: `6f8d445506` ("f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc") Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:38 -07:00
Yangtao Li	8051692f5f	f2fs: handle dqget error in f2fs_transfer_project_quota() We should set the error code when dqget() failed. Fixes: `2c1d030569` ("f2fs: support F2FS_IOC_FS{GET,SET}XATTR") Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:37 -07:00
Yangtao Li	447286ebad	f2fs: convert to use bitmap API Let's use BIT() and GENMASK() instead of open it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:37 -07:00
Yangtao Li	960fa2c828	f2fs: export compress_percent and compress_watermark entries This patch export below sysfs entries for better control cached compress page count. /sys/fs/f2fs/<disk>/compress_watermark /sys/fs/f2fs/<disk>/compress_percent Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:37 -07:00
Li Zetao	6063037506	f2fs: make f2fs_sync_inode_meta() static After commit `26b5a07919` ("f2fs: cleanup dirty pages if recover failed"), f2fs_sync_inode_meta() is only used in checkpoint.c, so f2fs_sync_inode_meta() should only be visible inside. Delete the declaration in the header file and change f2fs_sync_inode_meta() to static. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2023-03-29 15:17:37 -07:00
Christian Brauner	d549b74174	fs: rename generic posix acl handlers Reflect in their naming and document that they are kept around for legacy reasons and shouldn't be used anymore by new code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2023-03-06 09:57:13 +01:00
Christian Brauner	a5488f2983	fs: simplify ->listxattr() implementation The ext{2,4}, erofs, f2fs, and jffs2 filesystems use the same logic to check whether a given xattr can be listed. Simplify them and avoid open-coding the same check by calling the helper we introduced earlier. Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-erofs@lists.ozlabs.org Cc: linux-ext4@vger.kernel.org Cc: linux-mtd@lists.infradead.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2023-03-06 09:57:12 +01:00
Christian Brauner	0c95c025a0	fs: drop unused posix acl handlers Remove struct posix_acl_{access,default}_handler for all filesystems that don't depend on the xattr handler in their inode->i_op->listxattr() method in any way. There's nothing more to do than to simply remove the handler. It's been effectively unused ever since we introduced the new posix acl api. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2023-03-06 09:57:12 +01:00
Linus Torvalds	103830683c	f2fs-for-6.3-rc1 In this round, we've got a huge number of patches that improve code readability along with minor bug fixes, while we've mainly fixed some critical issues in recently-added per-block age-based extent_cache, atomic write support, and some folio cases. Enhancement: - add sysfs nodes to set last_age_weight and manage discard_io_aware_gran - show ipu policy in debugfs - reduce stack memory cost by using bitfield in struct f2fs_io_info - introduce trace_f2fs_replace_atomic_write_block - enhance iostat support and adds flush commands Bug fix: - revert "f2fs: truncate blocks in batch in __complete_revoke_list()" - fix kernel crash on the atomic write abort flow - call clear_page_private_reference in .{release,invalid}_folio - support .migrate_folio for compressed inode - fix cgroup writeback accounting with fs-layer encryption - retry to update the inode page given data corruption - fix kernel crash due to null io->bio - fix some bugs in per-block age-based extent_cache: a. wrong calculation of block age b. update age extent in f2fs_do_zero_range() c. update age extent correctly during truncation -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmP9M/cACgkQQBSofoJI UNIX1Q//Yp+nDeY91H3IO6aMSHPqRoBBnVTr8ERtLUF0fuQbBkzcQE+t8cMSoYDM 88sxoC+F7UnovNr84VeuKlHN4hYyAXuxj8OZetXI3XX+yiO+auEPtljGA0BaGwqL 93lIg8nIQ2ing/oZ/4+h4dvpYCPrhKOQS6h1sHhIWlql6Wxwxq01uA47i0Ni6y/o D23JPFaDQfumN8qy1bm5xfLRhTQmaE35n5NhcBJpUD/rGK92NPXv7RLKPgZc3tKN tJmL+NXa3NNEx5e5TSP7JX+rhD7KlL5XlB/m8LbpPIx338I0pt0uzAc7nTIOlzM3 DI0q3HXe9U3+JBHi+rKxkIniiRDmvhPx3NzdgcsYg75EnwdrNazsRHulBUEAXB4v ghHbx53OxA2uSnUVbkY4HXNYf7cYjrx5vbX0oqEx48btBCC8KGFIcHtI72tIBee3 xdCxoM3e2AWijkFBOCkThXNuNNbdifQzn2e7xR7W+o0L9hwdR5t7tHhHT+cqG9Ox 6UKWoZZUjYUAV/YQT5Qh6570GsGncM8gHAUMz7DVTIOB9wYkHtb0Q9tUmoQccwf5 46r84c4/jxUQHt8SkXBBl1aiqHmR7EF17YvM5AXIdG3DqqOy70NWkM48hMOyIy2G eY/wTVJwpd6oDveQPtxaHn7Wo3jSPNUlidOfAYQ7Itm34VXY/zc= =yXLl -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've got a huge number of patches that improve code readability along with minor bug fixes, while we've mainly fixed some critical issues in recently-added per-block age-based extent_cache, atomic write support, and some folio cases. Enhancements: - add sysfs nodes to set last_age_weight and manage discard_io_aware_gran - show ipu policy in debugfs - reduce stack memory cost by using bitfield in struct f2fs_io_info - introduce trace_f2fs_replace_atomic_write_block - enhance iostat support and adds flush commands Bug fixes: - revert "f2fs: truncate blocks in batch in __complete_revoke_list()" - fix kernel crash on the atomic write abort flow - call clear_page_private_reference in .{release,invalid}_folio - support .migrate_folio for compressed inode - fix cgroup writeback accounting with fs-layer encryption - retry to update the inode page given data corruption - fix kernel crash due to NULL io->bio - fix some bugs in per-block age-based extent_cache: - wrong calculation of block age - update age extent in f2fs_do_zero_range() - update age extent correctly during truncation" * tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits) f2fs: drop unnecessary arg for f2fs_ioc_*() f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()" f2fs: synchronize atomic write aborts f2fs: fix wrong segment count f2fs: replace si->sbi w/ sbi in stat_show() f2fs: export ipu policy in debugfs f2fs: make kobj_type structures constant f2fs: fix to do sanity check on extent cache correctly f2fs: add missing description for ipu_policy node f2fs: fix to set ipu policy f2fs: fix typos in comments f2fs: fix kernel crash due to null io->bio f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx() f2fs: add sysfs nodes to set last_age_weight f2fs: fix f2fs_show_options to show nogc_merge mount option f2fs: fix cgroup writeback accounting with fs-layer encryption f2fs: fix wrong calculation of block age f2fs: fix to update age extent in f2fs_do_zero_range() f2fs: fix to update age extent correctly during truncation f2fs: fix to avoid potential memory corruption in __update_iostat_latency() ...	2023-02-27 16:18:51 -08:00
Linus Torvalds	3822a7c409	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K DmxHkn0LAitGgJRS/W9w81yrgig9tAQ= =MlGs -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...	2023-02-23 17:09:35 -08:00
Linus Torvalds	6639c3ce7f	fsverity updates for 6.3 Fix the longstanding implementation limitation that fsverity was only supported when the Merkle tree block size, filesystem block size, and PAGE_SIZE were all equal. Specifically, add support for Merkle tree block sizes less than PAGE_SIZE, and make ext4 support fsverity on filesystems where the filesystem block size is less than PAGE_SIZE. Effectively, this means that fsverity can now be used on systems with non-4K pages, at least on ext4. These changes have been tested using the verity group of xfstests, newly updated to cover the new code paths. Also update fs/verity/ to support verifying data from large folios. There's also a similar patch for fs/crypto/, to support decrypting data from large folios, which I'm including in this pull request to avoid a merge conflict between the fscrypt and fsverity branches. There will be a merge conflict in fs/buffer.c with some of the foliation work in the mm tree. Please use the merge resolution from linux-next. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCY/KJtRQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK/A/AP0RUlCClBRuHwXPRG0we8R1L153ga4s Vl+xRpCr+SswXwEAiOEpYN5cXoVKzNgxbEXo2pQzxi5lrpjZgUI6CL3DuQs= =ZRFX -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull fsverity updates from Eric Biggers: "Fix the longstanding implementation limitation that fsverity was only supported when the Merkle tree block size, filesystem block size, and PAGE_SIZE were all equal. Specifically, add support for Merkle tree block sizes less than PAGE_SIZE, and make ext4 support fsverity on filesystems where the filesystem block size is less than PAGE_SIZE. Effectively, this means that fsverity can now be used on systems with non-4K pages, at least on ext4. These changes have been tested using the verity group of xfstests, newly updated to cover the new code paths. Also update fs/verity/ to support verifying data from large folios. There's also a similar patch for fs/crypto/, to support decrypting data from large folios, which I'm including in here to avoid a merge conflict between the fscrypt and fsverity branches" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fscrypt: support decrypting data from large folios fsverity: support verifying data from large folios fsverity.rst: update git repo URL for fsverity-utils ext4: allow verity with fs block size < PAGE_SIZE fs/buffer.c: support fsverity in block_read_full_folio() f2fs: simplify f2fs_readpage_limit() ext4: simplify ext4_readpage_limit() fsverity: support enabling with tree block size < PAGE_SIZE fsverity: support verification with tree block size < PAGE_SIZE fsverity: replace fsverity_hash_page() with fsverity_hash_block() fsverity: use EFBIG for file too large to enable verity fsverity: store log2(digest_size) precomputed fsverity: simplify Merkle tree readahead size calculation fsverity: use unsigned long for level_start fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG fsverity: pass pos and size to ->write_merkle_tree_block fsverity: optimize fsverity_cleanup_inode() on non-verity files fsverity: optimize fsverity_prepare_setattr() on non-verity files fsverity: optimize fsverity_file_open() on non-verity files	2023-02-20 12:33:41 -08:00

1 2 3 4 5 ...

3874 commits