linux/fs/btrfs
Filipe Manana f6df27dd27 btrfs: do not commit delayed inode when logging a file in full sync mode
When logging a regular file in full sync mode, we are currently committing
its delayed inode item. This is to ensure that we never miss copying the
inode item, with its most up to date data, into the log tree.

However that is not necessary since commit e4545de5b0 ("Btrfs: fix fsync
data loss after append write"), because even if we don't find the leaf
with the inode item when looking for leaves that changed in the current
transaction, we end up logging the inode item later using the in-memory
content. In case we find the leaf containing the inode item, we already
end up using the in-memory inode for filling the inode item in the log
tree, and not the inode item that is in the fs/subvolume tree, as it
might be not up to date (copy_items() -> fill_inode_item()).

So don't commit the delayed inode item, which brings a couple of benefits:

1) Avoid writing the inode item to the fs/subvolume btree, saving time and
   reducing lock contention on the btree;

2) In case no other item for the inode was changed, added or deleted in
   the same leaf where the inode item is located, we ended up copying
   all the items in that leaf to the log tree - it's harmless from a
   functional point of view, but it wastes time and log tree space.

This patch is part of a patch set comprised of the following patches:

  btrfs: check if a log tree exists at inode_logged()
  btrfs: remove no longer needed checks for NULL log context
  btrfs: do not log new dentries when logging that a new name exists
  btrfs: always update the logged transaction when logging new names
  btrfs: avoid expensive search when dropping inode items from log
  btrfs: add helper to truncate inode items when logging inode
  btrfs: avoid expensive search when truncating inode items from the log
  btrfs: avoid search for logged i_size when logging inode if possible
  btrfs: avoid attempt to drop extents when logging inode for the first time
  btrfs: do not commit delayed inode when logging a file in full sync mode

This is patch 10/10 and the following test results compare a branch with
the whole patch set applied versus a branch without any of the patches
applied.

The following script was used to test dbench with 8 and 16 jobs on a
machine with 12 cores, 64G of RAM, a NVME device and using a non-debug
kernel config (Debian's default):

  $ cat test.sh
  #!/bin/bash

  if [ $# -ne 1 ]; then
      echo "Use $0 NUM_JOBS"
      exit 1
  fi

  NUM_JOBS=$1

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1
  MOUNT_OPTIONS="-o ssd"
  MKFS_OPTIONS="-m single -d single"

  echo "performance" | \
      tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  dbench -D $MNT -t 120 $NUM_JOBS

  umount $MNT

The results were the following:

8 jobs, before patchset:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    4113896     0.009   238.665
 Close        3021699     0.001     0.590
 Rename        174215     0.082   238.733
 Unlink        830977     0.049   238.642
 Deltree           96     2.232     8.022
 Mkdir             48     0.003     0.005
 Qpathinfo    3729013     0.005     2.672
 Qfileinfo     653206     0.001     0.152
 Qfsinfo       683866     0.002     0.526
 Sfileinfo     335055     0.004     1.571
 Find         1441800     0.016     4.288
 WriteX       2049644     0.010     3.982
 ReadX        6449786     0.003     0.969
 LockX          13400     0.002     0.043
 UnlockX        13400     0.001     0.075
 Flush         288349     2.521   245.516

Throughput 1075.73 MB/sec  8 clients  8 procs  max_latency=245.520 ms

8 jobs, after patchset:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    4154282     0.009   156.675
 Close        3051450     0.001     0.843
 Rename        175912     0.072     4.444
 Unlink        839067     0.048    66.050
 Deltree           96     2.131     5.979
 Mkdir             48     0.002     0.004
 Qpathinfo    3765575     0.005     3.079
 Qfileinfo     659582     0.001     0.099
 Qfsinfo       690474     0.002     0.155
 Sfileinfo     338366     0.004     1.419
 Find         1455816     0.016     3.423
 WriteX       2069538     0.010     4.328
 ReadX        6512429     0.003     0.840
 LockX          13530     0.002     0.078
 UnlockX        13530     0.001     0.051
 Flush         291158     2.500   163.468

Throughput 1105.45 MB/sec  8 clients  8 procs  max_latency=163.474 ms

+2.7% throughput, -40.1% max latency

16 jobs, before patchset:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    5457602     0.033   337.098
 Close        4008979     0.002     2.018
 Rename        231051     0.323   254.054
 Unlink       1102209     0.202   337.243
 Deltree          160     6.521    31.720
 Mkdir             80     0.003     0.007
 Qpathinfo    4946147     0.014     6.988
 Qfileinfo     867440     0.001     1.642
 Qfsinfo       907081     0.003     1.821
 Sfileinfo     444433     0.005     2.053
 Find         1912506     0.067     7.854
 WriteX       2724852     0.018     7.428
 ReadX        8553883     0.003     2.059
 LockX          17770     0.003     0.350
 UnlockX        17770     0.002     0.627
 Flush         382533     2.810   353.691

Throughput 1413.09 MB/sec  16 clients  16 procs  max_latency=353.696 ms

16 jobs, after patchset:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    5393156     0.034   303.181
 Close        3961986     0.002     1.502
 Rename        228359     0.320   253.379
 Unlink       1088920     0.206   303.409
 Deltree          160     6.419    30.088
 Mkdir             80     0.003     0.004
 Qpathinfo    4887967     0.015     7.722
 Qfileinfo     857408     0.001     1.651
 Qfsinfo       896343     0.002     2.147
 Sfileinfo     439317     0.005     4.298
 Find         1890018     0.073     8.347
 WriteX       2693356     0.018     6.373
 ReadX        8453485     0.003     3.836
 LockX          17562     0.003     0.486
 UnlockX        17562     0.002     0.635
 Flush         378023     2.802   315.904

Throughput 1454.46 MB/sec  16 clients  16 procs  max_latency=315.910 ms

+2.9% throughput, -11.3% max latency

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26 19:08:01 +02:00
..
tests btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
acl.c overlayfs update for 5.15 2021-09-02 09:21:27 -07:00
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
backref.h btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
block-group.c btrfs: zoned: activate new block group 2021-10-26 19:07:59 +02:00
block-group.h btrfs: zoned: implement active zone tracking 2021-10-26 19:07:59 +02:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: initial fsverity support 2021-08-23 13:19:09 +02:00
check-integrity.c btrfs: check-integrity: drop kmap/kunmap for block pages 2021-08-23 13:19:00 +02:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: rework btrfs_decompress_buf2page() 2021-08-23 13:19:04 +02:00
compression.h btrfs: rework btrfs_decompress_buf2page() 2021-08-23 13:19:04 +02:00
ctree.c btrfs: introduce btrfs_search_backwards function 2021-08-23 13:19:09 +02:00
ctree.h btrfs: zoned: implement active zone tracking 2021-10-26 19:07:59 +02:00
delalloc-space.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: add ro compat flags to inodes 2021-08-23 13:19:09 +02:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
delayed-ref.h btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
dev-replace.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: unify lookup return value when dir entry is missing 2021-10-07 22:06:32 +02:00
discard.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: convert latest_bdev type to btrfs_device and rename 2021-10-26 19:08:00 +02:00
disk-io.h btrfs: split alloc_log_tree() 2021-02-09 02:46:07 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: zoned: avoid chunk allocation if active block group has enough space 2021-10-26 19:07:59 +02:00
extent_io.c btrfs: convert latest_bdev type to btrfs_device and rename 2021-10-26 19:08:00 +02:00
extent_io.h btrfs: zoned: finish fully written block group 2021-10-26 19:08:00 +02:00
extent_map.c btrfs: fix parameter description of btrfs_add_extent_mapping 2021-02-08 22:58:53 +01:00
extent_map.h btrfs: remove extent_map::bdev 2019-11-18 23:43:44 +01:00
file-item.c btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling 2021-09-17 19:29:38 +02:00
file.c btrfs: fix abort logic in btrfs_replace_file_extents 2021-10-07 22:08:06 +02:00
free-space-cache.c btrfs: zoned: implement active zone tracking 2021-10-26 19:07:59 +02:00
free-space-cache.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-01-25 18:44:37 +01:00
free-space-tree.h btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: convert latest_bdev type to btrfs_device and rename 2021-10-26 19:08:00 +02:00
ioctl.c btrfs: defrag: enable defrag for subpage case 2021-10-26 19:07:58 +02:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-06-22 14:11:57 +02:00
locking.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: rework lzo_decompress_bio() to make it subpage compatible 2021-08-23 13:19:04 +02:00
Makefile btrfs: initial fsverity support 2021-08-23 13:19:09 +02:00
misc.h btrfs: use correct header for div_u64 in misc.h 2021-09-07 14:29:50 +02:00
ordered-data.c btrfs: zoned: fix double counting of split ordered extent 2021-09-07 14:30:41 +02:00
ordered-data.h btrfs: remove uptodate parameter from btrfs_dec_test_first_ordered_pending 2021-08-23 13:19:02 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: props: change how empty value is interpreted 2021-06-22 14:11:58 +02:00
props.h btrfs: delete unused function btrfs_set_prop_trans 2019-04-29 19:02:54 +02:00
qgroup.c btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
qgroup.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
raid56.c btrfs: constify and cleanup variables in comparators 2021-08-23 13:19:03 +02:00
raid56.h btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes 2019-07-01 13:34:58 +02:00
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: subpage: make readahead work properly 2021-03-16 11:06:21 +01:00
ref-verify.c btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool 2021-08-23 13:19:00 +02:00
ref-verify.h
reflink.c btrfs: reflink: initialize return value to 0 in btrfs_extent_same() 2021-10-26 19:03:57 +02:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: zoned: finish relocating block group 2021-10-26 19:08:00 +02:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
send.c btrfs: send: simplify send_create_inode_if_needed 2021-10-25 21:17:16 +02:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: prevent __btrfs_dump_space_info() to underflow its free space 2021-09-17 19:29:54 +02:00
space-info.h btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
struct-funcs.c btrfs: add special case to setget helpers for 64k pages 2021-08-23 13:18:58 +02:00
subpage.c btrfs: subpage: pack all subpage bitmaps into a larger bitmap 2021-10-26 19:03:55 +02:00
subpage.h btrfs: subpage: pack all subpage bitmaps into a larger bitmap 2021-10-26 19:03:55 +02:00
super.c btrfs: use latest_dev in btrfs_show_devname 2021-10-26 19:08:00 +02:00
sysfs.c btrfs: sysfs: document structures and their associated files 2021-08-23 13:19:12 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
transaction.h btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
tree-checker.c btrfs: add ro compat flags to inodes 2021-08-23 13:19:09 +02:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: do not commit delayed inode when logging a file in full sync mode 2021-10-26 19:08:01 +02:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
tree-mod-log.c btrfs: fix race when picking most recent mod log operation for an old root 2021-04-20 19:27:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
verity.c btrfs: fix transaction handle leak after verity rollback failure 2021-09-17 19:29:41 +02:00
volumes.c btrfs: remove stale comment about the btrfs_show_devname 2021-10-26 19:08:00 +02:00
volumes.h btrfs: convert latest_bdev type to btrfs_device and rename 2021-10-26 19:08:00 +02:00
xattr.c for-5.12-rc1-tag 2021-03-05 12:21:14 -08:00
xattr.h
zlib.c btrfs: rework btrfs_decompress_buf2page() 2021-08-23 13:19:04 +02:00
zoned.c btrfs: zoned: finish fully written block group 2021-10-26 19:08:00 +02:00
zoned.h btrfs: zoned: finish fully written block group 2021-10-26 19:08:00 +02:00
zstd.c btrfs: rework btrfs_decompress_buf2page() 2021-08-23 13:19:04 +02:00