linux/fs
Dave Chinner 055388a318 xfs: dynamic speculative EOF preallocation
Currently the size of the speculative preallocation during delayed
allocation is fixed by either the allocsize mount option of a
default size. We are seeing a lot of cases where we need to
recommend using the allocsize mount option to prevent fragmentation
when buffered writes land in the same AG.

Rather than using a fixed preallocation size by default (up to 64k),
make it dynamic by basing it on the current inode size. That way the
EOF preallocation will increase as the file size increases.  Hence
for streaming writes we are much more likely to get large
preallocations exactly when we need it to reduce fragementation.

For default settings, the size of the initial extents is determined
by the number of parallel writers and the amount of memory in the
machine. For 4GB RAM and 4 concurrent 32GB file writes:

EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
   0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      1048576
   1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      1048576
   2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    2097152
   3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    4194304
   4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    8388608
   5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 16777208
   6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 16777088
   7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 16777088

and for 16 concurrent 16GB file writes:

 EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
   0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       262144
   1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       262144
   2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     524288
   3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    1048576
   4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    2097152
   5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  4194304
   6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  8388608
   7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 16777208

Because it is hard to take back specualtive preallocation, cases
where there are large slow growing log files on a nearly full
filesystem may cause premature ENOSPC. Hence as the filesystem nears
full, the maximum dynamic prealloc size іs reduced according to this
table (based on 4k block size):

freespace       max prealloc size
  >5%             full extent (8GB)
  4-5%             2GB (8GB >> 2)
  3-4%             1GB (8GB >> 3)
  2-3%           512MB (8GB >> 4)
  1-2%           256MB (8GB >> 5)
  <1%            128MB (8GB >> 6)

This should reduce the amount of space held in speculative
preallocation for such cases.

The allocsize mount option turns off the dynamic behaviour and fixes
the prealloc size to whatever the mount option specifies. i.e. the
behaviour is unchanged.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
2011-01-04 11:35:03 +11:00
..
9p convert v9fs 2010-10-29 04:16:38 -04:00
adfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
affs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
afs convert afs 2010-10-29 04:17:13 -04:00
autofs4 convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
befs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
bfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
btrfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2010-11-29 14:11:08 -08:00
cachefiles llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2010-11-19 15:32:22 -08:00
cifs cifs: fix a memleak in cifs_setattr_nounix() 2010-11-09 15:17:53 +00:00
coda convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
configfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
cramfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
debugfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
devpts convert get_sb_single() users 2010-10-29 04:16:28 -04:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2010-10-22 17:33:16 -07:00
ecryptfs BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
efs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
exofs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
exportfs exportfs: use dget_parent 2010-10-25 21:26:13 -04:00
ext2 new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ext3 BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
ext4 ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard 2010-11-19 21:47:07 -05:00
fat new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
freevxfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
fscache
fuse fuse: fix attributes after open(O_TRUNC) 2010-11-25 06:50:41 +09:00
gfs2 GFS2: Userland expects quota limit/warn/usage in 512b blocks 2010-11-19 11:20:29 +00:00
hfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
hfsplus new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
hostfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
hpfs Merge branches 'irq-core-for-linus' and 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-31 20:40:24 -04:00
hppfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
hugetlbfs hugetlbfs: lessen the impact of a deprecation warning 2010-11-12 07:55:32 -08:00
isofs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
jbd Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-10-27 20:13:18 -07:00
jbd2 jbd2: fix /proc/fs/jbd2/<dev> when using an external journal 2010-11-17 21:46:26 -05:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2010-10-30 08:31:35 -07:00
jfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
lockd BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
logfs fs: logfs: Fix up MTD=y build. 2010-11-01 16:34:56 -04:00
minix new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ncpfs BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
nfs NFS: Ensure we return the dirent->d_type when it is known 2010-11-22 13:24:48 -05:00
nfs_common
nfsd BKL: remove references to lock_kernel from comments 2010-11-17 08:59:32 -08:00
nilfs2 nilfs2: fix typo in comment of nilfs_dat_move function 2010-11-24 12:51:48 +09:00
nls
notify make fanotify_read() restartable across signals 2010-10-30 14:07:35 -04:00
ntfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ocfs2 BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
omfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
openpromfs sparc: fix openpromfs compile 2010-11-08 14:29:39 -08:00
partitions Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2010-10-25 07:45:10 -07:00
proc pagemap: set pagemap walk limit to PMD boundary 2010-11-25 06:50:46 +09:00
qnx4 new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
quota quota: Fix possible oops in __dquot_initialize() 2010-10-28 01:30:06 +02:00
ramfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
reiserfs reiserfs: fix inode mutex - reiserfs lock misordering 2010-11-25 06:50:48 +09:00
romfs convert get_sb_mtd() users to ->mount() 2010-10-29 04:16:26 -04:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2010-10-29 08:48:58 -07:00
sysfs convert sysfs 2010-10-29 04:17:08 -04:00
sysv new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ubifs convert ubifs 2010-10-29 04:16:36 -04:00
udf new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ufs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
xfs xfs: dynamic speculative EOF preallocation 2011-01-04 11:35:03 +11:00
aio.c new helper: ihold() 2010-10-25 21:26:11 -04:00
anon_inodes.c convert get_sb_pseudo() users 2010-10-29 04:16:33 -04:00
attr.c
bad_inode.c
binfmt_aout.c Don't dump task struct in a.out core-dumps 2010-10-14 10:57:40 -07:00
binfmt_elf.c ARM: 6342/1: fix ASLR of PIE executables 2010-10-08 10:02:53 +01:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c convert get_sb_single() users 2010-10-29 04:16:28 -04:00
binfmt_script.c
binfmt_som.c
bio-integrity.c
bio.c bio: take care not overflow page count when mapping/copying user data 2010-11-10 14:40:43 +01:00
block_dev.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
buffer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
char_dev.c Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl 2010-10-22 10:52:56 -07:00
compat.c fs/compat.c: fix build on MIPS/s390 2010-10-30 08:19:35 -07:00
compat_binfmt_elf.c
compat_ioctl.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
dcache.c fs: use RCU read side protection in d_validate 2010-10-25 21:26:13 -04:00
dcookies.c
direct-io.c fs/direct-io.c: fix truncation error in dio_complete() return 2010-10-26 16:52:13 -07:00
drop_caches.c
eventfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
eventpoll.c epoll: make epoll_wait() use the hrtimer range feature 2010-10-27 18:03:18 -07:00
exec.c exec: don't turn PF_KTHREAD off when a target command was not found 2010-10-27 18:03:13 -07:00
fcntl.c fasync: Fix placement of FASYNC flag comment 2010-10-27 18:17:02 -07:00
fifo.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
file.c
file_table.c fs: allow for more than 2^31 files 2010-10-26 16:52:15 -07:00
filesystems.c
fs-writeback.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2010-10-30 09:05:48 -07:00
fs_struct.c
generic_acl.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
internal.h braino in internal.h 2010-10-29 05:49:13 -04:00
ioctl.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2010-11-19 19:46:45 -08:00
ioprio.c ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() 2010-11-15 10:23:31 +01:00
Kconfig Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
Kconfig.binfmt coredump: default CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y 2010-10-27 18:03:12 -07:00
libfs.c convert get_sb_pseudo() users 2010-10-29 04:16:33 -04:00
locks.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
Makefile Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
mbcache.c
mpage.c
namei.c fix open/umount race 2010-10-29 04:14:56 -04:00
namespace.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
nfsctl.c
no-block.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
open.c fix open/umount race 2010-10-29 04:14:56 -04:00
pipe.c Un-inline get_pipe_info() helper function 2010-11-28 16:27:19 -08:00
pnode.c
pnode.h
posix_acl.c
read_write.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
read_write.h
readdir.c
select.c epoll: make epoll_wait() use the hrtimer range feature 2010-10-27 18:03:18 -07:00
seq_file.c fs: take dcache_lock inside __d_path 2010-10-25 21:26:12 -04:00
signalfd.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2010-10-26 10:13:10 -07:00
splice.c Export 'get_pipe_info()' to other users 2010-11-28 14:09:57 -08:00
stack.c
stat.c
statfs.c
super.c switch get_sb_ns() users 2010-10-29 04:17:03 -04:00
sync.c
timerfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
utimes.c
xattr.c
xattr_acl.c