linux/fs
Linus Torvalds 9c64daff9d ext3: avoid unnecessary spinlock in critical POSIX ACL path
If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
to do POSIX ACL checks on any files not owned by the caller, and it does
this for every single pathname component that it looks up.

That obviously can be pretty expensive if the filesystem isn't careful
about it, especially with locking. That's doubly sad, since the common
case tends to be that there are no ACL's associated with the files in
question.

ext3 already caches the ACL data so that it doesn't have to look it up
over and over again, but it does so by taking the inode->i_lock spinlock
on every lookup. Which is a noticeable overhead even if it's a private
lock, especially on CPU's where the serialization is expensive (eg Intel
Netburst aka 'P4').

For the special case of not actually having any ACL's, all that locking is
unnecessary. Even if somebody else were to be changing the ACL's on
another CPU, we simply don't care - if we've seen a NULL ACL, we might as
well use it.

So just load the ACL speculatively without any locking, and if it was
NULL, just use it. If it's non-NULL (either because we had a cached
entry, or because the cache hasn't been filled in at all), it means that
we'll need to get the lock and re-load it properly.

This is noticeable even on Nehalem, which does locking quite well (much
better than P4). From lmbench:

	Processor, Processes - times in microseconds - smaller is better
	--------------------------------------------------------------------
	Host                 OS  Mhz null null      open slct fork exec sh
	                             call  I/O stat clos TCP  proc proc proc
	--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
 - before:
	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141
	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140
	nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141
 - after:
	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094
	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123
	nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148

where you can see what appears to be a roughly 3% improvement in stat
and open/close latencies from just the removal of the locking overhead.

Of course, this only matters for files you don't own (the owner never
needs to do the ACL checks), but that's the common case for libraries,
header files, and executables. As well as for the base components of any
absolute pathname, even if you are the owner of the final file.

[ At some point we probably want to move this ACL caching logic entirely
  into the VFS layer (and only call down to the filesystem when
  uncached), but in the meantime this improves ext3 a bit.

  A similar fix to btrfs makes a much bigger difference (15x improvement
  in lmbench) due to broken caching. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-17 00:36:35 -04:00
..
9p Fix a leak in failure exit in 9p ->get_sb() 2009-05-09 10:49:40 -04:00
adfs Fix adfs GET_FRAG_ID() on big-endian 2009-06-11 21:36:14 -04:00
affs affs: add ->sync_fs 2009-06-11 21:36:14 -04:00
afs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
autofs switch follow_down() 2009-06-11 21:36:01 -04:00
autofs4 switch follow_down() 2009-06-11 21:36:01 -04:00
befs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
bfs bfs: add ->sync_fs 2009-06-11 21:36:14 -04:00
btrfs enforce ->sync_fs is only called for rw superblock 2009-06-11 21:36:06 -04:00
cachefiles enforce ->sync_fs is only called for rw superblock 2009-06-11 21:36:06 -04:00
cifs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
coda splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
configfs configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-30 10:48:26 -07:00
cramfs fs/cramfs: return f_fsid for statfs(2) 2009-04-02 19:05:08 -07:00
debugfs debugfs: function to know if debugfs is initialized 2009-03-23 16:25:46 +01:00
devpts devpts: unregister the file system on error 2009-06-11 08:51:06 -07:00
dlm dlm: use more NOFS allocation 2009-05-15 11:24:59 -05:00
ecryptfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
efs fs/efs: return f_fsid for statfs(2) 2009-04-02 19:05:09 -07:00
exofs [SCSI] Merge branch 'linus' 2009-06-12 10:02:03 -05:00
exportfs
ext2 trivial: ext2: fix a typo in comment in ext2.h 2009-06-12 18:01:44 +02:00
ext3 ext3: avoid unnecessary spinlock in critical POSIX ACL path 2009-06-17 00:36:35 -04:00
ext4 Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
fat fat: add ->sync_fs 2009-06-11 21:36:15 -04:00
freevxfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
fscache FS-Cache: Fixup renamed filenames in comments in internal.h 2009-05-27 10:20:13 -07:00
fuse Merge branch 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2009-06-12 09:31:20 -07:00
gfs2 GFS2: Remove lock_kernel from gfs2_put_super() 2009-06-12 13:40:47 +01:00
hfs hfs: add ->sync_fs 2009-06-11 21:36:15 -04:00
hfsplus hfsplus: add ->sync_fs 2009-06-11 21:36:16 -04:00
hostfs constify dentry_operations: misc filesystems 2009-03-27 14:44:00 -04:00
hpfs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
hppfs hppfs: hppfs_read_file() may return -ERROR 2009-04-02 19:04:53 -07:00
hugetlbfs Merge branch 'master' into next 2009-05-22 18:40:59 +10:00
isofs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
jbd jbd: fix race in buffer processing in commit code 2009-06-09 16:59:03 -07:00
jbd2 jbd2: Fix minor typos in comments in fs/jbd2/journal.c 2009-06-09 00:06:20 -04:00
jffs2 jffs2: call jffs2_write_super from jffs2_sync_fs 2009-06-11 21:36:16 -04:00
jfs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
lockd lockd: fix list corruption on lockd restart 2009-05-06 17:19:36 -04:00
minix switch minix to simple_fsync() 2009-06-11 21:36:12 -04:00
ncpfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
nfs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
nfs_common
nfsd switch follow_down() 2009-06-11 21:36:01 -04:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 2009-06-15 09:13:49 -07:00
nls
notify fsnotify: allow groups to set freeing_mark to null 2009-06-11 14:57:55 -04:00
ntfs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
ocfs2 Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
omfs switch omfs to simple_fsync() 2009-06-11 21:36:13 -04:00
openpromfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
partitions Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2009-06-12 09:29:42 -07:00
proc Move junk from proc_fs.h to fs/proc/internal.h 2009-06-11 21:36:01 -04:00
qnx4 fs/qnx4: sanitize includes 2009-06-11 21:36:12 -04:00
quota quota: cleanup dquota sync functions (version 4) 2009-06-11 21:36:04 -04:00
ramfs ramfs: ignore unknown mount options 2009-06-14 17:58:25 -07:00
reiserfs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
romfs ROMFS: romfs_dev_read() error ignored 2009-05-09 10:49:41 -04:00
smbfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
squashfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
sysfs sysfs: file.c: use create_singlethread_workqueue() 2009-05-28 14:24:07 -07:00
sysv sysv: add ->sync_fs 2009-06-11 21:36:16 -04:00
ubifs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
udf switch udf to simple_fsync() 2009-06-11 21:36:13 -04:00
ufs ufs: add ->sync_fs 2009-06-11 21:36:16 -04:00
xfs Merge branch 'master' of git://oss.sgi.com/xfs/xfs into for-linus 2009-06-12 21:28:59 -05:00
aio.c aio: lookup_ioctx can return the wrong value when looking up a bogus context 2009-03-19 15:57:18 -07:00
anon_inodes.c constify dentry_operations: rest 2009-03-27 14:44:03 -04:00
attr.c vfs: Use lowercase names of quota functions 2009-03-26 02:18:35 +01:00
bad_inode.c kill ->dir_notify() 2008-12-31 18:07:43 -05:00
binfmt_aout.c sanitize ifdefs in binfmt_aout 2009-01-03 11:45:54 -08:00
binfmt_elf.c Trim includes in binfmt_elf 2009-03-31 23:00:27 -04:00
binfmt_elf_fdpic.c ptrace: s/parent/real_parent/ in binfmt_elf_fdpic.c 2009-05-02 15:36:10 -07:00
binfmt_em86.c
binfmt_flat.c flat: fix data sections alignment 2009-05-29 08:40:02 -07:00
binfmt_misc.c fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/status 2009-01-06 15:59:19 -08:00
binfmt_script.c
binfmt_som.c Don't crap into descriptor table in binfmt_som 2009-03-31 23:00:28 -04:00
bio-integrity.c block: add private bio_set for bio integrity allocations 2009-03-24 12:35:17 +01:00
bio.c trivial: fix typo in bio_alloc kernel doc 2009-06-12 18:01:47 +02:00
block_dev.c vfs: Rename fsync_super() to sync_filesystem() (version 4) 2009-06-11 21:36:04 -04:00
buffer.c Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block 2009-06-11 11:10:35 -07:00
char_dev.c fs: Remove i_cindex from struct inode 2009-06-11 21:36:09 -04:00
compat.c trivial: fix comment typo in fs/compat.c 2009-06-12 18:01:44 +02:00
compat_binfmt_elf.c
compat_ioctl.c trivial: fix typo compatiable/compatiability has extra 'a'. 2009-06-12 18:01:46 +02:00
dcache.c dcache: extrace and use d_unlinked() 2009-06-11 21:36:06 -04:00
dcookies.c [CVE-2009-0029] System call wrapper special cases 2009-01-14 14:15:18 +01:00
direct-io.c block: Do away with the notion of hardsect_size 2009-05-22 23:22:54 +02:00
drop_caches.c vfs: skip I_CLEAR state inodes 2009-04-02 19:04:48 -07:00
eventfd.c eventfd: export eventfd_signal and eventfd_fget for lguest 2009-06-12 22:27:09 +09:30
eventpoll.c epoll: fix size check in epoll_create() 2009-05-12 14:11:35 -07:00
exec.c Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-06-11 14:01:07 -07:00
fcntl.c dup2: Fix return value with oldfd == newfd and invalid fd 2009-05-11 12:18:06 -07:00
fifo.c
file.c
file_table.c fs: move mark_files_ro into file_table.c 2009-06-11 21:36:02 -04:00
filesystems.c fs: Mark get_filesystem_list() as __init function. 2009-04-20 23:02:52 -04:00
fs-writeback.c fs: block_dump missing dentry locking 2009-06-11 21:36:10 -04:00
fs_struct.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
generic_acl.c New helper - current_umask() 2009-03-31 23:00:26 -04:00
inode.c trivial: fs/inode: Fix typo in file_update_time nanodoc 2009-06-12 18:01:45 +02:00
internal.h Trim a bit of crap from fs.h 2009-06-11 21:36:07 -04:00
ioctl.c fiemap: fix problem with setting FIEMAP_EXTENT_LAST 2009-05-06 16:36:09 -07:00
ioprio.c [CVE-2009-0029] System call wrappers part 28 2009-01-14 14:15:30 +01:00
Kconfig CUSE: implement CUSE - Character device in Userspace 2009-06-09 11:24:11 +02:00
Kconfig.binfmt CORE_DUMP_DEFAULT_ELF_HEADERS depends on ELF_CORE 2009-01-09 16:54:41 -08:00
libfs.c New helper - simple_fsync() 2009-06-11 21:36:11 -04:00
locks.c [CVE-2009-0029] System call wrappers part 16 2009-01-14 14:15:25 +01:00
Makefile nilfs2: update makefile and Kconfig 2009-04-07 08:31:16 -07:00
mbcache.c
mpage.c ext4: Properly initialize the buffer_head state 2009-05-13 15:13:42 -04:00
namei.c switch lookup_mnt() 2009-06-11 21:36:01 -04:00
namespace.c Push BKL down into do_remount_sb() 2009-06-11 21:36:08 -04:00
nfsctl.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
no-block.c
open.c fs: introduce mnt_clone_write 2009-06-11 21:36:02 -04:00
pipe.c splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
pnode.c
pnode.h
posix_acl.c
read_write.c splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
read_write.h
readdir.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
select.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
seq_file.c cpumask: fix seq_bitmap_*() functions. 2009-03-30 22:05:11 +10:30
signalfd.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
splice.c splice: fix kmaps in default_file_splice_write() 2009-05-19 11:37:46 +02:00
stack.c
stat.c kill vfs_stat_fd / vfs_lstat_fd 2009-04-20 23:02:52 -04:00
super.c Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
sync.c remove the call to ->write_super in __sync_filesystem 2009-06-11 21:36:17 -04:00
timerfd.c timerfd: add flags check 2009-02-18 15:37:53 -08:00
utimes.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
xattr.c fs: introduce mnt_clone_write 2009-06-11 21:36:02 -04:00
xattr_acl.c