Commit graph

1123 commits

Author SHA1 Message Date
Linus Torvalds 6c1b8d94bc Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits)
  GFS2: Move all locking inside the inode creation function
  GFS2: Clean up symlink creation
  GFS2: Clean up mkdir
  GFS2: Use UUID field in generic superblock
  GFS2: Rename ops_inode.c to inode.c
  GFS2: Inode.c is empty now, remove it
  GFS2: Move final part of inode.c into super.c
  GFS2: Move most of the remaining inode.c into ops_inode.c
  GFS2: Move gfs2_refresh_inode() and friends into glops.c
  GFS2: Remove gfs2_dinode_print() function
  GFS2: When adding a new dir entry, inc link count if it is a subdir
  GFS2: Make gfs2_dir_del update link count when required
  GFS2: Don't use gfs2_change_nlink in link syscall
  GFS2: Don't use a try lock when promoting to a higher mode
  GFS2: Double check link count under glock
  GFS2: Improve bug trap code in ->releasepage()
  GFS2: Fix ail list traversal
  GFS2: make sure fallocate bytes is a multiple of blksize
  GFS2: Add an AIL writeback tracepoint
  GFS2: Make writeback more responsive to system conditions
  ...
2011-05-20 13:28:45 -07:00
Steven Whitehouse f2741d9898 GFS2: Move all locking inside the inode creation function
Now that there are no longer any exceptions to the normal inode
creation code path, we can move the parts of the locking code
which were duplicated in mkdir/mknod/create/symlink into the
inode create function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 12:11:17 +01:00
Steven Whitehouse 160b4026dc GFS2: Clean up symlink creation
This moves the symlink specific parts of inode creation
into the function where we initialise the rest of the
dinode. As a result we have one less place where we need
to look up the inode's buffer.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 10:34:59 +01:00
Steven Whitehouse e2d0a13bba GFS2: Clean up mkdir
This moves the initialisation of the directory into the inode
creation functions to avoid having to duplicate the lookup
of the inode's buffer.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 09:55:55 +01:00
Steven Whitehouse 32e471ef10 GFS2: Use UUID field in generic superblock
The VFS superblock structure now has a UUID field, so we can use that
in preference to the UUID field in the GFS2 superblock now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 15:01:59 +01:00
Steven Whitehouse 2ab9cd1c63 GFS2: Rename ops_inode.c to inode.c
This is the final part of the ops_inode.c/inode.c reordering. We
are left with a single file called inode.c which now contains
all the inode operations, as expected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 13:12:49 +01:00
Steven Whitehouse 64ea540258 GFS2: Inode.c is empty now, remove it
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 13:09:53 +01:00
Steven Whitehouse 9eed04cd99 GFS2: Move final part of inode.c into super.c
Now inode.c is empty.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:38 +01:00
Steven Whitehouse 194c011fc4 GFS2: Move most of the remaining inode.c into ops_inode.c
This is in preparation to remove inode.c and rename ops_inode.c
to inode.c. Also most of the functions which were left in inode.c
relate to the creation and lookup of inodes. I'm intending to work
on consolidating some of that code, and its easier when its all in
one place.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:14 +01:00
Steven Whitehouse d4b2cf1b05 GFS2: Move gfs2_refresh_inode() and friends into glops.c
Eventually there will only be a single caller of this code, so lets
move it where it can be made static at some future date.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:49 +01:00
Steven Whitehouse 94fb763b1a GFS2: Remove gfs2_dinode_print() function
This function was intended for debugging purposes, but it is not very
useful. If we want to know what is on disk then all we need is a
block number and gfs2_edit can give us much better information about
what is there. Otherwise, if we are interested in what is stored in
the in-core inode, it doesn't help us out there either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:29 +01:00
Steven Whitehouse 3d6ecb7d16 GFS2: When adding a new dir entry, inc link count if it is a subdir
This adds an increment of the link count when we add a new directory
entry, if that entry is itself a directory. This means that we no
longer need separate code to perform this operation.

Now that both adding and removing directory entries automatically
update the parent directory's link count if required, that makes
the code shorter and simpler than before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:43:53 +01:00
Steven Whitehouse 855d23ce26 GFS2: Make gfs2_dir_del update link count when required
When we remove an entry from a directory, we can save ourselves
some trouble if we know the type of the entry in question, since
if it is itself a directory, we can update the link count of the
parent at the same time as removing the directory entry.

In addition this patch also merges the rmdir and unlink code which
was almost identical anyway. This eliminates the calls to remove
the . and .. directory entries on each rmdir (not needed since the
directory will be deallocated, anyway) which was the only thing preventing
passing the dentry to gfs2_dir_del(). The passing of the dentry
rather than just the name allows us to figure out the type of the entry
which is being removed, and thus adjust the link count when required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:42:37 +01:00
Steven Whitehouse 2baee03fb9 GFS2: Don't use gfs2_change_nlink in link syscall
There are three users of gfs2_change_nlink which add to the link
count. Two of these are about to be removed in later patches, so
this means that there will no callers, when that happens allowing
removal of that function, also in a later patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:35:25 +01:00
Steven Whitehouse 588da3b3be GFS2: Don't use a try lock when promoting to a higher mode
Previously we marked all locks being promoted to a higher mode
with the try flag to avoid any potential deadlocks issues. The
DLM is able to detect these and report them in way that GFS2 can
deal with them correctly. So we can just request the required mode
and wait for a response without needing to perform this check.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05 12:36:38 +01:00
Steven Whitehouse d192a8e5c6 GFS2: Double check link count under glock
To avoid any possible races relating to the link count, we need to
recheck it under the inode's glock in all cases where it matters.
Also to ensure we never get any nasty surprises, this patch also
ensures that once the link count has hit zero it can never be
elevated by rereading in data from disk.

The only place we cannot provide a proper solution is in rename
in the case where we are removing a target inode and we discover
that the target inode has been already unlinked on another node.
The race window is very small, and we return EAGAIN in this case
to indicate what has happened. The proper solution would be to move
the lookup parts of rename from the vfs into library calls which
the fs could call directly, but that is potentially a very big job
and this fix should cover most cases for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05 12:35:40 +01:00
Steven Whitehouse 8f065d3650 GFS2: Improve bug trap code in ->releasepage()
If the buffer is dirty or pinned, then as well as printing a
warning, we should also refuse to release the page in
question.

Currently this can occur if there is a race between mmap()ed
writers and O_DIRECT on the same file. With the addition of
->launder_page() in the future, we should be able to close
this gap.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03 11:49:19 +01:00
Steven Whitehouse 4f1de01821 GFS2: Fix ail list traversal
In the recent patches to update the AIL list code, I managed to
forget that the ail list lock got dropped, even though I
added a comment specifically to remind myself :(

Reported-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03 11:48:07 +01:00
Benjamin Marzinski 6905d9e4dd GFS2: make sure fallocate bytes is a multiple of blksize
The GFS2 fallocate code chooses a target size to for allocating chunks of
space.  Whenever it can't find any resource groups with enough space free, it
halves its target. Since this target is in bytes, eventually it will no longer
be a multiple of blksize.  As long as there is more space available in the
resource group than the target, this isn't a problem, since gfs2 will use the
actual space available, which is always a multiple of blksize.  However,
when gfs couldn't fallocate a bigger chunk than the target, it was using the
non-blksize aligned number. This caused a BUG in later code that required
blksize aligned offsets.  GFS2 now ensures that bytes is always a multiple of
blksize

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03 11:47:42 +01:00
Christoph Hellwig 1879fd6a26 add hlist_bl_lock/unlock helpers
Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets.  Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-25 18:14:10 -07:00
Steven Whitehouse c83ae9cad8 GFS2: Add an AIL writeback tracepoint
Add a tracepoint for monitoring writeback of the AIL.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:58 +01:00
Steven Whitehouse 4667a0ec32 GFS2: Make writeback more responsive to system conditions
This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:37 +01:00
Steven Whitehouse f42ab08529 GFS2: Optimise glock lru and end of life inodes
The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:17 +01:00
Steven Whitehouse 627c10b7e4 GFS2: Improve tracing support (adds two flags)
This adds support for two new flags. One keeps track of whether
the glock is on the LRU list or not. The other isn't really a
flag as such, but an indication of whether the glock has an
attached object or not. This indication is reported without
any locking, which is ok since we do not dereference the object
pointer but merely report whether it is NULL or not.

Also, this fixes one place where a tracepoint was missing, which
was at the point we remove deallocated blocks from the journal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:00:59 +01:00
Steven Whitehouse dba898b02d GFS2: Clean up fsync()
This patch is designed to clean up GFS2's fsync
implementation and ensure that it really does get everything on
disk. Since ->write_inode() has been updated, we can call that
via the vfs library function sync_inode_metadata() and the only
remaining thing that has to be done is to ensure that we get
any revoke records in the log after the inode has been written back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:00:41 +01:00
Steven Whitehouse efc1a9c2a7 GFS2: Remove unused macro
The buffer_in_io() macro has been unused for some time,
so remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:00:24 +01:00
Steven Whitehouse 29687a2ac8 GFS2: Alter point of entry to glock lru list for glocks with an address_space
Rather than allowing the glocks to be scheduled for possible
reclaim as soon as they have exited the journal, this patch
delays their entry to the list until the glocks in question
are no longer in use.

This means that we will rely on the vm for writeback of all
dirty data and metadata from now on. When glocks are added
to the lru list they should be freeable much faster since all
the I/O required to free them should have already been completed.

This should lead to much better I/O patterns under low memory
conditions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:59:48 +01:00
Steven Whitehouse 5ac048bb7e GFS2: Use filemap_fdatawrite() to write back the AIL
In order to ensure that the mapping stats (and thus the bdi) are correctly
updated, this patch changes the AIL writeback to use the filemap_datawrite
function. This helps prevent stalls in balance_dirty_pages() due to
large amounts of dirty metadata when there is little or no dirty data
around.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:59:25 +01:00
Steven Whitehouse 1027efaa23 GFS2: Make ->write_inode() really write
The GFS2 ->write_inode function should be more aggressive at writing
back to the filesystem. This adopts the XFS system of returning
-EAGAIN when the writeback has not been completely done. Also, we
now kick off in-place writeback when called with WB_SYNC_NONE,
but we only wait for it and flush the log when WB_SYNC_ALL is
requested.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:55:07 +01:00
Bob Peterson 556bb17998 GFS2: move function foreach_leaf to gfs2_dir_exhash_dealloc
The previous patches made function gfs2_dir_exhash_dealloc do nothing
but call function foreach_leaf.  This patch simplifies the code by
moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:54:44 +01:00
Bob Peterson ec038c826b GFS2: pass leaf_bh into leaf_dealloc
Function foreach_leaf used to look up the leaf block address and get
a buffer_head.  Then it would call leaf_dealloc which did the same
lookup.  This patch combines the two operations by making foreach_leaf
pass the leaf bh to leaf_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:54:26 +01:00
Bob Peterson d24a7a439a GFS2: Combine transaction from gfs2_dir_exhash_dealloc
At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
type to "file" to prevent directory corruption in case of a crash.
It was doing so in its own journal transaction.  This patch makes the
change occur when the last call is make to leaf_dealloc, since it needs
to rewrite the directory dinode at that time anyway.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:56 +01:00
Bob Peterson 0d95326d9b GFS2: remove *leaf_call_t and simplify leaf_dealloc
Since foreach_leaf is only called with leaf_dealloc as its only possible
call function, we can simplify the code by making it call leaf_dealloc
directly.  This simplifies the code and eliminates the need for
leaf_call_t, the generic call method.  This is a first small step in
simplifying the directory leaf deallocation code.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:35 +01:00
Bob Peterson 95c8e17f2f GFS2: Dump better debug info if a bitmap inconsistency is detected
On rare occasions we encounter gfs2 problems where an
invalid bitmap state transition is attempted.  For example,
trying to "unlink" a free block.  In these cases, there
is really no useful information logged to debug the problem.
This patch adds more debug details that should allow us to
more closely examine the problem and possibly solve it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:12 +01:00
Bob Peterson 44ad37d69b GFS2: filesystem hang caused by incorrect lock order
This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING.  The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first.  The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:23:50 +01:00
Steven Whitehouse 001e8e8df4 GFS2: Don't try to deallocate unlinked inodes when mounted ro
This adds a couple of missing tests to avoid read-only nodes
from attempting to deallocate unlinked inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Michel Andre de la Porte <madelaporte@ubi.com>
2011-04-18 15:23:12 +01:00
Benjamin Marzinski 0ee532062f GFS2: directly write blocks past i_size
GFS2 was relying on the writepage code to write out the zeroed data for
fallocate.  However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size.
If it is, it will be ignored.  To work around this, gfs2 now calls
write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE
is set, and it's writing past i_size.

This version is just a cleanup of my last version

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:22:52 +01:00
Bob Peterson deab72d379 GFS2: write_end error path fails to unlock transaction lock
I did an audit of gfs2's transaction glock for bugzilla bug
658619 and ran across this:

In function gfs2_write_end, in the unlikely event that
gfs2_meta_inode_buffer returns an error, the code may forget
to unlock the transaction lock because the "failed" label
appears after the call to function gfs2_trans_end.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:22:35 +01:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds 6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Serge E. Hallyn 2e14967075 userns: rename is_owner_or_cap to inode_owner_or_capable
And give it a kernel-doc comment.

[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:47:13 -07:00
Linus Torvalds a44f99c7ef Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits)
  video: change to new flag variable
  scsi: change to new flag variable
  rtc: change to new flag variable
  rapidio: change to new flag variable
  pps: change to new flag variable
  net: change to new flag variable
  misc: change to new flag variable
  message: change to new flag variable
  memstick: change to new flag variable
  isdn: change to new flag variable
  ieee802154: change to new flag variable
  ide: change to new flag variable
  hwmon: change to new flag variable
  dma: change to new flag variable
  char: change to new flag variable
  fs: change to new flag variable
  xtensa: change to new flag variable
  um: change to new flag variables
  s390: change to new flag variable
  mips: change to new flag variable
  ...

Fix up trivial conflict in drivers/hwmon/Makefile
2011-03-20 18:14:55 -07:00
matt mooney 0ccd234ca0 fs: change to new flag variable
Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y
for cleaner conditional inclusion.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-03-17 14:02:57 +01:00
Linus Torvalds 0f6e0e8448 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
  AppArmor: kill unused macros in lsm.c
  AppArmor: cleanup generated files correctly
  KEYS: Add an iovec version of KEYCTL_INSTANTIATE
  KEYS: Add a new keyctl op to reject a key with a specified error code
  KEYS: Add a key type op to permit the key description to be vetted
  KEYS: Add an RCU payload dereference macro
  AppArmor: Cleanup make file to remove cruft and make it easier to read
  SELinux: implement the new sb_remount LSM hook
  LSM: Pass -o remount options to the LSM
  SELinux: Compute SID for the newly created socket
  SELinux: Socket retains creator role and MLS attribute
  SELinux: Auto-generate security_is_socket_class
  TOMOYO: Fix memory leak upon file open.
  Revert "selinux: simplify ioctl checking"
  selinux: drop unused packet flow permissions
  selinux: Fix packet forwarding checks on postrouting
  selinux: Fix wrong checks for selinux_policycap_netpeer
  selinux: Fix check for xfrm selinux context algorithm
  ima: remove unnecessary call to ima_must_measure
  IMA: remove IMA imbalance checking
  ...
2011-03-16 09:15:43 -07:00
Linus Torvalds 3ae2a1ce2e Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Don't use _raw version of RCU dereference
  GFS2: Adding missing unlock_page()
  GFS2: Update to AIL list locking
  GFS2: introduce AIL lock
  GFS2: fix block allocation check for fallocate
  GFS2: Optimize glock multiple-dequeue code
  GFS2: Remove potential race in flock code
  GFS2: Fix glock deallocation race
  GFS2: quota allows exceeding hard limit
  GFS2: deallocation performance patch
  GFS2: panics on quotacheck update
  GFS2: Improve cluster mmap scalability
  GFS2: Fix glock queue trace point
  GFS2: Post-VFS scale update for RCU path walk
  GFS2: Use RCU for glock hash table
2011-03-16 08:58:43 -07:00
James Morris a002951c97 Merge branch 'next' into for-linus 2011-03-16 09:41:17 +11:00
Steven Whitehouse 7e32d02613 GFS2: Don't use _raw version of RCU dereference
As per RCU glock patch review comments, don't use the _raw
version of this function here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-03-15 08:58:17 +00:00
Maxim 6c474f7bc1 GFS2: Adding missing unlock_page()
gfs2_write_begin() calls grab_cache_page_write_begin() that returns *locked*
page. Correspondent error-handling path lacks for unlock_page() call:

> out:
> 	if (error == 0)
> 		return 0;
>
> 	page_cache_release(page);

The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin()
failed for some reason.

Reported-by: Maxim <maxim.patlasov@gmail.com>
Signed-off-by: Maxim <maxim.patlasov@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-14 13:19:21 +00:00
Aneesh Kumar K.V 5fe0c23788 exportfs: Return the minimum required handle size
The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Steven Whitehouse c618e87a5f GFS2: Update to AIL list locking
The previous patch missed a couple of places where the AIL list
needed locking, so this fixes up those places, plus a comment
is corrected too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
2011-03-14 12:40:29 +00:00
Dave Chinner d6a079e82e GFS2: introduce AIL lock
The log lock is currently used to protect the AIL lists and
the movements of buffers into and out of them. The lists
are self contained and no log specific items outside the
lists are accessed when starting or emptying the AIL lists.

Hence the operation of the AIL does not require the protection
of the log lock so split them out into a new AIL specific lock
to reduce the amount of traffic on the log lock. This will
also reduce the amount of serialisation that occurs when
the gfs2_logd pushes on the AIL to move it forward.

This reduces the impact of log pushing on sequential write
throughput.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11 11:52:25 +00:00
Benjamin Marzinski e4a7b7b0c9 GFS2: fix block allocation check for fallocate
GFS2 fallocate wasn't properly checking if a blocks were already allocated.
In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2
was always treating it as if there were no blocks allocated for that page.
GFS2 now calls gfs2_block_map() to check if the blocks are allocated before
writing them out.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11 09:26:48 +00:00
Bob Peterson fa1bbdea30 GFS2: Optimize glock multiple-dequeue code
This is a small patch that optimizes multiple glock dequeue
operations.  It changes the unlock order to be more efficient
and makes it easier for lock debugging tools to unravel.  It
also eliminates the need for the temp variable x, although
that would likely be optimized out.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11 09:24:54 +00:00
Al Viro 53fe924161 gfs2: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:44:48 -05:00
Jens Axboe 4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe 721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe 7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Steven Whitehouse 0a33443b38 GFS2: Remove potential race in flock code
This patch ensures that we always wait for glock demotion when
dropping flocks on a file in order to prevent any race
conditions associated with further flock calls or closing
the file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09 11:14:32 +00:00
Steven Whitehouse fc0e38dae6 GFS2: Fix glock deallocation race
This patch fixes a race in deallocating glocks which was introduced
in the RCU glock patch. We need to ensure that the glock count is
kept correct even in the case that there is a race to add a new
glock into the hash table. Also, to avoid having to wait for an
RCU grace period, the glock counter can be decremented before
call_rcu() is called.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09 10:58:04 +00:00
Abhijith Das 662e3a551b GFS2: quota allows exceeding hard limit
Immediately after being synced to disk, cached quotas are zeroed out and a
subsequent access of the cached quotas results in incorrect zero values. This
meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage
values it found in the cached quotas and comparison against warn/limits never
triggered a quota violation.

This patch adds a new flag QDF_REFRESH that is set after a sync so that the
cached quotas are forcefully refreshed from disk on a subsequent access on
seeing this flag set.

Resolves: rhbz#675944
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09 09:32:44 +00:00
James Morris fe3fa43039 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next 2011-03-08 11:38:10 +11:00
Bob Peterson 4c16c36ad6 GFS2: deallocation performance patch
This patch is a performance improvement to GFS2's dealloc code.
Rather than update the quota file and statfs file for every
single block that's stripped off in unlink function do_strip,
this patch keeps track and updates them once for every layer
that's stripped.  This is done entirely inside the existing
transaction, so there should be no risk of corruption.
The other functions that deallocate blocks will be unaffected
because they are using wrapper functions that do the same
thing that they do today.

I tested this code on my roth cluster by creating 200
files in a directory, each of which is 100MB, then on
four nodes, I simultaneously deleted the files, thus competing
for GFS2 resources (but different files).  The commands
I used were:

[root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done

The performance increase was significant:

             roth-01     roth-02     roth-03     roth-05
             ---------   ---------   ---------   ---------
old: real    0m34.027    0m25.021s   0m23.906s   0m35.646s
new: real    0m22.379s   0m24.362s   0m24.133s   0m18.562s

Total time spent deleting:
old: 118.6s
new:  89.4

For this particular case, this showed a 25% performance increase for
GFS2 unlinks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-24 12:13:48 +00:00
Miklos Szeredi 2aa15890f3 mm: prevent concurrent unmap_mapping_range() on the same inode
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Michael Leun <lkml20101129@newton.leun.net>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-23 19:52:52 -08:00
Tejun Heo 58a69cb47e workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'.  The former is the more prominent one.  The latter is
mostly used by workqueue and in a few other odd places.  Unify the
spelling to 'freezable'.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
2011-02-16 17:48:59 +01:00
Abhijith Das e79a46a030 GFS2: panics on quotacheck update
Handle block allocation for forceful unstuffing of quota dinode during quota
update using quotactl(). Also fix block reservation for special cases when
quotas cross over block boundaries and update 2 blocks instead of 1.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-07 20:00:44 +00:00
Steven Whitehouse b9c93bb7de GFS2: Improve cluster mmap scalability
The mmap system call grabs a glock when an update to atime maybe
required. It does this in order to ensure that the flags on the
inode are uptodate, but since it will only mark atime for a future
update, an exclusive lock is not required here (one will be taken
later when the actual update is performed).

Also, the lock can be skipped when the mount is marked noatime in
addition to the original check which only looked at the noatime
flag for the inode itself.

This should increase the scalability of the mmap call when multiple
nodes are all mmaping the same file.

Reported-by: Scooter Morris <scooter@cgl.ucsf.edu>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-02 14:48:10 +00:00
Eric Paris 2a7dba391e fs/vfs/security: pass last path component to LSM on inode creation
SELinux would like to implement a new labeling behavior of newly created
inodes.  We currently label new inodes based on the parent and the creating
process.  This new behavior would also take into account the name of the
new object when deciding the new label.  This is not the (supposed) full path,
just the last component of the path.

This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations.  We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly.  This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook.  If no such name
exists it is fine to pass NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>
2011-02-01 11:12:29 -05:00
Steven Whitehouse edae38a643 GFS2: Fix glock queue trace point
Somehow this tracepoint landed up in the wrong place. This moves it
to where it should be.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-31 09:38:12 +00:00
Steven Whitehouse 75d5cfbe4b GFS2: Post-VFS scale update for RCU path walk
We can allow a few more cases to use RCU path walking than
originally allowed. It should be possible to also enable
RCU path walking when the glock is already cached. Thats
a bit more complicated though, so left for a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@gmail.com>
2011-01-21 09:39:24 +00:00
Steven Whitehouse bc015cb841 GFS2: Use RCU for glock hash table
This has a number of advantages:

 - Reduces contention on the hash table lock
 - Makes the code smaller and simpler
 - Should speed up glock dumps when under load
 - Removes ref count changing in examine_bucket
 - No longer need hash chain lock in glock_put() in common case

There are some further changes which this enables and which
we may do in the future. One is to look at using SLAB_RCU,
and another is to look at using a per-cpu counter for the
per-sb glock counter, since that is touched twice in the
lifetime of each glock (but only used at umount time).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-01-21 09:39:08 +00:00
Steven Whitehouse 24d9765fc1 GFS2: Fix error path in gfs2_lookup_by_inum()
In the (impossible, except if there is fs corruption) error path
in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
fails, it was leaving the function by calling iput() rather
than iget_failed(). This would cause future lookups of the same
inode to block forever.

This patch fixes the problem by moving the call to gfs2_inode_refresh()
into gfs2_inode_lookup() where iget_failed() is part of the error path
already. Also this cleans up some unreachable code and makes
gfs2_set_iop() static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-18 14:49:08 +00:00
Benjamin Marzinski 23c3010808 GFS2: remove iopen glocks from cache on failed deletes
When a file gets deleted on GFS2, if a node can't get an exclusive lock on the
file's iopen glock, it punts on actually freeing up the space, because another
node is using the file.  When it does this, it needs to drop the iopen glock
from its cache so that the other node can get an exclusive lock on it. Now,
gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the
iopen glock in preparation for grabbing it in the exclusive state.  Since the
node needs the glock in the exclusive state, dropping the shared lock from the
cache doesn't slow down the case where no other nodes are using the file.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-18 14:28:29 +00:00
Christoph Hellwig 2fe17c1075 fallocate should be a file operation
Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes.  On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions.   Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 02:25:31 -05:00
Christoph Hellwig 64c23e8687 make the feature checks in ->fallocate future proof
Instead of various home grown checks that might need updates for new
flags just check for any bit outside the mask of the features supported
by the filesystem.  This makes the check future proof for any newly
added flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 02:25:30 -05:00
Linus Torvalds 275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Josef Bacik 9ecf639a96 Gfs2: fail if we try to use hole punch
Gfs2 doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate.  This support can
be added later.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:44 -05:00
Al Viro 41ced6dcf3 switch gfs2, close races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:46 -05:00
Alexey Dobriyan 57cc7215b7 headers: kobject.h redux
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-10 08:51:44 -08:00
Linus Torvalds b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Nick Piggin b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin 34286d6662 fs: rcu-walk aware d_revalidate method
Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin b1e6a015a5 fs: change d_hash for rcu-walk
Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:20 +11:00
Nick Piggin fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Steven Whitehouse 846f404552 GFS2: Don't flush delete workqueue when releasing the transaction lock
There is no requirement to flush the delete workqueue before a
gfs2 filesystem is suspended. The workqueue's work will just
be suspended along with the rest of the tasks on the filesystem.

The resolves a deadlock situation where the transaction lock's
demotion code was trying to flush the delete workqueue while at
the same time, the workqueue was waiting for the transaction
lock.

The delete workqueue is flushed by gfs2_make_fs_ro() already, so
that umount/remount are correctly protected anyway.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-12-16 15:18:48 +00:00
Bob Peterson bcd7278d8a GFS2: fsck.gfs2 reported statfs error after gfs2_grow
When you do gfs2_grow it failed to take the very last
rgrp into account when adding up the new free space due
to an off-by-one error.  It was not reading the last
rgrp from the rindex because of a check for "<=" that
should have been "<".  Therefore, fsck.gfs2 was finding
(and fixing) an error with the system statfs file.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2010-12-07 18:55:07 +00:00
Steven Whitehouse 47a25380e3 GFS2: Merge glock state fields into a bitfield
We can only merge the fields into a bitfield if the locking
rules for them are the same. In this case gl_spin covers all
of the fields (write side) but a couple of them are used
with GLF_LOCK as the read side lock, which should be ok
since we know that the field in question won't be changing
at the time.

The gl_req setting has to be done earlier (in glock.c) in order
to place it under gl_spin. The gl_reply setting also has to be
brought under gl_spin in order to comply with the new rules.

This saves 4*sizeof(unsigned int) per glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2010-11-30 15:49:31 +00:00
Steven Whitehouse e06dfc4928 GFS2: Fix uninitialised error value in previous patch
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 15:46:02 +00:00
Benjamin Marzinski 086d8334cf GFS2: fix recursive locking during rindex truncates
When you truncate the rindex file, you need to avoid calling gfs2_rindex_hold,
since you already hold it.  However, if you haven't already read in the
resource groups, you need to do that.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 15:41:54 +00:00
Benjamin Marzinski 0489b3f5eb GFS2: reread rindex when necessary to grow rindex
When GFS2 grew the filesystem, it was never rereading the rindex file during
the grow. This is necessary for large grows when the filesystem is almost full,
and GFS2 needs to use some of the space allocated earlier in the grow to
complete it.  Now, if GFS2 fails to reserve the necessary space and the rindex
file is not uptodate, it rereads it.  Also, the only difference between
gfs2_ri_update() and gfs2_ri_update_special() was that gfs2_ri_update_special()
didn't clear out the existing resource groups, since you knew that it was only
called when there were no resource groups.  Attempting to clear out the
resource groups when there are none takes almost no time, and rarely happens,
so I simply removed gfs2_ri_update_special().

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 15:34:18 +00:00
Steven Whitehouse 0b1246e677 GFS2: Remove duplicate #defines from glock.h
There are a number of duplicated #defines in glock.h
plus one which is unused. This removes the extra
definitions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 15:33:04 +00:00
Steven Whitehouse 921169ca2f GFS2: Clean up of gdlm_lock function
The DLM never returns -EAGAIN in response to dlm_lock(), and even
if it did, the test in gdlm_lock() was wrong anyway. Once that
test is removed, it is possible to greatly simplify this code
by simply using a "normal" error return code (0 for success).

We then no longer need the LM_OUT_ASYNC return code which can
be removed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:31:48 +00:00
Abhijith Das 802ec9b668 GFS2: Allow gfs2 to update quota usage values through the quotactl interface
With this patch the gfs2_set_dqblk() function will be able to update the
quota usage block count (FS_DQ_BCOUNT) in addition to the already supported
FS_DQ_BHARD (limit) and FS_DQ_BSOFT (warn) fields of the dquot structure.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:31:27 +00:00
Joe Perches edc221d00b GFS2: fs/gfs2/glock.h: Add __attribute__((format(printf,2,3)) to gfs2_print_dbg
Functions that use printf formatting, especially
those that use %pV, should have their uses of
printf format and arguments checked by the compiler.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:31:05 +00:00
Joe Perches 5e69069c1a GFS2: fs/gfs2/glock.c: Use printf extension %pV
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:30:41 +00:00
Steven Whitehouse 2ae51ed7b5 GFS2: Clean up duplicated setattr code
While preparing the last patch I noticed that the gfs2_setattr_simple
code had been duplicated into two other places. This patch updates
those to call gfs2_setattr_simple rather than open coding it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:30:19 +00:00
Steven Whitehouse 9e55cd5372 GFS2: Remove unreachable calls to vmtruncate
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:22:48 +00:00
Joe Perches cc18152eb7 GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pS
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:22:19 +00:00
Steven Whitehouse d2115778c7 GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIM
The WQ_RESCUER flag should only be used internally to the
workqueue implementation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
2010-11-30 10:21:55 +00:00