Commit graph

4688 commits

Author SHA1 Message Date
Nick Piggin
87df7241bd [PATCH] Fix try_to_free_buffer() locking
Fix commit ecdfc9787f

Not to put too fine a point on it, but in a nutshell...

	__set_page_dirty_buffers() | try_to_free_buffers()
	---------------------------+---------------------------
	                           | spin_lock(private_lock);
	                           | drop_bufers()
	                           | spin_unlock(private_lock);
	spin_lock(private_lock)    |
	!page_has_buffers()        |
	spin_unlock(private_lock)  |
	SetPageDirty()             |
	                           | cancel_dirty_page()

                          oops!

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-29 20:20:42 -08:00
Mark Fasheh
a8a75a20e9 [PATCH] ocfs2: fix thinko in ocfs2_backup_super_blkno()
Fix a bug which was introduced when I synced up ocfs2_fs.h with ocfs2-tools.
We can't do u64/u32 in kernel.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 14:53:27 -08:00
Alexey Dobriyan
1fb8449618 [PATCH] core-dumping unreadable binaries via PT_INTERP
Proposed patch to fix #5 in
http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt
aka
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073

To reproduce, do
* grab poc at the end of advisory.
* add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;"
  where first "4096" is something equal to or greater than 4096.
* ./poc /usr/bin/sudo && ls -l

Here I get with 2.6.20-rc5:

 -rw------- 1 ad   ad   102400 2007-01-15 19:17 core
 ---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudo

Check for MAY_READ like binfmt_misc.c does.

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00
NeilBrown
a0ad13ef64 [PATCH] knfsd: Fix type mismatch with filldir_t used by nfsd
nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'.  It
then casts encode_dent_fn points to 'filldir_t' as needed.  This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.

So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now.  Less internal (to nfsd) checking
offset by more external checking, which is more important.

Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.

Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00
Eric Van Hensbergen
e540eb45a5 [PATCH] 9p: null terminate error strings for debug print
We weren't properly NULL terminating protocol error strings for our debug
printk resulting in garbage being included in the output when debug was
enabled.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00
Eric Van Hensbergen
da977b2c7e [PATCH] 9p: fix segfault caused by race condition in meta-data operations
Running dbench multithreaded exposed a race condition where fid structures
were removed while in use.  This patch adds semaphores to meta-data operations
to protect the fid structure.  Some cleanup of error-case handling in the
inode operations is also included.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00
Eric Van Hensbergen
621997cd39 [PATCH] 9p: fix rename return code
9p doesn't handle renames between directories -- however, we were returning
EPERM instead of EXDEV when we detected this case.

Signed-off-by: Eric Van Hensbergren <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
Eric Van Hensbergen
f94b347059 [PATCH] 9p: fix bogus return code checks during initialization
There is a simple logic error in init_v9fs - the return code checks are
reversed.  This patch fixes the return code and adds some messages to prevent
module initialization from failing silently.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
Peter Staubach
c397852c3d [PATCH] knfsd: Don't mess with the 'mode' when storing a exclusive-create cookie
NFS V3 (and V4) support exclusive create by passing a 'cookie' which can get
stored with the file.  If the file exists but has exactly the right cookie
stored, then we assume this is a retransmit and the exclusive create was
successful.

The cookie is 64bits and is traditionally stored in the mtime and atime
fields.  This causes a problem with Solaris7 as negative mtime or atime
confuse it.  So we moved two bits into the mode word instead.

But inherited ACLs sometimes overwrite the mode word on create, so this is a
problem.

So we give up and just store 62 of the 64 bits and assume that is close
enough.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
NeilBrown
250f391518 [PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads
NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request.  The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.

However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply.  This can overflow and array and cause an Oops.

This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
NeilBrown
1a8eff6d97 [PATCH] knfsd: fix setting of ACL server versions
Due to silly typos, if the nfs versions are explicitly set, no NFSACL versions
get enabled.

Also improve an error message that would have made this bug a little easier to
find.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
Alexey Dobriyan
863c47028e [PATCH] Fix NULL ->nsproxy dereference in /proc/*/mounts
/proc/*/mounstats was fixed, all right, but...

To reproduce:

	while true; do
		find /proc -type f 2>/dev/null | xargs cat 1>/dev/null 2>/dev/null;
	done

BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
c01754df
*pde = 00000000
Oops: 0000 [#28]
Modules linked in: af_packet ohci_hcd e1000 ehci_hcd uhci_hcd usbcore xfs
CPU:    0
EIP:    0060:[<c01754df>]    Not tainted VLI
EFLAGS: 00010286   (2.6.20-rc5 #1)
EIP is at mounts_open+0x1c/0xac
eax: 00000000   ebx: d5898ac0   ecx: d1d27b18   edx: d1d27a50
esi: e6083e10   edi: d3c87f38   ebp: d5898ac0   esp: d3c87ef0
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 18071, ti=d3c86000 task=f7d5f070 task.ti=d3c86000)
Stack: d5898ac0 e6083e10 d3c87f38 c01754c3 c0147c91 c18c52c0 d343f314 d5898ac0
       00008000 d3c87f38 ffffff9c c0147e09 d5898ac0 00000000 00000000 c0147e4b
       00000000 d3c87f38 d343f314 c18c52c0 c015e53e 00001000 08051000 00000101
Call Trace:
 [<c01754c3>] mounts_open+0x0/0xac
 [<c0147c91>] __dentry_open+0xa1/0x18c
 [<c0147e09>] nameidata_to_filp+0x31/0x3a
 [<c0147e4b>] do_filp_open+0x39/0x40
 [<c015e53e>] seq_read+0x128/0x2aa
 [<c0147e8c>] do_sys_open+0x3a/0x6d
 [<c0147efa>] sys_open+0x1c/0x20
 [<c0102b76>] sysenter_past_esp+0x5f/0x85
 [<c02a0033>] unix_stream_recvmsg+0x3bf/0x4bf
 =======================
Code: 5d c3 89 d8 e8 06 e0 f9 ff eb bd 0f 0b eb fe 55 57 56 53 89 d5 8b 40 f0 31 d2 e8 02 c1 fa ff 89 c2 85 c0 74 5c 8b 80 48 04 00 00 <8b> 58 0c 85 db 74 02 ff 03 ff 4a 08 0f 94 c0 84 c0 75 74 85 db
EIP: [<c01754df>] mounts_open+0x1c/0xac SS:ESP 0068:d3c87ef0

A race with do_exit()'s call to exit_namespaces().

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
Roland McGrath
f47aef55d9 [PATCH] i386 vDSO: use VM_ALWAYSDUMP
This patch fixes core dumps to include the vDSO vma, which is left out now.
It removes the special-case core writing macros, which were not doing the
right thing for the vDSO vma anyway.  Instead, it uses VM_ALWAYSDUMP in the
vma; there is no need for the fixmap page to be installed.  It handles the
CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
get_gate_vma after real vmas in the same way the /proc/PID/maps code does.

This changes core dumps so they no longer include the non-PT_LOAD phdrs from
the vDSO.  I made the change to add them in the first place, but in turned out
that nothing ever wanted them there since the advent of NT_AUXV.  It's cleaner
to leave them out, and just let the phdrs inside the vDSO image speak for
themselves.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
Roland McGrath
e5b97dde51 [PATCH] Add VM_ALWAYSDUMP
This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct.  This
provides a clean explicit way to have a vma always included in core dumps, as
is needed for vDSO's.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
Linus Torvalds
4b89eed93e Write back inode data pages even when the inode itself is locked
In __writeback_single_inode(), when we find a locked inode and we're not
doing a data-integrity sync, we used to just skip writing entirely,
since we didn't want to wait for the inode to unlock.

However, there's really no reason to skip writing the data pages, which
are likely to be the the bulk of the dirty state anyway (and the main
reason why writeback was started for the non-data-integrity case, of
course!)

Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andrew Morton <akpm@osdl.org>,
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 12:53:20 -08:00
Linus Torvalds
ecdfc9787f Resurrect 'try_to_free_buffers()' VM hackery
It's not pretty, but it appears that ext3 with data=journal will clean
pages without ever actually telling the VM that they are clean.  This,
in turn, will result in the VM (and balance_dirty_pages() in particular)
to never realize that the pages got cleaned, and wait forever for an
event that already happened.

Technically, this seems to be a problem with ext3 itself, but it used to
be hidden by 'try_to_free_buffers()' noticing this situation on its own,
and just working around the filesystem problem.

This commit re-instates that hack, in order to avoid a regression for
the 2.6.20 release. This fixes bugzilla 7844:

	http://bugzilla.kernel.org/show_bug.cgi?id=7844

Peter Zijlstra points out that we should probably retain the debugging
code that this removes from cancel_dirty_page(), and I agree, but for
the imminent release we might as well just silence the warning too
(since it's not a new bug: anything that triggers that warning has been
around forever).

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 12:47:06 -08:00
Trond Myklebust
717d44e849 [PATCH] NFS: Fix races in nfs_revalidate_mapping()
Prevent the call to invalidate_inode_pages2() from racing with file writes
by taking the inode->i_mutex across the page cache flush and invalidate.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-24 12:31:06 -08:00
Linus Torvalds
5394cd2187 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Fix oops when Windows server sent bad domain name null terminator
  [CIFS]  cifs sprintf fix
  [CIFS] Remove 2 unneeded kzalloc casts
  [CIFS] Update CIFS version number
2007-01-24 09:46:54 -08:00
Vladimir Saveliev
de14569f94 [PATCH] resierfs: avoid tail packing if an inode was ever mmapped
This patch fixes a confusion reiserfs has for a long time.

On release file operation reiserfs used to try to pack file data stored in
last incomplete page of some files into metadata blocks.  After packing the
page got cleared with clear_page_dirty.  It did not take into account that
the page may be mmaped into other process's address space.  Recent
replacement for clear_page_dirty cancel_dirty_page found the confusion with
sanity check that page has to be not mapped.

The patch fixes the confusion by making reiserfs avoid tail packing if an
inode was ever mmapped.  reiserfs_mmap and reiserfs_file_release are
serialized with mutex in reiserfs specific inode.  reiserfs_mmap locks the
mutex and sets a bit in reiserfs specific inode flags.
reiserfs_file_release checks the bit having the mutex locked.  If bit is
set - tail packing is avoided.  This eliminates a possibility that mmapped
page gets cancel_page_dirty-ed.

Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:06 -08:00
Chen, Kenneth W
cda9205da2 [PATCH] fix blk_direct_IO bio preparation
For large size DIO that needs multiple bio, one full page worth of data was
lost at the boundary of bio's maximum sector or segment limits.  After a
bio is full and got submitted.  The outer while (nbytes) { ...  } loop will
allocate a new bio and just march on to index into next page.  It just
forgets about the page that bio_add_page() rejected when previous bio is
full.  Fix it by put the rejected page back to pvec so we pick it up again
for the next bio.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:06 -08:00
Andrew Morton
790816dd54 [PATCH] blockdev direct_io: fix signedness bug
size_t is unsigned.  IO errors aren't getting through.

Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:05 -08:00
Linus Torvalds
ebcccd14b7 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (84 commits)
  [JFFS2] debug.h: include <linux/sched.h> for current->pid
  [MTD] OneNAND: Handle DDP chip boundary during read-while-load
  [MTD] OneNAND: return ecc error code only when 2-bit ecc occurs
  [MTD] OneNAND: Implement read-while-load
  [MTD] OneNAND: fix onenand_wait bug in read ecc error
  [MTD] OneNAND: release CPU in cycles
  [MTD] OneNAND: add subpage write support
  [MTD] OneNAND: fix onenand_wait bug
  [JFFS2] use the ref_offset macro
  [JFFS2] Reschedule in loops
  [JFFS2] Fix error-path leak in summary scan
  [JFFS2] add cond_resched() when garbage collecting deletion dirent
  [MTD] Nuke IVR leftovers
  [MTD] OneNAND: fix oob handling in recent oob patch
  [MTD] Fix ssfdc blksize typo
  [JFFS2] replace kmalloc+memset with kzalloc
  [MTD] Fix SSFDC build for variable blocksize.
  [MTD] ESB2ROM uses PCI
  [MTD] of_device-based physmap driver
  [MTD] Support combined RedBoot FIS directory and configuration area
  ...
2007-01-22 19:32:13 -08:00
Linus Torvalds
2596627c5c Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Add backup superblock info to ocfs2_fs.h
  ocfs2: cleanup ocfs2_iget() errors
  ocfs2: Directory c/mtime update fixes
  ocfs2: Don't print errors when following symlinks
2007-01-22 11:33:40 -08:00
Steve French
8e6f195af0 [CIFS] Fix oops when Windows server sent bad domain name null terminator
Fixes RedHat bug 211672

Windows sends one byte (instead of two) of null to terminate final Unicode
string (domain name) in session setup response in some cases - this caused
cifs to misalign some informational strings (making it hard to convert
from UCS16 to UTF8).

Thanks to Shaggy for his help and Akemi Yagi for debugging/testing

Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-01-22 01:19:30 +00:00
Mark Fasheh
50af94b14c ocfs2: Add backup superblock info to ocfs2_fs.h
This synchronizes us with recent ocfs2-tools changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:20:10 -08:00
Mark Fasheh
6a1bd4a578 ocfs2: cleanup ocfs2_iget() errors
Get rid of some error prints in the ocfs2_iget() path from
ocfs2_get_dentry(). NFSD can easily cause us to read stale inodes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:19:12 -08:00
Mark Fasheh
592282cf2e ocfs2: Directory c/mtime update fixes
ocfs2 wasn't updating c/mtime on directories during dirent
creation/deletion. Fix ocfs2_unlink(), ocfs2_rename() and
__ocfs2_add_entry() by adding the proper code to update the struct inode and
push the change out to disk.

This helps rename/unlink on nfs exported file systems in particular as those
clients compare directory time values to avoid a full re-reading a directory
which hasn't changed.

ocfs2_rename() loses some superfluous error handling as a result of this
patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:18:49 -08:00
Mark Fasheh
72bce5078d ocfs2: Don't print errors when following symlinks
We shouldn't print errors returned from vfs_follow_link(). This was causing
spurious errors to show up in the logs.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:18:14 -08:00
Steve French
bd2abf177b [CIFS] cifs sprintf fix
Cc: <alert7@xfocus.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-01-21 23:19:01 +00:00
Steve French
76849e3e97 [CIFS] Remove 2 unneeded kzalloc casts
Signed-off-by: Ahmed Darwish <darwish.07@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-01-21 22:56:22 +00:00
Anton Altaparmakov
bd62b23cbc NTFS: Forgot to bump version number in makefile to 2.1.28...
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2007-01-18 10:28:18 +00:00
Anton Altaparmakov
8331191e56 NTFS: 2.1.28 - Fix deadlock reported by Sergey Vlasov due to ntfs_put_inode().
- Fix deadlock in fs/ntfs/inode.c::ntfs_put_inode().  Thanks to Sergey
  Vlasov for the report and detailed analysis of the deadlock.  The fix
  involved getting rid of ntfs_put_inode() altogether and hence NTFS no
  longer has a ->put_inode super operation.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2007-01-18 09:42:48 +00:00
David Woodhouse
9cdf083f98 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-01-18 10:34:51 +11:00
David Woodhouse
e499e01d23 [JFFS2] debug.h: include <linux/sched.h> for current->pid
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-01-13 08:19:03 +08:00
David Chinner
f73ca1b76c [PATCH] Revert bd_mount_mutex back to a semaphore
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

(XFS unlocks the semaphore from a different task, by design.  The mutex
code warns about this)

Signed-off-by: Dave Chinner <dgc@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:21 -08:00
Trond Myklebust
e3db7691e9 [PATCH] NFS: Fix race in nfs_release_page()
NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

[akpm@osdl.org: cleanup]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:21 -08:00
Roman Zippel
3eb3c740f5 [PATCH] fix linux banner format string
Revert previous attempts at messing with the linux banner string and
simply use a separate format string for proc.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-10 09:33:59 -08:00
Kyungmin Park
abb536e7ac [JFFS2] use the ref_offset macro
Don't use ref->flash_offset directly in debugging code, use the ref_offset macro instead.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
2007-01-10 14:03:20 +02:00
Artem Bityutskiy
a2166b933e [JFFS2] Reschedule in loops
Make JFFS2 nicer and teach it to call cond_resched() in loops
which may be quite large.

Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
2007-01-10 14:01:00 +02:00
Linus Torvalds
90cb28e8f7 Revert "[PATCH] binfmt_elf: randomize PIE binaries (2nd try)"
This reverts commit 59287c0913.

Hugh Dickins reports that it causes random failures on x86 with SuSE
10.2, and points out

  "Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE,
   sure to place the ET_DYN from time to time just where the comment
   says it's trying to avoid? I assume that somehow results in the error
   reported."

(where the comment in question is the existing comment in the source
code about mmap/brk clashes).

Suggested-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Marcus Meissner <meissner@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 13:28:21 -08:00
Evgeniy Dushistov
d63b70902b [PATCH] fix garbage instead of zeroes in UFS
Looks like this is the problem, which point Al Viro some time ago:

ufs's get_block callback allocates 16k of disk at a time, and links that
entire 16k into the file's metadata.  But because get_block is called for only
a single buffer_head (a 2k buffer_head in this case?) we are only able to tell
the VFS that this 2k is buffer_new().

So when ufs_getfrag_block() is later called to map some more data in the file,
and when that data resides within the remaining 14k of this fragment,
ufs_getfrag_block() will incorrectly return a !buffer_new() buffer_head.

I don't see _right_ way to do nullification of whole block, if use inode
page cache, some pages may be outside of inode limits (inode size), and
will be lost; if use blockdev page cache it is possible to zero real data,
if later inode page cache will be used.

The simpliest way, as can I see usage of block device page cache, but not only
mark dirty, but also sync it during "nullification".  I use my simple tests
collection, which I used for check that create,open,write,read,close works on
ufs, and I see that this patch makes ufs code 18% slower then before.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:29 -08:00
Eric Sandeen
be6aab0e9f [PATCH] fix memory corruption from misinterpreted bad_inode_ops return values
CVE-2006-5753 is for a case where an inode can be marked bad, switching
the ops to bad_inode_ops, which are all connected as:

static int return_EIO(void)
{
        return -EIO;
}

#define EIO_ERROR ((void *) (return_EIO))

static struct inode_operations bad_inode_ops =
{
        .create         = bad_inode_create
...etc...

The problem here is that the void cast causes return types to not be
promoted, and for ops such as listxattr which expect more than 32 bits of
return value, the 32-bit -EIO is interpreted as a large positive 64-bit
number, i.e. 0x00000000fffffffa instead of 0xfffffffa.

This goes particularly badly when the return value is taken as a number of
bytes to copy into, say, a user's buffer for example...

I originally had coded up the fix by creating a return_EIO_<TYPE> macro
for each return type, like this:

static int return_EIO_int(void)
{
	return -EIO;
}
#define EIO_ERROR_INT ((void *) (return_EIO_int))

static struct inode_operations bad_inode_ops =
{
	.create		= EIO_ERROR_INT,
...etc...

but Al felt that it was probably better to create an EIO-returner for each
actual op signature.  Since so few ops share a signature, I just went ahead
& created an EIO function for each individual file & inode op that returns
a value.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:23 -08:00
James Bursa
3223ea8cca [PATCH] adfs: fix filename handling
Fix filenames on adfs discs being terminated at the first character greater
than 128 (adfs filenames are Latin 1).  I saw this problem when using a
loopback adfs image on a 2.6.17-rc5 x86_64 machine, and the patch fixed it
there.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:22 -08:00
Amit Choudhary
85de3d9bc7 [JFFS2] Fix error-path leak in summary scan
Signed-off-by: Amit Choudhary <amit2030@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-01-02 21:16:10 +00:00
Linus Torvalds
bfff6e92a3 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: export heartbeat thread pid via configfs
  ocfs2: always unmap in ocfs2_data_convert_worker()
  ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()
  ocfs2: Allow direct I/O read past end of file
  ocfs2: don't print error in ocfs2_permission()
2006-12-30 12:02:53 -08:00
Dimitri Gorokhovik
131612dfe7 [PATCH] ramfs breaks without CONFIG_BLOCK
ramfs doesn't provide the .set_dirty_page a_op, and when the BLOCK layer is
not configured in, 'set_page_dirty' makes a call via a NULL pointer.

Signed-off-by: Dimitri Gorokhovik <dimitri.gorokhovik@free.fr>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:42 -08:00
Zach Brown
1ebb1101c5 [PATCH] Fix lock inversion aio_kick_handler()
lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
   interrupts enabled.  An interrupt might arrive and call wake_up() which
   grabs the wait queue's q->lock (B).

2) When performing retry-based AIO the AIO core registers
   aio_wake_function() as the wake funtion for iocb->ki_wait.  It is called
   with the wait queue's q->lock (B) held and then tries to add the iocb to
   the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
   alloc_lock (A) via lock_task() and unuse_mm().  Lockdep emits a warning
   saying that we're trying to connect the irq-safe q->lock to the
   irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock.  As Ben LaHaise pointed out __put_ioctx
could set ctx->mm to NULL, so we must only access ctx->mm while we have the
lock.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:55:54 -08:00
Zhen Wei
92efc15241 ocfs2: export heartbeat thread pid via configfs
The patch allows the ocfs2 heartbeat thread to prioritize I/O which may
help cut down on spurious fencing. Most of this will be in the tools -
we can have a pid configfs attribute and let userspace (ocfs2_hb_ctl)
calls the ioprio_set syscall after starting heartbeat, but only cfq
scheduler supports I/O priorities now.

Signed-off-by: Zhen Wei <zwei@novell.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:40:32 -08:00
Mark Fasheh
7f4a2a97e3 ocfs2: always unmap in ocfs2_data_convert_worker()
Mmap-heavy clustered workloads were sometimes finding stale data on mmap
reads. The solution is to call unmap_mapping_range() on any down convert of
a data lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:59 -08:00
Mark Fasheh
6c2aad0567 ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()
This can come from NFSD.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:32 -08:00