Commit graph

964 commits

Author SHA1 Message Date
Rick Macklem f92bbff248 Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.

Reviewed by:	jh
MFC after:	2 weeks
2010-07-24 22:11:11 +00:00
John Baldwin 3c497facfb Retire the NFS access cache timestamp structure. It was used in VOP_OPEN()
to avoid sending multiple ACCESS/GETATTR RPCs during a single open()
between VOP_LOOKUP() and VOP_OPEN().  Now we always send the RPC in
VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be
sent.

MFC after:	2 weeks
2010-07-15 19:40:48 +00:00
John Baldwin 72ccfc240b A previous change moved the GETATTR RPC for open() calls that hit in the
name cache up into nfs_lookup() instead of nfs_open().  Continue this
trend by flushing the attribute cache for leaf nodes in nfs_lookup() during
an open() if we do a LOOKUP RPC.  For NFSv3 this should generally be a NOP
as the attributes are flushed before fetching the post-op attributes from
the LOOKUP RPC which most (all?) NFSv3 servers provide, so the post-op
attributes should populate the cache.

Now all NFS open() calls will always clear the cached attributes during the
nfs_lookup() prior to nfs_open() in the !NMODIFIED case to provide CTOC.
As a result, we can remove the conditional flushing of the attribute
cache from nfs_open().

Reviewed by:	rmacklem, bde
MFC after:	2 weeks
2010-07-12 14:27:49 +00:00
John Baldwin 985fda3dfe - Add missing locking around flushing of an NFS node's attribute cache
in the NMODIFIED case of nfs_open().
- Cosmetic tweak to simplify an expression in nfs_lookup().

Reviewed by:	rmacklem, bde
MFC after:	1 week
2010-07-12 14:19:23 +00:00
Konstantin Belousov b38f7723eb In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by:	Mikolaj Golub <to.my.trociny gmail com>
MFC after:	2 weeks
2010-06-13 05:24:27 +00:00
Xin LI 76af820152 Fix build: newnp represents newvp so KDTRACE_NFS_ATTRCACHE_FLUSH_DONE()
on newvp instead of vp here.
2010-05-27 22:59:37 +00:00
John Baldwin b367632ec2 More gracefully handle stale file handles and attributes when opening a
file via NFS.  Specifically, to satisfy close-to-open-consistency, the NFS
client always performs at least one RPC on a file during an open(2) to see
if the file has changed.  Normally this RPC is an ACCESS or GETATTR RPC
that is forced by flushing a file's attribute cache during nfs_open() and
then requesting new attributes.  However, if the file is noticed to be
stale during nfs_open(), the only recourse is to fail the open(2) call
with ESTALE.  On the other hand, if the ACCESS or GETATTR RPC is sent
during nfs_lookup(), then the NFS client can fall back to a LOOKUP RPC to
obtain the new file handle in the case that a file has been replaced.

This change causes the NFS client to flush the attribute cache during
nfs_lookup() when validating a name cache hit if the attributes fetched
during nfs_lookup() can be reused in nfs_open().  This allows the client
to open a replaced file via the new file handle the first time that it
notices a replaced file rather than failing with ESTALE in some cases.

Reviewed by:	rmacklem, bde
Reviewed by:	mohans (older version)
MFC after:	1 week
2010-05-27 18:07:20 +00:00
Colin Percival 8fd6c56d29 Change the current working directory to be inside the jail created by
the jail(8) command. [10:04]

Fix a one-NUL-byte buffer overflow in libopie. [10:05]

Correctly sanity-check a buffer length in nfs mount. [10:06]

Approved by:	so (cperciva)
Approved by:	re (kensmith)
Security:	FreeBSD-SA-10:04.jail
Security:	FreeBSD-SA-10:05.opie
Security:	FreeBSD-SA-10:06.nfsclient
2010-05-27 03:15:04 +00:00
Alan Cox 03679e2334 Push down the page queues lock into vm_page_activate(). 2010-05-07 15:49:43 +00:00
Alan Cox eb00b276ab Eliminate page queues locking around most calls to vm_page_free(). 2010-05-06 18:58:32 +00:00
Alan Cox 5ac59343be Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
Edward Tomasz Napierala b5f770bd86 Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().
Reviewed by:	kib
2010-05-05 16:44:25 +00:00
Konstantin Belousov fc0c3802f0 Lock the page around vm_page_activate() and vm_page_deactivate() calls
where it was missed. The wrapped fragments now protect wire_count with
page lock.

Reviewed by:	alc
2010-05-03 20:31:13 +00:00
Pawel Jakub Dawidek 6619e90994 Simplify code a bit. 2010-02-18 22:10:55 +00:00
Marius Strobl b06b8fe3a7 Factor out the code shared between NFS client and server into its own
module. With r203732 it became apparent that creating the sysctl nodes
twice causes at least a warning, however the whole code shouldn't be
present twice in the first place.

Discussed with:	rmacklem
2010-02-16 20:00:21 +00:00
Marius Strobl 9ea01fedc0 - Move nfs_realign() from the NFS client to the shared NFS code and
remove the NFS server version in order to reduce code duplication.
  The shared version now uses a second parameter how, which is passed
  on to m_get(9) and m_getcl(9) as the server used M_WAIT while the
  client requires M_DONTWAIT, and replaces the the previously unused
  parameter hsiz.
- Change nfs_realign() to use nfsm_aligned() so as with other NFS code
  the alignment check isn't actually performed on platforms without
  strict alignment requirements for performance reasons because as the
  comment suggests unaligned data only occasionally occurs with TCP.
- Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather
  than M_WAIT because it's called with the RPC sp_lock held.

Reviewed by:	jhb, rmacklem
MFC after:	1 week
2010-02-09 23:45:14 +00:00
Marius Strobl be03f0b907 Some style(9) fixes 2010-02-09 23:40:07 +00:00
Rick Macklem 651ff543f8 Fix a race that can occur when nfs nfsiod threads are being created.
Without this patch it was possible for a different thread that calls
nfs_asyncio() to snitch a newly created nfsiod thread that was
intended for another caller of nfs_asyncio(), because the nfs_iod_mtx
mutex was unlocked while the new nfsiod thread was created. This patch
labels the newly created nfsiod, so that it is not taken by another
caller of nfs_asyncio(). This is believed to fix the problem reported
on the freebsd-stable email list under the subject:
FreeBSD NFS client/Linux NFS server issue.

Tested by:	to DOT my DOT trociny AT gmail DOT com
Reviewed by:	jhb
MFC after:	2 weeks
2010-01-27 15:22:20 +00:00
Rick Macklem ff589d0da1 Fix a typo in a comment introduced by r202767.
MFC after:	2 weeks
2010-01-21 21:59:10 +00:00
Rick Macklem f957b30da2 Add a timeout for the negative name cache entries in the NFS client.
This avoids a bogus negative name cache entry from persisting forever
when another client creates an entry with the same name within the
same NFS server time of day clock tick. The mount option negnametimeo
can be used to override the default timeout interval on a
per-mount-point basis. Setting negnametimeo to 0 disables negative
name caching for the mount point.
I also fixed one obvious typo where args.timeo should be
args.maxgrouplist.

Submitted by:	jhb (earlier version)
Reviewed by:	jhb
MFC after:	2 weeks
2010-01-21 20:57:25 +00:00
Marko Zec 5d005b51e5 Reduce recursions on curvnet and thus spamming the console with warning
messages for kernels built with options VIMAGE and VNET_DEBUG enabled.

Reviewed by:	bz
MFC after:	3 days
2010-01-09 14:56:38 +00:00
Martin Blapp c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Bjoern A. Zeeb 2254f022a0 Add missing include to make LINT-VIMAGE build as well.
Found by:	test building an MFC candidate
X-MFC with:	r200471
2009-12-27 10:10:38 +00:00
Bjoern A. Zeeb e65a4ba18b Add a few more V_hacks to nfsclient to allow machines with a VIMAGE
kernel to boot from NFS. [1]

Note: this is not a full virtualization of nfsclient. It is only does
what advertised above and nothing more.

Requested by:	public demand [1]
Tested by:	kris, ..
MFC after:	5 days
2009-12-13 11:06:39 +00:00
John Baldwin 12ac99dc46 Close a race with caching of -ve name lookups in the NFS client.
Specifically, clients only trust -ve cache entries while the directory
remains unchanged and discard any -ve cache entries for a directory when
they notice that the modification time of a directory entry changes.  The
race involves two concurrent lookups as follows:
- Thread A does a lookup for file 'foo' which sends a lookup RPC to the
  server.  The lookup fails and the server replies.
- The 'foo' file is created (either by the same client or a different
  client) updating the modification time on the parent directory of 'foo'.
- Thread B does a lookup for a different file 'bar' which updates the
  cached attributes of the parent directory of 'foo' to reflect the new
  modification time after 'foo' was created.
- Thread A finally resumes execution to parse the reply from the NFS
  server.  It adds a -ve cache entry and sets the cached value of the
  directory's modification time that is used for invalidating -ve cached
  lookups to the new modification time set by thread B.

At this point, future lookups of 'foo' will honor the -ve cached entry
until the cached entry is pushed out of the name cache's LRU or the
modification time of the parent directory is changed again by some other
change.  The fix is to read the directory's modification time before
sending the lookup RPC and use that cached modification time when setting
the directory's cached modification time.  Also, we do not add a -ve cache
entry if another thread has added -ve cache entry that set the directory's
cached modification time to a newer value than the value we read before
sending the lookup RPC.

Reviewed by:	rmacklem
MFC after:	1 week
2009-10-16 19:30:48 +00:00
Robert Watson 4347e9fd66 Add a MODULE_DEPEND() on the NFS client from dtnfsclient so that dtnfsclient
can access NFS client symbols.

MFC after:	3 days
Discussed with:	kib
Reported by:	markm
2009-10-12 18:58:42 +00:00
Qing Li 812777783d Reverting the previous change for now. Some users reports the patch
fixes their issues but one reports a failure in NFS ROOT. Revert
the change for now pending further investigation.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 22:09:42 +00:00
Qing Li 3b208f7ca0 Simply remove the code instead of using "#if 0".
Pointed out by sam
2009-09-15 02:22:57 +00:00
Qing Li 96ed1732bb The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

Reviewed by:    kmacy
2009-09-15 01:01:03 +00:00
Rick Macklem 8f63187ec1 Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.

Suggested by:	kib
Approved by:	kib (mentor)
MFC after:	3 days
2009-09-09 20:37:49 +00:00
Marko Zec 0348c661d1 Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by:	rmacklem (in part)
Reviewed by:	rmacklem, rwatson
Discussed with:	dfr, bz
Approved by:	re (rwatson), julian (mentor)
MFC after:	3 days
2009-08-24 10:09:30 +00:00
Robert Watson 77dfcdc445 Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
Konstantin Belousov 48bd6d4a49 In nfs_upgrade_vnlock(), assert that the vnode is locked. It is for all
pathes, as far as I see and testing seems to confirm it. Comparision of
old_lock with LK_SHARED make sense only if vnode is locked by current
thread.

When downgrading, pass LK_RETRY to the vn_lock(), since otherwise
vn_lock() unlocks the doomed vnode, causing extra unlock.

Reported and tested by:	pho
Approved by:	re (rwatson)
MFC after:	3 weeks
2009-08-14 10:59:17 +00:00
Robert Watson 530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Rick Macklem 874bb76647 Patch the regular nfs client in a manner analagous to
r195704 for the experimental client. The patch avoids calling vn_lock()
for the case where nfs_nget() has acquired the same vnode as dvp,
since nfs_nget() has already locked the vnode.

Reviewed by:	kib, jhb
Approved by:	re (kensmith), kib (mentor)
2009-07-17 19:38:07 +00:00
Konstantin Belousov b35687df13 Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:54:29 +00:00
Robert Watson eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Konstantin Belousov f1eccd05ec In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.

Tested by:	pho
Reviewed by:	jeff
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-02 18:02:55 +00:00
Doug Rabson 98c497255b Adjust the internal NFS KPI to avoid the last traces of NFS_LEGACYRPC.
Approved by: re
2009-06-30 19:10:17 +00:00
Doug Rabson b49a2b39fd Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
2009-06-30 19:03:27 +00:00
John Baldwin 5ed6940d13 Fix build with NFS_LEGACYRPC enabled after the socket upcall locking
changes.

Approved by:	re (kensmith)
2009-06-30 03:18:51 +00:00
Robert Watson 2d9cfabad4 Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
Bjoern A. Zeeb 5736e6fb9d After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
Alan Cox 57a7e73261 Fix some of the style errors in *getpages(). 2009-06-18 05:56:24 +00:00
Konstantin Belousov b3c5643a25 For dotdot lookup in nfs_lookup, inline the vn_vget_ino() to prevent
operating on the unmounted mount point and freed mount data in case of
forced unmount performed while dvp is unlocked to nget the target vnode.

Add missed calls to m_freem(mrep) there on error exits [1].

Submitted by:	rmacklem [1]
Tested by:	pho
MFC after:	2 weeks
2009-06-17 12:47:27 +00:00
Jamie Gritton c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Rick Macklem 5081c8c757 Add a test for VI_DOOMED just after nfs_upgrade_vnlock() in
nfs_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in nfs_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in nfs_vinvalbuf() down to after nfs_upgrade_vnlock() and replace the
out of date comment for it.

Submitted by:	jhb
Tested by:	pho
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-10 21:03:57 +00:00
Bjoern A. Zeeb 8d8bc0182e After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
John Baldwin 74fb0ba732 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
Bjoern A. Zeeb c2c2a7c11e Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by:	julian, rwatson, zec
X-MFC:		not possible
2009-06-01 15:49:42 +00:00