system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-10-17 22:04:40 +00:00

Author	SHA1	Message	Date
Lawrence Stewart	0963c8e431	When a previous call to sbsndptr() leaves sb->sb_sndptroff at the start of an mbuf that was fully consumed by the previous call, the mbuf ptr returned by the current call ends up being the previous mbuf in the sb chain to the one that contains the data we want. This does not cause any observable issues because the mbuf copy routines happily walk the mbuf chain to get to the data at the moff offset, which in this case means they effectively skip over the mbuf returned by sbsndptr(). We can't adjust sb->sb_sndptr during the previous call for this case because the next mbuf in the chain may not exist yet. We therefore need to detect the condition and make the adjustment during the current call. Fix by detecting the special case of moff being at the start of the next mbuf in the chain and adjust the required accounting variables accordingly. Reviewed by: andre MFC after: 2 weeks	2013-06-19 03:08:01 +00:00
Lawrence Stewart	ec41a9a1bd	The fix committed in r250951 replaced the reported panic with a deadlock... gold star for me. EVENTHANDLER_DEREGISTER() attempts to acquire the lock which is held by the event handler framework while executing event handler functions, leading to deadlock. Move EVENTHANDLER_DEREGISTER() to alq_load_handler() and thus deregister the ALQ shutdown_pre_sync handler at module unload time, which takes care of the originally reported panic and fixes the deadlock introduced in r250951. Reported by: Luiz Otavio O Souza MFC after: 3 days X-MFC with: 250951	2013-06-17 09:49:07 +00:00
Ed Schouten	2381f6ef8c	Change callout use counter to use C11 atomics. In order to get some coverage of C11 atomics in kernelspace, switch at least one piece of code in kernelspace to use C11 atomics instead of <machine/atomic.h>. While there, slightly improve the code by adding an assertion to prevent the use count from going negative.	2013-06-16 09:30:35 +00:00
Lawrence Stewart	38f080cb04	Move hhook's per-vnet initialisation to an earlier SYSINIT SI_SUB stage to ensure all per-vnet related hhook initialisation is completed prior to any virtualised hhook points attempting registration. vnet_register_sysinit() requires that a stage later than SI_SUB_VNET be chosen. There are no per-vnet initialisors in the source tree at this time which run earlier than SI_SUB_INIT_IF. A quick audit of non-virtualised SYSINITs indicates there are no subsystems pre SI_SUB_MBUF that would likely be interested in registering a virtualised hhook point. Settle on SI_SUB_MBUF as hhook's per-vnet initialisation stage as it's the first overtly network-related initilisation stage to run after SI_SUB_VNET. If a subsystem that initialises earlier than SI_SUB_MBUF ends up wanting to register virtualised hhook points in future, hhook's use of SI_SUB_MBUF will need to be revisited and would probably warrant creating a dedicated SI_SUB_HHOOK which runs immediately after SI_SUB_VNET. MFC after: 1 week	2013-06-15 10:08:34 +00:00
Lawrence Stewart	933a8bff73	Cleanup and simplification in khelp_{register\|deregister}_helper(). No functional changes. MFC after: 1 week	2013-06-15 06:45:17 +00:00
Lawrence Stewart	58261d30e1	Add a private KPI between hhook and khelp that allows khelp modules to insert hook functions into hhook points which register after the modules were loaded - potentially useful during boot or if hhook points are dynamically registered. MFC after: 1 week	2013-06-15 05:57:29 +00:00
Lawrence Stewart	b1f53277ec	Internalise handling of virtualised hook points inside hhook_{add\|remove}_hook_lookup() so that khelp (and other potential API consumers) do not have to care when they attempt to (un)hook a particular hook point identified by id and type. Reviewed by: scottl MFC after: 1 week	2013-06-15 04:03:40 +00:00
Lawrence Stewart	bfe72a58e2	Fix a major oversight in r251732 which causes non-VIMAGE kernels to trigger a KASSERT during TCP hhook registration at boot. Virtualised hook points only require extra housekeeping and sanity checking when "options VIMAGE" is present. Reported by: bdrewery,jh,dhw Tested by: dhw MFC after: 1 week X-MFC with: 251732	2013-06-14 18:11:21 +00:00
Lawrence Stewart	601d4c7543	Add support for non-virtualised hhook points, which are uniquely identified by type and id, as compared to virtualised hook points which are now uniquely identified by type, id and a vid (which for vimage is the pointer to the vnet that the hhook resides in). All hhook_head structs for both virtualised and non-virtualised hook points coexist in hhook_head_list, and a separate list is maintained for hhook points within each vnet to simplify some vimage-related housekeeping. Reviewed by: scottl MFC after: 1 week	2013-06-14 04:10:34 +00:00
Lawrence Stewart	86241d89a9	Fix a potential NULL-pointer dereference that would trigger if the hhook registration site did not provide storage for a copy of the hhook_head struct. MFC after: 3 days	2013-06-14 02:25:40 +00:00
Jeff Roberson	17a2737732	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-13 20:46:03 +00:00
Konstantin Belousov	1d7466bca4	Fix two issues with the spin loops in the umtx(2) implementation. - When looping, check for the pending suspension. Otherwise, other usermode thread which races with the looping one, could try to prevent the process from stopping or exiting. - Add missed checks for the faults from casuword*(). The code is structured in a way which makes the loops exit if the specified address is invalid, since both fuword() and casuword() return -1 on the fault. But if the address is mapped readonly, the typical value read by fuword() is different from -1, while casuword() returns -1. Absent the checks for casuword() faults, this is interpreted as the race with other thread and causes non-interruptible spinning in the kernel. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-06-13 09:33:22 +00:00
Marcel Moolenaar	4612275fdb	Revert r251590. It unexpectedly broke the build and there were some questions on locking. As part of commit-bit grooming, I'd like Steve to handle this, but can't leave things broken in the mean time.	2013-06-10 15:22:27 +00:00
Marcel Moolenaar	8c7ca16f63	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. Submitted by: stevek@juniper.net Obtained from: Juniper Networks, Inc	2013-06-09 23:51:26 +00:00
Gleb Smirnoff	8d1aa3c6b4	aio_mlock() added: - Regen for r251526. - Bump __FreeBSD_version.	2013-06-08 13:30:13 +00:00
Gleb Smirnoff	6160e12c10	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
Gleb Smirnoff	f95c13db04	Separate LIO_SYNC processing into a separate function aio_process_sync(), and rename aio_process() into aio_process_rw(). Reviewed by: kib Sponsored by: Nginx, Inc.	2013-06-08 13:02:43 +00:00
John Baldwin	c9813d0a37	Do not compare the existing mask of a cpuset with a new mask when changing the mask of a cpuset. Also, change the cpuset's mask before updating the masks of all children. Previously changing a cpuset's mask first required setting the mask to a super-set of both the old and new masks and then changing it a second time to the new mask.	2013-06-06 14:43:19 +00:00
Alan Cox	27a18d6a23	Don't busy the page unless we are likely to release the object lock. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2013-06-06 06:17:20 +00:00
Jeff Roberson	ba39d89bc9	- Consolidate duplicate code into support functions. - Split the bqlock into bqclean and bqdirty locks. - Only acquire the wakeup synchronization locks when we cross a threshold requiring them. - Restructure the way flushbufqueues() targets work so they are more smp friendly and sane. Reviewed by: kib Discussed with: mckusick, attilio Sponsored by: EMC / Isilon Storage Division M vfs_bio.c	2013-06-05 23:53:00 +00:00
Gleb Smirnoff	82e825c4c9	Improve r250890, so that we stop processing of a message with zero descriptors as early as possible, and assert that number of descriptors is positive in unp_freerights(). Reviewed by: mjg, pjd, jilles	2013-06-04 11:19:08 +00:00
John Baldwin	24150d37d3	- Fix a couple of inverted panic messages for shared/exclusive mismatches of a lock within a single thread. - Fix handling of interlocks in WITNESS by properly requiring the interlock to be held exactly once if it is specified.	2013-06-03 17:41:11 +00:00
John Baldwin	95d28652af	- Handle the recursed/not recursed flags with RA_RLOCKED in rw_assert(). - Tweak a panic message.	2013-06-03 17:38:57 +00:00
Konstantin Belousov	d39116f5d5	Be more generous when donating the current thread time to the owner of the vnode lock while iterating over the free vnode list. Instead of yielding, pause for 1 tick. The change is reported to help in some virtualized environments. Submitted by: Roger Pau Monn? <roger.pau@citrix.com> Discussed with: jilles Tested by: pho MFC after: 2 weeks	2013-06-03 17:36:43 +00:00
Konstantin Belousov	1e65d73c74	Do not map the shared page COW. If the process wired its address space, fork(2) would cause shadowing of the physical object and copying of the shared page into private copy, effectively preventing updates for the exported timehands structure and stopping the clock. Specify the maximum allowed permissions for the page to be read and execute, preventing write from the user mode. Reported and tested by: <huanghwh@yahoo.com> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-06-03 04:32:53 +00:00
Konstantin Belousov	92fab43f7f	When auto-sizing the buffer cache, limit the amount of physical memory used as the estimation of size, to 32GB. This provides around 100K of buffer headers and corresponding KVA for buffer map at the peak. Sizing the cache larger is not useful, also resulting in the wasting and exhausting of KVA for large machines. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation	2013-06-03 04:16:48 +00:00
Alan Cox	39a4cd0cec	Reduce the scope of the VM object locking in brelse(). In my tests, this change reduced the total number of VM object lock acquisitions by brelse() by 74%. Sponsored by: EMC / Isilon Storage Division	2013-06-02 16:18:03 +00:00
Marius Strobl	0ad17e4b32	Move an assertion to the right spot; only bus_dmamap_load_mbuf(9) requires a pkthdr being present but that's not the case for either _bus_dmamap_load_mbuf_sg() or bus_dmamap_load_mbuf_sg(9). Reported by: sbruno MFC after: 1 week	2013-06-01 11:42:47 +00:00
John Baldwin	3d4c503cf0	Style fixes to vn_ioctl(). Suggested by: bde	2013-05-31 16:15:22 +00:00
Jeff Roberson	22a722605d	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf	2013-05-31 00:43:41 +00:00
Julian Elischer	4591f0d339	Initialising the new fibnum field to a known value turns out to be a GOOD IDEA (TM). Apparently MOST users set this (e.g. tcp and friends) but there are a few users that just assume that it is a sensible value but then go on to read it. These include SCTP, pf and the FLOWTABLE option (and maybe others).	2013-05-24 02:18:37 +00:00
Lawrence Stewart	7639c9be45	Ensure alq's shutdown_pre_sync event handler is deregistered on module unload to avoid a dangling pointer and eventual panic on system shutdown. Reported by: Ali <comnetboy at gmail.com> Tested by: Ali <comnetboy at gmail.com> MFC after: 1 week	2013-05-24 00:49:12 +00:00
Pawel Jakub Dawidek	92981fdf9e	Use proper malloc type for ioctls white-list. Reported by: pho Tested by: pho	2013-05-23 21:07:26 +00:00
Luigi Rizzo	4b62214f4a	Increase the (arbitrary) limit for the number of packets per tick from 1k to 20k The previous value was good 10 years ago, but not anymore now. More importantly, lots of good surprises: polling is incredibly effective under virtualization, and not only prevents livelock but also saves most of the VM exit overhead in receive mode. Using polling, a FreeBSD instance under qemu-kvm remains perfectly responsive even when bombed with 10 Mpps over an emulated e1000, and happily processes 1.7 Mpps through ipfw. Note that some incompatibilities still remain: e.g. polling is not (yet) compatible with netmap, and seems to freeze the guest when kern.polling.idle_poll=1 MFC after: 3 days	2013-05-22 16:32:18 +00:00
Mateusz Guzik	ecbb2a1819	passing fd over unix socket: fix a corner case where caller wants to pass no descriptors. Previously the kernel would leak memory and try to free a potentially arbitrary pointer. Reviewed by: pjd	2013-05-21 21:58:00 +00:00
Attilio Rao	bed927ee17	vm_object locking is not needed there as pages are already wired. Sponsored by: EMC / Isilon storage division Submitted by: alc	2013-05-21 20:54:03 +00:00
Konstantin Belousov	f85769eb75	Regenerate.	2013-05-21 11:41:08 +00:00
Konstantin Belousov	48947eccee	Fix the wait6(2) on 32bit architectures and for the compat32, by using the right type for the argument in syscalls.master. Also fix the posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the architectures which require padding of the 64bit argument. Noted and reviewed by: jhb Pointy hat to: kib MFC after: 1 week	2013-05-21 11:40:16 +00:00
Pawel Jakub Dawidek	9b9ff7d390	Style nits.	2013-05-19 23:30:24 +00:00
Pawel Jakub Dawidek	9b1040a574	Use SDT_PROBE1() instead of SDT_PROBE().	2013-05-19 23:29:22 +00:00
Jamie Gritton	761d2bb5b9	Refine the "nojail" rc keyword, adding "nojailvnet" for files that don't apply to most jails but do apply to vnet jails. This includes adding a new sysctl "security.jail.vnet" to identify vnet jails. PR: conf/149050 Submitted by: mdodd MFC after: 3 days	2013-05-19 04:10:34 +00:00
Attilio Rao	e3ed7ff03f	Use readlocking now that assertions on vm_page_lookup() are relaxed. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: flo, pho	2013-05-17 20:03:55 +00:00
Jaakko Heinonen	c532f8c4d6	A library function shall not set errno to 0. Reviewed by: mdf	2013-05-16 18:13:10 +00:00
Jeff Roberson	f2cc1285c2	- Add a new general purpose path-compressed radix trie which can be used with any structure containing a uint64_t index. The tree code auto-generates type safe wrappers. - Eliminate the buf splay and replace it with pctrie. This is not only significantly faster with large files but also allows for the possibility of shared locking. Reviewed by: alc, attilio Sponsored by: EMC / Isilon Storage Division	2013-05-12 04:05:01 +00:00
Konstantin Belousov	0fc6daa72d	- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-05-11 11:17:44 +00:00
Eitan Adler	7a2b450ff8	Fxi a bunch of typos. PR: misc/174625 Submitted by: Jeremy Chadwick <jdc@koitsu.org>	2013-05-10 16:41:26 +00:00
Marcel Moolenaar	e63091ea6c	Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE locks. To support this, VNODE locks are created with the LK_IS_VNODE flag. This flag is propagated down using the LO_IS_VNODE flag. Note that WITNESS still records the LOR. Only the printing and the optional entering into the kernel debugger is bypassed with the WITNESS_NO_VNODE option.	2013-05-09 16:28:18 +00:00
Konstantin Belousov	3328431dec	Item 1 in r248830 causes earlier exits from the sendfile(2), before all requested data was sent. The reason is that xfsize <= 0 condition must not be tested at all if space == loopbytes. Otherwise, the done is set to 1, and sendfile(2) is aborted too early. Instead of moving the condition to exiting the inner loop after the xfersize check, directly check for the completed transfer before the testing of the available space in the socket buffer, and revert item 1 of r248830. It is arguably another bug to sleep waiting for socket buffer space (or return EAGAIN for non-blocking socket) if all bytes are already transferred. Reported by: pho Discussed with: scottl, gibbs Tested by: scottl (stable/9 backport), pho	2013-05-09 16:05:51 +00:00
Andre Oppermann	6753da1356	When the accept queue is full print the number of already pending new connections instead of by how many we're over the limit, which is always 1. Noticed by: jmallet MFC after: 1 week	2013-05-08 14:13:14 +00:00
Scott Long	ab8f55b9fd	Add a sysctl vfs.read_min to complement the exiting vfs.read_max. It defaults to 1, meaning that it's off. When read-ahead is enabled on a file, the vfs cluster code deliberately breaks a read into 2 I/O transactions; one to satisfy the actual read, and one to perform read-ahead. This makes sense in low-latency circumstances, but often produces unbalanced i/o transactions that penalize disks. By setting vfs.read_min, we can tell the algorithm to fetch a larger transaction that what we asked for, achieving the same effect as the read-ahead but without the doubled, unbalanced transaction and the slightly lower latency. This significantly helps our workloads with video streaming. Submitted by: emax Reviewed by: kib Obtained from: Netflix	2013-05-07 08:16:21 +00:00

1 2 3 4 5 ...

13261 commits