Commit graph

1027 commits

Author SHA1 Message Date
Alan Somers 420a5a2445 zfs-program.8: fix orphan .Xr
Reported by:	phk
Reviewed by:	avg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24488
2020-04-18 20:55:43 +00:00
Mariusz Zaborski 23ee238fe5 zfs: Add option for forcible unmounting dataset while receiving snapshot.
Currently when the dataset is in use we can't receive snapshots.

zfs send test/1@asd | zfs recv -FM test/2
cannot unmount '/test/2': Device busy

This commits add option 'M' which attempts to forcibly unmount the
dataset.  Thanks to this we can enforce receiving snapshots in a
single step.

Note that this functionality is not supported on Linux because the
VFS will prevent active mounted filesystems from being unmounted,
even with the force option.  This is the intended VFS behavior.

Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Differential Revision:	https://reviews.freebsd.org/D22306

openzfs/zfs@a57d3d45d6
2020-04-11 17:54:35 +00:00
Kyle Evans cddf6d61d1 zfs: fix -fno-common issues
A similar (or identical?) fix has already landed in OpenZFS.

-fno-common will become the default in GCC10/LLVM11.

MFC after:	3 days
2020-03-28 17:00:38 +00:00
Ryan Moeller 69534635ff MFOpenZFS: ZVOLs should not be allowed to have children
zfs create, receive and rename can bypass this hierarchy rule. Update
both userland and kernel module to prevent this issue and use pyzfs
unit tests to exercise the ioctls directly.

Note: this commit slightly changes zfs_ioc_create() ABI. This allow to
differentiate a generic error (EINVAL) from the specific case where we
tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

Approved by:	mav (mentor)
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
openzfs/zfs@d8d418ff0c
2020-03-25 15:56:18 +00:00
Alexander Motin d3c6ba3214 MFOpenZFS: make zil max block size tunable
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations.  The large allocations are for ZIL blocks.  If there is a
lot of fragmentation, the large allocations can be hard to satisfy.

The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more.  In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.

To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.

External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8865
openzfs/zfs@b8738257c2

MFC after:	2 weeks
2020-03-19 01:05:54 +00:00
Mark Johnston d8f743dcaa Do not load dtraceall.ko if dtrace.ko is already loaded.
This was the intent of the existing code, but instead it would
unconditionally load dtraceall.ko because of a stale errno value.

Reported by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-02-28 17:05:27 +00:00
Brandon Bergren 7283901ae9 [PowerPC] [DTrace] Add ELFv2 support in libdtrace
PPC64 ELFv2 acts like a "normal" platform in that it no longer needs
function descriptors. So, ensure we are only enabling them on ELFv1.

Additionally, ELFv2 requires that the ELF header have a nonzero e_flags,
so ensure that the synthesized ELF header in dt_link.c is setting it.

Reviewed by:	jhibbits, markj
Approved by:	gnn
Differential Revision:	https://reviews.freebsd.org/D22403
2020-02-05 19:39:48 +00:00
Alan Somers 03e7a2be06 Speed up "zpool import" in the presence of many zvols
By default, zpools may not be backed by zvols (that can be changed with the
"vfs.zfs.vol.recursive" sysctl). When that sysctl is set to 0, the kernel
does not attempt to read zvols when looking for vdevs. But the zpool command
still does. This change brings the zpool command into line with the kernel's
behavior. It speeds "zpool import" when an already imported pool has many
zvols, or a zvol with many snapshots.

PR:		241083
Reported by:	Martin Birgmeier <d8zNeCFG@aon.at>
Reviewed by:	mav, Ryan Moeller <ryan@freqlabs.com>
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D22077
2020-01-28 23:07:31 +00:00
Kyle Evans 149d640cfa libzfs: add zfs_mount_at
This will be used in libbe in place of the internal zmount(); libbe only
wants to be able to mount a dataset at an arbitrary mountpoint without
altering dataset/pool properties. The natural way to do this in a portable
way is by creating a zfs_mount_at() interface that's effectively zfs_mount()
+ a mountpoint parameter. zfs_mount() is now a light wrapper around the new
method.

The interface and implementation have already been accepted into ZFS On
Linux, and the next commit to switch libbe() over to this new interface will
solve the last compatibility issue with ZoL.  The next sysutils/openzfs
rebase against ZoL should be able to build libbe/bectl with only minor
adjustments to build glue.

Reviewed by:	Ryan Moeller <ryan freqlabs com>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D23132
2020-01-19 02:45:02 +00:00
Mark Johnston 938acb0869 Use a deterministic hash for USDT symbol names.
Previously libdtrace used ftok(3), which hashes the inode number of the
input object file.  To increase reproducibility of builds that embed
USDT probes, include a hash of the object file path in the symbol name
instead.

Reported and tested by:	bdrewery
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2020-01-07 21:56:20 +00:00
Mateusz Guzik 04d88d154d vfs: add a file missed in r356337 2020-01-03 22:47:31 +00:00
Mark Johnston a2ea78495a Add libdtrace support for arm64 USDT probes.
arm64 is still lacking a fasttrap implementation, which is required to
actually enable userland probes, but this at least allows USDT probes to
be linked into userland applications.

Submitted by:	Klaus Küchemann <maciphone2@googlemail.com> (original)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22360
2019-12-29 21:46:50 +00:00
Ryan Libby fa19b250bd dtrace: avoid gcc9 Walloca-larger-than
gcc9 grew a new warning for unbounded allocas, such as the one in
dt_options_load.  Remove both uses of alloca in dt_options.c.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22880
2019-12-21 02:44:13 +00:00
Andriy Gapon 8491540808 MFV r354383: 10592 misc. metaslab and vdev related ZoL bug fixes
illumos/illumos-gate@555d674d5d
555d674d5d

https://www.illumos.org/issues/10592
  This is a collection of recent fixes from ZoL:
  8eef997679 Error path in metaslab_load_impl() forgets to drop ms_sync_lock
  928e8ad47d Introduce auxiliary metaslab histograms
  425d3237ee Get rid of space_map_update() for ms_synced_length
  6c926f426a Simplify log vdev removal code
  21e7cf5da8 zdb -L should skip leak detection altogether
  df72b8bebe Rename range_tree_verify to range_tree_verify_not_present
  75058f3303 Remove unused vdev_t fields

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
MFC after:	4 weeks
2019-11-21 13:35:43 +00:00
Andriy Gapon 489912da7b MFV r354382,r354385: 10601 10757 Pool allocation classes
illumos/illumos-gate@663207adb1
663207adb1

10601 Pool allocation classes
https://www.illumos.org/issues/10601
  illumos port of ZoL Pool allocation classes. Includes at least these two
  commits:
  441709695 Pool allocation classes misplacing small file blocks
  cc99f275a Pool allocation classes

10757 Add -gLp to zpool subcommands for alt vdev names
https://www.illumos.org/issues/10757
  Port from ZoL of
  d2f3e292d Add -gLp to zpool subcommands for alt vdev names
  Note that a subsequent ZoL commit changed -p to -P
  a77f29f93 Change full path subcommand flag from -p to -P

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: Håkan Johansson <f96hajo@chalmers.se>
Portions contributed by: Richard Yao <ryao@gentoo.org>
Portions contributed by: Chunwei Chen <david.chen@nutanix.com>
Portions contributed by: loli10K <ezomori.nozomu@gmail.com>
Author: Don Brady <don.brady@delphix.com>

11541 allocation_classes feature must be enabled to add log device

illumos/illumos-gate@c1064fd7ce
c1064fd7ce

https://www.illumos.org/issues/11541
  After the allocation_classes feature was integrated, one can no longer add a
  log device to a pool unless that feature is enabled. There is an explicit check
  for this, but it is unnecessary in the case of log devices, so we should handle
  this better instead of forcing the feature to be enabled.

Author: Jerry Jelinek <jerry.jelinek@joyent.com>

FreeBSD notes.
I faithfully added the new -g, -L, -P flags, but only -g does something:
vdev GUIDs are displayed instead of device names.  -L, resolve symlinks,
and -P, display full disk paths, do nothing at the moment.
The use of special vdevs is backward compatible for read-only access, so
root pools should be bootable, but exercise caution.

MFC after:	4 weeks
2019-11-21 08:20:05 +00:00
Andriy Gapon a4619d8461 zpool.8: remove a paragraph about quorum disks
FreeBSD has no such thing.
illumos and ZoL manuals do not talk about quorum disks either.
Only Oracle ZFS mentions them.

MFC after:	1 week
2019-11-20 08:56:01 +00:00
Andriy Gapon 54d1762459 fix up r354804, resolve merge conflicts in zpool.8
Somehow I managed to commit the manual page with unresolved conflicts in
it.

While here, I also replaced .sp with .Pp.

MFC after:	3 weeks
X-MFC with:	r354804
2019-11-20 08:49:13 +00:00
Mark Johnston 066d9631cb Fix inconsistencies in anonymous DOF files.
The DOF file output by dtrace -A contains only the loadable sections.
However, as it was created by a call to dtrace_dof_create() without
flags, the original DOF was created with the loadable sections.  The
result is that the DOF includes the section headers for the unloadable
sections (COMMENTS and UTSNAME) without these sections actually being
present.  This is inconsistent.

A simple change to anon_prog() ensures that the missing sections are
present in the outputted DOF.  Alternatively, the call to
dtrace_dof_create() could pass the DTRACE_D_STRIP flag stripping out the
loadable sections.  As the unloadable sections contain info useful for
debugging purposes they haven't been stripped.

Submitted by:	Graeme Jenkinson <graeme.jenkinson@cl.cam.ac.uk>
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D21875
2019-11-18 18:34:23 +00:00
Andriy Gapon a8c08e008a MFV r354378,r354379,r354386: 10499 Multi-modifier protection (MMP)
10499 Multi-modifier protection (MMP)
illumos/illumos-gate@e0f1c0afa4
e0f1c0afa4
https://www.illumos.org/issues/10499
  Port the following ZFS commits from ZoL to illumos.
  379ca9cf2 Multi-modifier protection (MMP)
  bbffb59ef Fix multihost stale cache file import
  0d398b256 Do not initiate MMP writes while pool is suspended

10701 Correct lock ASSERTs in vdev_label_read/write
illumos/illumos-gate@58447f688d
58447f688d
https://www.illumos.org/issues/10701
  Port of ZoL commit:
  0091d66f4e Correct lock ASSERTs in vdev_label_read/write
  At a minimum, this fixes a blown assert during an MMP test run when running on
  a DEBUG build.

11770 additional mmp fixes
illumos/illumos-gate@4348eb9012
4348eb9012
https://www.illumos.org/issues/11770
  Port a few additional MMP fixes from ZoL that came in after our
  initial MMP port.
  4ca457b065 ZTS: Fix mmp_interval failure
  ca95f70dff zpool import progress kstat
  (only minimal changes from above can be pulled in right now)
  060f0226e6 MMP interval and fail_intervals in uberblock

Note from the committer (me).
I do not have any use for this feature and I have not tested it.  I only
did smoke testing with multihost=off.
Please be aware.
I merged the code only to make future merges easier.

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: Tim Chase <tim@chase2k.com>
Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com>
Portions contributed by: John L. Hammond <john.hammond@intel.com>
Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov>
Portions contributed by: Prakash Surya <surya1@llnl.gov>
Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Olaf Faaland <faaland1@llnl.gov>

MFC after:	4 weeks
2019-11-18 09:38:35 +00:00
Andriy Gapon ee4cf489ac fix zpool list property names
This change is based on
r354380 8899 zpool list property documentation doesn't match actual behaviour

There is no "used" pool property, "alloc" is actually spelled
"allocated".

MFC after:	5 days
2019-11-07 11:50:53 +00:00
Andriy Gapon 930db3e338 MFV r354377: 10554 Implemented zpool sync command
illumos/illumos-gate@9c2acf00e2
9c2acf00e2

https://www.illumos.org/issues/10554
  During the port of MMP (illumos bug 10499) from ZoL, I found this
  earlier ZoL project is a prerequisite. Here is the original
  description.  This addition will enable us to sync an open TXG to the
  main pool on demand. The functionality is similar to 'sync(2)' but
  'zpool sync' will return when data has hit the main storage instead of
  potentially just the ZIL as is the case with the 'sync(2)' cmd.

Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
MFC after:	3 weeks
Relnotes:	possibly
2019-11-07 11:18:28 +00:00
Alan Somers 1af3a11218 MFZoL: Avoid retrieving unused snapshot props
This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #8077
zfsonlinux/zfs@4c0883fb4a

MFC after:	2 weeks
2019-10-26 17:11:02 +00:00
Andriy Gapon 8c017db747 zfs.8: fix a typo in the date
Reported by:	Trond Endrestøl <trond.endrestol@ximalas.info>
MFC after:	3 days
2019-10-25 09:19:15 +00:00
Andriy Gapon 2981bc73c7 fix section number in zfs-program.8
MFC after:	3 days
2019-10-16 15:21:05 +00:00
Andriy Gapon 786c532a8f MFV r348596: 9689 zfs range lock code should not be zpl-specific
illumos/illumos-gate@7931524763

FreeBSD note: some tweaking was needed to avoid a conflict with
sys/rangelock.h.

Author:	Matthew Ahrens <mahrens@delphix.com>
Obtained from:	illumos
MFC after:	3 weeks
2019-10-16 09:04:53 +00:00
Andriy Gapon b830d43356 fix wording / typos in r353625
Reported by:	kib
MFC after:	4 weeks
X-MFC with:	r353625, r353618
2019-10-16 07:53:47 +00:00
Andriy Gapon ba2bbaffcf zfs: add a lame emulation of cv_wait_sig(9) in userland to fix r353618
Not sure if we need anything better.
Maybe we should try to port illumos libfakekernel or provide something
similar natively.

MFC after:	4 weeks
X-MFC with:	r353618
2019-10-16 07:41:33 +00:00
Andriy Gapon c67a6b9c7f MFV r353623: 10473 zfs(1M) missing cross-reference to zfs-program(1M)
illumos/illumos-gate@736e670039
736e670039

https://www.illumos.org/issues/10473

Author: Jason King <jason.king@joyent.com>
Obtained from:	illumos
MFC after:	6 days
2019-10-16 07:20:59 +00:00
Andriy Gapon 179e6dab09 MFV r353615: 9485 Optimize possible split block search space
illumos/illumos-gate@a21fe34979
a21fe34979

https://www.illumos.org/issues/9485
  Port this commit from ZoL:
  4589f3ae4c

Author: Brian Behlendorf <behlendorf1@llnl.gov>
Obtained from:	illumos, ZoL
MFC after:	3 weeks
2019-10-16 06:43:22 +00:00
Andriy Gapon 6cb9ab2bad MFC r353611: 10330 merge recent ZoL vdev and metaslab changes
illumos/illumos-gate@a0b03b161c
a0b03b161c

https://www.illumos.org/issues/10330
  3 recent ZoL changes in the vdev and metaslab code which we can pull over:
  PR 8324 c853f382db 8324 Change target size of metaslabs from 256GB to 16GB
  PR 8290 b194fab0fb 8290 Factor metaslab_load_wait() in metaslab_load()
  PR 8286 419ba59145 8286 Update vdev_is_spacemap_addressable() for new spacemap
  encoding

Author: Serapheim Dimitropoulos <serapheimd@gmail.com>
Obtained from:	illumos, ZoL
MFC after:	2 weeks
2019-10-16 06:26:51 +00:00
Andriy Gapon fba547735c MFV r353606: 10067 Miscellaneous man page typos
https://www.illumos.org/issues/10067
  fileystem - man1m/zfs.1m man1m/boot.1m

Author: Peter Tribble <peter.tribble@gmail.com>
Obtained from:	illumos
MFC after:	1 week
2019-10-16 06:05:18 +00:00
Andriy Gapon f8e8686410 zfs: remove gratuitous divergence from other openzfs flavours
The divergence is a result of a local change in r344601 and a followup
fix in r352580 that reverted portions of the earlier change.

MFC after:	1 week
2019-10-09 11:57:45 +00:00
Andriy Gapon 3f4e3bccdc zfs: remove incorrect warning about boot support for large_dnode
Fixes r353341

Reported by:	tsoome
MFC after:	4 days
X-MFC with:	r353341
2019-10-09 11:46:36 +00:00
Andriy Gapon 329012f513 zfs: document large_dnode feature
The text is copied from illumos.
The conversion to mdoc is mine.
The FreeBSD boot warning is copied from large_block description.

MFC after:	4 days
2019-10-09 11:34:16 +00:00
Andriy Gapon 862c20fd89 MFV r350898, r351075: 8423 8199 7432 Implement large_dnode pool feature
8423 8199 7432 Implement large_dnode pool feature

7432 Large dnode pool feature
8199 multi-threaded dmu_object_alloc()
8423 Implement large_dnode pool feature
10406 large_dnode changes broke zfs recv of legacy stream

llumos/illumos-gate@54811da5ac
54811da5ac
https://www.illumos.org/issues/8423
https://www.illumos.org/issues/8199
https://www.illumos.org/issues/7432

illumos/illumos-gate@811964cd9f
811964cd9f
https://www.illumos.org/issues/10406

  ZoL issues:
  Improved dnode allocation #6564
  Clean up large dnode code #6262
  Fix dnode_hold() freeing dnode behavior #8172
  Fix dnode allocation race #6414, #6439
  Partial: Raw sends must be able to decrease nlevels #6821, #6864
  Remove unnecessary txg syncs from receive_object() Closes #7197

This updates FreeBSD large_dnode code (that was imported from ZoL) to a
version that was committed to illumos.  It has some cleanups,
improvements and fixes comparing to what we have in FreeBSD now.
I think that the most significant update is 8199 multi-threaded
dmu_object_alloc().

This commit reverts r351077 that was a revert of r351074 and r351076 and
restores those changes.  Required atomic operations should be available
now on all platforms where we build ZFS.

Obtained from:	illumos
MFC after:	3 weeks
2019-10-07 08:14:45 +00:00
Andriy Gapon 912c3fe715 ZFS: add bookmark renaming
The feature is implemented as an extension of the existing
ZFS_IOC_RENAME ioctl.  Both the userland and the DSL interfaces support
renaming only a single bookmark at a time.  As of now, there is no ZCP
interface to the new functionality.  I am going to add it once the DSL
interface passes a test of time.

This change picks up support for zfs_ioc_namecheck_t::ENTITY_NAME that
was added to ZoL as part of Redacted Send/Receive feature by Paul
Dagnelie <pcd@delphix.com>.  This is needed to allow a bookmark name in
zc_name.

Discussed with:	mahrens
Reviewed by:	bcr (man page)
Sponsored by:	CyberSecure
Differential Revision: https://reviews.freebsd.org/D21795
2019-10-03 11:08:45 +00:00
Andriy Gapon 38a1def12f MFZoL: Retire send space estimation via ZFS_IOC_SEND
Add a small wrapper around libzfs_core's lzc_send_space() to libzfs so
that every legacy ZFS_IOC_SEND consumer, along with their userland
counterpart estimate_ioctl(), can leverage ZFS_IOC_SEND_SPACE to
request send space estimation.

The legacy functionality in zfs_ioc_send() is left untouched for
compatibility purposes.

Obtained from:	ZoL
Obtained from:	zfsonlinux/zfs@cf7684bc8d
Author:		loli10K <ezomori.nozomu@gmail.com>
MFC after:	2 weeks
2019-09-22 08:44:41 +00:00
Andriy Gapon 62dd1037a9 print summary line for space estimate of zfs send from bookmark
Although there is always a single stream and the total size in the
summary is always equal to the size reported for the stream, it's nice
to follow the usual output format.

MFC after:	3 days
2019-09-22 08:34:23 +00:00
Sean Eric Fagan 0c3d878d1c Fix a regression introduced in r344601, and work properly with the
-v and -n options.

PR:		240640
Reported by:	Andriy Gapon <avg@FreeBSD.org>
Reviewed by:	avg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D21709
2019-09-21 17:54:42 +00:00
Andriy Gapon dfb2b9a361 update zfs send usage help with r352447
MFC after:	3 days
2019-09-19 09:48:01 +00:00
Andriy Gapon 496ba62c36 MFZoL: Add -vnP support to 'zfs send' for bookmarks
zfsonlinux/zfs@835db58592

We have long supported estimating a size of an incremental stream from a
snapshot.  We should do the same for bookmarks as well.

Obtained from:	ZoL
Author:		loli10K <ezomori.nozomu@gmail.com>
MFC after:	3 days
2019-09-17 13:58:15 +00:00
Andriy Gapon b539c9bfbd ZFS: Always refuse receving non-resume stream when resume state exists
This fixes a hole in the situation where the resume state is left from
receiving a new dataset and, so, the state is set on the dataset itself
(as opposed to %recv child).

Additionally, distinguish incremental and resume streams in error
messages.

This was also committed to ZoL:
zfsonlinux/zfs@ebeb6f23bf

MFC after:	2 weeks
Sponsored by:	CyberSecure
2019-09-04 07:33:22 +00:00
Li-Wen Hsu c4bf2f169a Fix dtrace test case after r351423 due to ping6(8) options changed
Failure test case:
    cddl.usr.sbin.dtrace.common.ip.t_dtrace_contrib.tst_ipv6localicmp_ksh

Sponsored by:	The FreeBSD Foundation
2019-08-31 15:10:27 +00:00
Li-Wen Hsu 8dc0ad14ad Fix tests use /etc/motd after r350184 by using an always existing file
Sponsored by:	The FreeBSD Foundation
2019-08-31 14:41:58 +00:00
Alexander Motin 4d08d25153 MFV/ZoL: Fix wrong assertion in libzfs diff error handling
In compare(), all error cases set the error code to EPIPE, so when an
error is set, the correct assertion to make is that the error is EPIPE,
not EINVAL.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@freqlabs.com>
Closes #8743
zfsonlinux/zfs@9dc41a769d

Submitted by:	Ryan Moeller <ryan@freqlabs.com>
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20118
2019-08-28 17:39:46 +00:00
Mark Johnston e1a29d8c67 Add hold events for lockmgr probes, missed in r351361.
MFC with:	r351361
2019-08-21 23:47:01 +00:00
Mark Johnston 5b699f1614 Add lockmgr(9) probes to the lockstat DTrace provider.
They follow the conventions set by rw and sx lock probes.  There is
an additional lockstat:::lockmgr-disown probe.

Update lockstat(1) to report on contention and hold events for
lockmgr locks.  Document the new probes in dtrace_lockstat.4, and
deduplicate some of the existing probe descriptions.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21355
2019-08-21 23:43:58 +00:00
Mark Johnston 6ad06a5e50 Fix inverted predicates for sx lock hold events in lockstat(1).
This caused shared sx holds to be reported as exclusive, and vice
versa.

Reviewed by:	mjg
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-08-21 23:13:00 +00:00
Mateusz Piotrowski 2379f91cf9 zpool-features.7: Fix a typo
Reported by:	pstef
Reviewed by:	pstef
Approved by:	src (pstef)
Differential Revision:	https://reviews.freebsd.org/D21290
2019-08-16 10:43:23 +00:00
Andriy Gapon 5a75f51a1f Revert r351076 and r351074 because of atomic_swap_64 on 32-bit platforms
Trying to sort it out.
2019-08-15 15:27:58 +00:00
Andriy Gapon 93132b76cd MFV r350898: 8423 8199 7432 Implement large_dnode pool feature
8423 8199 7432 Implement large_dnode pool feature

8423 Implement large_dnode pool feature
8199 multi-threaded dmu_object_alloc()
7432 Large dnode pool feature

llumos/illumos-gate@54811da5ac
54811da5ac
https://www.illumos.org/issues/8423
https://www.illumos.org/issues/8199
https://www.illumos.org/issues/7432

  ZoL issues:
  Improved dnode allocation #6564
  Clean up large dnode code #6262
  Fix dnode_hold() freeing dnode behavior #8172
  Fix dnode allocation race #6414, #6439
  Partial: Raw sends must be able to decrease nlevels #6821, #6864
  Remove unnecessary txg syncs from receive_object() Closes #7197

This updates FreeBSD large_dnode code (that was imported from ZoL) to a version
that was committed to illumos.  It has some cleanups, improvements and fixes
comparing to what we have in FreeBSD now.  I think that the most significant
update is 8199 multi-threaded dmu_object_alloc().

Obtained from:	illumos
MFC after:	3 weeks
2019-08-15 14:57:27 +00:00
Andriy Gapon 4139761bb5 MFV r350896: 6585 sha512, skein, and edonr have an unenforced dependency on extensible dataset
illumos/illumos-gate@892586e8a1
892586e8a1

https://www.illumos.org/issues/6585
  In any pool without the extensible dataset feature flag already enabled,
  creating a dataset with dedup set to use one of the new checksums would result
  in the following panic as soon as any data was added:
  panic[cpu0]/thread=ffffff0006761c40: feature_get_refcount(spa, feature,
  &refcount) != 48 (0x30 != 0x30), file: ../../common/fs/zfs/zfeature.c line 390

  ffffff0006761830 fffffffffba8fbdd ()
  ffffff0006761890 zfs:feature_do_action+11a ()
  ffffff00067618c0 zfs:spa_feature_incr+1e ()
  ffffff0006761920 zfs:dmu_object_zapify+b7 ()
  ffffff00067619b0 zfs:dsl_dataset_activate_feature+97 ()
  ffffff0006761a20 zfs:dsl_dataset_sync+ba ()
  ffffff0006761ab0 zfs:dsl_pool_sync+153 ()
  ffffff0006761b70 zfs:spa_sync+26e ()
  ffffff0006761c20 zfs:txg_sync_thread+227 ()
  ffffff0006761c30 unix:thread_start+8 ()
  Inspection showed that feature->fi_feature was 7, which is the value of
  SPA_FEATURE_EXTENSIBLE_DATASET in the spa_feature enum.
  Testing shows that the panic can be prevented by explicitly setting extensible
  dataset as a dependency for the sha512, edonr, and skein feature flags.
  Alternatively, the new checksums code could possibly be changed to obviate the
  need for the dependency.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: ilovezfs <ilovezfs@icloud.com>

Note that FreeBSD does not support ednor yet.

MFC after:	2 weeks
2019-08-12 11:42:16 +00:00
Andriy Gapon f62615062e Allow ZVOL bookmarks to be listed recursively
Many thanks to cryx-freebsd@h3q.com for reporting the problem and
submitting a fix.  I have chosen to take an equivalent but textually
different patch from ZoL just to avoid increasing divergence between
OpenZFS flavours.

ZoL commit:	zfsonlinux/zfse33da554c5daf0103b093f44ab5b90ad6c064c3f
Author:		loli10K <ezomori.nozomu@gmail.com>
Date:		Wed Sep 7 19:34:20 2016 +0200
PR:		197821
Submitted by:	cryx-freebsd@h3q.com (alternative version)
Reported by:	cryx-freebsd@h3q.com
Obtained from:	ZoL
MFC after:	1 week
2019-08-12 10:00:32 +00:00
Toomas Soome b1b9326846 loader: support com.delphix:removing
We should support removing vdev from boot pool. Update loader zfs reader
to support com.delphix:removing.

Reviewed by:	allanjude
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18901
2019-08-08 18:08:13 +00:00
Wolfram Schneider 2b299dcf0e add forgotten opening bracket "("
PR:		237514
Reviewed by:	allanjude
MFC after:	soon for 11.3 and 12 series
Differential Revision:	https://reviews.freebsd.org/D21009
2019-07-31 21:21:34 +00:00
Baptiste Daroussin f34d9e5d6b Fix a bug introduced with parallel mounting of zfs
Incorporate a fix from zol:
ab5036df1c

commit log from upstream:
 Fix race in parallel mount's thread dispatching algorithm

Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) #8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of #8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of #8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

PR:		237517, 237397, 239243
Submitted by:	Matthew D. Fuller <fullermd@over-yonder.net> (by email)
MFC after:	3 days
2019-07-26 13:12:33 +00:00
Mark Johnston d6eb98610f Reference stdint.h types in ctf.5.
MFC after:	1 week
2019-07-17 16:31:50 +00:00
Allan Jude 92e0d7f840 zpool.8: the comment property is not read-only
The comment property was listed in the man page twice, once under the list
of read-only properties, and again (correctly), under the list of user
editable properties.

PR:		238355
Reported by:	Michael Zuo <muh.muhten@gmail.com>
Sponsored by:	Klara Systems
2019-06-06 01:32:00 +00:00
Mariusz Zaborski 75ed05ef7d DTrace: create an amd64 test suit
Create two tests checking if we can read urgs registers and if the
rax register returns a correct number.

Reviewed by:	markj
Discussed with:	lwhsu
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20364
2019-06-05 22:32:26 +00:00
Alexander Motin 9b048dd219 MFV r348583: 9847 leaking dd_clones (DMU_OT_DSL_CLONES) objects
illumos/illumos-gate@17fb938fd6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 20:49:20 +00:00
Alexander Motin eb106113c2 MFV r348580: 9559 zfs diff handles files on delete queue in fromsnap poorly
illumos/illumos-gate@20633e304b

Reviewed by: Joshua M. Clulow <josh@sysmgr.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     Paul Dagnelie <pcd@delphix.com>
2019-06-03 20:40:32 +00:00
Alexander Motin 07a5c938c9 MFV r348578: 9962 zil_commit should omit cache thrash
illumos/illumos-gate@cab3a55e15

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author:     Prakash Surya <prakash.surya@delphix.com>
2019-06-03 20:24:40 +00:00
Alexander Motin ce88141b27 MFV r348568: 9466 add JSON output support to channel programs
illumos/illumos-gate@5267591016

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Alek Pinchuk <apinchuk@datto.com>
2019-06-03 19:15:06 +00:00
Alexander Motin 677ef2563d MFV r348553: 9681 ztest failure in spa_history_log_internal due to spa_rename()
illumos/illumos-gate@6aee0ad769

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Matthew Ahrens <mahrens@delphix.com>
2019-06-03 18:32:56 +00:00
Alexander Motin 74f7070445 MFV r348552: 9682 page fault in dsl_async_clone_destroy() while opening pool
illumos/illumos-gate@ade2c82828

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Serapheim Dimitropoulos <serapheim@delphix.com>
2019-06-03 17:56:44 +00:00
Alexander Motin c4af53b8f1 MFV r348534: 9616 Bogus error when attempting to set property on read-only pool
illumos/illumos-gate@f62db44dbc

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
2019-06-03 17:19:05 +00:00
Leandro Lupori ff7449d6f5 [PowerPC64] stand: fix build using clang 8 as compiler
This change fixes "stand" build issues when using clang 8
as compiler.

Submitted by:   alfredo.junior_eldorado.org.br
Reviewed by:    jhibbits
Differential Revision: https://reviews.freebsd.org/D20026
2019-05-20 19:21:35 +00:00
Allan Jude 19a9d4fa28 MFV/ZoL: zfs userspace ignored all unresolved UIDs after the first
zfsonlinux/zfs@88cfff1824

zfs_main: fix `zfs userspace` squashing unresolved entries

The `zfs userspace` squashes all entries with unresolved numeric
values into a single output entry due to the comparsion always
made by the string name which is empty in case of unresolved IDs.

Fix this by falling to a numerical comparison when either one
of string values is not found. This then compares any numerical
values after all with a name resolved.

Signed-off-by: Pavel Boldin <boldin.pavel@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Reported by:	clusteradm
Obtained from:	ZFS-on-Linux
MFC after:	3 days
2019-05-18 12:27:22 +00:00
Alexander Motin d044b69950 Fix dataset name comparison in zfs_compare().
The code never returned match comparing two datasets (not snapshots).
As result, uu_avl_find(), called from zfs_callback(), never succeeded,
allowing to add same dataset into the list multiple times, for example:

	# zfs get name pers pers pers@z pers@z
	NAME    PROPERTY  VALUE   SOURCE
	pers    name      pers    -
	pers    name      pers    -
	pers@z  name      pers@z  -

With the patch:

	# zfs get name pers pers pers@z pers@z
	NAME    PROPERTY  VALUE   SOURCE
	pers    name      pers    -
	pers@z  name      pers@z  -

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-05-08 01:35:43 +00:00
Li-Wen Hsu 132af14fde Add a trailing empty line to match the test code output
This is added for letting these long failing test case pass, and for
consistency.  The test code should be fixed later to not output this extra
empty line.

Sponsored by:	The FreeBSD Foundation
2019-04-29 03:50:21 +00:00
Michael Tuexen 6eb0062dd0 Some test scripts use ncat --sctp --listen port to run an SCTP discard
server in the background. However, when running in the background,
stdin is closed and ncat initiates a graceful shutdown of the SCTP
association. This is not expected by the client. Therefore, the
ncat-based discard server is replaced by a perl-based one.

In addition, to remove the dependency from ncat, which needs to be
installed via the nmap port, also the code testing for a free SCTP port
is changed to use the perl-based client.

Finally, remove some debug output from the report generated.

Reviewed by:		lwhsu@
Differential Revision:	https://reviews.freebsd.org/D20086
2019-04-28 19:07:31 +00:00
Mark Johnston 5ee81b26a8 Ensure that we use a 64-bit value for the last mmap() argument.
When using __syscall(2), the offset argument is passed on the stack on
amd64.  Previously only 32 bits were written, so the upper 32 bits were
garbage and could cause the test to fail.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-03-20 23:35:15 +00:00
Baptiste Daroussin e7be6f4ed4 Fix a regression introduced in r344569
Import a fix from illumos (thanks Toomas Soomas for pointing at it)

See https://www.illumos.org/issues/10205 for more details
Illumos commit: 247b7da039

Submitted by:	jack@gandi.net
Reported by:	cy
Reviewed by:	tsoome, cy, bapt
Obtained from:	Illumos
2019-02-27 13:49:41 +00:00
Baptiste Daroussin c7851c5b7c Fix regression introduced in r344569
Reported by:	cy
Tested by:	cy
Submitted by:	Fatih Acar <fatih@gandi.net>
2019-02-27 07:55:53 +00:00
Sean Eric Fagan 50792eb553 Set process title during zfs send.
This adds a '-V' option to 'zfs send', which sets the process title once a
second to the progress information.

This code has been in FreeNAS for a long time now; this is just upstreaming
it here.  It was originially written by delphij.

Reviewed by:	mav
Obtained from:	iXsystems, Inc
Sponsored by:	iXsystems, Inc
Differential Revision:	https://reviews.freebsd.org/D19184
2019-02-26 19:23:22 +00:00
Baptiste Daroussin 0b858c82d8 Implement parallel mounting for ZFS filesystem
It was first implemented on Illumos and then ported to ZoL.
This patch is a port to FreeBSD of the ZoL version.
This patch also includes a fix for a race condition that was amended

With such patch Delphix has seen a huge decrease in latency of the mount phase
(https://github.com/openzfs/openzfs/commit/a3f0e2b569 for details).
With that current change Gandi has measured improvments that are on par with
those reported by Delphix.

Zol commits incorporated:
a10d50f999
e63ac16d25

Reviewed by:	avg, sef
Approved by:	avg, sef
Obtained from:	ZoL
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Gandi.net
Differential Revision:	https://reviews.freebsd.org/D19098
2019-02-26 08:18:34 +00:00
Leandro Lupori 559af1ec16 Increase ctfconvert buffer size
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D19353
2019-02-25 18:52:47 +00:00
Mark Johnston 9747bd8e02 MFV r344364:
9058 postmortem DTrace frequently broken under vmware

illumos/illumos-gate@793bd7e361

MFC after:	1 week
2019-02-20 17:10:30 +00:00
Andriy Gapon 885b0f9e91 zpool.8: sort zpool status flags in the same order as in illumos manual
Just in case, while I was here.

MFC after:	1 week
2019-02-20 13:37:27 +00:00
Andriy Gapon d97ff345cf zpool.8: document -D flag for zpool status
The description is taken from the illumos manual.

Reported by:	stilezy@gmail.com
MFC after:	1 week
2019-02-20 13:34:16 +00:00
Andriy Gapon 30d6475b3f fix userland illumos taskq code to pass relative timeout to cv_timedwait
Unlike illumos, FreeBSD cv_timedwait requires a relative timeout.  That
applies both to the kernel illumos compatibility code and to the
userland "fake kernel" code.

MFC after:	2 weeks
Sponsored by:	Panzura
2019-02-20 13:19:08 +00:00
Andriy Gapon 4c325393f3 MFV r342532: 5882 Temporary pool names
Note that this commit brings only formatting changes that were done
during the final review of the illumos change, because FreeBSD got the
main changes before illumos.

illumos/illumos-gate@04e5635652
04e5635652

https://www.illumos.org/issues/5882
  This is an import of the temporary pool names functionality from ZoL:
  e2282ef57e
  26b42f3f9d
  2f3ec90061
  00d2a8c92f
  83e9986f6e
  023bbe6f01
  It is intended to assist the creation and management of virtual machines
  that have their rootfs on ZFS on hosts that also have their rootfs on
  ZFS. These situations cause SPA namespace collisions when the standard
  name rpool is used in both cases. The solution is either to give each
  guest pool a name unique to the host, which is not always desireable, or
  boot a VM environment containing an ISO image to install it, which is
  cumbersome.

MFC after:	1 week
Sponsored by:	Panzura
2018-12-26 11:03:14 +00:00
Andriy Gapon f050611e7f MFV r342469: 9630 add lzc_rename and lzc_destroy to libzfs_core
illumos/illumos-gate@049ba636fa
049ba636fa

https://www.illumos.org/issues/9630
  Rename and destroy are very useful operations that deserve to be in
  libzfs_core.  And they are not hard to implement too.

MFC after:	2 weeks
Relnotes:	maybe
2018-12-26 10:37:41 +00:00
Yuri Pankov 65c9ed85e4 dtrace(1): remove reference to dtruss that was removed from base
system in r300226.

PR:		211618
Reviewed by:	gnn, markj, 0mp
Approved by:	kib (mentor, implicit)
Differential Revision:	https://reviews.freebsd.org/D17762
2018-10-31 15:29:26 +00:00
Michael Tuexen 1e88cc8b59 Add support for send, receive and state-change DTrace providers for
SCTP. They are based on what is specified in the Solaris DTrace manual
for Solaris 11.4.

Reviewed by:		0mp, dteske, markj
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16839
2018-08-22 21:23:32 +00:00
Mark Johnston f0af0b312f Add partial documentation for dtrace(1)'s -x configuration options.
Some options are still missing descriptions, but they can be filled in
over time.

Submitted by:	raichoo <raichoo@googlemail.com>
Reviewed by:	0mp (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16671
2018-08-16 19:28:44 +00:00
Matt Macy cc0fbbb92e MFV/ZoL: Implement large_dnode pool feature
commit 50c957f702
Author: Ned Bass <bass6@llnl.gov>
Date:   Wed Mar 16 18:25:34 2016 -0700

    Implement large_dnode pool feature

    Justification
    -------------

    This feature adds support for variable length dnodes. Our motivation is
    to eliminate the overhead associated with using spill blocks.  Spill
    blocks are used to store system attribute data (i.e. file metadata) that
    does not fit in the dnode's bonus buffer. By allowing a larger bonus
    buffer area the use of a spill block can be avoided.  Spill blocks
    potentially incur an additional read I/O for every dnode in a dnode
    block. As a worst case example, reading 32 dnodes from a 16k dnode block
    and all of the spill blocks could issue 33 separate reads. Now suppose
    those dnodes have size 1024 and therefore don't need spill blocks.  Then
    the worst case number of blocks read is reduced to from 33 to two--one
    per dnode block. In practice spill blocks may tend to be co-located on
    disk with the dnode blocks so the reduction in I/O would not be this
    drastic. In a badly fragmented pool, however, the improvement could be
    significant.

    ZFS-on-Linux systems that make heavy use of extended attributes would
    benefit from this feature. In particular, ZFS-on-Linux supports the
    xattr=sa dataset property which allows file extended attribute data
    to be stored in the dnode bonus buffer as an alternative to the
    traditional directory-based format. Workloads such as SELinux and the
    Lustre distributed filesystem often store enough xattr data to force
    spill bocks when xattr=sa is in effect. Large dnodes may therefore
    provide a performance benefit to such systems.

    Other use cases that may benefit from this feature include files with
    large ACLs and symbolic links with long target names. Furthermore,
    this feature may be desirable on other platforms in case future
    applications or features are developed that could make use of a
    larger bonus buffer area.

    Implementation
    --------------

    The size of a dnode may be a multiple of 512 bytes up to the size of
    a dnode block (currently 16384 bytes). A dn_extra_slots field was
    added to the current on-disk dnode_phys_t structure to describe the
    size of the physical dnode on disk. The 8 bits for this field were
    taken from the zero filled dn_pad2 field. The field represents how
    many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
    This convention results in a value of 0 for 512 byte dnodes which
    preserves on-disk format compatibility with older software.

    Similarly, the in-memory dnode_t structure has a new dn_num_slots field
    to represent the total number of dnode_phys_t slots consumed on disk.
    Thus dn->dn_num_slots is 1 greater than the corresponding
    dnp->dn_extra_slots. This difference in convention was adopted
    because, unlike on-disk structures, backward compatibility is not a
    concern for in-memory objects, so we used a more natural way to
    represent size for a dnode_t.

    The default size for newly created dnodes is determined by the value of
    a new "dnodesize" dataset property. By default the property is set to
    "legacy" which is compatible with older software. Setting the property
    to "auto" will allow the filesystem to choose the most suitable dnode
    size. Currently this just sets the default dnode size to 1k, but future
    code improvements could dynamically choose a size based on observed
    workload patterns. Dnodes of varying sizes can coexist within the same
    dataset and even within the same dnode block. For example, to enable
    automatically-sized dnodes, run

     # zfs set dnodesize=auto tank/fish

    The user can also specify literal values for the dnodesize property.
    These are currently limited to powers of two from 1k to 16k. The
    power-of-2 limitation is only for simplicity of the user interface.
    Internally the implementation can handle any multiple of 512 up to 16k,
    and consumers of the DMU API can specify any legal dnode value.

    The size of a new dnode is determined at object allocation time and
    stored as a new field in the znode in-memory structure. New DMU
    interfaces are added to allow the consumer to specify the dnode size
    that a newly allocated object should use. Existing interfaces are
    unchanged to avoid having to update every call site and to preserve
    compatibility with external consumers such as Lustre. The new
    interfaces names are given below. The versions of these functions that
    don't take a dnodesize parameter now just call the _dnsize() versions
    with a dnodesize of 0, which means use the legacy dnode size.

    New DMU interfaces:
      dmu_object_alloc_dnsize()
      dmu_object_claim_dnsize()
      dmu_object_reclaim_dnsize()

    New ZAP interfaces:
      zap_create_dnsize()
      zap_create_norm_dnsize()
      zap_create_flags_dnsize()
      zap_create_claim_norm_dnsize()
      zap_create_link_dnsize()

    The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
    spa_maxdnodesize() function should be used to determine the maximum
    bonus length for a pool.

    These are a few noteworthy changes to key functions:

    * The prototype for dnode_hold_impl() now takes a "slots" parameter.
      When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
      ensure the hole at the specified object offset is large enough to
      hold the dnode being created. The slots parameter is also used
      to ensure a dnode does not span multiple dnode blocks. In both of
      these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
      these failure cases are only possible when using DNODE_MUST_BE_FREE.

      If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
      dnode_hold_impl() will check if the requested dnode is already
      consumed as an extra dnode slot by an large dnode, in which case
      it returns ENOENT.

    * The function dmu_object_alloc() advances to the next dnode block
      if dnode_hold_impl() returns an error for a requested object.
      This is because the beginning of the next dnode block is the only
      location it can safely assume to either be a hole or a valid
      starting point for a dnode.

    * dnode_next_offset_level() and other functions that iterate
      through dnode blocks may no longer use a simple array indexing
      scheme. These now use the current dnode's dn_num_slots field to
      advance to the next dnode in the block. This is to ensure we
      properly skip the current dnode's bonus area and don't interpret it
      as a valid dnode.

    zdb
    ---
    The zdb command was updated to display a dnode's size under the
    "dnsize" column when the object is dumped.

    For ZIL create log records, zdb will now display the slot count for
    the object.

    ztest
    -----
    Ztest chooses a random dnodesize for every newly created object. The
    random distribution is more heavily weighted toward small dnodes to
    better simulate real-world datasets.

    Unused bonus buffer space is filled with non-zero values computed from
    the object number, dataset id, offset, and generation number.  This
    helps ensure that the dnode traversal code properly skips the interior
    regions of large dnodes, and that these interior regions are not
    overwritten by data belonging to other dnodes. A new test visits each
    object in a dataset. It verifies that the actual dnode size matches what
    was stored in the ztest block tag when it was created. It also verifies
    that the unused bonus buffer space is filled with the expected data
    patterns.

    ZFS Test Suite
    --------------
    Added six new large dnode-specific tests, and integrated the dnodesize
    property into existing tests for zfs allow and send/recv.

    Send/Receive
    ------------
    ZFS send streams for datasets containing large dnodes cannot be received
    on pools that don't support the large_dnode feature. A send stream with
    large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
    unrecognized by an incompatible receiving pool so that the zfs receive
    will fail gracefully.

    While not implemented here, it may be possible to generate a
    backward-compatible send stream from a dataset containing large
    dnodes. The implementation may be tricky, however, because the send
    object record for a large dnode would need to be resized to a 512
    byte dnode, possibly kicking in a spill block in the process. This
    means we would need to construct a new SA layout and possibly
    register it in the SA layout object. The SA layout is normally just
    sent as an ordinary object record. But if we are constructing new
    layouts while generating the send stream we'd have to build the SA
    layout object dynamically and send it at the end of the stream.

    For sending and receiving between pools that do support large dnodes,
    the drr_object send record type is extended with a new field to store
    the dnode slot count. This field was repurposed from unused padding
    in the structure.

    ZIL Replay
    ----------
    The dnode slot count is stored in the uppermost 8 bits of the lr_foid
    field. The bits were unused as the object id is currently capped at
    48 bits.

    Resizing Dnodes
    ---------------
    It should be possible to resize a dnode when it is dirtied if the
    current dnodesize dataset property differs from the dnode's size, but
    this functionality is not currently implemented. Clearly a dnode can
    only grow if there are sufficient contiguous unused slots in the
    dnode block, but it should always be possible to shrink a dnode.
    Growing dnodes may be useful to reduce fragmentation in a pool with
    many spill blocks in use. Shrinking dnodes may be useful to allow
    sending a dataset to a pool that doesn't support the large_dnode
    feature.

    Feature Reference Counting
    --------------------------
    The reference count for the large_dnode pool feature tracks the
    number of datasets that have ever contained a dnode of size larger
    than 512 bytes. The first time a large dnode is created in a dataset
    the dataset is converted to an extensible dataset. This is a one-way
    operation and the only way to decrement the feature count is to
    destroy the dataset, even if the dataset no longer contains any large
    dnodes. The complexity of reference counting on a per-dnode basis was
    too high, so we chose to track it on a per-dataset basis similarly to
    the large_block feature.

    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #3542
2018-08-12 00:45:53 +00:00
Alexander Leidinger a079a34fd5 Extend the info about the limitations of datasets in jails.
Reviewed by:	allanjude
Sponsored by:	Essen Hackathon
2018-08-11 20:49:19 +00:00
Mark Johnston 0b56e7a8e9 Disable the D subroutines msgsize() and msgdsize().
They are specific to illumos and the corresponding DIF subroutines are
already disabled on FreeBSD.

Reported by:	gnn
2018-08-10 19:23:20 +00:00
Matt Macy 648cfe57fd Performance optimization of AVL tree comparator functions
MFV:
commit ee36c709c3
Author: Gvozden Neskovic <neskovic@gmail.com>
Date:   Sat Aug 27 20:12:53 2016 +0200

    perf: 2.75x faster ddt_entry_compare()
        First 256bits of ddt_key_t is a block checksum, which are expected
    to be close to random data. Hence, on average, comparison only needs to
    look at first few bytes of the keys. To reduce number of conditional
    jump instructions, the result is computed as: sign(memcmp(k1, k2)).

    Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} ,
    which is computed efficiently.  Synthetic performance evaluation of
    original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R)
    CPU E5-2660 v3:

    old     6.85789 s
    new     2.49089 s

    perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare()
        Compute the result directly instead of using conditionals

    perf: zfs_range_compare()
        Speedup between 1.1x - 2.5x, depending on compiler version and
    optimization level.

    perf: spa_error_entry_compare()
        `bcmp()` is not suitable for comparator use. Use `memcmp()` instead.

    perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare()
    perf: 2.8x faster zil_bp_compare()
    perf: 2.8x faster mze_compare()
    perf: faster dbuf_compare()
    perf: faster compares in spa_misc
    perf: 2.8x faster layout_hash_compare()
    perf: 2.8x faster space_reftree_compare()
    perf: libzfs: faster avl tree comparators
    perf: guid_compare()
    perf: dsl_deadlist_compare()
    perf: perm_set_compare()
    perf: 2x faster range_tree_seg_compare()
    perf: faster unique_compare()
    perf: faster vdev_cache _compare()
    perf: faster vdev_uberblock_compare()
    perf: faster fuid _compare()
    perf: faster zfs_znode_hold_compare()

    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Richard Elling <richard.elling@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #5033
2018-08-10 06:42:08 +00:00
Alexander Motin 07ddc55096 MFV r337223:
9580 Add a hash-table on top of nvlist to speed-up operations

illumos/illumos-gate@2ec7644aab

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-08-03 01:52:25 +00:00
Alexander Motin 0285589b38 MFV 337214:
9621 Make createtxg and guid properties public

illumos/illumos-gate@e8d4a73c86

Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     Josh Paetzel <josh@tcbug.org>
2018-08-03 00:24:27 +00:00
Alexander Motin d49e9be14f MFV r337184: 9457 libzfs_import.c:add_config() has a memory leak
A memory leak occurs on lines 209 and 213 because the config is not freed
in the error case.  The interface to add_config() seems less than ideal -
it would be better if it copied any data necessary from the config and the
caller freed it.

illumos/illumos-gate@ddfe901b12

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:     sara hartse <sara.hartse@delphix.com>
2018-08-02 21:25:32 +00:00
Alexander Motin 2bce9a5316 MFV r337182: 9330 stack overflow when creating a deeply nested dataset
Datasets that are deeply nested (~100 levels) are impractical. We just put
a limit of 50 levels to newly created datasets. Existing datasets should
work without a problem.

illumos/illumos-gate@5ac95da7d6

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author:     Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-08-02 21:19:35 +00:00
Alexander Motin ac879e61ad 9523 Large alloc in zdb can cause trouble
16MB alloc in zdb_embedded_block() can cause cores in certain situations
(clang, gcc55).

OsX commit: ced236a5da
FreeBSD commit: https://svnweb.freebsd.org/base?view=revision&revision=326150
illumos/illumos-gate@03a4c2f4bf

Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author:     Jorgen Lundman <lundman@lundman.net>

This is an update for r326150 (by avg), where this change comes from.
2018-08-02 20:44:07 +00:00
Alexander Motin c423a5e6b8 MFV r337161: 9512 zfs remap poolname@snapname coredumps
Only filesystems and volumes are valid "zfs remap" parameters: when passed
a snapshot name zfs_remap_indirects() does not handle the EINVAL returned
from libzfs_core, which results in failing an assertion and consequently
crashing.

illumos/illumos-gate@0b2e825398

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     loli10K <ezomori.nozomu@gmail.com>
2018-08-02 19:13:45 +00:00
Alexander Motin 7fca1b93c4 Do not blindly include illumos kernel headers instead of user-space.
It is not needed now, and I doubt it much helped at all, creating more
confusions then good.
2018-08-02 18:55:55 +00:00
Alexander Motin b59e9cd1c0 MFV r316926:
7955 libshare needs to initialize only those datasets being modified by the consumer

illumos/illumos-gate@8a981c3356
8a981c3356

https://www.illumos.org/issues/7955
  Libshare currently initializes all available filesystems when doing any
  libshare operation. This requires iterating through all the filesystem
  multiple times, which is a huge performance problem for sharing and
  unsharing operations.

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Daniel Hoffman <dj.hoffman@delphix.com>

For FreeBSD this is practically a NOP, just a diff reduction.
2018-08-01 21:51:49 +00:00
Michael Tuexen 7bda966394 Add a dtrace provider for UDP-Lite.
The dtrace provider for UDP-Lite is modeled after the UDP provider.
This fixes the bug that UDP-Lite packets were triggering the UDP
provider.
Thanks to dteske@ for providing the dwatch module.

Reviewed by:		dteske@, markj@, rrs@
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16377
2018-07-31 22:56:03 +00:00
Alexander Motin 200c27a75d MFV r337014:
9421 zdb should detect and print out the number of "leaked" objects
9422 zfs diff and zdb should explicitly mark objects that are on the deleted queue

illumos/illumos-gate@20b5dafb42

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author:     Paul Dagnelie <pcd@delphix.com>
2018-07-31 22:50:50 +00:00
Alexander Motin 0021e1c10c MFV r336991, r337001:
9102 zfs should be able to initialize storage devices

The first access to a disk block can incur a performance penalty on some
platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that
volumes be "thick provisioned", where supported by the platform (VMware).
Thick provisioning is time consuming and often is ignored. If the thick
provision step is omitted, customers will see suboptimal performance until
we have written to all parts of the LUN. ZFS should be able to initialize
any unused storage to remove any first-write penalty that exists.

illumos/illumos-gate@094e47e980

Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author:     George Wilson <george.wilson@delphix.com>
2018-07-31 21:06:04 +00:00
Alexander Motin d1cf4052d0 MFV r336955: 9236 nuke spa_dbgmsg
We should use zfs_dbgmsg instead of spa_dbgmsg.  Or at least,
metaslab_condense() should call zfs_dbgmsg because it's important and rare
enough to always log. It's possible that the message in zio_dva_allocate()
would be too high-frequency for zfs_dbgmsg.

illumos/illumos-gate@21f7c81cc1

Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:47:27 +00:00
Alexander Motin 194000fa21 MFV r336950: 9290 device removal reduces redundancy of mirrors
Mirrors are supposed to provide redundancy in the face of whole-disk failure
and silent damage (e.g. some data on disk is not right, but ZFS hasn't
detected the whole device as being broken). However, the current device
removal implementation bypasses some of the mirror's redundancy.

illumos/illumos-gate@3a4b1be953

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2018-07-31 00:25:39 +00:00
Alexander Motin 6413a6d31f MFV r336946: 9238 ZFS Spacemap Encoding V2
The current space map encoding has the following disadvantages:
[1] Assuming 512 sector size each entry can represent at most 16MB for a segment.
This makes the encoding very inefficient for large regions of space.
[2] As vdev-wide space maps have started to be used by new features (i.e.
device removal, zpool checkpoint) we've started imposing limits in the
vdevs that can be used with them based on the maximum addressable offset
(currently 64PB for a top-level vdev).

The new remains backwards compatible with the old one. The introduced
two-word entry format, besides extending the limits imposed by the single-entry
layout, also includes a vdev field and some extra padding after its prefix.

The extra padding after the prefix should is reserved for future usage (e.g.
new prefixes for future encodings or new fields for flags). The new vdev field
not only makes the space maps more self-descriptive, but also opens the doors
for pool-wide space maps.

One final important note is that the number of bits used for vdevs is reduced
to 24 bits for blkptrs. That was decided as we don't know of any setups that
use more than 16M vdevs for the time being and
we wanted to fit the vdev field in the space map. In addition that gives us
some extra bits in dva_t.

illumos/illumos-gate@17f11284b4

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-07-30 23:47:38 +00:00
Alexander Motin 1960706625 MFV r336944: 9286 want refreservation=auto
When a ZFS volume is created with zfs create -V (but without -s), the
refreservation property is set to a value that is volsize plus the maximum
size of metadata. If refreservation is ever set to another value, it is
impossible to set it back to the automatically determined value. There are
other cases where refreservation may be wrong. These include receiving a
volume that was sent without properties and zfs clone.

We need:

zfs set refreservation=auto <volume>
zfs clone -o refreservation=auto <volume>

Each one would use the same function used by zfs create -V to determine the
proper value for refreservation.

illumos/illumos-gate@1c10ae76c0

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Mike Gerdts <mike.gerdts@joyent.com>
2018-07-30 22:39:30 +00:00
Michael Tuexen 53e0911116 Improve TCP related tests for dtrace.
Ensure that the TCP connections are terminated gracefully as expected
by the test. Use appropriate numbers for sent/received packets.
In addition, enable tst.localtcpstate.ksh, which should pass, but
doesn't until https://reviews.freebsd.org/D16369 is committed.

Reviewed by:		markj@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16288
2018-07-22 10:50:59 +00:00
Michael Tuexen be029a4979 Test that the dtrace UDP receive probe fires.
This test ensures that the fix committed in
https://svnweb.freebsd.org/changeset/base/336551
actually works.

Reviewed by:		dteske@, markj@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16046
2018-07-20 15:37:29 +00:00
Michael Tuexen 10b803a40d Adjust comment to reality since r286171.
Sponsored by:		Netflix, Inc.
2018-07-15 20:42:47 +00:00
Michael Tuexen e0f9b8233f Don't require a local sshd for the local TCP state dtrace test
This change is similar to the one done in r286171 for
tst.ipv4localtcp.ksh. This not only reduces the requirements on the
system used for testing but results also in a graceful teardown of
the TCP connection.

Reviewed by:		gnn@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16276
2018-07-15 20:41:16 +00:00
Michael Tuexen dc9f20b3f3 Fix the UDP tests for dtrace.
The code imported from opensolaris was depending on ping supporting
UDP for sending probes. Since this is not supported by ping on FreeBSD
use a perl script instead.
The remote test requires the usage of ksh93, so state that in the
sheband.
Enable the local test, but keep the remote test disabled, since it
requires a remote machine on the LAN.

Reviewed by:		markj@, gnn@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16268
2018-07-15 20:34:22 +00:00
Michael Tuexen 60e4fb3a3f Return the intended return code.
This bug was spotted by markj@ in D16268 because I copied this code part
and used it there. So fix it.

Sponsored by:		Netflix, Inc.
2018-07-14 19:53:41 +00:00
Michael Tuexen 49d7124b18 Fix shebangs and execute bit of test scripts.
Since we don't have /usr/bin/ksh, use a generic way of specifying
ksh. Some of the tests only run with ksh93, so use this shell
for these tests. Two of the tests don't have the execute bit set,
so fix this, too.

Reviewed by:		markj@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16270
2018-07-14 19:49:14 +00:00
Eric van Gyzen 79d4ee83de Fix markup in zfs(8); no content change
Sponsored by:	Dell EMC
2018-06-15 15:28:31 +00:00
Mark Johnston ecbde90073 Process CUs with a language attribute of DW_LANG_Mips_Assembler.
At the moment ctfconvert(1) does not do much with such CUs, but
that may not be true in the future, and we run ctfconvert on several
assembly files during the build.

X-MFC with:	r334883
2018-06-11 16:33:36 +00:00
Mark Johnston c5fda9bac0 Don't process DWARF generated from non-C/C++ code.
ctfconvert(1) is not designed to handle DWARF generated from such code,
and will generally fail in non-obvious ways.  Use an explicit check to
help catch such potential failures.

Reported by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	2 weeks
2018-06-09 15:10:49 +00:00
Sean Eric Fagan 69724399c4 This originated from ZFS On Linux, as
d4a72f2386

During scans (scrubs or resilvers), it sorts the blocks in each transaction
group by block offset; the result can be a significant improvement. (On my
test system just now, which I put some effort to introduce fragmentation into
the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the
changes.) I've seen similar rations on production systems.

Approved by:	Alexander Motin
Obtained from:	ZFS On Linux
Relnotes:	Yes (improved scrub performance, with tunables)
Differential Revision:	https://reviews.freebsd.org/D15562
2018-06-08 17:38:28 +00:00
Matt Macy 1f52c1db90 ctf dwarf: don't report "no dwarf entry" as if it were an error 2018-05-19 18:50:58 +00:00
Matt Macy d0ba1baed3 ctfconvert: silence useless enum has too many values warning 2018-05-19 06:31:17 +00:00
Sean Bruno 86784a0a22 Cleanup sundry clang warnings for code that is not upstream in illumos.
https://github.com/illumos/illumos-gate/edit/master/usr/src/lib/libzfs/common/libzfs_sendrecv.c

Patch our version of it to quiesce warnings until someone decides to sync
up our code:

libzfs_sendrecv.c:2555:30: warning: format specifies type 'unsigned long'
  but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                                sprintf(guidname, "%lu", thisguid);
                                                   ~~~   ^~~~~~~~
                                                   %llu
libzfs_sendrecv.c:2612:29: warning: format specifies type 'unsigned long'
  but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                        sprintf(guidname, "%lu", parent_fromsnap_guid);
                                           ~~~   ^~~~~~~~~~~~~~~~~~~~
                                           %llu
libzfs_sendrecv.c:2645:29: warning: format specifies type 'unsigned long'
  but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                        sprintf(guidname, "%lu", parent_fromsnap_guid);
                                           ~~~   ^~~~~~~~~~~~~~~~~~~~
                                           %llu

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D15325
2018-05-06 16:22:02 +00:00
Eitan Adler 4822188974 zpool(8): correct list of default properties in 'list'.
The default provides output in the following form:
```
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
HEALTH  ALTROOT
```

this corrects the man page.

Also submitted upstream as
https://github.com/openzfs/openzfs/pull/632/files (with slightly
different changes needed)
2018-04-28 01:14:16 +00:00
Alexander Motin e4d6c7fc17 MFV man pages update from r329502: 7614 zfs device evacuation/removal.
MFC after:	3 days
2018-04-17 02:33:54 +00:00
Andriy Gapon 81f187e576 allow ZFS pool to have temporary name for duration of current import
The change adds -t <name> option to zpool create and -t option to zpool
import in its form with an old name and a new name.  This allows to
import (or create) a pool under a name that's different from its real,
permanent name without affecting that name.  This is useful when working
with VM images or images of other physical systems if they happen to
have a ZFS pool with the same name as the host system.

The changes come from ZoL with some small tweaks.
The porting has been done by julian.

The change is being submitted to OpenZFS:
https://github.com/openzfs/openzfs/pull/600

Submitted by:	julian
Reviewed by:	smh
MFC after:	2 weeks
Sponsored by:	Panzura (porting)
Differential Revision: https://reviews.freebsd.org/D14972
2018-04-12 10:37:26 +00:00
Alexander Motin 849a7ce2d5 MFV r331712:
9280 Assertion failure while running removal_with_ganging test with 4K devices

illumos/illumos-gate@243952c7ee

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
2018-03-28 23:17:29 +00:00
Alexander Motin 5c4561f332 MFV r331706:
9235 rename zpool_rewind_policy_t to zpool_load_policy_t

illumos/illumos-gate@5dafeea3eb

We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.

For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-03-28 22:29:06 +00:00
Alexander Motin 0b0c76bc58 MFV r331695, 331700: 9166 zfs storage pool checkpoint
illumos/illumos-gate@8671400134

The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that.  It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data).  It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it.  Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure.  She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state.  Otherwise, she discards it.  With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
2018-03-28 22:01:27 +00:00
Alexander Motin 4bada7a0a2 Partial MFV r329753:
8809 libzpool should leverage work done in libfakekernel

illumos/illumos-gate@f06dce2c1f

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>

We do not have libfakekernel, but need to reduce code divergence.
2018-03-28 20:41:15 +00:00
Conrad Meyer 2f68cdb944 ctfconvert: Fix minor memory leaks in STABS parser
In an error case, free leaked objects.  Does anything use STABS anymore?
Probably not.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-27 22:49:06 +00:00
Conrad Meyer 52f72944b8 ctfconvert/ctfmerge: Fix a memory leak enumerating DWARF files
Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-26 23:20:37 +00:00
Conrad Meyer f5147e312f libctf: Don't construct pointers to out of bounds array offsets
Just attempting to do the pointer arithmetic is undefined behavior.

No functional change intended.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-26 22:02:36 +00:00
Conrad Meyer e796cc77c5 libctf: Appease Coverity overrun warnings
Rather than zeroing and reading into the a smaller union member the full
union size, just zero and read directly into the union.

No functional change intended.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-26 21:57:44 +00:00
Alexander Motin e76e77a972 MFV r331407: 9213 zfs: sytem typo
illumos/illumos-gate@edc8ef7d92

Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
2018-03-23 02:30:29 +00:00
Alexander Motin b8436536c9 MFV r331400: 8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
2018-03-23 02:15:05 +00:00
Mark Johnston df7165b88e Given hidden visibility to symbols referenced by the DOF section.
MFC after:	1 week
2018-03-19 19:32:05 +00:00
Mark Johnston 4ed6321f41 Use __syscall(2) rather than syscall(2) in syscall/tst.args.c.
Some of mmap(2)'s arguments are 64 bits wide.

MFC after:	3 days
2018-03-18 17:03:26 +00:00
Conrad Meyer a8c03de86d libdtrace: Fix another uninitialized dtt_flags UB
Like r331073, eliminate a UB by fully initializing the struct with a designated
initializer.  Note that the similar src_dtt is not fully used, so a similar
treatment was not absolutely required.  I chose to leave it alone.  It
wouldn't hurt to do the same thing, though.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-16 21:10:36 +00:00
Conrad Meyer 1ad2da031e libdtrace: Eliminate a minor UB by fully initializing parameter struct
The dtt_flags value is dereferenced by dt_type_pointer() and must be
initialized first.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-16 20:43:40 +00:00
Alan Somers 50b779bdc9 ZFS: fix adding vdevs to very large pools
r323791 changed the return value of zpool_read_label.  Error paths that
previously returned 0 began to return -1 instead.  However, not all error
paths initialized errno.  When adding vdevs to a very large pool, errno could
be prepopulated with ENOMEM, causing the operation to fail.  Fix the bug by
setting errno=ENOENT in the case that no ZFS label is found.

PR:		226096
Submitted by:	Nikita Kozlov
Reviewed by:	avg
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13088
2018-03-02 21:26:48 +00:00
Alan Somers 7d3761dc72 Don't declare __assfail as static
It gets called by dmu_buf_init_user, which is inline but not static.  So it
needs global linkage itself.

Reported by:	GCC-6
MFC after:	17 days
X-MFC-With:	329722
2018-02-25 14:29:43 +00:00
Alexander Motin 03d54eb339 MFV r329807:
8940 Sending an intra-pool resumable send stream may result in EXDEV

illumos/illumos-gate@544132fce3

"zfs send -t <token>" for an incremental send should be able to resume
successfully when sending to the same pool: a subtle issue in
zfs_iter_children() doesn't currently allow this.

Because resuming from a token requires "guid" -> "dataset" mapping
(guid_to_name()), we have to walk the whole hierarchy to find the right
snapshots to send.
When resuming an incremental send both source and destination live in the
same pool and have the same guid: this is where zfs_iter_children() gets
confused and picks up the wrong snapshot, so we end up trying to send an
incremental "destination@snap1 -> source@snap2" stream instead of
"source@snap1 -> source@snap2": this fails with an "Invalid cross-device
link" (EXDEV) error.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 04:01:55 +00:00
Alexander Motin 1ea10a60f9 MFV r329793, r329795:
9075 Improve ZFS pool import/load process and corrupted pool recovery

illumos/illumos-gate@6f7938128a

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

https://www.illumos.org/issues/7638: Refactor spa_load_impl into several functions
https://www.illumos.org/issues/8961: SPA load/import should tell us why it failed
https://www.illumos.org/issues/7277: zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
Of data, this feature is for now restricted to a read-only import.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 03:15:35 +00:00
Alexander Motin 613b0d87da 8942 zfs promote .../%recv should be an error
illumos/illumos-gate@add927f8c8

Reported on the ZFSonLinux https://github.com/zfsonlinux/zfs/issues/4843,
fixed by https://github.com/zfsonlinux/zfs/pull/6339:

If we are in the middle of an incremental zfs receive, the child .../%recv
will exist. If you concurrently run zfs promote .../%recv, it will "work",
but then zfs gets confused. For example, there's no obvious way to destroy
the containing filesystem (because it is now a clone of its invisible child).

Attempting to do this promote should be an error. We could fix this by
having zfs_ioc_promote() check if zc_name contains a %, similar to
zfs_ioc_rename().

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: loli10K <ezomori.nozomu@gmail.com>
2018-02-22 01:42:13 +00:00
Alexander Motin 502d18a8f1 MFV r329766: 8962 zdb should work on non-idle pools
illumos/illumos-gate@e144c4e6c9

Currently `zdb` consistently fails to examine non-idle pools as it fails
during the `spa_load()` process. The main problem seems to be that
`spa_load_verify()` fails as can be seen below:

$ sudo zdb -d -G dcenter
    zdb: can't open 'dcenter': I/O error

ZFS_DBGMSG(zdb):
    spa_open_common: opening dcenter
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
    spa_load(dcenter): using uberblock with txg=40824950
    spa_load(dcenter): UNLOADING
    spa_load(dcenter): RELOADING
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
    spa_load(dcenter): using uberblock with txg=40824952
    spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
    spa_load(dcenter): UNLOADING

This change makes `spa_load_verify()` a dryrun when ran from `zdb`. This is
done by creating a global flag in zfs and then setting it in `zdb`.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
2018-02-22 00:42:12 +00:00
Alexander Motin b17bfcde3d 9018 Replace kmem_cache_reap_now() with kmem_cache_reap_soon()
illumos/illumos-gate@36a64e6284

To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>

FreeBSD does not use taskqueue for kmem caches reaping, so this change
is less dramatic then it is on Illumos, just limiting reaping to 1 time
per second.  It may possibly be improved later, if needed.
2018-02-21 23:15:06 +00:00
Alexander Motin 24433f00ea MFV r329502: 7614 zfs device evacuation/removal
illumos/illumos-gate@5cabbc6b49

https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.

Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
2018-02-21 16:51:02 +00:00
Andriy Gapon cfb675a138 MFV r329718: 8520 7198 lzc_rollback_to should support rolling back to origin
illumos/illumos-gate@95643f75d2
95643f75d2

https://www.illumos.org/issues/8520
  lzc_rollback_to() should support rolling back to a clone's origin.
  The current checks in zfs_ioc_rollback() would not allow that because the
  origin snapshot belongs to a different filesystem.
  The overly restrictive check was introduced in 7600, but it was not a
  regression as none of the existing tools provided a way to rollback to the
  origin.

https://www.illumos.org/issues/7198
  EINVAL is returned when a dataset does not have any snapshots, so there is
  nothing to roll back to.
  Although the code in zfs_do_rollback checks for that condition in advance, it's
  still possible that the snapshot(s) gets removed after the check and before the
  rollback sync task is executed.
  At the moment zfs command would crash when that happens.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	2 weeks
2018-02-21 15:12:14 +00:00
Alexander Motin 73c9b6d523 MFV r322231:
8430 dir_is_empty_readdir() doesn't properly handle error from fdopendir()

illumos/illumos-gate@ba6e7e6505
ba6e7e6505

https://www.illumos.org/issues/8430
  we should close dirfd if fdopendir() fails.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Sowrabha Gopal <sowrabha.gopal@delphix.com>
2018-02-21 02:21:22 +00:00
Alexander Motin e5a4a83784 MFV r318941: 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>

This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.
2018-02-21 00:18:57 +00:00
Alexander Motin 502b48823a MFV r316918: 7990 libzfs: snapspec_cb() does not need to call zfs_strdup()
illumos/illumos-gate@d8584ba6fb
d8584ba6fb

https://www.illumos.org/issues/7990
  The snapspec_cb() callback function in libzfs does not need to call zfs_strdup().

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
2018-02-20 20:46:27 +00:00
Alexander Motin 67aa4cb2b9 MFV r316902: 7745 print error if lzc_* is called before libzfs_core_init
illumos/illumos-gate@7c13517fff
7c13517fff

https://www.illumos.org/issues/7745
  The problem is that consumers of `libZFS_Core` that forget to call
  `libzfs_core_init()` before calling any other function of the library
  are having a hard time realizing their mistake. The library's internal
  file descriptor is declared as global static, which is ok, but it is not
  initialized explicitly; therefore, it defaults to 0, which is a valid
  file descriptor. If `libzfs_core_init()`, which explicitly initializes
  the correct fd, is skipped, the ioctl functions return errors that do
  not have anything to do with `libZFS_Core`, where the problem is
  actually located.
  Even though assertions for that existed within `libZFS_Core` for debug
  builds, they were never enabled because the `-DDEBUG` flag was missing
  from the compiler flags.
  This patch applies the following changes:
  1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`,
         to enable their assertions on debug builds.
  2. It corrects an assertion within `libzfs`, where a function had
         been spelled incorrectly (`zpool_prop_unsupported()`) and nobody
         knew because the `-DDEBUG` flag was missing, and the preprocessor
         was taking that part of the code away.
  3. The library's internal fd is initialized to `-1` and `VERIFY`
         assertions have been placed to check that the fd is not equal to
         `-1` before issuing any ioctl. It is important here to note, that
         the `VERIFY` assertions exist in both debug and non-debug builds.
  4. In `libzfs_core_fini` we make sure to never increment the
         refcount of our fd below 0, and also reset the fd to `-1` when no
         one refers to it. The reason for this, is for the rare case that
         the consumer closes all references but then calls one of the
         library's functions without using `libzfs_core_init()` first, and
         in the mean time, a previous call to `open()` decided to reuse
         our previous fd. This scenario would have passed our assertion in

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-02-20 20:40:55 +00:00
Alexander Motin 499a9a917a MFV r316901:
7730 libzfs`add_config() leaks config nvl when reading spare/l2cache devices

illumos/illumos-gate@105686550e
105686550e

https://www.illumos.org/issues/7730
  antares:root:~# mdb /usr/sbin/zpool
  > ::sysbp _exit
  > ::run import
     pool: data
       id: 2093977168778024605
    state: ONLINE
   action: The pool can be imported using its name or numeric identifier.
   config:

          data        ONLINE
            c6t0d0    ONLINE
            c6t1d0    ONLINE
          cache
            c6t2d0
  mdb: stop on entry to _exit
  mdb: target stopped at:
  0xfee556ba:     nop
  mdb: You've got symbols!
  Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1
  libnvpair.so.1 ]
  > ::findleaks -d
  BYTES             LEAKED VMEM_SEG CALLER
  4096                  10 fda7b000 MMAP
  8192                   1 fea8d000 MMAP
  8192                   1 fe76d000 MMAP
  8192                   1 fe66e000 MMAP
  4096                   1 fe570000 MMAP
  8192                   1 fe470000 MMAP
  4096                   1 fe372000 MMAP
  4096                   1 fe273000 MMAP

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-02-20 20:37:01 +00:00