When a storage device reports that it does not support cache flush, the
GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH
command.
On AWS, local volumes do not advertise themselves as having write-cache
enabled. When they are selected for L3 on all HDD nodes, the L3
subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command
fails with an ENOTSUPP return code. The fix is to make GEOM disk return
success (0) when this condition occurs and add a sysctl to make this
error handling config-driven
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/710
Add some extra files for building the driver as part of the kernel.
Change some #defines to match those used when building as a module.
PR: 268354
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/779
I do not know why this is here but it blocks compilation.
Removing it makes the builtin option the same as the module build
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/779
internal_ram_wr() only takes 3 args when ECORE_CONFIG_DIRECT_HWFN
is defined
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/779
SRIOV is being enabled in ecore.h but by then
the qlnx_os.h header has been processed and not
included the relevant headers
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/779
Notable upstream pull request merges:
#14654 Pack our DDT ZAPs a bit denser
#14979 Again fix race between zil_commit() and zil_suspend()
#14985 Some ZIO micro-optimizations
#15000 Fix remount when setting multiple properties
#15004 ddt_addref: remove unnecessary phys fill when refcount is 0
#15007 Do not report bytes skipped by scan as issued
#15023 Enable tuning of ZVOL open timeout value
Obtained from: OpenZFS
OpenZFS commit: 009d3288de
OpenZFS tag: zfs-2.2.0-rc1
Normally, modern unwinders uses Dwarf information to unwind stack,
however in case when the code is not annotated by Dwarf instructions,
unwinders fallbacks to a frame-pointer based algorithm.
That is allows libunwind to unwind stack from global constructors and
destructors. Also it makes gdb happy as it printed nonexistent frame
before.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40795
The right unwinding stop indicator should be CFI-undefined PC.
https://dwarfstd.org/doc/Dwarf3.pdf - page 118:
If a Return Address register is defined in the virtual unwind table,
and its rule is undefined (for example, by DW_CFA_undefined), then
there is no return address and no call address, and the virtual
unwind of stack activations is complete.
That is allows gdb and libunwind successfully stop when unwinding stack
from global constructors and destructors.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40794
This man page documents what is currently implemented in siftr.d.
It doesn't work right now in head, but in stable/13. Follow-up
commits will fix it for head.
Reviewed by: cc, pauamma_gundo.com
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D40809
The default timeout for ZVOL opens may not be sufficient for all cases,
so we should enable the value to be more easily tuned to account for
systems where the default value is insufficient.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes#15023
The declaration didn't use matching array bounds as the later
definition raising a -Warray-parameters warning from GCC. However,
the function is also defined before it is used, so the declaration
isn't strictly needed.
The DDT is really inefficient on 4k and up vdevs, because it always
allocates 4k blocks, and while compression could save us somewhat
at ashift 9, that stops being true.
So let's change the default to 32 KiB, which seems like a reasonable
compromise between improved space savings and inflated write sizes
for DDT updates.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#14654
The previous comment wondered if this case could happen; it turns out
that it really can't.
This block can only be entered if dde_type and dde_class are "real";
that only happens when a ddt entry has been previously synced to a ddt
store, that is, it was created on a previous txg. Since its gone through
that sync, its dde_refcount must be >0.
ddt_addref() is called from brt_pending_apply(), which is called at the
beginning of spa_sync(), before pending DMU writes/frees are issued.
Freeing a dedup block is the only thing that can decrement dde_refcount,
so there's no way for it to drop to zero before applying the clone bumps
it.
Further, even if it _could_ go to zero, it wouldn't be necessary to fill
the entry from the block. The phys content is not cleared until the free
is issued, which happens when the refcount goes to zero, when the last
real free comes through. The cloned block should be identical to what's
in the phys already, so the fill should be a no-op anyway.
I've replaced this with an assertion because this is all very dependent
on the ordering in which BRT and DDT changes are applied, and that might
change in the future.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: Klara, Inc.
Closes#15004
With zl_suspend read in zil_commit() not protected by any locks it
is possible for new ZIL writes to be in progress while zil_destroy()
called by zil_suspend() freeing them. This patch closes the race
by taking zl_issuer_lock in zil_suspend() and adding the second
zl_suspend check to zil_get_commit_list(), protected by the lock.
It allows all already queued transactions to be logged normally,
while blocks any new ones, calling txg_wait_synced() for the TXGs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#14979
- Pack struct zio_prop by 4 bytes from 84 to 80.
- Skip new child ZIO locking while linking to parent. The newly
allocated ZIO is not externally visible yet, so nobody should care.
- Skip io_bp_copy writes when not used (write && non-debug).
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#14985
Scan process may skip blocks based on their birth time, DVA, etc.
Traditionally those blocks were accounted as issued, that caused
reporting of hugely over-inflated numbers, having nothing to do
with actual disk I/O. This change utilizes never used field in
struct dsl_scan_phys to account such skipped bytes, allowing to
report how much data were actually scrubbed/resilvered and what
is the actual I/O speed. While formally it is an on-disk format
change, it should be compatible both ways, so should not need a
feature flag.
This should partially address the same issue as c85ac731a0, but
from a different perspective, complementing it.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15007
This patch changes the passing of "size" to snprintf
from hard-coded (openended) to sizeof(errbuf). This
is bringing to standard with rest of the code where-
ever 'errbuf' is used.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes#15003
The previous code was checking zfs_is_namespace_prop() only for the
last property on the list. If one was not "namespace", then remount
wasn't called. To fix that move zfs_is_namespace_prop() inside the
loop and remount if at least one of properties was "namespace".
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15000
When generated files depend on tools that need to be built for host,
we need to carefully separate them for the DIRDEPS_BUILD so we
only build them once.
Reviewed by: stevek
Sponsored by: Juniper Networks, Inc.
Further testing (sadly, after committing) shows that I missed the fact
that IN_BASE is used as userland/kernel delimiter (and not just for
FreeBSD-specific code unlike the IN_FREEBSD_BASE). Revert until I have
a full (and proper) fix.
This reverts commit d2a45e9e81.
Consistently use IN_BASE to allow libzfs to get the same default
autotrim value as kernel does.
Note that this does not change the default value itself, rather
fixing the source of value and the value itself in e.g. zpool get
output if it was not set explicitly. (And as a reminder, default
value of autotrim on FreeBSD is 'on', despite what zpoolprops(7)
says currently.)
PR: 264234
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D40075
The issue that this is designed to work around is only applicable to
glibc, since it's caused by glibc's pthread_cancel() implementation
using dlopen on libgcc_s.so.1 (and therefor not triggering dracut to
include it in the initramfs). This commit adds an extra condition to the
workaround that tests for glibc via "ldconfig -p | grep -qF 'libc.so.6'"
(which should only be present on glibc systems).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Violet Purcell <vimproved@inventati.org>
Closes#14992
Consistently get the proper default value for autotrim.
Currently, only the kernel module is built with IN_FREEBSD_BASE,
and libzfs get the wrong default value, leading to confusion and
incorrect output when autotrim value was not set explicitly.
Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes#15016
This is almost certainly not the meaning of PCB used here.
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40786
The setting of VM_NFREEORDER and the comment describing it were copied
from sparc64 where both the page size and the number of page table
entries that fit in a cache line are different from arm64.
Reviewed by: andrew, kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40782
Device tree overlays are installed in /boot/dtb/overlays by default.
Adjust the comment to mention fdt_overlays and loader.conf, but do not
repeat what is said in the parent directory's description.
PR: 261349
Reviewed by: grahamperrin, kevans
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40785
The right unwinding stop indicator should be CFI-undefined PC.
https://dwarfstd.org/doc/Dwarf3.pdf - page 118:
If a Return Address register is defined in the virtual unwind table,
and its rule is undefined (for example, by DW_CFA_undefined), then
there is no return address and no call address, and the virtual
unwind of stack activations is complete.
This requires the crt code be built with unwind tables, for that remove
-fno-asynchronous-unwind-tables to enable unwind tables generation.
PR: 241562, 246322, 246537
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40780