Prefer UNMAPPEDIO and ROTATING from flags sysctl. See
1. aeab0812e6 (Add flags sysctl to ada)
2. cf3ff63e55 (Convert unmappedio over to a flag)
3. 96eb32bf0f (Convert rotating to a flag bit)
Reviewed by: imp, ken, #cam
MFC after: immediately (we want this in 14.0)
Differential Revision: https://reviews.freebsd.org/D42402
These sys/cdefs.h are not needed. Purge them. They are mostly left-over
from the $FreeBSD$ removal. A few in libc are still required for macros
that cdefs.h defines. Keep those.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D42385
In 922337e8d3 I added MOVED_LIBS into list-old-files, so that
delete-old-files would remove the old /usr/lib/libc++.so.1 as soon as
possible (after the library moved to /lib).
I left it in list-old-libs in case a user updated their src tree between
delete-old-files and delete-old-libs. Now that some time has passed,
tremove the redundant MOVED_LIBS entry.
PR: 272642
Sponsored by: The FreeBSD Foundation
Don't fill the fields of the UDP/IP header not used for the
checksum computation before performing the checksum computation.
Reviewed by: glebius
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42275
Notable upstream pull request merges:
#15366c3773de1 ZIL: Cleanup sync and commit handling
#15409dbe839a9 zvol: Cleanup set property
#1540960387fac zvol: Implement zvol threading as a Property
#154099ccdb8be zvol: fix delayed update to block device ro entry
#1544805a7348a RAIDZ: Use cache blocking during parity math
#15452514d661c Tune zio buffer caches and their alignments
#15456799e09f7 Unify arc_prune_async() code
#15465763ca47f Fix block cloning between unencrypted and encrypted
datasets
To make the module version better comparable, the module version number
now includes the commit count since last tag.
Obtained from: OpenZFS
OpenZFS commit: 41e55b476b
There is no "free list" for a long time now.
While here slightly tidy up affected comments in other ways.
Note that the "free vnode" term is a misnomer at best and will also need
to get sorted out.
Bump FreeBSD_version to 1500003 to mark the removal of the forward
compat code for the inode64 conversion. This removal should be a nop.
Sponsored by: Netflix
Forward compatibility code was added for running newer ino64 binaries on
older kernels as a transition aide. Now that ino64 has been in the tree
6 years, this code is no longer useful and should have been removed long
ago. Remove it now. Should be no user-visible changes at this point as
all the 'upgrade' scenarios it was intended for are long since past.
Also need to remove this stuff from rtld since the _foo versions
no longer exist.
Sponsored by: Netflix
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D42382
Calling nfs_reset_shares on Linux prints a warning:
`failed to lock /etc/exports.d/zfs.exports.lock: No such file or
directory`
when /etc/exports.d does not exist. The directory gets created, when a
filesystem is actually exported through nfs_toggle_share and
nfs_init_share. The truncation of /etc/exports.d/zfs.exports happens
unconditionally when calling `zfs mount -a` (via zfs_do_mount and
share_mount in `cmd/zfs/zfs_main.c`).
Fixing the issue only in the Linux part, since the exports file on
freebsd is in `/etc/zfs/`, which seems present on 2 FreeBSD systems I
have access to (through `/etc/zfs/compatibility.d/`), while a Debian
box does not have the directory even if `/usr/sbin/exportfs` is
present through the `nfs-kernel-server` package.
The code for exports_available is copied from nfs_available above.
Fixes: ede037cda7
("Make zfs-share service resilient to stale exports")
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes#15369Closes#15468
Implement PS2 Keyboard command 0xf6, which is "SET DEFAULTS". This is
the same as 0xf5 (DISABLE KEYBOARD), but without disabling the keyboard
(since that resets all the defaults as a side effect). Normally, we
clear the fifo when we re-enable the keyboard. However, since this
leaves the keyboard enabled, clear the fifo as part of this command and
send an ack.
Linux's keyboard driver sends this command on reboot. Other commands
enable / reset the kebyoard, so it doesn't matter too much this isn't
implemented for booting, eg ubuntu.
Sponsored by: Netflix
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D42384
Block cloning from an encrypted dataset into an unencrypted dataset
and vice versa is not possible. The current code did allow cloning
unencrypted files into an encrypted dataset causing a panic when
these were accessed. Block cloning between encrypted and encrypted
is currently supported on the same filesystem only.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes#15464Closes#15465
These were obtained from a drive, but they agree with the IBM
documentation.
The bpi/bpmm values are the same as TS1160, but the number of
tracks is much larger (18944 tracks vs 8704 for TS1160). The tapes
are also longer, 1337m total. (According to the MAM on a sample JF
tape. I don't have a JE tape handy to compare.) The end result
is a 50TB raw capacity (150TB compressed) for TS1170 with a JF
cartridge vs 20TB raw capacity (60TB compressed) for TS1160 with
a JE cartridge.
lib/libmt/mtlib.c:
Add the TS1170 density codes to the denstiy table in libmt.
usr.bin/mt/mt.1:
Add the TS1170 density codes and specs to the density table
in the mt(1) man page. As usual for TS drives, there is an
encrypted and non-encrypted density code (0x79 and 0x59
respectively).
MFC after: 3 days
Sponsored by: Spectra Logic
GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15459
The change in the zvol readonly property does not update the block
device readonly entry until the first IO to the ZVOL. This patch
addresses the issue by updating the block device readonly property
from the set property IOCTL call.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#15409
Currently, zvol threading can be switched through the zvol_request_sync
module parameter system-wide. By making it a zvol property, zvol
threading can be switched per zvol.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#15409
zvol_set_volmode() and zvol_set_snapdev() share a common code path.
Merge this shared code path into zvol_set_common().
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#15409
Allow SCTP state timeouts to be configured independently from TCP state
timeouts.
Reviewed by: tuexen
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42393
Allow the kernel to supply more array elements than expected, but cut
off when we hit what we think the maximum is. This will improve forward
compatibility (i.e. old userspace with newer kernel).
Reviewed by: zlei
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42392
The initial multihome implementation was a little simplistic, and failed
to create all of the required states. Given a client with IP 1 and 2 and
a server with IP 3 and 4 we end up creating states for 1 - 3 and 2 - 3,
as well as 3 - 1 and 4 - 1, but not for 2 - 4.
Check for this.
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42362
The existing code to create extra states when SCTP endpoints supplied
extra addresses missed a case. As a result we failed to generate all of
the required states.
Briefly, if host A has address 1 and 2 and host B has addres 3 and 4 we
generated 1 - 3 and 2 - 3, as well as 1 - 4, but not 2 - 4.
Store the list of endpoints supplied by each host and use those to
generate all of the connection permutations.
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42361
There is no sense to have separate implementations for FreeBSD and
Linux. Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.
Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15456
Many sysadmins prefer to configure their systems to UTC and it's a
reasonable default when installing, making it easier to get a usable
system by just hitting <return> repeatidly.
Renumber UTC to 0 to preserve the finger memory of those selecting a
region by shortcut.
Reviewed by: jrtc27, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42383
We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise. Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor. This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.
Reduce number of caches to half-power of 2 instead of quarter-power
of 2. This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way. 6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.
Remove explicit alignment in Linux user-space case. I don't think
it should be needed any more with the above fixes.
As result on FreeBSD instead of such numbers of pages per slab:
vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_14336.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_10240.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_7168.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 2 <= Broken
vm.uma.zio_buf_comb_5120.keg.ppera: 2
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3584.keg.ppera: 7 <= Hard to free
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2560.keg.ppera: 2
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1
I am now getting such:
vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 3 <= Fixed, 2 in 3 pages
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15452
RAIDZ parity is calculated by adding data one column at a time. It
works OK for small blocks, but for large blocks results of previous
addition may already be evicted from CPU caches to main memory, and
in addition to extra memory write require extra read to get it back.
This patch splits large parity operations into 64KB chunks, that
should in most cases fit into CPU L2 caches from the last decade.
I haven't touched more complicated cases of data reconstruction to
not over complicate the code. Those should be relatively rare.
My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show
up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/
RAIDZ2 blocks of ~4MB and up. Older CPUs with 256KB of L2 cache
should see the effect even on smaller blocks. Wider vdevs may need
bigger blocks to be affected.
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15448
ZVOL:
- Mark all ZVOL ZIL transactions as sync. Since ZVOLs have only
one object, it makes no sense to maintain async queue and on each
commit merge it into sync. Single sync queue is just cheaper, while
it changes nothing until actual commit request arrives.
- Remove zsd_sync_cnt and the zil_async_to_sync() calls since we
are no longer switching between sync and async queues.
ZFS:
- Mark write transactions as sync based only on number of sync
opens (z_sync_cnt). We can not randomly jump between sync and
async unless we want data corruptions due to writes reordering.
- When file first opened with O_SYNC (z_sync_cnt incremented to 1)
call zil_async_to_sync() for it to preserve correct ordering between
past and future writes.
- Drop zfs_fsyncer_key logic. Looks like it was an optimization
for workloads heavily intermixing async writes with tons of fsyncs.
But first it was broken 8 years ago due to Linux tsd implementation
not allowing data storage between syscalls, and second, I doubt it
is safe to switch from async to sync so often and without calling
zil_async_to_sync().
- Rename sync argument of *_log_write() into commit, now only
signalling caller's intent to call zil_commit() soon after. It
allows WR_COPIED optimizations without extra other meanings.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#15366
The .mcount function needs a BTI branch target. As we can't rely on
asm.h being included use the hint version of a "bti c" instruction.
This is a nop when BTI is not supported or not used.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42230
Add the Branch Target Identification (BTI) note to libc assembly
sources. As all obect files need the note for rtld to have it we need
to insert it in all asm files.
Reviewed by: markj, emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42228
Add the Branch Target Identification (BTI) note to libc assembly
sources and Pointer Authentication Code (PAC) instructions to _init and
_fini.
_init and _fini may be called indirectly so need a BTI landing pad. As
they are non-leaf functions use the appropriate PAC instruction that
also guards against changing the link register.
As all object files need the note for any binary using these object files
we need to insert it in all asm files.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42227
The new STATIC_TLS_EXTRA variable provides a means for applications
to increases the size of the extra static TLS space allocated by
rtld beyond the default of '128'. This extra static TLS space is used
for objects loaded with dlopen.
The value specified in the variable must be no less than the default
value and no greater than the maximum allowed value for size_t type.
If an invalid value is specified, rtld will ignore it and just use
the default value.
The rtld(1) man page is updated to document this new option.
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D42025
This reverts commit 9eff639071.
The libpfctl port has been fixed (to avoid using DIOCGETSTATESV2), so we
can now safely revert this.
Sponsored by: Rubicon Communications, LLC ("Netgate")
This enum also exists as enum ibv_raw_packet_caps in libibverbs/verbs.h.
[khng: cherry-picked from Linux
ebaaee253ad3a3c573ab7d3d77e849056bdfa9ea]
Sponsored by: Juniper Networks, Inc.
MFC after: 7 days
Reviewed by: kib, zlei
Differential Revision: https://reviews.freebsd.org/D42177
The specification for the Toeplitz function doesn't require to set the key
explicitly to be symmetric. In case a symmetric functionality is required
a symmetric key can be simply used.
Wrongly forcing the algorithm to symmetric causes the wrong packet
distribution and a performance degradation.
Link: https://lore.kernel.org/r/20190723065733.4899-7-leon@kernel.org
Fixes: 28d6137008b2 ("IB/mlx5: Add RSS QP support")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Alex Vainman <alexv@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
[khng: cherry-picked from Linux
b7165bd0d6cbb93732559be6ea8774653b204480]
Sponsored by: Juniper Networks, Inc.
MFC after: 7 days
Reviewed by: kib, zlei
Differential Revision: https://reviews.freebsd.org/D42178
Normalize on hard tabs.
I didn't catch this before pushing the previous commit.
No functional changes intended.
MFC after: 2 weeks
MFC with: 8ef8da882f