Previous code tried to keep prefetch streams while moving dnode. But
it was at least not updating per-stream zs_fetchback pointers, causing
use-after-free on next access. Instead of that I see much easier and
cleaner to just drop old prefetch state and start new from scratch.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#11936Closes#11998
While OpenZFS does permit breaking changes to the libzfs API, we should
avoid these changes when reasonably possible, and take steps to mitigate
the impact to consumers when changes are necessary.
Commit e4288a8397 made a libzfs API change that is especially
difficult for consumers because there is no change to the function
signatures, only to their behavior. Therefore, consumers can't notice
that there was a change at compile time. Also, the API change was
incompletely and incorrectly documented.
The commit message mentions `zfs_get_prop()` [sic], but all callers of
`get_numeric_property()` are impacted: `zfs_prop_get()`,
`zfs_prop_get_numeric()`, and `zfs_prop_get_int()`.
`zfs_prop_get_int()` always calls `get_numeric_property(src=NULL)`, so
it assumes that the filesystem is not mounted. This means that e.g.
`zfs_prop_get_int(ZFS_PROP_MOUNTED)` always returns 0.
The documentation says that to preserve the previous behavior, callers
should initialize `*src=ZPROP_SRC_NONE`, and some callers were changed
to do that. However, the existing behavior is actually preserved by
initializing `*src=ZPROP_SRC_ALL`, not `NONE`.
The code comment above `zfs_prop_get()` says, "src: ... NULL will be
treated as ZPROP_SRC_ALL.". However, the code actually treats NULL as
ZPROP_SRC_NONE. i.e. `zfs_prop_get(src=NULL)` assumes that the
filesystem is not mounted.
There are several existing calls which use `src=NULL` which are impacted
by the API change, most noticeably those used by `zfs list`, which now
assumes that filesystems are not mounted. For example,
`zfs list -o name,mounted` previously indicated whether a filesystem was
mounted or not, but now it always (incorrectly) indicates that the
filesystem is not mounted (`MOUNTED: no`). Similarly, properties that
are set at mount time are ignored. E.g. `zfs list -o name,atime` may
display an incorrect value if it was set at mount time.
To address these problems, this commit reverts commit e4288a8397bb1f:
"zfs get: don't lookup mount options when using "-s local""
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11999
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes#12003
Changed the default specified for zfs_dbgmsg_enable, added
clarification of interaction with zfs_flags.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#11984Closes#11986
Remove some extra whitespace.
Use pointer-typed asserts in Linux's znode cache destructor for more
info when debugging.
Simplify a couple of conversions from inode to znode when we already
have the znode.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11974
Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(),
ASSERT3P(), and likewise for VERIFY(). In some cases it ended up
making more sense to change the code, such as VERIFY on nvlist
operations that I have converted to use fnvlist instead. In one
place I changed an internal struct member from int to boolean_t to
match its use. Some asserts that combined multiple checks with &&
in a single assert have been split to separate asserts, to make it
apparent which check fails.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11971
Also minor clean-up with folding state_to_val() into a case,
unrolling the lesser-available seq into numbers,
ignoring vdev states we don't care about,
and documentation comments
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11934Closes#11935
For example, this would happily return "/dev/(null)" for /dev/sda1
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11935
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11970
IS_XATTRDIR is never used.
v_count is only used in two places, one immediately followed by the
use of the real name, v_usecount.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes#11973
We only recognize some history records, instead, use
same logic as in print_history_records() in zpool_main.c.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11940
Looking up mount options can be very expensive on servers with many
mounted file systems. When doing "zfs get" with any "-s" option that
does not include "temporary", the mount list will never be used. This
commit optimizes for that case.
This is a breaking commit for libzfs! Callers of zfs_get_prop are now
required to initialize src. To preserve existing behavior, they should
initialize it to ZPROP_SRC_NONE.
Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes#11955
Under function map_slot() variable passed as args
were not getting properly substituted or expanded.
This patch fixes the substitution issue.
Reviewed-by: Niklas Edmundsson <nikke@acc.umu.se>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes#11951Closes#11959
This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't
needlessly create DTL entries for offline devices that are already
up-to-date.
Consider a 3-way mirror, with disk A always online (and so always with
an empty DTL) and B and C only occasionally online. When A & B resilver
with C offline, B's DTL will effectively be appended to C's due to these
spurious ZIOs even as the resilver empties B's DTL:
* These ZIOs land in vdev_mirror_scrub_done() and flag an error
* That flagged error causes vdev_mirror_io_done() to see
unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which
inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes
that flag in ZIO_VDEV_CHILD_FLAGS.
* That ZIO fails, too, and eventually zio_done() gets its hands on it
and calls vdev_stat_update().
* vdev_stat_update() sees the error and this zio...
* is not speculative,
* is not due to EIO (but rather ENXIO, since the device is closed)
* has an ->io_vd != NULL (specifically, the offline leaf device)
* is a write
* is for a txg != 0 (but rather the read block's physical birth txg)
* has ZIO_FLAG_SCAN_THREAD asserted
* So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev.
Then, when A & C resilver with B offline, that story gets replayed and
C's DTL will be appended to B's.
In fact, one does not need this permanently-broken-mirror scenario to
induce badness: breaking a mirror with no DTLs and then scrubbing will
create DTLs for all offline devices. These DTLs will persist until the
entire mirror is reassembled for the duration of the *resilver*, which,
incidentally, will not consider the devices with good data to be sources
of good data in the case of a read failure.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
Closes#11930
In Fedora 28 the packaging guidelines were changed such that ldconfig
should no longer be called in either the %post or %postun scriptlets.
Instead the new compatibility macros %ldconfig_post, %ldconfig_postun,
and %ldocnfig_scriptlets should be used.
Since we only currently support Fedora 31 and newer, we could drop
%post or %postun scriptlets entirely according to the guidelines.
However, since we also use the same spec file for CentOS / RHEL
it's convenient to call the macros which are available starting
with CentOS / RHEL 8. For CentOS / RHEL 7 we must still call
ldconfig in the traditional way.
https://fedoraproject.org/wiki/Changes/Removing_ldconfig_scriptlets
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11931
If zdb is not built with DEBUG mode, the ASSERT macros will be
eliminated.
This will leave vim defined, but not used (gcc warning) and
checkpoint spacemap validation loop will do nothing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11932
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail. Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11949
This obeys the change in freebsd/freebsd-src@bce7ee9d4
External-issue: https://reviews.freebsd.org/D26980
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes#11947
Receiving datasets while blanket inheriting properties like zfs
receive -x mountpoint can generally be desirable, e.g. to avoid
unexpected mounts on backup hosts.
Currently this will fail to receive zvols due to the mountpoint
property being applicable to filesystems only. This limitation
currently requires operators to special-case their minds and tools
for zvols.
This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x),
errors for overriding (-o).
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#11416Closes#11840Closes#11864
Introduce a specific valid function for avx512f+avx512bw (instead
of checking only for avx512f).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes#11937Closes#11938
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts
By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added
This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11924Closes#11928
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes#11918
If $FSLIST exists but is empty, the generator fails with
sort: cannot read: '/etc/zfs/zfs-list.cache/*':
No such file or directory
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11915
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11886
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11886
Also open the temp file cloexec
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11886
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11886
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11886
This replaces the generic libspl atomic.c atomics implementation
with one based on builtin gcc atomics. This functionality was added
as an experimental feature in gcc 4.4. Today even CentOS 7 ships
with gcc 4.8 as the default compiler we can make this the default.
Furthermore, the builtin atomics are as good or better than our
hand-rolled implementation so it's reasonable to drop that custom code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11904
- Add additional logging to provide more information about why the
test failed. This including logging more of the individual commands
and the contents and differences of the record files on failure.
- Updated get_vdevs() to properly exclude all top-level vdevs
including raidz3 and draid[1-3].
- Replaced gnudd with dd. This is the only remaining place in the
test suite gnudd is used and it shouldn't be needed.
- The refill_test_env function expects the pool as the first argument
but never sets the pool variable.
- Only fill the test pools to 50% of capacity instead of 75% to help
speed up the tests.
- Fix replace_missing_devs() calculation, MINDEVSIZE should be
MINVDEVSIZE.
- Fix damage_devs() so it overwrites almost all of the device so
we're guaranteed to damage filesystem blocks.
- redundancy_stripe.ksh should not use log_mustnot to check if the
pool is healthy since the return value may be misinterpreted.
Just perform a normal conditional check and log the failure.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11906
Objtool requires the use of a DRAP register while aligning the
stack. Since a DRAP register is a gcc concept and we are
notoriously low on registers in the crypto code, it's not worth
the effort to mimic gcc generated stack realignment.
We simply silence the warning by adding the offending object files
to OBJECT_FILES_NON_STANDARD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#6950Closes#11914
This deduplicates 2 sets of caches which use the same allocation size.
Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11877
This partly mirrors what the i-t script does (though that mounts all
children, recursively) ‒ /etc, /usr, /lib*, and /bin are all essential,
if present, to successfully invoke the real init, which will then mount
everything else it might need in the right order
The following extreme-case set-up boots w/o issues now:
/ zoot zfs rw,relatime,xattr,noacl
├─/etc zoot/etc zfs rw,relatime,xattr,noacl
├─/usr zoot/usr zfs rw,relatime,xattr,noacl
│ └─/usr/local zoot/usr/local zfs rw,relatime,xattr,noacl
├─/var zoot/var zfs rw,relatime,xattr,noacl
│ ├─/var/lib zoot/var/lib zfs rw,relatime,xattr,noacl
│ ├─/var/log zoot/var/log zfs rw,relatime,xattr,posixacl
│ ├─/var/cache zoot/var/cache zfs rw,relatime,xattr,noacl
│ └─/var/tmp zoot/var/tmp zfs rw,relatime,xattr,noacl
├─/home zoot/home zfs rw,relatime,xattr,noacl
│ └─/home/nab zoot/home/nab zfs rw,relatime,xattr,noacl
├─/boot zoot/boot zfs rw,relatime,xattr,noacl
├─/root zoot/home/root zfs rw,relatime,xattr,noacl
├─/opt zoot/opt zfs rw,relatime,xattr,noacl
└─/srv zoot/srv zfs rw,relatime,xattr,noacl
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11898
"debug" is also used by systemd itself, and there's really no reason for
the generator to write this much garbage by default
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11898
Since the assembly routines calculating SHA checksums don't use
a standard stack layout, CFI directives are needed to unroll the
stack.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11733
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#11872Closes#11896
Before #11710 the flags in zfs-send(8) were sorted.
Restore order and bump the date.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11905
Fixes get_system_hostid() if it was set via the aforementioned sysctl
and simplifies the code a bit. The kernel and user-space must agree,
after all.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11879
This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit
32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers
to ->proc_handler") ‒ 5.7-rc1 and up
The access_ok() check in copy_to_user() in proc_copyout_string() would
always fail, so all userspace reads and writes would fail with EINVAL
proc_dostring() strips only the final new-line,
but simple_strtoul() doesn't actually need a back-trimmed string ‒
writing "012345678 \n" is still allowed, as is "012345678zupsko", &c.
This alters what happens when an invalid value is written ‒
previously it'd get set to what-ever simple_strtoul() returned
(probably 0, thereby resetting it to default), now it does nothing
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11878Closes#11879