Compare commits

...

307 Commits

Author SHA1 Message Date
Tony Hutter baa5031456 Tag zfs-2.2.6
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-08-27 14:53:03 -07:00
shodanshok cd42e992b5 Enable L2 cache of all (MRU+MFU) metadata but MFU data only
`l2arc_mfuonly` was added to avoid wasting L2 ARC on read-once MRU
data and metadata. However it can be useful to cache as much
metadata as possible while, at the same time, restricting data
cache to MFU buffers only.

This patch allow for such behavior by setting `l2arc_mfuonly` to 2
(or higher). The list of possible values is the following:
0: cache both MRU and MFU for both data and metadata;
1: cache only MFU for both data and metadata;
2: cache both MRU and MFU for metadata, but only MFU for data.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16343 
Closes #16402
2024-08-27 14:53:03 -07:00
Ameer Hamza c60df6a801 linux/zvol_os: fix zvol queue limits initialization
zvol queue limits initialization depends on `zv_volblocksize`, but it is
initialized later, leading to several limits being initialized with
incorrect values, including `max_discard_*` limits. This also causes
`blkdiscard` command to consistently fail, as `blk_ioctl_discard` reads
`bdev_max_discard_sectors()` limits as 0, leading to failure. The fix is
straightforward: initialize `zv->zv_volblocksize` early, before setting
the queue limits. This PR should fix `zvol/zvol_misc/zvol_misc_trim`
failure on recent PRs, as the test case issues `blkdiscard` for a zvol.
Additionally, `zvol_misc_trim` was recently enabled in `6c7d41a`,
which is why the issue wasn't identified earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16454
2024-08-26 15:10:16 -07:00
Rob Norris d8fa32a79d linux/zvol_os: tidy and document queue limit/config setup
It gets hairier again in Linux 6.11, so I want some actual theory of
operation laid out for next time.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-26 15:10:16 -07:00
Tino Reichardt 88a5ee0706 ZTS: fix zfs_copies_006_pos test on Ubuntu 20.04 (#16409)
This test was failing before:
- FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS)

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-26 15:10:16 -07:00
Tino Reichardt 0465fbecd7 ZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)
The timezone "US/Mountain" isn't supported on newer linux versions.
Using the correct timezone "America/Denver" like it's done in FreeBSD
will fix this. Older Linux distros should behave also okay with this.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-26 15:10:16 -07:00
Shengqi Chen a99a37991e contrib: link zpool to zfs in bash-completion (#16376)
Currently user won't have completion of `zpool` command until they
trigger completion of `zfs` first. This patch adds a link to `zfs`,
thus user can use both to initialize the completion.

Fixes: #16320

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-26 15:10:16 -07:00
Shengqi Chen 2ca1515374 module/icp/asm-arm/sha2: enable non-SIMD asm kernels on armv5/6
My merged pull request #15557 fixes compilation of sha2 kernels on arm
v5/6. However, the compiler guards only allows sha256/512_armv7_impl to
be used when __ARM_ARCH > 6. This patch enables these ASM kernels on all
arm architectures. Some compiler guards are adjusted accordingly to
avoid the unnecessary compilation of SIMD (e.g., neon, armv8ce) kernels
on old architectures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15623
2024-08-26 15:10:16 -07:00
Shengqi Chen bce36e21ca module/icp/asm-arm/sha2: auto detect __ARM_ARCH
This patch uses __ARM_ARCH set by compiler (both
GCC and Clang have this) whenever possible instead
of hardcoding it to 7. This change allows code to
compile on earlier ARM architectures such as armv5te.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557
2024-08-26 15:10:16 -07:00
Tony Hutter 86492e3c96 Linux 6.10 compat: META
Update the META file to reflect compatibility with the 6.10 kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16466
2024-08-22 15:43:46 -07:00
Ameer Hamza 07f0465742 linux/zvol_os.c: cleanup limits for non-blk mq case
Rob Noris suggested that we could clean up redundant limits for the case
of non-blk mq scenario.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462
2024-08-22 15:43:34 -07:00
Ameer Hamza 0f9457d1dd linux/zvol_os.c: Fix max_discard_sectors limit for 6.8+ kernel
In kernels 6.8 and later, the zvol block device is allocated with
qlimits passed during initialization. However, the zvol driver does not
set `max_hw_discard_sectors`, which is necessary to properly
initialize `max_discard_sectors`. This causes the `zvol_misc_trim` test
to fail on 6.8+ kernels when invoking the `blkdiscard` command. Setting
`max_hw_discard_sectors` in the `HAVE_BLK_ALLOC_DISK_2ARG` case resolve
the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462
2024-08-22 15:43:34 -07:00
Justin Gottula 859f906a4b Fix null ptr deref when renaming a zvol with snaps and snapdev=visible (#16316)
If a zvol is renamed, and it has one or more snapshots, and
snapdev=visible is true for the zvol, then the rename causes a kernel
null pointer dereference error. This has the effect (on Linux, anyway)
of killing the z_zvol taskq kthread, with locks still held; which in
turn causes a variety of zvol-related operations afterward to hang
indefinitely (such as udev workers, among other things).

The problem occurs because of an oversight in #15486
(e36ff84c33). As documented in
dataset_kstats_create, some datasets may not actually have kstats
allocated for them; and at least at the present time, this is true for
snapshots. In practical terms, this means that for snapshots,
dk->dk_kstats will be NULL. The dataset_kstats_rename function
introduced in the patch above does not first check whether dk->dk_kstats
is NULL before proceeding, unlike e.g. the nearby
dataset_kstats_update_* functions.

In the very particular circumstance in which a zvol is renamed, AND that
zvol has one or more snapshots, AND that zvol also has snapdev=visible,
zvol_rename_minors_impl will loop over not just the zvol dataset itself,
but each of the zvol's snapshots as well, so that their device nodes
will be renamed as well. This results in dataset_kstats_create being
called for snapshots, where, as we've established, dk->dk_kstats is
NULL.

Fix this by simply adding a NULL check before doing anything in
dataset_kstats_rename.

This still allows the dataset_name kstat value for the zvol to be
updated (as was the intent of the original patch), and merely blocks
attempts by the code to act upon the zvol's non-kstat-having snapshots.
If at some future time, kstats are added for snapshots, then things
should work as intended in that case as well.

Signed-off-by: Justin Gottula <justin@jgottula.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-22 15:42:49 -07:00
Tony Hutter 84a9861536 Linux 6.10 compat: Fix zvol NULL pointer deference
zvol_alloc_non_blk_mq()->blk_queue_set_write_cache() needs the disk
queue setup to prevent a NULL pointer deference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16453
2024-08-22 15:42:49 -07:00
Tony Hutter 9d64d1bfad Linux 6.10 compat: fix rpm-kmod and builtin
The 6.10 kernel broke our rpm-kmod builds.  The 6.10 kernel really
wants the source files in the same directory as the object files.
This workaround makes rpm-kmod work again.  It also updates
the builtin kernel codepath to work correctly with 6.10.

See kernel commits:

b1992c3772e6 kbuild: use $(src) instead of $(srctree)/$(src) for source
                     directory
9a0ebe5011f4 kbuild: use $(obj)/ instead of $(src)/ for common pattern
                     rules

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16439
Closes #16450
2024-08-22 15:42:49 -07:00
Tony Hutter ce22dc2589 ZTS: Use /dev/urandom instead of /dev/random
Use /dev/urandom so we never have to wait on entropy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16442
2024-08-22 15:42:14 -07:00
Rob Norris 8479a45abe Linux 6.11: avoid passing "end" sentinel to register_sysctl()
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 8156099cf2 Linux 6.11: add compat macro for page_mapping()
Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 11ad6124c3 Linux 6.11: add more queue_limit fields with removed setters
These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 11de432c8b Linux 6.11: IO stats is now a queue feature flag
Apply them with with the rest of the settings.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 464747ffd3 Linux 6.11: first arg to proc_handler is now const
Detect it, and use a macro to make sure we always match the prototype.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 92a8af0f8b Linux 6.11: get backing_dev_info through queue gendisk
It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 4fa84563b8 Linux 6.11: enable queue flush through queue limits
In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Mark Johnston 6961d4fb57 ZTS: Add a test to verify that copy_file_range obeys RLIMIT_FSIZE
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-22 15:31:56 -07:00
Mark Johnston 3a36797ad6 FreeBSD: Fix RLIMIT_FSIZE handling for block cloning
ZFS implements copy_file_range(2) using block cloning when possible.
This implementation must respect the RLIMIT_FSIZE limit.

zfs_clone_range() already checks the limit, so it is safe to remove this
check in zfs_freebsd_copy_file_range().  Moreover, the removed check
produces false positives: the length passed to copy_file_range(2) may be
larger than the input file size; as the man page notes, "for best
performance, call copy_file_range() with the largest len value
possible."  In particular, some existing code passes SSIZE_MAX there.

The check in zfs_clone_range() clamps the length to the input file's
size before checking, but the removed check uses the caller supplied
length, so something like

$ echo a > /tmp/foo
$ limits -f 1024 cat /tmp/foo > /tmp/bar

fails because FreeBSD's cat(1) uses copy_file_range(2) in the manner
described above.

Reported-by: Philip Paeps <philip@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-22 15:17:21 -07:00
c1ick ac6500389b zfs: add bounds checking to zil_parse (#16308)
Make sure log record don't stray beyond valid memory region.

There is a lack of verification of the space occupied by fixed members
of lr_t in the zil_parse.

We can create a crafted image to trigger an out of bounds read by
following these steps:
    1) Do some file operations and reboot to simulate abnormal exit
       without umount
    2) zil_chain.zc_nused: 0x1000
    3) First lr_t
       lr_t.lrc_txtype: 0x0
       lr_t.lrc_reclen: 0x1000-0xb8-0x1
       lr_t.lrc_txg: 0x0
       lr_t.lrc_seq: 0x1
    4) Update checksum in zil_chain.zc_eck

Fix:
Add some checks to make sure the remaining bytes are large enough to
hold an log record.

Signed-off-by: XDTG <click1799@163.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-08-22 15:12:54 -07:00
Rob Norris 1f055436f3 linux/zvol_os: fix SET_ERROR with negative return codes
SET_ERROR is our facility for tracking errors internally. The negation
is to match the what the kernel expects from us. Thus, the negation
should happen outside of the SET_ERROR.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364
2024-08-22 15:11:44 -07:00
Tino Reichardt 0172ee525b ZTS: fix io_uring test on RHEL 9 variants (#16411)
Simplify the test, by using the variable "$PLATFORM_ID" in favor
of "$REDHAT_SUPPORT_PRODUCT_VERSION".

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-22 15:06:40 -07:00
Tony Hutter 33174af151 Tag zfs-2.2.5
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-08-02 18:03:09 -07:00
Tony Hutter 6f27c4cadd [2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)
db65272ae was added to zfs-2.2.4 to stub in the
VDEV_PROP_RAIDZ_EXPANDING enum without adding the RAIDz expansion
feature.  This was needed to provide the right enum count for when the
VDEV_PROP_SLOW_IO proprieties got added.  This had the unfortunate side
effect of breaking module removal though.

Specifically, with the VDEV_PROP_RAIDZ_EXPANDING stub added,
the module would correctly omit making kobjects for the RAIDz expansion
vdev property, but then would try to blindly remove its non-existent
kobjects during module unload.

This commit fixes the issue by checking for an uninitialized kobject.

Fixes: #16249

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-02 18:03:09 -07:00
Alexander Motin dd5de55eba ZTS: Make do_vol_test() more deterministic (#16379)
- Explicitly disable compression since mkfile uses a zero buffer.
 - Explicitly sync file systems instead of waiting for timeout.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-30 11:36:52 -07:00
Tony Hutter b5835ed137 Linux 6.9: Fix UBSAN errors in sa.c (#16380)
This is a follow-on to 156a64161b
that ignores UBSAN errors in sa.c.

Thank you @thwalker3 for the fix.

Original-patch-by: @thwalker3
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-23 17:13:57 -07:00
Chunwei Chen ef08cb26da Fix long_free_dirty accounting for small files (#16264)
For files smaller than recordsize, it's most likely that they don't have
L1 blocks. However, current calculation will always return at least 1 L1
block.

In this change, we check dnode level to figure out if it has L1 blocks
or not, and return 0 if it doesn't. This will reduce the chance of
unnecessary throttling when deleting a large number of small files.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-23 12:02:10 -07:00
Rob Norris 9ad205ecde AUTHORS: refresh with recent new contributors (#16362)
Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-07-23 11:58:49 -07:00
Mark Johnston 14cce09a65 FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)
This way we can avoid making assumptions about the SDT probe
implementation.  No functional change intended.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-23 11:58:49 -07:00
Rob Norris 9835255f5d zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly (#16334)
These are used for DDT and BRT stores. There's limited information
available to produce meaningful output, but at least we can put
something on screen rather than crashing.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-17 14:54:47 -07:00
Rob Norris 4d2f7f9839 vdev_open: clear async fault flag after reopen
After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.

In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.

The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 14:54:47 -07:00
Rob Norris 25c4271d2f zts: test single-disk pool resumes properly after disk pull
A single disk pool should suspend when its disk fails and hold the IO.
When the disk is returned, the pool should return and the IO be
reissued, leaving everything in good shape.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 14:54:47 -07:00
Martin Wagner c950c5d369 disable automatic dependency tracking for dkms builds
Previously the dkms build left some unwanted files
in `/usr/lib/modules` which could cause package
managers to not properly clean up old kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Wagner <martin.wagner.dev@gmail.com>
Closes #16221 
Closes #16241
2024-07-17 14:54:47 -07:00
Alexander Motin 13ccbbb47a Some improvements to metaslabs eviction
- Add old eviction for special and dedup metaslab classes. Those
vdevs may be potentially big and fragmented with large metaslabs,
while their asynchronous write pattern is not really different
from normal class. It seems an omission to not evict old metaslabs
from them.
 - If we have metaslab preload enabled, which means we are not too
low on memory, do not evict active metaslabs even if they are not
used for some time.  Eviction of active metaslabs means we won't
be able to write anything until we load them, that may take some
time, that is straight opposite to metaslab preload goals.  For
small systems the memory saving should be less important after
recent reduction in number of allocators and so open metaslabs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16214
2024-07-17 14:54:47 -07:00
Alexander Motin ba3c7692cd Destroy ARC buffer in case of fill error
In case of error dmu_buf_fill_done() returns the buffer back into
DB_UNCACHED state.  Since during transition from DB_UNCACHED into
DB_FILL state dbuf_noread() allocates an ARC buffer, we must free
it here, otherwise it will be leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
Closes #15802
Closes #16216
2024-07-17 14:54:47 -07:00
Rob N 27cc6df760 Use memset to zero stack allocations containing unions
C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.

GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.

When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.

Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.

As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16135
Closes #16206
2024-07-17 14:54:47 -07:00
Rob Norris d06c8de748 zdb: bring crash handling over from ztest
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-07-17 14:54:47 -07:00
Rob N 2a2e358475 libspl_assert: always link -lpthread on FreeBSD
The pthread_* functions are in -lpthread on FreeBSD. Some of them are
implicitly linked through libc, but on FreeBSD 13 at least
pthread_getname_np() is not. Just be explicit, since -lpthread is the
documented interface anyway.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16168
2024-07-17 14:54:46 -07:00
Martin Matuška bc42d96d66 Unbreak FreeBSD cross-build on MacOS broken in 051460b8b
MacOS used FreeBSD-compatible getprogname() and pthread_getname_np().
But pthread_getthreadid_np() does not exist on MacOS. This implements
libspl_gettid() using pthread_threadid_np() to get the thread id
of the current thread.

Tested with FreeBSD GitHub actions
freebsd-src/.github/workflows/cross-bootstrap-tools.yml

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16167
2024-07-17 14:54:46 -07:00
Rob Norris 88686213c3 libspl/assert: use libunwind for backtrace when available
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob Norris 21f66db674 libspl/assert: dump backtrace in assert
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob Norris 3ca305f873 libspl/assert: add lock around assertion output
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob Norris 96cad4ca4c libspl/assert: show process/task details in assert output
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Brooks Davis 5668411713 Only provide execvpe(3) when needed
Check for the existence of execvpe(3) and only provide the FreeBSD
compat version if required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Brooks Davis <brooks.davis@sri.com>
Closes #15609
2024-07-17 14:54:46 -07:00
Rob Norris 32cd2da551 find_system_library: fix var cleanup when library not found
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob N fa2480f5b3 abd_iter_page: rework to handle multipage scatterlists
Previously, abd_iter_page() would assume that every scatterlist would
contain a single page (compound or no), because that's all we ever
create in abd_alloc_chunks(). However, scatterlists can contain multiple
pages of arbitrary provenance, and if we get one of those, we'd get all
the math wrong.

This reworks things to handle multiple pages in a scatterlist, by
properly finding the right page within it for the given offset, and
understanding better where the end of the page is and not crossing it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16108
2024-07-17 14:54:46 -07:00
Rob N ad8c8c1e31 zts: add a debug option to get full test output
The test runner accumulates output from individual tests, then writes it
to the log at the end. If a test hangs or crashes the system half way
through, we get no insight into how it got to where it did.

This adds a -D option for "debug". When set, all test output is written
to stdout.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16096
2024-07-17 14:54:46 -07:00
Rob N f14a62ebbe zts: allow running a single test by name only
Specifying a single test is kind of a hassle, because the full relative
path under the test suite dir has to be included, but it's not always
clear what that path even is.

This change allows `-t` to take the name of a single test instead of a
full path. If the value has no `/` characters, we search for a file of
that name under the test root, and if found, use that as the full test
path instead.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16088
2024-07-17 14:54:46 -07:00
Daniel Berlin dfdac38afb Fix missing semicolon in trace_dbuf.h (#16281)
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop.  Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine.  It also compiles fine on 6.8.11
(the previous kernel).  I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.

All other instances in the source tree are already terminated with
semicolons.

Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-16 16:34:07 -07:00
a1ea321 08da054005 one-word manpage correction: snapshot->rollback (#16294)
This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.

Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-16 16:34:07 -07:00
Tony Hutter bb401c02fc Linux 6.9 compat: META (#16358)
Update the META file to reflect compatibility with the 6.9
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 16:29:26 -07:00
Rob Norris da9da6aea6 ZTS: handle FreeBSD version numbers correctly (#16340)
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-16 15:47:10 -07:00
Tony Hutter 97f1eb8052 ZTS: Fix redacted_send failures on FreeBSD
We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16224
2024-07-16 15:46:30 -07:00
Rob Norris 7d8e2a7f73 Linux 5.16: use bdev_nr_bytes() to get device capacity
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 15:40:29 -07:00
Rob Norris 3ea3649755 Linux 6.10: work harder to avoid kmem_cache_alloc reuse
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.

This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.

For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 15:33:46 -07:00
Rob Norris 0342c4a6b2 Linux 6.10: rework queue limits setup
Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.

This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 15:33:37 -07:00
Tony Hutter d7bf0e5259 Linux 6.9: Fix UBSAN errors in zap_micro.c
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks.  We previously excluded
zap_micro.o with:

UBSAN_SANITIZE_zap_micro.o := n

For some reason that didn't work for the 6.9 kernel, which wants us
to use:

UBSAN_SANITIZE_zfs/zap_micro.o := n

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
2024-07-16 15:33:31 -07:00
Tony Hutter c24a039042 Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)
The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: #16089

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-07-16 15:33:23 -07:00
Rob N f4e2aed42a Linux 6.7 compat: detect if kernel defines intptr_t
Since Linux 6.7 the kernel has defined intptr_t. Clang has
-Wtypedef-redefinition by default, which causes the build to fail
because we also have a typedef for intptr_t.

Since its better to use the kernel's if it exists, detect it and skip
our own.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16201
2024-07-16 15:33:17 -07:00
George Amanakis 54ef0fdf60
head_errlog: fix use-after-free
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16272
Closes #16273
2024-07-15 09:07:33 -07:00
George Amanakis 2eab4f7b39 Fix assertion in Persistent L2ARC
At the end of l2arc_evict() fix an assertion in the case that l2ad_hand
+ distance == l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16202
Closes #16207
2024-05-29 13:35:14 -07:00
Alexander Motin 4c0fbd8d6d FreeBSD: Add zfs_link_create() error handling
Originally Solaris didn't expect errors there, but they may happen
if we fail to add entry into ZAP.  Linux fixed it in #7421, but it
was never fully ported to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13215
Closes #16138
2024-05-29 08:54:19 -07:00
Alexander Motin fa4b1a404e ZAP: Fix leaf references on zap_expand_leaf() errors
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #12366 
Closes #16159
2024-05-29 08:54:19 -07:00
Alexander Motin 4c484d66b7 Fix ZIL clone records for legacy holes
Previous code overengineered cloned range calculation by using
BP_GET_LSIZE(). The problem is that legacy holes don't have the
logical size, so result will be wrong.  But we also don't need
to look on every block size, since they all must be identical.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16165
2024-05-29 08:54:19 -07:00
Alexander Motin 41f2a9c81f Fix scn_queue races on very old pools
Code for pools before version 11 uses dmu_objset_find_dp() to scan
for children datasets/clones.  It calls enqueue_clones_cb() and
enqueue_cb() callbacks in parallel from multiple taskq threads.
It ends up bad for scan_ds_queue_insert(), corrupting scn_queue
AVL-tree.  Fix it by introducing a mutex to protect those two
scan_ds_queue_insert() calls.  All other calls are done from the
sync thread and so serialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16162
2024-05-29 08:54:19 -07:00
Alexander Motin 6724746596 Slightly improve dnode hash
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16131
2024-05-29 08:54:19 -07:00
Alexander Motin 938d1588eb Make more taskq parameters writable
There is no reason for these module parameters to be read-only.
Being modified they just apply on next pool import/creation, that
is useful for testing different values.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16118
2024-05-29 08:54:19 -07:00
Alexander Motin 0f1e8ba2f8 L2ARC: Cleanup buffer re-compression
When compressed ARC is disabled, we may have to re-compress when
writing into L2ARC.  If doing so we can't fit it into the original
physical size, we should just fail immediately, since even if it
may still fit into allocation size, its checksum will never match.

While there, refactor the code similar to other compression places
without using abd_return_buf_copy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16038
2024-05-29 08:54:19 -07:00
Alexander Motin b474dfad0d Refactor dbuf_read() for safer decryption
In dbuf_read_verify_dnode_crypt():
 - We don't need original dbuf locked there. Instead take a lock
on a dnode dbuf, that is actually manipulated.
 - Block decryption for a dnode dbuf if it is currently being
written.  ARC hash lock does not protect anonymous buffers, so
arc_untransform() is unsafe when used on buffers being written,
that may happen in case of encrypted dnode buffers, since they
are not copied by dbuf_dirty()/dbuf_hold_copy().

In dbuf_read():
 - If the buffer is in flight, recheck its compression/encryption
status after it is cached, since it may need arc_untransform().

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16104
2024-05-29 08:54:19 -07:00
chenqiuhao1997 9edf6af4ae Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Qiuhao Chen <chenqiuhao1997@gmail.com>
Closes #15940
2024-05-13 10:27:38 -05:00
Tony Hutter 2566592045 Tag zfs-2.2.4
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-04-30 10:01:15 -07:00
Alan Somers 3d4d61988a Fix updating the zvol_htable when renaming a zvol
When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Closes #16127
Closes #16128
2024-04-30 10:01:15 -07:00
Brian Behlendorf 61f3638a34 Add prefetch property
ZFS prefetch is currently governed by the zfs_prefetch_disable
tunable. However, this is a module-wide settings - if a specific
dataset benefits from prefetch, while others have issue with it,
an optimal solution does not exists.

This commit introduce the "prefetch" tri-state property, which enable
granular control (at dataset/volume level) for prefetching.

This patch does not remove the zfs_prefetch_disable, which remains
a system-wide switch for enable/disable prefetch. However, to avoid
duplication, it would be preferable to deprecate and then remove
the module tunable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15237 
Closes #15436
2024-04-30 10:01:15 -07:00
Don Brady 706307445e vdev probe to slow disk can stall mmp write checker
Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15839
2024-04-30 10:01:15 -07:00
Don Brady ea3f7c12a9 Extend import_progress kstat with a notes field
Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Closes #15539
2024-04-29 17:45:53 -07:00
George Wilson 6f323353d2 Add ashift validation when adding devices to a pool
Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509
2024-04-29 13:50:05 -07:00
Ameer Hamza b3b37b84e8 Fix arcstats for FreeBSD after zfetch support
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16141
2024-04-29 13:50:05 -07:00
Ameer Hamza 4d17e200dd Add zfetch stats in arcstats
arc_summary also reports zfetch stats but it's inconvenient to monitor
contiguously incrementing numbers. Adding them in arcstats allows us to
observe streams more conveniently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16094
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav 5972bb856c Use ASSERT0P() to check that a pointer is NULL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-29 13:50:05 -07:00
Tony Hutter ef3fea63eb GCC: Fixes for gcc 14 on Fedora 40
- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16124
Closes #16125
2024-04-29 13:50:05 -07:00
Brian Behlendorf 71216b91d2 Python 3.12 deprecated python3-distutils
As for python-3.12 the distutils package has been deprecated.
The latest ax_python_devel.m4 macro from the autoconf archive
has been updated accordingly so let's pull in the new version.

We can also drop the changes made to our customized version
to continue if the development version is not installed since
this functionality has been included upstream.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16126
Closes #16129
2024-04-29 13:50:05 -07:00
Todd 284489893b zfs-kmod: fix empty rpm requires/conflicts
Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to
include the correct RPM requires/conflicts relationships.  With this
change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms
packages to be installed together.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Closes #16121
2024-04-29 13:50:05 -07:00
Seth Troisi 6581b17842 ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16114
2024-04-29 13:50:05 -07:00
Seth Troisi 51d3c23150 Add newline to two zpool messages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16113
2024-04-29 13:50:05 -07:00
Tino Reichardt 16c223eec9 Do no use .cfi_negate_ra_state within the assembly on Arm64
Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <andrew.turner4@arm.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14965
Closes #15784
2024-04-29 13:50:05 -07:00
Andrew Turner 7aaf6ce9d8 Add the BTI elf note to the AArch64 SHA2 assembly
On ELF platforms there is a note to specify when an application or
library supports BTI. When linking one of these the linker needs
all input object files to have the note. If not it will not include
it in the output file.

Normally the compiler would generate it, but for assembly files we
need to do it our selves.

Add the note to the aarch64 sha256 and sha512 assembly files.

Tested by building with BTI enabled and using the -zbti-report=error
flag to lld that makes it an error if the note is missing.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #16086
2024-04-29 13:50:05 -07:00
Rob N 3f817debb4 AUTHORS: refresh with recent new contributors
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16079
2024-04-29 13:50:05 -07:00
Jason Lee 97889c037a return NULL at end of send_progress_thread
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Closes #16074
2024-04-29 13:50:05 -07:00
Maxim Filimonov 86b39b41a0 Fix locale-specific time
In `zpool status -t`, scrub date/time is reported using the C locale,
while trim time is reported using the current one. This is inconsistent.
This patch fixes that.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maxim Filimonov <che@bein.link>
Closes #15878
Closes #15879
2024-04-29 13:50:05 -07:00
Pavel Snajdr 531572b590 Fix panics when truncating/deleting files
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate
to B_TRUE if the dirty record is of another type than dl. Adding
more explicit dr type check before trying to access dr_brtwrite.

Fixes two similar panics:

[ 1373.806119] VERIFY0(db->db_level) failed (0 == 1)
[ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1373.814979]  dump_stack_lvl+0x71/0x90
[ 1373.815799]  spl_panic+0xd3/0x100 [spl]
[ 1373.827709]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1373.829204]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1373.831010]  dnode_free_range+0x532/0x1220 [zfs]
[ 1373.833922]  dmu_free_long_range+0x4e0/0x930 [zfs]
[ 1373.835277]  zfs_trunc+0x75/0x1e0 [zfs]
[ 1373.837958]  zfs_freesp+0x9b/0x470 [zfs]
[ 1373.847236]  zfs_setattr+0x161a/0x3500 [zfs]
[ 1373.855267]  zpl_setattr+0x125/0x320 [zfs]
[ 1373.856725]  notify_change+0x1ee/0x4a0
[ 1373.859207]  do_truncate+0x7f/0xd0
[ 1373.859968]  do_sys_ftruncate+0x28e/0x2e0
[ 1373.860962]  do_syscall_64+0x38/0x90
[ 1373.861751]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 1822.381337] VERIFY0(db->db_level) failed (0 == 1)
[ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1822.389232]  dump_stack_lvl+0x71/0x90
[ 1822.389920]  spl_panic+0xd3/0x100 [spl]
[ 1822.399567]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1822.400583]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1822.401752]  dnode_free_range+0x532/0x1220 [zfs]
[ 1822.402841]  dmu_object_free+0x74/0x120 [zfs]
[ 1822.403869]  zfs_znode_delete+0x75/0x120 [zfs]
[ 1822.404906]  zfs_rmnode+0x3f6/0x7f0 [zfs]
[ 1822.405870]  zfs_inactive+0xa3/0x610 [zfs]
[ 1822.407803]  zpl_evict_inode+0x3e/0x90 [zfs]
[ 1822.408831]  evict+0xc1/0x1c0
[ 1822.409387]  do_unlinkat+0x147/0x300
[ 1822.410060]  __x64_sys_unlinkat+0x33/0x60
[ 1822.410802]  do_syscall_64+0x38/0x90
[ 1822.411458]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15983
2024-04-29 13:50:05 -07:00
Alek P 74101f7e2a vdev props comment and manpage should include zfsd and FreeBSD mentions
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #15968
2024-04-29 13:50:05 -07:00
Don Brady c1c26a77ff Add slow disk diagnosis to ZED
Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469
2024-04-29 13:50:05 -07:00
Tony Hutter db65272aef [2.2.4-only] Stub RAIDZ enums to prevent conflicts
Stub in the RAIDZ expansions enums for now so that the slow IO
commit merges cleanly.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-04-29 13:50:05 -07:00
Rob N da88fc4ac9 zap_leaf: make l_hash[] variable length to silence UBSAN
When UBSAN is active and OpenZFS is a debug build, the l_hash assert at
the bottom of zap_open_leaf() causes UBSAN to complain.

This follows the example in 786641dcf to shut it up.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15964
2024-04-29 13:50:05 -07:00
Paul Dagnelie 889152ce4a Give a better message from 'zpool get' with invalid pool name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15942
2024-04-29 13:50:05 -07:00
Rob N 5d859a2e22 xdr: header cleanup
#16047 notes that include/os/freebsd/spl/rpc/xdr.h carried an
(apparently) incompatible license. While looking into it, it seems that
this file is actually unnecessary these days - FreeBSD's kernel XDR has
XDR_CONTROL, xdrmem_control and XDR_GET_BYTES_AVAIL, while userspace has
XDR_CONTROL and xdrmem_control, and our implementation of
XDR_GET_BYTES_AVAIL for libspl works nicely with it. So this removes
that file outright.

To keep the includes in nvpair.c tidy, I've made a few small adjustments
to the Linux headers. By definition, rpc/types.h provides bool_t and is
included before rpc/xdr.h, so I've created rpc/types.h for Linux. This
isn't necessary for userspace; both FreeBSD native and tirpc on Linux
already have these headers set up correctly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16047 
Closes #16051
2024-04-29 13:50:05 -07:00
Robert Evans e0cfa1592d Fix buffer underflow if sysfs file is empty
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Jason Lee <jasonlee@lanl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16028
Closes #16035
2024-04-29 13:50:05 -07:00
Robert Evans d088fb7d24 ZTS: fix flakiness in cp_files_002_pos
Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16029
2024-04-29 13:50:05 -07:00
Cameron Harr 67995229a8 Fix option string, adding -e and fixing order
The recently added '-e' option (PR #15769) missed adding the
new option in the online `zpool status` help command. This
adds the options and reorders a couple of the other options
that were not listed alphabetically.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16008
2024-04-29 13:50:05 -07:00
Rob N 2ff09e8fed freebsd: fix missing headers in distribution tarball
arc_os.h and freebsd_event.h aren't included in release tarballs, so the
build fails on FreeBSD. This fixes it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15963
2024-04-29 13:50:05 -07:00
Brian Behlendorf 9f1d3db730 Check for minimum partition size
On Linux block devices used for vdevs will by partitioned.  The block
device must be large enough for an 64M partition starting at offset
of 2048 sectors (part1), and a second 64M reserved partition at the
end of the device (part9).

This commit adds a capacity check when creating the GPT label to
immediately detect a device which is too small.  With the existing
code this would be caught slightly latter when attempting to use
the partition.  Catching it sooner let's us print a more useful error.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15898
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav 5dda8c0910 Add VERIFY0P() and ASSERT0P() macros.
These macros are similar to VERIFY0() and ASSERT0() but are intended
for pointers, and therefore use uintptr_t instead of int64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav d6da6cbd74 Clean up existing VERIFY*() macros.
Chiefly:

- Remove unnecessary parentheses around variable names.
- Remove spaces between the type and variable in casts.
- Make the panic message for VERIFY0() reflect how the macro is used.
- Use %p to format pointers, except in Linux kernel code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-22 13:32:33 -07:00
Benda Xu 6732e223bf etc/init.d: decide which variant to use at build time.
Let Debian use the sysv-rc variant of the script, even when OpenRC is
installed. Unlike on Gentoo, OpenRC on Debian consumes both the
sysv-rc scripts and OpenRC ones. ZFS initscripts on Debian should be
the sysv-rc version to provide most compatibility and to integrate
with the rest of initscripts for dependency tracking.

Restrict the substitution in the Makefile to the dedicated list.

This construct is inspired by Mo Zhou's detection of the execution
shell and follows the strategy of Peter in 6ef28c526b.

As of 2024, the initscripts are mostly relevant on Debian, Gentoo and
their derivatives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Issue #8063
Issue #8204
Issue #8359
Closes #15977
2024-04-22 09:28:06 -07:00
Benda Xu baaac31655 config/Substfiles.am: restrict to the dedicated list.
We recover the scope of $(SUBSTFILES) to explicitly control what files
are being generated from the corresponding .in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Closes #15980
2024-04-22 09:28:06 -07:00
Shengqi Chen b0b0d07b13 man: move zfs_prepare_disk.8 to nodist_man_MANS
The commit b53077a added zfs_prepare_disk.8 to the wrong list
dist_man_MANS, in which @zfsexecdir@ will not be properly substituted.
This leads to wrong path in the manpage in generated release tarballs.

Reported-by: Benda Xu <orv@debian.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15979
2024-04-22 09:28:06 -07:00
Umer Saleem 8a56047135 Add support for zfs mount -R <filesystem>
This commit adds support for mounting a dataset along with all of
it's children with '-R' flag for zfs mount. There can be scenarios
where we want to mount all datasets under one hierarchy instead of
mounting all datasets present on system with '-a' flag.

'-R' flag should work on all root and non-root datasets. Usage
information and man page has been updated for zfs mount. A test
for verifying the behavior for '-R' flag is also added.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16015
2024-04-22 09:28:06 -07:00
Rob Norris 9a7ef02f4d Linux 6.9 compat: blk_alloc_disk() now takes two args
There's an extra nullable arg for queue limits. Detect it, and set it to
NULL. Similar change for blk_mq_alloc_disk(), now three args, same
treatment.

Error return now has error encoded in the return, so detect with
IS_ERR() and explicitly NULL our own return.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-22 09:23:23 -07:00
Rob Norris 3bd7cd06b7 Linux 6.9 compat: bdev handles are now struct file
bdev_open_by_path() is replaced by bdev_file_open_by_path(), which
returns a plain old struct file*. Release function is gone entirely; the
regular file release function fput() will take care of the bdev
specifics.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-22 09:23:23 -07:00
Rob N b9c3040b10 vdev_disk: clean up spa/bdev mode conversion
43e8f6e37 introduced a subtle API misuse, in that it passed the output
from vdev_bdev_mode() back into itself. Fortunately, the
SPA_MODE_(READ|WRITE) bit values exactly map to the FMODE_(READ|WRITE) &
BLK_OPEN_(READ|WRITE) bit values, so it didn't result in a bug, but it
was hard to read and understand, so I cleaned it up.

In doing so, I noticed that the only call to vdev_bdev_mode() without
the "exclusive" flag set was in that misuse, and actually, we never do a
non-exclusive blkdev_get_by_path(). So I've just made exclusive be
always-on.


Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15995
2024-04-22 09:23:23 -07:00
Robert Evans 5dbed50429 Linux 5.18+ compat: Detect filemap_range_has_page
In v5.18 `filemap_range_has_page` moved to `pagemap.h`

`pagemap.h` has been around since 3.10 so just include both

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16034
2024-04-22 09:23:23 -07:00
Fabian-Gruenbichler 3fb0942cc5 udev: correctly handle partition #16 and later
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #15904
Closes #15970
2024-04-22 09:23:23 -07:00
Fabian-Gruenbichler fa2cbd4007 zvols: prevent overflow of minor device numbers
currently, the linux kernel allows 2^20 minor devices per major device
number.  ZFS reserves blocks of 2^4 minors per zvol: 1 for the zvol
itself, the other 15 for the first partitions of that zvol. as a result,
only 2^16 such blocks are available for use.

there are no checks in place to avoid overflowing into the major device
number when more than 2^16 zvols are allocated (with volmode=dev or
default). instead of ignoring this limit, which comes with all sorts of
weird knock-on effects, detect this situation and simply fail allocating
the zvol block device early on.

without this safeguard, the kernel will reject the attempt to create an
already existing block device, but ZFS doesn't handle this error and
gets confused about which zvol occupies which minor slot, potentially
resulting in kernel NULL derefs and other issues later on.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #16006
2024-04-22 09:23:23 -07:00
Tony Hutter bb9542a2a0 Linux 6.8 compat: META (#16099)
Update the META file to reflect compatibility with the 6.8 kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-04-22 09:23:23 -07:00
Rob N 72e4996a54 bdev_discard_supported: understand discard_granularity=0
Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See #16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux@3c407dc723), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when
discard_granularity=0. The kernel will never let the request go in
anyway; better that we just disable it so we can report it properly to
the user.

1. https://patchwork.kernel.org/project/linux-block/patch/20240312144826.1045212-2-hch@lst.de/

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit b181b2e604)
2024-04-19 10:19:53 -07:00
Alexander Motin 575872cc37 L2ARC: Relax locking during write
Previous code held ARC state sublist lock throughout all L2ARC
write process, which included number of allocations and even ZIO
issues.  Being blocked in any of those places the code could also
block ARC eviction, that could cause OOM activation or even dead-
lock if system is low on memory or one is too fragmented.

Fix it by dropping the lock as soon as we see a block eligible
for L2ARC writing and pick it up later using earlier inserted
marker.  While there, also reduce scope of hash lock, moving
ZIO allocation and other operations not requiring header access
out of it.  All operations requiring header access move under
hash lock, since L2_WRITING flag does not prevent header eviction
only transition to arc_l2c_only state with L1 header.

To be able to manipulate sublist lock and marker as needed add few
more multilist functions and modify one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
2024-04-19 10:13:38 -07:00
Alexander Motin f4ce02ae42 Small fix to prefetch ranges aggregation
When after #16022 adding new range we aggregate more than two
existing ranges, that should be very rare, only if several streams
overlap, we may need to zero not the last range, but some earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16072
2024-04-19 10:13:38 -07:00
Alexander Motin 97d7228f42 Remove db_state DB_NOFILL checks from syncing context
Syncing context should not depend on current state of dbuf, which
could already change several times in later transaction groups,
but rely solely on dirty record for the transaction group being
synced. Some of the checks seem already impossible, while instead
of others I think we should better check for absence of data in
the specific dirty record rather than DB_NOFILL.

Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16057
2024-04-19 10:13:38 -07:00
Alexander Motin 026fe79646 Speculative prefetch for reordered requests
Before this change speculative prefetcher was able to detect a stream
only if all of its accesses are perfectly sequential.  It was easy to
implement and is perfectly fine for single-threaded applications.
Unfortunately multi-threaded network servers, such as iSCSI, SMB or
NFS usually have plenty of threads and may often reorder requests,
preventing successful speculation and prefetch.

This change allows speculative prefetcher to detect streams even if
requests are reordered by introducing a list of 9 non-contiguous
ranges up to 16MB ahead of current stream position and filling the
gaps as more requests arrive.  It also allows stream to proceed
even with holes up to a certain configurable threshold (25%).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16022
2024-04-19 10:13:38 -07:00
Alexander Motin 602b5dca7b Fix read errors race after block cloning
Investigating read errors triggering panic fixed in #16042 I've
found that we have a race in a sync process between the moment
dirty record for cloned block is removed and the moment dbuf is
destroyed.  If dmu_buf_hold_array_by_dnode() take a hold on a
cloned dbuf before it is synced/destroyed, then dbuf_read_impl()
may see it still in DB_NOFILL state, but without the dirty record.
Such case is not an error, but equivalent to DB_UNCACHED, since
the dbuf block pointer is already updated by dbuf_write_ready().
Unfortunately it is impossible to safely change the dbuf state
to DB_UNCACHED there, since there may already be another cloning
in progress, that dropped dbuf lock before creating a new dirty
record, protected only by the range lock.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Robert Evans <evansr@google.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16052
2024-04-19 10:13:38 -07:00
Alexander Motin d5fb6abd36 Improve dbuf_read() error reporting
Previous code reported non-ZIO errors only via return value, but
not via parent ZIO.  It could cause NULL-dereference panics due
to dmu_buf_hold_array_by_dnode() ignoring the return value,
relying solely on parent ZIO status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reported by:	Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16042
2024-04-19 10:13:38 -07:00
Alexander Motin 39993c3dfe BRT: Check pool clone stats in more tests
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin e3c1c9153f BRT: Fix tests to work on non-empty pools
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin 2ea370a4e3 BRT: Fix holes cloning.
- When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15994
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin 3e91a9c525 BRT: Skip getting length in brt_entry_lookup()
Unlike DDT, where ZAP values may have different lengths due to
compression, all BRT entries are identical 8-byte counters.  It
does not make sense to first fetch the length only to assert it.
zap_lookup_uint64() is specifically designed to work with counters
of different size and should return error if something odd found.
Calling it straight allows to save some measurable CPU time.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15950
2024-04-19 10:13:38 -07:00
Alexander Motin c94f730078 BRT: Make BRT block sizes configurable
Similar to DDT make BRT data and indirect block sizes configurable
via module parameters.  I am not sure what would be the best yet,
but similar to DDT 4KB blocks kill all chances of compression on
vdev with ashift=12 or more, that on my tests reaches 3x.

While here, fix documentation for respective DDT parameters.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15967
2024-04-19 10:13:38 -07:00
Alexander Motin 457e62d7ca BRT: Relax brt_pending_apply() locking
Since brt_pending_apply() is running in syncing context, no other
brt_pending_tree accesses are possible for the TXG.  We don't need
to acquire brt_pending_lock here.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15955
2024-04-19 10:13:38 -07:00
Alexander Motin 19bf54b764 ZAP: Massively switch to _by_dnode() interfaces
Before this change ZAP called dnode_hold() for almost every block
access, that was clearly visible in profiler under heavy load, such
as BRT.  This patch makes it always hold the dnode reference between
zap_lockdir() and zap_unlockdir().  It allows to avoid most of dnode
operations between those.  It also adds several new _by_dnode() APIs
to ZAP and uses them in BRT code.  Also adds dmu_prefetch_by_dnode()
variant and uses it in the ZAP code.

After this there remains only one call to dmu_buf_dnode_enter(),
which seems to be unneeded.  So remove the call and the functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15951
2024-04-19 10:13:38 -07:00
Alexander Motin fdd8c0aea1 BRT: Skip duplicate BRT prefetches
If there is a pending entry for this block, then we've already
issued BRT prefetch for it within this TXG, so don't do it again.
BRT vdev lookup and following zap_prefetch_uint64() call can be
pretty expensive and should be avoided when not necessary.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15941
2024-04-19 10:13:38 -07:00
Alexander Motin dced953b62 ZAP: Some cleanups/micro-optimizations
- Remove custom zap_memset(), use regular memset().
- Use PANIC() instead of opaque cmn_err(CE_PANIC).
- Provide entry parameter to zap_leaf_rehash_entry().
- Reduce branching in zap_leaf_array_create() inner loop.
- Remove signedness where it should not be.

Should be no function changes.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15976
2024-04-19 10:13:38 -07:00
Alexander Motin f7c1db6366 BRT: Change brt_pending_tree sorting order
It does not look important how exactly brt_pending_tree is sorted.
When cloning large file, it is quite likely that all of its blocks
have identical physical birth times, so comparing them first does
not provide useful entropy, while accesses additional cache line.
In most cases combination of vdev and offset provides unique result
and physical birth time comparison is not even needed.  Meanwhile,
when traversing the tree inside brt_pending_apply(), it can be
beneficial for dbuf cache and CPU cache hits to group processing
by vdev and so by the per-VDEV BRT ZAPs.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15954
2024-04-19 10:13:38 -07:00
Alexander Motin fa5de0c5cd Update resume token at object receive.
Before this change resume token was updated only on data receive.
Usually it is enough to resume replication without much overlap.
But we've got a report of a curios case, where replication source
was traversed with recursive grep, which through enabled atime
modified every object without modifying any data.  It produced
several gigabytes of replication traffic without a single data
write and so without a single resume point.

While the resume token was not designed to resume from an object,
I've found that the send implementation always sends object before
any data. So by requesting resume from offset 0 we are effectively
resuming from the object, followed (or not) by the data at offset
0, just as we need it.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15927
2024-04-19 10:13:38 -07:00
Alexander Motin 793a2cff2a Linux: Cleanup taskq threads spawn/exit
This changes taskq_thread_should_stop() to limit maximum exit rate
for idle threads to one per 5 seconds.  I believe the previous one
was broken, not allowing any thread exits for tasks arriving more
than one at a time and so completing while others are running.

Also while there:
 - Remove taskq_thread_spawn() calls on task allocation errors.
 - Remove extra taskq_thread_should_stop() call.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15873
2024-04-19 10:13:38 -07:00
Alexander Motin fdd97e0093 Refactor dmu_prefetch().
- Split dmu_prefetch_dnode() from dmu_prefetch() into a separate
function.  It is quite inconvenient to read the code where len = 0
means dnode prefetch instead indirect/data prefetch.  One function
doing both has no benefits, since the code paths are independent.
 - Improve dmu_prefetch() handling of long block ranges.  Instead
of limiting L0 data length to prefetch for to dmu_prefetch_max,
make dmu_prefetch_max limit the actual amount of prefetch at the
specified level, and, if there is more, prefetch all the rest at
higher indirection level.  It should improve random access times
within the prefetched range of any length, reducing importance of
specific dmu_prefetch_max value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15076
2024-04-19 10:13:38 -07:00
Alexander Motin 3b8817db96 ZIL: Update Linux tracing after #15635
While picking parts from #14909 I've missed Linux tracing specific
ones, that went unnoticed in default configurations, but breaks the
build in some.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15730
2024-04-19 10:13:38 -07:00
Alexander Motin 25ea8ce94b ZIL: Improve next log block size prediction
Track history in context of bursts, not individual log blocks. It
allows to not blow away all the history by single large burst of
many block, and same time allows optimizations covering multiple
blocks in a burst and even predicted following burst.  For each
burst account its optimal block size and minimal first block size.
Use that statistics from the last 8 bursts to predict first block
size of the next burst.

Remove predefined set of block sizes. Allocate any size we see fit,
multiple of 4KB, as required by ZIL now.  With compression enabled
by default, ZFS already writes pretty random block sizes, so this
should not surprise space allocator any more.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15635
2024-04-19 10:13:38 -07:00
Alexander Motin 8b1a132de7 ZIO: Optimize zio_flush()
- Generalize vdev_nowritecache handling by traversing through the
VDEV tree and skipping children ZIOs where not supported.
 - Remove intermediate zio_null() in case of several VDEV children.
 - Remove children handling from zio_ioctl().  There are no other
use cases for this code beside DKIOCFLUSHWRITECACHED, and would there
be, I doubt they would so straightforward apply to all VDEV children.

Comparing to removed previous optimization this should improve cases
of redundant ZILs/SLOGs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15515
2024-04-19 10:13:38 -07:00
Alexander Motin 7ea8331009 ZIL: Detect single-threaded workloads
... by checking that previous block is fully written and flushed.
It allows to skip commit delays since we can give up on aggregation
in that case.  This removes zil_min_commit_timeout parameter, since
for single-threaded workloads it is not needed at all, while on very
fast devices even some multi-threaded workloads may get detected as
single-threaded and still bypass the wait.  To give multi-threaded
workloads more aggregation chances increase zfs_commit_timeout_pct
from 5 to 10%, as they should suffer less from additional latency.

Also single-threaded workloads detection allows in perspective better
prediction of the next block size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15381
2024-04-19 10:13:38 -07:00
Rob N 3c5f354a8c zvol_os: fix compile with blk-mq on Linux 4.x
99741bde5 accesses a cached blk-mq hardware context through the mq_hctx
field of struct request. However, this field did not exist until 5.0.
Before that, the private function blk_mq_map_queue() was used to dig it
out of broader queue context. This commit detects this situation, and
handles it with a poor-man's simulation of that function.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16069
2024-04-17 10:10:24 -07:00
Rob N 5c0fe099ec zvol_os: fix build on Linux <3.13
99741bde5 introduced zvol_num_taskqs, but put it behind the HAVE_BLK_MQ
define, preventing builds on versions of Linux that don't have it
(<3.13, incl EL7).

Nothing about it seems dependent on blk-mq, so this just moves it out
from behind that define and so fixes the build.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16062
2024-04-17 10:10:24 -07:00
Ameer Hamza 5fc134ff2f zvol: use multiple taskq
Currently, zvol uses a single taskq, resulting in throughput bottleneck
under heavy load due to lock contention on the single taskq. This patch
addresses the performance bottleneck under heavy load conditions by
utilizing multiple taskqs, thus mitigating lock contention. The number
of taskqs scale dynamically based on the available CPUs in the system,
as illustrated below:

                taskq   total
cpus    taskqs  threads threads
------- ------- ------- -------
1       1       32       32
2       1       32       32
4       1       32       32
8       2       16       32
16      3       11       33
32      5       7        35
64      8       8        64
128     11      12       132
256     16      16       256

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15992
2024-04-17 10:10:24 -07:00
Rob Norris 7ad2616d37 vdev_disk: fix alignment check when buffer has non-zero starting offset
If a linear buffer spans multiple pages, and the first page has a
non-zero starting offset, the checker would not include the offset, and
so would think there was an alignment gap at the end of the first page,
rather than at the start.

That is, for a 16K buffer spread across five pages with an initial 512B
offset:

    [.XXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXX.]

It would be interpreted as:

    [XXXXXXX.][XXXXXXXX]...

And be rejected as misaligned.

Since it's already a linear ABD, the "linearising" copy would just reuse
the buffer as-is, and the second check would failing, tripping the
VERIFY in vdev_disk_io_rw().

This commit fixes all this by including the offset in the check for
end-of-page alignment.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 1bf649cb0a)
2024-04-12 08:53:48 -07:00
Rob N d0d9dccc61 vdev_disk: ensure trim errors are returned immediately
After 08fd5ccc3, the discard issuing code was organised such that if
requesting an async discard or secure erase failed before the IO was
issued (that is, calling __blkdev_issue_discard() returned an error),
the failed zio would never be executed, resulting in txg_sync hanging
forever waiting for IO to finish.

This commit fixes that by immediately executing a failed zio on error.
To handle the successful synchronous op case, we fake an async op by,
when not using an asynchronous submission method, queuing the successful
result zio as part of the discard handler.

Since it was hard to understand the differences between discard and
secure erase, and sync and async, across different kernel versions, I've
commented and reorganised the code a bit to try and make everything more
contained and linear.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit ba9f587a77)
2024-04-11 12:25:40 -07:00
Rob Norris 28520cad25 vdev_disk: don't touch vbio after its handed off to the kernel
After IO is unplugged, it may complete immediately and vbio_completion
be called on interrupt context. That may interrupt or deschedule our
task. If its the last bio, the vbio will be freed. Then, we get
rescheduled, and try to write to freed memory through vbio->.

This patch just removes the the cleanup, and the corresponding assert.
These were leftovers from a previous iteration of vbio_submit() and were
always "belt and suspenders" ops anyway, never strictly required.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc
Reported-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 917ff75e95)
2024-04-08 10:13:55 -07:00
Robert Evans deb7a84231 Fix corruption caused by mmap flushing problems
1) Make mmap flushes synchronous. Linux may skip flushing dirty pages
   already in writeback unless data-integrity sync is requested.

2) Change zfs_putpage to use TXG_WAIT. Otherwise dirty pages may be
   skipped due to DMU pushing back on TX assign.

3) Add missing mmap flush when doing block cloning.

4) While here, pass errors from putpage to writepage/writepages.

This change fixes corruption edge cases, but unfortunately adds
synchronous ZIL flushes for dirty mmap pages to llseek and bclone
operations. It may be possible to avoid these sync writes later
but would need more tricky refactoring of the writeback code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #15933 
Closes #16019
2024-03-29 17:10:04 -07:00
Rob Norris eebf00bee9 vdev_disk: default to classic submission for 2.2.x
We don't want to change to brand-new code in the middle of a stable
series, but we want it available to test for people running into page
splitting issues.

This commits make zfs_vdev_disk_classic=1 the default, and updates the
documentation to better explain what's going on.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
2024-03-28 13:29:46 -07:00
Rob Norris d0b3be763f abd_iter_page: don't use compound heads on Linux <4.5
Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
in a compound page were refcounted separately. This means that using the
head page without taking a reference to it could see it cleaned up later
before we're finished with it. Specifically, bio_add_page() would take a
reference, and drop its reference after the bio completion callback
returns.

If the zio is executed immediately from the completion callback, this is
usually ok, as any data is referenced through the tail page referenced
by the ABD, and so becomes "live" that way. If there's a delay in zio
execution (high load, error injection), then the head page can be freed,
along with any dirty flags or other indicators that the underlying
memory is used. Later, when the zio completes and that memory is
accessed, its either unmapped and an unhandled fault takes down the
entire system, or it is mapped and we end up messing around in someone
else's memory. Both of these are very bad.

The solution on these older kernels is to take a reference to the head
page when we use it, and release it when we're done. There's not really
a sensible way under our current structure to do this; the "best" would
be to keep a list of head page references in the ABD, and release them
when the ABD is freed.

Since this additional overhead is totally unnecessary on 4.5+, where
head and tail pages share refcounts, I've opted to simply not use the
compound head in ABD page iteration there. This is theoretically less
efficient (though cleaning up head page references would add overhead),
but its safe, and we still get the other benefits of not mapping pages
before adding them to a bio and not mis-splitting pages.

There doesn't appear to be an obvious symbol name or config option we
can match on to discover this behaviour in configure (and the mm/page
APIs have changed a lot since then anyway), so I've gone with a simple
version check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit c6be6ce175)
2024-03-28 13:29:46 -07:00
Rob Norris cb599d27ed vdev_disk: use bio_chain() to submit multiple BIOs
Simplifies our code a lot, so we don't have to wait for each and
reassemble them.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 72fd834c47)
2024-03-28 13:29:46 -07:00
Rob Norris af3a5bb40d vdev_disk: add module parameter to select BIO submission method
This makes the submission method selectable at module load time via the
`zfs_vdev_disk_classic` parameter, allowing this change to be backported
to 2.2 safely, and disabled in favour of the "classic" submission method
if new problems come up.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit df2169d141)
2024-03-28 13:29:46 -07:00
Rob Norris 51c2bd0def vdev_disk: rewrite BIO filling machinery to avoid split pages
This commit tackles a number of issues in the way BIOs (`struct bio`)
are constructed for submission to the Linux block layer.

The kernel has a hard upper limit on the number of pages/segments that
can be added to a BIO, as well as a separate limit for each device
(related to its queue depth and other scheduling characteristics).

ZFS counts the number of memory pages in the request ABD
(`abd_nr_pages_off()`, and then uses that as the number of segments to
put into the BIO, up to the hard upper limit. If it requires more than
the limit, it will create multiple BIOs.

Leaving aside the fact that page count method is wrong (see below), not
limiting to the device segment max means that the device driver will
need to split the BIO in half. This is alone is not necessarily a
problem, but it interacts with another issue to cause a much larger
problem.

The kernel function to add a segment to a BIO (`bio_add_page()`) takes a
`struct page` pointer, and offset+len within it. `struct page` can
represent a run of contiguous memory pages (known as a "compound page").
In can be of arbitrary length.

The ZFS functions that count ABD pages and load them into the BIO
(`abd_nr_pages_off()`, `bio_map()` and `abd_bio_map_off()`) will never
consider a page to be more than `PAGE_SIZE` (4K), even if the `struct
page` is for multiple pages. In this case, it will load the same `struct
page` into the BIO multiple times, with the offset adjusted each time.

With a sufficiently large ABD, this can easily lead to the BIO being
entirely filled much earlier than it could have been. This is also
further contributes to the problem caused by the incorrect segment limit
calculation, as its much easier to go past the device limit, and so
require a split.

Again, this is not a problem on its own.

The logic for "never submit more than `PAGE_SIZE`" is actually a little
more subtle. It will actually never submit a buffer that crosses a 4K
page boundary.

In practice, this is fine, as most ABDs are scattered, that is a list of
complete 4K pages, and so are loaded in as such.

Linear ABDs are typically allocated from slabs, and for small sizes they
are frequently not aligned to page boundaries. For example, a 12K
allocation can span four pages, eg:

     -- 4K -- -- 4K -- -- 4K -- -- 4K --
    |        |        |        |        |
          :## ######## ######## ######:    [1K, 4K, 4K, 3K]

Such an allocation would be loaded into a BIO as you see:

    [1K, 4K, 4K, 3K]

This tends not to be a problem in practice, because even if the BIO were
filled and needed to be split, each half would still have either a start
or end aligned to the logical block size of the device (assuming 4K at
least).

---

In ideal circumstances, these shortcomings don't cause any particular
problems. Its when they start to interact with other ZFS features that
things get interesting.

Aggregation will create a "gang" ABD, which is simply a list of other
ABDs. Iterating over a gang ABD is just iterating over each ABD within
it in turn.

Because the segments are simply loaded in order, we can end up with
uneven segments either side of the "gap" between the two ABDs. For
example, two 12K ABDs might be aggregated and then loaded as:

    [1K, 4K, 4K, 3K, 2K, 4K, 4K, 2K]

Should a split occur, each individual BIO can end up either having an
start or end offset that is not aligned to the logical block size, which
some drivers (eg SCSI) will reject. However, this tends not to happen
because the default aggregation limit usually keeps the BIO small enough
to not require more than one split, and most pages are actually full 4K
pages, so hitting an uneven gap is very rare anyway.

If the pool is under particular memory pressure, then an IO can be
broken down into a "gang block", a 512-byte block composed of a header
and up to three block pointers. Each points to a fragment of the
original write, or in turn, another gang block, breaking the original
data up over and over until space can be found in the pool for each of
them.

Each gang header is a separate 512-byte memory allocation from a slab,
that needs to be written down to disk. When the gang header is added to
the BIO, its a single 512-byte segment.

Pulling all this together, consider a large aggregated write of gang
blocks. This results a BIO containing lots of 512-byte segments. Given
our tendency to overfill the BIO, a split is likely, and most possible
split points will yield a pair of BIOs that are misaligned. Drivers that
care, like the SCSI driver, will reject them.

---

This commit is a substantial refactor and rewrite of much of `vdev_disk`
to sort all this out.

`vdev_bio_max_segs()` now returns the ideal maximum size for the device,
if available. There's also a tuneable `zfs_vdev_disk_max_segs` to
override this, to assist with testing.

We scan the ABD up front to count the number of pages within it, and to
confirm that if we submitted all those pages to one or more BIOs, it
could be split at any point with creating a misaligned BIO.  If the
pages in the BIO are not usable (as in any of the above situations), the
ABD is linearised, and then checked again. This is the same technique
used in `vdev_geom` on FreeBSD, adjusted for Linux's variable page size
and allocator quirks.

`vbio_t` is a cleanup and enhancement of the old `dio_request_t`. The
idea is simply that it can hold all the state needed to create, submit
and return multiple BIOs, including all the refcounts, the ABD copy if
it was needed, and so on. Apart from what I hope is a clearer interface,
the major difference is that because we know how many BIOs we'll need up
front, we don't need the old overflow logic that would grow the BIO
array, throw away all the old work and restart. We can get it right from
the start.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 06a196020e)
2024-03-28 13:29:46 -07:00
Rob Norris 03ff875e09 vdev_disk: make read/write IO function configurable
This is just setting up for the next couple of commits, which will add a
new IO function and a parameter to select it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit c4a13ba483)
2024-03-28 13:29:46 -07:00
Rob Norris 13b5348848 vdev_disk: reorganise vdev_disk_io_start
Light reshuffle to make it a bit more linear to read and get rid of a
bunch of args that aren't needed in all cases.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 867178ae1d)
2024-03-28 13:29:46 -07:00
Rob Norris 4820185031 vdev_disk: rename existing functions to vdev_classic_*
This is just renaming the existing functions we're about to replace and
grouping them together to make the next commits easier to follow.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit f3b85d706b)
2024-03-28 13:29:46 -07:00
Rob Norris 52a2af6fd1 abd: add page iterator
The regular ABD iterators yield data buffers, so they have to map and
unmap pages into kernel memory. If the caller only wants to count
chunks, or can use page pointers directly, then the map/unmap is just
unnecessary overhead.

This adds adb_iterate_page_func, which yields unmapped struct page
instead.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 390b448726)
2024-03-28 13:29:46 -07:00
Rob Norris 220bb7341e linux 5.4 compat: page_size()
Before 5.4 we have to do a little math.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit df04efe321)
2024-03-28 13:29:46 -07:00
Rob N 58211157bf Linux 6.8 compat: use splice_copy_file_range() for fallback
Linux 6.8 removes generic_copy_file_range(), which had been reduced to a
simple wrapper around splice_copy_file_range(). Detect that function
directly and use it if generic_ is not available.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15930
Closes #15931
(cherry picked from commit ef08a4d406)
2024-03-21 09:35:17 -07:00
Tony Hutter c883088df8 Tag zfs-2.2.3
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-02-21 09:26:51 -08:00
Alexander Motin c0c4866f8a dmu: Allow buffer fills to fail
When ZFS overwrites a whole block, it does not bother to read the
old content from disk. It is a good optimization, but if the buffer
fill fails due to page fault or something else, the buffer ends up
corrupted, neither keeping old content, nor getting the new one.

On FreeBSD this is additionally complicated by page faults being
blocked by VFS layer, always returning EFAULT on attempt to write
from mmap()'ed but not yet cached address range.  Normally it is
not a big problem, since after original failure VFS will retry the
write after reading the required data.  The problem becomes worse
in specific case when somebody tries to write into a file its own
mmap()'ed content from the same location.  In that situation the
only copy of the data is getting corrupted on the page fault and
the following retries only fixate the status quo.  Block cloning
makes this issue easier to reproduce, since it does not read the
old data, unlike traditional file copy, that may work by chance.

This patch provides the fill status to dmu_buf_fill_done(), that
in case of error can destroy the corrupted buffer as if no write
happened.  One more complication in case of block cloning is that
if error is possible during fill, dmu_buf_will_fill() must read
the data via fall-back to dmu_buf_will_dirty().  It is required
to allow in case of error restoring the buffer to a state after
the cloning, not not before it, that would happen if we just call
dbuf_undirty().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
2024-02-20 15:53:02 -08:00
Tony Hutter b62fd2cef9 ZTS: Skip cross-fs bclone tests if FreeBSD < 14.0
Skip cross filesystem block cloning tests on FreeBSD if running
less than version 14.0.  Cross filesystem copy_file_range() was
added in FreeBSD 14.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15901
2024-02-16 09:33:26 -08:00
Tony Hutter d92fbe2150 [zfs-2.2.3] ZTS: Use correct bclone module param name on FreeBSD
The bclone module names are not prefixed with 'zfs' on FreeBSD.
This was causing test failues.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-02-16 09:33:05 -08:00
Bi11 a4978d2605 zdb: Fix false leak report for BRT objects
Fix a misreport in 'zdb -d' where it falsely marked
BRT objects as leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuxin Wang <yuxinwang9999@gmail.com>
Closes #15882
2024-02-12 17:03:17 -08:00
Dex Wood a6f6c881ff Add Ntfy notification support to ZED
This commit adds the zed_notify_ntfy() function and hooks it
into zed_notify(). This will allow ZED to send notifications
to ntfy.sh or a self-hosted Ntfy service, which can be received
on a desktop or mobile device. It is configured with ZED_NTFY_TOPIC,
ZED_NTFY_URL, and ZED_NTFY_ACCESS_TOKEN variables in zed.rc.

Reviewed-by: @classabbyamp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dex Wood <slash2314@gmail.com>
Closes #15584
2024-02-12 14:32:11 -08:00
Bi11 fc3d34bd08 BRT: Fix slop space calculation with block cloning
Similar to deduplication, the size of data duplicated by block cloning
should not be included in the slop space calculation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuxin Wang <yuxinwang9999@gmail.com>
Closes #15874
2024-02-12 14:04:27 -08:00
Rob N 36116b4612 zfs list: add '-t fs' and '-t vol' options (#15883)
Because "filesystem" and "volume" are just too long!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15864
(cherry picked from commit a5a725440b)
2024-02-12 14:04:27 -08:00
Tony Hutter b699dacb4a [zfs-2.2.3] Enable zfs_bclone_enabled on cp_files tests
cp_files_002_pos uses BRT, so enable block cloning in setup/cleanup.
This is only something we need to do in zfs-2.2.3, since 2.2.x ships
with block cloning disabled by default.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-02-12 14:04:21 -08:00
the-Chain-Warden-thresh d22bf6a9bd LUA: Backport CVE-2020-24370's patch
CVE-2020-24370 is a security vulnerability in lua. Although the CVE
description in CVE-2020-24370 said that this CVE only affected lua
5.4.0, according to lua this CVE actually existed since lua 5.2. The
root cause of this CVE is the negation overflow that occurs when you
try to take the negative of 0x80000000. Thus, this CVE also exists in
openzfs. Try to backport the fix to the lua in openzfs since the
original fix is for 5.4 and several functions have been changed.

https://github.com/advisories/GHSA-gfr4-c37g-mm3v
https://nvd.nist.gov/vuln/detail/CVE-2020-24370
https://www.lua.org/bugs.html#5.4.0-11
https://github.com/lua/lua/commit/a585eae6e7ada1ca9271607a4f48dfb1786

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChenHao Lu <18302010006@fudan.edu.cn>
Closes #15847
2024-02-08 15:22:16 -08:00
Cameron Harr 40e20d808c Add 'zpool status -e' flag to see unhealthy vdevs
When very large pools are present, it can be laborious to find
reasons for why a pool is degraded and/or where an unhealthy vdev
is. This option filters out vdevs that are ONLINE and with no errors
to make it easier to see where the issues are. Root and parents of
unhealthy vdevs will always be printed.

Testing:
ZFS errors and drive failures for multiple vdevs were simulated with
zinject.

Sample vdev listings with '-e' option
- All vdevs healthy
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0

- ZFS errors
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0

- Vdev faulted
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev faults and data errors
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-1  DEGRADED     0     0     0
        L2      FAULTED      0     0     0  too many errors
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev missing
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     UNAVAIL      3     1     0

- Slow devices when -s provided with -e
    NAME        STATE     READ WRITE CKSUM  SLOW
    iron5       DEGRADED     0     0     0     -
      raidz2-5  DEGRADED     0     0     0     -
        L10     FAULTED      0     0     0     0  external device fault
        L51     ONLINE       0     0     0    14

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #15769
2024-02-08 15:22:16 -08:00
Mauricio Faria de Oliveira 9bb8d26bd5 zed: fix typo in variable ZED_POWER_OFF_ENCLO*US*RE_SLOT_ON_FAULT
Replace ENCLO_US_RE with ENCLO_SU_RE in the name of the variable.

Note this changes the user-visible string in zed.rc, thus might
break current users with the wrong string, but it's ~2 months
since zfs-2.2.0 tag is out, thus should not be widespread yet.

Mechanical change:

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    cmd/zed/zed.d/zed.rc
    cmd/zed/zed.d/statechange-slot_off.sh

    $ sed -i 's/ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT/<linebreak>
                ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT/g' \
      cmd/zed/zed.d/zed.rc \
      cmd/zed/zed.d/statechange-slot_off.sh

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    $

Fixes 11fbcacf37
("zed: Add zedlet to power off slot when drive is faulted")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #15651
2024-02-08 15:22:16 -08:00
Umer Saleem 08fd5ccc38 Improve performance for zpool trim on linux
On Linux, ZFS uses blkdev_issue_discard in vdev_disk_io_trim to issue
trim command which is synchronous.

This commit updates vdev_disk_io_trim to use __blkdev_issue_discard,
which is asynchronous. Unfortunately there isn't any asynchronous
version for blkdev_issue_secure_erase, so performance of secure trim
will still suffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15843
2024-02-06 12:58:55 -08:00
Tony Hutter 00d85a98ea BRT: Fix FICLONE/FICLONERANGE shortened copy
On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls
are expected to either fully clone the specified range or return an
error.  The range may be for an entire file.  While internally ZFS
supports cloning partial ranges there's no way to return the length
cloned to the caller so we need to make this all or nothing.

As part of this change support for the REMAP_FILE_CAN_SHORTEN flag
has been added.  When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range()
will return a shortened range when encountering pending dirty records.
When it's clear zfs_clone_range() will block and wait for the records
to be written out allowing the blocks to be cloned.

Furthermore, the file range lock is held over the region being cloned
to prevent it from being modified while cloning.  This doesn't quite
provide an atomic semantics since if an error is encountered only a
portion of the range may be cloned.  This will be converted to an
error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the
caller.  However, the destination file range is left in an undefined
state.

A test case has been added which exercises this functionality by
verifying that `cp --reflink=never|auto|always` works correctly.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15728
Closes #15842
2024-02-06 10:01:15 -08:00
Mark Johnston 9ef15845f5 Fix the FreeBSD userspace build (#15716)
- Mark some parameters to zpool_power*() as unused.
- Add a stub zpool_disk_wait().

Fixes: a9520e6e5 ("zpool: Add slot power control, print power status")

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-01-30 13:33:36 -08:00
Tony Hutter 69142125d7 zpool: Add slot power control, print power status
Add `zpool` flags to control the slot power to drives.  This assumes
your SAS or NVMe enclosure supports slot power control via sysfs.

The new `--power` flag is added to `zpool offline|online|clear`:

    zpool offline --power <pool> <device>    Turn off device slot power
    zpool online --power <pool> <device>     Turn on device slot power
    zpool clear --power <pool> [device]      Turn on device slot power

If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power'
option is automatically implied for `zpool online` and `zpool clear`
and does not need to be passed.

zpool status also gets a --power option to print the slot power status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mart Frauenlob <AllKind@fastest.cc>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15662
2024-01-29 15:12:06 -08:00
Tony Hutter 59112ca27d zed: misc vdev_enc_sysfs_path fixes
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale.  To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible.  Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED.  That
is to say, we can't just blindly use the dynamic path in every case.

Also:
	- Add enclosure sysfs entry when running 'zpool add'
	- Fix 'slot' and 'enc' zpool.d scripts for nvme

Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15462
2024-01-29 15:11:56 -08:00
Tony Hutter 992d8871eb ZTS: Add dirty dnode stress test
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
https://github.com/openzfs/zfs/issues/15526

The bug was fixed in https://github.com/openzfs/zfs/pull/15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608
2024-01-29 15:06:14 -08:00
Rob Norris e6ca28c970 Linux 6.8 compat: handle mnt_idmap user_namespace change
struct mnt_idmap no longer has a struct user_namespace within it. Work
around this by creating a temporary with the copy of the map we need
taken from the idmap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris cbd51c5f24 Linux 6.8 compat: fix inode permission tests
The name inode_permission is now defined in the kernel. Rename ours to
test_permission, in line with most of our other tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris 09e6724e1e Linux 6.8 compat: replace MAX_ORDER define
MAX_ORDER has been renamed to MAX_PAGE_ORDER. Rather than just
redefining it, instead define our own name and set it consistently from
the start.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris 7466e09a49 Linux 6.8 compat: implement strlcpy fallback
Linux has removed strlcpy in favour of strscpy. This implements a
fallback implementation of strlcpy for this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris ce782d0804 Linux 6.8 compat: update for new bdev access functions
blkdev_get_by_path() and blkdev_put() have been replaced by
bdev_open_by_path() and bdev_release(), which return a "handle" object
with the bdev object itself inside.

This adds detection for the new functions, and macros to handle the old
and new forms consistently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris 64afc4e66e Linux 6.8 compat: make test functions static
The kernel is now being compiled with -Wmissing-prototypes. Most of our
test stub functions had no prototype, and failed to compile. Since they
don't need to be visible anywhere else, just make them all static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Brian Behlendorf 621dfaff5c Linux 6.7 compat: META
Update the META file to reflect compatibility with the 6.7 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15833
2024-01-29 14:53:29 -08:00
Paul Dagnelie ab653603f8 Don't assert mg_initialized due to device addition race
During device removal stress tests, we noticed that we were tripping 
the assertion that mg_initialized was true. After investigation, it was 
determined that the mg in question was the embedded log metaslab 
group for a newly added vdev; the normal mg had been initialized (by 
metaslab_sync_reassess, via vdev_sync_done). However, because the spa 
config alloc lock is not held as writer across both calls to 
metaslab_sync_reassess, it is possible for an allocation to happen 
between the two metaslab_groups being initialized. Because the metaslab 
code doesn't check the group in question, just the vdev's main mg, it 
is possible to get past the initial check in vdev_allocatable and 
later fail due to the assertion.

We simply remove the assertions. We could also consider locking the 
ALLOC lock around the reassess calls in vdev_sync_done, but that risks 
deadlocks. We could check the actual target mg in vdev_allocatable, 
but that risks racing with a passivation that comes in after that 
check but before the assertion. We still won't be able to actually 
allocate from the metaslab group if no metaslabs are ready, so this 
change shouldn't break anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15818
2024-01-29 14:53:29 -08:00
Chris Davidson acc7cd8e99 Update man pages to time(1) from time(2)
zpool-iostat.8: Updated time(2) -> time(1) to align to manual page
zpool-list.8: Updated time(2) -> time(1) to align to manual page
zpool-status.8: Updated time(2) -> time(1) to align to manual page
zpool-wait.8: Update time(2) -> time(1) to align to manual page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christopher Davidson <christopher.davidson@gmail.com>
Closes #15823
2024-01-29 14:53:29 -08:00
Brian Behlendorf dd0874cf7e ZTS: Allow longer run time for zdb_args_pos
The zdb_args_pos test may take slightly longer than 600 seconds to run
on some of the CI builders.  To prevent this from causing failures allow
up to 1200 seconds for tests in this group.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15826
2024-01-29 14:53:29 -08:00
Andrew Innes 7cd666d54b Move nodes into correct subgraphs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #15828
2024-01-29 14:53:29 -08:00
Rob N 0606ce2055 zpool wait: print timestamp before the header
list, status and iostat all display the -T timestamp before the header,
but wait showed it after. Make it be like the others.

Reported-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15825
2024-01-29 14:53:29 -08:00
Ameer Hamza dd3a0a2715 Update vdev devid and physpath if changed between imports
If devid or physpath for a vdev changes between imports, ensure it is
updated to the new value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15816
2024-01-29 14:53:29 -08:00
Tino Reichardt 9ad150446f ZTS: Update deprecated Github Action version numbers
GitHub Actions is transitioning from Node 16 to Node 20.

So we need to update these:
- actions/checkout@v3 -> v4
- actions/download-artifact@v3 -> v4
- actions/upload-artifact@v3 -> v4 and some minor changes

Update also the documentation of the testings workflow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15820
2024-01-29 14:53:29 -08:00
Richard Yao 9da745f5de Switch to CodeQL to detect prohibited function use
The LLVM/Clang developers pointed out that using the CPP to detect use
of functions that our QA policies prohibit risks invoking undefined
behavior. To resolve this, we configure CodeQL to detect forbidden
function usage.

Note that cpp in the context of CodeQL refers to C/C++, rather than the
C PreProcessor, which C++ also uses. It really should have been written
cxx, but that ship sailed a long time ago. This misuse of the term cpp
is retained in the CodeQL configuration for consistency with upstream
CodeQL.

As a side benefit, verbose make no longer is a wall of text showing a
bunch of CPP macros, which can make debugging slightly easier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #15819 
Closes #14134
2024-01-29 14:53:29 -08:00
Tino Reichardt cfa29b9945 ZTS: Apply small changes for speeding up the tests
The Github Action Runner got some new hardware metrics.  We should use
the provided and empty disk which is pre-mounted at /mnt now.

Disk1: 89GiB -> rootfs + bootfs with ~80MB/s -> don't care
Disk2: 64GiB -> /mnt with 420MB/s -> new testing ssd

This commit will mount the new disk to /var/tmp and provide hopefully
some speedups within our testings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15811
2024-01-29 14:53:29 -08:00
Val Packett 09a7961364 FreeBSD: Fix bootstrapping tools under Linux/musl
musl libc has deprecated LFS64 aliases, so bootstrapping FreeBSD tools
under musl distros has been failing with stat64 errors.

Apply the aliases under non-glibc Linux to fix this problem.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Val Packett <val@packett.cool>
Closes #15780
2024-01-29 14:53:29 -08:00
Tino Reichardt 276be5357c linux spl: fix typo in top comment of spl-condvar.c
Credential Implementation -> Condition Variables Implementation

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15782
2024-01-29 14:53:29 -08:00
Lalufu 424d06a298 Make sure all necessary RPM path macros are defined
When building (s)rpm files through the Makefile, a directory structure
is created in /tmp to hold the various files.

In case the user running the command has overridden some of the RPM path
settings through their user profile (for example in `~/.rpmmacros`),
these paths do not line up with the configuration, and the build fails.

Make sure all paths used are properly defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ralf Ertzinger <ralf@skytale.net>
Closes #15756
2024-01-29 14:53:29 -08:00
youzhongyang 6b64acc157 Make spl_kmem_cache size check consistent
On Linux x86_64, kmem cache can have size up to 4M,
however increasing spl_kmem_cache_slab_limit can lead
to crash due to the size check inconsistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #15757
2024-01-29 14:53:29 -08:00
Ameer Hamza a2e71db664 Add path handling for aux vdevs in `label_path`
If the AUX vdev is added using UUID, importing the pool falls back AUX
vdev to open it with disk name instead of UUID due to the absence of
path information for AUX vdevs. Since AUX label now have path
information, this PR adds path handling for it in `label_path`.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-29 14:53:29 -08:00
Ameer Hamza eb4a36bcef Extend aux label to add path information
Pool import logic uses vdev paths, so it makes sense to add path
information on AUX vdev as well.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-29 14:53:29 -08:00
Ameer Hamza 52cee9a3eb fix: Uber block label not always found for aux vdevs
When spare or l2cache (aux) vdev is added during pool creation,
spa->spa_uberblock is not dumped until that point. Subsequently,
the aux label is never synchronized after its initial creation,
resulting in the uberblock label remaining undumped. The uberblock
is crucial for lib_blkid in identifying the ZFS partition type. To
address this issue, we now ensure sync of the uberblock label once
if it's not dumped initially.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-29 14:53:29 -08:00
Brian Behlendorf 2006ac1f4a Fix "out of memory" error
Drop the no_memory() call from zpool_in_use() when reading the
label fails and instead return the error to the caller.  This
prevents a misleading "internal error: out of memory" error
when the label can't be read.  This will result in is_spare()
returning B_FALSE instead of aborting, which is already safely
handled.

Furthermore, on Linux it's possible for EREMOTEIO to returned
by an NVMe device if the device has been low-level formatted
and not rescanned.  In this case we want to fallback to the
legacy scanning method and read any of the labels we can.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #13538
Closes #15747
2024-01-29 14:53:29 -08:00
Benjamin Sherman 509526ad21 fix: preserve linux kmod signature in zfs-kmod rpm spec
This change provides rpm spec macros to sign the zfs and spl kmods as
the final step after the %install scriptlet. This is needed since the
find-debuginfo.sh script strips out debug symbols plus signatures.

Kernel module signing only occurs when the required files are present
as typically required in the Linux source tree:
- certs/signing_key.pem
- certs/signing_key.x509

The method for overriding the default __spec_install_post macro is
inspired by (and largely copied from) the Fedora kernel.spec.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Benjamin Sherman <benjamin@holyarmy.org>
Closes #15744
2024-01-29 14:53:29 -08:00
Stefan Lendl 4db88c37cc fix(mount): do not truncate shares not zfs mount
When running zfs share -a resetting the exports.d/zfs.exports makes
sense the get a clean state.
Truncating was also called with zfs mount which would not populate the
file again.
Add test to verify shares persist after mount -a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stefan Lendl <s.lendl@proxmox.com>
Closes #15607 
Closes #15660
2024-01-29 14:53:29 -08:00
Mark Johnston 8b1c6db3d2 Fix a potential use-after-free in zfs_setsecattr()
In general, VOPs must not load the "z_log" field until having called
zfs_enter_verify_zp().

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752
2024-01-29 14:53:29 -08:00
Mark Johnston 22e4f08c30 Linux: Defer loading the object set in zfs_setattr()
We need to wait until after having done a zfs_enter() to load some
fields from the zfsvfs structure.  Otherwise a use-after-free is
possible in the face of a concurrent rollback.

Other functions in this file are careful to avoid this bug, I believe
this is the only instance.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752
2024-01-29 14:53:29 -08:00
Rich Ercolani 7bccf98a73 Make zdb -R scale less poorly
zdb -R with :d tries to use gzip decompression 9 times per size.
There's absolutely no reason for that, they're all the same
decompressor.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15726
2024-01-29 14:53:29 -08:00
Rich Ercolani 4d4972ed98 Stop wasting time on malloc in snprintf_zstd_header
Profiling zdb -vvvvv on datasets with a lot of zstd blocks, we find
ourselves spending quite a lot of time on malloc/free, because we
allocate a 16M abd each call, and never free it, so we're leaking
16M per call as well.

This seems sub-optimal. So let's just keep the buffer around and
reuse it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15721
2024-01-29 14:53:29 -08:00
Pawel Jakub Dawidek 3425484eb9 Fix file descriptor leak on pool import.
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15630
2024-01-26 13:38:25 -08:00
Brian Behlendorf 9e0304c363
ZTS: Apply zfs_bclone_enabled to bclone tests
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796
2024-01-22 16:15:03 -08:00
Tino Reichardt c1161e2851 fix: variable type with zfs-tests/cmd/clonefile.c
Compiling on arm64 freebsd-13.2 and arm64 almalinux-8 brings currently
this error:

```
  CC       tests/zfs-tests/cmd/clonefile.o
tests/zfs-tests/cmd/clonefile.c:166:43: error: result of comparison of \
constant -1 with expression of type 'char' is always true \
[-Werror,-Wtautological-constant-out-of-range-compare]
        while ((c = getopt(argc, argv, "crfdq")) != -1) {
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~
1 error generated.
gmake[2]: *** [Makefile:8675: tests/zfs-tests/cmd/clonefile.o] Error 1
```

Fix: use correct variable type `int`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15783
2024-01-19 12:28:02 -08:00
Pawel Jakub Dawidek ef527958c6 Fix cloning into mmaped and cached file.
If the destination file is mmaped and the mmaped region was already
read, so it is cached, we need to update mmaped pages after successful
clone using update_pages().

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Pointed out by: Ka Ho Ng <khng@freebsd.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15772
2024-01-19 12:28:02 -08:00
Umer Saleem d2f7b2e557 ZTS: Test for clone, mmap and write for block cloning
For block cloning, if we mmap the cloned file and write from the
map into the file, it triggers a panic in dbuf_redirty() on Linux.

The same scenario causes data corruption on FreeBSD. Both these
issues are fixed under PR#15656 and PR#15665.

It would be good to add a test for this scenario in ZTS. The test
program and issue was produced by @robn.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15717
2024-01-19 12:28:02 -08:00
Brian Behlendorf 83c0ccc7cf Enable block_cloning tests on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15749
2024-01-19 12:28:02 -08:00
Pawel Jakub Dawidek c16d103422 Block cloning tests.
The test mostly focus on testing various corner cases.
The tests take a long time to run, so for the common.run runfile
we randomly select a hundred tests.
To run all the bclone tests, bclone.run runfile should be used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15631
2024-01-19 12:28:02 -08:00
Umer Saleem f94a77951d Test LWB buffer overflow for block cloning
PR#15634 removes 128K into 2x68K LWB split optimization, since it
was found to cause LWB buffer overflow while trying to write 128KB
TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer,
with multiple VDEVs ZIL.

This commit adds a test for this particular scenario by writing
maximum sizes TX_CLONE_RANE record with 1022 block pointers into
68KB buffer, with two SLOG devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15672
2024-01-19 12:28:02 -08:00
Ameer Hamza d8b0b6032b ZTS: Add test cases for block cloning replay
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2024-01-19 12:28:02 -08:00
Ameer Hamza 387f003be3 ZTS: block_cloning: Use numeric sort for get_same_blocks
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2024-01-19 12:28:02 -08:00
Kevin Jin 07cf973fe9 Autotrim High Load Average Fix
Switch from cv_wait() to cv_wait_idle() in vdev_autotrim_wait_kick(),
which should mitigate the high load average while waiting.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #15781
2024-01-18 11:33:29 -08:00
Rob N 2ecc2dfe42 Linux 6.7 compat: zfs_setattr fix atime update
In db4fc559c I messed up and changed this bit of code to set the inode
atime to an uninitialised value, when actually it was just supposed to
loading the atime from the inode to be stored in the SA. This changes it
to what it should have been.

Ensure times change by the right amount Previously, we only checked
if the times changed at all, which missed a bug where the atime was
being set to an undefined value.

Now ensure the times change by two seconds (or thereabouts), ensuring
we catch cases where we set the time to something bonkers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15762
Closes #15773
2024-01-17 08:59:28 -08:00
Shengqi Chen 9ecd112dc1 compact: workaround for GPL-only symbols on riscv from Linux 6.2
Since Linux 6.2, the implementation of flush_dcache_page on riscv
references GPL-only symbol `PageHuge`, breaking the build of zfs.

This patch uses existing mechanism to override flush_dcache_page,
removing the call to `PageHuge`. According to comments in kernel,
it is only used to do some check against HugeTLB pages, which only
exist in userspace. ZFS uses flush_dcache_page only on kernel pages,
thus this patch will not introduce any behaviour change.

See also: torvalds/linux@d33deda, openzfs/zfs@589f59b

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #14974 
Closes #15627
2024-01-16 13:27:29 -08:00
Mark Johnston a00231a3fc spa: Let spa_taskq_param_get()'s addition of a newline be optional
For FreeBSD sysctls, we don't want the extra newline, since the
sysctl(8) utility will format strings appropriately.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719
2024-01-16 11:32:19 -08:00
Mark Johnston 9181e94f0b spa: Fix FreeBSD sysctl handlers
sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl().  In particular, this code triggers an assertion
failure in sbuf_clear().

Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.

Fixes: 6930ecbb7 ("spa: make read/write queues configurable")
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719
2024-01-16 11:32:19 -08:00
Rob Norris 3bd23fd78d freebsd: fix compile for spa_taskq_read/spa_taskq_write params
Missed in #15695, backporting #15675.

Signed-off-by: Rob Norris <robn@despairlabs.com>
2024-01-16 11:32:19 -08:00
Alexander Motin ac592318b8 Fix livelist assertions for dedup and cloning
Two block pointers in livelist pointing to the same location may
be caused not only by dedup, but also by block cloning. We should
not assert D bit set in them.

Two block pointers in livelist pointing to the same location may
have different logical birth time in case of dedup or cloning. We
should assert identical physical birth time instead.

Assert identical physical block size between pointers in addition
to checksum, since that is what checksums are calculated on.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15732
2024-01-12 12:53:00 -08:00
Alexander Motin 152a775eac Improve block sizes checks during cloning
- Fail if source block is smaller than destination.  We can only
grow blocks, not shrink them.
 - Fail if we do not have full znode range lock.  In that case grow
is not even called.  We should improve zfs_rangelock_cb() somehow
to know when cloning needs to grow the block size unlike write.
 - Fail of we tried to resize, but failed.  There are many reasons
for it to fail that we can not predict at this level, so be ready
for them.  Unlike write, that may proceed after growth failure,
block cloning can't and must return error.

This fixes assertion inside dmu_brt_clone() when it sees different
number of blocks held in destination than it got block pointers.
Builds without ZFS_DEBUG returned EXDEV, so are not affected much.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15724 
Closes #15735
2024-01-12 12:53:00 -08:00
Shengqi Chen 976bf9b6a6 Linux 6.2 compat: add check for kernel_neon_* availability
This patch adds check for `kernel_neon_*` symbols on arm and arm64
platforms to address the following issues:

1. Linux 6.2+ on arm64 has exported them with `EXPORT_SYMBOL_GPL`, so
   license compatibility must be checked before use.
2. On both arm and arm64, the definitions of these symbols are guarded
   by `CONFIG_KERNEL_MODE_NEON`, but their declarations are still
   present. Checking in configuration phase only leads to MODPOST
   errors (undefined references).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15711 
Closes #14555 
Closes: #15401
2024-01-12 12:38:27 -08:00
chrisperedun f71c16a661 Don't panic on unencrypted block in encrypted dataset
While 763ca47 closes the situation of block cloning creating
unencrypted records in encrypted datasets, existing data still causes
panic on read. Setting zfs_recover bypasses this but at the cost of
potentially ignoring more serious issues.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Peredun <chris.peredun@ixsystems.com>
Closes #15677
2024-01-08 16:11:39 -08:00
Alexander Motin 9c40ae0219 dbuf: Set dr_data when unoverriding after clone
Block cloning normally creates dirty record without dr_data.  But if
the block is read after cloning, it is moved into DB_CACHED state and
receives the data buffer.  If after that we call dbuf_unoverride()
to convert the dirty record into normal write, we should give it the
data buffer from dbuf and release one.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15654
Closes #15656
2024-01-08 16:11:39 -08:00
Alexander Motin a701548eb4 dbuf: Handle arcbuf assignment after block cloning
In some cases dbuf_assign_arcbuf() may be called on a block that
was recently cloned.  If it happened in current TXG we must undo
the block cloning first, since the only one dirty record per TXG
can't and shouldn't mean both cloning and overwrite same time.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15653
2024-01-08 16:11:39 -08:00
Alexander Motin b13c91bb29 DMU: Fix lock leak on dbuf_hold() error
dmu_assign_arcbuf_by_dnode() should drop dn_struct_rwlock lock in
case dbuf_hold() failed.  I don't have reproduction for this, but
it looks inconsistent with dmu_buf_hold_noread_by_dnode() and co.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15644
2024-01-08 16:11:39 -08:00
Alexander Motin e09356fa05 BRT: Limit brt_vdev_dump() to only one vdev
Without this patch on pool of 60 vdevs with ZFS_DEBUG enabled clone
takes much more time than copy, while heavily trashing dbgmsg for
no good reason, repeatedly dumping all vdevs BRTs again and again,
even unmodified ones.

I am generally not sure this dumping is not excessive, but decided
to keep it for now, just restricting its scope to more reasonable.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15625
2024-01-08 16:11:39 -08:00
Alexander Motin 1e1d748cae ZIL: Remove 128K into 2x68K LWB split optimization
To improve 128KB block write performance in case of multiple VDEVs
ZIL used to spit those writes into two 64KB ones.  Unfortunately it
was found to cause LWB buffer overflow, trying to write maximum-
sizes 128KB TX_CLONE_RANGE record with 1022 block pointers into
68KB buffer, since unlike TX_WRITE ZIL code can't split it.

This is a minimally-invasive temporary block cloning fix until the
following more invasive prediction code refactoring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15634
2024-01-08 16:11:39 -08:00
Alexander Motin dea2d3c6cd zdb: Dump encrypted write and clone ZIL records
Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE
records, so we can dump them, that may be useful for debugging.

Related to #15543.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15629
2024-01-08 16:11:39 -08:00
oromenahar 121924575e Allow block cloning across encrypted datasets
When two datasets share the same master encryption key, it is safe
to clone encrypted blocks. Currently only snapshots and clones
of a dataset share with it the same encryption key.

Added a test for:
- Clone from encrypted sibling to encrypted sibling with
  non encrypted parent
- Clone from encrypted parent to inherited encrypted child
- Clone from child to sibling with encrypted parent
- Clone from snapshot to the original datasets
- Clone from foreign snapshot to a foreign dataset
- Cloning from non-encrypted to encrypted datasets
- Cloning from encrypted to non-encrypted datasets

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15544
2024-01-08 16:11:39 -08:00
Alexander Motin e11b3eb1c6 ZIL: Do not clone blocks from the future
ZIL claim can not handle block pointers cloned from the future,
since they are not yet allocated at that point.  It may happen
either if the block was just written when it was cloned, or if
the pool was frozen or somehow else rewound on import.

Handle it from two sides: prevent cloning of blocks with physical
birth time from not yet synced or frozen TXG, and abort ZIL claim
if we still detect such blocks due to rewind or something else.

While there, assert that any cloned blocks we claim are really
allocated by calling metaslab_check_free().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15617
2024-01-08 16:11:39 -08:00
Alexander Motin 3b8f227362 ZIL: Remove TX_CLONE_RANGE replay for ZVOLs.
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means we do not
need to do anything (drop the references) for not implemented yet
TX_CLONE_RANGE replay for ZVOLs.

This is a logical follow up to #15603.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15612
2024-01-08 16:11:39 -08:00
Alexander Motin e48195c816 ZIO: Add overflow checks for linear buffers
Since we use a limited set of kmem caches, quite often we have unused
memory after the end of the buffer.  Put there up to a 512-byte canary
when built with debug to detect buffer overflows at the free time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15553
2024-01-08 16:11:39 -08:00
Alexander Motin ad47eca195 ZIL: Assert record sizes in different places
This should make sure we have log written without overflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15517
2024-01-08 16:11:39 -08:00
Alexander Motin 2e259c6f00 L2ARC: Restrict write size to 1/4 of the device
PR #15457 exposed weird logic in L2ARC write sizing. If it appeared
bigger than device size, instead of liming write it reset all the
system-wide tunables to their default.  Aside of being excessive,
it did not actually help with the problem, still allowing infinite
loop to happen.

This patch removes the tunables reverting logic, but instead limits
L2ARC writes (or at least eviction/trim) to 1/4 of the capacity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15519
2024-01-08 16:11:39 -08:00
Alexander Motin a8c29a79df Linux: Reclaim unused spl_kmem_cache_reclaim
It is unused for 3 years since #10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15507
2024-01-08 16:11:39 -08:00
Alexander Motin f13593619b FreeBSD: Optimize large kstat outputs
- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").

Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15495
2024-01-08 16:11:39 -08:00
Alan Somers c34fe8dcbc Update the kstat dataset_name when renaming a zvol
Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15482
Closes #15486
2024-01-08 16:11:39 -08:00
Alexander Motin 2a59b6bfa9 ABD: Be more assertive in iterators
Once we verified the ABDs and asserted the sizes we should never
see premature ABDs ends.  Assert that and remove extra branches
from production builds.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15428
2024-01-08 16:11:39 -08:00
Rob Norris db2db50e37 spa: make read/write queues configurable
We are finding that as customers get larger and faster machines
(hundreds of cores, large NVMe-backed pools) they keep hitting
relatively low performance ceilings. Our profiling work almost always
finds that they're running into bottlenecks on the SPA IO taskqs.
Unfortunately there's often little we can advise at that point, because
there's very few ways to change behaviour without patching.

This commit adds two load-time parameters `zio_taskq_read` and
`zio_taskq_write` that can configure the READ and WRITE IO taskqs
directly.

This achieves two goals: it gives operators (and those that support
them) a way to tune things without requiring a custom build of OpenZFS,
which is often not possible, and it lets us easily try different config
variations in a variety of environments to inform the development of
better defaults for these kind of systems.

Because tuning the IO taskqs really requires a fairly deep understanding
of how IO in ZFS works, and generally isn't needed without a pretty
serious workload and an ability to identify bottlenecks, only minimal
documentation is provided. Its expected that anyone using this is going
to have the source code there as well.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
2023-12-22 13:25:07 -08:00
Brian Behlendorf d530d5d8a5 Linux 6.5 compat: check BLK_OPEN_EXCL is defined
On some systems we already have blkdev_get_by_path() with 4 args
but still the old FMODE_EXCL and not BLK_OPEN_EXCL defined.
The vdev_bdev_mode() function was added to handle this case
but there was no generic way to specify exclusive access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15692
2023-12-21 16:19:48 -08:00
Brian Behlendorf 3c502e376b ZTS: Disable io_uring test on CentOS 9
The io_uring test fails on CentOS 9 with the following fio error.
Disable the test for the benefit of the CI until this can be fully
investigated.  This basic test passes as expected on newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15636
2023-12-21 15:44:43 -08:00
Rob Norris 03b84099d9 linux 6.7 compat: rework shrinker setup for heap allocations
6.7 changes the shrinker API such that shrinkers must be allocated
dynamically by the kernel. To accomodate this, this commit reworks
spl_register_shrinker() to do something similar against earlier kernels.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Rob Norris 18a9185165 linux 6.7 compat: handle superblock shrinker member change
In 6.7 the superblock shrinker member s_shrink has changed from being an
embedded struct to a pointer. Detect this, and don't take a reference if
it already is one.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Rob Norris 3c13601a12 linux 6.7 compat: use inode atime/mtime accessors
6.6 made i_ctime inaccessible; 6.7 has done the same for i_atime and
i_mtime. This extends the method used for ctime in b37f29341 to atime
and mtime as well.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Rob Norris b3626f0a35 linux 6.7 compat: simplify current_time() check
6.7 changed the names of the time members in struct inode, so we can't
assign back to it because we don't know its name. In practice this
doesn't matter though - if we're missing current_time(), then we must be
on <4.9, and we know our fallback will need to return timespec.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Tony Hutter 494aaaed89 Tag zfs-2.2.2
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-29 14:08:46 -08:00
rmacklem 522414da3b FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible
Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount.  If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.

This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.

External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #15563
2023-11-29 14:08:46 -08:00
Alexander Motin a8c256046b ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means on actual
replay we must take additional references, which would stay in BRT.

Without this blocks could be freed prematurely when either original
file or its clone are destroyed.  I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened.  With the patch I see expected stats.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15603
2023-11-29 13:08:25 -08:00
Martin Matuška eb34de04d7 zdb: fix printf() length for uint64_t devid
Bug introduced in 213d682967.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606
2023-11-29 13:08:25 -08:00
Jaron Kent-Dobias d813aa8530 Linux 6.6 compat: fix configure error with clang (#15558)
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-11-28 15:19:07 -08:00
AllKind 3b267e72de zfs-dkms: fix shell-init error message
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15576
2023-11-28 15:19:07 -08:00
Alan Somers 349fb77f11 FreeBSD: Fix the build on FreeBSD 12
It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0.  So OpenZFS should be using
  VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1.  We can drop
  support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0.  OpenZFS change 9d0887402b
  assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b03791
  assumed 13.0 or later.

Sponsored-by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #15551
2023-11-28 15:19:07 -08:00
Rob N 2a953e0ac9 dmu_buf_will_clone: fix race in transition back to NOFILL
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15566
Closes #15526
2023-11-28 12:59:00 -08:00
Akash B e4985bf5a1 zdb: Fix zdb '-O|-r' options with -e/exported zpool
zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.

Below errors are seen:

~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory

~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory

We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15532
2023-11-28 12:56:43 -08:00
Rob Norris e96675a7b1 zdb: show BRT statistics and dump its contents
Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-28 12:56:43 -08:00
Rob Norris d702f86eaf brt: lift internal definitions into _impl header
So that zdb (and others!) can get at the BRT on-disk structures.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-28 12:56:43 -08:00
Tony Hutter 41c4599cba ZTS: Fix zfs_load-key failures on F39
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550
2023-11-28 12:56:09 -08:00
Alexander Motin 56a2a0981e ZIL: Do not encrypt block pointers in lr_clone_range_t
In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t.  Few other fields can be and are still encrypted.

This should fix panic on ZIL claim after crash when block cloning
is actively used.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
2023-11-28 11:17:52 -08:00
Rob N 9b9b09f452
dnode_is_dirty: check dnode and its data for dirtiness
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce372 Revert "Report holes when there are only metadata changes"
  ec4f9b8f3 Report holes when there are only metadata changes
  454365bba Fix dirty check in dmu_offset_next()
  66aca2473 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15571
Closes #15526
2023-11-28 09:15:48 -08:00
Brian Behlendorf 89fcb8c6f9 Revert "Tune zio buffer caches and their alignments"
This reverts commit bd7a02c251 which
can trigger an unlikely existing bio alignment issue on Linux.
This change is good, but the underlying issue it exposes needs to
be resolved before this can be re-applied.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #15533
2023-11-28 09:03:58 -08:00
Tony Hutter 55dd24c4cc Tag zfs-2.2.1
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-20 13:20:56 -08:00
Tony Hutter 78287023ce ZTS: Fix 'could not unmount datasets' on Alma 9
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup.  This was due to $PWD being set to '.' instead of the
expected full path.  This patch sets $PWD to the full path.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-20 13:20:56 -08:00
Tony Hutter 479dca51c6 zfs-2.2.1: Disable block cloning by default
Disable block cloning by default to mitigate possible data corruption
(see #15529 and #15526).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-16 14:23:03 -08:00
Rich Ercolani 87e9e82865 Add a tunable to disable BRT support.
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15529
2023-11-16 14:23:03 -08:00
Umer Saleem 0733fe2aa5 Packaging: Auto-generate changelog during configure (#15528)
Auto-generate changelog based off on @VERSION@ during configure,
so that it is not needed to be update with new releases / version
updates.

Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-11-16 14:23:03 -08:00
Tony Hutter fd836dfe24 Linux 6.6 compat: META
Update the META file to reflect compatibility with the 6.6 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15520
2023-11-16 14:23:03 -08:00
Tony Hutter e92a680c70 Workaround UBSAN errors for variable arrays
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from #15460.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #15145
Closes #15510
2023-11-16 14:23:03 -08:00
Umer Saleem f1659cc782 ZTS: Test for all known zpool feature sets
zpool_create_features_007_pos only tested for compat-2020 feature
set. It would be useful to test for all known features sets. If
any additional feature is found enabled that is not present in
compatibility list or feature set, it should be caught and
reported earlier.

This commit also removes encryption from openzfsonosx-1.8.1
compatibility list. Encryption enables bookmark_v2, since it is
a dependency of encryption, but not listed in openzfsonoxx-1.8.1
compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-16 14:23:03 -08:00
Umer Saleem f863ac3d0f Update zpool-features.7 for grub2 compatibility list updates
This commit updates zpool-features.7 man page to add newly added
zpool features to grub2 compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-16 14:23:03 -08:00
AllKind f6d2e5c075 Workaround to allow openzfs-zfs-dkms install on Ubuntu
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15503
2023-11-16 14:23:03 -08:00
Low-power f2fe4d51a8 Linux: reject read/write mapping to immutable file only on VM_SHARED
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #15344
2023-11-16 14:23:03 -08:00
MigeljanImeri 76663fe372 Fix accounting error for pending sync IO ops in zpool iostat
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #15478
2023-11-16 14:23:03 -08:00
Umer Saleem 44c8ff9b0c Linux 6.6 compat: fix implicit conversion error with debug build
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15489
2023-11-16 14:23:03 -08:00
Umer Saleem f0ffcc3adc Remove obsolete_counts from grub2 compatibility list
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.

This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15499
2023-11-16 14:23:03 -08:00
AllKind e534ba5ce7 Fix dkms installation of deb packages created with Alien.
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
 %pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15415
2023-11-16 14:23:03 -08:00
Umer Saleem 1c7048357d Add all read-only compatible zpool features to grub2 compatibility
GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15459
2023-11-16 14:23:03 -08:00
Alexander Motin 3ec4ea68d4 Unify arc_prune_async() code
There is no sense to have separate implementations for FreeBSD and
Linux.  Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.

Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15456
2023-11-08 12:15:41 -08:00
Alexander Motin bd7a02c251 Tune zio buffer caches and their alignments
We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise.  Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor.  This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.

Reduce number of caches to half-power of 2 instead of quarter-power
of 2.  This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way.  6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.

Remove explicit alignment in Linux user-space case.  I don't think
it should be needed any more with the above fixes.

As result on FreeBSD instead of such numbers of pages per slab:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_14336.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_10240.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_7168.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 2   <= Broken
vm.uma.zio_buf_comb_5120.keg.ppera: 2
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3584.keg.ppera: 7   <= Hard to free
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2560.keg.ppera: 2
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

I am now getting such:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 3   <= Fixed, 2 in 3 pages
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15452
2023-11-08 12:15:41 -08:00
Alexander Motin e82e68400a DMU: Do not pre-read holes during write
dmu_tx_check_ioerr() pre-reads blocks that are going to be dirtied
as part of transaction to both prefetch them and check for errors.
But it makes no sense to do it for holes, since there are no disk
reads to prefetch and there can be no errors.  On the other side
those blocks are anonymous, and they are freed immediately by the
dbuf_rele() without even being put into dbuf cache, so we just
burn CPU time on decompression and overheads and get absolutely
no result at the end.

Use of dbuf_hold_impl() with fail_sparse parameter allows to skip
the extra work, and on my tests with sequential 8KB writes to empty
ZVOL with 32KB blocks shows throughput increase from 1.7 to 2GB/s.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15371
2023-11-08 12:15:41 -08:00
Coleman Kane 3f67e012e4 Linux 6.6 compat: fsync_bdev() has been removed in favor of sync_blockdev()
In Linux commit 560e20e4bf6484a0c12f9f3c7a1aa55056948e1e, the
fsync_bdev() function was removed in favor of sync_blockdev() to do
(roughly) the same thing, given the same input. This change
conditionally attempts to call sync_blockdev() if fsync_bdev() isn't
discovered during configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
2023-11-08 12:15:41 -08:00
Coleman Kane 21875dd090 Linux 6.6 compat: generic_fillattr has a new u32 request_mask added at arg2
In commit 0d72b92883c651a11059d93335f33d65c6eb653b, a new u32 argument
for the request_mask was added to generic_fillattr. This is the same
request_mask for statx that's present in the most recent API implemented
by zpl_getattr_impl. This commit conditionally adds it to the
zpl_generic_fillattr(...) macro, as well as the zfs_getattr_fast(...)
implementation, when configure determines it's present in the kernel's
generic_fillattr(...).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
2023-11-08 12:15:41 -08:00
Coleman Kane fe9d409e90 Linux 6.6 compat: use inode_get/set_ctime*(...)
In Linux commit 13bc24457850583a2e7203ded05b7209ab4bc5ef, direct access
to the i_ctime member of struct inode was removed. The new approach is
to use accessor methods that exclusively handle passing the timestamp
around by value. This change adds new tests for each of these functions
and introduces zpl_* equivalents in include/os/linux/zfs/sys/zpl.h. In
where the inode_get/set_ctime*() functions exist, these zpl_* calls will
be mapped to the new functions. On older kernels, these macros just wrap
direct-access calls. The code that operated on an address of ip->i_ctime
to call ZFS_TIME_DECODE() now will take a local copy using
zpl_inode_get_ctime(), and then pass the address of the local copy when
performing the ZFS_TIME_DECODE() call, in all cases, rather than
directly accessing the member.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
Closes #15257
2023-11-08 12:15:41 -08:00
shodanshok 7aef672b77 Read prefetched buffers from L2ARC
Prefetched buffers are currently read from L2ARC if, and only if,
l2arc_noprefetch is set to non-default value of 0. This means that
a streaming read which can be served from L2ARC will instead engage
the main pool.

For example, consider what happens when a file is sequentially read:
- application requests contiguous data, engaging the prefetcher;
- ARC buffers are initially marked as prefetched but, as the calling
application consumes data, the prefetch tag is cleared;
- these "normal" buffers become eligible for L2ARC and are copied to it;
- re-reading the same file will *not* engage L2ARC even if it contains
the required buffers;
- main pool has to suffer another sequential read load, which (due to
most NCQ-enabled HDDs preferring sequential loads) can dramatically
increase latency for uncached random reads.

In other words, current behavior is to write data to L2ARC (wearing it)
without using the very same cache when reading back the same data. This
was probably useful many years ago to preserve L2ARC read bandwidth but,
with current SSD speed/size/price, it is vastly sub-optimal.

Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads,
means that even prefetched but unused buffers will be copied into L2ARC,
further increasing wear and load for potentially not-useful data.

This patch enable prefetched buffer to be read from L2ARC even when
l2arc_noprefetch=1 (default), increasing sequential read speed and
reducing load on the main pool without polluting L2ARC with not-useful
(ie: unused) prefetched data. Moreover, it clear users confusion about
L2ARC size increasing but not serving any IO when doing sequential
reads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15451
2023-11-06 16:47:51 -08:00
Thomas Bertschinger f9a9aea126 Add mutex_enter_interruptible() for interruptible sleeping IOCTLs
Many long-running ZFS ioctls lock the spa_namespace_lock, forcing
concurrent ioctls to sleep for the mutex. Previously, the only
option is to call mutex_enter() which sleeps uninterruptibly. This
is a usability issue for sysadmins, for example, if the admin runs
`zpool status` while a slow `zpool import` is ongoing, the admin's
shell will be locked in uninterruptible sleep for a long time.

This patch resolves this admin usability issue by introducing
mutex_enter_interruptible() which sleeps interruptibly while waiting
to acquire a lock. It is implemented for both Linux and FreeBSD.

The ZFS_IOC_POOL_CONFIGS ioctl, used by `zpool status`, is changed to
use this new macro so that the command can be interrupted if it is
issued during a concurrent `zpool import` (or other long-running
operation).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Closes #15360
2023-11-06 16:47:41 -08:00
Tony Hutter 8ba748d414 Revert "zvol: Temporally disable blk-mq"
This reverts commit aefb6a2bd6.

aefb6a2bd temporally disabled blk-mq until we could fix a fix for

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15439
2023-11-06 16:47:32 -08:00
Tony Hutter e860cb0200 zvol: Remove broken blk-mq optimization
This fix removes a dubious optimization in zfs_uiomove_bvec_rq()
that saved the iterator contents of a rq_for_each_segment().  This
optimization allowed restoring the "saved state" from a previous
rq_for_each_segment() call on the same uio so that you wouldn't
need to iterate though each bvec on every zfs_uiomove_bvec_rq() call.
However, if the kernel is manipulating the requests/bios/bvecs under
the covers between zfs_uiomove_bvec_rq() calls, then it could result
in corruption from using the "saved state".  This optimization
results in an unbootable system after installing an OS on a zvol
with blk-mq enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15351
2023-11-06 16:47:24 -08:00
ofthesun9 86c3ed40e1 "ARC prefetch metadata accesses:" appears twice in the output.
The first occurrence should be "ARC prefetch data accesses:"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #15427
2023-11-06 16:47:14 -08:00
Alexander Motin 6e41aca519 Trust ARC_BUF_SHARED() more
In my understanding ARC_BUF_SHARED() and arc_buf_is_shared() should
return identical results, except the second also asserts it deeper.
The first is much cheaper though, saving few pointer dereferences.
Replace production arc_buf_is_shared() calls with ARC_BUF_SHARED(),
and call arc_buf_is_shared() in random assertions, while making it
even more strict.

On my tests this in half reduces arc_buf_destroy_impl() time, that
noticeably reduces hash_lock congestion under heavy dbuf eviction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15397
2023-11-06 16:47:05 -08:00
Alexander Motin 79f7de5752 Remove lock from dsl_pool_need_dirty_delay()
Torn reads/writes of dp_dirty_total are unlikely: on 64-bit systems
due to register size, while on 32-bit due to memory constraints.
And even if we hit some race, the code implementing the delay takes
the lock any way.

Removal of the poll-wide lock acquisition saves ~1% of CPU time on
8-thread 8KB write workload.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15390
2023-11-06 16:46:55 -08:00
VaibhavB 0ef1964c79 run-zts test procfs/pool_state failed with uncorrectable I/O failure
Once we trigger the zpool scrub, all zpool/zfs command gets stuck for 
180 seconds. Post 180 seconds zpool/zfs commands gets start executing 
however few more seconds(10s) it take to update the status. hence 
sleeping for 200 seconds so that we get the correct status.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: vaibhav.bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15364
2023-11-06 16:46:49 -08:00
Alexander Motin eaa62d9951 Properly pad struct tx_cpu to cache line
We already use ____cacheline_aligned in many places, so add one more
instead of seems arbitrary char tc_pad[8].

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15402
2023-11-06 16:46:44 -08:00
dennisfriedrichsen 8ca95d78c5 Fix typo in tests/zfs-tests/tests/functional/cli_user/misc/misc.cfg
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Closes #15417
2023-11-06 16:46:37 -08:00
Olivier Certner edebca5dfc FreeBSD: taskq: Remove unused declaration
Variable 'uma_align_cache' has not been used since commit "FreeBSD: Use
a hash table for taskqid lookups" (3933305ea).  Moreover, it is soon
going to become private to FreeBSD's UMA in 15.0-CURRENT (main),
14.0-STABLE (stable/14) and 13.2-STABLE (stable/13).  Should accessing
this information become necessary again, one will have to use the new
accessors for recent versions.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olivier Certner <olce.freebsd@certner.fr>
Closes #15416
2023-11-06 16:46:32 -08:00
Colin Percival 1cc1bf4fa7 Set spa_ccw_fail_time=0 when expanding a vdev.
When a vdev is to be expanded -- either via `zpool online -e` or via
the autoexpand option -- a SPA_ASYNC_CONFIG_UPDATE request is queued
to be handled via an asynchronous worker thread (spa_async_thread).
This normally happens almost immediately; but will be delayed up to
zfs_ccw_retry_interval seconds (default 5 minutes) if an attempt to
write the zpool configuration cache failed.

When FreeBSD boots ZFS-root VM images generated using `makefs -t zfs`,
the zpoolupgrade rc.d script runs `zpool upgrade`, which modifies the
pool configuration and triggers an attempt to write to the cache file.
This attempted write fails because the filesystem is still mounted
read-only at this point in the boot process, triggering a 5-minute
cooldown before SPA_ASYNC_CONFIG_UPDATE requests will be handled by
the asynchronous worker thread.

When expanding a vdev, reset the "when did a configuration cache
write last fail" value so that the SPA_ASYNC_CONFIG_UPDATE request
will be handled promptly.  A cleaner but more intrusive option would
be to use separate SPA_ASYNC_ flags for "configuration changed" and
"try writing the configuration cache again", but with FreeBSD 14.0
coming very soon I'd prefer to leave such refactoring for a later
date.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@FreeBSD.org>
Closes #15405
2023-11-06 16:46:25 -08:00
Don Brady 0bcd1151f0 Fix ZED auto-replace for VDEVs using by-id paths
The change is simple -- restore the original code so that the VDEV 
path is updated when using by-id paths.  The more challenging part 
was to devise a second ZTS test, that would test auto-replace for 
'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, 
the fix to confirm that auto-replace for by-id paths works. The 
existing auto-replace test, functional/fault/auto_replace_001_pos, 
will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk 
wipe (using dd) was not effective in removing the partitioning since 
the kernel was never informed of the wipe.

Added a call to wipefs(8) so that the kernel is informed and ZED will 
re-partition the device.
    
Added a validation step that the re-partitioning occurred by
confirming  that the GPT partition UUID changes.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15363
2023-11-06 16:45:14 -08:00
Tony Hutter 78fd79eacd Add zfs_prepare_disk script for disk firmware install
Have libzfs call a special `zfs_prepare_disk` script before a disk is
included into the pool.  The user can edit this script to add things
like a disk firmware update or a disk health check.  Use of the script
is totally optional. See the zfs_prepare_disk manpage for full details.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15243
2023-11-06 16:45:07 -08:00
John Wren Kennedy 6d693e20a2 Large sync writes perform worse with slog
For synchronous write workloads with large IO sizes, a pool configured
with a slog performs worse than one with an embedded zil:

sequential_writes 1m sync ios, 16 threads
  Write IOPS:              1292          438   -66.10%
  Write Bandwidth:      1323570       448910   -66.08%
  Write Latency:       12128400     36330970      3.0x

sequential_writes 1m sync ios, 32 threads
  Write IOPS:              1293          430   -66.74%
  Write Bandwidth:      1324184       441188   -66.68%
  Write Latency:       24486278     74028536      3.0x

The reason is the `zil_slog_bulk` variable. In `zil_lwb_write_open`,
if a zil block is greater than 768K, the priority of the write is
downgraded from sync to async. Increasing the value allows greater
throughput. To select a value for this PR, I ran an fio workload with
the following values for `zil_slog_bulk`:

    zil_slog_bulk    KiB/s
    1048576         422132
    2097152         478935
    4194304         533645
    8388608         623031
    12582912        827158
    16777216       1038359
    25165824       1142210
    33554432       1211472
    50331648       1292847
    67108864       1308506
    100663296      1306821
    134217728      1304998

At 64M, the results with a slog are now improved to parity with an
embedded zil:

sequential_writes 1m sync ios, 16 threads
  Write IOPS:               438         1288      2.9x
  Write Bandwidth:       448910      1319062      2.9x
  Write Latency:       36330970     12163408   -66.52%

sequential_writes 1m sync ios, 32 threads
  Write IOPS:               430         1290      3.0x
  Write Bandwidth:       441188      1321693      3.0x
  Write Latency:       74028536     24519698   -66.88%

None of the other tests in the performance suite (run with a zil or
slog) had a significant change, including the random_write_zil tests,
which use multiple datasets.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14378
2023-11-06 16:33:23 -08:00
Alexander Motin b76724ae47 FreeBSD: Improve taskq wrapper
- Group tqent_task and tqent_timeout_task into a union.  They are
never used same time. This shrinks taskq_ent_t from 192 to 160 bytes.
 - Remove tqent_registered.  Use tqent_id != 0 instead.
 - Remove tqent_cancelled.  Use taskqueue pending counter instead.
 - Change tqent_type into uint_t.  We don't need to pack it any more.
 - Change tqent_rc into uint_t, matching refcount(9).
 - Take shared locks in taskq_lookup().
 - Call proper taskqueue_drain_timeout() for TIMEOUT_TASK in
taskq_cancel_id() and taskq_wait_id().
 - Switch from CK_LIST to regular LIST.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15356
2023-11-06 16:33:18 -08:00
Martin Matuška 459c99ff23 Fix block cloning between unencrypted and encrypted datasets
Block cloning from an encrypted dataset into an unencrypted dataset
and vice versa is not possible. The current code did allow cloning
unencrypted files into an encrypted dataset causing a panic when
these were accessed. Block cloning between encrypted and encrypted
is currently supported on the same filesystem only.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15464
Closes #15465
2023-11-06 10:40:50 -08:00
439 changed files with 14864 additions and 3606 deletions

4
.github/codeql-cpp.yml vendored Normal file
View File

@ -0,0 +1,4 @@
name: "Custom CodeQL Analysis"
queries:
- uses: ./.github/codeql/custom-queries/cpp/deprecatedFunctionUsage.ql

4
.github/codeql-python.yml vendored Normal file
View File

@ -0,0 +1,4 @@
name: "Custom CodeQL Analysis"
paths-ignore:
- tests

View File

@ -0,0 +1,59 @@
/**
* @name Deprecated function usage detection
* @description Detects functions whose usage is banned from the OpenZFS
* codebase due to QA concerns.
* @kind problem
* @severity error
* @id cpp/deprecated-function-usage
*/
import cpp
predicate isDeprecatedFunction(Function f) {
f.getName() = "strtok" or
f.getName() = "__xpg_basename" or
f.getName() = "basename" or
f.getName() = "dirname" or
f.getName() = "bcopy" or
f.getName() = "bcmp" or
f.getName() = "bzero" or
f.getName() = "asctime" or
f.getName() = "asctime_r" or
f.getName() = "gmtime" or
f.getName() = "localtime" or
f.getName() = "strncpy"
}
string getReplacementMessage(Function f) {
if f.getName() = "strtok" then
result = "Use strtok_r(3) instead!"
else if f.getName() = "__xpg_basename" then
result = "basename(3) is underspecified. Use zfs_basename() instead!"
else if f.getName() = "basename" then
result = "basename(3) is underspecified. Use zfs_basename() instead!"
else if f.getName() = "dirname" then
result = "dirname(3) is underspecified. Use zfs_dirnamelen() instead!"
else if f.getName() = "bcopy" then
result = "bcopy(3) is deprecated. Use memcpy(3)/memmove(3) instead!"
else if f.getName() = "bcmp" then
result = "bcmp(3) is deprecated. Use memcmp(3) instead!"
else if f.getName() = "bzero" then
result = "bzero(3) is deprecated. Use memset(3) instead!"
else if f.getName() = "asctime" then
result = "Use strftime(3) instead!"
else if f.getName() = "asctime_r" then
result = "Use strftime(3) instead!"
else if f.getName() = "gmtime" then
result = "gmtime(3) isn't thread-safe. Use gmtime_r(3) instead!"
else if f.getName() = "localtime" then
result = "localtime(3) isn't thread-safe. Use localtime_r(3) instead!"
else
result = "strncpy(3) is deprecated. Use strlcpy(3) instead!"
}
from FunctionCall fc, Function f
where
fc.getTarget() = f and
isDeprecatedFunction(f)
select fc, getReplacementMessage(f)

View File

@ -0,0 +1,4 @@
name: openzfs-cpp-queries
version: 0.0.0
libraryPathDependencies: codeql-cpp
suites: openzfs-cpp-suite

View File

@ -4,44 +4,54 @@
```mermaid
flowchart TB
subgraph CleanUp and Summary
Part1-20.04-->CleanUp+nice+Summary
Part2-20.04-->CleanUp+nice+Summary
PartN-20.04-->CleanUp+nice+Summary
Part1-22.04-->CleanUp+nice+Summary
Part2-22.04-->CleanUp+nice+Summary
PartN-22.04-->CleanUp+nice+Summary
CleanUp+Summary
end
subgraph Functional Testings
sanity-checks-20.04
zloop-checks-20.04
functional-testing-20.04-->Part1-20.04
functional-testing-20.04-->Part2-20.04
functional-testing-20.04-->PartN-20.04
functional-testing-20.04-->Part3-20.04
functional-testing-20.04-->Part4-20.04
functional-testing-22.04-->Part1-22.04
functional-testing-22.04-->Part2-22.04
functional-testing-22.04-->PartN-22.04
end
subgraph Sanity and zloop Testings
sanity-checks-20.04-->functional-testing-20.04
sanity-checks-22.04-->functional-testing-22.04
zloop-checks-20.04-->functional
zloop-checks-22.04-->functional
functional-testing-22.04-->Part3-22.04
functional-testing-22.04-->Part4-22.04
sanity-checks-22.04
zloop-checks-22.04
end
subgraph Code Checking + Building
Build-Ubuntu-20.04
codeql.yml
checkstyle.yml
Build-Ubuntu-20.04-->sanity-checks-20.04
Build-Ubuntu-22.04-->sanity-checks-22.04
Build-Ubuntu-20.04-->zloop-checks-20.04
Build-Ubuntu-22.04-->zloop-checks-22.04
Build-Ubuntu-22.04
end
Build-Ubuntu-20.04-->sanity-checks-20.04
Build-Ubuntu-20.04-->zloop-checks-20.04
Build-Ubuntu-20.04-->functional-testing-20.04
Build-Ubuntu-22.04-->sanity-checks-22.04
Build-Ubuntu-22.04-->zloop-checks-22.04
Build-Ubuntu-22.04-->functional-testing-22.04
sanity-checks-20.04-->CleanUp+Summary
Part1-20.04-->CleanUp+Summary
Part2-20.04-->CleanUp+Summary
Part3-20.04-->CleanUp+Summary
Part4-20.04-->CleanUp+Summary
Part1-22.04-->CleanUp+Summary
Part2-22.04-->CleanUp+Summary
Part3-22.04-->CleanUp+Summary
Part4-22.04-->CleanUp+Summary
sanity-checks-22.04-->CleanUp+Summary
```
1) build zfs modules for Ubuntu 20.04 and 22.04 (~15m)
2) 2x zloop test (~10m) + 2x sanity test (~25m)
3) functional testings in parts 1..5 (each ~1h)
3) 4x functional testings in parts 1..4 (each ~1h)
4) cleanup and create summary
- content of summary depends on the results of the steps

View File

@ -8,7 +8,7 @@ jobs:
checkstyle:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
@ -52,7 +52,7 @@ jobs:
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
find -name *.abi | tar -cf abi_files.tar -T -
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
if: failure() && steps.CheckABI.outcome == 'failure'
with:
name: New ABI files (use only if you're sure about interface changes)

View File

@ -24,11 +24,12 @@ jobs:
echo "MAKEFLAGS=-j$(nproc)" >> $GITHUB_ENV
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
config-file: .github/codeql-${{ matrix.language }}.yml
languages: ${{ matrix.language }}
- name: Autobuild

View File

@ -87,7 +87,7 @@ function summarize_f() {
output "\n## $headline\n"
rm -rf testfiles
for i in $(seq 1 $FUNCTIONAL_PARTS); do
tarfile="$2/part$i.tar"
tarfile="$2-part$i/part$i.tar"
check_tarfile "$tarfile"
check_logfile "testfiles/log"
done

View File

@ -55,29 +55,24 @@ function mod_install() {
cat /proc/spl/kstat/zfs/chksum_bench
echo "::endgroup::"
echo "::group::Reclaim and report disk space"
# remove 4GiB of images
sudo systemd-run docker system prune --force --all --volumes
echo "::group::Optimize storage for ZFS testings"
# remove swap and umount fast storage
# 89GiB -> rootfs + bootfs with ~80MB/s -> don't care
# 64GiB -> /mnt with 420MB/s -> new testing ssd
sudo swapoff -a
# remove unused software
sudo systemd-run --wait rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/* \
/usr/local/* \
/usr/share/az* \
/usr/share/dotnet \
/usr/share/gradle* \
/usr/share/miniconda \
/usr/share/swift \
/var/lib/gems \
/var/lib/mysql \
/var/lib/snapd
# trim the cleaned space
sudo fstrim /
# this one is fast and mounted @ /mnt
# -> we reformat with ext4 + move it to /var/tmp
DEV="/dev/disk/azure/resource-part1"
sudo umount /mnt
sudo mkfs.ext4 -O ^has_journal -F $DEV
sudo mount -o noatime,barrier=0 $DEV /var/tmp
sudo chmod 1777 /var/tmp
# disk usage afterwards
df -h /
sudo df -h /
sudo df -h /var/tmp
sudo fstrim -a
echo "::endgroup::"
}

View File

@ -13,10 +13,10 @@ jobs:
zloop:
runs-on: ubuntu-${{ inputs.os }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v4
with:
name: modules-${{ inputs.os }}
- name: Install modules
@ -34,7 +34,7 @@ jobs:
if: failure()
run: |
sudo chmod +r -R /var/tmp/zloop/
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Zpool-logs-${{ inputs.os }}
@ -43,7 +43,7 @@ jobs:
!/var/tmp/zloop/*/vdev/
retention-days: 14
if-no-files-found: ignore
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Zpool-files-${{ inputs.os }}
@ -55,10 +55,10 @@ jobs:
sanity:
runs-on: ubuntu-${{ inputs.os }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v4
with:
name: modules-${{ inputs.os }}
- name: Install modules
@ -77,7 +77,7 @@ jobs:
RESPATH="/var/tmp/test_results"
mv -f $RESPATH/current $RESPATH/testfiles
tar cf $RESPATH/sanity.tar -h -C $RESPATH testfiles
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
if: success() || failure()
with:
name: Logs-${{ inputs.os }}-sanity
@ -91,10 +91,10 @@ jobs:
matrix:
tests: [ part1, part2, part3, part4 ]
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v4
with:
name: modules-${{ inputs.os }}
- name: Install modules
@ -116,9 +116,9 @@ jobs:
RESPATH="/var/tmp/test_results"
mv -f $RESPATH/current $RESPATH/testfiles
tar cf $RESPATH/${{ matrix.tests }}.tar -h -C $RESPATH testfiles
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
if: success() || failure()
with:
name: Logs-${{ inputs.os }}-functional
name: Logs-${{ inputs.os }}-functional-${{ matrix.tests }}
path: /var/tmp/test_results/${{ matrix.tests }}.tar
if-no-files-found: ignore

View File

@ -14,14 +14,14 @@ jobs:
os: [20.04, 22.04]
runs-on: ubuntu-${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build modules
run: .github/workflows/scripts/setup-dependencies.sh build
- name: Prepare modules upload
run: tar czf modules-${{ matrix.os }}.tgz *.deb .github tests/test-runner tests/ImageOS.txt
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
with:
name: modules-${{ matrix.os }}
path: modules-${{ matrix.os }}.tgz
@ -44,7 +44,7 @@ jobs:
runs-on: ubuntu-22.04
needs: testings
steps:
- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v4
- name: Generating summary
run: |
tar xzf modules-22.04/modules-22.04.tgz .github tests
@ -58,7 +58,7 @@ jobs:
run: .github/workflows/scripts/generate-summary.sh 3
- name: Summary for errors #4
run: .github/workflows/scripts/generate-summary.sh 4
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
with:
name: Summary Files
path: Summary/

1
.gitignore vendored
View File

@ -83,6 +83,7 @@
modules.order
Makefile
Makefile.in
changelog
*.patch
*.orig
*.tmp

View File

@ -30,6 +30,7 @@ Andreas Dilger <adilger@dilger.ca>
Andrew Walker <awalker@ixsystems.com>
Benedikt Neuffer <github@itfriend.de>
Chengfei Zhu <chengfeix.zhu@intel.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chris Lindee <chris.lindee+github@gmail.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
@ -43,6 +44,7 @@ Glenn Washburn <development@efficientek.com>
Gordan Bobic <gordan.bobic@gmail.com>
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
hedong zhang <h_d_zhang@163.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
InsanePrawn <Insane.Prawny@gmail.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
@ -57,6 +59,7 @@ KernelOfTruth <kerneloftruth@gmail.com>
Liu Hua <liu.hua130@zte.com.cn>
Liu Qing <winglq@gmail.com>
loli10K <ezomori.nozomu@gmail.com>
Mart Frauenlob <allkind@fastest.cc>
Matthias Blankertz <matthias@blankertz.org>
Michael Gmelin <grembo@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
@ -73,6 +76,12 @@ WHR <msl0000023508@gmail.com>
Yanping Gao <yanping.gao@xtaotech.com>
Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>
# Commits from strange places, long ago
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c>
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@fedora-17-amd64.(none)>
@ -89,6 +98,7 @@ Alek Pinchuk <apinchuk@axcient.com> <alek-p@users.noreply.github.com>
Alexander Lobakin <alobakin@pm.me> <solbjorn@users.noreply.github.com>
Alexey Smirnoff <fling@member.fsf.org> <fling-@users.noreply.github.com>
Allen Holl <allen.m.holl@gmail.com> <65494904+allen-4@users.noreply.github.com>
Alphan Yılmaz <alphanyilmaz@gmail.com> <a1ea321@users.noreply.github.com>
Ameer Hamza <ahamza@ixsystems.com> <106930537+ixhamza@users.noreply.github.com>
Andrew J. Hesford <ajh@sideband.org> <48421688+ahesford@users.noreply.github.com>>
Andrew Sun <me@andrewsun.com> <as-com@users.noreply.github.com>
@ -96,18 +106,22 @@ Aron Xu <happyaron.xu@gmail.com> <happyaron@users.noreply.github.com>
Arun KV <arun.kv@datacore.com> <65647132+arun-kv@users.noreply.github.com>
Ben Wolsieffer <benwolsieffer@gmail.com> <lopsided98@users.noreply.github.com>
bernie1995 <bernie.pikes@gmail.com> <42413912+bernie1995@users.noreply.github.com>
Bojan Novković <bnovkov@FreeBSD.org> <72801811+bnovkov@users.noreply.github.com>
Boris Protopopov <boris.protopopov@actifio.com> <bprotopopov@users.noreply.github.com>
Brad Forschinger <github@bnjf.id.au> <bnjf@users.noreply.github.com>
Brandon Thetford <brandon@dodecatec.com> <dodexahedron@users.noreply.github.com>
buzzingwires <buzzingwires@outlook.com> <131118055+buzzingwires@users.noreply.github.com>
Cedric Maunoury <cedric.maunoury@gmail.com> <38213715+cedricmaunoury@users.noreply.github.com>
Charles Suh <charles.suh@gmail.com> <charlessuh@users.noreply.github.com>
Chris Peredun <chris.peredun@ixsystems.com> <126915832+chrisperedun@users.noreply.github.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com> <35844628+dacianstremtan@users.noreply.github.com>
Damian Szuberski <szuberskidamian@gmail.com> <30863496+szubersk@users.noreply.github.com>
Daniel Hiepler <d-git@coderdu.de> <32984777+heeplr@users.noreply.github.com>
Daniel Kobras <d.kobras@science-computing.de> <sckobras@users.noreply.github.com>
Daniel Reichelt <hacking@nachtgeist.net> <nachtgeist@users.noreply.github.com>
David Quigley <david.quigley@intel.com> <dpquigl@users.noreply.github.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com> <31087738+dennisfriedrichsen@users.noreply.github.com>
Dex Wood <slash2314@gmail.com> <slash2314@users.noreply.github.com>
DHE <git@dehacked.net> <DeHackEd@users.noreply.github.com>
Dmitri John Ledkov <dimitri.ledkov@canonical.com> <19779+xnox@users.noreply.github.com>
Dries Michiels <driesm.michiels@gmail.com> <32487486+driesmp@users.noreply.github.com>
@ -128,6 +142,7 @@ Harry Mallon <hjmallon@gmail.com> <1816667+hjmallon@users.noreply.github.com>
Hiếu Lê <leorize+oss@disroot.org> <alaviss@users.noreply.github.com>
Jake Howard <git@theorangeone.net> <RealOrangeOne@users.noreply.github.com>
James Cowgill <james.cowgill@mips.com> <jcowgill@users.noreply.github.com>
Jaron Kent-Dobias <jaron@kent-dobias.com> <kentdobias@users.noreply.github.com>
Jason King <jason.king@joyent.com> <jasonbking@users.noreply.github.com>
Jeff Dike <jdike@akamai.com> <52420226+jdike@users.noreply.github.com>
Jitendra Patidar <jitendra.patidar@nutanix.com> <53164267+jsai20@users.noreply.github.com>
@ -137,7 +152,9 @@ John L. Hammond <john.hammond@intel.com> <35266395+jhammond-intel@users.noreply.
John-Mark Gurney <jmg@funkthat.com> <jmgurney@users.noreply.github.com>
John Ramsden <johnramsden@riseup.net> <johnramsden@users.noreply.github.com>
Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com>
Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com>
Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com>
Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com>
Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com>
Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com>
Krzysztof Piecuch <piecuch@kpiecuch.pl> <3964215+pikrzysztof@users.noreply.github.com>
@ -148,9 +165,11 @@ Lorenz Hüdepohl <dev@stellardeath.org> <lhuedepohl@users.noreply.github.com>
Luís Henriques <henrix@camandro.org> <73643340+lumigch@users.noreply.github.com>
Marcin Skarbek <git@skarbek.name> <mskarbek@users.noreply.github.com>
Matt Fiddaman <github@m.fiddaman.uk> <81489167+matt-fidd@users.noreply.github.com>
Maxim Filimonov <che@bein.link> <part1zano@users.noreply.github.com>
Max Zettlmeißl <max@zettlmeissl.de> <6818198+maxz@users.noreply.github.com>
Michael Niewöhner <foss@mniewoehner.de> <c0d3z3r0@users.noreply.github.com>
Michael Zhivich <mzhivich@akamai.com> <33133421+mzhivich@users.noreply.github.com>
MigeljanImeri <ImeriMigel@gmail.com> <78048439+MigeljanImeri@users.noreply.github.com>
Mo Zhou <cdluminate@gmail.com> <5723047+cdluminate@users.noreply.github.com>
Nick Mattis <nickm970@gmail.com> <nmattis@users.noreply.github.com>
omni <omni+vagant@hack.org> <79493359+omnivagant@users.noreply.github.com>
@ -164,6 +183,7 @@ Ping Huang <huangping@smartx.com> <101400146+hpingfs@users.noreply.github.com>
Piotr P. Stefaniak <pstef@freebsd.org> <pstef@users.noreply.github.com>
Richard Allen <belperite@gmail.com> <33836503+belperite@users.noreply.github.com>
Rich Ercolani <rincebrain@gmail.com> <214141+rincebrain@users.noreply.github.com>
Rick Macklem <rmacklem@uoguelph.ca> <64620010+rmacklem@users.noreply.github.com>
Rob Wing <rob.wing@klarasystems.com> <98866084+rob-wing@users.noreply.github.com>
Roman Strashkin <roman.strashkin@nexenta.com> <Ramzec@users.noreply.github.com>
Ryan Hirasaki <ryanhirasaki@gmail.com> <4690732+RyanHir@users.noreply.github.com>
@ -174,13 +194,17 @@ Scott Colby <scott@scolby.com> <scolby33@users.noreply.github.com>
Sean Eric Fagan <kithrup@mac.com> <kithrup@users.noreply.github.com>
Spencer Kinny <spencerkinny1995@gmail.com> <30333052+Spencer-Kinny@users.noreply.github.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> <75025422+nssrikanth@users.noreply.github.com>
Stefan Lendl <s.lendl@proxmox.com> <1321542+stfl@users.noreply.github.com>
Thomas Bertschinger <bertschinger@lanl.gov> <101425190+bertschinger@users.noreply.github.com>
Thomas Geppert <geppi@digitx.de> <geppi@users.noreply.github.com>
Tim Crawford <tcrawford@datto.com> <crawfxrd@users.noreply.github.com>
Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Tom Matthews <tom@axiom-partners.com> <tomtastic@users.noreply.github.com>
Tony Perkins <tperkins@datto.com> <62951051+tony-zfs@users.noreply.github.com>
Torsten Wörtwein <twoertwein@gmail.com> <twoertwein@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com> <TulsiJain@users.noreply.github.com>
Václav Skála <skala@vshosting.cz> <33496485+vaclavskala@users.noreply.github.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com> <88050553+vaibhav-delphix@users.noreply.github.com>
Violet Purcell <vimproved@inventati.org> <66446404+vimproved@users.noreply.github.com>
Vipin Kumar Verma <vipin.verma@hpe.com> <75025470+vermavipinkumar@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com> <Blub@users.noreply.github.com>

48
AUTHORS
View File

@ -46,6 +46,7 @@ CONTRIBUTORS:
Alex Zhuravlev <alexey.zhuravlev@intel.com>
Allan Jude <allanjude@freebsd.org>
Allen Holl <allen.m.holl@gmail.com>
Alphan Yılmaz <alphanyilmaz@gmail.com>
alteriks <alteriks@gmail.com>
Alyssa Ross <hi@alyssa.is>
Ameer Hamza <ahamza@ixsystems.com>
@ -88,15 +89,18 @@ CONTRIBUTORS:
Bassu <bassu@phi9.com>
Ben Allen <bsallen@alcf.anl.gov>
Ben Cordero <bencord0@condi.me>
Benda Xu <orv@debian.org>
Benedikt Neuffer <github@itfriend.de>
Benjamin Albrecht <git@albrecht.io>
Benjamin Gentil <benjgentil.pro@gmail.com>
Benjamin Sherman <benjamin@holyarmy.org>
Ben McGough <bmcgough@fredhutch.org>
Ben Rubson <ben.rubson@gmail.com>
Ben Wolsieffer <benwolsieffer@gmail.com>
bernie1995 <bernie.pikes@gmail.com>
Bill McGonigle <bill-github.com-public1@bfccomputing.com>
Bill Pijewski <wdp@joyent.com>
Bojan Novković <bnovkov@FreeBSD.org>
Boris Protopopov <boris.protopopov@nexenta.com>
Brad Forschinger <github@bnjf.id.au>
Brad Lewis <brad.lewis@delphix.com>
@ -111,6 +115,7 @@ CONTRIBUTORS:
bzzz77 <bzzz.tomas@gmail.com>
cable2999 <cable2999@users.noreply.github.com>
Caleb James DeLisle <calebdelisle@lavabit.com>
Cameron Harr <harr1@llnl.gov>
Cao Xuewen <cao.xuewen@zte.com.cn>
Carlo Landmeter <clandmeter@gmail.com>
Carlos Alberto Lopez Perez <clopez@igalia.com>
@ -120,12 +125,15 @@ CONTRIBUTORS:
Chen Can <chen.can2@zte.com.cn>
Chengfei Zhu <chengfeix.zhu@intel.com>
Chen Haiquan <oc@yunify.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chip Parker <aparker@enthought.com>
Chris Burroughs <chris.burroughs@gmail.com>
Chris Davidson <christopher.davidson@gmail.com>
Chris Dunlap <cdunlap@llnl.gov>
Chris Dunlop <chris@onthe.net.au>
Chris Lindee <chris.lindee+github@gmail.com>
Chris McDonough <chrism@plope.com>
Chris Peredun <chris.peredun@ixsystems.com>
Chris Siden <chris.siden@delphix.com>
Chris Siebenmann <cks.github@cs.toronto.edu>
Christer Ekholm <che@chrekh.se>
@ -144,6 +152,7 @@ CONTRIBUTORS:
Clint Armstrong <clint@clintarmstrong.net>
Coleman Kane <ckane@colemankane.org>
Colin Ian King <colin.king@canonical.com>
Colin Percival <cperciva@tarsnap.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
Craig Loomis <cloomis@astro.princeton.edu>
@ -156,10 +165,12 @@ CONTRIBUTORS:
Damiano Albani <damiano.albani@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Damian Wojsław <damian@wojslaw.pl>
Daniel Berlin <dberlin@dberlin.org>
Daniel Hiepler <d-git@coderdu.de>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Kobras <d.kobras@science-computing.de>
Daniel Kolesa <daniel@octaforge.org>
Daniel Perry <dtperry@amazon.com>
Daniel Reichelt <hacking@nachtgeist.net>
Daniel Stevenson <bot@dstev.net>
Daniel Verite <daniel@verite.pro>
@ -176,8 +187,11 @@ CONTRIBUTORS:
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
D. Ebdrup <debdrup@freebsd.org>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Derek Schrock <dereks@lifeofadishwasher.com>
Dex Wood <slash2314@gmail.com>
DHE <git@dehacked.net>
Didier Roche <didrocks@ubuntu.com>
Dimitri John Ledkov <xnox@ubuntu.com>
@ -235,9 +249,12 @@ CONTRIBUTORS:
Gionatan Danti <g.danti@assyoma.it>
Giuseppe Di Natale <guss80@gmail.com>
Glenn Washburn <development@efficientek.com>
glibg10b <glibg10b@users.noreply.github.com>
gofaster <felix.gofaster@gmail.com>
Gordan Bobic <gordan@redsleeve.org>
Gordon Bergling <gbergling@googlemail.com>
Gordon Ross <gwr@nexenta.com>
Gordon Tetlow <gordon@freebsd.org>
Graham Christensen <graham@grahamc.com>
Graham Perrin <grahamperrin@gmail.com>
Gregor Kopka <gregor@kopka.net>
@ -265,6 +282,7 @@ CONTRIBUTORS:
Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com>
ilbsmart <wgqimut@gmail.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
illiliti <illiliti@protonmail.com>
ilovezfs <ilovezfs@icloud.com>
InsanePrawn <Insane.Prawny@gmail.com>
@ -280,9 +298,11 @@ CONTRIBUTORS:
Jan Engelhardt <jengelh@inai.de>
Jan Kryl <jan.kryl@nexenta.com>
Jan Sanislo <oystr@cs.washington.edu>
Jaron Kent-Dobias <jaron@kent-dobias.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
Jason King <jason.brian.king@gmail.com>
Jason Lee <jasonlee@lanl.gov>
Jason Zaman <jasonzaman@gmail.com>
Javen Wu <wu.javen@gmail.com>
Jean-Baptiste Lallement <jean-baptiste@ubuntu.com>
@ -313,6 +333,7 @@ CONTRIBUTORS:
Jonathon Fernyhough <jonathon@m2x.dev>
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Jose Luis Duran <jlduran@gmail.com>
Josh Soref <jsoref@users.noreply.github.com>
Joshua M. Clulow <josh@sysmgr.org>
José Luis Salvador Rufo <salvador.joseluis@gmail.com>
@ -336,8 +357,10 @@ CONTRIBUTORS:
Kash Pande <kash@tripleback.net>
Kay Pedersen <christianpe96@gmail.com>
Keith M Wesolowski <wesolows@foobazco.org>
Kent Ross <k@mad.cash>
KernelOfTruth <kerneloftruth@gmail.com>
Kevin Bowling <kevin.bowling@kev009.com>
Kevin Greene <kevin.greene@delphix.com>
Kevin Jin <lostking2008@hotmail.com>
Kevin P. Fleming <kevin@km6g.us>
Kevin Tanguy <kevin.tanguy@ovh.net>
@ -389,8 +412,10 @@ CONTRIBUTORS:
Mark Shellenbaum <Mark.Shellenbaum@Oracle.COM>
marku89 <mar42@kola.li>
Mark Wright <markwright@internode.on.net>
Mart Frauenlob <allkind@fastest.cc>
Martin Matuska <mm@FreeBSD.org>
Martin Rüegg <martin.rueegg@metaworx.ch>
Martin Wagner <martin.wagner.dev@gmail.com>
Massimo Maggi <me@massimo-maggi.eu>
Mateusz Guzik <mjguzik@gmail.com>
Mateusz Piotrowski <0mp@FreeBSD.org>
@ -405,6 +430,7 @@ CONTRIBUTORS:
Matus Kral <matuskral@me.com>
Mauricio Faria de Oliveira <mfo@canonical.com>
Max Grossman <max.grossman@delphix.com>
Maxim Filimonov <che@bein.link>
Maximilian Mehnert <maximilian.mehnert@gmx.de>
Max Zettlmeißl <max@zettlmeissl.de>
Md Islam <mdnahian@outlook.com>
@ -417,6 +443,7 @@ CONTRIBUTORS:
Michael Niewöhner <foss@mniewoehner.de>
Michael Zhivich <mzhivich@akamai.com>
Michal Vasilek <michal@vasilek.cz>
MigeljanImeri <ImeriMigel@gmail.com>
Mike Gerdts <mike.gerdts@joyent.com>
Mike Harsch <mike@harschsystems.com>
Mike Leddy <mike.leddy@gmail.com>
@ -448,6 +475,7 @@ CONTRIBUTORS:
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Olivier Certner <olce.freebsd@certner.fr>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
omni <omni+vagant@hack.org>
Orivej Desh <orivej@gmx.fr>
@ -466,6 +494,7 @@ CONTRIBUTORS:
Peng <peng.hse@xtaotech.com>
Peter Ashford <ashford@accs.com>
Peter Dave Hello <hsu@peterdavehello.org>
Peter Doherty <peterd@acranox.org>
Peter Levine <plevine457@gmail.com>
Peter Wirdemo <peter.wirdemo@gmail.com>
Petros Koutoupis <petros@petroskoutoupis.com>
@ -479,6 +508,8 @@ CONTRIBUTORS:
Prasad Joshi <prasadjoshi124@gmail.com>
privb0x23 <privb0x23@users.noreply.github.com>
P.SCH <p88@yahoo.com>
Qiuhao Chen <chenqiuhao1997@gmail.com>
Quartz <yyhran@163.com>
Quentin Zdanis <zdanisq@gmail.com>
Rafael Kitover <rkitover@gmail.com>
RageLtMan <sempervictus@users.noreply.github.com>
@ -491,11 +522,15 @@ CONTRIBUTORS:
Riccardo Schirone <rschirone91@gmail.com>
Richard Allen <belperite@gmail.com>
Richard Elling <Richard.Elling@RichardElling.com>
Richard Kojedzinszky <richard@kojedz.in>
Richard Laager <rlaager@wiktel.com>
Richard Lowe <richlowe@richlowe.net>
Richard Sharpe <rsharpe@samba.org>
Richard Yao <ryao@gentoo.org>
Rich Ercolani <rincebrain@gmail.com>
Rick Macklem <rmacklem@uoguelph.ca>
rilysh <nightquick@proton.me>
Robert Evans <evansr@google.com>
Robert Novak <sailnfool@gmail.com>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
@ -505,11 +540,14 @@ CONTRIBUTORS:
Roman Strashkin <roman.strashkin@nexenta.com>
Ross Williams <ross@ross-williams.net>
Ruben Kerkhof <ruben@rubenkerkhof.com>
Ryan <errornointernet@envs.net>
Ryan Hirasaki <ryanhirasaki@gmail.com>
Ryan Lahfa <masterancpp@gmail.com>
Ryan Libby <rlibby@FreeBSD.org>
Ryan Moeller <freqlabs@FreeBSD.org>
Sam Atkinson <samatk@amazon.com>
Sam Hathaway <github.com@munkynet.org>
Sam James <sam@gentoo.org>
Sam Lunt <samuel.j.lunt@gmail.com>
Samuel VERSCHELDE <stormi-github@ylix.fr>
Samuel Wycliffe <samuelwycliffe@gmail.com>
@ -527,9 +565,12 @@ CONTRIBUTORS:
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
Seth Forshee <seth.forshee@canonical.com>
Seth Troisi <sethtroisi@google.com>
Shaan Nobee <sniper111@gmail.com>
Shampavman <sham.pavman@nexenta.com>
Shaun Tancheff <shaun@aeonazure.com>
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
Shen Yan <shenyanxxxy@qq.com>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
@ -537,6 +578,7 @@ CONTRIBUTORS:
Spencer Kinny <spencerkinny1995@gmail.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Stanislav Seletskiy <s.seletskiy@gmail.com>
Stefan Lendl <s.lendl@proxmox.com>
Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Stephen Blinick <stephen.blinick@delphix.com>
sterlingjensen <sterlingjensen@users.noreply.github.com>
@ -557,6 +599,7 @@ CONTRIBUTORS:
Teodor Spæren <teodor_spaeren@riseup.net>
TerraTech <TerraTech@users.noreply.github.com>
Thijs Cramer <thijs.cramer@gmail.com>
Thomas Bertschinger <bertschinger@lanl.gov>
Thomas Geppert <geppi@digitx.de>
Thomas Lamprecht <guggentom@hotmail.de>
Till Maas <opensource@till.name>
@ -569,6 +612,7 @@ CONTRIBUTORS:
Tim Schumacher <timschumi@gmx.de>
Tino Reichardt <milky-zfs@mcmilk.de>
Tobin Harding <me@tobin.cc>
Todd Seidelmann <seidelma@users.noreply.github.com>
Tom Caputi <tcaputi@datto.com>
Tom Matthews <tom@axiom-partners.com>
Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
@ -586,6 +630,7 @@ CONTRIBUTORS:
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>
Umer Saleem <usaleem@ixsystems.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Valmiky Arquissandas <kayvlim@gmail.com>
Val Packett <val@packett.cool>
Vince van Oosten <techhazard@codeforyouand.me>
@ -614,10 +659,13 @@ CONTRIBUTORS:
yuina822 <ayuichi@club.kyutech.ac.jp>
YunQiang Su <syq@debian.org>
Yuri Pankov <yuri.pankov@gmail.com>
Yuxin Wang <yuxinwang9999@gmail.com>
Yuxuan Shui <yshuiv7@gmail.com>
Zachary Bedell <zac@thebedells.org>
Zach Dykstra <dykstra.zachary@gmail.com>
zgock <zgock@nuc.base.zgock-lab.net>
Zhao Yongming <zym@apache.org>
Zhenlei Huang <zlei@FreeBSD.org>
Zhu Chuang <chuang@melty.land>
Érico Nogueira <erico.erc@gmail.com>
Đoàn Trần Công Danh <congdanhqx@gmail.com>

4
META
View File

@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.2.0
Version: 2.2.6
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.5
Linux-Maximum: 6.10
Linux-Minimum: 3.10

View File

@ -32,4 +32,4 @@ For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are any supported branches and releases starting from 12.2-RELEASE.
* Supported FreeBSD versions are any supported branches and releases starting from 12.4-RELEASE.

View File

@ -711,7 +711,7 @@ def section_archits(kstats_dict):
pd_total = int(arc_stats['prefetch_data_hits']) +\
int(arc_stats['prefetch_data_iohits']) +\
int(arc_stats['prefetch_data_misses'])
prt_2('ARC prefetch metadata accesses:', f_perc(pd_total, all_accesses),
prt_2('ARC prefetch data accesses:', f_perc(pd_total, all_accesses),
f_hits(pd_total))
pd_todo = (('Prefetch data hits:', arc_stats['prefetch_data_hits']),
('Prefetch data I/O hits:', arc_stats['prefetch_data_iohits']),
@ -793,18 +793,27 @@ def section_dmu(kstats_dict):
zfetch_stats = isolate_section('zfetchstats', kstats_dict)
zfetch_access_total = int(zfetch_stats['hits'])+int(zfetch_stats['misses'])
zfetch_access_total = int(zfetch_stats['hits']) +\
int(zfetch_stats['future']) + int(zfetch_stats['stride']) +\
int(zfetch_stats['past']) + int(zfetch_stats['misses'])
prt_1('DMU predictive prefetcher calls:', f_hits(zfetch_access_total))
prt_i2('Stream hits:',
f_perc(zfetch_stats['hits'], zfetch_access_total),
f_hits(zfetch_stats['hits']))
future = int(zfetch_stats['future']) + int(zfetch_stats['stride'])
prt_i2('Hits ahead of stream:', f_perc(future, zfetch_access_total),
f_hits(future))
prt_i2('Hits behind stream:',
f_perc(zfetch_stats['past'], zfetch_access_total),
f_hits(zfetch_stats['past']))
prt_i2('Stream misses:',
f_perc(zfetch_stats['misses'], zfetch_access_total),
f_hits(zfetch_stats['misses']))
prt_i2('Streams limit reached:',
f_perc(zfetch_stats['max_streams'], zfetch_stats['misses']),
f_hits(zfetch_stats['max_streams']))
prt_i1('Stream strides:', f_hits(zfetch_stats['stride']))
prt_i1('Prefetches issued', f_hits(zfetch_stats['io_issued']))
print()

View File

@ -157,6 +157,16 @@ cols = {
"free": [5, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
"ztotal": [6, 1000, "zfetch total prefetcher calls per second"],
"zhits": [5, 1000, "zfetch stream hits per second"],
"zahead": [6, 1000, "zfetch hits ahead of streams per second"],
"zpast": [5, 1000, "zfetch hits behind streams per second"],
"zmisses": [7, 1000, "zfetch stream misses per second"],
"zmax": [4, 1000, "zfetch limit reached per second"],
"zfuture": [7, 1000, "zfetch stream future per second"],
"zstride": [7, 1000, "zfetch stream strides per second"],
"zissued": [7, 1000, "zfetch prefetches issued per second"],
"zactive": [7, 1000, "zfetch prefetches active per second"],
}
v = {}
@ -164,6 +174,8 @@ hdr = ["time", "read", "ddread", "ddh%", "dmread", "dmh%", "pread", "ph%",
"size", "c", "avail"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "unc", "eskip", "mtxmis",
"dread", "pread", "read"]
zhdr = ["time", "ztotal", "zhits", "zahead", "zpast", "zmisses", "zmax",
"zfuture", "zstride", "zissued", "zactive"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output
@ -188,6 +200,8 @@ if sys.platform.startswith('freebsd'):
k = [ctl for ctl in sysctl.filter('kstat.zfs.misc.arcstats')
if ctl.type != sysctl.CTLTYPE_NODE]
k += [ctl for ctl in sysctl.filter('kstat.zfs.misc.zfetchstats')
if ctl.type != sysctl.CTLTYPE_NODE]
if not k:
sys.exit(1)
@ -199,19 +213,28 @@ if sys.platform.startswith('freebsd'):
continue
name, value = s.name, s.value
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
if "arcstats" in name:
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
else:
kstat["zfetch_" + name[27:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
k1 = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
k2 = ["zfetch_" + line.strip() for line in
open('/proc/spl/kstat/zfs/zfetchstats')]
if k1 is None or k2 is None:
sys.exit(1)
del k[0:2]
del k1[0:2]
del k2[0:2]
k = k1 + k2
kstat = {}
for s in k:
@ -239,6 +262,7 @@ def usage():
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
sys.stderr.write("\t -z : Print zfetch stats\n")
sys.stderr.write("\t -f : Specify specific fields to print (see -v)\n")
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
@ -357,6 +381,7 @@ def init():
global count
global hdr
global xhdr
global zhdr
global opfile
global sep
global out
@ -368,15 +393,17 @@ def init():
xflag = False
hflag = False
vflag = False
zflag = False
i = 1
try:
opts, args = getopt.getopt(
sys.argv[1:],
"axo:hvs:f:p",
"axzo:hvs:f:p",
[
"all",
"extended",
"zfetch",
"outfile",
"help",
"verbose",
@ -410,13 +437,15 @@ def init():
i += 1
if opt in ('-p', '--parsable'):
pretty_print = False
if opt in ('-z', '--zfetch'):
zflag = True
i += 1
argv = sys.argv[i:]
sint = int(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1)
if hflag or (xflag and desired_cols):
if hflag or (xflag and zflag) or ((zflag or xflag) and desired_cols):
usage()
if vflag:
@ -425,6 +454,9 @@ def init():
if xflag:
hdr = xhdr
if zflag:
hdr = zhdr
update_hdr_intr()
# check if L2ARC exists
@ -569,6 +601,17 @@ def calculate():
v["el2mru"] = d["evict_l2_eligible_mru"] // sint
v["el2inel"] = d["evict_l2_ineligible"] // sint
v["mtxmis"] = d["mutex_miss"] // sint
v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
d["zfetch_past"] + d["zfetch_misses"]) // sint
v["zhits"] = d["zfetch_hits"] // sint
v["zahead"] = (d["zfetch_future"] + d["zfetch_stride"]) // sint
v["zpast"] = d["zfetch_past"] // sint
v["zmisses"] = d["zfetch_misses"] // sint
v["zmax"] = d["zfetch_max_streams"] // sint
v["zfuture"] = d["zfetch_future"] // sint
v["zstride"] = d["zfetch_stride"] // sint
v["zissued"] = d["zfetch_io_issued"] // sint
v["zactive"] = d["zfetch_io_active"] // sint
if l2exist:
v["l2hits"] = d["l2_hits"] // sint

View File

@ -34,6 +34,7 @@
* Copyright (c) 2021 Allan Jude
* Copyright (c) 2021 Toomas Soome <tsoome@me.com>
* Copyright (c) 2023, Klara Inc.
* Copyright (c) 2023, Rob Norris <robn@despairlabs.com>
*/
#include <stdio.h>
@ -47,6 +48,7 @@
#include <sys/spa_impl.h>
#include <sys/dmu.h>
#include <sys/zap.h>
#include <sys/zap_impl.h>
#include <sys/fs/zfs.h>
#include <sys/zfs_znode.h>
#include <sys/zfs_sa.h>
@ -80,8 +82,12 @@
#include <sys/dsl_scan.h>
#include <sys/btree.h>
#include <sys/brt.h>
#include <sys/brt_impl.h>
#include <zfs_comutil.h>
#include <sys/zstd/zstd.h>
#if (__GLIBC__ && !__UCLIBC__)
#include <execinfo.h> /* for backtrace() */
#endif
#include <libnvpair.h>
#include <libzutil.h>
@ -899,6 +905,8 @@ usage(void)
"don't print label contents\n");
(void) fprintf(stderr, " -t --txg=INTEGER "
"highest txg to use when searching for uberblocks\n");
(void) fprintf(stderr, " -T --brt-stats "
"BRT statistics\n");
(void) fprintf(stderr, " -u --uberblock "
"uberblock\n");
(void) fprintf(stderr, " -U --cachefile=PATH "
@ -922,11 +930,41 @@ usage(void)
static void
dump_debug_buffer(void)
{
if (dump_opt['G']) {
(void) printf("\n");
(void) fflush(stdout);
zfs_dbgmsg_print("zdb");
}
ssize_t ret __attribute__((unused));
if (!dump_opt['G'])
return;
/*
* We use write() instead of printf() so that this function
* is safe to call from a signal handler.
*/
ret = write(STDOUT_FILENO, "\n", 1);
zfs_dbgmsg_print("zdb");
}
#define BACKTRACE_SZ 100
static void sig_handler(int signo)
{
struct sigaction action;
#if (__GLIBC__ && !__UCLIBC__) /* backtrace() is a GNU extension */
int nptrs;
void *buffer[BACKTRACE_SZ];
nptrs = backtrace(buffer, BACKTRACE_SZ);
backtrace_symbols_fd(buffer, nptrs, STDERR_FILENO);
#endif
dump_debug_buffer();
/*
* Restore default action and re-raise signal so SIGSEGV and
* SIGABRT can trigger a core dump.
*/
action.sa_handler = SIG_DFL;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
(void) sigaction(signo, &action, NULL);
raise(signo);
}
/*
@ -999,6 +1037,15 @@ zdb_nicenum(uint64_t num, char *buf, size_t buflen)
nicenum(num, buf, buflen);
}
static void
zdb_nicebytes(uint64_t bytes, char *buf, size_t buflen)
{
if (dump_opt['P'])
(void) snprintf(buf, buflen, "%llu", (longlong_t)bytes);
else
zfs_nicebytes(bytes, buf, buflen);
}
static const char histo_stars[] = "****************************************";
static const uint64_t histo_width = sizeof (histo_stars) - 1;
@ -1186,16 +1233,33 @@ dump_zap(objset_t *os, uint64_t object, void *data, size_t size)
for (zap_cursor_init(&zc, os, object);
zap_cursor_retrieve(&zc, &attr) == 0;
zap_cursor_advance(&zc)) {
(void) printf("\t\t%s = ", attr.za_name);
boolean_t key64 =
!!(zap_getflags(zc.zc_zap) & ZAP_FLAG_UINT64_KEY);
if (key64)
(void) printf("\t\t0x%010lx = ",
*(uint64_t *)attr.za_name);
else
(void) printf("\t\t%s = ", attr.za_name);
if (attr.za_num_integers == 0) {
(void) printf("\n");
continue;
}
prop = umem_zalloc(attr.za_num_integers *
attr.za_integer_length, UMEM_NOFAIL);
(void) zap_lookup(os, object, attr.za_name,
attr.za_integer_length, attr.za_num_integers, prop);
if (attr.za_integer_length == 1) {
if (key64)
(void) zap_lookup_uint64(os, object,
(const uint64_t *)attr.za_name, 1,
attr.za_integer_length, attr.za_num_integers,
prop);
else
(void) zap_lookup(os, object, attr.za_name,
attr.za_integer_length, attr.za_num_integers,
prop);
if (attr.za_integer_length == 1 && !key64) {
if (strcmp(attr.za_name,
DSL_CRYPTO_KEY_MASTER_KEY) == 0 ||
strcmp(attr.za_name,
@ -1214,6 +1278,10 @@ dump_zap(objset_t *os, uint64_t object, void *data, size_t size)
} else {
for (i = 0; i < attr.za_num_integers; i++) {
switch (attr.za_integer_length) {
case 1:
(void) printf("%u ",
((uint8_t *)prop)[i]);
break;
case 2:
(void) printf("%u ",
((uint16_t *)prop)[i]);
@ -2081,6 +2149,76 @@ dump_all_ddts(spa_t *spa)
dump_dedup_ratio(&dds_total);
}
static void
dump_brt(spa_t *spa)
{
if (!spa_feature_is_enabled(spa, SPA_FEATURE_BLOCK_CLONING)) {
printf("BRT: unsupported on this pool\n");
return;
}
if (!spa_feature_is_active(spa, SPA_FEATURE_BLOCK_CLONING)) {
printf("BRT: empty\n");
return;
}
brt_t *brt = spa->spa_brt;
VERIFY(brt);
char count[32], used[32], saved[32];
zdb_nicebytes(brt_get_used(spa), used, sizeof (used));
zdb_nicebytes(brt_get_saved(spa), saved, sizeof (saved));
uint64_t ratio = brt_get_ratio(spa);
printf("BRT: used %s; saved %s; ratio %llu.%02llux\n", used, saved,
(u_longlong_t)(ratio / 100), (u_longlong_t)(ratio % 100));
if (dump_opt['T'] < 2)
return;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL)
continue;
if (!brtvd->bv_initiated) {
printf("BRT: vdev %" PRIu64 ": empty\n", vdevid);
continue;
}
zdb_nicenum(brtvd->bv_totalcount, count, sizeof (count));
zdb_nicebytes(brtvd->bv_usedspace, used, sizeof (used));
zdb_nicebytes(brtvd->bv_savedspace, saved, sizeof (saved));
printf("BRT: vdev %" PRIu64 ": refcnt %s; used %s; saved %s\n",
vdevid, count, used, saved);
}
if (dump_opt['T'] < 3)
return;
char dva[64];
printf("\n%-16s %-10s\n", "DVA", "REFCNT");
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL || !brtvd->bv_initiated)
continue;
zap_cursor_t zc;
zap_attribute_t za;
for (zap_cursor_init(&zc, brt->brt_mos, brtvd->bv_mos_entries);
zap_cursor_retrieve(&zc, &za) == 0;
zap_cursor_advance(&zc)) {
uint64_t offset = *(uint64_t *)za.za_name;
uint64_t refcnt = za.za_first_integer;
snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx", vdevid,
(u_longlong_t)offset);
printf("%-16s %-10llu\n", dva, (u_longlong_t)refcnt);
}
zap_cursor_fini(&zc);
}
}
static void
dump_dtl_seg(void *arg, uint64_t start, uint64_t size)
{
@ -2277,7 +2415,7 @@ static void
snprintf_zstd_header(spa_t *spa, char *blkbuf, size_t buflen,
const blkptr_t *bp)
{
abd_t *pabd;
static abd_t *pabd = NULL;
void *buf;
zio_t *zio;
zfs_zstdhdr_t zstd_hdr;
@ -2308,7 +2446,8 @@ snprintf_zstd_header(spa_t *spa, char *blkbuf, size_t buflen,
return;
}
pabd = abd_alloc_for_io(SPA_MAXBLOCKSIZE, B_FALSE);
if (!pabd)
pabd = abd_alloc_for_io(SPA_MAXBLOCKSIZE, B_FALSE);
zio = zio_root(spa, NULL, NULL, 0);
/* Decrypt but don't decompress so we can read the compression header */
@ -5133,7 +5272,7 @@ dump_label(const char *dev)
sizeof (cksum_record_t), offsetof(cksum_record_t, link));
psize = statbuf.st_size;
psize = P2ALIGN(psize, (uint64_t)sizeof (vdev_label_t));
psize = P2ALIGN_TYPED(psize, sizeof (vdev_label_t), uint64_t);
ashift = SPA_MINBLOCKSHIFT;
/*
@ -7957,6 +8096,17 @@ dump_mos_leaks(spa_t *spa)
}
}
if (spa->spa_brt != NULL) {
brt_t *brt = spa->spa_brt;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd != NULL && brtvd->bv_initiated) {
mos_obj_refd(brtvd->bv_mos_brtvdev);
mos_obj_refd(brtvd->bv_mos_entries);
}
}
}
/*
* Visit all allocated objects and make sure they are referenced.
*/
@ -8093,6 +8243,9 @@ dump_zpool(spa_t *spa)
if (dump_opt['D'])
dump_all_ddts(spa);
if (dump_opt['T'])
dump_brt(spa);
if (dump_opt['d'] > 2 || dump_opt['m'])
dump_metaslabs(spa);
if (dump_opt['M'])
@ -8404,6 +8557,14 @@ zdb_decompress_block(abd_t *pabd, void *buf, void *lbuf, uint64_t lsize,
*cfuncp++ = ZIO_COMPRESS_LZ4;
*cfuncp++ = ZIO_COMPRESS_LZJB;
mask |= ZIO_COMPRESS_MASK(LZ4) | ZIO_COMPRESS_MASK(LZJB);
/*
* Every gzip level has the same decompressor, no need to
* run it 9 times per bruteforce attempt.
*/
mask |= ZIO_COMPRESS_MASK(GZIP_2) | ZIO_COMPRESS_MASK(GZIP_3);
mask |= ZIO_COMPRESS_MASK(GZIP_4) | ZIO_COMPRESS_MASK(GZIP_5);
mask |= ZIO_COMPRESS_MASK(GZIP_6) | ZIO_COMPRESS_MASK(GZIP_7);
mask |= ZIO_COMPRESS_MASK(GZIP_8) | ZIO_COMPRESS_MASK(GZIP_9);
for (int c = 0; c < ZIO_COMPRESS_FUNCTIONS; c++)
if (((1ULL << c) & mask) == 0)
*cfuncp++ = c;
@ -8828,9 +8989,27 @@ main(int argc, char **argv)
char *spa_config_path_env, *objset_str;
boolean_t target_is_spa = B_TRUE, dataset_lookup = B_FALSE;
nvlist_t *cfg = NULL;
struct sigaction action;
dprintf_setup(&argc, argv);
/*
* Set up signal handlers, so if we crash due to bad on-disk data we
* can get more info. Unlike ztest, we don't bail out if we can't set
* up signal handlers, because zdb is very useful without them.
*/
action.sa_handler = sig_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGSEGV, &action, NULL) < 0) {
(void) fprintf(stderr, "zdb: cannot catch SIGSEGV: %s\n",
strerror(errno));
}
if (sigaction(SIGABRT, &action, NULL) < 0) {
(void) fprintf(stderr, "zdb: cannot catch SIGABRT: %s\n",
strerror(errno));
}
/*
* If there is an environment variable SPA_CONFIG_PATH it overrides
* default spa_config_path setting. If -U flag is specified it will
@ -8879,6 +9058,7 @@ main(int argc, char **argv)
{"io-stats", no_argument, NULL, 's'},
{"simulate-dedup", no_argument, NULL, 'S'},
{"txg", required_argument, NULL, 't'},
{"brt-stats", no_argument, NULL, 'T'},
{"uberblock", no_argument, NULL, 'u'},
{"cachefile", required_argument, NULL, 'U'},
{"verbose", no_argument, NULL, 'v'},
@ -8892,7 +9072,7 @@ main(int argc, char **argv)
};
while ((c = getopt_long(argc, argv,
"AbBcCdDeEFGhiI:kK:lLmMNo:Op:PqrRsSt:uU:vVx:XYyZ",
"AbBcCdDeEFGhiI:kK:lLmMNo:Op:PqrRsSt:TuU:vVx:XYyZ",
long_options, NULL)) != -1) {
switch (c) {
case 'b':
@ -8914,6 +9094,7 @@ main(int argc, char **argv)
case 'R':
case 's':
case 'S':
case 'T':
case 'u':
case 'y':
case 'Z':
@ -9076,22 +9257,6 @@ main(int argc, char **argv)
if (dump_opt['l'])
return (dump_label(argv[0]));
if (dump_opt['O']) {
if (argc != 2)
usage();
dump_opt['v'] = verbose + 3;
return (dump_path(argv[0], argv[1], NULL));
}
if (dump_opt['r']) {
target_is_spa = B_FALSE;
if (argc != 3)
usage();
dump_opt['v'] = verbose;
error = dump_path(argv[0], argv[1], &object);
if (error != 0)
fatal("internal error: %s", strerror(error));
}
if (dump_opt['X'] || dump_opt['F'])
rewind = ZPOOL_DO_REWIND |
(dump_opt['X'] ? ZPOOL_EXTREME_REWIND : 0);
@ -9192,6 +9357,29 @@ main(int argc, char **argv)
searchdirs = NULL;
}
/*
* We need to make sure to process -O option or call
* dump_path after the -e option has been processed,
* which imports the pool to the namespace if it's
* not in the cachefile.
*/
if (dump_opt['O']) {
if (argc != 2)
usage();
dump_opt['v'] = verbose + 3;
return (dump_path(argv[0], argv[1], NULL));
}
if (dump_opt['r']) {
target_is_spa = B_FALSE;
if (argc != 3)
usage();
dump_opt['v'] = verbose;
error = dump_path(argv[0], argv[1], &object);
if (error != 0)
fatal("internal error: %s", strerror(error));
}
/*
* import_checkpointed_state makes the assumption that the
* target pool that we pass it is already part of the spa

View File

@ -168,7 +168,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length);
if (txtype == TX_WRITE2 || verbose < 5)
if (txtype == TX_WRITE2 || verbose < 4)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
@ -178,6 +178,8 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
if (verbose < 5)
return;
if (BP_IS_HOLE(bp)) {
(void) printf("\t\t\tLSIZE 0x%llx\n",
(u_longlong_t)BP_GET_LSIZE(bp));
@ -202,6 +204,9 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
if (error)
goto out;
} else {
if (verbose < 5)
return;
/* data is stored after the end of the lr_write record */
data = abd_alloc(lr->lr_length, B_FALSE);
abd_copy_from_buf(data, lr + 1, lr->lr_length);
@ -217,6 +222,28 @@ out:
abd_free(data);
}
static void
zil_prt_rec_write_enc(zilog_t *zilog, int txtype, const void *arg)
{
(void) txtype;
const lr_write_t *lr = arg;
const blkptr_t *bp = &lr->lr_blkptr;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%s(encrypted)\n", tab_prefix);
if (verbose < 4)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) &&
bp->blk_birth >= spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
}
}
static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, const void *arg)
{
@ -312,11 +339,34 @@ zil_prt_rec_clone_range(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_clone_range_t *lr = arg;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%sfoid %llu, offset %llx, length %llx, blksize %llx\n",
tab_prefix, (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length, (u_longlong_t)lr->lr_blksz);
if (verbose < 4)
return;
for (unsigned int i = 0; i < lr->lr_nbps; i++) {
(void) printf("%s[%u/%llu] ", tab_prefix, i + 1,
(u_longlong_t)lr->lr_nbps);
print_log_bp(&lr->lr_bps[i], "");
}
}
static void
zil_prt_rec_clone_range_enc(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_clone_range_t *lr = arg;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%s(encrypted)\n", tab_prefix);
if (verbose < 4)
return;
for (unsigned int i = 0; i < lr->lr_nbps; i++) {
(void) printf("%s[%u/%llu] ", tab_prefix, i + 1,
(u_longlong_t)lr->lr_nbps);
@ -327,6 +377,7 @@ zil_prt_rec_clone_range(zilog_t *zilog, int txtype, const void *arg)
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, const void *);
typedef struct zil_rec_info {
zil_prt_rec_func_t zri_print;
zil_prt_rec_func_t zri_print_enc;
const char *zri_name;
uint64_t zri_count;
} zil_rec_info_t;
@ -341,7 +392,9 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = zil_prt_rec_remove, .zri_name = "TX_RMDIR "},
{.zri_print = zil_prt_rec_link, .zri_name = "TX_LINK "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME "},
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE "},
{.zri_print = zil_prt_rec_write,
.zri_print_enc = zil_prt_rec_write_enc,
.zri_name = "TX_WRITE "},
{.zri_print = zil_prt_rec_truncate, .zri_name = "TX_TRUNCATE "},
{.zri_print = zil_prt_rec_setattr, .zri_name = "TX_SETATTR "},
{.zri_print = zil_prt_rec_acl, .zri_name = "TX_ACL_V0 "},
@ -358,6 +411,7 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME_EXCHANGE "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME_WHITEOUT "},
{.zri_print = zil_prt_rec_clone_range,
.zri_print_enc = zil_prt_rec_clone_range_enc,
.zri_name = "TX_CLONE_RANGE "},
};
@ -384,6 +438,8 @@ print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg)
if (txtype && verbose >= 3) {
if (!zilog->zl_os->os_encrypted) {
zil_rec_info[txtype].zri_print(zilog, txtype, lr);
} else if (zil_rec_info[txtype].zri_print_enc) {
zil_rec_info[txtype].zri_print_enc(zilog, txtype, lr);
} else {
(void) printf("%s(encrypted)\n", tab_prefix);
}

View File

@ -22,6 +22,7 @@
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
/*
@ -231,28 +232,6 @@ fmd_prop_get_int32(fmd_hdl_t *hdl, const char *name)
if (strcmp(name, "spare_on_remove") == 0)
return (1);
if (strcmp(name, "io_N") == 0 || strcmp(name, "checksum_N") == 0)
return (10); /* N = 10 events */
return (0);
}
int64_t
fmd_prop_get_int64(fmd_hdl_t *hdl, const char *name)
{
(void) hdl;
/*
* These can be looked up in mp->modinfo->fmdi_props
* For now we just hard code for phase 2. In the
* future, there can be a ZED based override.
*/
if (strcmp(name, "remove_timeout") == 0)
return (15ULL * 1000ULL * 1000ULL * 1000ULL); /* 15 sec */
if (strcmp(name, "io_T") == 0 || strcmp(name, "checksum_T") == 0)
return (1000ULL * 1000ULL * 1000ULL * 600ULL); /* 10 min */
return (0);
}
@ -535,6 +514,19 @@ fmd_serd_exists(fmd_hdl_t *hdl, const char *name)
return (fmd_serd_eng_lookup(&mp->mod_serds, name) != NULL);
}
int
fmd_serd_active(fmd_hdl_t *hdl, const char *name)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return (0);
}
return (fmd_serd_eng_fired(sgp) || !fmd_serd_eng_empty(sgp));
}
void
fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
{
@ -543,12 +535,10 @@ fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return;
} else {
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
int
@ -556,16 +546,21 @@ fmd_serd_record(fmd_hdl_t *hdl, const char *name, fmd_event_t *ep)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
int err;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "failed to add record to SERD engine '%s'",
name);
return (0);
}
err = fmd_serd_eng_record(sgp, ep->ev_hrt);
return (fmd_serd_eng_record(sgp, ep->ev_hrt));
}
return (err);
void
fmd_serd_gc(fmd_hdl_t *hdl)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_hash_apply(&mp->mod_serds, fmd_serd_eng_gc, NULL);
}
/* FMD Timers */
@ -579,7 +574,7 @@ _timer_notify(union sigval sv)
const fmd_hdl_ops_t *ops = mp->mod_info->fmdi_ops;
struct itimerspec its;
fmd_hdl_debug(hdl, "timer fired (%p)", ftp->ft_tid);
fmd_hdl_debug(hdl, "%s timer fired (%p)", mp->mod_name, ftp->ft_tid);
/* disarm the timer */
memset(&its, 0, sizeof (struct itimerspec));

View File

@ -151,7 +151,6 @@ extern void fmd_hdl_vdebug(fmd_hdl_t *, const char *, va_list);
extern void fmd_hdl_debug(fmd_hdl_t *, const char *, ...);
extern int32_t fmd_prop_get_int32(fmd_hdl_t *, const char *);
extern int64_t fmd_prop_get_int64(fmd_hdl_t *, const char *);
#define FMD_STAT_NOALLOC 0x0 /* fmd should use caller's memory */
#define FMD_STAT_ALLOC 0x1 /* fmd should allocate stats memory */
@ -195,10 +194,12 @@ extern size_t fmd_buf_size(fmd_hdl_t *, fmd_case_t *, const char *);
extern void fmd_serd_create(fmd_hdl_t *, const char *, uint_t, hrtime_t);
extern void fmd_serd_destroy(fmd_hdl_t *, const char *);
extern int fmd_serd_exists(fmd_hdl_t *, const char *);
extern int fmd_serd_active(fmd_hdl_t *, const char *);
extern void fmd_serd_reset(fmd_hdl_t *, const char *);
extern int fmd_serd_record(fmd_hdl_t *, const char *, fmd_event_t *);
extern int fmd_serd_fired(fmd_hdl_t *, const char *);
extern int fmd_serd_empty(fmd_hdl_t *, const char *);
extern void fmd_serd_gc(fmd_hdl_t *);
extern id_t fmd_timer_install(fmd_hdl_t *, void *, fmd_event_t *, hrtime_t);
extern void fmd_timer_remove(fmd_hdl_t *, id_t);

View File

@ -310,8 +310,9 @@ fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
}
void
fmd_serd_eng_gc(fmd_serd_eng_t *sgp)
fmd_serd_eng_gc(fmd_serd_eng_t *sgp, void *arg)
{
(void) arg;
fmd_serd_elem_t *sep, *nep;
hrtime_t hrt;

View File

@ -77,7 +77,7 @@ extern int fmd_serd_eng_fired(fmd_serd_eng_t *);
extern int fmd_serd_eng_empty(fmd_serd_eng_t *);
extern void fmd_serd_eng_reset(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *, void *);
#ifdef __cplusplus
}

View File

@ -23,6 +23,7 @@
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
#include <stddef.h>
@ -47,11 +48,16 @@
#define DEFAULT_CHECKSUM_T 600 /* seconds */
#define DEFAULT_IO_N 10 /* events */
#define DEFAULT_IO_T 600 /* seconds */
#define DEFAULT_SLOW_IO_N 10 /* events */
#define DEFAULT_SLOW_IO_T 30 /* seconds */
#define CASE_GC_TIMEOUT_SECS 43200 /* 12 hours */
/*
* Our serd engines are named 'zfs_<pool_guid>_<vdev_guid>_{checksum,io}'. This
* #define reserves enough space for two 64-bit hex values plus the length of
* the longest string.
* Our serd engines are named in the following format:
* 'zfs_<pool_guid>_<vdev_guid>_{checksum,io,slow_io}'
* This #define reserves enough space for two 64-bit hex values plus the
* length of the longest string.
*/
#define MAX_SERDLEN (16 * 2 + sizeof ("zfs___checksum"))
@ -68,6 +74,7 @@ typedef struct zfs_case_data {
int zc_pool_state;
char zc_serd_checksum[MAX_SERDLEN];
char zc_serd_io[MAX_SERDLEN];
char zc_serd_slow_io[MAX_SERDLEN];
int zc_has_remove_timer;
} zfs_case_data_t;
@ -114,7 +121,8 @@ zfs_de_stats_t zfs_stats = {
{ "resource_drops", FMD_TYPE_UINT64, "resource related ereports" }
};
static hrtime_t zfs_remove_timeout;
/* wait 15 seconds after a removal */
static hrtime_t zfs_remove_timeout = SEC2NSEC(15);
uu_list_pool_t *zfs_case_pool;
uu_list_t *zfs_cases;
@ -124,6 +132,8 @@ uu_list_t *zfs_cases;
#define ZFS_MAKE_EREPORT(type) \
FM_EREPORT_CLASS "." ZFS_ERROR_CLASS "." type
static void zfs_purge_cases(fmd_hdl_t *hdl);
/*
* Write out the persistent representation of an active case.
*/
@ -170,6 +180,42 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
return (zcp);
}
/*
* count other unique slow-io cases in a pool
*/
static uint_t
zfs_other_slow_cases(fmd_hdl_t *hdl, const zfs_case_data_t *zfs_case)
{
zfs_case_t *zcp;
uint_t cases = 0;
static hrtime_t next_check = 0;
/*
* Note that plumbing in some external GC would require adding locking,
* since most of this module code is not thread safe and assumes there
* is only one thread running against the module. So we perform GC here
* inline periodically so that future delay induced faults will be
* possible once the issue causing multiple vdev delays is resolved.
*/
if (gethrestime_sec() > next_check) {
/* Periodically purge old SERD entries and stale cases */
fmd_serd_gc(hdl);
zfs_purge_cases(hdl);
next_check = gethrestime_sec() + CASE_GC_TIMEOUT_SECS;
}
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == zfs_case->zc_pool_guid &&
zcp->zc_data.zc_vdev_guid != zfs_case->zc_vdev_guid &&
zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
fmd_serd_active(hdl, zcp->zc_data.zc_serd_slow_io)) {
cases++;
}
}
return (cases);
}
/*
* Iterate over any active cases. If any cases are associated with a pool or
* vdev which is no longer present on the system, close the associated case.
@ -376,6 +422,14 @@ zfs_serd_name(char *buf, uint64_t pool_guid, uint64_t vdev_guid,
(long long unsigned int)vdev_guid, type);
}
static void
zfs_case_retire(fmd_hdl_t *hdl, zfs_case_t *zcp)
{
fmd_hdl_debug(hdl, "retiring case");
fmd_case_close(hdl, zcp->zc_case);
}
/*
* Solve a given ZFS case. This first checks to make sure the diagnosis is
* still valid, as well as cleaning up any pending timer associated with the
@ -632,9 +686,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DATA)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) == 0) {
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0) {
zfs_stats.resource_drops.fmds_value.ui64++;
return;
}
@ -702,6 +754,9 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (zcp->zc_data.zc_serd_checksum[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_slow_io);
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_RSRC(FM_RESOURCE_STATECHANGE))) {
uint64_t state = 0;
@ -730,7 +785,11 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (fmd_case_solved(hdl, zcp->zc_case))
return;
fmd_hdl_debug(hdl, "error event '%s'", class);
if (vdev_guid)
fmd_hdl_debug(hdl, "error event '%s', vdev %llu", class,
vdev_guid);
else
fmd_hdl_debug(hdl, "error event '%s'", class);
/*
* Determine if we should solve the case and generate a fault. We solve
@ -779,6 +838,8 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO_FAILURE)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
const char *failmode = NULL;
boolean_t checkremove = B_FALSE;
@ -814,6 +875,51 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
}
if (fmd_serd_record(hdl, zcp->zc_data.zc_serd_io, ep))
checkremove = B_TRUE;
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY))) {
uint64_t slow_io_n, slow_io_t;
/*
* Create a slow io SERD engine when the VDEV has the
* 'vdev_slow_io_n' and 'vdev_slow_io_n' properties.
*/
if (zcp->zc_data.zc_serd_slow_io[0] == '\0' &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_N,
&slow_io_n) == 0 &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_T,
&slow_io_t) == 0) {
zfs_serd_name(zcp->zc_data.zc_serd_slow_io,
pool_guid, vdev_guid, "slow_io");
fmd_serd_create(hdl,
zcp->zc_data.zc_serd_slow_io,
slow_io_n,
SEC2NSEC(slow_io_t));
zfs_case_serialize(zcp);
}
/* Pass event to SERD engine and see if this triggers */
if (zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
fmd_serd_record(hdl, zcp->zc_data.zc_serd_slow_io,
ep)) {
/*
* Ignore a slow io diagnosis when other
* VDEVs in the pool show signs of being slow.
*/
if (zfs_other_slow_cases(hdl, &zcp->zc_data)) {
zfs_case_retire(hdl, zcp);
fmd_hdl_debug(hdl, "pool %llu has "
"multiple slow io cases -- skip "
"degrading vdev %llu",
(u_longlong_t)
zcp->zc_data.zc_pool_guid,
(u_longlong_t)
zcp->zc_data.zc_vdev_guid);
} else {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.vdev.slow_io");
}
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CHECKSUM))) {
/*
@ -924,6 +1030,8 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_io);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_slow_io);
if (zcp->zc_data.zc_has_remove_timer)
fmd_timer_remove(hdl, zcp->zc_remove_timer);
@ -932,30 +1040,15 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
/*
* We use the fmd gc entry point to look for old cases that no longer apply.
* This allows us to keep our set of case data small in a long running system.
*/
static void
zfs_fm_gc(fmd_hdl_t *hdl)
{
zfs_purge_cases(hdl);
}
static const fmd_hdl_ops_t fmd_ops = {
zfs_fm_recv, /* fmdo_recv */
zfs_fm_timeout, /* fmdo_timeout */
zfs_fm_close, /* fmdo_close */
NULL, /* fmdo_stats */
zfs_fm_gc, /* fmdo_gc */
NULL, /* fmdo_gc */
};
static const fmd_prop_t fmd_props[] = {
{ "checksum_N", FMD_TYPE_UINT32, "10" },
{ "checksum_T", FMD_TYPE_TIME, "10min" },
{ "io_N", FMD_TYPE_UINT32, "10" },
{ "io_T", FMD_TYPE_TIME, "10min" },
{ "remove_timeout", FMD_TYPE_TIME, "15sec" },
{ NULL, 0, NULL }
};
@ -996,8 +1089,6 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
(void) fmd_stat_create(hdl, FMD_STAT_NOALLOC, sizeof (zfs_stats) /
sizeof (fmd_stat_t), (fmd_stat_t *)&zfs_stats);
zfs_remove_timeout = fmd_prop_get_int64(hdl, "remove_timeout");
}
void

View File

@ -24,6 +24,7 @@
* Copyright 2014 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, 2017, Intel Corporation.
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
* Copyright (c) 2023, Klara Inc.
*/
/*
@ -146,6 +147,17 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
return (0);
}
/*
* Write an array of strings to the zed log
*/
static void lines_to_zed_log_msg(char **lines, int lines_cnt)
{
int i;
for (i = 0; i < lines_cnt; i++) {
zed_log_msg(LOG_INFO, "%s", lines[i]);
}
}
/*
* Two stage replace on Linux
* since we get disk notifications
@ -193,14 +205,21 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
uint64_t is_spare = 0;
const char *physpath = NULL, *new_devid = NULL, *enc_sysfs_path = NULL;
char rawpath[PATH_MAX], fullpath[PATH_MAX];
char devpath[PATH_MAX];
char pathbuf[PATH_MAX];
int ret;
int online_flag = ZFS_ONLINE_CHECKREMOVE | ZFS_ONLINE_UNSPARE;
boolean_t is_sd = B_FALSE;
boolean_t is_mpath_wholedisk = B_FALSE;
uint_t c;
vdev_stat_t *vs;
char **lines = NULL;
int lines_cnt = 0;
/*
* Get the persistent path, typically under the '/dev/disk/by-id' or
* '/dev/disk/by-vdev' directories. Note that this path can change
* when a vdev is replaced with a new disk.
*/
if (nvlist_lookup_string(vdev, ZPOOL_CONFIG_PATH, &path) != 0)
return;
@ -214,8 +233,12 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
}
(void) nvlist_lookup_string(vdev, ZPOOL_CONFIG_PHYS_PATH, &physpath);
update_vdev_config_dev_sysfs_path(vdev, path,
ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH);
(void) nvlist_lookup_string(vdev, ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH,
&enc_sysfs_path);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK, &wholedisk);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_OFFLINE, &offline);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_FAULTED, &faulted);
@ -357,15 +380,17 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
(void) snprintf(rawpath, sizeof (rawpath), "%s%s",
is_sd ? DEV_BYVDEV_PATH : DEV_BYPATH_PATH, physpath);
if (realpath(rawpath, devpath) == NULL && !is_mpath_wholedisk) {
if (realpath(rawpath, pathbuf) == NULL && !is_mpath_wholedisk) {
zed_log_msg(LOG_INFO, " realpath: %s failed (%s)",
rawpath, strerror(errno));
(void) zpool_vdev_online(zhp, fullpath, ZFS_ONLINE_FORCEFAULT,
&newstate);
int err = zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_FORCEFAULT, &newstate);
zed_log_msg(LOG_INFO, " zpool_vdev_online: %s FORCEFAULT (%s)",
fullpath, libzfs_error_description(g_zfshdl));
zed_log_msg(LOG_INFO, " zpool_vdev_online: %s FORCEFAULT (%s) "
"err %d, new state %d",
fullpath, libzfs_error_description(g_zfshdl), err,
err ? (int)newstate : 0);
return;
}
@ -383,6 +408,22 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
if (is_mpath_wholedisk) {
/* Don't label device mapper or multipath disks. */
zed_log_msg(LOG_INFO,
" it's a multipath wholedisk, don't label");
if (zpool_prepare_disk(zhp, vdev, "autoreplace", &lines,
&lines_cnt) != 0) {
zed_log_msg(LOG_INFO,
" zpool_prepare_disk: could not "
"prepare '%s' (%s)", fullpath,
libzfs_error_description(g_zfshdl));
if (lines_cnt > 0) {
zed_log_msg(LOG_INFO,
" zfs_prepare_disk output:");
lines_to_zed_log_msg(lines, lines_cnt);
}
libzfs_free_str_array(lines, lines_cnt);
return;
}
} else if (!labeled) {
/*
* we're auto-replacing a raw disk, so label it first
@ -399,16 +440,24 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* to trigger a ZFS fault for the device (and any hot spare
* replacement).
*/
leafname = strrchr(devpath, '/') + 1;
leafname = strrchr(pathbuf, '/') + 1;
/*
* If this is a request to label a whole disk, then attempt to
* write out the label.
*/
if (zpool_label_disk(g_zfshdl, zhp, leafname) != 0) {
zed_log_msg(LOG_INFO, " zpool_label_disk: could not "
if (zpool_prepare_and_label_disk(g_zfshdl, zhp, leafname,
vdev, "autoreplace", &lines, &lines_cnt) != 0) {
zed_log_msg(LOG_WARNING,
" zpool_prepare_and_label_disk: could not "
"label '%s' (%s)", leafname,
libzfs_error_description(g_zfshdl));
if (lines_cnt > 0) {
zed_log_msg(LOG_INFO,
" zfs_prepare_disk output:");
lines_to_zed_log_msg(lines, lines_cnt);
}
libzfs_free_str_array(lines, lines_cnt);
(void) zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_FORCEFAULT, &newstate);
@ -431,7 +480,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
sizeof (device->pd_physpath));
list_insert_tail(&g_device_list, device);
zed_log_msg(LOG_INFO, " zpool_label_disk: async '%s' (%llu)",
zed_log_msg(LOG_NOTICE, " zpool_label_disk: async '%s' (%llu)",
leafname, (u_longlong_t)guid);
return; /* resumes at EC_DEV_ADD.ESC_DISK for partition */
@ -454,8 +503,8 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
}
if (!found) {
/* unexpected partition slice encountered */
zed_log_msg(LOG_INFO, "labeled disk %s unexpected here",
fullpath);
zed_log_msg(LOG_WARNING, "labeled disk %s was "
"unexpected here", fullpath);
(void) zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_FORCEFAULT, &newstate);
return;
@ -464,10 +513,21 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
zed_log_msg(LOG_INFO, " zpool_label_disk: resume '%s' (%llu)",
physpath, (u_longlong_t)guid);
(void) snprintf(devpath, sizeof (devpath), "%s%s",
DEV_BYID_PATH, new_devid);
/*
* Paths that begin with '/dev/disk/by-id/' will change and so
* they must be updated before calling zpool_vdev_attach().
*/
if (strncmp(path, DEV_BYID_PATH, strlen(DEV_BYID_PATH)) == 0) {
(void) snprintf(pathbuf, sizeof (pathbuf), "%s%s",
DEV_BYID_PATH, new_devid);
zed_log_msg(LOG_INFO, " zpool_label_disk: path '%s' "
"replaced by '%s'", path, pathbuf);
path = pathbuf;
}
}
libzfs_free_str_array(lines, lines_cnt);
/*
* Construct the root vdev to pass to zpool_vdev_attach(). While adding
* the entire vdev structure is harmless, we construct a reduced set of
@ -506,9 +566,11 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* Wait for udev to verify the links exist, then auto-replace
* the leaf disk at same physical location.
*/
if (zpool_label_disk_wait(path, 3000) != 0) {
zed_log_msg(LOG_WARNING, "zfs_mod: expected replacement "
"disk %s is missing", path);
if (zpool_label_disk_wait(path, DISK_LABEL_WAIT) != 0) {
zed_log_msg(LOG_WARNING, "zfs_mod: pool '%s', after labeling "
"replacement disk, the expected disk partition link '%s' "
"is missing after waiting %u ms",
zpool_get_name(zhp), path, DISK_LABEL_WAIT);
nvlist_free(nvroot);
return;
}
@ -523,7 +585,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
B_TRUE, B_FALSE);
}
zed_log_msg(LOG_INFO, " zpool_vdev_replace: %s with %s (%s)",
zed_log_msg(LOG_WARNING, " zpool_vdev_replace: %s with %s (%s)",
fullpath, path, (ret == 0) ? "no errors" :
libzfs_error_description(g_zfshdl));
@ -621,7 +683,7 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
dp->dd_prop, path);
dp->dd_found = B_TRUE;
/* pass the new devid for use by replacing code */
/* pass the new devid for use by auto-replacing code */
if (dp->dd_new_devid != NULL) {
(void) nvlist_add_string(nvl, "new_devid",
dp->dd_new_devid);

View File

@ -523,6 +523,9 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.checksum")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.slow_io")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.device")) {
fault_device = B_FALSE;

View File

@ -5,7 +5,7 @@
#
# Bad SCSI disks can often "disappear and reappear" causing all sorts of chaos
# as they flip between FAULTED and ONLINE. If
# ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is set in zed.rc, and the disk gets
# ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT is set in zed.rc, and the disk gets
# FAULTED, then power down the slot via sysfs:
#
# /sys/class/enclosure/<enclosure>/<slot>/power_status
@ -19,7 +19,7 @@
# Exit codes:
# 0: slot successfully powered off
# 1: enclosure not available
# 2: ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT disabled
# 2: ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT disabled
# 3: vdev was not FAULTED
# 4: The enclosure sysfs path passed from ZFS does not exist
# 5: Enclosure slot didn't actually turn off after we told it to
@ -32,7 +32,7 @@ if [ ! -d /sys/class/enclosure ] ; then
exit 1
fi
if [ "${ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT}" != "1" ] ; then
if [ "${ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT}" != "1" ] ; then
exit 2
fi

View File

@ -205,6 +205,10 @@ zed_notify()
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_ntfy "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
[ "${num_success}" -gt 0 ] && return 0
[ "${num_failure}" -gt 0 ] && return 1
return 2
@ -527,6 +531,100 @@ zed_notify_pushover()
}
# zed_notify_ntfy (subject, pathname)
#
# Send a notification via Ntfy.sh <https://ntfy.sh/>.
# The ntfy topic (ZED_NTFY_TOPIC) identifies the topic that the notification
# will be sent to Ntfy.sh server. The ntfy url (ZED_NTFY_URL) defines the
# self-hosted or provided hosted ntfy service location. The ntfy access token
# <https://docs.ntfy.sh/publish/#access-tokens> (ZED_NTFY_ACCESS_TOKEN) reprsents an
# access token that could be used if a topic is read/write protected. If a
# topic can be written to publicaly, a ZED_NTFY_ACCESS_TOKEN is not required.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://docs.ntfy.sh
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_NTFY_TOPIC
# ZED_NTFY_ACCESS_TOKEN (OPTIONAL)
# ZED_NTFY_URL
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_ntfy()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
[ -n "${ZED_NTFY_TOPIC}" ] || return 2
local url="${ZED_NTFY_URL:-"https://ntfy.sh"}/${ZED_NTFY_TOPIC}"
if [ ! -r "${pathname}" ]; then
zed_log_err "ntfy cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
if [ -n "${ZED_NTFY_ACCESS_TOKEN}" ]; then
msg_out="$( \
curl \
-u ":${ZED_NTFY_ACCESS_TOKEN}" \
-H "Title: ${subject}" \
-d "${msg_body}" \
-H "Priority: high" \
"${url}" \
2>/dev/null \
)"; rv=$?
else
msg_out="$( \
curl \
-H "Title: ${subject}" \
-d "${msg_body}" \
-H "Priority: high" \
"${url}" \
2>/dev/null \
)"; rv=$?
fi
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "ntfy \"${msg_err}"\"
return 1
fi
return 0
}
# zed_rate_limit (tag, [interval])
#
# Check whether an event of a given type [tag] has already occurred within the

View File

@ -146,4 +146,26 @@ ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"
# Power off the drive's slot in the enclosure if it becomes FAULTED. This can
# help silence misbehaving drives. This assumes your drive enclosure fully
# supports slot power control via sysfs.
#ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT=1
#ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT=1
##
# Ntfy topic
# This defines which topic will receive the ntfy notification.
# <https://docs.ntfy.sh/publish/>
# Disabled by default; uncomment to enable.
#ZED_NTFY_TOPIC=""
##
# Ntfy access token (optional for public topics)
# This defines an access token which can be used
# to allow you to authenticate when sending to topics
# <https://docs.ntfy.sh/publish/#access-tokens>
# Disabled by default; uncomment to enable.
#ZED_NTFY_ACCESS_TOKEN=""
##
# Ntfy Service URL
# This defines which service the ntfy call will be directed toward
# <https://docs.ntfy.sh/install/>
# https://ntfy.sh by default; uncomment to enable an alternative service url.
#ZED_NTFY_URL="https://ntfy.sh"

View File

@ -35,6 +35,7 @@
#include "zed_strings.h"
#include "agents/zfs_agents.h"
#include <libzutil.h>
#define MAXBUF 4096
@ -922,6 +923,25 @@ _zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
}
}
static void
_zed_event_update_enc_sysfs_path(nvlist_t *nvl)
{
const char *vdev_path;
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_PATH,
&vdev_path) != 0) {
return; /* some other kind of event, ignore it */
}
if (vdev_path == NULL) {
return;
}
update_vdev_config_dev_sysfs_path(nvl, vdev_path,
FM_EREPORT_PAYLOAD_ZFS_VDEV_ENC_SYSFS_PATH);
}
/*
* Service the next zevent, blocking until one is available.
*/
@ -969,6 +989,17 @@ zed_event_service(struct zed_conf *zcp)
zed_log_msg(LOG_WARNING,
"Failed to lookup zevent class (eid=%llu)", eid);
} else {
/*
* Special case: If we can dynamically detect an enclosure sysfs
* path, then use that value rather than the one stored in the
* vd->vdev_enc_sysfs_path. There have been rare cases where
* vd->vdev_enc_sysfs_path becomes outdated. However, there
* will be other times when we can not dynamically detect the
* sysfs path (like if a disk disappears) and have to rely on
* the old value for things like turning on the fault LED.
*/
_zed_event_update_enc_sysfs_path(nvl);
/* let internal modules see this event first */
zfs_agent_post_event(class, NULL, nvl);

View File

@ -309,7 +309,8 @@ get_usage(zfs_help_t idx)
"[filesystem|volume|snapshot] ...\n"));
case HELP_MOUNT:
return (gettext("\tmount\n"
"\tmount [-flvO] [-o opts] <-a | filesystem>\n"));
"\tmount [-flvO] [-o opts] <-a|-R filesystem|"
"filesystem>\n"));
case HELP_PROMOTE:
return (gettext("\tpromote <clone-filesystem>\n"));
case HELP_RECEIVE:
@ -3672,15 +3673,25 @@ zfs_do_list(int argc, char **argv)
for (char *tok; (tok = strsep(&optarg, ",")); ) {
static const char *const type_subopts[] = {
"filesystem", "volume",
"snapshot", "snap",
"filesystem",
"fs",
"volume",
"vol",
"snapshot",
"snap",
"bookmark",
"all" };
"all"
};
static const int type_types[] = {
ZFS_TYPE_FILESYSTEM, ZFS_TYPE_VOLUME,
ZFS_TYPE_SNAPSHOT, ZFS_TYPE_SNAPSHOT,
ZFS_TYPE_FILESYSTEM,
ZFS_TYPE_FILESYSTEM,
ZFS_TYPE_VOLUME,
ZFS_TYPE_VOLUME,
ZFS_TYPE_SNAPSHOT,
ZFS_TYPE_SNAPSHOT,
ZFS_TYPE_BOOKMARK,
ZFS_TYPE_DATASET | ZFS_TYPE_BOOKMARK };
ZFS_TYPE_DATASET | ZFS_TYPE_BOOKMARK
};
for (c = 0; c < ARRAY_SIZE(type_subopts); ++c)
if (strcmp(tok, type_subopts[c]) == 0) {
@ -6740,6 +6751,8 @@ zfs_do_holds(int argc, char **argv)
#define MOUNT_TIME 1 /* seconds */
typedef struct get_all_state {
char **ga_datasets;
int ga_count;
boolean_t ga_verbose;
get_all_cb_t *ga_cbp;
} get_all_state_t;
@ -6786,19 +6799,35 @@ get_one_dataset(zfs_handle_t *zhp, void *data)
return (0);
}
static void
get_all_datasets(get_all_cb_t *cbp, boolean_t verbose)
static int
get_recursive_datasets(zfs_handle_t *zhp, void *data)
{
get_all_state_t state = {
.ga_verbose = verbose,
.ga_cbp = cbp
};
get_all_state_t *state = data;
int len = strlen(zfs_get_name(zhp));
for (int i = 0; i < state->ga_count; ++i) {
if (strcmp(state->ga_datasets[i], zfs_get_name(zhp)) == 0)
return (get_one_dataset(zhp, data));
else if ((strncmp(state->ga_datasets[i], zfs_get_name(zhp),
len) == 0) && state->ga_datasets[i][len] == '/') {
(void) zfs_iter_filesystems_v2(zhp, 0,
get_recursive_datasets, data);
}
}
zfs_close(zhp);
return (0);
}
if (verbose)
static void
get_all_datasets(get_all_state_t *state)
{
if (state->ga_verbose)
set_progress_header(gettext("Reading ZFS config"));
(void) zfs_iter_root(g_zfs, get_one_dataset, &state);
if (state->ga_datasets == NULL)
(void) zfs_iter_root(g_zfs, get_one_dataset, state);
else
(void) zfs_iter_root(g_zfs, get_recursive_datasets, state);
if (verbose)
if (state->ga_verbose)
finish_progress(gettext("done."));
}
@ -7144,18 +7173,22 @@ static int
share_mount(int op, int argc, char **argv)
{
int do_all = 0;
int recursive = 0;
boolean_t verbose = B_FALSE;
int c, ret = 0;
char *options = NULL;
int flags = 0;
/* check options */
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":alvo:Of" : "al"))
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":aRlvo:Of" : "al"))
!= -1) {
switch (c) {
case 'a':
do_all = 1;
break;
case 'R':
recursive = 1;
break;
case 'v':
verbose = B_TRUE;
break;
@ -7197,7 +7230,7 @@ share_mount(int op, int argc, char **argv)
argv += optind;
/* check number of arguments */
if (do_all) {
if (do_all || recursive) {
enum sa_protocol protocol = SA_NO_PROTOCOL;
if (op == OP_SHARE && argc > 0) {
@ -7206,14 +7239,38 @@ share_mount(int op, int argc, char **argv)
argv++;
}
if (argc != 0) {
if (argc != 0 && do_all) {
(void) fprintf(stderr, gettext("too many arguments\n"));
usage(B_FALSE);
}
if (argc == 0 && recursive) {
(void) fprintf(stderr,
gettext("no dataset provided\n"));
usage(B_FALSE);
}
start_progress_timer();
get_all_cb_t cb = { 0 };
get_all_datasets(&cb, verbose);
get_all_state_t state = { 0 };
if (argc == 0) {
state.ga_datasets = NULL;
state.ga_count = -1;
} else {
zfs_handle_t *zhp;
for (int i = 0; i < argc; i++) {
zhp = zfs_open(g_zfs, argv[i],
ZFS_TYPE_FILESYSTEM);
if (zhp == NULL)
usage(B_FALSE);
zfs_close(zhp);
}
state.ga_datasets = argv;
state.ga_count = argc;
}
state.ga_verbose = verbose;
state.ga_cbp = &cb;
get_all_datasets(&state);
if (cb.cb_used == 0) {
free(options);
@ -7230,7 +7287,8 @@ share_mount(int op, int argc, char **argv)
pthread_mutex_init(&share_mount_state.sm_lock, NULL);
/* For a 'zfs share -a' operation start with a clean slate. */
zfs_truncate_shares(NULL);
if (op == OP_SHARE)
zfs_truncate_shares(NULL);
/*
* libshare isn't mt-safe, so only do the operation in parallel

View File

@ -1083,6 +1083,22 @@ main(int argc, char **argv)
libzfs_fini(g_zfs);
return (1);
}
if (record.zi_nlanes) {
switch (io_type) {
case ZIO_TYPE_READ:
case ZIO_TYPE_WRITE:
case ZIO_TYPES:
break;
default:
(void) fprintf(stderr, "I/O type for a delay "
"must be 'read' or 'write'\n");
usage();
libzfs_fini(g_zfs);
return (1);
}
}
if (!error)
error = ENXIO;

View File

@ -1,6 +1,9 @@
# Features which are supported by GRUB2
allocation_classes
async_destroy
block_cloning
bookmarks
device_rebuild
embedded_data
empty_bpobj
enabled_txg
@ -9,6 +12,12 @@ filesystem_limits
hole_birth
large_blocks
livelist
log_spacemap
lz4_compress
project_quota
resilver_defer
spacemap_histogram
spacemap_v2
userobj_accounting
zilsaxattr
zpool_checkpoint

View File

@ -6,7 +6,6 @@ edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth

View File

@ -124,3 +124,24 @@ check_file(const char *file, boolean_t force, boolean_t isspare)
{
return (check_file_generic(file, force, isspare));
}
int
zpool_power_current_state(zpool_handle_t *zhp, char *vdev)
{
(void) zhp;
(void) vdev;
/* Enclosure slot power not supported on FreeBSD yet */
return (-1);
}
int
zpool_power(zpool_handle_t *zhp, char *vdev, boolean_t turn_on)
{
(void) zhp;
(void) vdev;
(void) turn_on;
/* Enclosure slot power not supported on FreeBSD yet */
return (ENOTSUP);
}

View File

@ -416,3 +416,258 @@ check_file(const char *file, boolean_t force, boolean_t isspare)
{
return (check_file_generic(file, force, isspare));
}
/*
* Read from a sysfs file and return an allocated string. Removes
* the newline from the end of the string if there is one.
*
* Returns a string on success (which must be freed), or NULL on error.
*/
static char *zpool_sysfs_gets(char *path)
{
int fd;
struct stat statbuf;
char *buf = NULL;
ssize_t count = 0;
fd = open(path, O_RDONLY);
if (fd < 0)
return (NULL);
if (fstat(fd, &statbuf) != 0) {
close(fd);
return (NULL);
}
buf = calloc(statbuf.st_size + 1, sizeof (*buf));
if (buf == NULL) {
close(fd);
return (NULL);
}
/*
* Note, we can read less bytes than st_size, and that's ok. Sysfs
* files will report their size is 4k even if they only return a small
* string.
*/
count = read(fd, buf, statbuf.st_size);
if (count < 0) {
/* Error doing read() or we overran the buffer */
close(fd);
free(buf);
return (NULL);
}
/* Remove trailing newline */
if (count > 0 && buf[count - 1] == '\n')
buf[count - 1] = 0;
close(fd);
return (buf);
}
/*
* Write a string to a sysfs file.
*
* Returns 0 on success, non-zero otherwise.
*/
static int zpool_sysfs_puts(char *path, char *str)
{
FILE *file;
file = fopen(path, "w");
if (!file) {
return (-1);
}
if (fputs(str, file) < 0) {
fclose(file);
return (-2);
}
fclose(file);
return (0);
}
/* Given a vdev nvlist_t, rescan its enclosure sysfs path */
static void
rescan_vdev_config_dev_sysfs_path(nvlist_t *vdev_nv)
{
update_vdev_config_dev_sysfs_path(vdev_nv,
fnvlist_lookup_string(vdev_nv, ZPOOL_CONFIG_PATH),
ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH);
}
/*
* Given a power string: "on", "off", "1", or "0", return 0 if it's an
* off value, 1 if it's an on value, and -1 if the value is unrecognized.
*/
static int zpool_power_parse_value(char *str)
{
if ((strcmp(str, "off") == 0) || (strcmp(str, "0") == 0))
return (0);
if ((strcmp(str, "on") == 0) || (strcmp(str, "1") == 0))
return (1);
return (-1);
}
/*
* Given a vdev string return an allocated string containing the sysfs path to
* its power control file. Also do a check if the power control file really
* exists and has correct permissions.
*
* Example returned strings:
*
* /sys/class/enclosure/0:0:122:0/10/power_status
* /sys/bus/pci/slots/10/power
*
* Returns allocated string on success (which must be freed), NULL on failure.
*/
static char *
zpool_power_sysfs_path(zpool_handle_t *zhp, char *vdev)
{
const char *enc_sysfs_dir = NULL;
char *path = NULL;
nvlist_t *vdev_nv = zpool_find_vdev(zhp, vdev, NULL, NULL, NULL);
if (vdev_nv == NULL) {
return (NULL);
}
/* Make sure we're getting the updated enclosure sysfs path */
rescan_vdev_config_dev_sysfs_path(vdev_nv);
if (nvlist_lookup_string(vdev_nv, ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH,
&enc_sysfs_dir) != 0) {
return (NULL);
}
if (asprintf(&path, "%s/power_status", enc_sysfs_dir) == -1)
return (NULL);
if (access(path, W_OK) != 0) {
free(path);
path = NULL;
/* No HDD 'power_control' file, maybe it's NVMe? */
if (asprintf(&path, "%s/power", enc_sysfs_dir) == -1) {
return (NULL);
}
if (access(path, R_OK | W_OK) != 0) {
/* Not NVMe either */
free(path);
return (NULL);
}
}
return (path);
}
/*
* Given a path to a sysfs power control file, return B_TRUE if you should use
* "on/off" words to control it, or B_FALSE otherwise ("0/1" to control).
*/
static boolean_t
zpool_power_use_word(char *sysfs_path)
{
if (strcmp(&sysfs_path[strlen(sysfs_path) - strlen("power_status")],
"power_status") == 0) {
return (B_TRUE);
}
return (B_FALSE);
}
/*
* Check the sysfs power control value for a vdev.
*
* Returns:
* 0 - Power is off
* 1 - Power is on
* -1 - Error or unsupported
*/
int
zpool_power_current_state(zpool_handle_t *zhp, char *vdev)
{
char *val;
int rc;
char *path = zpool_power_sysfs_path(zhp, vdev);
if (path == NULL)
return (-1);
val = zpool_sysfs_gets(path);
if (val == NULL) {
free(path);
return (-1);
}
rc = zpool_power_parse_value(val);
free(val);
free(path);
return (rc);
}
/*
* Turn on or off the slot to a device
*
* Device path is the full path to the device (like /dev/sda or /dev/sda1).
*
* Return code:
* 0: Success
* ENOTSUP: Power control not supported for OS
* EBADSLT: Couldn't read current power state
* ENOENT: No sysfs path to power control
* EIO: Couldn't write sysfs power value
* EBADE: Sysfs power value didn't change
*/
int
zpool_power(zpool_handle_t *zhp, char *vdev, boolean_t turn_on)
{
char *sysfs_path;
const char *val;
int rc;
int timeout_ms;
rc = zpool_power_current_state(zhp, vdev);
if (rc == -1) {
return (EBADSLT);
}
/* Already correct value? */
if (rc == (int)turn_on)
return (0);
sysfs_path = zpool_power_sysfs_path(zhp, vdev);
if (sysfs_path == NULL)
return (ENOENT);
if (zpool_power_use_word(sysfs_path)) {
val = turn_on ? "on" : "off";
} else {
val = turn_on ? "1" : "0";
}
rc = zpool_sysfs_puts(sysfs_path, (char *)val);
free(sysfs_path);
if (rc != 0) {
return (EIO);
}
/*
* Wait up to 30 seconds for sysfs power value to change after
* writing it.
*/
timeout_ms = zpool_getenv_int("ZPOOL_POWER_ON_SLOT_TIMEOUT_MS", 30000);
for (int i = 0; i < MAX(1, timeout_ms / 200); i++) {
rc = zpool_power_current_state(zhp, vdev);
if (rc == (int)turn_on)
return (0); /* success */
fsleep(0.200); /* 200ms */
}
/* sysfs value never changed */
return (EBADE);
}

View File

@ -33,10 +33,18 @@ for i in $scripts ; do
val=""
case $i in
enc)
val=$(ls "$VDEV_ENC_SYSFS_PATH/../../" 2>/dev/null)
if echo "$VDEV_ENC_SYSFS_PATH" | grep -q '/sys/bus/pci/slots' ; then
val="$VDEV_ENC_SYSFS_PATH"
else
val="$(ls """$VDEV_ENC_SYSFS_PATH/../../""" 2>/dev/null)"
fi
;;
slot)
val=$(cat "$VDEV_ENC_SYSFS_PATH/slot" 2>/dev/null)
if echo "$VDEV_ENC_SYSFS_PATH" | grep -q '/sys/bus/pci/slots' ; then
val="$(basename """$VDEV_ENC_SYSFS_PATH""")"
else
val="$(cat """$VDEV_ENC_SYSFS_PATH/slot""" 2>/dev/null)"
fi
;;
encdev)
val=$(ls "$VDEV_ENC_SYSFS_PATH/../device/scsi_generic" 2>/dev/null)

View File

@ -443,37 +443,22 @@ vdev_run_cmd(vdev_cmd_data_t *data, char *cmd)
{
int rc;
char *argv[2] = {cmd};
char *env[5] = {(char *)"PATH=/bin:/sbin:/usr/bin:/usr/sbin"};
char **env;
char **lines = NULL;
int lines_cnt = 0;
int i;
/* Setup our custom environment variables */
rc = asprintf(&env[1], "VDEV_PATH=%s",
data->path ? data->path : "");
if (rc == -1) {
env[1] = NULL;
env = zpool_vdev_script_alloc_env(data->pool, data->path, data->upath,
data->vdev_enc_sysfs_path, NULL, NULL);
if (env == NULL)
goto out;
}
rc = asprintf(&env[2], "VDEV_UPATH=%s",
data->upath ? data->upath : "");
if (rc == -1) {
env[2] = NULL;
goto out;
}
rc = asprintf(&env[3], "VDEV_ENC_SYSFS_PATH=%s",
data->vdev_enc_sysfs_path ?
data->vdev_enc_sysfs_path : "");
if (rc == -1) {
env[3] = NULL;
goto out;
}
/* Run the command */
rc = libzfs_run_process_get_stdout_nopath(cmd, argv, env, &lines,
&lines_cnt);
zpool_vdev_script_free_env(env);
if (rc != 0)
goto out;
@ -485,10 +470,6 @@ vdev_run_cmd(vdev_cmd_data_t *data, char *cmd)
out:
if (lines != NULL)
libzfs_free_str_array(lines, lines_cnt);
/* Start with i = 1 since env[0] was statically allocated */
for (i = 1; i < ARRAY_SIZE(env); i++)
free(env[i]);
}
/*
@ -573,6 +554,10 @@ for_each_vdev_run_cb(void *zhp_data, nvlist_t *nv, void *cb_vcdl)
if (nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) != 0)
return (1);
/* Make sure we're getting the updated enclosure sysfs path */
update_vdev_config_dev_sysfs_path(nv, path,
ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH);
nvlist_lookup_string(nv, ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH,
&vdev_enc_sysfs_path);

View File

@ -22,7 +22,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright (c) 2012 by Frederik Wessels. All rights reserved.
* Copyright (c) 2012 by Cyril Plisko. All rights reserved.
* Copyright (c) 2013 by Prasad Joshi (sTec). All rights reserved.
@ -131,6 +131,13 @@ static int zpool_do_help(int argc, char **argv);
static zpool_compat_status_t zpool_do_load_compat(
const char *, boolean_t *);
enum zpool_options {
ZPOOL_OPTION_POWER = 1024,
ZPOOL_OPTION_ALLOW_INUSE,
ZPOOL_OPTION_ALLOW_REPLICATION_MISMATCH,
ZPOOL_OPTION_ALLOW_ASHIFT_MISMATCH
};
/*
* These libumem hooks provide a reasonable set of defaults for the allocator's
* debugging facilities.
@ -347,13 +354,13 @@ get_usage(zpool_help_t idx)
{
switch (idx) {
case HELP_ADD:
return (gettext("\tadd [-fgLnP] [-o property=value] "
return (gettext("\tadd [-afgLnP] [-o property=value] "
"<pool> <vdev> ...\n"));
case HELP_ATTACH:
return (gettext("\tattach [-fsw] [-o property=value] "
"<pool> <device> <new-device>\n"));
case HELP_CLEAR:
return (gettext("\tclear [-nF] <pool> [device]\n"));
return (gettext("\tclear [[--power]|[-nF]] <pool> [device]\n"));
case HELP_CREATE:
return (gettext("\tcreate [-fnd] [-o property=value] ... \n"
"\t [-O file-system-property=value] ... \n"
@ -389,9 +396,11 @@ get_usage(zpool_help_t idx)
"[-T d|u] [pool] ... \n"
"\t [interval [count]]\n"));
case HELP_OFFLINE:
return (gettext("\toffline [-f] [-t] <pool> <device> ...\n"));
return (gettext("\toffline [--power]|[[-f][-t]] <pool> "
"<device> ...\n"));
case HELP_ONLINE:
return (gettext("\tonline [-e] <pool> <device> ...\n"));
return (gettext("\tonline [--power][-e] <pool> <device> "
"...\n"));
case HELP_REPLACE:
return (gettext("\treplace [-fsw] [-o property=value] "
"<pool> <device> [new-device]\n"));
@ -410,8 +419,8 @@ get_usage(zpool_help_t idx)
return (gettext("\ttrim [-dw] [-r <rate>] [-c | -s] <pool> "
"[<device> ...]\n"));
case HELP_STATUS:
return (gettext("\tstatus [-c [script1,script2,...]] "
"[-igLpPstvxD] [-T d|u] [pool] ... \n"
return (gettext("\tstatus [--power] [-c [script1,script2,...]] "
"[-DegiLpPstvx] [-T d|u] [pool] ...\n"
"\t [interval [count]]\n"));
case HELP_UPGRADE:
return (gettext("\tupgrade\n"
@ -516,6 +525,77 @@ print_vdev_prop_cb(int prop, void *cb)
return (ZPROP_CONT);
}
/*
* Given a leaf vdev name like 'L5' return its VDEV_CONFIG_PATH like
* '/dev/disk/by-vdev/L5'.
*/
static const char *
vdev_name_to_path(zpool_handle_t *zhp, char *vdev)
{
nvlist_t *vdev_nv = zpool_find_vdev(zhp, vdev, NULL, NULL, NULL);
if (vdev_nv == NULL) {
return (NULL);
}
return (fnvlist_lookup_string(vdev_nv, ZPOOL_CONFIG_PATH));
}
static int
zpool_power_on(zpool_handle_t *zhp, char *vdev)
{
return (zpool_power(zhp, vdev, B_TRUE));
}
static int
zpool_power_on_and_disk_wait(zpool_handle_t *zhp, char *vdev)
{
int rc;
rc = zpool_power_on(zhp, vdev);
if (rc != 0)
return (rc);
zpool_disk_wait(vdev_name_to_path(zhp, vdev));
return (0);
}
static int
zpool_power_on_pool_and_wait_for_devices(zpool_handle_t *zhp)
{
nvlist_t *nv;
const char *path = NULL;
int rc;
/* Power up all the devices first */
FOR_EACH_REAL_LEAF_VDEV(zhp, nv) {
path = fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH);
if (path != NULL) {
rc = zpool_power_on(zhp, (char *)path);
if (rc != 0) {
return (rc);
}
}
}
/*
* Wait for their devices to show up. Since we powered them on
* at roughly the same time, they should all come online around
* the same time.
*/
FOR_EACH_REAL_LEAF_VDEV(zhp, nv) {
path = fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH);
zpool_disk_wait(path);
}
return (0);
}
static int
zpool_power_off(zpool_handle_t *zhp, char *vdev)
{
return (zpool_power(zhp, vdev, B_FALSE));
}
/*
* Display usage message. If we're inside a command, display only the usage for
* that command. Otherwise, iterate over the entire command table and display
@ -936,8 +1016,9 @@ add_prop_list_default(const char *propname, const char *propval,
}
/*
* zpool add [-fgLnP] [-o property=value] <pool> <vdev> ...
* zpool add [-afgLnP] [-o property=value] <pool> <vdev> ...
*
* -a Disable the ashift validation checks
* -f Force addition of devices, even if they appear in use
* -g Display guid for individual vdev name.
* -L Follow links when resolving vdev path name.
@ -953,8 +1034,11 @@ add_prop_list_default(const char *propname, const char *propval,
int
zpool_do_add(int argc, char **argv)
{
boolean_t force = B_FALSE;
boolean_t check_replication = B_TRUE;
boolean_t check_inuse = B_TRUE;
boolean_t dryrun = B_FALSE;
boolean_t check_ashift = B_TRUE;
boolean_t force = B_FALSE;
int name_flags = 0;
int c;
nvlist_t *nvroot;
@ -965,8 +1049,18 @@ zpool_do_add(int argc, char **argv)
nvlist_t *props = NULL;
char *propval;
struct option long_options[] = {
{"allow-in-use", no_argument, NULL, ZPOOL_OPTION_ALLOW_INUSE},
{"allow-replication-mismatch", no_argument, NULL,
ZPOOL_OPTION_ALLOW_REPLICATION_MISMATCH},
{"allow-ashift-mismatch", no_argument, NULL,
ZPOOL_OPTION_ALLOW_ASHIFT_MISMATCH},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "fgLno:P")) != -1) {
while ((c = getopt_long(argc, argv, "fgLno:P", long_options, NULL))
!= -1) {
switch (c) {
case 'f':
force = B_TRUE;
@ -996,6 +1090,15 @@ zpool_do_add(int argc, char **argv)
case 'P':
name_flags |= VDEV_NAME_PATH;
break;
case ZPOOL_OPTION_ALLOW_INUSE:
check_inuse = B_FALSE;
break;
case ZPOOL_OPTION_ALLOW_REPLICATION_MISMATCH:
check_replication = B_FALSE;
break;
case ZPOOL_OPTION_ALLOW_ASHIFT_MISMATCH:
check_ashift = B_FALSE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -1016,6 +1119,19 @@ zpool_do_add(int argc, char **argv)
usage(B_FALSE);
}
if (force) {
if (!check_inuse || !check_replication || !check_ashift) {
(void) fprintf(stderr, gettext("'-f' option is not "
"allowed with '--allow-replication-mismatch', "
"'--allow-ashift-mismatch', or "
"'--allow-in-use'\n"));
usage(B_FALSE);
}
check_inuse = B_FALSE;
check_replication = B_FALSE;
check_ashift = B_FALSE;
}
poolname = argv[0];
argc--;
@ -1046,8 +1162,8 @@ zpool_do_add(int argc, char **argv)
}
/* pass off to make_root_vdev for processing */
nvroot = make_root_vdev(zhp, props, force, !force, B_FALSE, dryrun,
argc, argv);
nvroot = make_root_vdev(zhp, props, !check_inuse,
check_replication, B_FALSE, dryrun, argc, argv);
if (nvroot == NULL) {
zpool_close(zhp);
return (1);
@ -1151,7 +1267,7 @@ zpool_do_add(int argc, char **argv)
ret = 0;
} else {
ret = (zpool_add(zhp, nvroot) != 0);
ret = (zpool_add(zhp, nvroot, check_ashift) != 0);
}
nvlist_free(props);
@ -2088,11 +2204,13 @@ typedef struct status_cbdata {
boolean_t cb_explain;
boolean_t cb_first;
boolean_t cb_dedup_stats;
boolean_t cb_print_unhealthy;
boolean_t cb_print_status;
boolean_t cb_print_slow_ios;
boolean_t cb_print_vdev_init;
boolean_t cb_print_vdev_trim;
vdev_cmd_data_list_t *vcdl;
boolean_t cb_print_power;
} status_cbdata_t;
/* Return 1 if string is NULL, empty, or whitespace; return 0 otherwise. */
@ -2171,7 +2289,6 @@ print_status_initialize(vdev_stat_t *vs, boolean_t verbose)
!vs->vs_scan_removing) {
char zbuf[1024];
char tbuf[256];
struct tm zaction_ts;
time_t t = vs->vs_initialize_action_time;
int initialize_pct = 100;
@ -2181,8 +2298,8 @@ print_status_initialize(vdev_stat_t *vs, boolean_t verbose)
100 / (vs->vs_initialize_bytes_est + 1));
}
(void) localtime_r(&t, &zaction_ts);
(void) strftime(tbuf, sizeof (tbuf), "%c", &zaction_ts);
(void) ctime_r(&t, tbuf);
tbuf[24] = 0;
switch (vs->vs_initialize_state) {
case VDEV_INITIALIZE_SUSPENDED:
@ -2222,7 +2339,6 @@ print_status_trim(vdev_stat_t *vs, boolean_t verbose)
!vs->vs_scan_removing) {
char zbuf[1024];
char tbuf[256];
struct tm zaction_ts;
time_t t = vs->vs_trim_action_time;
int trim_pct = 100;
@ -2231,8 +2347,8 @@ print_status_trim(vdev_stat_t *vs, boolean_t verbose)
100 / (vs->vs_trim_bytes_est + 1));
}
(void) localtime_r(&t, &zaction_ts);
(void) strftime(tbuf, sizeof (tbuf), "%c", &zaction_ts);
(void) ctime_r(&t, tbuf);
tbuf[24] = 0;
switch (vs->vs_trim_state) {
case VDEV_TRIM_SUSPENDED:
@ -2283,6 +2399,35 @@ health_str_to_color(const char *health)
return (NULL);
}
/*
* Called for each leaf vdev. Returns 0 if the vdev is healthy.
* A vdev is unhealthy if any of the following are true:
* 1) there are read, write, or checksum errors,
* 2) its state is not ONLINE, or
* 3) slow IO reporting was requested (-s) and there are slow IOs.
*/
static int
vdev_health_check_cb(void *hdl_data, nvlist_t *nv, void *data)
{
status_cbdata_t *cb = data;
vdev_stat_t *vs;
uint_t vsc;
(void) hdl_data;
if (nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &vsc) != 0)
return (1);
if (vs->vs_checksum_errors || vs->vs_read_errors ||
vs->vs_write_errors || vs->vs_state != VDEV_STATE_HEALTHY)
return (1);
if (cb->cb_print_slow_ios && vs->vs_slow_ios)
return (1);
return (0);
}
/*
* Print out configuration state as requested by status_callback.
*/
@ -2301,7 +2446,8 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
const char *state;
const char *type;
const char *path = NULL;
const char *rcolor = NULL, *wcolor = NULL, *ccolor = NULL;
const char *rcolor = NULL, *wcolor = NULL, *ccolor = NULL,
*scolor = NULL;
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_CHILDREN,
&child, &children) != 0)
@ -2328,6 +2474,15 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
state = gettext("AVAIL");
}
/*
* If '-e' is specified then top-level vdevs and their children
* can be pruned if all of their leaves are healthy.
*/
if (cb->cb_print_unhealthy && depth > 0 &&
for_each_vdev_in_nvlist(nv, vdev_health_check_cb, cb) == 0) {
return;
}
printf_color(health_str_to_color(state),
"\t%*s%-*s %-8s", depth, "", cb->cb_namewidth - depth,
name, state);
@ -2342,6 +2497,9 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
if (vs->vs_checksum_errors)
ccolor = ANSI_RED;
if (vs->vs_slow_ios)
scolor = ANSI_BLUE;
if (cb->cb_literal) {
fputc(' ', stdout);
printf_color(rcolor, "%5llu",
@ -2374,9 +2532,30 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
}
if (cb->cb_literal)
printf(" %5llu", (u_longlong_t)vs->vs_slow_ios);
printf_color(scolor, " %5llu",
(u_longlong_t)vs->vs_slow_ios);
else
printf(" %5s", rbuf);
printf_color(scolor, " %5s", rbuf);
}
if (cb->cb_print_power) {
if (children == 0) {
/* Only leaf vdevs have physical slots */
switch (zpool_power_current_state(zhp, (char *)
fnvlist_lookup_string(nv,
ZPOOL_CONFIG_PATH))) {
case 0:
printf_color(ANSI_RED, " %5s",
gettext("off"));
break;
case 1:
printf(" %5s", gettext("on"));
break;
default:
printf(" %5s", "-");
}
} else {
printf(" %5s", "-");
}
}
}
@ -2431,7 +2610,13 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
break;
case VDEV_AUX_ERR_EXCEEDED:
(void) printf(gettext("too many errors"));
if (vs->vs_read_errors + vs->vs_write_errors +
vs->vs_checksum_errors == 0 && children == 0 &&
vs->vs_slow_ios > 0) {
(void) printf(gettext("too many slow I/Os"));
} else {
(void) printf(gettext("too many errors"));
}
break;
case VDEV_AUX_IO_FAILURE:
@ -3258,10 +3443,10 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
ms_status = zpool_enable_datasets(zhp, mntopts, 0);
if (ms_status == EZFS_SHAREFAILED) {
(void) fprintf(stderr, gettext("Import was "
"successful, but unable to share some datasets"));
"successful, but unable to share some datasets\n"));
} else if (ms_status == EZFS_MOUNTFAILED) {
(void) fprintf(stderr, gettext("Import was "
"successful, but unable to mount some datasets"));
"successful, but unable to mount some datasets\n"));
}
}
@ -5428,19 +5613,6 @@ get_interval_count_filter_guids(int *argc, char **argv, float *interval,
interval, count);
}
/*
* Floating point sleep(). Allows you to pass in a floating point value for
* seconds.
*/
static void
fsleep(float sec)
{
struct timespec req;
req.tv_sec = floor(sec);
req.tv_nsec = (sec - (float)req.tv_sec) * NANOSEC;
nanosleep(&req, NULL);
}
/*
* Terminal height, in rows. Returns -1 if stdout is not connected to a TTY or
* if we were unable to determine its size.
@ -6940,9 +7112,10 @@ zpool_do_split(int argc, char **argv)
}
/*
* zpool online <pool> <device> ...
* zpool online [--power] <pool> <device> ...
*
* --power: Power on the enclosure slot to the drive (if possible)
*/
int
zpool_do_online(int argc, char **argv)
@ -6953,13 +7126,21 @@ zpool_do_online(int argc, char **argv)
int ret = 0;
vdev_state_t newstate;
int flags = 0;
boolean_t is_power_on = B_FALSE;
struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "e")) != -1) {
while ((c = getopt_long(argc, argv, "e", long_options, NULL)) != -1) {
switch (c) {
case 'e':
flags |= ZFS_ONLINE_EXPAND;
break;
case ZPOOL_OPTION_POWER:
is_power_on = B_TRUE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -6967,6 +7148,9 @@ zpool_do_online(int argc, char **argv)
}
}
if (libzfs_envvar_is_set("ZPOOL_AUTO_POWER_ON_SLOT"))
is_power_on = B_TRUE;
argc -= optind;
argv += optind;
@ -6988,6 +7172,18 @@ zpool_do_online(int argc, char **argv)
for (i = 1; i < argc; i++) {
vdev_state_t oldstate;
boolean_t avail_spare, l2cache;
int rc;
if (is_power_on) {
rc = zpool_power_on_and_disk_wait(zhp, argv[i]);
if (rc == ENOTSUP) {
(void) fprintf(stderr,
gettext("Power control not supported\n"));
}
if (rc != 0)
return (rc);
}
nvlist_t *tgt = zpool_find_vdev(zhp, argv[i], &avail_spare,
&l2cache, NULL);
if (tgt == NULL) {
@ -7033,12 +7229,15 @@ zpool_do_online(int argc, char **argv)
}
/*
* zpool offline [-ft] <pool> <device> ...
* zpool offline [-ft]|[--power] <pool> <device> ...
*
*
* -f Force the device into a faulted state.
*
* -t Only take the device off-line temporarily. The offline/faulted
* state will not be persistent across reboots.
*
* --power Power off the enclosure slot to the drive (if possible)
*/
int
zpool_do_offline(int argc, char **argv)
@ -7049,9 +7248,15 @@ zpool_do_offline(int argc, char **argv)
int ret = 0;
boolean_t istmp = B_FALSE;
boolean_t fault = B_FALSE;
boolean_t is_power_off = B_FALSE;
struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "ft")) != -1) {
while ((c = getopt_long(argc, argv, "ft", long_options, NULL)) != -1) {
switch (c) {
case 'f':
fault = B_TRUE;
@ -7059,6 +7264,9 @@ zpool_do_offline(int argc, char **argv)
case 't':
istmp = B_TRUE;
break;
case ZPOOL_OPTION_POWER:
is_power_off = B_TRUE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -7066,6 +7274,20 @@ zpool_do_offline(int argc, char **argv)
}
}
if (is_power_off && fault) {
(void) fprintf(stderr,
gettext("-0 and -f cannot be used together\n"));
usage(B_FALSE);
return (1);
}
if (is_power_off && istmp) {
(void) fprintf(stderr,
gettext("-0 and -t cannot be used together\n"));
usage(B_FALSE);
return (1);
}
argc -= optind;
argv += optind;
@ -7085,8 +7307,22 @@ zpool_do_offline(int argc, char **argv)
return (1);
for (i = 1; i < argc; i++) {
if (fault) {
uint64_t guid = zpool_vdev_path_to_guid(zhp, argv[i]);
uint64_t guid = zpool_vdev_path_to_guid(zhp, argv[i]);
if (is_power_off) {
/*
* Note: we have to power off first, then set REMOVED,
* or else zpool_vdev_set_removed_state() returns
* EAGAIN.
*/
ret = zpool_power_off(zhp, argv[i]);
if (ret != 0) {
(void) fprintf(stderr, "%s %s %d\n",
gettext("unable to power off slot for"),
argv[i], ret);
}
zpool_vdev_set_removed_state(zhp, guid, VDEV_AUX_NONE);
} else if (fault) {
vdev_aux_t aux;
if (istmp == B_FALSE) {
/* Force the fault to persist across imports */
@ -7109,7 +7345,7 @@ zpool_do_offline(int argc, char **argv)
}
/*
* zpool clear <pool> [device]
* zpool clear [-nF]|[--power] <pool> [device]
*
* Clear all errors associated with a pool or a particular device.
*/
@ -7121,13 +7357,20 @@ zpool_do_clear(int argc, char **argv)
boolean_t dryrun = B_FALSE;
boolean_t do_rewind = B_FALSE;
boolean_t xtreme_rewind = B_FALSE;
boolean_t is_power_on = B_FALSE;
uint32_t rewind_policy = ZPOOL_NO_REWIND;
nvlist_t *policy = NULL;
zpool_handle_t *zhp;
char *pool, *device;
struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "FnX")) != -1) {
while ((c = getopt_long(argc, argv, "FnX", long_options,
NULL)) != -1) {
switch (c) {
case 'F':
do_rewind = B_TRUE;
@ -7138,6 +7381,9 @@ zpool_do_clear(int argc, char **argv)
case 'X':
xtreme_rewind = B_TRUE;
break;
case ZPOOL_OPTION_POWER:
is_power_on = B_TRUE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -7145,6 +7391,9 @@ zpool_do_clear(int argc, char **argv)
}
}
if (libzfs_envvar_is_set("ZPOOL_AUTO_POWER_ON_SLOT"))
is_power_on = B_TRUE;
argc -= optind;
argv += optind;
@ -7185,6 +7434,14 @@ zpool_do_clear(int argc, char **argv)
return (1);
}
if (is_power_on) {
if (device == NULL) {
zpool_power_on_pool_and_wait_for_devices(zhp);
} else {
zpool_power_on_and_disk_wait(zhp, device);
}
}
if (zpool_clear(zhp, device, policy) != 0)
ret = 1;
@ -8653,7 +8910,7 @@ status_callback(zpool_handle_t *zhp, void *data)
printf_color(ANSI_BOLD, gettext("action: "));
printf_color(ANSI_YELLOW, gettext("Make sure the pool's devices"
" are connected, then reboot your system and\n\timport the "
"pool.\n"));
"pool or run 'zpool clear' to resume the pool.\n"));
break;
case ZPOOL_STATUS_IO_FAILURE_WAIT:
@ -8801,6 +9058,10 @@ status_callback(zpool_handle_t *zhp, void *data)
printf_color(ANSI_BOLD, " %5s", gettext("SLOW"));
}
if (cbp->cb_print_power) {
printf_color(ANSI_BOLD, " %5s", gettext("POWER"));
}
if (cbp->vcdl != NULL)
print_cmd_columns(cbp->vcdl, 0);
@ -8828,9 +9089,11 @@ status_callback(zpool_handle_t *zhp, void *data)
(void) printf(gettext(
"errors: No known data errors\n"));
} else if (!cbp->cb_verbose) {
color_start(ANSI_RED);
(void) printf(gettext("errors: %llu data "
"errors, use '-v' for a list\n"),
(u_longlong_t)nerr);
color_end();
} else {
print_error_log(zhp);
}
@ -8847,21 +9110,23 @@ status_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool status [-c [script1,script2,...]] [-igLpPstvx] [-T d|u] [pool] ...
* [interval [count]]
* zpool status [-c [script1,script2,...]] [-DegiLpPstvx] [--power] [-T d|u] ...
* [pool] [interval [count]]
*
* -c CMD For each vdev, run command CMD
* -i Display vdev initialization status.
* -D Display dedup status (undocumented)
* -e Display only unhealthy vdevs
* -g Display guid for individual vdev name.
* -i Display vdev initialization status.
* -L Follow links when resolving vdev path name.
* -p Display values in parsable (exact) format.
* -P Display full path for vdev name.
* -s Display slow IOs column.
* -v Display complete error logs
* -x Display only pools with potential problems
* -D Display dedup status (undocumented)
* -t Display vdev TRIM status.
* -T Display a timestamp in date(1) or Unix format
* -v Display complete error logs
* -x Display only pools with potential problems
* --power Display vdev enclosure slot power status
*
* Describes the health status of all pools or some subset.
*/
@ -8875,8 +9140,14 @@ zpool_do_status(int argc, char **argv)
status_cbdata_t cb = { 0 };
char *cmd = NULL;
struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "c:igLpPsvxDtT:")) != -1) {
while ((c = getopt_long(argc, argv, "c:DegiLpPstT:vx", long_options,
NULL)) != -1) {
switch (c) {
case 'c':
if (cmd != NULL) {
@ -8902,12 +9173,18 @@ zpool_do_status(int argc, char **argv)
}
cmd = optarg;
break;
case 'i':
cb.cb_print_vdev_init = B_TRUE;
case 'D':
cb.cb_dedup_stats = B_TRUE;
break;
case 'e':
cb.cb_print_unhealthy = B_TRUE;
break;
case 'g':
cb.cb_name_flags |= VDEV_NAME_GUID;
break;
case 'i':
cb.cb_print_vdev_init = B_TRUE;
break;
case 'L':
cb.cb_name_flags |= VDEV_NAME_FOLLOW_LINKS;
break;
@ -8920,20 +9197,20 @@ zpool_do_status(int argc, char **argv)
case 's':
cb.cb_print_slow_ios = B_TRUE;
break;
case 't':
cb.cb_print_vdev_trim = B_TRUE;
break;
case 'T':
get_timestamp_arg(*optarg);
break;
case 'v':
cb.cb_verbose = B_TRUE;
break;
case 'x':
cb.cb_explain = B_TRUE;
break;
case 'D':
cb.cb_dedup_stats = B_TRUE;
break;
case 't':
cb.cb_print_vdev_trim = B_TRUE;
break;
case 'T':
get_timestamp_arg(*optarg);
case ZPOOL_OPTION_POWER:
cb.cb_print_power = B_TRUE;
break;
case '?':
if (optopt == 'c') {
@ -8971,7 +9248,6 @@ zpool_do_status(int argc, char **argv)
if (cb.vcdl != NULL)
free_vdev_cmd_data_list(cb.vcdl);
if (argc == 0 && cb.cb_count == 0)
(void) fprintf(stderr, gettext("no pools available\n"));
else if (cb.cb_explain && cb.cb_first && cb.cb_allpools)
@ -10407,11 +10683,10 @@ found:
}
} else {
/*
* The first arg isn't a pool name,
* The first arg isn't the name of a valid pool.
*/
fprintf(stderr, gettext("missing pool name.\n"));
fprintf(stderr, "\n");
usage(B_FALSE);
fprintf(stderr, gettext("Cannot get properties of %s: "
"no such pool available.\n"), argv[0]);
return (1);
}
@ -10752,6 +11027,9 @@ print_wait_status_row(wait_data_t *wd, zpool_handle_t *zhp, int row)
col_widths[i] = MAX(strlen(headers[i]), 6) + 2;
}
if (timestamp_fmt != NODATE)
print_timestamp(timestamp_fmt);
/* Print header if appropriate */
int term_height = terminal_height();
boolean_t reprint_header = (!wd->wd_headers_once && term_height > 0 &&
@ -10819,9 +11097,6 @@ print_wait_status_row(wait_data_t *wd, zpool_handle_t *zhp, int row)
if (vdev_any_spare_replacing(nvroot))
bytes_rem[ZPOOL_WAIT_REPLACE] = bytes_rem[ZPOOL_WAIT_RESILVER];
if (timestamp_fmt != NODATE)
print_timestamp(timestamp_fmt);
for (i = 0; i < ZPOOL_WAIT_NUM_ACTIVITIES; i++) {
char buf[64];
if (!wd->wd_enabled[i])

View File

@ -126,6 +126,10 @@ vdev_cmd_data_list_t *all_pools_for_each_vdev_run(int argc, char **argv,
void free_vdev_cmd_data_list(vdev_cmd_data_list_t *vcdl);
void free_vdev_cmd_data(vdev_cmd_data_t *data);
int vdev_run_cmd_simple(char *path, char *cmd);
int check_device(const char *path, boolean_t force,
boolean_t isspare, boolean_t iswholedisk);
boolean_t check_sector_size_database(char *path, int *sector_size);
@ -134,6 +138,9 @@ int check_file(const char *file, boolean_t force, boolean_t isspare);
void after_zpool_upgrade(zpool_handle_t *zhp);
int check_file_generic(const char *file, boolean_t force, boolean_t isspare);
int zpool_power(zpool_handle_t *zhp, char *vdev, boolean_t turn_on);
int zpool_power_current_state(zpool_handle_t *zhp, char *vdev);
#ifdef __cplusplus
}
#endif

View File

@ -372,6 +372,10 @@ make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_PATH, path) == 0);
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_TYPE, type) == 0);
/* Lookup and add the enclosure sysfs path (if exists) */
update_vdev_config_dev_sysfs_path(vdev, path,
ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH);
if (strcmp(type, VDEV_TYPE_DISK) == 0)
verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK,
(uint64_t)wholedisk) == 0);
@ -936,6 +940,15 @@ zero_label(const char *path)
return (0);
}
static void
lines_to_stderr(char *lines[], int lines_cnt)
{
int i;
for (i = 0; i < lines_cnt; i++) {
fprintf(stderr, "%s\n", lines[i]);
}
}
/*
* Go through and find any whole disks in the vdev specification, labelling them
* as appropriate. When constructing the vdev spec, we were unable to open this
@ -947,7 +960,7 @@ zero_label(const char *path)
* need to get the devid after we label the disk.
*/
static int
make_disks(zpool_handle_t *zhp, nvlist_t *nv)
make_disks(zpool_handle_t *zhp, nvlist_t *nv, boolean_t replacing)
{
nvlist_t **child;
uint_t c, children;
@ -1032,6 +1045,8 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
*/
if (!is_exclusive && !is_spare(NULL, udevpath)) {
char *devnode = strrchr(devpath, '/') + 1;
char **lines = NULL;
int lines_cnt = 0;
ret = strncmp(udevpath, UDISK_ROOT, strlen(UDISK_ROOT));
if (ret == 0) {
@ -1043,9 +1058,27 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
/*
* When labeling a pool the raw device node name
* is provided as it appears under /dev/.
*
* Note that 'zhp' will be NULL when we're creating a
* pool.
*/
if (zpool_label_disk(g_zfs, zhp, devnode) == -1)
if (zpool_prepare_and_label_disk(g_zfs, zhp, devnode,
nv, zhp == NULL ? "create" :
replacing ? "replace" : "add", &lines,
&lines_cnt) != 0) {
(void) fprintf(stderr,
gettext(
"Error preparing/labeling disk.\n"));
if (lines_cnt > 0) {
(void) fprintf(stderr,
gettext("zfs_prepare_disk output:\n"));
lines_to_stderr(lines, lines_cnt);
}
libzfs_free_str_array(lines, lines_cnt);
return (-1);
}
libzfs_free_str_array(lines, lines_cnt);
/*
* Wait for udev to signal the device is available
@ -1082,19 +1115,19 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
}
for (c = 0; c < children; c++)
if ((ret = make_disks(zhp, child[c])) != 0)
if ((ret = make_disks(zhp, child[c], replacing)) != 0)
return (ret);
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_SPARES,
&child, &children) == 0)
for (c = 0; c < children; c++)
if ((ret = make_disks(zhp, child[c])) != 0)
if ((ret = make_disks(zhp, child[c], replacing)) != 0)
return (ret);
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_L2CACHE,
&child, &children) == 0)
for (c = 0; c < children; c++)
if ((ret = make_disks(zhp, child[c])) != 0)
if ((ret = make_disks(zhp, child[c], replacing)) != 0)
return (ret);
return (0);
@ -1752,7 +1785,7 @@ split_mirror_vdev(zpool_handle_t *zhp, char *newname, nvlist_t *props,
return (NULL);
}
if (!flags.dryrun && make_disks(zhp, newroot) != 0) {
if (!flags.dryrun && make_disks(zhp, newroot, B_FALSE) != 0) {
nvlist_free(newroot);
return (NULL);
}
@ -1873,7 +1906,7 @@ make_root_vdev(zpool_handle_t *zhp, nvlist_t *props, int force, int check_rep,
/*
* Run through the vdev specification and label any whole disks found.
*/
if (!dryrun && make_disks(zhp, newroot) != 0) {
if (!dryrun && make_disks(zhp, newroot, replacing) != 0) {
nvlist_free(newroot);
return (NULL);
}

View File

@ -186,7 +186,7 @@ static void
zfs_redup_stream(int infd, int outfd, boolean_t verbose)
{
int bufsz = SPA_MAXBLOCKSIZE;
dmu_replay_record_t thedrr = { 0 };
dmu_replay_record_t thedrr;
dmu_replay_record_t *drr = &thedrr;
redup_table_t rdt;
zio_cksum_t stream_cksum;
@ -194,6 +194,8 @@ zfs_redup_stream(int infd, int outfd, boolean_t verbose)
uint64_t num_records = 0;
uint64_t num_write_byref_records = 0;
memset(&thedrr, 0, sizeof (dmu_replay_record_t));
#ifdef _ILP32
uint64_t max_rde_size = SMALLEST_POSSIBLE_MAX_RDT_MB << 20;
#else

View File

@ -20,7 +20,7 @@
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2018 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2013 Steven Hartland. All rights reserved.
* Copyright (c) 2014 Integros [integros.com]
@ -2448,7 +2448,7 @@ ztest_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
ASSERT3P(zio, !=, NULL);
size = doi.doi_data_block_size;
if (ISP2(size)) {
offset = P2ALIGN(offset, size);
offset = P2ALIGN_TYPED(offset, size, uint64_t);
} else {
ASSERT3U(offset, <, size);
offset = 0;
@ -3270,7 +3270,7 @@ ztest_vdev_add_remove(ztest_ds_t *zd, uint64_t id)
"log" : NULL, ztest_opts.zo_raid_children, zs->zs_mirrors,
1);
error = spa_vdev_add(spa, nvroot);
error = spa_vdev_add(spa, nvroot, B_FALSE);
fnvlist_free(nvroot);
switch (error) {
@ -3332,7 +3332,7 @@ ztest_vdev_class_add(ztest_ds_t *zd, uint64_t id)
nvroot = make_vdev_root(NULL, NULL, NULL, ztest_opts.zo_vdev_size, 0,
class, ztest_opts.zo_raid_children, zs->zs_mirrors, 1);
error = spa_vdev_add(spa, nvroot);
error = spa_vdev_add(spa, nvroot, B_FALSE);
fnvlist_free(nvroot);
if (error == ENOSPC)
@ -3439,7 +3439,7 @@ ztest_vdev_aux_add_remove(ztest_ds_t *zd, uint64_t id)
*/
nvlist_t *nvroot = make_vdev_root(NULL, aux, NULL,
(ztest_opts.zo_vdev_size * 5) / 4, 0, NULL, 0, 0, 1);
error = spa_vdev_add(spa, nvroot);
error = spa_vdev_add(spa, nvroot, B_FALSE);
switch (error) {
case 0:
@ -4668,7 +4668,8 @@ ztest_dmu_object_next_chunk(ztest_ds_t *zd, uint64_t id)
*/
mutex_enter(&os->os_obj_lock);
object = ztest_random(os->os_obj_next_chunk);
os->os_obj_next_chunk = P2ALIGN(object, dnodes_per_chunk);
os->os_obj_next_chunk = P2ALIGN_TYPED(object, dnodes_per_chunk,
uint64_t);
mutex_exit(&os->os_obj_lock);
}
@ -6284,7 +6285,8 @@ ztest_fault_inject(ztest_ds_t *zd, uint64_t id)
* the end of the disk (vdev_psize) is aligned to
* sizeof (vdev_label_t).
*/
uint64_t psize = P2ALIGN(fsize, sizeof (vdev_label_t));
uint64_t psize = P2ALIGN_TYPED(fsize, sizeof (vdev_label_t),
uint64_t);
if ((leaf & 1) == 1 &&
offset + sizeof (bad) > psize - VDEV_LABEL_END_SIZE)
continue;
@ -6600,8 +6602,8 @@ ztest_fletcher_incr(ztest_ds_t *zd, uint64_t id)
size_t inc = 64 * ztest_random(size / 67);
/* sometimes add few bytes to test non-simd */
if (ztest_random(100) < 10)
inc += P2ALIGN(ztest_random(64),
sizeof (uint32_t));
inc += P2ALIGN_TYPED(ztest_random(64),
sizeof (uint32_t), uint64_t);
if (inc > (size - pos))
inc = size - pos;

View File

@ -33,6 +33,7 @@ AM_CPPFLAGS += -D_REENTRANT
AM_CPPFLAGS += -D_FILE_OFFSET_BITS=64
AM_CPPFLAGS += -D_LARGEFILE64_SOURCE
AM_CPPFLAGS += -DLIBEXECDIR=\"$(libexecdir)\"
AM_CPPFLAGS += -DZFSEXECDIR=\"$(zfsexecdir)\"
AM_CPPFLAGS += -DRUNSTATEDIR=\"$(runstatedir)\"
AM_CPPFLAGS += -DSBINDIR=\"$(sbindir)\"
AM_CPPFLAGS += -DSYSCONFDIR=\"$(sysconfdir)\"
@ -41,21 +42,6 @@ AM_CPPFLAGS += $(DEBUG_CPPFLAGS)
AM_CPPFLAGS += $(CODE_COVERAGE_CPPFLAGS)
AM_CPPFLAGS += -DTEXT_DOMAIN=\"zfs-@ac_system_l@-user\"
AM_CPPFLAGS_NOCHECK = -D"strtok(...)=strtok(__VA_ARGS__) __attribute__((deprecated(\"Use strtok_r(3) instead!\")))"
AM_CPPFLAGS_NOCHECK += -D"__xpg_basename(...)=__xpg_basename(__VA_ARGS__) __attribute__((deprecated(\"basename(3) is underspecified. Use zfs_basename() instead!\")))"
AM_CPPFLAGS_NOCHECK += -D"basename(...)=basename(__VA_ARGS__) __attribute__((deprecated(\"basename(3) is underspecified. Use zfs_basename() instead!\")))"
AM_CPPFLAGS_NOCHECK += -D"dirname(...)=dirname(__VA_ARGS__) __attribute__((deprecated(\"dirname(3) is underspecified. Use zfs_dirnamelen() instead!\")))"
AM_CPPFLAGS_NOCHECK += -D"bcopy(...)=__attribute__((deprecated(\"bcopy(3) is deprecated. Use memcpy(3)/memmove(3) instead!\"))) bcopy(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"bcmp(...)=__attribute__((deprecated(\"bcmp(3) is deprecated. Use memcmp(3) instead!\"))) bcmp(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"bzero(...)=__attribute__((deprecated(\"bzero(3) is deprecated. Use memset(3) instead!\"))) bzero(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"asctime(...)=__attribute__((deprecated(\"Use strftime(3) instead!\"))) asctime(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"asctime_r(...)=__attribute__((deprecated(\"Use strftime(3) instead!\"))) asctime_r(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"gmtime(...)=__attribute__((deprecated(\"gmtime(3) isn't thread-safe. Use gmtime_r(3) instead!\"))) gmtime(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"localtime(...)=__attribute__((deprecated(\"localtime(3) isn't thread-safe. Use localtime_r(3) instead!\"))) localtime(__VA_ARGS__)"
AM_CPPFLAGS_NOCHECK += -D"strncpy(...)=__attribute__((deprecated(\"strncpy(3) is deprecated. Use strlcpy(3) instead!\"))) strncpy(__VA_ARGS__)"
AM_CPPFLAGS += $(AM_CPPFLAGS_NOCHECK)
if ASAN_ENABLED
AM_CPPFLAGS += -DZFS_ASAN_ENABLED
endif

View File

@ -18,6 +18,7 @@ subst_sed_cmd = \
-e 's|@ASAN_ENABLED[@]|$(ASAN_ENABLED)|g' \
-e 's|@DEFAULT_INIT_NFS_SERVER[@]|$(DEFAULT_INIT_NFS_SERVER)|g' \
-e 's|@DEFAULT_INIT_SHELL[@]|$(DEFAULT_INIT_SHELL)|g' \
-e 's|@IS_SYSV_RC[@]|$(IS_SYSV_RC)|g' \
-e 's|@LIBFETCH_DYNAMIC[@]|$(LIBFETCH_DYNAMIC)|g' \
-e 's|@LIBFETCH_SONAME[@]|$(LIBFETCH_SONAME)|g' \
-e 's|@PYTHON[@]|$(PYTHON)|g' \
@ -43,4 +44,4 @@ SUBSTFILES =
CLEANFILES += $(SUBSTFILES)
dist_noinst_DATA += $(SUBSTFILES:=.in)
$(call SUBST,%,)
$(SUBSTFILES): $(call SUBST,%,)

View File

@ -80,10 +80,11 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_PYZFS], [
[AC_MSG_ERROR("Python $PYTHON_VERSION unknown")]
)
AX_PYTHON_DEVEL([$PYTHON_REQUIRED_VERSION], [
AS_IF([test "x$enable_pyzfs" = xyes], [
AC_MSG_ERROR("Python $PYTHON_REQUIRED_VERSION development library is not installed")
], [test "x$enable_pyzfs" != xno], [
AS_IF([test "x$enable_pyzfs" = xyes], [
AX_PYTHON_DEVEL([$PYTHON_REQUIRED_VERSION])
], [
AX_PYTHON_DEVEL([$PYTHON_REQUIRED_VERSION], [true])
AS_IF([test "x$ax_python_devel_found" = xno], [
enable_pyzfs=no
])
])

View File

@ -4,18 +4,13 @@
#
# SYNOPSIS
#
# AX_PYTHON_DEVEL([version], [action-if-not-found])
# AX_PYTHON_DEVEL([version[,optional]])
#
# DESCRIPTION
#
# Note: Defines as a precious variable "PYTHON_VERSION". Don't override it
# in your configure.ac.
#
# Note: this is a slightly modified version of the original AX_PYTHON_DEVEL
# macro which accepts an additional [action-if-not-found] argument. This
# allow to detect if Python development is available without aborting the
# configure phase with an hard error in case it is not.
#
# This macro checks for Python and tries to get the include path to
# 'Python.h'. It provides the $(PYTHON_CPPFLAGS) and $(PYTHON_LIBS) output
# variables. It also exports $(PYTHON_EXTRA_LIBS) and
@ -28,6 +23,11 @@
# version number. Don't use "PYTHON_VERSION" for this: that environment
# variable is declared as precious and thus reserved for the end-user.
#
# By default this will fail if it does not detect a development version of
# python. If you want it to continue, set optional to true, like
# AX_PYTHON_DEVEL([], [true]). The ax_python_devel_found variable will be
# "no" if it fails.
#
# This macro should work for all versions of Python >= 2.1.0. As an end
# user, you can disable the check for the python version by setting the
# PYTHON_NOVERSIONCHECK environment variable to something else than the
@ -45,7 +45,6 @@
# Copyright (c) 2009 Matteo Settenvini <matteo@member.fsf.org>
# Copyright (c) 2009 Horst Knorr <hk_classes@knoda.org>
# Copyright (c) 2013 Daniel Mullner <muellner@math.stanford.edu>
# Copyright (c) 2018 loli10K <ezomori.nozomu@gmail.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
@ -73,10 +72,18 @@
# modified version of the Autoconf Macro, you may extend this special
# exception to the GPL to apply to your modified version as well.
#serial 21
#serial 36
AU_ALIAS([AC_PYTHON_DEVEL], [AX_PYTHON_DEVEL])
AC_DEFUN([AX_PYTHON_DEVEL],[
# Get whether it's optional
if test -z "$2"; then
ax_python_devel_optional=false
else
ax_python_devel_optional=$2
fi
ax_python_devel_found=yes
#
# Allow the use of a (user set) custom python version
#
@ -87,23 +94,26 @@ AC_DEFUN([AX_PYTHON_DEVEL],[
AC_PATH_PROG([PYTHON],[python[$PYTHON_VERSION]])
if test -z "$PYTHON"; then
m4_ifvaln([$2],[$2],[
AC_MSG_ERROR([Cannot find python$PYTHON_VERSION in your system path])
PYTHON_VERSION=""
])
AC_MSG_WARN([Cannot find python$PYTHON_VERSION in your system path])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up, python development not available])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
#
# Check for a version of Python >= 2.1.0
#
AC_MSG_CHECKING([for a version of Python >= '2.1.0'])
ac_supports_python_ver=`$PYTHON -c "import sys; \
if test $ax_python_devel_found = yes; then
#
# Check for a version of Python >= 2.1.0
#
AC_MSG_CHECKING([for a version of Python >= '2.1.0'])
ac_supports_python_ver=`$PYTHON -c "import sys; \
ver = sys.version.split ()[[0]]; \
print (ver >= '2.1.0')"`
if test "$ac_supports_python_ver" != "True"; then
if test "$ac_supports_python_ver" != "True"; then
if test -z "$PYTHON_NOVERSIONCHECK"; then
AC_MSG_RESULT([no])
AC_MSG_FAILURE([
AC_MSG_WARN([
This version of the AC@&t@_PYTHON_DEVEL macro
doesn't work properly with versions of Python before
2.1.0. You may need to re-run configure, setting the
@ -112,20 +122,27 @@ PYTHON_EXTRA_LIBS and PYTHON_EXTRA_LDFLAGS by hand.
Moreover, to disable this check, set PYTHON_NOVERSIONCHECK
to something else than an empty string.
])
if ! $ax_python_devel_optional; then
AC_MSG_FAILURE([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
else
AC_MSG_RESULT([skip at user request])
fi
else
else
AC_MSG_RESULT([yes])
fi
fi
#
# If the macro parameter ``version'' is set, honour it.
# A Python shim class, VPy, is used to implement correct version comparisons via
# string expressions, since e.g. a naive textual ">= 2.7.3" won't work for
# Python 2.7.10 (the ".1" being evaluated as less than ".3").
#
if test -n "$1"; then
if test $ax_python_devel_found = yes; then
#
# If the macro parameter ``version'' is set, honour it.
# A Python shim class, VPy, is used to implement correct version comparisons via
# string expressions, since e.g. a naive textual ">= 2.7.3" won't work for
# Python 2.7.10 (the ".1" being evaluated as less than ".3").
#
if test -n "$1"; then
AC_MSG_CHECKING([for a version of Python $1])
cat << EOF > ax_python_devel_vpy.py
class VPy:
@ -133,7 +150,7 @@ class VPy:
return tuple(map(int, s.strip().replace("rc", ".").split(".")))
def __init__(self):
import sys
self.vpy = tuple(sys.version_info)
self.vpy = tuple(sys.version_info)[[:3]]
def __eq__(self, s):
return self.vpy == self.vtup(s)
def __ne__(self, s):
@ -155,25 +172,69 @@ EOF
AC_MSG_RESULT([yes])
else
AC_MSG_RESULT([no])
AC_MSG_ERROR([this package requires Python $1.
AC_MSG_WARN([this package requires Python $1.
If you have it installed, but it isn't the default Python
interpreter in your system path, please pass the PYTHON_VERSION
variable to configure. See ``configure --help'' for reference.
])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
fi
#
# Check for Python include path
#
#
AC_MSG_CHECKING([for Python include path])
if test -z "$PYTHON_CPPFLAGS"; then
python_path=`$PYTHON -c "import sysconfig; \
print (sysconfig.get_path('include'));"`
plat_python_path=`$PYTHON -c "import sysconfig; \
print (sysconfig.get_path('platinclude'));"`
if test $ax_python_devel_found = yes; then
#
# Check if you have distutils, else fail
#
AC_MSG_CHECKING([for the sysconfig Python package])
ac_sysconfig_result=`$PYTHON -c "import sysconfig" 2>&1`
if test $? -eq 0; then
AC_MSG_RESULT([yes])
IMPORT_SYSCONFIG="import sysconfig"
else
AC_MSG_RESULT([no])
AC_MSG_CHECKING([for the distutils Python package])
ac_sysconfig_result=`$PYTHON -c "from distutils import sysconfig" 2>&1`
if test $? -eq 0; then
AC_MSG_RESULT([yes])
IMPORT_SYSCONFIG="from distutils import sysconfig"
else
AC_MSG_WARN([cannot import Python module "distutils".
Please check your Python installation. The error was:
$ac_sysconfig_result])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
fi
if test $ax_python_devel_found = yes; then
#
# Check for Python include path
#
AC_MSG_CHECKING([for Python include path])
if test -z "$PYTHON_CPPFLAGS"; then
if test "$IMPORT_SYSCONFIG" = "import sysconfig"; then
# sysconfig module has different functions
python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_path ('include'));"`
plat_python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_path ('platinclude'));"`
else
# old distutils way
python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_inc ());"`
plat_python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_inc (plat_specific=1));"`
fi
if test -n "${python_path}"; then
if test "${plat_python_path}" != "${python_path}"; then
python_path="-I$python_path -I$plat_python_path"
@ -182,15 +243,15 @@ variable to configure. See ``configure --help'' for reference.
fi
fi
PYTHON_CPPFLAGS=$python_path
fi
AC_MSG_RESULT([$PYTHON_CPPFLAGS])
AC_SUBST([PYTHON_CPPFLAGS])
fi
AC_MSG_RESULT([$PYTHON_CPPFLAGS])
AC_SUBST([PYTHON_CPPFLAGS])
#
# Check for Python library path
#
AC_MSG_CHECKING([for Python library path])
if test -z "$PYTHON_LIBS"; then
#
# Check for Python library path
#
AC_MSG_CHECKING([for Python library path])
if test -z "$PYTHON_LIBS"; then
# (makes two attempts to ensure we've got a version number
# from the interpreter)
ac_python_version=`cat<<EOD | $PYTHON -
@ -208,7 +269,7 @@ EOD`
ac_python_version=$PYTHON_VERSION
else
ac_python_version=`$PYTHON -c "import sys; \
print ('.'.join(sys.version.split('.')[[:2]]))"`
print ("%d.%d" % sys.version_info[[:2]])"`
fi
fi
@ -220,7 +281,7 @@ EOD`
ac_python_libdir=`cat<<EOD | $PYTHON -
# There should be only one
import sysconfig
$IMPORT_SYSCONFIG
e = sysconfig.get_config_var('LIBDIR')
if e is not None:
print (e)
@ -229,7 +290,7 @@ EOD`
# Now, for the library:
ac_python_library=`cat<<EOD | $PYTHON -
import sysconfig
$IMPORT_SYSCONFIG
c = sysconfig.get_config_vars()
if 'LDVERSION' in c:
print ('python'+c[['LDVERSION']])
@ -249,88 +310,140 @@ EOD`
else
# old way: use libpython from python_configdir
ac_python_libdir=`$PYTHON -c \
"import sysconfig; \
"from sysconfig import get_python_lib as f; \
import os; \
print (os.path.join(sysconfig.get_path('platstdlib'), 'config'));"`
print (os.path.join(f(plat_specific=1, standard_lib=1), 'config'));"`
PYTHON_LIBS="-L$ac_python_libdir -lpython$ac_python_version"
fi
if test -z "PYTHON_LIBS"; then
m4_ifvaln([$2],[$2],[
AC_MSG_ERROR([
AC_MSG_WARN([
Cannot determine location of your Python DSO. Please check it was installed with
dynamic libraries enabled, or try setting PYTHON_LIBS by hand.
])
])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
fi
AC_MSG_RESULT([$PYTHON_LIBS])
AC_SUBST([PYTHON_LIBS])
#
# Check for site packages
#
AC_MSG_CHECKING([for Python site-packages path])
if test -z "$PYTHON_SITE_PKG"; then
PYTHON_SITE_PKG=`$PYTHON -c "import distutils.sysconfig; \
print (distutils.sysconfig.get_python_lib(0,0));" 2>/dev/null || \
$PYTHON -c "import sysconfig; \
print (sysconfig.get_path('purelib'));"`
fi
AC_MSG_RESULT([$PYTHON_SITE_PKG])
AC_SUBST([PYTHON_SITE_PKG])
if test $ax_python_devel_found = yes; then
AC_MSG_RESULT([$PYTHON_LIBS])
AC_SUBST([PYTHON_LIBS])
#
# libraries which must be linked in when embedding
#
AC_MSG_CHECKING(python extra libraries)
if test -z "$PYTHON_EXTRA_LIBS"; then
PYTHON_EXTRA_LIBS=`$PYTHON -c "import sysconfig; \
#
# Check for site packages
#
AC_MSG_CHECKING([for Python site-packages path])
if test -z "$PYTHON_SITE_PKG"; then
if test "$IMPORT_SYSCONFIG" = "import sysconfig"; then
PYTHON_SITE_PKG=`$PYTHON -c "
$IMPORT_SYSCONFIG;
if hasattr(sysconfig, 'get_default_scheme'):
scheme = sysconfig.get_default_scheme()
else:
scheme = sysconfig._get_default_scheme()
if scheme == 'posix_local':
# Debian's default scheme installs to /usr/local/ but we want to find headers in /usr/
scheme = 'posix_prefix'
prefix = '$prefix'
if prefix == 'NONE':
prefix = '$ac_default_prefix'
sitedir = sysconfig.get_path('purelib', scheme, vars={'base': prefix})
print(sitedir)"`
else
# distutils.sysconfig way
PYTHON_SITE_PKG=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_lib(0,0));"`
fi
fi
AC_MSG_RESULT([$PYTHON_SITE_PKG])
AC_SUBST([PYTHON_SITE_PKG])
#
# Check for platform-specific site packages
#
AC_MSG_CHECKING([for Python platform specific site-packages path])
if test -z "$PYTHON_PLATFORM_SITE_PKG"; then
if test "$IMPORT_SYSCONFIG" = "import sysconfig"; then
PYTHON_PLATFORM_SITE_PKG=`$PYTHON -c "
$IMPORT_SYSCONFIG;
if hasattr(sysconfig, 'get_default_scheme'):
scheme = sysconfig.get_default_scheme()
else:
scheme = sysconfig._get_default_scheme()
if scheme == 'posix_local':
# Debian's default scheme installs to /usr/local/ but we want to find headers in /usr/
scheme = 'posix_prefix'
prefix = '$prefix'
if prefix == 'NONE':
prefix = '$ac_default_prefix'
sitedir = sysconfig.get_path('platlib', scheme, vars={'platbase': prefix})
print(sitedir)"`
else
# distutils.sysconfig way
PYTHON_PLATFORM_SITE_PKG=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_lib(1,0));"`
fi
fi
AC_MSG_RESULT([$PYTHON_PLATFORM_SITE_PKG])
AC_SUBST([PYTHON_PLATFORM_SITE_PKG])
#
# libraries which must be linked in when embedding
#
AC_MSG_CHECKING(python extra libraries)
if test -z "$PYTHON_EXTRA_LIBS"; then
PYTHON_EXTRA_LIBS=`$PYTHON -c "$IMPORT_SYSCONFIG; \
conf = sysconfig.get_config_var; \
print (conf('LIBS') + ' ' + conf('SYSLIBS'))"`
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LIBS])
AC_SUBST(PYTHON_EXTRA_LIBS)
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LIBS])
AC_SUBST(PYTHON_EXTRA_LIBS)
#
# linking flags needed when embedding
#
AC_MSG_CHECKING(python extra linking flags)
if test -z "$PYTHON_EXTRA_LDFLAGS"; then
PYTHON_EXTRA_LDFLAGS=`$PYTHON -c "import sysconfig; \
#
# linking flags needed when embedding
#
AC_MSG_CHECKING(python extra linking flags)
if test -z "$PYTHON_EXTRA_LDFLAGS"; then
PYTHON_EXTRA_LDFLAGS=`$PYTHON -c "$IMPORT_SYSCONFIG; \
conf = sysconfig.get_config_var; \
print (conf('LINKFORSHARED'))"`
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LDFLAGS])
AC_SUBST(PYTHON_EXTRA_LDFLAGS)
# Hack for macos, it sticks this in here.
PYTHON_EXTRA_LDFLAGS=`echo $PYTHON_EXTRA_LDFLAGS | sed 's/CoreFoundation.*$/CoreFoundation/'`
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LDFLAGS])
AC_SUBST(PYTHON_EXTRA_LDFLAGS)
#
# final check to see if everything compiles alright
#
AC_MSG_CHECKING([consistency of all components of python development environment])
# save current global flags
ac_save_LIBS="$LIBS"
ac_save_LDFLAGS="$LDFLAGS"
ac_save_CPPFLAGS="$CPPFLAGS"
LIBS="$ac_save_LIBS $PYTHON_LIBS $PYTHON_EXTRA_LIBS $PYTHON_EXTRA_LIBS"
LDFLAGS="$ac_save_LDFLAGS $PYTHON_EXTRA_LDFLAGS"
CPPFLAGS="$ac_save_CPPFLAGS $PYTHON_CPPFLAGS"
AC_LANG_PUSH([C])
AC_LINK_IFELSE([
#
# final check to see if everything compiles alright
#
AC_MSG_CHECKING([consistency of all components of python development environment])
# save current global flags
ac_save_LIBS="$LIBS"
ac_save_LDFLAGS="$LDFLAGS"
ac_save_CPPFLAGS="$CPPFLAGS"
LIBS="$ac_save_LIBS $PYTHON_LIBS $PYTHON_EXTRA_LIBS"
LDFLAGS="$ac_save_LDFLAGS $PYTHON_EXTRA_LDFLAGS"
CPPFLAGS="$ac_save_CPPFLAGS $PYTHON_CPPFLAGS"
AC_LANG_PUSH([C])
AC_LINK_IFELSE([
AC_LANG_PROGRAM([[#include <Python.h>]],
[[Py_Initialize();]])
],[pythonexists=yes],[pythonexists=no])
AC_LANG_POP([C])
# turn back to default flags
CPPFLAGS="$ac_save_CPPFLAGS"
LIBS="$ac_save_LIBS"
LDFLAGS="$ac_save_LDFLAGS"
AC_LANG_POP([C])
# turn back to default flags
CPPFLAGS="$ac_save_CPPFLAGS"
LIBS="$ac_save_LIBS"
LDFLAGS="$ac_save_LDFLAGS"
AC_MSG_RESULT([$pythonexists])
AC_MSG_RESULT([$pythonexists])
if test ! "x$pythonexists" = "xyes"; then
m4_ifvaln([$2],[$2],[
AC_MSG_FAILURE([
if test ! "x$pythonexists" = "xyes"; then
AC_MSG_WARN([
Could not link test program to Python. Maybe the main Python library has been
installed in some non-standard library path. If so, pass it to configure,
via the LIBS environment variable.
@ -340,9 +453,13 @@ EOD`
You probably have to install the development version of the Python package
for your distribution. The exact name of this package varies among them.
============================================================================
])
PYTHON_VERSION=""
])
])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
#

View File

@ -90,8 +90,8 @@ AC_DEFUN([ZFS_AC_FIND_SYSTEM_LIBRARY], [
AC_DEFINE([HAVE_][$1], [1], [Define if you have [$5]])
$7
],[dnl ELSE
AC_SUBST([$1]_CFLAGS, [])
AC_SUBST([$1]_LIBS, [])
AC_SUBST([$1]_CFLAGS, [""])
AC_SUBST([$1]_LIBS, [""])
AC_MSG_WARN([cannot find [$5] via pkg-config or in the standard locations])
$8
])

View File

@ -172,7 +172,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_GET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_get_acl], [
#include <linux/fs.h>
struct posix_acl *get_acl_fn(struct inode *inode, int type)
static struct posix_acl *get_acl_fn(struct inode *inode, int type)
{ return NULL; }
static const struct inode_operations
@ -184,7 +184,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_GET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_get_acl_rcu], [
#include <linux/fs.h>
struct posix_acl *get_acl_fn(struct inode *inode, int type,
static struct posix_acl *get_acl_fn(struct inode *inode, int type,
bool rcu) { return NULL; }
static const struct inode_operations
@ -196,7 +196,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_GET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_get_inode_acl], [
#include <linux/fs.h>
struct posix_acl *get_inode_acl_fn(struct inode *inode, int type,
static struct posix_acl *get_inode_acl_fn(struct inode *inode, int type,
bool rcu) { return NULL; }
static const struct inode_operations
@ -243,7 +243,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_SET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_set_acl_mnt_idmap_dentry], [
#include <linux/fs.h>
int set_acl_fn(struct mnt_idmap *idmap,
static int set_acl_fn(struct mnt_idmap *idmap,
struct dentry *dent, struct posix_acl *acl,
int type) { return 0; }
@ -255,7 +255,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_SET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_set_acl_userns_dentry], [
#include <linux/fs.h>
int set_acl_fn(struct user_namespace *userns,
static int set_acl_fn(struct user_namespace *userns,
struct dentry *dent, struct posix_acl *acl,
int type) { return 0; }
@ -267,7 +267,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_SET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_set_acl_userns], [
#include <linux/fs.h>
int set_acl_fn(struct user_namespace *userns,
static int set_acl_fn(struct user_namespace *userns,
struct inode *inode, struct posix_acl *acl,
int type) { return 0; }
@ -279,7 +279,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_SET_ACL], [
ZFS_LINUX_TEST_SRC([inode_operations_set_acl], [
#include <linux/fs.h>
int set_acl_fn(struct inode *inode, struct posix_acl *acl,
static int set_acl_fn(struct inode *inode, struct posix_acl *acl,
int type) { return 0; }
static const struct inode_operations

View File

@ -8,7 +8,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_AUTOMOUNT], [
ZFS_LINUX_TEST_SRC([dentry_operations_d_automount], [
#include <linux/dcache.h>
struct vfsmount *d_automount(struct path *p) { return NULL; }
static struct vfsmount *d_automount(struct path *p) { return NULL; }
struct dentry_operations dops __attribute__ ((unused)) = {
.d_automount = d_automount,
};

View File

@ -247,7 +247,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BIO_END_IO_T_ARGS], [
ZFS_LINUX_TEST_SRC([bio_end_io_t_args], [
#include <linux/bio.h>
void wanted_end_io(struct bio *bio) { return; }
static void wanted_end_io(struct bio *bio) { return; }
bio_end_io_t *end_io __attribute__ ((unused)) = wanted_end_io;
], [])
])

View File

@ -25,6 +25,8 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_PLUG], [
dnl #
dnl # 2.6.32 - 4.11: statically allocated bdi in request_queue
dnl # 4.12: dynamically allocated bdi in request_queue
dnl # 6.11: bdi no longer available through request_queue, so get it from
dnl # the gendisk attached to the queue
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE_BDI], [
ZFS_LINUX_TEST_SRC([blk_queue_bdi], [
@ -47,6 +49,30 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_BDI], [
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE_DISK_BDI], [
ZFS_LINUX_TEST_SRC([blk_queue_disk_bdi], [
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
], [
struct request_queue q;
struct gendisk disk;
struct backing_dev_info bdi __attribute__ ((unused));
q.disk = &disk;
q.disk->bdi = &bdi;
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_DISK_BDI], [
AC_MSG_CHECKING([whether backing_dev_info is available through queue gendisk])
ZFS_LINUX_TEST_RESULT([blk_queue_disk_bdi], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_DISK_BDI, 1,
[backing_dev_info is available through queue gendisk])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 5.9: added blk_queue_update_readahead(),
dnl # 5.15: renamed to disk_update_readahead()
@ -332,7 +358,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_MAX_HW_SECTORS], [
ZFS_LINUX_TEST_RESULT([blk_queue_max_hw_sectors], [
AC_MSG_RESULT(yes)
],[
ZFS_LINUX_TEST_ERROR([blk_queue_max_hw_sectors])
AC_MSG_RESULT(no)
])
])
@ -355,7 +381,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_MAX_SEGMENTS], [
ZFS_LINUX_TEST_RESULT([blk_queue_max_segments], [
AC_MSG_RESULT(yes)
], [
ZFS_LINUX_TEST_ERROR([blk_queue_max_segments])
AC_MSG_RESULT(no)
])
])
@ -377,6 +403,14 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_MQ], [
(void) blk_mq_alloc_tag_set(&tag_set);
return BLK_STS_OK;
], [])
ZFS_LINUX_TEST_SRC([blk_mq_rq_hctx], [
#include <linux/blk-mq.h>
#include <linux/blkdev.h>
], [
struct request rq = {0};
struct blk_mq_hw_ctx *hctx = NULL;
rq.mq_hctx = hctx;
], [])
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_MQ], [
@ -384,6 +418,13 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_MQ], [
ZFS_LINUX_TEST_RESULT([blk_mq], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_MQ, 1, [block multiqueue is available])
AC_MSG_CHECKING([whether block multiqueue hardware context is cached in struct request])
ZFS_LINUX_TEST_RESULT([blk_mq_rq_hctx], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_MQ_RQ_HCTX, 1, [block multiqueue hardware context is cached in struct request])
], [
AC_MSG_RESULT(no)
])
], [
AC_MSG_RESULT(no)
])
@ -392,6 +433,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_MQ], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE], [
ZFS_AC_KERNEL_SRC_BLK_QUEUE_PLUG
ZFS_AC_KERNEL_SRC_BLK_QUEUE_BDI
ZFS_AC_KERNEL_SRC_BLK_QUEUE_DISK_BDI
ZFS_AC_KERNEL_SRC_BLK_QUEUE_UPDATE_READAHEAD
ZFS_AC_KERNEL_SRC_BLK_QUEUE_DISCARD
ZFS_AC_KERNEL_SRC_BLK_QUEUE_SECURE_ERASE
@ -406,6 +448,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE], [
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE], [
ZFS_AC_KERNEL_BLK_QUEUE_PLUG
ZFS_AC_KERNEL_BLK_QUEUE_BDI
ZFS_AC_KERNEL_BLK_QUEUE_DISK_BDI
ZFS_AC_KERNEL_BLK_QUEUE_UPDATE_READAHEAD
ZFS_AC_KERNEL_BLK_QUEUE_DISCARD
ZFS_AC_KERNEL_BLK_QUEUE_SECURE_ERASE

View File

@ -35,6 +35,45 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG], [
])
])
dnl #
dnl # 6.8.x API change
dnl # bdev_open_by_path() replaces blkdev_get_by_path()
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_OPEN_BY_PATH], [
ZFS_LINUX_TEST_SRC([bdev_open_by_path], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct bdev_handle *bdh __attribute__ ((unused)) = NULL;
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
bdh = bdev_open_by_path(path, mode, holder, &h);
])
])
dnl #
dnl # 6.9.x API change
dnl # bdev_file_open_by_path() replaced bdev_open_by_path(),
dnl # and returns struct file*
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BDEV_FILE_OPEN_BY_PATH], [
ZFS_LINUX_TEST_SRC([bdev_file_open_by_path], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct file *file __attribute__ ((unused)) = NULL;
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
file = bdev_file_open_by_path(path, mode, holder, &h);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_BY_PATH], [
AC_MSG_CHECKING([whether blkdev_get_by_path() exists and takes 3 args])
ZFS_LINUX_TEST_RESULT([blkdev_get_by_path], [
@ -47,7 +86,24 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_BY_PATH], [
[blkdev_get_by_path() exists and takes 4 args])
AC_MSG_RESULT(yes)
], [
ZFS_LINUX_TEST_ERROR([blkdev_get_by_path()])
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bdev_open_by_path() exists])
ZFS_LINUX_TEST_RESULT([bdev_open_by_path], [
AC_DEFINE(HAVE_BDEV_OPEN_BY_PATH, 1,
[bdev_open_by_path() exists])
AC_MSG_RESULT(yes)
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bdev_file_open_by_path() exists])
ZFS_LINUX_TEST_RESULT([bdev_file_open_by_path], [
AC_DEFINE(HAVE_BDEV_FILE_OPEN_BY_PATH, 1,
[bdev_file_open_by_path() exists])
AC_MSG_RESULT(yes)
], [
AC_MSG_RESULT(no)
ZFS_LINUX_TEST_ERROR([blkdev_get_by_path()])
])
])
])
])
])
@ -108,18 +164,50 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_PUT_HOLDER], [
])
])
dnl #
dnl # 6.8.x API change
dnl # bdev_release() replaces blkdev_put()
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_RELEASE], [
ZFS_LINUX_TEST_SRC([bdev_release], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct bdev_handle *bdh = NULL;
bdev_release(bdh);
])
])
dnl #
dnl # 6.9.x API change
dnl #
dnl # bdev_release() now private, but because bdev_file_open_by_path() returns
dnl # struct file*, we can just use fput(). So the blkdev_put test no longer
dnl # fails if not found.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_PUT], [
AC_MSG_CHECKING([whether blkdev_put() exists])
ZFS_LINUX_TEST_RESULT([blkdev_put], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_PUT, 1, [blkdev_put() exists])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether blkdev_put() accepts void* as arg 2])
ZFS_LINUX_TEST_RESULT([blkdev_put_holder], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_PUT_HOLDER, 1,
[blkdev_put() accepts void* as arg 2])
], [
ZFS_LINUX_TEST_ERROR([blkdev_put()])
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bdev_release() exists])
ZFS_LINUX_TEST_RESULT([bdev_release], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BDEV_RELEASE, 1,
[bdev_release() exists])
], [
AC_MSG_RESULT(no)
])
])
])
])
@ -446,6 +534,30 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BDEV_WHOLE], [
])
])
dnl #
dnl # 5.16 API change
dnl # Added bdev_nr_bytes() helper.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_NR_BYTES], [
ZFS_LINUX_TEST_SRC([bdev_nr_bytes], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
loff_t nr_bytes __attribute__ ((unused)) = 0;
nr_bytes = bdev_nr_bytes(bdev);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BDEV_NR_BYTES], [
AC_MSG_CHECKING([whether bdev_nr_bytes() is available])
ZFS_LINUX_TEST_RESULT([bdev_nr_bytes], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BDEV_NR_BYTES, 1, [bdev_nr_bytes() is available])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 5.20 API change,
dnl # Removed bdevname(), snprintf(.., %pg) should be used.
@ -473,11 +585,29 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BDEVNAME], [
])
dnl #
dnl # 5.19 API: blkdev_issue_secure_erase()
dnl # 3.10 API: blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # TRIM support: discard and secure erase. We make use of asynchronous
dnl # functions when available.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE], [
ZFS_LINUX_TEST_SRC([blkdev_issue_secure_erase], [
dnl # 3.10:
dnl # sync discard: blkdev_issue_discard(..., 0)
dnl # sync erase: blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # async discard: [not available]
dnl # async erase: [not available]
dnl #
dnl # 4.7:
dnl # sync discard: blkdev_issue_discard(..., 0)
dnl # sync erase: blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # async discard: __blkdev_issue_discard(..., 0)
dnl # async erase: __blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl #
dnl # 5.19:
dnl # sync discard: blkdev_issue_discard(...)
dnl # sync erase: blkdev_issue_secure_erase(...)
dnl # async discard: __blkdev_issue_discard(...)
dnl # async erase: [not available]
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_DISCARD], [
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_noflags], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
@ -485,10 +615,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE], [
sector_t nr_sects = 0;
int error __attribute__ ((unused));
error = blkdev_issue_secure_erase(bdev,
error = blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_flags], [
#include <linux/blkdev.h>
],[
@ -501,9 +630,77 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE], [
error = blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, flags);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_async_noflags], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
sector_t sector = 0;
sector_t nr_sects = 0;
struct bio *biop = NULL;
int error __attribute__ ((unused));
error = __blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, &biop);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_async_flags], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
sector_t sector = 0;
sector_t nr_sects = 0;
unsigned long flags = 0;
struct bio *biop = NULL;
int error __attribute__ ((unused));
error = __blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, flags, &biop);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_secure_erase], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
sector_t sector = 0;
sector_t nr_sects = 0;
int error __attribute__ ((unused));
error = blkdev_issue_secure_erase(bdev,
sector, nr_sects, GFP_KERNEL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE], [
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_ISSUE_DISCARD], [
AC_MSG_CHECKING([whether blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_noflags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_NOFLAGS, 1,
[blkdev_issue_discard() is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether blkdev_issue_discard(flags) is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_FLAGS, 1,
[blkdev_issue_discard(flags) is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether __blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_async_noflags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_ASYNC_NOFLAGS, 1,
[__blkdev_issue_discard() is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether __blkdev_issue_discard(flags) is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_async_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_ASYNC_FLAGS, 1,
[__blkdev_issue_discard(flags) is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether blkdev_issue_secure_erase() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_secure_erase], [
AC_MSG_RESULT(yes)
@ -511,15 +708,6 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE], [
[blkdev_issue_secure_erase() is available])
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD, 1,
[blkdev_issue_discard() is available])
],[
ZFS_LINUX_TEST_ERROR([blkdev_issue_discard()])
])
])
])
@ -570,8 +758,11 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BLK_STS_RESV_CONFLICT], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_OPEN_BY_PATH
ZFS_AC_KERNEL_SRC_BDEV_FILE_OPEN_BY_PATH
ZFS_AC_KERNEL_SRC_BLKDEV_PUT
ZFS_AC_KERNEL_SRC_BLKDEV_PUT_HOLDER
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_RELEASE
ZFS_AC_KERNEL_SRC_BLKDEV_REREAD_PART
ZFS_AC_KERNEL_SRC_BLKDEV_INVALIDATE_BDEV
ZFS_AC_KERNEL_SRC_BLKDEV_LOOKUP_BDEV
@ -580,8 +771,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_CHECK_DISK_CHANGE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_WHOLE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_NR_BYTES
ZFS_AC_KERNEL_SRC_BLKDEV_BDEVNAME
ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_DISCARD
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_SRC_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_SRC_BLKDEV_DISK_CHECK_MEDIA_CHANGE
@ -600,9 +792,10 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [
ZFS_AC_KERNEL_BLKDEV_CHECK_DISK_CHANGE
ZFS_AC_KERNEL_BLKDEV_BDEV_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_BLKDEV_BDEV_WHOLE
ZFS_AC_KERNEL_BLKDEV_BDEV_NR_BYTES
ZFS_AC_KERNEL_BLKDEV_BDEVNAME
ZFS_AC_KERNEL_BLKDEV_GET_ERESTARTSYS
ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_BLKDEV_ISSUE_DISCARD
ZFS_AC_KERNEL_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_BLKDEV_DISK_CHECK_MEDIA_CHANGE

View File

@ -5,7 +5,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_CHECK_EVENTS], [
ZFS_LINUX_TEST_SRC([block_device_operations_check_events], [
#include <linux/blkdev.h>
unsigned int blk_check_events(struct gendisk *disk,
static unsigned int blk_check_events(struct gendisk *disk,
unsigned int clearing) {
(void) disk, (void) clearing;
return (0);
@ -34,7 +34,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID], [
ZFS_LINUX_TEST_SRC([block_device_operations_release_void], [
#include <linux/blkdev.h>
void blk_release(struct gendisk *g, fmode_t mode) {
static void blk_release(struct gendisk *g, fmode_t mode) {
(void) g, (void) mode;
return;
}
@ -56,7 +56,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG], [
ZFS_LINUX_TEST_SRC([block_device_operations_release_void_1arg], [
#include <linux/blkdev.h>
void blk_release(struct gendisk *g) {
static void blk_release(struct gendisk *g) {
(void) g;
return;
}
@ -96,7 +96,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK], [
ZFS_LINUX_TEST_SRC([block_device_operations_revalidate_disk], [
#include <linux/blkdev.h>
int blk_revalidate_disk(struct gendisk *disk) {
static int blk_revalidate_disk(struct gendisk *disk) {
(void) disk;
return(0);
}

View File

@ -7,7 +7,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_COMMIT_METADATA], [
ZFS_LINUX_TEST_SRC([export_operations_commit_metadata], [
#include <linux/exportfs.h>
int commit_metadata(struct inode *inode) { return 0; }
static int commit_metadata(struct inode *inode) { return 0; }
static struct export_operations eops __attribute__ ((unused))={
.commit_metadata = commit_metadata,
};

View File

@ -2,12 +2,15 @@ dnl #
dnl # 4.9, current_time() added
dnl # 4.18, return type changed from timespec to timespec64
dnl #
dnl # Note that we don't care about the return type in this check. If we have
dnl # to implement a fallback, we'll know we're <4.9, which was timespec.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_CURRENT_TIME], [
ZFS_LINUX_TEST_SRC([current_time], [
#include <linux/fs.h>
], [
struct inode ip __attribute__ ((unused));
ip.i_atime = current_time(&ip);
(void) current_time(&ip);
])
])

View File

@ -98,7 +98,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_D_REVALIDATE_NAMEIDATA], [
#include <linux/dcache.h>
#include <linux/sched.h>
int revalidate (struct dentry *dentry,
static int revalidate (struct dentry *dentry,
struct nameidata *nidata) { return 0; }
static const struct dentry_operations

View File

@ -8,7 +8,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_DIRTY_INODE], [
ZFS_LINUX_TEST_SRC([dirty_inode_with_flags], [
#include <linux/fs.h>
void dirty_inode(struct inode *a, int b) { return; }
static void dirty_inode(struct inode *a, int b) { return; }
static const struct super_operations
sops __attribute__ ((unused)) = {

View File

@ -7,7 +7,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_ENCODE_FH_WITH_INODE], [
ZFS_LINUX_TEST_SRC([export_operations_encode_fh], [
#include <linux/exportfs.h>
int encode_fh(struct inode *inode, __u32 *fh, int *max_len,
static int encode_fh(struct inode *inode, __u32 *fh, int *max_len,
struct inode *parent) { return 0; }
static struct export_operations eops __attribute__ ((unused))={
.encode_fh = encode_fh,

View File

@ -6,7 +6,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_EVICT_INODE], [
ZFS_LINUX_TEST_SRC([evict_inode], [
#include <linux/fs.h>
void evict_inode (struct inode * t) { return; }
static void evict_inode (struct inode * t) { return; }
static struct super_operations sops __attribute__ ((unused)) = {
.evict_inode = evict_inode,
};

View File

@ -11,7 +11,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_FALLOCATE], [
ZFS_LINUX_TEST_SRC([file_fallocate], [
#include <linux/fs.h>
long test_fallocate(struct file *file, int mode,
static long test_fallocate(struct file *file, int mode,
loff_t offset, loff_t len) { return 0; }
static const struct file_operations

View File

@ -4,6 +4,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_FILEMAP], [
ZFS_LINUX_TEST_SRC([filemap_range_has_page], [
#include <linux/fs.h>
#include <linux/pagemap.h>
],[
struct address_space *mapping = NULL;
loff_t lstart = 0;

View File

@ -1,7 +1,8 @@
dnl #
dnl # Starting from Linux 5.13, flush_dcache_page() becomes an inline
dnl # function and may indirectly referencing GPL-only cpu_feature_keys on
dnl # powerpc
dnl # function and may indirectly referencing GPL-only symbols:
dnl # on powerpc: cpu_feature_keys
dnl # on riscv: PageHuge (added from 6.2)
dnl #
dnl #

View File

@ -79,6 +79,12 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_FPU], [
__kernel_fpu_end();
], [], [ZFS_META_LICENSE])
ZFS_LINUX_TEST_SRC([kernel_neon], [
#include <asm/neon.h>
], [
kernel_neon_begin();
kernel_neon_end();
], [], [ZFS_META_LICENSE])
])
AC_DEFUN([ZFS_AC_KERNEL_FPU], [
@ -105,9 +111,20 @@ AC_DEFUN([ZFS_AC_KERNEL_FPU], [
AC_DEFINE(KERNEL_EXPORTS_X86_FPU, 1,
[kernel exports FPU functions])
],[
AC_MSG_RESULT(internal)
AC_DEFINE(HAVE_KERNEL_FPU_INTERNAL, 1,
[kernel fpu internal])
dnl #
dnl # ARM neon symbols (only on arm and arm64)
dnl # could be GPL-only on arm64 after Linux 6.2
dnl #
ZFS_LINUX_TEST_RESULT([kernel_neon_license],[
AC_MSG_RESULT(kernel_neon_*)
AC_DEFINE(HAVE_KERNEL_NEON, 1,
[kernel has kernel_neon_* functions])
],[
# catch-all
AC_MSG_RESULT(internal)
AC_DEFINE(HAVE_KERNEL_FPU_INTERNAL, 1,
[kernel fpu internal])
])
])
])
])

View File

@ -0,0 +1,36 @@
dnl #
dnl # 6.6 API change,
dnl # fsync_bdev was removed in favor of sync_blockdev
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_SYNC_BDEV], [
ZFS_LINUX_TEST_SRC([fsync_bdev], [
#include <linux/blkdev.h>
],[
fsync_bdev(NULL);
])
ZFS_LINUX_TEST_SRC([sync_blockdev], [
#include <linux/blkdev.h>
],[
sync_blockdev(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_SYNC_BDEV], [
AC_MSG_CHECKING([whether fsync_bdev() exists])
ZFS_LINUX_TEST_RESULT([fsync_bdev], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FSYNC_BDEV, 1,
[fsync_bdev() is declared in include/blkdev.h])
],[
AC_MSG_CHECKING([whether sync_blockdev() exists])
ZFS_LINUX_TEST_RESULT([sync_blockdev], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SYNC_BLOCKDEV, 1,
[sync_blockdev() is declared in include/blkdev.h])
],[
ZFS_LINUX_TEST_ERROR(
[neither fsync_bdev() nor sync_blockdev() exist])
])
])
])

View File

@ -5,7 +5,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_FSYNC], [
ZFS_LINUX_TEST_SRC([fsync_without_dentry], [
#include <linux/fs.h>
int test_fsync(struct file *f, int x) { return 0; }
static int test_fsync(struct file *f, int x) { return 0; }
static const struct file_operations
fops __attribute__ ((unused)) = {
@ -16,7 +16,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_FSYNC], [
ZFS_LINUX_TEST_SRC([fsync_range], [
#include <linux/fs.h>
int test_fsync(struct file *f, loff_t a, loff_t b, int c)
static int test_fsync(struct file *f, loff_t a, loff_t b, int c)
{ return 0; }
static const struct file_operations

View File

@ -7,6 +7,10 @@ dnl #
dnl # 6.3 API
dnl # generic_fillattr() now takes struct mnt_idmap* as the first argument
dnl #
dnl # 6.6 API
dnl # generic_fillattr() now takes u32 as second argument, representing a
dnl # request_mask for statx
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_GENERIC_FILLATTR], [
ZFS_LINUX_TEST_SRC([generic_fillattr_userns], [
#include <linux/fs.h>
@ -25,22 +29,39 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_GENERIC_FILLATTR], [
struct kstat *k = NULL;
generic_fillattr(idmap, in, k);
])
ZFS_LINUX_TEST_SRC([generic_fillattr_mnt_idmap_reqmask], [
#include <linux/fs.h>
],[
struct mnt_idmap *idmap = NULL;
struct inode *in = NULL;
struct kstat *k = NULL;
generic_fillattr(idmap, 0, in, k);
])
])
AC_DEFUN([ZFS_AC_KERNEL_GENERIC_FILLATTR], [
AC_MSG_CHECKING([whether generic_fillattr requires struct mnt_idmap*])
ZFS_LINUX_TEST_RESULT([generic_fillattr_mnt_idmap], [
AC_MSG_CHECKING(
[whether generic_fillattr requires struct mnt_idmap* and request_mask])
ZFS_LINUX_TEST_RESULT([generic_fillattr_mnt_idmap_reqmask], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_GENERIC_FILLATTR_IDMAP, 1,
[generic_fillattr requires struct mnt_idmap*])
AC_DEFINE(HAVE_GENERIC_FILLATTR_IDMAP_REQMASK, 1,
[generic_fillattr requires struct mnt_idmap* and u32 request_mask])
],[
AC_MSG_CHECKING([whether generic_fillattr requires struct user_namespace*])
ZFS_LINUX_TEST_RESULT([generic_fillattr_userns], [
AC_MSG_CHECKING([whether generic_fillattr requires struct mnt_idmap*])
ZFS_LINUX_TEST_RESULT([generic_fillattr_mnt_idmap], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_GENERIC_FILLATTR_USERNS, 1,
[generic_fillattr requires struct user_namespace*])
AC_DEFINE(HAVE_GENERIC_FILLATTR_IDMAP, 1,
[generic_fillattr requires struct mnt_idmap*])
],[
AC_MSG_RESULT([no])
AC_MSG_CHECKING([whether generic_fillattr requires struct user_namespace*])
ZFS_LINUX_TEST_RESULT([generic_fillattr_userns], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_GENERIC_FILLATTR_USERNS, 1,
[generic_fillattr requires struct user_namespace*])
],[
AC_MSG_RESULT([no])
])
])
])
])

View File

@ -5,7 +5,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_GET_LINK], [
ZFS_LINUX_TEST_SRC([inode_operations_get_link], [
#include <linux/fs.h>
const char *get_link(struct dentry *de, struct inode *ip,
static const char *get_link(struct dentry *de, struct inode *ip,
struct delayed_call *done) { return "symlink"; }
static struct inode_operations
iops __attribute__ ((unused)) = {
@ -15,7 +15,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_GET_LINK], [
ZFS_LINUX_TEST_SRC([inode_operations_get_link_cookie], [
#include <linux/fs.h>
const char *get_link(struct dentry *de, struct
static const char *get_link(struct dentry *de, struct
inode *ip, void **cookie) { return "symlink"; }
static struct inode_operations
iops __attribute__ ((unused)) = {
@ -25,7 +25,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_GET_LINK], [
ZFS_LINUX_TEST_SRC([inode_operations_follow_link], [
#include <linux/fs.h>
const char *follow_link(struct dentry *de,
static const char *follow_link(struct dentry *de,
void **cookie) { return "symlink"; }
static struct inode_operations
iops __attribute__ ((unused)) = {
@ -35,7 +35,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_GET_LINK], [
ZFS_LINUX_TEST_SRC([inode_operations_follow_link_nameidata], [
#include <linux/fs.h>
void *follow_link(struct dentry *de, struct
static void *follow_link(struct dentry *de, struct
nameidata *nd) { return (void *)NULL; }
static struct inode_operations
iops __attribute__ ((unused)) = {

View File

@ -23,3 +23,28 @@ AC_DEFUN([ZFS_AC_KERNEL_IDMAP_MNT_API], [
])
])
dnl #
dnl # 6.8 decouples mnt_idmap from user_namespace. This is all internal
dnl # to mnt_idmap so we can't detect it directly, but we detect a related
dnl # change as use that as a signal.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_IDMAP_NO_USERNS], [
ZFS_LINUX_TEST_SRC([idmap_no_userns], [
#include <linux/uidgid.h>
], [
struct uid_gid_map *map = NULL;
map_id_down(map, 0);
])
])
AC_DEFUN([ZFS_AC_KERNEL_IDMAP_NO_USERNS], [
AC_MSG_CHECKING([whether idmapped mounts have a user namespace])
ZFS_LINUX_TEST_RESULT([idmap_no_userns], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_IDMAP_NO_USERNS, 1,
[mnt_idmap does not have user_namespace])
], [
AC_MSG_RESULT([no])
])
])

View File

@ -7,7 +7,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_CREATE], [
#include <linux/fs.h>
#include <linux/sched.h>
int inode_create(struct mnt_idmap *idmap,
static int inode_create(struct mnt_idmap *idmap,
struct inode *inode ,struct dentry *dentry,
umode_t umode, bool flag) { return 0; }
@ -25,7 +25,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_CREATE], [
#include <linux/fs.h>
#include <linux/sched.h>
int inode_create(struct user_namespace *userns,
static int inode_create(struct user_namespace *userns,
struct inode *inode ,struct dentry *dentry,
umode_t umode, bool flag) { return 0; }
@ -42,7 +42,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_CREATE], [
#include <linux/fs.h>
#include <linux/sched.h>
int inode_create(struct inode *inode ,struct dentry *dentry,
static int inode_create(struct inode *inode ,struct dentry *dentry,
umode_t umode, bool flag) { return 0; }
static const struct inode_operations

View File

@ -7,7 +7,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_GETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_getattr_mnt_idmap], [
#include <linux/fs.h>
int test_getattr(
static int test_getattr(
struct mnt_idmap *idmap,
const struct path *p, struct kstat *k,
u32 request_mask, unsigned int query_flags)
@ -28,7 +28,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_GETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_getattr_userns], [
#include <linux/fs.h>
int test_getattr(
static int test_getattr(
struct user_namespace *userns,
const struct path *p, struct kstat *k,
u32 request_mask, unsigned int query_flags)
@ -47,7 +47,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_GETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_getattr_path], [
#include <linux/fs.h>
int test_getattr(
static int test_getattr(
const struct path *p, struct kstat *k,
u32 request_mask, unsigned int query_flags)
{ return 0; }
@ -61,7 +61,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_GETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_getattr_vfsmount], [
#include <linux/fs.h>
int test_getattr(
static int test_getattr(
struct vfsmount *mnt, struct dentry *d,
struct kstat *k)
{ return 0; }

View File

@ -6,7 +6,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_LOOKUP_FLAGS], [
#include <linux/fs.h>
#include <linux/sched.h>
struct dentry *inode_lookup(struct inode *inode,
static struct dentry *inode_lookup(struct inode *inode,
struct dentry *dentry, unsigned int flags) { return NULL; }
static const struct inode_operations iops

View File

@ -8,12 +8,12 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_PERMISSION], [
#include <linux/fs.h>
#include <linux/sched.h>
int inode_permission(struct mnt_idmap *idmap,
static int test_permission(struct mnt_idmap *idmap,
struct inode *inode, int mask) { return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.permission = inode_permission,
.permission = test_permission,
};
],[])
@ -25,12 +25,12 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_PERMISSION], [
#include <linux/fs.h>
#include <linux/sched.h>
int inode_permission(struct user_namespace *userns,
static int test_permission(struct user_namespace *userns,
struct inode *inode, int mask) { return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.permission = inode_permission,
.permission = test_permission,
};
],[])
])

View File

@ -7,7 +7,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_SETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_setattr_mnt_idmap], [
#include <linux/fs.h>
int test_setattr(
static int test_setattr(
struct mnt_idmap *idmap,
struct dentry *de, struct iattr *ia)
{ return 0; }
@ -27,7 +27,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_SETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_setattr_userns], [
#include <linux/fs.h>
int test_setattr(
static int test_setattr(
struct user_namespace *userns,
struct dentry *de, struct iattr *ia)
{ return 0; }
@ -41,7 +41,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_SETATTR], [
ZFS_LINUX_TEST_SRC([inode_operations_setattr], [
#include <linux/fs.h>
int test_setattr(
static int test_setattr(
struct dentry *de, struct iattr *ia)
{ return 0; }

View File

@ -27,6 +27,73 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_TIMES], [
memset(&ip, 0, sizeof(ip));
ts = ip.i_mtime;
])
dnl #
dnl # 6.6 API change
dnl # i_ctime no longer directly accessible, must use
dnl # inode_get_ctime(ip), inode_set_ctime*(ip) to
dnl # read/write.
dnl #
ZFS_LINUX_TEST_SRC([inode_get_ctime], [
#include <linux/fs.h>
],[
struct inode ip;
memset(&ip, 0, sizeof(ip));
inode_get_ctime(&ip);
])
ZFS_LINUX_TEST_SRC([inode_set_ctime_to_ts], [
#include <linux/fs.h>
],[
struct inode ip;
struct timespec64 ts = {0};
memset(&ip, 0, sizeof(ip));
inode_set_ctime_to_ts(&ip, ts);
])
dnl #
dnl # 6.7 API change
dnl # i_atime/i_mtime no longer directly accessible, must use
dnl # inode_get_mtime(ip), inode_set_mtime*(ip) to
dnl # read/write.
dnl #
ZFS_LINUX_TEST_SRC([inode_get_atime], [
#include <linux/fs.h>
],[
struct inode ip;
memset(&ip, 0, sizeof(ip));
inode_get_atime(&ip);
])
ZFS_LINUX_TEST_SRC([inode_get_mtime], [
#include <linux/fs.h>
],[
struct inode ip;
memset(&ip, 0, sizeof(ip));
inode_get_mtime(&ip);
])
ZFS_LINUX_TEST_SRC([inode_set_atime_to_ts], [
#include <linux/fs.h>
],[
struct inode ip;
struct timespec64 ts = {0};
memset(&ip, 0, sizeof(ip));
inode_set_atime_to_ts(&ip, ts);
])
ZFS_LINUX_TEST_SRC([inode_set_mtime_to_ts], [
#include <linux/fs.h>
],[
struct inode ip;
struct timespec64 ts = {0};
memset(&ip, 0, sizeof(ip));
inode_set_mtime_to_ts(&ip, ts);
])
])
AC_DEFUN([ZFS_AC_KERNEL_INODE_TIMES], [
@ -47,4 +114,58 @@ AC_DEFUN([ZFS_AC_KERNEL_INODE_TIMES], [
AC_DEFINE(HAVE_INODE_TIMESPEC64_TIMES, 1,
[inode->i_*time's are timespec64])
])
AC_MSG_CHECKING([whether inode_get_ctime() exists])
ZFS_LINUX_TEST_RESULT([inode_get_ctime], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_GET_CTIME, 1,
[inode_get_ctime() exists in linux/fs.h])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether inode_set_ctime_to_ts() exists])
ZFS_LINUX_TEST_RESULT([inode_set_ctime_to_ts], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_SET_CTIME_TO_TS, 1,
[inode_set_ctime_to_ts() exists in linux/fs.h])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether inode_get_atime() exists])
ZFS_LINUX_TEST_RESULT([inode_get_atime], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_GET_ATIME, 1,
[inode_get_atime() exists in linux/fs.h])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether inode_set_atime_to_ts() exists])
ZFS_LINUX_TEST_RESULT([inode_set_atime_to_ts], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_SET_ATIME_TO_TS, 1,
[inode_set_atime_to_ts() exists in linux/fs.h])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether inode_get_mtime() exists])
ZFS_LINUX_TEST_RESULT([inode_get_mtime], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_GET_MTIME, 1,
[inode_get_mtime() exists in linux/fs.h])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether inode_set_mtime_to_ts() exists])
ZFS_LINUX_TEST_RESULT([inode_set_mtime_to_ts], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_SET_MTIME_TO_TS, 1,
[inode_set_mtime_to_ts() exists in linux/fs.h])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -4,7 +4,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
ZFS_LINUX_TEST_SRC([make_request_fn_void], [
#include <linux/blkdev.h>
void make_request(struct request_queue *q,
static void make_request(struct request_queue *q,
struct bio *bio) { return; }
],[
blk_queue_make_request(NULL, &make_request);
@ -12,7 +12,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
ZFS_LINUX_TEST_SRC([make_request_fn_blk_qc_t], [
#include <linux/blkdev.h>
blk_qc_t make_request(struct request_queue *q,
static blk_qc_t make_request(struct request_queue *q,
struct bio *bio) { return (BLK_QC_T_NONE); }
],[
blk_queue_make_request(NULL, &make_request);
@ -20,7 +20,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
ZFS_LINUX_TEST_SRC([blk_alloc_queue_request_fn], [
#include <linux/blkdev.h>
blk_qc_t make_request(struct request_queue *q,
static blk_qc_t make_request(struct request_queue *q,
struct bio *bio) { return (BLK_QC_T_NONE); }
],[
struct request_queue *q __attribute__ ((unused));
@ -29,7 +29,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
ZFS_LINUX_TEST_SRC([blk_alloc_queue_request_fn_rh], [
#include <linux/blkdev.h>
blk_qc_t make_request(struct request_queue *q,
static blk_qc_t make_request(struct request_queue *q,
struct bio *bio) { return (BLK_QC_T_NONE); }
],[
struct request_queue *q __attribute__ ((unused));
@ -50,6 +50,21 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
disk = blk_alloc_disk(NUMA_NO_NODE);
])
ZFS_LINUX_TEST_SRC([blk_alloc_disk_2arg], [
#include <linux/blkdev.h>
],[
struct queue_limits *lim = NULL;
struct gendisk *disk __attribute__ ((unused));
disk = blk_alloc_disk(lim, NUMA_NO_NODE);
])
ZFS_LINUX_TEST_SRC([blkdev_queue_limits_features], [
#include <linux/blkdev.h>
],[
struct queue_limits *lim = NULL;
lim->features = 0;
])
ZFS_LINUX_TEST_SRC([blk_cleanup_disk], [
#include <linux/blkdev.h>
],[
@ -96,6 +111,45 @@ AC_DEFUN([ZFS_AC_KERNEL_MAKE_REQUEST_FN], [
], [
AC_MSG_RESULT(no)
])
dnl #
dnl # Linux 6.9 API Change:
dnl # blk_alloc_queue() takes a nullable queue_limits arg.
dnl #
AC_MSG_CHECKING([whether blk_alloc_disk() exists and takes 2 args])
ZFS_LINUX_TEST_RESULT([blk_alloc_disk_2arg], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLK_ALLOC_DISK_2ARG], 1, [blk_alloc_disk() exists and takes 2 args])
dnl #
dnl # Linux 6.11 API change:
dnl # struct queue_limits gains a 'features' field,
dnl # used to set flushing options
dnl #
AC_MSG_CHECKING([whether struct queue_limits has a features field])
ZFS_LINUX_TEST_RESULT([blkdev_queue_limits_features], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLKDEV_QUEUE_LIMITS_FEATURES], 1,
[struct queue_limits has a features field])
], [
AC_MSG_RESULT(no)
])
dnl #
dnl # 5.20 API change,
dnl # Removed blk_cleanup_disk(), put_disk() should be used.
dnl #
AC_MSG_CHECKING([whether blk_cleanup_disk() exists])
ZFS_LINUX_TEST_RESULT([blk_cleanup_disk], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLK_CLEANUP_DISK], 1,
[blk_cleanup_disk() exists])
], [
AC_MSG_RESULT(no)
])
], [
AC_MSG_RESULT(no)
])
],[
AC_MSG_RESULT(no)

View File

@ -9,7 +9,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MKDIR], [
ZFS_LINUX_TEST_SRC([mkdir_mnt_idmap], [
#include <linux/fs.h>
int mkdir(struct mnt_idmap *idmap,
static int mkdir(struct mnt_idmap *idmap,
struct inode *inode, struct dentry *dentry,
umode_t umode) { return 0; }
static const struct inode_operations
@ -26,7 +26,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MKDIR], [
ZFS_LINUX_TEST_SRC([mkdir_user_namespace], [
#include <linux/fs.h>
int mkdir(struct user_namespace *userns,
static int mkdir(struct user_namespace *userns,
struct inode *inode, struct dentry *dentry,
umode_t umode) { return 0; }
@ -47,7 +47,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MKDIR], [
ZFS_LINUX_TEST_SRC([inode_operations_mkdir], [
#include <linux/fs.h>
int mkdir(struct inode *inode, struct dentry *dentry,
static int mkdir(struct inode *inode, struct dentry *dentry,
umode_t umode) { return 0; }
static const struct inode_operations

View File

@ -7,7 +7,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MKNOD], [
#include <linux/fs.h>
#include <linux/sched.h>
int tmp_mknod(struct mnt_idmap *idmap,
static int tmp_mknod(struct mnt_idmap *idmap,
struct inode *inode ,struct dentry *dentry,
umode_t u, dev_t d) { return 0; }
@ -25,7 +25,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MKNOD], [
#include <linux/fs.h>
#include <linux/sched.h>
int tmp_mknod(struct user_namespace *userns,
static int tmp_mknod(struct user_namespace *userns,
struct inode *inode ,struct dentry *dentry,
umode_t u, dev_t d) { return 0; }

View File

@ -0,0 +1,36 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE], [
ZFS_LINUX_TEST_SRC([page_size], [
#include <linux/mm.h>
],[
unsigned long s;
s = page_size(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_SIZE], [
AC_MSG_CHECKING([whether page_size() is available])
ZFS_LINUX_TEST_RESULT([page_size], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MM_PAGE_SIZE, 1, [page_size() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_MAPPING], [
ZFS_LINUX_TEST_SRC([page_mapping], [
#include <linux/pagemap.h>
],[
struct page *p = NULL;
struct address_space *m = page_mapping(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_MAPPING], [
AC_MSG_CHECKING([whether page_mapping() is available])
ZFS_LINUX_TEST_RESULT([page_mapping], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MM_PAGE_MAPPING, 1, [page_mapping() is available])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -7,14 +7,14 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_PROC_OPERATIONS], [
ZFS_LINUX_TEST_SRC([proc_ops_struct], [
#include <linux/proc_fs.h>
int test_open(struct inode *ip, struct file *fp) { return 0; }
ssize_t test_read(struct file *fp, char __user *ptr,
static int test_open(struct inode *ip, struct file *fp) { return 0; }
static ssize_t test_read(struct file *fp, char __user *ptr,
size_t size, loff_t *offp) { return 0; }
ssize_t test_write(struct file *fp, const char __user *ptr,
static ssize_t test_write(struct file *fp, const char __user *ptr,
size_t size, loff_t *offp) { return 0; }
loff_t test_lseek(struct file *fp, loff_t off, int flag)
static loff_t test_lseek(struct file *fp, loff_t off, int flag)
{ return 0; }
int test_release(struct inode *ip, struct file *fp)
static int test_release(struct inode *ip, struct file *fp)
{ return 0; }
const struct proc_ops test_ops __attribute__ ((unused)) = {

View File

@ -4,7 +4,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_PUT_LINK], [
ZFS_LINUX_TEST_SRC([put_link_cookie], [
#include <linux/fs.h>
void put_link(struct inode *ip, void *cookie)
static void put_link(struct inode *ip, void *cookie)
{ return; }
static struct inode_operations
iops __attribute__ ((unused)) = {
@ -14,7 +14,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_PUT_LINK], [
ZFS_LINUX_TEST_SRC([put_link_nameidata], [
#include <linux/fs.h>
void put_link(struct dentry *de, struct
static void put_link(struct dentry *de, struct
nameidata *nd, void *ptr) { return; }
static struct inode_operations
iops __attribute__ ((unused)) = {

View File

@ -25,3 +25,62 @@ AC_DEFUN([ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # Linux 6.11 register_sysctl() enforces that sysctl tables no longer
dnl # supply a sentinel end-of-table element. 6.6 introduces
dnl # register_sysctl_sz() to enable callers to choose, so we use it if
dnl # available for backward compatibility.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ], [
ZFS_LINUX_TEST_SRC([has_register_sysctl_sz], [
#include <linux/sysctl.h>
],[
struct ctl_table test_table[] __attribute__((unused)) = {0};
register_sysctl_sz("", test_table, 0);
])
])
AC_DEFUN([ZFS_AC_KERNEL_REGISTER_SYSCTL_SZ], [
AC_MSG_CHECKING([whether register_sysctl_sz exists])
ZFS_LINUX_TEST_RESULT([has_register_sysctl_sz], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_REGISTER_SYSCTL_SZ, 1,
[register_sysctl_sz exists])
],[
AC_MSG_RESULT([no])
])
])
dnl #
dnl # Linux 6.11 makes const the ctl_table arg of proc_handler
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_PROC_HANDLER_CTL_TABLE_CONST], [
ZFS_LINUX_TEST_SRC([has_proc_handler_ctl_table_const], [
#include <linux/sysctl.h>
static int test_handler(
const struct ctl_table *ctl __attribute((unused)),
int write __attribute((unused)),
void *buffer __attribute((unused)),
size_t *lenp __attribute((unused)),
loff_t *ppos __attribute((unused)))
{
return (0);
}
], [
proc_handler *ph __attribute((unused)) =
&test_handler;
])
])
AC_DEFUN([ZFS_AC_KERNEL_PROC_HANDLER_CTL_TABLE_CONST], [
AC_MSG_CHECKING([whether proc_handler ctl_table arg is const])
ZFS_LINUX_TEST_RESULT([has_proc_handler_ctl_table_const], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_PROC_HANDLER_CTL_TABLE_CONST, 1,
[proc_handler ctl_table arg is const])
], [
AC_MSG_RESULT([no])
])
])

View File

@ -8,7 +8,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_RENAME], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_rename2], [
#include <linux/fs.h>
int rename2_fn(struct inode *sip, struct dentry *sdp,
static int rename2_fn(struct inode *sip, struct dentry *sdp,
struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }
@ -26,7 +26,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_RENAME], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_rename_flags], [
#include <linux/fs.h>
int rename_fn(struct inode *sip, struct dentry *sdp,
static int rename_fn(struct inode *sip, struct dentry *sdp,
struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }
@ -44,7 +44,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_RENAME], [
dnl #
ZFS_LINUX_TEST_SRC([dir_inode_operations_wrapper_rename2], [
#include <linux/fs.h>
int rename2_fn(struct inode *sip, struct dentry *sdp,
static int rename2_fn(struct inode *sip, struct dentry *sdp,
struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }
@ -62,7 +62,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_RENAME], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_rename_userns], [
#include <linux/fs.h>
int rename_fn(struct user_namespace *user_ns, struct inode *sip,
static int rename_fn(struct user_namespace *user_ns, struct inode *sip,
struct dentry *sdp, struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }
@ -77,7 +77,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_RENAME], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_rename_mnt_idmap], [
#include <linux/fs.h>
int rename_fn(struct mnt_idmap *idmap, struct inode *sip,
static int rename_fn(struct mnt_idmap *idmap, struct inode *sip,
struct dentry *sdp, struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }

View File

@ -5,7 +5,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SHOW_OPTIONS], [
ZFS_LINUX_TEST_SRC([super_operations_show_options], [
#include <linux/fs.h>
int show_options(struct seq_file * x, struct dentry * y) {
static int show_options(struct seq_file * x, struct dentry * y) {
return 0;
};

View File

@ -8,9 +8,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_SHRINK], [
ZFS_LINUX_TEST_SRC([super_block_s_shrink], [
#include <linux/fs.h>
int shrink(struct shrinker *s, struct shrink_control *sc)
{ return 0; }
static const struct super_block
sb __attribute__ ((unused)) = {
.s_shrink.seeks = DEFAULT_SEEKS,
@ -19,12 +16,44 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_SHRINK], [
],[])
])
dnl #
dnl # 6.7 API change
dnl # s_shrink is now a pointer.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_SHRINK_PTR], [
ZFS_LINUX_TEST_SRC([super_block_s_shrink_ptr], [
#include <linux/fs.h>
static unsigned long shrinker_cb(struct shrinker *shrink,
struct shrink_control *sc) { return 0; }
static struct shrinker shrinker = {
.count_objects = shrinker_cb,
.scan_objects = shrinker_cb,
.seeks = DEFAULT_SEEKS,
};
static const struct super_block
sb __attribute__ ((unused)) = {
.s_shrink = &shrinker,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_SUPER_BLOCK_S_SHRINK], [
AC_MSG_CHECKING([whether super_block has s_shrink])
ZFS_LINUX_TEST_RESULT([super_block_s_shrink], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SUPER_BLOCK_S_SHRINK, 1,
[have super_block s_shrink])
],[
ZFS_LINUX_TEST_ERROR([sb->s_shrink()])
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether super_block has s_shrink pointer])
ZFS_LINUX_TEST_RESULT([super_block_s_shrink_ptr], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SUPER_BLOCK_S_SHRINK_PTR, 1,
[have super_block s_shrink pointer])
],[
AC_MSG_RESULT(no)
ZFS_LINUX_TEST_ERROR([sb->s_shrink()])
])
])
])
@ -57,7 +86,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SHRINK_CONTROL_HAS_NID], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SHRINKER_VARARG], [
ZFS_LINUX_TEST_SRC([register_shrinker_vararg], [
#include <linux/mm.h>
unsigned long shrinker_cb(struct shrinker *shrink,
static unsigned long shrinker_cb(struct shrinker *shrink,
struct shrink_control *sc) { return 0; }
],[
struct shrinker cache_shrinker = {
@ -72,7 +101,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SHRINKER_VARARG], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_SHRINKER_CALLBACK], [
ZFS_LINUX_TEST_SRC([shrinker_cb_shrink_control], [
#include <linux/mm.h>
int shrinker_cb(struct shrinker *shrink,
static int shrinker_cb(struct shrinker *shrink,
struct shrink_control *sc) { return 0; }
],[
struct shrinker cache_shrinker = {
@ -84,7 +113,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SHRINKER_CALLBACK], [
ZFS_LINUX_TEST_SRC([shrinker_cb_shrink_control_split], [
#include <linux/mm.h>
unsigned long shrinker_cb(struct shrinker *shrink,
static unsigned long shrinker_cb(struct shrinker *shrink,
struct shrink_control *sc) { return 0; }
],[
struct shrinker cache_shrinker = {
@ -96,6 +125,25 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SHRINKER_CALLBACK], [
])
])
dnl #
dnl # 6.7 API change
dnl # register_shrinker has been replaced by shrinker_register.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_SHRINKER_REGISTER], [
ZFS_LINUX_TEST_SRC([shrinker_register], [
#include <linux/shrinker.h>
static unsigned long shrinker_cb(struct shrinker *shrink,
struct shrink_control *sc) { return 0; }
],[
struct shrinker cache_shrinker = {
.count_objects = shrinker_cb,
.scan_objects = shrinker_cb,
.seeks = DEFAULT_SEEKS,
};
shrinker_register(&cache_shrinker);
])
])
AC_DEFUN([ZFS_AC_KERNEL_SHRINKER_CALLBACK],[
dnl #
dnl # 6.0 API change
@ -133,14 +181,36 @@ AC_DEFUN([ZFS_AC_KERNEL_SHRINKER_CALLBACK],[
dnl # cs->shrink() is logically split in to
dnl # cs->count_objects() and cs->scan_objects()
dnl #
AC_MSG_CHECKING([if cs->count_objects callback exists])
AC_MSG_CHECKING(
[whether cs->count_objects callback exists])
ZFS_LINUX_TEST_RESULT(
[shrinker_cb_shrink_control_split],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SPLIT_SHRINKER_CALLBACK, 1,
[cs->count_objects exists])
[shrinker_cb_shrink_control_split],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SPLIT_SHRINKER_CALLBACK, 1,
[cs->count_objects exists])
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether shrinker_register exists])
ZFS_LINUX_TEST_RESULT([shrinker_register], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SHRINKER_REGISTER, 1,
[shrinker_register exists])
dnl # We assume that the split shrinker
dnl # callback exists if
dnl # shrinker_register() exists,
dnl # because the latter is a much more
dnl # recent addition, and the macro
dnl # test for shrinker_register() only
dnl # works if the callback is split
AC_DEFINE(HAVE_SPLIT_SHRINKER_CALLBACK,
1, [cs->count_objects exists])
],[
AC_MSG_RESULT(no)
ZFS_LINUX_TEST_ERROR([shrinker])
])
])
])
])
@ -174,10 +244,12 @@ AC_DEFUN([ZFS_AC_KERNEL_SHRINK_CONTROL_STRUCT], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_SHRINKER], [
ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_SHRINK
ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_SHRINK_PTR
ZFS_AC_KERNEL_SRC_SHRINK_CONTROL_HAS_NID
ZFS_AC_KERNEL_SRC_SHRINKER_CALLBACK
ZFS_AC_KERNEL_SRC_SHRINK_CONTROL_STRUCT
ZFS_AC_KERNEL_SRC_REGISTER_SHRINKER_VARARG
ZFS_AC_KERNEL_SRC_SHRINKER_REGISTER
])
AC_DEFUN([ZFS_AC_KERNEL_SHRINKER], [

47
config/kernel-strlcpy.m4 Normal file
View File

@ -0,0 +1,47 @@
dnl #
dnl # 6.8.x replaced strlcpy with strscpy. Check for both so we can provide
dnl # appropriate fallbacks.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_STRLCPY], [
ZFS_LINUX_TEST_SRC([kernel_has_strlcpy], [
#include <linux/string.h>
], [
const char *src = "goodbye";
char dst[32];
size_t len;
len = strlcpy(dst, src, sizeof (dst));
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_STRSCPY], [
ZFS_LINUX_TEST_SRC([kernel_has_strscpy], [
#include <linux/string.h>
], [
const char *src = "goodbye";
char dst[32];
ssize_t len;
len = strscpy(dst, src, sizeof (dst));
])
])
AC_DEFUN([ZFS_AC_KERNEL_STRLCPY], [
AC_MSG_CHECKING([whether strlcpy() exists])
ZFS_LINUX_TEST_RESULT([kernel_has_strlcpy], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_KERNEL_STRLCPY, 1,
[strlcpy() exists])
], [
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_STRSCPY], [
AC_MSG_CHECKING([whether strscpy() exists])
ZFS_LINUX_TEST_RESULT([kernel_has_strscpy], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_KERNEL_STRSCPY, 1,
[strscpy() exists])
], [
AC_MSG_RESULT([no])
])
])

View File

@ -6,7 +6,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SYMLINK], [
ZFS_LINUX_TEST_SRC([symlink_mnt_idmap], [
#include <linux/fs.h>
#include <linux/sched.h>
int tmp_symlink(struct mnt_idmap *idmap,
static int tmp_symlink(struct mnt_idmap *idmap,
struct inode *inode ,struct dentry *dentry,
const char *path) { return 0; }
@ -23,7 +23,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_SYMLINK], [
#include <linux/fs.h>
#include <linux/sched.h>
int tmp_symlink(struct user_namespace *userns,
static int tmp_symlink(struct user_namespace *userns,
struct inode *inode ,struct dentry *dentry,
const char *path) { return 0; }

View File

@ -18,7 +18,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_TIMER_SETUP], [
int data;
};
void task_expire(struct timer_list *tl)
static void task_expire(struct timer_list *tl)
{
struct my_task_timer *task_timer =
from_timer(task_timer, tl, timer);
@ -31,7 +31,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_TIMER_SETUP], [
ZFS_LINUX_TEST_SRC([timer_list_function], [
#include <linux/timer.h>
void task_expire(struct timer_list *tl) {}
static void task_expire(struct timer_list *tl) {}
],[
struct timer_list tl;
tl.function = task_expire;

View File

@ -9,7 +9,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_TMPFILE], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_tmpfile_mnt_idmap], [
#include <linux/fs.h>
int tmpfile(struct mnt_idmap *idmap,
static int tmpfile(struct mnt_idmap *idmap,
struct inode *inode, struct file *file,
umode_t mode) { return 0; }
static struct inode_operations
@ -22,7 +22,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_TMPFILE], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_tmpfile], [
#include <linux/fs.h>
int tmpfile(struct user_namespace *userns,
static int tmpfile(struct user_namespace *userns,
struct inode *inode, struct file *file,
umode_t mode) { return 0; }
static struct inode_operations
@ -36,7 +36,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_TMPFILE], [
dnl #
ZFS_LINUX_TEST_SRC([inode_operations_tmpfile_dentry_userns], [
#include <linux/fs.h>
int tmpfile(struct user_namespace *userns,
static int tmpfile(struct user_namespace *userns,
struct inode *inode, struct dentry *dentry,
umode_t mode) { return 0; }
static struct inode_operations
@ -46,7 +46,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_TMPFILE], [
],[])
ZFS_LINUX_TEST_SRC([inode_operations_tmpfile_dentry], [
#include <linux/fs.h>
int tmpfile(struct inode *inode, struct dentry *dentry,
static int tmpfile(struct inode *inode, struct dentry *dentry,
umode_t mode) { return 0; }
static struct inode_operations
iops __attribute__ ((unused)) = {

40
config/kernel-types.m4 Normal file
View File

@ -0,0 +1,40 @@
dnl #
dnl # check if kernel provides definitions for given types
dnl #
dnl _ZFS_AC_KERNEL_SRC_TYPE(type)
AC_DEFUN([_ZFS_AC_KERNEL_SRC_TYPE], [
ZFS_LINUX_TEST_SRC([type_$1], [
#include <linux/types.h>
],[
const $1 __attribute__((unused)) x = ($1) 0;
])
])
dnl _ZFS_AC_KERNEL_TYPE(type)
AC_DEFUN([_ZFS_AC_KERNEL_TYPE], [
AC_MSG_CHECKING([whether kernel defines $1])
ZFS_LINUX_TEST_RESULT([type_$1], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_KERNEL_]m4_quote(m4_translit([$1], [a-z], [A-Z])),
1, [kernel defines $1])
], [
AC_MSG_RESULT([no])
])
])
dnl ZFS_AC_KERNEL_TYPES([types...])
AC_DEFUN([ZFS_AC_KERNEL_TYPES], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_TYPES], [
m4_foreach_w([type], [$1], [
_ZFS_AC_KERNEL_SRC_TYPE(type)
])
])
AC_DEFUN([ZFS_AC_KERNEL_TYPES], [
m4_foreach_w([type], [$1], [
_ZFS_AC_KERNEL_TYPE(type)
])
])
])
ZFS_AC_KERNEL_TYPES([intptr_t])

View File

@ -5,7 +5,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO], [
ZFS_LINUX_TEST_SRC([direct_io_iter], [
#include <linux/fs.h>
ssize_t test_direct_IO(struct kiocb *kiocb,
static ssize_t test_direct_IO(struct kiocb *kiocb,
struct iov_iter *iter) { return 0; }
static const struct address_space_operations
@ -17,7 +17,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO], [
ZFS_LINUX_TEST_SRC([direct_io_iter_offset], [
#include <linux/fs.h>
ssize_t test_direct_IO(struct kiocb *kiocb,
static ssize_t test_direct_IO(struct kiocb *kiocb,
struct iov_iter *iter, loff_t offset) { return 0; }
static const struct address_space_operations
@ -29,7 +29,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO], [
ZFS_LINUX_TEST_SRC([direct_io_iter_rw_offset], [
#include <linux/fs.h>
ssize_t test_direct_IO(int rw, struct kiocb *kiocb,
static ssize_t test_direct_IO(int rw, struct kiocb *kiocb,
struct iov_iter *iter, loff_t offset) { return 0; }
static const struct address_space_operations
@ -41,7 +41,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO], [
ZFS_LINUX_TEST_SRC([direct_io_iovec], [
#include <linux/fs.h>
ssize_t test_direct_IO(int rw, struct kiocb *kiocb,
static ssize_t test_direct_IO(int rw, struct kiocb *kiocb,
const struct iovec *iov, loff_t offset,
unsigned long nr_segs) { return 0; }

View File

@ -16,6 +16,9 @@ dnl #
dnl # 5.3: VFS copy_file_range() expected to do its own fallback,
dnl # generic_copy_file_range() added to support it
dnl #
dnl # 6.8: generic_copy_file_range() removed, replaced by
dnl # splice_copy_file_range()
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_copy_file_range], [
#include <linux/fs.h>
@ -72,6 +75,30 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE], [
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_SPLICE_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([splice_copy_file_range], [
#include <linux/splice.h>
], [
struct file *src_file __attribute__ ((unused)) = NULL;
loff_t src_off __attribute__ ((unused)) = 0;
struct file *dst_file __attribute__ ((unused)) = NULL;
loff_t dst_off __attribute__ ((unused)) = 0;
size_t len __attribute__ ((unused)) = 0;
splice_copy_file_range(src_file, src_off, dst_file, dst_off,
len);
])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_SPLICE_COPY_FILE_RANGE], [
AC_MSG_CHECKING([whether splice_copy_file_range() is available])
ZFS_LINUX_TEST_RESULT([splice_copy_file_range], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_SPLICE_COPY_FILE_RANGE, 1,
[splice_copy_file_range() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_clone_file_range], [
#include <linux/fs.h>

View File

@ -1,7 +1,7 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_ITERATE], [
ZFS_LINUX_TEST_SRC([file_operations_iterate_shared], [
#include <linux/fs.h>
int iterate(struct file *filp, struct dir_context * context)
static int iterate(struct file *filp, struct dir_context * context)
{ return 0; }
static const struct file_operations fops
@ -12,7 +12,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_ITERATE], [
ZFS_LINUX_TEST_SRC([file_operations_iterate], [
#include <linux/fs.h>
int iterate(struct file *filp,
static int iterate(struct file *filp,
struct dir_context *context) { return 0; }
static const struct file_operations fops
@ -27,7 +27,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_ITERATE], [
ZFS_LINUX_TEST_SRC([file_operations_readdir], [
#include <linux/fs.h>
int readdir(struct file *filp, void *entry,
static int readdir(struct file *filp, void *entry,
filldir_t func) { return 0; }
static const struct file_operations fops

View File

@ -5,9 +5,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_RW_ITERATE], [
ZFS_LINUX_TEST_SRC([file_operations_rw], [
#include <linux/fs.h>
ssize_t test_read(struct kiocb *kiocb, struct iov_iter *to)
static ssize_t test_read(struct kiocb *kiocb, struct iov_iter *to)
{ return 0; }
ssize_t test_write(struct kiocb *kiocb, struct iov_iter *from)
static ssize_t test_write(struct kiocb *kiocb, struct iov_iter *from)
{ return 0; }
static const struct file_operations

View File

@ -6,7 +6,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_WRITEPAGE_T], [
dnl #
ZFS_LINUX_TEST_SRC([writepage_t_folio], [
#include <linux/writeback.h>
int putpage(struct folio *folio,
static int putpage(struct folio *folio,
struct writeback_control *wbc, void *data)
{ return 0; }
writepage_t func = putpage;

View File

@ -68,7 +68,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_GET], [
ZFS_LINUX_TEST_SRC([xattr_handler_get_dentry_inode], [
#include <linux/xattr.h>
int get(const struct xattr_handler *handler,
static int get(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size) { return 0; }
static const struct xattr_handler
@ -80,7 +80,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_GET], [
ZFS_LINUX_TEST_SRC([xattr_handler_get_xattr_handler], [
#include <linux/xattr.h>
int get(const struct xattr_handler *handler,
static int get(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
void *buffer, size_t size) { return 0; }
static const struct xattr_handler
@ -92,7 +92,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_GET], [
ZFS_LINUX_TEST_SRC([xattr_handler_get_dentry], [
#include <linux/xattr.h>
int get(struct dentry *dentry, const char *name,
static int get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int handler_flags)
{ return 0; }
static const struct xattr_handler
@ -104,7 +104,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_GET], [
ZFS_LINUX_TEST_SRC([xattr_handler_get_dentry_inode_flags], [
#include <linux/xattr.h>
int get(const struct xattr_handler *handler,
static int get(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, void *buffer,
size_t size, int flags) { return 0; }
@ -182,7 +182,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_SET], [
ZFS_LINUX_TEST_SRC([xattr_handler_set_mnt_idmap], [
#include <linux/xattr.h>
int set(const struct xattr_handler *handler,
static int set(const struct xattr_handler *handler,
struct mnt_idmap *idmap,
struct dentry *dentry, struct inode *inode,
const char *name, const void *buffer,
@ -197,7 +197,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_SET], [
ZFS_LINUX_TEST_SRC([xattr_handler_set_userns], [
#include <linux/xattr.h>
int set(const struct xattr_handler *handler,
static int set(const struct xattr_handler *handler,
struct user_namespace *mnt_userns,
struct dentry *dentry, struct inode *inode,
const char *name, const void *buffer,
@ -212,7 +212,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_SET], [
ZFS_LINUX_TEST_SRC([xattr_handler_set_dentry_inode], [
#include <linux/xattr.h>
int set(const struct xattr_handler *handler,
static int set(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, const void *buffer,
size_t size, int flags)
@ -226,7 +226,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_SET], [
ZFS_LINUX_TEST_SRC([xattr_handler_set_xattr_handler], [
#include <linux/xattr.h>
int set(const struct xattr_handler *handler,
static int set(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags)
{ return 0; }
@ -239,7 +239,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_SET], [
ZFS_LINUX_TEST_SRC([xattr_handler_set_dentry], [
#include <linux/xattr.h>
int set(struct dentry *dentry, const char *name,
static int set(struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags,
int handler_flags) { return 0; }
static const struct xattr_handler
@ -325,7 +325,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_LIST], [
ZFS_LINUX_TEST_SRC([xattr_handler_list_simple], [
#include <linux/xattr.h>
bool list(struct dentry *dentry) { return 0; }
static bool list(struct dentry *dentry) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.list = list,
@ -335,7 +335,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_LIST], [
ZFS_LINUX_TEST_SRC([xattr_handler_list_xattr_handler], [
#include <linux/xattr.h>
size_t list(const struct xattr_handler *handler,
static size_t list(const struct xattr_handler *handler,
struct dentry *dentry, char *list, size_t list_size,
const char *name, size_t name_len) { return 0; }
static const struct xattr_handler
@ -347,7 +347,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_XATTR_HANDLER_LIST], [
ZFS_LINUX_TEST_SRC([xattr_handler_list_dentry], [
#include <linux/xattr.h>
size_t list(struct dentry *dentry,
static size_t list(struct dentry *dentry,
char *list, size_t list_size,
const char *name, size_t name_len,
int handler_flags) { return 0; }

View File

@ -37,6 +37,7 @@ dnl # only once the compilation can be done in parallel significantly
dnl # speeding up the process.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_TYPES
ZFS_AC_KERNEL_SRC_OBJTOOL
ZFS_AC_KERNEL_SRC_GLOBAL_PAGE_STATE
ZFS_AC_KERNEL_SRC_ACCESS_OK_TYPE
@ -118,6 +119,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_SPLICE_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE
@ -149,6 +151,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_SYSFS
ZFS_AC_KERNEL_SRC_SET_SPECIAL_STATE
ZFS_AC_KERNEL_SRC_STANDALONE_LINUX_STDARG
ZFS_AC_KERNEL_SRC_STRLCPY
ZFS_AC_KERNEL_SRC_STRSCPY
ZFS_AC_KERNEL_SRC_PAGEMAP_FOLIO_WAIT_BIT
ZFS_AC_KERNEL_SRC_ADD_DISK
ZFS_AC_KERNEL_SRC_KTHREAD
@ -156,17 +160,26 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC___COPY_FROM_USER_INATOMIC
ZFS_AC_KERNEL_SRC_USER_NS_COMMON_INUM
ZFS_AC_KERNEL_SRC_IDMAP_MNT_API
ZFS_AC_KERNEL_SRC_IDMAP_NO_USERNS
ZFS_AC_KERNEL_SRC_IATTR_VFSID
ZFS_AC_KERNEL_SRC_FILEMAP
ZFS_AC_KERNEL_SRC_WRITEPAGE_T
ZFS_AC_KERNEL_SRC_RECLAIMED
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ
ZFS_AC_KERNEL_SRC_PROC_HANDLER_CTL_TABLE_CONST
ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ
ZFS_AC_KERNEL_SRC_SYNC_BDEV
ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE
ZFS_AC_KERNEL_SRC_MM_PAGE_MAPPING
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
ZFS_AC_KERNEL_SRC_FLUSH_DCACHE_PAGE
;;
riscv*)
ZFS_AC_KERNEL_SRC_FLUSH_DCACHE_PAGE
;;
esac
AC_MSG_CHECKING([for available kernel interfaces])
@ -178,6 +191,7 @@ dnl #
dnl # Check results of kernel interface tests.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_TYPES
ZFS_AC_KERNEL_ACCESS_OK_TYPE
ZFS_AC_KERNEL_GLOBAL_PAGE_STATE
ZFS_AC_KERNEL_OBJTOOL
@ -259,6 +273,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_VFS_IOV_ITER
ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_SPLICE_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE
@ -290,6 +305,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_SYSFS
ZFS_AC_KERNEL_SET_SPECIAL_STATE
ZFS_AC_KERNEL_STANDALONE_LINUX_STDARG
ZFS_AC_KERNEL_STRLCPY
ZFS_AC_KERNEL_STRSCPY
ZFS_AC_KERNEL_PAGEMAP_FOLIO_WAIT_BIT
ZFS_AC_KERNEL_ADD_DISK
ZFS_AC_KERNEL_KTHREAD
@ -297,17 +314,26 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL___COPY_FROM_USER_INATOMIC
ZFS_AC_KERNEL_USER_NS_COMMON_INUM
ZFS_AC_KERNEL_IDMAP_MNT_API
ZFS_AC_KERNEL_IDMAP_NO_USERNS
ZFS_AC_KERNEL_IATTR_VFSID
ZFS_AC_KERNEL_FILEMAP
ZFS_AC_KERNEL_WRITEPAGE_T
ZFS_AC_KERNEL_RECLAIMED
ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_REGISTER_SYSCTL_SZ
ZFS_AC_KERNEL_PROC_HANDLER_CTL_TABLE_CONST
ZFS_AC_KERNEL_COPY_SPLICE_READ
ZFS_AC_KERNEL_SYNC_BDEV
ZFS_AC_KERNEL_MM_PAGE_SIZE
ZFS_AC_KERNEL_MM_PAGE_MAPPING
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
ZFS_AC_KERNEL_FLUSH_DCACHE_PAGE
;;
riscv*)
ZFS_AC_KERNEL_FLUSH_DCACHE_PAGE
;;
esac
])

View File

@ -83,6 +83,11 @@ srpm-common:
rpm-local || exit 1; \
LANG=C $(RPMBUILD) \
--define "_tmppath $$rpmbuild/TMP" \
--define "_builddir $$rpmbuild/BUILD" \
--define "_rpmdir $$rpmbuild/RPMS" \
--define "_srcrpmdir $$rpmbuild/SRPMS" \
--define "_specdir $$rpmbuild/SPECS" \
--define "_sourcedir $$rpmbuild/SOURCES" \
--define "_topdir $$rpmbuild" \
$(def) -bs $$rpmbuild/SPECS/$$rpmspec || exit 1; \
cp $$rpmbuild/SRPMS/$$rpmpkg . || exit 1; \
@ -99,6 +104,11 @@ rpm-common:
rpm-local || exit 1; \
LANG=C ${RPMBUILD} \
--define "_tmppath $$rpmbuild/TMP" \
--define "_builddir $$rpmbuild/BUILD" \
--define "_rpmdir $$rpmbuild/RPMS" \
--define "_srcrpmdir $$rpmbuild/SRPMS" \
--define "_specdir $$rpmbuild/SPECS" \
--define "_sourcedir $$rpmbuild/SOURCES" \
--define "_topdir $$rpmbuild" \
$(def) --rebuild $$rpmpkg || exit 1; \
cp $$rpmbuild/RPMS/*/* . || exit 1; \

14
config/user-backtrace.m4 Normal file
View File

@ -0,0 +1,14 @@
dnl
dnl backtrace(), for userspace assertions. glibc has this directly in libc.
dnl FreeBSD and (sometimes) musl have it in a separate -lexecinfo. It's assumed
dnl that this will also get the companion function backtrace_symbols().
dnl
AC_DEFUN([ZFS_AC_CONFIG_USER_BACKTRACE], [
AX_SAVE_FLAGS
LIBS=""
AC_SEARCH_LIBS([backtrace], [execinfo], [
AC_DEFINE(HAVE_BACKTRACE, 1, [backtrace() is available])
AC_SUBST([BACKTRACE_LIBS], ["$LIBS"])
])
AX_RESTORE_FLAGS
])

Some files were not shown because too many files have changed in this diff Show More