Commit Graph

7311 Commits

Author SHA1 Message Date
Rob Norris 802c258fc1 compress: add "slack" compression options
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Allan Jude 066532da51 Add module parameter to block 0 byte writes
Some hardware has issues when issues a write of 0 bytes
Add a new module paramter, zio_suppress_zero_writes
That when enabled (default) will just complete these I/Os
without sending them to the hardware.

Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 91d6b61268 json: Define PRId64 and PRIu64 on FreeBSD
On FreeBSD, these types are long instead of long long.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 95d6d8d32f json: Drop problematic casts in nvlist_to_json()
The NVP_NAME() macro requires its argument to be castable to char *.
The compiler complains if const char * is provided instead.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski a7d67aed05 freebsd: Fix ZFS_ENTER_UNMOUNTOK and ZFS_ENTER on FreeBSD
There was a typo in zfs_znode_impl. The two macros were lowercase
instead of all caps, which caused compilation problems on FreeBSD.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 6ee35af1a4 zil: Drop an unnecessary if statement
We already check for error != 0 earlier and return if true. The compiler
error here is a false positive.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski d744cdb77c json: null_filter(): Use __maybe_unused
The function fails to compile with -Wself-assign.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 9c2c6124be zpool: Provide GUID to zpool-reguid(8) with -g
This commit extends the zpool-reguid(8) command with a -g flag, which
allows the user to specify the GUID to set.

Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara Inc.
2023-07-05 13:27:31 +00:00
Allan Jude 9c9eed9737 Make zpool clear reset the removed flag on vdevs
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Allan Jude 41b06f70c6 Make zpool clear reset the removed flag on vdevs
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
2023-07-05 13:27:31 +00:00
Fred Weigel b6a9054a0e Fix checkstyle for zil.c
Returns are to be parenthesized
2023-07-05 13:27:31 +00:00
Fred Weigel eb3607bcec Fixes for Wasabi json endpoint
Corrects status output.
2023-07-05 13:27:31 +00:00
Fred Weigel cf5a6fbc82 Change 5 char tag limit to 255
Changes 5 character maximum tag to 255 characters.
2023-07-05 13:27:31 +00:00
Fred Weigel 6ccb1a75af Klara update for json
Fix checkstyle indicated errors, source format fixes

Signed-off-by: Fred Weigel <fred.weigel@klarasystems.com>
2023-07-05 13:27:31 +00:00
Allan Jude 2284c4d200 Add module parameter to block 0 byte writes
Some hardware has issues when issues a write of 0 bytes
Add a new module paramter, zio_suppress_zero_writes
That when enabled (default) will just complete these I/Os
without sending them to the hardware.

Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Rob Norris f882884358 btree: fix double-free in zfs_btree_remove_idx
We applied 03c0ee94b to fix two use-after-free cases, backporting 13f2b8fb9
from upstream. Unfortunately that patch seems to have been misapplied,
introducing a double-free in one of them. This commit fixes that.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2023-07-05 13:27:31 +00:00
Fred Weigel 506f987972 Added jprint.h and json_status.h to allow dist build
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Rob Norris 88149e0873 zil_create: don't try to deallocate a block we never allocated
(cherry picked from commit 8a35cfdcdd62ffc47e7628616f0dcb2ef172cf4b)
2023-07-05 13:27:31 +00:00
Rob Norris 5a256eaed1 zil_close: don't try to deallocate on-disk blocks
If we're force-exporting or failed then there's no guarantee the IO will
get anywhere. If its a clean shutdown then that's actually the lead
block and it'll be sorted out during replay or next txg.

(cherry picked from commit 01e04a4eef7811a31a6258c99d0cc51217732758)
2023-07-05 13:27:31 +00:00
Allan Jude 11d3cff47b Normalize the endpoint name
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
fredw 43b705c787 stats_version: 2, scan_stats added even if never done. pass_scrub_scrub_spent_paused is now pass_scrub_spent_paused. stats is stats.json
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 3828f754f1 json_stats.c: Rename the stats file to "status.json" 2023-07-05 13:27:31 +00:00
Rob Norris 2724bcb3d6 zil: allow the ZIL to fail and restart independently of the pool
zil_commit() has always returned void, and thus, cannot fail. Everything
inside it assumed that if anything ever went wrong, it could fall back
on txg_wait_synced() until the txg covering the operations being flushed
from the ZIL has fully committed. This meant that if the pool failed and
failmode=continue was set, syncing operations like fsync() would still
block.

Unblocking zil_commit() means largely the same approach. The difficulty
is that the ZIL carries the record of uncommitted VFS operations (vs the
changed data), and attached to those, callbacks and cvs that will
release userspace callers once the data is on disk. So if we can't write
the ZIL, we also can't release those records until the data is on disk.

This wasn't a problem before, because the zil_commit() would block. If
we change zil_commit() to return error, we still need to track those
entries until the data they represent hits the disk. We also need to
accept new records; just because the ZIL fails may not necessarily mean
the pool itself is unavailable.

This commit reorganises the ZIL to allow zil_commit() to return failure.
If ZIL writes or flushes fail, the ZIL is moved into a "failed" state,
and no further writes are done; all zil_commit() calls are serviced by
the regular txg mechanism. Outstanding records (itx_ts) are held until
the main pool writes their associated txg out. The records are then
released. Once all records are cleared, the ZIL is reset and reopened.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit af821006f6602261e690fe6635689cabdeefcadf)
2023-07-05 13:27:31 +00:00
Rob Norris cdaf041d39 zil: ensure flush errors are recieved
Its possible for a hardware failure to occur in a way that the ZIL block
writes appear to succeed, but the flush fails.

Because flush errors were being ignored, the lwb chain would finish with
a zero error code, which would result in zil_commit() returning and thus
fsync() returning success to the caller, even though the data was not
recorded in the ZIL.

If the ZIL is on the main pool (no SLOG device) it would typically
suspend around the same time. If that happened before the txg committed,
then those writes are now totally lost - not on the pool, not in the
ZIL.

zil_lwb_flush_vdevs_done() has the necessary code to deal with this
situation, but zio_flush() would never return failure, so it never saw
it. This just allows flushes to report failure, and now we never miss a
failed ZIL write.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit d9db5dccc56b551d0bf66bc9022b6c19a659b7e1)
2023-07-05 13:27:31 +00:00
Rob Norris 8ec175d7e1 zio_flush: require caller to decide if errors should propagate
Ignoring flush errors makes it possible for callers to never know that
their writes didn't succeed, and allows writes to be lost if the pool
fails.

This commit gives zio_flush() a flag argument, and updates the call
sites to pass ZIO_FLAG_DONT_PROPAGATE to it. Thus, this commit does not
change any behaviour, but opens the floor for further changes to allow
those callers to handle flush failures sensibly.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 6d0deb8a5a0c3d6bbc69d9625d55fc776bb98ea3)
2023-07-05 13:27:31 +00:00
Rob Norris 589cea17a9 dmu_tx_wait: handle pool suspension when failmode=continue
Let txg_wait_synced_tx fail, so the caller can retry.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit d560d64dbdf853d8fb9e18fc7570bd309091b2e4)
2023-07-05 13:27:30 +00:00
Rob Norris 7b7af8ba02 vnops: thread DMU_TX_ASSIGN_CONTINUE to a bunch of vnops
These are ones that I'm reasonably sure connect to a real syscall and
have a reasonable error response.

I've left stuff like `dirty_inode`, `zfs_inactive`, etc, which are
internal kernel housekeeping things, as well as anything that looks like
it belongs to zvols, ioctls, admin commands, etc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 39c2801c611e27b521d716fea8f771307820362e)
2023-07-05 13:27:30 +00:00
Rob Norris aea007e336 dmu: add DMU_TX_ASSIGN_CONTINUE flag
This is like DMU_TX_ASSIGN_NOSUSPEND, but only when failmode=continue,
and returning EIO if the pool is suspended. Its designed to be easy to
use from syscalls and similar without the ceremony of checking the for
EAGAIN and failmode every time.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 6bed8644dd2afa0e39727e9e90642479c2416521)
2023-07-05 13:27:30 +00:00
Rob Norris 48a48059c7 dmu: rename dmu_tx_assign flags
Their names clash with those for txg_wait_synced_tx, and they aren't
directly compatible, leading to confusion.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 1f0fb1dae7c1e84de3b39e669e09b8b3d5b80b87)
2023-07-05 13:27:30 +00:00
Rob Norris b0d75996ba zio: don't report suspend IOs if the pool is already suspended
This can happen if the pool suspended and then new IO is issued which
then fails too. This doesn't change behaviour, just silences the noise.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 3fa696404fb40205ed631538c62ec1a54d8ee6cd)
2023-07-05 13:27:30 +00:00
Rob Norris 3aea149bf8 linux: reject syncing ops if the filesystem is unmounting
The kernel can call these during unmount, so we have to handle them
directly to prevent any further IO being issued.

zfs_fsync reorganised slightly to not set up zfs_fsyncer_key until after
the teardown lock is acquired, just in case we don't get it.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 900c26570ddcdd1d3ca135e6aee5df6456f6bfd6)
2023-07-05 13:27:30 +00:00
Rob Norris 7e4a9cbaee zpool_disable_datasets: on Linux, detach mounts when forcing export
On Linux, MNT_FORCE makes the kernel inform that fileystem that its
about to call its unmount method so it can begin to eject active IO,
making it more likely that the unmount will succeed. This however does
not arrange for the unmount method to always succeed; new IO between the
two filesystem calls can dirty the filesystem. This is very difficult to
lock out properly within ZFS, as not all operations that cause the
kernel to dirty the filesystem can easily locked out (eg zfs_lookup).

So, we add MNT_DETACH as well. This causes the kernel to first remove
the mount from the user namespace, giving the appearance that it has
been unmounted (ie no longer appears in /proc/mounts), so that userspace
can't reference the filesystem anymore. The unmount then proceeds in the
background.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit d2e1634fc935288aa851b5915feaa670c791265c)
2023-07-05 13:27:30 +00:00
Mariusz Zaborski 40a9efd0e8 zfs: support force exporting pools
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by:  Klara, Inc.
Sponsored-by:  Catalogics, Inc.
Sponsored-by:  Wasabi Technology, Inc.
Closes #3461
(cherry picked from commit 852e633772217d779a63e8c46fe3c5f81dd8960e)
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski f65b59c5e5 module/zfs/Makefile.in: Add jprint.o and json_stats.o 2023-07-05 13:27:30 +00:00
Mateusz Piotrowski dcf745c378 Remove remaining bits of zpool addlog and ZFS_IOC_ADD_LOG 2023-07-05 13:27:30 +00:00
Mateusz Piotrowski 676d1dcc8c json_stats.c: Do not print value of vs_noalloc
The vs_noalloc member of the vdev_stat structure was implemented in
2a673e76a9. It is not available in ZFS
2.1.5, so code using it needs to be disabled.
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski bcde0da8e4 json_stats.c: Move variable declarations out of a switch statement
This patch fixes the following compilation error:

```
../../module/zfs/json_stats.c: In function ‘nvlist_to_json’:
../../module/zfs/json_stats.c:92:4: error: a label can only be part of a statement and a declaration is not a statement
    uint64_t *u = (uint64_t *)p;
    ^~~~~~~~
../../module/zfs/json_stats.c:102:4: error: a label can only be part of a statement and a declaration is not a statement
    nvlist_t **a = (nvlist_t **)p;
    ^~~~~~~~
```
2023-07-05 13:27:30 +00:00
Fred Weigel 747c7bbcf6 Add a JSON equivalent to zpool-status(8)
This is a squashed commit of the commits from
03a64568f318c696b9e4be19429e72b446c97462 to
1c64f0c8832b34bfa82645125351d6c62815ae21 developed by Fred Weigel.

Usage:

    cat /proc/spl/kstat/zfs/POOLNAME/stats

The following changes has been applied during the rebase of the patches
on top of the 2.1.5 branch:

- Drop ZFS_IOC_ADD_LOG. This ioctl was introduced to support introducing
  messages into the ZFS kernel log. It was used for debugging during
  development. The implementation of this debugging feature made `zpool
  addlog` output messages to /proc/spl/kstat/zfs/dbgmsg. The messages
  could later be retrieved with `zdbgmsg show`.
- Change the fmgw.c entry in lib/libzpool/Makefile.am to json_stats.c.
  The fmgw.c file has already been renamed to json_stats.c in other
  places.

Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>

(cherry picked from commit 75f3395d7fc0c93c02c8a8e792515f3e821aa05a)
2023-07-05 13:27:30 +00:00
Richard Yao 18ae26747c Fix use-after-free in btree code
Coverty static analysis found these.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #10989
Closes #13861
(cherry picked from commit 13f2b8fb92)
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski 2f327a2457 Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
The default_bs and default_ibs tunables control the default block size
and indirect block size.

So far, default_bs and default_ibs were tunable only on FreeBSD, e.g.,

    sysctl vfs.zfs.default_ibs

Remove the FreeBSD-specific sysctl code and expose default_bs and
default_ibs as tunables on both Linux and FreeBSD using
ZFS_MODULE_PARAM.

One of the use cases for changing the values of those tunables is to
lower the indirect block size, which may improve performance of large
directories (as discussed during the OpenZFS Leadership Meeting
on 2022-08-16).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14293
(cherry picked from commit 926715b9fc)
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski 3790dc2485 Add tunable to allow changing micro ZAP's max size
This change turns `MZAP_MAX_BLKSZ` into a `ZFS_MODULE_PARAM()` called
`zap_micro_max_size`. As a result, we can experiment with different
micro ZAP sizes to improve directory size scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Co-authored-by: Toomas Soome <toomas.soome@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14292
(cherry picked from commit a4b21eadec)
2023-07-05 13:27:30 +00:00
Tony Hutter 6c3c5fcfbe Tag zfs-2.1.5
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2022-06-21 17:00:34 -07:00
Matthew Thode 6e954130d4 Remove install of zfs-load-module.service for dracut
The zfs-load-module.service service is not currently provided by
the OpenZFS repository so we cannot safely assume it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #13574
2022-06-21 10:53:46 -07:00
Ryan Moeller 403d4bc66e FreeBSD: Silence clang unused-but-set-variable
Quick and dirty build fix for warnings being treated as errors.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
2022-06-15 11:27:28 -07:00
Alexander Motin 6ff89fe126 Improve sorted scan memory accounting
Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537
2022-06-15 11:23:49 -07:00
Rich Ercolani cc565f557b Corrected edge case in uncompressed ARC->L2ARC handling
I genuinely don't know why this didn't come up before,
but adding the LZ4 early abort pointed out this flaw,
in which we're allocating a buffer of one size, and
then telling the compressor that we're handing it buffers
of a different size, which may be Very Different - say,
allocating 512b and then telling it the inputs are 128k.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13375
2022-06-14 18:10:21 -07:00
Alexander Motin 338188562b Remove wrong assertion in log spacemap
It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b884 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486 
Closes #13513
2022-06-06 16:57:56 -07:00
Ryan Moeller 1fdd768d7f libzfs: Fail making a dataset handle gracefully
When a dataset is in the process of being received it gets marked as
inconsistent and should not be used.  We should check for this when
opening a dataset handle in libzfs and return with an appropriate error
set, rather than hitting an abort because of the incomplete data.

zfs_open() passes errno to zfs_standard_error() after observing
make_dataset_handle() fail, which ends up aborting if errno is 0.
Set errno before returning where we know it has not been set already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13077
2022-06-06 16:57:51 -07:00
наб 56eed508d4 libzfs: mount: don't leak mnt_param_t if mnt_func fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-06-06 16:57:46 -07:00
Rich Ercolani 271241187b Reject zfs send -RI with nonexistent fromsnap
Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575
2022-06-06 16:57:41 -07:00