Commit Graph

3662 Commits

Author SHA1 Message Date
Rob N 6c36d72e71 vdev: expose zfs_vdev_max_ms_shift as a module parameter
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Seagate Technology LLC
Closes #14719

(cherry picked from commit ff73574cd8)

The type of sysctls had to be changed from uint_t to int to match other
sysctls in to OpenZFS 2.1.5.
2023-07-31 15:05:56 +00:00
Fred Weigel b037880efb JSON spares state readout
Spares can be in multiple pools, even if in use. This means that
status check AVAIL/INUSE is a bit tricky. spa_add_spares does not
need to be called, but we do need to do the equivalent. Which is
now done directly.
2023-07-05 13:27:31 +00:00
Fred Weigel f7574ccff3 Update to json output.
Missing proper state output for spare devices. Everything else
should be ok.
2023-07-05 13:27:31 +00:00
Rob Norris 802c258fc1 compress: add "slack" compression options
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Allan Jude 066532da51 Add module parameter to block 0 byte writes
Some hardware has issues when issues a write of 0 bytes
Add a new module paramter, zio_suppress_zero_writes
That when enabled (default) will just complete these I/Os
without sending them to the hardware.

Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 91d6b61268 json: Define PRId64 and PRIu64 on FreeBSD
On FreeBSD, these types are long instead of long long.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 95d6d8d32f json: Drop problematic casts in nvlist_to_json()
The NVP_NAME() macro requires its argument to be castable to char *.
The compiler complains if const char * is provided instead.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 6ee35af1a4 zil: Drop an unnecessary if statement
We already check for error != 0 earlier and return if true. The compiler
error here is a false positive.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski d744cdb77c json: null_filter(): Use __maybe_unused
The function fails to compile with -Wself-assign.
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 9c2c6124be zpool: Provide GUID to zpool-reguid(8) with -g
This commit extends the zpool-reguid(8) command with a -g flag, which
allows the user to specify the GUID to set.

Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara Inc.
2023-07-05 13:27:31 +00:00
Allan Jude 9c9eed9737 Make zpool clear reset the removed flag on vdevs
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Allan Jude 41b06f70c6 Make zpool clear reset the removed flag on vdevs
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
2023-07-05 13:27:31 +00:00
Fred Weigel b6a9054a0e Fix checkstyle for zil.c
Returns are to be parenthesized
2023-07-05 13:27:31 +00:00
Fred Weigel eb3607bcec Fixes for Wasabi json endpoint
Corrects status output.
2023-07-05 13:27:31 +00:00
Fred Weigel cf5a6fbc82 Change 5 char tag limit to 255
Changes 5 character maximum tag to 255 characters.
2023-07-05 13:27:31 +00:00
Fred Weigel 6ccb1a75af Klara update for json
Fix checkstyle indicated errors, source format fixes

Signed-off-by: Fred Weigel <fred.weigel@klarasystems.com>
2023-07-05 13:27:31 +00:00
Allan Jude 2284c4d200 Add module parameter to block 0 byte writes
Some hardware has issues when issues a write of 0 bytes
Add a new module paramter, zio_suppress_zero_writes
That when enabled (default) will just complete these I/Os
without sending them to the hardware.

Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Rob Norris f882884358 btree: fix double-free in zfs_btree_remove_idx
We applied 03c0ee94b to fix two use-after-free cases, backporting 13f2b8fb9
from upstream. Unfortunately that patch seems to have been misapplied,
introducing a double-free in one of them. This commit fixes that.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2023-07-05 13:27:31 +00:00
Rob Norris 88149e0873 zil_create: don't try to deallocate a block we never allocated
(cherry picked from commit 8a35cfdcdd62ffc47e7628616f0dcb2ef172cf4b)
2023-07-05 13:27:31 +00:00
Rob Norris 5a256eaed1 zil_close: don't try to deallocate on-disk blocks
If we're force-exporting or failed then there's no guarantee the IO will
get anywhere. If its a clean shutdown then that's actually the lead
block and it'll be sorted out during replay or next txg.

(cherry picked from commit 01e04a4eef7811a31a6258c99d0cc51217732758)
2023-07-05 13:27:31 +00:00
Allan Jude 11d3cff47b Normalize the endpoint name
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
fredw 43b705c787 stats_version: 2, scan_stats added even if never done. pass_scrub_scrub_spent_paused is now pass_scrub_spent_paused. stats is stats.json
Signed-off-by: Allan Jude <allan@klarasystems.com>
2023-07-05 13:27:31 +00:00
Mateusz Piotrowski 3828f754f1 json_stats.c: Rename the stats file to "status.json" 2023-07-05 13:27:31 +00:00
Rob Norris 2724bcb3d6 zil: allow the ZIL to fail and restart independently of the pool
zil_commit() has always returned void, and thus, cannot fail. Everything
inside it assumed that if anything ever went wrong, it could fall back
on txg_wait_synced() until the txg covering the operations being flushed
from the ZIL has fully committed. This meant that if the pool failed and
failmode=continue was set, syncing operations like fsync() would still
block.

Unblocking zil_commit() means largely the same approach. The difficulty
is that the ZIL carries the record of uncommitted VFS operations (vs the
changed data), and attached to those, callbacks and cvs that will
release userspace callers once the data is on disk. So if we can't write
the ZIL, we also can't release those records until the data is on disk.

This wasn't a problem before, because the zil_commit() would block. If
we change zil_commit() to return error, we still need to track those
entries until the data they represent hits the disk. We also need to
accept new records; just because the ZIL fails may not necessarily mean
the pool itself is unavailable.

This commit reorganises the ZIL to allow zil_commit() to return failure.
If ZIL writes or flushes fail, the ZIL is moved into a "failed" state,
and no further writes are done; all zil_commit() calls are serviced by
the regular txg mechanism. Outstanding records (itx_ts) are held until
the main pool writes their associated txg out. The records are then
released. Once all records are cleared, the ZIL is reset and reopened.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit af821006f6602261e690fe6635689cabdeefcadf)
2023-07-05 13:27:31 +00:00
Rob Norris cdaf041d39 zil: ensure flush errors are recieved
Its possible for a hardware failure to occur in a way that the ZIL block
writes appear to succeed, but the flush fails.

Because flush errors were being ignored, the lwb chain would finish with
a zero error code, which would result in zil_commit() returning and thus
fsync() returning success to the caller, even though the data was not
recorded in the ZIL.

If the ZIL is on the main pool (no SLOG device) it would typically
suspend around the same time. If that happened before the txg committed,
then those writes are now totally lost - not on the pool, not in the
ZIL.

zil_lwb_flush_vdevs_done() has the necessary code to deal with this
situation, but zio_flush() would never return failure, so it never saw
it. This just allows flushes to report failure, and now we never miss a
failed ZIL write.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit d9db5dccc56b551d0bf66bc9022b6c19a659b7e1)
2023-07-05 13:27:31 +00:00
Rob Norris 8ec175d7e1 zio_flush: require caller to decide if errors should propagate
Ignoring flush errors makes it possible for callers to never know that
their writes didn't succeed, and allows writes to be lost if the pool
fails.

This commit gives zio_flush() a flag argument, and updates the call
sites to pass ZIO_FLAG_DONT_PROPAGATE to it. Thus, this commit does not
change any behaviour, but opens the floor for further changes to allow
those callers to handle flush failures sensibly.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 6d0deb8a5a0c3d6bbc69d9625d55fc776bb98ea3)
2023-07-05 13:27:31 +00:00
Rob Norris 589cea17a9 dmu_tx_wait: handle pool suspension when failmode=continue
Let txg_wait_synced_tx fail, so the caller can retry.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit d560d64dbdf853d8fb9e18fc7570bd309091b2e4)
2023-07-05 13:27:30 +00:00
Rob Norris 7b7af8ba02 vnops: thread DMU_TX_ASSIGN_CONTINUE to a bunch of vnops
These are ones that I'm reasonably sure connect to a real syscall and
have a reasonable error response.

I've left stuff like `dirty_inode`, `zfs_inactive`, etc, which are
internal kernel housekeeping things, as well as anything that looks like
it belongs to zvols, ioctls, admin commands, etc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 39c2801c611e27b521d716fea8f771307820362e)
2023-07-05 13:27:30 +00:00
Rob Norris aea007e336 dmu: add DMU_TX_ASSIGN_CONTINUE flag
This is like DMU_TX_ASSIGN_NOSUSPEND, but only when failmode=continue,
and returning EIO if the pool is suspended. Its designed to be easy to
use from syscalls and similar without the ceremony of checking the for
EAGAIN and failmode every time.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 6bed8644dd2afa0e39727e9e90642479c2416521)
2023-07-05 13:27:30 +00:00
Rob Norris 48a48059c7 dmu: rename dmu_tx_assign flags
Their names clash with those for txg_wait_synced_tx, and they aren't
directly compatible, leading to confusion.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 1f0fb1dae7c1e84de3b39e669e09b8b3d5b80b87)
2023-07-05 13:27:30 +00:00
Rob Norris b0d75996ba zio: don't report suspend IOs if the pool is already suspended
This can happen if the pool suspended and then new IO is issued which
then fails too. This doesn't change behaviour, just silences the noise.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 3fa696404fb40205ed631538c62ec1a54d8ee6cd)
2023-07-05 13:27:30 +00:00
Rob Norris 3aea149bf8 linux: reject syncing ops if the filesystem is unmounting
The kernel can call these during unmount, so we have to handle them
directly to prevent any further IO being issued.

zfs_fsync reorganised slightly to not set up zfs_fsyncer_key until after
the teardown lock is acquired, just in case we don't get it.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 900c26570ddcdd1d3ca135e6aee5df6456f6bfd6)
2023-07-05 13:27:30 +00:00
Mariusz Zaborski 40a9efd0e8 zfs: support force exporting pools
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by:  Klara, Inc.
Sponsored-by:  Catalogics, Inc.
Sponsored-by:  Wasabi Technology, Inc.
Closes #3461
(cherry picked from commit 852e633772217d779a63e8c46fe3c5f81dd8960e)
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski f65b59c5e5 module/zfs/Makefile.in: Add jprint.o and json_stats.o 2023-07-05 13:27:30 +00:00
Mateusz Piotrowski dcf745c378 Remove remaining bits of zpool addlog and ZFS_IOC_ADD_LOG 2023-07-05 13:27:30 +00:00
Mateusz Piotrowski 676d1dcc8c json_stats.c: Do not print value of vs_noalloc
The vs_noalloc member of the vdev_stat structure was implemented in
2a673e76a9. It is not available in ZFS
2.1.5, so code using it needs to be disabled.
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski bcde0da8e4 json_stats.c: Move variable declarations out of a switch statement
This patch fixes the following compilation error:

```
../../module/zfs/json_stats.c: In function ‘nvlist_to_json’:
../../module/zfs/json_stats.c:92:4: error: a label can only be part of a statement and a declaration is not a statement
    uint64_t *u = (uint64_t *)p;
    ^~~~~~~~
../../module/zfs/json_stats.c:102:4: error: a label can only be part of a statement and a declaration is not a statement
    nvlist_t **a = (nvlist_t **)p;
    ^~~~~~~~
```
2023-07-05 13:27:30 +00:00
Fred Weigel 747c7bbcf6 Add a JSON equivalent to zpool-status(8)
This is a squashed commit of the commits from
03a64568f318c696b9e4be19429e72b446c97462 to
1c64f0c8832b34bfa82645125351d6c62815ae21 developed by Fred Weigel.

Usage:

    cat /proc/spl/kstat/zfs/POOLNAME/stats

The following changes has been applied during the rebase of the patches
on top of the 2.1.5 branch:

- Drop ZFS_IOC_ADD_LOG. This ioctl was introduced to support introducing
  messages into the ZFS kernel log. It was used for debugging during
  development. The implementation of this debugging feature made `zpool
  addlog` output messages to /proc/spl/kstat/zfs/dbgmsg. The messages
  could later be retrieved with `zdbgmsg show`.
- Change the fmgw.c entry in lib/libzpool/Makefile.am to json_stats.c.
  The fmgw.c file has already been renamed to json_stats.c in other
  places.

Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>

(cherry picked from commit 75f3395d7fc0c93c02c8a8e792515f3e821aa05a)
2023-07-05 13:27:30 +00:00
Richard Yao 18ae26747c Fix use-after-free in btree code
Coverty static analysis found these.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #10989
Closes #13861
(cherry picked from commit 13f2b8fb92)
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski 2f327a2457 Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
The default_bs and default_ibs tunables control the default block size
and indirect block size.

So far, default_bs and default_ibs were tunable only on FreeBSD, e.g.,

    sysctl vfs.zfs.default_ibs

Remove the FreeBSD-specific sysctl code and expose default_bs and
default_ibs as tunables on both Linux and FreeBSD using
ZFS_MODULE_PARAM.

One of the use cases for changing the values of those tunables is to
lower the indirect block size, which may improve performance of large
directories (as discussed during the OpenZFS Leadership Meeting
on 2022-08-16).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14293
(cherry picked from commit 926715b9fc)
2023-07-05 13:27:30 +00:00
Mateusz Piotrowski 3790dc2485 Add tunable to allow changing micro ZAP's max size
This change turns `MZAP_MAX_BLKSZ` into a `ZFS_MODULE_PARAM()` called
`zap_micro_max_size`. As a result, we can experiment with different
micro ZAP sizes to improve directory size scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Co-authored-by: Toomas Soome <toomas.soome@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14292
(cherry picked from commit a4b21eadec)
2023-07-05 13:27:30 +00:00
Ryan Moeller 403d4bc66e FreeBSD: Silence clang unused-but-set-variable
Quick and dirty build fix for warnings being treated as errors.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
2022-06-15 11:27:28 -07:00
Alexander Motin 6ff89fe126 Improve sorted scan memory accounting
Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537
2022-06-15 11:23:49 -07:00
Rich Ercolani cc565f557b Corrected edge case in uncompressed ARC->L2ARC handling
I genuinely don't know why this didn't come up before,
but adding the LZ4 early abort pointed out this flaw,
in which we're allocating a buffer of one size, and
then telling the compressor that we're handing it buffers
of a different size, which may be Very Different - say,
allocating 512b and then telling it the inputs are 128k.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13375
2022-06-14 18:10:21 -07:00
Alexander Motin 338188562b Remove wrong assertion in log spacemap
It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b884 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486 
Closes #13513
2022-06-06 16:57:56 -07:00
Brian Behlendorf fec407fb69 Linux 5.19 compat: aops->read_folio()
As of the Linux 5.19 kernel the readpage() address space operation
has been replaced by read_folio().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-06-01 14:24:49 -07:00
Brian Behlendorf 7ae5ea8864 Linux 5.19 compat: blkdev_issue_secure_erase()
Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure
erase functionality from the blkdev_issue_discard() function.
The blkdev_issue_secure_erase() must now be issued to issue
a secure erase.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-06-01 14:24:49 -07:00
Brian Behlendorf 048301b6dc Linux 5.19 compat: bdev_max_secure_erase_sectors()
Linux 5.19 commit torvalds/linux@44abff2c0 removed the
blk_queue_secure_erase() helper function.  The preferred
interface is to now use the bdev_max_secure_erase_sectors()
function to check for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-06-01 14:24:49 -07:00
Brian Behlendorf 9ce5eb18ef Linux 5.19 compat: bdev_max_discard_sectors()
Linux 5.19 commit torvalds/linux@70200574cc removed the
blk_queue_discard() helper function.  The preferred interface
is to now use the bdev_max_discard_sectors() function to check
for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-06-01 14:24:49 -07:00
Brian Behlendorf 5a639f0802 Linux 5.18 compat: bio_alloc()
As for the Linux 5.18 kernel bio_alloc() expects a block_device struct
as an argument.  This removes the need for the bio_set_dev() compatibility
code for 5.18 and newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-06-01 14:24:49 -07:00
hping b28c0c4bf8 abd_os: remove redundant refcount creation for abd_children
Refcount creation for abd_zero_scatter->abd_children is redundant in
abd_alloc_zero_scatter, as it has been done in abd_init_struct.

In addition, abd_children is undefined when ZFS_DEBUG is disabled, the
reference of abd_children in abd_alloc_zero_scatter breaks build of
libzpool when ZFS_DEBUG is disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13429
2022-05-20 10:33:24 -07:00
Aidan Harris eee389ba2e Fix functions without a prototype
clang-15 emits the following error message for functions without
a prototype:

fs/zfs/os/linux/spl/spl-kmem-cache.c:1423:27: error:
  a function declaration without a prototype is deprecated
  in all versions of C [-Werror,-Wstrict-prototypes]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aidan Harris <me@aidanharr.is>
Closes #13421
2022-05-20 10:33:24 -07:00
Mateusz Guzik 2c5c8bb0a6 FreeBSD: use zero_region instead of allocating a dedicated page
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13406
2022-05-20 10:33:24 -07:00
szubersk 756c3e085b autoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol
A followup to 849c14e048
Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13389
2022-05-20 10:33:24 -07:00
Damian Szuberski 13b1f336d3 PPC get_user workaround
Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic()
inline helpers very indirectly include a reference to the GPL'd
array mmu_feature_keys[] and fails to build. Workaround this by
using copy_from_user() and throwing EFAULT for any calls to
__copy_from_user_inatomic(). This is a workaround until a fix
for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458
"powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11958
Closes #12590
Closes #13367
2022-05-20 10:33:24 -07:00
Brian Atkinson 60fc173251 Adding ZERO_PAGE detection
On some architectures ZERO_PAGE is unavailable because it references
a GPL exported symbol of empty_zero_page. Originally e08b993 removed
the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However,
a simple check can be done to avoid a kernel allocation and free for
the abd_zero_page if ZERO_PAGE is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13199
2022-05-20 10:33:24 -07:00
Ka Ho Ng 1f31889046 FreeBSD: Implement hole-punching support
This adds supports for hole-punching facilities in the FreeBSD kernel
starting from __FreeBSD_version 1400032.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458
2022-05-17 11:15:29 -07:00
наб 1467a1bb33 module: zstd: check we don't leak symbols; regenerate symbol map
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12988
Closes #13209
(cherry picked from commit 6ef00196db)
2022-05-16 15:48:21 -07:00
Brian Behlendorf bb29f1eb38 Reduce dbuf_find() lock contention
Holding a dbuf is a common operation which can become highly contended
in dbuf_find() when acquiring the dbuf hash mutex.  This is particularly
true on Linux when reading/writing volumes since by default up to 32
threads from the zvol_taskq may be taking a hold of the same dbuf.
This should also be observable on FreeBSD as long as there are enough
processes accessing the volume concurrently.

This is further aggregrated by the fact that only the block id will
be unique when calculating the dbuf hash for a single volume.  The
objset id, object id, and level will be the same for data blocks.
This has been observed to result in a somehwat less than uniform hash
distribution and a longer than expected max hash chain depth (~20)
on a large memory system (256 GB) using volumes.

This commit improves the siutation by switching the hash mutex to
an rwlock to allow concurrent lookups, and increasing DBUF_RWLOCKS
from 2048 to 8192 to further reduce the odds of a hash collision.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13405
2022-05-06 12:02:45 -07:00
Brian Behlendorf 4c9c96aba4 Silence unused-but-set-variable warnings
Clang 13.0.0 added support for `Wunused-but-set-parameter` and
`-Wunused-but-set-variable` which correctly detects two unused
variables in zstd resulting in a build failure.  This commit
annotates these instances accordingly.

  https://releases.llvm.org/13.0.1/tools/clang/docs/ReleaseNotes.html#id6

In FSE_createCTable(), malloc() is intentionally defined as NULL when
compiled in the kernel so the variable is unused.

  zstd/lib/compress/fse_compress.c:307:12: error: variable 'size'
  set but not used [-Werror,-Wunused-but-set-variable]

Additionally, in ZSTD_seqDecompressedSize() the assert is compiled
out similarly resulting in an unused variable.

  zstd/lib/compress/zstd_compress_superblock.c:412:12: error: variable
  'litLengthSum' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2022-05-02 15:42:58 -07:00
наб ecec151c14 module: zfs: freebsd: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2022-05-02 15:42:58 -07:00
наб a4f582f0b6 FreeBSD: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-05-02 15:42:58 -07:00
наб 9e68b734b3 zvol: remove unused variable
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-05-02 15:42:58 -07:00
наб a175fe82e6 fm: remove unused variables
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-05-02 15:42:58 -07:00
наб b8e1366ee6 zvol: remove unused variable
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-05-02 15:42:58 -07:00
наб 7536ad35ca module/zfs: vdev_removal: spa_vdev_remove_thread: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2022-05-02 15:42:58 -07:00
наб 986d64ccca module/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2022-05-02 15:42:58 -07:00
наб 18e9268087 module/zfs: dbuf: dbuf_read_impl: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2022-05-02 15:42:58 -07:00
наб 4149e19dfc module/zfs: arc: arc_hdr_realloc_crypt: remove unused variables
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2022-05-02 15:42:58 -07:00
Brian Behlendorf ce8d41ef75
Skip spacemaps reading in case of pool readonly import
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9095
Closes #12687
2022-04-28 16:47:12 -07:00
Brian Behlendorf 49c1346c10
Linux 5.18 compat: replace __set_page_dirty_nobuffers
Replace __set_page_dirty_nobuffers with filemap_dirty_folio.

Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a
("Merge tag 'folio-5.18b' of
git://git.infradead.org/users/willy/pagecache ")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Satadru Pramanik <satadru@gmail.com>
Signed-off-by: Satadru Pramanik <satadru@gmail.com>
Closes #13325
Closes #13380
2022-04-28 15:17:38 -07:00
Brian Behlendorf 71cd3726c0
Fix O_APPEND for Linux 3.15 and older kernels
When using a Linux kernel which predates the iov_iter interface the
O_APPEND flag should be applied in zpl_aio_write() via the call to
generic_write_checks().  The updated pos variable  was incorrectly
ignored resulting in the current offset being used.

This issue should only realistically impact the RHEL/CentOS 7.x
kernels which are based on Linux 3.10.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13370 
Closes #13377
2022-04-28 15:15:28 -07:00
наб 642426095a Linux 5.18 compat: kobj_type.default_attrs replaced with default_groups
Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject:
 kobj_type: remove default_attrs")
Upstream-commit: 0cdda2edb3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13357
2022-04-25 10:00:09 -07:00
Alexander Motin 972637dc06 FreeBSD: Fix translation from ABD to physical pages.
In hypothetical case of non-linear ABD with single segment, multiple
to page size but not aligned to it, vdev_geom_fill_unmap_cb() could
fill one page less into bio_ma array.

I am not sure it is expoitable, but better to be safe than sorry.

Reported-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
(cherry picked from commit 5352f85cdd)
2022-04-21 16:59:09 -07:00
Rich Ercolani c220771a47 Corrected oversight in ZERO_RANGE behavior
It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do
have differing semantics in some ways - in particular,
one requires KEEP_SIZE, and the other does not.

Also added a zero-range test to catch this, corrected a flaw
that made the punch-hole test succeed vacuously, and a typo
in file_write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13329 
Closes #13338
2022-04-21 16:58:07 -07:00
Brian Behlendorf aa1c3c1d1d Linux 5.17 compat: GENHD_FL_EXT_DEVT / GENHD_FL_NO_PART_SCAN
As of the 5.17 kernel the GENHD_FL_EXT_DEVT flag has been removed
and the GENHD_FL_NO_PART_SCAN flag renamed GENHD_FL_NO_PART. Update
zvol_alloc() to set GENHD_FL_NO_PART for the newer kernels which
is sufficient.  The behavior for prior kernels remains unchanged.

1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT")
46e7eac6 ("block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13294
Closes #13297
2022-04-20 13:44:19 -07:00
Mark Johnston b7546f92ea FreeBSD: Return Mach error codes from VOP_(GET|PUT)PAGES
FreeBSD's memory management system uses its own error numbers and gets
confused when these VOPs return EIO.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13311
2022-04-19 10:42:54 -07:00
Mark Johnston e9cd90f6e5 FreeBSD: Parameterize ZFS_ENTER/ZFS_VERIFY_VP with an error code
For legacy reasons, a couple of VOPs have to return error numbers that
don't come from the usual errno namespace.  To handle the cases where
ZFS_ENTER or ZFS_VERIFY_ZP fail, we need to be able to override the
default error return value of EIO.  Extend the macros to permit this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13311
2022-04-19 10:42:54 -07:00
Riccardo Schirone 35ddd8ee2e Linux 5.18 compat: use address_space_operations->readahead
->readpages was removed and replaced by ->readahead. Define
zpl_readahead for kernels that don't have ->readpages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Riccardo Schirone <rschirone91@gmail.com>
Closes #13278
2022-04-06 13:15:27 -07:00
Riccardo Schirone 10a9f5fc47 Linux 5.18 compat: blkg_tryget is moved to private headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Riccardo Schirone <rschirone91@gmail.com>
Closes #13278
2022-04-06 13:15:27 -07:00
наб 9f7f704507 Linux 5.18 compat: replace genhd.h with blkdev.h includes
blkdev.h includes genhd.h since dawn of upstream git, so this is
globally safe

Upstream-commit: 322cbb50de711814c42fb088f6d31901502c711a ("block:
 remove genhd.h")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13251
2022-04-06 13:15:27 -07:00
наб 215a8255a9 Linux 5.18 compat: 4-argument bio_alloc()
bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs)

became

  bio_alloc(struct block_device *bdev, unsigned short nr_vecs,
            unsigned int opf, gfp_t gfp_mask)
passing NULL/0 continues previous behaviour

Upstream-commit: 07888c665b405b1cd3577ddebfeb74f4717a84c4 ("block:
 pass a block_device and opf to bio_alloc")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13251
2022-04-06 13:15:27 -07:00
Ryan Moeller a5a28723bd FreeBSD: Use NDFREE_PNBUF if available
NDF_ONLY_PNBUF has been removed from FreeBSD in favor of NDFREE_PNBUF.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13277
2022-04-06 10:29:53 -07:00
Brian Behlendorf 5a9994f5ae Export minimal zfs_refcount interfaces
Lustre makes light use of the zfs_refcount interfaces which
isn't a problem when using a non-debug build of OpenZFS. However,
when debugging is enabled the required symbols are not exported.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12613
2022-04-06 10:29:00 -07:00
Brian Behlendorf 9f6943504a Default to zfs_dmu_offset_next_sync=1
Strict hole reporting was previously disabled by default as a
performance optimization.  However, this has lead to confusion
over the expected behavior and a variety of workarounds being
adopted by consumers of ZFS.  Change the default behavior to
always report holes and force the TXG sync.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Upstream-commit: 05b3eb6d23
Ref: #13261
Closes #12746
2022-04-01 09:59:47 -07:00
Brian Behlendorf 847d03060f
Fix ACL checks for NFS kernel server
This PR changes ZFS ACL checks to evaluate
fsuid / fsgid rather than euid / egid to avoid
accidentally granting elevated permissions to
NFS clients.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Andrew Walker <awalker@ixsystems.com>
Co-authored-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13221
2022-03-20 21:21:18 -07:00
Kyle Evans 421750672b module: freebsd: avoid a taking a destroyed lock in zfs_zevent bits
At shutdown time, we drain all of the zevents and set the
ZEVENT_SHUTDOWN flag.  On FreeBSD, we may end up calling
zfs_zevent_destroy() after the zevent_lock has been destroyed while
the sysevent thread is winding down; we observe ESHUTDOWN, then back
out.

Events have already been drained, so just inline the kmem_free call in
sysevent_worker() to avoid the race, and document the assumption that
zfs_zevent_destroy doesn't do anything else useful at that point.

This fixes a panic that can occur at module unload time.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #13220
2022-03-18 17:11:43 -07:00
Mateusz Guzik 275c756730 FreeBSD: add missing replay check to an assert in zfs_xvattr_set
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13219
2022-03-18 17:11:43 -07:00
Ryan Moeller 7b215d93bc
Fix module build with -Werror
This is a direct commit to zfs-2.1-release to fix release builds that
error out on an unused variable.  The issue is avoided on master by a
huge series of commits that change how the ASSERT macros work, but that
is not feasible to backport.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13194 
Closes #13196
2022-03-17 10:18:23 -07:00
Brian Behlendorf 145af480d3 Fix ENOSPC when unlinking multiple files from full pool
When unlinking multiple files from a pool at 100% capacity, it was
possible for ENOSPC to be returned after the first unlink.  e.g.

    rm -f /mnt/fs/test1.0.0 /mnt/fs/test1.1.0 /mnt/fs/test1.2.0
    rm: cannot remove '/mnt/fs/test1.1.0': No space left on device
    rm: cannot remove '/mnt/fs/test1.2.0': No space left on device

After waiting for the pending deferred frees from the first unlink to
be processed the remaining files can then be unlinked.  This is caused
by the quota limit in dsl_dir_tempreserve_impl() being temporarily
decreased to the allocatable pool capacity less any deferred free
space.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13172
2022-03-08 11:46:03 -08:00
Mark Johnston b3427b18b1 zfs: Fix a deadlock between page busy and the teardown lock
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache.  To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object.  Each page is thus exclusively busied while the
dataset's teardown write lock is held.

When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them.  The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied.  This represents a lock order reversal which can
lead to deadlock.

To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid.  Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages.  Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.

PR:		258208
Tested by:	pho
Reviewed by:	avg, sef, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32931

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2022-03-04 15:37:41 -08:00
Alexander Motin 0e2bb1a3ee Really zero the zero page
While switching abd_zero_buf allocation KPI I've missed the fact
that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc()
does not.  Add explicit bzero() after it.

I don't think it should have caused real problems, but leaking one
memory page content all over the pool is not good.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12569
2022-03-04 15:37:33 -08:00
Paul Dagnelie 7bd292e59b Fix cpu hotplug atomic sleep issue
We move the spinlock unlock before the thread creation. This should be
safe because the thread creation code doesn't actually manipulate any
taskq data structures; that's done by the thread once it's created.

We also remove the assertion that the maxthreads is the current threads
plus one; that assertion could fail if multiple hotplug events come in
quick succession, and the first new taskq thread hasn't had a chance to
start processing yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
eviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12714
2022-02-15 21:52:45 -08:00
George Amanakis bcddb18bae Enable encrypted raw sending to pools with greater ashift
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13067 
Closes #13074
2022-02-23 16:47:37 -08:00
George Amanakis 6c6153e5b8 Avoid dirtying the final TXGs when exporting a pool
There are two codepaths than can dirty final TXGs:

1) If calling spa_export_common()->spa_unload()->
   spa_unload_log_sm_flush_all() after the spa_final_txg is set, then
   spa_sync()->spa_flush_metaslabs() may end up dirtying the final
   TXGs. Then we have the following panic:
   Call Trace:
    <TASK>
    dump_stack_lvl+0x46/0x62
    spl_panic+0xea/0x102 [spl]
    dbuf_dirty+0xcd6/0x11b0 [zfs]
    zap_lockdir_impl+0x321/0x590 [zfs]
    zap_lockdir+0xed/0x150 [zfs]
    zap_update+0x69/0x250 [zfs]
    feature_sync+0x5f/0x190 [zfs]
    space_map_alloc+0x83/0xc0 [zfs]
    spa_generate_syncing_log_sm+0x10b/0x2f0 [zfs]
    spa_flush_metaslabs+0xb2/0x350 [zfs]
    spa_sync_iterate_to_convergence+0x15a/0x320 [zfs]
    spa_sync+0x2e0/0x840 [zfs]
    txg_sync_thread+0x2b1/0x3f0 [zfs]
    thread_generic_wrapper+0x62/0xa0 [spl]
    kthread+0x127/0x150
    ret_from_fork+0x22/0x30
    </TASK>

2) Calling vdev_*_stop_all() for a second time in spa_unload() after
   spa_export_common() unnecessarily delays the final TXGs beyond what
   spa_final_txg is set at.

Fix this by performing the check and call for
spa_unload_log_sm_flush_all() before the spa_final_txg is set in
spa_export_common(). Also check if the spa_final_txg has already been
set in spa_unload() and skip those calls in this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
External-issue: https://www.illumos.org/issues/9081
Closes #13048 
Closes #13098
2022-02-23 16:47:33 -08:00
Tomohiro Kusumi 02309af096 Remove unneeded "extern inline" function declarations
All of these externs are already #included as static inline
functions via corresponding headers.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #13073
2022-02-16 17:58:56 -08:00
наб 94a4b7ec3d module: zfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2022-02-16 17:58:56 -08:00
drowfx bc99c809d5 Add dataset_kstats_update.. to mmap read/write paths
This allows reads/writes caused by accesses to mmap files to be
accounted correctly in the per-dataset kstats for both Linux and
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthias Blankertz <matthias@blankertz.org>
Closes #12994 
Closes #13044
2022-02-16 17:58:56 -08:00
Attila Fülöp 5c19af07d4 Receive checks should allow unencrypted child datasets
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13033 
Closes #13076
2022-02-16 17:58:55 -08:00
Peter Levine c7fcf00917 Add support for $KERNEL_{CC,LD,LLVM} variables
Currently, $(CC), $(LD), and $(LLVM) variables aren't passed to kbuild
while building modules.  This causes modules to build with the default
GNU GCC toolchain and prevents experimenting with other toolchains such
as CLANG/LLVM.  It can also lead to build failure if the CFLAGS/LDFLAGS
passed are incompatible with gcc/ld.

Pass $KERNEL_CC, $KERNEL_LD, and $KERNEL_LLVM as $(CC), $(LD), and
$(LLVM), respectively, to kbuild for each that is defined in the
environment.  This should take care of the majority of alternative
toolchain use cases.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #13046
2022-02-16 17:58:55 -08:00