Compare commits

...

130 Commits

Author SHA1 Message Date
Brian Behlendorf 2407f30bda Tag 2.2.0-rc5
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-10-07 09:14:37 -07:00
Alexander Motin 9be8ddfb3c ZIL: Reduce maximum size of WR_COPIED to 7.5K
Benchmarks show that at certain write sizes range lock/unlock take
not so much time as extra memory copy.  The exact threshold is not
obvious due to other overheads, but it is definitely lower than
~63KB used before.  Make it configurable, defaulting at 7.5KB,
that is 8KB of nearest malloc() size minus itx and lr structs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15353
2023-10-07 09:08:20 -07:00
siv0 3755cde22a rpm: Fix `make rpm` on Debian/Ubuntu
The recent patch to change the bash completion install location based
on the Distribution, ignored that it should still be possible to
create RPMs on Debian derived systems. Additionally `make deb` itself
creates RPMs and converts them via `alien`.

This patch adds the bashcompletiondir variable to the rpm defines and
uses this for the location, where to get the bash completion file.

It still changes the location on Debian/Ubuntu systems in the final
packages from /etc/bash_completion.d to
/usr/share/bash-completion/completions

Fixes: e69ade32e1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15355
Closes #15365
2023-10-07 09:08:20 -07:00
Rob Norris 33d7c2d165 import: require force when cachefile hostid doesn't match on-disk
Previously, if a cachefile is passed to zpool import, the cached config
is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and
the results are taken as the canonical pool config and handed back to
ZFS_IOC_POOL_IMPORT.

In the course of its operation, spa_load() will inspect the pool and
build a new config from what it finds on disk. However, it then
regenerates a new config ready to import, and so rightly sets the hostid
and hostname for the local host in the config it returns.

Because of this, the "require force" checks always decide the pool is
exported and last touched by the local host, even if this is not true,
which is possible in a HA environment when MMP is not enabled. The pool
may be imported on another head, but the import checks still pass here,
so the pool ends up imported on both.

(This doesn't happen when a cachefile isn't used, because the pool
config is discovered in userspace in zpool_find_import(), and that does
find the on-disk hostid and hostname correctly).

Since the systemd zfs-import-cache.service unit uses cachefile imports,
this can lead to a system returning after a crash with a "valid"
cachefile on disk and automatically, quietly, importing a pool that has
already been taken up by a secondary head.

This commit causes the on-disk hostid and hostname to be included in the
ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the
"force" checks for zpool import to use them if present.

This method should give no change in behaviour for old userspace on new
kernels (they won't know to look for the new config items) and for new
userspace on old kernels (the won't find the new config items).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15290
2023-10-07 09:08:20 -07:00
Rob Norris 2919784be2 tests: add tests for zpool import behaviour when hostid changes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15290
2023-10-07 09:08:20 -07:00
Rob N 8495536f7f zfsconcepts: add description of block cloning
Here I'm trying to succinctly introduce the concept, the basics of its
construction, how its different to dedup, how to use it, and where its
limitations lie, in four paragraphs and with enough searchable terms to
help the reader find more information both within OpenZFS and elsewhere.

Phew.

Sponsored-By: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15362
2023-10-07 09:08:20 -07:00
Alexander Motin bcd010d3a5 Reduce number of metaslab preload taskq threads.
Before this change ZFS created threads for 50% of CPUs for each top-
level vdev.  Plus it created the same number of threads for embedded
log groups (that have only one metaslab and don't need any preload).
As result, on system with 80 CPUs and pool of 60 vdevs this resulted
in 4800 metaslab preload threads, that is absolutely insane.

This patch changes the preload threads to 50% of CPUs in one taskq
per pool, so on the mentioned system it will be only 40 threads.

Among other things this fixes zdb on the mentioned system and pool
on FreeBSD, that failed to create so many threads in one process.

Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15319
2023-10-07 09:08:20 -07:00
Martin Matuška c27277daac CI: add FreeBSD build with Cirrus CI
As a first step for automatic FreeBSD testing add a build and install
for FreeBSD versions 12.4, 13.2 and 14-snapshot using Cirrus CI.

Reviewed-by: Jose Luis Duran 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15332
2023-10-07 09:08:20 -07:00
Rob N bf54da84fb tests/block_cloning: sync before write in fallback test
We're still seeing this test fail intermittently (that is, the clone
happens), which must mean the write and the clone can still be happening
on different txgs.

It might be that there's still activity after the pool is created. So
here we force a sync before starting the write.

Sponsored-By: Klara Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15359
2023-10-07 09:08:20 -07:00
Alexander Motin 3158b5d718 ARC: Drop different size headers for crypto
To reduce memory usage ZFS crypto allocated bigger by 56 bytes ARC
headers only when specific block was encrypted on disk.  It was a
nice optimization, except in some cases the code reallocated them
on fly, that invalidated header pointers from the buffers.  Since
the buffers use different locking, it created number of races, that
were originally covered (at least partially) by b_evict_lock, used
also to protection evictions.  But it has gone as part of #14340.
As result, as was found in #15293, arc_hdr_realloc_crypt() ended
up unprotected and causing use-after-free.

Instead of introducing some even more elaborate locking, this patch
just drops the difference between normal and protected headers. It
cost us additional 56 bytes per header, but with couple patches
saving 24 bytes, the net growth is only 32 bytes with total header
size of 232 bytes on FreeBSD, that IMHO is acceptable price for
simplicity.  Additional locking would also end up consuming space,
time or both.

Reviewe-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15293
Closes #15347
2023-10-07 09:08:20 -07:00
Alexander Motin ba7797c8db ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
In most cases we do not care about exact number of buffers linked
to the header, we just need to know if it is zero, non-zero or one.
That can easily be checked just looking on b_buf pointer or in some
cases derefencing it.

b_ebufcnt is read only once, and in that case we already traverse
the list as part of arc_buf_remove(), so second traverse should not
be expensive.

This reduces L1 ARC header size by 8 bytes and full crypto header by
16 bytes, down to 176 and 232 bytes on FreeBSD respectively.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15350
2023-10-07 09:08:20 -07:00
Alexander Motin bc77a0c85e ARC: Remove b_cv from struct l1arc_buf_hdr
Earlier as part of #14123 I've removed one use of b_cv.  This patch
reuses the same approach to remove the other one from much more
rare code path.

This saves 16 bytes of L1 ARC header on FreeBSD (reducing it from
200 to 184 bytes) and seems even more on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15340
2023-10-07 09:08:20 -07:00
Andrew Turner 1611b8e56e Add BTI landing pads to the AArch64 SHA2 assembly
The Arm Branch Target Identification (BTI) extension guards against
branching to an unintended instruction.

To support BTI add the landing pad instructions to the SHA2 functions.
These are from the hint space so are a nop on hardware that lacks BTI
support or if BTI isn't enabled.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #14862
Closes #15339
2023-10-04 12:36:21 -07:00
Umer Saleem 8015e2ea66 Add '-u' - nomount flag for zfs set
This commit adds '-u' flag for zfs set operation. With this flag,
mountpoint, sharenfs and sharesmb properties can be updated
without actually mounting or sharing the dataset.

Previously, if dataset was unmounted, and mountpoint property was
updated, dataset was not mounted after the update. This behavior
is changed in #15240. We mount the dataset whenever mountpoint
property is updated, regardless if it's mounted or not.

To provide the user with option to keep the dataset unmounted and
still update the mountpoint without mounting the dataset, '-u'
flag can be used.

If any of mountpoint, sharenfs or sharesmb properties are updated
with '-u' flag, the property is set to desired value but the
operation to (re/un)mount and/or (re/un)share the dataset is not
performed and dataset remains as it was before.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15322
2023-10-03 15:41:46 -07:00
Umer Saleem c53bc3837c Improve the handling of sharesmb,sharenfs properties
For sharesmb and sharenfs properties, the status of setting the
property is tied with whether we succeed to share the dataset or
not. In case sharing the dataset is not successful, this is
treated as overall failure of setting the property. In this case,
if we check the property after the failure, it is set to on.

This commit updates this behavior and the status of setting the
share properties is not returned as failure, when we fail to
share the dataset.

For sharenfs property, if access list is provided, the syntax
errors in access list/host adresses are not validated until after
setting the property during postfix phase while trying to
share the dataset. This is not correct, since the property has
already been set when we reach there.

Syntax errors in access list/host addresses are validated while
validating the property list, before setting the property and
failure is returned to user in this case when there are errors
in access list.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15240
2023-10-03 15:41:46 -07:00
Umer Saleem e9dc31c74e Update the behavior of mountpoint property
There are some inconsistencies in the handling of mountpoint
property. This commit updates the behavior and makes it
consistent.

If mountpoint property is set when dataset is unmounted, this
would update the mountpoint property. The mountpoint could be
valid or invalid in this case. Setting the mountpoint property
would result in success in this case. Dataset would still be
unmounted here.

On the other hand, if dataset is mounted and mountpoint
property is updated to something invalid where mount cannot be
successful, for example, setting the mountpoint inside a readonly
directory. This would unmount the dataset, set the mountpoint
property to requested value and tries to mount the dataset. The
mount operation returns error and this error is treated as
overall failure of setting the property while the property is
actually set.

To make the behavior consistent in case dataset is mounted or
unmounted, we should try to mount the dataset whenever mountpoint
property is updated. This would result in mounting the datasets
if canmount property is set to on, regardless if the dataset was
previously unmounted.

The failure in mount operation while setting the mountpoint
property should not be treated as failure, since the property is
actually set now to user requested value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15240
2023-10-03 15:41:46 -07:00
Stoiko Ivanov b04b13ae79 contrib: debian: drop bashcompletion mangling after install
tested by running:
```
./configure --with-config=user; cp -a contrib/debian .
dpkg-buildpackage -b -uc -us
```
on a Debian 12 based system.

and checking where the completion file got installed.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-03 09:06:07 -07:00
Stoiko Ivanov 7b1d421adf contrib: debian: switch to dh-sequence-dkms
Follows b191f9a13d3005621ead9a727b811892264505ef from Debian's
packaging team at:
https://salsa.debian.org/zfsonlinux-team/zfs/

The previous build-dependency is kept as option, to still be able to
build on older Debian based distros (e.g. Ubuntu 20.04).

Without this building on Debian 12/bookworm does not work, as `dkms`
is a virtual package.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-03 09:06:07 -07:00
Stoiko Ivanov db5c3b4c76 contrib: bash_completion.d: make install destination vendor dependent
Certain Linux distributions (Debian/Ubuntu at least) expect
bash-completion snippets to be installed in
/usr/share/bash-completion/completions instead of
/etc/bash_completion.d.

This patch sets the bashcompletiondir variable based on the vendor,
inspired by similar settings for initdir and initconfdir.

It seems that commit 612b8dff5b
caused the file to be installed in the first-place (thus the error
when building debian packages only became apparent when testing a
2.2.0-rc4 build)

The change only sets the variable in Makefile context - the
rpm/zfs.spec.in file has the path hardcoded as
%{_sysconfdir}/bash_completion.d/zfs, but since running
```
./configure --sysconfdir=/myetc  ; make rpm
```
also results in all relevant files to be installed in /etc instead of
/myetc I assume this can remain as is.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-03 09:06:07 -07:00
Chunwei Chen 0d870a1775 Fix invalid pointer access in trace_dbuf.h
In dnode_destroy, dn_objset is invalidated. However, it will later call
into dbuf_destroy, in which DTRACE_SET_STATE will try to access spa_name
via dn_objset causing illegal pointer access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15333
2023-10-03 09:06:07 -07:00
George Amanakis 608741d062 Report ashift of L2ARC devices in zdb
Commit 8af1104f does not actually store the ashift of cache devices in
their label. However, in order to facilitate reporting the ashift
through zdb, we enable this in the present commit. We also document
how the retrieval of the ashift is done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #15331
2023-10-03 09:06:07 -07:00
Alexander Motin 3079bf2e6c Restrict short block cloning requests
If we are copying only one block and it is smaller than recordsize
property, do not allow destination to grow beyond one block if it
is not there yet.  Otherwise the destination will get stuck with
that block size forever, that can be as small as 512 bytes, no
matter how big the destination grow later.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15321
2023-10-03 09:06:07 -07:00
Brian Behlendorf b34bf2d5f6 Tweak rebuild in-flight hard limit
Vendor testing shows we should be able to get a little more
performance if we further relax the hard limit which we're hitting.

Authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15324
2023-10-03 09:06:07 -07:00
Akash B 229ca7d738 Fix ENOSPC for extended quota
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR #13172 but then this was
again introduced by PR #13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15312
2023-09-28 14:28:21 -07:00
Paul Dagnelie 9e36c5769f Don't allocate from new metaslabs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15307
Closes #15308
2023-09-28 14:28:21 -07:00
Paul Dagnelie d38f4664a6 Reduce trim min size even lower for tests to reduce flakiness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15315
2023-09-28 14:28:21 -07:00
Paul Dagnelie 99dc1fc340 ZTS: Fix introduced test bug in block_cloning_copyfilerange
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15316
2023-09-28 14:28:21 -07:00
Brian Behlendorf ba4dbbdae7 ZTS: Add additional exceptions
"zfs_share_concurrent_shares" may fail on FreeBSD and some Linux
distributions (fedora).  Move it to the common list.

"zfs_allow_010_pos" has been observed to fail on FreeBSD 13.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15306
2023-09-28 14:28:21 -07:00
Paul Dagnelie 8526b12f3d Set timeout before creating pool in test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15309
2023-09-28 14:28:21 -07:00
Paul Dagnelie 0ce1b2ca19 Invoke zdb by guid to avoid import errors
The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15298
2023-09-28 14:28:21 -07:00
Alexander Motin 0aabd6b482 ZIL: Avoid dbuf_read() in ztest_get_data()
While working on similar patches for zfs and zvol in #15153 I've
forgot about ztest.  Update it also so that we test the same code
paths as use in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15301
2023-09-28 14:28:21 -07:00
Rob N 5f30698670 tests/block_cloning: try harder to stay on same txg in fallback test
We've observed this test failing intermittently. When it does, the
"same block" check shows that both files have the same content, that is,
the file was cloned.

The only way this could have happened is if the open txg moved between
the dd and clonefile calls. That's possible because although we set
zfs_txg_timeout to be large, that only affects the wait time in the sync
thread at the start of a new txg; it doesn't change anything if its
currently waiting or working.

So here we just force the txgs to move immediately before, which should
get both operations onto the same txg as intented.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris Rob Norris <rob.norris@klarasystems.com>
Closes #15303
2023-09-22 16:13:20 -07:00
Rob N a199cac6cd status: report pool suspension state under failmode=continue
When failmode=continue is set and the pool suspends, both 'zpool status'
and the 'zfs/pool/state' kstat ignore it and report the normal vdev tree
state. There's no clear indicator that the pool is suspended. This is
unlike suspend in failmode=wait, or suspend due to MMP check failure,
which both report "SUSPENDED" explicitly.

This commit changes it so SUSPENDED is reported for failmode=continue
the same as for other modes.

Rationale:

The historical behaviour of failmode=continue is roughly, "press on as
though all is well". To this end, the fact that the pool had suspended
was not shown, to maintain the façade that all is well.

Its unclear why hiding this information was considered appropriate. One
possibility is that it was expected that a true pool fault would always
be reported as DEGRADED or FAULTED, and that the pool could not suspend
without these happening.

That is not necessarily true, as vdev health and suspend state are only
loosely connected, such that a pool in (apparent) good health can be
suspended for good reasons, and of course a degraded pool does not lead
to suspension. Even if that expectation were true, there's still a
difference in urgency - a degraded pool may not need to be attended to
for hours, while a suspended pool is most often unusable until an
operator intervenes.

An operator that has set failmode=continue has presumably done so
because their workload is one that can continue to operate in a useful
way when the pool suspends. In this case the operator still needs a
clear indicator that there is a problem that needs attending to.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15297
2023-09-22 16:13:20 -07:00
Paul Dagnelie 729507d309 Fix occasional rsend test crashes
We have occasional crashes in the rsend tests. Debugging revealed 
that this is because the send_worker thread is getting EINTR from 
splice(). This happens when a non-fatal signal is received during 
the syscall. We should retry the syscall, rather than exiting failure.
Tweak the loop to only break if the splice is finished or we receive 
a non-EINTR error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15273
2023-09-22 16:13:20 -07:00
Rob N 3af63683fe cmd: add 'help' subcommand to zpool and zfs
'program help subcommand' is a reasonably common pattern for
multifunction command-line programs. This commit adds support for that
style to the zpool and zfs commands.

When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the
'man' program on the PATH with the most likely manpage name for the
requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or
"zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics.
If no topic is supplied, uses the top "zpool" or "zfs" pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15288
2023-09-22 16:13:20 -07:00
Paul Dagnelie 9aa1a2878e Fix incorrect expected error in ztest
There is an occasional ztest failure that looks like ztest: attach 
(/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) 
returned 22, expected 95. This is because the value that we return 
is EINVAL, but expected_error is set incorrectly.

Change the expected_error value to match both the comment and the 
actual error value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15295
2023-09-22 16:13:20 -07:00
Paul Dagnelie cc75c816c5 Fix l2arc_apply_transforms ztest crash
In #13375 we modified the allocation size of the buffer that we use 
to apply l2arc transforms to be the size of the arc hdr we're using, 
rather than the allocation size that will be in place on the disk, 
because sometimes the hdr size is larger. Unfortunately, sometimes 
the allocation size is larger, which means that we overflow the buffer 
in that case. This change modifies the allocation to be the max of 
the two values

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15177
Closes #15248
2023-09-22 16:13:20 -07:00
Rob N 1c2aee7a52 tests: install missing PAM tests
'pam_change_unmounted' and 'pam_recursive' both exist and are referenced
by the test run config, but weren't being installed and so are excluded.
This gets them installed so they will run as expected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15291
2023-09-22 16:13:20 -07:00
Alexander Motin 62677576a7 ZIL: Fix potential race on flush deferring.
zil_lwb_set_zio_dependency() can not set write ZIO dependency on
previous LWB's write ZIO if one is already in done handler and set
state to LWB_STATE_WRITE_DONE.  So theoretically done handler of
next LWB's write ZIO may run before done handler of previous LWB
write ZIO completes.  In such case we can not defer flushes, since
the flush issue process is not locked.

This may fix some reported assertions of lwb_vdev_tree not being
empty inside zil_free_lwb().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15278
2023-09-20 16:41:23 -07:00
Mateusz Guzik f7a07d76ee Retire z_nr_znodes
Added in ab26409db7 ("Linux 3.1 compat, super_block->s_shrink"), with
the only consumer which needed the count getting retired in 066e825221
("Linux compat: Minimum kernel version 3.10").

The counter gets in the way of not maintaining the list to begin with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15274
2023-09-19 08:52:06 -07:00
Tony Hutter 54c6fbd378 zed: Allow autoreplace and fault LEDs for removed vdevs
Allow zed to autoreplace vdevs marked as REMOVED.  Also update
statechange-led zedlet to toggle fault LEDs for REMOVED vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15281
2023-09-19 08:52:06 -07:00
наб 0ce7a068e9 check-zstd-symbols: also ignore __pfx_ symbols
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b341b20d648bb7e9a3307c33163e7399f0913e66

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15282 
Closes #15284
2023-09-19 08:52:06 -07:00
Laura Hild 228b064d1b Remove implication that child `disk`s aren't vdevs in zpoolconcepts(7)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #15247
2023-09-19 08:52:06 -07:00
ednadolski-ix b9b9cdcdb1 update max_variance limit in zdb_block_size_histogram test for CI
Commit 2d7843401a had previously
updated this hardcoded limit to allow for CI testing. As there
is no deterministic pass/fail value, the need has arisen for
one more small increase.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15252
2023-09-19 08:52:06 -07:00
George Amanakis 11943656f9 Update the MOS directory on spa_upgrade_errlog()
spa_upgrade_errlog() does not update the MOS directory when the
head_errlog feature is enabled. In this case if spa_errlog_sync() is not
called, the MOS dir references the old errlog_last and errlog_sync
objects. Thus when doing a scrub a panic will occur:

Call Trace:
 dump_stack+0x6d/0x8b
 panic+0x101/0x2e3
 spl_panic+0xcf/0x102 [spl]
 delete_errlog+0x124/0x130 [zfs]
 spa_errlog_sync+0x256/0x260 [zfs]
 spa_sync_iterate_to_convergence+0xe5/0x250 [zfs]
 spa_sync+0x2f7/0x670 [zfs]
 txg_sync_thread+0x22d/0x2d0 [zfs]
 thread_generic_wrapper+0x83/0xa0 [spl]
 kthread+0x104/0x140
 ret_from_fork+0x1f/0x40

Fix this by updating the related MOS directory objects in
spa_upgrade_errlog().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #15279 
Closes #15277
2023-09-19 08:51:00 -07:00
Tony Hutter c011ef8c91 Linux 6.5 compat: META (#15265)
Update the META file to reflect compatibility with the 6.5
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-09-19 08:50:01 -07:00
Andrea Righi cacc599aa2 Linux 6.5 compat: spl: properly unregister sysctl entries
When register_sysctl_table() is unavailable we fail to properly
unregister sysctl entries under "kernel/spl".

This leads to errors like the following when spl is unloaded/reloaded,
making impossible to properly reload the spl module:

[  746.995704] sysctl duplicate entry: /kernel/spl/kmem/slab_kvmem_total

Fix by cleaning up all the sub-entries inside "kernel/spl" when the
spl module is unloaded.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes #15239
2023-09-19 08:50:01 -07:00
Andrea Righi c7ee59a160 Linux 6.5 compat: safe cleanup in spl_proc_fini()
If we fail to create a proc entry in spl_proc_init() we may end up
calling unregister_sysctl_table() twice: one in the failure path of
spl_proc_init() and another time during spl_proc_fini().

Avoid the double call to unregister_sysctl_table() and while at it
refactor the code a bit to reduce code duplication.

This was accidentally introduced when the spl code was
updated for Linux 6.5 compatibility.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes #15234 
Closes #15235
2023-09-19 08:50:01 -07:00
Coleman Kane 58a707375f Linux 6.5 compat: Use copy_splice_read instead of filemap_splice_read
Using the filemap_splice_read function for the splice_read handler was
leading to occasional data corruption under certain circumstances. Favor
using copy_splice_read instead, which does not demonstrate the same
erroneous behavior under the tested failure cases.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15164
2023-09-19 08:50:01 -07:00
Coleman Kane 5a22de144a Linux 6.5 compat: replace generic_file_splice_read with filemap_splice_read
The generic_file_splice_read function was removed in Linux 6.5 in favor
of filemap_splice_read. Add an autoconf test for filemap_splice_read and
use it if it is found as the handler for .splice_read in the
file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This
change removes the ITER_* macros that OpenZFS doesn't use from being
tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was
causing the test to fail, which also affected the code responsible for
setting the .splice_read handler, above. That behavior caused run-time
panics on Linux 6.5.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15155
2023-09-19 08:50:01 -07:00
Coleman Kane 31a4673c05 Linux 6.5 compat: register_sysctl_table removed
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15138
2023-09-19 08:50:01 -07:00
Brian Atkinson 3a68f3c50f Revert "Linux 6.5 compat: register_sysctl_table removed"
This reverts commit b35374fd64 as there
are error messages when loading the SPL module. Errors seemed to be tied
to duplicate a duplicate entry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #15134
2023-09-19 08:50:01 -07:00
Coleman Kane 8be6308e85 Linux 4.20 compat: wrapper function for iov_iter type access
An iov_iter_type() function to access the "type" member of the struct
iov_iter was added at one point. Move the conditional logic to decide
which method to use for accessing it into a macro and simplify the
zpl_uio_init code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15100
2023-09-19 08:50:01 -07:00
Coleman Kane 0bf2c5365e Linux 6.4 compat: iter_iov() function now used to get old iov member
The iov_iter->iov member is now iov_iter->__iov and must be accessed via
the accessor function iter_iov(). Create a wrapper that is conditionally
compiled to use the access method appropriate for the target kernel
version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15100
2023-09-19 08:50:01 -07:00
Coleman Kane d76de9fb17 Linux 6.5 compat: blkdev changes
Multiple changes to the blkdev API were introduced in Linux 6.5. This
includes passing (void* holder) to blkdev_put, adding a new
blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type
that replaces uses of fmode_t, and removing an argument from the release
handler on block_device_operations that we weren't using. The open
function definition has also changed to take gendisk* and blk_mode_t, so
update it accordingly, too.

Implement local wrappers for blkdev_get_by_path() and
vdev_blkdev_put() so that the in-line calls are cleaner, and place the
conditionally-compiled implementation details inside of both of these
local wrappers. Both calls are exclusively used within vdev_disk.c, at
this time.

Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE
The wrapper function is now used for testing using the appropriate
method for the kernel, whether the open mode is writable or not.

Emphasize fmode_t arg in zvol_release is not used

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15099
2023-09-19 08:50:01 -07:00
Coleman Kane c0f075c06b Linux 6.5 compat: use disk_check_media_change when it exists
When disk_check_media_change() exists, then define
zfs_check_media_change() to simply call disk_check_media_change() on
the bd_disk member of its argument. Since disk_check_media_change()
is newer than when revalidate_disk was present in bops, we should
be able to safely do this via a macro, instead of recreating a new
implementation of the inline function that forces revalidation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15101
2023-09-19 08:50:01 -07:00
Coleman Kane 6c2fc56916 Linux 6.5 compat: register_sysctl_table removed
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15098
2023-09-19 08:50:01 -07:00
Alexander Motin e96fbdba34 Add more constraints for block cloning.
- We cannot clone into files with smaller block size if there is
more than one block, since we can not grow the block size.
 - Block size must be power-of-2 if destination offset != 0, since
there can be no multiple blocks of non-power-of-2 size.

The first should handle the case when destination file has several
blocks but still is not bigger than one block of the source file.
The second fixes panic in dmu_buf_hold_array_by_dnode() on attempt
to concatenate files with equal but non-power-of-2 block sizes.

While there, assert that error is reported if we made no progress.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
2023-09-10 14:02:52 -07:00
Brian Behlendorf 739db06ce7 Tag 2.2.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-09-07 16:11:40 -07:00
Volker Mauel 4da8c7d11e Intel QAT 1.7 compatibility
Based on the intel QAT samples which are bundled in the 1.x drivers, 
this is the preferred approach since api version 1.6.  See:

https://www.intel.de/content/www/de/de/download/19734/intel-quickassist-technology-driver-for-linux-hw-version-1-x.html?

Reviewed-by: Weigang Li <weigang.li@intel.com>
Signed-off-by: Volker Mauel <volkermauel@gmail.com>
Closes #15190
2023-09-07 16:10:52 -07:00
Umer Saleem 32949f2560 Relax error reporting in zpool import and zpool split
For zpool import and zpool split, zpool_enable_datasets is called
to mount and share all datasets in a pool. If there is an error
while mounting or sharing any dataset in the pool, the status of
import or split is reported as failure. However, the changes do
show up in zpool list.

This commit updates the error reporting in zpool import and zpool
split path. More descriptive messages are shown to user in case
there is an error during mount or share. Errors in mount or share
do not effect the overall status of zpool import and zpool split.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15216
2023-09-02 10:30:38 -07:00
Alexander Motin 79ac1b29d5 ZIL: Change ZIOs issue order.
In zil_lwb_write_issue(), after issuing lwb_root_zio/lwb_write_zio,
we have no right to access lwb->lwb_child_zio. If it was not there,
the first two ZIOs may have already completed and freed the lwb.
ZIOs issue in opposite order from children to parent should keep
the lwb valid till the end, since the lwb can be freed only after
lwb_root_zio completion callback.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15233
2023-09-02 10:30:38 -07:00
Alexander Motin 7dc2baaa1f ZIL: Revert zl_lock scope reduction.
While I have no reports of it, I suspect possible use-after-free
scenario when zil_commit_waiter() tries to dereference zcw_lwb
for lwb already freed by zil_sync(), while zcw_done is not set.
Extension of zl_lock scope as it was originally should block
zil_sync() from freeing the lwb, closing this race.

This reverts #14959 and couple chunks of #14841.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15228
2023-09-02 10:30:38 -07:00
Alexander Motin 5a7cb0b065 ZIL: Tune some assertions.
In zil_free_lwb() we should first assert lwb_state or the rest of
assertions can be misleading if it is false.

Add lwb_state assertions in zil_lwb_add_block() to make sure we are
not trying to add elements to lwb_vdev_tree after it was processed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15227
2023-09-02 10:30:38 -07:00
Dimitry Andric 400f56e3f8 dmu_buf_will_clone: change assertion to fix 32-bit compiler warning
Building module/zfs/dbuf.c for 32-bit targets can result in a warning:

In file included from
/usr/src/sys/contrib/openzfs/include/sys/zfs_context.h:97,
                 from /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:32:
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c: In function
'dmu_buf_will_clone':
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:116:33: error:
cast from pointer to integer of different size
[-Werror=pointer-to-int-cast]
  116 |         const uint64_t __left = (uint64_t)(LEFT);
  \
      |                                 ^
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:148:25: note:
in expansion of macro 'VERIFY0'
  148 | #define ASSERT0         VERIFY0
      |                         ^~~~~~~
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2704:9: note: in
expansion of macro 'ASSERT0'
 2704 |         ASSERT0(dbuf_find_dirty_eq(db, tx->tx_txg));
      |         ^~~~~~~

This is because dbuf_find_dirty_eq() returns a pointer, which if
pointers are 32-bit results in a warning about the cast to uint64_t.

Instead, use the ASSERT3P() macro, with == and NULL as second and third
arguments, which should work regardless of the target's bitness.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #15224
2023-09-01 09:33:33 -07:00
Serapheim Dimitropoulos 63159e5bda checkstyle: fix action failures
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15220
2023-09-01 09:33:33 -07:00
Paul Dagnelie 7eabb0af37 Try to clarify wording to reduce zpool add incidents
Try to clarify wording to reduce zpool add incidents.
Add an attach example.

Reviewed-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15179
2023-08-27 08:25:42 -07:00
Rich Ercolani c65aaa8387 Avoid save/restoring AMX registers to avoid a SPR erratum
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14989
Closes #15168
2023-08-27 08:25:42 -07:00
Brian Behlendorf e99e684b33 zed: update zed.d/statechange-slot_off.sh
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
2023-08-27 08:25:42 -07:00
наб 1b696429c1 Make zoned/jailed zfsprops(7) make more sense.
- Distribute zfs-[un]jail.8 on FreeBSD and zfs-[un]zone.8 on Linux
- zfsprops.7: mirror zoned/jailed, only available on respective platforms

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15161
2023-08-27 08:25:42 -07:00
Rob N 084ff4abd2 tests/block_cloning: rename and document get_same_blocks helper
`get_same_blocks` is a helper to compare two files and return a list of
the blocks that are clones of each other. Its very necessary for block
cloning tests.

Previously it was incorrectly called `unique_blocks`, which is the
_inverse_ of what it does (an early version did list unique blocks; it
was changed but the name was not). So if nothing else, it should be
called `duplicate_blocks`.

But, keeping the details of a clone operation in your head is actually
quite difficult, without the additional overhead of wondering how the
tools work. So I've renamed it to better describe what it does, added a
usage note, and changed it to return block indexes from 0 instead of 1,
to match how L0 blocks are normally counted.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by:  Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15181
2023-08-26 11:18:11 -07:00
Serapheim Dimitropoulos ab999406fe Update outdated assertion from zio_write_compress
As part of some internal gang block testing within Delphix
we hit the assertion removed by this patch. The assertion
was triggered by a ZIO that had two copies and was a gang
block making the following expression equal to 3:
```
MIN(zp->zp_copies + BP_IS_GANG(bp), spa_max_replication(spa))
```
and failing when we expected the above to be equal to
`BP_GET_NDVAS(bp)`.

The assertion is no longer valid since the following commit:
```
commit 14872aaa4f
Author: Matthew Ahrens <matthew.ahrens@delphix.com>
Date:   Mon Feb 6 09:37:06 2023 -0800

  EIO caused by encryption + recursive gang
```

The above commit changed gang block headers so they can't
have more than 2 copies but the assertion in question from
this PR was never updated.

Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15180
2023-08-26 11:18:11 -07:00
Tony Hutter d19304ffee zed: Add zedlet to power off slot when drive is faulted
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
2023-08-25 13:33:40 -07:00
Rob N 92f095a903 copy_file_range: fix fallback when source create on same txg
In 019dea0a5 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15170
Closes #15172
2023-08-25 13:33:40 -07:00
Umer Saleem 645a7e4d95 Move zinject from openzfs-zfs-test to openzfs-zfsutils
For Native Debian packaging, zinject binary and man page is
packaged in ZFS test package. zinject is not not directly related
to ZTS and should be packaged with other utilities, like it is
present in zfs_<ver>.rpm/deb packages.

This commit moves zinject binary and man page from openzfs-zfs-test
to openzfs-zfsutils package.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15160
2023-08-25 13:33:40 -07:00
Rafael Kitover 95649854ba dracut: support mountpoint=legacy for root dataset
Support mountpoint=legacy for the root dataset in the dracut zfs support
scripts.

mountpoint=/ or mountpoint=/sysroot also works.

Change zfs-env-bootfs.service to add zfsutil to BOOTFSFLAGS only for
root datasets with mountpoint != legacy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rafael Kitover <rkitover@gmail.com>
Closes #15149
2023-08-25 13:33:40 -07:00
oromenahar 895cb689d3 zfs_clone_range should return a descriptive error codes
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15148
2023-08-25 13:33:40 -07:00
наб 6bdc7259d1 libzfs: sendrecv: send_progress_thread: handle SIGINFO/SIGUSR1
POSIX timers target the process, not the thread (as does SIGINFO),
so we need to block it in the main thread which will die if interrupted.

Ref: https://101010.pl/@ed1conf@bsd.network/110731819189629373
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15113
2023-08-25 13:33:40 -07:00
Ryan Lahfa 1e488eec60 linux/spl/kmem_cache: undefine `kmem_cache_alloc` before defining it
When compiling a kernel with bcachefs and zfs,
the two macros will collide, making it impossible
to have both filesystems.

It is sufficient to just undefine the macro before calling it.

On why this should be in ZFS rather than bcachefs, currently,
bcachefs is not a in-tree filesystem, but,
it has a reasonably high chance of getting included soon.

This avoids the breakage in ZFS early,
this patch may be distributed downstream in NixOS
and is already used there.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Lahfa <ryan@lahfa.xyz>
Closes #15144
2023-08-25 13:33:40 -07:00
Mateusz Piotrowski c418edf1d3 Fix some typos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #15141
2023-08-25 13:33:40 -07:00
Alexander Motin df8c9f351d ZIL: Second attempt to reduce scope of zl_issuer_lock.
The previous patch #14841 appeared to have significant flaw, causing
deadlocks if zl_get_data callback got blocked waiting for TXG sync.  I
already handled some of such cases in the original patch, but issue
 #14982 shown cases that were impossible to solve in that design.

This patch fixes the problem by postponing log blocks allocation till
the very end, just before the zios issue, leaving nothing blocking after
that point to cause deadlocks.  Before that point though any sleeps are
now allowed, not causing sync thread blockage.  This require slightly
more complicated lwb state machine to allocate blocks and issue zios
in proper order.  But with removal of special early issue workarounds
the new code is much cleaner now, and should even be more efficient.

Since this patch uses null zios between write, I've found that null
zios do not wait for logical children ready status in zio_ready(),
that makes parent write to proceed prematurely, producing incorrect
log blocks.  Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children()
fixes it.

Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15122
2023-08-25 11:58:44 -07:00
Alexander Motin bb31ded68b ZIL: Replay blocks without next block pointer.
If we get next block allocation error during log write, we trigger
transaction commit.  But the block we have just completed is still
written and transactions it covers will be acknowledged normally.
If after that we ignore the block during replay just because it is
the last in the chain, we may not replay some transactions that we
have acknowledged as synced, that is not right.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15132
2023-08-25 11:58:44 -07:00
Alexander Motin c1801cbe59 ZIL: Avoid dbuf_read() before dmu_sync().
In most cases dmu_sync() works with dirty records directly and does
not need actual data. The only exception is dmu_sync_late_arrival().
To save some CPU time use dmu_buf_hold_noread*() in z*_get_data()
and explicitly call dbuf_read() in dmu_sync_late_arrival(). There
is also a chance that by that time TXG will already be synced and
we won't have to do it at all.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15153
2023-08-25 11:58:44 -07:00
Alexander Motin ffaedf0a44 Remove fastwrite mechanism.
Fastwrite was introduced many years ago to improve ZIL writes spread
between multiple top-level vdevs by tracking number of allocated but
not written blocks and choosing vdev with smaller count.  It suposed
to reduce ZIL knowledge about allocation, but actually made ZIL to
even more actively report allocation code about the allocations,
complicating both ZIL and metaslabs code.

On top of that, it seems ZIO_FLAG_FASTWRITE setting in dmu_sync()
was lost many years ago, that was one of the declared benefits. Plus
introduction of embedded log metaslab class solved another problem
with allocation rotor accounting both normal and log allocations,
since in most cases those are now in different metaslab classes.

After all that, I'd prefer to simplify already too complicated ZIL,
ZIO and metaslab code if the benefit of complexity is not obvious.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15107
2023-08-25 11:58:44 -07:00
Alexander Motin 02ce9030e6 Avoid waiting in dmu_sync_late_arrival().
The transaction there does not produce any dirty data or log blocks,
so it should not be throttled. All other cases wait for TXG sync, by
which time the log block we are writing will be obsolete, so we can
skip waiting and just return error here instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15096
2023-08-25 11:58:44 -07:00
Serapheim Dimitropoulos 0ae7bfc0a4 zpool_vdev_remove() should handle EALREADY error return
When the vdev properties features was merged an extra check
was added in `spa_vdev_remove_top_check()` which checked
whether the vdev that we want to remove is already being
removed and if so return an EALREADY error.

```
static int
spa_vdev_remove_top_check(vdev_t *vd)
{
	... <snip> ...
	/*
	 * This device is already being removed
	 */
	if (vd->vdev_removing)
		return (SET_ERROR(EALREADY));
```

Before that change we'd still fail with an error but it
was a more generic one - here is the check that failed
later in the same function:
```
	/*
	 * There can not be a removal in progress.
	 */
	if (spa->spa_removing_phys.sr_state == DSS_SCANNING)
		return (SET_ERROR(EBUSY));
```

Changing the error code returned from that function changed
the behavior of the removal's library interface exposed to
the userland - `spa_vdev_remove()` now returns `EZFS_UNKNOWN`
instead of `EZFS_EBUSY` that was returning before.

This patch adds logic to make `spa_vdev_remove()` mindful
of the new EALREADY code and propagating `EZFS_EBUSY`
reverting to the previously established semantics of that
function.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15013
Closes #15129
2023-08-02 08:54:09 -07:00
наб bd1eab16eb linux: zfs: ctldir: set [amc]time to snapshot's creation property
If looking up a snapdir inode failed, hold pool config – hold the 
snapshot – get its creation property – release it – release it, 
then use that as the [amc]time in the allocated inode. If that 
fails then fall back to current time. No performance impact since 
this is only done when allocating a new snapdir inode.
                                                       
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15110
Closes #15117
2023-08-02 08:53:45 -07:00
Zach Dykstra b3c1807d77 readmmap.c: fix building with MUSL libc
glibc includes sys/types.h from stdlib.h. This is not the case for MUSL,
so explicitly include it. Fixes usage of uint_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Zach Dykstra <dykstra.zachary@gmail.com>
Closes #15130
2023-08-02 08:53:06 -07:00
oromenahar b5e2456333 Check the return value in clonefile test
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15128
2023-08-02 08:52:40 -07:00
Rob N c47f0f4417 linux/copy_file_range: properly request a fallback copy on Linux <5.3
Before Linux 5.3, the filesystem's copy_file_range handler had to signal
back to the kernel that we can't fulfill the request and it should
fallback to a content copy. This is done by returning -EOPNOTSUPP.

This commit converts the EXDEV return from zfs_clone_range to
EOPNOTSUPP, to force the kernel to fallback for all the valid reasons it
might be unable to clone. Without it the copy_file_range() syscall will
return EXDEV to userspace, breaking its semantics.

Add test for copy_file_range fallbacks.  copy_file_range should always
fallback to a content copy whenever ZFS can't service the request with
cloning.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15131
2023-08-02 08:52:40 -07:00
Rob N 12f2b1f65e zdb: include cloned blocks in block statistics
This gives `zdb -b` support for clone blocks.

Previously, it didn't know what clones were, so would count their space
allocation multiple times and then report leaked space (or, in debug,
would assert trying to claim blocks a second time).

This commit fixes those bugs, and reports the number of clones and the
space "used" (saved) by them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15123
2023-08-02 08:52:40 -07:00
Brian Behlendorf 4a104ac047 Tag 2.2.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-27 16:15:44 -07:00
oromenahar c24a480631 BRT should return EOPNOTSUPP
Return the more descriptive EOPNOTSUPP instead of EXDEV when the
storage pool doesn't support block cloning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15097
2023-07-27 16:11:54 -07:00
Rob Norris 36d1a3ef4e zts: block cloning tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
Closes #405
Closes #13349
2023-07-26 08:46:58 -07:00
Rob Norris 2768dc04cc linux: implement filesystem-side copy/clone functions for EL7
Redhat have backported copy_file_range and clone_file_range to the EL7
kernel using an "extended file operations" wrapper structure. This
connects all that up to let cloning work there too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 3366ceaf3a linux: implement filesystem-side clone ioctls
Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and
were implemented as regular filesystem-specific ioctls. This implements
those ioctls directly in OpenZFS, allowing cloning to work on older
kernels.

There's no need to gate these behind version checks; on later kernels
Linux will simply never deliver these ioctls, instead calling the
approprate VFS op.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 5d12545da8 linux: implement filesystem-side copy/clone functions
This implements the Linux VFS ops required to service the file
copy/clone APIs:

  .copy_file_range    (4.5+)
  .clone_file_range   (4.5-4.19)
  .dedupe_file_range  (4.5-4.19)
  .remap_file_range   (4.20+)

Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are
hooked up here, but are not implemented yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris a3ea8c8ee6 dbuf_sync_leaf: check DB_READ in state assertions
Block cloning introduced a new state transition from DB_NOFILL to
DB_READ. This occurs when a block is cloned and then read on the
current txg.

In this case, the clone will move the dbuf to DB_NOFILL, and then the
read will be issued for the overidden block pointer. If that read is
still outstanding when it comes time to write, the dbuf will be in
DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus
tripping the assertions.

This updates those checks to allow DB_READ as a valid state iff the
dirty record is for a BRT write and there is a override block pointer.
This is a safe situation because the block already exists, so there's
nothing that could change from underneath the read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 0426e13271 dmu_buf_will_clone: only check that current txg is clean
dbuf_undirty() will (correctly) only removed dirty records for the given
(open) txg. If there is a dirty record for an earlier closed txg that
has not been synced out yet, then db_dirty_records will still have
entries on it, tripping the assertion.

Instead, change the assertion to only consider the current txg. To some
extent this is redundant, as its really just saying "did dbuf_undirty()
work?", but it it doesn't hurt and accurately expresses our
expectations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 8aa4f0f0fc brt_vdev_realloc: use vmem_alloc for large allocation
bv_entcount can be a relatively large allocation (see comment for
BRT_RANGESIZE), so get it from the big allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 7698503dca zfs_clone_range: use vmem_malloc for large allocation
Just silencing the warning about large allocations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Brian Behlendorf b9aa32ff39 zed: Reduce log noise for large JBODs
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15086
Closes #15094
2023-07-26 08:46:58 -07:00
Brian Behlendorf 571762b290 Linux 6.4 compat: META
Update the META file to reflect compatibility with the 6.4 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15095
2023-07-26 08:46:58 -07:00
Alexander Motin 991834f5dc Remove zl_issuer_lock from zil_suspend().
This locking was recently added as part of #14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from #14979.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15103
2023-07-25 13:54:02 -07:00
Alexander Motin 41a0f66279 ZIL: Fix config lock deadlock.
When we have some LWBs closed and their ZIOs ready to be issued, we
can not afford sleeping on config lock if somebody else try to lock
it as writer, or it will cause a deadlock.

To solve it, move spa_config_enter() from zil_lwb_write_issue() to
zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering
with other threads.  Now if we can't immediately lock config, issue
all previously closed LWBs so that they could drop their config
locks after completion, and only then allow sleeping on our lock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15078
Closes #15080
2023-07-25 13:54:02 -07:00
Umer Saleem c79d1bae75
Update changelog for OpenZFS 2.2.0 release
This commit updates changelog for native Debian packages for
OpenZFS 2.2.0 release.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15104
2023-07-25 09:01:27 -07:00
Brian Behlendorf 70232483b4 Tag 2.2.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-21 16:36:34 -07:00
Rob N c5273e0c31 shellcheck: disable "unreachable command" check [SC2317]
This new check in 0.9.0 appears to have some issues with various forms
of "early return", like trap, exit and return. This is tripping up (at
least):

  cmd/zed/zed.d/history_event-zfs-list-cacher.sh
  /etc/zfs/zfs-functions

Its not obvious what its complaining about or what the remedy is, so it
seems sensible to disable this check for now.

See also:

  https://www.shellcheck.net/wiki/SC2317
  https://github.com/koalaman/shellcheck/issues/2542
  https://github.com/koalaman/shellcheck/issues/2613

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15089
2023-07-21 16:35:12 -07:00
Rob N 685ae4429f metaslab: tuneable to better control force ganging
metaslab_force_ganging isn't enough to actually force ganging, because
it still only forces 3% of the time. This adds
metaslab_force_ganging_pct so we can configure how often to force
ganging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15088
2023-07-21 16:35:12 -07:00
Alexander Motin 81be809a25 Adjust prefetch parameters.
- Reduce maximum prefetch distance for 32bit platforms to 8MB as it
was previously.  Those systems didn't grow much probably, so better
stay conservative there.
 - Retire array_rd_sz tunable, blocking prefetch for large requests.
We should not penalize applications trying to be more efficient. The
speculative prefetcher by itself has reasonable distance limits, and
1MB is not much at all these days.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15072
2023-07-21 16:35:12 -07:00
Alexander Motin 8a6fde8213 Add explicit prefetches to bpobj_iterate().
To simplify error handling bpobj_iterate_blkptrs() iterates through
the list of block pointers backwards.  Unfortunately speculative
prefetcher is currently unable to detect such patterns, that makes
each block read there synchronous and very slow on HDD pools.

According to my tests, added explicit prefetch reduces time needed
to asynchronously delete 8 snapshots of 4 million blocks each from
20 seconds to less than one, that should free sync thread for other
useful work, such as async writes, scrub, etc.

While there, plug one memory leak in case of bpobj_open() error and
harmonize some variable names.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15071
2023-07-21 16:35:12 -07:00
Alan Somers b6f618f8ff Don't emit cksum_{actual_expected} in ereport.fs.zfs.checksum events
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes #14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #14717
Closes #15052
2023-07-21 16:35:12 -07:00
Alan Somers 51a2b59767 Don't emit checksum histograms in ereport.fs.zfs.checksum events
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue #14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15052
2023-07-21 16:35:12 -07:00
Tony Hutter 8c81c0b05d zed: Fix zed ASSERT on slot power cycle
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15084
2023-07-21 16:35:12 -07:00
Chunwei Chen b221f43943 Fix zpl_test_super race with zfs_umount
We cannot call zpl_enter in zpl_test_super, because zpl_test_super is
under spinlock so we can't sleep, and also because zpl_test_super is
called without sb->s_umount taken, so it's possible we would race with
zfs_umount and call zpl_enter on freed zfsvfs.

Here's an stack trace when this happens:
[ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast()
[ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.115012] Call Trace:
[ 2379.115019]  dump_stack+0x74/0x96
[ 2379.115024]  panic+0x114/0x2f6
[ 2379.115035]  spl_panic+0xcf/0xfc [spl]
[ 2379.115477]  __cv_broadcast+0x68/0xa0 [spl]
[ 2379.115585]  rrw_exit+0xb8/0x310 [zfs]
[ 2379.115696]  rrm_exit+0x4a/0x80 [zfs]
[ 2379.115808]  zpl_test_super+0xa9/0xd0 [zfs]
[ 2379.115920]  sget+0xd1/0x230
[ 2379.116033]  zpl_mount+0xdc/0x230 [zfs]
[ 2379.116037]  legacy_get_tree+0x28/0x50
[ 2379.116039]  vfs_get_tree+0x27/0xc0
[ 2379.116045]  path_mount+0x2fe/0xa70
[ 2379.116048]  do_mount+0x80/0xa0
[ 2379.116050]  __x64_sys_mount+0x8b/0xe0
[ 2379.116052]  do_syscall_64+0x35/0x50
[ 2379.116054]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 2379.116057] RIP: 0033:0x7f9912e8b26a

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15077
2023-07-21 16:35:12 -07:00
Ameer Hamza e037327bfe spa_min_alloc should be GCD, not min
Since spa_min_alloc may not be a power of 2, unlike ashifts, in the
case of DRAID, we should not select the minimal value among several
vdevs. Rounding to a multiple of it is unlikely to work for other
vdevs. Instead, using the greatest common divisor produces smaller
yet more reasonable results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15067
2023-07-21 16:35:12 -07:00
Yuri Pankov 1a2e486d25 Don't panic if setting vdev properties is unsupported for this vdev type
Check that vdev has valid zap and bail out early.

While here, move objid selection out of the loop, it's not going to
change.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15063
2023-07-21 16:35:12 -07:00
Ameer Hamza d8011707cc Ignore pool ashift property during vdev attachment
Ashift can be set for a vdev only during its creation, and the
top-level vdev does not change when a vdev is attached or replaced.
The ashift property should not be used during attachment, as it
does not allow attaching/replacing a vdev if the pool's ashift
property is increased after the existing vdev was created. Instead,
we should be able to attach the vdev if the attached vdev can
satisfy the ashift requirement with its parent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15061
2023-07-21 16:35:12 -07:00
Wojciech Małota-Wójcik f5f5a2db95 Rollback before zfs root is mounted
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Wojciech Małota-Wójcik <59281144+outofforest@users.noreply.github.com>
Closes #15025
2023-07-21 16:35:12 -07:00
Alexander Motin 83b0967c1f Do not request data L1 buffers on scan prefetch.
Set ARC_FLAG_NO_BUF when prefetching data L1 buffers for scan.  We
do not prefetch data L0 buffers, so we do not need the L1 buffers,
only want them to be ready in ARC. This saves some CPU time on the
buffers decompression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15029
2023-07-21 16:35:12 -07:00
Coleman Kane 73ba5df31a Linux 6.5 compat: disk_check_media_change() was added
The disk_check_media_change() function was added which replaces
bdev_check_media_change.  This change was introduced in 6.5rc1
444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a
gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk
is now used to pass the expected data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15060
2023-07-21 16:35:12 -07:00
Coleman Kane 1bc244ae93 Linux 6.5 compat: BLK_STS_NEXUS renamed to BLK_STS_RESV_CONFLICT
This change was introduced in Linux commit
7ba150834b840f6f5cdd07ca69a4ccf39df59a66

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15059
2023-07-21 16:35:12 -07:00
Coleman Kane 931dc70550 Linux 6.5 compat: intptr_t definition is canonically signed
Make the version here match that elsewhere in the kernel and system
headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15058
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-21 16:35:12 -07:00
Yuri Pankov 5299f4f289 set autotrim default to 'off' everywhere
As it turns out having autotrim default to 'on' on FreeBSD never really
worked due to mess with defines where userland and kernel module were
getting different default values (userland was defaulting to 'off',
module was thinking it's 'on').

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15079
2023-07-21 16:35:12 -07:00
Alan Somers f917cf1c03 Fix the ZFS checksum error histograms with larger record sizes
My analysis in PR #14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15049
2023-07-21 16:35:12 -07:00
Alexander Motin 56ed389a57 Fix raw receive with different indirect block size.
Unlike regular receive, raw receive require destination to have the
same block structure as the source.  In case of dnode reclaim this
triggers two special cases, requiring special handling:
 - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz()
should not dirty the data buffer if block size does not change, or
durign receive dbuf_dirty_lightweight() will trigger assertion.
 - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz()
would fail and receive_object would trigger assertion, so we should
destroy and recreate the dnode from scratch.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15039

(cherry picked from commit c4e8742149)
2023-07-20 08:58:29 -07:00
Alexander Motin e613e4bbe3 Avoid extra snprintf() in dsl_deadlist_merge().
Since we are already iterating the ZAP, we have exact string key to
remove, we do not need to call zap_remove_int() with the int key we
just converted, we can call zap_remove() for the original string.

This should make no functional change, only a micro-optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15056

(cherry picked from commit fdba8cbb79)
2023-07-20 08:58:29 -07:00
Alexander Motin b4e630b00c Add missed DMU_PROJECTUSED_OBJECT prefetch.
It seems 9c5167d19f "Project Quota on ZFS" missed to add prefetch
for DMU_PROJECTUSED_OBJECT during scan (scrub/resilver).  It should
not cause visible problems, but may affect scub/resilver performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15024
2023-07-20 08:58:29 -07:00
Mateusz Guzik bf6cd30796 FreeBSD: catch up to __FreeBSD_version 1400093
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15036
2023-07-20 08:58:29 -07:00
Alexander Motin 1266cebf87 FreeBSD: Fix build on stable/13 after 1302506.
Starting approximately from version 1302506 vn_lock_pair() grown two
additional arguments following head.  There is a one week hole, but
that is closet reference point we have.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15047
2023-07-20 08:58:29 -07:00
174 changed files with 4395 additions and 1424 deletions

21
.cirrus.yml Normal file
View File

@ -0,0 +1,21 @@
env:
CIRRUS_CLONE_DEPTH: 1
ARCH: amd64
build_task:
matrix:
freebsd_instance:
image_family: freebsd-12-4
freebsd_instance:
image_family: freebsd-13-2
freebsd_instance:
image_family: freebsd-14-0-snap
prepare_script:
- pkg install -y autoconf automake libtool gettext-runtime gmake ksh93 py39-packaging py39-cffi py39-sysctl
configure_script:
- env MAKE=gmake ./autogen.sh
- env MAKE=gmake ./configure --with-config="user" --with-python=3.9
build_script:
- gmake -j `sysctl -n kern.smp.cpus`
install_script:
- gmake install

2
.gitignore vendored
View File

@ -42,6 +42,7 @@
!udev/** !udev/**
!.editorconfig !.editorconfig
!.cirrus.yml
!.gitignore !.gitignore
!.gitmodules !.gitmodules
!AUTHORS !AUTHORS
@ -60,7 +61,6 @@
!TEST !TEST
!zfs.release.in !zfs.release.in
# #
# Normal rules # Normal rules
# #

4
META
View File

@ -2,9 +2,9 @@ Meta: 1
Name: zfs Name: zfs
Branch: 1.0 Branch: 1.0
Version: 2.2.0 Version: 2.2.0
Release: rc1 Release: rc5
Release-Tags: relext Release-Tags: relext
License: CDDL License: CDDL
Author: OpenZFS Author: OpenZFS
Linux-Maximum: 6.3 Linux-Maximum: 6.5
Linux-Minimum: 3.10 Linux-Minimum: 3.10

View File

@ -79,6 +79,7 @@
#include <sys/dsl_crypt.h> #include <sys/dsl_crypt.h>
#include <sys/dsl_scan.h> #include <sys/dsl_scan.h>
#include <sys/btree.h> #include <sys/btree.h>
#include <sys/brt.h>
#include <zfs_comutil.h> #include <zfs_comutil.h>
#include <sys/zstd/zstd.h> #include <sys/zstd/zstd.h>
@ -5178,7 +5179,7 @@ dump_label(const char *dev)
if (nvlist_size(config, &size, NV_ENCODE_XDR) != 0) if (nvlist_size(config, &size, NV_ENCODE_XDR) != 0)
size = buflen; size = buflen;
/* If the device is a cache device clear the header. */ /* If the device is a cache device read the header. */
if (!read_l2arc_header) { if (!read_l2arc_header) {
if (nvlist_lookup_uint64(config, if (nvlist_lookup_uint64(config,
ZPOOL_CONFIG_POOL_STATE, &l2cache) == 0 && ZPOOL_CONFIG_POOL_STATE, &l2cache) == 0 &&
@ -5342,12 +5343,20 @@ static const char *zdb_ot_extname[] = {
#define ZB_TOTAL DN_MAX_LEVELS #define ZB_TOTAL DN_MAX_LEVELS
#define SPA_MAX_FOR_16M (SPA_MAXBLOCKSHIFT+1) #define SPA_MAX_FOR_16M (SPA_MAXBLOCKSHIFT+1)
typedef struct zdb_brt_entry {
dva_t zbre_dva;
uint64_t zbre_refcount;
avl_node_t zbre_node;
} zdb_brt_entry_t;
typedef struct zdb_cb { typedef struct zdb_cb {
zdb_blkstats_t zcb_type[ZB_TOTAL + 1][ZDB_OT_TOTAL + 1]; zdb_blkstats_t zcb_type[ZB_TOTAL + 1][ZDB_OT_TOTAL + 1];
uint64_t zcb_removing_size; uint64_t zcb_removing_size;
uint64_t zcb_checkpoint_size; uint64_t zcb_checkpoint_size;
uint64_t zcb_dedup_asize; uint64_t zcb_dedup_asize;
uint64_t zcb_dedup_blocks; uint64_t zcb_dedup_blocks;
uint64_t zcb_clone_asize;
uint64_t zcb_clone_blocks;
uint64_t zcb_psize_count[SPA_MAX_FOR_16M]; uint64_t zcb_psize_count[SPA_MAX_FOR_16M];
uint64_t zcb_lsize_count[SPA_MAX_FOR_16M]; uint64_t zcb_lsize_count[SPA_MAX_FOR_16M];
uint64_t zcb_asize_count[SPA_MAX_FOR_16M]; uint64_t zcb_asize_count[SPA_MAX_FOR_16M];
@ -5368,6 +5377,8 @@ typedef struct zdb_cb {
int zcb_haderrors; int zcb_haderrors;
spa_t *zcb_spa; spa_t *zcb_spa;
uint32_t **zcb_vd_obsolete_counts; uint32_t **zcb_vd_obsolete_counts;
avl_tree_t zcb_brt;
boolean_t zcb_brt_is_active;
} zdb_cb_t; } zdb_cb_t;
/* test if two DVA offsets from same vdev are within the same metaslab */ /* test if two DVA offsets from same vdev are within the same metaslab */
@ -5662,6 +5673,45 @@ zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const blkptr_t *bp,
zcb->zcb_asize_len[bin] += BP_GET_ASIZE(bp); zcb->zcb_asize_len[bin] += BP_GET_ASIZE(bp);
zcb->zcb_asize_total += BP_GET_ASIZE(bp); zcb->zcb_asize_total += BP_GET_ASIZE(bp);
if (zcb->zcb_brt_is_active && brt_maybe_exists(zcb->zcb_spa, bp)) {
/*
* Cloned blocks are special. We need to count them, so we can
* later uncount them when reporting leaked space, and we must
* only claim them them once.
*
* To do this, we keep our own in-memory BRT. For each block
* we haven't seen before, we look it up in the real BRT and
* if its there, we note it and its refcount then proceed as
* normal. If we see the block again, we count it as a clone
* and then give it no further consideration.
*/
zdb_brt_entry_t zbre_search, *zbre;
avl_index_t where;
zbre_search.zbre_dva = bp->blk_dva[0];
zbre = avl_find(&zcb->zcb_brt, &zbre_search, &where);
if (zbre != NULL) {
zcb->zcb_clone_asize += BP_GET_ASIZE(bp);
zcb->zcb_clone_blocks++;
zbre->zbre_refcount--;
if (zbre->zbre_refcount == 0) {
avl_remove(&zcb->zcb_brt, zbre);
umem_free(zbre, sizeof (zdb_brt_entry_t));
}
return;
}
uint64_t crefcnt = brt_entry_get_refcount(zcb->zcb_spa, bp);
if (crefcnt > 0) {
zbre = umem_zalloc(sizeof (zdb_brt_entry_t),
UMEM_NOFAIL);
zbre->zbre_dva = bp->blk_dva[0];
zbre->zbre_refcount = crefcnt;
avl_insert(&zcb->zcb_brt, zbre, where);
}
}
if (dump_opt['L']) if (dump_opt['L'])
return; return;
@ -6664,6 +6714,20 @@ deleted_livelists_dump_mos(spa_t *spa)
iterate_deleted_livelists(spa, dump_livelist_cb, NULL); iterate_deleted_livelists(spa, dump_livelist_cb, NULL);
} }
static int
zdb_brt_entry_compare(const void *zcn1, const void *zcn2)
{
const dva_t *dva1 = &((const zdb_brt_entry_t *)zcn1)->zbre_dva;
const dva_t *dva2 = &((const zdb_brt_entry_t *)zcn2)->zbre_dva;
int cmp;
cmp = TREE_CMP(DVA_GET_VDEV(dva1), DVA_GET_VDEV(dva2));
if (cmp == 0)
cmp = TREE_CMP(DVA_GET_OFFSET(dva1), DVA_GET_OFFSET(dva2));
return (cmp);
}
static int static int
dump_block_stats(spa_t *spa) dump_block_stats(spa_t *spa)
{ {
@ -6678,6 +6742,13 @@ dump_block_stats(spa_t *spa)
zcb = umem_zalloc(sizeof (zdb_cb_t), UMEM_NOFAIL); zcb = umem_zalloc(sizeof (zdb_cb_t), UMEM_NOFAIL);
if (spa_feature_is_active(spa, SPA_FEATURE_BLOCK_CLONING)) {
avl_create(&zcb->zcb_brt, zdb_brt_entry_compare,
sizeof (zdb_brt_entry_t),
offsetof(zdb_brt_entry_t, zbre_node));
zcb->zcb_brt_is_active = B_TRUE;
}
(void) printf("\nTraversing all blocks %s%s%s%s%s...\n\n", (void) printf("\nTraversing all blocks %s%s%s%s%s...\n\n",
(dump_opt['c'] || !dump_opt['L']) ? "to verify " : "", (dump_opt['c'] || !dump_opt['L']) ? "to verify " : "",
(dump_opt['c'] == 1) ? "metadata " : "", (dump_opt['c'] == 1) ? "metadata " : "",
@ -6779,7 +6850,8 @@ dump_block_stats(spa_t *spa)
metaslab_class_get_alloc(spa_special_class(spa)) + metaslab_class_get_alloc(spa_special_class(spa)) +
metaslab_class_get_alloc(spa_dedup_class(spa)) + metaslab_class_get_alloc(spa_dedup_class(spa)) +
get_unflushed_alloc_space(spa); get_unflushed_alloc_space(spa);
total_found = tzb->zb_asize - zcb->zcb_dedup_asize + total_found =
tzb->zb_asize - zcb->zcb_dedup_asize - zcb->zcb_clone_asize +
zcb->zcb_removing_size + zcb->zcb_checkpoint_size; zcb->zcb_removing_size + zcb->zcb_checkpoint_size;
if (total_found == total_alloc && !dump_opt['L']) { if (total_found == total_alloc && !dump_opt['L']) {
@ -6820,6 +6892,9 @@ dump_block_stats(spa_t *spa)
"bp deduped:", (u_longlong_t)zcb->zcb_dedup_asize, "bp deduped:", (u_longlong_t)zcb->zcb_dedup_asize,
(u_longlong_t)zcb->zcb_dedup_blocks, (u_longlong_t)zcb->zcb_dedup_blocks,
(double)zcb->zcb_dedup_asize / tzb->zb_asize + 1.0); (double)zcb->zcb_dedup_asize / tzb->zb_asize + 1.0);
(void) printf("\t%-16s %14llu count: %6llu\n",
"bp cloned:", (u_longlong_t)zcb->zcb_clone_asize,
(u_longlong_t)zcb->zcb_clone_blocks);
(void) printf("\t%-16s %14llu used: %5.2f%%\n", "Normal class:", (void) printf("\t%-16s %14llu used: %5.2f%%\n", "Normal class:",
(u_longlong_t)norm_alloc, 100.0 * norm_alloc / norm_space); (u_longlong_t)norm_alloc, 100.0 * norm_alloc / norm_space);

View File

@ -372,6 +372,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
/* Only autoreplace bad disks */ /* Only autoreplace bad disks */
if ((vs->vs_state != VDEV_STATE_DEGRADED) && if ((vs->vs_state != VDEV_STATE_DEGRADED) &&
(vs->vs_state != VDEV_STATE_FAULTED) && (vs->vs_state != VDEV_STATE_FAULTED) &&
(vs->vs_state != VDEV_STATE_REMOVED) &&
(vs->vs_state != VDEV_STATE_CANT_OPEN)) { (vs->vs_state != VDEV_STATE_CANT_OPEN)) {
zed_log_msg(LOG_INFO, " not autoreplacing since disk isn't in " zed_log_msg(LOG_INFO, " not autoreplacing since disk isn't in "
"a bad state (currently %llu)", vs->vs_state); "a bad state (currently %llu)", vs->vs_state);
@ -607,8 +608,6 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
*/ */
if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 || if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 ||
strcmp(dp->dd_compare, path) != 0) { strcmp(dp->dd_compare, path) != 0) {
zed_log_msg(LOG_INFO, " %s: no match (%s != vdev %s)",
__func__, dp->dd_compare, path);
return; return;
} }
if (dp->dd_new_vdev_guid != 0 && dp->dd_new_vdev_guid != guid) { if (dp->dd_new_vdev_guid != 0 && dp->dd_new_vdev_guid != guid) {

View File

@ -416,6 +416,11 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0) FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0)
return; return;
if (vdev_guid == 0) {
fmd_hdl_debug(hdl, "Got a zero GUID");
return;
}
if (spare) { if (spare) {
int nspares = find_and_remove_spares(zhdl, vdev_guid); int nspares = find_and_remove_spares(zhdl, vdev_guid);
fmd_hdl_debug(hdl, "%d spares removed", nspares); fmd_hdl_debug(hdl, "%d spares removed", nspares);

View File

@ -16,6 +16,7 @@ dist_zedexec_SCRIPTS = \
%D%/scrub_finish-notify.sh \ %D%/scrub_finish-notify.sh \
%D%/statechange-led.sh \ %D%/statechange-led.sh \
%D%/statechange-notify.sh \ %D%/statechange-notify.sh \
%D%/statechange-slot_off.sh \
%D%/trim_finish-notify.sh \ %D%/trim_finish-notify.sh \
%D%/vdev_attach-led.sh \ %D%/vdev_attach-led.sh \
%D%/vdev_clear-led.sh %D%/vdev_clear-led.sh
@ -35,6 +36,7 @@ zedconfdefaults = \
scrub_finish-notify.sh \ scrub_finish-notify.sh \
statechange-led.sh \ statechange-led.sh \
statechange-notify.sh \ statechange-notify.sh \
statechange-slot_off.sh \
vdev_attach-led.sh \ vdev_attach-led.sh \
vdev_clear-led.sh vdev_clear-led.sh

View File

@ -121,7 +121,7 @@ state_to_val()
{ {
state="$1" state="$1"
case "$state" in case "$state" in
FAULTED|DEGRADED|UNAVAIL) FAULTED|DEGRADED|UNAVAIL|REMOVED)
echo 1 echo 1
;; ;;
ONLINE) ONLINE)

View File

@ -0,0 +1,64 @@
#!/bin/sh
# shellcheck disable=SC3014,SC2154,SC2086,SC2034
#
# Turn off disk's enclosure slot if it becomes FAULTED.
#
# Bad SCSI disks can often "disappear and reappear" causing all sorts of chaos
# as they flip between FAULTED and ONLINE. If
# ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is set in zed.rc, and the disk gets
# FAULTED, then power down the slot via sysfs:
#
# /sys/class/enclosure/<enclosure>/<slot>/power_status
#
# We assume the user will be responsible for turning the slot back on again.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
# 0: slot successfully powered off
# 1: enclosure not available
# 2: ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT disabled
# 3: vdev was not FAULTED
# 4: The enclosure sysfs path passed from ZFS does not exist
# 5: Enclosure slot didn't actually turn off after we told it to
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
if [ "${ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT}" != "1" ] ; then
exit 2
fi
if [ "$ZEVENT_VDEV_STATE_STR" != "FAULTED" ] ; then
exit 3
fi
if [ ! -f "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status" ] ; then
exit 4
fi
# Turn off the slot and wait for sysfs to report that the slot is off.
# It can take ~400ms on some enclosures and multiple retries may be needed.
for i in $(seq 1 20) ; do
echo "off" | tee "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status"
for j in $(seq 1 5) ; do
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
break 2
fi
sleep 0.1
done
done
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" != "off" ] ; then
exit 5
fi
zed_log_msg "powered down slot $ZEVENT_VDEV_ENC_SYSFS_PATH for $ZEVENT_VDEV_PATH"

View File

@ -142,3 +142,8 @@ ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"
# Disabled by default, 1 to enable and 0 to disable. # Disabled by default, 1 to enable and 0 to disable.
#ZED_SYSLOG_DISPLAY_GUIDS=1 #ZED_SYSLOG_DISPLAY_GUIDS=1
##
# Power off the drive's slot in the enclosure if it becomes FAULTED. This can
# help silence misbehaving drives. This assumes your drive enclosure fully
# supports slot power control via sysfs.
#ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT=1

View File

@ -132,6 +132,8 @@ static int zfs_do_zone(int argc, char **argv);
static int zfs_do_unzone(int argc, char **argv); static int zfs_do_unzone(int argc, char **argv);
#endif #endif
static int zfs_do_help(int argc, char **argv);
/* /*
* Enable a reasonable set of defaults for libumem debugging on DEBUG builds. * Enable a reasonable set of defaults for libumem debugging on DEBUG builds.
*/ */
@ -337,7 +339,7 @@ get_usage(zfs_help_t idx)
"\tsend [-nVvPe] -t <receive_resume_token>\n" "\tsend [-nVvPe] -t <receive_resume_token>\n"
"\tsend [-PnVv] --saved filesystem\n")); "\tsend [-PnVv] --saved filesystem\n"));
case HELP_SET: case HELP_SET:
return (gettext("\tset <property=value> ... " return (gettext("\tset [-u] <property=value> ... "
"<filesystem|volume|snapshot> ...\n")); "<filesystem|volume|snapshot> ...\n"));
case HELP_SHARE: case HELP_SHARE:
return (gettext("\tshare [-l] <-a [nfs|smb] | filesystem>\n")); return (gettext("\tshare [-l] <-a [nfs|smb] | filesystem>\n"));
@ -606,6 +608,9 @@ usage(boolean_t requested)
(void) fprintf(fp, (void) fprintf(fp,
gettext("\nFor the delegated permission list, run: %s\n"), gettext("\nFor the delegated permission list, run: %s\n"),
"zfs allow|unallow"); "zfs allow|unallow");
(void) fprintf(fp,
gettext("\nFor further help on a command or topic, "
"run: %s\n"), "zfs help [<topic>]");
} }
/* /*
@ -4197,9 +4202,10 @@ out:
static int static int
set_callback(zfs_handle_t *zhp, void *data) set_callback(zfs_handle_t *zhp, void *data)
{ {
nvlist_t *props = data; zprop_set_cbdata_t *cb = data;
int ret = zfs_prop_set_list_flags(zhp, cb->cb_proplist, cb->cb_flags);
if (zfs_prop_set_list(zhp, props) != 0) { if (ret != 0 || libzfs_errno(g_zfs) != EZFS_SUCCESS) {
switch (libzfs_errno(g_zfs)) { switch (libzfs_errno(g_zfs)) {
case EZFS_MOUNTFAILED: case EZFS_MOUNTFAILED:
(void) fprintf(stderr, gettext("property may be set " (void) fprintf(stderr, gettext("property may be set "
@ -4210,33 +4216,42 @@ set_callback(zfs_handle_t *zhp, void *data)
"but unable to reshare filesystem\n")); "but unable to reshare filesystem\n"));
break; break;
} }
return (1);
} }
return (0); return (ret);
} }
static int static int
zfs_do_set(int argc, char **argv) zfs_do_set(int argc, char **argv)
{ {
nvlist_t *props = NULL; zprop_set_cbdata_t cb = { 0 };
int ds_start = -1; /* argv idx of first dataset arg */ int ds_start = -1; /* argv idx of first dataset arg */
int ret = 0; int ret = 0;
int i; int i, c;
/* check for options */ /* check options */
if (argc > 1 && argv[1][0] == '-') { while ((c = getopt(argc, argv, "u")) != -1) {
switch (c) {
case 'u':
cb.cb_flags |= ZFS_SET_NOMOUNT;
break;
case '?':
default:
(void) fprintf(stderr, gettext("invalid option '%c'\n"), (void) fprintf(stderr, gettext("invalid option '%c'\n"),
argv[1][1]); optopt);
usage(B_FALSE); usage(B_FALSE);
} }
}
argc -= optind;
argv += optind;
/* check number of arguments */ /* check number of arguments */
if (argc < 2) { if (argc < 1) {
(void) fprintf(stderr, gettext("missing arguments\n")); (void) fprintf(stderr, gettext("missing arguments\n"));
usage(B_FALSE); usage(B_FALSE);
} }
if (argc < 3) { if (argc < 2) {
if (strchr(argv[1], '=') == NULL) { if (strchr(argv[0], '=') == NULL) {
(void) fprintf(stderr, gettext("missing property=value " (void) fprintf(stderr, gettext("missing property=value "
"argument(s)\n")); "argument(s)\n"));
} else { } else {
@ -4247,7 +4262,7 @@ zfs_do_set(int argc, char **argv)
} }
/* validate argument order: prop=val args followed by dataset args */ /* validate argument order: prop=val args followed by dataset args */
for (i = 1; i < argc; i++) { for (i = 0; i < argc; i++) {
if (strchr(argv[i], '=') != NULL) { if (strchr(argv[i], '=') != NULL) {
if (ds_start > 0) { if (ds_start > 0) {
/* out-of-order prop=val argument */ /* out-of-order prop=val argument */
@ -4265,20 +4280,20 @@ zfs_do_set(int argc, char **argv)
} }
/* Populate a list of property settings */ /* Populate a list of property settings */
if (nvlist_alloc(&props, NV_UNIQUE_NAME, 0) != 0) if (nvlist_alloc(&cb.cb_proplist, NV_UNIQUE_NAME, 0) != 0)
nomem(); nomem();
for (i = 1; i < ds_start; i++) { for (i = 0; i < ds_start; i++) {
if (!parseprop(props, argv[i])) { if (!parseprop(cb.cb_proplist, argv[i])) {
ret = -1; ret = -1;
goto error; goto error;
} }
} }
ret = zfs_for_each(argc - ds_start, argv + ds_start, 0, ret = zfs_for_each(argc - ds_start, argv + ds_start, 0,
ZFS_TYPE_DATASET, NULL, NULL, 0, set_callback, props); ZFS_TYPE_DATASET, NULL, NULL, 0, set_callback, &cb);
error: error:
nvlist_free(props); nvlist_free(cb.cb_proplist);
return (ret); return (ret);
} }
@ -8726,6 +8741,25 @@ zfs_do_version(int argc, char **argv)
return (zfs_version_print() != 0); return (zfs_version_print() != 0);
} }
/* Display documentation */
static int
zfs_do_help(int argc, char **argv)
{
char page[MAXNAMELEN];
if (argc < 3 || strcmp(argv[2], "zfs") == 0)
strcpy(page, "zfs");
else if (strcmp(argv[2], "concepts") == 0 ||
strcmp(argv[2], "props") == 0)
snprintf(page, sizeof (page), "zfs%s", argv[2]);
else
snprintf(page, sizeof (page), "zfs-%s", argv[2]);
execlp("man", "man", page, NULL);
fprintf(stderr, "couldn't run man program: %s", strerror(errno));
return (-1);
}
int int
main(int argc, char **argv) main(int argc, char **argv)
{ {
@ -8781,6 +8815,12 @@ main(int argc, char **argv)
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0)) if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zfs_do_version(argc, argv)); return (zfs_do_version(argc, argv));
/*
* Special case 'help'
*/
if (strcmp(cmdname, "help") == 0)
return (zfs_do_help(argc, argv));
if ((g_zfs = libzfs_init()) == NULL) { if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s\n", libzfs_error_init(errno)); (void) fprintf(stderr, "%s\n", libzfs_error_init(errno));
return (1); return (1);

View File

@ -126,6 +126,8 @@ static int zpool_do_version(int, char **);
static int zpool_do_wait(int, char **); static int zpool_do_wait(int, char **);
static int zpool_do_help(int argc, char **argv);
static zpool_compat_status_t zpool_do_load_compat( static zpool_compat_status_t zpool_do_load_compat(
const char *, boolean_t *); const char *, boolean_t *);
@ -538,6 +540,10 @@ usage(boolean_t requested)
(void) fprintf(fp, "%s", (void) fprintf(fp, "%s",
get_usage(command_table[i].usage)); get_usage(command_table[i].usage));
} }
(void) fprintf(fp,
gettext("\nFor further help on a command or topic, "
"run: %s\n"), "zpool help [<topic>]");
} else { } else {
(void) fprintf(fp, gettext("usage:\n")); (void) fprintf(fp, gettext("usage:\n"));
(void) fprintf(fp, "%s", get_usage(current_command->usage)); (void) fprintf(fp, "%s", get_usage(current_command->usage));
@ -3116,12 +3122,21 @@ zfs_force_import_required(nvlist_t *config)
nvlist_t *nvinfo; nvlist_t *nvinfo;
state = fnvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_STATE); state = fnvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_STATE);
(void) nvlist_lookup_uint64(config, ZPOOL_CONFIG_HOSTID, &hostid); nvinfo = fnvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO);
/*
* The hostid on LOAD_INFO comes from the MOS label via
* spa_tryimport(). If its not there then we're likely talking to an
* older kernel, so use the top one, which will be from the label
* discovered in zpool_find_import(), or if a cachefile is in use, the
* local hostid.
*/
if (nvlist_lookup_uint64(nvinfo, ZPOOL_CONFIG_HOSTID, &hostid) != 0)
nvlist_lookup_uint64(config, ZPOOL_CONFIG_HOSTID, &hostid);
if (state != POOL_STATE_EXPORTED && hostid != get_system_hostid()) if (state != POOL_STATE_EXPORTED && hostid != get_system_hostid())
return (B_TRUE); return (B_TRUE);
nvinfo = fnvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO);
if (nvlist_exists(nvinfo, ZPOOL_CONFIG_MMP_STATE)) { if (nvlist_exists(nvinfo, ZPOOL_CONFIG_MMP_STATE)) {
mmp_state_t mmp_state = fnvlist_lookup_uint64(nvinfo, mmp_state_t mmp_state = fnvlist_lookup_uint64(nvinfo,
ZPOOL_CONFIG_MMP_STATE); ZPOOL_CONFIG_MMP_STATE);
@ -3143,6 +3158,7 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
nvlist_t *props, int flags) nvlist_t *props, int flags)
{ {
int ret = 0; int ret = 0;
int ms_status = 0;
zpool_handle_t *zhp; zpool_handle_t *zhp;
const char *name; const char *name;
uint64_t version; uint64_t version;
@ -3191,7 +3207,10 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
time_t timestamp = 0; time_t timestamp = 0;
uint64_t hostid = 0; uint64_t hostid = 0;
if (nvlist_exists(config, ZPOOL_CONFIG_HOSTNAME)) if (nvlist_exists(nvinfo, ZPOOL_CONFIG_HOSTNAME))
hostname = fnvlist_lookup_string(nvinfo,
ZPOOL_CONFIG_HOSTNAME);
else if (nvlist_exists(config, ZPOOL_CONFIG_HOSTNAME))
hostname = fnvlist_lookup_string(config, hostname = fnvlist_lookup_string(config,
ZPOOL_CONFIG_HOSTNAME); ZPOOL_CONFIG_HOSTNAME);
@ -3199,7 +3218,10 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
timestamp = fnvlist_lookup_uint64(config, timestamp = fnvlist_lookup_uint64(config,
ZPOOL_CONFIG_TIMESTAMP); ZPOOL_CONFIG_TIMESTAMP);
if (nvlist_exists(config, ZPOOL_CONFIG_HOSTID)) if (nvlist_exists(nvinfo, ZPOOL_CONFIG_HOSTID))
hostid = fnvlist_lookup_uint64(nvinfo,
ZPOOL_CONFIG_HOSTID);
else if (nvlist_exists(config, ZPOOL_CONFIG_HOSTID))
hostid = fnvlist_lookup_uint64(config, hostid = fnvlist_lookup_uint64(config,
ZPOOL_CONFIG_HOSTID); ZPOOL_CONFIG_HOSTID);
@ -3232,10 +3254,15 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
ret = 1; ret = 1;
if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL && if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL &&
!(flags & ZFS_IMPORT_ONLY) && !(flags & ZFS_IMPORT_ONLY)) {
zpool_enable_datasets(zhp, mntopts, 0) != 0) { ms_status = zpool_enable_datasets(zhp, mntopts, 0);
zpool_close(zhp); if (ms_status == EZFS_SHAREFAILED) {
return (1); (void) fprintf(stderr, gettext("Import was "
"successful, but unable to share some datasets"));
} else if (ms_status == EZFS_MOUNTFAILED) {
(void) fprintf(stderr, gettext("Import was "
"successful, but unable to mount some datasets"));
}
} }
zpool_close(zhp); zpool_close(zhp);
@ -6755,6 +6782,7 @@ zpool_do_split(int argc, char **argv)
char *mntopts = NULL; char *mntopts = NULL;
splitflags_t flags; splitflags_t flags;
int c, ret = 0; int c, ret = 0;
int ms_status = 0;
boolean_t loadkeys = B_FALSE; boolean_t loadkeys = B_FALSE;
zpool_handle_t *zhp; zpool_handle_t *zhp;
nvlist_t *config, *props = NULL; nvlist_t *config, *props = NULL;
@ -6891,14 +6919,19 @@ zpool_do_split(int argc, char **argv)
ret = 1; ret = 1;
} }
if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL && if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL) {
zpool_enable_datasets(zhp, mntopts, 0) != 0) { ms_status = zpool_enable_datasets(zhp, mntopts, 0);
ret = 1; if (ms_status == EZFS_SHAREFAILED) {
(void) fprintf(stderr, gettext("Split was successful, but " (void) fprintf(stderr, gettext("Split was successful, "
"the datasets could not all be mounted\n")); "datasets are mounted but sharing of some datasets "
"has failed\n"));
} else if (ms_status == EZFS_MOUNTFAILED) {
(void) fprintf(stderr, gettext("Split was successful"
", but some datasets could not be mounted\n"));
(void) fprintf(stderr, gettext("Try doing '%s' with a " (void) fprintf(stderr, gettext("Try doing '%s' with a "
"different altroot\n"), "zpool import"); "different altroot\n"), "zpool import");
} }
}
zpool_close(zhp); zpool_close(zhp);
nvlist_free(config); nvlist_free(config);
nvlist_free(props); nvlist_free(props);
@ -11039,6 +11072,25 @@ zpool_do_version(int argc, char **argv)
return (zfs_version_print() != 0); return (zfs_version_print() != 0);
} }
/* Display documentation */
static int
zpool_do_help(int argc, char **argv)
{
char page[MAXNAMELEN];
if (argc < 3 || strcmp(argv[2], "zpool") == 0)
strcpy(page, "zpool");
else if (strcmp(argv[2], "concepts") == 0 ||
strcmp(argv[2], "props") == 0)
snprintf(page, sizeof (page), "zpool%s", argv[2]);
else
snprintf(page, sizeof (page), "zpool-%s", argv[2]);
execlp("man", "man", page, NULL);
fprintf(stderr, "couldn't run man program: %s", strerror(errno));
return (-1);
}
/* /*
* Do zpool_load_compat() and print error message on failure * Do zpool_load_compat() and print error message on failure
*/ */
@ -11106,6 +11158,12 @@ main(int argc, char **argv)
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0)) if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zpool_do_version(argc, argv)); return (zpool_do_version(argc, argv));
/*
* Special case 'help'
*/
if (strcmp(cmdname, "help") == 0)
return (zpool_do_help(argc, argv));
if ((g_zfs = libzfs_init()) == NULL) { if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s\n", libzfs_error_init(errno)); (void) fprintf(stderr, "%s\n", libzfs_error_init(errno));
return (1); return (1);

View File

@ -2412,7 +2412,6 @@ ztest_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
int error; int error;
ASSERT3P(lwb, !=, NULL); ASSERT3P(lwb, !=, NULL);
ASSERT3P(zio, !=, NULL);
ASSERT3U(size, !=, 0); ASSERT3U(size, !=, 0);
ztest_object_lock(zd, object, RL_READER); ztest_object_lock(zd, object, RL_READER);
@ -2446,6 +2445,7 @@ ztest_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
DMU_READ_NO_PREFETCH); DMU_READ_NO_PREFETCH);
ASSERT0(error); ASSERT0(error);
} else { } else {
ASSERT3P(zio, !=, NULL);
size = doi.doi_data_block_size; size = doi.doi_data_block_size;
if (ISP2(size)) { if (ISP2(size)) {
offset = P2ALIGN(offset, size); offset = P2ALIGN(offset, size);
@ -2457,8 +2457,7 @@ ztest_get_data(void *arg, uint64_t arg2, lr_write_t *lr, char *buf,
zgd->zgd_lr = (struct zfs_locked_range *)ztest_range_lock(zd, zgd->zgd_lr = (struct zfs_locked_range *)ztest_range_lock(zd,
object, offset, size, RL_READER); object, offset, size, RL_READER);
error = dmu_buf_hold(os, object, offset, zgd, &db, error = dmu_buf_hold_noread(os, object, offset, zgd, &db);
DMU_READ_NO_PREFETCH);
if (error == 0) { if (error == 0) {
blkptr_t *bp = &lr->lr_blkptr; blkptr_t *bp = &lr->lr_blkptr;
@ -3767,7 +3766,7 @@ ztest_vdev_attach_detach(ztest_ds_t *zd, uint64_t id)
else if (ashift > oldvd->vdev_top->vdev_ashift) else if (ashift > oldvd->vdev_top->vdev_ashift)
expected_error = EDOM; expected_error = EDOM;
else if (newvd_is_dspare && pvd != vdev_draid_spare_get_parent(newvd)) else if (newvd_is_dspare && pvd != vdev_draid_spare_get_parent(newvd))
expected_error = ENOTSUP; expected_error = EINVAL;
else else
expected_error = 0; expected_error = 0;
@ -6379,6 +6378,7 @@ ztest_reguid(ztest_ds_t *zd, uint64_t id)
spa_t *spa = ztest_spa; spa_t *spa = ztest_spa;
uint64_t orig, load; uint64_t orig, load;
int error; int error;
ztest_shared_t *zs = ztest_shared;
if (ztest_opts.zo_mmp_test) if (ztest_opts.zo_mmp_test)
return; return;
@ -6388,6 +6388,7 @@ ztest_reguid(ztest_ds_t *zd, uint64_t id)
(void) pthread_rwlock_wrlock(&ztest_name_lock); (void) pthread_rwlock_wrlock(&ztest_name_lock);
error = spa_change_guid(spa); error = spa_change_guid(spa);
zs->zs_guid = spa_guid(spa);
(void) pthread_rwlock_unlock(&ztest_name_lock); (void) pthread_rwlock_unlock(&ztest_name_lock);
if (error != 0) if (error != 0)
@ -6917,7 +6918,7 @@ ztest_trim(ztest_ds_t *zd, uint64_t id)
* Verify pool integrity by running zdb. * Verify pool integrity by running zdb.
*/ */
static void static void
ztest_run_zdb(const char *pool) ztest_run_zdb(uint64_t guid)
{ {
int status; int status;
char *bin; char *bin;
@ -6941,13 +6942,13 @@ ztest_run_zdb(const char *pool)
free(set_gvars_args); free(set_gvars_args);
size_t would = snprintf(zdb, len, size_t would = snprintf(zdb, len,
"%s -bcc%s%s -G -d -Y -e -y %s -p %s %s", "%s -bcc%s%s -G -d -Y -e -y %s -p %s %"PRIu64,
bin, bin,
ztest_opts.zo_verbose >= 3 ? "s" : "", ztest_opts.zo_verbose >= 3 ? "s" : "",
ztest_opts.zo_verbose >= 4 ? "v" : "", ztest_opts.zo_verbose >= 4 ? "v" : "",
set_gvars_args_joined, set_gvars_args_joined,
ztest_opts.zo_dir, ztest_opts.zo_dir,
pool); guid);
ASSERT3U(would, <, len); ASSERT3U(would, <, len);
umem_free(set_gvars_args_joined, strlen(set_gvars_args_joined) + 1); umem_free(set_gvars_args_joined, strlen(set_gvars_args_joined) + 1);
@ -7525,14 +7526,15 @@ ztest_import(ztest_shared_t *zs)
VERIFY0(spa_open(ztest_opts.zo_pool, &spa, FTAG)); VERIFY0(spa_open(ztest_opts.zo_pool, &spa, FTAG));
zs->zs_metaslab_sz = zs->zs_metaslab_sz =
1ULL << spa->spa_root_vdev->vdev_child[0]->vdev_ms_shift; 1ULL << spa->spa_root_vdev->vdev_child[0]->vdev_ms_shift;
zs->zs_guid = spa_guid(spa);
spa_close(spa, FTAG); spa_close(spa, FTAG);
kernel_fini(); kernel_fini();
if (!ztest_opts.zo_mmp_test) { if (!ztest_opts.zo_mmp_test) {
ztest_run_zdb(ztest_opts.zo_pool); ztest_run_zdb(zs->zs_guid);
ztest_freeze(); ztest_freeze();
ztest_run_zdb(ztest_opts.zo_pool); ztest_run_zdb(zs->zs_guid);
} }
(void) pthread_rwlock_destroy(&ztest_name_lock); (void) pthread_rwlock_destroy(&ztest_name_lock);
@ -7603,7 +7605,6 @@ ztest_run(ztest_shared_t *zs)
dsl_pool_config_enter(dmu_objset_pool(os), FTAG); dsl_pool_config_enter(dmu_objset_pool(os), FTAG);
dmu_objset_fast_stat(os, &dds); dmu_objset_fast_stat(os, &dds);
dsl_pool_config_exit(dmu_objset_pool(os), FTAG); dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
zs->zs_guid = dds.dds_guid;
dmu_objset_disown(os, B_TRUE, FTAG); dmu_objset_disown(os, B_TRUE, FTAG);
/* /*
@ -7874,14 +7875,15 @@ ztest_init(ztest_shared_t *zs)
VERIFY0(spa_open(ztest_opts.zo_pool, &spa, FTAG)); VERIFY0(spa_open(ztest_opts.zo_pool, &spa, FTAG));
zs->zs_metaslab_sz = zs->zs_metaslab_sz =
1ULL << spa->spa_root_vdev->vdev_child[0]->vdev_ms_shift; 1ULL << spa->spa_root_vdev->vdev_child[0]->vdev_ms_shift;
zs->zs_guid = spa_guid(spa);
spa_close(spa, FTAG); spa_close(spa, FTAG);
kernel_fini(); kernel_fini();
if (!ztest_opts.zo_mmp_test) { if (!ztest_opts.zo_mmp_test) {
ztest_run_zdb(ztest_opts.zo_pool); ztest_run_zdb(zs->zs_guid);
ztest_freeze(); ztest_freeze();
ztest_run_zdb(ztest_opts.zo_pool); ztest_run_zdb(zs->zs_guid);
} }
(void) pthread_rwlock_destroy(&ztest_name_lock); (void) pthread_rwlock_destroy(&ztest_name_lock);
@ -8304,7 +8306,7 @@ main(int argc, char **argv)
} }
if (!ztest_opts.zo_mmp_test) if (!ztest_opts.zo_mmp_test)
ztest_run_zdb(ztest_opts.zo_pool); ztest_run_zdb(zs->zs_guid);
} }
if (ztest_opts.zo_verbose >= 1) { if (ztest_opts.zo_verbose >= 1) {

View File

@ -4,6 +4,7 @@
# Not following: a was not specified as input (see shellcheck -x). [SC1091] # Not following: a was not specified as input (see shellcheck -x). [SC1091]
# Prefer putting braces around variable references even when not strictly required. [SC2250] # Prefer putting braces around variable references even when not strictly required. [SC2250]
# Consider invoking this command separately to avoid masking its return value (or use '|| true' to ignore). [SC2312] # Consider invoking this command separately to avoid masking its return value (or use '|| true' to ignore). [SC2312]
# Command appears to be unreachable. Check usage (or ignore if invoked indirectly). [SC2317]
# In POSIX sh, 'local' is undefined. [SC2039] # older ShellCheck versions # In POSIX sh, 'local' is undefined. [SC2039] # older ShellCheck versions
# In POSIX sh, 'local' is undefined. [SC3043] # newer ShellCheck versions # In POSIX sh, 'local' is undefined. [SC3043] # newer ShellCheck versions
@ -18,7 +19,7 @@ PHONY += shellcheck
_STGT = $(subst ^,/,$(subst shellcheck-here-,,$@)) _STGT = $(subst ^,/,$(subst shellcheck-here-,,$@))
shellcheck-here-%: shellcheck-here-%:
if HAVE_SHELLCHECK if HAVE_SHELLCHECK
shellcheck --format=gcc --enable=all --exclude=SC1090,SC1091,SC2039,SC2250,SC2312,SC3043 $$([ -n "$(SHELLCHECK_SHELL)" ] && echo "--shell=$(SHELLCHECK_SHELL)") "$$([ -e "$(_STGT)" ] || echo "$(srcdir)/")$(_STGT)" shellcheck --format=gcc --enable=all --exclude=SC1090,SC1091,SC2039,SC2250,SC2312,SC2317,SC3043 $$([ -n "$(SHELLCHECK_SHELL)" ] && echo "--shell=$(SHELLCHECK_SHELL)") "$$([ -e "$(_STGT)" ] || echo "$(srcdir)/")$(_STGT)"
else else
@echo "skipping shellcheck of" $(_STGT) "because shellcheck is not installed" @echo "skipping shellcheck of" $(_STGT) "because shellcheck is not installed"
endif endif

View File

@ -16,13 +16,64 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH], [
]) ])
]) ])
dnl #
dnl # 6.5.x API change,
dnl # blkdev_get_by_path() takes 4 args
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG], [
ZFS_LINUX_TEST_SRC([blkdev_get_by_path_4arg], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct block_device *bdev __attribute__ ((unused)) = NULL;
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
bdev = blkdev_get_by_path(path, mode, holder, &h);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_BY_PATH], [ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_BY_PATH], [
AC_MSG_CHECKING([whether blkdev_get_by_path() exists]) AC_MSG_CHECKING([whether blkdev_get_by_path() exists and takes 3 args])
ZFS_LINUX_TEST_RESULT([blkdev_get_by_path], [ ZFS_LINUX_TEST_RESULT([blkdev_get_by_path], [
AC_MSG_RESULT(yes) AC_MSG_RESULT(yes)
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether blkdev_get_by_path() exists and takes 4 args])
ZFS_LINUX_TEST_RESULT([blkdev_get_by_path_4arg], [
AC_DEFINE(HAVE_BLKDEV_GET_BY_PATH_4ARG, 1,
[blkdev_get_by_path() exists and takes 4 args])
AC_MSG_RESULT(yes)
], [ ], [
ZFS_LINUX_TEST_ERROR([blkdev_get_by_path()]) ZFS_LINUX_TEST_ERROR([blkdev_get_by_path()])
]) ])
])
])
dnl #
dnl # 6.5.x API change
dnl # blk_mode_t was added as a type to supercede some places where fmode_t
dnl # is used
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BLK_MODE_T], [
ZFS_LINUX_TEST_SRC([blk_mode_t], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
blk_mode_t m __attribute((unused)) = (blk_mode_t)0;
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BLK_MODE_T], [
AC_MSG_CHECKING([whether blk_mode_t is defined])
ZFS_LINUX_TEST_RESULT([blk_mode_t], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_MODE_T, 1, [blk_mode_t is defined])
], [
AC_MSG_RESULT(no)
])
]) ])
dnl # dnl #
@ -41,13 +92,36 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_PUT], [
]) ])
]) ])
dnl #
dnl # 6.5.x API change.
dnl # blkdev_put() takes (void* holder) as arg 2
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_PUT_HOLDER], [
ZFS_LINUX_TEST_SRC([blkdev_put_holder], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct block_device *bdev = NULL;
void *holder = NULL;
blkdev_put(bdev, holder);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_PUT], [ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_PUT], [
AC_MSG_CHECKING([whether blkdev_put() exists]) AC_MSG_CHECKING([whether blkdev_put() exists])
ZFS_LINUX_TEST_RESULT([blkdev_put], [ ZFS_LINUX_TEST_RESULT([blkdev_put], [
AC_MSG_RESULT(yes) AC_MSG_RESULT(yes)
], [
AC_MSG_CHECKING([whether blkdev_put() accepts void* as arg 2])
ZFS_LINUX_TEST_RESULT([blkdev_put_holder], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_PUT_HOLDER, 1,
[blkdev_put() accepts void* as arg 2])
], [ ], [
ZFS_LINUX_TEST_ERROR([blkdev_put()]) ZFS_LINUX_TEST_ERROR([blkdev_put()])
]) ])
])
]) ])
dnl # dnl #
@ -103,6 +177,33 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_CHECK_DISK_CHANGE], [
]) ])
]) ])
dnl #
dnl # 6.5.x API change
dnl # disk_check_media_change() was added
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_DISK_CHECK_MEDIA_CHANGE], [
ZFS_LINUX_TEST_SRC([disk_check_media_change], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct block_device *bdev = NULL;
bool error;
error = disk_check_media_change(bdev->bd_disk);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_DISK_CHECK_MEDIA_CHANGE], [
AC_MSG_CHECKING([whether disk_check_media_change() exists])
ZFS_LINUX_TEST_RESULT([disk_check_media_change], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DISK_CHECK_MEDIA_CHANGE, 1,
[disk_check_media_change() exists])
], [
AC_MSG_RESULT(no)
])
])
dnl # dnl #
dnl # bdev_kobj() is introduced from 5.12 dnl # bdev_kobj() is introduced from 5.12
dnl # dnl #
@ -443,9 +544,34 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_ERESTARTSYS], [
]) ])
]) ])
dnl #
dnl # 6.5.x API change
dnl # BLK_STS_NEXUS replaced with BLK_STS_RESV_CONFLICT
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BLK_STS_RESV_CONFLICT], [
ZFS_LINUX_TEST_SRC([blk_sts_resv_conflict], [
#include <linux/blkdev.h>
],[
blk_status_t s __attribute__ ((unused)) = BLK_STS_RESV_CONFLICT;
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BLK_STS_RESV_CONFLICT], [
AC_MSG_CHECKING([whether BLK_STS_RESV_CONFLICT is defined])
ZFS_LINUX_TEST_RESULT([blk_sts_resv_conflict], [
AC_DEFINE(HAVE_BLK_STS_RESV_CONFLICT, 1, [BLK_STS_RESV_CONFLICT is defined])
AC_MSG_RESULT(yes)
], [
AC_MSG_RESULT(no)
])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG
ZFS_AC_KERNEL_SRC_BLKDEV_PUT ZFS_AC_KERNEL_SRC_BLKDEV_PUT
ZFS_AC_KERNEL_SRC_BLKDEV_PUT_HOLDER
ZFS_AC_KERNEL_SRC_BLKDEV_REREAD_PART ZFS_AC_KERNEL_SRC_BLKDEV_REREAD_PART
ZFS_AC_KERNEL_SRC_BLKDEV_INVALIDATE_BDEV ZFS_AC_KERNEL_SRC_BLKDEV_INVALIDATE_BDEV
ZFS_AC_KERNEL_SRC_BLKDEV_LOOKUP_BDEV ZFS_AC_KERNEL_SRC_BLKDEV_LOOKUP_BDEV
@ -458,6 +584,9 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_KOBJ ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_SRC_BLKDEV_PART_TO_DEV ZFS_AC_KERNEL_SRC_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_SRC_BLKDEV_DISK_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_SRC_BLKDEV_BLK_STS_RESV_CONFLICT
ZFS_AC_KERNEL_SRC_BLKDEV_BLK_MODE_T
]) ])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [
@ -476,4 +605,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [
ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_BLKDEV_BDEV_KOBJ ZFS_AC_KERNEL_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_BLKDEV_PART_TO_DEV ZFS_AC_KERNEL_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_BLKDEV_DISK_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_BLKDEV_BLK_STS_RESV_CONFLICT
ZFS_AC_KERNEL_BLKDEV_BLK_MODE_T
]) ])

View File

@ -49,13 +49,43 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID], [
], [], []) ], [], [])
]) ])
dnl #
dnl # 5.9.x API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG], [
ZFS_LINUX_TEST_SRC([block_device_operations_release_void_1arg], [
#include <linux/blkdev.h>
void blk_release(struct gendisk *g) {
(void) g;
return;
}
static const struct block_device_operations
bops __attribute__ ((unused)) = {
.open = NULL,
.release = blk_release,
.ioctl = NULL,
.compat_ioctl = NULL,
};
], [], [])
])
AC_DEFUN([ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID], [ AC_DEFUN([ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID], [
AC_MSG_CHECKING([whether bops->release() is void]) AC_MSG_CHECKING([whether bops->release() is void and takes 2 args])
ZFS_LINUX_TEST_RESULT([block_device_operations_release_void], [ ZFS_LINUX_TEST_RESULT([block_device_operations_release_void], [
AC_MSG_RESULT(yes) AC_MSG_RESULT(yes)
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bops->release() is void and takes 1 arg])
ZFS_LINUX_TEST_RESULT([block_device_operations_release_void_1arg], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG], [1],
[Define if release() in block_device_operations takes 1 arg])
],[ ],[
ZFS_LINUX_TEST_ERROR([bops->release()]) ZFS_LINUX_TEST_ERROR([bops->release()])
]) ])
])
]) ])
dnl # dnl #
@ -92,6 +122,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS], [ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS], [
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_CHECK_EVENTS ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_CHECK_EVENTS
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK
]) ])

View File

@ -0,0 +1,25 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ], [
dnl #
dnl # Kernel 6.5 - generic_file_splice_read was removed in favor
dnl # of copy_splice_read for the .splice_read member of the
dnl # file_operations struct.
dnl #
ZFS_LINUX_TEST_SRC([has_copy_splice_read], [
#include <linux/fs.h>
struct file_operations fops __attribute__((unused)) = {
.splice_read = copy_splice_read,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_COPY_SPLICE_READ], [
AC_MSG_CHECKING([whether copy_splice_read() exists])
ZFS_LINUX_TEST_RESULT([has_copy_splice_read], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_COPY_SPLICE_READ, 1,
[copy_splice_read exists])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,27 @@
dnl #
dnl # Linux 6.5 removes register_sysctl_table
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE], [
ZFS_LINUX_TEST_SRC([has_register_sysctl_table], [
#include <linux/sysctl.h>
static struct ctl_table dummy_table[] = {
{}
};
],[
struct ctl_table_header *h
__attribute((unused)) = register_sysctl_table(dummy_table);
])
])
AC_DEFUN([ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE], [
AC_MSG_CHECKING([whether register_sysctl_table exists])
ZFS_LINUX_TEST_RESULT([has_register_sysctl_table], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_REGISTER_SYSCTL_TABLE, 1,
[register_sysctl_table exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -0,0 +1,50 @@
dnl #
dnl # EL7 have backported copy_file_range and clone_file_range and
dnl # added them to an "extended" file_operations struct.
dnl #
dnl # We're testing for both functions in one here, because they will only
dnl # ever appear together and we don't want to match a similar method in
dnl # some future vendor kernel.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_FILE_OPERATIONS_EXTEND], [
ZFS_LINUX_TEST_SRC([vfs_file_operations_extend], [
#include <linux/fs.h>
static ssize_t test_copy_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
size_t len, unsigned int flags) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len; (void) flags;
return (0);
}
static int test_clone_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
u64 len) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len;
return (0);
}
static const struct file_operations_extend
fops __attribute__ ((unused)) = {
.kabi_fops = {},
.copy_file_range = test_copy_file_range,
.clone_file_range = test_clone_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_FILE_OPERATIONS_EXTEND], [
AC_MSG_CHECKING([whether file_operations_extend takes \
.copy_file_range() and .clone_file_range()])
ZFS_LINUX_TEST_RESULT([vfs_file_operations_extend], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_FILE_OPERATIONS_EXTEND, 1,
[file_operations_extend takes .copy_file_range()
and .clone_file_range()])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -0,0 +1,164 @@
dnl #
dnl # The *_file_range APIs have a long history:
dnl #
dnl # 2.6.29: BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctl introduced
dnl # 3.12: BTRFS_IOC_FILE_EXTENT_SAME ioctl introduced
dnl #
dnl # 4.5: copy_file_range() syscall introduced, added to VFS
dnl # 4.5: BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE renamed to FICLONE ands
dnl # FICLONERANGE, added to VFS as clone_file_range()
dnl # 4.5: BTRFS_IOC_FILE_EXTENT_SAME renamed to FIDEDUPERANGE, added to VFS
dnl # as dedupe_file_range()
dnl #
dnl # 4.20: VFS clone_file_range() and dedupe_file_range() replaced by
dnl # remap_file_range()
dnl #
dnl # 5.3: VFS copy_file_range() expected to do its own fallback,
dnl # generic_copy_file_range() added to support it
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_copy_file_range], [
#include <linux/fs.h>
static ssize_t test_copy_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
size_t len, unsigned int flags) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len; (void) flags;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.copy_file_range = test_copy_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->copy_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_copy_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_COPY_FILE_RANGE, 1,
[fops->copy_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([generic_copy_file_range], [
#include <linux/fs.h>
], [
struct file *src_file __attribute__ ((unused)) = NULL;
loff_t src_off __attribute__ ((unused)) = 0;
struct file *dst_file __attribute__ ((unused)) = NULL;
loff_t dst_off __attribute__ ((unused)) = 0;
size_t len __attribute__ ((unused)) = 0;
unsigned int flags __attribute__ ((unused)) = 0;
generic_copy_file_range(src_file, src_off, dst_file, dst_off,
len, flags);
])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE], [
AC_MSG_CHECKING([whether generic_copy_file_range() is available])
ZFS_LINUX_TEST_RESULT_SYMBOL([generic_copy_file_range],
[generic_copy_file_range], [fs/read_write.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_GENERIC_COPY_FILE_RANGE, 1,
[generic_copy_file_range() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_clone_file_range], [
#include <linux/fs.h>
static int test_clone_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
u64 len) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.clone_file_range = test_clone_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->clone_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_clone_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_CLONE_FILE_RANGE, 1,
[fops->clone_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_dedupe_file_range], [
#include <linux/fs.h>
static int test_dedupe_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
u64 len) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.dedupe_file_range = test_dedupe_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->dedupe_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_dedupe_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DEDUPE_FILE_RANGE, 1,
[fops->dedupe_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_remap_file_range], [
#include <linux/fs.h>
static loff_t test_remap_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
loff_t len, unsigned int flags) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len; (void) flags;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.remap_file_range = test_remap_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->remap_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_remap_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_REMAP_FILE_RANGE, 1,
[fops->remap_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -6,8 +6,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_IOV_ITER], [
#include <linux/fs.h> #include <linux/fs.h>
#include <linux/uio.h> #include <linux/uio.h>
],[ ],[
int type __attribute__ ((unused)) = int type __attribute__ ((unused)) = ITER_KVEC;
ITER_IOVEC | ITER_KVEC | ITER_BVEC | ITER_PIPE;
]) ])
ZFS_LINUX_TEST_SRC([iov_iter_advance], [ ZFS_LINUX_TEST_SRC([iov_iter_advance], [
@ -93,6 +92,14 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_IOV_ITER], [
struct iov_iter iter = { 0 }; struct iov_iter iter = { 0 };
__attribute__((unused)) enum iter_type i = iov_iter_type(&iter); __attribute__((unused)) enum iter_type i = iov_iter_type(&iter);
]) ])
ZFS_LINUX_TEST_SRC([iter_iov], [
#include <linux/fs.h>
#include <linux/uio.h>
],[
struct iov_iter iter = { 0 };
__attribute__((unused)) const struct iovec *iov = iter_iov(&iter);
])
]) ])
AC_DEFUN([ZFS_AC_KERNEL_VFS_IOV_ITER], [ AC_DEFUN([ZFS_AC_KERNEL_VFS_IOV_ITER], [
@ -201,4 +208,19 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_IOV_ITER], [
AC_DEFINE(HAVE_VFS_IOV_ITER, 1, AC_DEFINE(HAVE_VFS_IOV_ITER, 1,
[All required iov_iter interfaces are available]) [All required iov_iter interfaces are available])
]) ])
dnl #
dnl # Kernel 6.5 introduces the iter_iov() function that returns the
dnl # __iov member of an iov_iter*. The iov member was renamed to this
dnl # __iov member, and is intended to be accessed via the helper
dnl # function now.
dnl #
AC_MSG_CHECKING([whether iter_iov() is available])
ZFS_LINUX_TEST_RESULT([iter_iov], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_ITER_IOV, 1,
[iter_iov() is available])
],[
AC_MSG_RESULT(no)
])
]) ])

View File

@ -116,6 +116,12 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_VFS_RW_ITERATE ZFS_AC_KERNEL_SRC_VFS_RW_ITERATE
ZFS_AC_KERNEL_SRC_VFS_GENERIC_WRITE_CHECKS ZFS_AC_KERNEL_SRC_VFS_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_SRC_VFS_IOV_ITER ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_FILE_OPERATIONS_EXTEND
ZFS_AC_KERNEL_SRC_KMAP_ATOMIC_ARGS ZFS_AC_KERNEL_SRC_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_SRC_FOLLOW_DOWN_ONE ZFS_AC_KERNEL_SRC_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN
@ -154,6 +160,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_FILEMAP ZFS_AC_KERNEL_SRC_FILEMAP
ZFS_AC_KERNEL_SRC_WRITEPAGE_T ZFS_AC_KERNEL_SRC_WRITEPAGE_T
ZFS_AC_KERNEL_SRC_RECLAIMED ZFS_AC_KERNEL_SRC_RECLAIMED
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ
case "$host_cpu" in case "$host_cpu" in
powerpc*) powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@ -249,6 +257,12 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_VFS_RW_ITERATE ZFS_AC_KERNEL_VFS_RW_ITERATE
ZFS_AC_KERNEL_VFS_GENERIC_WRITE_CHECKS ZFS_AC_KERNEL_VFS_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_VFS_IOV_ITER ZFS_AC_KERNEL_VFS_IOV_ITER
ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE
ZFS_AC_KERNEL_VFS_FILE_OPERATIONS_EXTEND
ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_FOLLOW_DOWN_ONE ZFS_AC_KERNEL_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_MAKE_REQUEST_FN ZFS_AC_KERNEL_MAKE_REQUEST_FN
@ -287,6 +301,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_FILEMAP ZFS_AC_KERNEL_FILEMAP
ZFS_AC_KERNEL_WRITEPAGE_T ZFS_AC_KERNEL_WRITEPAGE_T
ZFS_AC_KERNEL_RECLAIMED ZFS_AC_KERNEL_RECLAIMED
ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_COPY_SPLICE_READ
case "$host_cpu" in case "$host_cpu" in
powerpc*) powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE ZFS_AC_KERNEL_CPU_HAS_FEATURE

View File

@ -358,6 +358,9 @@ AC_DEFUN([ZFS_AC_RPM], [
AS_IF([test -n "$udevruledir" ], [ AS_IF([test -n "$udevruledir" ], [
RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' --define "_udevruledir $(udevruledir)"' RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' --define "_udevruledir $(udevruledir)"'
]) ])
AS_IF([test -n "$bashcompletiondir" ], [
RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' --define "_bashcompletiondir $(bashcompletiondir)"'
])
RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' $(DEFINE_SYSTEMD)' RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' $(DEFINE_SYSTEMD)'
RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' $(DEFINE_PYZFS)' RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' $(DEFINE_PYZFS)'
RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' $(DEFINE_PAM)' RPM_DEFINE_UTIL=${RPM_DEFINE_UTIL}' $(DEFINE_PAM)'
@ -617,6 +620,17 @@ AC_DEFUN([ZFS_AC_DEFAULT_PACKAGE], [
AC_MSG_RESULT([no]) AC_MSG_RESULT([no])
fi fi
AC_SUBST(RPM_DEFINE_INITRAMFS) AC_SUBST(RPM_DEFINE_INITRAMFS)
AC_MSG_CHECKING([default bash completion directory])
case "$VENDOR" in
ubuntu) bashcompletiondir=/usr/share/bash-completion/completions ;;
debian) bashcompletiondir=/usr/share/bash-completion/completions ;;
freebsd) bashcompletiondir=$sysconfdir/bash_completion.d;;
*) bashcompletiondir=/etc/bash_completion.d ;;
esac
AC_MSG_RESULT([$bashcompletiondir])
AC_SUBST(bashcompletiondir)
]) ])
dnl # dnl #

View File

@ -1,5 +1,3 @@
bashcompletiondir = $(sysconfdir)/bash_completion.d
nodist_bashcompletion_DATA = %D%/zfs nodist_bashcompletion_DATA = %D%/zfs
SUBSTFILES += $(nodist_bashcompletion_DATA) SUBSTFILES += $(nodist_bashcompletion_DATA)

View File

@ -1,3 +1,9 @@
openzfs-linux (2.2.0-0) unstable; urgency=low
* OpenZFS 2.2.0 is tagged.
-- Umer Saleem <usaleem@ixsystems.com> Tue, 25 Jul 2023 15:00:00 +0500
openzfs-linux (2.1.99-1) unstable; urgency=low openzfs-linux (2.1.99-1) unstable; urgency=low
* Integrate minimally modified Debian packaging from ZFS on Linux * Integrate minimally modified Debian packaging from ZFS on Linux

View File

@ -4,7 +4,7 @@ Priority: optional
Maintainer: ZFS on Linux specific mailing list <zfs-discuss@list.zfsonlinux.org> Maintainer: ZFS on Linux specific mailing list <zfs-discuss@list.zfsonlinux.org>
Build-Depends: debhelper-compat (= 12), Build-Depends: debhelper-compat (= 12),
dh-python, dh-python,
dkms (>> 2.1.1.2-5), dh-sequence-dkms | dkms (>> 2.1.1.2-5),
libaio-dev, libaio-dev,
libblkid-dev, libblkid-dev,
libcurl4-openssl-dev, libcurl4-openssl-dev,

View File

@ -1,10 +1,8 @@
sbin/zinject
sbin/ztest sbin/ztest
usr/bin/raidz_test usr/bin/raidz_test
usr/share/man/man1/raidz_test.1 usr/share/man/man1/raidz_test.1
usr/share/man/man1/test-runner.1 usr/share/man/man1/test-runner.1
usr/share/man/man1/ztest.1 usr/share/man/man1/ztest.1
usr/share/man/man8/zinject.8
usr/share/zfs/common.sh usr/share/zfs/common.sh
usr/share/zfs/runfiles/ usr/share/zfs/runfiles/
usr/share/zfs/test-runner usr/share/zfs/test-runner

View File

@ -1,7 +1,6 @@
etc/default/zfs etc/default/zfs
etc/zfs/zfs-functions etc/zfs/zfs-functions
etc/zfs/zpool.d/ etc/zfs/zpool.d/
etc/bash_completion.d/zfs
lib/systemd/system-generators/ lib/systemd/system-generators/
lib/systemd/system-preset/ lib/systemd/system-preset/
lib/systemd/system/zfs-import-cache.service lib/systemd/system/zfs-import-cache.service
@ -27,6 +26,7 @@ sbin/zfs
sbin/zfs_ids_to_path sbin/zfs_ids_to_path
sbin/zgenhostid sbin/zgenhostid
sbin/zhack sbin/zhack
sbin/zinject
sbin/zpool sbin/zpool
sbin/zstream sbin/zstream
sbin/zstreamdump sbin/zstreamdump
@ -59,7 +59,6 @@ usr/share/man/man8/zfs-get.8
usr/share/man/man8/zfs-groupspace.8 usr/share/man/man8/zfs-groupspace.8
usr/share/man/man8/zfs-hold.8 usr/share/man/man8/zfs-hold.8
usr/share/man/man8/zfs-inherit.8 usr/share/man/man8/zfs-inherit.8
usr/share/man/man8/zfs-jail.8
usr/share/man/man8/zfs-list.8 usr/share/man/man8/zfs-list.8
usr/share/man/man8/zfs-load-key.8 usr/share/man/man8/zfs-load-key.8
usr/share/man/man8/zfs-mount-generator.8 usr/share/man/man8/zfs-mount-generator.8
@ -79,7 +78,6 @@ usr/share/man/man8/zfs-set.8
usr/share/man/man8/zfs-share.8 usr/share/man/man8/zfs-share.8
usr/share/man/man8/zfs-snapshot.8 usr/share/man/man8/zfs-snapshot.8
usr/share/man/man8/zfs-unallow.8 usr/share/man/man8/zfs-unallow.8
usr/share/man/man8/zfs-unjail.8
usr/share/man/man8/zfs-unload-key.8 usr/share/man/man8/zfs-unload-key.8
usr/share/man/man8/zfs-unmount.8 usr/share/man/man8/zfs-unmount.8
usr/share/man/man8/zfs-unzone.8 usr/share/man/man8/zfs-unzone.8
@ -92,6 +90,7 @@ usr/share/man/man8/zfs_ids_to_path.8
usr/share/man/man7/zfsconcepts.7 usr/share/man/man7/zfsconcepts.7
usr/share/man/man7/zfsprops.7 usr/share/man/man7/zfsprops.7
usr/share/man/man8/zgenhostid.8 usr/share/man/man8/zgenhostid.8
usr/share/man/man8/zinject.8
usr/share/man/man8/zpool-add.8 usr/share/man/man8/zpool-add.8
usr/share/man/man8/zpool-attach.8 usr/share/man/man8/zpool-attach.8
usr/share/man/man8/zpool-checkpoint.8 usr/share/man/man8/zpool-checkpoint.8

View File

@ -71,10 +71,6 @@ override_dh_auto_install:
@# Install the utilities. @# Install the utilities.
$(MAKE) install DESTDIR='$(CURDIR)/debian/tmp' $(MAKE) install DESTDIR='$(CURDIR)/debian/tmp'
# Use upstream's bash completion
install -D -t '$(CURDIR)/debian/tmp/usr/share/bash-completion/completions/' \
'$(CURDIR)/contrib/bash_completion.d/zfs'
# Move from bin_dir to /usr/sbin # Move from bin_dir to /usr/sbin
# Remove suffix (.py) as per policy 10.4 - Scripts # Remove suffix (.py) as per policy 10.4 - Scripts
# https://www.debian.org/doc/debian-policy/ch-files.html#s-scripts # https://www.debian.org/doc/debian-policy/ch-files.html#s-scripts
@ -136,7 +132,6 @@ override_dh_auto_install:
chmod a-x '$(CURDIR)/debian/tmp/etc/zfs/zfs-functions' chmod a-x '$(CURDIR)/debian/tmp/etc/zfs/zfs-functions'
chmod a-x '$(CURDIR)/debian/tmp/etc/default/zfs' chmod a-x '$(CURDIR)/debian/tmp/etc/default/zfs'
chmod a-x '$(CURDIR)/debian/tmp/usr/share/bash-completion/completions/zfs'
override_dh_python3: override_dh_python3:
dh_python3 -p openzfs-python3-pyzfs dh_python3 -p openzfs-python3-pyzfs

View File

@ -12,6 +12,7 @@ ExecStart=/bin/sh -c '
decode_root_args || exit 0; \ decode_root_args || exit 0; \
[ "$root" = "zfs:AUTO" ] && root="$(@sbindir@/zpool list -H -o bootfs | grep -m1 -vFx -)"; \ [ "$root" = "zfs:AUTO" ] && root="$(@sbindir@/zpool list -H -o bootfs | grep -m1 -vFx -)"; \
rootflags="$(getarg rootflags=)"; \ rootflags="$(getarg rootflags=)"; \
[ "$(@sbindir@/zfs get -H -o value mountpoint "$root")" = legacy ] || \
case ",$rootflags," in \ case ",$rootflags," in \
*,zfsutil,*) ;; \ *,zfsutil,*) ;; \
,,) rootflags=zfsutil ;; \ ,,) rootflags=zfsutil ;; \

View File

@ -2,7 +2,7 @@
Description=Rollback bootfs just before it is mounted Description=Rollback bootfs just before it is mounted
Requisite=zfs-import.target Requisite=zfs-import.target
After=zfs-import.target dracut-pre-mount.service zfs-snapshot-bootfs.service After=zfs-import.target dracut-pre-mount.service zfs-snapshot-bootfs.service
Before=dracut-mount.service Before=dracut-mount.service sysroot.mount
DefaultDependencies=no DefaultDependencies=no
ConditionKernelCommandLine=bootfs.rollback ConditionKernelCommandLine=bootfs.rollback
ConditionEnvironment=BOOTFS ConditionEnvironment=BOOTFS

View File

@ -156,6 +156,7 @@ typedef enum zfs_error {
EZFS_NOT_USER_NAMESPACE, /* a file is not a user namespace */ EZFS_NOT_USER_NAMESPACE, /* a file is not a user namespace */
EZFS_CKSUM, /* insufficient replicas */ EZFS_CKSUM, /* insufficient replicas */
EZFS_RESUME_EXISTS, /* Resume on existing dataset without force */ EZFS_RESUME_EXISTS, /* Resume on existing dataset without force */
EZFS_SHAREFAILED, /* filesystem share failed */
EZFS_UNKNOWN EZFS_UNKNOWN
} zfs_error_t; } zfs_error_t;
@ -522,6 +523,7 @@ _LIBZFS_H nvlist_t *zfs_valid_proplist(libzfs_handle_t *, zfs_type_t,
_LIBZFS_H const char *zfs_prop_to_name(zfs_prop_t); _LIBZFS_H const char *zfs_prop_to_name(zfs_prop_t);
_LIBZFS_H int zfs_prop_set(zfs_handle_t *, const char *, const char *); _LIBZFS_H int zfs_prop_set(zfs_handle_t *, const char *, const char *);
_LIBZFS_H int zfs_prop_set_list(zfs_handle_t *, nvlist_t *); _LIBZFS_H int zfs_prop_set_list(zfs_handle_t *, nvlist_t *);
_LIBZFS_H int zfs_prop_set_list_flags(zfs_handle_t *, nvlist_t *, int);
_LIBZFS_H int zfs_prop_get(zfs_handle_t *, zfs_prop_t, char *, size_t, _LIBZFS_H int zfs_prop_get(zfs_handle_t *, zfs_prop_t, char *, size_t,
zprop_source_t *, char *, size_t, boolean_t); zprop_source_t *, char *, size_t, boolean_t);
_LIBZFS_H int zfs_prop_get_recvd(zfs_handle_t *, const char *, char *, size_t, _LIBZFS_H int zfs_prop_get_recvd(zfs_handle_t *, const char *, char *, size_t,
@ -644,6 +646,13 @@ typedef struct zprop_get_cbdata {
vdev_cbdata_t cb_vdevs; vdev_cbdata_t cb_vdevs;
} zprop_get_cbdata_t; } zprop_get_cbdata_t;
#define ZFS_SET_NOMOUNT 1
typedef struct zprop_set_cbdata {
int cb_flags;
nvlist_t *cb_proplist;
} zprop_set_cbdata_t;
_LIBZFS_H void zprop_print_one_property(const char *, zprop_get_cbdata_t *, _LIBZFS_H void zprop_print_one_property(const char *, zprop_get_cbdata_t *,
const char *, const char *, zprop_source_t, const char *, const char *, const char *, zprop_source_t, const char *,
const char *); const char *);

View File

@ -36,7 +36,11 @@ struct xucred;
typedef struct flock flock64_t; typedef struct flock flock64_t;
typedef struct vnode vnode_t; typedef struct vnode vnode_t;
typedef struct vattr vattr_t; typedef struct vattr vattr_t;
#if __FreeBSD_version < 1400093
typedef enum vtype vtype_t; typedef enum vtype vtype_t;
#else
#define vtype_t __enum_uint8(vtype)
#endif
#include <sys/types.h> #include <sys/types.h>
#include <sys/queue.h> #include <sys/queue.h>

View File

@ -93,7 +93,6 @@ struct zfsvfs {
zfs_teardown_lock_t z_teardown_lock; zfs_teardown_lock_t z_teardown_lock;
zfs_teardown_inactive_lock_t z_teardown_inactive_lock; zfs_teardown_inactive_lock_t z_teardown_inactive_lock;
list_t z_all_znodes; /* all vnodes in the fs */ list_t z_all_znodes; /* all vnodes in the fs */
uint64_t z_nr_znodes; /* number of znodes in the fs */
kmutex_t z_znodes_lock; /* lock for z_all_znodes */ kmutex_t z_znodes_lock; /* lock for z_all_znodes */
struct zfsctl_root *z_ctldir; /* .zfs directory pointer */ struct zfsctl_root *z_ctldir; /* .zfs directory pointer */
boolean_t z_show_ctldir; /* expose .zfs in the root dir */ boolean_t z_show_ctldir; /* expose .zfs in the root dir */

View File

@ -181,7 +181,11 @@ bi_status_to_errno(blk_status_t status)
return (ENOLINK); return (ENOLINK);
case BLK_STS_TARGET: case BLK_STS_TARGET:
return (EREMOTEIO); return (EREMOTEIO);
#ifdef HAVE_BLK_STS_RESV_CONFLICT
case BLK_STS_RESV_CONFLICT:
#else
case BLK_STS_NEXUS: case BLK_STS_NEXUS:
#endif
return (EBADE); return (EBADE);
case BLK_STS_MEDIUM: case BLK_STS_MEDIUM:
return (ENODATA); return (ENODATA);
@ -215,7 +219,11 @@ errno_to_bi_status(int error)
case EREMOTEIO: case EREMOTEIO:
return (BLK_STS_TARGET); return (BLK_STS_TARGET);
case EBADE: case EBADE:
#ifdef HAVE_BLK_STS_RESV_CONFLICT
return (BLK_STS_RESV_CONFLICT);
#else
return (BLK_STS_NEXUS); return (BLK_STS_NEXUS);
#endif
case ENODATA: case ENODATA:
return (BLK_STS_MEDIUM); return (BLK_STS_MEDIUM);
case EILSEQ: case EILSEQ:
@ -337,6 +345,9 @@ zfs_check_media_change(struct block_device *bdev)
return (0); return (0);
} }
#define vdev_bdev_reread_part(bdev) zfs_check_media_change(bdev) #define vdev_bdev_reread_part(bdev) zfs_check_media_change(bdev)
#elif defined(HAVE_DISK_CHECK_MEDIA_CHANGE)
#define vdev_bdev_reread_part(bdev) disk_check_media_change(bdev->bd_disk)
#define zfs_check_media_change(bdev) disk_check_media_change(bdev->bd_disk)
#else #else
/* /*
* This is encountered if check_disk_change() and bdev_check_media_change() * This is encountered if check_disk_change() and bdev_check_media_change()
@ -387,6 +398,12 @@ vdev_lookup_bdev(const char *path, dev_t *dev)
#endif #endif
} }
#if defined(HAVE_BLK_MODE_T)
#define blk_mode_is_open_write(flag) ((flag) & BLK_OPEN_WRITE)
#else
#define blk_mode_is_open_write(flag) ((flag) & FMODE_WRITE)
#endif
/* /*
* Kernels without bio_set_op_attrs use bi_rw for the bio flags. * Kernels without bio_set_op_attrs use bi_rw for the bio flags.
*/ */

View File

@ -147,6 +147,15 @@
#error "Toolchain needs to support the XSAVE assembler instruction" #error "Toolchain needs to support the XSAVE assembler instruction"
#endif #endif
#ifndef XFEATURE_MASK_XTILE
/*
* For kernels where this doesn't exist yet, we still don't want to break
* by save/restoring this broken nonsense.
* See issue #14989 or Intel errata SPR4 for why
*/
#define XFEATURE_MASK_XTILE 0x60000
#endif
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/slab.h> #include <linux/slab.h>
@ -315,18 +324,18 @@ kfpu_begin(void)
uint8_t *state = zfs_kfpu_fpregs[smp_processor_id()]; uint8_t *state = zfs_kfpu_fpregs[smp_processor_id()];
#if defined(HAVE_XSAVES) #if defined(HAVE_XSAVES)
if (static_cpu_has(X86_FEATURE_XSAVES)) { if (static_cpu_has(X86_FEATURE_XSAVES)) {
kfpu_do_xsave("xsaves", state, ~0); kfpu_do_xsave("xsaves", state, ~XFEATURE_MASK_XTILE);
return; return;
} }
#endif #endif
#if defined(HAVE_XSAVEOPT) #if defined(HAVE_XSAVEOPT)
if (static_cpu_has(X86_FEATURE_XSAVEOPT)) { if (static_cpu_has(X86_FEATURE_XSAVEOPT)) {
kfpu_do_xsave("xsaveopt", state, ~0); kfpu_do_xsave("xsaveopt", state, ~XFEATURE_MASK_XTILE);
return; return;
} }
#endif #endif
if (static_cpu_has(X86_FEATURE_XSAVE)) { if (static_cpu_has(X86_FEATURE_XSAVE)) {
kfpu_do_xsave("xsave", state, ~0); kfpu_do_xsave("xsave", state, ~XFEATURE_MASK_XTILE);
} else if (static_cpu_has(X86_FEATURE_FXSR)) { } else if (static_cpu_has(X86_FEATURE_FXSR)) {
kfpu_save_fxsr(state); kfpu_save_fxsr(state);
} else { } else {
@ -376,12 +385,12 @@ kfpu_end(void)
uint8_t *state = zfs_kfpu_fpregs[smp_processor_id()]; uint8_t *state = zfs_kfpu_fpregs[smp_processor_id()];
#if defined(HAVE_XSAVES) #if defined(HAVE_XSAVES)
if (static_cpu_has(X86_FEATURE_XSAVES)) { if (static_cpu_has(X86_FEATURE_XSAVES)) {
kfpu_do_xrstor("xrstors", state, ~0); kfpu_do_xrstor("xrstors", state, ~XFEATURE_MASK_XTILE);
goto out; goto out;
} }
#endif #endif
if (static_cpu_has(X86_FEATURE_XSAVE)) { if (static_cpu_has(X86_FEATURE_XSAVE)) {
kfpu_do_xrstor("xrstor", state, ~0); kfpu_do_xrstor("xrstor", state, ~XFEATURE_MASK_XTILE);
} else if (static_cpu_has(X86_FEATURE_FXSR)) { } else if (static_cpu_has(X86_FEATURE_FXSR)) {
kfpu_restore_fxsr(state); kfpu_restore_fxsr(state);
} else { } else {

View File

@ -198,6 +198,14 @@ extern uint64_t spl_kmem_cache_entry_size(kmem_cache_t *cache);
spl_kmem_cache_create(name, size, align, ctor, dtor, rclm, priv, vmp, fl) spl_kmem_cache_create(name, size, align, ctor, dtor, rclm, priv, vmp, fl)
#define kmem_cache_set_move(skc, move) spl_kmem_cache_set_move(skc, move) #define kmem_cache_set_move(skc, move) spl_kmem_cache_set_move(skc, move)
#define kmem_cache_destroy(skc) spl_kmem_cache_destroy(skc) #define kmem_cache_destroy(skc) spl_kmem_cache_destroy(skc)
/*
* This is necessary to be compatible with other kernel modules
* or in-tree filesystem that may define kmem_cache_alloc,
* like bcachefs does it now.
*/
#ifdef kmem_cache_alloc
#undef kmem_cache_alloc
#endif
#define kmem_cache_alloc(skc, flags) spl_kmem_cache_alloc(skc, flags) #define kmem_cache_alloc(skc, flags) spl_kmem_cache_alloc(skc, flags)
#define kmem_cache_free(skc, obj) spl_kmem_cache_free(skc, obj) #define kmem_cache_free(skc, obj) spl_kmem_cache_free(skc, obj)
#define kmem_cache_reap_now(skc) spl_kmem_cache_reap_now(skc) #define kmem_cache_reap_now(skc) spl_kmem_cache_reap_now(skc)

View File

@ -38,7 +38,7 @@ typedef unsigned long ulong_t;
typedef unsigned long long u_longlong_t; typedef unsigned long long u_longlong_t;
typedef long long longlong_t; typedef long long longlong_t;
typedef unsigned long intptr_t; typedef long intptr_t;
typedef unsigned long long rlim64_t; typedef unsigned long long rlim64_t;
typedef struct task_struct kthread_t; typedef struct task_struct kthread_t;

View File

@ -173,4 +173,16 @@ zfs_uio_iov_iter_init(zfs_uio_t *uio, struct iov_iter *iter, offset_t offset,
} }
#endif #endif
#if defined(HAVE_ITER_IOV)
#define zfs_uio_iter_iov(iter) iter_iov((iter))
#else
#define zfs_uio_iter_iov(iter) (iter)->iov
#endif
#if defined(HAVE_IOV_ITER_TYPE)
#define zfs_uio_iov_iter_type(iter) iov_iter_type((iter))
#else
#define zfs_uio_iov_iter_type(iter) (iter)->type
#endif
#endif /* SPL_UIO_H */ #endif /* SPL_UIO_H */

View File

@ -51,7 +51,6 @@ DECLARE_EVENT_CLASS(zfs_arc_buf_hdr_class,
__array(uint64_t, hdr_dva_word, 2) __array(uint64_t, hdr_dva_word, 2)
__field(uint64_t, hdr_birth) __field(uint64_t, hdr_birth)
__field(uint32_t, hdr_flags) __field(uint32_t, hdr_flags)
__field(uint32_t, hdr_bufcnt)
__field(arc_buf_contents_t, hdr_type) __field(arc_buf_contents_t, hdr_type)
__field(uint16_t, hdr_psize) __field(uint16_t, hdr_psize)
__field(uint16_t, hdr_lsize) __field(uint16_t, hdr_lsize)
@ -70,7 +69,6 @@ DECLARE_EVENT_CLASS(zfs_arc_buf_hdr_class,
__entry->hdr_dva_word[1] = ab->b_dva.dva_word[1]; __entry->hdr_dva_word[1] = ab->b_dva.dva_word[1];
__entry->hdr_birth = ab->b_birth; __entry->hdr_birth = ab->b_birth;
__entry->hdr_flags = ab->b_flags; __entry->hdr_flags = ab->b_flags;
__entry->hdr_bufcnt = ab->b_l1hdr.b_bufcnt;
__entry->hdr_psize = ab->b_psize; __entry->hdr_psize = ab->b_psize;
__entry->hdr_lsize = ab->b_lsize; __entry->hdr_lsize = ab->b_lsize;
__entry->hdr_spa = ab->b_spa; __entry->hdr_spa = ab->b_spa;
@ -84,12 +82,12 @@ DECLARE_EVENT_CLASS(zfs_arc_buf_hdr_class,
__entry->hdr_refcount = ab->b_l1hdr.b_refcnt.rc_count; __entry->hdr_refcount = ab->b_l1hdr.b_refcnt.rc_count;
), ),
TP_printk("hdr { dva 0x%llx:0x%llx birth %llu " TP_printk("hdr { dva 0x%llx:0x%llx birth %llu "
"flags 0x%x bufcnt %u type %u psize %u lsize %u spa %llu " "flags 0x%x type %u psize %u lsize %u spa %llu "
"state_type %u access %lu mru_hits %u mru_ghost_hits %u " "state_type %u access %lu mru_hits %u mru_ghost_hits %u "
"mfu_hits %u mfu_ghost_hits %u l2_hits %u refcount %lli }", "mfu_hits %u mfu_ghost_hits %u l2_hits %u refcount %lli }",
__entry->hdr_dva_word[0], __entry->hdr_dva_word[1], __entry->hdr_dva_word[0], __entry->hdr_dva_word[1],
__entry->hdr_birth, __entry->hdr_flags, __entry->hdr_birth, __entry->hdr_flags,
__entry->hdr_bufcnt, __entry->hdr_type, __entry->hdr_psize, __entry->hdr_type, __entry->hdr_psize,
__entry->hdr_lsize, __entry->hdr_spa, __entry->hdr_state_type, __entry->hdr_lsize, __entry->hdr_spa, __entry->hdr_state_type,
__entry->hdr_access, __entry->hdr_mru_hits, __entry->hdr_access, __entry->hdr_mru_hits,
__entry->hdr_mru_ghost_hits, __entry->hdr_mfu_hits, __entry->hdr_mru_ghost_hits, __entry->hdr_mfu_hits,
@ -192,7 +190,6 @@ DECLARE_EVENT_CLASS(zfs_arc_miss_class,
__array(uint64_t, hdr_dva_word, 2) __array(uint64_t, hdr_dva_word, 2)
__field(uint64_t, hdr_birth) __field(uint64_t, hdr_birth)
__field(uint32_t, hdr_flags) __field(uint32_t, hdr_flags)
__field(uint32_t, hdr_bufcnt)
__field(arc_buf_contents_t, hdr_type) __field(arc_buf_contents_t, hdr_type)
__field(uint16_t, hdr_psize) __field(uint16_t, hdr_psize)
__field(uint16_t, hdr_lsize) __field(uint16_t, hdr_lsize)
@ -223,7 +220,6 @@ DECLARE_EVENT_CLASS(zfs_arc_miss_class,
__entry->hdr_dva_word[1] = hdr->b_dva.dva_word[1]; __entry->hdr_dva_word[1] = hdr->b_dva.dva_word[1];
__entry->hdr_birth = hdr->b_birth; __entry->hdr_birth = hdr->b_birth;
__entry->hdr_flags = hdr->b_flags; __entry->hdr_flags = hdr->b_flags;
__entry->hdr_bufcnt = hdr->b_l1hdr.b_bufcnt;
__entry->hdr_psize = hdr->b_psize; __entry->hdr_psize = hdr->b_psize;
__entry->hdr_lsize = hdr->b_lsize; __entry->hdr_lsize = hdr->b_lsize;
__entry->hdr_spa = hdr->b_spa; __entry->hdr_spa = hdr->b_spa;
@ -255,7 +251,7 @@ DECLARE_EVENT_CLASS(zfs_arc_miss_class,
__entry->zb_blkid = zb->zb_blkid; __entry->zb_blkid = zb->zb_blkid;
), ),
TP_printk("hdr { dva 0x%llx:0x%llx birth %llu " TP_printk("hdr { dva 0x%llx:0x%llx birth %llu "
"flags 0x%x bufcnt %u psize %u lsize %u spa %llu state_type %u " "flags 0x%x psize %u lsize %u spa %llu state_type %u "
"access %lu mru_hits %u mru_ghost_hits %u mfu_hits %u " "access %lu mru_hits %u mru_ghost_hits %u mfu_hits %u "
"mfu_ghost_hits %u l2_hits %u refcount %lli } " "mfu_ghost_hits %u l2_hits %u refcount %lli } "
"bp { dva0 0x%llx:0x%llx dva1 0x%llx:0x%llx dva2 " "bp { dva0 0x%llx:0x%llx dva1 0x%llx:0x%llx dva2 "
@ -264,7 +260,7 @@ DECLARE_EVENT_CLASS(zfs_arc_miss_class,
"blkid %llu }", "blkid %llu }",
__entry->hdr_dva_word[0], __entry->hdr_dva_word[1], __entry->hdr_dva_word[0], __entry->hdr_dva_word[1],
__entry->hdr_birth, __entry->hdr_flags, __entry->hdr_birth, __entry->hdr_flags,
__entry->hdr_bufcnt, __entry->hdr_psize, __entry->hdr_lsize, __entry->hdr_psize, __entry->hdr_lsize,
__entry->hdr_spa, __entry->hdr_state_type, __entry->hdr_access, __entry->hdr_spa, __entry->hdr_state_type, __entry->hdr_access,
__entry->hdr_mru_hits, __entry->hdr_mru_ghost_hits, __entry->hdr_mru_hits, __entry->hdr_mru_ghost_hits,
__entry->hdr_mfu_hits, __entry->hdr_mfu_ghost_hits, __entry->hdr_mfu_hits, __entry->hdr_mfu_ghost_hits,

View File

@ -60,8 +60,12 @@
#define DBUF_TP_FAST_ASSIGN \ #define DBUF_TP_FAST_ASSIGN \
if (db != NULL) { \ if (db != NULL) { \
if (POINTER_IS_VALID(DB_DNODE(db)->dn_objset)) { \
__assign_str(os_spa, \ __assign_str(os_spa, \
spa_name(DB_DNODE(db)->dn_objset->os_spa)); \ spa_name(DB_DNODE(db)->dn_objset->os_spa)); \
} else { \
__assign_str(os_spa, "NULL"); \
} \
\ \
__entry->ds_object = db->db_objset->os_dsl_dataset ? \ __entry->ds_object = db->db_objset->os_dsl_dataset ? \
db->db_objset->os_dsl_dataset->ds_object : 0; \ db->db_objset->os_dsl_dataset->ds_object : 0; \

View File

@ -105,7 +105,6 @@ struct zfsvfs {
rrmlock_t z_teardown_lock; rrmlock_t z_teardown_lock;
krwlock_t z_teardown_inactive_lock; krwlock_t z_teardown_inactive_lock;
list_t z_all_znodes; /* all znodes in the fs */ list_t z_all_znodes; /* all znodes in the fs */
uint64_t z_nr_znodes; /* number of znodes in the fs */
unsigned long z_rollback_time; /* last online rollback time */ unsigned long z_rollback_time; /* last online rollback time */
unsigned long z_snap_defer_time; /* last snapshot unmount deferral */ unsigned long z_snap_defer_time; /* last snapshot unmount deferral */
kmutex_t z_znodes_lock; /* lock for z_all_znodes */ kmutex_t z_znodes_lock; /* lock for z_all_znodes */

View File

@ -52,7 +52,11 @@ extern const struct inode_operations zpl_special_inode_operations;
/* zpl_file.c */ /* zpl_file.c */
extern const struct address_space_operations zpl_address_space_operations; extern const struct address_space_operations zpl_address_space_operations;
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
extern const struct file_operations_extend zpl_file_operations;
#else
extern const struct file_operations zpl_file_operations; extern const struct file_operations zpl_file_operations;
#endif
extern const struct file_operations zpl_dir_file_operations; extern const struct file_operations zpl_dir_file_operations;
/* zpl_super.c */ /* zpl_super.c */
@ -180,6 +184,55 @@ zpl_dir_emit_dots(struct file *file, zpl_dir_context_t *ctx)
} }
#endif /* HAVE_VFS_ITERATE */ #endif /* HAVE_VFS_ITERATE */
/* zpl_file_range.c */
/* handlers for file_operations of the same name */
extern ssize_t zpl_copy_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, size_t len, unsigned int flags);
extern loff_t zpl_remap_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, loff_t len, unsigned int flags);
extern int zpl_clone_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len);
extern int zpl_dedupe_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len);
/* compat for FICLONE/FICLONERANGE/FIDEDUPERANGE ioctls */
typedef struct {
int64_t fcr_src_fd;
uint64_t fcr_src_offset;
uint64_t fcr_src_length;
uint64_t fcr_dest_offset;
} zfs_ioc_compat_file_clone_range_t;
typedef struct {
int64_t fdri_dest_fd;
uint64_t fdri_dest_offset;
uint64_t fdri_bytes_deduped;
int32_t fdri_status;
uint32_t fdri_reserved;
} zfs_ioc_compat_dedupe_range_info_t;
typedef struct {
uint64_t fdr_src_offset;
uint64_t fdr_src_length;
uint16_t fdr_dest_count;
uint16_t fdr_reserved1;
uint32_t fdr_reserved2;
zfs_ioc_compat_dedupe_range_info_t fdr_info[];
} zfs_ioc_compat_dedupe_range_t;
#define ZFS_IOC_COMPAT_FICLONE _IOW(0x94, 9, int)
#define ZFS_IOC_COMPAT_FICLONERANGE \
_IOW(0x94, 13, zfs_ioc_compat_file_clone_range_t)
#define ZFS_IOC_COMPAT_FIDEDUPERANGE \
_IOWR(0x94, 54, zfs_ioc_compat_dedupe_range_t)
extern long zpl_ioctl_ficlone(struct file *filp, void *arg);
extern long zpl_ioctl_ficlonerange(struct file *filp, void *arg);
extern long zpl_ioctl_fideduperange(struct file *filp, void *arg);
#if defined(HAVE_INODE_TIMESTAMP_TRUNCATE) #if defined(HAVE_INODE_TIMESTAMP_TRUNCATE)
#define zpl_inode_timestamp_truncate(ts, ip) timestamp_truncate(ts, ip) #define zpl_inode_timestamp_truncate(ts, ip) timestamp_truncate(ts, ip)
#elif defined(HAVE_INODE_TIMESPEC64_TIMES) #elif defined(HAVE_INODE_TIMESPEC64_TIMES)

View File

@ -159,10 +159,6 @@ struct arc_write_callback {
* these two allocation states. * these two allocation states.
*/ */
typedef struct l1arc_buf_hdr { typedef struct l1arc_buf_hdr {
/* for waiting on reads to complete */
kcondvar_t b_cv;
uint8_t b_byteswap;
/* protected by arc state mutex */ /* protected by arc state mutex */
arc_state_t *b_state; arc_state_t *b_state;
multilist_node_t b_arc_node; multilist_node_t b_arc_node;
@ -173,7 +169,7 @@ typedef struct l1arc_buf_hdr {
uint32_t b_mru_ghost_hits; uint32_t b_mru_ghost_hits;
uint32_t b_mfu_hits; uint32_t b_mfu_hits;
uint32_t b_mfu_ghost_hits; uint32_t b_mfu_ghost_hits;
uint32_t b_bufcnt; uint8_t b_byteswap;
arc_buf_t *b_buf; arc_buf_t *b_buf;
/* self protecting */ /* self protecting */
@ -436,12 +432,12 @@ typedef struct l2arc_dev {
*/ */
typedef struct arc_buf_hdr_crypt { typedef struct arc_buf_hdr_crypt {
abd_t *b_rabd; /* raw encrypted data */ abd_t *b_rabd; /* raw encrypted data */
dmu_object_type_t b_ot; /* object type */
uint32_t b_ebufcnt; /* count of encrypted buffers */
/* dsobj for looking up encryption key for l2arc encryption */ /* dsobj for looking up encryption key for l2arc encryption */
uint64_t b_dsobj; uint64_t b_dsobj;
dmu_object_type_t b_ot; /* object type */
/* encryption parameters */ /* encryption parameters */
uint8_t b_salt[ZIO_DATA_SALT_LEN]; uint8_t b_salt[ZIO_DATA_SALT_LEN];
uint8_t b_iv[ZIO_DATA_IV_LEN]; uint8_t b_iv[ZIO_DATA_IV_LEN];

View File

@ -60,7 +60,7 @@ typedef struct bpobj {
kmutex_t bpo_lock; kmutex_t bpo_lock;
objset_t *bpo_os; objset_t *bpo_os;
uint64_t bpo_object; uint64_t bpo_object;
int bpo_epb; uint32_t bpo_epb;
uint8_t bpo_havecomp; uint8_t bpo_havecomp;
uint8_t bpo_havesubobj; uint8_t bpo_havesubobj;
uint8_t bpo_havefreed; uint8_t bpo_havefreed;

View File

@ -36,6 +36,7 @@ extern "C" {
#endif #endif
extern boolean_t brt_entry_decref(spa_t *spa, const blkptr_t *bp); extern boolean_t brt_entry_decref(spa_t *spa, const blkptr_t *bp);
extern uint64_t brt_entry_get_refcount(spa_t *spa, const blkptr_t *bp);
extern uint64_t brt_get_dspace(spa_t *spa); extern uint64_t brt_get_dspace(spa_t *spa);
extern uint64_t brt_get_used(spa_t *spa); extern uint64_t brt_get_used(spa_t *spa);

View File

@ -572,11 +572,15 @@ int dmu_buf_hold(objset_t *os, uint64_t object, uint64_t offset,
int dmu_buf_hold_array(objset_t *os, uint64_t object, uint64_t offset, int dmu_buf_hold_array(objset_t *os, uint64_t object, uint64_t offset,
uint64_t length, int read, const void *tag, int *numbufsp, uint64_t length, int read, const void *tag, int *numbufsp,
dmu_buf_t ***dbpp); dmu_buf_t ***dbpp);
int dmu_buf_hold_noread(objset_t *os, uint64_t object, uint64_t offset,
const void *tag, dmu_buf_t **dbp);
int dmu_buf_hold_by_dnode(dnode_t *dn, uint64_t offset, int dmu_buf_hold_by_dnode(dnode_t *dn, uint64_t offset,
const void *tag, dmu_buf_t **dbp, int flags); const void *tag, dmu_buf_t **dbp, int flags);
int dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, int dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset,
uint64_t length, boolean_t read, const void *tag, int *numbufsp, uint64_t length, boolean_t read, const void *tag, int *numbufsp,
dmu_buf_t ***dbpp, uint32_t flags); dmu_buf_t ***dbpp, uint32_t flags);
int dmu_buf_hold_noread_by_dnode(dnode_t *dn, uint64_t offset, const void *tag,
dmu_buf_t **dbp);
/* /*
* Add a reference to a dmu buffer that has already been held via * Add a reference to a dmu buffer that has already been held via
* dmu_buf_hold() in the current context. * dmu_buf_hold() in the current context.

View File

@ -247,8 +247,6 @@ typedef struct dmu_sendstatus {
void dmu_object_zapify(objset_t *, uint64_t, dmu_object_type_t, dmu_tx_t *); void dmu_object_zapify(objset_t *, uint64_t, dmu_object_type_t, dmu_tx_t *);
void dmu_object_free_zapified(objset_t *, uint64_t, dmu_tx_t *); void dmu_object_free_zapified(objset_t *, uint64_t, dmu_tx_t *);
int dmu_buf_hold_noread(objset_t *, uint64_t, uint64_t,
const void *, dmu_buf_t **);
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@ -36,8 +36,6 @@
extern "C" { extern "C" {
#endif #endif
extern uint64_t zfetch_array_rd_sz;
struct dnode; /* so we can reference dnode */ struct dnode; /* so we can reference dnode */
typedef struct zfetch { typedef struct zfetch {

View File

@ -102,8 +102,6 @@ extern "C" {
#define FM_EREPORT_PAYLOAD_ZFS_ZIO_TIMESTAMP "zio_timestamp" #define FM_EREPORT_PAYLOAD_ZFS_ZIO_TIMESTAMP "zio_timestamp"
#define FM_EREPORT_PAYLOAD_ZFS_ZIO_DELTA "zio_delta" #define FM_EREPORT_PAYLOAD_ZFS_ZIO_DELTA "zio_delta"
#define FM_EREPORT_PAYLOAD_ZFS_PREV_STATE "prev_state" #define FM_EREPORT_PAYLOAD_ZFS_PREV_STATE "prev_state"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_EXPECTED "cksum_expected"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_ACTUAL "cksum_actual"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_ALGO "cksum_algorithm" #define FM_EREPORT_PAYLOAD_ZFS_CKSUM_ALGO "cksum_algorithm"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_BYTESWAP "cksum_byteswap" #define FM_EREPORT_PAYLOAD_ZFS_CKSUM_BYTESWAP "cksum_byteswap"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_OFFSET_RANGES "bad_ranges" #define FM_EREPORT_PAYLOAD_ZFS_BAD_OFFSET_RANGES "bad_ranges"
@ -112,8 +110,6 @@ extern "C" {
#define FM_EREPORT_PAYLOAD_ZFS_BAD_RANGE_CLEARS "bad_range_clears" #define FM_EREPORT_PAYLOAD_ZFS_BAD_RANGE_CLEARS "bad_range_clears"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_SET_BITS "bad_set_bits" #define FM_EREPORT_PAYLOAD_ZFS_BAD_SET_BITS "bad_set_bits"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_BITS "bad_cleared_bits" #define FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_BITS "bad_cleared_bits"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM "bad_set_histogram"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM "bad_cleared_histogram"
#define FM_EREPORT_PAYLOAD_ZFS_SNAPSHOT_NAME "snapshot_name" #define FM_EREPORT_PAYLOAD_ZFS_SNAPSHOT_NAME "snapshot_name"
#define FM_EREPORT_PAYLOAD_ZFS_DEVICE_NAME "device_name" #define FM_EREPORT_PAYLOAD_ZFS_DEVICE_NAME "device_name"
#define FM_EREPORT_PAYLOAD_ZFS_RAW_DEVICE_NAME "raw_name" #define FM_EREPORT_PAYLOAD_ZFS_RAW_DEVICE_NAME "raw_name"

View File

@ -80,7 +80,6 @@ uint64_t metaslab_largest_allocatable(metaslab_t *);
#define METASLAB_ASYNC_ALLOC 0x8 #define METASLAB_ASYNC_ALLOC 0x8
#define METASLAB_DONT_THROTTLE 0x10 #define METASLAB_DONT_THROTTLE 0x10
#define METASLAB_MUST_RESERVE 0x20 #define METASLAB_MUST_RESERVE 0x20
#define METASLAB_FASTWRITE 0x40
#define METASLAB_ZIL 0x80 #define METASLAB_ZIL 0x80
int metaslab_alloc(spa_t *, metaslab_class_t *, uint64_t, int metaslab_alloc(spa_t *, metaslab_class_t *, uint64_t,
@ -96,8 +95,6 @@ void metaslab_unalloc_dva(spa_t *, const dva_t *, uint64_t);
int metaslab_claim(spa_t *, const blkptr_t *, uint64_t); int metaslab_claim(spa_t *, const blkptr_t *, uint64_t);
int metaslab_claim_impl(vdev_t *, uint64_t, uint64_t, uint64_t); int metaslab_claim_impl(vdev_t *, uint64_t, uint64_t, uint64_t);
void metaslab_check_free(spa_t *, const blkptr_t *); void metaslab_check_free(spa_t *, const blkptr_t *);
void metaslab_fastwrite_mark(spa_t *, const blkptr_t *);
void metaslab_fastwrite_unmark(spa_t *, const blkptr_t *);
void metaslab_stat_init(void); void metaslab_stat_init(void);
void metaslab_stat_fini(void); void metaslab_stat_fini(void);

View File

@ -250,7 +250,6 @@ struct metaslab_group {
int64_t mg_activation_count; int64_t mg_activation_count;
metaslab_class_t *mg_class; metaslab_class_t *mg_class;
vdev_t *mg_vd; vdev_t *mg_vd;
taskq_t *mg_taskq;
metaslab_group_t *mg_prev; metaslab_group_t *mg_prev;
metaslab_group_t *mg_next; metaslab_group_t *mg_next;
@ -313,7 +312,7 @@ struct metaslab_group {
* Each metaslab maintains a set of in-core trees to track metaslab * Each metaslab maintains a set of in-core trees to track metaslab
* operations. The in-core free tree (ms_allocatable) contains the list of * operations. The in-core free tree (ms_allocatable) contains the list of
* free segments which are eligible for allocation. As blocks are * free segments which are eligible for allocation. As blocks are
* allocated, the allocated segment are removed from the ms_allocatable and * allocated, the allocated segments are removed from the ms_allocatable and
* added to a per txg allocation tree (ms_allocating). As blocks are * added to a per txg allocation tree (ms_allocating). As blocks are
* freed, they are added to the free tree (ms_freeing). These trees * freed, they are added to the free tree (ms_freeing). These trees
* allow us to process all allocations and frees in syncing context * allow us to process all allocations and frees in syncing context
@ -366,9 +365,9 @@ struct metaslab_group {
struct metaslab { struct metaslab {
/* /*
* This is the main lock of the metaslab and its purpose is to * This is the main lock of the metaslab and its purpose is to
* coordinate our allocations and frees [e.g metaslab_block_alloc(), * coordinate our allocations and frees [e.g., metaslab_block_alloc(),
* metaslab_free_concrete(), ..etc] with our various syncing * metaslab_free_concrete(), ..etc] with our various syncing
* procedures [e.g. metaslab_sync(), metaslab_sync_done(), ..etc]. * procedures [e.g., metaslab_sync(), metaslab_sync_done(), ..etc].
* *
* The lock is also used during some miscellaneous operations like * The lock is also used during some miscellaneous operations like
* using the metaslab's histogram for the metaslab group's histogram * using the metaslab's histogram for the metaslab group's histogram

View File

@ -723,16 +723,10 @@ typedef enum spa_mode {
* Send TRIM commands in-line during normal pool operation while deleting. * Send TRIM commands in-line during normal pool operation while deleting.
* OFF: no * OFF: no
* ON: yes * ON: yes
* NB: IN_FREEBSD_BASE is defined within the FreeBSD sources.
*/ */
typedef enum { typedef enum {
SPA_AUTOTRIM_OFF = 0, /* default */ SPA_AUTOTRIM_OFF = 0, /* default */
SPA_AUTOTRIM_ON, SPA_AUTOTRIM_ON,
#ifdef IN_FREEBSD_BASE
SPA_AUTOTRIM_DEFAULT = SPA_AUTOTRIM_ON,
#else
SPA_AUTOTRIM_DEFAULT = SPA_AUTOTRIM_OFF,
#endif
} spa_autotrim_t; } spa_autotrim_t;
/* /*

View File

@ -250,6 +250,7 @@ struct spa {
uint64_t spa_min_ashift; /* of vdevs in normal class */ uint64_t spa_min_ashift; /* of vdevs in normal class */
uint64_t spa_max_ashift; /* of vdevs in normal class */ uint64_t spa_max_ashift; /* of vdevs in normal class */
uint64_t spa_min_alloc; /* of vdevs in normal class */ uint64_t spa_min_alloc; /* of vdevs in normal class */
uint64_t spa_gcd_alloc; /* of vdevs in normal class */
uint64_t spa_config_guid; /* config pool guid */ uint64_t spa_config_guid; /* config pool guid */
uint64_t spa_load_guid; /* spa_load initialized guid */ uint64_t spa_load_guid; /* spa_load initialized guid */
uint64_t spa_last_synced_guid; /* last synced guid */ uint64_t spa_last_synced_guid; /* last synced guid */
@ -422,7 +423,9 @@ struct spa {
hrtime_t spa_ccw_fail_time; /* Conf cache write fail time */ hrtime_t spa_ccw_fail_time; /* Conf cache write fail time */
taskq_t *spa_zvol_taskq; /* Taskq for minor management */ taskq_t *spa_zvol_taskq; /* Taskq for minor management */
taskq_t *spa_metaslab_taskq; /* Taskq for metaslab preload */
taskq_t *spa_prefetch_taskq; /* Taskq for prefetch threads */ taskq_t *spa_prefetch_taskq; /* Taskq for prefetch threads */
taskq_t *spa_upgrade_taskq; /* Taskq for upgrade jobs */
uint64_t spa_multihost; /* multihost aware (mmp) */ uint64_t spa_multihost; /* multihost aware (mmp) */
mmp_thread_t spa_mmp; /* multihost mmp thread */ mmp_thread_t spa_mmp; /* multihost mmp thread */
list_t spa_leaf_list; /* list of leaf vdevs */ list_t spa_leaf_list; /* list of leaf vdevs */
@ -446,8 +449,6 @@ struct spa {
*/ */
spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */ spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */
zfs_refcount_t spa_refcount; /* number of opens */ zfs_refcount_t spa_refcount; /* number of opens */
taskq_t *spa_upgrade_taskq; /* taskq for upgrade jobs */
}; };
extern char *spa_config_path; extern char *spa_config_path;

View File

@ -266,7 +266,6 @@ struct vdev {
metaslab_group_t *vdev_mg; /* metaslab group */ metaslab_group_t *vdev_mg; /* metaslab group */
metaslab_group_t *vdev_log_mg; /* embedded slog metaslab group */ metaslab_group_t *vdev_log_mg; /* embedded slog metaslab group */
metaslab_t **vdev_ms; /* metaslab array */ metaslab_t **vdev_ms; /* metaslab array */
uint64_t vdev_pending_fastwrite; /* allocated fastwrites */
txg_list_t vdev_ms_list; /* per-txg dirty metaslab lists */ txg_list_t vdev_ms_list; /* per-txg dirty metaslab lists */
txg_list_t vdev_dtl_list; /* per-txg dirty DTL lists */ txg_list_t vdev_dtl_list; /* per-txg dirty DTL lists */
txg_node_t vdev_txg_node; /* per-txg dirty vdev linkage */ txg_node_t vdev_txg_node; /* per-txg dirty vdev linkage */
@ -420,6 +419,7 @@ struct vdev {
boolean_t vdev_copy_uberblocks; /* post expand copy uberblocks */ boolean_t vdev_copy_uberblocks; /* post expand copy uberblocks */
boolean_t vdev_resilver_deferred; /* resilver deferred */ boolean_t vdev_resilver_deferred; /* resilver deferred */
boolean_t vdev_kobj_flag; /* kobj event record */ boolean_t vdev_kobj_flag; /* kobj event record */
boolean_t vdev_attaching; /* vdev attach ashift handling */
vdev_queue_t vdev_queue; /* I/O deadline schedule queue */ vdev_queue_t vdev_queue; /* I/O deadline schedule queue */
spa_aux_vdev_t *vdev_aux; /* for l2cache and spares vdevs */ spa_aux_vdev_t *vdev_aux; /* for l2cache and spares vdevs */
zio_t *vdev_probe_zio; /* root of current probe */ zio_t *vdev_probe_zio; /* root of current probe */

View File

@ -38,14 +38,22 @@ extern "C" {
/* /*
* Possible states for a given lwb structure. * Possible states for a given lwb structure.
* *
* An lwb will start out in the "closed" state, and then transition to * An lwb will start out in the "new" state, and transition to the "opened"
* the "opened" state via a call to zil_lwb_write_open(). When * state via a call to zil_lwb_write_open() on first itx assignment. When
* transitioning from "closed" to "opened" the zilog's "zl_issuer_lock" * transitioning from "new" to "opened" the zilog's "zl_issuer_lock" must be
* must be held. * held.
* *
* After the lwb is "opened", it can transition into the "issued" state * After the lwb is "opened", it can be assigned number of itxs and transition
* via zil_lwb_write_close(). Again, the zilog's "zl_issuer_lock" must * into the "closed" state via zil_lwb_write_close() when full or on timeout.
* be held when making this transition. * When transitioning from "opened" to "closed" the zilog's "zl_issuer_lock"
* must be held. New lwb allocation also takes "zl_lock" to protect the list.
*
* After the lwb is "closed", it can transition into the "ready" state via
* zil_lwb_write_issue(). "zl_lock" must be held when making this transition.
* Since it is done by the same thread, "zl_issuer_lock" is not needed.
*
* When lwb in "ready" state receives its block pointer, it can transition to
* "issued". "zl_lock" must be held when making this transition.
* *
* After the lwb's write zio completes, it transitions into the "write * After the lwb's write zio completes, it transitions into the "write
* done" state via zil_lwb_write_done(); and then into the "flush done" * done" state via zil_lwb_write_done(); and then into the "flush done"
@ -62,17 +70,20 @@ extern "C" {
* *
* Additionally, correctness when reading an lwb's state is often * Additionally, correctness when reading an lwb's state is often
* achieved by exploiting the fact that these state transitions occur in * achieved by exploiting the fact that these state transitions occur in
* this specific order; i.e. "closed" to "opened" to "issued" to "done". * this specific order; i.e. "new" to "opened" to "closed" to "ready" to
* "issued" to "write_done" and finally "flush_done".
* *
* Thus, if an lwb is in the "closed" or "opened" state, holding the * Thus, if an lwb is in the "new" or "opened" state, holding the
* "zl_issuer_lock" will prevent a concurrent thread from transitioning * "zl_issuer_lock" will prevent a concurrent thread from transitioning
* that lwb to the "issued" state. Likewise, if an lwb is already in the * that lwb to the "closed" state. Likewise, if an lwb is already in the
* "issued" state, holding the "zl_lock" will prevent a concurrent * "ready" state, holding the "zl_lock" will prevent a concurrent thread
* thread from transitioning that lwb to the "write done" state. * from transitioning that lwb to the "issued" state.
*/ */
typedef enum { typedef enum {
LWB_STATE_CLOSED, LWB_STATE_NEW,
LWB_STATE_OPENED, LWB_STATE_OPENED,
LWB_STATE_CLOSED,
LWB_STATE_READY,
LWB_STATE_ISSUED, LWB_STATE_ISSUED,
LWB_STATE_WRITE_DONE, LWB_STATE_WRITE_DONE,
LWB_STATE_FLUSH_DONE, LWB_STATE_FLUSH_DONE,
@ -91,18 +102,21 @@ typedef enum {
typedef struct lwb { typedef struct lwb {
zilog_t *lwb_zilog; /* back pointer to log struct */ zilog_t *lwb_zilog; /* back pointer to log struct */
blkptr_t lwb_blk; /* on disk address of this log blk */ blkptr_t lwb_blk; /* on disk address of this log blk */
boolean_t lwb_fastwrite; /* is blk marked for fastwrite? */ boolean_t lwb_slim; /* log block has slim format */
boolean_t lwb_slog; /* lwb_blk is on SLOG device */ boolean_t lwb_slog; /* lwb_blk is on SLOG device */
boolean_t lwb_indirect; /* do not postpone zil_lwb_commit() */ int lwb_error; /* log block allocation error */
int lwb_nmax; /* max bytes in the buffer */
int lwb_nused; /* # used bytes in buffer */ int lwb_nused; /* # used bytes in buffer */
int lwb_nfilled; /* # filled bytes in buffer */ int lwb_nfilled; /* # filled bytes in buffer */
int lwb_sz; /* size of block and buffer */ int lwb_sz; /* size of block and buffer */
lwb_state_t lwb_state; /* the state of this lwb */ lwb_state_t lwb_state; /* the state of this lwb */
char *lwb_buf; /* log write buffer */ char *lwb_buf; /* log write buffer */
zio_t *lwb_child_zio; /* parent zio for children */
zio_t *lwb_write_zio; /* zio for the lwb buffer */ zio_t *lwb_write_zio; /* zio for the lwb buffer */
zio_t *lwb_root_zio; /* root zio for lwb write and flushes */ zio_t *lwb_root_zio; /* root zio for lwb write and flushes */
hrtime_t lwb_issued_timestamp; /* when was the lwb issued? */ hrtime_t lwb_issued_timestamp; /* when was the lwb issued? */
uint64_t lwb_issued_txg; /* the txg when the write is issued */ uint64_t lwb_issued_txg; /* the txg when the write is issued */
uint64_t lwb_alloc_txg; /* the txg when lwb_blk is allocated */
uint64_t lwb_max_txg; /* highest txg in this lwb */ uint64_t lwb_max_txg; /* highest txg in this lwb */
list_node_t lwb_node; /* zilog->zl_lwb_list linkage */ list_node_t lwb_node; /* zilog->zl_lwb_list linkage */
list_node_t lwb_issue_node; /* linkage of lwbs ready for issue */ list_node_t lwb_issue_node; /* linkage of lwbs ready for issue */

View File

@ -222,7 +222,6 @@ typedef uint64_t zio_flag_t;
#define ZIO_FLAG_NOPWRITE (1ULL << 28) #define ZIO_FLAG_NOPWRITE (1ULL << 28)
#define ZIO_FLAG_REEXECUTED (1ULL << 29) #define ZIO_FLAG_REEXECUTED (1ULL << 29)
#define ZIO_FLAG_DELEGATED (1ULL << 30) #define ZIO_FLAG_DELEGATED (1ULL << 30)
#define ZIO_FLAG_FASTWRITE (1ULL << 31)
#define ZIO_FLAG_MUSTSUCCEED 0 #define ZIO_FLAG_MUSTSUCCEED 0
#define ZIO_FLAG_RAW (ZIO_FLAG_RAW_COMPRESS | ZIO_FLAG_RAW_ENCRYPT) #define ZIO_FLAG_RAW (ZIO_FLAG_RAW_COMPRESS | ZIO_FLAG_RAW_ENCRYPT)

View File

@ -94,8 +94,6 @@ typedef const struct zio_checksum_info {
} zio_checksum_info_t; } zio_checksum_info_t;
typedef struct zio_bad_cksum { typedef struct zio_bad_cksum {
zio_cksum_t zbc_expected;
zio_cksum_t zbc_actual;
const char *zbc_checksum_name; const char *zbc_checksum_name;
uint8_t zbc_byteswapped; uint8_t zbc_byteswapped;
uint8_t zbc_injected; uint8_t zbc_injected;

View File

@ -161,7 +161,8 @@ nfs_is_shared(sa_share_impl_t impl_share)
static int static int
nfs_validate_shareopts(const char *shareopts) nfs_validate_shareopts(const char *shareopts)
{ {
(void) shareopts; if (strlen(shareopts) == 0)
return (SA_SYNTAX_ERR);
return (SA_OK); return (SA_OK);
} }

View File

@ -319,12 +319,49 @@ get_linux_shareopts_cb(const char *key, const char *value, void *cookie)
"wdelay" }; "wdelay" };
char **plinux_opts = (char **)cookie; char **plinux_opts = (char **)cookie;
char *host, *val_dup, *literal, *next;
/* host-specific options, these are taken care of elsewhere */ if (strcmp(key, "sec") == 0)
if (strcmp(key, "ro") == 0 || strcmp(key, "rw") == 0 ||
strcmp(key, "sec") == 0)
return (SA_OK); return (SA_OK);
if (strcmp(key, "ro") == 0 || strcmp(key, "rw") == 0) {
if (value == NULL || strlen(value) == 0)
return (SA_OK);
val_dup = strdup(value);
host = val_dup;
if (host == NULL)
return (SA_NO_MEMORY);
do {
if (*host == '[') {
host++;
literal = strchr(host, ']');
if (literal == NULL) {
free(val_dup);
return (SA_SYNTAX_ERR);
}
if (literal[1] == '\0')
next = NULL;
else if (literal[1] == '/') {
next = strchr(literal + 2, ':');
if (next != NULL)
++next;
} else if (literal[1] == ':')
next = literal + 2;
else {
free(val_dup);
return (SA_SYNTAX_ERR);
}
} else {
next = strchr(host, ':');
if (next != NULL)
++next;
}
host = next;
} while (host != NULL);
free(val_dup);
return (SA_OK);
}
if (strcmp(key, "anon") == 0) if (strcmp(key, "anon") == 0)
key = "anonuid"; key = "anonuid";
@ -472,6 +509,10 @@ static int
nfs_validate_shareopts(const char *shareopts) nfs_validate_shareopts(const char *shareopts)
{ {
char *linux_opts = NULL; char *linux_opts = NULL;
if (strlen(shareopts) == 0)
return (SA_SYNTAX_ERR);
int error = get_linux_shareopts(shareopts, &linux_opts); int error = get_linux_shareopts(shareopts, &linux_opts);
if (error != SA_OK) if (error != SA_OK)
return (error); return (error);

View File

@ -57,7 +57,7 @@ libzfs_la_LIBADD = \
libzutil.la \ libzutil.la \
libuutil.la libuutil.la
libzfs_la_LIBADD += -lm $(LIBCRYPTO_LIBS) $(ZLIB_LIBS) $(LIBFETCH_LIBS) $(LTLIBINTL) libzfs_la_LIBADD += -lrt -lm $(LIBCRYPTO_LIBS) $(ZLIB_LIBS) $(LIBFETCH_LIBS) $(LTLIBINTL)
libzfs_la_LDFLAGS = -pthread libzfs_la_LDFLAGS = -pthread

View File

@ -396,6 +396,7 @@
<elf-symbol name='zfs_prop_readonly' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> <elf-symbol name='zfs_prop_readonly' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_prop_set' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> <elf-symbol name='zfs_prop_set' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_prop_set_list' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> <elf-symbol name='zfs_prop_set_list' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_prop_set_list_flags' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_prop_setonce' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> <elf-symbol name='zfs_prop_setonce' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_prop_string_to_index' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> <elf-symbol name='zfs_prop_string_to_index' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_prop_to_name' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> <elf-symbol name='zfs_prop_to_name' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
@ -4424,6 +4425,12 @@
<parameter type-id='5ce45b60' name='props'/> <parameter type-id='5ce45b60' name='props'/>
<return type-id='95e97e5e'/> <return type-id='95e97e5e'/>
</function-decl> </function-decl>
<function-decl name='zfs_prop_set_list_flags' mangled-name='zfs_prop_set_list_flags' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zfs_prop_set_list_flags'>
<parameter type-id='9200a744' name='zhp'/>
<parameter type-id='5ce45b60' name='props'/>
<parameter type-id='95e97e5e' name='flags'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='zfs_prop_inherit' mangled-name='zfs_prop_inherit' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zfs_prop_inherit'> <function-decl name='zfs_prop_inherit' mangled-name='zfs_prop_inherit' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zfs_prop_inherit'>
<parameter type-id='9200a744' name='zhp'/> <parameter type-id='9200a744' name='zhp'/>
<parameter type-id='80f4b756' name='propname'/> <parameter type-id='80f4b756' name='propname'/>

View File

@ -105,6 +105,15 @@ changelist_prefix(prop_changelist_t *clp)
clp->cl_prop != ZFS_PROP_SHARESMB) clp->cl_prop != ZFS_PROP_SHARESMB)
return (0); return (0);
/*
* If CL_GATHER_DONT_UNMOUNT is set, don't want to unmount/unshare and
* later (re)mount/(re)share the filesystem in postfix phase, so we
* return from here. If filesystem is mounted or unmounted, leave it
* as it is.
*/
if (clp->cl_gflags & CL_GATHER_DONT_UNMOUNT)
return (0);
if ((walk = uu_avl_walk_start(clp->cl_tree, UU_WALK_ROBUST)) == NULL) if ((walk = uu_avl_walk_start(clp->cl_tree, UU_WALK_ROBUST)) == NULL)
return (-1); return (-1);
@ -129,8 +138,6 @@ changelist_prefix(prop_changelist_t *clp)
*/ */
switch (clp->cl_prop) { switch (clp->cl_prop) {
case ZFS_PROP_MOUNTPOINT: case ZFS_PROP_MOUNTPOINT:
if (clp->cl_gflags & CL_GATHER_DONT_UNMOUNT)
break;
if (zfs_unmount(cn->cn_handle, NULL, if (zfs_unmount(cn->cn_handle, NULL,
clp->cl_mflags) != 0) { clp->cl_mflags) != 0) {
ret = -1; ret = -1;
@ -164,9 +171,8 @@ changelist_prefix(prop_changelist_t *clp)
* reshare the filesystems as necessary. In changelist_gather() we recorded * reshare the filesystems as necessary. In changelist_gather() we recorded
* whether the filesystem was previously shared or mounted. The action we take * whether the filesystem was previously shared or mounted. The action we take
* depends on the previous state, and whether the value was previously 'legacy'. * depends on the previous state, and whether the value was previously 'legacy'.
* For non-legacy properties, we only remount/reshare the filesystem if it was * For non-legacy properties, we always remount/reshare the filesystem,
* previously mounted/shared. Otherwise, we always remount/reshare the * if CL_GATHER_DONT_UNMOUNT is not set.
* filesystem.
*/ */
int int
changelist_postfix(prop_changelist_t *clp) changelist_postfix(prop_changelist_t *clp)
@ -174,10 +180,17 @@ changelist_postfix(prop_changelist_t *clp)
prop_changenode_t *cn; prop_changenode_t *cn;
uu_avl_walk_t *walk; uu_avl_walk_t *walk;
char shareopts[ZFS_MAXPROPLEN]; char shareopts[ZFS_MAXPROPLEN];
int errors = 0;
boolean_t commit_smb_shares = B_FALSE; boolean_t commit_smb_shares = B_FALSE;
boolean_t commit_nfs_shares = B_FALSE; boolean_t commit_nfs_shares = B_FALSE;
/*
* If CL_GATHER_DONT_UNMOUNT is set, it means we don't want to (un)mount
* or (re/un)share the filesystem, so we return from here. If filesystem
* is mounted or unmounted, leave it as it is.
*/
if (clp->cl_gflags & CL_GATHER_DONT_UNMOUNT)
return (0);
/* /*
* If we're changing the mountpoint, attempt to destroy the underlying * If we're changing the mountpoint, attempt to destroy the underlying
* mountpoint. All other datasets will have inherited from this dataset * mountpoint. All other datasets will have inherited from this dataset
@ -240,17 +253,16 @@ changelist_postfix(prop_changelist_t *clp)
needs_key = (zfs_prop_get_int(cn->cn_handle, needs_key = (zfs_prop_get_int(cn->cn_handle,
ZFS_PROP_KEYSTATUS) == ZFS_KEYSTATUS_UNAVAILABLE); ZFS_PROP_KEYSTATUS) == ZFS_KEYSTATUS_UNAVAILABLE);
mounted = (clp->cl_gflags & CL_GATHER_DONT_UNMOUNT) || mounted = zfs_is_mounted(cn->cn_handle, NULL);
zfs_is_mounted(cn->cn_handle, NULL);
if (!mounted && !needs_key && (cn->cn_mounted || if (!mounted && !needs_key && (cn->cn_mounted ||
((sharenfs || sharesmb || clp->cl_waslegacy) && (((clp->cl_prop == ZFS_PROP_MOUNTPOINT &&
clp->cl_prop == clp->cl_realprop) ||
sharenfs || sharesmb || clp->cl_waslegacy) &&
(zfs_prop_get_int(cn->cn_handle, (zfs_prop_get_int(cn->cn_handle,
ZFS_PROP_CANMOUNT) == ZFS_CANMOUNT_ON)))) { ZFS_PROP_CANMOUNT) == ZFS_CANMOUNT_ON)))) {
if (zfs_mount(cn->cn_handle, NULL, 0) != 0) if (zfs_mount(cn->cn_handle, NULL, 0) == 0)
errors++;
else
mounted = TRUE; mounted = TRUE;
} }
@ -262,19 +274,19 @@ changelist_postfix(prop_changelist_t *clp)
const enum sa_protocol nfs[] = const enum sa_protocol nfs[] =
{SA_PROTOCOL_NFS, SA_NO_PROTOCOL}; {SA_PROTOCOL_NFS, SA_NO_PROTOCOL};
if (sharenfs && mounted) { if (sharenfs && mounted) {
errors += zfs_share(cn->cn_handle, nfs); zfs_share(cn->cn_handle, nfs);
commit_nfs_shares = B_TRUE; commit_nfs_shares = B_TRUE;
} else if (cn->cn_shared || clp->cl_waslegacy) { } else if (cn->cn_shared || clp->cl_waslegacy) {
errors += zfs_unshare(cn->cn_handle, NULL, nfs); zfs_unshare(cn->cn_handle, NULL, nfs);
commit_nfs_shares = B_TRUE; commit_nfs_shares = B_TRUE;
} }
const enum sa_protocol smb[] = const enum sa_protocol smb[] =
{SA_PROTOCOL_SMB, SA_NO_PROTOCOL}; {SA_PROTOCOL_SMB, SA_NO_PROTOCOL};
if (sharesmb && mounted) { if (sharesmb && mounted) {
errors += zfs_share(cn->cn_handle, smb); zfs_share(cn->cn_handle, smb);
commit_smb_shares = B_TRUE; commit_smb_shares = B_TRUE;
} else if (cn->cn_shared || clp->cl_waslegacy) { } else if (cn->cn_shared || clp->cl_waslegacy) {
errors += zfs_unshare(cn->cn_handle, NULL, smb); zfs_unshare(cn->cn_handle, NULL, smb);
commit_smb_shares = B_TRUE; commit_smb_shares = B_TRUE;
} }
} }
@ -288,7 +300,7 @@ changelist_postfix(prop_changelist_t *clp)
zfs_commit_shares(proto); zfs_commit_shares(proto);
uu_avl_walk_end(walk); uu_avl_walk_end(walk);
return (errors ? -1 : 0); return (0);
} }
/* /*

View File

@ -1771,14 +1771,24 @@ error:
return (ret); return (ret);
} }
/* /*
* Given an nvlist of property names and values, set the properties for the * Given an nvlist of property names and values, set the properties for the
* given dataset. * given dataset.
*/ */
int int
zfs_prop_set_list(zfs_handle_t *zhp, nvlist_t *props) zfs_prop_set_list(zfs_handle_t *zhp, nvlist_t *props)
{
return (zfs_prop_set_list_flags(zhp, props, 0));
}
/*
* Given an nvlist of property names, values and flags, set the properties
* for the given dataset. If ZFS_SET_NOMOUNT is set, it allows to update
* mountpoint, sharenfs and sharesmb properties without (un/re)mounting
* and (un/re)sharing the dataset.
*/
int
zfs_prop_set_list_flags(zfs_handle_t *zhp, nvlist_t *props, int flags)
{ {
zfs_cmd_t zc = {"\0"}; zfs_cmd_t zc = {"\0"};
int ret = -1; int ret = -1;
@ -1848,7 +1858,9 @@ zfs_prop_set_list(zfs_handle_t *zhp, nvlist_t *props)
if (prop != ZFS_PROP_CANMOUNT || if (prop != ZFS_PROP_CANMOUNT ||
(fnvpair_value_uint64(elem) == ZFS_CANMOUNT_OFF && (fnvpair_value_uint64(elem) == ZFS_CANMOUNT_OFF &&
zfs_is_mounted(zhp, NULL))) { zfs_is_mounted(zhp, NULL))) {
cls[cl_idx] = changelist_gather(zhp, prop, 0, 0); cls[cl_idx] = changelist_gather(zhp, prop,
((flags & ZFS_SET_NOMOUNT) ?
CL_GATHER_DONT_UNMOUNT : 0), 0);
if (cls[cl_idx] == NULL) if (cls[cl_idx] == NULL)
goto error; goto error;
} }

View File

@ -1300,7 +1300,7 @@ zpool_enable_datasets(zpool_handle_t *zhp, const char *mntopts, int flags)
zfs_foreach_mountpoint(zhp->zpool_hdl, cb.cb_handles, cb.cb_used, zfs_foreach_mountpoint(zhp->zpool_hdl, cb.cb_handles, cb.cb_used,
zfs_mount_one, &ms, B_TRUE); zfs_mount_one, &ms, B_TRUE);
if (ms.ms_mntstatus != 0) if (ms.ms_mntstatus != 0)
ret = ms.ms_mntstatus; ret = EZFS_MOUNTFAILED;
/* /*
* Share all filesystems that need to be shared. This needs to be * Share all filesystems that need to be shared. This needs to be
@ -1311,7 +1311,7 @@ zpool_enable_datasets(zpool_handle_t *zhp, const char *mntopts, int flags)
zfs_foreach_mountpoint(zhp->zpool_hdl, cb.cb_handles, cb.cb_used, zfs_foreach_mountpoint(zhp->zpool_hdl, cb.cb_handles, cb.cb_used,
zfs_share_one, &ms, B_FALSE); zfs_share_one, &ms, B_FALSE);
if (ms.ms_mntstatus != 0) if (ms.ms_mntstatus != 0)
ret = ms.ms_mntstatus; ret = EZFS_SHAREFAILED;
else else
zfs_commit_shares(NULL); zfs_commit_shares(NULL);

View File

@ -29,7 +29,7 @@
* Copyright (c) 2017, Intel Corporation. * Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com> * Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021, Colm Buckley <colm@tuatha.org> * Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
* Copyright (c) 2021, Klara Inc. * Copyright (c) 2021, 2023, Klara Inc.
*/ */
#include <errno.h> #include <errno.h>
@ -255,6 +255,7 @@ zpool_get_state_str(zpool_handle_t *zhp)
if (zpool_get_state(zhp) == POOL_STATE_UNAVAIL) { if (zpool_get_state(zhp) == POOL_STATE_UNAVAIL) {
str = gettext("FAULTED"); str = gettext("FAULTED");
} else if (status == ZPOOL_STATUS_IO_FAILURE_WAIT || } else if (status == ZPOOL_STATUS_IO_FAILURE_WAIT ||
status == ZPOOL_STATUS_IO_FAILURE_CONTINUE ||
status == ZPOOL_STATUS_IO_FAILURE_MMP) { status == ZPOOL_STATUS_IO_FAILURE_MMP) {
str = gettext("SUSPENDED"); str = gettext("SUSPENDED");
} else { } else {
@ -3926,6 +3927,12 @@ zpool_vdev_remove(zpool_handle_t *zhp, const char *path)
switch (errno) { switch (errno) {
case EALREADY:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"removal for this vdev is already in progress."));
(void) zfs_error(hdl, EZFS_BUSY, errbuf);
break;
case EINVAL: case EINVAL:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"invalid config; all top-level vdevs must " "invalid config; all top-level vdevs must "

View File

@ -928,6 +928,39 @@ zfs_send_progress(zfs_handle_t *zhp, int fd, uint64_t *bytes_written,
return (0); return (0);
} }
static volatile boolean_t send_progress_thread_signal_duetotimer;
static void
send_progress_thread_act(int sig, siginfo_t *info, void *ucontext)
{
(void) sig, (void) ucontext;
send_progress_thread_signal_duetotimer = info->si_code == SI_TIMER;
}
struct timer_desirability {
timer_t timer;
boolean_t desired;
};
static void
timer_delete_cleanup(void *timer)
{
struct timer_desirability *td = timer;
if (td->desired)
timer_delete(td->timer);
}
#ifdef SIGINFO
#define SEND_PROGRESS_THREAD_PARENT_BLOCK_SIGINFO sigaddset(&new, SIGINFO)
#else
#define SEND_PROGRESS_THREAD_PARENT_BLOCK_SIGINFO
#endif
#define SEND_PROGRESS_THREAD_PARENT_BLOCK(old) { \
sigset_t new; \
sigemptyset(&new); \
sigaddset(&new, SIGUSR1); \
SEND_PROGRESS_THREAD_PARENT_BLOCK_SIGINFO; \
pthread_sigmask(SIG_BLOCK, &new, old); \
}
static void * static void *
send_progress_thread(void *arg) send_progress_thread(void *arg)
{ {
@ -941,6 +974,26 @@ send_progress_thread(void *arg)
struct tm tm; struct tm tm;
int err; int err;
const struct sigaction signal_action =
{.sa_sigaction = send_progress_thread_act, .sa_flags = SA_SIGINFO};
struct sigevent timer_cfg =
{.sigev_notify = SIGEV_SIGNAL, .sigev_signo = SIGUSR1};
const struct itimerspec timer_time =
{.it_value = {.tv_sec = 1}, .it_interval = {.tv_sec = 1}};
struct timer_desirability timer = {};
sigaction(SIGUSR1, &signal_action, NULL);
#ifdef SIGINFO
sigaction(SIGINFO, &signal_action, NULL);
#endif
if ((timer.desired = pa->pa_progress || pa->pa_astitle)) {
if (timer_create(CLOCK_MONOTONIC, &timer_cfg, &timer.timer))
return ((void *)(uintptr_t)errno);
(void) timer_settime(timer.timer, 0, &timer_time, NULL);
}
pthread_cleanup_push(timer_delete_cleanup, &timer);
if (!pa->pa_parsable && pa->pa_progress) { if (!pa->pa_parsable && pa->pa_progress) {
(void) fprintf(stderr, (void) fprintf(stderr,
"TIME %s %sSNAPSHOT %s\n", "TIME %s %sSNAPSHOT %s\n",
@ -953,12 +1006,12 @@ send_progress_thread(void *arg)
* Print the progress from ZFS_IOC_SEND_PROGRESS every second. * Print the progress from ZFS_IOC_SEND_PROGRESS every second.
*/ */
for (;;) { for (;;) {
(void) sleep(1); pause();
if ((err = zfs_send_progress(zhp, pa->pa_fd, &bytes, if ((err = zfs_send_progress(zhp, pa->pa_fd, &bytes,
&blocks)) != 0) { &blocks)) != 0) {
if (err == EINTR || err == ENOENT) if (err == EINTR || err == ENOENT)
return ((void *)0); err = 0;
return ((void *)(uintptr_t)err); pthread_exit(((void *)(uintptr_t)err));
} }
(void) time(&t); (void) time(&t);
@ -991,21 +1044,25 @@ send_progress_thread(void *arg)
(void) fprintf(stderr, "%02d:%02d:%02d\t%llu\t%s\n", (void) fprintf(stderr, "%02d:%02d:%02d\t%llu\t%s\n",
tm.tm_hour, tm.tm_min, tm.tm_sec, tm.tm_hour, tm.tm_min, tm.tm_sec,
(u_longlong_t)bytes, zhp->zfs_name); (u_longlong_t)bytes, zhp->zfs_name);
} else if (pa->pa_progress) { } else if (pa->pa_progress ||
!send_progress_thread_signal_duetotimer) {
zfs_nicebytes(bytes, buf, sizeof (buf)); zfs_nicebytes(bytes, buf, sizeof (buf));
(void) fprintf(stderr, "%02d:%02d:%02d %5s %s\n", (void) fprintf(stderr, "%02d:%02d:%02d %5s %s\n",
tm.tm_hour, tm.tm_min, tm.tm_sec, tm.tm_hour, tm.tm_min, tm.tm_sec,
buf, zhp->zfs_name); buf, zhp->zfs_name);
} }
} }
pthread_cleanup_pop(B_TRUE);
} }
static boolean_t static boolean_t
send_progress_thread_exit(libzfs_handle_t *hdl, pthread_t ptid) send_progress_thread_exit(
libzfs_handle_t *hdl, pthread_t ptid, sigset_t *oldmask)
{ {
void *status = NULL; void *status = NULL;
(void) pthread_cancel(ptid); (void) pthread_cancel(ptid);
(void) pthread_join(ptid, &status); (void) pthread_join(ptid, &status);
pthread_sigmask(SIG_SETMASK, oldmask, NULL);
int error = (int)(uintptr_t)status; int error = (int)(uintptr_t)status;
if (error != 0 && status != PTHREAD_CANCELED) if (error != 0 && status != PTHREAD_CANCELED)
return (zfs_standard_error(hdl, error, return (zfs_standard_error(hdl, error,
@ -1199,7 +1256,8 @@ dump_snapshot(zfs_handle_t *zhp, void *arg)
* If progress reporting is requested, spawn a new thread to * If progress reporting is requested, spawn a new thread to
* poll ZFS_IOC_SEND_PROGRESS at a regular interval. * poll ZFS_IOC_SEND_PROGRESS at a regular interval.
*/ */
if (sdd->progress || sdd->progressastitle) { sigset_t oldmask;
{
pa.pa_zhp = zhp; pa.pa_zhp = zhp;
pa.pa_fd = sdd->outfd; pa.pa_fd = sdd->outfd;
pa.pa_parsable = sdd->parsable; pa.pa_parsable = sdd->parsable;
@ -1214,13 +1272,13 @@ dump_snapshot(zfs_handle_t *zhp, void *arg)
zfs_close(zhp); zfs_close(zhp);
return (err); return (err);
} }
SEND_PROGRESS_THREAD_PARENT_BLOCK(&oldmask);
} }
err = dump_ioctl(zhp, sdd->prevsnap, sdd->prevsnap_obj, err = dump_ioctl(zhp, sdd->prevsnap, sdd->prevsnap_obj,
fromorigin, sdd->outfd, flags, sdd->debugnv); fromorigin, sdd->outfd, flags, sdd->debugnv);
if ((sdd->progress || sdd->progressastitle) && if (send_progress_thread_exit(zhp->zfs_hdl, tid, &oldmask))
send_progress_thread_exit(zhp->zfs_hdl, tid))
return (-1); return (-1);
} }
@ -1562,8 +1620,9 @@ estimate_size(zfs_handle_t *zhp, const char *from, int fd, sendflags_t *flags,
progress_arg_t pa = { 0 }; progress_arg_t pa = { 0 };
int err = 0; int err = 0;
pthread_t ptid; pthread_t ptid;
sigset_t oldmask;
if (flags->progress || flags->progressastitle) { {
pa.pa_zhp = zhp; pa.pa_zhp = zhp;
pa.pa_fd = fd; pa.pa_fd = fd;
pa.pa_parsable = flags->parsable; pa.pa_parsable = flags->parsable;
@ -1577,6 +1636,7 @@ estimate_size(zfs_handle_t *zhp, const char *from, int fd, sendflags_t *flags,
return (zfs_error(zhp->zfs_hdl, return (zfs_error(zhp->zfs_hdl,
EZFS_THREADCREATEFAILED, errbuf)); EZFS_THREADCREATEFAILED, errbuf));
} }
SEND_PROGRESS_THREAD_PARENT_BLOCK(&oldmask);
} }
err = lzc_send_space_resume_redacted(zhp->zfs_name, from, err = lzc_send_space_resume_redacted(zhp->zfs_name, from,
@ -1584,8 +1644,7 @@ estimate_size(zfs_handle_t *zhp, const char *from, int fd, sendflags_t *flags,
redactbook, fd, &size); redactbook, fd, &size);
*sizep = size; *sizep = size;
if ((flags->progress || flags->progressastitle) && if (send_progress_thread_exit(zhp->zfs_hdl, ptid, &oldmask))
send_progress_thread_exit(zhp->zfs_hdl, ptid))
return (-1); return (-1);
if (!flags->progress && !flags->parsable) if (!flags->progress && !flags->parsable)
@ -1876,11 +1935,12 @@ zfs_send_resume_impl_cb_impl(libzfs_handle_t *hdl, sendflags_t *flags,
if (!flags->dryrun) { if (!flags->dryrun) {
progress_arg_t pa = { 0 }; progress_arg_t pa = { 0 };
pthread_t tid; pthread_t tid;
sigset_t oldmask;
/* /*
* If progress reporting is requested, spawn a new thread to * If progress reporting is requested, spawn a new thread to
* poll ZFS_IOC_SEND_PROGRESS at a regular interval. * poll ZFS_IOC_SEND_PROGRESS at a regular interval.
*/ */
if (flags->progress || flags->progressastitle) { {
pa.pa_zhp = zhp; pa.pa_zhp = zhp;
pa.pa_fd = outfd; pa.pa_fd = outfd;
pa.pa_parsable = flags->parsable; pa.pa_parsable = flags->parsable;
@ -1898,6 +1958,7 @@ zfs_send_resume_impl_cb_impl(libzfs_handle_t *hdl, sendflags_t *flags,
zfs_close(zhp); zfs_close(zhp);
return (error); return (error);
} }
SEND_PROGRESS_THREAD_PARENT_BLOCK(&oldmask);
} }
error = lzc_send_resume_redacted(zhp->zfs_name, fromname, outfd, error = lzc_send_resume_redacted(zhp->zfs_name, fromname, outfd,
@ -1905,8 +1966,7 @@ zfs_send_resume_impl_cb_impl(libzfs_handle_t *hdl, sendflags_t *flags,
if (redact_book != NULL) if (redact_book != NULL)
free(redact_book); free(redact_book);
if ((flags->progressastitle || flags->progress) && if (send_progress_thread_exit(hdl, tid, &oldmask)) {
send_progress_thread_exit(hdl, tid)) {
zfs_close(zhp); zfs_close(zhp);
return (-1); return (-1);
} }
@ -2691,7 +2751,8 @@ zfs_send_one_cb_impl(zfs_handle_t *zhp, const char *from, int fd,
* If progress reporting is requested, spawn a new thread to poll * If progress reporting is requested, spawn a new thread to poll
* ZFS_IOC_SEND_PROGRESS at a regular interval. * ZFS_IOC_SEND_PROGRESS at a regular interval.
*/ */
if (flags->progress || flags->progressastitle) { sigset_t oldmask;
{
pa.pa_zhp = zhp; pa.pa_zhp = zhp;
pa.pa_fd = fd; pa.pa_fd = fd;
pa.pa_parsable = flags->parsable; pa.pa_parsable = flags->parsable;
@ -2708,13 +2769,13 @@ zfs_send_one_cb_impl(zfs_handle_t *zhp, const char *from, int fd,
return (zfs_error(zhp->zfs_hdl, return (zfs_error(zhp->zfs_hdl,
EZFS_THREADCREATEFAILED, errbuf)); EZFS_THREADCREATEFAILED, errbuf));
} }
SEND_PROGRESS_THREAD_PARENT_BLOCK(&oldmask);
} }
err = lzc_send_redacted(name, from, fd, err = lzc_send_redacted(name, from, fd,
lzc_flags_from_sendflags(flags), redactbook); lzc_flags_from_sendflags(flags), redactbook);
if ((flags->progress || flags->progressastitle) && if (send_progress_thread_exit(hdl, ptid, &oldmask))
send_progress_thread_exit(hdl, ptid))
return (-1); return (-1);
if (err == 0 && (flags->props || flags->holds || flags->backup)) { if (err == 0 && (flags->props || flags->holds || flags->backup)) {

View File

@ -650,10 +650,12 @@ send_worker(void *arg)
unsigned int bufsiz = max_pipe_buffer(ctx->from); unsigned int bufsiz = max_pipe_buffer(ctx->from);
ssize_t rd; ssize_t rd;
while ((rd = splice(ctx->from, NULL, ctx->to, NULL, bufsiz, for (;;) {
SPLICE_F_MOVE | SPLICE_F_MORE)) > 0) rd = splice(ctx->from, NULL, ctx->to, NULL, bufsiz,
; SPLICE_F_MOVE | SPLICE_F_MORE);
if ((rd == -1 && errno != EINTR) || rd == 0)
break;
}
int err = (rd == -1) ? errno : 0; int err = (rd == -1) ? errno : 0;
close(ctx->from); close(ctx->from);
return ((void *)(uintptr_t)err); return ((void *)(uintptr_t)err);

View File

@ -38,7 +38,6 @@ dist_man_MANS = \
%D%/man8/zfs-groupspace.8 \ %D%/man8/zfs-groupspace.8 \
%D%/man8/zfs-hold.8 \ %D%/man8/zfs-hold.8 \
%D%/man8/zfs-inherit.8 \ %D%/man8/zfs-inherit.8 \
%D%/man8/zfs-jail.8 \
%D%/man8/zfs-list.8 \ %D%/man8/zfs-list.8 \
%D%/man8/zfs-load-key.8 \ %D%/man8/zfs-load-key.8 \
%D%/man8/zfs-mount.8 \ %D%/man8/zfs-mount.8 \
@ -57,14 +56,11 @@ dist_man_MANS = \
%D%/man8/zfs-share.8 \ %D%/man8/zfs-share.8 \
%D%/man8/zfs-snapshot.8 \ %D%/man8/zfs-snapshot.8 \
%D%/man8/zfs-unallow.8 \ %D%/man8/zfs-unallow.8 \
%D%/man8/zfs-unjail.8 \
%D%/man8/zfs-unload-key.8 \ %D%/man8/zfs-unload-key.8 \
%D%/man8/zfs-unmount.8 \ %D%/man8/zfs-unmount.8 \
%D%/man8/zfs-unzone.8 \
%D%/man8/zfs-upgrade.8 \ %D%/man8/zfs-upgrade.8 \
%D%/man8/zfs-userspace.8 \ %D%/man8/zfs-userspace.8 \
%D%/man8/zfs-wait.8 \ %D%/man8/zfs-wait.8 \
%D%/man8/zfs-zone.8 \
%D%/man8/zfs_ids_to_path.8 \ %D%/man8/zfs_ids_to_path.8 \
%D%/man8/zgenhostid.8 \ %D%/man8/zgenhostid.8 \
%D%/man8/zinject.8 \ %D%/man8/zinject.8 \
@ -104,6 +100,18 @@ dist_man_MANS = \
%D%/man8/zstreamdump.8 \ %D%/man8/zstreamdump.8 \
%D%/man8/zpool_influxdb.8 %D%/man8/zpool_influxdb.8
if BUILD_FREEBSD
dist_man_MANS += \
%D%/man8/zfs-jail.8 \
%D%/man8/zfs-unjail.8
endif
if BUILD_LINUX
dist_man_MANS += \
%D%/man8/zfs-unzone.8 \
%D%/man8/zfs-zone.8
endif
nodist_man_MANS = \ nodist_man_MANS = \
%D%/man8/zed.8 \ %D%/man8/zed.8 \
%D%/man8/zfs-mount-generator.8 %D%/man8/zfs-mount-generator.8

View File

@ -15,7 +15,7 @@
.\" own identifying information: .\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner] .\" Portions Copyright [yyyy] [name of copyright owner]
.\" .\"
.Dd January 10, 2023 .Dd July 21, 2023
.Dt ZFS 4 .Dt ZFS 4
.Os .Os
. .
@ -239,6 +239,11 @@ relative to the pool.
Make some blocks above a certain size be gang blocks. Make some blocks above a certain size be gang blocks.
This option is used by the test suite to facilitate testing. This option is used by the test suite to facilitate testing.
. .
.It Sy metaslab_force_ganging_pct Ns = Ns Sy 3 Ns % Pq uint
For blocks that could be forced to be a gang block (due to
.Sy metaslab_force_ganging ) ,
force this many of them to be gang blocks.
.
.It Sy zfs_ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int .It Sy zfs_ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
Default DDT ZAP data block size as a power of 2. Note that changing this after Default DDT ZAP data block size as a power of 2. Note that changing this after
creating a DDT on the pool will not affect existing DDTs, only newly created creating a DDT on the pool will not affect existing DDTs, only newly created
@ -397,6 +402,12 @@ Practical upper limit of total metaslabs per top-level vdev.
.It Sy metaslab_preload_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int .It Sy metaslab_preload_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
Enable metaslab group preloading. Enable metaslab group preloading.
. .
.It Sy metaslab_preload_limit Ns = Ns Sy 10 Pq uint
Maximum number of metaslabs per group to preload
.
.It Sy metaslab_preload_pct Ns = Ns Sy 50 Pq uint
Percentage of CPUs to run a metaslab preload taskq
.
.It Sy metaslab_lba_weighting_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int .It Sy metaslab_lba_weighting_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
Give more weight to metaslabs with lower LBAs, Give more weight to metaslabs with lower LBAs,
assuming they have greater bandwidth, assuming they have greater bandwidth,
@ -519,9 +530,6 @@ However, this is limited by
Maximum micro ZAP size. Maximum micro ZAP size.
A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size. A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size.
. .
.It Sy zfetch_array_rd_sz Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
If prefetching is enabled, disable prefetching for reads larger than this size.
.
.It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint .It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint
Min bytes to prefetch per stream. Min bytes to prefetch per stream.
Prefetch distance starts from the demand access size and quickly grows to Prefetch distance starts from the demand access size and quickly grows to
@ -2142,6 +2150,11 @@ On very fragmented pools, lowering this
.Pq typically to Sy 36 KiB .Pq typically to Sy 36 KiB
can improve performance. can improve performance.
. .
.It Sy zil_maxcopied Ns = Ns Sy 7680 Ns B Po 7.5 KiB Pc Pq uint
This sets the maximum number of write bytes logged via WR_COPIED.
It tunes a tradeoff between additional memory copy and possibly worse log
space efficiency vs additional range lock/unlock.
.
.It Sy zil_min_commit_timeout Ns = Ns Sy 5000 Pq u64 .It Sy zil_min_commit_timeout Ns = Ns Sy 5000 Pq u64
This sets the minimum delay in nanoseconds ZIL care to delay block commit, This sets the minimum delay in nanoseconds ZIL care to delay block commit,
waiting for more records. waiting for more records.

View File

@ -28,8 +28,9 @@
.\" Copyright 2019 Richard Laager. All rights reserved. .\" Copyright 2019 Richard Laager. All rights reserved.
.\" Copyright 2018 Nexenta Systems, Inc. .\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc. .\" Copyright 2019 Joyent, Inc.
.\" Copyright 2023 Klara, Inc.
.\" .\"
.Dd June 30, 2019 .Dd October 6, 2023
.Dt ZFSCONCEPTS 7 .Dt ZFSCONCEPTS 7
.Os .Os
. .
@ -205,3 +206,40 @@ practices, such as regular backups.
Consider using the Consider using the
.Sy compression .Sy compression
property as a less resource-intensive alternative. property as a less resource-intensive alternative.
.Ss Block cloning
Block cloning is a facility that allows a file (or parts of a file) to be
.Qq cloned ,
that is, a shallow copy made where the existing data blocks are referenced
rather than copied.
Later modifications to the data will cause a copy of the data block to be taken
and that copy modified.
This facility is used to implement
.Qq reflinks
or
.Qq file-level copy-on-write .
.Pp
Cloned blocks are tracked in a special on-disk structure called the Block
Reference Table
.Po BRT
.Pc .
Unlike deduplication, this table has minimal overhead, so can be enabled at all
times.
.Pp
Also unlike deduplication, cloning must be requested by a user program.
Many common file copying programs, including newer versions of
.Nm /bin/cp ,
will try to create clones automatically.
Look for
.Qq clone ,
.Qq dedupe
or
.Qq reflink
in the documentation for more information.
.Pp
There are some limitations to block cloning.
Only whole blocks can be cloned, and blocks can not be cloned if they are not
yet written to disk, or if they are encrypted, or the source and destination
.Sy recordsize
properties differ.
The OS may add additional restrictions;
for example, most versions of Linux will not allow clones across datasets.

View File

@ -38,7 +38,7 @@
.\" Copyright (c) 2019, Kjeld Schouten-Lebbing .\" Copyright (c) 2019, Kjeld Schouten-Lebbing
.\" Copyright (c) 2022 Hewlett Packard Enterprise Development LP. .\" Copyright (c) 2022 Hewlett Packard Enterprise Development LP.
.\" .\"
.Dd April 18, 2023 .Dd August 8, 2023
.Dt ZFSPROPS 7 .Dt ZFSPROPS 7
.Os .Os
. .
@ -1248,10 +1248,18 @@ Otherwise, they are automatically remounted in the new location if the property
was previously was previously
.Sy legacy .Sy legacy
or or
.Sy none , .Sy none .
or if they were mounted before the property was changed.
In addition, any shared file systems are unshared and shared in the new In addition, any shared file systems are unshared and shared in the new
location. location.
.Pp
When the
.Sy mountpoint
property is set with
.Nm zfs Cm set Fl u
, the
.Sy mountpoint
property is updated but dataset is not mounted or unmounted and remains
as it was before.
.It Sy nbmand Ns = Ns Sy on Ns | Ns Sy off .It Sy nbmand Ns = Ns Sy on Ns | Ns Sy off
Controls whether the file system should be mounted with Controls whether the file system should be mounted with
.Sy nbmand .Sy nbmand
@ -1656,6 +1664,13 @@ by default.
This means that any additional access control This means that any additional access control
(disallow specific user specific access etc) must be done on the underlying file (disallow specific user specific access etc) must be done on the underlying file
system. system.
.Pp
When the
.Sy sharesmb
property is updated with
.Nm zfs Cm set Fl u
, the property is set to desired value, but the operation to share, reshare
or unshare the the dataset is not performed.
.It Sy sharenfs Ns = Ns Sy on Ns | Ns Sy off Ns | Ns Ar opts .It Sy sharenfs Ns = Ns Sy on Ns | Ns Sy off Ns | Ns Ar opts
Controls whether the file system is shared via NFS, and what options are to be Controls whether the file system is shared via NFS, and what options are to be
used. used.
@ -1699,6 +1714,13 @@ or if they were shared before the property was changed.
If the new property is If the new property is
.Sy off , .Sy off ,
the file systems are unshared. the file systems are unshared.
.Pp
When the
.Sy sharenfs
property is updated with
.Nm zfs Cm set Fl u
, the property is set to desired value, but the operation to share, reshare
or unshare the the dataset is not performed.
.It Sy logbias Ns = Ns Sy latency Ns | Ns Sy throughput .It Sy logbias Ns = Ns Sy latency Ns | Ns Sy throughput
Provide a hint to ZFS about handling of synchronous requests in this dataset. Provide a hint to ZFS about handling of synchronous requests in this dataset.
If If
@ -1916,13 +1938,15 @@ See
for more information. for more information.
Jails are a Jails are a
.Fx .Fx
feature and are not relevant on other platforms. feature and this property is not available on other platforms.
The default value is .It Sy zoned Ns = Ns Sy off Ns | Ns Sy on
.Sy off .
.It Sy zoned Ns = Ns Sy on Ns | Ns Sy off
Controls whether the dataset is managed from a non-global zone or namespace. Controls whether the dataset is managed from a non-global zone or namespace.
The default value is See
.Sy off . .Xr zfs-zone 8
for more information.
Zoning is a
Linux
feature and this property is not available on other platforms.
.El .El
.Pp .Pp
The following three properties cannot be changed after the file system is The following three properties cannot be changed after the file system is

View File

@ -203,11 +203,9 @@ For more information, see the
section. section.
.El .El
.Pp .Pp
Virtual devices cannot be nested, so a mirror or raidz virtual device can only Virtual devices cannot be nested arbitrarily.
contain files or disks. A mirror, raidz or draid virtual device can only be created with files or disks.
Mirrors of mirrors Mirrors of mirrors or other such combinations are not allowed.
.Pq or other combinations
are not allowed.
.Pp .Pp
A pool can have any number of virtual devices at the top of the configuration A pool can have any number of virtual devices at the top of the configuration
.Po known as .Po known as

View File

@ -29,7 +29,7 @@
.\" Copyright 2018 Nexenta Systems, Inc. .\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc. .\" Copyright 2019 Joyent, Inc.
.\" .\"
.Dd January 12, 2023 .Dd July 27, 2023
.Dt ZFS-SEND 8 .Dt ZFS-SEND 8
.Os .Os
. .
@ -297,6 +297,12 @@ This flag can only be used in conjunction with
.It Fl v , -verbose .It Fl v , -verbose
Print verbose information about the stream package generated. Print verbose information about the stream package generated.
This information includes a per-second report of how much data has been sent. This information includes a per-second report of how much data has been sent.
The same report can be requested by sending
.Dv SIGINFO
or
.Dv SIGUSR1 ,
regardless of
.Fl v .
.Pp .Pp
The format of the stream is committed. The format of the stream is committed.
You will be able to receive your streams on future versions of ZFS. You will be able to receive your streams on future versions of ZFS.
@ -433,6 +439,12 @@ and the verbose output goes to standard error
.It Fl v , -verbose .It Fl v , -verbose
Print verbose information about the stream package generated. Print verbose information about the stream package generated.
This information includes a per-second report of how much data has been sent. This information includes a per-second report of how much data has been sent.
The same report can be requested by sending
.Dv SIGINFO
or
.Dv SIGUSR1 ,
regardless of
.Fl v .
.El .El
.It Xo .It Xo
.Nm zfs .Nm zfs
@ -669,6 +681,10 @@ ones on the source, and are ready to be used, while the parent snapshot on the
target contains none of the username and password data present on the source, target contains none of the username and password data present on the source,
because it was removed by the redacted send operation. because it was removed by the redacted send operation.
. .
.Sh SIGNALS
See
.Fl v .
.
.Sh EXAMPLES .Sh EXAMPLES
.\" These are, respectively, examples 12, 13 from zfs.8 .\" These are, respectively, examples 12, 13 from zfs.8
.\" Make sure to update them bidirectionally .\" Make sure to update them bidirectionally

View File

@ -39,6 +39,7 @@
.Sh SYNOPSIS .Sh SYNOPSIS
.Nm zfs .Nm zfs
.Cm set .Cm set
.Op Fl u
.Ar property Ns = Ns Ar value Oo Ar property Ns = Ns Ar value Oc Ns .Ar property Ns = Ns Ar value Oo Ar property Ns = Ns Ar value Oc Ns
.Ar filesystem Ns | Ns Ar volume Ns | Ns Ar snapshot Ns .Ar filesystem Ns | Ns Ar volume Ns | Ns Ar snapshot Ns
.Nm zfs .Nm zfs
@ -60,6 +61,7 @@
.It Xo .It Xo
.Nm zfs .Nm zfs
.Cm set .Cm set
.Op Fl u
.Ar property Ns = Ns Ar value Oo Ar property Ns = Ns Ar value Oc Ns .Ar property Ns = Ns Ar value Oo Ar property Ns = Ns Ar value Oc Ns
.Ar filesystem Ns | Ns Ar volume Ns | Ns Ar snapshot Ns .Ar filesystem Ns | Ns Ar volume Ns | Ns Ar snapshot Ns
.Xc .Xc
@ -79,6 +81,11 @@ For more information, see the
.Em User Properties .Em User Properties
section of section of
.Xr zfsprops 7 . .Xr zfsprops 7 .
.Bl -tag -width "-u"
.It Fl u
Update mountpoint, sharenfs, sharesmb property but do not mount or share the
dataset.
.El
.It Xo .It Xo
.Nm zfs .Nm zfs
.Cm get .Cm get

View File

@ -26,7 +26,7 @@
.\" Copyright 2017 Nexenta Systems, Inc. .\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. .\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\" .\"
.Dd May 27, 2021 .Dd July 11, 2023
.Dt ZPOOL-EVENTS 8 .Dt ZPOOL-EVENTS 8
.Os .Os
. .
@ -305,10 +305,6 @@ The time when a given I/O request was submitted.
The time required to service a given I/O request. The time required to service a given I/O request.
.It Sy prev_state .It Sy prev_state
The previous state of the vdev. The previous state of the vdev.
.It Sy cksum_expected
The expected checksum value for the block.
.It Sy cksum_actual
The actual checksum value for an errant block.
.It Sy cksum_algorithm .It Sy cksum_algorithm
Checksum algorithm used. Checksum algorithm used.
See See
@ -362,23 +358,6 @@ Like
but contains but contains
.Pq Ar good data No & ~( Ns Ar bad data ) ; .Pq Ar good data No & ~( Ns Ar bad data ) ;
that is, the bits set in the good data which are cleared in the bad data. that is, the bits set in the good data which are cleared in the bad data.
.It Sy bad_set_histogram
If this field exists, it is an array of counters.
Each entry counts bits set in a particular bit of a big-endian uint64 type.
The first entry counts bits
set in the high-order bit of the first byte, the 9th byte, etc, and the last
entry counts bits set of the low-order bit of the 8th byte, the 16th byte, etc.
This information is useful for observing a stuck bit in a parallel data path,
such as IDE or parallel SCSI.
.It Sy bad_cleared_histogram
If this field exists, it is an array of counters.
Each entry counts bit clears in a particular bit of a big-endian uint64 type.
The first entry counts bits
clears of the high-order bit of the first byte, the 9th byte, etc, and the
last entry counts clears of the low-order bit of the 8th byte, the 16th byte,
etc.
This information is useful for observing a stuck bit in a parallel data
path, such as IDE or parallel SCSI.
.El .El
. .
.Sh I/O STAGES .Sh I/O STAGES

View File

@ -110,9 +110,10 @@ Removes ZFS label information from the specified
.It Xo .It Xo
.Xr zpool-attach 8 Ns / Ns Xr zpool-detach 8 .Xr zpool-attach 8 Ns / Ns Xr zpool-detach 8
.Xc .Xc
Increases or decreases redundancy by Converts a non-redundant disk into a mirror, or increases
.Cm attach Ns ing or the redundancy level of an existing mirror
.Cm detach Ns ing a device on an existing vdev (virtual device). .Cm ( attach Ns ), or performs the inverse operation (
.Cm detach Ns ).
.It Xo .It Xo
.Xr zpool-add 8 Ns / Ns Xr zpool-remove 8 .Xr zpool-add 8 Ns / Ns Xr zpool-remove 8
.Xc .Xc
@ -233,16 +234,16 @@ Invalid command line options were specified.
.El .El
. .
.Sh EXAMPLES .Sh EXAMPLES
.\" Examples 1, 2, 3, 4, 11, 12 are shared with zpool-create.8. .\" Examples 1, 2, 3, 4, 12, 13 are shared with zpool-create.8.
.\" Examples 5, 13 are shared with zpool-add.8. .\" Examples 6, 14 are shared with zpool-add.8.
.\" Examples 6, 15 are shared with zpool-list.8. .\" Examples 7, 16 are shared with zpool-list.8.
.\" Examples 7 are shared with zpool-destroy.8. .\" Examples 8 are shared with zpool-destroy.8.
.\" Examples 8 are shared with zpool-export.8. .\" Examples 9 are shared with zpool-export.8.
.\" Examples 9 are shared with zpool-import.8. .\" Examples 10 are shared with zpool-import.8.
.\" Examples 10 are shared with zpool-upgrade.8. .\" Examples 11 are shared with zpool-upgrade.8.
.\" Examples 14 are shared with zpool-remove.8. .\" Examples 15 are shared with zpool-remove.8.
.\" Examples 16 are shared with zpool-status.8. .\" Examples 17 are shared with zpool-status.8.
.\" Examples 13, 16 are also shared with zpool-iostat.8. .\" Examples 14, 17 are also shared with zpool-iostat.8.
.\" Make sure to update them omnidirectionally .\" Make sure to update them omnidirectionally
.Ss Example 1 : No Creating a RAID-Z Storage Pool .Ss Example 1 : No Creating a RAID-Z Storage Pool
The following command creates a pool with a single raidz root vdev that The following command creates a pool with a single raidz root vdev that
@ -264,14 +265,21 @@ While not recommended, a pool based on files can be useful for experimental
purposes. purposes.
.Dl # Nm zpool Cm create Ar tank Pa /path/to/file/a /path/to/file/b .Dl # Nm zpool Cm create Ar tank Pa /path/to/file/a /path/to/file/b
. .
.Ss Example 5 : No Adding a Mirror to a ZFS Storage Pool .Ss Example 5 : No Making a non-mirrored ZFS Storage Pool mirrored
The following command converts an existing single device
.Ar sda
into a mirror by attaching a second device to it,
.Ar sdb .
.Dl # Nm zpool Cm attach Ar tank Pa sda sdb
.
.Ss Example 6 : No Adding a Mirror to a ZFS Storage Pool
The following command adds two mirrored disks to the pool The following command adds two mirrored disks to the pool
.Ar tank , .Ar tank ,
assuming the pool is already made up of two-way mirrors. assuming the pool is already made up of two-way mirrors.
The additional space is immediately available to any datasets within the pool. The additional space is immediately available to any datasets within the pool.
.Dl # Nm zpool Cm add Ar tank Sy mirror Pa sda sdb .Dl # Nm zpool Cm add Ar tank Sy mirror Pa sda sdb
. .
.Ss Example 6 : No Listing Available ZFS Storage Pools .Ss Example 7 : No Listing Available ZFS Storage Pools
The following command lists all available pools on the system. The following command lists all available pools on the system.
In this case, the pool In this case, the pool
.Ar zion .Ar zion
@ -285,19 +293,19 @@ tank 61.5G 20.0G 41.5G - 48% 32% 1.00x ONLINE -
zion - - - - - - - FAULTED - zion - - - - - - - FAULTED -
.Ed .Ed
. .
.Ss Example 7 : No Destroying a ZFS Storage Pool .Ss Example 8 : No Destroying a ZFS Storage Pool
The following command destroys the pool The following command destroys the pool
.Ar tank .Ar tank
and any datasets contained within: and any datasets contained within:
.Dl # Nm zpool Cm destroy Fl f Ar tank .Dl # Nm zpool Cm destroy Fl f Ar tank
. .
.Ss Example 8 : No Exporting a ZFS Storage Pool .Ss Example 9 : No Exporting a ZFS Storage Pool
The following command exports the devices in pool The following command exports the devices in pool
.Ar tank .Ar tank
so that they can be relocated or later imported: so that they can be relocated or later imported:
.Dl # Nm zpool Cm export Ar tank .Dl # Nm zpool Cm export Ar tank
. .
.Ss Example 9 : No Importing a ZFS Storage Pool .Ss Example 10 : No Importing a ZFS Storage Pool
The following command displays available pools, and then imports the pool The following command displays available pools, and then imports the pool
.Ar tank .Ar tank
for use on the system. for use on the system.
@ -318,7 +326,7 @@ config:
.No # Nm zpool Cm import Ar tank .No # Nm zpool Cm import Ar tank
.Ed .Ed
. .
.Ss Example 10 : No Upgrading All ZFS Storage Pools to the Current Version .Ss Example 11 : No Upgrading All ZFS Storage Pools to the Current Version
The following command upgrades all ZFS Storage pools to the current version of The following command upgrades all ZFS Storage pools to the current version of
the software: the software:
.Bd -literal -compact -offset Ds .Bd -literal -compact -offset Ds
@ -326,7 +334,7 @@ the software:
This system is currently running ZFS version 2. This system is currently running ZFS version 2.
.Ed .Ed
. .
.Ss Example 11 : No Managing Hot Spares .Ss Example 12 : No Managing Hot Spares
The following command creates a new pool with an available hot spare: The following command creates a new pool with an available hot spare:
.Dl # Nm zpool Cm create Ar tank Sy mirror Pa sda sdb Sy spare Pa sdc .Dl # Nm zpool Cm create Ar tank Sy mirror Pa sda sdb Sy spare Pa sdc
.Pp .Pp
@ -341,12 +349,12 @@ The hot spare can be permanently removed from the pool using the following
command: command:
.Dl # Nm zpool Cm remove Ar tank Pa sdc .Dl # Nm zpool Cm remove Ar tank Pa sdc
. .
.Ss Example 12 : No Creating a ZFS Pool with Mirrored Separate Intent Logs .Ss Example 13 : No Creating a ZFS Pool with Mirrored Separate Intent Logs
The following command creates a ZFS storage pool consisting of two, two-way The following command creates a ZFS storage pool consisting of two, two-way
mirrors and mirrored log devices: mirrors and mirrored log devices:
.Dl # Nm zpool Cm create Ar pool Sy mirror Pa sda sdb Sy mirror Pa sdc sdd Sy log mirror Pa sde sdf .Dl # Nm zpool Cm create Ar pool Sy mirror Pa sda sdb Sy mirror Pa sdc sdd Sy log mirror Pa sde sdf
. .
.Ss Example 13 : No Adding Cache Devices to a ZFS Pool .Ss Example 14 : No Adding Cache Devices to a ZFS Pool
The following command adds two disks for use as cache devices to a ZFS storage The following command adds two disks for use as cache devices to a ZFS storage
pool: pool:
.Dl # Nm zpool Cm add Ar pool Sy cache Pa sdc sdd .Dl # Nm zpool Cm add Ar pool Sy cache Pa sdc sdd
@ -359,7 +367,7 @@ Capacity and reads can be monitored using the
subcommand as follows: subcommand as follows:
.Dl # Nm zpool Cm iostat Fl v Ar pool 5 .Dl # Nm zpool Cm iostat Fl v Ar pool 5
. .
.Ss Example 14 : No Removing a Mirrored top-level (Log or Data) Device .Ss Example 15 : No Removing a Mirrored top-level (Log or Data) Device
The following commands remove the mirrored log device The following commands remove the mirrored log device
.Sy mirror-2 .Sy mirror-2
and mirrored top-level data device and mirrored top-level data device
@ -394,7 +402,7 @@ The command to remove the mirrored data
.Ar mirror-1 No is : .Ar mirror-1 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-1 .Dl # Nm zpool Cm remove Ar tank mirror-1
. .
.Ss Example 15 : No Displaying expanded space on a device .Ss Example 16 : No Displaying expanded space on a device
The following command displays the detailed information for the pool The following command displays the detailed information for the pool
.Ar data . .Ar data .
This pool is comprised of a single raidz vdev where one of its devices This pool is comprised of a single raidz vdev where one of its devices
@ -411,7 +419,7 @@ data 23.9G 14.6G 9.30G - 48% 61% 1.00x ONLINE -
sdc - - - - - sdc - - - - -
.Ed .Ed
. .
.Ss Example 16 : No Adding output columns .Ss Example 17 : No Adding output columns
Additional columns can be added to the Additional columns can be added to the
.Nm zpool Cm status No and Nm zpool Cm iostat No output with Fl c . .Nm zpool Cm status No and Nm zpool Cm iostat No output with Fl c .
.Bd -literal -compact -offset Ds .Bd -literal -compact -offset Ds

View File

@ -461,6 +461,7 @@ ZFS_OBJS_OS := \
zpl_ctldir.o \ zpl_ctldir.o \
zpl_export.o \ zpl_export.o \
zpl_file.o \ zpl_file.o \
zpl_file_range.o \
zpl_inode.o \ zpl_inode.o \
zpl_super.o \ zpl_super.o \
zpl_xattr.o \ zpl_xattr.o \

View File

@ -168,4 +168,4 @@ gen-zstd-symbols:
for obj in $(addprefix zstd/,$(ZSTD_UPSTREAM_OBJS)); do echo; echo "/* $${obj#zstd/}: */"; @OBJDUMP@ -t $$obj | awk '$$2 == "g" && !/ zfs_/ {print "#define\t" $$6 " zfs_" $$6}' | sort; done >> zstd/include/zstd_compat_wrapper.h for obj in $(addprefix zstd/,$(ZSTD_UPSTREAM_OBJS)); do echo; echo "/* $${obj#zstd/}: */"; @OBJDUMP@ -t $$obj | awk '$$2 == "g" && !/ zfs_/ {print "#define\t" $$6 " zfs_" $$6}' | sort; done >> zstd/include/zstd_compat_wrapper.h
check-zstd-symbols: check-zstd-symbols:
@OBJDUMP@ -t $(addprefix zstd/,$(ZSTD_UPSTREAM_OBJS)) | awk '/file format/ {print} $$2 == "g" && !/ zfs_/ {++ret; print} END {exit ret}' @OBJDUMP@ -t $(addprefix zstd/,$(ZSTD_UPSTREAM_OBJS)) | awk '/file format/ {print} $$2 == "g" && (!/ zfs_/ && !/ __pfx_zfs_/) {++ret; print} END {exit ret}'

View File

@ -49,6 +49,7 @@
.type zfs_sha256_block_armv7,%function .type zfs_sha256_block_armv7,%function
.align 6 .align 6
zfs_sha256_block_armv7: zfs_sha256_block_armv7:
hint #34 // bti c
stp x29,x30,[sp,#-128]! stp x29,x30,[sp,#-128]!
add x29,sp,#0 add x29,sp,#0
@ -1015,6 +1016,7 @@ zfs_sha256_block_armv7:
.type zfs_sha256_block_armv8,%function .type zfs_sha256_block_armv8,%function
.align 6 .align 6
zfs_sha256_block_armv8: zfs_sha256_block_armv8:
hint #34 // bti c
.Lv8_entry: .Lv8_entry:
stp x29,x30,[sp,#-16]! stp x29,x30,[sp,#-16]!
add x29,sp,#0 add x29,sp,#0
@ -1155,6 +1157,7 @@ zfs_sha256_block_armv8:
.type zfs_sha256_block_neon,%function .type zfs_sha256_block_neon,%function
.align 4 .align 4
zfs_sha256_block_neon: zfs_sha256_block_neon:
hint #34 // bti c
.Lneon_entry: .Lneon_entry:
stp x29, x30, [sp, #-16]! stp x29, x30, [sp, #-16]!
mov x29, sp mov x29, sp

View File

@ -73,6 +73,7 @@
.type zfs_sha512_block_armv7,%function .type zfs_sha512_block_armv7,%function
.align 6 .align 6
zfs_sha512_block_armv7: zfs_sha512_block_armv7:
hint #34 // bti c
stp x29,x30,[sp,#-128]! stp x29,x30,[sp,#-128]!
add x29,sp,#0 add x29,sp,#0
@ -1040,6 +1041,7 @@ zfs_sha512_block_armv7:
.type zfs_sha512_block_armv8,%function .type zfs_sha512_block_armv8,%function
.align 6 .align 6
zfs_sha512_block_armv8: zfs_sha512_block_armv8:
hint #34 // bti c
.Lv8_entry: .Lv8_entry:
// Armv8.3-A PAuth: even though x30 is pushed to stack it is not popped later // Armv8.3-A PAuth: even though x30 is pushed to stack it is not popped later
stp x29,x30,[sp,#-16]! stp x29,x30,[sp,#-16]!

View File

@ -596,28 +596,6 @@ SYSCTL_UINT(_vfs_zfs_metaslab, OID_AUTO, df_free_pct,
" space map to continue allocations in a first-fit fashion"); " space map to continue allocations in a first-fit fashion");
/* END CSTYLED */ /* END CSTYLED */
/*
* Percentage of all cpus that can be used by the metaslab taskq.
*/
extern int metaslab_load_pct;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs_metaslab, OID_AUTO, load_pct,
CTLFLAG_RWTUN, &metaslab_load_pct, 0,
"Percentage of cpus that can be used by the metaslab taskq");
/* END CSTYLED */
/*
* Max number of metaslabs per group to preload.
*/
extern uint_t metaslab_preload_limit;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs_metaslab, OID_AUTO, preload_limit,
CTLFLAG_RWTUN, &metaslab_preload_limit, 0,
"Max number of metaslabs per group to preload");
/* END CSTYLED */
/* mmp.c */ /* mmp.c */
int int

View File

@ -1154,7 +1154,6 @@ zfsvfs_free(zfsvfs_t *zfsvfs)
mutex_destroy(&zfsvfs->z_znodes_lock); mutex_destroy(&zfsvfs->z_znodes_lock);
mutex_destroy(&zfsvfs->z_lock); mutex_destroy(&zfsvfs->z_lock);
ASSERT3U(zfsvfs->z_nr_znodes, ==, 0);
list_destroy(&zfsvfs->z_all_znodes); list_destroy(&zfsvfs->z_all_znodes);
ZFS_TEARDOWN_DESTROY(zfsvfs); ZFS_TEARDOWN_DESTROY(zfsvfs);
ZFS_TEARDOWN_INACTIVE_DESTROY(zfsvfs); ZFS_TEARDOWN_INACTIVE_DESTROY(zfsvfs);
@ -1558,12 +1557,11 @@ zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting)
* may add the parents of dir-based xattrs to the taskq * may add the parents of dir-based xattrs to the taskq
* so we want to wait for these. * so we want to wait for these.
* *
* We can safely read z_nr_znodes without locking because the * We can safely check z_all_znodes for being empty because the
* VFS has already blocked operations which add to the * VFS has already blocked operations which add to it.
* z_all_znodes list and thus increment z_nr_znodes.
*/ */
int round = 0; int round = 0;
while (zfsvfs->z_nr_znodes > 0) { while (!list_is_empty(&zfsvfs->z_all_znodes)) {
taskq_wait_outstanding(dsl_pool_zrele_taskq( taskq_wait_outstanding(dsl_pool_zrele_taskq(
dmu_objset_pool(zfsvfs->z_os)), 0); dmu_objset_pool(zfsvfs->z_os)), 0);
if (++round > 1 && !unmounting) if (++round > 1 && !unmounting)

View File

@ -6263,7 +6263,8 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
goto bad_write_fallback; goto bad_write_fallback;
} }
} else { } else {
#if __FreeBSD_version >= 1400086 #if (__FreeBSD_version >= 1302506 && __FreeBSD_version < 1400000) || \
__FreeBSD_version >= 1400086
vn_lock_pair(invp, false, LK_EXCLUSIVE, outvp, false, vn_lock_pair(invp, false, LK_EXCLUSIVE, outvp, false,
LK_EXCLUSIVE); LK_EXCLUSIVE);
#else #else
@ -6289,7 +6290,8 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp), error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
ap->a_outoffp, &len, ap->a_outcred); ap->a_outoffp, &len, ap->a_outcred);
if (error == EXDEV) if (error == EXDEV || error == EAGAIN || error == EINVAL ||
error == EOPNOTSUPP)
goto bad_locked_fallback; goto bad_locked_fallback;
*ap->a_lenp = (size_t)len; *ap->a_lenp = (size_t)len;
out_locked: out_locked:

View File

@ -537,7 +537,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
mutex_enter(&zfsvfs->z_znodes_lock); mutex_enter(&zfsvfs->z_znodes_lock);
list_insert_tail(&zfsvfs->z_all_znodes, zp); list_insert_tail(&zfsvfs->z_all_znodes, zp);
zfsvfs->z_nr_znodes++;
zp->z_zfsvfs = zfsvfs; zp->z_zfsvfs = zfsvfs;
mutex_exit(&zfsvfs->z_znodes_lock); mutex_exit(&zfsvfs->z_znodes_lock);
@ -1286,7 +1285,6 @@ zfs_znode_free(znode_t *zp)
mutex_enter(&zfsvfs->z_znodes_lock); mutex_enter(&zfsvfs->z_znodes_lock);
POINTER_INVALIDATE(&zp->z_zfsvfs); POINTER_INVALIDATE(&zp->z_zfsvfs);
list_remove(&zfsvfs->z_all_znodes, zp); list_remove(&zfsvfs->z_all_znodes, zp);
zfsvfs->z_nr_znodes--;
mutex_exit(&zfsvfs->z_znodes_lock); mutex_exit(&zfsvfs->z_znodes_lock);
#if __FreeBSD_version >= 1300139 #if __FreeBSD_version >= 1300139

View File

@ -47,6 +47,10 @@ static unsigned long table_min = 0;
static unsigned long table_max = ~0; static unsigned long table_max = ~0;
static struct ctl_table_header *spl_header = NULL; static struct ctl_table_header *spl_header = NULL;
#ifndef HAVE_REGISTER_SYSCTL_TABLE
static struct ctl_table_header *spl_kmem = NULL;
static struct ctl_table_header *spl_kstat = NULL;
#endif
static struct proc_dir_entry *proc_spl = NULL; static struct proc_dir_entry *proc_spl = NULL;
static struct proc_dir_entry *proc_spl_kmem = NULL; static struct proc_dir_entry *proc_spl_kmem = NULL;
static struct proc_dir_entry *proc_spl_kmem_slab = NULL; static struct proc_dir_entry *proc_spl_kmem_slab = NULL;
@ -624,6 +628,7 @@ static struct ctl_table spl_table[] = {
.mode = 0644, .mode = 0644,
.proc_handler = &proc_dohostid, .proc_handler = &proc_dohostid,
}, },
#ifdef HAVE_REGISTER_SYSCTL_TABLE
{ {
.procname = "kmem", .procname = "kmem",
.mode = 0555, .mode = 0555,
@ -634,9 +639,11 @@ static struct ctl_table spl_table[] = {
.mode = 0555, .mode = 0555,
.child = spl_kstat_table, .child = spl_kstat_table,
}, },
#endif
{}, {},
}; };
#ifdef HAVE_REGISTER_SYSCTL_TABLE
static struct ctl_table spl_dir[] = { static struct ctl_table spl_dir[] = {
{ {
.procname = "spl", .procname = "spl",
@ -654,15 +661,58 @@ static struct ctl_table spl_root[] = {
}, },
{} {}
}; };
#endif
static void spl_proc_cleanup(void)
{
remove_proc_entry("kstat", proc_spl);
remove_proc_entry("slab", proc_spl_kmem);
remove_proc_entry("kmem", proc_spl);
remove_proc_entry("taskq-all", proc_spl);
remove_proc_entry("taskq", proc_spl);
remove_proc_entry("spl", NULL);
#ifndef HAVE_REGISTER_SYSCTL_TABLE
if (spl_kstat) {
unregister_sysctl_table(spl_kstat);
spl_kstat = NULL;
}
if (spl_kmem) {
unregister_sysctl_table(spl_kmem);
spl_kmem = NULL;
}
#endif
if (spl_header) {
unregister_sysctl_table(spl_header);
spl_header = NULL;
}
}
int int
spl_proc_init(void) spl_proc_init(void)
{ {
int rc = 0; int rc = 0;
#ifdef HAVE_REGISTER_SYSCTL_TABLE
spl_header = register_sysctl_table(spl_root); spl_header = register_sysctl_table(spl_root);
if (spl_header == NULL) if (spl_header == NULL)
return (-EUNATCH); return (-EUNATCH);
#else
spl_header = register_sysctl("kernel/spl", spl_table);
if (spl_header == NULL)
return (-EUNATCH);
spl_kmem = register_sysctl("kernel/spl/kmem", spl_kmem_table);
if (spl_kmem == NULL) {
rc = -EUNATCH;
goto out;
}
spl_kstat = register_sysctl("kernel/spl/kstat", spl_kstat_table);
if (spl_kstat == NULL) {
rc = -EUNATCH;
goto out;
}
#endif
proc_spl = proc_mkdir("spl", NULL); proc_spl = proc_mkdir("spl", NULL);
if (proc_spl == NULL) { if (proc_spl == NULL) {
@ -703,15 +753,8 @@ spl_proc_init(void)
goto out; goto out;
} }
out: out:
if (rc) { if (rc)
remove_proc_entry("kstat", proc_spl); spl_proc_cleanup();
remove_proc_entry("slab", proc_spl_kmem);
remove_proc_entry("kmem", proc_spl);
remove_proc_entry("taskq-all", proc_spl);
remove_proc_entry("taskq", proc_spl);
remove_proc_entry("spl", NULL);
unregister_sysctl_table(spl_header);
}
return (rc); return (rc);
} }
@ -719,13 +762,5 @@ out:
void void
spl_proc_fini(void) spl_proc_fini(void)
{ {
remove_proc_entry("kstat", proc_spl); spl_proc_cleanup();
remove_proc_entry("slab", proc_spl_kmem);
remove_proc_entry("kmem", proc_spl);
remove_proc_entry("taskq-all", proc_spl);
remove_proc_entry("taskq", proc_spl);
remove_proc_entry("spl", NULL);
ASSERT(spl_header != NULL);
unregister_sysctl_table(spl_header);
} }

View File

@ -193,7 +193,9 @@ qat_dc_init(void)
sd.huffType = CPA_DC_HT_FULL_DYNAMIC; sd.huffType = CPA_DC_HT_FULL_DYNAMIC;
sd.sessDirection = CPA_DC_DIR_COMBINED; sd.sessDirection = CPA_DC_DIR_COMBINED;
sd.sessState = CPA_DC_STATELESS; sd.sessState = CPA_DC_STATELESS;
#if (CPA_DC_API_VERSION_NUM_MAJOR == 1 && CPA_DC_API_VERSION_NUM_MINOR < 6)
sd.deflateWindowSize = 7; sd.deflateWindowSize = 7;
#endif
sd.checksum = CPA_DC_ADLER32; sd.checksum = CPA_DC_ADLER32;
status = cpaDcGetSessionSize(dc_inst_handles[i], status = cpaDcGetSessionSize(dc_inst_handles[i],
&sd, &sess_size, &ctx_size); &sd, &sess_size, &ctx_size);

View File

@ -80,9 +80,22 @@ typedef struct dio_request {
static unsigned int zfs_vdev_failfast_mask = 1; static unsigned int zfs_vdev_failfast_mask = 1;
#ifdef HAVE_BLK_MODE_T
static blk_mode_t
#else
static fmode_t static fmode_t
#endif
vdev_bdev_mode(spa_mode_t spa_mode) vdev_bdev_mode(spa_mode_t spa_mode)
{ {
#ifdef HAVE_BLK_MODE_T
blk_mode_t mode = 0;
if (spa_mode & SPA_MODE_READ)
mode |= BLK_OPEN_READ;
if (spa_mode & SPA_MODE_WRITE)
mode |= BLK_OPEN_WRITE;
#else
fmode_t mode = 0; fmode_t mode = 0;
if (spa_mode & SPA_MODE_READ) if (spa_mode & SPA_MODE_READ)
@ -90,6 +103,7 @@ vdev_bdev_mode(spa_mode_t spa_mode)
if (spa_mode & SPA_MODE_WRITE) if (spa_mode & SPA_MODE_WRITE)
mode |= FMODE_WRITE; mode |= FMODE_WRITE;
#endif
return (mode); return (mode);
} }
@ -197,12 +211,47 @@ vdev_disk_kobj_evt_post(vdev_t *v)
} }
} }
#if !defined(HAVE_BLKDEV_GET_BY_PATH_4ARG)
/*
* Define a dummy struct blk_holder_ops for kernel versions
* prior to 6.5.
*/
struct blk_holder_ops {};
#endif
static struct block_device *
vdev_blkdev_get_by_path(const char *path, spa_mode_t mode, void *holder,
const struct blk_holder_ops *hops)
{
#ifdef HAVE_BLKDEV_GET_BY_PATH_4ARG
return (blkdev_get_by_path(path,
vdev_bdev_mode(mode) | BLK_OPEN_EXCL, holder, hops));
#else
return (blkdev_get_by_path(path,
vdev_bdev_mode(mode) | FMODE_EXCL, holder));
#endif
}
static void
vdev_blkdev_put(struct block_device *bdev, spa_mode_t mode, void *holder)
{
#ifdef HAVE_BLKDEV_PUT_HOLDER
return (blkdev_put(bdev, holder));
#else
return (blkdev_put(bdev, vdev_bdev_mode(mode) | FMODE_EXCL));
#endif
}
static int static int
vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize, vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
uint64_t *logical_ashift, uint64_t *physical_ashift) uint64_t *logical_ashift, uint64_t *physical_ashift)
{ {
struct block_device *bdev; struct block_device *bdev;
#ifdef HAVE_BLK_MODE_T
blk_mode_t mode = vdev_bdev_mode(spa_mode(v->vdev_spa));
#else
fmode_t mode = vdev_bdev_mode(spa_mode(v->vdev_spa)); fmode_t mode = vdev_bdev_mode(spa_mode(v->vdev_spa));
#endif
hrtime_t timeout = MSEC2NSEC(zfs_vdev_open_timeout_ms); hrtime_t timeout = MSEC2NSEC(zfs_vdev_open_timeout_ms);
vdev_disk_t *vd; vdev_disk_t *vd;
@ -252,15 +301,15 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
reread_part = B_TRUE; reread_part = B_TRUE;
} }
blkdev_put(bdev, mode | FMODE_EXCL); vdev_blkdev_put(bdev, mode, zfs_vdev_holder);
} }
if (reread_part) { if (reread_part) {
bdev = blkdev_get_by_path(disk_name, mode | FMODE_EXCL, bdev = vdev_blkdev_get_by_path(disk_name, mode,
zfs_vdev_holder); zfs_vdev_holder, NULL);
if (!IS_ERR(bdev)) { if (!IS_ERR(bdev)) {
int error = vdev_bdev_reread_part(bdev); int error = vdev_bdev_reread_part(bdev);
blkdev_put(bdev, mode | FMODE_EXCL); vdev_blkdev_put(bdev, mode, zfs_vdev_holder);
if (error == 0) { if (error == 0) {
timeout = MSEC2NSEC( timeout = MSEC2NSEC(
zfs_vdev_open_timeout_ms * 2); zfs_vdev_open_timeout_ms * 2);
@ -305,8 +354,8 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
hrtime_t start = gethrtime(); hrtime_t start = gethrtime();
bdev = ERR_PTR(-ENXIO); bdev = ERR_PTR(-ENXIO);
while (IS_ERR(bdev) && ((gethrtime() - start) < timeout)) { while (IS_ERR(bdev) && ((gethrtime() - start) < timeout)) {
bdev = blkdev_get_by_path(v->vdev_path, mode | FMODE_EXCL, bdev = vdev_blkdev_get_by_path(v->vdev_path, mode,
zfs_vdev_holder); zfs_vdev_holder, NULL);
if (unlikely(PTR_ERR(bdev) == -ENOENT)) { if (unlikely(PTR_ERR(bdev) == -ENOENT)) {
/* /*
* There is no point of waiting since device is removed * There is no point of waiting since device is removed
@ -382,8 +431,8 @@ vdev_disk_close(vdev_t *v)
return; return;
if (vd->vd_bdev != NULL) { if (vd->vd_bdev != NULL) {
blkdev_put(vd->vd_bdev, vdev_blkdev_put(vd->vd_bdev, spa_mode(v->vdev_spa),
vdev_bdev_mode(spa_mode(v->vdev_spa)) | FMODE_EXCL); zfs_vdev_holder);
} }
rw_destroy(&vd->vd_lock); rw_destroy(&vd->vd_lock);

View File

@ -478,16 +478,18 @@ zfsctl_is_snapdir(struct inode *ip)
*/ */
static struct inode * static struct inode *
zfsctl_inode_alloc(zfsvfs_t *zfsvfs, uint64_t id, zfsctl_inode_alloc(zfsvfs_t *zfsvfs, uint64_t id,
const struct file_operations *fops, const struct inode_operations *ops) const struct file_operations *fops, const struct inode_operations *ops,
uint64_t creation)
{ {
inode_timespec_t now;
struct inode *ip; struct inode *ip;
znode_t *zp; znode_t *zp;
inode_timespec_t now = {.tv_sec = creation};
ip = new_inode(zfsvfs->z_sb); ip = new_inode(zfsvfs->z_sb);
if (ip == NULL) if (ip == NULL)
return (NULL); return (NULL);
if (!creation)
now = current_time(ip); now = current_time(ip);
zp = ITOZ(ip); zp = ITOZ(ip);
ASSERT3P(zp->z_dirlocks, ==, NULL); ASSERT3P(zp->z_dirlocks, ==, NULL);
@ -535,7 +537,6 @@ zfsctl_inode_alloc(zfsvfs_t *zfsvfs, uint64_t id,
mutex_enter(&zfsvfs->z_znodes_lock); mutex_enter(&zfsvfs->z_znodes_lock);
list_insert_tail(&zfsvfs->z_all_znodes, zp); list_insert_tail(&zfsvfs->z_all_znodes, zp);
zfsvfs->z_nr_znodes++;
membar_producer(); membar_producer();
mutex_exit(&zfsvfs->z_znodes_lock); mutex_exit(&zfsvfs->z_znodes_lock);
@ -552,14 +553,28 @@ zfsctl_inode_lookup(zfsvfs_t *zfsvfs, uint64_t id,
const struct file_operations *fops, const struct inode_operations *ops) const struct file_operations *fops, const struct inode_operations *ops)
{ {
struct inode *ip = NULL; struct inode *ip = NULL;
uint64_t creation = 0;
dsl_dataset_t *snap_ds;
dsl_pool_t *pool;
while (ip == NULL) { while (ip == NULL) {
ip = ilookup(zfsvfs->z_sb, (unsigned long)id); ip = ilookup(zfsvfs->z_sb, (unsigned long)id);
if (ip) if (ip)
break; break;
if (id <= ZFSCTL_INO_SNAPDIRS && !creation) {
pool = dmu_objset_pool(zfsvfs->z_os);
dsl_pool_config_enter(pool, FTAG);
if (!dsl_dataset_hold_obj(pool,
ZFSCTL_INO_SNAPDIRS - id, FTAG, &snap_ds)) {
creation = dsl_get_creation(snap_ds);
dsl_dataset_rele(snap_ds, FTAG);
}
dsl_pool_config_exit(pool, FTAG);
}
/* May fail due to concurrent zfsctl_inode_alloc() */ /* May fail due to concurrent zfsctl_inode_alloc() */
ip = zfsctl_inode_alloc(zfsvfs, id, fops, ops); ip = zfsctl_inode_alloc(zfsvfs, id, fops, ops, creation);
} }
return (ip); return (ip);
@ -581,7 +596,7 @@ zfsctl_create(zfsvfs_t *zfsvfs)
ASSERT(zfsvfs->z_ctldir == NULL); ASSERT(zfsvfs->z_ctldir == NULL);
zfsvfs->z_ctldir = zfsctl_inode_alloc(zfsvfs, ZFSCTL_INO_ROOT, zfsvfs->z_ctldir = zfsctl_inode_alloc(zfsvfs, ZFSCTL_INO_ROOT,
&zpl_fops_root, &zpl_ops_root); &zpl_fops_root, &zpl_ops_root, 0);
if (zfsvfs->z_ctldir == NULL) if (zfsvfs->z_ctldir == NULL)
return (SET_ERROR(ENOENT)); return (SET_ERROR(ENOENT));

View File

@ -1330,12 +1330,11 @@ zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting)
* may add the parents of dir-based xattrs to the taskq * may add the parents of dir-based xattrs to the taskq
* so we want to wait for these. * so we want to wait for these.
* *
* We can safely read z_nr_znodes without locking because the * We can safely check z_all_znodes for being empty because the
* VFS has already blocked operations which add to the * VFS has already blocked operations which add to it.
* z_all_znodes list and thus increment z_nr_znodes.
*/ */
int round = 0; int round = 0;
while (zfsvfs->z_nr_znodes > 0) { while (!list_is_empty(&zfsvfs->z_all_znodes)) {
taskq_wait_outstanding(dsl_pool_zrele_taskq( taskq_wait_outstanding(dsl_pool_zrele_taskq(
dmu_objset_pool(zfsvfs->z_os)), 0); dmu_objset_pool(zfsvfs->z_os)), 0);
if (++round > 1 && !unmounting) if (++round > 1 && !unmounting)
@ -1662,6 +1661,7 @@ zfs_umount(struct super_block *sb)
} }
zfsvfs_free(zfsvfs); zfsvfs_free(zfsvfs);
sb->s_fs_info = NULL;
return (0); return (0);
} }
@ -2091,6 +2091,9 @@ zfs_init(void)
zfs_znode_init(); zfs_znode_init();
dmu_objset_register_type(DMU_OST_ZFS, zpl_get_file_info); dmu_objset_register_type(DMU_OST_ZFS, zpl_get_file_info);
register_filesystem(&zpl_fs_type); register_filesystem(&zpl_fs_type);
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
register_fo_extend(&zpl_file_operations);
#endif
} }
void void
@ -2101,6 +2104,9 @@ zfs_fini(void)
*/ */
taskq_wait(system_delay_taskq); taskq_wait(system_delay_taskq);
taskq_wait(system_taskq); taskq_wait(system_taskq);
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
unregister_fo_extend(&zpl_file_operations);
#endif
unregister_filesystem(&zpl_fs_type); unregister_filesystem(&zpl_fs_type);
zfs_znode_fini(); zfs_znode_fini();
zfsctl_fini(); zfsctl_fini();

View File

@ -186,7 +186,7 @@ zfs_open(struct inode *ip, int mode, int flag, cred_t *cr)
return (error); return (error);
/* Honor ZFS_APPENDONLY file attribute */ /* Honor ZFS_APPENDONLY file attribute */
if ((mode & FMODE_WRITE) && (zp->z_pflags & ZFS_APPENDONLY) && if (blk_mode_is_open_write(mode) && (zp->z_pflags & ZFS_APPENDONLY) &&
((flag & O_APPEND) == 0)) { ((flag & O_APPEND) == 0)) {
zfs_exit(zfsvfs, FTAG); zfs_exit(zfsvfs, FTAG);
return (SET_ERROR(EPERM)); return (SET_ERROR(EPERM));

View File

@ -390,7 +390,6 @@ zfs_inode_destroy(struct inode *ip)
mutex_enter(&zfsvfs->z_znodes_lock); mutex_enter(&zfsvfs->z_znodes_lock);
if (list_link_active(&zp->z_link_node)) { if (list_link_active(&zp->z_link_node)) {
list_remove(&zfsvfs->z_all_znodes, zp); list_remove(&zfsvfs->z_all_znodes, zp);
zfsvfs->z_nr_znodes--;
} }
mutex_exit(&zfsvfs->z_znodes_lock); mutex_exit(&zfsvfs->z_znodes_lock);
@ -415,7 +414,11 @@ zfs_inode_set_ops(zfsvfs_t *zfsvfs, struct inode *ip)
switch (ip->i_mode & S_IFMT) { switch (ip->i_mode & S_IFMT) {
case S_IFREG: case S_IFREG:
ip->i_op = &zpl_inode_operations; ip->i_op = &zpl_inode_operations;
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
ip->i_fop = &zpl_file_operations.kabi_fops;
#else
ip->i_fop = &zpl_file_operations; ip->i_fop = &zpl_file_operations;
#endif
ip->i_mapping->a_ops = &zpl_address_space_operations; ip->i_mapping->a_ops = &zpl_address_space_operations;
break; break;
@ -455,7 +458,11 @@ zfs_inode_set_ops(zfsvfs_t *zfsvfs, struct inode *ip)
/* Assume the inode is a file and attempt to continue */ /* Assume the inode is a file and attempt to continue */
ip->i_mode = S_IFREG | 0644; ip->i_mode = S_IFREG | 0644;
ip->i_op = &zpl_inode_operations; ip->i_op = &zpl_inode_operations;
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
ip->i_fop = &zpl_file_operations.kabi_fops;
#else
ip->i_fop = &zpl_file_operations; ip->i_fop = &zpl_file_operations;
#endif
ip->i_mapping->a_ops = &zpl_address_space_operations; ip->i_mapping->a_ops = &zpl_address_space_operations;
break; break;
} }
@ -633,7 +640,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
mutex_enter(&zfsvfs->z_znodes_lock); mutex_enter(&zfsvfs->z_znodes_lock);
list_insert_tail(&zfsvfs->z_all_znodes, zp); list_insert_tail(&zfsvfs->z_all_znodes, zp);
zfsvfs->z_nr_znodes++;
mutex_exit(&zfsvfs->z_znodes_lock); mutex_exit(&zfsvfs->z_znodes_lock);
if (links > 0) if (links > 0)

View File

@ -42,7 +42,7 @@
static int static int
zpl_common_open(struct inode *ip, struct file *filp) zpl_common_open(struct inode *ip, struct file *filp)
{ {
if (filp->f_mode & FMODE_WRITE) if (blk_mode_is_open_write(filp->f_mode))
return (-EACCES); return (-EACCES);
return (generic_file_open(ip, filp)); return (generic_file_open(ip, filp));

View File

@ -301,15 +301,10 @@ zpl_uio_init(zfs_uio_t *uio, struct kiocb *kiocb, struct iov_iter *to,
#if defined(HAVE_VFS_IOV_ITER) #if defined(HAVE_VFS_IOV_ITER)
zfs_uio_iov_iter_init(uio, to, pos, count, skip); zfs_uio_iov_iter_init(uio, to, pos, count, skip);
#else #else
#ifdef HAVE_IOV_ITER_TYPE zfs_uio_iovec_init(uio, zfs_uio_iter_iov(to), to->nr_segs, pos,
zfs_uio_iovec_init(uio, to->iov, to->nr_segs, pos, zfs_uio_iov_iter_type(to) & ITER_KVEC ?
iov_iter_type(to) & ITER_KVEC ? UIO_SYSSPACE : UIO_USERSPACE, UIO_SYSSPACE : UIO_USERSPACE,
count, skip); count, skip);
#else
zfs_uio_iovec_init(uio, to->iov, to->nr_segs, pos,
to->type & ITER_KVEC ? UIO_SYSSPACE : UIO_USERSPACE,
count, skip);
#endif
#endif #endif
} }
@ -1257,6 +1252,12 @@ zpl_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return (zpl_ioctl_getdosflags(filp, (void *)arg)); return (zpl_ioctl_getdosflags(filp, (void *)arg));
case ZFS_IOC_SETDOSFLAGS: case ZFS_IOC_SETDOSFLAGS:
return (zpl_ioctl_setdosflags(filp, (void *)arg)); return (zpl_ioctl_setdosflags(filp, (void *)arg));
case ZFS_IOC_COMPAT_FICLONE:
return (zpl_ioctl_ficlone(filp, (void *)arg));
case ZFS_IOC_COMPAT_FICLONERANGE:
return (zpl_ioctl_ficlonerange(filp, (void *)arg));
case ZFS_IOC_COMPAT_FIDEDUPERANGE:
return (zpl_ioctl_fideduperange(filp, (void *)arg));
default: default:
return (-ENOTTY); return (-ENOTTY);
} }
@ -1283,7 +1284,6 @@ zpl_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
} }
#endif /* CONFIG_COMPAT */ #endif /* CONFIG_COMPAT */
const struct address_space_operations zpl_address_space_operations = { const struct address_space_operations zpl_address_space_operations = {
#ifdef HAVE_VFS_READPAGES #ifdef HAVE_VFS_READPAGES
.readpages = zpl_readpages, .readpages = zpl_readpages,
@ -1306,7 +1306,12 @@ const struct address_space_operations zpl_address_space_operations = {
#endif #endif
}; };
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
const struct file_operations_extend zpl_file_operations = {
.kabi_fops = {
#else
const struct file_operations zpl_file_operations = { const struct file_operations zpl_file_operations = {
#endif
.open = zpl_open, .open = zpl_open,
.release = zpl_release, .release = zpl_release,
.llseek = zpl_llseek, .llseek = zpl_llseek,
@ -1318,7 +1323,11 @@ const struct file_operations zpl_file_operations = {
.read_iter = zpl_iter_read, .read_iter = zpl_iter_read,
.write_iter = zpl_iter_write, .write_iter = zpl_iter_write,
#ifdef HAVE_VFS_IOV_ITER #ifdef HAVE_VFS_IOV_ITER
#ifdef HAVE_COPY_SPLICE_READ
.splice_read = copy_splice_read,
#else
.splice_read = generic_file_splice_read, .splice_read = generic_file_splice_read,
#endif
.splice_write = iter_file_splice_write, .splice_write = iter_file_splice_write,
#endif #endif
#else #else
@ -1333,6 +1342,18 @@ const struct file_operations zpl_file_operations = {
.aio_fsync = zpl_aio_fsync, .aio_fsync = zpl_aio_fsync,
#endif #endif
.fallocate = zpl_fallocate, .fallocate = zpl_fallocate,
#ifdef HAVE_VFS_COPY_FILE_RANGE
.copy_file_range = zpl_copy_file_range,
#endif
#ifdef HAVE_VFS_CLONE_FILE_RANGE
.clone_file_range = zpl_clone_file_range,
#endif
#ifdef HAVE_VFS_REMAP_FILE_RANGE
.remap_file_range = zpl_remap_file_range,
#endif
#ifdef HAVE_VFS_DEDUPE_FILE_RANGE
.dedupe_file_range = zpl_dedupe_file_range,
#endif
#ifdef HAVE_FILE_FADVISE #ifdef HAVE_FILE_FADVISE
.fadvise = zpl_fadvise, .fadvise = zpl_fadvise,
#endif #endif
@ -1340,6 +1361,11 @@ const struct file_operations zpl_file_operations = {
#ifdef CONFIG_COMPAT #ifdef CONFIG_COMPAT
.compat_ioctl = zpl_compat_ioctl, .compat_ioctl = zpl_compat_ioctl,
#endif #endif
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
}, /* kabi_fops */
.copy_file_range = zpl_copy_file_range,
.clone_file_range = zpl_clone_file_range,
#endif
}; };
const struct file_operations zpl_dir_file_operations = { const struct file_operations zpl_dir_file_operations = {

View File

@ -0,0 +1,272 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2023, Klara Inc.
*/
#ifdef CONFIG_COMPAT
#include <linux/compat.h>
#endif
#include <linux/fs.h>
#include <sys/file.h>
#include <sys/zfs_znode.h>
#include <sys/zfs_vnops.h>
#include <sys/zfeature.h>
/*
* Clone part of a file via block cloning.
*
* Note that we are not required to update file offsets; the kernel will take
* care of that depending on how it was called.
*/
static ssize_t
__zpl_clone_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, size_t len)
{
struct inode *src_i = file_inode(src_file);
struct inode *dst_i = file_inode(dst_file);
uint64_t src_off_o = (uint64_t)src_off;
uint64_t dst_off_o = (uint64_t)dst_off;
uint64_t len_o = (uint64_t)len;
cred_t *cr = CRED();
fstrans_cookie_t cookie;
int err;
if (!spa_feature_is_enabled(
dmu_objset_spa(ITOZSB(dst_i)->z_os), SPA_FEATURE_BLOCK_CLONING))
return (-EOPNOTSUPP);
if (src_i != dst_i)
spl_inode_lock_shared(src_i);
spl_inode_lock(dst_i);
crhold(cr);
cookie = spl_fstrans_mark();
err = -zfs_clone_range(ITOZ(src_i), &src_off_o, ITOZ(dst_i),
&dst_off_o, &len_o, cr);
spl_fstrans_unmark(cookie);
crfree(cr);
spl_inode_unlock(dst_i);
if (src_i != dst_i)
spl_inode_unlock_shared(src_i);
if (err < 0)
return (err);
return ((ssize_t)len_o);
}
#if defined(HAVE_VFS_COPY_FILE_RANGE) || \
defined(HAVE_VFS_FILE_OPERATIONS_EXTEND)
/*
* Entry point for copy_file_range(). Copy len bytes from src_off in src_file
* to dst_off in dst_file. We are permitted to do this however we like, so we
* try to just clone the blocks, and if we can't support it, fall back to the
* kernel's generic byte copy function.
*/
ssize_t
zpl_copy_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, size_t len, unsigned int flags)
{
ssize_t ret;
if (flags != 0)
return (-EINVAL);
/* Try to do it via zfs_clone_range() */
ret = __zpl_clone_file_range(src_file, src_off,
dst_file, dst_off, len);
#ifdef HAVE_VFS_GENERIC_COPY_FILE_RANGE
/*
* Since Linux 5.3 the filesystem driver is responsible for executing
* an appropriate fallback, and a generic fallback function is provided.
*/
if (ret == -EOPNOTSUPP || ret == -EINVAL || ret == -EXDEV ||
ret == -EAGAIN)
ret = generic_copy_file_range(src_file, src_off, dst_file,
dst_off, len, flags);
#else
/*
* Before Linux 5.3 the filesystem has to return -EOPNOTSUPP to signal
* to the kernel that it should fallback to a content copy.
*/
if (ret == -EINVAL || ret == -EXDEV || ret == -EAGAIN)
ret = -EOPNOTSUPP;
#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE */
return (ret);
}
#endif /* HAVE_VFS_COPY_FILE_RANGE || HAVE_VFS_FILE_OPERATIONS_EXTEND */
#ifdef HAVE_VFS_REMAP_FILE_RANGE
/*
* Entry point for FICLONE/FICLONERANGE/FIDEDUPERANGE.
*
* FICLONE and FICLONERANGE are basically the same as copy_file_range(), except
* that they must clone - they cannot fall back to copying. FICLONE is exactly
* FICLONERANGE, for the entire file. We don't need to try to tell them apart;
* the kernel will sort that out for us.
*
* FIDEDUPERANGE is for turning a non-clone into a clone, that is, compare the
* range in both files and if they're the same, arrange for them to be backed
* by the same storage.
*/
loff_t
zpl_remap_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, loff_t len, unsigned int flags)
{
if (flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_CAN_SHORTEN))
return (-EINVAL);
/*
* REMAP_FILE_CAN_SHORTEN lets us know we can clone less than the given
* range if we want. Its designed for filesystems that make data past
* EOF available, and don't want it to be visible in both files. ZFS
* doesn't do that, so we just turn the flag off.
*/
flags &= ~REMAP_FILE_CAN_SHORTEN;
if (flags & REMAP_FILE_DEDUP)
/* No support for dedup yet */
return (-EOPNOTSUPP);
/* Zero length means to clone everything to the end of the file */
if (len == 0)
len = i_size_read(file_inode(src_file)) - src_off;
return (__zpl_clone_file_range(src_file, src_off,
dst_file, dst_off, len));
}
#endif /* HAVE_VFS_REMAP_FILE_RANGE */
#if defined(HAVE_VFS_CLONE_FILE_RANGE) || \
defined(HAVE_VFS_FILE_OPERATIONS_EXTEND)
/*
* Entry point for FICLONE and FICLONERANGE, before Linux 4.20.
*/
int
zpl_clone_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len)
{
/* Zero length means to clone everything to the end of the file */
if (len == 0)
len = i_size_read(file_inode(src_file)) - src_off;
return (__zpl_clone_file_range(src_file, src_off,
dst_file, dst_off, len));
}
#endif /* HAVE_VFS_CLONE_FILE_RANGE || HAVE_VFS_FILE_OPERATIONS_EXTEND */
#ifdef HAVE_VFS_DEDUPE_FILE_RANGE
/*
* Entry point for FIDEDUPERANGE, before Linux 4.20.
*/
int
zpl_dedupe_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len)
{
/* No support for dedup yet */
return (-EOPNOTSUPP);
}
#endif /* HAVE_VFS_DEDUPE_FILE_RANGE */
/* Entry point for FICLONE, before Linux 4.5. */
long
zpl_ioctl_ficlone(struct file *dst_file, void *arg)
{
unsigned long sfd = (unsigned long)arg;
struct file *src_file = fget(sfd);
if (src_file == NULL)
return (-EBADF);
if (dst_file->f_op != src_file->f_op)
return (-EXDEV);
size_t len = i_size_read(file_inode(src_file));
ssize_t ret =
__zpl_clone_file_range(src_file, 0, dst_file, 0, len);
fput(src_file);
if (ret < 0) {
if (ret == -EOPNOTSUPP)
return (-ENOTTY);
return (ret);
}
if (ret != len)
return (-EINVAL);
return (0);
}
/* Entry point for FICLONERANGE, before Linux 4.5. */
long
zpl_ioctl_ficlonerange(struct file *dst_file, void __user *arg)
{
zfs_ioc_compat_file_clone_range_t fcr;
if (copy_from_user(&fcr, arg, sizeof (fcr)))
return (-EFAULT);
struct file *src_file = fget(fcr.fcr_src_fd);
if (src_file == NULL)
return (-EBADF);
if (dst_file->f_op != src_file->f_op)
return (-EXDEV);
size_t len = fcr.fcr_src_length;
if (len == 0)
len = i_size_read(file_inode(src_file)) - fcr.fcr_src_offset;
ssize_t ret = __zpl_clone_file_range(src_file, fcr.fcr_src_offset,
dst_file, fcr.fcr_dest_offset, len);
fput(src_file);
if (ret < 0) {
if (ret == -EOPNOTSUPP)
return (-ENOTTY);
return (ret);
}
if (ret != len)
return (-EINVAL);
return (0);
}
/* Entry point for FIDEDUPERANGE, before Linux 4.5. */
long
zpl_ioctl_fideduperange(struct file *filp, void *arg)
{
(void) arg;
/* No support for dedup yet */
return (-ENOTTY);
}

View File

@ -277,8 +277,6 @@ zpl_test_super(struct super_block *s, void *data)
{ {
zfsvfs_t *zfsvfs = s->s_fs_info; zfsvfs_t *zfsvfs = s->s_fs_info;
objset_t *os = data; objset_t *os = data;
int match;
/* /*
* If the os doesn't match the z_os in the super_block, assume it is * If the os doesn't match the z_os in the super_block, assume it is
* not a match. Matching would imply a multimount of a dataset. It is * not a match. Matching would imply a multimount of a dataset. It is
@ -286,19 +284,7 @@ zpl_test_super(struct super_block *s, void *data)
* that changes the z_os, e.g., rollback, where the match will be * that changes the z_os, e.g., rollback, where the match will be
* missed, but in that case the user will get an EBUSY. * missed, but in that case the user will get an EBUSY.
*/ */
if (zfsvfs == NULL || os != zfsvfs->z_os) return (zfsvfs != NULL && os == zfsvfs->z_os);
return (0);
/*
* If they do match, recheck with the lock held to prevent mounting the
* wrong dataset since z_os can be stale when the teardown lock is held.
*/
if (zpl_enter(zfsvfs, FTAG) != 0)
return (0);
match = (os == zfsvfs->z_os);
zpl_exit(zfsvfs, FTAG);
return (match);
} }
static struct super_block * static struct super_block *
@ -324,12 +310,35 @@ zpl_mount_impl(struct file_system_type *fs_type, int flags, zfs_mnt_t *zm)
s = sget(fs_type, zpl_test_super, set_anon_super, flags, os); s = sget(fs_type, zpl_test_super, set_anon_super, flags, os);
/*
* Recheck with the lock held to prevent mounting the wrong dataset
* since z_os can be stale when the teardown lock is held.
*
* We can't do this in zpl_test_super in since it's under spinlock and
* also s_umount lock is not held there so it would race with
* zfs_umount and zfsvfs can be freed.
*/
if (!IS_ERR(s) && s->s_fs_info != NULL) {
zfsvfs_t *zfsvfs = s->s_fs_info;
if (zpl_enter(zfsvfs, FTAG) == 0) {
if (os != zfsvfs->z_os)
err = -SET_ERROR(EBUSY);
zpl_exit(zfsvfs, FTAG);
} else {
err = -SET_ERROR(EBUSY);
}
}
dsl_dataset_long_rele(dmu_objset_ds(os), FTAG); dsl_dataset_long_rele(dmu_objset_ds(os), FTAG);
dsl_dataset_rele(dmu_objset_ds(os), FTAG); dsl_dataset_rele(dmu_objset_ds(os), FTAG);
if (IS_ERR(s)) if (IS_ERR(s))
return (ERR_CAST(s)); return (ERR_CAST(s));
if (err) {
deactivate_locked_super(s);
return (ERR_PTR(err));
}
if (s->s_root == NULL) { if (s->s_root == NULL) {
err = zpl_fill_super(s, zm, flags & SB_SILENT ? 1 : 0); err = zpl_fill_super(s, zm, flags & SB_SILENT ? 1 : 0);
if (err) { if (err) {

View File

@ -671,7 +671,11 @@ zvol_request(struct request_queue *q, struct bio *bio)
} }
static int static int
#ifdef HAVE_BLK_MODE_T
zvol_open(struct gendisk *disk, blk_mode_t flag)
#else
zvol_open(struct block_device *bdev, fmode_t flag) zvol_open(struct block_device *bdev, fmode_t flag)
#endif
{ {
zvol_state_t *zv; zvol_state_t *zv;
int error = 0; int error = 0;
@ -686,10 +690,14 @@ retry:
/* /*
* Obtain a copy of private_data under the zvol_state_lock to make * Obtain a copy of private_data under the zvol_state_lock to make
* sure that either the result of zvol free code path setting * sure that either the result of zvol free code path setting
* bdev->bd_disk->private_data to NULL is observed, or zvol_os_free() * disk->private_data to NULL is observed, or zvol_os_free()
* is not called on this zv because of the positive zv_open_count. * is not called on this zv because of the positive zv_open_count.
*/ */
#ifdef HAVE_BLK_MODE_T
zv = disk->private_data;
#else
zv = bdev->bd_disk->private_data; zv = bdev->bd_disk->private_data;
#endif
if (zv == NULL) { if (zv == NULL) {
rw_exit(&zvol_state_lock); rw_exit(&zvol_state_lock);
return (SET_ERROR(-ENXIO)); return (SET_ERROR(-ENXIO));
@ -769,14 +777,15 @@ retry:
} }
} }
error = -zvol_first_open(zv, !(flag & FMODE_WRITE)); error = -zvol_first_open(zv, !(blk_mode_is_open_write(flag)));
if (drop_namespace) if (drop_namespace)
mutex_exit(&spa_namespace_lock); mutex_exit(&spa_namespace_lock);
} }
if (error == 0) { if (error == 0) {
if ((flag & FMODE_WRITE) && (zv->zv_flags & ZVOL_RDONLY)) { if ((blk_mode_is_open_write(flag)) &&
(zv->zv_flags & ZVOL_RDONLY)) {
if (zv->zv_open_count == 0) if (zv->zv_open_count == 0)
zvol_last_close(zv); zvol_last_close(zv);
@ -791,14 +800,25 @@ retry:
rw_exit(&zv->zv_suspend_lock); rw_exit(&zv->zv_suspend_lock);
if (error == 0) if (error == 0)
#ifdef HAVE_BLK_MODE_T
disk_check_media_change(disk);
#else
zfs_check_media_change(bdev); zfs_check_media_change(bdev);
#endif
return (error); return (error);
} }
static void static void
zvol_release(struct gendisk *disk, fmode_t mode) #ifdef HAVE_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG
zvol_release(struct gendisk *disk)
#else
zvol_release(struct gendisk *disk, fmode_t unused)
#endif
{ {
#if !defined(HAVE_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG)
(void) unused;
#endif
zvol_state_t *zv; zvol_state_t *zv;
boolean_t drop_suspend = B_TRUE; boolean_t drop_suspend = B_TRUE;

View File

@ -160,7 +160,7 @@ zpool_prop_init(void)
"wait | continue | panic", "FAILMODE", failuremode_table, "wait | continue | panic", "FAILMODE", failuremode_table,
sfeatures); sfeatures);
zprop_register_index(ZPOOL_PROP_AUTOTRIM, "autotrim", zprop_register_index(ZPOOL_PROP_AUTOTRIM, "autotrim",
SPA_AUTOTRIM_DEFAULT, PROP_DEFAULT, ZFS_TYPE_POOL, SPA_AUTOTRIM_OFF, PROP_DEFAULT, ZFS_TYPE_POOL,
"on | off", "AUTOTRIM", boolean_table, sfeatures); "on | off", "AUTOTRIM", boolean_table, sfeatures);
/* hidden properties */ /* hidden properties */

View File

@ -748,8 +748,7 @@ taskq_t *arc_prune_taskq;
* Other sizes * Other sizes
*/ */
#define HDR_FULL_CRYPT_SIZE ((int64_t)sizeof (arc_buf_hdr_t)) #define HDR_FULL_SIZE ((int64_t)sizeof (arc_buf_hdr_t))
#define HDR_FULL_SIZE ((int64_t)offsetof(arc_buf_hdr_t, b_crypt_hdr))
#define HDR_L2ONLY_SIZE ((int64_t)offsetof(arc_buf_hdr_t, b_l1hdr)) #define HDR_L2ONLY_SIZE ((int64_t)offsetof(arc_buf_hdr_t, b_l1hdr))
/* /*
@ -1113,7 +1112,6 @@ buf_hash_remove(arc_buf_hdr_t *hdr)
*/ */
static kmem_cache_t *hdr_full_cache; static kmem_cache_t *hdr_full_cache;
static kmem_cache_t *hdr_full_crypt_cache;
static kmem_cache_t *hdr_l2only_cache; static kmem_cache_t *hdr_l2only_cache;
static kmem_cache_t *buf_cache; static kmem_cache_t *buf_cache;
@ -1134,7 +1132,6 @@ buf_fini(void)
for (int i = 0; i < BUF_LOCKS; i++) for (int i = 0; i < BUF_LOCKS; i++)
mutex_destroy(BUF_HASH_LOCK(i)); mutex_destroy(BUF_HASH_LOCK(i));
kmem_cache_destroy(hdr_full_cache); kmem_cache_destroy(hdr_full_cache);
kmem_cache_destroy(hdr_full_crypt_cache);
kmem_cache_destroy(hdr_l2only_cache); kmem_cache_destroy(hdr_l2only_cache);
kmem_cache_destroy(buf_cache); kmem_cache_destroy(buf_cache);
} }
@ -1151,7 +1148,6 @@ hdr_full_cons(void *vbuf, void *unused, int kmflag)
memset(hdr, 0, HDR_FULL_SIZE); memset(hdr, 0, HDR_FULL_SIZE);
hdr->b_l1hdr.b_byteswap = DMU_BSWAP_NUMFUNCS; hdr->b_l1hdr.b_byteswap = DMU_BSWAP_NUMFUNCS;
cv_init(&hdr->b_l1hdr.b_cv, NULL, CV_DEFAULT, NULL);
zfs_refcount_create(&hdr->b_l1hdr.b_refcnt); zfs_refcount_create(&hdr->b_l1hdr.b_refcnt);
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
mutex_init(&hdr->b_l1hdr.b_freeze_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&hdr->b_l1hdr.b_freeze_lock, NULL, MUTEX_DEFAULT, NULL);
@ -1163,19 +1159,6 @@ hdr_full_cons(void *vbuf, void *unused, int kmflag)
return (0); return (0);
} }
static int
hdr_full_crypt_cons(void *vbuf, void *unused, int kmflag)
{
(void) unused;
arc_buf_hdr_t *hdr = vbuf;
hdr_full_cons(vbuf, unused, kmflag);
memset(&hdr->b_crypt_hdr, 0, sizeof (hdr->b_crypt_hdr));
arc_space_consume(sizeof (hdr->b_crypt_hdr), ARC_SPACE_HDRS);
return (0);
}
static int static int
hdr_l2only_cons(void *vbuf, void *unused, int kmflag) hdr_l2only_cons(void *vbuf, void *unused, int kmflag)
{ {
@ -1211,7 +1194,6 @@ hdr_full_dest(void *vbuf, void *unused)
arc_buf_hdr_t *hdr = vbuf; arc_buf_hdr_t *hdr = vbuf;
ASSERT(HDR_EMPTY(hdr)); ASSERT(HDR_EMPTY(hdr));
cv_destroy(&hdr->b_l1hdr.b_cv);
zfs_refcount_destroy(&hdr->b_l1hdr.b_refcnt); zfs_refcount_destroy(&hdr->b_l1hdr.b_refcnt);
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
mutex_destroy(&hdr->b_l1hdr.b_freeze_lock); mutex_destroy(&hdr->b_l1hdr.b_freeze_lock);
@ -1220,16 +1202,6 @@ hdr_full_dest(void *vbuf, void *unused)
arc_space_return(HDR_FULL_SIZE, ARC_SPACE_HDRS); arc_space_return(HDR_FULL_SIZE, ARC_SPACE_HDRS);
} }
static void
hdr_full_crypt_dest(void *vbuf, void *unused)
{
(void) vbuf, (void) unused;
hdr_full_dest(vbuf, unused);
arc_space_return(sizeof (((arc_buf_hdr_t *)NULL)->b_crypt_hdr),
ARC_SPACE_HDRS);
}
static void static void
hdr_l2only_dest(void *vbuf, void *unused) hdr_l2only_dest(void *vbuf, void *unused)
{ {
@ -1285,9 +1257,6 @@ retry:
hdr_full_cache = kmem_cache_create("arc_buf_hdr_t_full", HDR_FULL_SIZE, hdr_full_cache = kmem_cache_create("arc_buf_hdr_t_full", HDR_FULL_SIZE,
0, hdr_full_cons, hdr_full_dest, NULL, NULL, NULL, 0); 0, hdr_full_cons, hdr_full_dest, NULL, NULL, NULL, 0);
hdr_full_crypt_cache = kmem_cache_create("arc_buf_hdr_t_full_crypt",
HDR_FULL_CRYPT_SIZE, 0, hdr_full_crypt_cons, hdr_full_crypt_dest,
NULL, NULL, NULL, 0);
hdr_l2only_cache = kmem_cache_create("arc_buf_hdr_t_l2only", hdr_l2only_cache = kmem_cache_create("arc_buf_hdr_t_l2only",
HDR_L2ONLY_SIZE, 0, hdr_l2only_cons, hdr_l2only_dest, NULL, HDR_L2ONLY_SIZE, 0, hdr_l2only_cons, hdr_l2only_dest, NULL,
NULL, NULL, 0); NULL, NULL, 0);
@ -1995,7 +1964,6 @@ arc_buf_untransform_in_place(arc_buf_t *buf)
arc_buf_size(buf)); arc_buf_size(buf));
buf->b_flags &= ~ARC_BUF_FLAG_ENCRYPTED; buf->b_flags &= ~ARC_BUF_FLAG_ENCRYPTED;
buf->b_flags &= ~ARC_BUF_FLAG_COMPRESSED; buf->b_flags &= ~ARC_BUF_FLAG_COMPRESSED;
hdr->b_crypt_hdr.b_ebufcnt -= 1;
} }
/* /*
@ -2230,7 +2198,6 @@ arc_evictable_space_increment(arc_buf_hdr_t *hdr, arc_state_t *state)
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
if (GHOST_STATE(state)) { if (GHOST_STATE(state)) {
ASSERT0(hdr->b_l1hdr.b_bufcnt);
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
ASSERT(!HDR_HAS_RABD(hdr)); ASSERT(!HDR_HAS_RABD(hdr));
@ -2270,7 +2237,6 @@ arc_evictable_space_decrement(arc_buf_hdr_t *hdr, arc_state_t *state)
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
if (GHOST_STATE(state)) { if (GHOST_STATE(state)) {
ASSERT0(hdr->b_l1hdr.b_bufcnt);
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
ASSERT(!HDR_HAS_RABD(hdr)); ASSERT(!HDR_HAS_RABD(hdr));
@ -2386,7 +2352,9 @@ arc_buf_info(arc_buf_t *ab, arc_buf_info_t *abi, int state_index)
l2hdr = &hdr->b_l2hdr; l2hdr = &hdr->b_l2hdr;
if (l1hdr) { if (l1hdr) {
abi->abi_bufcnt = l1hdr->b_bufcnt; abi->abi_bufcnt = 0;
for (arc_buf_t *buf = l1hdr->b_buf; buf; buf = buf->b_next)
abi->abi_bufcnt++;
abi->abi_access = l1hdr->b_arc_access; abi->abi_access = l1hdr->b_arc_access;
abi->abi_mru_hits = l1hdr->b_mru_hits; abi->abi_mru_hits = l1hdr->b_mru_hits;
abi->abi_mru_ghost_hits = l1hdr->b_mru_ghost_hits; abi->abi_mru_ghost_hits = l1hdr->b_mru_ghost_hits;
@ -2414,7 +2382,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
{ {
arc_state_t *old_state; arc_state_t *old_state;
int64_t refcnt; int64_t refcnt;
uint32_t bufcnt;
boolean_t update_old, update_new; boolean_t update_old, update_new;
arc_buf_contents_t type = arc_buf_type(hdr); arc_buf_contents_t type = arc_buf_type(hdr);
@ -2428,19 +2395,16 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
if (HDR_HAS_L1HDR(hdr)) { if (HDR_HAS_L1HDR(hdr)) {
old_state = hdr->b_l1hdr.b_state; old_state = hdr->b_l1hdr.b_state;
refcnt = zfs_refcount_count(&hdr->b_l1hdr.b_refcnt); refcnt = zfs_refcount_count(&hdr->b_l1hdr.b_refcnt);
bufcnt = hdr->b_l1hdr.b_bufcnt; update_old = (hdr->b_l1hdr.b_buf != NULL ||
update_old = (bufcnt > 0 || hdr->b_l1hdr.b_pabd != NULL || hdr->b_l1hdr.b_pabd != NULL || HDR_HAS_RABD(hdr));
HDR_HAS_RABD(hdr));
IMPLY(GHOST_STATE(old_state), bufcnt == 0);
IMPLY(GHOST_STATE(new_state), bufcnt == 0);
IMPLY(GHOST_STATE(old_state), hdr->b_l1hdr.b_buf == NULL); IMPLY(GHOST_STATE(old_state), hdr->b_l1hdr.b_buf == NULL);
IMPLY(GHOST_STATE(new_state), hdr->b_l1hdr.b_buf == NULL); IMPLY(GHOST_STATE(new_state), hdr->b_l1hdr.b_buf == NULL);
IMPLY(old_state == arc_anon, bufcnt <= 1); IMPLY(old_state == arc_anon, hdr->b_l1hdr.b_buf == NULL ||
ARC_BUF_LAST(hdr->b_l1hdr.b_buf));
} else { } else {
old_state = arc_l2c_only; old_state = arc_l2c_only;
refcnt = 0; refcnt = 0;
bufcnt = 0;
update_old = B_FALSE; update_old = B_FALSE;
} }
update_new = update_old; update_new = update_old;
@ -2488,14 +2452,12 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
if (update_new && new_state != arc_l2c_only) { if (update_new && new_state != arc_l2c_only) {
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
if (GHOST_STATE(new_state)) { if (GHOST_STATE(new_state)) {
ASSERT0(bufcnt);
/* /*
* When moving a header to a ghost state, we first * When moving a header to a ghost state, we first
* remove all arc buffers. Thus, we'll have a * remove all arc buffers. Thus, we'll have no arc
* bufcnt of zero, and no arc buffer to use for * buffer to use for the reference. As a result, we
* the reference. As a result, we use the arc * use the arc header pointer for the reference.
* header pointer for the reference.
*/ */
(void) zfs_refcount_add_many( (void) zfs_refcount_add_many(
&new_state->arcs_size[type], &new_state->arcs_size[type],
@ -2503,7 +2465,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
ASSERT(!HDR_HAS_RABD(hdr)); ASSERT(!HDR_HAS_RABD(hdr));
} else { } else {
uint32_t buffers = 0;
/* /*
* Each individual buffer holds a unique reference, * Each individual buffer holds a unique reference,
@ -2512,8 +2473,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
*/ */
for (arc_buf_t *buf = hdr->b_l1hdr.b_buf; buf != NULL; for (arc_buf_t *buf = hdr->b_l1hdr.b_buf; buf != NULL;
buf = buf->b_next) { buf = buf->b_next) {
ASSERT3U(bufcnt, !=, 0);
buffers++;
/* /*
* When the arc_buf_t is sharing the data * When the arc_buf_t is sharing the data
@ -2529,7 +2488,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
&new_state->arcs_size[type], &new_state->arcs_size[type],
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
ASSERT3U(bufcnt, ==, buffers);
if (hdr->b_l1hdr.b_pabd != NULL) { if (hdr->b_l1hdr.b_pabd != NULL) {
(void) zfs_refcount_add_many( (void) zfs_refcount_add_many(
@ -2548,7 +2506,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
if (update_old && old_state != arc_l2c_only) { if (update_old && old_state != arc_l2c_only) {
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
if (GHOST_STATE(old_state)) { if (GHOST_STATE(old_state)) {
ASSERT0(bufcnt);
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
ASSERT(!HDR_HAS_RABD(hdr)); ASSERT(!HDR_HAS_RABD(hdr));
@ -2564,7 +2521,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
&old_state->arcs_size[type], &old_state->arcs_size[type],
HDR_GET_LSIZE(hdr), hdr); HDR_GET_LSIZE(hdr), hdr);
} else { } else {
uint32_t buffers = 0;
/* /*
* Each individual buffer holds a unique reference, * Each individual buffer holds a unique reference,
@ -2573,8 +2529,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
*/ */
for (arc_buf_t *buf = hdr->b_l1hdr.b_buf; buf != NULL; for (arc_buf_t *buf = hdr->b_l1hdr.b_buf; buf != NULL;
buf = buf->b_next) { buf = buf->b_next) {
ASSERT3U(bufcnt, !=, 0);
buffers++;
/* /*
* When the arc_buf_t is sharing the data * When the arc_buf_t is sharing the data
@ -2590,7 +2544,6 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr)
&old_state->arcs_size[type], &old_state->arcs_size[type],
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
ASSERT3U(bufcnt, ==, buffers);
ASSERT(hdr->b_l1hdr.b_pabd != NULL || ASSERT(hdr->b_l1hdr.b_pabd != NULL ||
HDR_HAS_RABD(hdr)); HDR_HAS_RABD(hdr));
@ -2838,9 +2791,6 @@ arc_buf_alloc_impl(arc_buf_hdr_t *hdr, spa_t *spa, const zbookmark_phys_t *zb,
VERIFY3P(buf->b_data, !=, NULL); VERIFY3P(buf->b_data, !=, NULL);
hdr->b_l1hdr.b_buf = buf; hdr->b_l1hdr.b_buf = buf;
hdr->b_l1hdr.b_bufcnt += 1;
if (encrypted)
hdr->b_crypt_hdr.b_ebufcnt += 1;
/* /*
* If the user wants the data from the hdr, we need to either copy or * If the user wants the data from the hdr, we need to either copy or
@ -3082,8 +3032,6 @@ arc_buf_remove(arc_buf_hdr_t *hdr, arc_buf_t *buf)
} }
buf->b_next = NULL; buf->b_next = NULL;
ASSERT3P(lastbuf, !=, buf); ASSERT3P(lastbuf, !=, buf);
IMPLY(hdr->b_l1hdr.b_bufcnt > 0, lastbuf != NULL);
IMPLY(hdr->b_l1hdr.b_bufcnt > 0, hdr->b_l1hdr.b_buf != NULL);
IMPLY(lastbuf != NULL, ARC_BUF_LAST(lastbuf)); IMPLY(lastbuf != NULL, ARC_BUF_LAST(lastbuf));
return (lastbuf); return (lastbuf);
@ -3122,22 +3070,20 @@ arc_buf_destroy_impl(arc_buf_t *buf)
} }
buf->b_data = NULL; buf->b_data = NULL;
ASSERT(hdr->b_l1hdr.b_bufcnt > 0);
hdr->b_l1hdr.b_bufcnt -= 1;
if (ARC_BUF_ENCRYPTED(buf)) {
hdr->b_crypt_hdr.b_ebufcnt -= 1;
/* /*
* If we have no more encrypted buffers and we've * If we have no more encrypted buffers and we've already
* already gotten a copy of the decrypted data we can * gotten a copy of the decrypted data we can free b_rabd
* free b_rabd to save some space. * to save some space.
*/ */
if (hdr->b_crypt_hdr.b_ebufcnt == 0 && if (ARC_BUF_ENCRYPTED(buf) && HDR_HAS_RABD(hdr) &&
HDR_HAS_RABD(hdr) && hdr->b_l1hdr.b_pabd != NULL && hdr->b_l1hdr.b_pabd != NULL && !HDR_IO_IN_PROGRESS(hdr)) {
!HDR_IO_IN_PROGRESS(hdr)) { arc_buf_t *b;
arc_hdr_free_abd(hdr, B_TRUE); for (b = hdr->b_l1hdr.b_buf; b; b = b->b_next) {
if (b != buf && ARC_BUF_ENCRYPTED(b))
break;
} }
if (b == NULL)
arc_hdr_free_abd(hdr, B_TRUE);
} }
} }
@ -3298,11 +3244,7 @@ arc_hdr_alloc(uint64_t spa, int32_t psize, int32_t lsize,
arc_buf_hdr_t *hdr; arc_buf_hdr_t *hdr;
VERIFY(type == ARC_BUFC_DATA || type == ARC_BUFC_METADATA); VERIFY(type == ARC_BUFC_DATA || type == ARC_BUFC_METADATA);
if (protected) {
hdr = kmem_cache_alloc(hdr_full_crypt_cache, KM_PUSHPAGE);
} else {
hdr = kmem_cache_alloc(hdr_full_cache, KM_PUSHPAGE); hdr = kmem_cache_alloc(hdr_full_cache, KM_PUSHPAGE);
}
ASSERT(HDR_EMPTY(hdr)); ASSERT(HDR_EMPTY(hdr));
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
@ -3325,7 +3267,6 @@ arc_hdr_alloc(uint64_t spa, int32_t psize, int32_t lsize,
hdr->b_l1hdr.b_mru_ghost_hits = 0; hdr->b_l1hdr.b_mru_ghost_hits = 0;
hdr->b_l1hdr.b_mfu_hits = 0; hdr->b_l1hdr.b_mfu_hits = 0;
hdr->b_l1hdr.b_mfu_ghost_hits = 0; hdr->b_l1hdr.b_mfu_ghost_hits = 0;
hdr->b_l1hdr.b_bufcnt = 0;
hdr->b_l1hdr.b_buf = NULL; hdr->b_l1hdr.b_buf = NULL;
ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
@ -3351,16 +3292,6 @@ arc_hdr_realloc(arc_buf_hdr_t *hdr, kmem_cache_t *old, kmem_cache_t *new)
ASSERT((old == hdr_full_cache && new == hdr_l2only_cache) || ASSERT((old == hdr_full_cache && new == hdr_l2only_cache) ||
(old == hdr_l2only_cache && new == hdr_full_cache)); (old == hdr_l2only_cache && new == hdr_full_cache));
/*
* if the caller wanted a new full header and the header is to be
* encrypted we will actually allocate the header from the full crypt
* cache instead. The same applies to freeing from the old cache.
*/
if (HDR_PROTECTED(hdr) && new == hdr_full_cache)
new = hdr_full_crypt_cache;
if (HDR_PROTECTED(hdr) && old == hdr_full_cache)
old = hdr_full_crypt_cache;
nhdr = kmem_cache_alloc(new, KM_PUSHPAGE); nhdr = kmem_cache_alloc(new, KM_PUSHPAGE);
ASSERT(MUTEX_HELD(HDR_LOCK(hdr))); ASSERT(MUTEX_HELD(HDR_LOCK(hdr)));
@ -3368,7 +3299,7 @@ arc_hdr_realloc(arc_buf_hdr_t *hdr, kmem_cache_t *old, kmem_cache_t *new)
memcpy(nhdr, hdr, HDR_L2ONLY_SIZE); memcpy(nhdr, hdr, HDR_L2ONLY_SIZE);
if (new == hdr_full_cache || new == hdr_full_crypt_cache) { if (new == hdr_full_cache) {
arc_hdr_set_flags(nhdr, ARC_FLAG_HAS_L1HDR); arc_hdr_set_flags(nhdr, ARC_FLAG_HAS_L1HDR);
/* /*
* arc_access and arc_change_state need to be aware that a * arc_access and arc_change_state need to be aware that a
@ -3382,7 +3313,6 @@ arc_hdr_realloc(arc_buf_hdr_t *hdr, kmem_cache_t *old, kmem_cache_t *new)
ASSERT(!HDR_HAS_RABD(hdr)); ASSERT(!HDR_HAS_RABD(hdr));
} else { } else {
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT0(hdr->b_l1hdr.b_bufcnt);
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
ASSERT3P(hdr->b_l1hdr.b_freeze_cksum, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_freeze_cksum, ==, NULL);
#endif #endif
@ -3448,126 +3378,6 @@ arc_hdr_realloc(arc_buf_hdr_t *hdr, kmem_cache_t *old, kmem_cache_t *new)
return (nhdr); return (nhdr);
} }
/*
* This function allows an L1 header to be reallocated as a crypt
* header and vice versa. If we are going to a crypt header, the
* new fields will be zeroed out.
*/
static arc_buf_hdr_t *
arc_hdr_realloc_crypt(arc_buf_hdr_t *hdr, boolean_t need_crypt)
{
arc_buf_hdr_t *nhdr;
arc_buf_t *buf;
kmem_cache_t *ncache, *ocache;
/*
* This function requires that hdr is in the arc_anon state.
* Therefore it won't have any L2ARC data for us to worry
* about copying.
*/
ASSERT(HDR_HAS_L1HDR(hdr));
ASSERT(!HDR_HAS_L2HDR(hdr));
ASSERT3U(!!HDR_PROTECTED(hdr), !=, need_crypt);
ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node));
ASSERT(!list_link_active(&hdr->b_l2hdr.b_l2node));
ASSERT3P(hdr->b_hash_next, ==, NULL);
if (need_crypt) {
ncache = hdr_full_crypt_cache;
ocache = hdr_full_cache;
} else {
ncache = hdr_full_cache;
ocache = hdr_full_crypt_cache;
}
nhdr = kmem_cache_alloc(ncache, KM_PUSHPAGE);
/*
* Copy all members that aren't locks or condvars to the new header.
* No lists are pointing to us (as we asserted above), so we don't
* need to worry about the list nodes.
*/
nhdr->b_dva = hdr->b_dva;
nhdr->b_birth = hdr->b_birth;
nhdr->b_type = hdr->b_type;
nhdr->b_flags = hdr->b_flags;
nhdr->b_psize = hdr->b_psize;
nhdr->b_lsize = hdr->b_lsize;
nhdr->b_spa = hdr->b_spa;
#ifdef ZFS_DEBUG
nhdr->b_l1hdr.b_freeze_cksum = hdr->b_l1hdr.b_freeze_cksum;
#endif
nhdr->b_l1hdr.b_bufcnt = hdr->b_l1hdr.b_bufcnt;
nhdr->b_l1hdr.b_byteswap = hdr->b_l1hdr.b_byteswap;
nhdr->b_l1hdr.b_state = hdr->b_l1hdr.b_state;
nhdr->b_l1hdr.b_arc_access = hdr->b_l1hdr.b_arc_access;
nhdr->b_l1hdr.b_mru_hits = hdr->b_l1hdr.b_mru_hits;
nhdr->b_l1hdr.b_mru_ghost_hits = hdr->b_l1hdr.b_mru_ghost_hits;
nhdr->b_l1hdr.b_mfu_hits = hdr->b_l1hdr.b_mfu_hits;
nhdr->b_l1hdr.b_mfu_ghost_hits = hdr->b_l1hdr.b_mfu_ghost_hits;
nhdr->b_l1hdr.b_acb = hdr->b_l1hdr.b_acb;
nhdr->b_l1hdr.b_pabd = hdr->b_l1hdr.b_pabd;
/*
* This zfs_refcount_add() exists only to ensure that the individual
* arc buffers always point to a header that is referenced, avoiding
* a small race condition that could trigger ASSERTs.
*/
(void) zfs_refcount_add(&nhdr->b_l1hdr.b_refcnt, FTAG);
nhdr->b_l1hdr.b_buf = hdr->b_l1hdr.b_buf;
for (buf = nhdr->b_l1hdr.b_buf; buf != NULL; buf = buf->b_next)
buf->b_hdr = nhdr;
zfs_refcount_transfer(&nhdr->b_l1hdr.b_refcnt, &hdr->b_l1hdr.b_refcnt);
(void) zfs_refcount_remove(&nhdr->b_l1hdr.b_refcnt, FTAG);
ASSERT0(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt));
if (need_crypt) {
arc_hdr_set_flags(nhdr, ARC_FLAG_PROTECTED);
} else {
arc_hdr_clear_flags(nhdr, ARC_FLAG_PROTECTED);
}
/* unset all members of the original hdr */
memset(&hdr->b_dva, 0, sizeof (dva_t));
hdr->b_birth = 0;
hdr->b_type = 0;
hdr->b_flags = 0;
hdr->b_psize = 0;
hdr->b_lsize = 0;
hdr->b_spa = 0;
#ifdef ZFS_DEBUG
hdr->b_l1hdr.b_freeze_cksum = NULL;
#endif
hdr->b_l1hdr.b_buf = NULL;
hdr->b_l1hdr.b_bufcnt = 0;
hdr->b_l1hdr.b_byteswap = 0;
hdr->b_l1hdr.b_state = NULL;
hdr->b_l1hdr.b_arc_access = 0;
hdr->b_l1hdr.b_mru_hits = 0;
hdr->b_l1hdr.b_mru_ghost_hits = 0;
hdr->b_l1hdr.b_mfu_hits = 0;
hdr->b_l1hdr.b_mfu_ghost_hits = 0;
hdr->b_l1hdr.b_acb = NULL;
hdr->b_l1hdr.b_pabd = NULL;
if (ocache == hdr_full_crypt_cache) {
ASSERT(!HDR_HAS_RABD(hdr));
hdr->b_crypt_hdr.b_ot = DMU_OT_NONE;
hdr->b_crypt_hdr.b_ebufcnt = 0;
hdr->b_crypt_hdr.b_dsobj = 0;
memset(hdr->b_crypt_hdr.b_salt, 0, ZIO_DATA_SALT_LEN);
memset(hdr->b_crypt_hdr.b_iv, 0, ZIO_DATA_IV_LEN);
memset(hdr->b_crypt_hdr.b_mac, 0, ZIO_DATA_MAC_LEN);
}
buf_discard_identity(hdr);
kmem_cache_free(ocache, hdr);
return (nhdr);
}
/* /*
* This function is used by the send / receive code to convert a newly * This function is used by the send / receive code to convert a newly
* allocated arc_buf_t to one that is suitable for a raw encrypted write. It * allocated arc_buf_t to one that is suitable for a raw encrypted write. It
@ -3587,8 +3397,7 @@ arc_convert_to_raw(arc_buf_t *buf, uint64_t dsobj, boolean_t byteorder,
ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon); ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
buf->b_flags |= (ARC_BUF_FLAG_COMPRESSED | ARC_BUF_FLAG_ENCRYPTED); buf->b_flags |= (ARC_BUF_FLAG_COMPRESSED | ARC_BUF_FLAG_ENCRYPTED);
if (!HDR_PROTECTED(hdr)) arc_hdr_set_flags(hdr, ARC_FLAG_PROTECTED);
hdr = arc_hdr_realloc_crypt(hdr, B_TRUE);
hdr->b_crypt_hdr.b_dsobj = dsobj; hdr->b_crypt_hdr.b_dsobj = dsobj;
hdr->b_crypt_hdr.b_ot = ot; hdr->b_crypt_hdr.b_ot = ot;
hdr->b_l1hdr.b_byteswap = (byteorder == ZFS_HOST_BYTEORDER) ? hdr->b_l1hdr.b_byteswap = (byteorder == ZFS_HOST_BYTEORDER) ?
@ -3789,8 +3598,6 @@ static void
arc_hdr_destroy(arc_buf_hdr_t *hdr) arc_hdr_destroy(arc_buf_hdr_t *hdr)
{ {
if (HDR_HAS_L1HDR(hdr)) { if (HDR_HAS_L1HDR(hdr)) {
ASSERT(hdr->b_l1hdr.b_buf == NULL ||
hdr->b_l1hdr.b_bufcnt > 0);
ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon); ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
} }
@ -3854,12 +3661,7 @@ arc_hdr_destroy(arc_buf_hdr_t *hdr)
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
ASSERT3P(hdr->b_l1hdr.b_freeze_cksum, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_freeze_cksum, ==, NULL);
#endif #endif
if (!HDR_PROTECTED(hdr)) {
kmem_cache_free(hdr_full_cache, hdr); kmem_cache_free(hdr_full_cache, hdr);
} else {
kmem_cache_free(hdr_full_crypt_cache, hdr);
}
} else { } else {
kmem_cache_free(hdr_l2only_cache, hdr); kmem_cache_free(hdr_l2only_cache, hdr);
} }
@ -3871,7 +3673,8 @@ arc_buf_destroy(arc_buf_t *buf, const void *tag)
arc_buf_hdr_t *hdr = buf->b_hdr; arc_buf_hdr_t *hdr = buf->b_hdr;
if (hdr->b_l1hdr.b_state == arc_anon) { if (hdr->b_l1hdr.b_state == arc_anon) {
ASSERT3U(hdr->b_l1hdr.b_bufcnt, ==, 1); ASSERT3P(hdr->b_l1hdr.b_buf, ==, buf);
ASSERT(ARC_BUF_LAST(buf));
ASSERT(!HDR_IO_IN_PROGRESS(hdr)); ASSERT(!HDR_IO_IN_PROGRESS(hdr));
VERIFY0(remove_reference(hdr, tag)); VERIFY0(remove_reference(hdr, tag));
return; return;
@ -3881,7 +3684,7 @@ arc_buf_destroy(arc_buf_t *buf, const void *tag)
mutex_enter(hash_lock); mutex_enter(hash_lock);
ASSERT3P(hdr, ==, buf->b_hdr); ASSERT3P(hdr, ==, buf->b_hdr);
ASSERT(hdr->b_l1hdr.b_bufcnt > 0); ASSERT3P(hdr->b_l1hdr.b_buf, !=, NULL);
ASSERT3P(hash_lock, ==, HDR_LOCK(hdr)); ASSERT3P(hash_lock, ==, HDR_LOCK(hdr));
ASSERT3P(hdr->b_l1hdr.b_state, !=, arc_anon); ASSERT3P(hdr->b_l1hdr.b_state, !=, arc_anon);
ASSERT3P(buf->b_data, !=, NULL); ASSERT3P(buf->b_data, !=, NULL);
@ -3924,7 +3727,6 @@ arc_evict_hdr(arc_buf_hdr_t *hdr, uint64_t *real_evicted)
ASSERT(MUTEX_HELD(HDR_LOCK(hdr))); ASSERT(MUTEX_HELD(HDR_LOCK(hdr)));
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
ASSERT(!HDR_IO_IN_PROGRESS(hdr)); ASSERT(!HDR_IO_IN_PROGRESS(hdr));
ASSERT0(hdr->b_l1hdr.b_bufcnt);
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT0(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt)); ASSERT0(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt));
@ -5586,13 +5388,6 @@ arc_read_done(zio_t *zio)
buf_hash_remove(hdr); buf_hash_remove(hdr);
} }
/*
* Broadcast before we drop the hash_lock to avoid the possibility
* that the hdr (and hence the cv) might be freed before we get to
* the cv_broadcast().
*/
cv_broadcast(&hdr->b_l1hdr.b_cv);
arc_hdr_clear_flags(hdr, ARC_FLAG_IO_IN_PROGRESS); arc_hdr_clear_flags(hdr, ARC_FLAG_IO_IN_PROGRESS);
(void) remove_reference(hdr, hdr); (void) remove_reference(hdr, hdr);
@ -5787,7 +5582,6 @@ top:
} }
acb->acb_zio_head = head_zio; acb->acb_zio_head = head_zio;
acb->acb_next = hdr->b_l1hdr.b_acb; acb->acb_next = hdr->b_l1hdr.b_acb;
if (hdr->b_l1hdr.b_acb)
hdr->b_l1hdr.b_acb->acb_prev = acb; hdr->b_l1hdr.b_acb->acb_prev = acb;
hdr->b_l1hdr.b_acb = acb; hdr->b_l1hdr.b_acb = acb;
} }
@ -5928,8 +5722,28 @@ top:
* and so the performance impact shouldn't * and so the performance impact shouldn't
* matter. * matter.
*/ */
cv_wait(&hdr->b_l1hdr.b_cv, hash_lock); arc_callback_t *acb = kmem_zalloc(
sizeof (arc_callback_t), KM_SLEEP);
acb->acb_wait = B_TRUE;
mutex_init(&acb->acb_wait_lock, NULL,
MUTEX_DEFAULT, NULL);
cv_init(&acb->acb_wait_cv, NULL, CV_DEFAULT,
NULL);
acb->acb_zio_head =
hdr->b_l1hdr.b_acb->acb_zio_head;
acb->acb_next = hdr->b_l1hdr.b_acb;
hdr->b_l1hdr.b_acb->acb_prev = acb;
hdr->b_l1hdr.b_acb = acb;
mutex_exit(hash_lock); mutex_exit(hash_lock);
mutex_enter(&acb->acb_wait_lock);
while (acb->acb_wait) {
cv_wait(&acb->acb_wait_cv,
&acb->acb_wait_lock);
}
mutex_exit(&acb->acb_wait_lock);
mutex_destroy(&acb->acb_wait_lock);
cv_destroy(&acb->acb_wait_cv);
kmem_free(acb, sizeof (arc_callback_t));
goto top; goto top;
} }
} }
@ -6310,7 +6124,8 @@ arc_release(arc_buf_t *buf, const void *tag)
ASSERT(!HDR_IN_HASH_TABLE(hdr)); ASSERT(!HDR_IN_HASH_TABLE(hdr));
ASSERT(!HDR_HAS_L2HDR(hdr)); ASSERT(!HDR_HAS_L2HDR(hdr));
ASSERT3U(hdr->b_l1hdr.b_bufcnt, ==, 1); ASSERT3P(hdr->b_l1hdr.b_buf, ==, buf);
ASSERT(ARC_BUF_LAST(buf));
ASSERT3S(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt), ==, 1); ASSERT3S(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt), ==, 1);
ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node)); ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node));
@ -6361,7 +6176,7 @@ arc_release(arc_buf_t *buf, const void *tag)
/* /*
* Do we have more than one buf? * Do we have more than one buf?
*/ */
if (hdr->b_l1hdr.b_bufcnt > 1) { if (hdr->b_l1hdr.b_buf != buf || !ARC_BUF_LAST(buf)) {
arc_buf_hdr_t *nhdr; arc_buf_hdr_t *nhdr;
uint64_t spa = hdr->b_spa; uint64_t spa = hdr->b_spa;
uint64_t psize = HDR_GET_PSIZE(hdr); uint64_t psize = HDR_GET_PSIZE(hdr);
@ -6442,10 +6257,6 @@ arc_release(arc_buf_t *buf, const void *tag)
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
hdr->b_l1hdr.b_bufcnt -= 1;
if (ARC_BUF_ENCRYPTED(buf))
hdr->b_crypt_hdr.b_ebufcnt -= 1;
arc_cksum_verify(buf); arc_cksum_verify(buf);
arc_buf_unwatch(buf); arc_buf_unwatch(buf);
@ -6458,15 +6269,11 @@ arc_release(arc_buf_t *buf, const void *tag)
nhdr = arc_hdr_alloc(spa, psize, lsize, protected, nhdr = arc_hdr_alloc(spa, psize, lsize, protected,
compress, hdr->b_complevel, type); compress, hdr->b_complevel, type);
ASSERT3P(nhdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(nhdr->b_l1hdr.b_buf, ==, NULL);
ASSERT0(nhdr->b_l1hdr.b_bufcnt);
ASSERT0(zfs_refcount_count(&nhdr->b_l1hdr.b_refcnt)); ASSERT0(zfs_refcount_count(&nhdr->b_l1hdr.b_refcnt));
VERIFY3U(nhdr->b_type, ==, type); VERIFY3U(nhdr->b_type, ==, type);
ASSERT(!HDR_SHARED_DATA(nhdr)); ASSERT(!HDR_SHARED_DATA(nhdr));
nhdr->b_l1hdr.b_buf = buf; nhdr->b_l1hdr.b_buf = buf;
nhdr->b_l1hdr.b_bufcnt = 1;
if (ARC_BUF_ENCRYPTED(buf))
nhdr->b_crypt_hdr.b_ebufcnt = 1;
(void) zfs_refcount_add(&nhdr->b_l1hdr.b_refcnt, tag); (void) zfs_refcount_add(&nhdr->b_l1hdr.b_refcnt, tag);
buf->b_hdr = nhdr; buf->b_hdr = nhdr;
@ -6517,7 +6324,7 @@ arc_write_ready(zio_t *zio)
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
ASSERT(!zfs_refcount_is_zero(&buf->b_hdr->b_l1hdr.b_refcnt)); ASSERT(!zfs_refcount_is_zero(&buf->b_hdr->b_l1hdr.b_refcnt));
ASSERT(hdr->b_l1hdr.b_bufcnt > 0); ASSERT3P(hdr->b_l1hdr.b_buf, !=, NULL);
/* /*
* If we're reexecuting this zio because the pool suspended, then * If we're reexecuting this zio because the pool suspended, then
@ -6552,13 +6359,9 @@ arc_write_ready(zio_t *zio)
add_reference(hdr, hdr); /* For IO_IN_PROGRESS. */ add_reference(hdr, hdr); /* For IO_IN_PROGRESS. */
} }
if (BP_IS_PROTECTED(bp) != !!HDR_PROTECTED(hdr))
hdr = arc_hdr_realloc_crypt(hdr, BP_IS_PROTECTED(bp));
if (BP_IS_PROTECTED(bp)) { if (BP_IS_PROTECTED(bp)) {
/* ZIL blocks are written through zio_rewrite */ /* ZIL blocks are written through zio_rewrite */
ASSERT3U(BP_GET_TYPE(bp), !=, DMU_OT_INTENT_LOG); ASSERT3U(BP_GET_TYPE(bp), !=, DMU_OT_INTENT_LOG);
ASSERT(HDR_PROTECTED(hdr));
if (BP_SHOULD_BYTESWAP(bp)) { if (BP_SHOULD_BYTESWAP(bp)) {
if (BP_GET_LEVEL(bp) > 0) { if (BP_GET_LEVEL(bp) > 0) {
@ -6571,11 +6374,14 @@ arc_write_ready(zio_t *zio)
hdr->b_l1hdr.b_byteswap = DMU_BSWAP_NUMFUNCS; hdr->b_l1hdr.b_byteswap = DMU_BSWAP_NUMFUNCS;
} }
arc_hdr_set_flags(hdr, ARC_FLAG_PROTECTED);
hdr->b_crypt_hdr.b_ot = BP_GET_TYPE(bp); hdr->b_crypt_hdr.b_ot = BP_GET_TYPE(bp);
hdr->b_crypt_hdr.b_dsobj = zio->io_bookmark.zb_objset; hdr->b_crypt_hdr.b_dsobj = zio->io_bookmark.zb_objset;
zio_crypt_decode_params_bp(bp, hdr->b_crypt_hdr.b_salt, zio_crypt_decode_params_bp(bp, hdr->b_crypt_hdr.b_salt,
hdr->b_crypt_hdr.b_iv); hdr->b_crypt_hdr.b_iv);
zio_crypt_decode_mac_bp(bp, hdr->b_crypt_hdr.b_mac); zio_crypt_decode_mac_bp(bp, hdr->b_crypt_hdr.b_mac);
} else {
arc_hdr_clear_flags(hdr, ARC_FLAG_PROTECTED);
} }
/* /*
@ -6656,7 +6462,8 @@ arc_write_ready(zio_t *zio)
} else { } else {
ASSERT3P(buf->b_data, ==, abd_to_buf(zio->io_orig_abd)); ASSERT3P(buf->b_data, ==, abd_to_buf(zio->io_orig_abd));
ASSERT3U(zio->io_orig_size, ==, arc_buf_size(buf)); ASSERT3U(zio->io_orig_size, ==, arc_buf_size(buf));
ASSERT3U(hdr->b_l1hdr.b_bufcnt, ==, 1); ASSERT3P(hdr->b_l1hdr.b_buf, ==, buf);
ASSERT(ARC_BUF_LAST(buf));
arc_share_buf(hdr, buf); arc_share_buf(hdr, buf);
} }
@ -6737,7 +6544,8 @@ arc_write_done(zio_t *zio)
(void *)hdr, (void *)exists); (void *)hdr, (void *)exists);
} else { } else {
/* Dedup */ /* Dedup */
ASSERT(hdr->b_l1hdr.b_bufcnt == 1); ASSERT3P(hdr->b_l1hdr.b_buf, !=, NULL);
ASSERT(ARC_BUF_LAST(hdr->b_l1hdr.b_buf));
ASSERT(hdr->b_l1hdr.b_state == arc_anon); ASSERT(hdr->b_l1hdr.b_state == arc_anon);
ASSERT(BP_GET_DEDUP(zio->io_bp)); ASSERT(BP_GET_DEDUP(zio->io_bp));
ASSERT(BP_GET_LEVEL(zio->io_bp) == 0); ASSERT(BP_GET_LEVEL(zio->io_bp) == 0);
@ -6778,7 +6586,7 @@ arc_write(zio_t *pio, spa_t *spa, uint64_t txg,
ASSERT(!HDR_IO_ERROR(hdr)); ASSERT(!HDR_IO_ERROR(hdr));
ASSERT(!HDR_IO_IN_PROGRESS(hdr)); ASSERT(!HDR_IO_IN_PROGRESS(hdr));
ASSERT3P(hdr->b_l1hdr.b_acb, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_acb, ==, NULL);
ASSERT3U(hdr->b_l1hdr.b_bufcnt, >, 0); ASSERT3P(hdr->b_l1hdr.b_buf, !=, NULL);
if (uncached) if (uncached)
arc_hdr_set_flags(hdr, ARC_FLAG_UNCACHED); arc_hdr_set_flags(hdr, ARC_FLAG_UNCACHED);
else if (l2arc) else if (l2arc)
@ -9092,15 +8900,16 @@ l2arc_apply_transforms(spa_t *spa, arc_buf_hdr_t *hdr, uint64_t asize,
* write things before deciding to fail compression in nearly * write things before deciding to fail compression in nearly
* every case.) * every case.)
*/ */
cabd = abd_alloc_for_io(size, ismd); uint64_t bufsize = MAX(size, asize);
tmp = abd_borrow_buf(cabd, size); cabd = abd_alloc_for_io(bufsize, ismd);
tmp = abd_borrow_buf(cabd, bufsize);
psize = zio_compress_data(compress, to_write, &tmp, size, psize = zio_compress_data(compress, to_write, &tmp, size,
hdr->b_complevel); hdr->b_complevel);
if (psize >= asize) { if (psize >= asize) {
psize = HDR_GET_PSIZE(hdr); psize = HDR_GET_PSIZE(hdr);
abd_return_buf_copy(cabd, tmp, size); abd_return_buf_copy(cabd, tmp, bufsize);
HDR_SET_COMPRESS(hdr, ZIO_COMPRESS_OFF); HDR_SET_COMPRESS(hdr, ZIO_COMPRESS_OFF);
to_write = cabd; to_write = cabd;
abd_copy(to_write, hdr->b_l1hdr.b_pabd, psize); abd_copy(to_write, hdr->b_l1hdr.b_pabd, psize);
@ -9110,9 +8919,9 @@ l2arc_apply_transforms(spa_t *spa, arc_buf_hdr_t *hdr, uint64_t asize,
} }
ASSERT3U(psize, <=, HDR_GET_PSIZE(hdr)); ASSERT3U(psize, <=, HDR_GET_PSIZE(hdr));
if (psize < asize) if (psize < asize)
memset((char *)tmp + psize, 0, asize - psize); memset((char *)tmp + psize, 0, bufsize - psize);
psize = HDR_GET_PSIZE(hdr); psize = HDR_GET_PSIZE(hdr);
abd_return_buf_copy(cabd, tmp, size); abd_return_buf_copy(cabd, tmp, bufsize);
to_write = cabd; to_write = cabd;
} }

View File

@ -284,7 +284,17 @@ bpobj_iterate_blkptrs(bpobj_info_t *bpi, bpobj_itor_t func, void *arg,
dmu_buf_t *dbuf = NULL; dmu_buf_t *dbuf = NULL;
bpobj_t *bpo = bpi->bpi_bpo; bpobj_t *bpo = bpi->bpi_bpo;
for (int64_t i = bpo->bpo_phys->bpo_num_blkptrs - 1; i >= start; i--) { int64_t i = bpo->bpo_phys->bpo_num_blkptrs - 1;
uint64_t pe = P2ALIGN_TYPED(i, bpo->bpo_epb, uint64_t) *
sizeof (blkptr_t);
uint64_t ps = start * sizeof (blkptr_t);
uint64_t pb = MAX((pe > dmu_prefetch_max) ? pe - dmu_prefetch_max : 0,
ps);
if (pe > pb) {
dmu_prefetch(bpo->bpo_os, bpo->bpo_object, 0, pb, pe - pb,
ZIO_PRIORITY_ASYNC_READ);
}
for (; i >= start; i--) {
uint64_t offset = i * sizeof (blkptr_t); uint64_t offset = i * sizeof (blkptr_t);
uint64_t blkoff = P2PHASE(i, bpo->bpo_epb); uint64_t blkoff = P2PHASE(i, bpo->bpo_epb);
@ -292,9 +302,16 @@ bpobj_iterate_blkptrs(bpobj_info_t *bpi, bpobj_itor_t func, void *arg,
if (dbuf) if (dbuf)
dmu_buf_rele(dbuf, FTAG); dmu_buf_rele(dbuf, FTAG);
err = dmu_buf_hold(bpo->bpo_os, bpo->bpo_object, err = dmu_buf_hold(bpo->bpo_os, bpo->bpo_object,
offset, FTAG, &dbuf, 0); offset, FTAG, &dbuf, DMU_READ_NO_PREFETCH);
if (err) if (err)
break; break;
pe = pb;
pb = MAX((dbuf->db_offset > dmu_prefetch_max) ?
dbuf->db_offset - dmu_prefetch_max : 0, ps);
if (pe > pb) {
dmu_prefetch(bpo->bpo_os, bpo->bpo_object, 0,
pb, pe - pb, ZIO_PRIORITY_ASYNC_READ);
}
} }
ASSERT3U(offset, >=, dbuf->db_offset); ASSERT3U(offset, >=, dbuf->db_offset);
@ -466,22 +483,30 @@ bpobj_iterate_impl(bpobj_t *initial_bpo, bpobj_itor_t func, void *arg,
int64_t i = bpi->bpi_unprocessed_subobjs - 1; int64_t i = bpi->bpi_unprocessed_subobjs - 1;
uint64_t offset = i * sizeof (uint64_t); uint64_t offset = i * sizeof (uint64_t);
uint64_t obj_from_sublist; uint64_t subobj;
err = dmu_read(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs, err = dmu_read(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs,
offset, sizeof (uint64_t), &obj_from_sublist, offset, sizeof (uint64_t), &subobj,
DMU_READ_PREFETCH); DMU_READ_NO_PREFETCH);
if (err) if (err)
break; break;
bpobj_t *sublist = kmem_alloc(sizeof (bpobj_t),
bpobj_t *subbpo = kmem_alloc(sizeof (bpobj_t),
KM_SLEEP); KM_SLEEP);
err = bpobj_open(subbpo, bpo->bpo_os, subobj);
err = bpobj_open(sublist, bpo->bpo_os, if (err) {
obj_from_sublist); kmem_free(subbpo, sizeof (bpobj_t));
if (err)
break; break;
}
list_insert_head(&stack, bpi_alloc(sublist, bpi, i)); if (subbpo->bpo_havesubobj &&
mutex_enter(&sublist->bpo_lock); subbpo->bpo_phys->bpo_subobjs != 0) {
dmu_prefetch(subbpo->bpo_os,
subbpo->bpo_phys->bpo_subobjs, 0, 0, 0,
ZIO_PRIORITY_ASYNC_READ);
}
list_insert_head(&stack, bpi_alloc(subbpo, bpi, i));
mutex_enter(&subbpo->bpo_lock);
bpi->bpi_unprocessed_subobjs--; bpi->bpi_unprocessed_subobjs--;
} }
} }

Some files were not shown because too many files have changed in this diff Show More