Compare commits

...

39 Commits

Author SHA1 Message Date
Brian Behlendorf 4a104ac047 Tag 2.2.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-27 16:15:44 -07:00
oromenahar c24a480631 BRT should return EOPNOTSUPP
Return the more descriptive EOPNOTSUPP instead of EXDEV when the
storage pool doesn't support block cloning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15097
2023-07-27 16:11:54 -07:00
Rob Norris 36d1a3ef4e zts: block cloning tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
Closes #405
Closes #13349
2023-07-26 08:46:58 -07:00
Rob Norris 2768dc04cc linux: implement filesystem-side copy/clone functions for EL7
Redhat have backported copy_file_range and clone_file_range to the EL7
kernel using an "extended file operations" wrapper structure. This
connects all that up to let cloning work there too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 3366ceaf3a linux: implement filesystem-side clone ioctls
Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and
were implemented as regular filesystem-specific ioctls. This implements
those ioctls directly in OpenZFS, allowing cloning to work on older
kernels.

There's no need to gate these behind version checks; on later kernels
Linux will simply never deliver these ioctls, instead calling the
approprate VFS op.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 5d12545da8 linux: implement filesystem-side copy/clone functions
This implements the Linux VFS ops required to service the file
copy/clone APIs:

  .copy_file_range    (4.5+)
  .clone_file_range   (4.5-4.19)
  .dedupe_file_range  (4.5-4.19)
  .remap_file_range   (4.20+)

Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are
hooked up here, but are not implemented yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris a3ea8c8ee6 dbuf_sync_leaf: check DB_READ in state assertions
Block cloning introduced a new state transition from DB_NOFILL to
DB_READ. This occurs when a block is cloned and then read on the
current txg.

In this case, the clone will move the dbuf to DB_NOFILL, and then the
read will be issued for the overidden block pointer. If that read is
still outstanding when it comes time to write, the dbuf will be in
DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus
tripping the assertions.

This updates those checks to allow DB_READ as a valid state iff the
dirty record is for a BRT write and there is a override block pointer.
This is a safe situation because the block already exists, so there's
nothing that could change from underneath the read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 0426e13271 dmu_buf_will_clone: only check that current txg is clean
dbuf_undirty() will (correctly) only removed dirty records for the given
(open) txg. If there is a dirty record for an earlier closed txg that
has not been synced out yet, then db_dirty_records will still have
entries on it, tripping the assertion.

Instead, change the assertion to only consider the current txg. To some
extent this is redundant, as its really just saying "did dbuf_undirty()
work?", but it it doesn't hurt and accurately expresses our
expectations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 8aa4f0f0fc brt_vdev_realloc: use vmem_alloc for large allocation
bv_entcount can be a relatively large allocation (see comment for
BRT_RANGESIZE), so get it from the big allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 7698503dca zfs_clone_range: use vmem_malloc for large allocation
Just silencing the warning about large allocations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Brian Behlendorf b9aa32ff39 zed: Reduce log noise for large JBODs
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15086
Closes #15094
2023-07-26 08:46:58 -07:00
Brian Behlendorf 571762b290 Linux 6.4 compat: META
Update the META file to reflect compatibility with the 6.4 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15095
2023-07-26 08:46:58 -07:00
Alexander Motin 991834f5dc Remove zl_issuer_lock from zil_suspend().
This locking was recently added as part of #14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from #14979.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15103
2023-07-25 13:54:02 -07:00
Alexander Motin 41a0f66279 ZIL: Fix config lock deadlock.
When we have some LWBs closed and their ZIOs ready to be issued, we
can not afford sleeping on config lock if somebody else try to lock
it as writer, or it will cause a deadlock.

To solve it, move spa_config_enter() from zil_lwb_write_issue() to
zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering
with other threads.  Now if we can't immediately lock config, issue
all previously closed LWBs so that they could drop their config
locks after completion, and only then allow sleeping on our lock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15078
Closes #15080
2023-07-25 13:54:02 -07:00
Umer Saleem c79d1bae75
Update changelog for OpenZFS 2.2.0 release
This commit updates changelog for native Debian packages for
OpenZFS 2.2.0 release.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15104
2023-07-25 09:01:27 -07:00
Brian Behlendorf 70232483b4 Tag 2.2.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-21 16:36:34 -07:00
Rob N c5273e0c31 shellcheck: disable "unreachable command" check [SC2317]
This new check in 0.9.0 appears to have some issues with various forms
of "early return", like trap, exit and return. This is tripping up (at
least):

  cmd/zed/zed.d/history_event-zfs-list-cacher.sh
  /etc/zfs/zfs-functions

Its not obvious what its complaining about or what the remedy is, so it
seems sensible to disable this check for now.

See also:

  https://www.shellcheck.net/wiki/SC2317
  https://github.com/koalaman/shellcheck/issues/2542
  https://github.com/koalaman/shellcheck/issues/2613

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15089
2023-07-21 16:35:12 -07:00
Rob N 685ae4429f metaslab: tuneable to better control force ganging
metaslab_force_ganging isn't enough to actually force ganging, because
it still only forces 3% of the time. This adds
metaslab_force_ganging_pct so we can configure how often to force
ganging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15088
2023-07-21 16:35:12 -07:00
Alexander Motin 81be809a25 Adjust prefetch parameters.
- Reduce maximum prefetch distance for 32bit platforms to 8MB as it
was previously.  Those systems didn't grow much probably, so better
stay conservative there.
 - Retire array_rd_sz tunable, blocking prefetch for large requests.
We should not penalize applications trying to be more efficient. The
speculative prefetcher by itself has reasonable distance limits, and
1MB is not much at all these days.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15072
2023-07-21 16:35:12 -07:00
Alexander Motin 8a6fde8213 Add explicit prefetches to bpobj_iterate().
To simplify error handling bpobj_iterate_blkptrs() iterates through
the list of block pointers backwards.  Unfortunately speculative
prefetcher is currently unable to detect such patterns, that makes
each block read there synchronous and very slow on HDD pools.

According to my tests, added explicit prefetch reduces time needed
to asynchronously delete 8 snapshots of 4 million blocks each from
20 seconds to less than one, that should free sync thread for other
useful work, such as async writes, scrub, etc.

While there, plug one memory leak in case of bpobj_open() error and
harmonize some variable names.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15071
2023-07-21 16:35:12 -07:00
Alan Somers b6f618f8ff Don't emit cksum_{actual_expected} in ereport.fs.zfs.checksum events
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes #14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #14717
Closes #15052
2023-07-21 16:35:12 -07:00
Alan Somers 51a2b59767 Don't emit checksum histograms in ereport.fs.zfs.checksum events
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue #14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15052
2023-07-21 16:35:12 -07:00
Tony Hutter 8c81c0b05d zed: Fix zed ASSERT on slot power cycle
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15084
2023-07-21 16:35:12 -07:00
Chunwei Chen b221f43943 Fix zpl_test_super race with zfs_umount
We cannot call zpl_enter in zpl_test_super, because zpl_test_super is
under spinlock so we can't sleep, and also because zpl_test_super is
called without sb->s_umount taken, so it's possible we would race with
zfs_umount and call zpl_enter on freed zfsvfs.

Here's an stack trace when this happens:
[ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast()
[ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.115012] Call Trace:
[ 2379.115019]  dump_stack+0x74/0x96
[ 2379.115024]  panic+0x114/0x2f6
[ 2379.115035]  spl_panic+0xcf/0xfc [spl]
[ 2379.115477]  __cv_broadcast+0x68/0xa0 [spl]
[ 2379.115585]  rrw_exit+0xb8/0x310 [zfs]
[ 2379.115696]  rrm_exit+0x4a/0x80 [zfs]
[ 2379.115808]  zpl_test_super+0xa9/0xd0 [zfs]
[ 2379.115920]  sget+0xd1/0x230
[ 2379.116033]  zpl_mount+0xdc/0x230 [zfs]
[ 2379.116037]  legacy_get_tree+0x28/0x50
[ 2379.116039]  vfs_get_tree+0x27/0xc0
[ 2379.116045]  path_mount+0x2fe/0xa70
[ 2379.116048]  do_mount+0x80/0xa0
[ 2379.116050]  __x64_sys_mount+0x8b/0xe0
[ 2379.116052]  do_syscall_64+0x35/0x50
[ 2379.116054]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 2379.116057] RIP: 0033:0x7f9912e8b26a

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15077
2023-07-21 16:35:12 -07:00
Ameer Hamza e037327bfe spa_min_alloc should be GCD, not min
Since spa_min_alloc may not be a power of 2, unlike ashifts, in the
case of DRAID, we should not select the minimal value among several
vdevs. Rounding to a multiple of it is unlikely to work for other
vdevs. Instead, using the greatest common divisor produces smaller
yet more reasonable results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15067
2023-07-21 16:35:12 -07:00
Yuri Pankov 1a2e486d25 Don't panic if setting vdev properties is unsupported for this vdev type
Check that vdev has valid zap and bail out early.

While here, move objid selection out of the loop, it's not going to
change.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15063
2023-07-21 16:35:12 -07:00
Ameer Hamza d8011707cc Ignore pool ashift property during vdev attachment
Ashift can be set for a vdev only during its creation, and the
top-level vdev does not change when a vdev is attached or replaced.
The ashift property should not be used during attachment, as it
does not allow attaching/replacing a vdev if the pool's ashift
property is increased after the existing vdev was created. Instead,
we should be able to attach the vdev if the attached vdev can
satisfy the ashift requirement with its parent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15061
2023-07-21 16:35:12 -07:00
Wojciech Małota-Wójcik f5f5a2db95 Rollback before zfs root is mounted
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Wojciech Małota-Wójcik <59281144+outofforest@users.noreply.github.com>
Closes #15025
2023-07-21 16:35:12 -07:00
Alexander Motin 83b0967c1f Do not request data L1 buffers on scan prefetch.
Set ARC_FLAG_NO_BUF when prefetching data L1 buffers for scan.  We
do not prefetch data L0 buffers, so we do not need the L1 buffers,
only want them to be ready in ARC. This saves some CPU time on the
buffers decompression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15029
2023-07-21 16:35:12 -07:00
Coleman Kane 73ba5df31a Linux 6.5 compat: disk_check_media_change() was added
The disk_check_media_change() function was added which replaces
bdev_check_media_change.  This change was introduced in 6.5rc1
444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a
gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk
is now used to pass the expected data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15060
2023-07-21 16:35:12 -07:00
Coleman Kane 1bc244ae93 Linux 6.5 compat: BLK_STS_NEXUS renamed to BLK_STS_RESV_CONFLICT
This change was introduced in Linux commit
7ba150834b840f6f5cdd07ca69a4ccf39df59a66

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15059
2023-07-21 16:35:12 -07:00
Coleman Kane 931dc70550 Linux 6.5 compat: intptr_t definition is canonically signed
Make the version here match that elsewhere in the kernel and system
headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15058
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-21 16:35:12 -07:00
Yuri Pankov 5299f4f289 set autotrim default to 'off' everywhere
As it turns out having autotrim default to 'on' on FreeBSD never really
worked due to mess with defines where userland and kernel module were
getting different default values (userland was defaulting to 'off',
module was thinking it's 'on').

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15079
2023-07-21 16:35:12 -07:00
Alan Somers f917cf1c03 Fix the ZFS checksum error histograms with larger record sizes
My analysis in PR #14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15049
2023-07-21 16:35:12 -07:00
Alexander Motin 56ed389a57 Fix raw receive with different indirect block size.
Unlike regular receive, raw receive require destination to have the
same block structure as the source.  In case of dnode reclaim this
triggers two special cases, requiring special handling:
 - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz()
should not dirty the data buffer if block size does not change, or
durign receive dbuf_dirty_lightweight() will trigger assertion.
 - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz()
would fail and receive_object would trigger assertion, so we should
destroy and recreate the dnode from scratch.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15039

(cherry picked from commit c4e8742149)
2023-07-20 08:58:29 -07:00
Alexander Motin e613e4bbe3 Avoid extra snprintf() in dsl_deadlist_merge().
Since we are already iterating the ZAP, we have exact string key to
remove, we do not need to call zap_remove_int() with the int key we
just converted, we can call zap_remove() for the original string.

This should make no functional change, only a micro-optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15056

(cherry picked from commit fdba8cbb79)
2023-07-20 08:58:29 -07:00
Alexander Motin b4e630b00c Add missed DMU_PROJECTUSED_OBJECT prefetch.
It seems 9c5167d19f "Project Quota on ZFS" missed to add prefetch
for DMU_PROJECTUSED_OBJECT during scan (scrub/resilver).  It should
not cause visible problems, but may affect scub/resilver performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15024
2023-07-20 08:58:29 -07:00
Mateusz Guzik bf6cd30796 FreeBSD: catch up to __FreeBSD_version 1400093
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15036
2023-07-20 08:58:29 -07:00
Alexander Motin 1266cebf87 FreeBSD: Fix build on stable/13 after 1302506.
Starting approximately from version 1302506 vn_lock_pair() grown two
additional arguments following head.  There is a one week hole, but
that is closet reference point we have.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15047
2023-07-20 08:58:29 -07:00
72 changed files with 1996 additions and 266 deletions

4
META
View File

@ -2,9 +2,9 @@ Meta: 1
Name: zfs Name: zfs
Branch: 1.0 Branch: 1.0
Version: 2.2.0 Version: 2.2.0
Release: rc1 Release: rc3
Release-Tags: relext Release-Tags: relext
License: CDDL License: CDDL
Author: OpenZFS Author: OpenZFS
Linux-Maximum: 6.3 Linux-Maximum: 6.4
Linux-Minimum: 3.10 Linux-Minimum: 3.10

View File

@ -607,8 +607,6 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
*/ */
if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 || if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 ||
strcmp(dp->dd_compare, path) != 0) { strcmp(dp->dd_compare, path) != 0) {
zed_log_msg(LOG_INFO, " %s: no match (%s != vdev %s)",
__func__, dp->dd_compare, path);
return; return;
} }
if (dp->dd_new_vdev_guid != 0 && dp->dd_new_vdev_guid != guid) { if (dp->dd_new_vdev_guid != 0 && dp->dd_new_vdev_guid != guid) {

View File

@ -416,6 +416,11 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0) FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0)
return; return;
if (vdev_guid == 0) {
fmd_hdl_debug(hdl, "Got a zero GUID");
return;
}
if (spare) { if (spare) {
int nspares = find_and_remove_spares(zhdl, vdev_guid); int nspares = find_and_remove_spares(zhdl, vdev_guid);
fmd_hdl_debug(hdl, "%d spares removed", nspares); fmd_hdl_debug(hdl, "%d spares removed", nspares);

View File

@ -4,6 +4,7 @@
# Not following: a was not specified as input (see shellcheck -x). [SC1091] # Not following: a was not specified as input (see shellcheck -x). [SC1091]
# Prefer putting braces around variable references even when not strictly required. [SC2250] # Prefer putting braces around variable references even when not strictly required. [SC2250]
# Consider invoking this command separately to avoid masking its return value (or use '|| true' to ignore). [SC2312] # Consider invoking this command separately to avoid masking its return value (or use '|| true' to ignore). [SC2312]
# Command appears to be unreachable. Check usage (or ignore if invoked indirectly). [SC2317]
# In POSIX sh, 'local' is undefined. [SC2039] # older ShellCheck versions # In POSIX sh, 'local' is undefined. [SC2039] # older ShellCheck versions
# In POSIX sh, 'local' is undefined. [SC3043] # newer ShellCheck versions # In POSIX sh, 'local' is undefined. [SC3043] # newer ShellCheck versions
@ -18,7 +19,7 @@ PHONY += shellcheck
_STGT = $(subst ^,/,$(subst shellcheck-here-,,$@)) _STGT = $(subst ^,/,$(subst shellcheck-here-,,$@))
shellcheck-here-%: shellcheck-here-%:
if HAVE_SHELLCHECK if HAVE_SHELLCHECK
shellcheck --format=gcc --enable=all --exclude=SC1090,SC1091,SC2039,SC2250,SC2312,SC3043 $$([ -n "$(SHELLCHECK_SHELL)" ] && echo "--shell=$(SHELLCHECK_SHELL)") "$$([ -e "$(_STGT)" ] || echo "$(srcdir)/")$(_STGT)" shellcheck --format=gcc --enable=all --exclude=SC1090,SC1091,SC2039,SC2250,SC2312,SC2317,SC3043 $$([ -n "$(SHELLCHECK_SHELL)" ] && echo "--shell=$(SHELLCHECK_SHELL)") "$$([ -e "$(_STGT)" ] || echo "$(srcdir)/")$(_STGT)"
else else
@echo "skipping shellcheck of" $(_STGT) "because shellcheck is not installed" @echo "skipping shellcheck of" $(_STGT) "because shellcheck is not installed"
endif endif

View File

@ -103,6 +103,33 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_CHECK_DISK_CHANGE], [
]) ])
]) ])
dnl #
dnl # 6.5.x API change
dnl # disk_check_media_change() was added
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_DISK_CHECK_MEDIA_CHANGE], [
ZFS_LINUX_TEST_SRC([disk_check_media_change], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct block_device *bdev = NULL;
bool error;
error = disk_check_media_change(bdev->bd_disk);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_DISK_CHECK_MEDIA_CHANGE], [
AC_MSG_CHECKING([whether disk_check_media_change() exists])
ZFS_LINUX_TEST_RESULT([disk_check_media_change], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DISK_CHECK_MEDIA_CHANGE, 1,
[disk_check_media_change() exists])
], [
AC_MSG_RESULT(no)
])
])
dnl # dnl #
dnl # bdev_kobj() is introduced from 5.12 dnl # bdev_kobj() is introduced from 5.12
dnl # dnl #
@ -443,6 +470,29 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_ERESTARTSYS], [
]) ])
]) ])
dnl #
dnl # 6.5.x API change
dnl # BLK_STS_NEXUS replaced with BLK_STS_RESV_CONFLICT
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BLK_STS_RESV_CONFLICT], [
ZFS_LINUX_TEST_SRC([blk_sts_resv_conflict], [
#include <linux/blkdev.h>
],[
blk_status_t s __attribute__ ((unused)) = BLK_STS_RESV_CONFLICT;
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BLK_STS_RESV_CONFLICT], [
AC_MSG_CHECKING([whether BLK_STS_RESV_CONFLICT is defined])
ZFS_LINUX_TEST_RESULT([blk_sts_resv_conflict], [
AC_DEFINE(HAVE_BLK_STS_RESV_CONFLICT, 1, [BLK_STS_RESV_CONFLICT is defined])
AC_MSG_RESULT(yes)
], [
AC_MSG_RESULT(no)
])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH
ZFS_AC_KERNEL_SRC_BLKDEV_PUT ZFS_AC_KERNEL_SRC_BLKDEV_PUT
@ -458,6 +508,8 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_KOBJ ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_SRC_BLKDEV_PART_TO_DEV ZFS_AC_KERNEL_SRC_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_SRC_BLKDEV_DISK_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_SRC_BLKDEV_BLK_STS_RESV_CONFLICT
]) ])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [
@ -476,4 +528,6 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [
ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_BLKDEV_BDEV_KOBJ ZFS_AC_KERNEL_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_BLKDEV_PART_TO_DEV ZFS_AC_KERNEL_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_BLKDEV_DISK_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_BLKDEV_BLK_STS_RESV_CONFLICT
]) ])

View File

@ -0,0 +1,50 @@
dnl #
dnl # EL7 have backported copy_file_range and clone_file_range and
dnl # added them to an "extended" file_operations struct.
dnl #
dnl # We're testing for both functions in one here, because they will only
dnl # ever appear together and we don't want to match a similar method in
dnl # some future vendor kernel.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_FILE_OPERATIONS_EXTEND], [
ZFS_LINUX_TEST_SRC([vfs_file_operations_extend], [
#include <linux/fs.h>
static ssize_t test_copy_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
size_t len, unsigned int flags) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len; (void) flags;
return (0);
}
static int test_clone_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
u64 len) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len;
return (0);
}
static const struct file_operations_extend
fops __attribute__ ((unused)) = {
.kabi_fops = {},
.copy_file_range = test_copy_file_range,
.clone_file_range = test_clone_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_FILE_OPERATIONS_EXTEND], [
AC_MSG_CHECKING([whether file_operations_extend takes \
.copy_file_range() and .clone_file_range()])
ZFS_LINUX_TEST_RESULT([vfs_file_operations_extend], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_FILE_OPERATIONS_EXTEND, 1,
[file_operations_extend takes .copy_file_range()
and .clone_file_range()])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -0,0 +1,164 @@
dnl #
dnl # The *_file_range APIs have a long history:
dnl #
dnl # 2.6.29: BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctl introduced
dnl # 3.12: BTRFS_IOC_FILE_EXTENT_SAME ioctl introduced
dnl #
dnl # 4.5: copy_file_range() syscall introduced, added to VFS
dnl # 4.5: BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE renamed to FICLONE ands
dnl # FICLONERANGE, added to VFS as clone_file_range()
dnl # 4.5: BTRFS_IOC_FILE_EXTENT_SAME renamed to FIDEDUPERANGE, added to VFS
dnl # as dedupe_file_range()
dnl #
dnl # 4.20: VFS clone_file_range() and dedupe_file_range() replaced by
dnl # remap_file_range()
dnl #
dnl # 5.3: VFS copy_file_range() expected to do its own fallback,
dnl # generic_copy_file_range() added to support it
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_copy_file_range], [
#include <linux/fs.h>
static ssize_t test_copy_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
size_t len, unsigned int flags) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len; (void) flags;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.copy_file_range = test_copy_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->copy_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_copy_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_COPY_FILE_RANGE, 1,
[fops->copy_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([generic_copy_file_range], [
#include <linux/fs.h>
], [
struct file *src_file __attribute__ ((unused)) = NULL;
loff_t src_off __attribute__ ((unused)) = 0;
struct file *dst_file __attribute__ ((unused)) = NULL;
loff_t dst_off __attribute__ ((unused)) = 0;
size_t len __attribute__ ((unused)) = 0;
unsigned int flags __attribute__ ((unused)) = 0;
generic_copy_file_range(src_file, src_off, dst_file, dst_off,
len, flags);
])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE], [
AC_MSG_CHECKING([whether generic_copy_file_range() is available])
ZFS_LINUX_TEST_RESULT_SYMBOL([generic_copy_file_range],
[generic_copy_file_range], [fs/read_write.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_GENERIC_COPY_FILE_RANGE, 1,
[generic_copy_file_range() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_clone_file_range], [
#include <linux/fs.h>
static int test_clone_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
u64 len) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.clone_file_range = test_clone_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->clone_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_clone_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_CLONE_FILE_RANGE, 1,
[fops->clone_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_dedupe_file_range], [
#include <linux/fs.h>
static int test_dedupe_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
u64 len) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.dedupe_file_range = test_dedupe_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->dedupe_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_dedupe_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DEDUPE_FILE_RANGE, 1,
[fops->dedupe_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_remap_file_range], [
#include <linux/fs.h>
static loff_t test_remap_file_range(struct file *src_file,
loff_t src_off, struct file *dst_file, loff_t dst_off,
loff_t len, unsigned int flags) {
(void) src_file; (void) src_off;
(void) dst_file; (void) dst_off;
(void) len; (void) flags;
return (0);
}
static const struct file_operations
fops __attribute__ ((unused)) = {
.remap_file_range = test_remap_file_range,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE], [
AC_MSG_CHECKING([whether fops->remap_file_range() is available])
ZFS_LINUX_TEST_RESULT([vfs_remap_file_range], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_REMAP_FILE_RANGE, 1,
[fops->remap_file_range() is available])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -116,6 +116,12 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_VFS_RW_ITERATE ZFS_AC_KERNEL_SRC_VFS_RW_ITERATE
ZFS_AC_KERNEL_SRC_VFS_GENERIC_WRITE_CHECKS ZFS_AC_KERNEL_SRC_VFS_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_SRC_VFS_IOV_ITER ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_FILE_OPERATIONS_EXTEND
ZFS_AC_KERNEL_SRC_KMAP_ATOMIC_ARGS ZFS_AC_KERNEL_SRC_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_SRC_FOLLOW_DOWN_ONE ZFS_AC_KERNEL_SRC_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN
@ -249,6 +255,12 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_VFS_RW_ITERATE ZFS_AC_KERNEL_VFS_RW_ITERATE
ZFS_AC_KERNEL_VFS_GENERIC_WRITE_CHECKS ZFS_AC_KERNEL_VFS_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_VFS_IOV_ITER ZFS_AC_KERNEL_VFS_IOV_ITER
ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE
ZFS_AC_KERNEL_VFS_FILE_OPERATIONS_EXTEND
ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_FOLLOW_DOWN_ONE ZFS_AC_KERNEL_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_MAKE_REQUEST_FN ZFS_AC_KERNEL_MAKE_REQUEST_FN

View File

@ -1,3 +1,9 @@
openzfs-linux (2.2.0-0) unstable; urgency=low
* OpenZFS 2.2.0 is tagged.
-- Umer Saleem <usaleem@ixsystems.com> Tue, 25 Jul 2023 15:00:00 +0500
openzfs-linux (2.1.99-1) unstable; urgency=low openzfs-linux (2.1.99-1) unstable; urgency=low
* Integrate minimally modified Debian packaging from ZFS on Linux * Integrate minimally modified Debian packaging from ZFS on Linux

View File

@ -2,7 +2,7 @@
Description=Rollback bootfs just before it is mounted Description=Rollback bootfs just before it is mounted
Requisite=zfs-import.target Requisite=zfs-import.target
After=zfs-import.target dracut-pre-mount.service zfs-snapshot-bootfs.service After=zfs-import.target dracut-pre-mount.service zfs-snapshot-bootfs.service
Before=dracut-mount.service Before=dracut-mount.service sysroot.mount
DefaultDependencies=no DefaultDependencies=no
ConditionKernelCommandLine=bootfs.rollback ConditionKernelCommandLine=bootfs.rollback
ConditionEnvironment=BOOTFS ConditionEnvironment=BOOTFS

View File

@ -36,7 +36,11 @@ struct xucred;
typedef struct flock flock64_t; typedef struct flock flock64_t;
typedef struct vnode vnode_t; typedef struct vnode vnode_t;
typedef struct vattr vattr_t; typedef struct vattr vattr_t;
#if __FreeBSD_version < 1400093
typedef enum vtype vtype_t; typedef enum vtype vtype_t;
#else
#define vtype_t __enum_uint8(vtype)
#endif
#include <sys/types.h> #include <sys/types.h>
#include <sys/queue.h> #include <sys/queue.h>

View File

@ -181,7 +181,11 @@ bi_status_to_errno(blk_status_t status)
return (ENOLINK); return (ENOLINK);
case BLK_STS_TARGET: case BLK_STS_TARGET:
return (EREMOTEIO); return (EREMOTEIO);
#ifdef HAVE_BLK_STS_RESV_CONFLICT
case BLK_STS_RESV_CONFLICT:
#else
case BLK_STS_NEXUS: case BLK_STS_NEXUS:
#endif
return (EBADE); return (EBADE);
case BLK_STS_MEDIUM: case BLK_STS_MEDIUM:
return (ENODATA); return (ENODATA);
@ -215,7 +219,11 @@ errno_to_bi_status(int error)
case EREMOTEIO: case EREMOTEIO:
return (BLK_STS_TARGET); return (BLK_STS_TARGET);
case EBADE: case EBADE:
#ifdef HAVE_BLK_STS_RESV_CONFLICT
return (BLK_STS_RESV_CONFLICT);
#else
return (BLK_STS_NEXUS); return (BLK_STS_NEXUS);
#endif
case ENODATA: case ENODATA:
return (BLK_STS_MEDIUM); return (BLK_STS_MEDIUM);
case EILSEQ: case EILSEQ:
@ -337,6 +345,8 @@ zfs_check_media_change(struct block_device *bdev)
return (0); return (0);
} }
#define vdev_bdev_reread_part(bdev) zfs_check_media_change(bdev) #define vdev_bdev_reread_part(bdev) zfs_check_media_change(bdev)
#elif defined(HAVE_DISK_CHECK_MEDIA_CHANGE)
#define vdev_bdev_reread_part(bdev) disk_check_media_change(bdev->bd_disk)
#else #else
/* /*
* This is encountered if check_disk_change() and bdev_check_media_change() * This is encountered if check_disk_change() and bdev_check_media_change()

View File

@ -38,7 +38,7 @@ typedef unsigned long ulong_t;
typedef unsigned long long u_longlong_t; typedef unsigned long long u_longlong_t;
typedef long long longlong_t; typedef long long longlong_t;
typedef unsigned long intptr_t; typedef long intptr_t;
typedef unsigned long long rlim64_t; typedef unsigned long long rlim64_t;
typedef struct task_struct kthread_t; typedef struct task_struct kthread_t;

View File

@ -52,7 +52,11 @@ extern const struct inode_operations zpl_special_inode_operations;
/* zpl_file.c */ /* zpl_file.c */
extern const struct address_space_operations zpl_address_space_operations; extern const struct address_space_operations zpl_address_space_operations;
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
extern const struct file_operations_extend zpl_file_operations;
#else
extern const struct file_operations zpl_file_operations; extern const struct file_operations zpl_file_operations;
#endif
extern const struct file_operations zpl_dir_file_operations; extern const struct file_operations zpl_dir_file_operations;
/* zpl_super.c */ /* zpl_super.c */
@ -180,6 +184,55 @@ zpl_dir_emit_dots(struct file *file, zpl_dir_context_t *ctx)
} }
#endif /* HAVE_VFS_ITERATE */ #endif /* HAVE_VFS_ITERATE */
/* zpl_file_range.c */
/* handlers for file_operations of the same name */
extern ssize_t zpl_copy_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, size_t len, unsigned int flags);
extern loff_t zpl_remap_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, loff_t len, unsigned int flags);
extern int zpl_clone_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len);
extern int zpl_dedupe_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len);
/* compat for FICLONE/FICLONERANGE/FIDEDUPERANGE ioctls */
typedef struct {
int64_t fcr_src_fd;
uint64_t fcr_src_offset;
uint64_t fcr_src_length;
uint64_t fcr_dest_offset;
} zfs_ioc_compat_file_clone_range_t;
typedef struct {
int64_t fdri_dest_fd;
uint64_t fdri_dest_offset;
uint64_t fdri_bytes_deduped;
int32_t fdri_status;
uint32_t fdri_reserved;
} zfs_ioc_compat_dedupe_range_info_t;
typedef struct {
uint64_t fdr_src_offset;
uint64_t fdr_src_length;
uint16_t fdr_dest_count;
uint16_t fdr_reserved1;
uint32_t fdr_reserved2;
zfs_ioc_compat_dedupe_range_info_t fdr_info[];
} zfs_ioc_compat_dedupe_range_t;
#define ZFS_IOC_COMPAT_FICLONE _IOW(0x94, 9, int)
#define ZFS_IOC_COMPAT_FICLONERANGE \
_IOW(0x94, 13, zfs_ioc_compat_file_clone_range_t)
#define ZFS_IOC_COMPAT_FIDEDUPERANGE \
_IOWR(0x94, 54, zfs_ioc_compat_dedupe_range_t)
extern long zpl_ioctl_ficlone(struct file *filp, void *arg);
extern long zpl_ioctl_ficlonerange(struct file *filp, void *arg);
extern long zpl_ioctl_fideduperange(struct file *filp, void *arg);
#if defined(HAVE_INODE_TIMESTAMP_TRUNCATE) #if defined(HAVE_INODE_TIMESTAMP_TRUNCATE)
#define zpl_inode_timestamp_truncate(ts, ip) timestamp_truncate(ts, ip) #define zpl_inode_timestamp_truncate(ts, ip) timestamp_truncate(ts, ip)
#elif defined(HAVE_INODE_TIMESPEC64_TIMES) #elif defined(HAVE_INODE_TIMESPEC64_TIMES)

View File

@ -60,7 +60,7 @@ typedef struct bpobj {
kmutex_t bpo_lock; kmutex_t bpo_lock;
objset_t *bpo_os; objset_t *bpo_os;
uint64_t bpo_object; uint64_t bpo_object;
int bpo_epb; uint32_t bpo_epb;
uint8_t bpo_havecomp; uint8_t bpo_havecomp;
uint8_t bpo_havesubobj; uint8_t bpo_havesubobj;
uint8_t bpo_havefreed; uint8_t bpo_havefreed;

View File

@ -36,8 +36,6 @@
extern "C" { extern "C" {
#endif #endif
extern uint64_t zfetch_array_rd_sz;
struct dnode; /* so we can reference dnode */ struct dnode; /* so we can reference dnode */
typedef struct zfetch { typedef struct zfetch {

View File

@ -102,8 +102,6 @@ extern "C" {
#define FM_EREPORT_PAYLOAD_ZFS_ZIO_TIMESTAMP "zio_timestamp" #define FM_EREPORT_PAYLOAD_ZFS_ZIO_TIMESTAMP "zio_timestamp"
#define FM_EREPORT_PAYLOAD_ZFS_ZIO_DELTA "zio_delta" #define FM_EREPORT_PAYLOAD_ZFS_ZIO_DELTA "zio_delta"
#define FM_EREPORT_PAYLOAD_ZFS_PREV_STATE "prev_state" #define FM_EREPORT_PAYLOAD_ZFS_PREV_STATE "prev_state"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_EXPECTED "cksum_expected"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_ACTUAL "cksum_actual"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_ALGO "cksum_algorithm" #define FM_EREPORT_PAYLOAD_ZFS_CKSUM_ALGO "cksum_algorithm"
#define FM_EREPORT_PAYLOAD_ZFS_CKSUM_BYTESWAP "cksum_byteswap" #define FM_EREPORT_PAYLOAD_ZFS_CKSUM_BYTESWAP "cksum_byteswap"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_OFFSET_RANGES "bad_ranges" #define FM_EREPORT_PAYLOAD_ZFS_BAD_OFFSET_RANGES "bad_ranges"
@ -112,8 +110,6 @@ extern "C" {
#define FM_EREPORT_PAYLOAD_ZFS_BAD_RANGE_CLEARS "bad_range_clears" #define FM_EREPORT_PAYLOAD_ZFS_BAD_RANGE_CLEARS "bad_range_clears"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_SET_BITS "bad_set_bits" #define FM_EREPORT_PAYLOAD_ZFS_BAD_SET_BITS "bad_set_bits"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_BITS "bad_cleared_bits" #define FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_BITS "bad_cleared_bits"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM "bad_set_histogram"
#define FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM "bad_cleared_histogram"
#define FM_EREPORT_PAYLOAD_ZFS_SNAPSHOT_NAME "snapshot_name" #define FM_EREPORT_PAYLOAD_ZFS_SNAPSHOT_NAME "snapshot_name"
#define FM_EREPORT_PAYLOAD_ZFS_DEVICE_NAME "device_name" #define FM_EREPORT_PAYLOAD_ZFS_DEVICE_NAME "device_name"
#define FM_EREPORT_PAYLOAD_ZFS_RAW_DEVICE_NAME "raw_name" #define FM_EREPORT_PAYLOAD_ZFS_RAW_DEVICE_NAME "raw_name"

View File

@ -723,16 +723,10 @@ typedef enum spa_mode {
* Send TRIM commands in-line during normal pool operation while deleting. * Send TRIM commands in-line during normal pool operation while deleting.
* OFF: no * OFF: no
* ON: yes * ON: yes
* NB: IN_FREEBSD_BASE is defined within the FreeBSD sources.
*/ */
typedef enum { typedef enum {
SPA_AUTOTRIM_OFF = 0, /* default */ SPA_AUTOTRIM_OFF = 0, /* default */
SPA_AUTOTRIM_ON, SPA_AUTOTRIM_ON,
#ifdef IN_FREEBSD_BASE
SPA_AUTOTRIM_DEFAULT = SPA_AUTOTRIM_ON,
#else
SPA_AUTOTRIM_DEFAULT = SPA_AUTOTRIM_OFF,
#endif
} spa_autotrim_t; } spa_autotrim_t;
/* /*

View File

@ -250,6 +250,7 @@ struct spa {
uint64_t spa_min_ashift; /* of vdevs in normal class */ uint64_t spa_min_ashift; /* of vdevs in normal class */
uint64_t spa_max_ashift; /* of vdevs in normal class */ uint64_t spa_max_ashift; /* of vdevs in normal class */
uint64_t spa_min_alloc; /* of vdevs in normal class */ uint64_t spa_min_alloc; /* of vdevs in normal class */
uint64_t spa_gcd_alloc; /* of vdevs in normal class */
uint64_t spa_config_guid; /* config pool guid */ uint64_t spa_config_guid; /* config pool guid */
uint64_t spa_load_guid; /* spa_load initialized guid */ uint64_t spa_load_guid; /* spa_load initialized guid */
uint64_t spa_last_synced_guid; /* last synced guid */ uint64_t spa_last_synced_guid; /* last synced guid */

View File

@ -420,6 +420,7 @@ struct vdev {
boolean_t vdev_copy_uberblocks; /* post expand copy uberblocks */ boolean_t vdev_copy_uberblocks; /* post expand copy uberblocks */
boolean_t vdev_resilver_deferred; /* resilver deferred */ boolean_t vdev_resilver_deferred; /* resilver deferred */
boolean_t vdev_kobj_flag; /* kobj event record */ boolean_t vdev_kobj_flag; /* kobj event record */
boolean_t vdev_attaching; /* vdev attach ashift handling */
vdev_queue_t vdev_queue; /* I/O deadline schedule queue */ vdev_queue_t vdev_queue; /* I/O deadline schedule queue */
spa_aux_vdev_t *vdev_aux; /* for l2cache and spares vdevs */ spa_aux_vdev_t *vdev_aux; /* for l2cache and spares vdevs */
zio_t *vdev_probe_zio; /* root of current probe */ zio_t *vdev_probe_zio; /* root of current probe */

View File

@ -94,8 +94,6 @@ typedef const struct zio_checksum_info {
} zio_checksum_info_t; } zio_checksum_info_t;
typedef struct zio_bad_cksum { typedef struct zio_bad_cksum {
zio_cksum_t zbc_expected;
zio_cksum_t zbc_actual;
const char *zbc_checksum_name; const char *zbc_checksum_name;
uint8_t zbc_byteswapped; uint8_t zbc_byteswapped;
uint8_t zbc_injected; uint8_t zbc_injected;

View File

@ -15,7 +15,7 @@
.\" own identifying information: .\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner] .\" Portions Copyright [yyyy] [name of copyright owner]
.\" .\"
.Dd January 10, 2023 .Dd July 21, 2023
.Dt ZFS 4 .Dt ZFS 4
.Os .Os
. .
@ -239,6 +239,11 @@ relative to the pool.
Make some blocks above a certain size be gang blocks. Make some blocks above a certain size be gang blocks.
This option is used by the test suite to facilitate testing. This option is used by the test suite to facilitate testing.
. .
.It Sy metaslab_force_ganging_pct Ns = Ns Sy 3 Ns % Pq uint
For blocks that could be forced to be a gang block (due to
.Sy metaslab_force_ganging ) ,
force this many of them to be gang blocks.
.
.It Sy zfs_ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int .It Sy zfs_ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
Default DDT ZAP data block size as a power of 2. Note that changing this after Default DDT ZAP data block size as a power of 2. Note that changing this after
creating a DDT on the pool will not affect existing DDTs, only newly created creating a DDT on the pool will not affect existing DDTs, only newly created
@ -519,9 +524,6 @@ However, this is limited by
Maximum micro ZAP size. Maximum micro ZAP size.
A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size. A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size.
. .
.It Sy zfetch_array_rd_sz Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
If prefetching is enabled, disable prefetching for reads larger than this size.
.
.It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint .It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint
Min bytes to prefetch per stream. Min bytes to prefetch per stream.
Prefetch distance starts from the demand access size and quickly grows to Prefetch distance starts from the demand access size and quickly grows to

View File

@ -26,7 +26,7 @@
.\" Copyright 2017 Nexenta Systems, Inc. .\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. .\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\" .\"
.Dd May 27, 2021 .Dd July 11, 2023
.Dt ZPOOL-EVENTS 8 .Dt ZPOOL-EVENTS 8
.Os .Os
. .
@ -305,10 +305,6 @@ The time when a given I/O request was submitted.
The time required to service a given I/O request. The time required to service a given I/O request.
.It Sy prev_state .It Sy prev_state
The previous state of the vdev. The previous state of the vdev.
.It Sy cksum_expected
The expected checksum value for the block.
.It Sy cksum_actual
The actual checksum value for an errant block.
.It Sy cksum_algorithm .It Sy cksum_algorithm
Checksum algorithm used. Checksum algorithm used.
See See
@ -362,23 +358,6 @@ Like
but contains but contains
.Pq Ar good data No & ~( Ns Ar bad data ) ; .Pq Ar good data No & ~( Ns Ar bad data ) ;
that is, the bits set in the good data which are cleared in the bad data. that is, the bits set in the good data which are cleared in the bad data.
.It Sy bad_set_histogram
If this field exists, it is an array of counters.
Each entry counts bits set in a particular bit of a big-endian uint64 type.
The first entry counts bits
set in the high-order bit of the first byte, the 9th byte, etc, and the last
entry counts bits set of the low-order bit of the 8th byte, the 16th byte, etc.
This information is useful for observing a stuck bit in a parallel data path,
such as IDE or parallel SCSI.
.It Sy bad_cleared_histogram
If this field exists, it is an array of counters.
Each entry counts bit clears in a particular bit of a big-endian uint64 type.
The first entry counts bits
clears of the high-order bit of the first byte, the 9th byte, etc, and the
last entry counts clears of the low-order bit of the 8th byte, the 16th byte,
etc.
This information is useful for observing a stuck bit in a parallel data
path, such as IDE or parallel SCSI.
.El .El
. .
.Sh I/O STAGES .Sh I/O STAGES

View File

@ -461,6 +461,7 @@ ZFS_OBJS_OS := \
zpl_ctldir.o \ zpl_ctldir.o \
zpl_export.o \ zpl_export.o \
zpl_file.o \ zpl_file.o \
zpl_file_range.o \
zpl_inode.o \ zpl_inode.o \
zpl_super.o \ zpl_super.o \
zpl_xattr.o \ zpl_xattr.o \

View File

@ -6263,7 +6263,8 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
goto bad_write_fallback; goto bad_write_fallback;
} }
} else { } else {
#if __FreeBSD_version >= 1400086 #if (__FreeBSD_version >= 1302506 && __FreeBSD_version < 1400000) || \
__FreeBSD_version >= 1400086
vn_lock_pair(invp, false, LK_EXCLUSIVE, outvp, false, vn_lock_pair(invp, false, LK_EXCLUSIVE, outvp, false,
LK_EXCLUSIVE); LK_EXCLUSIVE);
#else #else
@ -6289,7 +6290,7 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp), error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
ap->a_outoffp, &len, ap->a_outcred); ap->a_outoffp, &len, ap->a_outcred);
if (error == EXDEV) if (error == EXDEV || error == EOPNOTSUPP)
goto bad_locked_fallback; goto bad_locked_fallback;
*ap->a_lenp = (size_t)len; *ap->a_lenp = (size_t)len;
out_locked: out_locked:

View File

@ -1662,6 +1662,7 @@ zfs_umount(struct super_block *sb)
} }
zfsvfs_free(zfsvfs); zfsvfs_free(zfsvfs);
sb->s_fs_info = NULL;
return (0); return (0);
} }
@ -2091,6 +2092,9 @@ zfs_init(void)
zfs_znode_init(); zfs_znode_init();
dmu_objset_register_type(DMU_OST_ZFS, zpl_get_file_info); dmu_objset_register_type(DMU_OST_ZFS, zpl_get_file_info);
register_filesystem(&zpl_fs_type); register_filesystem(&zpl_fs_type);
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
register_fo_extend(&zpl_file_operations);
#endif
} }
void void
@ -2101,6 +2105,9 @@ zfs_fini(void)
*/ */
taskq_wait(system_delay_taskq); taskq_wait(system_delay_taskq);
taskq_wait(system_taskq); taskq_wait(system_taskq);
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
unregister_fo_extend(&zpl_file_operations);
#endif
unregister_filesystem(&zpl_fs_type); unregister_filesystem(&zpl_fs_type);
zfs_znode_fini(); zfs_znode_fini();
zfsctl_fini(); zfsctl_fini();

View File

@ -415,7 +415,11 @@ zfs_inode_set_ops(zfsvfs_t *zfsvfs, struct inode *ip)
switch (ip->i_mode & S_IFMT) { switch (ip->i_mode & S_IFMT) {
case S_IFREG: case S_IFREG:
ip->i_op = &zpl_inode_operations; ip->i_op = &zpl_inode_operations;
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
ip->i_fop = &zpl_file_operations.kabi_fops;
#else
ip->i_fop = &zpl_file_operations; ip->i_fop = &zpl_file_operations;
#endif
ip->i_mapping->a_ops = &zpl_address_space_operations; ip->i_mapping->a_ops = &zpl_address_space_operations;
break; break;
@ -455,7 +459,11 @@ zfs_inode_set_ops(zfsvfs_t *zfsvfs, struct inode *ip)
/* Assume the inode is a file and attempt to continue */ /* Assume the inode is a file and attempt to continue */
ip->i_mode = S_IFREG | 0644; ip->i_mode = S_IFREG | 0644;
ip->i_op = &zpl_inode_operations; ip->i_op = &zpl_inode_operations;
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
ip->i_fop = &zpl_file_operations.kabi_fops;
#else
ip->i_fop = &zpl_file_operations; ip->i_fop = &zpl_file_operations;
#endif
ip->i_mapping->a_ops = &zpl_address_space_operations; ip->i_mapping->a_ops = &zpl_address_space_operations;
break; break;
} }

View File

@ -1257,6 +1257,12 @@ zpl_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return (zpl_ioctl_getdosflags(filp, (void *)arg)); return (zpl_ioctl_getdosflags(filp, (void *)arg));
case ZFS_IOC_SETDOSFLAGS: case ZFS_IOC_SETDOSFLAGS:
return (zpl_ioctl_setdosflags(filp, (void *)arg)); return (zpl_ioctl_setdosflags(filp, (void *)arg));
case ZFS_IOC_COMPAT_FICLONE:
return (zpl_ioctl_ficlone(filp, (void *)arg));
case ZFS_IOC_COMPAT_FICLONERANGE:
return (zpl_ioctl_ficlonerange(filp, (void *)arg));
case ZFS_IOC_COMPAT_FIDEDUPERANGE:
return (zpl_ioctl_fideduperange(filp, (void *)arg));
default: default:
return (-ENOTTY); return (-ENOTTY);
} }
@ -1283,7 +1289,6 @@ zpl_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
} }
#endif /* CONFIG_COMPAT */ #endif /* CONFIG_COMPAT */
const struct address_space_operations zpl_address_space_operations = { const struct address_space_operations zpl_address_space_operations = {
#ifdef HAVE_VFS_READPAGES #ifdef HAVE_VFS_READPAGES
.readpages = zpl_readpages, .readpages = zpl_readpages,
@ -1306,7 +1311,12 @@ const struct address_space_operations zpl_address_space_operations = {
#endif #endif
}; };
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
const struct file_operations_extend zpl_file_operations = {
.kabi_fops = {
#else
const struct file_operations zpl_file_operations = { const struct file_operations zpl_file_operations = {
#endif
.open = zpl_open, .open = zpl_open,
.release = zpl_release, .release = zpl_release,
.llseek = zpl_llseek, .llseek = zpl_llseek,
@ -1333,6 +1343,18 @@ const struct file_operations zpl_file_operations = {
.aio_fsync = zpl_aio_fsync, .aio_fsync = zpl_aio_fsync,
#endif #endif
.fallocate = zpl_fallocate, .fallocate = zpl_fallocate,
#ifdef HAVE_VFS_COPY_FILE_RANGE
.copy_file_range = zpl_copy_file_range,
#endif
#ifdef HAVE_VFS_CLONE_FILE_RANGE
.clone_file_range = zpl_clone_file_range,
#endif
#ifdef HAVE_VFS_REMAP_FILE_RANGE
.remap_file_range = zpl_remap_file_range,
#endif
#ifdef HAVE_VFS_DEDUPE_FILE_RANGE
.dedupe_file_range = zpl_dedupe_file_range,
#endif
#ifdef HAVE_FILE_FADVISE #ifdef HAVE_FILE_FADVISE
.fadvise = zpl_fadvise, .fadvise = zpl_fadvise,
#endif #endif
@ -1340,6 +1362,11 @@ const struct file_operations zpl_file_operations = {
#ifdef CONFIG_COMPAT #ifdef CONFIG_COMPAT
.compat_ioctl = zpl_compat_ioctl, .compat_ioctl = zpl_compat_ioctl,
#endif #endif
#ifdef HAVE_VFS_FILE_OPERATIONS_EXTEND
}, /* kabi_fops */
.copy_file_range = zpl_copy_file_range,
.clone_file_range = zpl_clone_file_range,
#endif
}; };
const struct file_operations zpl_dir_file_operations = { const struct file_operations zpl_dir_file_operations = {

View File

@ -0,0 +1,264 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2023, Klara Inc.
*/
#ifdef CONFIG_COMPAT
#include <linux/compat.h>
#endif
#include <linux/fs.h>
#include <sys/file.h>
#include <sys/zfs_znode.h>
#include <sys/zfs_vnops.h>
#include <sys/zfeature.h>
/*
* Clone part of a file via block cloning.
*
* Note that we are not required to update file offsets; the kernel will take
* care of that depending on how it was called.
*/
static ssize_t
__zpl_clone_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, size_t len)
{
struct inode *src_i = file_inode(src_file);
struct inode *dst_i = file_inode(dst_file);
uint64_t src_off_o = (uint64_t)src_off;
uint64_t dst_off_o = (uint64_t)dst_off;
uint64_t len_o = (uint64_t)len;
cred_t *cr = CRED();
fstrans_cookie_t cookie;
int err;
if (!spa_feature_is_enabled(
dmu_objset_spa(ITOZSB(dst_i)->z_os), SPA_FEATURE_BLOCK_CLONING))
return (-EOPNOTSUPP);
if (src_i != dst_i)
spl_inode_lock_shared(src_i);
spl_inode_lock(dst_i);
crhold(cr);
cookie = spl_fstrans_mark();
err = -zfs_clone_range(ITOZ(src_i), &src_off_o, ITOZ(dst_i),
&dst_off_o, &len_o, cr);
spl_fstrans_unmark(cookie);
crfree(cr);
spl_inode_unlock(dst_i);
if (src_i != dst_i)
spl_inode_unlock_shared(src_i);
if (err < 0)
return (err);
return ((ssize_t)len_o);
}
#if defined(HAVE_VFS_COPY_FILE_RANGE) || \
defined(HAVE_VFS_FILE_OPERATIONS_EXTEND)
/*
* Entry point for copy_file_range(). Copy len bytes from src_off in src_file
* to dst_off in dst_file. We are permitted to do this however we like, so we
* try to just clone the blocks, and if we can't support it, fall back to the
* kernel's generic byte copy function.
*/
ssize_t
zpl_copy_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, size_t len, unsigned int flags)
{
ssize_t ret;
if (flags != 0)
return (-EINVAL);
/* Try to do it via zfs_clone_range() */
ret = __zpl_clone_file_range(src_file, src_off,
dst_file, dst_off, len);
#ifdef HAVE_VFS_GENERIC_COPY_FILE_RANGE
/*
* Since Linux 5.3 the filesystem driver is responsible for executing
* an appropriate fallback, and a generic fallback function is provided.
*/
if (ret == -EOPNOTSUPP || ret == -EXDEV)
ret = generic_copy_file_range(src_file, src_off, dst_file,
dst_off, len, flags);
#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE */
return (ret);
}
#endif /* HAVE_VFS_COPY_FILE_RANGE || HAVE_VFS_FILE_OPERATIONS_EXTEND */
#ifdef HAVE_VFS_REMAP_FILE_RANGE
/*
* Entry point for FICLONE/FICLONERANGE/FIDEDUPERANGE.
*
* FICLONE and FICLONERANGE are basically the same as copy_file_range(), except
* that they must clone - they cannot fall back to copying. FICLONE is exactly
* FICLONERANGE, for the entire file. We don't need to try to tell them apart;
* the kernel will sort that out for us.
*
* FIDEDUPERANGE is for turning a non-clone into a clone, that is, compare the
* range in both files and if they're the same, arrange for them to be backed
* by the same storage.
*/
loff_t
zpl_remap_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, loff_t len, unsigned int flags)
{
if (flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_CAN_SHORTEN))
return (-EINVAL);
/*
* REMAP_FILE_CAN_SHORTEN lets us know we can clone less than the given
* range if we want. Its designed for filesystems that make data past
* EOF available, and don't want it to be visible in both files. ZFS
* doesn't do that, so we just turn the flag off.
*/
flags &= ~REMAP_FILE_CAN_SHORTEN;
if (flags & REMAP_FILE_DEDUP)
/* No support for dedup yet */
return (-EOPNOTSUPP);
/* Zero length means to clone everything to the end of the file */
if (len == 0)
len = i_size_read(file_inode(src_file)) - src_off;
return (__zpl_clone_file_range(src_file, src_off,
dst_file, dst_off, len));
}
#endif /* HAVE_VFS_REMAP_FILE_RANGE */
#if defined(HAVE_VFS_CLONE_FILE_RANGE) || \
defined(HAVE_VFS_FILE_OPERATIONS_EXTEND)
/*
* Entry point for FICLONE and FICLONERANGE, before Linux 4.20.
*/
int
zpl_clone_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len)
{
/* Zero length means to clone everything to the end of the file */
if (len == 0)
len = i_size_read(file_inode(src_file)) - src_off;
return (__zpl_clone_file_range(src_file, src_off,
dst_file, dst_off, len));
}
#endif /* HAVE_VFS_CLONE_FILE_RANGE || HAVE_VFS_FILE_OPERATIONS_EXTEND */
#ifdef HAVE_VFS_DEDUPE_FILE_RANGE
/*
* Entry point for FIDEDUPERANGE, before Linux 4.20.
*/
int
zpl_dedupe_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len)
{
/* No support for dedup yet */
return (-EOPNOTSUPP);
}
#endif /* HAVE_VFS_DEDUPE_FILE_RANGE */
/* Entry point for FICLONE, before Linux 4.5. */
long
zpl_ioctl_ficlone(struct file *dst_file, void *arg)
{
unsigned long sfd = (unsigned long)arg;
struct file *src_file = fget(sfd);
if (src_file == NULL)
return (-EBADF);
if (dst_file->f_op != src_file->f_op)
return (-EXDEV);
size_t len = i_size_read(file_inode(src_file));
ssize_t ret =
__zpl_clone_file_range(src_file, 0, dst_file, 0, len);
fput(src_file);
if (ret < 0) {
if (ret == -EOPNOTSUPP)
return (-ENOTTY);
return (ret);
}
if (ret != len)
return (-EINVAL);
return (0);
}
/* Entry point for FICLONERANGE, before Linux 4.5. */
long
zpl_ioctl_ficlonerange(struct file *dst_file, void __user *arg)
{
zfs_ioc_compat_file_clone_range_t fcr;
if (copy_from_user(&fcr, arg, sizeof (fcr)))
return (-EFAULT);
struct file *src_file = fget(fcr.fcr_src_fd);
if (src_file == NULL)
return (-EBADF);
if (dst_file->f_op != src_file->f_op)
return (-EXDEV);
size_t len = fcr.fcr_src_length;
if (len == 0)
len = i_size_read(file_inode(src_file)) - fcr.fcr_src_offset;
ssize_t ret = __zpl_clone_file_range(src_file, fcr.fcr_src_offset,
dst_file, fcr.fcr_dest_offset, len);
fput(src_file);
if (ret < 0) {
if (ret == -EOPNOTSUPP)
return (-ENOTTY);
return (ret);
}
if (ret != len)
return (-EINVAL);
return (0);
}
/* Entry point for FIDEDUPERANGE, before Linux 4.5. */
long
zpl_ioctl_fideduperange(struct file *filp, void *arg)
{
(void) arg;
/* No support for dedup yet */
return (-ENOTTY);
}

View File

@ -277,8 +277,6 @@ zpl_test_super(struct super_block *s, void *data)
{ {
zfsvfs_t *zfsvfs = s->s_fs_info; zfsvfs_t *zfsvfs = s->s_fs_info;
objset_t *os = data; objset_t *os = data;
int match;
/* /*
* If the os doesn't match the z_os in the super_block, assume it is * If the os doesn't match the z_os in the super_block, assume it is
* not a match. Matching would imply a multimount of a dataset. It is * not a match. Matching would imply a multimount of a dataset. It is
@ -286,19 +284,7 @@ zpl_test_super(struct super_block *s, void *data)
* that changes the z_os, e.g., rollback, where the match will be * that changes the z_os, e.g., rollback, where the match will be
* missed, but in that case the user will get an EBUSY. * missed, but in that case the user will get an EBUSY.
*/ */
if (zfsvfs == NULL || os != zfsvfs->z_os) return (zfsvfs != NULL && os == zfsvfs->z_os);
return (0);
/*
* If they do match, recheck with the lock held to prevent mounting the
* wrong dataset since z_os can be stale when the teardown lock is held.
*/
if (zpl_enter(zfsvfs, FTAG) != 0)
return (0);
match = (os == zfsvfs->z_os);
zpl_exit(zfsvfs, FTAG);
return (match);
} }
static struct super_block * static struct super_block *
@ -324,12 +310,35 @@ zpl_mount_impl(struct file_system_type *fs_type, int flags, zfs_mnt_t *zm)
s = sget(fs_type, zpl_test_super, set_anon_super, flags, os); s = sget(fs_type, zpl_test_super, set_anon_super, flags, os);
/*
* Recheck with the lock held to prevent mounting the wrong dataset
* since z_os can be stale when the teardown lock is held.
*
* We can't do this in zpl_test_super in since it's under spinlock and
* also s_umount lock is not held there so it would race with
* zfs_umount and zfsvfs can be freed.
*/
if (!IS_ERR(s) && s->s_fs_info != NULL) {
zfsvfs_t *zfsvfs = s->s_fs_info;
if (zpl_enter(zfsvfs, FTAG) == 0) {
if (os != zfsvfs->z_os)
err = -SET_ERROR(EBUSY);
zpl_exit(zfsvfs, FTAG);
} else {
err = -SET_ERROR(EBUSY);
}
}
dsl_dataset_long_rele(dmu_objset_ds(os), FTAG); dsl_dataset_long_rele(dmu_objset_ds(os), FTAG);
dsl_dataset_rele(dmu_objset_ds(os), FTAG); dsl_dataset_rele(dmu_objset_ds(os), FTAG);
if (IS_ERR(s)) if (IS_ERR(s))
return (ERR_CAST(s)); return (ERR_CAST(s));
if (err) {
deactivate_locked_super(s);
return (ERR_PTR(err));
}
if (s->s_root == NULL) { if (s->s_root == NULL) {
err = zpl_fill_super(s, zm, flags & SB_SILENT ? 1 : 0); err = zpl_fill_super(s, zm, flags & SB_SILENT ? 1 : 0);
if (err) { if (err) {

View File

@ -160,7 +160,7 @@ zpool_prop_init(void)
"wait | continue | panic", "FAILMODE", failuremode_table, "wait | continue | panic", "FAILMODE", failuremode_table,
sfeatures); sfeatures);
zprop_register_index(ZPOOL_PROP_AUTOTRIM, "autotrim", zprop_register_index(ZPOOL_PROP_AUTOTRIM, "autotrim",
SPA_AUTOTRIM_DEFAULT, PROP_DEFAULT, ZFS_TYPE_POOL, SPA_AUTOTRIM_OFF, PROP_DEFAULT, ZFS_TYPE_POOL,
"on | off", "AUTOTRIM", boolean_table, sfeatures); "on | off", "AUTOTRIM", boolean_table, sfeatures);
/* hidden properties */ /* hidden properties */

View File

@ -284,7 +284,17 @@ bpobj_iterate_blkptrs(bpobj_info_t *bpi, bpobj_itor_t func, void *arg,
dmu_buf_t *dbuf = NULL; dmu_buf_t *dbuf = NULL;
bpobj_t *bpo = bpi->bpi_bpo; bpobj_t *bpo = bpi->bpi_bpo;
for (int64_t i = bpo->bpo_phys->bpo_num_blkptrs - 1; i >= start; i--) { int64_t i = bpo->bpo_phys->bpo_num_blkptrs - 1;
uint64_t pe = P2ALIGN_TYPED(i, bpo->bpo_epb, uint64_t) *
sizeof (blkptr_t);
uint64_t ps = start * sizeof (blkptr_t);
uint64_t pb = MAX((pe > dmu_prefetch_max) ? pe - dmu_prefetch_max : 0,
ps);
if (pe > pb) {
dmu_prefetch(bpo->bpo_os, bpo->bpo_object, 0, pb, pe - pb,
ZIO_PRIORITY_ASYNC_READ);
}
for (; i >= start; i--) {
uint64_t offset = i * sizeof (blkptr_t); uint64_t offset = i * sizeof (blkptr_t);
uint64_t blkoff = P2PHASE(i, bpo->bpo_epb); uint64_t blkoff = P2PHASE(i, bpo->bpo_epb);
@ -292,9 +302,16 @@ bpobj_iterate_blkptrs(bpobj_info_t *bpi, bpobj_itor_t func, void *arg,
if (dbuf) if (dbuf)
dmu_buf_rele(dbuf, FTAG); dmu_buf_rele(dbuf, FTAG);
err = dmu_buf_hold(bpo->bpo_os, bpo->bpo_object, err = dmu_buf_hold(bpo->bpo_os, bpo->bpo_object,
offset, FTAG, &dbuf, 0); offset, FTAG, &dbuf, DMU_READ_NO_PREFETCH);
if (err) if (err)
break; break;
pe = pb;
pb = MAX((dbuf->db_offset > dmu_prefetch_max) ?
dbuf->db_offset - dmu_prefetch_max : 0, ps);
if (pe > pb) {
dmu_prefetch(bpo->bpo_os, bpo->bpo_object, 0,
pb, pe - pb, ZIO_PRIORITY_ASYNC_READ);
}
} }
ASSERT3U(offset, >=, dbuf->db_offset); ASSERT3U(offset, >=, dbuf->db_offset);
@ -466,22 +483,30 @@ bpobj_iterate_impl(bpobj_t *initial_bpo, bpobj_itor_t func, void *arg,
int64_t i = bpi->bpi_unprocessed_subobjs - 1; int64_t i = bpi->bpi_unprocessed_subobjs - 1;
uint64_t offset = i * sizeof (uint64_t); uint64_t offset = i * sizeof (uint64_t);
uint64_t obj_from_sublist; uint64_t subobj;
err = dmu_read(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs, err = dmu_read(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs,
offset, sizeof (uint64_t), &obj_from_sublist, offset, sizeof (uint64_t), &subobj,
DMU_READ_PREFETCH); DMU_READ_NO_PREFETCH);
if (err) if (err)
break; break;
bpobj_t *sublist = kmem_alloc(sizeof (bpobj_t),
bpobj_t *subbpo = kmem_alloc(sizeof (bpobj_t),
KM_SLEEP); KM_SLEEP);
err = bpobj_open(subbpo, bpo->bpo_os, subobj);
err = bpobj_open(sublist, bpo->bpo_os, if (err) {
obj_from_sublist); kmem_free(subbpo, sizeof (bpobj_t));
if (err)
break; break;
}
list_insert_head(&stack, bpi_alloc(sublist, bpi, i)); if (subbpo->bpo_havesubobj &&
mutex_enter(&sublist->bpo_lock); subbpo->bpo_phys->bpo_subobjs != 0) {
dmu_prefetch(subbpo->bpo_os,
subbpo->bpo_phys->bpo_subobjs, 0, 0, 0,
ZIO_PRIORITY_ASYNC_READ);
}
list_insert_head(&stack, bpi_alloc(subbpo, bpi, i));
mutex_enter(&subbpo->bpo_lock);
bpi->bpi_unprocessed_subobjs--; bpi->bpi_unprocessed_subobjs--;
} }
} }

View File

@ -680,7 +680,7 @@ brt_vdev_realloc(brt_t *brt, brt_vdev_t *brtvd)
size = (vdev_get_min_asize(vd) - 1) / brt->brt_rangesize + 1; size = (vdev_get_min_asize(vd) - 1) / brt->brt_rangesize + 1;
spa_config_exit(brt->brt_spa, SCL_VDEV, FTAG); spa_config_exit(brt->brt_spa, SCL_VDEV, FTAG);
entcount = kmem_zalloc(sizeof (entcount[0]) * size, KM_SLEEP); entcount = vmem_zalloc(sizeof (entcount[0]) * size, KM_SLEEP);
nblocks = BRT_RANGESIZE_TO_NBLOCKS(size); nblocks = BRT_RANGESIZE_TO_NBLOCKS(size);
bitmap = kmem_zalloc(BT_SIZEOFMAP(nblocks), KM_SLEEP); bitmap = kmem_zalloc(BT_SIZEOFMAP(nblocks), KM_SLEEP);
@ -709,7 +709,7 @@ brt_vdev_realloc(brt_t *brt, brt_vdev_t *brtvd)
sizeof (entcount[0]) * MIN(size, brtvd->bv_size)); sizeof (entcount[0]) * MIN(size, brtvd->bv_size));
memcpy(bitmap, brtvd->bv_bitmap, MIN(BT_SIZEOFMAP(nblocks), memcpy(bitmap, brtvd->bv_bitmap, MIN(BT_SIZEOFMAP(nblocks),
BT_SIZEOFMAP(brtvd->bv_nblocks))); BT_SIZEOFMAP(brtvd->bv_nblocks)));
kmem_free(brtvd->bv_entcount, vmem_free(brtvd->bv_entcount,
sizeof (entcount[0]) * brtvd->bv_size); sizeof (entcount[0]) * brtvd->bv_size);
kmem_free(brtvd->bv_bitmap, BT_SIZEOFMAP(brtvd->bv_nblocks)); kmem_free(brtvd->bv_bitmap, BT_SIZEOFMAP(brtvd->bv_nblocks));
} }
@ -792,7 +792,7 @@ brt_vdev_dealloc(brt_t *brt, brt_vdev_t *brtvd)
ASSERT(RW_WRITE_HELD(&brt->brt_lock)); ASSERT(RW_WRITE_HELD(&brt->brt_lock));
ASSERT(brtvd->bv_initiated); ASSERT(brtvd->bv_initiated);
kmem_free(brtvd->bv_entcount, sizeof (uint16_t) * brtvd->bv_size); vmem_free(brtvd->bv_entcount, sizeof (uint16_t) * brtvd->bv_size);
brtvd->bv_entcount = NULL; brtvd->bv_entcount = NULL;
kmem_free(brtvd->bv_bitmap, BT_SIZEOFMAP(brtvd->bv_nblocks)); kmem_free(brtvd->bv_bitmap, BT_SIZEOFMAP(brtvd->bv_nblocks));
brtvd->bv_bitmap = NULL; brtvd->bv_bitmap = NULL;

View File

@ -2701,7 +2701,7 @@ dmu_buf_will_clone(dmu_buf_t *db_fake, dmu_tx_t *tx)
*/ */
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
VERIFY(!dbuf_undirty(db, tx)); VERIFY(!dbuf_undirty(db, tx));
ASSERT(list_head(&db->db_dirty_records) == NULL); ASSERT0(dbuf_find_dirty_eq(db, tx->tx_txg));
if (db->db_buf != NULL) { if (db->db_buf != NULL) {
arc_buf_destroy(db->db_buf, db); arc_buf_destroy(db->db_buf, db);
db->db_buf = NULL; db->db_buf = NULL;
@ -4457,6 +4457,15 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx)
} else if (db->db_state == DB_FILL) { } else if (db->db_state == DB_FILL) {
/* This buffer was freed and is now being re-filled */ /* This buffer was freed and is now being re-filled */
ASSERT(db->db.db_data != dr->dt.dl.dr_data); ASSERT(db->db.db_data != dr->dt.dl.dr_data);
} else if (db->db_state == DB_READ) {
/*
* This buffer has a clone we need to write, and an in-flight
* read on the BP we're about to clone. Its safe to issue the
* write here because the read has already been issued and the
* contents won't change.
*/
ASSERT(dr->dt.dl.dr_brtwrite &&
dr->dt.dl.dr_override_state == DR_OVERRIDDEN);
} else { } else {
ASSERT(db->db_state == DB_CACHED || db->db_state == DB_NOFILL); ASSERT(db->db_state == DB_CACHED || db->db_state == DB_NOFILL);
} }

View File

@ -89,7 +89,11 @@ static int zfs_dmu_offset_next_sync = 1;
* helps to limit the amount of memory that can be used by prefetching. * helps to limit the amount of memory that can be used by prefetching.
* Larger objects should be prefetched a bit at a time. * Larger objects should be prefetched a bit at a time.
*/ */
#ifdef _ILP32
uint_t dmu_prefetch_max = 8 * 1024 * 1024;
#else
uint_t dmu_prefetch_max = 8 * SPA_MAXBLOCKSIZE; uint_t dmu_prefetch_max = 8 * SPA_MAXBLOCKSIZE;
#endif
const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = { const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = {
{DMU_BSWAP_UINT8, TRUE, FALSE, FALSE, "unallocated" }, {DMU_BSWAP_UINT8, TRUE, FALSE, FALSE, "unallocated" },
@ -552,8 +556,7 @@ dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, uint64_t length,
zio = zio_root(dn->dn_objset->os_spa, NULL, NULL, zio = zio_root(dn->dn_objset->os_spa, NULL, NULL,
ZIO_FLAG_CANFAIL); ZIO_FLAG_CANFAIL);
blkid = dbuf_whichblock(dn, 0, offset); blkid = dbuf_whichblock(dn, 0, offset);
if ((flags & DMU_READ_NO_PREFETCH) == 0 && if ((flags & DMU_READ_NO_PREFETCH) == 0) {
length <= zfetch_array_rd_sz) {
/* /*
* Prepare the zfetch before initiating the demand reads, so * Prepare the zfetch before initiating the demand reads, so
* that if multiple threads block on same indirect block, we * that if multiple threads block on same indirect block, we

View File

@ -1795,17 +1795,19 @@ receive_handle_existing_object(const struct receive_writer_arg *rwa,
} }
/* /*
* The dmu does not currently support decreasing nlevels * The dmu does not currently support decreasing nlevels or changing
* or changing the number of dnode slots on an object. For * indirect block size if there is already one, same as changing the
* non-raw sends, this does not matter and the new object * number of of dnode slots on an object. For non-raw sends this
* can just use the previous one's nlevels. For raw sends, * does not matter and the new object can just use the previous one's
* however, the structure of the received dnode (including * parameters. For raw sends, however, the structure of the received
* nlevels and dnode slots) must match that of the send * dnode (including indirects and dnode slots) must match that of the
* side. Therefore, instead of using dmu_object_reclaim(), * send side. Therefore, instead of using dmu_object_reclaim(), we
* we must free the object completely and call * must free the object completely and call dmu_object_claim_dnsize()
* dmu_object_claim_dnsize() instead. * instead.
*/ */
if ((rwa->raw && drro->drr_nlevels < doi->doi_indirection) || if ((rwa->raw && ((doi->doi_indirection > 1 &&
indblksz != doi->doi_metadata_block_size) ||
drro->drr_nlevels < doi->doi_indirection)) ||
dn_slots != doi->doi_dnodesize >> DNODE_SHIFT) { dn_slots != doi->doi_dnodesize >> DNODE_SHIFT) {
err = dmu_free_long_object(rwa->os, drro->drr_object); err = dmu_free_long_object(rwa->os, drro->drr_object);
if (err != 0) if (err != 0)

View File

@ -52,14 +52,19 @@ static unsigned int zfetch_max_streams = 8;
static unsigned int zfetch_min_sec_reap = 1; static unsigned int zfetch_min_sec_reap = 1;
/* max time before stream delete */ /* max time before stream delete */
static unsigned int zfetch_max_sec_reap = 2; static unsigned int zfetch_max_sec_reap = 2;
#ifdef _ILP32
/* min bytes to prefetch per stream (default 2MB) */
static unsigned int zfetch_min_distance = 2 * 1024 * 1024;
/* max bytes to prefetch per stream (default 8MB) */
unsigned int zfetch_max_distance = 8 * 1024 * 1024;
#else
/* min bytes to prefetch per stream (default 4MB) */ /* min bytes to prefetch per stream (default 4MB) */
static unsigned int zfetch_min_distance = 4 * 1024 * 1024; static unsigned int zfetch_min_distance = 4 * 1024 * 1024;
/* max bytes to prefetch per stream (default 64MB) */ /* max bytes to prefetch per stream (default 64MB) */
unsigned int zfetch_max_distance = 64 * 1024 * 1024; unsigned int zfetch_max_distance = 64 * 1024 * 1024;
#endif
/* max bytes to prefetch indirects for per stream (default 64MB) */ /* max bytes to prefetch indirects for per stream (default 64MB) */
unsigned int zfetch_max_idistance = 64 * 1024 * 1024; unsigned int zfetch_max_idistance = 64 * 1024 * 1024;
/* max number of bytes in an array_read in which we allow prefetching (1MB) */
uint64_t zfetch_array_rd_sz = 1024 * 1024;
typedef struct zfetch_stats { typedef struct zfetch_stats {
kstat_named_t zfetchstat_hits; kstat_named_t zfetchstat_hits;
@ -580,6 +585,3 @@ ZFS_MODULE_PARAM(zfs_prefetch, zfetch_, max_distance, UINT, ZMOD_RW,
ZFS_MODULE_PARAM(zfs_prefetch, zfetch_, max_idistance, UINT, ZMOD_RW, ZFS_MODULE_PARAM(zfs_prefetch, zfetch_, max_idistance, UINT, ZMOD_RW,
"Max bytes to prefetch indirects for per stream"); "Max bytes to prefetch indirects for per stream");
ZFS_MODULE_PARAM(zfs_prefetch, zfetch_, array_rd_sz, U64, ZMOD_RW,
"Number of bytes in a array_read");

View File

@ -1882,7 +1882,7 @@ dnode_set_blksz(dnode_t *dn, uint64_t size, int ibs, dmu_tx_t *tx)
if (ibs == dn->dn_indblkshift) if (ibs == dn->dn_indblkshift)
ibs = 0; ibs = 0;
if (size >> SPA_MINBLOCKSHIFT == dn->dn_datablkszsec && ibs == 0) if (size == dn->dn_datablksz && ibs == 0)
return (0); return (0);
rw_enter(&dn->dn_struct_rwlock, RW_WRITER); rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
@ -1905,6 +1905,8 @@ dnode_set_blksz(dnode_t *dn, uint64_t size, int ibs, dmu_tx_t *tx)
if (ibs && dn->dn_nlevels != 1) if (ibs && dn->dn_nlevels != 1)
goto fail; goto fail;
dnode_setdirty(dn, tx);
if (size != dn->dn_datablksz) {
/* resize the old block */ /* resize the old block */
err = dbuf_hold_impl(dn, 0, 0, TRUE, FALSE, FTAG, &db); err = dbuf_hold_impl(dn, 0, 0, TRUE, FALSE, FTAG, &db);
if (err == 0) { if (err == 0) {
@ -1914,15 +1916,14 @@ dnode_set_blksz(dnode_t *dn, uint64_t size, int ibs, dmu_tx_t *tx)
} }
dnode_setdblksz(dn, size); dnode_setdblksz(dn, size);
dnode_setdirty(dn, tx);
dn->dn_next_blksz[tx->tx_txg & TXG_MASK] = size; dn->dn_next_blksz[tx->tx_txg & TXG_MASK] = size;
if (db)
dbuf_rele(db, FTAG);
}
if (ibs) { if (ibs) {
dn->dn_indblkshift = ibs; dn->dn_indblkshift = ibs;
dn->dn_next_indblkshift[tx->tx_txg & TXG_MASK] = ibs; dn->dn_next_indblkshift[tx->tx_txg & TXG_MASK] = ibs;
} }
/* release after we have fixed the blocksize in the dnode */
if (db)
dbuf_rele(db, FTAG);
rw_exit(&dn->dn_struct_rwlock); rw_exit(&dn->dn_struct_rwlock);
return (0); return (0);

View File

@ -892,9 +892,9 @@ dsl_deadlist_merge(dsl_deadlist_t *dl, uint64_t obj, dmu_tx_t *tx)
for (zap_cursor_init(&zc, dl->dl_os, obj); for (zap_cursor_init(&zc, dl->dl_os, obj);
(error = zap_cursor_retrieve(&zc, za)) == 0; (error = zap_cursor_retrieve(&zc, za)) == 0;
zap_cursor_advance(&zc)) { zap_cursor_advance(&zc)) {
uint64_t mintxg = zfs_strtonum(za->za_name, NULL); dsl_deadlist_insert_bpobj(dl, za->za_first_integer,
dsl_deadlist_insert_bpobj(dl, za->za_first_integer, mintxg, tx); zfs_strtonum(za->za_name, NULL), tx);
VERIFY0(zap_remove_int(dl->dl_os, obj, mintxg, tx)); VERIFY0(zap_remove(dl->dl_os, obj, za->za_name, tx));
if (perror == 0) { if (perror == 0) {
dsl_deadlist_prefetch_bpobj(dl, pza->za_first_integer, dsl_deadlist_prefetch_bpobj(dl, pza->za_first_integer,
zfs_strtonum(pza->za_name, NULL)); zfs_strtonum(pza->za_name, NULL));

View File

@ -2015,6 +2015,11 @@ dsl_scan_prefetch_cb(zio_t *zio, const zbookmark_phys_t *zb, const blkptr_t *bp,
zb->zb_objset, DMU_META_DNODE_OBJECT); zb->zb_objset, DMU_META_DNODE_OBJECT);
if (OBJSET_BUF_HAS_USERUSED(buf)) { if (OBJSET_BUF_HAS_USERUSED(buf)) {
if (OBJSET_BUF_HAS_PROJECTUSED(buf)) {
dsl_scan_prefetch_dnode(scn,
&osp->os_projectused_dnode, zb->zb_objset,
DMU_PROJECTUSED_OBJECT);
}
dsl_scan_prefetch_dnode(scn, dsl_scan_prefetch_dnode(scn,
&osp->os_groupused_dnode, zb->zb_objset, &osp->os_groupused_dnode, zb->zb_objset,
DMU_GROUPUSED_OBJECT); DMU_GROUPUSED_OBJECT);
@ -2075,10 +2080,16 @@ dsl_scan_prefetch_thread(void *arg)
zio_flags |= ZIO_FLAG_RAW; zio_flags |= ZIO_FLAG_RAW;
} }
/* We don't need data L1 buffer since we do not prefetch L0. */
blkptr_t *bp = &spic->spic_bp;
if (BP_GET_LEVEL(bp) == 1 && BP_GET_TYPE(bp) != DMU_OT_DNODE &&
BP_GET_TYPE(bp) != DMU_OT_OBJSET)
flags |= ARC_FLAG_NO_BUF;
/* issue the prefetch asynchronously */ /* issue the prefetch asynchronously */
(void) arc_read(scn->scn_zio_root, scn->scn_dp->dp_spa, (void) arc_read(scn->scn_zio_root, spa, bp,
&spic->spic_bp, dsl_scan_prefetch_cb, spic->spic_spc, dsl_scan_prefetch_cb, spic->spic_spc, ZIO_PRIORITY_SCRUB,
ZIO_PRIORITY_SCRUB, zio_flags, &flags, &spic->spic_zb); zio_flags, &flags, &spic->spic_zb);
kmem_free(spic, sizeof (scan_prefetch_issue_ctx_t)); kmem_free(spic, sizeof (scan_prefetch_issue_ctx_t));
} }

View File

@ -58,6 +58,11 @@ static uint64_t metaslab_aliquot = 1024 * 1024;
*/ */
uint64_t metaslab_force_ganging = SPA_MAXBLOCKSIZE + 1; uint64_t metaslab_force_ganging = SPA_MAXBLOCKSIZE + 1;
/*
* Of blocks of size >= metaslab_force_ganging, actually gang them this often.
*/
uint_t metaslab_force_ganging_pct = 3;
/* /*
* In pools where the log space map feature is not enabled we touch * In pools where the log space map feature is not enabled we touch
* multiple metaslabs (and their respective space maps) with each * multiple metaslabs (and their respective space maps) with each
@ -5109,7 +5114,9 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
* damage can result in extremely long reconstruction times. This * damage can result in extremely long reconstruction times. This
* will also test spilling from special to normal. * will also test spilling from special to normal.
*/ */
if (psize >= metaslab_force_ganging && (random_in_range(100) < 3)) { if (psize >= metaslab_force_ganging &&
metaslab_force_ganging_pct > 0 &&
(random_in_range(100) < MIN(metaslab_force_ganging_pct, 100))) {
metaslab_trace_add(zal, NULL, NULL, psize, d, TRACE_FORCE_GANG, metaslab_trace_add(zal, NULL, NULL, psize, d, TRACE_FORCE_GANG,
allocator); allocator);
return (SET_ERROR(ENOSPC)); return (SET_ERROR(ENOSPC));
@ -6266,7 +6273,10 @@ ZFS_MODULE_PARAM(zfs_metaslab, zfs_metaslab_, switch_threshold, INT, ZMOD_RW,
"Segment-based metaslab selection maximum buckets before switching"); "Segment-based metaslab selection maximum buckets before switching");
ZFS_MODULE_PARAM(zfs_metaslab, metaslab_, force_ganging, U64, ZMOD_RW, ZFS_MODULE_PARAM(zfs_metaslab, metaslab_, force_ganging, U64, ZMOD_RW,
"Blocks larger than this size are forced to be gang blocks"); "Blocks larger than this size are sometimes forced to be gang blocks");
ZFS_MODULE_PARAM(zfs_metaslab, metaslab_, force_ganging_pct, UINT, ZMOD_RW,
"Percentage of large blocks that will be forced to be gang blocks");
ZFS_MODULE_PARAM(zfs_metaslab, metaslab_, df_max_search, UINT, ZMOD_RW, ZFS_MODULE_PARAM(zfs_metaslab, metaslab_, df_max_search, UINT, ZMOD_RW,
"Max distance (bytes) to search forward before using size tree"); "Max distance (bytes) to search forward before using size tree");

View File

@ -772,6 +772,7 @@ spa_add(const char *name, nvlist_t *config, const char *altroot)
spa->spa_min_ashift = INT_MAX; spa->spa_min_ashift = INT_MAX;
spa->spa_max_ashift = 0; spa->spa_max_ashift = 0;
spa->spa_min_alloc = INT_MAX; spa->spa_min_alloc = INT_MAX;
spa->spa_gcd_alloc = INT_MAX;
/* Reset cached value */ /* Reset cached value */
spa->spa_dedup_dspace = ~0ULL; spa->spa_dedup_dspace = ~0ULL;

View File

@ -889,9 +889,15 @@ vdev_alloc(spa_t *spa, vdev_t **vdp, nvlist_t *nv, vdev_t *parent, uint_t id,
&vd->vdev_not_present); &vd->vdev_not_present);
/* /*
* Get the alignment requirement. * Get the alignment requirement. Ignore pool ashift for vdev
* attach case.
*/ */
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_ASHIFT, &vd->vdev_ashift); if (alloctype != VDEV_ALLOC_ATTACH) {
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_ASHIFT,
&vd->vdev_ashift);
} else {
vd->vdev_attaching = B_TRUE;
}
/* /*
* Retrieve the vdev creation time. * Retrieve the vdev creation time.
@ -1393,6 +1399,36 @@ vdev_remove_parent(vdev_t *cvd)
vdev_free(mvd); vdev_free(mvd);
} }
/*
* Choose GCD for spa_gcd_alloc.
*/
static uint64_t
vdev_gcd(uint64_t a, uint64_t b)
{
while (b != 0) {
uint64_t t = b;
b = a % b;
a = t;
}
return (a);
}
/*
* Set spa_min_alloc and spa_gcd_alloc.
*/
static void
vdev_spa_set_alloc(spa_t *spa, uint64_t min_alloc)
{
if (min_alloc < spa->spa_min_alloc)
spa->spa_min_alloc = min_alloc;
if (spa->spa_gcd_alloc == INT_MAX) {
spa->spa_gcd_alloc = min_alloc;
} else {
spa->spa_gcd_alloc = vdev_gcd(min_alloc,
spa->spa_gcd_alloc);
}
}
void void
vdev_metaslab_group_create(vdev_t *vd) vdev_metaslab_group_create(vdev_t *vd)
{ {
@ -1445,8 +1481,7 @@ vdev_metaslab_group_create(vdev_t *vd)
spa->spa_min_ashift = vd->vdev_ashift; spa->spa_min_ashift = vd->vdev_ashift;
uint64_t min_alloc = vdev_get_min_alloc(vd); uint64_t min_alloc = vdev_get_min_alloc(vd);
if (min_alloc < spa->spa_min_alloc) vdev_spa_set_alloc(spa, min_alloc);
spa->spa_min_alloc = min_alloc;
} }
} }
} }
@ -2144,9 +2179,9 @@ vdev_open(vdev_t *vd)
return (SET_ERROR(EDOM)); return (SET_ERROR(EDOM));
} }
if (vd->vdev_top == vd) { if (vd->vdev_top == vd && vd->vdev_attaching == B_FALSE)
vdev_ashift_optimize(vd); vdev_ashift_optimize(vd);
} vd->vdev_attaching = B_FALSE;
} }
if (vd->vdev_ashift != 0 && (vd->vdev_ashift < ASHIFT_MIN || if (vd->vdev_ashift != 0 && (vd->vdev_ashift < ASHIFT_MIN ||
vd->vdev_ashift > ASHIFT_MAX)) { vd->vdev_ashift > ASHIFT_MAX)) {
@ -2207,8 +2242,7 @@ vdev_open(vdev_t *vd)
if (vd->vdev_top == vd && vd->vdev_ashift != 0 && if (vd->vdev_top == vd && vd->vdev_ashift != 0 &&
vd->vdev_islog == 0 && vd->vdev_aux == NULL) { vd->vdev_islog == 0 && vd->vdev_aux == NULL) {
uint64_t min_alloc = vdev_get_min_alloc(vd); uint64_t min_alloc = vdev_get_min_alloc(vd);
if (min_alloc < spa->spa_min_alloc) vdev_spa_set_alloc(spa, min_alloc);
spa->spa_min_alloc = min_alloc;
} }
/* /*
@ -5688,6 +5722,7 @@ vdev_props_set_sync(void *arg, dmu_tx_t *tx)
objset_t *mos = spa->spa_meta_objset; objset_t *mos = spa->spa_meta_objset;
nvpair_t *elem = NULL; nvpair_t *elem = NULL;
uint64_t vdev_guid; uint64_t vdev_guid;
uint64_t objid;
nvlist_t *nvprops; nvlist_t *nvprops;
vdev_guid = fnvlist_lookup_uint64(nvp, ZPOOL_VDEV_PROPS_SET_VDEV); vdev_guid = fnvlist_lookup_uint64(nvp, ZPOOL_VDEV_PROPS_SET_VDEV);
@ -5698,15 +5733,6 @@ vdev_props_set_sync(void *arg, dmu_tx_t *tx)
if (vd == NULL) if (vd == NULL)
return; return;
mutex_enter(&spa->spa_props_lock);
while ((elem = nvlist_next_nvpair(nvprops, elem)) != NULL) {
uint64_t intval, objid = 0;
const char *strval;
vdev_prop_t prop;
const char *propname = nvpair_name(elem);
zprop_type_t proptype;
/* /*
* Set vdev property values in the vdev props mos object. * Set vdev property values in the vdev props mos object.
*/ */
@ -5717,12 +5743,18 @@ vdev_props_set_sync(void *arg, dmu_tx_t *tx)
} else if (vd->vdev_leaf_zap != 0) { } else if (vd->vdev_leaf_zap != 0) {
objid = vd->vdev_leaf_zap; objid = vd->vdev_leaf_zap;
} else { } else {
/* panic("unexpected vdev type");
* XXX: implement vdev_props_set_check()
*/
panic("vdev not root/top/leaf");
} }
mutex_enter(&spa->spa_props_lock);
while ((elem = nvlist_next_nvpair(nvprops, elem)) != NULL) {
uint64_t intval;
const char *strval;
vdev_prop_t prop;
const char *propname = nvpair_name(elem);
zprop_type_t proptype;
switch (prop = vdev_name_to_prop(propname)) { switch (prop = vdev_name_to_prop(propname)) {
case VDEV_PROP_USERPROP: case VDEV_PROP_USERPROP:
if (vdev_prop_user(propname)) { if (vdev_prop_user(propname)) {
@ -5791,6 +5823,12 @@ vdev_prop_set(vdev_t *vd, nvlist_t *innvl, nvlist_t *outnvl)
ASSERT(vd != NULL); ASSERT(vd != NULL);
/* Check that vdev has a zap we can use */
if (vd->vdev_root_zap == 0 &&
vd->vdev_top_zap == 0 &&
vd->vdev_leaf_zap == 0)
return (SET_ERROR(EINVAL));
if (nvlist_lookup_uint64(innvl, ZPOOL_VDEV_PROPS_SET_VDEV, if (nvlist_lookup_uint64(innvl, ZPOOL_VDEV_PROPS_SET_VDEV,
&vdev_guid) != 0) &vdev_guid) != 0)
return (SET_ERROR(EINVAL)); return (SET_ERROR(EINVAL));

View File

@ -1398,7 +1398,7 @@ vdev_indirect_checksum_error(zio_t *zio,
vd->vdev_stat.vs_checksum_errors++; vd->vdev_stat.vs_checksum_errors++;
mutex_exit(&vd->vdev_stat_lock); mutex_exit(&vd->vdev_stat_lock);
zio_bad_cksum_t zbc = {{{ 0 }}}; zio_bad_cksum_t zbc = { 0 };
abd_t *bad_abd = ic->ic_data; abd_t *bad_abd = ic->ic_data;
abd_t *good_abd = is->is_good_child->ic_data; abd_t *good_abd = is->is_good_child->ic_data;
(void) zfs_ereport_post_checksum(zio->io_spa, vd, NULL, zio, (void) zfs_ereport_post_checksum(zio->io_spa, vd, NULL, zio,

View File

@ -1785,7 +1785,7 @@ vdev_raidz_checksum_error(zio_t *zio, raidz_col_t *rc, abd_t *bad_data)
static int static int
raidz_checksum_verify(zio_t *zio) raidz_checksum_verify(zio_t *zio)
{ {
zio_bad_cksum_t zbc = {{{0}}}; zio_bad_cksum_t zbc = {0};
raidz_map_t *rm = zio->io_vsd; raidz_map_t *rm = zio->io_vsd;
int ret = zio_checksum_error(zio, &zbc); int ret = zio_checksum_error(zio, &zbc);

View File

@ -754,10 +754,6 @@ zfs_ereport_start(nvlist_t **ereport_out, nvlist_t **detector_out,
#define MAX_RANGES 16 #define MAX_RANGES 16
typedef struct zfs_ecksum_info { typedef struct zfs_ecksum_info {
/* histograms of set and cleared bits by bit number in a 64-bit word */
uint8_t zei_histogram_set[sizeof (uint64_t) * NBBY];
uint8_t zei_histogram_cleared[sizeof (uint64_t) * NBBY];
/* inline arrays of bits set and cleared. */ /* inline arrays of bits set and cleared. */
uint64_t zei_bits_set[ZFM_MAX_INLINE]; uint64_t zei_bits_set[ZFM_MAX_INLINE];
uint64_t zei_bits_cleared[ZFM_MAX_INLINE]; uint64_t zei_bits_cleared[ZFM_MAX_INLINE];
@ -781,7 +777,7 @@ typedef struct zfs_ecksum_info {
} zfs_ecksum_info_t; } zfs_ecksum_info_t;
static void static void
update_histogram(uint64_t value_arg, uint8_t *hist, uint32_t *count) update_bad_bits(uint64_t value_arg, uint32_t *count)
{ {
size_t i; size_t i;
size_t bits = 0; size_t bits = 0;
@ -789,11 +785,9 @@ update_histogram(uint64_t value_arg, uint8_t *hist, uint32_t *count)
/* We store the bits in big-endian (largest-first) order */ /* We store the bits in big-endian (largest-first) order */
for (i = 0; i < 64; i++) { for (i = 0; i < 64; i++) {
if (value & (1ull << i)) { if (value & (1ull << i))
hist[63 - i]++;
++bits; ++bits;
} }
}
/* update the count of bits changed */ /* update the count of bits changed */
*count += bits; *count += bits;
} }
@ -920,14 +914,6 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info,
if (info != NULL && info->zbc_has_cksum) { if (info != NULL && info->zbc_has_cksum) {
fm_payload_set(ereport, fm_payload_set(ereport,
FM_EREPORT_PAYLOAD_ZFS_CKSUM_EXPECTED,
DATA_TYPE_UINT64_ARRAY,
sizeof (info->zbc_expected) / sizeof (uint64_t),
(uint64_t *)&info->zbc_expected,
FM_EREPORT_PAYLOAD_ZFS_CKSUM_ACTUAL,
DATA_TYPE_UINT64_ARRAY,
sizeof (info->zbc_actual) / sizeof (uint64_t),
(uint64_t *)&info->zbc_actual,
FM_EREPORT_PAYLOAD_ZFS_CKSUM_ALGO, FM_EREPORT_PAYLOAD_ZFS_CKSUM_ALGO,
DATA_TYPE_STRING, DATA_TYPE_STRING,
info->zbc_checksum_name, info->zbc_checksum_name,
@ -1010,10 +996,8 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info,
offset++; offset++;
} }
update_histogram(set, eip->zei_histogram_set, update_bad_bits(set, &eip->zei_range_sets[range]);
&eip->zei_range_sets[range]); update_bad_bits(cleared, &eip->zei_range_clears[range]);
update_histogram(cleared, eip->zei_histogram_cleared,
&eip->zei_range_clears[range]);
} }
/* convert to byte offsets */ /* convert to byte offsets */
@ -1049,15 +1033,6 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info,
DATA_TYPE_UINT8_ARRAY, DATA_TYPE_UINT8_ARRAY,
inline_size, (uint8_t *)eip->zei_bits_cleared, inline_size, (uint8_t *)eip->zei_bits_cleared,
NULL); NULL);
} else {
fm_payload_set(ereport,
FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM,
DATA_TYPE_UINT8_ARRAY,
NBBY * sizeof (uint64_t), eip->zei_histogram_set,
FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM,
DATA_TYPE_UINT8_ARRAY,
NBBY * sizeof (uint64_t), eip->zei_histogram_cleared,
NULL);
} }
return (eip); return (eip);
} }

View File

@ -1078,6 +1078,16 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
return (SET_ERROR(EXDEV)); return (SET_ERROR(EXDEV));
} }
/*
* outos and inos belongs to the same storage pool.
* see a few lines above, only one check.
*/
if (!spa_feature_is_enabled(dmu_objset_spa(outos),
SPA_FEATURE_BLOCK_CLONING)) {
zfs_exit_two(inzfsvfs, outzfsvfs, FTAG);
return (SET_ERROR(EOPNOTSUPP));
}
ASSERT(!outzfsvfs->z_replay); ASSERT(!outzfsvfs->z_replay);
error = zfs_verify_zp(inzp); error = zfs_verify_zp(inzp);
@ -1088,12 +1098,6 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
return (error); return (error);
} }
if (!spa_feature_is_enabled(dmu_objset_spa(outos),
SPA_FEATURE_BLOCK_CLONING)) {
zfs_exit_two(inzfsvfs, outzfsvfs, FTAG);
return (SET_ERROR(EXDEV));
}
/* /*
* We don't copy source file's flags that's why we don't allow to clone * We don't copy source file's flags that's why we don't allow to clone
* files that are in quarantine. * files that are in quarantine.
@ -1212,7 +1216,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
gid = KGID_TO_SGID(ZTOGID(outzp)); gid = KGID_TO_SGID(ZTOGID(outzp));
projid = outzp->z_projid; projid = outzp->z_projid;
bps = kmem_alloc(sizeof (bps[0]) * maxblocks, KM_SLEEP); bps = vmem_alloc(sizeof (bps[0]) * maxblocks, KM_SLEEP);
/* /*
* Clone the file in reasonable size chunks. Each chunk is cloned * Clone the file in reasonable size chunks. Each chunk is cloned
@ -1330,7 +1334,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
done += size; done += size;
} }
kmem_free(bps, sizeof (bps[0]) * maxblocks); vmem_free(bps, sizeof (bps[0]) * maxblocks);
zfs_znode_update_vfs(outzp); zfs_znode_update_vfs(outzp);
unlock: unlock:

View File

@ -151,6 +151,7 @@ static kmem_cache_t *zil_lwb_cache;
static kmem_cache_t *zil_zcw_cache; static kmem_cache_t *zil_zcw_cache;
static void zil_lwb_commit(zilog_t *zilog, lwb_t *lwb, itx_t *itx); static void zil_lwb_commit(zilog_t *zilog, lwb_t *lwb, itx_t *itx);
static void zil_lwb_write_issue(zilog_t *zilog, lwb_t *lwb);
static itx_t *zil_itx_clone(itx_t *oitx); static itx_t *zil_itx_clone(itx_t *oitx);
static int static int
@ -1768,7 +1769,7 @@ static uint_t zil_maxblocksize = SPA_OLD_MAXBLOCKSIZE;
* Has to be called under zl_issuer_lock to chain more lwbs. * Has to be called under zl_issuer_lock to chain more lwbs.
*/ */
static lwb_t * static lwb_t *
zil_lwb_write_close(zilog_t *zilog, lwb_t *lwb) zil_lwb_write_close(zilog_t *zilog, lwb_t *lwb, list_t *ilwbs)
{ {
lwb_t *nlwb = NULL; lwb_t *nlwb = NULL;
zil_chain_t *zilc; zil_chain_t *zilc;
@ -1870,6 +1871,27 @@ zil_lwb_write_close(zilog_t *zilog, lwb_t *lwb)
dmu_tx_commit(tx); dmu_tx_commit(tx);
/*
* We need to acquire the config lock for the lwb to issue it later.
* However, if we already have a queue of closed parent lwbs already
* holding the config lock (but not yet issued), we can't block here
* waiting on the lock or we will deadlock. In that case we must
* first issue to parent IOs before waiting on the lock.
*/
if (ilwbs && !list_is_empty(ilwbs)) {
if (!spa_config_tryenter(spa, SCL_STATE, lwb, RW_READER)) {
lwb_t *tlwb;
while ((tlwb = list_remove_head(ilwbs)) != NULL)
zil_lwb_write_issue(zilog, tlwb);
spa_config_enter(spa, SCL_STATE, lwb, RW_READER);
}
} else {
spa_config_enter(spa, SCL_STATE, lwb, RW_READER);
}
if (ilwbs)
list_insert_tail(ilwbs, lwb);
/* /*
* If there was an allocation failure then nlwb will be null which * If there was an allocation failure then nlwb will be null which
* forces a txg_wait_synced(). * forces a txg_wait_synced().
@ -1933,7 +1955,7 @@ zil_lwb_write_issue(zilog_t *zilog, lwb_t *lwb)
ZIL_STAT_INCR(zilog, zil_itx_metaslab_normal_alloc, ZIL_STAT_INCR(zilog, zil_itx_metaslab_normal_alloc,
BP_GET_LSIZE(&lwb->lwb_blk)); BP_GET_LSIZE(&lwb->lwb_blk));
} }
spa_config_enter(zilog->zl_spa, SCL_STATE, lwb, RW_READER); ASSERT(spa_config_held(zilog->zl_spa, SCL_STATE, RW_READER));
zil_lwb_add_block(lwb, &lwb->lwb_blk); zil_lwb_add_block(lwb, &lwb->lwb_blk);
lwb->lwb_issued_timestamp = gethrtime(); lwb->lwb_issued_timestamp = gethrtime();
zio_nowait(lwb->lwb_root_zio); zio_nowait(lwb->lwb_root_zio);
@ -2037,8 +2059,7 @@ cont:
lwb_sp < zil_max_waste_space(zilog) && lwb_sp < zil_max_waste_space(zilog) &&
(dlen % max_log_data == 0 || (dlen % max_log_data == 0 ||
lwb_sp < reclen + dlen % max_log_data))) { lwb_sp < reclen + dlen % max_log_data))) {
list_insert_tail(ilwbs, lwb); lwb = zil_lwb_write_close(zilog, lwb, ilwbs);
lwb = zil_lwb_write_close(zilog, lwb);
if (lwb == NULL) if (lwb == NULL)
return (NULL); return (NULL);
zil_lwb_write_open(zilog, lwb); zil_lwb_write_open(zilog, lwb);
@ -2937,8 +2958,7 @@ zil_process_commit_list(zilog_t *zilog, zil_commit_waiter_t *zcw, list_t *ilwbs)
zfs_commit_timeout_pct / 100; zfs_commit_timeout_pct / 100;
if (sleep < zil_min_commit_timeout || if (sleep < zil_min_commit_timeout ||
lwb->lwb_sz - lwb->lwb_nused < lwb->lwb_sz / 8) { lwb->lwb_sz - lwb->lwb_nused < lwb->lwb_sz / 8) {
list_insert_tail(ilwbs, lwb); lwb = zil_lwb_write_close(zilog, lwb, ilwbs);
lwb = zil_lwb_write_close(zilog, lwb);
zilog->zl_cur_used = 0; zilog->zl_cur_used = 0;
if (lwb == NULL) { if (lwb == NULL) {
while ((lwb = list_remove_head(ilwbs)) while ((lwb = list_remove_head(ilwbs))
@ -3096,7 +3116,7 @@ zil_commit_waiter_timeout(zilog_t *zilog, zil_commit_waiter_t *zcw)
* since we've reached the commit waiter's timeout and it still * since we've reached the commit waiter's timeout and it still
* hasn't been issued. * hasn't been issued.
*/ */
lwb_t *nlwb = zil_lwb_write_close(zilog, lwb); lwb_t *nlwb = zil_lwb_write_close(zilog, lwb, NULL);
ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED); ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);
@ -3921,13 +3941,11 @@ zil_suspend(const char *osname, void **cookiep)
return (error); return (error);
zilog = dmu_objset_zil(os); zilog = dmu_objset_zil(os);
mutex_enter(&zilog->zl_issuer_lock);
mutex_enter(&zilog->zl_lock); mutex_enter(&zilog->zl_lock);
zh = zilog->zl_header; zh = zilog->zl_header;
if (zh->zh_flags & ZIL_REPLAY_NEEDED) { /* unplayed log */ if (zh->zh_flags & ZIL_REPLAY_NEEDED) { /* unplayed log */
mutex_exit(&zilog->zl_lock); mutex_exit(&zilog->zl_lock);
mutex_exit(&zilog->zl_issuer_lock);
dmu_objset_rele(os, suspend_tag); dmu_objset_rele(os, suspend_tag);
return (SET_ERROR(EBUSY)); return (SET_ERROR(EBUSY));
} }
@ -3941,7 +3959,6 @@ zil_suspend(const char *osname, void **cookiep)
if (cookiep == NULL && !zilog->zl_suspending && if (cookiep == NULL && !zilog->zl_suspending &&
(zilog->zl_suspend > 0 || BP_IS_HOLE(&zh->zh_log))) { (zilog->zl_suspend > 0 || BP_IS_HOLE(&zh->zh_log))) {
mutex_exit(&zilog->zl_lock); mutex_exit(&zilog->zl_lock);
mutex_exit(&zilog->zl_issuer_lock);
dmu_objset_rele(os, suspend_tag); dmu_objset_rele(os, suspend_tag);
return (0); return (0);
} }
@ -3950,7 +3967,6 @@ zil_suspend(const char *osname, void **cookiep)
dsl_pool_rele(dmu_objset_pool(os), suspend_tag); dsl_pool_rele(dmu_objset_pool(os), suspend_tag);
zilog->zl_suspend++; zilog->zl_suspend++;
mutex_exit(&zilog->zl_issuer_lock);
if (zilog->zl_suspend > 1) { if (zilog->zl_suspend > 1) {
/* /*

View File

@ -1596,6 +1596,19 @@ zio_shrink(zio_t *zio, uint64_t size)
} }
} }
/*
* Round provided allocation size up to a value that can be allocated
* by at least some vdev(s) in the pool with minimum or no additional
* padding and without extra space usage on others
*/
static uint64_t
zio_roundup_alloc_size(spa_t *spa, uint64_t size)
{
if (size > spa->spa_min_alloc)
return (roundup(size, spa->spa_gcd_alloc));
return (spa->spa_min_alloc);
}
/* /*
* ========================================================================== * ==========================================================================
* Prepare to read and write logical blocks * Prepare to read and write logical blocks
@ -1802,9 +1815,8 @@ zio_write_compress(zio_t *zio)
* in that we charge for the padding used to fill out * in that we charge for the padding used to fill out
* the last sector. * the last sector.
*/ */
ASSERT3U(spa->spa_min_alloc, >=, SPA_MINBLOCKSHIFT); size_t rounded = (size_t)zio_roundup_alloc_size(spa,
size_t rounded = (size_t)roundup(psize, psize);
spa->spa_min_alloc);
if (rounded >= lsize) { if (rounded >= lsize) {
compress = ZIO_COMPRESS_OFF; compress = ZIO_COMPRESS_OFF;
zio_buf_free(cbuf, lsize); zio_buf_free(cbuf, lsize);
@ -1847,8 +1859,8 @@ zio_write_compress(zio_t *zio)
* take this codepath because it will change the on-disk block * take this codepath because it will change the on-disk block
* and decryption will fail. * and decryption will fail.
*/ */
size_t rounded = MIN((size_t)roundup(psize, size_t rounded = MIN((size_t)zio_roundup_alloc_size(spa, psize),
spa->spa_min_alloc), lsize); lsize);
if (rounded != psize) { if (rounded != psize) {
abd_t *cdata = abd_alloc_linear(rounded, B_TRUE); abd_t *cdata = abd_alloc_linear(rounded, B_TRUE);

View File

@ -515,8 +515,6 @@ zio_checksum_error_impl(spa_t *spa, const blkptr_t *bp,
} }
if (info != NULL) { if (info != NULL) {
info->zbc_expected = expected_cksum;
info->zbc_actual = actual_cksum;
info->zbc_checksum_name = ci->ci_name; info->zbc_checksum_name = ci->ci_name;
info->zbc_byteswapped = byteswap; info->zbc_byteswapped = byteswap;
info->zbc_injected = 0; info->zbc_injected = 0;

View File

@ -34,6 +34,15 @@ tags = ['functional', 'acl', 'posix-sa']
tests = ['atime_003_pos', 'root_relatime_on'] tests = ['atime_003_pos', 'root_relatime_on']
tags = ['functional', 'atime'] tags = ['functional', 'atime']
[tests/functional/block_cloning:Linux]
tests = ['block_cloning_copyfilerange', 'block_cloning_copyfilerange_partial',
'block_cloning_ficlone', 'block_cloning_ficlonerange',
'block_cloning_ficlonerange_partial',
'block_cloning_disabled_copyfilerange', 'block_cloning_disabled_ficlone',
'block_cloning_disabled_ficlonerange',
'block_cloning_copyfilerange_cross_dataset']
tags = ['functional', 'block_cloning']
[tests/functional/chattr:Linux] [tests/functional/chattr:Linux]
tests = ['chattr_001_pos', 'chattr_002_neg'] tests = ['chattr_001_pos', 'chattr_002_neg']
tags = ['functional', 'chattr'] tags = ['functional', 'chattr']

View File

@ -134,6 +134,12 @@ ci_reason = 'CI runner doesn\'t have all requirements'
# #
idmap_reason = 'Idmapped mount needs kernel 5.12+' idmap_reason = 'Idmapped mount needs kernel 5.12+'
#
# copy_file_range() is not supported by all kernels
#
cfr_reason = 'Kernel copy_file_range support required'
cfr_cross_reason = 'copy_file_range(2) cross-filesystem needs kernel 5.3+'
# #
# These tests are known to fail, thus we use this list to prevent these # These tests are known to fail, thus we use this list to prevent these
# failures from failing the job as a whole; only unexpected failures # failures from failing the job as a whole; only unexpected failures
@ -288,6 +294,14 @@ elif sys.platform.startswith('linux'):
'idmap_mount/idmap_mount_003': ['SKIP', idmap_reason], 'idmap_mount/idmap_mount_003': ['SKIP', idmap_reason],
'idmap_mount/idmap_mount_004': ['SKIP', idmap_reason], 'idmap_mount/idmap_mount_004': ['SKIP', idmap_reason],
'idmap_mount/idmap_mount_005': ['SKIP', idmap_reason], 'idmap_mount/idmap_mount_005': ['SKIP', idmap_reason],
'block_cloning/block_cloning_disabled_copyfilerange':
['SKIP', cfr_reason],
'block_cloning/block_cloning_copyfilerange':
['SKIP', cfr_reason],
'block_cloning/block_cloning_copyfilerange_partial':
['SKIP', cfr_reason],
'block_cloning/block_cloning_copyfilerange_cross_dataset':
['SKIP', cfr_cross_reason],
}) })

View File

@ -1,6 +1,7 @@
/badsend /badsend
/btree_test /btree_test
/chg_usr_exec /chg_usr_exec
/clonefile
/devname2devid /devname2devid
/dir_rd_update /dir_rd_update
/draid /draid

View File

@ -119,6 +119,7 @@ scripts_zfs_tests_bin_PROGRAMS += %D%/renameat2
scripts_zfs_tests_bin_PROGRAMS += %D%/xattrtest scripts_zfs_tests_bin_PROGRAMS += %D%/xattrtest
scripts_zfs_tests_bin_PROGRAMS += %D%/zed_fd_spill-zedlet scripts_zfs_tests_bin_PROGRAMS += %D%/zed_fd_spill-zedlet
scripts_zfs_tests_bin_PROGRAMS += %D%/idmap_util scripts_zfs_tests_bin_PROGRAMS += %D%/idmap_util
scripts_zfs_tests_bin_PROGRAMS += %D%/clonefile
%C%_idmap_util_LDADD = libspl.la %C%_idmap_util_LDADD = libspl.la

View File

@ -0,0 +1,333 @@
/*
* SPDX-License-Identifier: MIT
*
* Copyright (c) 2023, Rob Norris <robn@despairlabs.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
/*
* This program is to test the availability and behaviour of copy_file_range,
* FICLONE, FICLONERANGE and FIDEDUPERANGE in the Linux kernel. It should
* compile and run even if these features aren't exposed through the libc.
*/
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <stdlib.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#ifndef __NR_copy_file_range
#if defined(__x86_64__)
#define __NR_copy_file_range (326)
#elif defined(__i386__)
#define __NR_copy_file_range (377)
#elif defined(__s390__)
#define __NR_copy_file_range (375)
#elif defined(__arm__)
#define __NR_copy_file_range (391)
#elif defined(__aarch64__)
#define __NR_copy_file_range (285)
#elif defined(__powerpc__)
#define __NR_copy_file_range (379)
#else
#error "no definition of __NR_copy_file_range for this platform"
#endif
#endif /* __NR_copy_file_range */
ssize_t
copy_file_range(int, loff_t *, int, loff_t *, size_t, unsigned int)
__attribute__((weak));
static inline ssize_t
cf_copy_file_range(int sfd, loff_t *soff, int dfd, loff_t *doff,
size_t len, unsigned int flags)
{
if (copy_file_range)
return (copy_file_range(sfd, soff, dfd, doff, len, flags));
return (
syscall(__NR_copy_file_range, sfd, soff, dfd, doff, len, flags));
}
/* Define missing FICLONE */
#ifdef FICLONE
#define CF_FICLONE FICLONE
#else
#define CF_FICLONE _IOW(0x94, 9, int)
#endif
/* Define missing FICLONERANGE and support structs */
#ifdef FICLONERANGE
#define CF_FICLONERANGE FICLONERANGE
typedef struct file_clone_range cf_file_clone_range_t;
#else
typedef struct {
int64_t src_fd;
uint64_t src_offset;
uint64_t src_length;
uint64_t dest_offset;
} cf_file_clone_range_t;
#define CF_FICLONERANGE _IOW(0x94, 13, cf_file_clone_range_t)
#endif
/* Define missing FIDEDUPERANGE and support structs */
#ifdef FIDEDUPERANGE
#define CF_FIDEDUPERANGE FIDEDUPERANGE
#define CF_FILE_DEDUPE_RANGE_SAME FILE_DEDUPE_RANGE_SAME
#define CF_FILE_DEDUPE_RANGE_DIFFERS FILE_DEDUPE_RANGE_DIFFERS
typedef struct file_dedupe_range_info cf_file_dedupe_range_info_t;
typedef struct file_dedupe_range cf_file_dedupe_range_t;
#else
typedef struct {
int64_t dest_fd;
uint64_t dest_offset;
uint64_t bytes_deduped;
int32_t status;
uint32_t reserved;
} cf_file_dedupe_range_info_t;
typedef struct {
uint64_t src_offset;
uint64_t src_length;
uint16_t dest_count;
uint16_t reserved1;
uint32_t reserved2;
cf_file_dedupe_range_info_t info[0];
} cf_file_dedupe_range_t;
#define CF_FIDEDUPERANGE _IOWR(0x94, 54, cf_file_dedupe_range_t)
#define CF_FILE_DEDUPE_RANGE_SAME (0)
#define CF_FILE_DEDUPE_RANGE_DIFFERS (1)
#endif
typedef enum {
CF_MODE_NONE,
CF_MODE_CLONE,
CF_MODE_CLONERANGE,
CF_MODE_COPYFILERANGE,
CF_MODE_DEDUPERANGE,
} cf_mode_t;
static int
usage(void)
{
printf(
"usage:\n"
" FICLONE:\n"
" clonefile -c <src> <dst>\n"
" FICLONERANGE:\n"
" clonefile -r <src> <dst> <soff> <doff> <len>\n"
" copy_file_range:\n"
" clonefile -f <src> <dst> <soff> <doff> <len>\n"
" FIDEDUPERANGE:\n"
" clonefile -d <src> <dst> <soff> <doff> <len>\n");
return (1);
}
int do_clone(int sfd, int dfd);
int do_clonerange(int sfd, int dfd, loff_t soff, loff_t doff, size_t len);
int do_copyfilerange(int sfd, int dfd, loff_t soff, loff_t doff, size_t len);
int do_deduperange(int sfd, int dfd, loff_t soff, loff_t doff, size_t len);
int quiet = 0;
int
main(int argc, char **argv)
{
cf_mode_t mode = CF_MODE_NONE;
char c;
while ((c = getopt(argc, argv, "crfdq")) != -1) {
switch (c) {
case 'c':
mode = CF_MODE_CLONE;
break;
case 'r':
mode = CF_MODE_CLONERANGE;
break;
case 'f':
mode = CF_MODE_COPYFILERANGE;
break;
case 'd':
mode = CF_MODE_DEDUPERANGE;
break;
case 'q':
quiet = 1;
break;
}
}
if (mode == CF_MODE_NONE || (argc-optind) < 2 ||
(mode != CF_MODE_CLONE && (argc-optind) < 5))
return (usage());
loff_t soff = 0, doff = 0;
size_t len = 0;
if (mode != CF_MODE_CLONE) {
soff = strtoull(argv[optind+2], NULL, 10);
if (soff == ULLONG_MAX) {
fprintf(stderr, "invalid source offset");
return (1);
}
doff = strtoull(argv[optind+3], NULL, 10);
if (doff == ULLONG_MAX) {
fprintf(stderr, "invalid dest offset");
return (1);
}
len = strtoull(argv[optind+4], NULL, 10);
if (len == ULLONG_MAX) {
fprintf(stderr, "invalid length");
return (1);
}
}
int sfd = open(argv[optind], O_RDONLY);
if (sfd < 0) {
fprintf(stderr, "open: %s: %s\n",
argv[optind], strerror(errno));
return (1);
}
int dfd = open(argv[optind+1], O_WRONLY|O_CREAT,
S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH);
if (sfd < 0) {
fprintf(stderr, "open: %s: %s\n",
argv[optind+1], strerror(errno));
close(sfd);
return (1);
}
int err;
switch (mode) {
case CF_MODE_CLONE:
err = do_clone(sfd, dfd);
break;
case CF_MODE_CLONERANGE:
err = do_clonerange(sfd, dfd, soff, doff, len);
break;
case CF_MODE_COPYFILERANGE:
err = do_copyfilerange(sfd, dfd, soff, doff, len);
break;
case CF_MODE_DEDUPERANGE:
err = do_deduperange(sfd, dfd, soff, doff, len);
break;
default:
abort();
}
off_t spos = lseek(sfd, 0, SEEK_CUR);
off_t slen = lseek(sfd, 0, SEEK_END);
off_t dpos = lseek(dfd, 0, SEEK_CUR);
off_t dlen = lseek(dfd, 0, SEEK_END);
fprintf(stderr, "file offsets: src=%lu/%lu; dst=%lu/%lu\n", spos, slen,
dpos, dlen);
close(dfd);
close(sfd);
return (err == 0 ? 0 : 1);
}
int
do_clone(int sfd, int dfd)
{
fprintf(stderr, "using FICLONE\n");
int err = ioctl(dfd, CF_FICLONE, sfd);
if (err < 0) {
fprintf(stderr, "ioctl(FICLONE): %s\n", strerror(errno));
return (err);
}
return (0);
}
int
do_clonerange(int sfd, int dfd, loff_t soff, loff_t doff, size_t len)
{
fprintf(stderr, "using FICLONERANGE\n");
cf_file_clone_range_t fcr = {
.src_fd = sfd,
.src_offset = soff,
.src_length = len,
.dest_offset = doff,
};
int err = ioctl(dfd, CF_FICLONERANGE, &fcr);
if (err < 0) {
fprintf(stderr, "ioctl(FICLONERANGE): %s\n", strerror(errno));
return (err);
}
return (0);
}
int
do_copyfilerange(int sfd, int dfd, loff_t soff, loff_t doff, size_t len)
{
fprintf(stderr, "using copy_file_range\n");
ssize_t copied = cf_copy_file_range(sfd, &soff, dfd, &doff, len, 0);
if (copied < 0) {
fprintf(stderr, "copy_file_range: %s\n", strerror(errno));
return (1);
}
if (copied != len) {
fprintf(stderr, "copy_file_range: copied less than requested: "
"requested=%lu; copied=%lu\n", len, copied);
return (1);
}
return (0);
}
int
do_deduperange(int sfd, int dfd, loff_t soff, loff_t doff, size_t len)
{
fprintf(stderr, "using FIDEDUPERANGE\n");
char buf[sizeof (cf_file_dedupe_range_t)+
sizeof (cf_file_dedupe_range_info_t)] = {0};
cf_file_dedupe_range_t *fdr = (cf_file_dedupe_range_t *)&buf[0];
cf_file_dedupe_range_info_t *fdri =
(cf_file_dedupe_range_info_t *)
&buf[sizeof (cf_file_dedupe_range_t)];
fdr->src_offset = soff;
fdr->src_length = len;
fdr->dest_count = 1;
fdri->dest_fd = dfd;
fdri->dest_offset = doff;
int err = ioctl(sfd, CF_FIDEDUPERANGE, fdr);
if (err != 0)
fprintf(stderr, "ioctl(FIDEDUPERANGE): %s\n", strerror(errno));
if (fdri->status < 0) {
fprintf(stderr, "dedup failed: %s\n", strerror(-fdri->status));
err = -1;
} else if (fdri->status == CF_FILE_DEDUPE_RANGE_DIFFERS) {
fprintf(stderr, "dedup failed: range differs\n");
err = -1;
}
return (err);
}

View File

@ -182,6 +182,7 @@ export ZFS_FILES='zdb
export ZFSTEST_FILES='badsend export ZFSTEST_FILES='badsend
btree_test btree_test
chg_usr_exec chg_usr_exec
clonefile
devname2devid devname2devid
dir_rd_update dir_rd_update
draid draid

View File

@ -90,6 +90,7 @@ nobase_dist_datadir_zfs_tests_tests_DATA += \
functional/alloc_class/alloc_class.kshlib \ functional/alloc_class/alloc_class.kshlib \
functional/atime/atime.cfg \ functional/atime/atime.cfg \
functional/atime/atime_common.kshlib \ functional/atime/atime_common.kshlib \
functional/block_cloning/block_cloning.kshlib \
functional/cache/cache.cfg \ functional/cache/cache.cfg \
functional/cache/cache.kshlib \ functional/cache/cache.kshlib \
functional/cachefile/cachefile.cfg \ functional/cachefile/cachefile.cfg \
@ -437,6 +438,17 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/atime/root_atime_on.ksh \ functional/atime/root_atime_on.ksh \
functional/atime/root_relatime_on.ksh \ functional/atime/root_relatime_on.ksh \
functional/atime/setup.ksh \ functional/atime/setup.ksh \
functional/block_cloning/cleanup.ksh \
functional/block_cloning/setup.ksh \
functional/block_cloning/block_cloning_copyfilerange_cross_dataset.ksh \
functional/block_cloning/block_cloning_copyfilerange.ksh \
functional/block_cloning/block_cloning_copyfilerange_partial.ksh \
functional/block_cloning/block_cloning_disabled_copyfilerange.ksh \
functional/block_cloning/block_cloning_disabled_ficlone.ksh \
functional/block_cloning/block_cloning_disabled_ficlonerange.ksh \
functional/block_cloning/block_cloning_ficlone.ksh \
functional/block_cloning/block_cloning_ficlonerange.ksh \
functional/block_cloning/block_cloning_ficlonerange_partial.ksh \
functional/bootfs/bootfs_001_pos.ksh \ functional/bootfs/bootfs_001_pos.ksh \
functional/bootfs/bootfs_002_neg.ksh \ functional/bootfs/bootfs_002_neg.ksh \
functional/bootfs/bootfs_003_pos.ksh \ functional/bootfs/bootfs_003_pos.ksh \

View File

@ -0,0 +1,46 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
function have_same_content
{
typeset hash1=$(cat $1 | md5sum)
typeset hash2=$(cat $2 | md5sum)
log_must [ "$hash1" = "$hash2" ]
}
function unique_blocks
{
typeset zdbout=${TMPDIR:-$TEST_BASE_DIR}/zdbout.$$
zdb -vvvvv $1 -O $2 | \
awk '/ L0 / { print ++l " " $3 " " $7 }' > $zdbout.a
zdb -vvvvv $3 -O $4 | \
awk '/ L0 / { print ++l " " $3 " " $7 }' > $zdbout.b
echo $(sort $zdbout.a $zdbout.b | uniq -d | cut -f1 -d' ')
}

View File

@ -0,0 +1,60 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
if [[ $(linux_version) -lt $(linux_version "4.5") ]]; then
log_unsupported "copy_file_range not available before Linux 4.5"
fi
claim="The copy_file_range syscall can clone whole files."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must clonefile -f /$TESTPOOL/file1 /$TESTPOOL/file2 0 0 524288
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "1 2 3 4" ]
log_pass $claim

View File

@ -0,0 +1,65 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
if [[ $(linux_version) -lt $(linux_version "5.3") ]]; then
log_unsupported "copy_file_range can't copy cross-filesystem before Linux 5.3"
fi
claim="The copy_file_range syscall can clone across datasets."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $DISKS
log_must zfs create $TESTPOOL/$TESTFS1
log_must zfs create $TESTPOOL/$TESTFS2
log_must dd if=/dev/urandom of=/$TESTPOOL/$TESTFS1/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must \
clonefile -f /$TESTPOOL/$TESTFS1/file1 /$TESTPOOL/$TESTFS2/file2 0 0 524288
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/$TESTFS1/file1 /$TESTPOOL/$TESTFS2/file2
typeset blocks=$(unique_blocks \
$TESTPOOL/$TESTFS1 file1 $TESTPOOL/$TESTFS2 file2)
log_must [ "$blocks" = "1 2 3 4" ]
log_pass $claim

View File

@ -0,0 +1,68 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
if [[ $(linux_version) -lt $(linux_version "4.5") ]]; then
log_unsupported "copy_file_range not available before Linux 4.5"
fi
claim="The copy_file_range syscall can clone parts of a file."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must dd if=/$TESTPOOL/file1 of=/$TESTPOOL/file2 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "" ]
log_must clonefile -f /$TESTPOOL/file1 /$TESTPOOL/file2 131072 131072 262144
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "2 3" ]
log_pass $claim

View File

@ -0,0 +1,60 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
if [[ $(linux_version) -lt $(linux_version "4.5") ]]; then
log_unsupported "copy_file_range not available before Linux 4.5"
fi
claim="The copy_file_range syscall copies files when block cloning is disabled."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=disabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must clonefile -f /$TESTPOOL/file1 /$TESTPOOL/file2 0 0 524288
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "" ]
log_pass $claim

View File

@ -0,0 +1,50 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
claim="The FICLONE ioctl fails when block cloning is disabled."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=disabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_mustnot clonefile -c /$TESTPOOL/file1 /$TESTPOOL/file2
log_pass $claim

View File

@ -0,0 +1,50 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
claim="The FICLONERANGE ioctl fails when block cloning is disabled."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=disabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_mustnot clonefile -r /$TESTPOOL/file1 /$TESTPOOL/file2 0 0 524288
log_pass $claim

View File

@ -0,0 +1,56 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
claim="The FICLONE ioctl can clone files."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must clonefile -c /$TESTPOOL/file1 /$TESTPOOL/file2
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "1 2 3 4" ]
log_pass $claim

View File

@ -0,0 +1,56 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
claim="The FICLONERANGE ioctl can clone whole files."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must clonefile -r /$TESTPOOL/file1 /$TESTPOOL/file2 0 0 524288
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "1 2 3 4" ]
log_pass $claim

View File

@ -0,0 +1,64 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
claim="The FICLONERANGE ioctl can clone parts of a file."
log_assert $claim
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
}
log_onexit cleanup
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $DISKS
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must dd if=/$TESTPOOL/file1 of=/$TESTPOOL/file2 bs=128K count=4
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "" ]
log_must clonefile -r /$TESTPOOL/file1 /$TESTPOOL/file2 131072 131072 262144
log_must sync_pool $TESTPOOL
log_must have_same_content /$TESTPOOL/file1 /$TESTPOOL/file2
typeset blocks=$(unique_blocks $TESTPOOL file1 $TESTPOOL file2)
log_must [ "$blocks" = "2 3" ]
log_pass $claim

View File

@ -0,0 +1,34 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
verify_runnable "global"
default_cleanup_noexit
log_pass

View File

@ -0,0 +1,36 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023, Klara Inc.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
if ! command -v clonefile > /dev/null ; then
log_unsupported "clonefile program required to test block cloning"
fi
verify_runnable "global"
log_pass

View File

@ -35,7 +35,7 @@
# #
# STRATEGY: # STRATEGY:
# 1. Create various pools with different ashift values. # 1. Create various pools with different ashift values.
# 2. Verify 'attach -o ashift=<n>' works only with allowed values. # 2. Verify 'attach' works.
# #
verify_runnable "global" verify_runnable "global"
@ -65,28 +65,16 @@ log_must set_tunable32 VDEV_FILE_PHYSICAL_ASHIFT 16
typeset ashifts=("9" "10" "11" "12" "13" "14" "15" "16") typeset ashifts=("9" "10" "11" "12" "13" "14" "15" "16")
for ashift in ${ashifts[@]} for ashift in ${ashifts[@]}
do
for cmdval in ${ashifts[@]}
do do
log_must zpool create -o ashift=$ashift $TESTPOOL1 $disk1 log_must zpool create -o ashift=$ashift $TESTPOOL1 $disk1
log_must verify_ashift $disk1 $ashift log_must verify_ashift $disk1 $ashift
log_must zpool attach $TESTPOOL1 $disk1 $disk2
# ashift_of(attached_disk) <= ashift_of(existing_vdev)
if [[ $cmdval -le $ashift ]]
then
log_must zpool attach -o ashift=$cmdval $TESTPOOL1 \
$disk1 $disk2
log_must verify_ashift $disk2 $ashift log_must verify_ashift $disk2 $ashift
else
log_mustnot zpool attach -o ashift=$cmdval $TESTPOOL1 \
$disk1 $disk2
fi
# clean things for the next run # clean things for the next run
log_must zpool destroy $TESTPOOL1 log_must zpool destroy $TESTPOOL1
log_must zpool labelclear $disk1 log_must zpool labelclear $disk1
log_must zpool labelclear $disk2 log_must zpool labelclear $disk2
done done
done
typeset badvals=("off" "on" "1" "8" "17" "1b" "ff" "-") typeset badvals=("off" "on" "1" "8" "17" "1b" "ff" "-")
for badval in ${badvals[@]} for badval in ${badvals[@]}

View File

@ -35,7 +35,7 @@
# #
# STRATEGY: # STRATEGY:
# 1. Create various pools with different ashift values. # 1. Create various pools with different ashift values.
# 2. Verify 'replace -o ashift=<n>' works only with allowed values. # 2. Verify 'replace' works.
# #
verify_runnable "global" verify_runnable "global"
@ -65,28 +65,18 @@ log_must set_tunable32 VDEV_FILE_PHYSICAL_ASHIFT 16
typeset ashifts=("9" "10" "11" "12" "13" "14" "15" "16") typeset ashifts=("9" "10" "11" "12" "13" "14" "15" "16")
for ashift in ${ashifts[@]} for ashift in ${ashifts[@]}
do
for cmdval in ${ashifts[@]}
do do
log_must zpool create -o ashift=$ashift $TESTPOOL1 $disk1 log_must zpool create -o ashift=$ashift $TESTPOOL1 $disk1
log_must verify_ashift $disk1 $ashift log_must verify_ashift $disk1 $ashift
# ashift_of(replacing_disk) <= ashift_of(existing_vdev) # ashift_of(replacing_disk) <= ashift_of(existing_vdev)
if [[ $cmdval -le $ashift ]] log_must zpool replace $TESTPOOL1 $disk1 $disk2
then
log_must zpool replace -o ashift=$cmdval $TESTPOOL1 \
$disk1 $disk2
log_must verify_ashift $disk2 $ashift log_must verify_ashift $disk2 $ashift
wait_replacing $TESTPOOL1 wait_replacing $TESTPOOL1
else
log_mustnot zpool replace -o ashift=$cmdval $TESTPOOL1 \
$disk1 $disk2
fi
# clean things for the next run # clean things for the next run
log_must zpool destroy $TESTPOOL1 log_must zpool destroy $TESTPOOL1
log_must zpool labelclear $disk1 log_must zpool labelclear $disk1
log_must zpool labelclear $disk2 log_must zpool labelclear $disk2
done done
done
typeset badvals=("off" "on" "1" "8" "17" "1b" "ff" "-") typeset badvals=("off" "on" "1" "8" "17" "1b" "ff" "-")
for badval in ${badvals[@]} for badval in ${badvals[@]}

View File

@ -34,10 +34,8 @@
# #
# STRATEGY: # STRATEGY:
# 1. Create a pool with default values. # 1. Create a pool with default values.
# 2. Verify 'zpool replace' uses the ashift pool property value when # 2. Override the pool ashift property.
# replacing an existing device. # 3. Verify 'zpool replace' works.
# 3. Verify the default ashift value can still be overridden by manually
# specifying '-o ashift=<n>' from the command line.
# #
verify_runnable "global" verify_runnable "global"
@ -72,21 +70,9 @@ do
do do
log_must zpool create -o ashift=$ashift $TESTPOOL1 $disk1 log_must zpool create -o ashift=$ashift $TESTPOOL1 $disk1
log_must zpool set ashift=$pprop $TESTPOOL1 log_must zpool set ashift=$pprop $TESTPOOL1
# ashift_of(replacing_disk) <= ashift_of(existing_vdev)
if [[ $pprop -le $ashift ]]
then
log_must zpool replace $TESTPOOL1 $disk1 $disk2 log_must zpool replace $TESTPOOL1 $disk1 $disk2
wait_replacing $TESTPOOL1 wait_replacing $TESTPOOL1
log_must verify_ashift $disk2 $ashift log_must verify_ashift $disk2 $ashift
else
# cannot replace if pool prop ashift > vdev ashift
log_mustnot zpool replace $TESTPOOL1 $disk1 $disk2
# verify we can override the pool prop value manually
log_must zpool replace -o ashift=$ashift $TESTPOOL1 \
$disk1 $disk2
wait_replacing $TESTPOOL1
log_must verify_ashift $disk2 $ashift
fi
# clean things for the next run # clean things for the next run
log_must zpool destroy $TESTPOOL1 log_must zpool destroy $TESTPOOL1
log_must zpool labelclear $disk1 log_must zpool labelclear $disk1