zfs/man/man8/zpool.8

2806 lines
83 KiB
Groff
Raw Normal View History

.\"
.\" CDDL HEADER START
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" CDDL HEADER END
.\"
.\"
.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
.\" Copyright (c) 2017 Datto Inc.
.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\"
Detect IO errors during device removal * Detect IO errors during device removal While device removal cannot verify the checksums of individual blocks during device removal, it can reasonably detect hard IO errors from the leaf vdevs. Failure to perform this error checking can result in device removal completing successfully, but moving no data which will permanently corrupt the pool. Situation 1: faulted/degraded vdevs In the configuration shown below, the removal of mirror-0 will permanently corrupt the pool. Device removal will preferentially copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which in this case will result in nothing being copied since one vdev in each of those groups in unavailable. However, device removal will complete successfully since all IO errors are ignored. tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 /var/tmp/vdev1 FAULTED 0 0 0 external fault /var/tmp/vdev2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 /var/tmp/vdev3 ONLINE 0 0 0 /var/tmp/vdev4 FAULTED 0 0 0 external fault This issue is resolved by updating the source child selection logic to exclude unreadable leaf vdevs. Additionally, unwritable destination child vdevs which can never succeed are skipped to prevent generating a large number of write IO errors. Situation 2: individual hard IO errors During removal if an unexpected hard IO error is encountered when either reading or writing the child vdev the entire removal operation is cancelled. While it may be possible to reconstruct the data after removal that cannot be guaranteed. The only strictly safe thing to do is to cancel the removal. As a future improvement we may want to instead suspend the removal process and allow the damaged region to be retried. But that work is left for another time, hard IO errors during the removal process are expected to be exceptionally rare. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #6900 Closes #8161
2018-12-04 17:37:37 +00:00
.Dd November 29, 2018
.Dt ZPOOL 8 SMM
.Os Linux
.Sh NAME
.Nm zpool
.Nd configure ZFS storage pools
.Sh SYNOPSIS
.Nm
.Fl ?
.Nm
.Cm add
.Op Fl fgLnP
.Oo Fl o Ar property Ns = Ns Ar value Oc
.Ar pool vdev Ns ...
.Nm
.Cm attach
.Op Fl f
.Oo Fl o Ar property Ns = Ns Ar value Oc
.Ar pool device new_device
.Nm
OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
2016-12-16 22:11:29 +00:00
.Cm checkpoint
.Op Fl d, -discard
.Ar pool
.Nm
.Cm clear
.Ar pool
.Op Ar device
.Nm
.Cm create
.Op Fl dfn
.Op Fl m Ar mountpoint
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Oo Fl o Ar feature@feature Ns = Ns Ar value Oc
.Oo Fl O Ar file-system-property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Ar pool vdev Ns ...
.Nm
.Cm destroy
.Op Fl f
.Ar pool
.Nm
.Cm detach
.Ar pool device
.Nm
.Cm events
.Op Fl vHf Oo Ar pool Oc | Fl c
.Nm
.Cm export
.Op Fl a
.Op Fl f
.Ar pool Ns ...
.Nm
.Cm get
.Op Fl Hp
.Op Fl o Ar field Ns Oo , Ns Ar field Oc Ns ...
.Sy all Ns | Ns Ar property Ns Oo , Ns Ar property Oc Ns ...
.Oo Ar pool Oc Ns ...
.Nm
.Cm history
.Op Fl il
.Oo Ar pool Oc Ns ...
.Nm
.Cm import
.Op Fl D
.Op Fl d Ar dir Ns | Ns device
.Nm
.Cm import
.Fl a
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.Op Fl DflmN
.Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc
OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
2016-12-16 22:11:29 +00:00
.Op Fl -rewind-to-checkpoint
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
.Op Fl o Ar mntopts
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Nm
.Cm import
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.Op Fl Dflm
.Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc
OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
2016-12-16 22:11:29 +00:00
.Op Fl -rewind-to-checkpoint
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
.Op Fl o Ar mntopts
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Op Fl s
.Ar pool Ns | Ns Ar id
.Op Ar newpool Oo Fl t Oc
.Nm
OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
2018-12-19 14:54:59 +00:00
.Cm initialize
.Op Fl c | Fl s
OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
2018-12-19 14:54:59 +00:00
.Ar pool
.Op Ar device Ns ...
.Nm
.Cm iostat
.Op Oo Oo Fl c Ar SCRIPT Oc Oo Fl lq Oc Oc Ns | Ns Fl rw
.Op Fl T Sy u Ns | Ns Sy d
.Op Fl ghHLnpPvy
.Oo Oo Ar pool Ns ... Oc Ns | Ns Oo Ar pool vdev Ns ... Oc Ns | Ns Oo Ar vdev Ns ... Oc Oc
.Op Ar interval Op Ar count
.Nm
.Cm labelclear
.Op Fl f
.Ar device
.Nm
.Cm list
.Op Fl HgLpPv
.Op Fl o Ar property Ns Oo , Ns Ar property Oc Ns ...
.Op Fl T Sy u Ns | Ns Sy d
.Oo Ar pool Oc Ns ...
.Op Ar interval Op Ar count
.Nm
.Cm offline
.Op Fl f
.Op Fl t
.Ar pool Ar device Ns ...
.Nm
.Cm online
.Op Fl e
.Ar pool Ar device Ns ...
.Nm
.Cm reguid
.Ar pool
.Nm
.Cm reopen
.Op Fl n
.Ar pool
.Nm
.Cm remove
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.Op Fl np
.Ar pool Ar device Ns ...
.Nm
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.Cm remove
.Fl s
.Ar pool
.Nm
.Cm replace
.Op Fl f
.Oo Fl o Ar property Ns = Ns Ar value Oc
.Ar pool Ar device Op Ar new_device
.Nm
.Cm resilver
.Ar pool Ns ...
.Nm
.Cm scrub
.Op Fl s | Fl p
.Ar pool Ns ...
.Nm
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.Cm trim
.Op Fl d
.Op Fl r Ar rate
.Op Fl c | Fl s
.Ar pool
.Op Ar device Ns ...
.Nm
.Cm set
.Ar property Ns = Ns Ar value
.Ar pool
.Nm
.Cm split
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.Op Fl gLlnP
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Ar pool newpool
.Oo Ar device Oc Ns ...
.Nm
.Cm status
.Oo Fl c Ar SCRIPT Oc
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.Op Fl DigLpPstvx
.Op Fl T Sy u Ns | Ns Sy d
.Oo Ar pool Oc Ns ...
.Op Ar interval Op Ar count
.Nm
.Cm sync
.Oo Ar pool Oc Ns ...
.Nm
.Cm upgrade
.Nm
.Cm upgrade
.Fl v
.Nm
.Cm upgrade
.Op Fl V Ar version
.Fl a Ns | Ns Ar pool Ns ...
.Sh DESCRIPTION
The
.Nm
command configures ZFS storage pools.
A storage pool is a collection of devices that provides physical storage and
data replication for ZFS datasets.
All datasets within a storage pool share the same space.
See
.Xr zfs 8
for information on managing datasets.
.Ss Virtual Devices (vdevs)
A "virtual device" describes a single device or a collection of devices
organized according to certain performance and fault characteristics.
The following virtual devices are supported:
.Bl -tag -width Ds
.It Sy disk
A block device, typically located under
.Pa /dev .
ZFS can use individual slices or partitions, though the recommended mode of
operation is to use whole disks.
A disk can be specified by a full path, or it can be a shorthand name
.Po the relative portion of the path under
.Pa /dev
.Pc .
A whole disk can be specified by omitting the slice or partition designation.
For example,
.Pa sda
is equivalent to
.Pa /dev/sda .
When given a whole disk, ZFS automatically labels the disk, if necessary.
.It Sy file
A regular file.
The use of files as a backing store is strongly discouraged.
It is designed primarily for experimental purposes, as the fault tolerance of a
file is only as good as the file system of which it is a part.
A file must be specified by a full path.
.It Sy mirror
A mirror of two or more devices.
Data is replicated in an identical fashion across all components of a mirror.
A mirror with N disks of size X can hold X bytes and can withstand (N-1) devices
failing before data integrity is compromised.
.It Sy raidz , raidz1 , raidz2 , raidz3
A variation on RAID-5 that allows for better distribution of parity and
eliminates the RAID-5
.Qq write hole
.Pq in which data and parity become inconsistent after a power loss .
Data and parity is striped across all disks within a raidz group.
.Pp
A raidz group can have single-, double-, or triple-parity, meaning that the
raidz group can sustain one, two, or three failures, respectively, without
losing any data.
The
.Sy raidz1
vdev type specifies a single-parity raidz group; the
.Sy raidz2
vdev type specifies a double-parity raidz group; and the
.Sy raidz3
vdev type specifies a triple-parity raidz group.
The
.Sy raidz
vdev type is an alias for
.Sy raidz1 .
.Pp
A raidz group with N disks of size X with P parity disks can hold approximately
(N-P)*X bytes and can withstand P device(s) failing before data integrity is
compromised.
The minimum number of devices in a raidz group is one more than the number of
parity disks.
The recommended number is between 3 and 9 to help increase performance.
.It Sy spare
A pseudo-vdev which keeps track of available hot spares for a pool.
For more information, see the
.Sx Hot Spares
section.
.It Sy log
A separate intent log device.
If more than one log device is specified, then writes are load-balanced between
devices.
Log devices can be mirrored.
However, raidz vdev types are not supported for the intent log.
For more information, see the
.Sx Intent Log
section.
.It Sy dedup
A device dedicated solely for dedup data.
The redundancy of this device should match the redundancy of the other normal
devices in the pool. If more than one dedup device is specified, then
allocations are load-balanced between those devices.
.It Sy special
A device dedicated solely for allocating various kinds of internal metadata,
and optionally small file data.
The redundancy of this device should match the redundancy of the other normal
devices in the pool. If more than one special device is specified, then
allocations are load-balanced between those devices.
.Pp
For more information on special allocations, see the
.Sx Special Allocation Class
section.
.It Sy cache
A device used to cache storage pool data.
A cache device cannot be configured as a mirror or raidz group.
For more information, see the
.Sx Cache Devices
section.
.El
.Pp
Virtual devices cannot be nested, so a mirror or raidz virtual device can only
contain files or disks.
Mirrors of mirrors
.Pq or other combinations
are not allowed.
.Pp
A pool can have any number of virtual devices at the top of the configuration
.Po known as
.Qq root vdevs
.Pc .
Data is dynamically distributed across all top-level devices to balance data
among devices.
As new virtual devices are added, ZFS automatically places data on the newly
available devices.
.Pp
Virtual devices are specified one at a time on the command line, separated by
whitespace.
The keywords
.Sy mirror
and
.Sy raidz
are used to distinguish where a group ends and another begins.
For example, the following creates two root vdevs, each a mirror of two disks:
.Bd -literal
# zpool create mypool mirror sda sdb mirror sdc sdd
.Ed
.Ss Device Failure and Recovery
ZFS supports a rich set of mechanisms for handling device failure and data
corruption.
All metadata and data is checksummed, and ZFS automatically repairs bad data
from a good copy when corruption is detected.
.Pp
In order to take advantage of these features, a pool must make use of some form
of redundancy, using either mirrored or raidz groups.
While ZFS supports running in a non-redundant configuration, where each root
vdev is simply a disk or file, this is strongly discouraged.
A single case of bit corruption can render some or all of your data unavailable.
.Pp
A pool's health status is described by one of three states: online, degraded,
or faulted.
An online pool has all devices operating normally.
A degraded pool is one in which one or more devices have failed, but the data is
still available due to a redundant configuration.
A faulted pool has corrupted metadata, or one or more faulted devices, and
insufficient replicas to continue functioning.
.Pp
The health of the top-level vdev, such as mirror or raidz device, is
potentially impacted by the state of its associated vdevs, or component
devices.
A top-level vdev or component device is in one of the following states:
.Bl -tag -width "DEGRADED"
.It Sy DEGRADED
One or more top-level vdevs is in the degraded state because one or more
component devices are offline.
Sufficient replicas exist to continue functioning.
.Pp
One or more component devices is in the degraded or faulted state, but
sufficient replicas exist to continue functioning.
The underlying conditions are as follows:
.Bl -bullet
.It
The number of checksum errors exceeds acceptable levels and the device is
degraded as an indication that something may be wrong.
ZFS continues to use the device as necessary.
.It
The number of I/O errors exceeds acceptable levels.
The device could not be marked as faulted because there are insufficient
replicas to continue functioning.
.El
.It Sy FAULTED
One or more top-level vdevs is in the faulted state because one or more
component devices are offline.
Insufficient replicas exist to continue functioning.
.Pp
One or more component devices is in the faulted state, and insufficient
replicas exist to continue functioning.
The underlying conditions are as follows:
.Bl -bullet
.It
The device could be opened, but the contents did not match expected values.
.It
The number of I/O errors exceeds acceptable levels and the device is faulted to
prevent further use of the device.
.El
.It Sy OFFLINE
The device was explicitly taken offline by the
.Nm zpool Cm offline
command.
.It Sy ONLINE
The device is online and functioning.
.It Sy REMOVED
The device was physically removed while the system was running.
Device removal detection is hardware-dependent and may not be supported on all
platforms.
.It Sy UNAVAIL
The device could not be opened.
If a pool is imported when a device was unavailable, then the device will be
identified by a unique identifier instead of its path since the path was never
correct in the first place.
.El
.Pp
If a device is removed and later re-attached to the system, ZFS attempts
to put the device online automatically.
Device attach detection is hardware-dependent and might not be supported on all
platforms.
.Ss Hot Spares
ZFS allows devices to be associated with pools as
.Qq hot spares .
These devices are not actively used in the pool, but when an active device
fails, it is automatically replaced by a hot spare.
To create a pool with hot spares, specify a
.Sy spare
vdev with any number of devices.
For example,
.Bd -literal
# zpool create pool mirror sda sdb spare sdc sdd
.Ed
.Pp
Spares can be shared across multiple pools, and can be added with the
.Nm zpool Cm add
command and removed with the
.Nm zpool Cm remove
command.
Once a spare replacement is initiated, a new
.Sy spare
vdev is created within the configuration that will remain there until the
original device is replaced.
At this point, the hot spare becomes available again if another device fails.
.Pp
If a pool has a shared spare that is currently being used, the pool can not be
exported since other pools may use this shared spare, which may lead to
potential data corruption.
.Pp
Shared spares add some risk. If the pools are imported on different hosts, and
both pools suffer a device failure at the same time, both could attempt to use
the spare at the same time. This may not be detected, resulting in data
corruption.
.Pp
An in-progress spare replacement can be cancelled by detaching the hot spare.
If the original faulted device is detached, then the hot spare assumes its
place in the configuration, and is removed from the spare list of all active
pools.
.Pp
Spares cannot replace log devices.
.Ss Intent Log
The ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
transactions.
For instance, databases often require their transactions to be on stable storage
devices when returning from a system call.
NFS and other applications can also use
.Xr fsync 2
to ensure data stability.
By default, the intent log is allocated from blocks within the main pool.
However, it might be possible to get better performance using separate intent
log devices such as NVRAM or a dedicated disk.
For example:
.Bd -literal
# zpool create pool sda sdb log sdc
.Ed
.Pp
Multiple log devices can also be specified, and they can be mirrored.
See the
.Sx EXAMPLES
section for an example of mirroring multiple log devices.
.Pp
Log devices can be added, replaced, attached, detached and removed. In
addition, log devices are imported and exported as part of the pool
that contains them.
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
Mirrored devices can be removed by specifying the top-level mirror vdev.
.Ss Cache Devices
Devices can be added to a storage pool as
.Qq cache devices .
These devices provide an additional layer of caching between main memory and
disk.
For read-heavy workloads, where the working set size is much larger than what
can be cached in main memory, using cache devices allow much more of this
working set to be served from low latency media.
Using cache devices provides the greatest performance improvement for random
read-workloads of mostly static content.
.Pp
To create a pool with cache devices, specify a
.Sy cache
vdev with any number of devices.
For example:
.Bd -literal
# zpool create pool sda sdb cache sdc sdd
.Ed
.Pp
Cache devices cannot be mirrored or part of a raidz configuration.
If a read error is encountered on a cache device, that read I/O is reissued to
the original storage pool device, which might be part of a mirrored or raidz
configuration.
.Pp
The content of the cache devices is considered volatile, as is the case with
other system caches.
OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
2016-12-16 22:11:29 +00:00
.Ss Pool checkpoint
Before starting critical procedures that include destructive actions (e.g
.Nm zfs Cm destroy
), an administrator can checkpoint the pool's state and in the case of a
mistake or failure, rewind the entire pool back to the checkpoint.
Otherwise, the checkpoint can be discarded when the procedure has completed
successfully.
.Pp
A pool checkpoint can be thought of as a pool-wide snapshot and should be used
with care as it contains every part of the pool's state, from properties to vdev
configuration.
Thus, while a pool has a checkpoint certain operations are not allowed.
Specifically, vdev removal/attach/detach, mirror splitting, and
changing the pool's guid.
Adding a new vdev is supported but in the case of a rewind it will have to be
added again.
Finally, users of this feature should keep in mind that scrubs in a pool that
has a checkpoint do not repair checkpointed data.
.Pp
To create a checkpoint for a pool:
.Bd -literal
# zpool checkpoint pool
.Ed
.Pp
To later rewind to its checkpointed state, you need to first export it and
then rewind it during import:
.Bd -literal
# zpool export pool
# zpool import --rewind-to-checkpoint pool
.Ed
.Pp
To discard the checkpoint from a pool:
.Bd -literal
# zpool checkpoint -d pool
.Ed
.Pp
Dataset reservations (controlled by the
.Nm reservation
or
.Nm refreservation
zfs properties) may be unenforceable while a checkpoint exists, because the
checkpoint is allowed to consume the dataset's reservation.
Finally, data that is part of the checkpoint but has been freed in the
current state of the pool won't be scanned during a scrub.
.Ss Special Allocation Class
The allocations in the special class are dedicated to specific block types.
By default this includes all metadata, the indirect blocks of user data, and
any dedup data. The class can also be provisioned to accept a limited
percentage of small file data blocks.
.Pp
A pool must always have at least one general (non-specified) vdev before
other devices can be assigned to the special class. If the special class
becomes full, then allocations intended for it will spill back into the
normal class.
.Pp
Dedup data can be excluded from the special class by setting the
.Sy zfs_ddt_data_is_special
zfs module parameter to false (0).
.Pp
Inclusion of small file blocks in the special class is opt-in. Each dataset
can control the size of small file blocks allowed in the special class by
setting the
.Sy special_small_blocks
dataset property. It defaults to zero, so you must opt-in by setting it to a
non-zero value. See
.Xr zfs 8
for more info on setting this property.
.Ss Properties
Each pool has several properties associated with it.
Some properties are read-only statistics while others are configurable and
change the behavior of the pool.
.Pp
The following are read-only properties:
.Bl -tag -width Ds
.It Cm allocated
Amount of storage used within the pool.
See
.Sy fragmentation
and
.Sy free
for more information.
.It Sy capacity
Percentage of pool space used.
This property can also be referred to by its shortened column name,
.Sy cap .
.It Sy expandsize
Amount of uninitialized space within the pool or device that can be used to
increase the total capacity of the pool.
Uninitialized space consists of any space on an EFI labeled vdev which has not
been brought online
.Po e.g, using
.Nm zpool Cm online Fl e
.Pc .
This space occurs when a LUN is dynamically expanded.
.It Sy fragmentation
The amount of fragmentation in the pool. As the amount of space
.Sy allocated
increases, it becomes more difficult to locate
.Sy free
space. This may result in lower write performance compared to pools with more
unfragmented free space.
.It Sy free
The amount of free space available in the pool.
By contrast, the
.Xr zfs 8
.Sy available
property describes how much new data can be written to ZFS filesystems/volumes.
The zpool
.Sy free
property is not generally useful for this purpose, and can be substantially more than the zfs
.Sy available
space. This discrepancy is due to several factors, including raidz party; zfs
reservation, quota, refreservation, and refquota properties; and space set aside by
.Sy spa_slop_shift
(see
.Xr zfs-module-parameters 5
for more information).
.It Sy freeing
After a file system or snapshot is destroyed, the space it was using is
returned to the pool asynchronously.
.Sy freeing
is the amount of space remaining to be reclaimed.
Over time
.Sy freeing
will decrease while
.Sy free
increases.
.It Sy health
The current health of the pool.
Health can be one of
.Sy ONLINE , DEGRADED , FAULTED , OFFLINE, REMOVED , UNAVAIL .
.It Sy guid
A unique identifier for the pool.
.It Sy load_guid
A unique identifier for the pool.
Unlike the
.Sy guid
property, this identifier is generated every time we load the pool (e.g. does
not persist across imports/exports) and never changes while the pool is loaded
(even if a
.Sy reguid
operation takes place).
.It Sy size
Total size of the storage pool.
.It Sy unsupported@ Ns Em feature_guid
Information about unsupported features that are enabled on the pool.
See
.Xr zpool-features 5
for details.
.El
.Pp
The space usage properties report actual physical space available to the
storage pool.
The physical space can be different from the total amount of space that any
contained datasets can actually use.
The amount of space used in a raidz configuration depends on the characteristics
of the data being written.
In addition, ZFS reserves some space for internal accounting that the
.Xr zfs 8
command takes into account, but the
.Nm
command does not.
For non-full pools of a reasonable size, these effects should be invisible.
For small pools, or pools that are close to being completely full, these
discrepancies may become more noticeable.
.Pp
The following property can be set at creation time and import time:
.Bl -tag -width Ds
.It Sy altroot
Alternate root directory.
If set, this directory is prepended to any mount points within the pool.
This can be used when examining an unknown pool where the mount points cannot be
trusted, or in an alternate boot environment, where the typical paths are not
valid.
.Sy altroot
is not a persistent property.
It is valid only while the system is up.
Setting
.Sy altroot
defaults to using
.Sy cachefile Ns = Ns Sy none ,
though this may be overridden using an explicit setting.
.El
.Pp
The following property can be set only at import time:
.Bl -tag -width Ds
.It Sy readonly Ns = Ns Sy on Ns | Ns Sy off
If set to
.Sy on ,
the pool will be imported in read-only mode.
This property can also be referred to by its shortened column name,
.Sy rdonly .
.El
.Pp
The following properties can be set at creation time and import time, and later
changed with the
.Nm zpool Cm set
command:
.Bl -tag -width Ds
.It Sy ashift Ns = Ns Sy ashift
Pool sector size exponent, to the power of
.Sy 2
(internally referred to as
.Sy ashift
). Values from 9 to 16, inclusive, are valid; also, the
value 0 (the default) means to auto-detect using the kernel's block
layer and a ZFS internal exception list. I/O operations will be aligned
to the specified size boundaries. Additionally, the minimum (disk)
write size will be set to the specified size, so this represents a
space vs. performance trade-off. For optimal performance, the pool
sector size should be greater than or equal to the sector size of the
underlying disks. The typical case for setting this property is when
performance is important and the underlying disks use 4KiB sectors but
report 512B sectors to the OS (for compatibility reasons); in that
case, set
.Sy ashift=12
(which is 1<<12 = 4096). When set, this property is
used as the default hint value in subsequent vdev operations (add,
attach and replace). Changing this value will not modify any existing
vdev, not even on disk replacement; however it can be used, for
instance, to replace a dying 512B sectors disk with a newer 4KiB
sectors device: this will probably result in bad performance but at the
same time could prevent loss of data.
.It Sy autoexpand Ns = Ns Sy on Ns | Ns Sy off
Controls automatic pool expansion when the underlying LUN is grown.
If set to
.Sy on ,
the pool will be resized according to the size of the expanded device.
If the device is part of a mirror or raidz then all devices within that
mirror/raidz group must be expanded before the new space is made available to
the pool.
The default behavior is
.Sy off .
This property can also be referred to by its shortened column name,
.Sy expand .
.It Sy autoreplace Ns = Ns Sy on Ns | Ns Sy off
Controls automatic device replacement.
If set to
.Sy off ,
device replacement must be initiated by the administrator by using the
.Nm zpool Cm replace
command.
If set to
.Sy on ,
any new device, found in the same physical location as a device that previously
belonged to the pool, is automatically formatted and replaced.
The default behavior is
.Sy off .
This property can also be referred to by its shortened column name,
.Sy replace .
Autoreplace can also be used with virtual disks (like device
mapper) provided that you use the /dev/disk/by-vdev paths setup by
vdev_id.conf. See the
.Xr vdev_id 8
man page for more details.
Autoreplace and autoonline require the ZFS Event Daemon be configured and
running. See the
.Xr zed 8
man page for more details.
.It Sy bootfs Ns = Ns Sy (unset) Ns | Ns Ar pool Ns / Ns Ar dataset
Identifies the default bootable dataset for the root pool. This property is
expected to be set mainly by the installation and upgrade programs.
Not all Linux distribution boot processes use the bootfs property.
.It Sy cachefile Ns = Ns Ar path Ns | Ns Sy none
Controls the location of where the pool configuration is cached.
Discovering all pools on system startup requires a cached copy of the
configuration data that is stored on the root file system.
All pools in this cache are automatically imported when the system boots.
Some environments, such as install and clustering, need to cache this
information in a different location so that pools are not automatically
imported.
Setting this property caches the pool configuration in a different location that
can later be imported with
.Nm zpool Cm import Fl c .
Setting it to the value
.Sy none
creates a temporary pool that is never cached, and the
.Qq
.Pq empty string
uses the default location.
.Pp
Multiple pools can share the same cache file.
Because the kernel destroys and recreates this file when pools are added and
removed, care should be taken when attempting to access this file.
When the last pool using a
.Sy cachefile
is exported or destroyed, the file will be empty.
.It Sy comment Ns = Ns Ar text
A text string consisting of printable ASCII characters that will be stored
such that it is available even if the pool becomes faulted.
An administrator can provide additional information about a pool using this
property.
.It Sy dedupditto Ns = Ns Ar number
Threshold for the number of block ditto copies.
If the reference count for a deduplicated block increases above this number, a
new ditto copy of this block is automatically stored.
The default setting is
.Sy 0
which causes no ditto copies to be created for deduplicated blocks.
The minimum legal nonzero setting is
.Sy 100 .
.It Sy delegation Ns = Ns Sy on Ns | Ns Sy off
Controls whether a non-privileged user is granted access based on the dataset
permissions defined on the dataset.
See
.Xr zfs 8
for more information on ZFS delegated administration.
.It Sy failmode Ns = Ns Sy wait Ns | Ns Sy continue Ns | Ns Sy panic
Controls the system behavior in the event of catastrophic pool failure.
This condition is typically a result of a loss of connectivity to the underlying
storage device(s) or a failure of all devices within the pool.
The behavior of such an event is determined as follows:
.Bl -tag -width "continue"
.It Sy wait
Blocks all I/O access until the device connectivity is recovered and the errors
are cleared.
This is the default behavior.
.It Sy continue
Returns
.Er EIO
to any new write I/O requests but allows reads to any of the remaining healthy
devices.
Any write requests that have yet to be committed to disk would be blocked.
.It Sy panic
Prints out a message to the console and generates a system crash dump.
.El
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.It Sy autotrim Ns = Ns Sy on Ns | Ns Sy off
When set to
.Sy on
space which has been recently freed, and is no longer allocated by the pool,
will be periodically trimmed. This allows block device vdevs which support
BLKDISCARD, such as SSDs, or file vdevs on which the underlying file system
supports hole-punching, to reclaim unused blocks. The default setting for
this property is
.Sy off .
.Pp
Automatic TRIM does not immediately reclaim blocks after a free. Instead,
it will optimistically delay allowing smaller ranges to be aggregated in to
a few larger ones. These can then be issued more efficiently to the storage.
.Pp
Be aware that automatic trimming of recently freed data blocks can put
significant stress on the underlying storage devices. This will vary
depending of how well the specific device handles these commands. For
lower end devices it is often possible to achieve most of the benefits
of automatic trimming by running an on-demand (manual) TRIM periodically
using the
.Nm zpool Cm trim
command.
.It Sy feature@ Ns Ar feature_name Ns = Ns Sy enabled
The value of this property is the current state of
.Ar feature_name .
The only valid value when setting this property is
.Sy enabled
which moves
.Ar feature_name
to the enabled state.
See
.Xr zpool-features 5
for details on feature states.
.It Sy listsnapshots Ns = Ns Sy on Ns | Ns Sy off
Controls whether information about snapshots associated with this pool is
output when
.Nm zfs Cm list
is run without the
.Fl t
option.
The default value is
.Sy off .
This property can also be referred to by its shortened name,
.Sy listsnaps .
Multi-modifier protection (MMP) Add multihost=on|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #745 Closes #6279
2017-07-08 03:20:35 +00:00
.It Sy multihost Ns = Ns Sy on Ns | Ns Sy off
Controls whether a pool activity check should be performed during
.Nm zpool Cm import .
When a pool is determined to be active it cannot be imported, even with the
.Fl f
option. This property is intended to be used in failover configurations
where multiple hosts have access to a pool on shared storage.
.Pp
Multihost provides protection on import only. It does not protect against an
individual device being used in multiple pools, regardless of the type of vdev.
See the discussion under
.Sy zpool create.
.Pp
When this property is on, periodic writes to storage occur to show the pool is
in use. See
Multi-modifier protection (MMP) Add multihost=on|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #745 Closes #6279
2017-07-08 03:20:35 +00:00
.Sy zfs_multihost_interval
in the
.Xr zfs-module-parameters 5
man page. In order to enable this property each host must set a unique hostid.
See
.Xr genhostid 1
.Xr zgenhostid 8
.Xr spl-module-parameters 5
Multi-modifier protection (MMP) Add multihost=on|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #745 Closes #6279
2017-07-08 03:20:35 +00:00
for additional details. The default value is
.Sy off .
.It Sy version Ns = Ns Ar version
The current on-disk version of the pool.
This can be increased, but never decreased.
The preferred method of updating pools is with the
.Nm zpool Cm upgrade
command, though this property can be used when a specific version is needed for
backwards compatibility.
Once feature flags are enabled on a pool this property will no longer have a
value.
.El
.Ss Subcommands
All subcommands that modify state are logged persistently to the pool in their
original form.
.Pp
The
.Nm
command provides subcommands to create and destroy storage pools, add capacity
to storage pools, and provide information about the storage pools.
The following subcommands are supported:
.Bl -tag -width Ds
.It Xo
.Nm
.Fl ?
.Xc
Displays a help message.
.It Xo
.Nm
.Cm add
.Op Fl fgLnP
.Oo Fl o Ar property Ns = Ns Ar value Oc
.Ar pool vdev Ns ...
.Xc
Adds the specified virtual devices to the given pool.
The
.Ar vdev
specification is described in the
.Sx Virtual Devices
section.
The behavior of the
.Fl f
option, and the device checks performed are described in the
.Nm zpool Cm create
subcommand.
.Bl -tag -width Ds
.It Fl f
Forces use of
.Ar vdev Ns s ,
even if they appear in use or specify a conflicting replication level.
Not all devices can be overridden in this manner.
.It Fl g
Display
.Ar vdev ,
GUIDs instead of the normal device names. These GUIDs can be used in place of
device names for the zpool detach/offline/remove/replace commands.
.It Fl L
Display real paths for
.Ar vdev Ns s
resolving all symbolic links. This can be used to look up the current block
device name regardless of the /dev/disk/ path used to open it.
.It Fl n
Displays the configuration that would be used without actually adding the
.Ar vdev Ns s .
The actual pool creation can still fail due to insufficient privileges or
device sharing.
.It Fl P
Display real paths for
.Ar vdev Ns s
instead of only the last component of the path. This can be used in
conjunction with the
.Fl L
flag.
.It Fl o Ar property Ns = Ns Ar value
Sets the given pool properties. See the
.Sx Properties
section for a list of valid properties that can be set. The only property
supported at the moment is ashift.
.El
.It Xo
.Nm
.Cm attach
.Op Fl f
.Oo Fl o Ar property Ns = Ns Ar value Oc
.Ar pool device new_device
.Xc
Attaches
.Ar new_device
to the existing
.Ar device .
The existing device cannot be part of a raidz configuration.
If
.Ar device
is not currently part of a mirrored configuration,
.Ar device
automatically transforms into a two-way mirror of
.Ar device
and
.Ar new_device .
If
.Ar device
is part of a two-way mirror, attaching
.Ar new_device
creates a three-way mirror, and so on.
In either case,
.Ar new_device
begins to resilver immediately.
.Bl -tag -width Ds
.It Fl f
Forces use of
.Ar new_device ,
even if it appears to be in use.
Not all devices can be overridden in this manner.
.It Fl o Ar property Ns = Ns Ar value
Sets the given pool properties. See the
.Sx Properties
section for a list of valid properties that can be set. The only property
supported at the moment is ashift.
.El
.It Xo
.Nm
OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
2016-12-16 22:11:29 +00:00
.Cm checkpoint
.Op Fl d, -discard
.Ar pool
.Xc
Checkpoints the current state of
.Ar pool
, which can be later restored by
.Nm zpool Cm import --rewind-to-checkpoint .
The existence of a checkpoint in a pool prohibits the following
.Nm zpool
commands:
.Cm remove ,
.Cm attach ,
.Cm detach ,
.Cm split ,
and
.Cm reguid .
In addition, it may break reservation boundaries if the pool lacks free
space.
The
.Nm zpool Cm status
command indicates the existence of a checkpoint or the progress of discarding a
checkpoint from a pool.
The
.Nm zpool Cm list
command reports how much space the checkpoint takes from the pool.
.Bl -tag -width Ds
.It Fl d, -discard
Discards an existing checkpoint from
.Ar pool .
.El
.It Xo
.Nm
.Cm clear
.Ar pool
.Op Ar device
.Xc
Clears device errors in a pool.
If no arguments are specified, all device errors within the pool are cleared.
If one or more devices is specified, only those errors associated with the
specified device or devices are cleared.
If multihost is enabled, and the pool has been suspended, this will not
resume I/O. While the pool was suspended, it may have been imported on
another host, and resuming I/O could result in pool damage.
.It Xo
.Nm
.Cm create
.Op Fl dfn
.Op Fl m Ar mountpoint
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Oo Fl o Ar feature@feature Ns = Ns Ar value Oc Ns ...
.Oo Fl O Ar file-system-property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Op Fl t Ar tname
.Ar pool vdev Ns ...
.Xc
Creates a new storage pool containing the virtual devices specified on the
command line.
The pool name must begin with a letter, and can only contain
alphanumeric characters as well as underscore
.Pq Qq Sy _ ,
dash
.Pq Qq Sy \&- ,
colon
.Pq Qq Sy \&: ,
space
.Pq Qq Sy \&\ ,
and period
.Pq Qq Sy \&. .
The pool names
.Sy mirror ,
.Sy raidz ,
.Sy spare
and
.Sy log
are reserved, as are names beginning with
.Sy mirror ,
.Sy raidz ,
.Sy spare ,
and the pattern
.Sy c[0-9] .
The
.Ar vdev
specification is described in the
.Sx Virtual Devices
section.
.Pp
The command attempts to verify that each device specified is accessible and not
currently in use by another subsystem. However this check is not robust enough
to detect simultaneous attempts to use a new device in different pools, even if
.Sy multihost
is
.Sy enabled.
The
administrator must ensure that simultaneous invocations of any combination of
.Sy zpool replace ,
.Sy zpool create ,
.Sy zpool add ,
or
.Sy zpool labelclear ,
do not refer to the same device. Using the same device in two pools will
result in pool corruption.
.Pp
There are some uses, such as being currently mounted, or specified as the
dedicated dump device, that prevents a device from ever being used by ZFS.
Other uses, such as having a preexisting UFS file system, can be overridden with
the
.Fl f
option.
.Pp
The command also checks that the replication strategy for the pool is
consistent.
An attempt to combine redundant and non-redundant storage in a single pool, or
to mix disks and files, results in an error unless
.Fl f
is specified.
The use of differently sized devices within a single raidz or mirror group is
also flagged as an error unless
.Fl f
is specified.
.Pp
Unless the
.Fl R
option is specified, the default mount point is
.Pa / Ns Ar pool .
The mount point must not exist or must be empty, or else the root dataset
cannot be mounted.
This can be overridden with the
.Fl m
option.
.Pp
By default all supported features are enabled on the new pool unless the
.Fl d
option is specified.
.Bl -tag -width Ds
.It Fl d
Do not enable any features on the new pool.
Individual features can be enabled by setting their corresponding properties to
.Sy enabled
with the
.Fl o
option.
See
.Xr zpool-features 5
for details about feature properties.
.It Fl f
Forces use of
.Ar vdev Ns s ,
even if they appear in use or specify a conflicting replication level.
Not all devices can be overridden in this manner.
.It Fl m Ar mountpoint
Sets the mount point for the root dataset.
The default mount point is
.Pa /pool
or
.Pa altroot/pool
if
.Ar altroot
is specified.
The mount point must be an absolute path,
.Sy legacy ,
or
.Sy none .
For more information on dataset mount points, see
.Xr zfs 8 .
.It Fl n
Displays the configuration that would be used without actually creating the
pool.
The actual pool creation can still fail due to insufficient privileges or
device sharing.
.It Fl o Ar property Ns = Ns Ar value
Sets the given pool properties.
See the
.Sx Properties
section for a list of valid properties that can be set.
.It Fl o Ar feature@feature Ns = Ns Ar value
Sets the given pool feature. See the
.Xr zpool-features 5
section for a list of valid features that can be set.
Value can be either disabled or enabled.
.It Fl O Ar file-system-property Ns = Ns Ar value
Sets the given file system properties in the root file system of the pool.
See the
.Sx Properties
section of
.Xr zfs 8
for a list of valid properties that can be set.
.It Fl R Ar root
Equivalent to
.Fl o Sy cachefile Ns = Ns Sy none Fl o Sy altroot Ns = Ns Ar root
.It Fl t Ar tname
Sets the in-core pool name to
.Sy tname
while the on-disk name will be the name specified as the pool name
.Sy pool .
This will set the default cachefile property to none. This is intended
to handle name space collisions when creating pools for other systems,
such as virtual machines or physical machines whose pools live on network
block devices.
.El
.It Xo
.Nm
.Cm destroy
.Op Fl f
.Ar pool
.Xc
Destroys the given pool, freeing up any devices for other use.
This command tries to unmount any active datasets before destroying the pool.
.Bl -tag -width Ds
.It Fl f
Forces any active datasets contained within the pool to be unmounted.
.El
.It Xo
.Nm
.Cm detach
.Ar pool device
.Xc
Detaches
.Ar device
from a mirror.
The operation is refused if there are no other valid replicas of the data.
If device may be re-added to the pool later on then consider the
.Sy zpool offline
command instead.
.It Xo
.Nm
.Cm events
.Op Fl vHf Oo Ar pool Oc | Fl c
.Xc
Lists all recent events generated by the ZFS kernel modules. These events
are consumed by the
.Xr zed 8
and used to automate administrative tasks such as replacing a failed device
with a hot spare. For more information about the subclasses and event payloads
that can be generated see the
.Xr zfs-events 5
man page.
.Bl -tag -width Ds
.It Fl c
Clear all previous events.
.It Fl f
Follow mode.
.It Fl H
Scripted mode. Do not display headers, and separate fields by a
single tab instead of arbitrary space.
.It Fl v
Print the entire payload for each event.
.El
.It Xo
.Nm
.Cm export
.Op Fl a
.Op Fl f
.Ar pool Ns ...
.Xc
Exports the given pools from the system.
All devices are marked as exported, but are still considered in use by other
subsystems.
The devices can be moved between systems
.Pq even those of different endianness
and imported as long as a sufficient number of devices are present.
.Pp
Before exporting the pool, all datasets within the pool are unmounted.
A pool can not be exported if it has a shared spare that is currently being
used.
.Pp
For pools to be portable, you must give the
.Nm
command whole disks, not just partitions, so that ZFS can label the disks with
portable EFI labels.
Otherwise, disk drivers on platforms of different endianness will not recognize
the disks.
.Bl -tag -width Ds
.It Fl a
Exports all pools imported on the system.
.It Fl f
Forcefully unmount all datasets, using the
.Nm unmount Fl f
command.
.Pp
This command will forcefully export the pool even if it has a shared spare that
is currently being used.
This may lead to potential data corruption.
.El
.It Xo
.Nm
.Cm get
.Op Fl Hp
.Op Fl o Ar field Ns Oo , Ns Ar field Oc Ns ...
.Sy all Ns | Ns Ar property Ns Oo , Ns Ar property Oc Ns ...
.Oo Ar pool Oc Ns ...
.Xc
Retrieves the given list of properties
.Po
or all properties if
.Sy all
is used
.Pc
for the specified storage pool(s).
These properties are displayed with the following fields:
.Bd -literal
name Name of storage pool
property Property name
value Property value
source Property source, either 'default' or 'local'.
.Ed
.Pp
See the
.Sx Properties
section for more information on the available pool properties.
.Bl -tag -width Ds
.It Fl H
Scripted mode.
Do not display headers, and separate fields by a single tab instead of arbitrary
space.
.It Fl o Ar field
A comma-separated list of columns to display.
.Sy name Ns \&, Ns Sy property Ns \&, Ns Sy value Ns \&, Ns Sy source
is the default value.
.It Fl p
Display numbers in parsable (exact) values.
.El
.It Xo
.Nm
.Cm history
.Op Fl il
.Oo Ar pool Oc Ns ...
.Xc
Displays the command history of the specified pool(s) or all pools if no pool is
specified.
.Bl -tag -width Ds
.It Fl i
Displays internally logged ZFS events in addition to user initiated events.
.It Fl l
Displays log records in long format, which in addition to standard format
includes, the user name, the hostname, and the zone in which the operation was
performed.
.El
.It Xo
.Nm
.Cm import
.Op Fl D
.Op Fl d Ar dir Ns | Ns device
.Xc
Lists pools available to import.
If the
.Fl d
option is not specified, this command searches for devices in
.Pa /dev .
The
.Fl d
option can be specified multiple times, and all directories are searched.
If the device appears to be part of an exported pool, this command displays a
summary of the pool with the name of the pool, a numeric identifier, as well as
the vdev layout and current health of the device for each device or file.
Destroyed pools, pools that were previously destroyed with the
.Nm zpool Cm destroy
command, are not listed unless the
.Fl D
option is specified.
.Pp
The numeric identifier is unique, and can be used instead of the pool name when
multiple exported pools of the same name are available.
.Bl -tag -width Ds
.It Fl c Ar cachefile
Reads configuration from the given
.Ar cachefile
that was created with the
.Sy cachefile
pool property.
This
.Ar cachefile
is used instead of searching for devices.
.It Fl d Ar dir Ns | Ns Ar device
Uses
.Ar device
or searches for devices or files in
.Ar dir .
The
.Fl d
option can be specified multiple times.
.It Fl D
Lists destroyed pools only.
.El
.It Xo
.Nm
.Cm import
.Fl a
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.Op Fl DflmN
.Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
.Op Fl o Ar mntopts
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Op Fl s
.Xc
Imports all pools found in the search directories.
Identical to the previous command, except that all pools with a sufficient
number of devices available are imported.
Destroyed pools, pools that were previously destroyed with the
.Nm zpool Cm destroy
command, will not be imported unless the
.Fl D
option is specified.
.Bl -tag -width Ds
.It Fl a
Searches for and imports all pools found.
.It Fl c Ar cachefile
Reads configuration from the given
.Ar cachefile
that was created with the
.Sy cachefile
pool property.
This
.Ar cachefile
is used instead of searching for devices.
.It Fl d Ar dir Ns | Ns Ar device
Uses
.Ar device
or searches for devices or files in
.Ar dir .
The
.Fl d
option can be specified multiple times.
This option is incompatible with the
.Fl c
option.
.It Fl D
Imports destroyed pools only.
The
.Fl f
option is also required.
.It Fl f
Forces import, even if the pool appears to be potentially active.
.It Fl F
Recovery mode for a non-importable pool.
Attempt to return the pool to an importable state by discarding the last few
transactions.
Not all damaged pools can be recovered by using this option.
If successful, the data from the discarded transactions is irretrievably lost.
This option is ignored if the pool is importable or already imported.
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.It Fl l
Indicates that this command will request encryption keys for all encrypted
datasets it attempts to mount as it is bringing the pool online. Note that if
any datasets have a
.Sy keylocation
of
.Sy prompt
this command will block waiting for the keys to be entered. Without this flag
encrypted datasets will be left unavailable until the keys are loaded.
.It Fl m
Allows a pool to import when there is a missing log device.
Recent transactions can be lost because the log device will be discarded.
.It Fl n
Used with the
.Fl F
recovery option.
Determines whether a non-importable pool can be made importable again, but does
not actually perform the pool recovery.
For more details about pool recovery mode, see the
.Fl F
option, above.
.It Fl N
Import the pool without mounting any file systems.
.It Fl o Ar mntopts
Comma-separated list of mount options to use when mounting datasets within the
pool.
See
.Xr zfs 8
for a description of dataset properties and mount options.
.It Fl o Ar property Ns = Ns Ar value
Sets the specified property on the imported pool.
See the
.Sx Properties
section for more information on the available pool properties.
.It Fl R Ar root
Sets the
.Sy cachefile
property to
.Sy none
and the
.Sy altroot
property to
.Ar root .
OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
2016-12-16 22:11:29 +00:00
.It Fl -rewind-to-checkpoint
Rewinds pool to the checkpointed state.
Once the pool is imported with this flag there is no way to undo the rewind.
All changes and data that were written after the checkpoint are lost!
The only exception is when the
.Sy readonly
mounting option is enabled.
In this case, the checkpointed state of the pool is opened and an
administrator can see how the pool would look like if they were
to fully rewind.
.It Fl s
Scan using the default search path, the libblkid cache will not be
consulted. A custom search path may be specified by setting the
ZPOOL_IMPORT_PATH environment variable.
.It Fl X
Used with the
.Fl F
recovery option. Determines whether extreme
measures to find a valid txg should take place. This allows the pool to
be rolled back to a txg which is no longer guaranteed to be consistent.
Pools imported at an inconsistent txg may contain uncorrectable
checksum errors. For more details about pool recovery mode, see the
.Fl F
option, above. WARNING: This option can be extremely hazardous to the
health of your pool and should only be used as a last resort.
.It Fl T
Specify the txg to use for rollback. Implies
.Fl FX .
For more details
about pool recovery mode, see the
.Fl X
option, above. WARNING: This option can be extremely hazardous to the
health of your pool and should only be used as a last resort.
.El
.It Xo
.Nm
.Cm import
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.Op Fl Dflm
.Op Fl F Oo Fl n Oc Oo Fl t Oc Oo Fl T Oc Oo Fl X Oc
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
.Op Fl o Ar mntopts
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Op Fl s
.Ar pool Ns | Ns Ar id
.Op Ar newpool
.Xc
Imports a specific pool.
A pool can be identified by its name or the numeric identifier.
If
.Ar newpool
is specified, the pool is imported using the name
.Ar newpool .
Otherwise, it is imported with the same name as its exported name.
.Pp
If a device is removed from a system without running
.Nm zpool Cm export
first, the device appears as potentially active.
It cannot be determined if this was a failed export, or whether the device is
really in use from another host.
To import a pool in this state, the
.Fl f
option is required.
.Bl -tag -width Ds
.It Fl c Ar cachefile
Reads configuration from the given
.Ar cachefile
that was created with the
.Sy cachefile
pool property.
This
.Ar cachefile
is used instead of searching for devices.
.It Fl d Ar dir Ns | Ns Ar device
Uses
.Ar device
or searches for devices or files in
.Ar dir .
The
.Fl d
option can be specified multiple times.
This option is incompatible with the
.Fl c
option.
.It Fl D
Imports destroyed pool.
The
.Fl f
option is also required.
.It Fl f
Forces import, even if the pool appears to be potentially active.
.It Fl F
Recovery mode for a non-importable pool.
Attempt to return the pool to an importable state by discarding the last few
transactions.
Not all damaged pools can be recovered by using this option.
If successful, the data from the discarded transactions is irretrievably lost.
This option is ignored if the pool is importable or already imported.
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.It Fl l
Indicates that this command will request encryption keys for all encrypted
datasets it attempts to mount as it is bringing the pool online. Note that if
any datasets have a
.Sy keylocation
of
.Sy prompt
this command will block waiting for the keys to be entered. Without this flag
encrypted datasets will be left unavailable until the keys are loaded.
.It Fl m
Allows a pool to import when there is a missing log device.
Recent transactions can be lost because the log device will be discarded.
.It Fl n
Used with the
.Fl F
recovery option.
Determines whether a non-importable pool can be made importable again, but does
not actually perform the pool recovery.
For more details about pool recovery mode, see the
.Fl F
option, above.
.It Fl o Ar mntopts
Comma-separated list of mount options to use when mounting datasets within the
pool.
See
.Xr zfs 8
for a description of dataset properties and mount options.
.It Fl o Ar property Ns = Ns Ar value
Sets the specified property on the imported pool.
See the
.Sx Properties
section for more information on the available pool properties.
.It Fl R Ar root
Sets the
.Sy cachefile
property to
.Sy none
and the
.Sy altroot
property to
.Ar root .
.It Fl s
Scan using the default search path, the libblkid cache will not be
consulted. A custom search path may be specified by setting the
ZPOOL_IMPORT_PATH environment variable.
.It Fl X
Used with the
.Fl F
recovery option. Determines whether extreme
measures to find a valid txg should take place. This allows the pool to
be rolled back to a txg which is no longer guaranteed to be consistent.
Pools imported at an inconsistent txg may contain uncorrectable
checksum errors. For more details about pool recovery mode, see the
.Fl F
option, above. WARNING: This option can be extremely hazardous to the
health of your pool and should only be used as a last resort.
.It Fl T
Specify the txg to use for rollback. Implies
.Fl FX .
For more details
about pool recovery mode, see the
.Fl X
option, above. WARNING: This option can be extremely hazardous to the
health of your pool and should only be used as a last resort.
.It Fl t
Used with
.Sy newpool .
Specifies that
.Sy newpool
is temporary. Temporary pool names last until export. Ensures that
the original pool name will be used in all label updates and therefore
is retained upon export.
Will also set -o cachefile=none when not explicitly specified.
.El
.It Xo
.Nm
OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
2018-12-19 14:54:59 +00:00
.Cm initialize
.Op Fl c | Fl s
OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230
2018-12-19 14:54:59 +00:00
.Ar pool
.Op Ar device Ns ...
.Xc
Begins initializing by writing to all unallocated regions on the specified
devices, or all eligible devices in the pool if no individual devices are
specified.
Only leaf data or log devices may be initialized.
.Bl -tag -width Ds
.It Fl c, -cancel
Cancel initializing on the specified devices, or all eligible devices if none
are specified.
If one or more target devices are invalid or are not currently being
initialized, the command will fail and no cancellation will occur on any device.
.It Fl s -suspend
Suspend initializing on the specified devices, or all eligible devices if none
are specified.
If one or more target devices are invalid or are not currently being
initialized, the command will fail and no suspension will occur on any device.
Initializing can then be resumed by running
.Nm zpool Cm initialize
with no flags on the relevant target devices.
.El
.It Xo
.Nm
.Cm iostat
.Op Oo Oo Fl c Ar SCRIPT Oc Oo Fl lq Oc Oc Ns | Ns Fl rw
.Op Fl T Sy u Ns | Ns Sy d
.Op Fl ghHLnpPvy
.Oo Oo Ar pool Ns ... Oc Ns | Ns Oo Ar pool vdev Ns ... Oc Ns | Ns Oo Ar vdev Ns ... Oc Oc
.Op Ar interval Op Ar count
.Xc
Displays logical I/O statistics for the given pools/vdevs. Physical I/Os may
be observed via
.Xr iostat 1 .
If writes are located nearby, they may be merged into a single
larger operation. Additional I/O may be generated depending on the level of
vdev redundancy.
To filter output, you may pass in a list of pools, a pool and list of vdevs
in that pool, or a list of any vdevs from any pool. If no items are specified,
statistics for every pool in the system are shown.
When given an
.Ar interval ,
the statistics are printed every
.Ar interval
seconds until ^C is pressed. If
.Fl n
flag is specified the headers are displayed only once, otherwise they are
displayed periodically. If count is specified, the command exits
after count reports are printed. The first report printed is always
the statistics since boot regardless of whether
.Ar interval
and
.Ar count
are passed. However, this behavior can be suppressed with the
.Fl y
flag. Also note that the units of
.Sy K ,
.Sy M ,
.Sy G ...
that are printed in the report are in base 1024. To get the raw
values, use the
.Fl p
flag.
.Bl -tag -width Ds
.It Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ...
Run a script (or scripts) on each vdev and include the output as a new column
in the
.Nm zpool Cm iostat
output. Users can run any script found in their
.Pa ~/.zpool.d
directory or from the system
.Pa /etc/zfs/zpool.d
directory. Script names containing the slash (/) character are not allowed.
The default search path can be overridden by setting the
ZPOOL_SCRIPTS_PATH environment variable. A privileged user can run
.Fl c
if they have the ZPOOL_SCRIPTS_AS_ROOT
environment variable set. If a script requires the use of a privileged
command, like
.Xr smartctl 8 ,
then it's recommended you allow the user access to it in
.Pa /etc/sudoers
or add the user to the
.Pa /etc/sudoers.d/zfs
file.
.Pp
If
.Fl c
is passed without a script name, it prints a list of all scripts.
.Fl c
also sets verbose mode
.No \&( Ns Fl v Ns No \&).
.Pp
Script output should be in the form of "name=value". The column name is
set to "name" and the value is set to "value". Multiple lines can be
used to output multiple columns. The first line of output not in the
"name=value" format is displayed without a column title, and no more
output after that is displayed. This can be useful for printing error
messages. Blank or NULL values are printed as a '-' to make output
awk-able.
.Pp
2017-04-21 16:27:04 +00:00
The following environment variables are set before running each script:
.Bl -tag -width "VDEV_PATH"
.It Sy VDEV_PATH
Full path to the vdev
.El
.Bl -tag -width "VDEV_UPATH"
.It Sy VDEV_UPATH
Underlying path to the vdev (/dev/sd*). For use with device mapper,
multipath, or partitioned vdevs.
.El
.Bl -tag -width "VDEV_ENC_SYSFS_PATH"
.It Sy VDEV_ENC_SYSFS_PATH
The sysfs path to the enclosure for the vdev (if any).
.El
.It Fl T Sy u Ns | Ns Sy d
Display a time stamp.
Specify
.Sy u
for a printed representation of the internal representation of time.
See
.Xr time 2 .
Specify
.Sy d
for standard date format.
See
.Xr date 1 .
.It Fl g
Display vdev GUIDs instead of the normal device names. These GUIDs
can be used in place of device names for the zpool
detach/offline/remove/replace commands.
.It Fl H
Scripted mode. Do not display headers, and separate fields by a
single tab instead of arbitrary space.
.It Fl L
Display real paths for vdevs resolving all symbolic links. This can
be used to look up the current block device name regardless of the
.Pa /dev/disk/
path used to open it.
.It Fl n
Print headers only once when passed
.It Fl p
Display numbers in parsable (exact) values. Time values are in
nanoseconds.
.It Fl P
Display full paths for vdevs instead of only the last component of
the path. This can be used in conjunction with the
.Fl L
flag.
.It Fl r
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
Print request size histograms for the leaf vdev's IO. This includes
histograms of individual IOs (ind) and aggregate IOs (agg). These stats
can be useful for observing how well IO aggregation is working. Note
that TRIM IOs may exceed 16M, but will be counted as 16M.
.It Fl v
Verbose statistics Reports usage statistics for individual vdevs within the
pool, in addition to the pool-wide statistics.
.It Fl y
Omit statistics since boot.
Normally the first line of output reports the statistics since boot.
This option suppresses that first line of output.
.Ar interval
.It Fl w
Display latency histograms:
.Pp
.Ar total_wait :
Total IO time (queuing + disk IO time).
.Ar disk_wait :
Disk IO time (time reading/writing the disk).
.Ar syncq_wait :
Amount of time IO spent in synchronous priority queues. Does not include
disk time.
.Ar asyncq_wait :
Amount of time IO spent in asynchronous priority queues. Does not include
disk time.
.Ar scrub :
Amount of time IO spent in scrub queue. Does not include disk time.
.It Fl l
Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <hutter2@llnl.gov Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4433
2016-02-29 18:05:23 +00:00
Include average latency statistics:
.Pp
.Ar total_wait :
Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <hutter2@llnl.gov Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4433
2016-02-29 18:05:23 +00:00
Average total IO time (queuing + disk IO time).
.Ar disk_wait :
Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <hutter2@llnl.gov Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4433
2016-02-29 18:05:23 +00:00
Average disk IO time (time reading/writing the disk).
.Ar syncq_wait :
Average amount of time IO spent in synchronous priority queues. Does
not include disk time.
.Ar asyncq_wait :
Average amount of time IO spent in asynchronous priority queues.
Does not include disk time.
.Ar scrub :
Average queuing time in scrub queue. Does not include disk time.
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.Ar trim :
Average queuing time in trim queue. Does not include disk time.
.It Fl q
Include active queue statistics. Each priority queue has both
pending (
.Ar pend )
and active (
.Ar activ )
IOs. Pending IOs are waiting to
be issued to the disk, and active IOs have been issued to disk and are
waiting for completion. These stats are broken out by priority queue:
.Pp
.Ar syncq_read/write :
Current number of entries in synchronous priority
queues.
.Ar asyncq_read/write :
Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <hutter2@llnl.gov Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4433
2016-02-29 18:05:23 +00:00
Current number of entries in asynchronous priority queues.
.Ar scrubq_read :
Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <hutter2@llnl.gov Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4433
2016-02-29 18:05:23 +00:00
Current number of entries in scrub queue.
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.Ar trimq_write :
Current number of entries in trim queue.
.Pp
All queue statistics are instantaneous measurements of the number of
entries in the queues. If you specify an interval, the measurements
will be sampled from the end of the interval.
.El
.It Xo
.Nm
.Cm labelclear
.Op Fl f
.Ar device
.Xc
Removes ZFS label information from the specified
.Ar device .
The
.Ar device
must not be part of an active pool configuration.
.Bl -tag -width Ds
.It Fl f
Treat exported or foreign devices as inactive.
.El
.It Xo
.Nm
.Cm list
.Op Fl HgLpPv
.Op Fl o Ar property Ns Oo , Ns Ar property Oc Ns ...
.Op Fl T Sy u Ns | Ns Sy d
.Oo Ar pool Oc Ns ...
.Op Ar interval Op Ar count
.Xc
Lists the given pools along with a health status and space usage.
If no
.Ar pool Ns s
are specified, all pools in the system are listed.
When given an
.Ar interval ,
the information is printed every
.Ar interval
seconds until ^C is pressed.
If
.Ar count
is specified, the command exits after
.Ar count
reports are printed.
.Bl -tag -width Ds
.It Fl g
Display vdev GUIDs instead of the normal device names. These GUIDs
can be used in place of device names for the zpool
detach/offline/remove/replace commands.
.It Fl H
Scripted mode.
Do not display headers, and separate fields by a single tab instead of arbitrary
space.
.It Fl o Ar property
Comma-separated list of properties to display.
See the
.Sx Properties
section for a list of valid properties.
The default list is
.Cm name , size , allocated , free , checkpoint, expandsize , fragmentation ,
.Cm capacity , dedupratio , health , altroot .
.It Fl L
Display real paths for vdevs resolving all symbolic links. This can
be used to look up the current block device name regardless of the
/dev/disk/ path used to open it.
.It Fl p
Display numbers in parsable
.Pq exact
values.
.It Fl P
Display full paths for vdevs instead of only the last component of
the path. This can be used in conjunction with the
.Fl L
flag.
.It Fl T Sy u Ns | Ns Sy d
Display a time stamp.
Specify
.Sy u
for a printed representation of the internal representation of time.
See
.Xr time 2 .
Specify
.Sy d
for standard date format.
See
.Xr date 1 .
.It Fl v
Verbose statistics.
Reports usage statistics for individual vdevs within the pool, in addition to
the pool-wise statistics.
.El
.It Xo
.Nm
.Cm offline
.Op Fl f
.Op Fl t
.Ar pool Ar device Ns ...
.Xc
Takes the specified physical device offline.
While the
.Ar device
is offline, no attempt is made to read or write to the device.
This command is not applicable to spares.
.Bl -tag -width Ds
.It Fl f
Force fault. Instead of offlining the disk, put it into a faulted
state. The fault will persist across imports unless the
.Fl t
flag was specified.
.It Fl t
Temporary.
Upon reboot, the specified physical device reverts to its previous state.
.El
.It Xo
.Nm
.Cm online
.Op Fl e
.Ar pool Ar device Ns ...
.Xc
Brings the specified physical device online.
This command is not applicable to spares.
.Bl -tag -width Ds
.It Fl e
Expand the device to use all available space.
If the device is part of a mirror or raidz then all devices must be expanded
before the new space will become available to the pool.
.El
.It Xo
.Nm
.Cm reguid
.Ar pool
.Xc
Generates a new unique identifier for the pool.
You must ensure that all devices in this pool are online and healthy before
performing this action.
.It Xo
.Nm
.Cm reopen
.Op Fl n
.Ar pool
.Xc
Illumos #3306, #3321 3306 zdb should be able to issue reads in parallel 3321 'zpool reopen' command should be documented in the man page and help Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@31d7e8fa33fae995f558673adb22641b5aa8b6e1 https://www.illumos.org/issues/3306 https://www.illumos.org/issues/3321 The vdev_file.c implementation in this patch diverges significantly from the upstream version. For consistenty with the vdev_disk.c code the upstream version leverages the Illumos bio interfaces. This makes sense for Illumos but not for ZoL for two reasons. 1) The vdev_disk.c code in ZoL has been rewritten to use the Linux block device interfaces which differ significantly from those in Illumos. Therefore, updating the vdev_file.c to use the Illumos interfaces doesn't get you consistency with vdev_disk.c. 2) Using the upstream patch as is would requiring implementing compatibility code for those Solaris block device interfaces in user and kernel space. That additional complexity could lead to confusion and doesn't buy us anything. For these reasons I've opted to simply move the existing vn_rdwr() as is in to the taskq function. This has the advantage of being low risk and easy to understand. Moving the vn_rdwr() function in to its own taskq thread also neatly avoids the possibility of a stack overflow. Finally, because of the additional work which is being handled by the free taskq the number of threads has been increased. The thread count under Illumos defaults to 100 but was decreased to 2 in commit 08d08e due to contention. We increase it to 8 until the contention can be address by porting Illumos #3581. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1354
2013-05-02 23:36:32 +00:00
Reopen all the vdevs associated with the pool.
.Bl -tag -width Ds
.It Fl n
Do not restart an in-progress scrub operation. This is not recommended and can
result in partially resilvered devices unless a second scrub is performed.
.El
.It Xo
.Nm
.Cm remove
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.Op Fl np
.Ar pool Ar device Ns ...
.Xc
Removes the specified device from the pool.
This command supports removing hot spare, cache, log, and both mirrored and
non-redundant primary top-level vdevs, including dedup and special vdevs.
When the primary pool storage includes a top-level raidz vdev only hot spare,
cache, and log devices can be removed.
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.sp
Removing a top-level vdev reduces the total amount of space in the storage pool.
The specified device will be evacuated by copying all allocated space from it to
the other devices in the pool.
In this case, the
.Nm zpool Cm remove
command initiates the removal and returns, while the evacuation continues in
the background.
The removal progress can be monitored with
Detect IO errors during device removal * Detect IO errors during device removal While device removal cannot verify the checksums of individual blocks during device removal, it can reasonably detect hard IO errors from the leaf vdevs. Failure to perform this error checking can result in device removal completing successfully, but moving no data which will permanently corrupt the pool. Situation 1: faulted/degraded vdevs In the configuration shown below, the removal of mirror-0 will permanently corrupt the pool. Device removal will preferentially copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which in this case will result in nothing being copied since one vdev in each of those groups in unavailable. However, device removal will complete successfully since all IO errors are ignored. tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 /var/tmp/vdev1 FAULTED 0 0 0 external fault /var/tmp/vdev2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 /var/tmp/vdev3 ONLINE 0 0 0 /var/tmp/vdev4 FAULTED 0 0 0 external fault This issue is resolved by updating the source child selection logic to exclude unreadable leaf vdevs. Additionally, unwritable destination child vdevs which can never succeed are skipped to prevent generating a large number of write IO errors. Situation 2: individual hard IO errors During removal if an unexpected hard IO error is encountered when either reading or writing the child vdev the entire removal operation is cancelled. While it may be possible to reconstruct the data after removal that cannot be guaranteed. The only strictly safe thing to do is to cancel the removal. As a future improvement we may want to instead suspend the removal process and allow the damaged region to be retried. But that work is left for another time, hard IO errors during the removal process are expected to be exceptionally rare. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #6900 Closes #8161
2018-12-04 17:37:37 +00:00
.Nm zpool Cm status .
If an IO error is encountered during the removal process it will be
cancelled. The
.Sy device_removal
feature flag must be enabled to remove a top-level vdev, see
.Xr zpool-features 5 .
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.Pp
A mirrored top-level device (log or data) can be removed by specifying the top-level mirror for the
same.
Non-log devices or data devices that are part of a mirrored configuration can be removed using
the
.Nm zpool Cm detach
command.
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.Bl -tag -width Ds
.It Fl n
Do not actually perform the removal ("no-op").
Instead, print the estimated amount of memory that will be used by the
mapping table after the removal completes.
This is nonzero only for top-level vdevs.
.El
.Bl -tag -width Ds
.It Fl p
Used in conjunction with the
.Fl n
flag, displays numbers as parsable (exact) values.
.El
.It Xo
.Nm
.Cm remove
.Fl s
.Ar pool
.Xc
Stops and cancels an in-progress removal of a top-level vdev.
.It Xo
.Nm
.Cm replace
.Op Fl f
.Op Fl o Ar property Ns = Ns Ar value
.Ar pool Ar device Op Ar new_device
.Xc
Replaces
.Ar old_device
with
.Ar new_device .
This is equivalent to attaching
.Ar new_device ,
waiting for it to resilver, and then detaching
.Ar old_device .
.Pp
The size of
.Ar new_device
must be greater than or equal to the minimum size of all the devices in a mirror
or raidz configuration.
.Pp
.Ar new_device
is required if the pool is not redundant.
If
.Ar new_device
is not specified, it defaults to
.Ar old_device .
This form of replacement is useful after an existing disk has failed and has
been physically replaced.
In this case, the new disk may have the same
.Pa /dev
path as the old device, even though it is actually a different disk.
ZFS recognizes this.
.Bl -tag -width Ds
.It Fl f
Forces use of
.Ar new_device ,
even if it appears to be in use.
Not all devices can be overridden in this manner.
.It Fl o Ar property Ns = Ns Ar value
Sets the given pool properties. See the
.Sx Properties
section for a list of valid properties that can be set.
The only property supported at the moment is
.Sy ashift .
.El
.It Xo
.Nm
.Cm scrub
.Op Fl s | Fl p
.Ar pool Ns ...
.Xc
Begins a scrub or resumes a paused scrub.
The scrub examines all data in the specified pools to verify that it checksums
correctly.
For replicated
.Pq mirror or raidz
devices, ZFS automatically repairs any damage discovered during the scrub.
The
.Nm zpool Cm status
command reports the progress of the scrub and summarizes the results of the
scrub upon completion.
.Pp
Scrubbing and resilvering are very similar operations.
The difference is that resilvering only examines data that ZFS knows to be out
of date
.Po
for example, when attaching a new device to a mirror or replacing an existing
device
.Pc ,
whereas scrubbing examines all data to discover silent errors due to hardware
faults or disk failure.
.Pp
Because scrubbing and resilvering are I/O-intensive operations, ZFS only allows
one at a time.
If a scrub is paused, the
.Nm zpool Cm scrub
resumes it.
If a resilver is in progress, ZFS does not allow a scrub to be started until the
resilver completes.
.Bl -tag -width Ds
.It Fl s
Stop scrubbing.
.El
.Bl -tag -width Ds
.It Fl p
Pause scrubbing.
Scrub pause state and progress are periodically synced to disk.
If the system is restarted or pool is exported during a paused scrub,
even after import, scrub will remain paused until it is resumed.
Once resumed the scrub will pick up from the place where it was last
checkpointed to disk.
To resume a paused scrub issue
.Nm zpool Cm scrub
again.
.El
.It Xo
.Nm
.Cm resilver
.Ar pool Ns ...
.Xc
Starts a resilver. If an existing resilver is already running it will be
restarted from the beginning. Any drives that were scheduled for a deferred
resilver will be added to the new one.
.It Xo
.Nm
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.Cm trim
.Op Fl d
.Op Fl c | Fl s
.Ar pool
.Op Ar device Ns ...
.Xc
Initiates an immediate on-demand TRIM operation for all of the free space in
a pool. This operation informs the underlying storage devices of all blocks
in the pool which are no longer allocated and allows thinly provisioned
devices to reclaim the space.
.Pp
A manual on-demand TRIM operation can be initiated irrespective of the
.Sy autotrim
pool property setting. See the documentation for the
.Sy autotrim
property above for the types of vdev devices which can be trimmed.
.Bl -tag -width Ds
.It Fl d -secure
Causes a secure TRIM to be initiated. When performing a secure TRIM, the
device guarantees that data stored on the trimmed blocks has been erased.
This requires support from the device and is not supported by all SSDs.
.It Fl r -rate Ar rate
Controls the rate at which the TRIM operation progresses. Without this
option TRIM is executed as quickly as possible. The rate, expressed in bytes
per second, is applied on a per-vdev basis and may be set differently for
each leaf vdev.
.It Fl c, -cancel
Cancel trimming on the specified devices, or all eligible devices if none
are specified.
If one or more target devices are invalid or are not currently being
trimmed, the command will fail and no cancellation will occur on any device.
.It Fl s -suspend
Suspend trimming on the specified devices, or all eligible devices if none
are specified.
If one or more target devices are invalid or are not currently being
trimmed, the command will fail and no suspension will occur on any device.
Trimming can then be resumed by running
.Nm zpool Cm trim
with no flags on the relevant target devices.
.El
.It Xo
.Nm
.Cm set
.Ar property Ns = Ns Ar value
.Ar pool
.Xc
Sets the given property on the specified pool.
See the
.Sx Properties
section for more information on what properties can be set and acceptable
values.
.It Xo
.Nm
.Cm split
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.Op Fl gLlnP
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
.Op Fl R Ar root
.Ar pool newpool
.Op Ar device ...
.Xc
Splits devices off
.Ar pool
creating
.Ar newpool .
All vdevs in
.Ar pool
must be mirrors and the pool must not be in the process of resilvering.
At the time of the split,
.Ar newpool
will be a replica of
.Ar pool .
By default, the
last device in each mirror is split from
.Ar pool
to create
.Ar newpool .
.Pp
The optional device specification causes the specified device(s) to be
included in the new
.Ar pool
and, should any devices remain unspecified,
the last device in each mirror is used as would be by default.
.Bl -tag -width Ds
.It Fl g
Display vdev GUIDs instead of the normal device names. These GUIDs
can be used in place of device names for the zpool
detach/offline/remove/replace commands.
.It Fl L
Display real paths for vdevs resolving all symbolic links. This can
be used to look up the current block device name regardless of the
.Pa /dev/disk/
path used to open it.
Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769
2017-08-14 17:36:48 +00:00
.It Fl l
Indicates that this command will request encryption keys for all encrypted
datasets it attempts to mount as it is bringing the new pool online. Note that
if any datasets have a
.Sy keylocation
of
.Sy prompt
this command will block waiting for the keys to be entered. Without this flag
encrypted datasets will be left unavailable until the keys are loaded.
.It Fl n
Do dry run, do not actually perform the split.
Print out the expected configuration of
.Ar newpool .
.It Fl P
Display full paths for vdevs instead of only the last component of
the path. This can be used in conjunction with the
.Fl L
flag.
.It Fl o Ar property Ns = Ns Ar value
Sets the specified property for
.Ar newpool .
See the
.Sx Properties
section for more information on the available pool properties.
.It Fl R Ar root
Set
.Sy altroot
for
.Ar newpool
to
.Ar root
and automatically import it.
.El
.It Xo
.Nm
.Cm status
.Op Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ...
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.Op Fl DigLpPstvx
.Op Fl T Sy u Ns | Ns Sy d
.Oo Ar pool Oc Ns ...
.Op Ar interval Op Ar count
.Xc
Displays the detailed health status for the given pools.
If no
.Ar pool
is specified, then the status of each pool in the system is displayed.
For more information on pool and device health, see the
.Sx Device Failure and Recovery
section.
.Pp
If a scrub or resilver is in progress, this command reports the percentage done
and the estimated time to completion.
Both of these are only approximate, because the amount of data in the pool and
the other workloads on the system can change.
.Bl -tag -width Ds
.It Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ...
Run a script (or scripts) on each vdev and include the output as a new column
in the
.Nm zpool Cm status
output. See the
.Fl c
option of
.Nm zpool Cm iostat
for complete details.
.It Fl i
Display vdev initialization status.
.It Fl g
Display vdev GUIDs instead of the normal device names. These GUIDs
can be used in place of device names for the zpool
detach/offline/remove/replace commands.
.It Fl L
Display real paths for vdevs resolving all symbolic links. This can
be used to look up the current block device name regardless of the
.Pa /dev/disk/
path used to open it.
.It Fl p
Display numbers in parsable (exact) values.
.It Fl P
Display full paths for vdevs instead of only the last component of
the path. This can be used in conjunction with the
.Fl L
flag.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk
and referenced
.Pq logically referenced in the pool
block counts and sizes by reference count.
.It Fl s
Display the number of leaf VDEV slow IOs. This is the number of IOs that
didn't complete in \fBzio_slow_io_ms\fR milliseconds (default 30 seconds).
This does not necessarily mean the IOs failed to complete, just took an
unreasonably long amount of time. This may indicate a problem with the
underlying storage.
Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598
2019-03-29 16:13:20 +00:00
.It Fl t
Display vdev TRIM status.
.It Fl T Sy u Ns | Ns Sy d
Display a time stamp.
Specify
.Sy u
for a printed representation of the internal representation of time.
See
.Xr time 2 .
Specify
.Sy d
for standard date format.
See
.Xr date 1 .
.It Fl v
Displays verbose data error information, printing out a complete list of all
data errors since the last complete pool scrub.
.It Fl x
Only display status for pools that are exhibiting errors or are otherwise
unavailable.
Warnings about pools not using the latest on-disk format will not be included.
.El
.It Xo
.Nm
.Cm sync
.Op Ar pool ...
.Xc
This command forces all in-core dirty data to be written to the primary
pool storage and not the ZIL. It will also update administrative
information including quota reporting. Without arguments,
.Sy zpool sync
will sync all pools on the system. Otherwise, it will sync only the
specified pool(s).
.It Xo
.Nm
.Cm upgrade
.Xc
Displays pools which do not have all supported features enabled and pools
formatted using a legacy ZFS version number.
These pools can continue to be used, but some features may not be available.
Use
.Nm zpool Cm upgrade Fl a
to enable all features on all pools.
.It Xo
.Nm
.Cm upgrade
.Fl v
.Xc
Displays legacy ZFS versions supported by the current software.
See
.Xr zpool-features 5
for a description of feature flags features supported by the current software.
.It Xo
.Nm
.Cm upgrade
.Op Fl V Ar version
.Fl a Ns | Ns Ar pool Ns ...
.Xc
Enables all supported features on the given pool.
Once this is done, the pool will no longer be accessible on systems that do not
support feature flags.
See
.Xr zpool-features 5
for details on compatibility with systems that support feature flags, but do not
support all features enabled on the pool.
.Bl -tag -width Ds
.It Fl a
Enables all supported features on all pools.
.It Fl V Ar version
Upgrade to the specified legacy version.
If the
.Fl V
flag is specified, no features will be enabled on the pool.
This option can only be used to increase the version number up to the last
supported legacy version number.
.El
.El
.Sh EXIT STATUS
The following exit values are returned:
.Bl -tag -width Ds
.It Sy 0
Successful completion.
.It Sy 1
An error occurred.
.It Sy 2
Invalid command line options were specified.
.El
.Sh EXAMPLES
.Bl -tag -width Ds
.It Sy Example 1 No Creating a RAID-Z Storage Pool
The following command creates a pool with a single raidz root vdev that
consists of six disks.
.Bd -literal
# zpool create tank raidz sda sdb sdc sdd sde sdf
.Ed
.It Sy Example 2 No Creating a Mirrored Storage Pool
The following command creates a pool with two mirrors, where each mirror
contains two disks.
.Bd -literal
# zpool create tank mirror sda sdb mirror sdc sdd
.Ed
.It Sy Example 3 No Creating a ZFS Storage Pool by Using Partitions
The following command creates an unmirrored pool using two disk partitions.
.Bd -literal
# zpool create tank sda1 sdb2
.Ed
.It Sy Example 4 No Creating a ZFS Storage Pool by Using Files
The following command creates an unmirrored pool using files.
While not recommended, a pool based on files can be useful for experimental
purposes.
.Bd -literal
# zpool create tank /path/to/file/a /path/to/file/b
.Ed
.It Sy Example 5 No Adding a Mirror to a ZFS Storage Pool
The following command adds two mirrored disks to the pool
.Em tank ,
assuming the pool is already made up of two-way mirrors.
The additional space is immediately available to any datasets within the pool.
.Bd -literal
# zpool add tank mirror sda sdb
.Ed
.It Sy Example 6 No Listing Available ZFS Storage Pools
The following command lists all available pools on the system.
In this case, the pool
.Em zion
is faulted due to a missing device.
The results from this command are similar to the following:
.Bd -literal
# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 19.9G 8.43G 11.4G - 33% 42% 1.00x ONLINE -
tank 61.5G 20.0G 41.5G - 48% 32% 1.00x ONLINE -
zion - - - - - - - FAULTED -
.Ed
.It Sy Example 7 No Destroying a ZFS Storage Pool
The following command destroys the pool
.Em tank
and any datasets contained within.
.Bd -literal
# zpool destroy -f tank
.Ed
.It Sy Example 8 No Exporting a ZFS Storage Pool
The following command exports the devices in pool
.Em tank
so that they can be relocated or later imported.
.Bd -literal
# zpool export tank
.Ed
.It Sy Example 9 No Importing a ZFS Storage Pool
The following command displays available pools, and then imports the pool
.Em tank
for use on the system.
The results from this command are similar to the following:
.Bd -literal
# zpool import
pool: tank
id: 15451357997522795478
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
tank ONLINE
mirror ONLINE
sda ONLINE
sdb ONLINE
# zpool import tank
.Ed
.It Sy Example 10 No Upgrading All ZFS Storage Pools to the Current Version
The following command upgrades all ZFS Storage pools to the current version of
the software.
.Bd -literal
# zpool upgrade -a
This system is currently running ZFS version 2.
.Ed
.It Sy Example 11 No Managing Hot Spares
The following command creates a new pool with an available hot spare:
.Bd -literal
# zpool create tank mirror sda sdb spare sdc
.Ed
.Pp
If one of the disks were to fail, the pool would be reduced to the degraded
state.
The failed device can be replaced using the following command:
.Bd -literal
# zpool replace tank sda sdd
.Ed
.Pp
Once the data has been resilvered, the spare is automatically removed and is
made available for use should another device fail.
The hot spare can be permanently removed from the pool using the following
command:
.Bd -literal
# zpool remove tank sdc
.Ed
.It Sy Example 12 No Creating a ZFS Pool with Mirrored Separate Intent Logs
The following command creates a ZFS storage pool consisting of two, two-way
mirrors and mirrored log devices:
.Bd -literal
# zpool create pool mirror sda sdb mirror sdc sdd log mirror \\
sde sdf
.Ed
.It Sy Example 13 No Adding Cache Devices to a ZFS Pool
The following command adds two disks for use as cache devices to a ZFS storage
pool:
.Bd -literal
# zpool add pool cache sdc sdd
.Ed
.Pp
Once added, the cache devices gradually fill with content from main memory.
Depending on the size of your cache devices, it could take over an hour for
them to fill.
Capacity and reads can be monitored using the
.Cm iostat
option as follows:
.Bd -literal
# zpool iostat -v pool 5
.Ed
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.It Sy Example 14 No Removing a Mirrored top-level (Log or Data) Device
The following commands remove the mirrored log device
.Sy mirror-2
and mirrored top-level data device
.Sy mirror-1 .
.Pp
Given this configuration:
.Bd -literal
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
logs
mirror-2 ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
.Ed
.Pp
The command to remove the mirrored log
.Sy mirror-2
is:
.Bd -literal
# zpool remove tank mirror-2
.Ed
OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900
2016-09-22 16:30:13 +00:00
.Pp
The command to remove the mirrored data
.Sy mirror-1
is:
.Bd -literal
# zpool remove tank mirror-1
.Ed
.It Sy Example 15 No Displaying expanded space on a device
The following command displays the detailed information for the pool
.Em data .
This pool is comprised of a single raidz vdev where one of its devices
increased its capacity by 10GB.
In this example, the pool will not be able to utilize this extra capacity until
all the devices under the raidz vdev have been expanded.
.Bd -literal
# zpool list -v data
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data 23.9G 14.6G 9.30G - 48% 61% 1.00x ONLINE -
raidz1 23.9G 14.6G 9.30G - 48%
sda - - - - -
sdb - - - 10G -
sdc - - - - -
.Ed
.It Sy Example 16 No Adding output columns
Additional columns can be added to the
.Nm zpool Cm status
and
.Nm zpool Cm iostat
output with
.Fl c
option.
.Bd -literal
# zpool status -c vendor,model,size
NAME STATE READ WRITE CKSUM vendor model size
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
U1 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
U10 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
U11 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
U12 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
U13 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
U14 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
# zpool iostat -vc slaves
capacity operations bandwidth
pool alloc free read write read write slaves
---------- ----- ----- ----- ----- ----- ----- ---------
tank 20.4G 7.23T 26 152 20.7M 21.6M
mirror 20.4G 7.23T 26 152 20.7M 21.6M
U1 - - 0 31 1.46K 20.6M sdb sdff
U10 - - 0 1 3.77K 13.3K sdas sdgw
U11 - - 0 1 288K 13.3K sdat sdgx
U12 - - 0 1 78.4K 13.3K sdau sdgy
U13 - - 0 1 128K 13.3K sdav sdgz
U14 - - 0 1 63.2K 13.3K sdfk sdg
.Ed
.El
.Sh ENVIRONMENT VARIABLES
.Bl -tag -width "ZFS_ABORT"
.It Ev ZFS_ABORT
Cause
.Nm zpool
to dump core on exit for the purposes of running
.Sy ::findleaks .
.El
.Bl -tag -width "ZPOOL_IMPORT_PATH"
.It Ev ZPOOL_IMPORT_PATH
The search path for devices or files to use with the pool. This is a colon-separated list of directories in which
.Nm zpool
looks for device nodes and files.
Similar to the
.Fl d
option in
.Nm zpool import .
.El
.Bl -tag -width "ZPOOL_VDEV_NAME_GUID"
.It Ev ZPOOL_VDEV_NAME_GUID
Cause
.Nm zpool subcommands to output vdev guids by default. This behavior
is identical to the
.Nm zpool status -g
command line option.
.El
.Bl -tag -width "ZPOOL_VDEV_NAME_FOLLOW_LINKS"
.It Ev ZPOOL_VDEV_NAME_FOLLOW_LINKS
Cause
.Nm zpool
subcommands to follow links for vdev names by default. This behavior is identical to the
.Nm zpool status -L
command line option.
.El
.Bl -tag -width "ZPOOL_VDEV_NAME_PATH"
.It Ev ZPOOL_VDEV_NAME_PATH
Cause
.Nm zpool
subcommands to output full vdev path names by default. This
behavior is identical to the
.Nm zpool status -p
command line option.
.El
.Bl -tag -width "ZFS_VDEV_DEVID_OPT_OUT"
.It Ev ZFS_VDEV_DEVID_OPT_OUT
Older ZFS on Linux implementations had issues when attempting to display pool
config VDEV names if a
.Sy devid
NVP value is present in the pool's config.
.Pp
For example, a pool that originated on illumos platform would have a devid
value in the config and
.Nm zpool status
would fail when listing the config.
This would also be true for future Linux based pools.
.Pp
A pool can be stripped of any
.Sy devid
values on import or prevented from adding
them on
.Nm zpool create
or
.Nm zpool add
by setting
.Sy ZFS_VDEV_DEVID_OPT_OUT .
.El
.Bl -tag -width "ZPOOL_SCRIPTS_AS_ROOT"
.It Ev ZPOOL_SCRIPTS_AS_ROOT
Allow a privileged user to run the
.Nm zpool status/iostat
with the
.Fl c
option. Normally, only unprivileged users are allowed to run
.Fl c .
.El
.Bl -tag -width "ZPOOL_SCRIPTS_PATH"
.It Ev ZPOOL_SCRIPTS_PATH
The search path for scripts when running
.Nm zpool status/iostat
with the
.Fl c
option. This is a colon-separated list of directories and overrides the default
.Pa ~/.zpool.d
and
.Pa /etc/zfs/zpool.d
search paths.
.El
.Bl -tag -width "ZPOOL_SCRIPTS_ENABLED"
.It Ev ZPOOL_SCRIPTS_ENABLED
Allow a user to run
.Nm zpool status/iostat
with the
.Fl c
option. If
.Sy ZPOOL_SCRIPTS_ENABLED
is not set, it is assumed that the user is allowed to run
.Nm zpool status/iostat -c .
.El
.Sh INTERFACE STABILITY
.Sy Evolving
.Sh SEE ALSO
.Xr zfs-events 5 ,
.Xr zfs-module-parameters 5 ,
.Xr zpool-features 5 ,
.Xr zed 8 ,
.Xr zfs 8