2017-06-18 18:27:06 +00:00
|
|
|
.\"
|
|
|
|
.\" CDDL HEADER START
|
|
|
|
.\"
|
|
|
|
.\" The contents of this file are subject to the terms of the
|
|
|
|
.\" Common Development and Distribution License (the "License").
|
|
|
|
.\" You may not use this file except in compliance with the License.
|
|
|
|
.\"
|
|
|
|
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
|
|
|
.\" or http://www.opensolaris.org/os/licensing.
|
|
|
|
.\" See the License for the specific language governing permissions
|
|
|
|
.\" and limitations under the License.
|
|
|
|
.\"
|
|
|
|
.\" When distributing Covered Code, include this CDDL HEADER in each
|
|
|
|
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
|
|
|
.\" If applicable, add the following below this CDDL HEADER, with the
|
|
|
|
.\" fields enclosed by brackets "[]" replaced with your own identifying
|
|
|
|
.\" information: Portions Copyright [yyyy] [name of copyright owner]
|
|
|
|
.\"
|
|
|
|
.\" CDDL HEADER END
|
|
|
|
.\"
|
|
|
|
.\"
|
2009-12-12 00:15:33 +00:00
|
|
|
.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
|
2018-08-20 16:52:37 +00:00
|
|
|
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
|
2012-11-06 12:39:00 +00:00
|
|
|
.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
|
2017-05-19 19:33:11 +00:00
|
|
|
.\" Copyright (c) 2017 Datto Inc.
|
2018-04-30 18:42:58 +00:00
|
|
|
.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
|
2017-08-24 17:30:42 +00:00
|
|
|
.\" Copyright 2017 Nexenta Systems, Inc.
|
2017-10-26 19:26:09 +00:00
|
|
|
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
|
2012-12-13 23:24:15 +00:00
|
|
|
.\"
|
2018-12-04 17:37:37 +00:00
|
|
|
.Dd November 29, 2018
|
2017-06-18 18:27:06 +00:00
|
|
|
.Dt ZPOOL 8 SMM
|
|
|
|
.Os Linux
|
|
|
|
.Sh NAME
|
|
|
|
.Nm zpool
|
|
|
|
.Nd configure ZFS storage pools
|
|
|
|
.Sh SYNOPSIS
|
|
|
|
.Nm
|
|
|
|
.Fl ?
|
|
|
|
.Nm
|
|
|
|
.Cm add
|
|
|
|
.Op Fl fgLnP
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc
|
|
|
|
.Ar pool vdev Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm attach
|
|
|
|
.Op Fl f
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc
|
|
|
|
.Ar pool device new_device
|
|
|
|
.Nm
|
2016-12-16 22:11:29 +00:00
|
|
|
.Cm checkpoint
|
|
|
|
.Op Fl d, -discard
|
|
|
|
.Ar pool
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm clear
|
|
|
|
.Ar pool
|
|
|
|
.Op Ar device
|
|
|
|
.Nm
|
|
|
|
.Cm create
|
|
|
|
.Op Fl dfn
|
|
|
|
.Op Fl m Ar mountpoint
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Oo Fl o Ar feature@feature Ns = Ns Ar value Oc
|
|
|
|
.Oo Fl O Ar file-system-property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Ar pool vdev Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm destroy
|
|
|
|
.Op Fl f
|
|
|
|
.Ar pool
|
|
|
|
.Nm
|
|
|
|
.Cm detach
|
|
|
|
.Ar pool device
|
|
|
|
.Nm
|
|
|
|
.Cm events
|
2017-10-26 23:49:33 +00:00
|
|
|
.Op Fl vHf Oo Ar pool Oc | Fl c
|
2017-06-18 18:27:06 +00:00
|
|
|
.Nm
|
|
|
|
.Cm export
|
|
|
|
.Op Fl a
|
|
|
|
.Op Fl f
|
|
|
|
.Ar pool Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm get
|
|
|
|
.Op Fl Hp
|
|
|
|
.Op Fl o Ar field Ns Oo , Ns Ar field Oc Ns ...
|
|
|
|
.Sy all Ns | Ns Ar property Ns Oo , Ns Ar property Oc Ns ...
|
2018-09-18 15:55:33 +00:00
|
|
|
.Oo Ar pool Oc Ns ...
|
2017-06-18 18:27:06 +00:00
|
|
|
.Nm
|
|
|
|
.Cm history
|
|
|
|
.Op Fl il
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm import
|
|
|
|
.Op Fl D
|
2018-01-26 18:49:46 +00:00
|
|
|
.Op Fl d Ar dir Ns | Ns device
|
2017-06-18 18:27:06 +00:00
|
|
|
.Nm
|
|
|
|
.Cm import
|
|
|
|
.Fl a
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.Op Fl DflmN
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc
|
2016-12-16 22:11:29 +00:00
|
|
|
.Op Fl -rewind-to-checkpoint
|
2018-01-26 18:49:46 +00:00
|
|
|
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl o Ar mntopts
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Nm
|
|
|
|
.Cm import
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.Op Fl Dflm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc
|
2016-12-16 22:11:29 +00:00
|
|
|
.Op Fl -rewind-to-checkpoint
|
2018-01-26 18:49:46 +00:00
|
|
|
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl o Ar mntopts
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Op Fl s
|
|
|
|
.Ar pool Ns | Ns Ar id
|
|
|
|
.Op Ar newpool Oo Fl t Oc
|
|
|
|
.Nm
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 14:54:59 +00:00
|
|
|
.Cm initialize
|
2018-12-27 21:12:53 +00:00
|
|
|
.Op Fl c | Fl s
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 14:54:59 +00:00
|
|
|
.Ar pool
|
|
|
|
.Op Ar device Ns ...
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm iostat
|
|
|
|
.Op Oo Oo Fl c Ar SCRIPT Oc Oo Fl lq Oc Oc Ns | Ns Fl rw
|
|
|
|
.Op Fl T Sy u Ns | Ns Sy d
|
|
|
|
.Op Fl ghHLpPvy
|
|
|
|
.Oo Oo Ar pool Ns ... Oc Ns | Ns Oo Ar pool vdev Ns ... Oc Ns | Ns Oo Ar vdev Ns ... Oc Oc
|
|
|
|
.Op Ar interval Op Ar count
|
|
|
|
.Nm
|
|
|
|
.Cm labelclear
|
|
|
|
.Op Fl f
|
|
|
|
.Ar device
|
|
|
|
.Nm
|
|
|
|
.Cm list
|
|
|
|
.Op Fl HgLpPv
|
|
|
|
.Op Fl o Ar property Ns Oo , Ns Ar property Oc Ns ...
|
|
|
|
.Op Fl T Sy u Ns | Ns Sy d
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Op Ar interval Op Ar count
|
|
|
|
.Nm
|
|
|
|
.Cm offline
|
|
|
|
.Op Fl f
|
|
|
|
.Op Fl t
|
|
|
|
.Ar pool Ar device Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm online
|
|
|
|
.Op Fl e
|
|
|
|
.Ar pool Ar device Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm reguid
|
|
|
|
.Ar pool
|
|
|
|
.Nm
|
|
|
|
.Cm reopen
|
2017-10-26 19:26:09 +00:00
|
|
|
.Op Fl n
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar pool
|
|
|
|
.Nm
|
|
|
|
.Cm remove
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.Op Fl np
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar pool Ar device Ns ...
|
|
|
|
.Nm
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.Cm remove
|
|
|
|
.Fl s
|
|
|
|
.Ar pool
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm replace
|
|
|
|
.Op Fl f
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc
|
|
|
|
.Ar pool Ar device Op Ar new_device
|
|
|
|
.Nm
|
2018-10-19 04:06:18 +00:00
|
|
|
.Cm resilver
|
|
|
|
.Ar pool Ns ...
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm scrub
|
2017-07-07 05:16:13 +00:00
|
|
|
.Op Fl s | Fl p
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar pool Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm set
|
|
|
|
.Ar property Ns = Ns Ar value
|
|
|
|
.Ar pool
|
|
|
|
.Nm
|
|
|
|
.Cm split
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.Op Fl gLlnP
|
2017-06-18 18:27:06 +00:00
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Ar pool newpool
|
|
|
|
.Oo Ar device Oc Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm status
|
|
|
|
.Oo Fl c Ar SCRIPT Oc
|
2018-12-27 21:12:53 +00:00
|
|
|
.Op Fl DigLpPsvx
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl T Sy u Ns | Ns Sy d
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Op Ar interval Op Ar count
|
|
|
|
.Nm
|
|
|
|
.Cm sync
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Nm
|
|
|
|
.Cm upgrade
|
|
|
|
.Nm
|
|
|
|
.Cm upgrade
|
|
|
|
.Fl v
|
|
|
|
.Nm
|
|
|
|
.Cm upgrade
|
|
|
|
.Op Fl V Ar version
|
|
|
|
.Fl a Ns | Ns Ar pool Ns ...
|
|
|
|
.Sh DESCRIPTION
|
|
|
|
The
|
|
|
|
.Nm
|
|
|
|
command configures ZFS storage pools.
|
|
|
|
A storage pool is a collection of devices that provides physical storage and
|
|
|
|
data replication for ZFS datasets.
|
|
|
|
All datasets within a storage pool share the same space.
|
|
|
|
See
|
|
|
|
.Xr zfs 8
|
|
|
|
for information on managing datasets.
|
|
|
|
.Ss Virtual Devices (vdevs)
|
|
|
|
A "virtual device" describes a single device or a collection of devices
|
|
|
|
organized according to certain performance and fault characteristics.
|
|
|
|
The following virtual devices are supported:
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Sy disk
|
|
|
|
A block device, typically located under
|
|
|
|
.Pa /dev .
|
|
|
|
ZFS can use individual slices or partitions, though the recommended mode of
|
|
|
|
operation is to use whole disks.
|
|
|
|
A disk can be specified by a full path, or it can be a shorthand name
|
|
|
|
.Po the relative portion of the path under
|
|
|
|
.Pa /dev
|
|
|
|
.Pc .
|
|
|
|
A whole disk can be specified by omitting the slice or partition designation.
|
|
|
|
For example,
|
|
|
|
.Pa sda
|
|
|
|
is equivalent to
|
|
|
|
.Pa /dev/sda .
|
|
|
|
When given a whole disk, ZFS automatically labels the disk, if necessary.
|
|
|
|
.It Sy file
|
|
|
|
A regular file.
|
|
|
|
The use of files as a backing store is strongly discouraged.
|
|
|
|
It is designed primarily for experimental purposes, as the fault tolerance of a
|
|
|
|
file is only as good as the file system of which it is a part.
|
|
|
|
A file must be specified by a full path.
|
|
|
|
.It Sy mirror
|
|
|
|
A mirror of two or more devices.
|
|
|
|
Data is replicated in an identical fashion across all components of a mirror.
|
|
|
|
A mirror with N disks of size X can hold X bytes and can withstand (N-1) devices
|
|
|
|
failing before data integrity is compromised.
|
|
|
|
.It Sy raidz , raidz1 , raidz2 , raidz3
|
|
|
|
A variation on RAID-5 that allows for better distribution of parity and
|
|
|
|
eliminates the RAID-5
|
|
|
|
.Qq write hole
|
|
|
|
.Pq in which data and parity become inconsistent after a power loss .
|
|
|
|
Data and parity is striped across all disks within a raidz group.
|
|
|
|
.Pp
|
|
|
|
A raidz group can have single-, double-, or triple-parity, meaning that the
|
|
|
|
raidz group can sustain one, two, or three failures, respectively, without
|
|
|
|
losing any data.
|
|
|
|
The
|
|
|
|
.Sy raidz1
|
|
|
|
vdev type specifies a single-parity raidz group; the
|
|
|
|
.Sy raidz2
|
|
|
|
vdev type specifies a double-parity raidz group; and the
|
|
|
|
.Sy raidz3
|
|
|
|
vdev type specifies a triple-parity raidz group.
|
|
|
|
The
|
|
|
|
.Sy raidz
|
|
|
|
vdev type is an alias for
|
|
|
|
.Sy raidz1 .
|
|
|
|
.Pp
|
|
|
|
A raidz group with N disks of size X with P parity disks can hold approximately
|
|
|
|
(N-P)*X bytes and can withstand P device(s) failing before data integrity is
|
|
|
|
compromised.
|
|
|
|
The minimum number of devices in a raidz group is one more than the number of
|
|
|
|
parity disks.
|
|
|
|
The recommended number is between 3 and 9 to help increase performance.
|
|
|
|
.It Sy spare
|
|
|
|
A special pseudo-vdev which keeps track of available hot spares for a pool.
|
|
|
|
For more information, see the
|
|
|
|
.Sx Hot Spares
|
|
|
|
section.
|
|
|
|
.It Sy log
|
|
|
|
A separate intent log device.
|
|
|
|
If more than one log device is specified, then writes are load-balanced between
|
|
|
|
devices.
|
|
|
|
Log devices can be mirrored.
|
|
|
|
However, raidz vdev types are not supported for the intent log.
|
|
|
|
For more information, see the
|
|
|
|
.Sx Intent Log
|
|
|
|
section.
|
2018-09-06 01:33:36 +00:00
|
|
|
.It Sy dedup
|
|
|
|
A device dedicated solely for allocating dedup data.
|
|
|
|
The redundancy of this device should match the redundancy of the other normal
|
|
|
|
devices in the pool. If more than one dedup device is specified, then
|
|
|
|
allocations are load-balanced between devices.
|
|
|
|
.It Sy special
|
|
|
|
A device dedicated solely for allocating various kinds of internal metadata,
|
|
|
|
and optionally small file data.
|
|
|
|
The redundancy of this device should match the redundancy of the other normal
|
|
|
|
devices in the pool. If more than one special device is specified, then
|
|
|
|
allocations are load-balanced between devices.
|
|
|
|
.Pp
|
|
|
|
For more information on special allocations, see the
|
|
|
|
.Sx Special Allocation Class
|
|
|
|
section.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy cache
|
|
|
|
A device used to cache storage pool data.
|
|
|
|
A cache device cannot be configured as a mirror or raidz group.
|
|
|
|
For more information, see the
|
|
|
|
.Sx Cache Devices
|
|
|
|
section.
|
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
Virtual devices cannot be nested, so a mirror or raidz virtual device can only
|
|
|
|
contain files or disks.
|
|
|
|
Mirrors of mirrors
|
|
|
|
.Pq or other combinations
|
|
|
|
are not allowed.
|
|
|
|
.Pp
|
|
|
|
A pool can have any number of virtual devices at the top of the configuration
|
|
|
|
.Po known as
|
|
|
|
.Qq root vdevs
|
|
|
|
.Pc .
|
|
|
|
Data is dynamically distributed across all top-level devices to balance data
|
|
|
|
among devices.
|
|
|
|
As new virtual devices are added, ZFS automatically places data on the newly
|
|
|
|
available devices.
|
|
|
|
.Pp
|
|
|
|
Virtual devices are specified one at a time on the command line, separated by
|
|
|
|
whitespace.
|
|
|
|
The keywords
|
|
|
|
.Sy mirror
|
|
|
|
and
|
|
|
|
.Sy raidz
|
|
|
|
are used to distinguish where a group ends and another begins.
|
|
|
|
For example, the following creates two root vdevs, each a mirror of two disks:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create mypool mirror sda sdb mirror sdc sdd
|
|
|
|
.Ed
|
|
|
|
.Ss Device Failure and Recovery
|
|
|
|
ZFS supports a rich set of mechanisms for handling device failure and data
|
|
|
|
corruption.
|
|
|
|
All metadata and data is checksummed, and ZFS automatically repairs bad data
|
|
|
|
from a good copy when corruption is detected.
|
|
|
|
.Pp
|
|
|
|
In order to take advantage of these features, a pool must make use of some form
|
|
|
|
of redundancy, using either mirrored or raidz groups.
|
|
|
|
While ZFS supports running in a non-redundant configuration, where each root
|
|
|
|
vdev is simply a disk or file, this is strongly discouraged.
|
|
|
|
A single case of bit corruption can render some or all of your data unavailable.
|
|
|
|
.Pp
|
|
|
|
A pool's health status is described by one of three states: online, degraded,
|
|
|
|
or faulted.
|
|
|
|
An online pool has all devices operating normally.
|
|
|
|
A degraded pool is one in which one or more devices have failed, but the data is
|
|
|
|
still available due to a redundant configuration.
|
|
|
|
A faulted pool has corrupted metadata, or one or more faulted devices, and
|
|
|
|
insufficient replicas to continue functioning.
|
|
|
|
.Pp
|
|
|
|
The health of the top-level vdev, such as mirror or raidz device, is
|
|
|
|
potentially impacted by the state of its associated vdevs, or component
|
|
|
|
devices.
|
|
|
|
A top-level vdev or component device is in one of the following states:
|
|
|
|
.Bl -tag -width "DEGRADED"
|
|
|
|
.It Sy DEGRADED
|
|
|
|
One or more top-level vdevs is in the degraded state because one or more
|
|
|
|
component devices are offline.
|
|
|
|
Sufficient replicas exist to continue functioning.
|
|
|
|
.Pp
|
|
|
|
One or more component devices is in the degraded or faulted state, but
|
|
|
|
sufficient replicas exist to continue functioning.
|
|
|
|
The underlying conditions are as follows:
|
|
|
|
.Bl -bullet
|
|
|
|
.It
|
|
|
|
The number of checksum errors exceeds acceptable levels and the device is
|
|
|
|
degraded as an indication that something may be wrong.
|
|
|
|
ZFS continues to use the device as necessary.
|
|
|
|
.It
|
|
|
|
The number of I/O errors exceeds acceptable levels.
|
|
|
|
The device could not be marked as faulted because there are insufficient
|
|
|
|
replicas to continue functioning.
|
|
|
|
.El
|
|
|
|
.It Sy FAULTED
|
|
|
|
One or more top-level vdevs is in the faulted state because one or more
|
|
|
|
component devices are offline.
|
|
|
|
Insufficient replicas exist to continue functioning.
|
|
|
|
.Pp
|
|
|
|
One or more component devices is in the faulted state, and insufficient
|
|
|
|
replicas exist to continue functioning.
|
|
|
|
The underlying conditions are as follows:
|
|
|
|
.Bl -bullet
|
|
|
|
.It
|
2015-12-17 01:45:15 +00:00
|
|
|
The device could be opened, but the contents did not match expected values.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It
|
|
|
|
The number of I/O errors exceeds acceptable levels and the device is faulted to
|
|
|
|
prevent further use of the device.
|
|
|
|
.El
|
|
|
|
.It Sy OFFLINE
|
|
|
|
The device was explicitly taken offline by the
|
|
|
|
.Nm zpool Cm offline
|
|
|
|
command.
|
|
|
|
.It Sy ONLINE
|
2009-12-12 00:15:33 +00:00
|
|
|
The device is online and functioning.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy REMOVED
|
|
|
|
The device was physically removed while the system was running.
|
|
|
|
Device removal detection is hardware-dependent and may not be supported on all
|
|
|
|
platforms.
|
|
|
|
.It Sy UNAVAIL
|
|
|
|
The device could not be opened.
|
|
|
|
If a pool is imported when a device was unavailable, then the device will be
|
|
|
|
identified by a unique identifier instead of its path since the path was never
|
|
|
|
correct in the first place.
|
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
If a device is removed and later re-attached to the system, ZFS attempts
|
|
|
|
to put the device online automatically.
|
|
|
|
Device attach detection is hardware-dependent and might not be supported on all
|
|
|
|
platforms.
|
|
|
|
.Ss Hot Spares
|
|
|
|
ZFS allows devices to be associated with pools as
|
|
|
|
.Qq hot spares .
|
|
|
|
These devices are not actively used in the pool, but when an active device
|
|
|
|
fails, it is automatically replaced by a hot spare.
|
|
|
|
To create a pool with hot spares, specify a
|
|
|
|
.Sy spare
|
|
|
|
vdev with any number of devices.
|
|
|
|
For example,
|
|
|
|
.Bd -literal
|
2011-04-09 03:27:25 +00:00
|
|
|
# zpool create pool mirror sda sdb spare sdc sdd
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
Spares can be shared across multiple pools, and can be added with the
|
|
|
|
.Nm zpool Cm add
|
|
|
|
command and removed with the
|
|
|
|
.Nm zpool Cm remove
|
|
|
|
command.
|
|
|
|
Once a spare replacement is initiated, a new
|
|
|
|
.Sy spare
|
|
|
|
vdev is created within the configuration that will remain there until the
|
|
|
|
original device is replaced.
|
|
|
|
At this point, the hot spare becomes available again if another device fails.
|
|
|
|
.Pp
|
|
|
|
If a pool has a shared spare that is currently being used, the pool can not be
|
|
|
|
exported since other pools may use this shared spare, which may lead to
|
|
|
|
potential data corruption.
|
|
|
|
.Pp
|
2017-09-15 20:13:52 +00:00
|
|
|
An in-progress spare replacement can be cancelled by detaching the hot spare.
|
2017-06-18 18:27:06 +00:00
|
|
|
If the original faulted device is detached, then the hot spare assumes its
|
|
|
|
place in the configuration, and is removed from the spare list of all active
|
|
|
|
pools.
|
|
|
|
.Pp
|
2009-12-12 00:15:33 +00:00
|
|
|
Spares cannot replace log devices.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ss Intent Log
|
|
|
|
The ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
|
|
|
|
transactions.
|
|
|
|
For instance, databases often require their transactions to be on stable storage
|
|
|
|
devices when returning from a system call.
|
|
|
|
NFS and other applications can also use
|
|
|
|
.Xr fsync 2
|
|
|
|
to ensure data stability.
|
|
|
|
By default, the intent log is allocated from blocks within the main pool.
|
|
|
|
However, it might be possible to get better performance using separate intent
|
|
|
|
log devices such as NVRAM or a dedicated disk.
|
|
|
|
For example:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create pool sda sdb log sdc
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
Multiple log devices can also be specified, and they can be mirrored.
|
|
|
|
See the
|
|
|
|
.Sx EXAMPLES
|
|
|
|
section for an example of mirroring multiple log devices.
|
|
|
|
.Pp
|
2018-03-22 22:12:08 +00:00
|
|
|
Log devices can be added, replaced, attached, detached and removed. In
|
|
|
|
addition, log devices are imported and exported as part of the pool
|
|
|
|
that contains them.
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
Mirrored devices can be removed by specifying the top-level mirror vdev.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ss Cache Devices
|
|
|
|
Devices can be added to a storage pool as
|
|
|
|
.Qq cache devices .
|
|
|
|
These devices provide an additional layer of caching between main memory and
|
|
|
|
disk.
|
|
|
|
For read-heavy workloads, where the working set size is much larger than what
|
|
|
|
can be cached in main memory, using cache devices allow much more of this
|
|
|
|
working set to be served from low latency media.
|
|
|
|
Using cache devices provides the greatest performance improvement for random
|
|
|
|
read-workloads of mostly static content.
|
|
|
|
.Pp
|
|
|
|
To create a pool with cache devices, specify a
|
|
|
|
.Sy cache
|
|
|
|
vdev with any number of devices.
|
|
|
|
For example:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create pool sda sdb cache sdc sdd
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
Cache devices cannot be mirrored or part of a raidz configuration.
|
|
|
|
If a read error is encountered on a cache device, that read I/O is reissued to
|
|
|
|
the original storage pool device, which might be part of a mirrored or raidz
|
|
|
|
configuration.
|
|
|
|
.Pp
|
|
|
|
The content of the cache devices is considered volatile, as is the case with
|
|
|
|
other system caches.
|
2016-12-16 22:11:29 +00:00
|
|
|
.Ss Pool checkpoint
|
|
|
|
Before starting critical procedures that include destructive actions (e.g
|
|
|
|
.Nm zfs Cm destroy
|
|
|
|
), an administrator can checkpoint the pool's state and in the case of a
|
|
|
|
mistake or failure, rewind the entire pool back to the checkpoint.
|
|
|
|
Otherwise, the checkpoint can be discarded when the procedure has completed
|
|
|
|
successfully.
|
|
|
|
.Pp
|
|
|
|
A pool checkpoint can be thought of as a pool-wide snapshot and should be used
|
|
|
|
with care as it contains every part of the pool's state, from properties to vdev
|
|
|
|
configuration.
|
|
|
|
Thus, while a pool has a checkpoint certain operations are not allowed.
|
|
|
|
Specifically, vdev removal/attach/detach, mirror splitting, and
|
|
|
|
changing the pool's guid.
|
|
|
|
Adding a new vdev is supported but in the case of a rewind it will have to be
|
|
|
|
added again.
|
|
|
|
Finally, users of this feature should keep in mind that scrubs in a pool that
|
|
|
|
has a checkpoint do not repair checkpointed data.
|
|
|
|
.Pp
|
|
|
|
To create a checkpoint for a pool:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool checkpoint pool
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
To later rewind to its checkpointed state, you need to first export it and
|
|
|
|
then rewind it during import:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool export pool
|
|
|
|
# zpool import --rewind-to-checkpoint pool
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
To discard the checkpoint from a pool:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool checkpoint -d pool
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
Dataset reservations (controlled by the
|
|
|
|
.Nm reservation
|
|
|
|
or
|
|
|
|
.Nm refreservation
|
|
|
|
zfs properties) may be unenforceable while a checkpoint exists, because the
|
|
|
|
checkpoint is allowed to consume the dataset's reservation.
|
|
|
|
Finally, data that is part of the checkpoint but has been freed in the
|
|
|
|
current state of the pool won't be scanned during a scrub.
|
2018-09-06 01:33:36 +00:00
|
|
|
.Ss Special Allocation Class
|
|
|
|
The allocations in the special class are dedicated to specific block types.
|
|
|
|
By default this includes all metadata, the indirect blocks of user data, and
|
|
|
|
any dedup data. The class can also be provisioned to accept a limited
|
|
|
|
percentage of small file data blocks.
|
|
|
|
.Pp
|
|
|
|
A pool must always have at least one general (non-specified) vdev before
|
|
|
|
other devices can be assigned to the special class. If the special class
|
|
|
|
becomes full, then allocations intended for it will spill back into the
|
|
|
|
normal class.
|
|
|
|
.Pp
|
|
|
|
Dedup data can be excluded from the special class by setting the
|
|
|
|
.Sy zfs_ddt_data_is_special
|
|
|
|
zfs module parameter to false (0).
|
|
|
|
.Pp
|
|
|
|
Inclusion of small file blocks in the special class is opt-in. Each dataset
|
|
|
|
can control the size of small file blocks allowed in the special class by
|
|
|
|
setting the
|
|
|
|
.Sy special_small_blocks
|
|
|
|
dataset property. It defaults to zero so you must opt-in by setting it to a
|
|
|
|
non-zero value. See
|
|
|
|
.Xr zfs 8
|
|
|
|
for more info on setting this property.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ss Properties
|
|
|
|
Each pool has several properties associated with it.
|
|
|
|
Some properties are read-only statistics while others are configurable and
|
|
|
|
change the behavior of the pool.
|
|
|
|
.Pp
|
|
|
|
The following are read-only properties:
|
|
|
|
.Bl -tag -width Ds
|
2017-12-06 05:19:31 +00:00
|
|
|
.It Cm allocated
|
|
|
|
Amount of storage used within the pool.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy capacity
|
|
|
|
Percentage of pool space used.
|
|
|
|
This property can also be referred to by its shortened column name,
|
|
|
|
.Sy cap .
|
|
|
|
.It Sy expandsize
|
2012-12-13 23:24:15 +00:00
|
|
|
Amount of uninitialized space within the pool or device that can be used to
|
2017-06-18 18:27:06 +00:00
|
|
|
increase the total capacity of the pool.
|
|
|
|
Uninitialized space consists of any space on an EFI labeled vdev which has not
|
|
|
|
been brought online
|
|
|
|
.Po e.g, using
|
|
|
|
.Nm zpool Cm online Fl e
|
|
|
|
.Pc .
|
|
|
|
This space occurs when a LUN is dynamically expanded.
|
|
|
|
.It Sy fragmentation
|
2014-07-19 20:19:24 +00:00
|
|
|
The amount of fragmentation in the pool.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy free
|
2012-12-13 23:24:15 +00:00
|
|
|
The amount of free space available in the pool.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy freeing
|
2012-12-13 23:24:15 +00:00
|
|
|
After a file system or snapshot is destroyed, the space it was using is
|
2017-06-18 18:27:06 +00:00
|
|
|
returned to the pool asynchronously.
|
|
|
|
.Sy freeing
|
|
|
|
is the amount of space remaining to be reclaimed.
|
|
|
|
Over time
|
|
|
|
.Sy freeing
|
|
|
|
will decrease while
|
|
|
|
.Sy free
|
|
|
|
increases.
|
|
|
|
.It Sy health
|
|
|
|
The current health of the pool.
|
|
|
|
Health can be one of
|
|
|
|
.Sy ONLINE , DEGRADED , FAULTED , OFFLINE, REMOVED , UNAVAIL .
|
|
|
|
.It Sy guid
|
2009-12-12 00:15:33 +00:00
|
|
|
A unique identifier for the pool.
|
2018-08-20 16:52:37 +00:00
|
|
|
.It Sy load_guid
|
|
|
|
A unique identifier for the pool.
|
|
|
|
Unlike the
|
|
|
|
.Sy guid
|
|
|
|
property, this identifier is generated every time we load the pool (e.g. does
|
|
|
|
not persist across imports/exports) and never changes while the pool is loaded
|
|
|
|
(even if a
|
|
|
|
.Sy reguid
|
|
|
|
operation takes place).
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy size
|
2009-12-12 00:15:33 +00:00
|
|
|
Total size of the storage pool.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy unsupported@ Ns Em feature_guid
|
|
|
|
Information about unsupported features that are enabled on the pool.
|
|
|
|
See
|
|
|
|
.Xr zpool-features 5
|
|
|
|
for details.
|
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
The space usage properties report actual physical space available to the
|
|
|
|
storage pool.
|
|
|
|
The physical space can be different from the total amount of space that any
|
|
|
|
contained datasets can actually use.
|
|
|
|
The amount of space used in a raidz configuration depends on the characteristics
|
|
|
|
of the data being written.
|
|
|
|
In addition, ZFS reserves some space for internal accounting that the
|
|
|
|
.Xr zfs 8
|
|
|
|
command takes into account, but the
|
|
|
|
.Nm
|
|
|
|
command does not.
|
|
|
|
For non-full pools of a reasonable size, these effects should be invisible.
|
|
|
|
For small pools, or pools that are close to being completely full, these
|
|
|
|
discrepancies may become more noticeable.
|
|
|
|
.Pp
|
2009-12-12 00:15:33 +00:00
|
|
|
The following property can be set at creation time and import time:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Sy altroot
|
|
|
|
Alternate root directory.
|
|
|
|
If set, this directory is prepended to any mount points within the pool.
|
|
|
|
This can be used when examining an unknown pool where the mount points cannot be
|
|
|
|
trusted, or in an alternate boot environment, where the typical paths are not
|
|
|
|
valid.
|
|
|
|
.Sy altroot
|
|
|
|
is not a persistent property.
|
|
|
|
It is valid only while the system is up.
|
|
|
|
Setting
|
|
|
|
.Sy altroot
|
|
|
|
defaults to using
|
|
|
|
.Sy cachefile Ns = Ns Sy none ,
|
|
|
|
though this may be overridden using an explicit setting.
|
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
The following property can be set only at import time:
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Sy readonly Ns = Ns Sy on Ns | Ns Sy off
|
|
|
|
If set to
|
|
|
|
.Sy on ,
|
|
|
|
the pool will be imported in read-only mode.
|
|
|
|
This property can also be referred to by its shortened column name,
|
|
|
|
.Sy rdonly .
|
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
The following properties can be set at creation time and import time, and later
|
|
|
|
changed with the
|
|
|
|
.Nm zpool Cm set
|
|
|
|
command:
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Sy ashift Ns = Ns Sy ashift
|
|
|
|
Pool sector size exponent, to the power of
|
|
|
|
.Sy 2
|
|
|
|
(internally referred to as
|
|
|
|
.Sy ashift
|
|
|
|
). Values from 9 to 16, inclusive, are valid; also, the special
|
|
|
|
value 0 (the default) means to auto-detect using the kernel's block
|
|
|
|
layer and a ZFS internal exception list. I/O operations will be aligned
|
|
|
|
to the specified size boundaries. Additionally, the minimum (disk)
|
|
|
|
write size will be set to the specified size, so this represents a
|
|
|
|
space vs. performance trade-off. For optimal performance, the pool
|
|
|
|
sector size should be greater than or equal to the sector size of the
|
|
|
|
underlying disks. The typical case for setting this property is when
|
|
|
|
performance is important and the underlying disks use 4KiB sectors but
|
|
|
|
report 512B sectors to the OS (for compatibility reasons); in that
|
|
|
|
case, set
|
|
|
|
.Sy ashift=12
|
|
|
|
(which is 1<<12 = 4096). When set, this property is
|
|
|
|
used as the default hint value in subsequent vdev operations (add,
|
|
|
|
attach and replace). Changing this value will not modify any existing
|
|
|
|
vdev, not even on disk replacement; however it can be used, for
|
|
|
|
instance, to replace a dying 512B sectors disk with a newer 4KiB
|
|
|
|
sectors device: this will probably result in bad performance but at the
|
|
|
|
same time could prevent loss of data.
|
|
|
|
.It Sy autoexpand Ns = Ns Sy on Ns | Ns Sy off
|
|
|
|
Controls automatic pool expansion when the underlying LUN is grown.
|
|
|
|
If set to
|
|
|
|
.Sy on ,
|
|
|
|
the pool will be resized according to the size of the expanded device.
|
|
|
|
If the device is part of a mirror or raidz then all devices within that
|
|
|
|
mirror/raidz group must be expanded before the new space is made available to
|
|
|
|
the pool.
|
|
|
|
The default behavior is
|
|
|
|
.Sy off .
|
|
|
|
This property can also be referred to by its shortened column name,
|
|
|
|
.Sy expand .
|
|
|
|
.It Sy autoreplace Ns = Ns Sy on Ns | Ns Sy off
|
|
|
|
Controls automatic device replacement.
|
|
|
|
If set to
|
|
|
|
.Sy off ,
|
|
|
|
device replacement must be initiated by the administrator by using the
|
|
|
|
.Nm zpool Cm replace
|
|
|
|
command.
|
|
|
|
If set to
|
|
|
|
.Sy on ,
|
|
|
|
any new device, found in the same physical location as a device that previously
|
|
|
|
belonged to the pool, is automatically formatted and replaced.
|
|
|
|
The default behavior is
|
|
|
|
.Sy off .
|
|
|
|
This property can also be referred to by its shortened column name,
|
|
|
|
.Sy replace .
|
|
|
|
Autoreplace can also be used with virtual disks (like device
|
|
|
|
mapper) provided that you use the /dev/disk/by-vdev paths setup by
|
|
|
|
vdev_id.conf. See the
|
|
|
|
.Xr vdev_id 8
|
|
|
|
man page for more details.
|
|
|
|
Autoreplace and autoonline require the ZFS Event Daemon be configured and
|
|
|
|
running. See the
|
|
|
|
.Xr zed 8
|
|
|
|
man page for more details.
|
|
|
|
.It Sy bootfs Ns = Ns Sy (unset) Ns | Ns Ar pool Ns / Ns Ar dataset
|
|
|
|
Identifies the default bootable dataset for the root pool. This property is
|
|
|
|
expected to be set mainly by the installation and upgrade programs.
|
|
|
|
Not all Linux distribution boot processes use the bootfs property.
|
|
|
|
.It Sy cachefile Ns = Ns Ar path Ns | Ns Sy none
|
|
|
|
Controls the location of where the pool configuration is cached.
|
|
|
|
Discovering all pools on system startup requires a cached copy of the
|
|
|
|
configuration data that is stored on the root file system.
|
|
|
|
All pools in this cache are automatically imported when the system boots.
|
|
|
|
Some environments, such as install and clustering, need to cache this
|
|
|
|
information in a different location so that pools are not automatically
|
|
|
|
imported.
|
|
|
|
Setting this property caches the pool configuration in a different location that
|
|
|
|
can later be imported with
|
|
|
|
.Nm zpool Cm import Fl c .
|
|
|
|
Setting it to the special value
|
|
|
|
.Sy none
|
|
|
|
creates a temporary pool that is never cached, and the special value
|
|
|
|
.Qq
|
|
|
|
.Pq empty string
|
|
|
|
uses the default location.
|
|
|
|
.Pp
|
|
|
|
Multiple pools can share the same cache file.
|
|
|
|
Because the kernel destroys and recreates this file when pools are added and
|
|
|
|
removed, care should be taken when attempting to access this file.
|
|
|
|
When the last pool using a
|
|
|
|
.Sy cachefile
|
2017-10-19 17:06:55 +00:00
|
|
|
is exported or destroyed, the file will be empty.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy comment Ns = Ns Ar text
|
|
|
|
A text string consisting of printable ASCII characters that will be stored
|
|
|
|
such that it is available even if the pool becomes faulted.
|
|
|
|
An administrator can provide additional information about a pool using this
|
|
|
|
property.
|
|
|
|
.It Sy dedupditto Ns = Ns Ar number
|
|
|
|
Threshold for the number of block ditto copies.
|
|
|
|
If the reference count for a deduplicated block increases above this number, a
|
|
|
|
new ditto copy of this block is automatically stored.
|
|
|
|
The default setting is
|
|
|
|
.Sy 0
|
|
|
|
which causes no ditto copies to be created for deduplicated blocks.
|
|
|
|
The minimum legal nonzero setting is
|
|
|
|
.Sy 100 .
|
|
|
|
.It Sy delegation Ns = Ns Sy on Ns | Ns Sy off
|
|
|
|
Controls whether a non-privileged user is granted access based on the dataset
|
|
|
|
permissions defined on the dataset.
|
|
|
|
See
|
|
|
|
.Xr zfs 8
|
|
|
|
for more information on ZFS delegated administration.
|
|
|
|
.It Sy failmode Ns = Ns Sy wait Ns | Ns Sy continue Ns | Ns Sy panic
|
|
|
|
Controls the system behavior in the event of catastrophic pool failure.
|
|
|
|
This condition is typically a result of a loss of connectivity to the underlying
|
|
|
|
storage device(s) or a failure of all devices within the pool.
|
|
|
|
The behavior of such an event is determined as follows:
|
|
|
|
.Bl -tag -width "continue"
|
|
|
|
.It Sy wait
|
|
|
|
Blocks all I/O access until the device connectivity is recovered and the errors
|
|
|
|
are cleared.
|
|
|
|
This is the default behavior.
|
|
|
|
.It Sy continue
|
|
|
|
Returns
|
|
|
|
.Er EIO
|
|
|
|
to any new write I/O requests but allows reads to any of the remaining healthy
|
|
|
|
devices.
|
|
|
|
Any write requests that have yet to be committed to disk would be blocked.
|
|
|
|
.It Sy panic
|
2009-12-12 00:15:33 +00:00
|
|
|
Prints out a message to the console and generates a system crash dump.
|
2017-06-18 18:27:06 +00:00
|
|
|
.El
|
|
|
|
.It Sy feature@ Ns Ar feature_name Ns = Ns Sy enabled
|
|
|
|
The value of this property is the current state of
|
|
|
|
.Ar feature_name .
|
|
|
|
The only valid value when setting this property is
|
|
|
|
.Sy enabled
|
|
|
|
which moves
|
|
|
|
.Ar feature_name
|
|
|
|
to the enabled state.
|
|
|
|
See
|
|
|
|
.Xr zpool-features 5
|
|
|
|
for details on feature states.
|
|
|
|
.It Sy listsnapshots Ns = Ns Sy on Ns | Ns Sy off
|
|
|
|
Controls whether information about snapshots associated with this pool is
|
|
|
|
output when
|
|
|
|
.Nm zfs Cm list
|
|
|
|
is run without the
|
|
|
|
.Fl t
|
|
|
|
option.
|
|
|
|
The default value is
|
|
|
|
.Sy off .
|
|
|
|
This property can also be referred to by its shortened name,
|
|
|
|
.Sy listsnaps .
|
Multi-modifier protection (MMP)
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
2017-07-08 03:20:35 +00:00
|
|
|
.It Sy multihost Ns = Ns Sy on Ns | Ns Sy off
|
|
|
|
Controls whether a pool activity check should be performed during
|
|
|
|
.Nm zpool Cm import .
|
|
|
|
When a pool is determined to be active it cannot be imported, even with the
|
|
|
|
.Fl f
|
|
|
|
option. This property is intended to be used in failover configurations
|
|
|
|
where multiple hosts have access to a pool on shared storage. When this
|
|
|
|
property is on, periodic writes to storage occur to show the pool is in use.
|
|
|
|
See
|
|
|
|
.Sy zfs_multihost_interval
|
|
|
|
in the
|
|
|
|
.Xr zfs-module-parameters 5
|
|
|
|
man page. In order to enable this property each host must set a unique hostid.
|
|
|
|
See
|
|
|
|
.Xr genhostid 1
|
2017-07-19 01:11:08 +00:00
|
|
|
.Xr zgenhostid 8
|
2018-02-28 16:57:10 +00:00
|
|
|
.Xr spl-module-parameters 5
|
Multi-modifier protection (MMP)
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
2017-07-08 03:20:35 +00:00
|
|
|
for additional details. The default value is
|
|
|
|
.Sy off .
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy version Ns = Ns Ar version
|
|
|
|
The current on-disk version of the pool.
|
|
|
|
This can be increased, but never decreased.
|
|
|
|
The preferred method of updating pools is with the
|
|
|
|
.Nm zpool Cm upgrade
|
|
|
|
command, though this property can be used when a specific version is needed for
|
|
|
|
backwards compatibility.
|
|
|
|
Once feature flags are enabled on a pool this property will no longer have a
|
|
|
|
value.
|
|
|
|
.El
|
|
|
|
.Ss Subcommands
|
|
|
|
All subcommands that modify state are logged persistently to the pool in their
|
|
|
|
original form.
|
|
|
|
.Pp
|
|
|
|
The
|
|
|
|
.Nm
|
|
|
|
command provides subcommands to create and destroy storage pools, add capacity
|
|
|
|
to storage pools, and provide information about the storage pools.
|
|
|
|
The following subcommands are supported:
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Fl ?
|
|
|
|
.Xc
|
2009-12-12 00:15:33 +00:00
|
|
|
Displays a help message.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm add
|
|
|
|
.Op Fl fgLnP
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc
|
|
|
|
.Ar pool vdev Ns ...
|
|
|
|
.Xc
|
|
|
|
Adds the specified virtual devices to the given pool.
|
|
|
|
The
|
|
|
|
.Ar vdev
|
|
|
|
specification is described in the
|
|
|
|
.Sx Virtual Devices
|
|
|
|
section.
|
|
|
|
The behavior of the
|
|
|
|
.Fl f
|
|
|
|
option, and the device checks performed are described in the
|
|
|
|
.Nm zpool Cm create
|
|
|
|
subcommand.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl f
|
|
|
|
Forces use of
|
|
|
|
.Ar vdev Ns s ,
|
|
|
|
even if they appear in use or specify a conflicting replication level.
|
|
|
|
Not all devices can be overridden in this manner.
|
|
|
|
.It Fl g
|
|
|
|
Display
|
|
|
|
.Ar vdev ,
|
|
|
|
GUIDs instead of the normal device names. These GUIDs can be used in place of
|
|
|
|
device names for the zpool detach/offline/remove/replace commands.
|
|
|
|
.It Fl L
|
|
|
|
Display real paths for
|
|
|
|
.Ar vdev Ns s
|
|
|
|
resolving all symbolic links. This can be used to look up the current block
|
|
|
|
device name regardless of the /dev/disk/ path used to open it.
|
|
|
|
.It Fl n
|
|
|
|
Displays the configuration that would be used without actually adding the
|
|
|
|
.Ar vdev Ns s .
|
|
|
|
The actual pool creation can still fail due to insufficient privileges or
|
|
|
|
device sharing.
|
|
|
|
.It Fl P
|
|
|
|
Display real paths for
|
|
|
|
.Ar vdev Ns s
|
|
|
|
instead of only the last component of the path. This can be used in
|
2018-06-04 16:06:16 +00:00
|
|
|
conjunction with the
|
|
|
|
.Fl L
|
|
|
|
flag.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the given pool properties. See the
|
|
|
|
.Sx Properties
|
|
|
|
section for a list of valid properties that can be set. The only property
|
|
|
|
supported at the moment is ashift.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm attach
|
|
|
|
.Op Fl f
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc
|
|
|
|
.Ar pool device new_device
|
|
|
|
.Xc
|
|
|
|
Attaches
|
|
|
|
.Ar new_device
|
|
|
|
to the existing
|
|
|
|
.Ar device .
|
|
|
|
The existing device cannot be part of a raidz configuration.
|
|
|
|
If
|
|
|
|
.Ar device
|
|
|
|
is not currently part of a mirrored configuration,
|
|
|
|
.Ar device
|
|
|
|
automatically transforms into a two-way mirror of
|
|
|
|
.Ar device
|
|
|
|
and
|
|
|
|
.Ar new_device .
|
|
|
|
If
|
|
|
|
.Ar device
|
|
|
|
is part of a two-way mirror, attaching
|
|
|
|
.Ar new_device
|
|
|
|
creates a three-way mirror, and so on.
|
|
|
|
In either case,
|
|
|
|
.Ar new_device
|
|
|
|
begins to resilver immediately.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl f
|
|
|
|
Forces use of
|
|
|
|
.Ar new_device ,
|
|
|
|
even if its appears to be in use.
|
|
|
|
Not all devices can be overridden in this manner.
|
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the given pool properties. See the
|
|
|
|
.Sx Properties
|
|
|
|
section for a list of valid properties that can be set. The only property
|
|
|
|
supported at the moment is ashift.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
2016-12-16 22:11:29 +00:00
|
|
|
.Cm checkpoint
|
|
|
|
.Op Fl d, -discard
|
|
|
|
.Ar pool
|
|
|
|
.Xc
|
|
|
|
Checkpoints the current state of
|
|
|
|
.Ar pool
|
|
|
|
, which can be later restored by
|
|
|
|
.Nm zpool Cm import --rewind-to-checkpoint .
|
|
|
|
The existence of a checkpoint in a pool prohibits the following
|
|
|
|
.Nm zpool
|
|
|
|
commands:
|
|
|
|
.Cm remove ,
|
|
|
|
.Cm attach ,
|
|
|
|
.Cm detach ,
|
|
|
|
.Cm split ,
|
|
|
|
and
|
|
|
|
.Cm reguid .
|
|
|
|
In addition, it may break reservation boundaries if the pool lacks free
|
|
|
|
space.
|
|
|
|
The
|
|
|
|
.Nm zpool Cm status
|
|
|
|
command indicates the existence of a checkpoint or the progress of discarding a
|
|
|
|
checkpoint from a pool.
|
|
|
|
The
|
|
|
|
.Nm zpool Cm list
|
|
|
|
command reports how much space the checkpoint takes from the pool.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl d, -discard
|
|
|
|
Discards an existing checkpoint from
|
|
|
|
.Ar pool .
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm clear
|
|
|
|
.Ar pool
|
|
|
|
.Op Ar device
|
|
|
|
.Xc
|
|
|
|
Clears device errors in a pool.
|
|
|
|
If no arguments are specified, all device errors within the pool are cleared.
|
|
|
|
If one or more devices is specified, only those errors associated with the
|
|
|
|
specified device or devices are cleared.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm create
|
|
|
|
.Op Fl dfn
|
|
|
|
.Op Fl m Ar mountpoint
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Oo Fl o Ar feature@feature Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Oo Fl O Ar file-system-property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Op Fl t Ar tname
|
|
|
|
.Ar pool vdev Ns ...
|
|
|
|
.Xc
|
|
|
|
Creates a new storage pool containing the virtual devices specified on the
|
|
|
|
command line.
|
|
|
|
The pool name must begin with a letter, and can only contain
|
|
|
|
alphanumeric characters as well as underscore
|
|
|
|
.Pq Qq Sy _ ,
|
|
|
|
dash
|
2017-09-16 17:51:24 +00:00
|
|
|
.Pq Qq Sy \&- ,
|
2017-06-18 18:27:06 +00:00
|
|
|
colon
|
|
|
|
.Pq Qq Sy \&: ,
|
|
|
|
space
|
2017-09-16 17:51:24 +00:00
|
|
|
.Pq Qq Sy \&\ ,
|
2017-06-18 18:27:06 +00:00
|
|
|
and period
|
|
|
|
.Pq Qq Sy \&. .
|
|
|
|
The pool names
|
|
|
|
.Sy mirror ,
|
|
|
|
.Sy raidz ,
|
|
|
|
.Sy spare
|
|
|
|
and
|
|
|
|
.Sy log
|
2018-02-22 17:31:34 +00:00
|
|
|
are reserved, as are names beginning with
|
|
|
|
.Sy mirror ,
|
|
|
|
.Sy raidz ,
|
|
|
|
.Sy spare ,
|
|
|
|
and the pattern
|
2017-06-18 18:27:06 +00:00
|
|
|
.Sy c[0-9] .
|
|
|
|
The
|
|
|
|
.Ar vdev
|
|
|
|
specification is described in the
|
|
|
|
.Sx Virtual Devices
|
|
|
|
section.
|
|
|
|
.Pp
|
|
|
|
The command verifies that each device specified is accessible and not currently
|
|
|
|
in use by another subsystem.
|
|
|
|
There are some uses, such as being currently mounted, or specified as the
|
|
|
|
dedicated dump device, that prevents a device from ever being used by ZFS.
|
|
|
|
Other uses, such as having a preexisting UFS file system, can be overridden with
|
|
|
|
the
|
|
|
|
.Fl f
|
|
|
|
option.
|
|
|
|
.Pp
|
|
|
|
The command also checks that the replication strategy for the pool is
|
|
|
|
consistent.
|
|
|
|
An attempt to combine redundant and non-redundant storage in a single pool, or
|
|
|
|
to mix disks and files, results in an error unless
|
|
|
|
.Fl f
|
|
|
|
is specified.
|
|
|
|
The use of differently sized devices within a single raidz or mirror group is
|
|
|
|
also flagged as an error unless
|
|
|
|
.Fl f
|
|
|
|
is specified.
|
|
|
|
.Pp
|
|
|
|
Unless the
|
|
|
|
.Fl R
|
|
|
|
option is specified, the default mount point is
|
|
|
|
.Pa / Ns Ar pool .
|
|
|
|
The mount point must not exist or must be empty, or else the root dataset
|
|
|
|
cannot be mounted.
|
|
|
|
This can be overridden with the
|
|
|
|
.Fl m
|
|
|
|
option.
|
|
|
|
.Pp
|
|
|
|
By default all supported features are enabled on the new pool unless the
|
|
|
|
.Fl d
|
|
|
|
option is specified.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl d
|
|
|
|
Do not enable any features on the new pool.
|
|
|
|
Individual features can be enabled by setting their corresponding properties to
|
|
|
|
.Sy enabled
|
|
|
|
with the
|
|
|
|
.Fl o
|
|
|
|
option.
|
|
|
|
See
|
|
|
|
.Xr zpool-features 5
|
|
|
|
for details about feature properties.
|
|
|
|
.It Fl f
|
|
|
|
Forces use of
|
|
|
|
.Ar vdev Ns s ,
|
|
|
|
even if they appear in use or specify a conflicting replication level.
|
|
|
|
Not all devices can be overridden in this manner.
|
|
|
|
.It Fl m Ar mountpoint
|
|
|
|
Sets the mount point for the root dataset.
|
|
|
|
The default mount point is
|
|
|
|
.Pa /pool
|
|
|
|
or
|
|
|
|
.Pa altroot/pool
|
|
|
|
if
|
|
|
|
.Ar altroot
|
|
|
|
is specified.
|
|
|
|
The mount point must be an absolute path,
|
|
|
|
.Sy legacy ,
|
|
|
|
or
|
|
|
|
.Sy none .
|
|
|
|
For more information on dataset mount points, see
|
|
|
|
.Xr zfs 8 .
|
|
|
|
.It Fl n
|
|
|
|
Displays the configuration that would be used without actually creating the
|
|
|
|
pool.
|
|
|
|
The actual pool creation can still fail due to insufficient privileges or
|
|
|
|
device sharing.
|
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the given pool properties.
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for a list of valid properties that can be set.
|
|
|
|
.It Fl o Ar feature@feature Ns = Ns Ar value
|
|
|
|
Sets the given pool feature. See the
|
|
|
|
.Xr zpool-features 5
|
|
|
|
section for a list of valid features that can be set.
|
|
|
|
Value can be either disabled or enabled.
|
|
|
|
.It Fl O Ar file-system-property Ns = Ns Ar value
|
|
|
|
Sets the given file system properties in the root file system of the pool.
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section of
|
|
|
|
.Xr zfs 8
|
|
|
|
for a list of valid properties that can be set.
|
|
|
|
.It Fl R Ar root
|
|
|
|
Equivalent to
|
|
|
|
.Fl o Sy cachefile Ns = Ns Sy none Fl o Sy altroot Ns = Ns Ar root
|
|
|
|
.It Fl t Ar tname
|
|
|
|
Sets the in-core pool name to
|
|
|
|
.Sy tname
|
|
|
|
while the on-disk name will be the name specified as the pool name
|
|
|
|
.Sy pool .
|
|
|
|
This will set the default cachefile property to none. This is intended
|
|
|
|
to handle name space collisions when creating pools for other systems,
|
|
|
|
such as virtual machines or physical machines whose pools live on network
|
|
|
|
block devices.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm destroy
|
|
|
|
.Op Fl f
|
|
|
|
.Ar pool
|
|
|
|
.Xc
|
|
|
|
Destroys the given pool, freeing up any devices for other use.
|
|
|
|
This command tries to unmount any active datasets before destroying the pool.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl f
|
2009-12-12 00:15:33 +00:00
|
|
|
Forces any active datasets contained within the pool to be unmounted.
|
2017-06-18 18:27:06 +00:00
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm detach
|
|
|
|
.Ar pool device
|
|
|
|
.Xc
|
|
|
|
Detaches
|
|
|
|
.Ar device
|
|
|
|
from a mirror.
|
|
|
|
The operation is refused if there are no other valid replicas of the data.
|
|
|
|
If device may be re-added to the pool later on then consider the
|
|
|
|
.Sy zpool offline
|
|
|
|
command instead.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm events
|
2017-10-26 23:49:33 +00:00
|
|
|
.Op Fl vHf Oo Ar pool Oc | Fl c
|
2017-06-18 18:27:06 +00:00
|
|
|
.Xc
|
|
|
|
Lists all recent events generated by the ZFS kernel modules. These events
|
|
|
|
are consumed by the
|
|
|
|
.Xr zed 8
|
|
|
|
and used to automate administrative tasks such as replacing a failed device
|
|
|
|
with a hot spare. For more information about the subclasses and event payloads
|
|
|
|
that can be generated see the
|
|
|
|
.Xr zfs-events 5
|
|
|
|
man page.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl c
|
2015-06-08 13:48:30 +00:00
|
|
|
Clear all previous events.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl f
|
|
|
|
Follow mode.
|
|
|
|
.It Fl H
|
|
|
|
Scripted mode. Do not display headers, and separate fields by a
|
|
|
|
single tab instead of arbitrary space.
|
|
|
|
.It Fl v
|
|
|
|
Print the entire payload for each event.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm export
|
|
|
|
.Op Fl a
|
|
|
|
.Op Fl f
|
|
|
|
.Ar pool Ns ...
|
|
|
|
.Xc
|
|
|
|
Exports the given pools from the system.
|
|
|
|
All devices are marked as exported, but are still considered in use by other
|
|
|
|
subsystems.
|
|
|
|
The devices can be moved between systems
|
|
|
|
.Pq even those of different endianness
|
|
|
|
and imported as long as a sufficient number of devices are present.
|
|
|
|
.Pp
|
|
|
|
Before exporting the pool, all datasets within the pool are unmounted.
|
|
|
|
A pool can not be exported if it has a shared spare that is currently being
|
|
|
|
used.
|
|
|
|
.Pp
|
|
|
|
For pools to be portable, you must give the
|
|
|
|
.Nm
|
|
|
|
command whole disks, not just partitions, so that ZFS can label the disks with
|
|
|
|
portable EFI labels.
|
|
|
|
Otherwise, disk drivers on platforms of different endianness will not recognize
|
|
|
|
the disks.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl a
|
2015-03-20 22:29:14 +00:00
|
|
|
Exports all pools imported on the system.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl f
|
|
|
|
Forcefully unmount all datasets, using the
|
|
|
|
.Nm unmount Fl f
|
|
|
|
command.
|
|
|
|
.Pp
|
|
|
|
This command will forcefully export the pool even if it has a shared spare that
|
|
|
|
is currently being used.
|
|
|
|
This may lead to potential data corruption.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm get
|
|
|
|
.Op Fl Hp
|
|
|
|
.Op Fl o Ar field Ns Oo , Ns Ar field Oc Ns ...
|
|
|
|
.Sy all Ns | Ns Ar property Ns Oo , Ns Ar property Oc Ns ...
|
2018-09-18 15:55:33 +00:00
|
|
|
.Oo Ar pool Oc Ns ...
|
2017-06-18 18:27:06 +00:00
|
|
|
.Xc
|
|
|
|
Retrieves the given list of properties
|
|
|
|
.Po
|
|
|
|
or all properties if
|
|
|
|
.Sy all
|
|
|
|
is used
|
|
|
|
.Pc
|
|
|
|
for the specified storage pool(s).
|
|
|
|
These properties are displayed with the following fields:
|
|
|
|
.Bd -literal
|
2016-05-09 21:03:18 +00:00
|
|
|
name Name of storage pool
|
2009-12-12 00:15:33 +00:00
|
|
|
property Property name
|
|
|
|
value Property value
|
|
|
|
source Property source, either 'default' or 'local'.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for more information on the available pool properties.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl H
|
|
|
|
Scripted mode.
|
|
|
|
Do not display headers, and separate fields by a single tab instead of arbitrary
|
|
|
|
space.
|
|
|
|
.It Fl o Ar field
|
|
|
|
A comma-separated list of columns to display.
|
2017-08-24 17:30:42 +00:00
|
|
|
.Sy name Ns \&, Ns Sy property Ns \&, Ns Sy value Ns \&, Ns Sy source
|
2016-05-09 21:03:18 +00:00
|
|
|
is the default value.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl p
|
|
|
|
Display numbers in parsable (exact) values.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm history
|
|
|
|
.Op Fl il
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Xc
|
|
|
|
Displays the command history of the specified pool(s) or all pools if no pool is
|
|
|
|
specified.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl i
|
|
|
|
Displays internally logged ZFS events in addition to user initiated events.
|
|
|
|
.It Fl l
|
|
|
|
Displays log records in long format, which in addition to standard format
|
|
|
|
includes, the user name, the hostname, and the zone in which the operation was
|
|
|
|
performed.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm import
|
|
|
|
.Op Fl D
|
2018-01-26 18:49:46 +00:00
|
|
|
.Op Fl d Ar dir Ns | Ns device
|
2017-06-18 18:27:06 +00:00
|
|
|
.Xc
|
|
|
|
Lists pools available to import.
|
|
|
|
If the
|
|
|
|
.Fl d
|
|
|
|
option is not specified, this command searches for devices in
|
|
|
|
.Pa /dev .
|
|
|
|
The
|
|
|
|
.Fl d
|
|
|
|
option can be specified multiple times, and all directories are searched.
|
|
|
|
If the device appears to be part of an exported pool, this command displays a
|
|
|
|
summary of the pool with the name of the pool, a numeric identifier, as well as
|
|
|
|
the vdev layout and current health of the device for each device or file.
|
|
|
|
Destroyed pools, pools that were previously destroyed with the
|
|
|
|
.Nm zpool Cm destroy
|
|
|
|
command, are not listed unless the
|
|
|
|
.Fl D
|
|
|
|
option is specified.
|
|
|
|
.Pp
|
|
|
|
The numeric identifier is unique, and can be used instead of the pool name when
|
|
|
|
multiple exported pools of the same name are available.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl c Ar cachefile
|
|
|
|
Reads configuration from the given
|
|
|
|
.Ar cachefile
|
|
|
|
that was created with the
|
|
|
|
.Sy cachefile
|
|
|
|
pool property.
|
|
|
|
This
|
|
|
|
.Ar cachefile
|
|
|
|
is used instead of searching for devices.
|
2018-01-26 18:49:46 +00:00
|
|
|
.It Fl d Ar dir Ns | Ns Ar device
|
|
|
|
Uses
|
|
|
|
.Ar device
|
|
|
|
or searches for devices or files in
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar dir .
|
|
|
|
The
|
|
|
|
.Fl d
|
|
|
|
option can be specified multiple times.
|
|
|
|
.It Fl D
|
2009-12-12 00:15:33 +00:00
|
|
|
Lists destroyed pools only.
|
2017-06-18 18:27:06 +00:00
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm import
|
|
|
|
.Fl a
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.Op Fl DflmN
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl F Oo Fl n Oc Oo Fl T Oc Oo Fl X Oc
|
2018-01-26 18:49:46 +00:00
|
|
|
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl o Ar mntopts
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Op Fl s
|
|
|
|
.Xc
|
|
|
|
Imports all pools found in the search directories.
|
|
|
|
Identical to the previous command, except that all pools with a sufficient
|
|
|
|
number of devices available are imported.
|
|
|
|
Destroyed pools, pools that were previously destroyed with the
|
|
|
|
.Nm zpool Cm destroy
|
|
|
|
command, will not be imported unless the
|
|
|
|
.Fl D
|
|
|
|
option is specified.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl a
|
2015-12-17 01:45:15 +00:00
|
|
|
Searches for and imports all pools found.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl c Ar cachefile
|
|
|
|
Reads configuration from the given
|
|
|
|
.Ar cachefile
|
|
|
|
that was created with the
|
|
|
|
.Sy cachefile
|
|
|
|
pool property.
|
|
|
|
This
|
|
|
|
.Ar cachefile
|
|
|
|
is used instead of searching for devices.
|
2018-01-26 18:49:46 +00:00
|
|
|
.It Fl d Ar dir Ns | Ns Ar device
|
|
|
|
Uses
|
|
|
|
.Ar device
|
|
|
|
or searches for devices or files in
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar dir .
|
|
|
|
The
|
|
|
|
.Fl d
|
|
|
|
option can be specified multiple times.
|
|
|
|
This option is incompatible with the
|
|
|
|
.Fl c
|
|
|
|
option.
|
|
|
|
.It Fl D
|
|
|
|
Imports destroyed pools only.
|
|
|
|
The
|
|
|
|
.Fl f
|
|
|
|
option is also required.
|
|
|
|
.It Fl f
|
|
|
|
Forces import, even if the pool appears to be potentially active.
|
|
|
|
.It Fl F
|
|
|
|
Recovery mode for a non-importable pool.
|
|
|
|
Attempt to return the pool to an importable state by discarding the last few
|
|
|
|
transactions.
|
|
|
|
Not all damaged pools can be recovered by using this option.
|
|
|
|
If successful, the data from the discarded transactions is irretrievably lost.
|
|
|
|
This option is ignored if the pool is importable or already imported.
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.It Fl l
|
|
|
|
Indicates that this command will request encryption keys for all encrypted
|
|
|
|
datasets it attempts to mount as it is bringing the pool online. Note that if
|
|
|
|
any datasets have a
|
|
|
|
.Sy keylocation
|
|
|
|
of
|
|
|
|
.Sy prompt
|
|
|
|
this command will block waiting for the keys to be entered. Without this flag
|
|
|
|
encrypted datasets will be left unavailable until the keys are loaded.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl m
|
2011-07-19 18:22:29 +00:00
|
|
|
Allows a pool to import when there is a missing log device.
|
2017-06-18 18:27:06 +00:00
|
|
|
Recent transactions can be lost because the log device will be discarded.
|
|
|
|
.It Fl n
|
|
|
|
Used with the
|
|
|
|
.Fl F
|
|
|
|
recovery option.
|
|
|
|
Determines whether a non-importable pool can be made importable again, but does
|
|
|
|
not actually perform the pool recovery.
|
|
|
|
For more details about pool recovery mode, see the
|
|
|
|
.Fl F
|
|
|
|
option, above.
|
|
|
|
.It Fl N
|
2011-07-19 18:22:29 +00:00
|
|
|
Import the pool without mounting any file systems.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl o Ar mntopts
|
|
|
|
Comma-separated list of mount options to use when mounting datasets within the
|
|
|
|
pool.
|
|
|
|
See
|
|
|
|
.Xr zfs 8
|
|
|
|
for a description of dataset properties and mount options.
|
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the specified property on the imported pool.
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for more information on the available pool properties.
|
|
|
|
.It Fl R Ar root
|
|
|
|
Sets the
|
|
|
|
.Sy cachefile
|
|
|
|
property to
|
|
|
|
.Sy none
|
|
|
|
and the
|
|
|
|
.Sy altroot
|
|
|
|
property to
|
|
|
|
.Ar root .
|
2016-12-16 22:11:29 +00:00
|
|
|
.It Fl -rewind-to-checkpoint
|
|
|
|
Rewinds pool to the checkpointed state.
|
|
|
|
Once the pool is imported with this flag there is no way to undo the rewind.
|
|
|
|
All changes and data that were written after the checkpoint are lost!
|
|
|
|
The only exception is when the
|
|
|
|
.Sy readonly
|
|
|
|
mounting option is enabled.
|
|
|
|
In this case, the checkpointed state of the pool is opened and an
|
|
|
|
administrator can see how the pool would look like if they were
|
|
|
|
to fully rewind.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl s
|
|
|
|
Scan using the default search path, the libblkid cache will not be
|
|
|
|
consulted. A custom search path may be specified by setting the
|
|
|
|
ZPOOL_IMPORT_PATH environment variable.
|
|
|
|
.It Fl X
|
|
|
|
Used with the
|
|
|
|
.Fl F
|
|
|
|
recovery option. Determines whether extreme
|
|
|
|
measures to find a valid txg should take place. This allows the pool to
|
|
|
|
be rolled back to a txg which is no longer guaranteed to be consistent.
|
|
|
|
Pools imported at an inconsistent txg may contain uncorrectable
|
|
|
|
checksum errors. For more details about pool recovery mode, see the
|
|
|
|
.Fl F
|
|
|
|
option, above. WARNING: This option can be extremely hazardous to the
|
|
|
|
health of your pool and should only be used as a last resort.
|
|
|
|
.It Fl T
|
|
|
|
Specify the txg to use for rollback. Implies
|
|
|
|
.Fl FX .
|
|
|
|
For more details
|
|
|
|
about pool recovery mode, see the
|
|
|
|
.Fl X
|
|
|
|
option, above. WARNING: This option can be extremely hazardous to the
|
|
|
|
health of your pool and should only be used as a last resort.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm import
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.Op Fl Dflm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl F Oo Fl n Oc Oo Fl t Oc Oo Fl T Oc Oo Fl X Oc
|
2018-01-26 18:49:46 +00:00
|
|
|
.Op Fl c Ar cachefile Ns | Ns Fl d Ar dir Ns | Ns device
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl o Ar mntopts
|
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Op Fl s
|
|
|
|
.Ar pool Ns | Ns Ar id
|
|
|
|
.Op Ar newpool
|
|
|
|
.Xc
|
|
|
|
Imports a specific pool.
|
|
|
|
A pool can be identified by its name or the numeric identifier.
|
|
|
|
If
|
|
|
|
.Ar newpool
|
|
|
|
is specified, the pool is imported using the name
|
|
|
|
.Ar newpool .
|
|
|
|
Otherwise, it is imported with the same name as its exported name.
|
|
|
|
.Pp
|
|
|
|
If a device is removed from a system without running
|
|
|
|
.Nm zpool Cm export
|
|
|
|
first, the device appears as potentially active.
|
|
|
|
It cannot be determined if this was a failed export, or whether the device is
|
|
|
|
really in use from another host.
|
|
|
|
To import a pool in this state, the
|
|
|
|
.Fl f
|
|
|
|
option is required.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl c Ar cachefile
|
|
|
|
Reads configuration from the given
|
|
|
|
.Ar cachefile
|
|
|
|
that was created with the
|
|
|
|
.Sy cachefile
|
|
|
|
pool property.
|
|
|
|
This
|
|
|
|
.Ar cachefile
|
|
|
|
is used instead of searching for devices.
|
2018-01-26 18:49:46 +00:00
|
|
|
.It Fl d Ar dir Ns | Ns Ar device
|
|
|
|
Uses
|
|
|
|
.Ar device
|
|
|
|
or searches for devices or files in
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar dir .
|
|
|
|
The
|
|
|
|
.Fl d
|
|
|
|
option can be specified multiple times.
|
|
|
|
This option is incompatible with the
|
|
|
|
.Fl c
|
|
|
|
option.
|
|
|
|
.It Fl D
|
|
|
|
Imports destroyed pool.
|
|
|
|
The
|
|
|
|
.Fl f
|
|
|
|
option is also required.
|
|
|
|
.It Fl f
|
2009-12-12 00:15:33 +00:00
|
|
|
Forces import, even if the pool appears to be potentially active.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl F
|
|
|
|
Recovery mode for a non-importable pool.
|
|
|
|
Attempt to return the pool to an importable state by discarding the last few
|
|
|
|
transactions.
|
|
|
|
Not all damaged pools can be recovered by using this option.
|
|
|
|
If successful, the data from the discarded transactions is irretrievably lost.
|
|
|
|
This option is ignored if the pool is importable or already imported.
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.It Fl l
|
|
|
|
Indicates that this command will request encryption keys for all encrypted
|
|
|
|
datasets it attempts to mount as it is bringing the pool online. Note that if
|
|
|
|
any datasets have a
|
|
|
|
.Sy keylocation
|
|
|
|
of
|
|
|
|
.Sy prompt
|
|
|
|
this command will block waiting for the keys to be entered. Without this flag
|
|
|
|
encrypted datasets will be left unavailable until the keys are loaded.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl m
|
2011-07-19 18:22:29 +00:00
|
|
|
Allows a pool to import when there is a missing log device.
|
2017-06-18 18:27:06 +00:00
|
|
|
Recent transactions can be lost because the log device will be discarded.
|
|
|
|
.It Fl n
|
|
|
|
Used with the
|
|
|
|
.Fl F
|
|
|
|
recovery option.
|
|
|
|
Determines whether a non-importable pool can be made importable again, but does
|
|
|
|
not actually perform the pool recovery.
|
|
|
|
For more details about pool recovery mode, see the
|
|
|
|
.Fl F
|
|
|
|
option, above.
|
|
|
|
.It Fl o Ar mntopts
|
|
|
|
Comma-separated list of mount options to use when mounting datasets within the
|
|
|
|
pool.
|
|
|
|
See
|
|
|
|
.Xr zfs 8
|
|
|
|
for a description of dataset properties and mount options.
|
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the specified property on the imported pool.
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for more information on the available pool properties.
|
|
|
|
.It Fl R Ar root
|
|
|
|
Sets the
|
|
|
|
.Sy cachefile
|
|
|
|
property to
|
|
|
|
.Sy none
|
|
|
|
and the
|
|
|
|
.Sy altroot
|
|
|
|
property to
|
|
|
|
.Ar root .
|
|
|
|
.It Fl s
|
|
|
|
Scan using the default search path, the libblkid cache will not be
|
|
|
|
consulted. A custom search path may be specified by setting the
|
|
|
|
ZPOOL_IMPORT_PATH environment variable.
|
|
|
|
.It Fl X
|
|
|
|
Used with the
|
|
|
|
.Fl F
|
|
|
|
recovery option. Determines whether extreme
|
|
|
|
measures to find a valid txg should take place. This allows the pool to
|
|
|
|
be rolled back to a txg which is no longer guaranteed to be consistent.
|
|
|
|
Pools imported at an inconsistent txg may contain uncorrectable
|
|
|
|
checksum errors. For more details about pool recovery mode, see the
|
|
|
|
.Fl F
|
|
|
|
option, above. WARNING: This option can be extremely hazardous to the
|
|
|
|
health of your pool and should only be used as a last resort.
|
|
|
|
.It Fl T
|
|
|
|
Specify the txg to use for rollback. Implies
|
|
|
|
.Fl FX .
|
|
|
|
For more details
|
|
|
|
about pool recovery mode, see the
|
|
|
|
.Fl X
|
|
|
|
option, above. WARNING: This option can be extremely hazardous to the
|
|
|
|
health of your pool and should only be used as a last resort.
|
2017-11-28 17:10:52 +00:00
|
|
|
.It Fl t
|
2017-06-18 18:27:06 +00:00
|
|
|
Used with
|
|
|
|
.Sy newpool .
|
|
|
|
Specifies that
|
|
|
|
.Sy newpool
|
|
|
|
is temporary. Temporary pool names last until export. Ensures that
|
|
|
|
the original pool name will be used in all label updates and therefore
|
|
|
|
is retained upon export.
|
|
|
|
Will also set -o cachefile=none when not explicitly specified.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 14:54:59 +00:00
|
|
|
.Cm initialize
|
2018-12-27 21:12:53 +00:00
|
|
|
.Op Fl c | Fl s
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 14:54:59 +00:00
|
|
|
.Ar pool
|
|
|
|
.Op Ar device Ns ...
|
|
|
|
.Xc
|
|
|
|
Begins initializing by writing to all unallocated regions on the specified
|
|
|
|
devices, or all eligible devices in the pool if no individual devices are
|
|
|
|
specified.
|
|
|
|
Only leaf data or log devices may be initialized.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl c, -cancel
|
|
|
|
Cancel initializing on the specified devices, or all eligible devices if none
|
|
|
|
are specified.
|
|
|
|
If one or more target devices are invalid or are not currently being
|
|
|
|
initialized, the command will fail and no cancellation will occur on any device.
|
|
|
|
.It Fl s -suspend
|
|
|
|
Suspend initializing on the specified devices, or all eligible devices if none
|
|
|
|
are specified.
|
|
|
|
If one or more target devices are invalid or are not currently being
|
|
|
|
initialized, the command will fail and no suspension will occur on any device.
|
|
|
|
Initializing can then be resumed by running
|
|
|
|
.Nm zpool Cm initialize
|
|
|
|
with no flags on the relevant target devices.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm iostat
|
|
|
|
.Op Oo Oo Fl c Ar SCRIPT Oc Oo Fl lq Oc Oc Ns | Ns Fl rw
|
|
|
|
.Op Fl T Sy u Ns | Ns Sy d
|
|
|
|
.Op Fl ghHLpPvy
|
|
|
|
.Oo Oo Ar pool Ns ... Oc Ns | Ns Oo Ar pool vdev Ns ... Oc Ns | Ns Oo Ar vdev Ns ... Oc Oc
|
|
|
|
.Op Ar interval Op Ar count
|
|
|
|
.Xc
|
|
|
|
Displays I/O statistics for the given pools/vdevs. You can pass in a
|
|
|
|
list of pools, a pool and list of vdevs in that pool, or a list of any
|
|
|
|
vdevs from any pool. If no items are specified, statistics for every
|
|
|
|
pool in the system are shown.
|
|
|
|
When given an
|
|
|
|
.Ar interval ,
|
|
|
|
the statistics are printed every
|
|
|
|
.Ar interval
|
|
|
|
seconds until ^C is pressed. If count is specified, the command exits
|
|
|
|
after count reports are printed. The first report printed is always
|
|
|
|
the statistics since boot regardless of whether
|
|
|
|
.Ar interval
|
|
|
|
and
|
|
|
|
.Ar count
|
|
|
|
are passed. However, this behavior can be suppressed with the
|
|
|
|
.Fl y
|
|
|
|
flag. Also note that the units of
|
|
|
|
.Sy K ,
|
|
|
|
.Sy M ,
|
|
|
|
.Sy G ...
|
|
|
|
that are printed in the report are in base 1024. To get the raw
|
|
|
|
values, use the
|
|
|
|
.Fl p
|
|
|
|
flag.
|
|
|
|
.Bl -tag -width Ds
|
2017-07-21 00:04:35 +00:00
|
|
|
.It Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ...
|
2017-06-18 18:27:06 +00:00
|
|
|
Run a script (or scripts) on each vdev and include the output as a new column
|
|
|
|
in the
|
|
|
|
.Nm zpool Cm iostat
|
|
|
|
output. Users can run any script found in their
|
|
|
|
.Pa ~/.zpool.d
|
|
|
|
directory or from the system
|
|
|
|
.Pa /etc/zfs/zpool.d
|
2017-07-24 18:53:59 +00:00
|
|
|
directory. Script names containing the slash (/) character are not allowed.
|
|
|
|
The default search path can be overridden by setting the
|
2017-06-18 18:27:06 +00:00
|
|
|
ZPOOL_SCRIPTS_PATH environment variable. A privileged user can run
|
|
|
|
.Fl c
|
|
|
|
if they have the ZPOOL_SCRIPTS_AS_ROOT
|
|
|
|
environment variable set. If a script requires the use of a privileged
|
|
|
|
command, like
|
2017-07-21 00:04:35 +00:00
|
|
|
.Xr smartctl 8 ,
|
|
|
|
then it's recommended you allow the user access to it in
|
2017-06-18 18:27:06 +00:00
|
|
|
.Pa /etc/sudoers
|
|
|
|
or add the user to the
|
|
|
|
.Pa /etc/sudoers.d/zfs
|
|
|
|
file.
|
|
|
|
.Pp
|
|
|
|
If
|
|
|
|
.Fl c
|
|
|
|
is passed without a script name, it prints a list of all scripts.
|
|
|
|
.Fl c
|
2017-07-21 00:04:35 +00:00
|
|
|
also sets verbose mode
|
2017-09-16 17:51:24 +00:00
|
|
|
.No \&( Ns Fl v Ns No \&).
|
2017-06-18 18:27:06 +00:00
|
|
|
.Pp
|
|
|
|
Script output should be in the form of "name=value". The column name is
|
|
|
|
set to "name" and the value is set to "value". Multiple lines can be
|
|
|
|
used to output multiple columns. The first line of output not in the
|
|
|
|
"name=value" format is displayed without a column title, and no more
|
|
|
|
output after that is displayed. This can be useful for printing error
|
|
|
|
messages. Blank or NULL values are printed as a '-' to make output
|
|
|
|
awk-able.
|
|
|
|
.Pp
|
2017-04-21 16:27:04 +00:00
|
|
|
The following environment variables are set before running each script:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bl -tag -width "VDEV_PATH"
|
|
|
|
.It Sy VDEV_PATH
|
|
|
|
Full path to the vdev
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "VDEV_UPATH"
|
|
|
|
.It Sy VDEV_UPATH
|
|
|
|
Underlying path to the vdev (/dev/sd*). For use with device mapper,
|
|
|
|
multipath, or partitioned vdevs.
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "VDEV_ENC_SYSFS_PATH"
|
|
|
|
.It Sy VDEV_ENC_SYSFS_PATH
|
|
|
|
The sysfs path to the enclosure for the vdev (if any).
|
|
|
|
.El
|
|
|
|
.It Fl T Sy u Ns | Ns Sy d
|
2009-12-12 00:15:33 +00:00
|
|
|
Display a time stamp.
|
2017-06-18 18:27:06 +00:00
|
|
|
Specify
|
|
|
|
.Sy u
|
|
|
|
for a printed representation of the internal representation of time.
|
|
|
|
See
|
|
|
|
.Xr time 2 .
|
|
|
|
Specify
|
|
|
|
.Sy d
|
|
|
|
for standard date format.
|
|
|
|
See
|
|
|
|
.Xr date 1 .
|
|
|
|
.It Fl g
|
|
|
|
Display vdev GUIDs instead of the normal device names. These GUIDs
|
|
|
|
can be used in place of device names for the zpool
|
|
|
|
detach/offline/remove/replace commands.
|
|
|
|
.It Fl H
|
|
|
|
Scripted mode. Do not display headers, and separate fields by a
|
|
|
|
single tab instead of arbitrary space.
|
|
|
|
.It Fl L
|
|
|
|
Display real paths for vdevs resolving all symbolic links. This can
|
|
|
|
be used to look up the current block device name regardless of the
|
|
|
|
.Pa /dev/disk/
|
|
|
|
path used to open it.
|
|
|
|
.It Fl p
|
|
|
|
Display numbers in parsable (exact) values. Time values are in
|
|
|
|
nanoseconds.
|
|
|
|
.It Fl P
|
|
|
|
Display full paths for vdevs instead of only the last component of
|
|
|
|
the path. This can be used in conjunction with the
|
|
|
|
.Fl L
|
|
|
|
flag.
|
|
|
|
.It Fl r
|
|
|
|
Print request size histograms for the leaf ZIOs. This includes
|
|
|
|
histograms of individual ZIOs (
|
|
|
|
.Ar ind )
|
|
|
|
and aggregate ZIOs (
|
|
|
|
.Ar agg ).
|
|
|
|
These stats can be useful for seeing how well the ZFS IO aggregator is
|
|
|
|
working. Do not confuse these request size stats with the block layer
|
|
|
|
requests; it's possible ZIOs can be broken up before being sent to the
|
|
|
|
block device.
|
|
|
|
.It Fl v
|
|
|
|
Verbose statistics Reports usage statistics for individual vdevs within the
|
|
|
|
pool, in addition to the pool-wide statistics.
|
|
|
|
.It Fl y
|
2018-04-30 18:42:58 +00:00
|
|
|
Omit statistics since boot.
|
|
|
|
Normally the first line of output reports the statistics since boot.
|
|
|
|
This option suppresses that first line of output.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl w
|
2018-04-30 18:42:58 +00:00
|
|
|
Display latency histograms:
|
|
|
|
.Pp
|
|
|
|
.Ar total_wait :
|
|
|
|
Total IO time (queuing + disk IO time).
|
|
|
|
.Ar disk_wait :
|
|
|
|
Disk IO time (time reading/writing the disk).
|
|
|
|
.Ar syncq_wait :
|
|
|
|
Amount of time IO spent in synchronous priority queues. Does not include
|
|
|
|
disk time.
|
|
|
|
.Ar asyncq_wait :
|
|
|
|
Amount of time IO spent in asynchronous priority queues. Does not include
|
|
|
|
disk time.
|
|
|
|
.Ar scrub :
|
|
|
|
Amount of time IO spent in scrub queue. Does not include disk time.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl l
|
2016-02-29 18:05:23 +00:00
|
|
|
Include average latency statistics:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Pp
|
|
|
|
.Ar total_wait :
|
2016-02-29 18:05:23 +00:00
|
|
|
Average total IO time (queuing + disk IO time).
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar disk_wait :
|
2016-02-29 18:05:23 +00:00
|
|
|
Average disk IO time (time reading/writing the disk).
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar syncq_wait :
|
|
|
|
Average amount of time IO spent in synchronous priority queues. Does
|
|
|
|
not include disk time.
|
|
|
|
.Ar asyncq_wait :
|
|
|
|
Average amount of time IO spent in asynchronous priority queues.
|
|
|
|
Does not include disk time.
|
|
|
|
.Ar scrub :
|
|
|
|
Average queuing time in scrub queue. Does not include disk time.
|
|
|
|
.It Fl q
|
|
|
|
Include active queue statistics. Each priority queue has both
|
|
|
|
pending (
|
|
|
|
.Ar pend )
|
|
|
|
and active (
|
|
|
|
.Ar activ )
|
|
|
|
IOs. Pending IOs are waiting to
|
|
|
|
be issued to the disk, and active IOs have been issued to disk and are
|
|
|
|
waiting for completion. These stats are broken out by priority queue:
|
|
|
|
.Pp
|
|
|
|
.Ar syncq_read/write :
|
|
|
|
Current number of entries in synchronous priority
|
|
|
|
queues.
|
|
|
|
.Ar asyncq_read/write :
|
2016-02-29 18:05:23 +00:00
|
|
|
Current number of entries in asynchronous priority queues.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar scrubq_read :
|
2016-02-29 18:05:23 +00:00
|
|
|
Current number of entries in scrub queue.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Pp
|
|
|
|
All queue statistics are instantaneous measurements of the number of
|
|
|
|
entries in the queues. If you specify an interval, the measurements
|
|
|
|
will be sampled from the end of the interval.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm labelclear
|
|
|
|
.Op Fl f
|
|
|
|
.Ar device
|
|
|
|
.Xc
|
|
|
|
Removes ZFS label information from the specified
|
|
|
|
.Ar device .
|
|
|
|
The
|
|
|
|
.Ar device
|
|
|
|
must not be part of an active pool configuration.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl f
|
2013-07-05 11:01:44 +00:00
|
|
|
Treat exported or foreign devices as inactive.
|
2017-06-18 18:27:06 +00:00
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm list
|
|
|
|
.Op Fl HgLpPv
|
|
|
|
.Op Fl o Ar property Ns Oo , Ns Ar property Oc Ns ...
|
|
|
|
.Op Fl T Sy u Ns | Ns Sy d
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Op Ar interval Op Ar count
|
|
|
|
.Xc
|
|
|
|
Lists the given pools along with a health status and space usage.
|
|
|
|
If no
|
|
|
|
.Ar pool Ns s
|
|
|
|
are specified, all pools in the system are listed.
|
|
|
|
When given an
|
|
|
|
.Ar interval ,
|
|
|
|
the information is printed every
|
|
|
|
.Ar interval
|
|
|
|
seconds until ^C is pressed.
|
|
|
|
If
|
|
|
|
.Ar count
|
|
|
|
is specified, the command exits after
|
|
|
|
.Ar count
|
|
|
|
reports are printed.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl g
|
|
|
|
Display vdev GUIDs instead of the normal device names. These GUIDs
|
|
|
|
can be used in place of device names for the zpool
|
|
|
|
detach/offline/remove/replace commands.
|
|
|
|
.It Fl H
|
|
|
|
Scripted mode.
|
|
|
|
Do not display headers, and separate fields by a single tab instead of arbitrary
|
|
|
|
space.
|
|
|
|
.It Fl o Ar property
|
|
|
|
Comma-separated list of properties to display.
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for a list of valid properties.
|
|
|
|
The default list is
|
2018-04-27 23:59:49 +00:00
|
|
|
.Cm name , size , allocated , free , checkpoint, expandsize , fragmentation ,
|
|
|
|
.Cm capacity , dedupratio , health , altroot .
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl L
|
|
|
|
Display real paths for vdevs resolving all symbolic links. This can
|
|
|
|
be used to look up the current block device name regardless of the
|
|
|
|
/dev/disk/ path used to open it.
|
|
|
|
.It Fl p
|
|
|
|
Display numbers in parsable
|
|
|
|
.Pq exact
|
|
|
|
values.
|
|
|
|
.It Fl P
|
|
|
|
Display full paths for vdevs instead of only the last component of
|
|
|
|
the path. This can be used in conjunction with the
|
2018-06-04 16:06:16 +00:00
|
|
|
.Fl L
|
|
|
|
flag.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl T Sy u Ns | Ns Sy d
|
2013-07-18 22:19:32 +00:00
|
|
|
Display a time stamp.
|
2017-06-18 18:27:06 +00:00
|
|
|
Specify
|
|
|
|
.Fl u
|
|
|
|
for a printed representation of the internal representation of time.
|
|
|
|
See
|
|
|
|
.Xr time 2 .
|
|
|
|
Specify
|
|
|
|
.Fl d
|
|
|
|
for standard date format.
|
|
|
|
See
|
|
|
|
.Xr date 1 .
|
|
|
|
.It Fl v
|
|
|
|
Verbose statistics.
|
|
|
|
Reports usage statistics for individual vdevs within the pool, in addition to
|
|
|
|
the pool-wise statistics.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm offline
|
|
|
|
.Op Fl f
|
|
|
|
.Op Fl t
|
|
|
|
.Ar pool Ar device Ns ...
|
|
|
|
.Xc
|
|
|
|
Takes the specified physical device offline.
|
|
|
|
While the
|
|
|
|
.Ar device
|
|
|
|
is offline, no attempt is made to read or write to the device.
|
|
|
|
This command is not applicable to spares.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl f
|
|
|
|
Force fault. Instead of offlining the disk, put it into a faulted
|
|
|
|
state. The fault will persist across imports unless the
|
|
|
|
.Fl t
|
|
|
|
flag was specified.
|
|
|
|
.It Fl t
|
|
|
|
Temporary.
|
|
|
|
Upon reboot, the specified physical device reverts to its previous state.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm online
|
|
|
|
.Op Fl e
|
|
|
|
.Ar pool Ar device Ns ...
|
|
|
|
.Xc
|
2009-12-12 00:15:33 +00:00
|
|
|
Brings the specified physical device online.
|
2017-09-15 20:13:52 +00:00
|
|
|
This command is not applicable to spares.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl e
|
|
|
|
Expand the device to use all available space.
|
|
|
|
If the device is part of a mirror or raidz then all devices must be expanded
|
|
|
|
before the new space will become available to the pool.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm reguid
|
|
|
|
.Ar pool
|
|
|
|
.Xc
|
|
|
|
Generates a new unique identifier for the pool.
|
|
|
|
You must ensure that all devices in this pool are online and healthy before
|
|
|
|
performing this action.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm reopen
|
2017-10-26 19:26:09 +00:00
|
|
|
.Op Fl n
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar pool
|
|
|
|
.Xc
|
2013-05-02 23:36:32 +00:00
|
|
|
Reopen all the vdevs associated with the pool.
|
2017-10-26 19:26:09 +00:00
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl n
|
|
|
|
Do not restart an in-progress scrub operation. This is not recommended and can
|
|
|
|
result in partially resilvered devices unless a second scrub is performed.
|
2017-10-27 16:52:18 +00:00
|
|
|
.El
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm remove
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.Op Fl np
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar pool Ar device Ns ...
|
|
|
|
.Xc
|
|
|
|
Removes the specified device from the pool.
|
2018-09-18 00:28:18 +00:00
|
|
|
This command supports removing hot spare, cache, log, and both mirrored and
|
|
|
|
non-redundant primary top-level vdevs, including dedup and special vdevs.
|
|
|
|
When the primary pool storage includes a top-level raidz vdev only hot spare,
|
|
|
|
cache, and log devices can be removed.
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.sp
|
|
|
|
Removing a top-level vdev reduces the total amount of space in the storage pool.
|
|
|
|
The specified device will be evacuated by copying all allocated space from it to
|
|
|
|
the other devices in the pool.
|
|
|
|
In this case, the
|
|
|
|
.Nm zpool Cm remove
|
|
|
|
command initiates the removal and returns, while the evacuation continues in
|
|
|
|
the background.
|
|
|
|
The removal progress can be monitored with
|
2018-12-04 17:37:37 +00:00
|
|
|
.Nm zpool Cm status .
|
|
|
|
If an IO error is encountered during the removal process it will be
|
|
|
|
cancelled. The
|
2018-09-18 00:28:18 +00:00
|
|
|
.Sy device_removal
|
|
|
|
feature flag must be enabled to remove a top-level vdev, see
|
|
|
|
.Xr zpool-features 5 .
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.Pp
|
|
|
|
A mirrored top-level device (log or data) can be removed by specifying the top-level mirror for the
|
|
|
|
same.
|
|
|
|
Non-log devices or data devices that are part of a mirrored configuration can be removed using
|
2017-06-18 18:27:06 +00:00
|
|
|
the
|
|
|
|
.Nm zpool Cm detach
|
|
|
|
command.
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl n
|
|
|
|
Do not actually perform the removal ("no-op").
|
|
|
|
Instead, print the estimated amount of memory that will be used by the
|
|
|
|
mapping table after the removal completes.
|
|
|
|
This is nonzero only for top-level vdevs.
|
|
|
|
.El
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl p
|
|
|
|
Used in conjunction with the
|
|
|
|
.Fl n
|
|
|
|
flag, displays numbers as parsable (exact) values.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm remove
|
|
|
|
.Fl s
|
|
|
|
.Ar pool
|
|
|
|
.Xc
|
|
|
|
Stops and cancels an in-progress removal of a top-level vdev.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm replace
|
|
|
|
.Op Fl f
|
|
|
|
.Op Fl o Ar property Ns = Ns Ar value
|
|
|
|
.Ar pool Ar device Op Ar new_device
|
|
|
|
.Xc
|
|
|
|
Replaces
|
|
|
|
.Ar old_device
|
|
|
|
with
|
|
|
|
.Ar new_device .
|
|
|
|
This is equivalent to attaching
|
|
|
|
.Ar new_device ,
|
|
|
|
waiting for it to resilver, and then detaching
|
|
|
|
.Ar old_device .
|
|
|
|
.Pp
|
|
|
|
The size of
|
|
|
|
.Ar new_device
|
|
|
|
must be greater than or equal to the minimum size of all the devices in a mirror
|
|
|
|
or raidz configuration.
|
|
|
|
.Pp
|
|
|
|
.Ar new_device
|
|
|
|
is required if the pool is not redundant.
|
|
|
|
If
|
|
|
|
.Ar new_device
|
|
|
|
is not specified, it defaults to
|
|
|
|
.Ar old_device .
|
|
|
|
This form of replacement is useful after an existing disk has failed and has
|
|
|
|
been physically replaced.
|
|
|
|
In this case, the new disk may have the same
|
|
|
|
.Pa /dev
|
|
|
|
path as the old device, even though it is actually a different disk.
|
|
|
|
ZFS recognizes this.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl f
|
|
|
|
Forces use of
|
|
|
|
.Ar new_device ,
|
|
|
|
even if its appears to be in use.
|
|
|
|
Not all devices can be overridden in this manner.
|
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the given pool properties. See the
|
|
|
|
.Sx Properties
|
|
|
|
section for a list of valid properties that can be set.
|
|
|
|
The only property supported at the moment is
|
|
|
|
.Sy ashift .
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm scrub
|
2017-07-07 05:16:13 +00:00
|
|
|
.Op Fl s | Fl p
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ar pool Ns ...
|
|
|
|
.Xc
|
2017-07-07 05:16:13 +00:00
|
|
|
Begins a scrub or resumes a paused scrub.
|
2017-06-18 18:27:06 +00:00
|
|
|
The scrub examines all data in the specified pools to verify that it checksums
|
|
|
|
correctly.
|
|
|
|
For replicated
|
|
|
|
.Pq mirror or raidz
|
|
|
|
devices, ZFS automatically repairs any damage discovered during the scrub.
|
|
|
|
The
|
|
|
|
.Nm zpool Cm status
|
|
|
|
command reports the progress of the scrub and summarizes the results of the
|
|
|
|
scrub upon completion.
|
|
|
|
.Pp
|
|
|
|
Scrubbing and resilvering are very similar operations.
|
|
|
|
The difference is that resilvering only examines data that ZFS knows to be out
|
|
|
|
of date
|
|
|
|
.Po
|
|
|
|
for example, when attaching a new device to a mirror or replacing an existing
|
|
|
|
device
|
|
|
|
.Pc ,
|
|
|
|
whereas scrubbing examines all data to discover silent errors due to hardware
|
|
|
|
faults or disk failure.
|
|
|
|
.Pp
|
|
|
|
Because scrubbing and resilvering are I/O-intensive operations, ZFS only allows
|
|
|
|
one at a time.
|
2017-07-07 05:16:13 +00:00
|
|
|
If a scrub is paused, the
|
2017-06-18 18:27:06 +00:00
|
|
|
.Nm zpool Cm scrub
|
2017-07-07 05:16:13 +00:00
|
|
|
resumes it.
|
2017-06-18 18:27:06 +00:00
|
|
|
If a resilver is in progress, ZFS does not allow a scrub to be started until the
|
|
|
|
resilver completes.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl s
|
2009-12-12 00:15:33 +00:00
|
|
|
Stop scrubbing.
|
2017-06-18 18:27:06 +00:00
|
|
|
.El
|
2017-07-07 05:16:13 +00:00
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl p
|
|
|
|
Pause scrubbing.
|
2017-08-24 17:27:20 +00:00
|
|
|
Scrub pause state and progress are periodically synced to disk.
|
|
|
|
If the system is restarted or pool is exported during a paused scrub,
|
|
|
|
even after import, scrub will remain paused until it is resumed.
|
|
|
|
Once resumed the scrub will pick up from the place where it was last
|
|
|
|
checkpointed to disk.
|
2017-07-07 05:16:13 +00:00
|
|
|
To resume a paused scrub issue
|
|
|
|
.Nm zpool Cm scrub
|
|
|
|
again.
|
|
|
|
.El
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Xo
|
|
|
|
.Nm
|
2018-10-19 04:06:18 +00:00
|
|
|
.Cm resilver
|
|
|
|
.Ar pool Ns ...
|
|
|
|
.Xc
|
|
|
|
Starts a resilver. If an existing resilver is already running it will be
|
|
|
|
restarted from the beginning. Any drives that were scheduled for a deferred
|
|
|
|
resilver will be added to the new one.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
2017-06-18 18:27:06 +00:00
|
|
|
.Cm set
|
|
|
|
.Ar property Ns = Ns Ar value
|
|
|
|
.Ar pool
|
|
|
|
.Xc
|
|
|
|
Sets the given property on the specified pool.
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for more information on what properties can be set and acceptable
|
|
|
|
values.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm split
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.Op Fl gLlnP
|
2017-06-18 18:27:06 +00:00
|
|
|
.Oo Fl o Ar property Ns = Ns Ar value Oc Ns ...
|
|
|
|
.Op Fl R Ar root
|
|
|
|
.Ar pool newpool
|
|
|
|
.Op Ar device ...
|
|
|
|
.Xc
|
|
|
|
Splits devices off
|
|
|
|
.Ar pool
|
|
|
|
creating
|
|
|
|
.Ar newpool .
|
|
|
|
All vdevs in
|
|
|
|
.Ar pool
|
|
|
|
must be mirrors and the pool must not be in the process of resilvering.
|
|
|
|
At the time of the split,
|
|
|
|
.Ar newpool
|
|
|
|
will be a replica of
|
|
|
|
.Ar pool .
|
|
|
|
By default, the
|
|
|
|
last device in each mirror is split from
|
|
|
|
.Ar pool
|
|
|
|
to create
|
|
|
|
.Ar newpool .
|
|
|
|
.Pp
|
|
|
|
The optional device specification causes the specified device(s) to be
|
|
|
|
included in the new
|
|
|
|
.Ar pool
|
|
|
|
and, should any devices remain unspecified,
|
|
|
|
the last device in each mirror is used as would be by default.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl g
|
|
|
|
Display vdev GUIDs instead of the normal device names. These GUIDs
|
|
|
|
can be used in place of device names for the zpool
|
|
|
|
detach/offline/remove/replace commands.
|
|
|
|
.It Fl L
|
|
|
|
Display real paths for vdevs resolving all symbolic links. This can
|
|
|
|
be used to look up the current block device name regardless of the
|
|
|
|
.Pa /dev/disk/
|
|
|
|
path used to open it.
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 17:36:48 +00:00
|
|
|
.It Fl l
|
|
|
|
Indicates that this command will request encryption keys for all encrypted
|
|
|
|
datasets it attempts to mount as it is bringing the new pool online. Note that
|
|
|
|
if any datasets have a
|
|
|
|
.Sy keylocation
|
|
|
|
of
|
|
|
|
.Sy prompt
|
|
|
|
this command will block waiting for the keys to be entered. Without this flag
|
|
|
|
encrypted datasets will be left unavailable until the keys are loaded.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl n
|
|
|
|
Do dry run, do not actually perform the split.
|
|
|
|
Print out the expected configuration of
|
|
|
|
.Ar newpool .
|
|
|
|
.It Fl P
|
|
|
|
Display full paths for vdevs instead of only the last component of
|
|
|
|
the path. This can be used in conjunction with the
|
2018-06-04 16:06:16 +00:00
|
|
|
.Fl L
|
|
|
|
flag.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl o Ar property Ns = Ns Ar value
|
|
|
|
Sets the specified property for
|
|
|
|
.Ar newpool .
|
|
|
|
See the
|
|
|
|
.Sx Properties
|
|
|
|
section for more information on the available pool properties.
|
|
|
|
.It Fl R Ar root
|
|
|
|
Set
|
|
|
|
.Sy altroot
|
|
|
|
for
|
|
|
|
.Ar newpool
|
|
|
|
to
|
|
|
|
.Ar root
|
|
|
|
and automatically import it.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm status
|
2017-07-21 00:04:35 +00:00
|
|
|
.Op Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ...
|
2018-12-27 21:12:53 +00:00
|
|
|
.Op Fl DigLpPsvx
|
2017-06-18 18:27:06 +00:00
|
|
|
.Op Fl T Sy u Ns | Ns Sy d
|
|
|
|
.Oo Ar pool Oc Ns ...
|
|
|
|
.Op Ar interval Op Ar count
|
|
|
|
.Xc
|
|
|
|
Displays the detailed health status for the given pools.
|
|
|
|
If no
|
|
|
|
.Ar pool
|
|
|
|
is specified, then the status of each pool in the system is displayed.
|
|
|
|
For more information on pool and device health, see the
|
|
|
|
.Sx Device Failure and Recovery
|
|
|
|
section.
|
|
|
|
.Pp
|
|
|
|
If a scrub or resilver is in progress, this command reports the percentage done
|
|
|
|
and the estimated time to completion.
|
|
|
|
Both of these are only approximate, because the amount of data in the pool and
|
|
|
|
the other workloads on the system can change.
|
|
|
|
.Bl -tag -width Ds
|
2017-07-21 00:04:35 +00:00
|
|
|
.It Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns ...
|
2017-06-18 18:27:06 +00:00
|
|
|
Run a script (or scripts) on each vdev and include the output as a new column
|
|
|
|
in the
|
|
|
|
.Nm zpool Cm status
|
|
|
|
output. See the
|
|
|
|
.Fl c
|
|
|
|
option of
|
|
|
|
.Nm zpool Cm iostat
|
|
|
|
for complete details.
|
2018-12-27 21:12:53 +00:00
|
|
|
.It Fl i
|
|
|
|
Display vdev initialization status.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl g
|
|
|
|
Display vdev GUIDs instead of the normal device names. These GUIDs
|
|
|
|
can be used in place of device names for the zpool
|
|
|
|
detach/offline/remove/replace commands.
|
|
|
|
.It Fl L
|
|
|
|
Display real paths for vdevs resolving all symbolic links. This can
|
|
|
|
be used to look up the current block device name regardless of the
|
|
|
|
.Pa /dev/disk/
|
|
|
|
path used to open it.
|
2018-11-09 00:47:24 +00:00
|
|
|
.It Fl p
|
|
|
|
Display numbers in parsable (exact) values.
|
2017-10-27 22:52:03 +00:00
|
|
|
.It Fl P
|
|
|
|
Display full paths for vdevs instead of only the last component of
|
|
|
|
the path. This can be used in conjunction with the
|
2018-06-04 16:06:16 +00:00
|
|
|
.Fl L
|
|
|
|
flag.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl D
|
|
|
|
Display a histogram of deduplication statistics, showing the allocated
|
|
|
|
.Pq physically present on disk
|
|
|
|
and referenced
|
|
|
|
.Pq logically referenced in the pool
|
|
|
|
block counts and sizes by reference count.
|
2018-11-09 00:47:24 +00:00
|
|
|
.It Fl s
|
|
|
|
Display the number of leaf VDEV slow IOs. This is the number of IOs that
|
|
|
|
didn't complete in \fBzio_slow_io_ms\fR milliseconds (default 30 seconds).
|
|
|
|
This does not necessarily mean the IOs failed to complete, just took an
|
|
|
|
unreasonably long amount of time. This may indicate a problem with the
|
|
|
|
underlying storage.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl T Sy u Ns | Ns Sy d
|
2013-10-13 16:36:15 +00:00
|
|
|
Display a time stamp.
|
2017-06-18 18:27:06 +00:00
|
|
|
Specify
|
|
|
|
.Fl u
|
|
|
|
for a printed representation of the internal representation of time.
|
|
|
|
See
|
|
|
|
.Xr time 2 .
|
|
|
|
Specify
|
|
|
|
.Fl d
|
|
|
|
for standard date format.
|
|
|
|
See
|
|
|
|
.Xr date 1 .
|
|
|
|
.It Fl v
|
|
|
|
Displays verbose data error information, printing out a complete list of all
|
|
|
|
data errors since the last complete pool scrub.
|
|
|
|
.It Fl x
|
|
|
|
Only display status for pools that are exhibiting errors or are otherwise
|
|
|
|
unavailable.
|
|
|
|
Warnings about pools not using the latest on-disk format will not be included.
|
|
|
|
.El
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm sync
|
|
|
|
.Op Ar pool ...
|
|
|
|
.Xc
|
|
|
|
This command forces all in-core dirty data to be written to the primary
|
|
|
|
pool storage and not the ZIL. It will also update administrative
|
|
|
|
information including quota reporting. Without arguments,
|
|
|
|
.Sy zpool sync
|
|
|
|
will sync all pools on the system. Otherwise, it will sync only the
|
|
|
|
specified pool(s).
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm upgrade
|
|
|
|
.Xc
|
|
|
|
Displays pools which do not have all supported features enabled and pools
|
|
|
|
formatted using a legacy ZFS version number.
|
|
|
|
These pools can continue to be used, but some features may not be available.
|
|
|
|
Use
|
|
|
|
.Nm zpool Cm upgrade Fl a
|
|
|
|
to enable all features on all pools.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm upgrade
|
|
|
|
.Fl v
|
|
|
|
.Xc
|
|
|
|
Displays legacy ZFS versions supported by the current software.
|
|
|
|
See
|
|
|
|
.Xr zpool-features 5
|
|
|
|
for a description of feature flags features supported by the current software.
|
|
|
|
.It Xo
|
|
|
|
.Nm
|
|
|
|
.Cm upgrade
|
|
|
|
.Op Fl V Ar version
|
|
|
|
.Fl a Ns | Ns Ar pool Ns ...
|
|
|
|
.Xc
|
|
|
|
Enables all supported features on the given pool.
|
|
|
|
Once this is done, the pool will no longer be accessible on systems that do not
|
|
|
|
support feature flags.
|
|
|
|
See
|
2018-09-21 16:41:08 +00:00
|
|
|
.Xr zpool-features 5
|
2017-06-18 18:27:06 +00:00
|
|
|
for details on compatibility with systems that support feature flags, but do not
|
|
|
|
support all features enabled on the pool.
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Fl a
|
2012-12-14 23:00:45 +00:00
|
|
|
Enables all supported features on all pools.
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Fl V Ar version
|
|
|
|
Upgrade to the specified legacy version.
|
|
|
|
If the
|
|
|
|
.Fl V
|
|
|
|
flag is specified, no features will be enabled on the pool.
|
|
|
|
This option can only be used to increase the version number up to the last
|
|
|
|
supported legacy version number.
|
|
|
|
.El
|
|
|
|
.El
|
|
|
|
.Sh EXIT STATUS
|
|
|
|
The following exit values are returned:
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Sy 0
|
|
|
|
Successful completion.
|
|
|
|
.It Sy 1
|
|
|
|
An error occurred.
|
|
|
|
.It Sy 2
|
|
|
|
Invalid command line options were specified.
|
|
|
|
.El
|
|
|
|
.Sh EXAMPLES
|
|
|
|
.Bl -tag -width Ds
|
|
|
|
.It Sy Example 1 No Creating a RAID-Z Storage Pool
|
|
|
|
The following command creates a pool with a single raidz root vdev that
|
|
|
|
consists of six disks.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create tank raidz sda sdb sdc sdd sde sdf
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 2 No Creating a Mirrored Storage Pool
|
|
|
|
The following command creates a pool with two mirrors, where each mirror
|
|
|
|
contains two disks.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create tank mirror sda sdb mirror sdc sdd
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 3 No Creating a ZFS Storage Pool by Using Partitions
|
2011-04-09 03:27:25 +00:00
|
|
|
The following command creates an unmirrored pool using two disk partitions.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bd -literal
|
|
|
|
# zpool create tank sda1 sdb2
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 4 No Creating a ZFS Storage Pool by Using Files
|
|
|
|
The following command creates an unmirrored pool using files.
|
|
|
|
While not recommended, a pool based on files can be useful for experimental
|
|
|
|
purposes.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create tank /path/to/file/a /path/to/file/b
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 5 No Adding a Mirror to a ZFS Storage Pool
|
|
|
|
The following command adds two mirrored disks to the pool
|
|
|
|
.Em tank ,
|
|
|
|
assuming the pool is already made up of two-way mirrors.
|
|
|
|
The additional space is immediately available to any datasets within the pool.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool add tank mirror sda sdb
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 6 No Listing Available ZFS Storage Pools
|
|
|
|
The following command lists all available pools on the system.
|
|
|
|
In this case, the pool
|
|
|
|
.Em zion
|
|
|
|
is faulted due to a missing device.
|
2009-12-12 00:15:33 +00:00
|
|
|
The results from this command are similar to the following:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bd -literal
|
|
|
|
# zpool list
|
2018-02-28 16:54:53 +00:00
|
|
|
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
|
|
|
|
rpool 19.9G 8.43G 11.4G - 33% 42% 1.00x ONLINE -
|
|
|
|
tank 61.5G 20.0G 41.5G - 48% 32% 1.00x ONLINE -
|
|
|
|
zion - - - - - - - FAULTED -
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ed
|
|
|
|
.It Sy Example 7 No Destroying a ZFS Storage Pool
|
|
|
|
The following command destroys the pool
|
|
|
|
.Em tank
|
|
|
|
and any datasets contained within.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool destroy -f tank
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 8 No Exporting a ZFS Storage Pool
|
|
|
|
The following command exports the devices in pool
|
|
|
|
.Em tank
|
|
|
|
so that they can be relocated or later imported.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool export tank
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 9 No Importing a ZFS Storage Pool
|
|
|
|
The following command displays available pools, and then imports the pool
|
|
|
|
.Em tank
|
|
|
|
for use on the system.
|
2009-12-12 00:15:33 +00:00
|
|
|
The results from this command are similar to the following:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bd -literal
|
|
|
|
# zpool import
|
2009-12-12 00:15:33 +00:00
|
|
|
pool: tank
|
|
|
|
id: 15451357997522795478
|
|
|
|
state: ONLINE
|
|
|
|
action: The pool can be imported using its name or numeric identifier.
|
|
|
|
config:
|
|
|
|
|
|
|
|
tank ONLINE
|
|
|
|
mirror ONLINE
|
2011-04-09 03:27:25 +00:00
|
|
|
sda ONLINE
|
|
|
|
sdb ONLINE
|
2009-12-12 00:15:33 +00:00
|
|
|
|
2017-06-18 18:27:06 +00:00
|
|
|
# zpool import tank
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 10 No Upgrading All ZFS Storage Pools to the Current Version
|
|
|
|
The following command upgrades all ZFS Storage pools to the current version of
|
|
|
|
the software.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool upgrade -a
|
|
|
|
This system is currently running ZFS version 2.
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 11 No Managing Hot Spares
|
2009-12-12 00:15:33 +00:00
|
|
|
The following command creates a new pool with an available hot spare:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bd -literal
|
|
|
|
# zpool create tank mirror sda sdb spare sdc
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
If one of the disks were to fail, the pool would be reduced to the degraded
|
|
|
|
state.
|
|
|
|
The failed device can be replaced using the following command:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool replace tank sda sdd
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
Once the data has been resilvered, the spare is automatically removed and is
|
2017-09-15 20:13:52 +00:00
|
|
|
made available for use should another device fail.
|
2017-06-18 18:27:06 +00:00
|
|
|
The hot spare can be permanently removed from the pool using the following
|
|
|
|
command:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool remove tank sdc
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 12 No Creating a ZFS Pool with Mirrored Separate Intent Logs
|
|
|
|
The following command creates a ZFS storage pool consisting of two, two-way
|
|
|
|
mirrors and mirrored log devices:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool create pool mirror sda sdb mirror sdc sdd log mirror \\
|
|
|
|
sde sdf
|
|
|
|
.Ed
|
|
|
|
.It Sy Example 13 No Adding Cache Devices to a ZFS Pool
|
|
|
|
The following command adds two disks for use as cache devices to a ZFS storage
|
|
|
|
pool:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool add pool cache sdc sdd
|
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
Once added, the cache devices gradually fill with content from main memory.
|
|
|
|
Depending on the size of your cache devices, it could take over an hour for
|
|
|
|
them to fill.
|
|
|
|
Capacity and reads can be monitored using the
|
|
|
|
.Cm iostat
|
|
|
|
option as follows:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool iostat -v pool 5
|
|
|
|
.Ed
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.It Sy Example 14 No Removing a Mirrored top-level (Log or Data) Device
|
|
|
|
The following commands remove the mirrored log device
|
|
|
|
.Sy mirror-2
|
|
|
|
and mirrored top-level data device
|
|
|
|
.Sy mirror-1 .
|
|
|
|
.Pp
|
2009-12-12 00:15:33 +00:00
|
|
|
Given this configuration:
|
2017-06-18 18:27:06 +00:00
|
|
|
.Bd -literal
|
|
|
|
pool: tank
|
|
|
|
state: ONLINE
|
|
|
|
scrub: none requested
|
2009-12-12 00:15:33 +00:00
|
|
|
config:
|
|
|
|
|
|
|
|
NAME STATE READ WRITE CKSUM
|
|
|
|
tank ONLINE 0 0 0
|
|
|
|
mirror-0 ONLINE 0 0 0
|
2011-04-09 03:27:25 +00:00
|
|
|
sda ONLINE 0 0 0
|
|
|
|
sdb ONLINE 0 0 0
|
2009-12-12 00:15:33 +00:00
|
|
|
mirror-1 ONLINE 0 0 0
|
2011-04-09 03:27:25 +00:00
|
|
|
sdc ONLINE 0 0 0
|
|
|
|
sdd ONLINE 0 0 0
|
2009-12-12 00:15:33 +00:00
|
|
|
logs
|
|
|
|
mirror-2 ONLINE 0 0 0
|
2011-04-09 03:27:25 +00:00
|
|
|
sde ONLINE 0 0 0
|
|
|
|
sdf ONLINE 0 0 0
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ed
|
|
|
|
.Pp
|
|
|
|
The command to remove the mirrored log
|
|
|
|
.Sy mirror-2
|
|
|
|
is:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool remove tank mirror-2
|
|
|
|
.Ed
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 16:30:13 +00:00
|
|
|
.Pp
|
|
|
|
The command to remove the mirrored data
|
|
|
|
.Sy mirror-1
|
|
|
|
is:
|
|
|
|
.Bd -literal
|
|
|
|
# zpool remove tank mirror-1
|
|
|
|
.Ed
|
2017-06-18 18:27:06 +00:00
|
|
|
.It Sy Example 15 No Displaying expanded space on a device
|
|
|
|
The following command displays the detailed information for the pool
|
|
|
|
.Em data .
|
|
|
|
This pool is comprised of a single raidz vdev where one of its devices
|
|
|
|
increased its capacity by 10GB.
|
|
|
|
In this example, the pool will not be able to utilize this extra capacity until
|
|
|
|
all the devices under the raidz vdev have been expanded.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool list -v data
|
2018-02-28 16:54:53 +00:00
|
|
|
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
|
|
|
|
data 23.9G 14.6G 9.30G - 48% 61% 1.00x ONLINE -
|
|
|
|
raidz1 23.9G 14.6G 9.30G - 48%
|
|
|
|
sda - - - - -
|
|
|
|
sdb - - - 10G -
|
|
|
|
sdc - - - - -
|
2017-06-18 18:27:06 +00:00
|
|
|
.Ed
|
|
|
|
.It Sy Example 16 No Adding output columns
|
|
|
|
Additional columns can be added to the
|
|
|
|
.Nm zpool Cm status
|
|
|
|
and
|
|
|
|
.Nm zpool Cm iostat
|
|
|
|
output with
|
|
|
|
.Fl c
|
|
|
|
option.
|
|
|
|
.Bd -literal
|
|
|
|
# zpool status -c vendor,model,size
|
|
|
|
NAME STATE READ WRITE CKSUM vendor model size
|
|
|
|
tank ONLINE 0 0 0
|
|
|
|
mirror-0 ONLINE 0 0 0
|
|
|
|
U1 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
|
|
|
|
U10 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
|
|
|
|
U11 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
|
|
|
|
U12 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
|
|
|
|
U13 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
|
|
|
|
U14 ONLINE 0 0 0 SEAGATE ST8000NM0075 7.3T
|
|
|
|
|
|
|
|
# zpool iostat -vc slaves
|
|
|
|
capacity operations bandwidth
|
|
|
|
pool alloc free read write read write slaves
|
|
|
|
---------- ----- ----- ----- ----- ----- ----- ---------
|
|
|
|
tank 20.4G 7.23T 26 152 20.7M 21.6M
|
|
|
|
mirror 20.4G 7.23T 26 152 20.7M 21.6M
|
|
|
|
U1 - - 0 31 1.46K 20.6M sdb sdff
|
|
|
|
U10 - - 0 1 3.77K 13.3K sdas sdgw
|
|
|
|
U11 - - 0 1 288K 13.3K sdat sdgx
|
|
|
|
U12 - - 0 1 78.4K 13.3K sdau sdgy
|
|
|
|
U13 - - 0 1 128K 13.3K sdav sdgz
|
|
|
|
U14 - - 0 1 63.2K 13.3K sdfk sdg
|
|
|
|
.Ed
|
|
|
|
.El
|
|
|
|
.Sh ENVIRONMENT VARIABLES
|
|
|
|
.Bl -tag -width "ZFS_ABORT"
|
|
|
|
.It Ev ZFS_ABORT
|
|
|
|
Cause
|
|
|
|
.Nm zpool
|
|
|
|
to dump core on exit for the purposes of running
|
2017-09-16 17:51:24 +00:00
|
|
|
.Sy ::findleaks .
|
2017-06-18 18:27:06 +00:00
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_IMPORT_PATH"
|
|
|
|
.It Ev ZPOOL_IMPORT_PATH
|
|
|
|
The search path for devices or files to use with the pool. This is a colon-separated list of directories in which
|
|
|
|
.Nm zpool
|
|
|
|
looks for device nodes and files.
|
|
|
|
Similar to the
|
|
|
|
.Fl d
|
|
|
|
option in
|
|
|
|
.Nm zpool import .
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_VDEV_NAME_GUID"
|
|
|
|
.It Ev ZPOOL_VDEV_NAME_GUID
|
|
|
|
Cause
|
|
|
|
.Nm zpool subcommands to output vdev guids by default. This behavior
|
|
|
|
is identical to the
|
|
|
|
.Nm zpool status -g
|
|
|
|
command line option.
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_VDEV_NAME_FOLLOW_LINKS"
|
|
|
|
.It Ev ZPOOL_VDEV_NAME_FOLLOW_LINKS
|
|
|
|
Cause
|
|
|
|
.Nm zpool
|
|
|
|
subcommands to follow links for vdev names by default. This behavior is identical to the
|
|
|
|
.Nm zpool status -L
|
|
|
|
command line option.
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_VDEV_NAME_PATH"
|
|
|
|
.It Ev ZPOOL_VDEV_NAME_PATH
|
|
|
|
Cause
|
|
|
|
.Nm zpool
|
|
|
|
subcommands to output full vdev path names by default. This
|
|
|
|
behavior is identical to the
|
|
|
|
.Nm zpool status -p
|
|
|
|
command line option.
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZFS_VDEV_DEVID_OPT_OUT"
|
|
|
|
.It Ev ZFS_VDEV_DEVID_OPT_OUT
|
2016-03-14 16:04:21 +00:00
|
|
|
Older ZFS on Linux implementations had issues when attempting to display pool
|
2017-06-18 18:27:06 +00:00
|
|
|
config VDEV names if a
|
|
|
|
.Sy devid
|
|
|
|
NVP value is present in the pool's config.
|
|
|
|
.Pp
|
2016-03-14 16:04:21 +00:00
|
|
|
For example, a pool that originated on illumos platform would have a devid
|
2017-06-18 18:27:06 +00:00
|
|
|
value in the config and
|
|
|
|
.Nm zpool status
|
|
|
|
would fail when listing the config.
|
2016-03-14 16:04:21 +00:00
|
|
|
This would also be true for future Linux based pools.
|
2017-06-18 18:27:06 +00:00
|
|
|
.Pp
|
|
|
|
A pool can be stripped of any
|
|
|
|
.Sy devid
|
|
|
|
values on import or prevented from adding
|
|
|
|
them on
|
|
|
|
.Nm zpool create
|
|
|
|
or
|
|
|
|
.Nm zpool add
|
|
|
|
by setting
|
|
|
|
.Sy ZFS_VDEV_DEVID_OPT_OUT .
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_SCRIPTS_AS_ROOT"
|
|
|
|
.It Ev ZPOOL_SCRIPTS_AS_ROOT
|
2017-07-21 00:04:35 +00:00
|
|
|
Allow a privileged user to run the
|
2017-06-18 18:27:06 +00:00
|
|
|
.Nm zpool status/iostat
|
|
|
|
with the
|
|
|
|
.Fl c
|
2017-07-21 00:04:35 +00:00
|
|
|
option. Normally, only unprivileged users are allowed to run
|
2017-06-18 18:27:06 +00:00
|
|
|
.Fl c .
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_SCRIPTS_PATH"
|
|
|
|
.It Ev ZPOOL_SCRIPTS_PATH
|
|
|
|
The search path for scripts when running
|
|
|
|
.Nm zpool status/iostat
|
|
|
|
with the
|
|
|
|
.Fl c
|
2017-06-05 17:52:15 +00:00
|
|
|
option. This is a colon-separated list of directories and overrides the default
|
2017-06-18 18:27:06 +00:00
|
|
|
.Pa ~/.zpool.d
|
|
|
|
and
|
|
|
|
.Pa /etc/zfs/zpool.d
|
|
|
|
search paths.
|
|
|
|
.El
|
|
|
|
.Bl -tag -width "ZPOOL_SCRIPTS_ENABLED"
|
|
|
|
.It Ev ZPOOL_SCRIPTS_ENABLED
|
|
|
|
Allow a user to run
|
|
|
|
.Nm zpool status/iostat
|
|
|
|
with the
|
|
|
|
.Fl c
|
|
|
|
option. If
|
|
|
|
.Sy ZPOOL_SCRIPTS_ENABLED
|
|
|
|
is not set, it is assumed that the user is allowed to run
|
|
|
|
.Nm zpool status/iostat -c .
|
2017-09-16 17:51:24 +00:00
|
|
|
.El
|
2017-06-18 18:27:06 +00:00
|
|
|
.Sh INTERFACE STABILITY
|
|
|
|
.Sy Evolving
|
|
|
|
.Sh SEE ALSO
|
|
|
|
.Xr zfs-events 5 ,
|
|
|
|
.Xr zfs-module-parameters 5 ,
|
2017-09-16 17:51:24 +00:00
|
|
|
.Xr zpool-features 5 ,
|
|
|
|
.Xr zed 8 ,
|
|
|
|
.Xr zfs 8
|