zfs/man/man7/zpool-features.7

923 lines
29 KiB
Groff
Raw Normal View History

.\"
Implement Redacted Send/Receive Redacted send/receive allows users to send subsets of their data to a target system. One possible use case for this feature is to not transmit sensitive information to a data warehousing, test/dev, or analytics environment. Another is to save space by not replicating unimportant data within a given dataset, for example in backup tools like zrepl. Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots). Second, the new zfs redact command is used to create a redaction bookmark. The redaction bookmark stores the list of blocks in a snapshot that were modified by the redaction snapshot(s). Finally, the redaction bookmark is passed as a parameter to zfs send. When sending to the snapshot that was redacted, the redaction bookmark is used to filter out blocks that contain sensitive or unwanted information, and those blocks are not included in the send stream. When sending from the redaction bookmark, the blocks it contains are considered as candidate blocks in addition to those blocks in the destination snapshot that were modified since the creation_txg of the redaction bookmark. This step is necessary to allow the target to rehydrate data in the case where some blocks are accidentally or unnecessarily modified in the redaction snapshot. The changes to bookmarks to enable fast space estimation involve adding deadlists to bookmarks. There is also logic to manage the life cycles of these deadlists. The new size estimation process operates in cases where previously an accurate estimate could not be provided. In those cases, a send is performed where no data blocks are read, reducing the runtime significantly and providing a byte-accurate size estimate. Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Prashanth Sreenivasa <pks@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Chris Williamson <chris.williamson@delphix.com> Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com> Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #7958
2019-06-19 16:48:13 +00:00
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
.\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
.\" Copyright (c) 2014, Joyent, Inc. All rights reserved.
.\" The contents of this file are subject to the terms of the Common Development
.\" and Distribution License (the "License"). You may not use this file except
.\" in compliance with the License. You can obtain a copy of the license at
.\" usr/src/OPENSOLARIS.LICENSE or https://opensource.org/licenses/CDDL-1.0.
.\"
.\" See the License for the specific language governing permissions and
.\" limitations under the License. When distributing Covered Code, include this
.\" CDDL HEADER in each file and include the License file at
.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
Add zstd support to zfs This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #6247 Closes #9024 Closes #10277 Closes #10278
2020-08-18 17:10:17 +00:00
.\" Copyright (c) 2019, Klara Inc.
.\" Copyright (c) 2019, Allan Jude
Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468
2021-02-18 05:30:45 +00:00
.\" Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
.\"
.Dd June 23, 2022
.Dt ZPOOL-FEATURES 7
.Os
.
.Sh NAME
.Nm zpool-features
.Nd description of ZFS pool features
.
.Sh DESCRIPTION
ZFS pool on-disk format versions are specified via
.Dq features
which replace the old on-disk format numbers
.Pq the last supported on-disk format number is 28 .
To enable a feature on a pool use the
.Nm zpool Cm upgrade ,
or set the
.Sy feature Ns @ Ns Ar feature-name
property to
.Sy enabled .
Please also see the
.Sx Compatibility feature sets
Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468
2021-02-18 05:30:45 +00:00
section for information on how sets of features may be enabled together.
.Pp
The pool format does not affect file system version compatibility or the ability
to send file systems between pools.
.Pp
Since most features can be enabled independently of each other, the on-disk
format of the pool is specified by the set of all features marked as
.Sy active
on the pool.
If the pool was created by another software version
this set may include unsupported features.
.
.Ss Identifying features
Every feature has a GUID of the form
.Ar com.example : Ns Ar feature-name .
The reversed DNS name ensures that the feature's GUID is unique across all ZFS
implementations.
When unsupported features are encountered on a pool they will
be identified by their GUIDs.
Refer to the documentation for the ZFS
implementation that created the pool for information about those features.
.Pp
Each supported feature also has a short name.
By convention a feature's short name is the portion of its GUID which follows
the
.Sq \&:
.Po
i.e.
.Ar com.example : Ns Ar feature-name
would have the short name
.Ar feature-name
.Pc ,
however a feature's short name may differ across ZFS implementations if
following the convention would result in name conflicts.
.
.Ss Feature states
Features can be in one of three states:
.Bl -tag -width "disabled"
.It Sy active
This feature's on-disk format changes are in effect on the pool.
Support for this feature is required to import the pool in read-write mode.
If this feature is not read-only compatible,
support is also required to import the pool in read-only mode
.Pq see Sx Read-only compatibility .
.It Sy enabled
An administrator has marked this feature as enabled on the pool, but the
feature's on-disk format changes have not been made yet.
The pool can still be imported by software that does not support this feature,
but changes may be made to the on-disk format at any time
which will move the feature to the
.Sy active
state.
Some features may support returning to the
.Sy enabled
state after becoming
.Sy active .
See feature-specific documentation for details.
.It Sy disabled
This feature's on-disk format changes have not been made and will not be made
unless an administrator moves the feature to the
.Sy enabled
state.
Features cannot be disabled once they have been enabled.
.El
.Pp
The state of supported features is exposed through pool properties of the form
.Sy feature Ns @ Ns Ar short-name .
.
.Ss Read-only compatibility
Some features may make on-disk format changes that do not interfere with other
software's ability to read from the pool.
These features are referred to as
.Dq read-only compatible .
If all unsupported features on a pool are read-only compatible,
the pool can be imported in read-only mode by setting the
.Sy readonly
property during import
.Po see
.Xr zpool-import 8
for details on importing pools
.Pc .
.
.Ss Unsupported features
For each unsupported feature enabled on an imported pool, a pool property
named
.Sy unsupported Ns @ Ns Ar feature-name
will indicate why the import was allowed despite the unsupported feature.
Possible values for this property are:
.Bl -tag -width "readonly"
.It Sy inactive
The feature is in the
.Sy enabled
state and therefore the pool's on-disk
format is still compatible with software that does not support this feature.
.It Sy readonly
The feature is read-only compatible and the pool has been imported in
read-only mode.
.El
.
.Ss Feature dependencies
Some features depend on other features being enabled in order to function.
Enabling a feature will automatically enable any features it depends on.
.
.Ss Compatibility feature sets
Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468
2021-02-18 05:30:45 +00:00
It is sometimes necessary for a pool to maintain compatibility with a
specific on-disk format, by enabling and disabling particular features.
The
.Sy compatibility
feature facilitates this by allowing feature sets to be read from text files.
When set to
.Sy off
.Pq the default ,
compatibility feature sets are disabled
.Pq i.e. all features are enabled ;
when set to
.Sy legacy ,
no features are enabled.
When set to a comma-separated list of filenames
.Po
each filename may either be an absolute path, or relative to
.Pa /etc/zfs/compatibility.d
or
.Pa /usr/share/zfs/compatibility.d
.Pc ,
the lists of requested features are read from those files,
separated by whitespace and/or commas.
Only features present in all files are enabled.
.Pp
Simple sanity checks are applied to the files:
they must be between 1 B and 16 KiB in size, and must end with a newline
character.
.Pp
Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468
2021-02-18 05:30:45 +00:00
The requested features are applied when a pool is created using
.Nm zpool Cm create Fl o Sy compatibility Ns = Ns Ar
and controls which features are enabled when using
.Nm zpool Cm upgrade .
.Nm zpool Cm status
Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468
2021-02-18 05:30:45 +00:00
will not show a warning about disabled features which are not part
of the requested feature set.
.Pp
The special value
.Sy legacy
prevents any features from being enabled, either via
.Nm zpool Cm upgrade
or
.Nm zpool Cm set Sy feature Ns @ Ns Ar feature-name Ns = Ns Sy enabled .
This setting also prevents pools from being upgraded to newer on-disk versions.
This is a safety measure to prevent new features from being
accidentally enabled, breaking compatibility.
.Pp
By convention, compatibility files in
.Pa /usr/share/zfs/compatibility.d
are provided by the distribution, and include feature sets
supported by important versions of popular distributions, and feature
sets commonly supported at the start of each year.
Compatibility files in
.Pa /etc/zfs/compatibility.d ,
if present, will take precedence over files with the same name in
.Pa /usr/share/zfs/compatibility.d .
.Pp
If an unrecognized feature is found in these files, an error message will
be shown.
If the unrecognized feature is in a file in
.Pa /etc/zfs/compatibility.d ,
this is treated as an error and processing will stop.
If the unrecognized feature is under
.Pa /usr/share/zfs/compatibility.d ,
this is treated as a warning and processing will continue.
This difference is to allow distributions to include features
which might not be recognized by the currently-installed binaries.
.Pp
Compatibility files may include comments:
any text from
.Sq #
to the end of the line is ignored.
.Pp
.Sy Example :
.Bd -literal -compact -offset 4n
.No example# Nm cat Pa /usr/share/zfs/compatibility.d/grub2
Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468
2021-02-18 05:30:45 +00:00
# Features which are supported by GRUB2
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
spacemap_histogram
.No example# Nm zpool Cm create Fl o Sy compatibility Ns = Ns Ar grub2 Ar bootpool Ar vdev
.Ed
.Pp
See
.Xr zpool-create 8
and
.Xr zpool-upgrade 8
for more information on how these commands are affected by feature sets.
.
.de feature
.It Sy \\$2
.Bl -tag -compact -width "READ-ONLY COMPATIBLE"
.It GUID
.Sy \\$1:\\$2
.if !"\\$4"" \{\
.It DEPENDENCIES
\fB\\$4\fP\c
.if !"\\$5"" , \fB\\$5\fP\c
.if !"\\$6"" , \fB\\$6\fP\c
.if !"\\$7"" , \fB\\$7\fP\c
.if !"\\$8"" , \fB\\$8\fP\c
.if !"\\$9"" , \fB\\$9\fP\c
.\}
.It READ-ONLY COMPATIBLE
\\$3
.El
.Pp
..
.
.ds instant-never \
.No This feature becomes Sy active No as soon as it is enabled \
and will never return to being Sy enabled .
.
.ds remount-upgrade \
.No Each filesystem will be upgraded automatically when remounted, \
or when a new file is created under that filesystem. \
The upgrade can also be triggered on filesystems via \
Nm zfs Cm set Sy version Ns = Ns Sy current Ar fs . \
No The upgrade process runs in the background and may take a while to complete \
for filesystems containing large amounts of files .
.
.de checksum-spiel
When the
.Sy \\$1
feature is set to
.Sy enabled ,
the administrator can turn on the
.Sy \\$1
checksum on any dataset using
.Nm zfs Cm set Sy checksum Ns = Ns Sy \\$1 Ar dset
.Po see Xr zfs-set 8 Pc .
This feature becomes
.Sy active
once a
.Sy checksum
property has been set to
.Sy \\$1 ,
and will return to being
.Sy enabled
once all filesystems that have ever had their checksum set to
.Sy \\$1
are destroyed.
..
.
.Sh FEATURES
The following features are supported on this system:
.Bl -tag -width Ds
.feature org.zfsonlinux allocation_classes yes
This feature enables support for separate allocation classes.
.Pp
This feature becomes
.Sy active
when a dedicated allocation class vdev
.Pq dedup or special
is created with the
.Nm zpool Cm create No or Nm zpool Cm add No commands .
With device removal, it can be returned to the
.Sy enabled
state if all the dedicated allocation class vdevs are removed.
.
.feature com.delphix async_destroy yes
Destroying a file system requires traversing all of its data in order to
return its used space to the pool.
Without
.Sy async_destroy ,
the file system is not fully removed until all space has been reclaimed.
If the destroy operation is interrupted by a reboot or power outage,
the next attempt to open the pool will need to complete the destroy
operation synchronously.
.Pp
When
.Sy async_destroy
is enabled, the file system's data will be reclaimed by a background process,
allowing the destroy operation to complete
without traversing the entire file system.
The background process is able to resume
interrupted destroys after the pool has been opened, eliminating the need
to finish interrupted destroys as part of the open operation.
The amount of space remaining to be reclaimed by the background process
is available through the
.Sy freeing
property.
.Pp
This feature is only
.Sy active
while
.Sy freeing
is non-zero.
.
Introduce BLAKE3 checksums as an OpenZFS feature This commit adds BLAKE3 checksums to OpenZFS, it has similar performance to Edon-R, but without the caveats around the latter. Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3 Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3 Short description of Wikipedia: BLAKE3 is a cryptographic hash function based on Bao and BLAKE2, created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real World Crypto. BLAKE3 is a single algorithm with many desirable features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. BLAKE3 has a binary tree structure, so it supports a practically unlimited degree of parallelism (both SIMD and multithreading) given enough input. The official Rust and C implementations are dual-licensed as public domain (CC0) and the Apache License. Along with adding the BLAKE3 hash into the OpenZFS infrastructure a new benchmarking file called chksum_bench was introduced. When read it reports the speed of the available checksum functions. On Linux: cat /proc/spl/kstat/zfs/chksum_bench On FreeBSD: sysctl kstat.zfs.misc.chksum_bench This is an example output of an i3-1005G1 test system with Debian 11: implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1196 1602 1761 1749 1762 1759 1751 skein-generic 546 591 608 615 619 612 616 sha256-generic 240 300 316 314 304 285 276 sha512-generic 353 441 467 476 472 467 426 blake3-generic 308 313 313 313 312 313 312 blake3-sse2 402 1289 1423 1446 1432 1458 1413 blake3-sse41 427 1470 1625 1704 1679 1607 1629 blake3-avx2 428 1920 3095 3343 3356 3318 3204 blake3-avx512 473 2687 4905 5836 5844 5643 5374 Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1840 2458 2665 2719 2711 2723 2693 skein-generic 870 966 996 992 1003 1005 1009 sha256-generic 415 442 453 455 457 457 457 sha512-generic 608 690 711 718 719 720 721 blake3-generic 301 313 311 309 309 310 310 blake3-sse2 343 1865 2124 2188 2180 2181 2186 blake3-sse41 364 2091 2396 2509 2463 2482 2488 blake3-avx2 365 2590 4399 4971 4915 4802 4764 Output on Debian 5.10.0-9-powerpc64le system: (POWER 9) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1213 1703 1889 1918 1957 1902 1907 skein-generic 434 492 520 522 511 525 525 sha256-generic 167 183 187 188 188 187 188 sha512-generic 186 216 222 221 225 224 224 blake3-generic 153 152 154 153 151 153 153 blake3-sse2 391 1170 1366 1406 1428 1426 1414 blake3-sse41 352 1049 1212 1174 1262 1258 1259 Output on Debian 5.10.0-11-arm64 system: (Pi400) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 487 603 629 639 643 641 641 skein-generic 271 299 303 308 309 309 307 sha256-generic 117 127 128 130 130 129 130 sha512-generic 145 165 170 172 173 174 175 blake3-generic 81 29 71 89 89 89 89 blake3-sse2 112 323 368 379 380 371 374 blake3-sse41 101 315 357 368 369 364 360 Structurally, the new code is mainly split into these parts: - 1x cross platform generic c variant: blake3_generic.c - 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512) - 2x assembly for ARMv8 (NEON converted from SSE2) - 2x assembly for PPC64-LE (POWER8 converted from SSE2) - one file for switching between the implementations Note the PPC64 assembly requires the VSX instruction set and the kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly. Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Co-authored-by: Rich Ercolani <rincebrain@gmail.com> Closes #10058 Closes #12918
2022-06-08 22:55:57 +00:00
.feature org.openzfs blake3 no extensible_dataset
This feature enables the use of the BLAKE3 hash algorithm for checksum and
dedup.
Introduce BLAKE3 checksums as an OpenZFS feature This commit adds BLAKE3 checksums to OpenZFS, it has similar performance to Edon-R, but without the caveats around the latter. Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3 Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3 Short description of Wikipedia: BLAKE3 is a cryptographic hash function based on Bao and BLAKE2, created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real World Crypto. BLAKE3 is a single algorithm with many desirable features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. BLAKE3 has a binary tree structure, so it supports a practically unlimited degree of parallelism (both SIMD and multithreading) given enough input. The official Rust and C implementations are dual-licensed as public domain (CC0) and the Apache License. Along with adding the BLAKE3 hash into the OpenZFS infrastructure a new benchmarking file called chksum_bench was introduced. When read it reports the speed of the available checksum functions. On Linux: cat /proc/spl/kstat/zfs/chksum_bench On FreeBSD: sysctl kstat.zfs.misc.chksum_bench This is an example output of an i3-1005G1 test system with Debian 11: implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1196 1602 1761 1749 1762 1759 1751 skein-generic 546 591 608 615 619 612 616 sha256-generic 240 300 316 314 304 285 276 sha512-generic 353 441 467 476 472 467 426 blake3-generic 308 313 313 313 312 313 312 blake3-sse2 402 1289 1423 1446 1432 1458 1413 blake3-sse41 427 1470 1625 1704 1679 1607 1629 blake3-avx2 428 1920 3095 3343 3356 3318 3204 blake3-avx512 473 2687 4905 5836 5844 5643 5374 Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1840 2458 2665 2719 2711 2723 2693 skein-generic 870 966 996 992 1003 1005 1009 sha256-generic 415 442 453 455 457 457 457 sha512-generic 608 690 711 718 719 720 721 blake3-generic 301 313 311 309 309 310 310 blake3-sse2 343 1865 2124 2188 2180 2181 2186 blake3-sse41 364 2091 2396 2509 2463 2482 2488 blake3-avx2 365 2590 4399 4971 4915 4802 4764 Output on Debian 5.10.0-9-powerpc64le system: (POWER 9) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1213 1703 1889 1918 1957 1902 1907 skein-generic 434 492 520 522 511 525 525 sha256-generic 167 183 187 188 188 187 188 sha512-generic 186 216 222 221 225 224 224 blake3-generic 153 152 154 153 151 153 153 blake3-sse2 391 1170 1366 1406 1428 1426 1414 blake3-sse41 352 1049 1212 1174 1262 1258 1259 Output on Debian 5.10.0-11-arm64 system: (Pi400) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 487 603 629 639 643 641 641 skein-generic 271 299 303 308 309 309 307 sha256-generic 117 127 128 130 130 129 130 sha512-generic 145 165 170 172 173 174 175 blake3-generic 81 29 71 89 89 89 89 blake3-sse2 112 323 368 379 380 371 374 blake3-sse41 101 315 357 368 369 364 360 Structurally, the new code is mainly split into these parts: - 1x cross platform generic c variant: blake3_generic.c - 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512) - 2x assembly for ARMv8 (NEON converted from SSE2) - 2x assembly for PPC64-LE (POWER8 converted from SSE2) - one file for switching between the implementations Note the PPC64 assembly requires the VSX instruction set and the kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly. Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Co-authored-by: Rich Ercolani <rincebrain@gmail.com> Closes #10058 Closes #12918
2022-06-08 22:55:57 +00:00
BLAKE3 is a secure hash algorithm focused on high performance.
.Pp
.checksum-spiel blake3
.
.feature com.delphix bookmarks yes extensible_dataset
This feature enables use of the
.Nm zfs Cm bookmark
command.
.Pp
This feature is
.Sy active
while any bookmarks exist in the pool.
All bookmarks in the pool can be listed by running
.Nm zfs Cm list Fl t Sy bookmark Fl r Ar poolname .
.
.feature com.datto bookmark_v2 no bookmark extensible_dataset
This feature enables the creation and management of larger bookmarks which are
needed for other features in ZFS.
.Pp
This feature becomes
.Sy active
when a v2 bookmark is created and will be returned to the
.Sy enabled
state when all v2 bookmarks are destroyed.
.
.feature com.delphix bookmark_written no bookmark extensible_dataset bookmark_v2
Implement Redacted Send/Receive Redacted send/receive allows users to send subsets of their data to a target system. One possible use case for this feature is to not transmit sensitive information to a data warehousing, test/dev, or analytics environment. Another is to save space by not replicating unimportant data within a given dataset, for example in backup tools like zrepl. Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots). Second, the new zfs redact command is used to create a redaction bookmark. The redaction bookmark stores the list of blocks in a snapshot that were modified by the redaction snapshot(s). Finally, the redaction bookmark is passed as a parameter to zfs send. When sending to the snapshot that was redacted, the redaction bookmark is used to filter out blocks that contain sensitive or unwanted information, and those blocks are not included in the send stream. When sending from the redaction bookmark, the blocks it contains are considered as candidate blocks in addition to those blocks in the destination snapshot that were modified since the creation_txg of the redaction bookmark. This step is necessary to allow the target to rehydrate data in the case where some blocks are accidentally or unnecessarily modified in the redaction snapshot. The changes to bookmarks to enable fast space estimation involve adding deadlists to bookmarks. There is also logic to manage the life cycles of these deadlists. The new size estimation process operates in cases where previously an accurate estimate could not be provided. In those cases, a send is performed where no data blocks are read, reducing the runtime significantly and providing a byte-accurate size estimate. Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Prashanth Sreenivasa <pks@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Chris Williamson <chris.williamson@delphix.com> Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com> Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #7958
2019-06-19 16:48:13 +00:00
This feature enables additional bookmark accounting fields, enabling the
.Sy written Ns # Ns Ar bookmark
property
.Pq space written since a bookmark
and estimates of send stream sizes for incrementals from bookmarks.
.Pp
This feature becomes
.Sy active
when a bookmark is created and will be
returned to the
.Sy enabled
state when all bookmarks with these fields are destroyed.
.
.feature org.openzfs device_rebuild yes
This feature enables the ability for the
.Nm zpool Cm attach
and
.Nm zpool Cm replace
commands to perform sequential reconstruction
.Pq instead of healing reconstruction
when resilvering.
.Pp
Add device rebuild feature The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <he.huang@intel.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: John Poduska <jpoduska@datto.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10349
2020-07-03 18:05:50 +00:00
Sequential reconstruction resilvers a device in LBA order without immediately
verifying the checksums.
Once complete, a scrub is started, which then verifies the checksums.
This approach allows full redundancy to be restored to the pool
in the minimum amount of time.
This two-phase approach will take longer than a healing resilver
when the time to verify the checksums is included.
However, unless there is additional pool damage,
no checksum errors should be reported by the scrub.
This feature is incompatible with raidz configurations.
.
This feature becomes
.Sy active
while a sequential resilver is in progress, and returns to
.Sy enabled
when the resilver completes.
.
.feature com.delphix device_removal no
This feature enables the
.Nm zpool Cm remove
command to remove top-level vdevs,
evacuating them to reduce the total size of the pool.
.Pp
This feature becomes
.Sy active
when the
.Nm zpool Cm remove
command is used
on a top-level vdev, and will never return to being
.Sy enabled .
.
.feature org.openzfs draid no
This feature enables use of the
.Sy draid
vdev type.
dRAID is a variant of RAID-Z which provides integrated distributed
hot spares that allow faster resilvering while retaining the benefits of RAID-Z.
Data, parity, and spare space are organized in redundancy groups
and distributed evenly over all of the devices.
.Pp
This feature becomes
.Sy active
when creating a pool which uses the
.Sy draid
vdev type, or when adding a new
.Sy draid
vdev to an existing pool.
.
.feature org.illumos edonr no extensible_dataset
This feature enables the use of the Edon-R hash algorithm for checksum,
including for nopwrite
.Po if compression is also enabled, an overwrite of
a block whose checksum matches the data being written will be ignored
.Pc .
In an abundance of caution, Edon-R requires verification when used with
dedup:
.Nm zfs Cm set Sy dedup Ns = Ns Sy edonr , Ns Sy verify
.Po see Xr zfs-set 8 Pc .
.Pp
Edon-R is a very high-performance hash algorithm that was part
of the NIST SHA-3 competition.
It provides extremely high hash performance
.Pq over 350% faster than SHA-256 ,
but was not selected because of its unsuitability
as a general purpose secure hash algorithm.
This implementation utilizes the new salted checksumming functionality
in ZFS, which means that the checksum is pre-seeded with a secret
256-bit random key
.Pq stored on the pool
before being fed the data block to be checksummed.
Thus the produced checksums are unique to a given pool,
preventing hash collision attacks on systems with dedup.
Introduce BLAKE3 checksums as an OpenZFS feature This commit adds BLAKE3 checksums to OpenZFS, it has similar performance to Edon-R, but without the caveats around the latter. Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3 Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3 Short description of Wikipedia: BLAKE3 is a cryptographic hash function based on Bao and BLAKE2, created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real World Crypto. BLAKE3 is a single algorithm with many desirable features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. BLAKE3 has a binary tree structure, so it supports a practically unlimited degree of parallelism (both SIMD and multithreading) given enough input. The official Rust and C implementations are dual-licensed as public domain (CC0) and the Apache License. Along with adding the BLAKE3 hash into the OpenZFS infrastructure a new benchmarking file called chksum_bench was introduced. When read it reports the speed of the available checksum functions. On Linux: cat /proc/spl/kstat/zfs/chksum_bench On FreeBSD: sysctl kstat.zfs.misc.chksum_bench This is an example output of an i3-1005G1 test system with Debian 11: implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1196 1602 1761 1749 1762 1759 1751 skein-generic 546 591 608 615 619 612 616 sha256-generic 240 300 316 314 304 285 276 sha512-generic 353 441 467 476 472 467 426 blake3-generic 308 313 313 313 312 313 312 blake3-sse2 402 1289 1423 1446 1432 1458 1413 blake3-sse41 427 1470 1625 1704 1679 1607 1629 blake3-avx2 428 1920 3095 3343 3356 3318 3204 blake3-avx512 473 2687 4905 5836 5844 5643 5374 Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1840 2458 2665 2719 2711 2723 2693 skein-generic 870 966 996 992 1003 1005 1009 sha256-generic 415 442 453 455 457 457 457 sha512-generic 608 690 711 718 719 720 721 blake3-generic 301 313 311 309 309 310 310 blake3-sse2 343 1865 2124 2188 2180 2181 2186 blake3-sse41 364 2091 2396 2509 2463 2482 2488 blake3-avx2 365 2590 4399 4971 4915 4802 4764 Output on Debian 5.10.0-9-powerpc64le system: (POWER 9) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 1213 1703 1889 1918 1957 1902 1907 skein-generic 434 492 520 522 511 525 525 sha256-generic 167 183 187 188 188 187 188 sha512-generic 186 216 222 221 225 224 224 blake3-generic 153 152 154 153 151 153 153 blake3-sse2 391 1170 1366 1406 1428 1426 1414 blake3-sse41 352 1049 1212 1174 1262 1258 1259 Output on Debian 5.10.0-11-arm64 system: (Pi400) implementation 1k 4k 16k 64k 256k 1m 4m edonr-generic 487 603 629 639 643 641 641 skein-generic 271 299 303 308 309 309 307 sha256-generic 117 127 128 130 130 129 130 sha512-generic 145 165 170 172 173 174 175 blake3-generic 81 29 71 89 89 89 89 blake3-sse2 112 323 368 379 380 371 374 blake3-sse41 101 315 357 368 369 364 360 Structurally, the new code is mainly split into these parts: - 1x cross platform generic c variant: blake3_generic.c - 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512) - 2x assembly for ARMv8 (NEON converted from SSE2) - 2x assembly for PPC64-LE (POWER8 converted from SSE2) - one file for switching between the implementations Note the PPC64 assembly requires the VSX instruction set and the kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly. Reviewed-by: Felix Dörre <felix@dogcraft.de> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Co-authored-by: Rich Ercolani <rincebrain@gmail.com> Closes #10058 Closes #12918
2022-06-08 22:55:57 +00:00
.Pp
.checksum-spiel edonr
.
.feature com.delphix embedded_data no
This feature improves the performance and compression ratio of
highly-compressible blocks.
Blocks whose contents can compress to 112 bytes
or smaller can take advantage of this feature.
.Pp
When this feature is enabled, the contents of highly-compressible blocks are
stored in the block
.Dq pointer
itself
.Po a misnomer in this case, as it contains
the compressed data, rather than a pointer to its location on disk
.Pc .
Thus the space of the block
.Pq one sector, typically 512 B or 4 KiB
is saved, and no additional I/O is needed to read and write the data block.
.
\*[instant-never]
.
.feature com.delphix empty_bpobj yes
This feature increases the performance of creating and using a large
number of snapshots of a single filesystem or volume, and also reduces
the disk space required.
.Pp
When there are many snapshots, each snapshot uses many Block Pointer
Objects
.Pq bpobjs
to track blocks associated with that snapshot.
However, in common use cases, most of these bpobjs are empty.
This feature allows us to create each bpobj on-demand,
thus eliminating the empty bpobjs.
.Pp
This feature is
.Sy active
while there are any filesystems, volumes,
or snapshots which were created after enabling this feature.
.
.feature com.delphix enabled_txg yes
Once this feature is enabled, ZFS records the transaction group number
in which new features are enabled.
This has no user-visible impact, but other features may depend on this feature.
.Pp
This feature becomes
.Sy active
as soon as it is enabled and will never return to being
.Sy enabled .
.
.feature com.datto encryption no bookmark_v2 extensible_dataset
This feature enables the creation and management of natively encrypted datasets.
.Pp
This feature becomes
.Sy active
when an encrypted dataset is created and will be returned to the
.Sy enabled
state when all datasets that use this feature are destroyed.
.
.feature com.delphix extensible_dataset no
This feature allows more flexible use of internal ZFS data structures,
and exists for other features to depend on.
.Pp
This feature will be
.Sy active
when the first dependent feature uses it, and will be returned to the
.Sy enabled
state when all datasets that use this feature are destroyed.
.
.feature com.joyent filesystem_limits yes extensible_dataset
This feature enables filesystem and snapshot limits.
These limits can be used to control how many filesystems and/or snapshots
can be created at the point in the tree on which the limits are set.
.Pp
This feature is
.Sy active
once either of the limit properties has been set on a dataset
and will never return to being
.Sy enabled .
.
Improve zpool status output, list all affected datasets Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of #9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Co-authored-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9175 Closes #12812
2022-04-26 00:25:42 +00:00
.feature com.delphix head_errlog no
This feature enables the upgraded version of errlog, which required an on-disk
error log format change.
Now the error log of each head dataset is stored separately in the zap object
and keyed by the head id.
In case of encrypted filesystems with unloaded keys or unmounted encrypted
filesystems we are unable to check their snapshots or clones for errors and
these will not be reported.
Improve zpool status output, list all affected datasets Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of #9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Co-authored-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9175 Closes #12812
2022-04-26 00:25:42 +00:00
With this feature enabled, every dataset affected by an error block is listed
in the output of
.Nm zpool Cm status .
.Pp
\*[instant-never]
.
.feature com.delphix hole_birth no enabled_txg
This feature has/had bugs, the result of which is that, if you do a
.Nm zfs Cm send Fl i
.Pq or Fl R , No since it uses Fl i
from an affected dataset, the receiving party will not see any checksum
or other errors, but the resulting destination snapshot
will not match the source.
Its use by
.Nm zfs Cm send Fl i
has been disabled by default
.Po
see
.Sy send_holes_without_birth_time
in
.Xr zfs 4
.Pc .
.Pp
This feature improves performance of incremental sends
.Pq Nm zfs Cm send Fl i
and receives for objects with many holes.
The most common case of hole-filled objects is zvols.
.Pp
An incremental send stream from snapshot
.Sy A No to snapshot Sy B
contains information about every block that changed between
.Sy A No and Sy B .
Blocks which did not change between those snapshots can be
identified and omitted from the stream using a piece of metadata called
the
.Dq block birth time ,
but birth times are not recorded for holes
.Pq blocks filled only with zeroes .
Since holes created after
.Sy A No cannot be distinguished from holes created before Sy A ,
information about every hole in the entire filesystem or zvol
is included in the send stream.
.Pp
For workloads where holes are rare this is not a problem.
However, when incrementally replicating filesystems or zvols with many holes
.Pq for example a zvol formatted with another filesystem
a lot of time will be spent sending and receiving unnecessary information
about holes that already exist on the receiving side.
.Pp
Once the
.Sy hole_birth
feature has been enabled the block birth times
of all new holes will be recorded.
Incremental sends between snapshots created after this feature is enabled
will use this new metadata to avoid sending information about holes that
already exist on the receiving side.
.Pp
\*[instant-never]
.
.feature org.open-zfs large_blocks no extensible_dataset
This feature allows the record size on a dataset to be set larger than 128 KiB.
.Pp
This feature becomes
.Sy active
once a dataset contains a file with a block size larger than 128 KiB,
and will return to being
.Sy enabled
once all filesystems that have ever had their recordsize larger than 128 KiB
are destroyed.
.
.feature org.zfsonlinux large_dnode no extensible_dataset
This feature allows the size of dnodes in a dataset to be set larger than 512 B.
.
This feature becomes
.Sy active
once a dataset contains an object with a dnode larger than 512 B,
which occurs as a result of setting the
.Sy dnodesize
dataset property to a value other than
.Sy legacy .
The feature will return to being
.Sy enabled
once all filesystems that have ever contained a dnode larger than 512 B
are destroyed.
Large dnodes allow more data to be stored in the bonus buffer,
thus potentially improving performance by avoiding the use of spill blocks.
.
.feature com.delphix livelist yes
This feature allows clones to be deleted faster than the traditional method
when a large number of random/sparse writes have been made to the clone.
All blocks allocated and freed after a clone is created are tracked by the
the clone's livelist which is referenced during the deletion of the clone.
The feature is activated when a clone is created and remains
.Sy active
until all clones have been destroyed.
.
.feature com.delphix log_spacemap yes com.delphix:spacemap_v2
This feature improves performance for heavily-fragmented pools,
especially when workloads are heavy in random-writes.
It does so by logging all the metaslab changes on a single spacemap every TXG
instead of scattering multiple writes to all the metaslab spacemaps.
.Pp
\*[instant-never]
.
.feature org.illumos lz4_compress no
.Sy lz4
is a high-performance real-time compression algorithm that
features significantly faster compression and decompression as well as a
higher compression ratio than the older
.Sy lzjb
compression.
Typically,
.Sy lz4
compression is approximately 50% faster on compressible data and 200% faster
on incompressible data than
.Sy lzjb .
It is also approximately 80% faster on decompression,
while giving approximately a 10% better compression ratio.
.Pp
When the
.Sy lz4_compress
feature is set to
.Sy enabled ,
the administrator can turn on
.Sy lz4
compression on any dataset on the pool using the
.Xr zfs-set 8
command.
All newly written metadata will be compressed with the
.Sy lz4
algorithm.
.Pp
\*[instant-never]
.
.feature com.joyent multi_vdev_crash_dump no
This feature allows a dump device to be configured with a pool comprised
of multiple vdevs.
Those vdevs may be arranged in any mirrored or raidz configuration.
.Pp
When the
.Sy multi_vdev_crash_dump
feature is set to
.Sy enabled ,
the administrator can use
.Xr dumpadm 8
to configure a dump device on a pool comprised of multiple vdevs.
.Pp
Under
.Fx
and Linux this feature is unused, but registered for compatibility.
New pools created on these systems will have the feature
.Sy enabled
but will never transition to
.Sy active ,
as this functionality is not required for crash dump support.
Existing pools where this feature is
.Sy active
can be imported.
.
.feature com.delphix obsolete_counts yes device_removal
This feature is an enhancement of
.Sy device_removal ,
which will over time reduce the memory used to track removed devices.
When indirect blocks are freed or remapped,
we note that their part of the indirect mapping is
.Dq obsolete
no longer needed.
.Pp
This feature becomes
.Sy active
when the
.Nm zpool Cm remove
command is used on a top-level vdev, and will never return to being
.Sy enabled .
.
.feature org.zfsonlinux project_quota yes extensible_dataset
This feature allows administrators to account the spaces and objects usage
information against the project identifier
.Pq ID .
.Pp
The project ID is an object-based attribute.
When upgrading an existing filesystem,
objects without a project ID will be assigned a zero project ID.
When this feature is enabled, newly created objects inherit
their parent directories' project ID if the parent's inherit flag is set
.Pq via Nm chattr Sy [+-]P No or Nm zfs Cm project Fl s Ns | Ns Fl C .
Otherwise, the new object's project ID will be zero.
An object's project ID can be changed at any time by the owner
.Pq or privileged user
via
.Nm chattr Fl p Ar prjid
or
.Nm zfs Cm project Fl p Ar prjid .
.Pp
This feature will become
.Sy active
as soon as it is enabled and will never return to being
.Sy disabled .
\*[remount-upgrade]
.
.feature com.delphix redaction_bookmarks no bookmarks extensible_dataset
This feature enables the use of redacted
.Nm zfs Cm send Ns s ,
which create redaction bookmarks storing the list of blocks
redacted by the send that created them.
For more information about redacted sends, see
.Xr zfs-send 8 .
.
.feature com.delphix redacted_datasets no extensible_dataset
This feature enables the receiving of redacted
.Nm zfs Cm send
streams, which create redacted datasets when received.
These datasets are missing some of their blocks,
and so cannot be safely mounted, and their contents cannot be safely read.
For more information about redacted receives, see
.Xr zfs-send 8 .
.
.feature com.datto resilver_defer yes
This feature allows ZFS to postpone new resilvers if an existing one is already
in progress.
Without this feature, any new resilvers will cause the currently
running one to be immediately restarted from the beginning.
.Pp
This feature becomes
.Sy active
once a resilver has been deferred, and returns to being
.Sy enabled
when the deferred resilver begins.
.
.feature org.illumos sha512 no extensible_dataset
OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Ported by: Tony Hutter <hutter2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Porting Notes: This code is ported on top of the Illumos Crypto Framework code: https://github.com/zfsonlinux/zfs/pull/4329/commits/b5e030c8dbb9cd393d313571dee4756fbba8c22d The list of porting changes includes: - Copied module/icp/include/sha2/sha2.h directly from illumos - Removed from module/icp/algs/sha2/sha2.c: #pragma inline(SHA256Init, SHA384Init, SHA512Init) - Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since it now takes in an extra parameter. - Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c - Added skein & edonr to libicp/Makefile.am - Added sha512.S. It was generated from sha512-x86_64.pl in Illumos. - Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument. - In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section to not #include the non-existant endian.h. - In skein_test.c, renane NULL to 0 in "no test vector" array entries to get around a compiler warning. - Fixup test files: - Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>, - Remove <note.h> and define NOTE() as NOP. - Define u_longlong_t - Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p" - Rename NULL to 0 in "no test vector" array entries to get around a compiler warning. - Remove "for isa in $($ISAINFO); do" stuff - Add/update Makefiles - Add some userspace headers like stdio.h/stdlib.h in places of sys/types.h. - EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules. - Update scripts/zfs2zol-patch.sed - include <sys/sha2.h> in sha2_impl.h - Add sha2.h to include/sys/Makefile.am - Add skein and edonr dirs to icp Makefile - Add new checksums to zpool_get.cfg - Move checksum switch block from zfs_secpolicy_setprop() to zfs_check_settable() - Fix -Wuninitialized error in edonr_byteorder.h on PPC - Fix stack frame size errors on ARM32 - Don't unroll loops in Skein on 32-bit to save stack space - Add memory barriers in sha2.c on 32-bit to save stack space - Add filetest_001_pos.ksh checksum sanity test - Add option to write psudorandom data in file_write utility
2016-06-15 22:47:05 +00:00
This feature enables the use of the SHA-512/256 truncated hash algorithm
.Pq FIPS 180-4
for checksum and dedup.
The native 64-bit arithmetic of SHA-512 provides an approximate 50%
performance boost over SHA-256 on 64-bit hardware
and is thus a good minimum-change replacement candidate
for systems where hash performance is important,
but these systems cannot for whatever reason utilize the faster
.Sy skein No and Sy edonr
algorithms.
.Pp
.checksum-spiel sha512
.
.feature org.illumos skein no extensible_dataset
This feature enables the use of the Skein hash algorithm for checksum and dedup.
Skein is a high-performance secure hash algorithm that was a
finalist in the NIST SHA-3 competition.
It provides a very high security margin and high performance on 64-bit hardware
.Pq 80% faster than SHA-256 .
This implementation also utilizes the new salted checksumming
OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Ported by: Tony Hutter <hutter2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Porting Notes: This code is ported on top of the Illumos Crypto Framework code: https://github.com/zfsonlinux/zfs/pull/4329/commits/b5e030c8dbb9cd393d313571dee4756fbba8c22d The list of porting changes includes: - Copied module/icp/include/sha2/sha2.h directly from illumos - Removed from module/icp/algs/sha2/sha2.c: #pragma inline(SHA256Init, SHA384Init, SHA512Init) - Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since it now takes in an extra parameter. - Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c - Added skein & edonr to libicp/Makefile.am - Added sha512.S. It was generated from sha512-x86_64.pl in Illumos. - Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument. - In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section to not #include the non-existant endian.h. - In skein_test.c, renane NULL to 0 in "no test vector" array entries to get around a compiler warning. - Fixup test files: - Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>, - Remove <note.h> and define NOTE() as NOP. - Define u_longlong_t - Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p" - Rename NULL to 0 in "no test vector" array entries to get around a compiler warning. - Remove "for isa in $($ISAINFO); do" stuff - Add/update Makefiles - Add some userspace headers like stdio.h/stdlib.h in places of sys/types.h. - EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules. - Update scripts/zfs2zol-patch.sed - include <sys/sha2.h> in sha2_impl.h - Add sha2.h to include/sys/Makefile.am - Add skein and edonr dirs to icp Makefile - Add new checksums to zpool_get.cfg - Move checksum switch block from zfs_secpolicy_setprop() to zfs_check_settable() - Fix -Wuninitialized error in edonr_byteorder.h on PPC - Fix stack frame size errors on ARM32 - Don't unroll loops in Skein on 32-bit to save stack space - Add memory barriers in sha2.c on 32-bit to save stack space - Add filetest_001_pos.ksh checksum sanity test - Add option to write psudorandom data in file_write utility
2016-06-15 22:47:05 +00:00
functionality in ZFS, which means that the checksum is pre-seeded with a
secret 256-bit random key
.Pq stored on the pool
before being fed the data block to be checksummed.
Thus the produced checksums are unique to a given pool,
preventing hash collision attacks on systems with dedup.
.Pp
.checksum-spiel skein
.
.feature com.delphix spacemap_histogram yes
This features allows ZFS to maintain more information about how free space
is organized within the pool.
If this feature is
.Sy enabled ,
it will be activated when a new space map object is created, or
an existing space map is upgraded to the new format,
and never returns back to being
.Sy enabled .
.
.feature com.delphix spacemap_v2 yes
This feature enables the use of the new space map encoding which
consists of two words
.Pq instead of one
whenever it is advantageous.
The new encoding allows space maps to represent large regions of
space more efficiently on-disk while also increasing their maximum
addressable offset.
.Pp
This feature becomes
.Sy active
once it is
.Sy enabled ,
and never returns back to being
.Sy enabled .
.
.feature org.zfsonlinux userobj_accounting yes extensible_dataset
This feature allows administrators to account the object usage information
by user and group.
.Pp
\*[instant-never]
\*[remount-upgrade]
.
log xattr=sa create/remove/update to ZIL As such, there are no specific synchronous semantics defined for the xattrs. But for xattr=on, it does log to ZIL and zil_commit() is done, if sync=always is set on dataset. This provides sync semantics for xattr=on with sync=always set on dataset. For the xattr=sa implementation, it doesn't log to ZIL, so, even with sync=always, xattrs are not guaranteed to be synced before xattr call returns to caller. So, xattr can be lost if system crash happens, before txg carrying xattr transaction is synced. This change adds xattr=sa logging to ZIL on xattr create/remove/update and xattrs are synced to ZIL (zil_commit() done) for sync=always. This makes xattr=sa behavior similar to xattr=on. Implementation notes: The actual logging is fairly straight-forward and does not warrant additional explanation. However, it has been 14 years since we last added new TX types to the ZIL [1], hence this is the first time we do it after the introduction of zpool features. Therefore, here is an overview of the feature activation and deactivation workflow: 1. The feature must be enabled. Otherwise, we don't log the new record type. This ensures compatibility with older software. 2. The feature is activated per-dataset, since the ZIL is per-dataset. 3. If the feature is enabled and dataset is not for zvol, any append to the ZIL chain will activate the feature for the dataset. Likewise for starting a new ZIL chain. 4. A dataset that doesn't have a ZIL chain has the feature deactivated. We ensure (3) by activating on the first zil_commit() after the feature was enabled. Since activating the features requires waiting for txg sync, the first zil_commit() after enabling the feature will be slower than usual. The downside is that this is really a conservative approximation: even if we never append a 'TX_SETSAXATTR' to the ZIL chain, we pay the penalty for feature activation. The upside is that the user is in control of when we pay the penalty, i.e., upon enabling the feature. We ensure (4) by hooking into zil_sync(), where ZIL destroy actually happens. One more piece on feature activation, since it's spread across multiple functions: zil_commit() zil_process_commit_list() if lwb == NULL // first zil_commit since zil_open zil_create() if no log block pointer in ZIL header: if feature enabled and not active: // CASE 1 enable, COALESCE txg wait with dmu_tx that allocated the log block else // log block was allocated earlier than this zil_open if feature enabled and not active: // CASE 2 enable, EXPLICIT txg wait else // already have an in-DRAM LWB if feature enabled and not active: // this happens when we enable the feature after zil_create // CASE 3 enable, EXPLICIT txg wait [1] https://github.com/illumos/illumos-gate/commit/da6c28aaf62fa55f0fdb8004aa40f88f23bf53f0 Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com> Closes #8768 Closes #9078
2022-02-22 21:06:43 +00:00
.feature org.openzfs zilsaxattr yes extensible_dataset
This feature enables
.Sy xattr Ns = Ns Sy sa
extended attribute logging in the ZIL.
If enabled, extended attribute changes
.Pq both Sy xattrdir Ns = Ns Sy dir No and Sy xattr Ns = Ns Sy sa
are guaranteed to be durable if either the dataset had
.Sy sync Ns = Ns Sy always
set at the time the changes were made, or
.Xr sync 2
is called on the dataset after the changes were made.
.Pp
This feature becomes
.Sy active
when a ZIL is created for at least one dataset and will be returned to the
.Sy enabled
state when it is destroyed for all datasets that use this feature.
.
.feature com.delphix zpool_checkpoint yes
This feature enables the
.Nm zpool Cm checkpoint
command that can checkpoint the state of the pool
at the time it was issued and later rewind back to it or discard it.
.Pp
This feature becomes
.Sy active
when the
.Nm zpool Cm checkpoint
command is used to checkpoint the pool.
The feature will only return back to being
.Sy enabled
when the pool is rewound or the checkpoint has been discarded.
.
.feature org.freebsd zstd_compress no extensible_dataset
.Sy zstd
is a high-performance compression algorithm that features a
combination of high compression ratios and high speed.
Compared to
.Sy gzip ,
.Sy zstd
offers slightly better compression at much higher speeds.
Compared to
.Sy lz4 ,
.Sy zstd
offers much better compression while being only modestly slower.
Typically,
.Sy zstd
compression speed ranges from 250 to 500 MB/s per thread
and decompression speed is over 1 GB/s per thread.
.Pp
When the
.Sy zstd
feature is set to
.Sy enabled ,
the administrator can turn on
.Sy zstd
compression of any dataset using
.Nm zfs Cm set Sy compress Ns = Ns Sy zstd Ar dset
.Po see Xr zfs-set 8 Pc .
This feature becomes
.Sy active
once a
.Sy compress
property has been set to
.Sy zstd ,
and will return to being
.Sy enabled
once all filesystems that have ever had their
.Sy compress
property set to
.Sy zstd
are destroyed.
.El
.
.Sh SEE ALSO
.Xr zfs 8 ,
.Xr zpool 8