Commit Graph

384 Commits

Author SHA1 Message Date
Richard Laager 9810410a53 Eliminate most mentions of "special"
Previously, the "spare" vdev type was described as "A special
pseudo-vdev which...".  I wanted to eliminate the word "special" from
that, now that the allocation_classes feature exists and there is such a
thing as a "special vdev".  I ended up eliminating almost all instances
of the word "special" that are not referencing the allocation_classes
feature.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #8626
2019-04-16 09:59:57 -07:00
Josh Soref af65079300 Hint about zpool free vs zfs available
Also describe free/allocated/fragmentation

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Closes #7565 
Closes #8483
2019-04-04 10:00:16 -07:00
Josh Soref f72ecb8d27 Fix man(1) warnings
The macOS man app strenuously objects to blank lines in man files.

mdoc warning: Empty input line #xyz

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Closes #8559
2019-04-02 11:05:09 -07:00
Tom Caputi dd29864b01 Update raw send documentation
This patch simply clarifies some of the limitations related to
raw sends in the man page. No functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jason Cohen <jwittlincohen@gmail.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8503
Closes #8544
2019-03-29 15:05:55 -07:00
Brian Behlendorf 1b939560be
Add TRIM support
UNMAP/TRIM support is a frequently-requested feature to help
prevent performance from degrading on SSDs and on various other
SAN-like storage back-ends.  By issuing UNMAP/TRIM commands for
sectors which are no longer allocated the underlying device can
often more efficiently manage itself.

This TRIM implementation is modeled on the `zpool initialize`
feature which writes a pattern to all unallocated space in the
pool.  The new `zpool trim` command uses the same vdev_xlate()
code to calculate what sectors are unallocated, the same per-
vdev TRIM thread model and locking, and the same basic CLI for
a consistent user experience.  The core difference is that
instead of writing a pattern it will issue UNMAP/TRIM commands
for those extents.

The zio pipeline was updated to accommodate this by adding a new
ZIO_TYPE_TRIM type and associated spa taskq.  This new type makes
is straight forward to add the platform specific TRIM/UNMAP calls
to vdev_disk.c and vdev_file.c.  These new ZIO_TYPE_TRIM zios are
handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
This makes it possible to largely avoid changing the pipieline,
one exception is that TRIM zio's may exceed the 16M block size
limit since they contain no data.

In addition to the manual `zpool trim` command, a background
automatic TRIM was added and is controlled by the 'autotrim'
property.  It relies on the exact same infrastructure as the
manual TRIM.  However, instead of relying on the extents in a
metaslab's ms_allocatable range tree, a ms_trim tree is kept
per metaslab.  When 'autotrim=on', ranges added back to the
ms_allocatable tree are also added to the ms_free tree.  The
ms_free tree is then periodically consumed by an autotrim
thread which systematically walks a top level vdev's metaslabs.

Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`.  This
may occur when the freed blocks are small and not enough time
was allowed to aggregate them.  An automatic TRIM and a manual
`zpool trim` may be run concurrently, in which case the automatic
TRIM will yield to the manual TRIM.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Contributions-by: Tim Chase <tim@chase2k.com>
Contributions-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8419 
Closes #598
2019-03-29 09:13:20 -07:00
Evan Allrich 74580a9411 Correct a very minor grammar issue
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Evan Allrich <evan@unguku.com>
Closes #8535
2019-03-26 12:27:29 -07:00
Tom Caputi ab7615d92c Multiple DVA Scrubbing Fix
Currently, there is an issue in the sequential scrub code which
prevents self healing from working in some cases. The scrub code
will split up all DVA copies of a bp and issue each of them
separately. The problem is that, since each of the DVAs is no
longer associated with the others, the self healing code doesn't
have the opportunity to repair problems that show up in one of the
DVAs with the data from the others.

This patch fixes this issue by ensuring that all IOs issued by the
sequential scrub code include all DVAs. Initially, only the first
DVA of each is attempted. If an issue arises, the IO is retried
with all available copies, giving the self healing code a chance
to correct the issue.

To test this change, this patch also adds the ability for zinject
to specify individual DVAs to inject read errors into. We then
add a new test case that utilizes this functionality to ensure
scrubs and self-healing reads can handle and transparently fix
issues with individual copies of blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8453
2019-03-15 14:14:31 -07:00
Tom Caputi f00ab3f22c Detect and prevent mixed raw and non-raw sends
Currently, there is an issue in the raw receive code where
raw receives are allowed to happen on top of previously
non-raw received datasets. This is a problem because the
source-side dataset doesn't know about how the blocks on
the destination were encrypted. As a result, any MAC in
the objset's checksum-of-MACs tree that is a parent of both
blocks encrypted on the source and blocks encrypted by the
destination will be incorrect. This will result in
authentication errors when we decrypt the dataset.

This patch fixes this issue by adding a new check to the
raw receive code. The code now maintains an "IVset guid",
which acts as an identifier for the set of IVs used to
encrypt a given snapshot. When a snapshot is raw received,
the destination snapshot will take this value from the
DRR_BEGIN payload. Non-raw receives and normal "zfs snap"
operations will cause ZFS to generate a new IVset guid.
When a raw incremental stream is received, ZFS will check
that the "from" IVset guid in the stream matches that of
the "from" destination snapshot. If they do not match, the
code will error out the receive, preventing the problem.

This patch requires an on-disk format change to add the
IVset guids to snapshots and bookmarks. As a result, this
patch has errata handling and a tunable to help affected
users resolve the issue with as little interruption as
possible.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8308
2019-03-13 11:00:43 -07:00
Olaf Faaland 8133679ff0 Do not resume a pool if multihost is enabled
When multihost is enabled, and a pool is suspended, return
EINVAL in response to "zpool clear <pool>".  The pool
may have been imported on another host while I/O was suspended.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6933 
Closes #8460
2019-02-28 17:56:19 -08:00
Olaf Faaland 4f3218aed8 Warn user about accidentally sharing devices
Improve the man page text to warn the user about the risk of adding
the same device to multiple pools via simultaneous "zpool create",
"zpool add", "zpool replace", etc.

State that MMP/multihost does not protect against these scenarios.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6473 
Closes #8457
2019-02-28 17:54:36 -08:00
Matthew Ahrens c568ab8d99 zfs.8 has wrong description of "zfs program -t"
The "-t" argument to "zfs program" specifies a limit on the number of
LUA instructions that can be executed.  The zfs.8 manpage has the wrong
description.  It should be updated to match what's in zfs-program.8

Also fix the formatting of the zfs help message.

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8410
2019-02-26 11:15:28 -08:00
DeHackEd ba7b05cb25 zfs(8): improve document of compression behaviours
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: DHE <git@dehacked.net>
Closes #4660 
Closes #8423
2019-02-25 11:10:16 -08:00
kpande f8bb2a7e0c Clarify zpool iostat statistics reporting
Document expected behavior for  zpool iostat statistics reporting.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Kash Pande <kash@tripleback.net>
Closes #2888 
Closes #8417
2019-02-21 14:00:48 -08:00
Anatoly Borodin f23b0242b6 Fix '-T u|d' descriptions in zpool(8)
In

	-T u|d  Display a time stamp.  Specify -u for a printed
		representation of the internal representation of time.
		See time(2).  Specify -d for standard date format.
		See date(1).

'Specify u' and 'Specify d' should be used instead. `zpool list -T -u`
does not work.

Bring the descriptions in `zpool list` and `zpool status` in sync with
`zpool iostat`.

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Anatoly Borodin <anatoly.borodin@gmail.com>
Closes #8438
2019-02-21 11:22:06 -08:00
kpande 11f6127aba zfs mount man page should document legacy behaviour
Document legacy mount behavior.

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Kash Pande <kash@tripleback.net>
Closes #2900
Closes #8414
2019-02-19 11:10:57 -08:00
Paul Zuchowski 9c5e88b1de zfs should optionally send holds
Add -h switch to zfs send command to send dataset holds. If
holds are present in the stream, zfs receive will create them
on the target dataset, unless the zfs receive -h option is used
to skip receive of holds.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #7513
2019-02-15 12:41:38 -08:00
loli10K 4417096956 Pool allocation classes misplacing small file blocks
Due to an off-by-one condition in spa_preferred_class() we are picking
the "normal" allocation class instead of the "special" one for file
blocks with size equal to the special_small_blocks property value.

This change fix the small code issue, update the ZFS Test Suite and the
zfs(8) man page.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8351
Closes #8361
2019-02-08 12:32:12 -08:00
loli10K 1a745ef62e zstreamdump: -d option is not documented in manpage
This change simply documents the missing -d (dump contents) option in
zstreamdump(8).

Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8369
2019-02-04 09:13:00 -08:00
Serapheim Dimitropoulos 21e7cf5da8 zdb -L should skip leak detection altogether
Currently the point of -L option in zdb is to  disable leak
tracing and the loading of space maps because they are expensive,
yet still do leak detection in terms of space. Unfortunately,
there is a scenario where this is a lie. If we are using zdb -L
on a pool where a vdev is being removed, zdb_claim_removing()
will open the metaslab space maps of that device.

This patch makes it so zdb -L skips leak detection altogether
and ensures that no space maps are loaded.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8335
2019-01-30 09:54:27 -08:00
Damian Wojsław 8fccfa8e17 zpool iostat should print headers when terminal fills
When `zpool iostat` fills the terminal the headers should be
printed again.  `zpool iostat -n` can be used to suppress this.

If the command is not attached to a tty, headers will not be
printed so as to not break existing scripts.

Reviewed-by: Joshua M. Clulow <josh@sysmgr.org>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Damian Wojsław <damian@wojslaw.pl>
Closes #8235
Closes #8262
2019-01-23 13:29:49 -08:00
Brian Behlendorf 64bdf63f5c
ztest: split block reconstruction
Increase the default allowed number of reconstruction attempts.
There's not an exact right number for this setting.  It needs
to be set large enough to cover any realistic failure scenarios
and small enough to avoid stalling the IO pipeline and invoking
the dead man detection.

The current value of 256 was empirically determined to be too
low based on multi-day runs of ztest.  The fault injection code
would inject more damage than could be reconstructed given the
relatively small number of attempts.  However, in all observed
cases the block could be reconstructed using a slightly higher
limit.

Based on local testing increasing the default value to 4096 was
determined to strike the best balance.  Checking all combinations
takes less than 10s in the worst case, and has so far eliminated
the vast majority of false positives detected by ztest.  This
delay is roughly on par with how long retries may be performed
to a misbehaving HDD and was deemed to be reasonable.  Better to
err on the side of a brief delay rather than fail to reconstruct
the data.

Lastly, the -Y flag has been added to zdb to make it easy to try all
possible combinations when performing split block reconstruction.
For badly damaged blocks with 18 splits, they can be fully enumerated
within a few minutes.  This has been done to ensure permanent errors
are never incorrectly reported when ztest verifies the pool with zdb.

Reviewed by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8271
2019-01-16 14:10:02 -08:00
Brian Behlendorf 6e91a72fe3
Disable 'zfs remap' command
The implementation of 'zfs remap' has proven to be problematic since
it modifies the objset (but not its logical contents) by dirtying
metadata without owning it.  The consequence of which is that
dmu_objset_remap_indirects() is vulnerable to certain races.

For example, if we are in the middle of receiving into the filesystem
while it is being remapped.  Then it is possible we could evict the
objset when the receive completes (see dsl_dataset_clone_swap_sync_impl,
or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still
using the objset.  The result of which would be a panic.

Extended runs of ztest(8) have exposed other possible races which
can occur when using 'zfs remap'.  Several of these have been fixed
but there may be others which have not yet been encountered and
diagnosed.

Furthermore, the ability to manually remap a filesystem is no longer
particularly useful now that the removal code can map large chunks.
Coupled with the fact that explaining what this command does and why
it may be useful requires a detailed understanding of the internals
of device removal.  These are details users should not be bothered
with.

Therefore, the 'zfs remap' command is being disabled but not entirely
removed.  It may be removed in the future or potentially reworked
to address the issues described above.  Since 'zfs remap' has never
been part of a tagged release its removal is expected to have
minimal impact.

The ZTS tests have been updated to continue to exercise the command
to prevent atrophy, but it has been removed entirely from ztest(8).

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8238
2019-01-15 15:46:58 -08:00
Benjamin Gentil 22448f0894 zfs.8 uses wrong snapshot names in Example 15
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: Benjamin Gentil <benjamin@gentil.io>
Closes #8241
2019-01-07 11:08:54 -08:00
Brian Behlendorf a769fb53a1 Add 'zpool status -i' option
Only display the full details of the vdev initialization state
in 'zpool status' output when requested with the -i option.
By default display '(initializing)' after vdevs when they are
being actively initialized.  This is consistent with the
established precident of appending '(resilvering), etc' and
fits within the default 80 column terminal width making it
easy to read.

Additionally, updated the 'zpool initialize' documentation to
make it clear the options are mutually exclusive, but allow
duplicate options like all other zfs/zpool commands.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8230
2019-01-07 11:03:18 -08:00
George Wilson 619f097693 OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========

The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.

SOLUTION
=========

This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.

When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
        - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
                - start, suspend, or cancel initialization
        - Creates new open-context thread for each vdev
        - Thread iterates through all metaslabs in this vdev
        - Each metaslab:
                - select a metaslab
                - load the metaslab
                - mark the metaslab as being zeroed
                - walk all free ranges within that metaslab and translate
                  them to ranges on the leaf vdev
                - issue a "zeroing" I/O on the leaf vdev that corresponds to
                  a free range on the metaslab we're working on
                - continue until all free ranges for this metaslab have been
                  "zeroed"
                - reset/unmark the metaslab being zeroed
                - if more metaslabs exist, then repeat above tasks.
                - if no more metaslabs, then we're done.

        - progress for the initialization is stored on-disk in the vdev’s
          leaf zap object. The following information is stored:
                - the last offset that has been initialized
                - the state of the initialization process (i.e. active,
                  suspended, or canceled)
                - the start time for the initialization

        - progress is reported via the zpool status command and shows
          information for each of the vdevs that are initializing

Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
  written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2019-01-07 10:37:26 -08:00
Richard Laager 06f3fc2a4b Minor tweaks to zfs.8 man page for POSIX ACLs
* Capitalize POSIX in POSIX ACLs.  This change makes the POSIX 
  in POSIX ACLs all caps, which is both correct and consistent with
  the rest of the man page.

* Slightly reword part of zfs.8.  I tweaked a sentence to add a 
  missing comma, and as long as I was editing, removed a couple
  unnecessary words.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #8220
2018-12-26 13:50:14 -08:00
Brian Behlendorf 7c9a42921e
Detect IO errors during device removal
* Detect IO errors during device removal

While device removal cannot verify the checksums of individual
blocks during device removal, it can reasonably detect hard IO
errors from the leaf vdevs.  Failure to perform this error
checking can result in device removal completing successfully,
but moving no data which will permanently corrupt the pool.

Situation 1: faulted/degraded vdevs

In the configuration shown below, the removal of mirror-0 will
permanently corrupt the pool.  Device removal will preferentially
copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'.  Which
in this case will result in nothing being copied since one vdev
in each of those groups in unavailable.  However, device removal
will complete successfully since all IO errors are ignored.

  tank                DEGRADED     0     0     0
    mirror-0          DEGRADED     0     0     0
      /var/tmp/vdev1  FAULTED      0     0     0  external fault
      /var/tmp/vdev2  ONLINE       0     0     0
    mirror-1          DEGRADED     0     0     0
      /var/tmp/vdev3  ONLINE       0     0     0
      /var/tmp/vdev4  FAULTED      0     0     0  external fault

This issue is resolved by updating the source child selection
logic to exclude unreadable leaf vdevs.  Additionally, unwritable
destination child vdevs which can never succeed are skipped to
prevent generating a large number of write IO errors.

Situation 2: individual hard IO errors

During removal if an unexpected hard IO error is encountered when
either reading or writing the child vdev the entire removal
operation is cancelled.  While it may be possible to reconstruct
the data after removal that cannot be guaranteed.  The only
strictly safe thing to do is to cancel the removal.

As a future improvement we may want to instead suspend the removal
process and allow the damaged region to be retried.  But that work
is left for another time, hard IO errors during the removal process
are expected to be exceptionally rare.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #6900
Closes #8161
2018-12-04 09:37:37 -08:00
Christian Schwarz bd9c195805 man/zfs.8: document 'received' property source
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #8134
2018-11-20 09:59:43 -08:00
Tony Hutter ad796b8a3b Add zpool status -s (slow I/Os) and -p (parseable)
This patch adds a new slow I/Os (-s) column to zpool status to show the
number of VDEV slow I/Os. This is the number of I/Os that didn't
complete in zio_slow_io_ms milliseconds. It also adds a new parsable
(-p) flag to display exact values.

 	NAME         STATE     READ WRITE CKSUM  SLOW
 	testpool     ONLINE       0     0     0     -
	  mirror-0   ONLINE       0     0     0     -
 	    loop0    ONLINE       0     0     0    20
 	    loop1    ONLINE       0     0     0     0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7756
Closes #6885
2018-11-08 16:47:24 -08:00
Tom Caputi 80a91e7469 Defer new resilvers until the current one ends
Currently, if a resilver is triggered for any reason while an
existing one is running, zfs will immediately restart the existing
resilver from the beginning to include the new drive. This causes
problems for system administrators when a drive fails while another
is already resilvering. In this case, the optimal thing to do to
reduce risk of data loss is to wait for the current resilver to end
before immediately replacing the second failed drive, which allows
the system to operate with two incomplete drives for the minimum
amount of time.

This patch introduces the resilver_defer feature that essentially
does this for the admin without forcing them to wait and monitor
the resilver manually. The change requires an on-disk feature
since we must mark drives that are part of a deferred resilver in
the vdev config to ensure that we do not assume they are done
resilvering when an existing resilver completes.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @mmaybee 
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7732
2018-10-18 21:06:18 -07:00
Yuri Pankov cb110f254e OpenZFS 9763 - zfs(1M): broken formatting in allow/unallow description
Porting notes:
* Two of the three changes from the upstream patch were already
  applied for Linux.  Only the last one is required.

Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Gordon Ross <gwr@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9763
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8a702e55
Closes #7972
2018-10-02 16:12:54 -07:00
DeHackEd 9d489ab3a8 Fix reference to zpool-features(5)
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: DHE <git@dehacked.net>
Closes #7938
2018-09-21 09:41:08 -07:00
Gregor Kopka 48b0b649fd Man page fixes - zpool/zfs optional parameters
The man pages for zpool and zfs (get command)
listed the pool/dataset parameter as required,
but these are optional. Fixed that.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7916
2018-09-18 08:55:33 -07:00
Brian Behlendorf 2ced3cf0b2
Clarify 'zpool remove' restrictions
Update zpool(8) to clarify what type of vdevs may be safely
removed and that the existence of any top-level raidz device
which is part of the primary pool will prevent device removal.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7880 
Closes #7893
2018-09-17 17:28:18 -07:00
LOLi 0238a9755b Fix 'zfs allow' for create time permissions
When no permission set is defined for a dataset the create time
permissions are incorrectly shown as if they were a permission set.
This change simply correct how allow permissions are displayed.

This commit also fixes a small manpage formatting issue and adds the
"zfs_allow_003_pos" test case to the ZFS Test Suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7519 
Closes #7860
2018-09-06 13:11:21 -07:00
Don Brady cc99f275a2 Pool allocation classes
Allocation Classes add the ability to have allocation classes in a
pool that are dedicated to serving specific block categories, such
as DDT data, metadata, and small file blocks. A pool can opt-in to
this feature by adding a 'special' or 'dedup' top-level VDEV.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Reviewed-by: Andreas Dilger <andreas.dilger@chamcloud.com>
Reviewed-by: DHE <git@dehacked.net>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Gregor Kopka <gregor@kopka.net>
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #5182
2018-09-05 18:33:36 -07:00
Tom Caputi c3bd3fb4ac OpenZFS 9403 - assertion failed in arc_buf_destroy()
Assertion failed in arc_buf_destroy() when concurrently reading
block with checksum error.

Porting notes:
* The ability to zinject decompression errors has been added, but
  this only works at the zio_decompress() level, where we have all
  of the info we need to match against the user's zinject options.
* The decompress_fault test has been added to test the new zinject
  functionality
* We attempted to set zio_decompress_fail_fraction to (1 << 18) in
  ztest for further test coverage. Although this did uncover a few
  low priority issues, this unfortuantely also causes ztest to
  ASSERT in many locations where the code is working correctly since
  it is designed to fail on IO errors. Developers can manually set
  this variable with the '-o' option to find and debug issues.

Authored by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Matt Ahrens <mahrens@delphix.com>
Ported-by: Tom Caputi <tcaputi@datto.com>

OpenZFS-issue: https://illumos.org/issues/9403
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9
Closes #7822
2018-08-29 11:33:33 -07:00
Serapheim Dimitropoulos a448a2557e Introduce read/write kstats per dataset
The following patch introduces a few statistics on reads and writes
grouped by dataset. These statistics are implemented as kstats
(backed by aggregate sums for performance) and can be retrieved by
using the dataset objset ID number. The motivation for this change is
to provide some preliminary analytics on dataset usage/performance.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #7705
2018-08-20 09:52:37 -07:00
LOLi a9d6270acb 'zfs holds' scripted mode is not documented
This change simply documents the existing "scripted mode" option in
both command help and man page.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7798
2018-08-18 15:47:41 -07:00
Tom Caputi d9c460a0b6 Added encryption support for zfs recv -o / -x
One small integration that was absent from b52563 was
support for zfs recv -o / -x with regards to encryption
parameters. The main use cases of this are as follows:

* Receiving an unencrypted stream as encrypted without
  needing to create a "dummy" encrypted parent so that
  encryption can be inheritted.

* Allowing users to change their keylocation on receive,
  so long as the receiving dataset is an encryption root.

* Allowing users to explicitly exclude or override the
  encryption property from an unencrypted properties stream,
  allowing it to be received as encrypted.

* Receiving a recursive heirarchy of unencrypted datasets,
  encrypting the top-level one and forcing all children to
  inherit the encryption.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7650
2018-08-15 09:48:49 -07:00
Toomas Soome 5fadb7fb0c OpenZFS 8906 - uts: illumos rootfs should support salted cksum
Porting notes:
* As of grub-2.02 these checksums are not supported.  However, as
  pointed out in #6501 there are alternatives such as EFISTUB which
  work and have no such restriction.  A warning was added to the
  checksum property section of the zfs.8 man page.

Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/8906
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f
Closes #6501
Closes #7714
2018-07-27 08:35:28 -07:00
Serapheim Dimitropoulos a7ed98d8b5 OpenZFS 9330 - stack overflow when creating a deeply nested dataset
Datasets that are deeply nested (~100 levels) are impractical. We just
put a limit of 50 levels to newly created datasets. Existing datasets
should work without a problem.

The problem can be seen by attempting to create a dataset using the -p
option with many levels:

    panic[cpu0]/thread=ffffff01cd282c20: BAD TRAP: type=8 (#df Double fault) rp=ffffffff

    fffffffffbc3aa60 unix:die+100 ()
    fffffffffbc3ab70 unix:trap+157d ()
    ffffff00083d7020 unix:_patch_xrstorq_rbx+196 ()
    ffffff00083d7050 zfs:dbuf_rele+2e ()
    ...
    ffffff00083d7080 zfs:dsl_dir_close+32 ()
    ffffff00083d70b0 zfs:dsl_dir_evict+30 ()
    ffffff00083d70d0 zfs:dbuf_evict_user+4a ()
    ffffff00083d7100 zfs:dbuf_rele_and_unlock+87 ()
    ffffff00083d7130 zfs:dbuf_rele+2e ()
    ... The block above repeats once per directory in the ...
    ... create -p command, working towards the root ...
    ffffff00083db9f0 zfs:dsl_dataset_drop_ref+19 ()
    ffffff00083dba20 zfs:dsl_dataset_rele+42 ()
    ffffff00083dba70 zfs:dmu_objset_prefetch+e4 ()
    ffffff00083dbaa0 zfs:findfunc+23 ()
    ffffff00083dbb80 zfs:dmu_objset_find_spa+38c ()
    ffffff00083dbbc0 zfs:dmu_objset_find+40 ()
    ffffff00083dbc20 zfs:zfs_ioc_snapshot_list_next+4b ()
    ffffff00083dbcc0 zfs:zfsdev_ioctl+347 ()
    ffffff00083dbd00 genunix:cdev_ioctl+45 ()
    ffffff00083dbd40 specfs:spec_ioctl+5a ()
    ffffff00083dbdc0 genunix:fop_ioctl+7b ()
    ffffff00083dbec0 genunix:ioctl+18e ()
    ffffff00083dbf10 unix:brand_sys_sysenter+1c9 ()

Porting notes:
* Added zfs_max_dataset_nesting module option with documentation.
* Updated zfs_rename_014_neg.ksh for Linux.
* Increase the zfs.sh stack warning to 15K.  Enough time has passed
  that 16K can be reasonably assumed to be the default value.  It
  was increased in the 3.15 kernel released in June of 2014.

Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>

OpenZFS-issue: https://www.illumos.org/issues/9330
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/757a75a
Closes #7681
2018-07-09 13:02:50 -07:00
Eitan Adler fb8a10d5be OpenZFS 9521 - Add checkpoint field
Add checkpoint field in the default list of the zpool-list man page

Authored by: Eitan Adler <lists@eitanadler.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: kpande <github@tripleback.net>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9521
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c5a860f7b
Closes #7658
2018-06-27 09:33:37 -07:00
Serapheim Dimitropoulos d2734cce68 OpenZFS 9166 - zfs storage pool checkpoint
Details about the motivation of this feature and its usage can
be found in this blogpost:

    https://sdimitro.github.io/post/zpool-checkpoint/

A lightning talk of this feature can be found here:
https://www.youtube.com/watch?v=fPQA8K40jAM

Implementation details can be found in big block comment of
spa_checkpoint.c

Side-changes that are relevant to this commit but not explained
elsewhere:

* renames members of "struct metaslab trees to be shorter without
  losing meaning

* space_map_{alloc,truncate}() accept a block size as a
  parameter. The reason is that in the current state all space
  maps that we allocate through the DMU use a global tunable
  (space_map_blksz) which defauls to 4KB. This is ok for metaslab
  space maps in terms of bandwirdth since they are scattered all
  over the disk. But for other space maps this default is probably
  not what we want. Examples are device removal's vdev_obsolete_sm
  or vdev_chedkpoint_sm from this review. Both of these have a
  1:1 relationship with each vdev and could benefit from a bigger
  block size.

Porting notes:

* The part of dsl_scan_sync() which handles async destroys has
  been moved into the new dsl_process_async_destroys() function.

* Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write
  to block device backed pools.

* ZTS:
  * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg".

  * Don't use large dd block sizes on /dev/urandom under Linux in
    checkpoint_capacity.

  * Adopt Delphix-OS's setting of 4 (spa_asize_inflation =
    SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed
    its attempts to fill the pool

  * Create the base and nested pools with sync=disabled to speed up
    the "setup" phase.

  * Clear labels in test pool between checkpoint tests to avoid
    duplicate pool issues.

  * The import_rewind_device_replaced test has been marked as "known
    to fail" for the reasons listed in its DISCLAIMER.

  * New module parameters:

      zfs_spa_discard_memory_limit,
      zfs_remove_max_bytes_pause (not documented - debugging only)
      vdev_max_ms_count (formerly metaslabs_per_vdev)
      vdev_min_ms_count

Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9166
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8
Closes #7570
2018-06-26 10:07:42 -07:00
Antonio Russo 39042f9736 Tunable directory for zfs runtime scripts
zpool and zed place scripts in subdirectories of libexecdir. Some
distributions locate architecture independent scripts in other locations
(e.g. Debian). To avoid these paths getting out of sync, centralize the
definitions.

Build zfs-test's default.cfg by Makefile.  Use the new directory
logic building tests/zfs-tests/include/default.cfg.in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7597
2018-06-07 09:59:59 -07:00
Antonio Russo 7106b23640 Minor documentation, logging, and testing typos
This patch collects some minor inconsistencies and typos in the
documentation, logging and testing infrastructure.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7608
2018-06-07 09:38:39 -07:00
bunder2015 85912983a4 Fix typoes in zpool man page
Fixed some highlighting in the zpool man page

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7596
2018-06-04 09:06:16 -07:00
Brian Behlendorf b669ab83bb
Ignore *.o.ur-safe build artifacts
Generated when building on Ubuntu 18.04.  Also ignore the new
dynamically generated zfs-mount-generator.8 man page, and the
module/.cache.mk file.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7534
2018-05-13 18:59:02 -07:00
Antonio Russo 68fded8146 Add canonical mount options zfs-mount-generator
lib/libzfs/libzfs_mount.c:zfs_add_options provides the canonical
mount options used by a `zfs mount` command. Because we cannot call
`zfs mount` directly from a systemd.mount unit, we mirror that logic
in zfs-mount-generator.

The zed script is updated to cache these properties as well.

Include a mini-tutorial in the manual page, properly substitute
configuration paths in zfs-mount-generator.8.in, and standardize the
Makefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7453
2018-05-11 12:44:14 -07:00
Tom Caputi be9a5c355c Add support for decryption faults in zinject
This patch adds the ability for zinject to trigger decryption
and authentication faults in the ZIO and ARC layers. This
functionality is exposed via the new "decrypt" error type, which
may be provided for "data" object types.

This patch also refactors some of the core encryption / decryption
functions so that they have consistent prototypes, handle errors
consistently, and do not have unused arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7474
2018-05-02 15:36:20 -07:00
George Melikov eb201f50ac Add back iostat -y or -w descriptions
The iostat -y and -w descriptions were left in cda0317e,
get them back.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7479 
Closes #7483
2018-04-30 13:42:58 -05:00
Matthew Ahrens a1d477c24c OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete

This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk.  The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool.  An entry becomes obsolete when all the blocks that use
it are freed.  An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones).  Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible.  This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of
the data that is copied.  This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.

At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.

Porting Notes:

* Avoid zero-sized kmem_alloc() in vdev_compact_children().

    The device evacuation code adds a dependency that
    vdev_compact_children() be able to properly empty the vdev_child
    array by setting it to NULL and zeroing vdev_children.  Under Linux,
    kmem_alloc() and related functions return a sentinel pointer rather
    than NULL for zero-sized allocations.

* Remove comment regarding "mpt" driver where zfs_remove_max_segment
  is initialized to SPA_MAXBLOCKSIZE.

  Change zfs_condense_indirect_commit_entry_delay_ticks to
  zfs_condense_indirect_commit_entry_delay_ms for consistency with
  most other tunables in which delays are specified in ms.

* ZTS changes:

    Use set_tunable rather than mdb
    Use zpool sync as appropriate
    Use sync_pool instead of sync
    Kill jobs during test_removal_with_operation to allow unmount/export
    Don't add non-disk names such as "mirror" or "raidz" to $DISKS
    Use $TEST_BASE_DIR instead of /tmp
    Increase HZ from 100 to 1000 which is more common on Linux

    removal_multiple_indirection.ksh
        Reduce iterations in order to not time out on the code
        coverage builders.

    removal_resume_export:
        Functionally, the test case is correct but there exists a race
        where the kernel thread hasn't been fully started yet and is
        not visible.  Wait for up to 1 second for the removal thread
        to be started before giving up on it.  Also, increase the
        amount of data copied in order that the removal not finish
        before the export has a chance to fail.

* MMP compatibility, the concept of concrete versus non-concrete devices
  has slightly changed the semantics of vdev_writeable().  Update
  mmp_random_leaf_impl() accordingly.

* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
  feature which is not supported by OpenZFS.

* Added support for new vdev removal tracepoints.

* Test cases removal_with_zdb and removal_condense_export have been
  intentionally disabled.  When run manually they pass as intended,
  but when running in the automated test environment they produce
  unreliable results on the latest Fedora release.

  They may work better once the upstream pool import refectoring is
  merged into ZoL at which point they will be re-enabled.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2018-04-14 12:16:17 -07:00
Mike Gerdts d22f3a8244 OpenZFS 9286 - want refreservation=auto
Authored by: Mike Gerdts <mike.gerdts@joyent.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Don Brady <don.brady@delphix.com>

Porting Notes:
* Adopted destroy_dataset in ZTS test cleanup
* Use ksh shebang instead of bash for new tests

OpenZFS-issue: https://www.illumos.org/issues/9286
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/723d0c85
Closes #7387
2018-04-11 14:52:13 -07:00
DeHackEd dfb1ad027f zfs(8): fix dedup omission during mdoc overhaul
The property description has been updated with new algorithms as well.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes #7377
2018-04-10 17:19:01 -07:00
Antonio Russo 55d80e651a systemd mount generator and tracking ZEDLET
zfs-mount-generator implements the "systemd generator" protocol,
producing systemd.mount units from the cached outputs of zfs list,
during early boot, integrating with systemd.

Each pool has an indpendent cache of the command

  zfs list -H -oname,mountpoint,canmount -tfilesystem -r $pool

which is kept synchronized by the ZEDLET

  history_event-zfs-list-cacher.sh

Datasets not in the cache will be loaded later in the boot process by
zfs-mount.service, including pools without a cache.

Among other things, this allows for complex mount hierarchies.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7329
2018-04-06 14:11:09 -07:00
Peter Ashford 910f3ce739 Clarify zpool actions for an intent log device
Updated the "Intent Log" section of the "zpool" manual page to
properly reflect the actions that may be performed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Ashford <ashford@accs.com>
Closes #6938 
Closes #7318
2018-03-22 15:12:08 -07:00
Alek P 272b5d730f Add JSON output support to channel programs
The changes piggyback JSON output support on top of channel programs 
(#6558).  This way the JSON output support is targeted to scripting 
use cases and is easily maintainable since it really only touches 
one function (zfs_do_channel_program()).

This patch ports Joyent's JSON nvlist library from illumos to enable 
easy JSON printing of channel program output nvlist.  To keep the 
delta small I also took advantage of the fact that printing in
zfs_do_channel_program() was almost always done before exiting 
the program.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7281
2018-03-19 12:40:58 -07:00
Paul Zuchowski 83362e8e67 Destroy makes full snap list before destroying
Change zfs destroy logic so destroying begins before
the entire list of snapshots is built.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #7271
2018-03-12 15:24:08 -07:00
Tomohiro Kusumi 5ee220ba5c Document allowed pool names
PR #7208 was a patch to allow non-reserved pool names which begin with
mirror, raidz, spare (but do not equal), however we'd rather document
it in the man page for compatibility with other OpenZFS implementations,
to avoid pool names that may not work on non-Linux platforms.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #7216
2018-03-09 14:04:15 -08:00
John Eismeier d699aaef09 Fix some typos
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Eismeier <john.eismeier@gmail.com>
Closes #7237
2018-02-28 08:57:10 -08:00
Tomohiro Kusumi d72cd017dd Fix zpool(8) list example to match actual format
a05dfd00 (Illumos 5147) has swapped FRAG and EXPANDSZ,
so it's natural to modify these examples.

 # zpool list | head -1
 NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
                              ^^^^^^^^^^^^^^^

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #7244
2018-02-28 08:54:53 -08:00
Tony Hutter bf95a000c4 Add scrub after resilver zed script
* Add a zed script to kick off a scrub after a resilver.  The script is
disabled by default.

* Add a optional $PATH (-P) option to zed to allow it to use a custom
$PATH for its zedlets.  This is needed when you're running zed under
the ZTS in a local workspace.

* Update test scripts to not copy in all-debug.sh and all-syslog.sh by
default.  They can be optionally copied in as part of zed_setup().
These scripts slow down zed considerably under heavy events loads and
can cause events to be dropped or their delivery delayed. This was
causing some sporadic failures in the 'fault' tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4662 
Closes #7086
2018-02-23 11:38:05 -08:00
LOLi faa97c1619 Want 'zfs send -b'
This change implements 'zfs send -b' which can be used to send only
received property values whether or not they are overridden by local
settings.

This can be very useful during "restore" operations from a backup pool
because it allows to send only the property values originally sent
from the backup source, even though they were later modified on the
destination either by a 'zfs set' operation, explicit 'zfs inherit' or
overridden during the receive process via 'zfs receive -o|-x'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7156
2018-02-21 12:32:06 -08:00
Olaf Faaland ec7c1b914c Clarify zinject(8) explanation of -e
Error injection of EIO or ENXIO simply sets the zio's io_error value,
rather than preventing the read or write from occurring.  This is
important information as it affects how the probes must be used.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7172
2018-02-15 09:50:06 -08:00
Nasf-Fan 9c5167d19f Project Quota on ZFS
Project quota is a new ZFS system space/object usage accounting
and enforcement mechanism. Similar as user/group quota, project
quota is another dimension of system quota. It bases on the new
object attribute - project ID.

Project ID is a numerical value to indicate to which project an
object belongs. An object only can belong to one project though
you (the object owner or privileged user) can change the object
project ID via 'chattr -p' or 'zfs project [-s] -p' explicitly.
The object also can inherit the project ID from its parent when
created if the parent has the project inherit flag (that can be
set via 'chattr +P' or 'zfs project -s [-p]').

By accounting the spaces/objects belong to the same project, we
can know how many spaces/objects used by the project. And if we
set the upper limit then we can control the spaces/objects that
are consumed by such project. It is useful when multiple groups
and users cooperate for the same project, or a user/group needs
to participate in multiple projects.

Support the following commands and functionalities:

zfs set projectquota@project
zfs set projectobjquota@project

zfs get projectquota@project
zfs get projectobjquota@project
zfs get projectused@project
zfs get projectobjused@project

zfs projectspace

zfs allow projectquota
zfs allow projectobjquota
zfs allow projectused
zfs allow projectobjused

zfs unallow projectquota
zfs unallow projectobjquota
zfs unallow projectused
zfs unallow projectobjused

chattr +/-P
chattr -p project_id
lsattr -p

This patch also supports tree quota based on the project quota via
"zfs project" commands set as following:
zfs project [-d|-r] <file|directory ...>
zfs project -C [-k] [-r] <file|directory ...>
zfs project -c [-0] [-d|-r] [-p id] <file|directory ...>
zfs project [-p id] [-r] [-s] <file|directory ...>

For "df [-i] $DIR" command, if we set INHERIT (project ID) flag on
the $DIR, then the proejct [obj]quota and [obj]used values for the
$DIR's project ID will be shown as the total/free (avail) resource.
Keep the same behavior as EXT4/XFS does.

Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by  Ned Bass <bass6@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fan Yong <fan.yong@intel.com>
TEST_ZIMPORT_POOLS="zol-0.6.1 zol-0.6.2 master"
Change-Id: Ib4f0544602e03fb61fd46a849d7ba51a6005693c
Closes #6290
2018-02-13 14:54:54 -08:00
Chunwei Chen 950e17c215 Fix zdb -R decompression
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
Closes #4984
2018-02-09 10:11:02 -08:00
Serapheim Dimitropoulos 5b72a38d68 OpenZFS 8677 - Open-Context Channel Programs
Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Don Brady <don.brady@delphix.com>

We want to be able to run channel programs outside of synching
context. This would greatly improve performance for channel programs
that just gather information, as they won't have to wait for synching
context anymore.

=== What is implemented?

This feature introduces the following:
- A new command line flag in "zfs program" to specify our intention
  to run in open context. (The -n option)
- A new flag/option within the channel program ioctl which selects
  the context.
- Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
- Documentation for the new feature in the manual pages.

=== How do we handle zfs.sync functions in open context?

When such a function is found by the interpreter and we are running
in open context we abort the script and we spit out a descriptive
runtime error. For example, given the script below ...

arg = ...
fs = arg["argv"][1]
err = zfs.sync.destroy(fs)
msg = "destroying " .. fs .. " err=" .. err
return msg

if we run it in open context, we will get back the following error:

Channel program execution failed:
[string "channel program"]:3: running functions from the zfs.sync
submodule requires passing sync=TRUE to lzc_channel_program()
(i.e. do not specify the "-n" command line argument)
stack traceback:
            [C]: in function 'destroy'
            [string "channel program"]:3: in main chunk

=== What about testing?

We've introduced new wrappers for all channel program tests that
run each channel program as both (startard & open-context) and
expect the appropriate behavior depending on the program using
the zfs.sync module.

OpenZFS-issue: https://www.illumos.org/issues/8677
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/17a49e15
Closes #6558
2018-02-08 16:05:57 -08:00
Chris Williamson 234c91c508 OpenZFS 8600 - ZFS channel programs - snapshot
Authored by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Don Brady <don.brady@delphix.com>

ZFS channel programs should be able to create snapshots.
In addition to the base snapshot functionality, this entails extra
logic to handle edge cases which were formerly not possible, such as
creating then destroying a snapshot in the same transaction sync.

OpenZFS-issue: https://www.illumos.org/issues/8600
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68089b8b
2018-02-08 15:29:24 -08:00
Brad Lewis af07368986 OpenZFS 8592 - ZFS channel programs - rollback
Authored by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Don Brady <don.brady@delphix.com>

ZFS channel programs should be able to perform a rollback.

OpenZFS-issue: https://www.illumos.org/issues/8592
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d46b5ed6
2018-02-08 15:29:14 -08:00
Chris Williamson 475eca4908 OpenZFS 8605 - zfs channel programs fix zfs.exists
Authored by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Don Brady <don.brady@delphix.com>

zfs.exists() in channel programs doesn't return any result, and should
have a man page entry. This patch corrects zfs.exists so that it
returns a value indicating if the dataset exists or not. It also adds
documentation about it in the man page.

OpenZFS-issue: https://www.illumos.org/issues/8605
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1e85e111
2018-02-08 15:28:52 -08:00
Chris Williamson d99a015343 OpenZFS 7431 - ZFS Channel Programs
Authored by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Don Brady <don.brady@delphix.com>
Ported-by: John Kennedy <john.kennedy@delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/7431
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dfc11533

Porting Notes:
* The CLI long option arguments for '-t' and '-m' don't parse on linux
* Switched from kmem_alloc to vmem_alloc in zcp_lua_alloc
* Lua implementation is built as its own module (zlua.ko)
* Lua headers consumed directly by zfs code moved to 'include/sys/lua/'
* There is no native setjmp/longjump available in stock Linux kernel.
  Brought over implementations from illumos and FreeBSD
* The get_temporary_prop() was adapted due to VFS platform differences
* Use of inline functions in lua parser to reduce stack usage per C call
* Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow
2018-02-08 15:28:18 -08:00
LOLi 63f88c12b4 Fix style issues in man pages and commands help
* Remove 'zfs snap' from zfs help message (OpenZFS sync)
* Update zfs(8) to suggest 'snap' can be used as an alias for 'snapshot'
* Enforce 80 columns limit in help messages
* Remove zfs_disable_dup_eviction from zfs-module-parameters(5)
* Expose zfs_scan_max_ext_gap as a kernel module parameter.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7087
2018-01-29 15:05:03 -08:00
Chunwei Chen 522db29275 zpool import -d to specify device path
When we know which devices have the pool we are looking for, sometime
it's better if we can directly pass those device paths to zpool import
instead of letting it to search through all unrelated stuff, which might
take a lot of time if you have hundreds of disks.

This patch allows option -d <dev_path> to zpool import. You can have
multiple pairs of -d <dev_path>, and zpool import will only search
through those devices. For example:

    zpool import -d /dev/sda -d /dev/sdb

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7077
2018-01-26 10:49:46 -08:00
Yuri Pankov 6df9f8ebd7 OpenZFS 8899 - zpool list property documentation doesn't match actual behaviour
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Alexander Pyhalov <alp@rsu.ru>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8899
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b0e142e57d
Closes #7032
2018-01-11 13:54:34 -08:00
Yuri Pankov bcb1a8a25e OpenZFS 8898 - creating fs with checksum=skein on the boot pools fails ungracefully
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8898
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9fa2266d9a
Closes #7031
2018-01-11 13:53:04 -08:00
DeHackEd 1c68856bca zpool(8): Fix "zpool import -t"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes #6894
2017-11-28 11:10:52 -06:00
Antonio Russo 5c2552c564 systemd zfs-import.target and documentation
zfs-import-{cache,scan}.service must complete before any mounting of
filesystems can occur. To simplify this dependency, create a target
that is reached After (in the systemd sense) the pool is imported.

Additionally, recommend that legacy zfs mounts use the option

x-systemd.requires=zfs-import.target

to codify this requirement.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #6764
2017-10-30 13:18:26 -07:00
Brian Behlendorf f4ae39a19d
Fix status command options in zpool(8)
The 'zpool status' command supports the -P option for printing full
path names.  It does not support the -p parsable option for printing
exact values.
    
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6792 
Closes #6794
2017-10-27 15:52:03 -07:00
Giuseppe Di Natale a94d38c0f3 Correct make mancheck recipe
The current make recipe for mancheck silently ignores errors. Correct
the recipe so errors cause the mancheck recipe fail.

The zpool reopen command in the zpool.8 manpage had a bullet list
without an .El.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6790
2017-10-27 09:52:18 -07:00
LOLi 88f9c9396b Allow 'zpool events' filtering by pool name
Additionally add four new tests:

 * zpool_events_clear: verify 'zpool events -c' functionality
 * zpool_events_cliargs: verify command line options and arguments
 * zpool_events_follow: verify 'zpool events -f'
 * zpool_events_poolname: verify events filtering by pool name

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3285 
Closes #6762
2017-10-26 16:49:33 -07:00
Arkadiusz Bubała d3f2cd7e3b Added no_scrub_restart flag to zpool reopen
Added -n flag to zpool reopen that allows a running scrub
operation to continue if there is a device with Dirty Time Log.

By default if a component device has a DTL and zpool reopen
is executed all running scan operations will be restarted.

Added functional tests for `zpool reopen`

Tests covers following scenarios:
* `zpool reopen` without arguments,
* `zpool reopen` with pool name as argument,
* `zpool reopen` while scrubbing,
* `zpool reopen -n` while scrubbing,
* `zpool reopen -n` while resilvering,
* `zpool reopen` with bad arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes #6076 
Closes #6746
2017-10-26 12:26:09 -07:00
Brian Behlendorf bbf1ad67cd Remove vn_rename and vn_remove dependency
The only place vn_rename and vn_remove are used is when writing
out an updated pool configuration file.  By truncating the file
instead of renaming and removing it we can avoid having to implement
these interfaces entirely.  Functionally an empty cache file is
treated the same as a missing cache file.  This is particularly
advantageous because the Linux kernel has never provided a way
to reliably implement vn_rename and vn_remove.

The cachefile_004_pos.ksh test case was updated to understand
that an empty cache file is the same as a missing one.

The zfs-import-* systemd service files were not updated to use
ConditionFileNotEmpty in place of ConditionPathExists.  This
means that after exporting all pools and rebooting new pools
will not the scanned for on the next boot.  This small change
should not impact normal usage since pools are not exported
as part of a normal shutdown.

Documentation was updated accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#648 
Closes #6753
2017-10-19 10:06:55 -07:00
Tom Caputi 4807c0badb Encryption patch follow-up
* PBKDF2 implementation changed to OpenSSL implementation.

* HKDF implementation moved to its own file and tests
  added to ensure correctness.

* Removed libzfs's now unnecessary dependency on libzpool
  and libicp.

* Ztest can now create and test encrypted datasets. This is
  currently disabled until issue #6526 is resolved, but
  otherwise functions as advertised.

* Several small bug fixes discovered after enabling ztest
  to run on encrypted datasets.

* Fixed coverity defects added by the encryption patch.

* Updated man pages for encrypted send / receive behavior.

* Fixed a bug where encrypted datasets could receive
  DRR_WRITE_EMBEDDED records.

* Minor code cleanups / consolidation.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
2017-10-11 16:54:48 -04:00
LOLi 90cdf2833d Add mdoc style checker
Add a new make 'mancheck' target which uses mandoc -Tlint to verify
manpage files: currently only zfs(8), zpool(8) zdb(8) and zgenhostid(8)
are supported.

Additionally fix some outstanding manpage formatting issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6646
2017-09-16 10:51:24 -07:00
George Melikov 7c9abcf887 OpenZFS 8435 - zpool.1m and zfs.1m: minor cleanup
3796 listsnapshots not documented in zpool man page

Authored by: George Melikov <mail@gmelikov.ru>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Ported-by: George Melikov mail@gmelikov.ru

OpenZFS-issue: https://www.illumos.org/issues/8435
OpenZFS-commit: openzfs/openzfs@a058d1c

Porting notes: OpenZFS review applied,
some ZoL changes were reverted.
See https://github.com/openzfs/openzfs/pull/415
2017-09-15 13:13:52 -07:00
LOLi 835db58592 Add -vnP support to 'zfs send' for bookmarks
This leverages the functionality introduced in cf7684b to expose
verbose, dry-run and parsable 'zfs send' options for bookmarks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3666 
Closes #6601
2017-09-08 15:24:31 -07:00
Mike Swanson 57858fb5ca Recommend compression=on in zfs(8) dedup section
compression=lz4 depends on the lz4 feature being enabled, while
compression=on will let ZFS use either lzjb or lz4 where appropriate.
It also allows the documentation to not go out of date if/when ZFS
picks a new default in the future.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #6614
2017-09-08 15:21:58 -07:00
bunder2015 2917956841 zfs(8) manpage corrections
Corrected indent of the note located at the bottom of the options for
zfs send as well as remove an extra whitespace

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6590
2017-09-05 13:45:18 -07:00
George Melikov 076e9b946e Remove copyright duplicate in zpool man page
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #6553
2017-08-24 10:36:17 -07:00
Giuseppe Di Natale d7323e79a6 OpenZFS 8547 - update mandoc to 1.14.3
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8547
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c66b804
Closes #6549
2017-08-24 10:30:42 -07:00
Alek P e4b6b2db12 OpenZFS 8414 - Implemented zpool scrub pause/resume
Authored by: Alek Pinchuk <apinchuk@datto.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Alek Pinchuk <apinchuk@datto.com>

OpenZFS-issue: https://www.illumos.org/issues/8414
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c29616076
Closes #6538
2017-08-24 10:27:20 -07:00
Tom Caputi 9b8407638d Send / Recv Fixes following b52563
This patch fixes several issues discovered after
the encryption patch was merged:

* Fixed a bug where encrypted datasets could attempt
  to receive embedded data records.

* Fixed a bug where dirty records created by the recv
  code wasn't properly setting the dr_raw flag.

* Fixed a typo where a dmu_tx_commit() was changed to
  dmu_tx_abort()

* Fixed a few error handling bugs unrelated to the
  encryption patch in dmu_recv_stream()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6512 
Closes #6524 
Closes #6545
2017-08-23 16:54:24 -07:00
Don Brady d977122da9 Add corruption failure option to zinject(8)
Added a 'corrupt' error option that will flip a bit in the data
after a read operation.  This is useful for generating checksum
errors at the device layer (in a mirror config for example). It
is also used to validate the diagnosis of checksum errors from
the zfs diagnosis engine.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #6345
2017-08-14 15:17:15 -07:00
Tom Caputi b525630342 Native Encryption for ZFS on Linux
This change incorporates three major pieces:

The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.

The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.

The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494 
Closes #5769
2017-08-14 10:36:48 -07:00
bunder2015 0f69f42b43 Correct man page generation
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6409
Closes #6410
2017-07-27 19:06:34 -07:00
Olaf Faaland b9373170e3 Add zgenhostid utility script
Turning the multihost property on requires that a hostid be set to allow
ZFS to determine when a foreign system is attemping to import a pool.
The error message instructing the user to set a hostid refers to
genhostid(1).

Genhostid(1) is not available on SUSE Linux.  This commit adds a script
modeled after genhostid(1) for those users.

Zgenhostid checks for an /etc/hostid file; if it does not exist, it
creates one and stores a value.  If the user has provided a hostid as an
argument, that value is used.  Otherwise, a random hostid is generated
and stored.

This differs from the CENTOS 6/7 versions of genhostid, which overwrite
the /etc/hostid file even though their manpages state otherwise.

A man page for zgenhostid is added. The one for genhostid is in (1), but
I put zgenhostid in (8) because I believe it's more appropriate.

The mmp tests are modified to use zgenhostid to set the hostid instead
of using the spl_hostid module parameter.  zgenhostid will not replace
an existing /etc/hostid file, so new mmp_clear_hostid calls are
required.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6358
Closes #6379
2017-07-25 13:22:03 -04:00
Giuseppe Di Natale d6bcf7ff5e Restrict zpool iostat/status -c to search path
zpool iostat/status -c is supposed to be restricted
by its search path, but currently isn't. To prevent
arbitrary scripts from being executed, disallow '/'
from commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6353 
Closes #6359
2017-07-24 11:53:59 -07:00
Ned Bass 7a8ed6b8b7 Minor fixes in zpool iostat -c documentation (#6370)
- Use nested [] notation to denote optional script list elements
- Fix space before comma after smarctl(8)
- Fix typo and formatting error in reference to -v option
- Fix spelling errors

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6370
2017-07-20 17:04:35 -07:00
Olaf Faaland 379ca9cf2b Multi-modifier protection (MMP)
Add multihost=on|off pool property to control MMP.  When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp.  Property defaults to off.

During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock.  Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".

Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval.  The period is specified in milliseconds.  The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.

Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path.  Abbreviated
output below.

$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg   timestamp  mmp_delay   vdev_guid   vdev_label vdev_path
20468    261337  250274925   68396651780       3    /dev/sda
20468    261339  252023374   6267402363293     1    /dev/sdc
20468    261340  252000858   6698080955233     1    /dev/sdx
20468    261341  251980635   783892869810      2    /dev/sdy
20468    261342  253385953   8923255792467     3    /dev/sdd
20468    261344  253336622   042125143176      0    /dev/sdab
20468    261345  253310522   1200778101278     2    /dev/sde
20468    261346  253286429   0950576198362     2    /dev/sdt
20468    261347  253261545   96209817917       3    /dev/sds
20468    261349  253238188   8555725937673     3    /dev/sdb

Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.

When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test.  For example, the
pool is exported to run zdb and then imported again.  Add a new ztest
function, "-M", to alter ztest behavior to prevent this.

Add new tests to verify the new functionality.  Tests provided by
Giuseppe Di Natale.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
2017-07-13 13:54:00 -04:00
LOLi cf8738d853 Add port of FreeBSD 'volmode' property
The volmode property may be set to control the visibility of ZVOL
block devices.

This allow switching ZVOL between three modes:
   full - existing fully functional behaviour (default)
   dev  - hide partitions on ZVOL block devices
   none - not exposing volumes outside ZFS

Additionally the new zvol_volmode module parameter can be used to
control the default behaviour.

This functionality can be used, for instance, on "backup" pools to
avoid cluttering /dev with unneeded zd* devices.

Original-patch-by: mav <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>

FreeBSD-commit: https://github.com/freebsd/freebsd/commit/dd28e6bb
Closes #1796 
Closes #3438 
Closes #6233
2017-07-12 13:05:37 -07:00