Commit Graph

382 Commits

Author SHA1 Message Date
Ahmed Gahnem 4e82b4be78 OpenZFS 9184 - Add ZFS performance test for fixed blocksize random read/write IO
This change introduces a new performance test which does random reads
and writes, but instead of using `bssplit` to determine the block size,
it uses a fixed blocksize. Additionally, some new IO sizes are added to
other tests and timestamp data is recorded with the performance data.

Authored by: Ahmed Gahnem <ahmedg@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Ported-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Requires-builders: perf

OpenZFS-issue: https://www.illumos.org/issues/9184
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/659
External-issue: DLPX-46724
Closes #7660
2018-07-02 13:46:06 -07:00
Brian Behlendorf e03a41a604
ZTS: Improve enospc tests
The enospc_002_pos test case would frequently fail due a command
succeeding when it was expected to fail due to lack of space.
In order to make this far less likely, files are created across
multiple transaction groups in order to consume as many unused
blocks as possible.

The dependency that the tests run on a partitioned block device
has been removed.  It's simpler to use sparse files.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7663
2018-06-29 09:40:32 -07:00
Serapheim Dimitropoulos d2734cce68 OpenZFS 9166 - zfs storage pool checkpoint
Details about the motivation of this feature and its usage can
be found in this blogpost:

    https://sdimitro.github.io/post/zpool-checkpoint/

A lightning talk of this feature can be found here:
https://www.youtube.com/watch?v=fPQA8K40jAM

Implementation details can be found in big block comment of
spa_checkpoint.c

Side-changes that are relevant to this commit but not explained
elsewhere:

* renames members of "struct metaslab trees to be shorter without
  losing meaning

* space_map_{alloc,truncate}() accept a block size as a
  parameter. The reason is that in the current state all space
  maps that we allocate through the DMU use a global tunable
  (space_map_blksz) which defauls to 4KB. This is ok for metaslab
  space maps in terms of bandwirdth since they are scattered all
  over the disk. But for other space maps this default is probably
  not what we want. Examples are device removal's vdev_obsolete_sm
  or vdev_chedkpoint_sm from this review. Both of these have a
  1:1 relationship with each vdev and could benefit from a bigger
  block size.

Porting notes:

* The part of dsl_scan_sync() which handles async destroys has
  been moved into the new dsl_process_async_destroys() function.

* Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write
  to block device backed pools.

* ZTS:
  * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg".

  * Don't use large dd block sizes on /dev/urandom under Linux in
    checkpoint_capacity.

  * Adopt Delphix-OS's setting of 4 (spa_asize_inflation =
    SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed
    its attempts to fill the pool

  * Create the base and nested pools with sync=disabled to speed up
    the "setup" phase.

  * Clear labels in test pool between checkpoint tests to avoid
    duplicate pool issues.

  * The import_rewind_device_replaced test has been marked as "known
    to fail" for the reasons listed in its DISCLAIMER.

  * New module parameters:

      zfs_spa_discard_memory_limit,
      zfs_remove_max_bytes_pause (not documented - debugging only)
      vdev_max_ms_count (formerly metaslabs_per_vdev)
      vdev_min_ms_count

Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9166
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8
Closes #7570
2018-06-26 10:07:42 -07:00
Tim Chase 88eaf610d9 Use "eval" in history_002_pos for log_must
Otherwise the output is consumed by the output redirection.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #7570
2018-06-26 09:58:05 -07:00
Serapheim Dimitropoulos 7637ef8d23 OpenZFS 9591 - ms_shift can be incorrectly changed
ms_shift can be incorrectly changed changed in MOS config for
indirect vdevs that have been historically expanded

According to spa_config_update() we expect new vdevs to have
vdev_ms_array equal to 0 and then we go ahead and set their metaslab
size. The problem is that indirect vdevs also have vdev_ms_array == 0
because their metaslabs are destroyed once their removal is done.

As a result, if a vdev was expanded and then removed may have its
ms_shift changed if another vdev was added after its removal.
Fortunately this behavior does not cause any type of crash or bad
behavior in the kernel but it can confuse zdb and anyone doing any kind
of analysis of the history of the pools.

Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

OpenZFS-commit: https://github.com/openzfs/openzfs/pull/651
OpenZFS-issue: https://illumos.org/issues/9591a
External-issue: DLPX-58879
Closes #7644
2018-06-21 09:35:26 -07:00
Brian Behlendorf e4a3297a04
ZTS: Adopt OpenZFS test analysis script
Adopt and extend the OpenZFS ZTS results analysis script for use
with ZFS on Linux.  This allows for automatic analysis of tests
which may be skipped for a variety or reasons or which are not
entirely reliable.

In addition to the list of 'known' failures, which have been updated
for ZFS on Linux, there in a new 'maybe' section.  This mapping
include tests which might be correctly skipped depending on the
test environment.  This may be because of a missing dependency or
lack of required kernel support.  This list also includes tests
which normally pass but might on occasion fail for a harmless
reason.

The script was also extended include a reason for why a given test
might be skipped or may fail.  The reason will be included after
the test in the "results other than PASS that are expected" section.
For failures it is preferable to set the reason to the GitHub issue
number and for skipped tests several generic reasons are available.
You may also specify a custom reason if needed.

All tests were added back in to the linux.run file even if they are
expected to failed.  There is value in running tests which may not
pass, the expected results for these tests has been encoded in
the new analysis script.

All tests which were disabled because they ran more slowly on a
32-bit system have been re-enabled.  Developers working on 32-bit
systems should assess what it reasonable for their environment.

The unnecessary dependency on physical block devices was removed for
the checksum, grow_pool, and grow_replicas test groups so they are
no longer skipped.  Updated the filetest_001_pos test case to run
properly now that it is enabled and moved the grow tests in to a
single directory.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7638
2018-06-20 14:03:13 -07:00
John Gallagher 917f475fba Add tunables for channel programs
This patch adds tunables for modifying the maximum memory limit and
maximum instruction limit that can be specified when running a channel
program.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
External-issue: LX-1085
Closes #7618
2018-06-15 15:10:42 -07:00
John Gallagher ab24877bd3 ZTS: deletes home directories in /export/home
In the cleanup for the privilege tests, an empty variable, empty because
the corresponding setup is skipped on Linux, results in /export/home
being deleted. This patch adds an assertion that the variable is not
empty, and causes the cleanup to be skipped on Linux as well.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
External-issue: LX-1099
Closes #7615
2018-06-12 10:42:26 -07:00
bunder2015 5277571260 ZTS: cleanup user_run
user_run leaves two files in /tmp, moving them to $TEST_BASE_DIR and
adding them to the default cleanup routine.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7614
2018-06-12 10:37:12 -07:00
Antonio Russo 39042f9736 Tunable directory for zfs runtime scripts
zpool and zed place scripts in subdirectories of libexecdir. Some
distributions locate architecture independent scripts in other locations
(e.g. Debian). To avoid these paths getting out of sync, centralize the
definitions.

Build zfs-test's default.cfg by Makefile.  Use the new directory
logic building tests/zfs-tests/include/default.cfg.in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7597
2018-06-07 09:59:59 -07:00
Antonio Russo 7106b23640 Minor documentation, logging, and testing typos
This patch collects some minor inconsistencies and typos in the
documentation, logging and testing infrastructure.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7608
2018-06-07 09:38:39 -07:00
Alek P 6969afcefd Always continue recursive destroy after error
Currently, during a recursive zfs destroy the first error that is
encountered will stop the destruction of the datasets. Errors may
happen for a variety of reasons including competing deletions
and busy datasets.
This patch switches recursive destroy to always do a best-effort
recursive dataset destroy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7574
2018-06-06 10:14:52 -07:00
bunder2015 62841115bc ZTS: history path cleanup
History tests were hard coded to use /tmp and didn't clean up
properly after testing.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Issue #7507 
Closes #7600
2018-06-06 10:00:22 -07:00
Tony Hutter f0ed6c7448 Add pool state /proc entry, "SUSPENDED" pools
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7331 
Closes #7563
2018-06-06 09:33:54 -07:00
John Wren Kennedy ab44e511e2 OpenZFS 9245 - zfs-test failures: slog_013_pos and slog_014_pos
Test 13 would fail because of attempts to zpool destroy -f a pool that
was still busy. Changed those calls to destroy_pool which does a retry
loop, and the problem is no longer reproducible. Also removed some non
functional code in the test which is why it was originally commented out
by placing it after the call to log_pass.

Test 14 would fail because sometimes the check for a degraded pool would
complete before the pool had changed state. Changed the logic to check
in a loop with a timeout and the problem is no longer reproducible.

Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Porting Notes:
* Re-enabled slog_013_pos.ksh

OpenZFS-issue: https://illumos.org/issues/9245
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8f323b5
Closes #7585
2018-06-04 14:55:02 -07:00
Sara Hartse 74d42600d8 zpool reopen should detect expanded devices
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes #7546 
Issue #7582
2018-05-31 10:36:37 -07:00
John Wren Kennedy 93491c4bb9 OpenZFS 9082 - Add ZFS performance test targeting ZIL latency
This adds a new test to measure ZIL performance.

- Adds the ability to induce IO delays with zinject
- Adds a new variable (PERF_NTHREADS_PER_FS) to allow fio threads to
  be distributed to individual file systems as opposed to all IO going
  to one, as happens elsewhere.
- Refactoring of do_fio_run

Authored by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: John Wren Kennedy <jwk404@gmail.com>

OpenZFS-issue: https://www.illumos.org/issues/9082
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/634
External-issue: DLPX-48625
Closes #7491
2018-05-30 11:59:04 -07:00
Brian Behlendorf 93ce2b4ca5 Update build system and packaging
Minimal changes required to integrate the SPL sources in to the
ZFS repository build infrastructure and packaging.

Build system and packaging:
  * Renamed SPL_* autoconf m4 macros to ZFS_*.
  * Removed redundant SPL_* autoconf m4 macros.
  * Updated the RPM spec files to remove SPL package dependency.
  * The zfs package obsoletes the spl package, and the zfs-kmod
    package obsoletes the spl-kmod package.
  * The zfs-kmod-devel* packages were updated to add compatibility
    symlinks under /usr/src/spl-x.y.z until all dependent packages
    can be updated.  They will be removed in a future release.
  * Updated copy-builtin script for in-kernel builds.
  * Updated DKMS package to include the spl.ko.
  * Updated stale AUTHORS file to include all contributors.
  * Updated stale COPYRIGHT and included the SPL as an exception.
  * Renamed README.markdown to README.md
  * Renamed OPENSOLARIS.LICENSE to LICENSE.
  * Renamed DISCLAIMER to NOTICE.

Required code changes:
  * Removed redundant HAVE_SPL macro.
  * Removed _BOOT from nvpairs since it doesn't apply for Linux.
  * Initial header cleanup (removal of empty headers, refactoring).
  * Remove SPL repository clone/build from zimport.sh.
  * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due
    to build issues when forcing C99 compilation.
  * Replaced legacy ACCESS_ONCE with READ_ONCE.
  * Include needed headers for `current` and `EXPORT_SYMBOL`.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes #7556
2018-05-29 16:00:33 -07:00
Tony Nguyen ba863d0be4 Profiling for perf tests
Stack profiling is quite useful and Linux ZFS test suite does not
current collect that data.

Linux perf is a common tool for this purpose though the perf record
data file can be quite large. With this change, Linux ZFS perf tests
capture perf record data if perf is installed on the system and
PERF_DO_PROFILING environment variable is set.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
External-issue: LX-971
Closes #7549
2018-05-22 10:51:46 -07:00
George Melikov f46209dd6b Update `tests/README.md` and fix markdown
- there are more options now
- command examples are more readable in code style

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7538
2018-05-15 09:01:28 -07:00
Brian Behlendorf ab6a2b5cd7
ZTS: Improve zpool_scrub_004_pos reliability
It's possible for the `zpool attach` portion of this test case
to complete before the `zpool scrub` can be issued.  Update the
test case to force the resilvering phase to take longer.

Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5444 
Closes #7541
2018-05-15 08:58:46 -07:00
Brian Behlendorf 8c64fe0442
ZTS: Update O_TMPFILE support check
In CentOS 7.5 the kernel provided a compatibility wrapper to support
O_TMPFILE.  This results in the test setup script correctly detecting
kernel support.  But the ZFS module was built without O_TMPFILE
support due to the non-standard CentOS kernel interface.

Handle this case by updating the setup check to fail either when
the kernel or the ZFS module fail to provide support.  The reason
will be clearly logged in the test results.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7528
2018-05-14 20:36:30 -07:00
Pavel Zakharov 189bd0b670 OpenZFS 9190 - Fix cleanup routine in import_cachefile_device_replaced.ksh
Must clear slow-disk zinject injections in test cleanup routine.
Otherwise, when this test fails, it causes most subsequent tests
to fail.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9190
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/762c6b4
Closes #7530
2018-05-14 14:30:28 -04:00
bunder2015 29badadd4e Fix shebangs on import tests
Incorrect shebangs were used when porting.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7523  
Closes #7524
2018-05-11 12:37:44 -07:00
bunder2015 670d74b9ce ZTS: enospc_002 path cleanup
Removing hard-coded path used in enospc_002

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7515
2018-05-08 21:42:58 -07:00
Tim Chase f3d28f0a59 Streamline the zpool_import tests
Don't create an ext4 file system atop $DEV_DISKDIR/$DISK2.
There's likely to not be sufficient space for it to succeed.
Instead, simply create the vdev files in the directory where it
would have been mounted.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7459
2018-05-08 21:40:13 -07:00
Pavel Zakharov 6cb8e5306d OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

	7638 Refactor spa_load_impl into several functions
	8961 SPA load/import should tell us why it failed
	7277 zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
of data, this feature is for now restricted to a read-only import.

Porting notes (ZTS):
* Fix 'make dist' target in zpool_import

* The maximum path length allowed by tar is 99 characters.  Several
  of the new test cases exceeded this limit resulting in them not
  being included in the tarball.  Shorten the names slightly.

* Set/get tunables using accessor functions.

* Get last synced txg via the "zfs_txg_history" mechanism.

* Clear zinject handlers in cleanup for import_cache_device_replaced
  and import_rewind_device_replaced in order that the zpool can be
  exported if there is an error.

* Increase FILESIZE to 8G in zfs-test.sh to allow for a larger
  ext4 file system to be created on ZFS_DISK2.  Also, there's
  no need to partition ZFS_DISK2 at all.  The partitioning had
  already been disabled for multipath devices.  Among other things,
  the partitioning steals some space from the ext4 file system,
  makes it difficult to accurately calculate the paramters to
  parted and can make some of the tests fail.

* Increase FS_SIZE and FILE_SIZE in the zpool_import test
  configuration now that FILESIZE is larger.

* Write more data in order that device evacuation take lonnger in
  a couple tests.

* Use mkdir -p to avoid errors when the directory already exists.

* Remove use of sudo in import_rewind_config_changed.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9075
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/619c0123
Closes #7459
2018-05-08 21:35:27 -07:00
Paul Dagnelie ca0845d59e OpenZFS 9256 - zfs send space estimation off by > 10% on some datasets
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

Porting Notes:
* Added tuning to man page.
* Test case changes dropped, default behavior unchanged.

OpenZFS-issue: https://www.illumos.org/issues/9256
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/32356b3c56
Closes #7470
2018-05-08 08:59:24 -07:00
LOLi 4ceb8dd6fd Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom
dataset properties: this is because we mistakenly call
zfs_set_prop_nvlist() on the "real" pool name which, as expected,
cannot be found because the SPA is present in the namespace with the
temporary name.

Fix this by specifying the correct pool name when setting the dataset
properties.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7502 
Closes #7509
2018-05-07 21:11:58 -07:00
Brian Behlendorf c02c1becce
ZTS: Re-enable MMP tests
Commit 7fab6361 inadvertently disabled the MMP test cases by creating
and not removing an /etc/hostid file in the new zpool_split_props test
case.  When the file exists the ZTS skips the entire MMP test group
rather than modify what may be a system which is already configured.
Update the test case to remove the file.

Additionally, because the MMP tests were disabled a regression slipped
in as part of commit 9eb7b46ed0.  Fix it.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7514
2018-05-07 21:08:33 -07:00
bunder2015 a82a4a15be ZTS: remove dead cleanup code from snapshot tests
Caught during path cleanups, the files referenced do not appear to be
created or used anywhere.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7508
2018-05-06 21:02:10 -07:00
LOLi fb0da71fd9 ZTS: Fix zfs_diff_timestamp
When using mawk instead of gawk zfs_diff_timestamp fails consistently:
this is due to a subtle difference in how mawk handles substr().

From awk(1):
---
Finally, here is how mawk handles exceptional cases not discussed in
the AWK book or the Posix draft.  It is unsafe to assume consistency
across awks and safe to skip to the next section.
substr(s,  i,  n) returns the characters of s in the intersection of
the closed interval [1, length(s)] and the half-open interval [i, i+n).
When this intersection is empty, the empty string is returned; so
substr("ABC", 1, 0) = "" and substr("ABC", -4, 6) = "A".
---

To support running zfs_diff_timestamp with both gawk and mawk change
the second parameter passed to substr().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7503 
Closes #7510
2018-05-06 20:42:29 -07:00
Tom Caputi be9a5c355c Add support for decryption faults in zinject
This patch adds the ability for zinject to trigger decryption
and authentication faults in the ZIO and ARC layers. This
functionality is exposed via the new "decrypt" error type, which
may be provided for "data" object types.

This patch also refactors some of the core encryption / decryption
functions so that they have consistent prototypes, handle errors
consistently, and do not have unused arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7474
2018-05-02 15:36:20 -07:00
Tom Caputi 2c24b5b148 Fix issues found with zfs diff
Two deadlocks / ASSERT failures were introduced in a2c2ed1b which
would occur whenever arc_buf_fill() failed to decrypt a block of
data. This occurred because the call to arc_buf_destroy() which
was responsible for cleaning up the newly created buffer would
attempt to take out the hdr lock that it was already holding. This
was resolved by calling the underlying functions directly without
retaking the lock.

In addition, the dmu_diff() code did not properly ensure that keys
were loaded and mapped before begining dataset traversal. It turns
out that this code does not need to look at any encrypted values,
so the code was altered to perform raw IO only.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7354 
Closes #7456
2018-05-01 11:24:20 -07:00
loli10K 85ce3f4fd1 Adopt pyzfs from ClusterHQ
This commit introduces several changes:

 * Update LICENSE and project information

 * Give a good PEP8 talk to existing Python source code

 * Add RPM/DEB packaging for pyzfs

 * Fix some outstanding issues with the existing pyzfs code caused by
   changes in the ABI since the last time the code was updated

 * Integrate pyzfs Python unittest with the ZFS Test Suite

 * Add missing libzfs_core functions: lzc_change_key,
   lzc_channel_program, lzc_channel_program_nosync, lzc_load_key,
   lzc_receive_one, lzc_receive_resumable, lzc_receive_with_cmdprops,
   lzc_receive_with_header, lzc_reopen, lzc_send_resume, lzc_sync,
   lzc_unload_key, lzc_remap

Note: this commit slightly changes zfs_ioc_unload_key() ABI. This allow
to differentiate the case where we tried to unload a key on a
non-existing dataset (ENOENT) from the situation where a dataset has
no key loaded: this is consistent with the "change" case where trying
to zfs_ioc_change_key() from a dataset with no key results in EACCES.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7230
2018-05-01 10:33:35 -07:00
LOLi 3cbe89b12a Fix zfs incremental send remove '-o' properties
When receiving an incremental send stream with intermediary snapshots
zfs_receive_one() does not correctly identify the top-level dataset:
consequently we restore said snapshots as if they were children
datasets in the hierarchy, forcing inheritance of any property received
with 'zfs send -o' and effectively removing any locally set value.

The test case did not correctly verify this situation because it uses
adjacent snapshots, basically testing 'zfs send -i' instead of
'zfs send -I': this commit adds an additional intermediary snapshot to
the test script.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7478
2018-04-30 20:58:29 -07:00
Antonio Russo c83ccb3e72 Add test with two kinds of file creation orders
Data loss was identified in #7401 when many small files were copied.
This adds a reproducer for this bug and other similar ones: randomly
generate N files. Then, listing M of them by `ls -U` order, produce
those same files in a directory of the same name.

This triggers the bug consistently, provided N and M are large enough.
Here, N=2^16 and M=2^13.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7411
2018-04-30 12:45:47 -05:00
LOLi b4555c777a Fix 'zfs remap <poolname@snapname>'
Only filesystems and volumes are valid 'zfs remap' parameters: when
passed a snapshot name zfs_remap_indirects() does not handle the
EINVAL returned from libzfs_core, which results in failing an assertion
and consequently crashing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7454
2018-04-19 09:45:17 -07:00
Chunwei Chen 599b864813 Fix ENOSPC in "Handle zap_add() failures in ..."
Commit cc63068 caused ENOSPC error when copy a large amount of files
between two directories. The reason is that the patch limits zap leaf
expansion to 2 retries, and return ENOSPC when failed.

The intent for limiting retries is to prevent pointlessly growing table
to max size when adding a block full of entries with same name in
different case in mixed mode. However, it turns out we cannot use any
limit on the retry. When we copy files from one directory in readdir
order, we are copying in hash order, one leaf block at a time. Which
means that if the leaf block in source directory has expanded 6 times,
and you copy those entries in that block, by the time you need to expand
the leaf in destination directory, you need to expand it 6 times in one
go. So any limit on the retry will result in error where it shouldn't.

Note that while we do use different salt for different directories, it
seems that the salt/hash function doesn't provide enough randomization
to the hash distance to prevent this from happening.

Since cc63068 has already been reverted. This patch adds it back and
removes the retry limit.

Also, as it turn out, failing on zap_add() has a serious side effect for
mzap_upgrade(). When upgrading from micro zap to fat zap, it will
call zap_add() to transfer entries one at a time. If it hit any error
halfway through, the remaining entries will be lost, causing those files
to become orphan. This patch add a VERIFY to catch it.

Reviewed-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Albert Lee <trisk@forkgnu.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7401 
Closes #7421
2018-04-18 14:19:50 -07:00
Tom Caputi b0ee5946aa Fix issues with raw sends of spill blocks
This patch fixes 2 issues in how spill blocks are processed during
raw sends. The first problem is that compressed spill blocks were
using the logical length rather than the physical length to
determine how much data to dump into the send stream. The second
issue is a typo that caused the spill record's object number to be
used where the objset's ID number was required. Both issues have
been corrected, and the payload_size is now printed in zstreamdump
for future debugging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7378 
Closes #7432
2018-04-17 11:19:03 -07:00
Tom Caputi e14a32b1c8 Fix object reclaim when using large dnodes
Currently, when the receive_object() code wants to reclaim an
object, it always assumes that the dnode is the legacy 512 bytes,
even when the incoming bonus buffer exceeds this length. This
causes a buffer overflow if --enable-debug is not provided and
triggers an ASSERT if it is. This patch resolves this issue and
adds an ASSERT to ensure this can't happen again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7097
Closes #7433
2018-04-17 11:13:57 -07:00
bunder2015 b40d45bc6c ZTS: fix reservation_013_pos integer overflow
When using large disks the integers for calculating sizes can
overflow past 2**31.  Changing to long integers with typeset
should correct this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #4444 
Closes #7451
2018-04-17 10:52:53 -07:00
Matt Ahrens d830d4795a OpenZFS 9280 - Assertion failure while running removal_with_ganging test with 4K devices
Authored by: Matt Ahrens <Matt.Ahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9280
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/243952c
Closes #7445
2018-04-17 10:44:50 -07:00
bunder2015 3eba666332 ZTS: zpool_create_002 clean up leftover filedisk
zpool_create_002_pos did not clean up filedisk files left over from
running the test.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7435
Closes #7439
2018-04-15 15:17:44 -07:00
Toomas Soome 5e567da987 OpenZFS 9213 - zfs: sytem typo
Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Porting Notes:
* The additional instances of this typo addressed in the OpenZFS
  patch were already resolved.

OpenZFS-issue: https://illumos.org/issues/9213
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/edc8ef7d92
Closes #7436
2018-04-15 10:59:13 -07:00
Matthew Ahrens a1d477c24c OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete

This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk.  The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool.  An entry becomes obsolete when all the blocks that use
it are freed.  An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones).  Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible.  This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of
the data that is copied.  This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.

At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.

Porting Notes:

* Avoid zero-sized kmem_alloc() in vdev_compact_children().

    The device evacuation code adds a dependency that
    vdev_compact_children() be able to properly empty the vdev_child
    array by setting it to NULL and zeroing vdev_children.  Under Linux,
    kmem_alloc() and related functions return a sentinel pointer rather
    than NULL for zero-sized allocations.

* Remove comment regarding "mpt" driver where zfs_remove_max_segment
  is initialized to SPA_MAXBLOCKSIZE.

  Change zfs_condense_indirect_commit_entry_delay_ticks to
  zfs_condense_indirect_commit_entry_delay_ms for consistency with
  most other tunables in which delays are specified in ms.

* ZTS changes:

    Use set_tunable rather than mdb
    Use zpool sync as appropriate
    Use sync_pool instead of sync
    Kill jobs during test_removal_with_operation to allow unmount/export
    Don't add non-disk names such as "mirror" or "raidz" to $DISKS
    Use $TEST_BASE_DIR instead of /tmp
    Increase HZ from 100 to 1000 which is more common on Linux

    removal_multiple_indirection.ksh
        Reduce iterations in order to not time out on the code
        coverage builders.

    removal_resume_export:
        Functionally, the test case is correct but there exists a race
        where the kernel thread hasn't been fully started yet and is
        not visible.  Wait for up to 1 second for the removal thread
        to be started before giving up on it.  Also, increase the
        amount of data copied in order that the removal not finish
        before the export has a chance to fail.

* MMP compatibility, the concept of concrete versus non-concrete devices
  has slightly changed the semantics of vdev_writeable().  Update
  mmp_random_leaf_impl() accordingly.

* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
  feature which is not supported by OpenZFS.

* Added support for new vdev removal tracepoints.

* Test cases removal_with_zdb and removal_condense_export have been
  intentionally disabled.  When run manually they pass as intended,
  but when running in the automated test environment they produce
  unreliable results on the latest Fedora release.

  They may work better once the upstream pool import refectoring is
  merged into ZoL at which point they will be re-enabled.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2018-04-14 12:16:17 -07:00
Tim Chase 4b0f5b2d7b Wait for resilver after online
This test performs a rapid offline/online cycle of each of several
mirror vdevs.  It can run so quickly that there isn't sufficient pool
redundancy to perform an offline.  The solution is to wait until the
pool is resilvered following the online operation.

Also, add a pool sync before the offline operation to help reduce
spurious errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #6900
2018-04-13 18:04:10 -07:00
Seth Forshee 93b43af10d Allow mounting datasets more than once
Currently mounting an already mounted zfs dataset results in an
error, whereas it is typically allowed with other filesystems.
This causes some bad interactions with mount namespaces. Take
this sequence for example:

- Create a dataset
- Create a snapshot of the dataset
- Create a clone of the snapshot
- Create a new mount namespace
- Rename the original dataset

The rename results in unmounting and remounting the clone in the
original mount namespace, however the remount fails because the
dataset is still mounted in the new mount namespace. (Note that
this means the mount in the new mount namespace is never being
unmounted, so perhaps the unmount/remount of the clone isn't
actually necessary.)

The problem here is a result of the way mounting is implemented
in the kernel module. Since it is not mounting block devices it
uses mount_nodev() instead of the usual mount_bdev(). However,
mount_nodev() is written for filesystems for which each mount is
a new instance (i.e. a new super block), and zfs should be able
to detect when a mount request can be satisfied using an existing
super block.

Change zpl_mount() to call sget() directly with it's own test
callback. Passing the objset_t object as the fs data allows
checking if a superblock already exists for the dataset, and in
that case we just need to return a new reference for the sb's
root dentry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Closes #5796
Closes #7207
2018-04-13 10:44:05 -07:00
bunder2015 1e37dee03f ZTS: clean up leftover ibackup_trunc files
zfs_receive_raw_incremental did not clean up ibackup_trunc.* files
left over from running the test.

Also changing the path of the ibackup files so they can be placed
in the correct directories when /var/tmp is not the temporary
directory.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7430
2018-04-13 10:35:55 -07:00
LOLi 7fab636188 Add 'zpool split' coverage to the ZFS Test Suite
This change adds five new tests to the ZTS:

 * zpool_split_cliargs: verify command line options and arguments
 * zpool_split_devices: verify zpool split accepts a device list
 * zpool_split_encryption: verify zpool can split encrypted pools
 * zpool_split_props: verify zpool split can set property values
 * zpool_split_vdevs: verify vdev layout when splitting the pool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7409
2018-04-12 10:57:24 -07:00