Commit Graph

4743 Commits

Author SHA1 Message Date
Paul Zuchowski 2ffd89fcb9 Wrong error message when removing log device
In the case where the pool is loaded without the crypto
keys necessary to playback the intent log, and log device
removal is attempted, a generic busy message is received.
Change the message to inform the user that the datasets
must be mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #7518
2018-06-07 18:07:29 -07:00
Brian Behlendorf 174bcd581d
Fix preemptible warning in aggsum_add()
In the new aggsum counters the CPU_SEQID macro should be surrounded by
kpreempt_disable)() and kpreempt_enable() calls to prevent a Linux
kernel BUG warning.  The addsum_add() function use the cpuid to
minimize lock contention when selecting a bucket, after selection
the bucket is protected by a mutex and it is safe to reschedule the
process to a different processor at any time.

Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7609
Closes #7610
2018-06-07 15:55:11 -07:00
Antonio Russo 39042f9736 Tunable directory for zfs runtime scripts
zpool and zed place scripts in subdirectories of libexecdir. Some
distributions locate architecture independent scripts in other locations
(e.g. Debian). To avoid these paths getting out of sync, centralize the
definitions.

Build zfs-test's default.cfg by Makefile.  Use the new directory
logic building tests/zfs-tests/include/default.cfg.in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7597
2018-06-07 09:59:59 -07:00
Nathaniel Clark fba33c3819 Don't panic on bad SA_MAGIC in sa_build_index
If sa_build_index() encounters a corrupt buffer, don't panic.
Add info to zfs ring buffer and return EIO.  This allows for a cleaner
error recovery path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Issue #6500 
Closes #7487
2018-06-07 09:51:56 -07:00
Antonio Russo 7106b23640 Minor documentation, logging, and testing typos
This patch collects some minor inconsistencies and typos in the
documentation, logging and testing infrastructure.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7608
2018-06-07 09:38:39 -07:00
Tom Caputi b405837a6c Update the correct abd in l2arc_read_done()
This patch fixes an issue where l2arc_read_done() would always
write data to b_pabd, even if raw encrypted data was requested.
This only occured in cases where the L2ARC device had a different
ashift than the main pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7586 
Closes #7593
2018-06-06 10:17:50 -07:00
Tom Caputi e7504d7a18 Raw receive functions must not decrypt data
This patch fixes a small bug found where receive_spill() sometimes
attempted to decrypt spill blocks when doing a raw receive. In
addition, this patch fixes another small issue in arc_buf_fill()'s
error handling where a decryption failure (which could be caused by
the first bug) would attempt to set the arc header's IO_ERROR flag
without holding the header's lock.

Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7564 
Closes #7584 
Closes #7592
2018-06-06 10:16:41 -07:00
Alek P 6969afcefd Always continue recursive destroy after error
Currently, during a recursive zfs destroy the first error that is
encountered will stop the destruction of the datasets. Errors may
happen for a variety of reasons including competing deletions
and busy datasets.
This patch switches recursive destroy to always do a best-effort
recursive dataset destroy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes #7574
2018-06-06 10:14:52 -07:00
bunder2015 62841115bc ZTS: history path cleanup
History tests were hard coded to use /tmp and didn't clean up
properly after testing.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Issue #7507 
Closes #7600
2018-06-06 10:00:22 -07:00
Paul Dagnelie 37fb3e4318 OpenZFS 8484 - Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Paul Dagnelie <pcd@delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/8484
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7028a8b92b7
Issue #3752
Closes #7462
2018-06-06 09:35:59 -07:00
Tony Hutter f0ed6c7448 Add pool state /proc entry, "SUSPENDED" pools
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7331 
Closes #7563
2018-06-06 09:33:54 -07:00
Brian Behlendorf 2d9142c9d4
Remove rwlock wrappers
The only remaining consumer of the rwlock compatibility wrappers
is ztest.  Remove the wrappers and convert the few remaining
calls to the underlying pthread functions.

    rwlock_init()    -> pthread_rwlock_init()
    rwlock_destroy() -> pthread_rwlock_destroy()
    rw_rdlock()      -> pthread_rwlock_rdlock()
    rw_wrlock()      -> pthread_rwlock_wrlock()
    rw_unlock()      -> pthread_rwlock_unlock()

Note pthread_rwlock_init() defaults to PTHREAD_PROCESS_PRIVATE
which is equivilant to the USYNC_THREAD behavior.  There is no
functional change.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7591
2018-06-04 16:52:10 -07:00
Serapheim Dimitropoulos e48afbc4eb OpenZFS 9464 - txg_kick() fails to see that we are quiescing
txg_kick() fails to see that we are quiescing, forcing transactions to
their next stages without leaving them accumulate changes

Creating a fragmented pool in a DCenter VM and continuously writing to it with
multiple instances of randwritecomp, we get the following output from txg.d:

    0ms   311MB in  4114ms (95% p1)  75MB/s  544MB (76%)  336us   153ms     0ms
    0ms     8MB in    51ms ( 0% p1) 163MB/s  474MB (66%)  129us    34ms     0ms
    0ms   366MB in  4454ms (93% p1)  82MB/s  572MB (79%)  498us    20ms     0ms
    0ms   406MB in  5212ms (95% p1)  77MB/s  591MB (82%)  661us    37ms     0ms
    0ms   340MB in  5110ms (94% p1)  66MB/s  622MB (86%) 1048us    41ms     1ms
    0ms     3MB in    61ms ( 0% p1)  51MB/s  419MB (58%)   33us     0ms     0ms
    0ms   361MB in  3555ms (88% p1) 101MB/s  542MB (75%)  335us    40ms     0ms
    0ms   356MB in  4592ms (92% p1)  77MB/s  561MB (78%)  430us    89ms     1ms
    0ms    11MB in   129ms (13% p1)  90MB/s  507MB (70%)  222us    15ms     0ms
    0ms   281MB in  2520ms (89% p1) 111MB/s  542MB (75%)  334us    42ms     0ms
    0ms   383MB in  3666ms (91% p1) 104MB/s  557MB (77%)  411us   133ms     0ms
    0ms   404MB in  5757ms (94% p1)  70MB/s  635MB (88%) 1274us   123ms     2ms
    4ms   367MB in  4172ms (89% p1)  88MB/s  556MB (77%)  401us    51ms     0ms
    0ms    42MB in   470ms (44% p1)  90MB/s  557MB (77%)  412us    43ms     0ms
    0ms   261MB in  2273ms (88% p1) 114MB/s  556MB (77%)  407us    27ms     0ms
    0ms   394MB in  3646ms (85% p1) 108MB/s  552MB (77%)  393us   304ms     0ms
    0ms   275MB in  2416ms (89% p1) 113MB/s  510MB (71%)  200us    53ms     0ms
    0ms     9MB in    53ms ( 0% p1) 169MB/s  483MB (67%)  140us   100ms     1ms

The TXGs that are getting synced and don't have lots of changes are pushed by
txg_kick() which basically forces the current open txg to get to the quiesced
state:

        if (tx->tx_syncing_txg == 0 &&
        tx->tx_quiesce_txg_waiting <= tx->tx_open_txg &&
        tx->tx_sync_txg_waiting <= tx->tx_synced_txg &&
        tx->tx_quiesced_txg <= tx->tx_synced_txg) {
        tx->tx_quiesce_txg_waiting = tx->tx_open_txg + 1;
        cv_broadcast(&tx->tx_quiesce_more_cv);
    }

The problem is that the above code doesn't check if we are currently quiescing
anything (only if a quiesce or a sync has been requested, ..etc) so the
following scenario can happen:

1] We have an open txg A that had enough dirty data (more than
   zfs_dirty_data_sync) and it was pushed to the quiesced state, and opened
   a new txg B. No txg is currently being synced.
2] Immediately after the opening of B, txg_kick() was run by some other write
   (and because of A's dirty data) and saw that we are not currently syncing
   any txg and no one has requested quiescing so it requests one by bumping
   tx_quiesce_txg_waiting and broadcasts the quiesce thread.
3] The quiesce thread just passed txg A to be synced and sees that a quiescing
   request has been sent to it so it immediately grabs B without letting it
   gather enough data, putting it in a quiesced state and opening a new txg C.

In this scenario txg B, is an example of how the entries of interest show up in
the txg.d output.

Ideally we would like txg_kick() to get triggered only when we are sure that
we are not syncing AND not quiescing any txg. This way we can kick an open TXG
to the quiescing state when we are sure that there is nothing going on and we
would benefit from the different states running concurrently.

Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9464
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1cd7635b
Closes #7587
2018-06-04 14:56:06 -07:00
John Wren Kennedy ab44e511e2 OpenZFS 9245 - zfs-test failures: slog_013_pos and slog_014_pos
Test 13 would fail because of attempts to zpool destroy -f a pool that
was still busy. Changed those calls to destroy_pool which does a retry
loop, and the problem is no longer reproducible. Also removed some non
functional code in the test which is why it was originally commented out
by placing it after the call to log_pass.

Test 14 would fail because sometimes the check for a degraded pool would
complete before the pool had changed state. Changed the logic to check
in a loop with a timeout and the problem is no longer reproducible.

Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Porting Notes:
* Re-enabled slog_013_pos.ksh

OpenZFS-issue: https://illumos.org/issues/9245
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8f323b5
Closes #7585
2018-06-04 14:55:02 -07:00
Pavel Zakharov 8a393be353 OpenZFS 9235 - rename zpool_rewind_policy_t to zpool_load_policy_t
We want to be able to pass various settings during import/open of a
pool, which are not only related to rewind. Instead of adding a new
policy and duplicate a bunch of code, we should just rename
rewind_policy to a more generic term like load_policy.

For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we
want those flags not only for the import case, but also for the open
case. One such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb)
which would allow zfs to open a pool when logs are missing.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9235
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d2b1e44
Closes #7532
2018-06-04 14:54:20 -07:00
bunder2015 85912983a4 Fix typoes in zpool man page
Fixed some highlighting in the zpool man page

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7596
2018-06-04 09:06:16 -07:00
Matthew Ahrens 1a5b96b8ee OpenZFS 9329 - panic in zap_leaf_lookup() due to concurrent zapification
For the null pointer issue shown below, the solution is to initialize the
contents of the object before changing its type, so that concurrent accessors
will see it as non-zapified until it is ready for access via the ZAP.

    BAD TRAP: type=e (#pf Page fault) rp=ffffff00ff520440 addr=20 occurred
    in module "zfs" due to a NULL pointer dereference

    ffffff00ff520320 unix:die+df ()
    ffffff00ff520430 unix:trap+dc0 ()
    ffffff00ff520440 unix:cmntrap+e6 ()
    ffffff00ff520590 zfs:zap_leaf_lookup+46 ()
    ffffff00ff520640 zfs:fzap_lookup+a9 ()
    ffffff00ff5206e0 zfs:zap_lookup_norm+111 ()
    ffffff00ff520730 zfs:zap_contains+42 ()
    ffffff00ff520760 zfs:dsl_dataset_has_resume_receive_state+47 ()
    ffffff00ff520900 zfs:get_receive_resume_stats+3e ()
    ffffff00ff520a90 zfs:dsl_dataset_stats+262 ()
    ffffff00ff520ac0 zfs:dmu_objset_stats+2b ()
    ffffff00ff520b10 zfs:zfs_ioc_objset_stats_impl+64 ()
    ffffff00ff520b60 zfs:zfs_ioc_objset_stats+33 ()
    ffffff00ff520bd0 zfs:zfs_ioc_dataset_list_next+140 ()
    ffffff00ff520c80 zfs:zfsdev_ioctl+4d7 ()
    ffffff00ff520cc0 genunix:cdev_ioctl+39 ()
    ffffff00ff520d10 specfs:spec_ioctl+60 ()
    ffffff00ff520da0 genunix:fop_ioctl+55 ()
    ffffff00ff520ec0 genunix:ioctl+9b ()
    ffffff00ff520f10 unix:brand_sys_sysenter+1c9 ()

Porting Notes:
* DMU_OT_BYTESWAP conditional in zap_lockdir_impl() kept.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9329
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e8e0f97
Closes #7578
2018-05-31 10:53:49 -07:00
Matthew Ahrens d2a12f9e2a OpenZFS 9328 - zap code can take advantage of c99
The ZAP code was written before we allowed c99 in the Solaris kernel. We
should change it to take advantage of being able to declare variables where
they are first used. This reduces variable scope and means less scrolling
to find the type of variables.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9328
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/76ead05
Closes #7578
2018-05-31 10:53:11 -07:00
Sara Hartse 74d42600d8 zpool reopen should detect expanded devices
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes #7546 
Issue #7582
2018-05-31 10:36:37 -07:00
Matthew Ahrens d1f06ec5bc make install only works once
`make install` shouldn't fail if a directory it created still exists.
In this case we can blow away the spl src directory before recreating
it.  This also gracefully handles the migration from pre-spl-merge to
post-spl-merge.

Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7580
2018-05-31 09:19:59 -07:00
Antonio Russo 928046b744 Explicitly state supported Linux versions
Add META tags Linux-Maximum and Linux-Minimum.

One pain point for package maintainers is ensuring the compatibility of
the packaged version of ZFS with the Linux kernel. By providing an
authoritative compatibility guide in the source tree, maintainers can
automate compatibility checks.

Additionally, increase META string extraction specificity.
configure.ac finds Name and Version by a very simple `grep`, which might
conceivably find other fields. Require the string be at the beginning of
a line, and be followed by a colon to avoid such confusions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7571
2018-05-30 20:11:19 -07:00
John Wren Kennedy 93491c4bb9 OpenZFS 9082 - Add ZFS performance test targeting ZIL latency
This adds a new test to measure ZIL performance.

- Adds the ability to induce IO delays with zinject
- Adds a new variable (PERF_NTHREADS_PER_FS) to allow fio threads to
  be distributed to individual file systems as opposed to all IO going
  to one, as happens elsewhere.
- Refactoring of do_fio_run

Authored by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: John Wren Kennedy <jwk404@gmail.com>

OpenZFS-issue: https://www.illumos.org/issues/9082
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/634
External-issue: DLPX-48625
Closes #7491
2018-05-30 11:59:04 -07:00
Tony Hutter c26cf0966d Fix zio->io_priority failed (7 < 6) assert
This fixes an assert in vdev_queue_change_io_priority():

  VERIFY3(zio->io_priority < ZIO_PRIORITY_NUM_QUEUEABLE) failed (7 < 6)
  PANIC at vdev_queue.c:832:vdev_queue_change_io_priority()

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7566 
Closes #7542
2018-05-29 18:13:48 -07:00
Steffen Müthing 3c28c63642 Install basename utility into dracut initramfs
vdev_id requires the program `basename` when handling short aliases
defined in `vdev_id.conf` (those defined without a leading path), but
`basename` is not always available in the dracut environment. This
causes the pool device names to change when using `by-vdev/` devices
or (in extreme cases) can make the pool import fail in dracut.

This commit fixes the problem by explicitly installing `basename`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Closes #7562
2018-05-29 17:32:05 -07:00
Brian Behlendorf 93ce2b4ca5 Update build system and packaging
Minimal changes required to integrate the SPL sources in to the
ZFS repository build infrastructure and packaging.

Build system and packaging:
  * Renamed SPL_* autoconf m4 macros to ZFS_*.
  * Removed redundant SPL_* autoconf m4 macros.
  * Updated the RPM spec files to remove SPL package dependency.
  * The zfs package obsoletes the spl package, and the zfs-kmod
    package obsoletes the spl-kmod package.
  * The zfs-kmod-devel* packages were updated to add compatibility
    symlinks under /usr/src/spl-x.y.z until all dependent packages
    can be updated.  They will be removed in a future release.
  * Updated copy-builtin script for in-kernel builds.
  * Updated DKMS package to include the spl.ko.
  * Updated stale AUTHORS file to include all contributors.
  * Updated stale COPYRIGHT and included the SPL as an exception.
  * Renamed README.markdown to README.md
  * Renamed OPENSOLARIS.LICENSE to LICENSE.
  * Renamed DISCLAIMER to NOTICE.

Required code changes:
  * Removed redundant HAVE_SPL macro.
  * Removed _BOOT from nvpairs since it doesn't apply for Linux.
  * Initial header cleanup (removal of empty headers, refactoring).
  * Remove SPL repository clone/build from zimport.sh.
  * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due
    to build issues when forcing C99 compilation.
  * Replaced legacy ACCESS_ONCE with READ_ONCE.
  * Include needed headers for `current` and `EXPORT_SYMBOL`.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes #7556
2018-05-29 16:00:33 -07:00
Brian Behlendorf 1272941f49 Merge branch 'zfsonlinux/merge-spl'
Merge a minimal version of the zfsonlinux/spl repository in to the
zfsonlinux/zfs repository.  Care was taken to prevent file conflicts
when merging and to preserve the spl repository history.  The spl
kernel module remains under the GPLv2 license as documented by the
additional THIRDPARTYLICENSE.gplv2 file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2018-05-29 14:57:55 -07:00
Brian Behlendorf a91258913f Prepare SPL repo to merge with ZFS repo
This commit removes everything from the repository except the core
SPL implementation for Linux.  Those files which remain have been
moved to non-conflicting locations to facilitate the merge.
The README.md and associated files have been updated accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2018-05-29 14:51:39 -07:00
Antonio Russo 3e5300e0ed Support Debian DKMS builds
scripts/dkms.mkconf calls configure with
`--with-linux=${kernel_source_dir}`, but Debian puts it kernel source at
`/lib/modules/<version>/source`. This patch adds the same logic to the
DKMS file produced by `scripts/dkms.mkconf` that Debian has shipped in
its official ZFS packaging: at DKMS build time, it checks if the system
is a Debian system, and adjusts the path accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7358 
Closes #7540 
Closes #7554
2018-05-26 10:56:24 -07:00
Jorgen Lundman 561ba8d1b1 OpenZFS 9523 - Large alloc in zdb can cause trouble
16MB alloc in zdb_embedded_block() can cause cores in certain
situations (clang, gcc55).

Authored by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Porting Notes:
* Replaces an equivalent fix previously made for Linux.

OpenZFS-issue: https://illumos.org/issues/9523
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2c1964a
Closes #7561
2018-05-25 17:24:57 -07:00
Matthew Ahrens 0dc2f70c5c OpenZFS 9486 - reduce memory used by device removal on fragmented pools
Device removal allocates a new location for each allocated segment on
the disk that's being removed.  Each allocation results in one entry in
the mapping table, which maps from old location + length to new
location.  When a fragmented disk is removed, this can result in a large
number of mapping entries, and thus a large amount of memory consumed by
the mapping table.  In the worst real-world cases, we've seen around 1GB
of RAM per 1TB of storage removed.

We can improve on this situation by allocating larger segments, which
span across both allocated and free regions of the device being removed.
By including free regions in the allocation (and thus mapping), we
reduce the number of mapping entries.  For example, if we have a 4K
allocation followed by 1K free and then 4K allocated, we would allocate
4+1+4 = 9KB, and then move the entire region (including allocated and
free parts).  In this case we used one mapping where previously we would
have used two, but often the ratio is much higher (up to 20:1 in
real-world use).  We then need to mark the regions that were free on the
removing device as free in the new locations, and also obsolete in the
mapping entry.

This method preserves the fragmentation of the removing device, rather
than consolidating its allocated space into a small number of chunks
where possible.  But it results in drastic reduction of memory used by
the mapping table - around 20x in the most-fragmented cases.

In the most fragmented real-world cases, this reduces memory used by the
mapping from ~1GB to ~50MB of RAM per 1TB of storage removed.  Less
fragmented cases will typically also see around 50-100MB of RAM per 1TB
of storage.

Porting notes:

* Add the following as module parameters:
    * zfs_condense_indirect_vdevs_enable
    * zfs_condense_max_obsolete_bytes

* Document the following module parameters:
   * zfs_condense_indirect_vdevs_enable
   * zfs_condense_max_obsolete_bytes
   * zfs_condense_min_mapping_bytes

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9486
OpenZFS-commit: https://github.com/ahrens/illumos/commit/07152e142e44c
External-issue: DLPX-57962
Closes #7536
2018-05-24 10:18:07 -07:00
Tony Nguyen ba863d0be4 Profiling for perf tests
Stack profiling is quite useful and Linux ZFS test suite does not
current collect that data.

Linux perf is a common tool for this purpose though the perf record
data file can be quite large. With this change, Linux ZFS perf tests
capture perf record data if perf is installed on the system and
PERF_DO_PROFILING environment variable is set.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
External-issue: LX-971
Closes #7549
2018-05-22 10:51:46 -07:00
Matthew Ahrens a430cef9cd Create "bin" directory so that zloop.sh works
Before running zloop.sh, we need to run `scripts/zfs-tests.sh -c` to
create and populate the `bin` directory with symlinks to our utilities.
Rather than making developers remember to do this, `make` should do it
for them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7525 
Closes #7547
2018-05-21 10:36:59 -07:00
George Melikov 43eb39d6cc Small cleanup of PR and issue templates
- Add links to PULL_REQUEST_TEMPLATE.md
- Clean `System information` table

It's easier to find needes documentation about
PR process with links.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7539
2018-05-15 09:02:57 -07:00
George Melikov f46209dd6b Update `tests/README.md` and fix markdown
- there are more options now
- command examples are more readable in code style

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7538
2018-05-15 09:01:28 -07:00
Brian Behlendorf ab6a2b5cd7
ZTS: Improve zpool_scrub_004_pos reliability
It's possible for the `zpool attach` portion of this test case
to complete before the `zpool scrub` can be issued.  Update the
test case to force the resilvering phase to take longer.

Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5444 
Closes #7541
2018-05-15 08:58:46 -07:00
Brian Behlendorf 8c64fe0442
ZTS: Update O_TMPFILE support check
In CentOS 7.5 the kernel provided a compatibility wrapper to support
O_TMPFILE.  This results in the test setup script correctly detecting
kernel support.  But the ZFS module was built without O_TMPFILE
support due to the non-standard CentOS kernel interface.

Handle this case by updating the setup check to fail either when
the kernel or the ZFS module fail to provide support.  The reason
will be clearly logged in the test results.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7528
2018-05-14 20:36:30 -07:00
Pavel Zakharov 38a19edd34 OpenZFS 9189 - Add debug to vdev_label_read_config when txg check fails
These changes were added to help debug issue #9187.

Essentially, in the original bug, vdev_validate() seems to fails in
vdev_label_read_config() and prints "failed reading config". This could
happen because either:
1. The labels are actually corrupt and zio_wait() fails for all of them
2. The labels were discarded because they didn't pass the txg check.

Beyond 9187, having debug info when case 2 happens could be useful in
other scenarios, such as zpool import.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Approved by: Matt Ahrens <mahrens@delphix.com>
Ported-by:  Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9189
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f6af1b7
Closes #7533
2018-05-14 14:32:49 -04:00
Pavel Zakharov db7d07e14b OpenZFS 9191 - dump vdev tree to zfs_dbgmsg when spa load fails due to missing log devices
Add vdev_print_tree() in spa_check_for_missing_logs() when some log
devices are missing to ease debugging

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9191
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c5c02e5
Closes #7531
2018-05-14 14:30:52 -04:00
Pavel Zakharov 189bd0b670 OpenZFS 9190 - Fix cleanup routine in import_cachefile_device_replaced.ksh
Must clear slow-disk zinject injections in test cleanup routine.
Otherwise, when this test fails, it causes most subsequent tests
to fail.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9190
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/762c6b4
Closes #7530
2018-05-14 14:30:28 -04:00
Pavel Zakharov a11c7aaec9 OpenZFS 9187 - racing condition between vdev label and spa_last_synced_txg in vdev_validate
ztest failed with uncorrectable IO error despite having the fix for
7163.  Both sides of the mirror have CANT_OPEN_BAD_LABEL, which also
distinguishes it from that issue.

Definitely seems like a racing condition between the vdev_validate
and spa_sync:
1. Thread A (spa_sync): vdev label is updated to latest txg
2. Thread B (vdev_validate): vdev label's txg is compared to
   spa_last_synced_txg and is ahead.
3. Thread A (spa_sync): spa_last_synced_txg is updated to latest txg.

Solution: do not check txg in vdev_validate unless config lock is held.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9187
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/805fda72
Closes #7529
2018-05-14 14:28:09 -04:00
Brian Behlendorf b669ab83bb
Ignore *.o.ur-safe build artifacts
Generated when building on Ubuntu 18.04.  Also ignore the new
dynamically generated zfs-mount-generator.8 man page, and the
module/.cache.mk file.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7534
2018-05-13 18:59:02 -07:00
Olaf Faaland bc5f51c5de module param callbacks check for initialized spa
Callbacks provided for module parameters are executed both
after the module is loaded, when a user alters it via sysfs, e.g
	echo bar > /sys/modules/zfs/parameters/foo

as well as when the module is loaded with an argument, e.g.
	modprobe zfs foo=bar

In the latter case, the init functions likely have not run yet,
including spa_init() which initializes the namespace lock so it is safe
to use.

Instead of immediately taking the namespace lock and attemping to
iterate over initialized spa structures, check whether spa_mode_global
is nonzero.  This is set by spa_init() after it has initialized the
namespace lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7496 
Closes #7521
2018-05-11 12:46:07 -07:00
Antonio Russo 68fded8146 Add canonical mount options zfs-mount-generator
lib/libzfs/libzfs_mount.c:zfs_add_options provides the canonical
mount options used by a `zfs mount` command. Because we cannot call
`zfs mount` directly from a systemd.mount unit, we mirror that logic
in zfs-mount-generator.

The zed script is updated to cache these properties as well.

Include a mini-tutorial in the manual page, properly substitute
configuration paths in zfs-mount-generator.8.in, and standardize the
Makefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7453
2018-05-11 12:44:14 -07:00
bunder2015 29badadd4e Fix shebangs on import tests
Incorrect shebangs were used when porting.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7523  
Closes #7524
2018-05-11 12:37:44 -07:00
Tim Chase d1043e2f6d Unify behavior of deadman parameters
The zfs_deadman_failmode, zfs_deadman_ziotime_ms and
zfs_deadman_synctime_ms paramaters are stored per-pool.  However,
only the zfs_deadman_failmode updates the per-pool state when it's
change.  This patch gives adds the same behavior to the other two
for consistency.

Also, in all 3 three cases, only update the per-pool parameters
if spa_init() has actually been called in order to avoid panicking
when trying to take a lock on the spa_namespace_lock mutex.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #7499
2018-05-08 21:45:47 -07:00
bunder2015 670d74b9ce ZTS: enospc_002 path cleanup
Removing hard-coded path used in enospc_002

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7515
2018-05-08 21:42:58 -07:00
Tim Chase f3d28f0a59 Streamline the zpool_import tests
Don't create an ext4 file system atop $DEV_DISKDIR/$DISK2.
There's likely to not be sufficient space for it to succeed.
Instead, simply create the vdev files in the directory where it
would have been mounted.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7459
2018-05-08 21:40:13 -07:00
Tim Chase a0ad7ca54e Clear vdev_faulted
Clear vdev_faulted if ZPOOL_CONFIG_AUX_STATE is not set to "external"

ZoL supports "zpool export -f" (force fault), which can be combined
with "-t" (temporary fault; don't persist across export/import) and
causes a MOS configuration to be set with ZPOOL_CONFIG_FAULTED=1
and without ZFS_CONFIG_AUX_STATE set at all.  In this case, the
previously-offlined vdev should be imported in an on-line state and.
Clearing the "vdev_faulted" flag causes the import to treat the
device as on-line.  Typically, resilver will catch it up based on
its DTL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #7459
2018-05-08 21:39:50 -07:00
Pavel Zakharov 6cb8e5306d OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

	7638 Refactor spa_load_impl into several functions
	8961 SPA load/import should tell us why it failed
	7277 zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
of data, this feature is for now restricted to a read-only import.

Porting notes (ZTS):
* Fix 'make dist' target in zpool_import

* The maximum path length allowed by tar is 99 characters.  Several
  of the new test cases exceeded this limit resulting in them not
  being included in the tarball.  Shorten the names slightly.

* Set/get tunables using accessor functions.

* Get last synced txg via the "zfs_txg_history" mechanism.

* Clear zinject handlers in cleanup for import_cache_device_replaced
  and import_rewind_device_replaced in order that the zpool can be
  exported if there is an error.

* Increase FILESIZE to 8G in zfs-test.sh to allow for a larger
  ext4 file system to be created on ZFS_DISK2.  Also, there's
  no need to partition ZFS_DISK2 at all.  The partitioning had
  already been disabled for multipath devices.  Among other things,
  the partitioning steals some space from the ext4 file system,
  makes it difficult to accurately calculate the paramters to
  parted and can make some of the tests fail.

* Increase FS_SIZE and FILE_SIZE in the zpool_import test
  configuration now that FILESIZE is larger.

* Write more data in order that device evacuation take lonnger in
  a couple tests.

* Use mkdir -p to avoid errors when the directory already exists.

* Remove use of sudo in import_rewind_config_changed.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9075
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/619c0123
Closes #7459
2018-05-08 21:35:27 -07:00
Pavel Zakharov afd2f7b711 OpenZFS 8962 - zdb should work on non-idle pools
Currently `zdb` consistently fails to examine non-idle pools as it
fails during the `spa_load()` process. The main problem seems to be
that `spa_load_verify()` fails as can be seen below:

    $ sudo zdb -d -G dcenter
    zdb: can't open 'dcenter': I/O error

    ZFS_DBGMSG(zdb):
    spa_open_common: opening dcenter
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c4t11d0s0': best uberblock found for spa dcenter. txg 40824950
    spa_load(dcenter): using uberblock with txg=40824950
    spa_load(dcenter): UNLOADING
    spa_load(dcenter): RELOADING
    spa_load(dcenter): LOADING
    disk vdev '/dev/dsk/c3t10d0s0': best uberblock found for spa dcenter. txg 40824952
    spa_load(dcenter): using uberblock with txg=40824952
    spa_load(dcenter): FAILED: spa_load_verify failed [error=5]
    spa_load(dcenter): UNLOADING

This change makes `spa_load_verify()` a dryrun when ran from
`zdb`. This is done by creating a global flag in zfs and then setting
it in `zdb`.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/8962
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/180ad792
Closes #7459
2018-05-08 21:32:57 -07:00