Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Brian Behlendorf	a769fb53a1	Add 'zpool status -i' option Only display the full details of the vdev initialization state in 'zpool status' output when requested with the -i option. By default display '(initializing)' after vdevs when they are being actively initialized. This is consistent with the established precident of appending '(resilvering), etc' and fits within the default 80 column terminal width making it easy to read. Additionally, updated the 'zpool initialize' documentation to make it clear the options are mutually exclusive, but allow duplicate options like all other zfs/zpool commands. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8230	2019-01-07 11:03:18 -08:00
George Wilson	c10d37dd9f	zfs initialize performance enhancements PROBLEM ======== When invoking "zpool initialize" on a pool the command will create a thread to initialize each disk. Unfortunately, it does this serially across many transaction groups which can result in commands taking a long time to return to the user and may appear hung. The same thing is true when trying to suspend/cancel the operation. SOLUTION ========= This change refactors the way we invoke the initialize interface to ensure we can start or stop the intialization in just a few transaction groups. When stopping or cancelling a vdev initialization perform it in two phases. First signal each vdev initialization thread that it should exit, then after all threads have been signaled wait for them to exit. On a pool with 40 leaf vdevs this reduces the vdev initialize stop/cancel time from ~10 minutes to under a second. The reason for this is spa_vdev_initialize() no longer needs to wait on multiple full TXGs per leaf vdev being stopped. This commit additionally adds some missing checks for the passed "initialize_vdevs" input nvlist. The contents of the user provided input "initialize_vdevs" nvlist must be validated to ensure all values are uint64s. This is done in zfs_ioc_pool_initialize() in order to keep all of these checks in a single location. Updated the innvl and outnvl comments to match the formatting used for all other new sytle ioctls. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Wilson <george.wilson@delphix.com> Closes #8230	2019-01-07 11:03:08 -08:00
George Wilson	619f097693	OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230	2019-01-07 10:37:26 -08:00
Brian Behlendorf	6e72a5b9b6	pyzfs: python3 support (build system) Almost all of the Python code in the respository has been updated to be compatibile with Python 2.6, Python 3.4, or newer. The only exceptions are arc_summery3.py which requires Python 3, and pyzfs which requires at least Python 2.7. This allows us to maintain a single version of the code and support most default versions of python. This change does the following: * Sets the default shebang for all Python scripts to python3. If only Python 2 is available, then at install time scripts which are compatible with Python 2 will have their shebangs replaced with /usr/bin/python. This is done for compatibility until Python 2 goes end of life. Since only the installed versions are changed this means Python 3 must be installed on the system for test-runner when testing in-tree. * Added --with-python=<2\|3\|3.4,etc> configure option which sets the PYTHON environment variable to target a specific python version. By default the newest installed version of Python will be used or the preferred distribution version when creating pacakges. * Fixed --enable-pyzfs configure checks so they are run when --enable-pyzfs=check and --enable-pyzfs=yes. * Enabled pyzfs for Python 3.4 and newer, which is now supported. * Renamed pyzfs package to python<VERSION>-pyzfs and updated to install in the appropriate site location. For example, when building with --with-python=3.4 a python34-pyzfs will be created which installs in /usr/lib/python3.4/site-packages/. * Renamed the following python scripts according to the Fedora guidance for packaging utilities in /bin - dbufstat.py -> dbufstat - arcstat.py -> arcstat - arc_summary.py -> arc_summary - arc_summary3.py -> arc_summary3 * Updated python-cffi package name. On CentOS 6, CentOS 7, and Amazon Linux it's called python-cffi, not python2-cffi. For Python3 it's called python3-cffi or python3x-cffi. * Install one version of arc_summary. Depending on the version of Python available install either arc_summary2 or arc_summary3 as arc_summary. The user output is only slightly different. Reviewed-by: John Ramsden <johnramsden@riseup.net> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8096	2019-01-06 10:39:41 -08:00
Brian Behlendorf	65ca2c1eb9	Fix 'zpool remap' freeing race The dmu_objset_remap_indirects_impl() logic depends on dnode_hold() returning ENOENT for dnodes which will be freed and should be skipped. This behavior can only be relied upon when taking a new hold and while the caller has an open transaction. This ensures that the open txg cannot advance and that a concurrent free will end up in the same txg (which is critical). Relying on an existing hold will not prevent dnode_free() from succeeding. The solution is to take an additional dnode_hold() after assigning the transaction. This ensures the remap will never dirty the dnode if it was freed while we were waiting in dmu_tx_assign(, TXG_WAIT). Randomly set zfs_object_remap_one_indirect_delay_ms in ztest. This increases the likelihood of an operation racing with the remap. Converted from ticks to milliseconds. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Tom Caputi <tcaputi@datto.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8215	2019-01-02 11:46:04 -08:00
Tony Hutter	c66401fae0	Add enclosure_symlinks option to vdev_id Add an 'enclosure_symlinks' option to vdev_id.conf. This creates consistently named symlinks to the enclosure devices (/dev/sg*) based off the configuration in vdev_id.conf. The enclosure symlinks show up in /dev/by-enclosure/<prefix>-<channel><num>. The links make it make it easy to run sg_ses on a particular enclosure device. The enclosure links are created in addition to the normal /dev/disk/by-vdev links. 'enclosure_symlinks' is only valid in sas_direct configurations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Simon Guest <simon.guest@tesujimath.org> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #8194	2018-12-14 17:27:49 -08:00
Brian Behlendorf	0dd6b6bfcb	ztest: ENOSPC in ztest_objset_destroy_cb() While unlikely it is possible for dsl_destroy_head() to return ENOSPC in the ztest_objset_destroy_cb(). This can occur even when ZFS_SPACE_CHECK_DESTROY is used with the dsl_sync_task(). Both the existence of a checkpoint and pending deferred frees can cause this. Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8206	2018-12-14 10:03:05 -08:00
Prakash Surya	900d09b285	OpenZFS 9962 - zil_commit should omit cache thrash As a result of the changes made in 8585, it's possible for an excessive amount of vdev flush commands to be issued under some workloads. Specifically, when the workload consists of mostly async write activity, interspersed with some sync write and/or fsync activity, we can end up issuing more flush commands to the underlying storage than is actually necessary. As a result of these flush commands, the write latency and overall throughput of the pool can be poorly impacted (latency increases, throughput decreases). Currently, any time an lwb completes, the vdev(s) written to as a result of that lwb will be issued a flush command. The intenion is so the data written to that vdev is on stable storage, prior to communicating to any waiting threads that their data is safe on disk. The problem with this scheme, is that sometimes an lwb will not have any threads waiting for it to complete. This can occur when there's async activity that gets "converted" to sync requests, as a result of calling the zil_async_to_sync() function via zil_commit_impl(). When this occurs, the current code may issue many lwbs that don't have waiters associated with them, resulting in many flush commands, potentially to the same vdev(s). For example, given a pool with a single vdev, and a single fsync() call that results in 10 lwbs being written out (e.g. due to other async writes), that will result in 10 flush commands to that single vdev (a flush issued after each lwb write completes). Ideally, we'd only issue a single flush command to that vdev, after all 10 lwb writes completed. Further, and most important as it pertains to this change, since the flush commands are often very impactful to the performance of the pool's underlying storage, unnecessarily issuing these flush commands can poorly impact the performance of the lwb writes themselves. Thus, we need to avoid issuing flush commands when possible, in order to acheive the best possible performance out of the pool's underlying storage. This change attempts to address this problem by changing the ZIL's logic to only issue a vdev flush command when it detects an lwb that has a thread waiting for it to complete. When an lwb does not have threads waiting for it, the responsibility of issuing the flush command to the vdevs involved with that lwb's write is passed on to the "next" lwb. It's only once a write for an lwb with waiters completes, do we issue the vdev flush command(s). As a result, now when we issue the flush(s), we will issue them to the vdevs involved with that specific lwb's write, but potentially also to vdevs involved with "previous" lwb writes (i.e. if the previous lwbs did not have waiters associated with them). Thus, in our prior example with 10 lwbs, it's only once the last lwb completes (which will be the lwb containing the waiter for the thread that called fsync) will we issue the vdev flush command; all of the other lwbs will find they have no waiters, so they'll pass the responsibility of the flush to the "next" lwb (until reaching the last lwb that has the waiter). Porting Notes: * Reconciled conflicts with the fastwrite feature. Authored by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Joshua M. Clulow <josh@sysmgr.org> Ported-by: Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9962 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/545190c6 Closes #8188	2018-12-07 11:09:42 -08:00
Tom Caputi	e3c85c0938	Move assert in dump_dir() in zdb This one line patch moves an assert in the function dump_dir() below an error check that ensures it ran correctly. This ensures zdb dumps the error that actually caused the problem, as opposed to one of its symptoms. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8171	2018-12-05 09:30:28 -08:00
Brian Behlendorf	c5eea0ab9c	Fix 'zpool list -v' alignment The verbose output of 'zpool list' was not correctly aligned due to differences in the vdev name lengths. Minimally update the code the correct the alignment using the same strategy employed by 'zpool status'. Missing dashes were added for the empty defaults columns, and the vdev state is now printed for all vdevs. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7308 Closes #8147	2018-12-04 10:17:54 -08:00
Tom Caputi	0b606cb33f	Fix ztest deadlock in ztest_zil_remount() This patch fixes a small race condition in ztest_zil_remount() that could result in a deadlock. ztest_device_removal() calls spa_vdev_remove() which may eventually call spa_reset_logs(). If ztest_zil_remount() attempts to call zil_close() while this is happening, it may fail when it asserts !zilog_is_dirty(zilog). This patch simply adds locking to correct the issue. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8154	2018-12-04 09:43:31 -08:00
LOLi	bdbd5477bc	Fix ASSERT in zfs_receive_one() This commit fixes the following ASSERT in zfs_receive_one() when receiving a send stream from a root dataset with the "-e" option: $ sudo zfs snap source@snap $ sudo zfs send source@snap \| sudo zfs recv -e destination/recv chopprefix > drrb->drr_toname ASSERT at libzfs_sendrecv.c:3804:zfs_receive_one() Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8121	2018-12-04 09:38:55 -08:00
Tom Caputi	c40a1124e1	Fix consistency of ztest_device_removal_active ztest currently uses the boolean flag ztest_device_removal_active to protect some tests that may not run successfully if they occur at the same time as ztest_device_removal(). Unfortunately, in the event that ztest is in the middle of a device removal when it decides to issue a SIGKILL, the device removal will be automatically restarted (without setting the flag) when the pool is re-imported on the next run. This patch corrects this by ensuring that any in-progress removals are completed before running further tests after the re-import. This patch also makes a few small changes to prevent race conditions involving the creation and destruction of spa->spa_vdev_removal, since this field is not protected by any locks. Some checks that may run concurrently with setting / unsetting this field have been updated to check spa->spa_removing_phys.sr_state instead. The most significant change here is that spa_removal_get_stats() no longer accounts for in-flight work done, since that could result in a NULL pointer dereference. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8105	2018-11-28 20:47:09 -08:00
LOLi	0cd5c941d0	zpool: allow split with whole-disk devices This change allows 'zpool split' to work with whole-disk devices and updates the ZFS Test Suite with a new script to exercise this functionality. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6643 Closes #8133	2018-11-20 10:22:53 -08:00
Sebastien Roy	a10d50f999	OpenZFS 8115 - parallel zfs mount Porting Notes: * Use thread pools (tpool) API instead of introducing taskq interfaces to libzfs. * Use pthread_mutext for locks as mutex_t isn't available. * Ignore alternative libshare initialization since OpenZFS-7955 is not present on zfsonlinux. Authored by: Sebastien Roy <seb@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Authored by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Matt Ahrens <mahrens@delphix.com> Ported-by: Don Brady <don.brady@delphix.com> OpenZFS-issue: https://www.illumos.org/issues/8115 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3f0e2b569 Closes #8092	2018-11-15 11:33:58 -08:00
loli10K	d48091de81	zed: detect and offline physically removed devices This commit adds a new test case to the ZFS Test Suite to verify ZED can detect when a device is physically removed from a running system: the device will be offlined if a spare is not available in the pool. We implement this by using the existing libudev functionality and without relying solely on the FM kernel module capabilities which have been observed to be unreliable with some kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #1537 Closes #7926	2018-11-09 11:17:24 -08:00
Tony Hutter	ad796b8a3b	Add zpool status -s (slow I/Os) and -p (parseable) This patch adds a new slow I/Os (-s) column to zpool status to show the number of VDEV slow I/Os. This is the number of I/Os that didn't complete in zio_slow_io_ms milliseconds. It also adds a new parsable (-p) flag to display exact values. NAME STATE READ WRITE CKSUM SLOW testpool ONLINE 0 0 0 - mirror-0 ONLINE 0 0 0 - loop0 ONLINE 0 0 0 20 loop1 ONLINE 0 0 0 0 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7756 Closes #6885	2018-11-08 16:47:24 -08:00
Tom Caputi	a2d88f778a	Fix !zilog_is_dirty() assert during ztest ztest occasionally hits an assert that !zilog_is_dirty() during zil_close(). This is caused by an interaction between 2 threads. First, ztest_run() waits for each test thread to complete and closes the associated dataset as soon as the thread joins. At the same time, the ztest_vdev_add_remove() test may attempt to remove the slog, which will open, dirty, and reset the logs on every dataset in the pool (including those of other threads). This patch simply ensures that we always join all of the test threads before closing any datasets. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8094	2018-11-07 15:46:50 -08:00
Tom Caputi	f44ad9297d	Replay logs before starting ztest workers This patch ensures that logs are replayed on all datasets prior to starting ztest workers. This ensures that the call to vdev_offline() a log device in ztest_fault_inject() will not fail due to the log device being required for replay. This patch also fixes a small issue found during testing where spa_keystore_load_wkey() does not check that the dataset specified is an encryption root. This check was present in libzfs, however. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8084	2018-11-07 15:40:24 -08:00
Brian Behlendorf	09b85f2ded	ztest: reduce gangblock creation In order to validate the gang block code ztest is configured to artificially force a fraction of large blocks to be written as gang blocks. The default setting chosen for this was to write 25% of all blocks 32k or larger using gang blocks. The confluence of an unrealistically large number of gang blocks, the aggressive fault injection done by ztest, and the split segment reconstruction logic introduced by device removal has resulted in the following type of failure: zdb -bccsv -G -d ... exit code 3 Specifically, zdb was unable to open the pool because it was unable to reconstruct a damaged block. Manual investigation of multiple failures clearly showed that the block could be reconstructed. However, due to the large number of damaged segments (>35) it could not be done in the allotted time. Furthermore, the large number of gang blocks was determined to be the reason for the unrealistically large number of damaged segments. In order to make this situation less likely, this change both increases the forced gang block size to 64k and reduces the frequency to 3% of blocks. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8080	2018-11-05 11:53:49 -08:00
Don Brady	e89f1295d4	Add libzutil for libzfs or libzpool consumers Adds a libzutil for utility functions that are common to libzfs and libzpool consumers (most of what was in libzfs_import.c). This removes the need for utilities to link against both libzpool and libzfs. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #8050	2018-11-05 11:22:33 -08:00
Serapheim Dimitropoulos	0a544c174d	zdb -k does not work on Linux when used with -e This minor bug was introduced with the port of the feature from OpenZFS to ZoL. This patch fixes the issue that was caused by a minor re-ordering from the original code. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8001	2018-10-30 11:46:18 -05:00
Gregor Kopka	63a77ae3cf	Added column definitions to arcstat.py grow: ARC Grow enabled (!arc_no_grow) free: ARC Free memory (arc_sys_free) need: ARC Reclaim need (arc_need_free) Fixed alignment issues (mread had wrong width). Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes #8058	2018-10-29 18:18:20 -05:00
Brian Behlendorf	bea7578356	ZTS: Fix auto_replace_001_pos test The root cause of these failures is that udev can notify the ZED of newly created partition before its links are created. Handle this by allowing an auto-replace to briefly wait until udev confirms the links exist. Distill this test case down to its essentials so it can be run reliably. What we need to check is that: 1) A new disk, in the same physical location, is automatically brought online when added to the system, 2) It completes the replacement process, and 3) The pool is now ONLINE and healthy. There is no need to remove the scsi_debug module. After exporting the pool the disk can be zeroed, removed, and then re-added to the system as a new disk. Reviewed by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8051	2018-10-29 15:05:14 -05:00
Brian Behlendorf	b74f48fe1b	Fix flake8 "invalid escape sequence 'x'" warning From, https://lintlyci.github.io/Flake8Rules/rules/W605.html As of Python 3.6, a backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases. Note 'float_pobj' was simply removed from arcstat.py since it was entirely unused. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8056	2018-10-24 23:26:08 -07:00
Tom Caputi	7d658d29cf	Fix waiting in ztest_device_removal() spa->spa_vdev_removal is created in a sync task that is initiated via dsl_sync_task_nowait(). Since the task may not run before spa_vdev_remove() returns, we must wait at least 1 txg to ensure that the removal struct has been created. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010	2018-10-24 14:37:35 -07:00
Tom Caputi	4a7eb69a5a	Fix ztest deadman panic with indirect vdev damage This patch fixes an issue where ztest's deadman thread would trigger a panic because reconstructing artifically damaged blocks would take too long to reconstruct. This patch simply limits how often ztest inflicts split-block damage and how many segments it can damage when it does. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010	2018-10-24 14:37:31 -07:00
Tom Caputi	ab4c009e3d	Fix dbgmsg printing in ztest and zdb This patch resolves a problem where the -G option in both zdb and ztest would cause the code to call __dprintf() to print zfs_dbgmsg output. This function was not properly wired to add messages to the dbgmsg log as it is in userspace and so the messages were simply dropped. This patch also tries to add some degree of distinction to dprintf() (which now prints directly to stdout) and zfs_dbgmsg() (which adds messages to an internal list that can be dumped with zfs_dbgmsg_print()). In addition, this patch corrects an issue where ztest used a global variable to decide whether to dump the dbgmsg buffer on a crash. This did not work because ztest spins up more instances of itself using execv(), which did not copy the global variable to the new process. The option has been moved to the ztest_shared_opts_t which already exists for interprocess communication. This patch also changes zfs_dbgmsg_print() to use write() calls instead of printf() so that it will not fail when used in a signal handler. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010	2018-10-24 14:36:50 -07:00
Tom Caputi	9410257800	Fix random ztest_deadman_thread failures The zloop test has been failing in buildbot for the last few weeks with various failures in ztest_deadman_thread(). This is due to the fact that this thread is not stopped when performing pool import / export tests as it should be. This patch simply corrects this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010	2018-10-24 14:36:21 -07:00
Serapheim Dimitropoulos	9b2266e3d8	OpenZFS 9682 - page fault in dsl_async_clone_destroy() while opening pool Authored by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/9682 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ade2c82828 Closes #8037	2018-10-19 12:06:21 -07:00
Matthew Ahrens	d637db98e1	OpenZFS 9681 - ztest failure in spa_history_log_internal due to spa_rename() Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed by: Tom Caputi <tcaputi@datto.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9681 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/6aee0ad7 Closes #8041	2018-10-19 12:02:28 -07:00
Tom Caputi	80a91e7469	Defer new resilvers until the current one ends Currently, if a resilver is triggered for any reason while an existing one is running, zfs will immediately restart the existing resilver from the beginning to include the new drive. This causes problems for system administrators when a drive fails while another is already resilvering. In this case, the optimal thing to do to reduce risk of data loss is to wait for the current resilver to end before immediately replacing the second failed drive, which allows the system to operate with two incomplete drives for the minimum amount of time. This patch introduces the resilver_defer feature that essentially does this for the admin without forcing them to wait and monitor the resilver manually. The change requires an on-disk feature since we must mark drives that are part of a deferred resilver in the vdev config to ensure that we do not assume they are done resilvering when an existing resilver completes. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: @mmaybee Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7732	2018-10-18 21:06:18 -07:00
LOLi	2e55034471	zpool: allow sharing of spare device among pools ZFS allows, by default, sharing of spare devices among different pools; this commit simply restores this functionality for disk devices and adds an additional tests case to the ZFS Test Suite to prevent future regression. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7999	2018-10-17 11:21:07 -07:00
Paul Dagnelie	d52d80b700	Add types to featureflags in zfs The boolean featureflags in use thus far in ZFS are extremely useful, but because they take advantage of the zap layer, more interesting data than just a true/false value can be stored in a featureflag. In redacted send/receive, this is used to store the list of redaction snapshots for a redacted dataset. This change adds the ability for ZFS to store types other than a boolean in a featureflag. The only other implemented type is a uint64_t array. It also modifies the interfaces around dataset features to accomodate the new capabilities, and adds a few new functions to increase encapsulation. This functionality will be used by the Redacted Send/Receive feature. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #7981	2018-10-16 11:15:04 -07:00
Matthew Ahrens	0aa5916a30	OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979 ) OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects We're leaking the dd_clones objects in dsl_dir_destroy_sync. This bug appears to have been around forever. Thankfully the amount of space typically involved is tiny. In addition this adds a mechanism in ZDB to find objects in the MOS which are leaked (not referenced anywhere). Porting notes: * Added dd_crypto_obj to ZDB MOS object leak tracking Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Matthew Ahrens <mahrens@delphix.com> OpenZFS-issue: https://illumos.org/issues/9847 Closes #7979	2018-10-12 11:28:26 -07:00
Brian Behlendorf	27f80e85c2	Improved error handling for extreme rewinds The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7809 Closes #7921	2018-10-12 11:24:04 -07:00
Matt Ahrens	5d43cc9a59	OpenZFS 9689 - zfs range lock code should not be zpl-specific The ZFS range locking code in zfs_rlock.c/h depends on ZPL-specific data structures, specifically znode_t. However, it's also used by the ZVOL code, which uses a "dummy" znode_t to pass to the range locking code. We should clean this up so that the range locking code is generic and can be used equally by ZPL and ZVOL, and also can be used by future consumers that may need to run in userland (libzpool) as well as the kernel. Porting notes: * Added missing sys/avl.h include to sys/zfs_rlock.h. * Removed 'dbuf is within the locked range' ASSERTs from dmu_sync(). This was needed because ztest does not yet use a locked_range_t. * Removed "Approved by:" tag requirement from OpenZFS commit check to prevent needless warnings when integrating changes which has not been merged to illumos. * Reverted free_list range lock changes which were originally needed to defer the cv_destroy() which was called immediately after cv_broadcast(). With `d2733258` this should be safe but if not we may need to reintroduce this logic. * Reverts: The following two commits were reverted and squashed in to this change in order to make it easier to apply OpenZFS 9689. - `d88895a0`, which removed the dummy znode from zvol_state - `e3a07cd0`, which updated ztest to use range locks * Preserved optimized rangelock comparison function. Preserved the rangelock free list. The cv_destroy() function will block waiting for all processes in cv_wait() to be scheduled and drop their reference. This is done to ensure it's safe to free the condition variable. However, blocking while holding the rl->rl_lock mutex can result in a deadlock on Linux. A free list is introduced to defer the cv_destroy() and kmem_free() until after the mutex is released. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://illumos.org/issues/9689 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/680 External-issue: DLPX-58662 Closes #7980	2018-10-11 10:19:33 -07:00
Tony Hutter	2ef0f8c329	Print "(repairing)" in zpool status again Historically, zpool status prints "(repairing)" for any drives that have errors during a scrub: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/file1 ONLINE 13 0 0 (repairing) /tmp/file2 ONLINE 0 0 0 /tmp/file3 ONLINE 0 0 0 This was accidentally broken in "OpenZFS 9166 - zfs storage pool checkpoint" (`d2734cc`). This patch adds it back in. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7779 Closes #7978	2018-10-09 20:30:32 -07:00
Tom Caputi	52ce99dd61	Refcounted DSL Crypto Key Mappings Since native ZFS encryption was merged, we have been fighting against a series of bugs that come down to the same problem: Key mappings (which must be present during all I/O operations) are created and destroyed based on dataset ownership, but I/Os can have traditionally been allowed to "leak" into the next txg after the dataset is disowned. In the past we have attempted to solve this problem by trying to ensure that datasets are disowned ater all I/O is finished by calling txg_wait_synced(), but we have repeatedly found edge cases that need to be squashed and code paths that might incur a high number of txg syncs. This patch attempts to resolve this issue differently, by adding a reference to the key mapping for each txg it is dirtied in. By doing so, we can remove many of the unnecessary calls to txg_wait_synced() we have added in the past and ensure we don't need to deal with this problem in the future. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7949	2018-10-03 09:47:11 -07:00
Tim Schumacher	424fd7c3e0	Prefix all refcount functions with zfs_ Recent changes in the Linux kernel made it necessary to prefix the refcount_add() function with zfs_ due to a name collision. To bring the other functions in line with that and to avoid future collisions, prefix the other refcount functions as well. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Schumacher <timschumi@gmx.de> Closes #7963	2018-10-01 10:42:05 -07:00
Brian Behlendorf	1258bd778e	Refine split block reconstruction Due to a flaw in `4589f3ae` the number of unique combinations could be calculated incorrectly. This could result in the random combinations reconstruction being used when it would have been possible to check all combinations. This change fixes the unique combinations calculation and simplifies the reconstruction logic by maintaining a per- segment list of unique copies. The vdev_indirect_splits_damage() function was introduced to validate both the enumeration and random reconstruction logic with ztest. It is implemented such it will never make a known recoverable block unrecoverable. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #6900 Closes #7934	2018-10-01 10:36:34 -07:00
Tim Schumacher	c13060e478	Linux 4.19-rc3+ compat: Remove refcount_t compat torvalds/linux@59b57717f ("blkcg: delay blkg destruction until after writeback has finished") added a refcount_t to the blkcg structure. Due to the refcount_t compatibility code, zfs_refcount_t was used by mistake. Resolve this by removing the compatibility code and replacing the occurrences of refcount_t with zfs_refcount_t. Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Schumacher <timschumi@gmx.de> Closes #7885 Closes #7932	2018-09-26 10:29:26 -07:00
Gregor Kopka	b954e36e51	Zpool iostat: remove latency/queue scaling Bandwidth and iops are average per second while _wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes #7945 Closes #7694	2018-09-25 16:29:16 -07:00
LOLi	e0b7ff46c9	zstreamdump dumps core printing truncated nvlist This change prevents zstreamdump from crashing when trying to print invalid nvlist data (DRR_BEGIN record) from a truncated send stream. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7917	2018-09-18 09:43:09 -07:00
LOLi	5140a58f3b	zpool should detect invalid fs property on create This change improve the handling of invalid filesystem properties when specified at pool creation: this is useful when 'zpool create -n' (dry run) is executed to detect invalid fs-level options (-O) before the actual command is run. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7620 Closes #7878	2018-09-13 13:37:42 -07:00
LOLi	0238a9755b	Fix 'zfs allow' for create time permissions When no permission set is defined for a dataset the create time permissions are incorrectly shown as if they were a permission set. This change simply correct how allow permissions are displayed. This commit also fixes a small manpage formatting issue and adds the "zfs_allow_003_pos" test case to the ZFS Test Suite. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7519 Closes #7860	2018-09-06 13:11:21 -07:00
Don Brady	cc99f275a2	Pool allocation classes Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Håkan Johansson <f96hajo@chalmers.se> Reviewed-by: Andreas Dilger <andreas.dilger@chamcloud.com> Reviewed-by: DHE <git@dehacked.net> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Gregor Kopka <gregor@kopka.net> Reviewed-by: Kash Pande <kash@tripleback.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #5182	2018-09-05 18:33:36 -07:00
Don Brady	b83a0e2dc1	Add basic zfs ioc input nvpair validation We want newer versions of libzfs_core to run against an existing zfs kernel module (i.e. a deferred reboot or module reload after an update). Programmatically document, via a zfs_ioc_key_t, the valid arguments for the ioc commands that rely on nvpair input arguments (i.e. non legacy commands from libzfs_core). Automatically verify the expected pairs before dispatching a command. This initial phase focuses on the non-legacy ioctls. A follow-on change can address the legacy ioctl input from the zfs_cmd_t. The zfs_ioc_key_t for zfs_keys_channel_program looks like: static const zfs_ioc_key_t zfs_keys_channel_program[] = { {"program", DATA_TYPE_STRING, 0}, {"arg", DATA_TYPE_UNKNOWN, 0}, {"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL}, {"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, {"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, }; Introduce four input errors to identify specific input failures (in addition to generic argument value errors like EINVAL, ERANGE, EBADF, and E2BIG). ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #7780	2018-09-02 12:14:01 -07:00
bernie1995	0fe7c953b3	ZTS: path cleanup Removing hardcoded paths in many scripts. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bernie1995 <bernie.pikes@gmail.com> Issue #7507 Closes #7843	2018-08-30 13:46:55 -07:00
Tom Caputi	c3bd3fb4ac	OpenZFS 9403 - assertion failed in arc_buf_destroy() Assertion failed in arc_buf_destroy() when concurrently reading block with checksum error. Porting notes: * The ability to zinject decompression errors has been added, but this only works at the zio_decompress() level, where we have all of the info we need to match against the user's zinject options. * The decompress_fault test has been added to test the new zinject functionality * We attempted to set zio_decompress_fail_fraction to (1 << 18) in ztest for further test coverage. Although this did uncover a few low priority issues, this unfortuantely also causes ztest to ASSERT in many locations where the code is working correctly since it is designed to fail on IO errors. Developers can manually set this variable with the '-o' option to find and debug issues. Authored by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Matt Ahrens <mahrens@delphix.com> Ported-by: Tom Caputi <tcaputi@datto.com> OpenZFS-issue: https://illumos.org/issues/9403 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9 Closes #7822	2018-08-29 11:33:33 -07:00

1 2 3 4 5 ...

789 Commits