Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
loli10K	715c996d3b	Fix pool creation with feature@allocation_classes disabled When "feature@allocation_classes" is not enabled on the pool no vdev with "special" or "dedup" allocation type should be allowed to exist in the vdev tree. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9427 Closes #9429	2019-10-10 16:39:41 -07:00
Igor K	d0a84ba92b	ZTS: Fix mmp_hostid test Correctly use the `mntpnt_fs` variable, and include additional logic to ensure the /etc/hostid is correct set up and cleaned up. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #9349	2019-10-08 13:40:17 -07:00
Brian Behlendorf	94bcf6f5e3	ZTS: Fix trim/trim_config and trim/autotrim_config There have been occasional CI failures which occur when the trimmed vdev size exactly matches the target size. Resolve this by slightly relaxing the conditional and checking for -ge rather than -gt. In all of the cases observer, the values match exactly. For example: Failure /mnt/trim-vdev1 is 768 MB which is not -gt than 768 MB Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9399	2019-10-04 12:38:07 -07:00
Brian Behlendorf	e1c216fb0c	ZTS: Fix upgrade_readonly_pool Update cleanup_upgrade to use destroy_dataset and destroy_pool when performing cleanup. These wrappers retry if the pool is busy preventing occasional failures like those observed when running tests upgrade_readonly_pool. For example: SUCCESS: test enabled == enabled User accounting upgrade is not executed on readonly pool NOTE: Performing local cleanup via log_onexit (cleanup_upgrade) cannot destroy 'testpool': pool is busy ERROR: zpool destroy testpool exited 1 Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9400	2019-10-03 09:39:13 -07:00
Matthew Macy	d31277abb1	OpenZFS restructuring - libspl Factor Linux specific pieces out of libspl. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9336	2019-10-02 10:39:48 -07:00
Matthew Macy	7bb0c29468	OpenZFS restructuring - zfs_ioctl Refactor the zfs ioctls in to platform dependent and independent bits. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Signed-off-by: Matthew Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9301	2019-09-27 10:46:28 -07:00
Tom Caputi	bb61cc3185	Fix encryption hierarchy issues with zfs recv -d Currently, the recv_fix_encryption_hierarchy() function accepts 'destsnap' as one of its parameters. Originally, this was intended to be the top-level dataset of a receive (whether or not the receive was recursive). Unfortunately, this parameter actually is simply the input that is passed in from the command line. When the user specifies 'zfs recv -d', this string is actually only the name of the receiving pool since the rest of the name is derived from the send stream. This causes the function to fail, leaving some datasets with an invalid encryption hierarchy. This patch resolves this problem by passing in the top_zfs variable instead. In order to make this work, this patch also includes some changes that ensure the value is always present when we need it. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #9273 Closes #9309	2019-09-25 17:02:32 -07:00
Ryan Moeller	479d7d3ca6	ZTS: Fix typos in zfs_destroy tests cleanup lot_must -> log_must Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9362	2019-09-25 11:42:04 -07:00
Brian Behlendorf	5e5cefbaee	ZTS: harden xattr/cleanup.ksh When the xattr/cleanup.ksh script is unable to remove the test group due to an active process then it will not call default_cleanup. This will result in a zvol_ENOSPC/setup failure when attempting to create the /mnt/testdir directory which will already exist. Resolve the issue by performing the default_cleanup before removing the test user and group to ensure this step always happens. Also allow one more retry to further minimize the likelihood of the cleanup failing. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9358	2019-09-25 09:24:45 -07:00
loli10K	d359e99c38	diff_cb() does not handle large dnodes Trying to 'zfs diff' a snapshot with large dnodes will incorrectly try to access its interior slots when dnodesize > sizeof(dnode_phys_t). This is normally not an issue because the interior slots are zero-filled, which report_dnode() handles calling report_free_dnode_range(). However this is not the case for encrypted large dnodes or filesystem using many SA based xattrs where the extra data past the legacy dnode size boundary is interpreted as a dnode_phys_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7678 Closes #8931 Closes #9343	2019-09-24 12:01:37 -07:00
loli10K	fcd37b622b	Device removal of indirect vdev panics the kernel This commit fixes a NULL pointer dereference triggered in spa_vdev_remove_top_check() by trying to "zpool remove" an indirect vdev. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9327	2019-09-16 10:46:59 -07:00
loli10K	7e15647ce9	ZTS: Fix /usr/bin/env: 'python2': No such file or directory Since `4f342e45` env(1) must be able to find a "python2" executable in the "constrained path" on systems configured with --with-python=2.x otherwise the ZFS Test Suite won't be able to use Python scripts. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9325	2019-09-16 10:44:51 -07:00
Tom Caputi	637f0c6019	Fix clone handling with encryption roots Currently, spa_keystore_change_key_sync_impl() does not recurse into clones when updating encryption roots for either a call to 'zfs promote' or 'zfs change-key'. This can cause children of these clones to end up in a state where they point to the wrong dataset as the encryption root. It can also trigger ASSERTs in some cases where the code checks reference counts on wrapping keys. This patch fixes this issue by ensuring that this function properly recurses into clones during processing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #9267 Closes #9294	2019-09-16 10:07:33 -07:00
John Gallagher	e60e158eff	Add subcommand to wait for background zfs activity to complete Currently the best way to wait for the completion of a long-running operation in a pool, like a scrub or device removal, is to poll 'zpool status' and parse its output, which is neither efficient nor convenient. This change adds a 'wait' subcommand to the zpool command. When invoked, 'zpool wait' will block until a specified type of background activity completes. Currently, this subcommand can wait for any of the following: - Scrubs or resilvers to complete - Devices to initialized - Devices to be replaced - Devices to be removed - Checkpoints to be discarded - Background freeing to complete For example, a scrub that is in progress could be waited for by running zpool wait -t scrub <pool> This also adds a -w flag to the attach, checkpoint, initialize, replace, remove, and scrub subcommands. When used, this flag makes the operations kicked off by these subcommands synchronous instead of asynchronous. This functionality is implemented using a new ioctl. The type of activity to wait for is provided as input to the ioctl, and the ioctl blocks until all activity of that type has completed. An ioctl was used over other methods of kernel-userspace communiction primarily for the sake of portability. Porting Notes: This is ported from Delphix OS change DLPX-44432. The following changes were made while porting: - Added ZoL-style ioctl input declaration. - Reorganized error handling in zpool_initialize in libzfs to integrate better with changes made for TRIM support. - Fixed check for whether a checkpoint discard is in progress. Previously it also waited if the pool had a checkpoint, instead of just if a checkpoint was being discarded. - Exposed zfs_initialize_chunk_size as a ZoL-style tunable. - Updated more existing tests to make use of new 'zpool wait' functionality, tests that don't exist in Delphix OS. - Used existing ZoL tunable zfs_scan_suspend_progress, together with zinject, in place of a new tunable zfs_scan_max_blks_per_txg. - Added support for a non-integral interval argument to zpool wait. Future work: ZoL has support for trimming devices, which Delphix OS does not. In the future, 'zpool wait' could be extended to add the ability to wait for trim operations to complete. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com> Closes #9162	2019-09-13 18:09:06 -07:00
Brian Behlendorf	25f06d677a	Fix /etc/hostid on root pool deadlock Accidentally introduced by `dc04a8c` which now takes the SCL_VDEV lock as a reader in zfs_blkptr_verify(). A deadlock can occur if the /etc/hostid file resides on a dataset in the same pool. This is because reading the /etc/hostid file may occur while the caller is holding the SCL_VDEV lock as a writer. For example, to perform a `zpool attach` as shown in the abbreviated stack below. To resolve the issue we cache the system's hostid when initializing the spa_t, or when modifying the multihost property. The cached value is then relied upon for subsequent accesses. Call Trace: spa_config_enter+0x1e8/0x350 [zfs] zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock zio_read+0x6c/0x140 [zfs] ... vfs_read+0xfc/0x1e0 kernel_read+0x50/0x90 ... spa_get_hostid+0x1c/0x38 [zfs] spa_config_generate+0x1a0/0x610 [zfs] vdev_label_init+0xa0/0xc80 [zfs] vdev_create+0x98/0xe0 [zfs] spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9256 Closes #9285	2019-09-10 13:42:30 -07:00
John Wren Kennedy	b63e2d881f	ZTS: Introduce targeted corruption in file blocks filetest_001_pos verifies that various checksum algorithms detect corruption by overwriting the underlying vdev on which a file resides. It is possible for the overwrite to miss the blocks of a file, causing a spurious failure. This change introduces a function to corrupt the individual blocks of a file as determined by zdb. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #9288	2019-09-09 16:11:07 -07:00
Ryan Moeller	43f4495bde	Clean up do_vol_test in zfs_copies tests Get rid of the `get_used_prop` function. `get_prop used` works fine. Fix the comment describing the function parameters. The type does not have a default, and mntp is also used for ext2. Rename the variable for the number of copies from `copy` to `copies`. Use a `case` statement to match the type parameter, order the cases alphabetically, and add a little sanity checking for good measure. Use eval to make sure the output of commands is silenced rather than the log messages when redirecting output to /dev/null. Simplify cases where zfs requires special behavior. Don't allow the test to loop forever in the event space usage does not change. Bail out of the loop and fail after an arbitrary number of iterations. Add more information to the log message when the test fails, to help debugging. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9286	2019-09-09 16:04:05 -07:00
Tom Caputi	870e7a52c1	Fix noop receive of raw send stream Currently, the noop receive code fails to work with raw send streams and resuming send streams. This happens because zfs_receive_impl() reads the DRR_BEGIN payload without reading the payload itself. Normally, the kernel expects to read this itself, but in this case the recv_skip() code runs instead and it is not prepared to handle the stream being left at any place other than the beginning of a record. This patch resolves this issue by manually reading the DRR_BEGIN payload in the dry-run case. This patch also includes a number of small fixups in this code path. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #9221 Closes #9173	2019-09-05 16:22:05 -07:00
Ryan Moeller	8e2c502cf3	Clean up zfs_clone_010_pos Remove a lot of unnecessary setting and incrementing of `i`. Remove unused variable `j`. Instead of calling out to Python in a loop to generate the same string repeatedly, generate the string once using shell constructs before entering the loop. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9284	2019-09-05 16:20:09 -07:00
Ryan Moeller	240c015ac6	Refactor checksum operations in tests md5sum in particular but also sha256sum to a lesser extent is used in several areas of the test suite for computing checksums. The vast majority of invocations are followed by `\| awk '{ print $1 }'`. Introduce functions to wrap up `md5sum $file \| awk '{ print $1 }'` and likewise for sha256sum. These also serve as a convenient interface for alternative implementations on other platforms. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9280	2019-09-05 09:51:59 -07:00
Ryan Moeller	a6a2877774	Use the right booleans TRUE and FALSE happen to be defined, but we should use B_TRUE and B_FALSE for the sake of consistency. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9264	2019-09-03 12:44:08 -07:00
Igor K	1a504d27df	ZTS: Fix removal_cancel.ksh Create a larger file to extend the time required to perform the removal. Occasional failures were observed due to the removal completing before the cancel could be requested. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #9259	2019-09-03 10:46:40 -07:00
Andrea Gelmini	c6e457dffb	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9250	2019-09-02 18:14:53 -07:00
Andrea Gelmini	cb14aa4ca9	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9249	2019-09-02 18:13:19 -07:00
Andrea Gelmini	4001f09055	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9247	2019-09-02 18:12:01 -07:00
Andrea Gelmini	220dd4ae84	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9246	2019-09-02 18:10:31 -07:00
Andrea Gelmini	24739cd5b0	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9244	2019-09-02 18:08:56 -07:00
Andrea Gelmini	37e42197a4	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9243	2019-09-02 18:07:35 -07:00
Andrea Gelmini	ade306a9d4	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9242	2019-09-02 17:58:26 -07:00
Andrea Gelmini	c953960048	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9248	2019-08-30 16:53:48 -07:00
Andrea Gelmini	b520706f29	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9245	2019-08-30 16:52:00 -07:00
Igor K	d39c71d365	Fix refquota_007_neg.ksh Must use 'zfs' instead of '$ZFS' which is undefined. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #9257	2019-08-30 09:32:25 -07:00
Ryan Moeller	815a645692	Simplify deleting partitions in libtest Eliminate unnecessary code duplication. We can use a for-loop instead of a while-loop. There is no need to echo $DISKSARRAY in a subshell or return 0. Declare all variables with typeset. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9224	2019-08-29 13:11:29 -07:00
Ryan Moeller	f66ad580cc	Use compatible arg order in tests BSD getopt() and getopt_long() want options before arguments. Reorder arguments to zfs/zpool in tests to put all the options first. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9228	2019-08-29 11:03:09 -07:00
Chunwei Chen	035e96118b	Fix zil replay panic when TX_REMOVE followed by TX_CREATE If TX_REMOVE is followed by TX_CREATE on the same object id, we need to make sure the object removal is completely finished before creation. The current implementation relies on dnode_hold_impl with DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work fine before, in current version it does not guarantee the object removal is completed. We fix this by checking if DNODE_MUST_BE_FREE returns successful instead. Also add test and remove dead code in dnode_hold_impl. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7151 Closes #8910 Closes #9123 Closes #9145	2019-08-28 10:42:02 -07:00
Ryan Moeller	9c9dcd6e04	Prefer `for (;;)` to `while (TRUE)` Defining a special constant to make an infinite loop is excessive, especially when the name clashes with symbols commonly defined on some platforms (ie FreeBSD). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9219	2019-08-28 10:38:40 -07:00
Ryan Moeller	142f84dd19	Restore :: in Makefile.am The double-colon looked like a typo, but it's actually an obscure feature. Rules with :: may appear multiple times and are run independently of one another in the order they appear. The use of :: for distclean-local was conventional, not accidental. Add comments to indicate the intentional use of double-colon rules. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9210	2019-08-26 11:48:31 -07:00
Paul Dagnelie	95f0144675	Add regression test for "zpool list -p" Other than this test, zpool list -p is not well tested by any of the automated tests. Add a test for zpool list -p. Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #9134	2019-08-25 18:33:03 -07:00
Brian Behlendorf	4302698be1	ZTS: Fix in-tree dbufstats test case Commit `a887d653` updated the dbufstats such that escalated privileges are required. Since all tests under cli_user are run with normal privileges move this test case to a location where it will be run required privileges. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9118 Closes #9196	2019-08-22 17:37:48 -07:00
Ryan Moeller	97c54ea818	Make slog test setup more robust The slog tests fail when attempting to create pools using file vdevs that already exist from previous test runs. Remove these files in the setup for the test. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9194	2019-08-22 17:26:50 -07:00
Brian Behlendorf	31b548ffb9	ZTS: Use decimal values when setting tunables The mdb_set_uint32 function requires that the values passed in be decimal. This was overlooked initially because the matching Linux function accepts both decimal and hexadecimal values. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #9125 Closes #9195	2019-08-22 10:36:57 -07:00
Ryan Moeller	0154a1e539	Dedup IOC enum values in libzfs_input_check Reuse enum value ZFS_IOC_BASE for `('Z' << 8)`. This is helpful on FreeBSD where ZFS_IOC_BASE has a different value and `('Z' << 8)` is wrong. Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9188	2019-08-22 09:46:09 -07:00
Ryan Moeller	f591a581d6	Enhance ioctl number checks When checking ZFS_IOC_* numbers, print which numbers are wrong rather than silently failing. Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9187	2019-08-22 09:44:11 -07:00
Brian Behlendorf	20f7b917aa	ZTS: Fix vdev_zaps_005_pos on CentOS 6 The ancient version of blkid (v2.17.2) used in CentOS 6 will not detect the newly created pool unless it has been written to. Force a pool sync so `zpool import` will detect the newly created pool. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9199	2019-08-22 08:53:44 -07:00
Paul Dagnelie	1a26cb6160	Add more refquota tests It used to be possible for zfs receive (and other operations related to clone swap) to bypass refquotas. This can cause a number of issues, and there should be an automated test for it. Added tests for rollback and receive not overriding refquota. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #9139	2019-08-19 15:06:53 -07:00
Chunwei Chen	8e556c5ebc	Fix out-of-order ZIL txtype lost on hardlinked files We should only call zil_remove_async when an object is removed. However, in current implementation, it is called whenever TX_REMOVE is called. In the case of hardlinked file, every unlink will generate TX_REMOVE and causing operations to be dropped even when the object is not removed. We fix this by only calling zil_remove_async when the file is fully unlinked. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #8769 Closes #9061	2019-08-13 21:21:27 -06:00
Serapheim Dimitropoulos	3b9edd7b17	Introduce getting holds and listing bookmarks through ZCP Consumers of ZFS Channel Programs can now list bookmarks, and get holds from datasets. A minor-refactoring was also applied to distinguish between user and system properties in ZCP. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Ported-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Dan Kimmel <dan.kimmel@delphix.com> OpenZFS-issue: https://illumos.org/issues/8862 Closes #7902	2019-08-12 10:02:34 -07:00
Serapheim Dimitropoulos	8098465558	Test cancelling a removal in ZTS This patch adds a new test that sanity checks cancelling a removal. Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #9101	2019-08-05 10:50:20 -07:00
Brian Behlendorf	1e620c9872	Revert "Develop tests for issues #5866 and #8858 " This reverts commit `693c1fc478`. This change resulted in a kmem leak being observed in existing code which needs to be identified and addressed. Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8978 Closes #9090	2019-07-29 12:46:56 -07:00
Paul Zuchowski	693c1fc478	Develop tests for issues #5866 and #8858 Provide zfstest coverage for these two issues which were a panic accessing extended attributes and a problem comparing 64 bit and 32 bit generation numbers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Issue #5866 Issue #8858 Closes #8978	2019-07-26 17:52:13 -07:00
Tomohiro Kusumi	9fb6abe5ad	Implement secpolicy_vnode_setid_retain() Don't unconditionally return 0 (i.e. retain SUID/SGID). Test CAP_FSETID capability. https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t which expects SUID/SGID to be dropped on write(2) by non-owner fails without this. Most filesystems make this decision within VFS by using a generic file write for fops. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #9035 Closes #9043	2019-07-26 13:52:30 -07:00
Sara Hartse	37f03da8ba	Fast Clone Deletion Deleting a clone requires finding blocks are clone-only, not shared with the snapshot. This was done by traversing the entire block tree which results in a large performance penalty for sparsely written clones. This is new method keeps track of clone blocks when they are modified in a "Livelist" so that, when it’s time to delete, the clone-specific blocks are already at hand. We see performance improvements because now deletion work is proportional to the number of clone-modified blocks, not the size of the original dataset. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Sara Hartse <sara.hartse@delphix.com> Closes #8416	2019-07-26 10:54:14 -07:00
Tony Hutter	d84c5a120e	Move some tests to cli_user/zpool_status The tests in tests/functional/cli_root/zpool_status should all require root. However, linux.run has "user =" specified for those tests, which means they run as a normal user. When I removed that line to run them as root, the following tests did not pass: zpool_status_003_pos zpool_status_-c_disable zpool_status_-c_homedir zpool_status_-c_searchpath These tests need to be run as a normal user. To fix this, move these tests to a new tests/functional/cli_user/zpool_status directory. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #9057	2019-07-19 11:21:54 -07:00
Mike Gerdts	d45d7f08fa	Add zfs create dryrun Adds the ability to sanity check zfs create arguments and to see the value of any additional properties that will local to the dataset. For example, automation that may need to adjust quota on a parent filesystem before creating a volume may call `zfs create -nP -V <size> <volume>` to obtain the value of refreservation. This adds the following options to zfs create: - -n dry-run (no-op) - -v verbose - -P parseable (implies verbose) Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Mike Gerdts <mike.gerdts@joyent.com> Closes #8974	2019-07-16 11:19:24 -07:00
Serapheim Dimitropoulos	93e28d661e	Log Spacemap Project = Motivation At Delphix we've seen a lot of customer systems where fragmentation is over 75% and random writes take a performance hit because a lot of time is spend on I/Os that update on-disk space accounting metadata. Specifically, we seen cases where 20% to 40% of sync time is spend after sync pass 1 and ~30% of the I/Os on the system is spent updating spacemaps. The problem is that these pools have existed long enough that we've touched almost every metaslab at least once, and random writes scatter frees across all metaslabs every TXG, thus appending to their spacemaps and resulting in many I/Os. To give an example, assuming that every VDEV has 200 metaslabs and our writes fit within a single spacemap block (generally 4K) we have 200 I/Os. Then if we assume 2 levels of indirection, we need 400 additional I/Os and since we are talking about metadata for which we keep 2 extra copies for redundancy we need to triple that number, leading to a total of 1800 I/Os per VDEV every TXG. We could try and decrease the number of metaslabs so we have less I/Os per TXG but then each metaslab would cover a wider range on disk and thus would take more time to be loaded in memory from disk. In addition, after it's loaded, it's range tree would consume more memory. Another idea would be to just increase the spacemap block size which would allow us to fit more entries within an I/O block resulting in fewer I/Os per metaslab and a speedup in loading time. The problem is still that we don't deal with the number of I/Os going up as the number of metaslabs is increasing and the fact is that we generally write a lot to a few metaslabs and a little to the rest of them. Thus, just increasing the block size would actually waste bandwidth because we won't be utilizing our bigger block size. = About this patch This patch introduces the Log Spacemap project which provides the solution to the above problem while taking into account all the aforementioned tradeoffs. The details on how it achieves that can be found in the references sections below and in the code (see Big Theory Statement in spa_log_spacemap.c). Even though the change is fairly constraint within the metaslab and lower-level SPA codepaths, there is a side-change that is user-facing. The change is that VDEV IDs from VDEV holes will no longer be reused. To give some background and reasoning for this, when a log device is removed and its VDEV structure was replaced with a hole (or was compacted; if at the end of the vdev array), its vdev_id could be reused by devices added after that. Now with the pool-wide space maps recording the vdev ID, this behavior can cause problems (e.g. is this entry referring to a segment in the new vdev or the removed log?). Thus, to simplify things the ID reuse behavior is gone and now vdev IDs for top-level vdevs are truly unique within a pool. = Testing The illumos implementation of this feature has been used internally for a year and has been in production for ~6 months. For this patch specifically there don't seem to be any regressions introduced to ZTS and I have been running zloop for a week without any related problems. = Performance Analysis (Linux Specific) All performance results and analysis for illumos can be found in the links of the references. Redoing the same experiments in Linux gave similar results. Below are the specifics of the Linux run. After the pool reached stable state the percentage of the time spent in pass 1 per TXG was 64% on average for the stock bits while the log spacemap bits stayed at 95% during the experiment (graph: sdimitro.github.io/img/linux-lsm/PercOfSyncInPassOne.png). Sync times per TXG were 37.6 seconds on average for the stock bits and 22.7 seconds for the log spacemap bits (related graph: sdimitro.github.io/img/linux-lsm/SyncTimePerTXG.png). As a result the log spacemap bits were able to push more TXGs, which is also the reason why all graphs quantified per TXG have more entries for the log spacemap bits. Another interesting aspect in terms of txg syncs is that the stock bits had 22% of their TXGs reach sync pass 7, 55% reach sync pass 8, and 20% reach 9. The log space map bits reached sync pass 4 in 79% of their TXGs, sync pass 7 in 19%, and sync pass 8 at 1%. This emphasizes the fact that not only we spend less time on metadata but we also iterate less times to convergence in spa_sync() dirtying objects. [related graphs: stock- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGStock.png lsm- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGLSM.png] Finally, the improvement in IOPs that the userland gains from the change is approximately 40%. There is a consistent win in IOPS as you can see from the graphs below but the absolute amount of improvement that the log spacemap gives varies within each minute interval. sdimitro.github.io/img/linux-lsm/StockVsLog3Days.png sdimitro.github.io/img/linux-lsm/StockVsLog10Hours.png = Porting to Other Platforms For people that want to port this commit to other platforms below is a list of ZoL commits that this patch depends on: Make zdb results for checkpoint tests consistent `db587941c5` Update vdev_is_spacemap_addressable() for new spacemap encoding `419ba59145` Simplify spa_sync by breaking it up to smaller functions `8dc2197b7b` Factor metaslab_load_wait() in metaslab_load() `b194fab0fb` Rename range_tree_verify to range_tree_verify_not_present `df72b8bebe` Change target size of metaslabs from 256GB to 16GB `c853f382db` zdb -L should skip leak detection altogether `21e7cf5da8` vs_alloc can underflow in L2ARC vdevs `7558997d2f` Simplify log vdev removal code `6c926f426a` Get rid of space_map_update() for ms_synced_length `425d3237ee` Introduce auxiliary metaslab histograms `928e8ad47d` Error path in metaslab_load_impl() forgets to drop ms_sync_lock `8eef997679` = References Background, Motivation, and Internals of the Feature - OpenZFS 2017 Presentation: youtu.be/jj2IxRkl5bQ - Slides: slideshare.net/SerapheimNikolaosDim/zfs-log-spacemaps-project Flushing Algorithm Internals & Performance Results (Illumos Specific) - Blogpost: sdimitro.github.io/post/zfs-lsm-flushing/ - OpenZFS 2018 Presentation: youtu.be/x6D2dHRjkxw - Slides: slideshare.net/SerapheimNikolaosDim/zfs-log-spacemap-flushing-algorithm Upstream Delphix Issues: DLPX-51539, DLPX-59659, DLPX-57783, DLPX-61438, DLPX-41227, DLPX-59320 DLPX-63385 Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8442	2019-07-16 10:11:49 -07:00
Tomohiro Kusumi	ab5036df1c	Fix race in parallel mount's thread dispatching algorithm Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) #8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of #8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) #8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of #8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8450 Closes #8833 Closes #8878	2019-07-09 09:31:46 -07:00
Mike Gerdts	341166c843	OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz skip blocks When a volume is created in a pool with raidz vdevs and volblocksize != 128k, the volume can reference more space than is reserved with the automatically calculated refreservation. There are two deficiencies in vol_volsize_to_reservation that contribute to this: 1) Skip blocks may be added to keep each allocation a multiple of parity + 1. This is the dominating factor when volblocksize is close to 2^ashift. 2) raidz deflation for 128 KB blocks is different for most other block sizes. See "The theory of raidz space accounting" comment in libzfs_dataset.c for a full explanation. Authored by: Mike Gerdts <mike.gerdts@joyent.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Kody Kantor <kody.kantor@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Mike Gerdts <mike.gerdts@joyent.com> Porting Notes: * ZTS: wait for zvols to exist before writing * ZTS: use log_must_busy with {zpool\|zfs} destroy OpenZFS-issue: https://www.illumos.org/issues/9318 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0 Closes #8973	2019-07-05 15:35:15 -07:00
George Wilson	681a85cb01	nopwrites on dmu_sync-ed blocks can result in a panic After device removal, performing nopwrites on a dmu_sync-ed block will result in a panic. This panic can show up in two ways: 1. an attempt to issue an IOCTL in vdev_indirect_io_start() 2. a failed comparison of zio->io_bp and zio->io_bp_orig in zio_done() To resolve both of these panics, nopwrites of blocks on indirect vdevs should be ignored and new allocations should be performed on concrete vdevs. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #8957	2019-06-28 12:40:23 -07:00
Tom Caputi	765d1f0644	Add 'zfs umount -u' for encrypted datasets This patch adds the ability for the user to unload keys for datasets as they are being unmounted. This is analogous to 'zfs mount -l'. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes: #8917 Closes: #8952	2019-06-28 12:38:37 -07:00
Igor K	13d454c6ca	-Y option for zdb is valid The -Y option was added for ztest to test split block reconstruction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #8926	2019-06-24 17:58:11 -07:00
Matthew Ahrens	59ec30a329	Remove code for zfs remap The "zfs remap" command was disabled by `6e91a72fe3`, because it has little utility and introduced some tricky bugs. This commit removes the code for it, the associated ZFS_IOC_REMAP ioctl, and tests. Note that the ioctl and property will remain, but have no functionality. This allows older software to fail gracefully if it attempts to use these, and avoids a backwards incompatibility that would be introduced if we renumbered the later ioctls/props. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #8944	2019-06-24 16:44:01 -07:00
Brian Behlendorf	8f12a4f8d2	Fix out-of-tree build failures Resolve the incorrect use of srcdir and builddir references for various files in the build system. These have crept in over time and went unnoticed because when building in the top level directory srcdir and builddir are identical. With this change it's again possible to build in a subdirectory. $ mkdir obj $ cd obj $ ../configure $ make Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8921 Closes #8943	2019-06-24 09:32:47 -07:00
Tomohiro Kusumi	cc9625c47c	Add missing .gitignore from "Implement Redacted Send/Receive" `30af21b025` needed to ignore get_diff executable. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8950	2019-06-23 11:51:29 -07:00
Don Brady	186898bbb5	OpenZFS 9425 - channel programs can be interrupted Problem Statement ================= ZFS Channel program scripts currently require a timeout, so that hung or long-running scripts return a timeout error instead of causing ZFS to get wedged. This limit can currently be set up to 100 million Lua instructions. Even with a limit in place, it would be desirable to have a sys admin (support engineer) be able to cancel a script that is taking a long time. Proposed Solution ================= Make it possible to abort a channel program by sending an interrupt signal.In the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to catch the signal. Once a signal is encountered, the dsl_sync_task function can install a Lua hook that will get called before the Lua interpreter executes a new line of code. The dsl_sync_task can resume with a standard txg_wait_sync call and wait for the txg to complete. Meanwhile, the hook will abort the script and indicate that the channel program was canceled. The kernel returns a EINTR to indicate that the channel program run was canceled. Porting notes: Added missing return value from cv_wait_sig() Authored by: Don Brady <don.brady@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> OpenZFS-issue: https://www.illumos.org/issues/9425 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/d0cb1fb926 Closes #8904	2019-06-22 16:51:46 -07:00
Allan Jude	fb6e6f1ffb	zstreamdump: add per-record-type counters and an overhead counter Count the bytes of payload for each replication record type Count the bytes of overhead (replication records themselves) Include these counters in the output summary at the end of the run. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Sponsored-By: Klara Systems and Catalogic Closes #8432	2019-06-22 16:33:44 -07:00
Tomohiro Kusumi	d5bf1cf179	Fix build break by "Implement Redacted Send/Receive" `30af21b025` broke build on Fedora. gcc can detect potential overflow on compile-time. Consider strlen of already copied string. Also change strn to strl variants per suggestion from @behlendorf and @ofaaland. -- libzfs_input_check.c: In function 'test_redact': libzfs_input_check.c:711:2: error: 'strncat' specified bound 288 equals destination size [-Werror=stringop-overflow=] strncat(bookmark, "#testbookmark", sizeof (bookmark)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8939	2019-06-22 16:30:59 -07:00
Tom Caputi	da68988708	Allow unencrypted children of encrypted datasets When encryption was first added to ZFS, we made a decision to prevent users from creating unencrypted children of encrypted datasets. The idea was to prevent users from inadvertently leaving some of their data unencrypted. However, since the release of 0.8.0, some legitimate reasons have been brought up for this behavior to be allowed. This patch simply removes this limitation from all code paths that had checks for it and updates the tests accordingly. Reviewed-by: Jason King <jason.king@joyent.com> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8737 Closes #8870	2019-06-20 12:29:51 -07:00
Matthew Ahrens	050d720c43	Remove dedupditto functionality If dedup is in use, the `dedupditto` property can be set, causing ZFS to keep an extra copy of data that is referenced many times (>100x). The idea was that this data is more important than other data and thus we want to be really sure that it is not lost if the disk experiences a small amount of random corruption. ZFS (and system administrators) rely on the pool-level redundancy to protect their data (e.g. mirroring or RAIDZ). Since the user/sysadmin doesn't have control over what data will be offered extra redundancy by dedupditto, this extra redundancy is not very useful. The bulk of the data is still vulnerable to loss based on the pool-level redundancy. For example, if particle strikes corrupt 0.1% of blocks, you will either be saved by mirror/raidz, or you will be sad. This is true even if dedupditto saved another 0.01% of blocks from being corrupted. Therefore, the dedupditto functionality is rarely enabled (i.e. the property is rarely set), and it fulfills its promise of increased redundancy even more rarely. Additionally, this feature does not work as advertised (on existing releases), because scrub/resilver did not repair the extra (dedupditto) copy (see https://github.com/zfsonlinux/zfs/pull/8270). In summary, this seldom-used feature doesn't work, and even if it did it wouldn't provide useful data protection. It has a non-trivial maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270). We should remove the dedupditto functionality. For backwards compatibility with the existing CLI, "zpool set dedupditto" will still "succeed" (exit code zero), but won't have any effect. For backwards compatibility with existing pools that had dedupditto enabled at some point, the code will still be able to understand dedupditto blocks and free them when appropriate. However, ZFS won't write any new dedupditto blocks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Issue #8270 Closes #8310	2019-06-19 14:54:02 -07:00
Brian Behlendorf	4ca457b065	ZTS: Fix mmp_interval failure The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8906	2019-06-19 10:39:28 -07:00
Paul Dagnelie	30af21b025	Implement Redacted Send/Receive Redacted send/receive allows users to send subsets of their data to a target system. One possible use case for this feature is to not transmit sensitive information to a data warehousing, test/dev, or analytics environment. Another is to save space by not replicating unimportant data within a given dataset, for example in backup tools like zrepl. Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots). Second, the new zfs redact command is used to create a redaction bookmark. The redaction bookmark stores the list of blocks in a snapshot that were modified by the redaction snapshot(s). Finally, the redaction bookmark is passed as a parameter to zfs send. When sending to the snapshot that was redacted, the redaction bookmark is used to filter out blocks that contain sensitive or unwanted information, and those blocks are not included in the send stream. When sending from the redaction bookmark, the blocks it contains are considered as candidate blocks in addition to those blocks in the destination snapshot that were modified since the creation_txg of the redaction bookmark. This step is necessary to allow the target to rehydrate data in the case where some blocks are accidentally or unnecessarily modified in the redaction snapshot. The changes to bookmarks to enable fast space estimation involve adding deadlists to bookmarks. There is also logic to manage the life cycles of these deadlists. The new size estimation process operates in cases where previously an accurate estimate could not be provided. In those cases, a send is performed where no data blocks are read, reducing the runtime significantly and providing a byte-accurate size estimate. Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Prashanth Sreenivasa <pks@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Chris Williamson <chris.williamson@delphix.com> Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com> Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #7958	2019-06-19 09:48:12 -07:00
Tulsi Jain	9c7da9a95a	Restrict filesystem creation if name referred either '.' or '..' This change restricts filesystem creation if the given name contains either '.' or '..' Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: TulsiJain <tulsi.jain@delphix.com> Closes #8842 Closes #8564	2019-06-13 08:56:15 -07:00
Richard Elling	cfc16f8ba8	Improve ZTS block_device_wait debugging The udevadm settle timeout can be 120 or 180 seconds by default for some distributions. If a long delay is experienced, it could be due to some strangeness in a malfunctioning device that isn't related to the devices under test. To help debug this condition, a notice is given if settle takes too long. Arguments can now be passed to block_device_wait. The expected arguments are block device pathnames. Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:21:19 -07:00
Richard Elling	4cb1b541d4	Block_device_wait does not return an error code Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:21:08 -07:00
Richard Elling	bef70afaa6	Remove redundant redundant remove Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:20:42 -07:00
Richard Elling	2ff615b2bf	Fix logic error in setpartition function Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:19:32 -07:00
Don Brady	8e91c5ba6a	hkdf_test binary should only have one icp instance The build for test binary hkdf_test was linking both against libicp and libzpool. This results in two instances of libicp inside the binary but the call to icp_init() only initializes one of them! Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #8850	2019-06-05 14:21:25 -07:00
Tomohiro Kusumi	1608985a41	Add link count test for root inode Add tests for 97aa3ba44("Fix link count of root inode when snapdir is visible") as suggested in #8727. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com> Closes #8732	2019-05-29 16:26:46 -07:00
loli10K	ab1a9705f8	Double-free of encryption wrapping key due to invalid pool properties This commits fixes a double-free in zfs_ioc_pool_create() triggered by specifying an unsupported combination of properties when creating a pool with encryption enabled. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8791	2019-05-28 15:19:50 -07:00
Stoiko Ivanov	10e1a0112e	tests: fix cosmetic permission issues during `make install` files in dist__SCRIPTS get installed with 0755, those in dist__DATA with 0644. This commit moves all .kshlib, .shlib and .cfg files in the testsuite to dist_pkgdata_DATA, and removes the shebang from zpool_import.kshlib. This ensures that the files are installed with appropriate permissions and silences some warnings from lintian Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> Closes #8803	2019-05-28 15:07:24 -07:00
loli10K	fe609530f2	zfs-tests: fix warnings when packaging some .shlib files This change prevents the following warning when packaging some zfs-tests files: *** WARNING: ./usr/src/zfs-0.8.0/tests/zfs-tests/include/zpool_script.shlib is executable but has empty or no shebang, removing executable bit Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8787	2019-05-24 14:12:14 -07:00
loli10K	2696e86f2b	zfs-tests: verify zfs(8) and zpool(8) help message is under 80 columns This commit updates the ZFS Test Suite to detect incorrect wrapping of both zfs(8) and zpool(8) help message Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8785	2019-05-24 14:04:08 -07:00
siv0	0ff5cd0772	Fix ksh-path for random_readwrite_fixed.ksh The test in zfs-tests/tests/perf/regression/random_readwrite_fixed.ksh is the only file to use /usr/bin/ksh in the shebang. Change it to /bin/ksh for consistency. Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> Closes #8779	2019-05-24 13:17:50 -07:00
Igor K	2d0077da6d	Rename reservation tests from .sh to .ksh Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #8729	2019-05-23 15:42:03 -07:00
Brian Behlendorf	caf9dd209f	Fix send/recv lost spill block When receiving a DRR_OBJECT record the receive_object() function needs to determine how to handle a spill block associated with the object. It may need to be removed or kept depending on how the object was modified at the source. This determination is currently accomplished using a heuristic which takes in to account the DRR_OBJECT record and the existing object properties. This is a problem because there isn't quite enough information available to do the right thing under all circumstances. For example, when only the block size changes the spill block is removed when it should be kept. What's needed to resolve this is an additional flag in the DRR_OBJECT which indicates if the object being received references a spill block. The DRR_OBJECT_SPILL flag was added for this purpose. When set then the object references a spill block and it must be kept. Either it is update to date, or it will be replaced by a subsequent DRR_SPILL record. Conversely, if the object being received doesn't reference a spill block then any existing spill block should always be removed. Since previous versions of ZFS do not understand this new flag additional DRR_SPILL records will be inserted in to the stream. This has the advantage of being fully backward compatible. Existing ZFS systems receiving this stream will recreate the spill block if it was incorrectly removed. Updated ZFS versions will correctly ignore the additional spill blocks which can be identified by checking for the DRR_SPILL_UNMODIFIED flag. The small downside to this approach is that is may increase the size of the stream and of the received snapshot on previous versions of ZFS. Additionally, when receiving streams generated by previous unpatched versions of ZFS spill blocks may still be lost. OpenZFS-issue: https://www.illumos.org/issues/9952 FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233277 Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8668	2019-05-07 15:18:44 -07:00
Tomohiro Kusumi	9c53e51616	Fix `zfs set atime\|relatime=off\|on` behavior on inherited datasets `zfs set atime\|relatime=off\|on` doesn't disable or enable the property on read for datasets whose property was inherited from parent, until a dataset is once unmounted and mounted again. (The properties start to work properly if a dataset is once unmounted and mounted again. The difference comes from regular mount process, e.g. via zpool import, uses mount options based on properties read from ondisk layout for each dataset, whereas `zfs set atime\|relatime=off\|on` just remounts a specified dataset.) -- # zpool create p1 <device> # zfs create p1/f1 # zfs set atime=off p1 # echo test > /p1/f1/test # sync # zfs list NAME USED AVAIL REFER MOUNTPOINT p1 176K 18.9G 25.5K /p1 p1/f1 26K 18.9G 26K /p1/f1 # zfs get atime NAME PROPERTY VALUE SOURCE p1 atime off local p1/f1 atime off inherited from p1 # stat /p1/f1/test \| grep Access \| tail -1 Access: 2019-04-26 23:32:33.741205192 +0900 # cat /p1/f1/test test # stat /p1/f1/test \| grep Access \| tail -1 Access: 2019-04-26 23:32:50.173231861 +0900 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ changed by read(2) -- The problem is that zfsvfs::z_atime which was probably intended to keep incore atime state just gets updated by a callback function of "atime" property change, atime_changed_cb(), and never used for anything else. Since now that all file read and atime update use a common function zpl_iter_read_common() -> file_accessed(), and whether to update atime via ->dirty_inode() is determined by atime_needs_update(), atime_needs_update() needs to return false once atime is turned off. It currently continues to return true on `zfs set atime=off`. Fix atime_changed_cb() by setting or dropping SB_NOATIME in VFS super block depending on a new atime value, so that atime_needs_update() works as expected after property change. The same problem applies to "relatime" except that a self contained relatime test is needed. This is because relatime_need_update() is based on a mount option flag MNT_RELATIME, which doesn't exist in datasets with inherited "relatime" property via `zfs set relatime=...`, hence it needs its own relatime test zfs_relatime_need_update(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8674 Closes #8675	2019-05-07 10:06:30 -07:00
Tom Caputi	fa24166074	Add feature check for 'zpool resilver' command The 'zpool resilver' command requires that the resilver_defer feature is active on the pool. Unfortunately, the check for this was left out of the original patch. This commit simply corrects this so that the command properly returns an error in this case. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8700	2019-05-02 16:42:31 -07:00
Tomohiro Kusumi	5b1443c47e	Use sigaction(2) instead of sigset(3) for portability sigset(3) isn't portable. This code fails to compile on platforms without sigset(3). Use sigaction(2). -- largest_file.c: In function 'main': largest_file.c:75:9: error: implicit declaration of function 'sigset'; did you mean 'sigvec'? [-Werror=implicit-function-declaration] (void) sigset(SIGXFSZ, sigxfsz); ^~~~~~ sigvec Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8593	2019-04-30 20:45:17 -07:00
Tom Caputi	c2c6eadf29	Fix issues with truncated files in raw sends When receiving a raw send stream only reallocated objects whose contents were not freed by the standard indicators should call dmu_free_long_range(). Furthermore, if calling dmu_free_long_range() is required then the objects current block size must be used and not the new block size. Two additional test cases were added to provided realistic test coverage for processing reallocated objects which are part of a raw receive. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8528 Closes #8607	2019-04-15 15:28:48 -07:00
Richard Laager	83472fabe5	Fix hierarchy misspellings Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Richard Laager <rlaager@wiktel.com> Closes #8563 Closes #8622	2019-04-14 19:06:34 -07:00
Tomohiro Kusumi	703f791d35	Don't hard-code number of ioctls for portability Use (ZFS_IOC_LAST - ZFS_IOC_FIRST) instead of 256. It seems 256 is just a number large enough to hold ioctls at the moment. Using 256 also causes compile-time warning or error on platfoms whose enum zfs_ioc definition differs. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8598	2019-04-14 11:13:34 -07:00
Brian Behlendorf	b92f5d9f82	Fix issue in receive_object() during reallocation When receiving an object to a previously allocated interior slot the new object should be "allocated" by setting DMU_NEW_OBJECT, not "reallocated" with dnode_reallocate(). For resilience verify the slot is free as required in case the stream is malformed. Add a test case to generate more realistic incremental send streams that force reallocation to occur during the receive. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8067 Closes #8614	2019-04-12 14:28:04 -07:00
John Wren Kennedy	9e3485abfc	ZTS: Make fault cleanup function more robust The cleanup function of auto_online_001_pos does not account for the possibility that the test may fail while a disk is still removed. If the test run is using real disks, cleanup should involve restoring any that are missing. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #8579	2019-04-12 10:07:20 -07:00
Brian Behlendorf	d93d4b1acd	Revert "Fix issues with truncated files in raw sends" This partially reverts commit `5dbf8b4ed`. This change resolved the issues observed with truncated files in raw sends. However, the required changes to dnode_allocate() introduced a regression for non-raw streams which needs to be understood. The additional debugging improvements from the original patch were not reverted. Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7378 Issue #8528 Issue #8540 Issue #8565 Close #8584	2019-04-05 17:32:56 -07:00
Don Brady	b4ddec7af6	features.kernel layout should match features.pool The features.kernel layout should match features.pool. Reviewed-by: Sara Hartse <sara.hartse@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #8566	2019-04-04 19:00:55 -07:00
Sara Hartse	a887d653b3	Restrict kstats and print real pointers There are several places where we use zfs_dbgmsg and %p to print pointers. In the Linux kernel, these values obfuscated to prevent information leaks which means the pointers aren't very useful for debugging crash dumps. We decided to restrict the permissions of dbgmsg (and some other kstats while we were at it) and print pointers with %px in zfs_dbgmsg as well as spl_dumpstack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: sara hartse <sara.hartse@delphix.com> Closes #8467 Closes #8476	2019-04-04 18:57:06 -07:00
Tom Caputi	df583073eb	Do not iterate through filesystems unnecessarily Currently, when attempting to list snapshots ZFS may do a lot of extra work checking child datasets. This is because the code does not realize that it will not be able to reach any snapshots contained within snapshots that are at the depth limit since the snapshots of those datasets are counted as an additional layer deeper. This patch corrects this issue. In addition, this patch adds the ability to do perform the commands: $ zfs list -t snapshot <dataset> $ zfs get -t snapshot <prop> <dataset> as a convenient way to list out properties of all snapshots of a given dataset without having to use the depth limit. Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8539	2019-04-01 11:58:59 -07:00
Brian Behlendorf	1b939560be	Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598	2019-03-29 09:13:20 -07:00
Tom Caputi	5dbf8b4edd	Fix issues with truncated files in raw sends This patch fixes a few issues with raw receives involving truncated files: * dnode_reallocate() now calls dnode_set_blksz() instead of dnode_setdblksz(). This ensures that any remaining dbufs with blkid 0 are resized along with their containing dnode upon reallocation. * One of the calls to dmu_free_long_range() in receive_object() needs to check that the object it is about to free some contents or hasn't been completely removed already by a previous call to dmu_free_long_object() in the same function. * The same call to dmu_free_long_range() in the previous point needs to ensure it uses the object's current block size and not the new block size. This ensures the blocks of the object that are supposed to be freed are completely removed and not simply partially zeroed out. This patch also adds handling for DRR_OBJECT_RANGE records to dprintf_drr() for debugging purposes. Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7378 Closes #8528	2019-03-27 11:30:48 -07:00
Richard Elling	85a150ce1e	Update valid vdev types for get_disklist Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8532	2019-03-26 14:18:58 -07:00
Brian Behlendorf	db6be852da	ZTS: Detect e2fsprogs verity issue The projectid_001_pos and projecttree_001_pos test cases use the lsattr command to detect that the project quota bit is set correctly. Due to a bug in e2fsprogs-1.44.4 setting the Project 'P' bit also results in the Verity 'V' bit being reported as set. This will result in the test case failing. The issue has been resolved in e2fsprogs but in order to avoid testing failures these two test cases are skipped when e2fsprogs-1.44.4 is installed. https://github.com/tytso/e2fsprogs/commit/7e5a95e3d Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8534	2019-03-26 13:57:40 -07:00
Olaf Faaland	060f0226e6	MMP interval and fail_intervals in uberblock When Multihost is enabled, and a pool is imported, uberblock writes include ub_mmp_delay to allow an importing node to calculate the duration of an activity test. This value, is not enough information. If zfs_multihost_fail_intervals > 0 on the node with the pool imported, the safe minimum duration of the activity test is well defined, but does not depend on ub_mmp_delay: zfs_multihost_fail_intervals * zfs_multihost_interval and if zfs_multihost_fail_intervals == 0 on that node, there is no such well defined safe duration, but the importing host cannot tell whether mmp_delay is high due to I/O delays, or due to a very large zfs_multihost_interval setting on the host which last imported the pool. As a result, it may use a far longer period for the activity test than is necessary. This patch renames ub_mmp_sequence to ub_mmp_config and uses it to record the zfs_multihost_interval and zfs_multihost_fail_intervals values, as well as the mmp sequence. This allows a shorter activity test duration to be calculated by the importing host in most situations. These values are also added to the multihost_history kstat records. It calculates the activity test duration differently depending on whether the new fields are present or not; for importing pools with only ub_mmp_delay, it uses (zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals Which results in an activity test duration less sensitive to the leaf count. In addition, it makes a few other improvements: * It updates the "sequence" part of ub_mmp_config when MMP writes in between syncs occur. This allows an importing host to detect MMP on the remote host sooner, when the pool is idle, as it is not limited to the granularity of ub_timestamp (1 second). * It issues writes immediately when zfs_multihost_interval is changed so remote hosts see the updated value as soon as possible. * It fixes a bug where setting zfs_multihost_fail_intervals = 1 results in immediate pool suspension. * Update tests to verify activity check duration is based on recorded tunable values, not tunable values on importing host. * Update tests to verify the expected number of uberblocks have valid MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number), that sequence number is incrementing, and that uberblock values match tunable settings. Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7842	2019-03-21 12:47:57 -07:00
Brian Behlendorf	066da71e7f	Improve `zpool labelclear` 1) As implemented the `zpool labelclear` command overwrites the calculated offsets of all four vdev labels even when only a single valid label is found. If the device as been re-purposed but still contains a valid label this can result in space no longer owned by ZFS being zeroed. Prevent this by verifying every label removed is intact before it's overwritten. 2) Address a small bug in zpool_do_labelclear() which prevented labelclear from working on file vdevs. Only block devices support BLKFLSBUF, try the ioctl() but when it's reported as unsupported this should not be fatal. 3) Fix `zpool labelclear` so it can be run on vdevs which were removed from the pool with `zpool remove`. Additionally, allow intact but partial labels to be cleared as in the case of a failed `zpool attach` or `zpool replace`. 4) Remove LABELCLEAR and LABELREAD variables for test cases. Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8500 Closes #8373 Closes #6261	2019-03-21 10:13:01 -07:00
Tom Caputi	ab7615d92c	Multiple DVA Scrubbing Fix Currently, there is an issue in the sequential scrub code which prevents self healing from working in some cases. The scrub code will split up all DVA copies of a bp and issue each of them separately. The problem is that, since each of the DVAs is no longer associated with the others, the self healing code doesn't have the opportunity to repair problems that show up in one of the DVAs with the data from the others. This patch fixes this issue by ensuring that all IOs issued by the sequential scrub code include all DVAs. Initially, only the first DVA of each is attempted. If an issue arises, the IO is retried with all available copies, giving the self healing code a chance to correct the issue. To test this change, this patch also adds the ability for zinject to specify individual DVAs to inject read errors into. We then add a new test case that utilizes this functionality to ensure scrubs and self-healing reads can handle and transparently fix issues with individual copies of blocks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8453	2019-03-15 14:14:31 -07:00
Tony Hutter	2bbec1c910	Make zpool status counters match error events count The number of IO and checksum events should match the number of errors seen in zpool status. Previously there was a mismatch between the two counts because zpool status would only count unrecovered errors, while zpool events would get an event for all errors (recovered or not). This lead to situations where disks could be faulted for "too many errors", while at the same time showing zero errors in zpool status. This fixes the zpool status error counters to increment at the same times we post the error events. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #4851 Closes #7817	2019-03-14 18:21:53 -07:00
Tom Caputi	eaed840542	Better user experience for errata 4 This patch attempts to address some user concerns that have arisen since errata 4 was introduced. * The errata warning has been made less scary for users without any encrypted datasets. * The errata warning now clears itself without a pool reimport if the bookmark_v2 feature is enabled and no encrypted datasets exist. * It is no longer possible to create new encrypted datasets without enabling the bookmark_v2 feature, thus helping to ensure that the errata is resolved. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Issue ##8308 Closes #8504	2019-03-14 16:48:30 -07:00
Igor K	508c5527d0	Use 'printf %s' instead of 'echo -n' for compatibility The ksh 'echo -n' behavior on Illumos and Linux differs. For compatibility with others platforms switch to "printf '%s' ". Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #8501	2019-03-13 18:39:12 -07:00
Tom Caputi	5cc9ba5cf0	Make zstreamdump -v more greppable Currently, the verbose output of zstreamdump includes new line characters within some individual records. Presumably, this was originally done to keep the output from getting too wide to fit on a terminal. However, since new flags and struct members have been added, these rules have not been maintained consistently. In addition, these newlines can make it hard to grep the output in some scenarios. This patch simply removes these newlines, making the output easier to grep and removing the inconsistency. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8493	2019-03-13 11:19:23 -07:00
Tom Caputi	f00ab3f22c	Detect and prevent mixed raw and non-raw sends Currently, there is an issue in the raw receive code where raw receives are allowed to happen on top of previously non-raw received datasets. This is a problem because the source-side dataset doesn't know about how the blocks on the destination were encrypted. As a result, any MAC in the objset's checksum-of-MACs tree that is a parent of both blocks encrypted on the source and blocks encrypted by the destination will be incorrect. This will result in authentication errors when we decrypt the dataset. This patch fixes this issue by adding a new check to the raw receive code. The code now maintains an "IVset guid", which acts as an identifier for the set of IVs used to encrypt a given snapshot. When a snapshot is raw received, the destination snapshot will take this value from the DRR_BEGIN payload. Non-raw receives and normal "zfs snap" operations will cause ZFS to generate a new IVset guid. When a raw incremental stream is received, ZFS will check that the "from" IVset guid in the stream matches that of the "from" destination snapshot. If they do not match, the code will error out the receive, preventing the problem. This patch requires an on-disk format change to add the IVset guids to snapshots and bookmarks. As a result, this patch has errata handling and a tunable to help affected users resolve the issue with as little interruption as possible. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8308	2019-03-13 11:00:43 -07:00
Tom Caputi	369aa501d1	Fix handling of maxblkid for raw sends Currently, the receive code can create an unreadable dataset from a correct raw send stream. This is because it is currently impossible to set maxblkid to a lower value without freeing the associated object. This means truncating files on the send side to a non-0 size could result in corruption. This patch solves this issue by adding a new 'force' flag to dnode_new_blkid() which will allow the raw receive code to force the DMU to accept the provided maxblkid even if it is a lower value than the existing one. For testing purposes the send_encrypted_files.ksh test has been extended to include a variety of truncated files and multiple snapshots. It also now leverages the xattrtest command to help ensure raw receives correctly handle xattrs. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8168 Closes #8487	2019-03-13 10:52:01 -07:00
Olaf Faaland	3d31aad83e	MMP writes rotate over leaves Instead of choosing a leaf vdev quasi-randomly, by starting at the root vdev and randomly choosing children, rotate over leaves to issue MMP writes. This fixes an issue in a pool whose top-level vdevs have different numbers of leaves. The issue is that the frequency at which individual leaves are chosen for MMP writes is based not on the total number of leaves but based on how many siblings the leaves have. For example, in a pool like this: root-vdev +------+---------------+ vdev1 vdev2 \| \| \| +------+-----+-----+----+ disk1 disk2 disk3 disk4 disk5 disk6 vdev1 and vdev2 will each be chosen 50% of the time. Every time vdev1 is chosen, disk1 will be chosen. However, every time vdev2 is chosen, disk2 is chosen 20% of the time. As a result, disk1 will be sent 5x as many MMP writes as disk2. This may create wear issues in the case of SSDs. It also reduces the effectiveness of MMP as it depends on the writes being evenly distributed for the case where some devices fail or are partitioned. The new code maintains a list of leaf vdevs in the pool. MMP records the last leaf used for an MMP write in mmp->mmp_last_leaf. To choose the next leaf, MMP starts at mmp->mmp_last_leaf and traverses the list, continuing from the head if the tail is reached. It stops when a suitable leaf is found or all leaves have been examined. Added a test to verify MMP write distribution is even. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Kash Pande <kash@tripleback.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7953	2019-03-12 10:37:06 -07:00
Lorenz Brun	bf90948daf	Reorder ZFS ioctls to fix cross-version compatibility Reorder ZFS ioctls to fix cross-version compatibility. Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Lorenz Brun <lorenz@dolansoft.org> Closes #8484	2019-03-09 13:39:31 -08:00
Matthew Ahrens	c568ab8d99	zfs.8 has wrong description of "zfs program -t" The "-t" argument to "zfs program" specifies a limit on the number of LUA instructions that can be executed. The zfs.8 manpage has the wrong description. It should be updated to match what's in zfs-program.8 Also fix the formatting of the zfs help message. Reviewed by: Allan Jude <allanjude@freebsd.org> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #8410	2019-02-26 11:15:28 -08:00
loli10K	c44a3ec059	zvol: allow rename of in use ZVOL dataset While ZFS allow renaming of in use ZVOLs at the DSL level without issues the ZVOL layer does not correctly update the renamed dataset if the device node is open (zv->zv_open_count > 0): trying to access the stale dataset name, for instance during a zfs receive, will cause the following failure: VERIFY3(zv->zv_objset->os_dsl_dataset->ds_owner == zv) failed ((null) == ffff8800dbb6fc00) PANIC at zvol.c:1255:zvol_resume() Showing stack for process 1390 CPU: 0 PID: 1390 Comm: zfs Tainted: P O 3.16.0-4-amd64 #1 Debian 3.16.51-3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 ffffffff8151ea00 ffffffffa0758a80 ffff88028aefba30 ffffffffa0417219 ffff880037179220 ffffffff00000030 ffff88028aefba40 ffff88028aefb9e0 2833594649524556 6f5f767a3e2d767a 6f3e2d7465736a62 Call Trace: [<0>] ? dump_stack+0x5d/0x78 [<0>] ? spl_panic+0xc9/0x110 [spl] [<0>] ? mutex_lock+0xe/0x2a [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs] [<0>] ? rrw_exit+0xc8/0x2e0 [zfs] [<0>] ? mutex_lock+0xe/0x2a [<0>] ? dmu_objset_from_ds+0x9a/0x250 [zfs] [<0>] ? dmu_objset_hold_flags+0x71/0xc0 [zfs] [<0>] ? zvol_resume+0x178/0x280 [zfs] [<0>] ? zfs_ioc_recv_impl+0x88b/0xf80 [zfs] [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs] [<0>] ? zfs_ioc_recv+0x1c2/0x2a0 [zfs] [<0>] ? dmu_buf_get_user+0x13/0x20 [zfs] [<0>] ? __alloc_pages_nodemask+0x166/0xb50 [<0>] ? zfsdev_ioctl+0x896/0x9c0 [zfs] [<0>] ? handle_mm_fault+0x464/0x1140 [<0>] ? do_vfs_ioctl+0x2cf/0x4b0 [<0>] ? __do_page_fault+0x177/0x410 [<0>] ? SyS_ioctl+0x81/0xa0 [<0>] ? async_page_fault+0x28/0x30 [<0>] ? system_call_fast_compare_end+0x10/0x15 Reviewed by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6263 Closes #8371	2019-02-22 15:38:42 -08:00
loli10K	bb1be77a35	Prevent user accounting on readonly pool Trying to mount a dataset from a readonly pool could inadvertently start the user accounting upgrade task, leading to the following failure: VERIFY3(tx->tx_threads == 2) failed (0 == 2) PANIC at txg.c:680:txg_wait_synced() Showing stack for process 2541 CPU: 2 PID: 2541 Comm: z_upgrade Tainted: P O 3.16.0-4-amd64 #1 Debian 3.16.51-3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: [<0>] ? dump_stack+0x5d/0x78 [<0>] ? spl_panic+0xc9/0x110 [spl] [<0>] ? dnode_next_offset+0x1d4/0x2c0 [zfs] [<0>] ? dmu_object_next+0x77/0x130 [zfs] [<0>] ? dnode_rele_and_unlock+0x4d/0x120 [zfs] [<0>] ? txg_wait_synced+0x91/0x220 [zfs] [<0>] ? dmu_objset_id_quota_upgrade_cb+0x10f/0x140 [zfs] [<0>] ? dmu_objset_upgrade_task_cb+0xe3/0x170 [zfs] [<0>] ? taskq_thread+0x2cc/0x5d0 [spl] [<0>] ? wake_up_state+0x10/0x10 [<0>] ? taskq_thread_should_stop.part.3+0x70/0x70 [spl] [<0>] ? kthread+0xbd/0xe0 [<0>] ? kthread_create_on_node+0x180/0x180 [<0>] ? ret_from_fork+0x58/0x90 [<0>] ? kthread_create_on_node+0x180/0x180 This patch updates both functions responsible for checking if we can perform user accounting to verify the pool is not readonly. Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8424	2019-02-19 18:41:18 -08:00
Ned Bass	75d6b7ddca	Add missing copyright notice to large_dnode tests Missing copyright notices were noticed during the Illumos RTI process. Add LLNS 2016 copyright based on original merge date. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #8435	2019-02-19 18:39:10 -08:00
John Wren Kennedy	435637d1ed	ZTS: user_property_002_pos fails to destroy volume During the cleanup function of this test, an attempt to destroy a volume can fail because the volume is busy. This leaves the system with unexpected datasets which in turn causes subsequent failures. Reviewed-by: bunder2015 <omfgbunder@gmail.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #8422	2019-02-19 11:12:47 -08:00
John Wren Kennedy	07237a7bc1	ZTS: clone_001_pos fails in cleanup on busy dataset The "cleanup_all" function in this test calls "zfs destroy" which fails approximately 30% of the time in our environment due to the dataset being busy. Since the failure happens during cleanup, the error is propagated to subsequent tests. Tested by running the snapshot test group in a loop without seeing any failures. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #8409	2019-02-15 12:45:46 -08:00
Paul Zuchowski	9c5e88b1de	zfs should optionally send holds Add -h switch to zfs send command to send dataset holds. If holds are present in the stream, zfs receive will create them on the target dataset, unless the zfs receive -h option is used to skip receive of holds. Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #7513	2019-02-15 12:41:38 -08:00
Alek P	dcec0a12c8	port async unlinked drain from illumos-nexenta This patch is an async implementation of the existing sync zfs_unlinked_drain() function. This function is called at mount time and is responsible for freeing znodes that we didn't get to freeing before. We don't have to hold mounting of the dataset until the unlinked list is fully drained as is done now. Since we can process the unlinked set asynchronously this results in a better user experience when mounting a dataset with entries in the unlinked set. Reviewed by: Jorgen Lundman <lundman@lundman.net> Reviewed by: Tom Caputi <tcaputi@datto.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #8142	2019-02-12 10:41:15 -08:00
loli10K	d8d418ff0c	ZVOLs should not be allowed to have children zfs create, receive and rename can bypass this hierarchy rule. Update both userland and kernel module to prevent this issue and use pyzfs unit tests to exercise the ioctls directly. Note: this commit slightly changes zfs_ioc_create() ABI. This allow to differentiate a generic error (EINVAL) from the specific case where we tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com>	2019-02-08 15:44:15 -08:00
loli10K	4417096956	Pool allocation classes misplacing small file blocks Due to an off-by-one condition in spa_preferred_class() we are picking the "normal" allocation class instead of the "special" one for file blocks with size equal to the special_small_blocks property value. This change fix the small code issue, update the ZFS Test Suite and the zfs(8) man page. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8351 Closes #8361	2019-02-08 12:32:12 -08:00
Ahmed Ghanem	9634299657	OpenZFS 9185 - Enable testing over NFS in ZFS performance tests This change makes additions to the ZFS test suite that allows the performance tests to run over NFS. The test is run and performance data collected from the server side, while IO is generated on the NFS client. This has been tested with Linux and illumos NFS clients. Authored by: Ahmed Ghanem <ahmedg@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Kevin Greene <kevin.greene@delphix.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com> OpenZFS-issue: https://www.illumos.org/issues/9185 Closes #8367	2019-02-04 09:27:37 -08:00
Serapheim Dimitropoulos	c853f382db	Change target size of metaslabs from 256GB to 16GB = Old behavior For vdev sizes 100GB to 50TB we keep ~200 metaslabs per vdev and the metaslab size grows from 512MB to 256GB. For vdev's bigger than that we start increasing the number of metaslabs until we hit the 128K limit. = New Behavior For vdev sizes 100GB to 3TB we keep ~200 metaslabs per vdev and the metaslab size grows from 512MB to 16GB. For vdev's bigger than that we start increasing the number of metaslabs until we hit the 128K limit. = Reasoning The old behavior makes metaslabs grow in size when the vdev range is between 3TB (ms_size 16GB) and 32PB (ms_size 256GB). Even though keeping the number of metaslabs is good in terms of potential number of I/Os per TXG, these bigger metaslabs take longer to be loaded and after they are loaded they can take up a lot of memory because of their range trees. This change tries to put a boundary in memory and loading time for the specific range of vdev sizes. Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8324	2019-01-25 16:38:27 -08:00
loli10K	60b0a963f5	Off-by-one in zap_leaf_array_create() Trying to set user properties with their length 1 byte shorter than the maximum size triggers an assertion failure in zap_leaf_array_create(): panic[cpu0]/thread=ffffff000a092c40: assertion failed: num_integers * integer_size < (8<<10) (0x2000 < 0x2000), file: ../../common/fs/zfs/zap_leaf.c, line: 233 ffffff000a092500 genunix:process_type+167c35 () ffffff000a0925a0 zfs:zap_leaf_array_create+1d2 () ffffff000a092650 zfs:zap_entry_create+1be () ffffff000a092720 zfs:fzap_update+ed () ffffff000a0927d0 zfs:zap_update+1a5 () ffffff000a0928d0 zfs:dsl_prop_set_sync_impl+5c6 () ffffff000a092970 zfs:dsl_props_set_sync_impl+fc () ffffff000a0929b0 zfs:dsl_props_set_sync+79 () ffffff000a0929f0 zfs:dsl_sync_task_sync+10a () ffffff000a092a80 zfs:dsl_pool_sync+3a3 () ffffff000a092b50 zfs:spa_sync+4e6 () ffffff000a092c20 zfs:txg_sync_thread+297 () ffffff000a092c30 unix:thread_start+8 () This patch simply corrects the assertion. Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8278	2019-01-18 09:58:46 -08:00
Serapheim Dimitropoulos	419ba59145	Update vdev_is_spacemap_addressable() for new spacemap encoding Since the new spacemap encoding was ported to ZoL that's no longer a limitation. This patch updates vdev_is_spacemap_addressable() that was performing that check. It also updates the appropriate test to ensure that the same functionality is tested. The test does so by creating pools that don't have the new spacemap encoding enabled - just the checkpoint feature. This patch also reorganizes that same tests in order to cut in half its memory consumption. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8286	2019-01-16 15:06:20 -08:00
Serapheim Dimitropoulos	db587941c5	Make zdb results for checkpoint tests consistent This patch exports and re-imports the pool when these tests are analyzed with zdb to get consistent results. Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8292	2019-01-16 10:41:47 -08:00
Brian Behlendorf	6e91a72fe3	Disable 'zfs remap' command The implementation of 'zfs remap' has proven to be problematic since it modifies the objset (but not its logical contents) by dirtying metadata without owning it. The consequence of which is that dmu_objset_remap_indirects() is vulnerable to certain races. For example, if we are in the middle of receiving into the filesystem while it is being remapped. Then it is possible we could evict the objset when the receive completes (see dsl_dataset_clone_swap_sync_impl, or dmu_recv_end_sync), but dmu_objset_remap_indirects() may be still using the objset. The result of which would be a panic. Extended runs of ztest(8) have exposed other possible races which can occur when using 'zfs remap'. Several of these have been fixed but there may be others which have not yet been encountered and diagnosed. Furthermore, the ability to manually remap a filesystem is no longer particularly useful now that the removal code can map large chunks. Coupled with the fact that explaining what this command does and why it may be useful requires a detailed understanding of the internals of device removal. These are details users should not be bothered with. Therefore, the 'zfs remap' command is being disabled but not entirely removed. It may be removed in the future or potentially reworked to address the issues described above. Since 'zfs remap' has never been part of a tagged release its removal is expected to have minimal impact. The ZTS tests have been updated to continue to exercise the command to prevent atrophy, but it has been removed entirely from ztest(8). Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8238	2019-01-15 15:46:58 -08:00
Paul Zuchowski	83c796c5e9	zfs filesystem skipped by df -h On full pool when pool root filesystem references very few bytes, the f_blocks returned to statvfs is 0 but should be at least 1. Reviewed by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #8253 Closes #8254	2019-01-13 10:06:13 -08:00
Brian Behlendorf	99b0b5bc3f	ZTS: zpool_resilver_restart Since the vdev initialize feature was integrated the ZTS zpool_resilver_restart test has been hitting its internal timeout more frequently. This happens most often on the coverage builder but not exclusively. Increasing the timeout for this test case prevents any false positives. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8273	2019-01-13 10:01:31 -08:00
loli10K	0f5f23869a	zfs receive and rollback can skew filesystem_count This commit fixes a small issue which causes both zfs receive and rollback operations to incorrectly increase the "filesystem_count" property value. This change also adds a new test group "limits" to the ZFS Test Suite to exercise both filesystem_count/limit and snapshot_count/limit functionality. Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8232	2019-01-08 10:17:46 -08:00
Brian Behlendorf	a769fb53a1	Add 'zpool status -i' option Only display the full details of the vdev initialization state in 'zpool status' output when requested with the -i option. By default display '(initializing)' after vdevs when they are being actively initialized. This is consistent with the established precident of appending '(resilvering), etc' and fits within the default 80 column terminal width making it easy to read. Additionally, updated the 'zpool initialize' documentation to make it clear the options are mutually exclusive, but allow duplicate options like all other zfs/zpool commands. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8230	2019-01-07 11:03:18 -08:00
George Wilson	619f097693	OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230	2019-01-07 10:37:26 -08:00
Brian Behlendorf	530248d1aa	arc_summary: consolidate test case Since we're only installing one version of arc_summary we only need one test case. Update the test to determine which version is available and then test its supported flags. Remove files for misc tests which should have been cleaned up. Reviewed-by: John Ramsden <johnramsden@riseup.net> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8096	2019-01-06 10:39:41 -08:00
Brian Behlendorf	6e72a5b9b6	pyzfs: python3 support (build system) Almost all of the Python code in the respository has been updated to be compatibile with Python 2.6, Python 3.4, or newer. The only exceptions are arc_summery3.py which requires Python 3, and pyzfs which requires at least Python 2.7. This allows us to maintain a single version of the code and support most default versions of python. This change does the following: * Sets the default shebang for all Python scripts to python3. If only Python 2 is available, then at install time scripts which are compatible with Python 2 will have their shebangs replaced with /usr/bin/python. This is done for compatibility until Python 2 goes end of life. Since only the installed versions are changed this means Python 3 must be installed on the system for test-runner when testing in-tree. * Added --with-python=<2\|3\|3.4,etc> configure option which sets the PYTHON environment variable to target a specific python version. By default the newest installed version of Python will be used or the preferred distribution version when creating pacakges. * Fixed --enable-pyzfs configure checks so they are run when --enable-pyzfs=check and --enable-pyzfs=yes. * Enabled pyzfs for Python 3.4 and newer, which is now supported. * Renamed pyzfs package to python<VERSION>-pyzfs and updated to install in the appropriate site location. For example, when building with --with-python=3.4 a python34-pyzfs will be created which installs in /usr/lib/python3.4/site-packages/. * Renamed the following python scripts according to the Fedora guidance for packaging utilities in /bin - dbufstat.py -> dbufstat - arcstat.py -> arcstat - arc_summary.py -> arc_summary - arc_summary3.py -> arc_summary3 * Updated python-cffi package name. On CentOS 6, CentOS 7, and Amazon Linux it's called python-cffi, not python2-cffi. For Python3 it's called python3-cffi or python3x-cffi. * Install one version of arc_summary. Depending on the version of Python available install either arc_summary2 or arc_summary3 as arc_summary. The user output is only slightly different. Reviewed-by: John Ramsden <johnramsden@riseup.net> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8096	2019-01-06 10:39:41 -08:00
bunder2015	5365b0747a	Add missing MMP status code to libzfs_status When MMP was merged the status codes in libzfs_status were not updated to add the status code for ZPOOL_STATUS_IO_FAILURE_MMP. This commit corrects this and adds comments to help keep track of which code is used for which status. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #8148 Closes #8222	2019-01-03 12:15:46 -08:00
Tom Caputi	7c46894081	ZTS: fix wait_scrubbed() Currently, wait_scrubbed() is the only function of its kind that accepts a timeout, which is 10s by default. This timeout is pretty short for a scrub and causes test failures if we run too long. This patch removes the timeout, instead leaning on the global test suite timeout to ensure the tests keep moving. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8210	2018-12-14 10:06:49 -08:00
Andriy Gapon	dc1c630b8a	OpenZFS 9630 - add lzc_rename and lzc_destroy to libzfs_core Porting Notes: * Additional changes to recv_rename_impl() were required due to encryption code not being merged in OpenZFS yet. * libzfs_core python bindings (pyzfs) were updated to fully support both lzc_rename() and lzc_destroy() Authored by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: loli10K <ezomori.nozomu@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/9630 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/049ba63 Closes #8207	2018-12-14 09:49:45 -08:00
Brian Behlendorf	4b70290163	Check for strlcat and strlcpy This partially reverts commit `8005ca4` by moving the strlcat() and strlcpy() compatibility implementations back to their original location. In addition, these two functions were added to the AC_CHECK_FUNCS macro. When these functions are available from the C library, HAVE_STRLCAT and HAVE_STRLCPY will be defined and library version used. Otherwise the compatibility version is built. Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8157 Closes #8202	2018-12-11 16:01:41 -08:00
LOLi	bdbd5477bc	Fix ASSERT in zfs_receive_one() This commit fixes the following ASSERT in zfs_receive_one() when receiving a send stream from a root dataset with the "-e" option: $ sudo zfs snap source@snap $ sudo zfs send source@snap \| sudo zfs recv -e destination/recv chopprefix > drrb->drr_toname ASSERT at libzfs_sendrecv.c:3804:zfs_receive_one() Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8121	2018-12-04 09:38:55 -08:00
Brian Behlendorf	7c9a42921e	Detect IO errors during device removal * Detect IO errors during device removal While device removal cannot verify the checksums of individual blocks during device removal, it can reasonably detect hard IO errors from the leaf vdevs. Failure to perform this error checking can result in device removal completing successfully, but moving no data which will permanently corrupt the pool. Situation 1: faulted/degraded vdevs In the configuration shown below, the removal of mirror-0 will permanently corrupt the pool. Device removal will preferentially copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which in this case will result in nothing being copied since one vdev in each of those groups in unavailable. However, device removal will complete successfully since all IO errors are ignored. tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 /var/tmp/vdev1 FAULTED 0 0 0 external fault /var/tmp/vdev2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 /var/tmp/vdev3 ONLINE 0 0 0 /var/tmp/vdev4 FAULTED 0 0 0 external fault This issue is resolved by updating the source child selection logic to exclude unreadable leaf vdevs. Additionally, unwritable destination child vdevs which can never succeed are skipped to prevent generating a large number of write IO errors. Situation 2: individual hard IO errors During removal if an unexpected hard IO error is encountered when either reading or writing the child vdev the entire removal operation is cancelled. While it may be possible to reconstruct the data after removal that cannot be guaranteed. The only strictly safe thing to do is to cancel the removal. As a future improvement we may want to instead suspend the removal process and allow the damaged region to be retried. But that work is left for another time, hard IO errors during the removal process are expected to be exceptionally rare. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #6900 Closes #8161	2018-12-04 09:37:37 -08:00
Tom Caputi	cef48f14da	Remove races from scrub / resilver tests Currently, several tests in the ZFS Test Suite that attempt to test scrub and resilver behavior occasionally fail. A big reason for this is that these tests use a combination of zinject and zfs_scan_vdev_limit to attempt to slow these operations enough to verify their test commands. This method works most of the time, but provides no guarantees and leads to flaky behavior. This patch adds a new tunable, zfs_scan_suspend_progress, that ensures that scans make no progress, guaranteeing that tests can be run without racing. This patch also changes zfs_remove_max_bytes_pause to match this new tunable. This provides some consistency between these two similar tunables and ensures that the tunable will not misbehave on 32-bit systems. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8111	2018-11-28 10:12:08 -08:00
LOLi	00369f3338	ZTS: fix "not found" errors This commit fixes several "not found" errors caused by calling undefined or incorrect shell functions in the following ZFS Test Suite groups: * alloc_class * channel_program/lua_core * channel_program/synctask_core * cli_root/zpool_import * cli_user/misc Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: bunder2015 <omfgbunder@gmail.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8152	2018-11-27 09:39:37 -08:00
Brian Behlendorf	8005ca4f74	Move strlcat, strlcpy, and strnlen Move strlcat() and strlcpy() from .c source files in to the libspl string.h header. By changing these compatibility functions to static inline functions they can included as needed without requiring linking with the libspl.so library. Remove strnlen() which is barely used in the source, and has been provided by glibc since v2.10. Finally, convert four instances of strncpy() to strlcpy() in libzfs_input_check.c which were causing build warnings when compiling with gcc 8.2.1. For example: libzfs_input_check.c: In function ‘zfs_destroy’: libzfs_input_check.c:651:9: error: ‘strncpy’ specified bound \ 4096 equals destination size [-Werror=stringop-truncation] (void) strncpy(zc.zc_name, dataset, sizeof (zc.zc_name)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8116	2018-11-20 10:37:49 -08:00
LOLi	0cd5c941d0	zpool: allow split with whole-disk devices This change allows 'zpool split' to work with whole-disk devices and updates the ZFS Test Suite with a new script to exercise this functionality. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6643 Closes #8133	2018-11-20 10:22:53 -08:00
John Wren Kennedy	70621ff20e	ZTS: Fix parsing of zpool status in checksum test filetest_001_pos consumes the output using read -r, assigning each field to a variable. The problem comes when a vdev is marked degraded, which appends extra fields to the line. This causes the trailing text to be treated as part of the `cksum` variable. Using awk instead of read -r allows us to extract the checksum error count from the output whether the vdev is degraded or not. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com> Closes #8136	2018-11-20 09:51:42 -08:00
LOLi	ebb8735901	ZTS: "checksum" test group needs "lscpu" This change adds "lscpu" to the list of commands used by the ZFS Test Suite: this is required by the "checksum" test group to read the CPU frequency which is used in EdonR, Skein and SHA2 performance tests. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8139	2018-11-20 09:47:58 -08:00
Sebastien Roy	a10d50f999	OpenZFS 8115 - parallel zfs mount Porting Notes: * Use thread pools (tpool) API instead of introducing taskq interfaces to libzfs. * Use pthread_mutext for locks as mutex_t isn't available. * Ignore alternative libshare initialization since OpenZFS-7955 is not present on zfsonlinux. Authored by: Sebastien Roy <seb@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Authored by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Matt Ahrens <mahrens@delphix.com> Ported-by: Don Brady <don.brady@delphix.com> OpenZFS-issue: https://www.illumos.org/issues/8115 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3f0e2b569 Closes #8092	2018-11-15 11:33:58 -08:00
loli10K	d48091de81	zed: detect and offline physically removed devices This commit adds a new test case to the ZFS Test Suite to verify ZED can detect when a device is physically removed from a running system: the device will be offlined if a spare is not available in the pool. We implement this by using the existing libudev functionality and without relying solely on the FM kernel module capabilities which have been observed to be unreliable with some kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #1537 Closes #7926	2018-11-09 11:17:24 -08:00
Tony Hutter	ad796b8a3b	Add zpool status -s (slow I/Os) and -p (parseable) This patch adds a new slow I/Os (-s) column to zpool status to show the number of VDEV slow I/Os. This is the number of I/Os that didn't complete in zio_slow_io_ms milliseconds. It also adds a new parsable (-p) flag to display exact values. NAME STATE READ WRITE CKSUM SLOW testpool ONLINE 0 0 0 - mirror-0 ONLINE 0 0 0 - loop0 ONLINE 0 0 0 20 loop1 ONLINE 0 0 0 0 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7756 Closes #6885	2018-11-08 16:47:24 -08:00
George Melikov	877d925a9e	Update zfs_admin_snapshot value (disabled) It's disabled by default, update code and tests to reflect the documentation. Minor cleanup in delegate_common.kshlib. Reviewed-by: Gregor Kopka <gregor@kopka.net> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #7835 Closes #8045	2018-11-08 16:17:12 -08:00
Tom Caputi	d8244d34bd	ZTS: Fix and reenable zfs_rename tests zfs_rename_006_pos has been flaky in the past because it was missing a call to block_device_wait to ensure the zvols it creates are present before running dd. Whenever this this happened, zfs_rename_009_neg would also fail because the first test would leak a zvol clone that it did not know how to clean up. This patch fixes the root cause and reenables the test. It also fixes some minor grammar errors. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #5647 Closes #5648 Closes #8088	2018-11-07 16:59:27 -08:00
Paul Zuchowski	c2bcfa71f4	ZTS: Fix test zfs_mount_006_pos For Linux, place a file in the mount point folder so it will be considered "busy". Fix the while loop so it doesn't rm in directories above the testdir. Add Linux-specific code to test overlay on\|off. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #4990 Closes #8081	2018-11-07 16:54:08 -08:00
Paul Zuchowski	04a88fc00c	ZTS: Fix posix ACL tests that should pass Make sure tests have proper include files. Make sure underlying "chmod" style permissions don't interfere with ACLs. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #8069	2018-10-31 18:58:43 -05:00
George Melikov	58aeb87a8f	ZTS: change `$(cat)` to `$(<)` for speedup It's better to use ksh/bash built in methods, rather than spawn new processes every time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #8071	2018-10-31 12:00:06 -05:00
Serapheim Dimitropoulos	0a544c174d	zdb -k does not work on Linux when used with -e This minor bug was introduced with the port of the feature from OpenZFS to ZoL. This patch fixes the issue that was caused by a minor re-ordering from the original code. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8001	2018-10-30 11:46:18 -05:00
Brian Behlendorf	bea7578356	ZTS: Fix auto_replace_001_pos test The root cause of these failures is that udev can notify the ZED of newly created partition before its links are created. Handle this by allowing an auto-replace to briefly wait until udev confirms the links exist. Distill this test case down to its essentials so it can be run reliably. What we need to check is that: 1) A new disk, in the same physical location, is automatically brought online when added to the system, 2) It completes the replacement process, and 3) The pool is now ONLINE and healthy. There is no need to remove the scsi_debug module. After exporting the pool the disk can be zeroed, removed, and then re-added to the system as a new disk. Reviewed by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8051	2018-10-29 15:05:14 -05:00
Brian Behlendorf	3449243b6d	ZTS: Update project quota tests e2fsprogs v1.44.1, which provides lsattr, added a new attribute for ext3 called "verity". It is reported after the project quota flag as a 'V' character in the `lsattr` output. Update projectid_001_pos.ksh and projecttree_001_pos.ksh to use a pattern which will match the expected output in both cases. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8043	2018-10-23 19:53:14 -07:00
Tom Caputi	80a91e7469	Defer new resilvers until the current one ends Currently, if a resilver is triggered for any reason while an existing one is running, zfs will immediately restart the existing resilver from the beginning to include the new drive. This causes problems for system administrators when a drive fails while another is already resilvering. In this case, the optimal thing to do to reduce risk of data loss is to wait for the current resilver to end before immediately replacing the second failed drive, which allows the system to operate with two incomplete drives for the minimum amount of time. This patch introduces the resilver_defer feature that essentially does this for the admin without forcing them to wait and monitor the resilver manually. The change requires an on-disk feature since we must mark drives that are part of a deferred resilver in the vdev config to ensure that we do not assume they are done resilvering when an existing resilver completes. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: @mmaybee Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7732	2018-10-18 21:06:18 -07:00
LOLi	2e55034471	zpool: allow sharing of spare device among pools ZFS allows, by default, sharing of spare devices among different pools; this commit simply restores this functionality for disk devices and adds an additional tests case to the ZFS Test Suite to prevent future regression. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7999	2018-10-17 11:21:07 -07:00
ilbsmart	779a6c0bf6	deadlock between mm_sem and tx assign in zfs_write() and page fault The bug time sequence: 1. thread #1, `zfs_write` assign a txg "n". 2. In a same process, thread #2, mmap page fault (which means the `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed, and wait previous txg "n" completed. 3. thread #1 call `uiomove` to write, however page fault is occurred in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by thread #2, so it stuck and can't complete, then txg "n" will not complete. So thread #1 and thread #2 are deadlocked. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Grady Wong <grady.w@xtaotech.com> Closes #7939	2018-10-16 11:11:24 -07:00
Alek P	50a343d85c	Fix changelist mounted-dataset iteration Commit `0c6d093` caused a regression in the inherit codepath. The fix is to restrict the changelist iteration on mountpoints and add proper handling for 'legacy' mountpoints Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7988 Closes #7991	2018-10-10 21:13:13 -07:00
Tony Hutter	2ef0f8c329	Print "(repairing)" in zpool status again Historically, zpool status prints "(repairing)" for any drives that have errors during a scrub: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/file1 ONLINE 13 0 0 (repairing) /tmp/file2 ONLINE 0 0 0 /tmp/file3 ONLINE 0 0 0 This was accidentally broken in "OpenZFS 9166 - zfs storage pool checkpoint" (`d2734cc`). This patch adds it back in. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7779 Closes #7978	2018-10-09 20:30:32 -07:00
Prakash Surya	54eb2c410e	Verify 'zfs destroy' will unshare the dataset This change adds a new test case to the zfs-test suite to verify that when 'zfs destroy' is used on a shared dataset, the dataset will be unshared after the destroy operation completes. Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #7941	2018-10-03 10:17:58 -07:00
Prakash Surya	1bf490ba93	Fix "zfs destroy" when "sharenfs=on" is used When using "zfs destroy" on a dataset that is using "sharenfs=on" and has been automatically exported (by libzfs), the dataset will not be automatically unexported as it should be. This workflow appears to have been broken by this commit: `3fd3e56cfd` In that change, the "zfs_unmount" function was modified to use the "mnt.mnt_special" field when determining the mount point that is being unmounted, rather than "mnt.mnt_mountp". As a result, when "mntpt" is passed into "zfs_unshare_proto", it's value is now the dataset name rather than the mountpoint. Thus, when this value is used with the "is_shared" function (via "zfs_unshare_proto") it will not find a match (since that function assumes it'll be passed the mountpoint) and incorrectly reports that the dataset is not shared. This can be easily reproduced with the following commands: $ sudo zpool create tank xvdb $ sudo zfs create -o sharenfs=on tank/fish $ sudo zfs destroy tank/fish $ sudo zfs list -r tank NAME USED AVAIL REFER MOUNTPOINT tank 97.5K 7.27G 24K /tank $ sudo exportfs /tank/fish <world> $ sudo cat /etc/dfs/sharetab /tank/fish - nfs rw,crossmnt At this point, the "tank/fish" filesystem doesn't exist, but it's still listed as exported when looking at "exportfs" and "/etc/dfs/sharetab". Also note, this change brings us back in-sync with the illumos code, as it pertains to this one line; on illumos, "mnt.mnt_mountp" is used. Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Issue #6143 Closes #7941	2018-10-03 10:17:58 -07:00
Alek P	0c6d09361d	changelist should be able to iter on mounts Modified changelist_gather()ing for the mountpoint property. Now instead of iterating on all dataset descendants, we read /proc/self/mounts and iterate on the mounted descendant datasets only. Switched changelist implementation from a uu_list_* to uu_avl_* in order to reduce changlist code-path's worst case time complexity. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7967	2018-10-02 12:30:58 -07:00
Brian Behlendorf	838bd5ff35	ZTS: Fix snapshot_009_pos, snapshot_010_pos Mitigate the likelihood of the newly created volumes being busy when the 'zfs destroy -r' is issued by waiting for udev to settle. Since this is not a iron clad fix I've added the test case to the known list of possible failures and referenced issue #7961. Finally, in the case this test does fail fix the cleanup logic so subsequent tests won't incorrectly fail. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7961 Closes #7962	2018-10-01 17:15:57 -07:00
John Gallagher	d12614521a	Fixes for procfs files backed by linked lists There are some issues with the way the seq_file interface is implemented for kstats backed by linked lists (zfs_dbgmsgs and certain per-pool debugging info): * We don't account for the fact that seq_file sometimes visits a node multiple times, which results in missing messages when read through procfs. * We don't keep separate state for each reader of a file, so concurrent readers will receive incorrect results. * We don't account for the fact that entries may have been removed from the list between read syscalls, so reading from these files in procfs can cause the system to crash. This change fixes these issues and adds procfs_list, a wrapper around a linked list which abstracts away the details of implementing the seq_file interface for a list and exposing the contents of the list through procfs. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com> External-issue: LX-1211 Closes #7819	2018-09-26 11:08:12 -07:00
LOLi	36e369ecb8	ZTS: Fix removal_resume_export This change simplify the test case removing part of the logic which was introducing a race condition and thus causing spurious failures: we use attempt_during_removal() from removal.kshlib instead which has been observed to be more stable. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7894 Closes #7913	2018-09-24 12:58:16 -07:00
LOLi	5140a58f3b	zpool should detect invalid fs property on create This change improve the handling of invalid filesystem properties when specified at pool creation: this is useful when 'zpool create -n' (dry run) is executed to detect invalid fs-level options (-O) before the actual command is run. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7620 Closes #7878	2018-09-13 13:37:42 -07:00
Brian Behlendorf	92b432139d	Add removal_resume_export to zts-report.py Add the removal_resume_export test case to the possible failure section of the zts-report.py and reference the Github issue. In the CI environment this test has proven to be unreliable due to the way it detects the removal thread. This is a flaw in the test and not device removal so update the result summary accordingly. Additionally, increase the allowed timeout in an effort to reduce the observed rate of false positves. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7895 Issue #7894	2018-09-13 13:35:09 -07:00
Roman Strashkin	733b5722b4	zpool split can create a corrupted pool Added vdev_resilver_needed() check to verify VDEVs are fully synced, so that after split the new pool will not be corrupted. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Roman Strashkin <roman.strashkin@nexenta.com> Closes #7865 Closes #7881	2018-09-12 18:14:42 -07:00
Don Brady	cc99f275a2	Pool allocation classes Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Håkan Johansson <f96hajo@chalmers.se> Reviewed-by: Andreas Dilger <andreas.dilger@chamcloud.com> Reviewed-by: DHE <git@dehacked.net> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Gregor Kopka <gregor@kopka.net> Reviewed-by: Kash Pande <kash@tripleback.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #5182	2018-09-05 18:33:36 -07:00
Don Brady	b83a0e2dc1	Add basic zfs ioc input nvpair validation We want newer versions of libzfs_core to run against an existing zfs kernel module (i.e. a deferred reboot or module reload after an update). Programmatically document, via a zfs_ioc_key_t, the valid arguments for the ioc commands that rely on nvpair input arguments (i.e. non legacy commands from libzfs_core). Automatically verify the expected pairs before dispatching a command. This initial phase focuses on the non-legacy ioctls. A follow-on change can address the legacy ioctl input from the zfs_cmd_t. The zfs_ioc_key_t for zfs_keys_channel_program looks like: static const zfs_ioc_key_t zfs_keys_channel_program[] = { {"program", DATA_TYPE_STRING, 0}, {"arg", DATA_TYPE_UNKNOWN, 0}, {"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL}, {"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, {"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, }; Introduce four input errors to identify specific input failures (in addition to generic argument value errors like EINVAL, ERANGE, EBADF, and E2BIG). ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #7780	2018-09-02 12:14:01 -07:00
Don Brady	e8bcb693d6	Add zfs module feature and property info to sysfs This extends our sysfs '/sys/module/zfs' entry to include feature and property attributes. The primary consumer of this information is user processes, like the zfs CLI, that need to know what the current loaded ZFS module supports. The libzfs binary will consult this information when instantiating the zfs and zpool property tables and the pool features table. This introduces 4 kernel objects (dirs) into '/sys/module/zfs' with corresponding attributes (files): features.runtime features.pool properties.dataset properties.pool Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #7706	2018-09-02 12:09:53 -07:00
Brian Behlendorf	bb91178e60	ZTS: Fix EBUSY volume destroy failures It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup functions to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. This was done not only for volumes but also for file systems for consistency. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7854	2018-08-31 15:30:44 -07:00
bunder2015	9e7fb6c171	ZTS: pool_checkpoint path cleanup Removing hardcoded paths in pool_checkpoint.kshlib Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7840	2018-08-30 14:45:16 -07:00
Richard Elling	6fa1e1e73a	ZTS: Fix DEV_DSKDIR trim from disk Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #7848	2018-08-30 14:43:37 -07:00
bunder2015	de61daa597	ZTS: zvol_swap_003 path cleanup Removing hardcoded paths in zvol_swap_003 Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7839	2018-08-30 13:53:06 -07:00
bernie1995	0fe7c953b3	ZTS: path cleanup Removing hardcoded paths in many scripts. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bernie1995 <bernie.pikes@gmail.com> Issue #7507 Closes #7843	2018-08-30 13:46:55 -07:00
Brian Behlendorf	6c6949acae	ZTS: Fix zfs_create_013_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7847	2018-08-30 13:38:09 -07:00
Tom Caputi	c3bd3fb4ac	OpenZFS 9403 - assertion failed in arc_buf_destroy() Assertion failed in arc_buf_destroy() when concurrently reading block with checksum error. Porting notes: * The ability to zinject decompression errors has been added, but this only works at the zio_decompress() level, where we have all of the info we need to match against the user's zinject options. * The decompress_fault test has been added to test the new zinject functionality * We attempted to set zio_decompress_fail_fraction to (1 << 18) in ztest for further test coverage. Although this did uncover a few low priority issues, this unfortuantely also causes ztest to ASSERT in many locations where the code is working correctly since it is designed to fail on IO errors. Developers can manually set this variable with the '-o' option to find and debug issues. Authored by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Matt Ahrens <mahrens@delphix.com> Ported-by: Tom Caputi <tcaputi@datto.com> OpenZFS-issue: https://illumos.org/issues/9403 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9 Closes #7822	2018-08-29 11:33:33 -07:00
Tom Caputi	47ab01a18f	Always wait for txg sync when umounting dataset Currently, when unmounting a filesystem, ZFS will only wait for a txg sync if the dataset is dirty and not readonly. However, this can be problematic in cases where a dataset is remounted readonly immediately before being unmounted, which often happens when the system is being shut down. Since encrypted datasets require that all I/O is completed before the dataset is disowned, this issue causes problems when write I/Os leak into the txgs after the dataset is disowned, which can happen when sync=disabled. While looking into fixes for this issue, it was discovered that dsl_dataset_is_dirty() does not return B_TRUE when the dataset has been removed from the txg dirty datasets list, but has not actually been processed yet. Furthermore, the implementation is comletely different from dmu_objset_is_dirty(), adding to the confusion. Rather than relying on this function, this patch forces the umount code path (and the remount readonly code path) to always perform a txg sync on read-write datasets and removes the function altogether. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7753 Closes #7795	2018-08-27 10:16:28 -07:00
Brian Behlendorf	a584ef2605	Direct IO support Direct IO via the O_DIRECT flag was originally introduced in XFS by IRIX for database workloads. Its purpose was to allow the database to bypass the page and buffer caches to prevent unnecessary IO operations (e.g. readahead) while preventing contention for system memory between the database and kernel caches. On Illumos, there is a library function called directio(3C) that allows user space to provide a hint to the file system that Direct IO is useful, but the file system is free to ignore it. The semantics are also entirely a file system decision. Those that do not implement it return ENOTTY. Since the semantics were never defined in any standard, O_DIRECT is implemented such that it conforms to the behavior described in the Linux open(2) man page as follows. 1. Minimize cache effects of the I/O. By design the ARC is already scan-resistant which helps mitigate the need for special O_DIRECT handling. Data which is only accessed once will be the first to be evicted from the cache. This behavior is in consistent with Illumos and FreeBSD. Future performance work may wish to investigate the benefits of immediately evicting data from the cache which has been read or written with the O_DIRECT flag. Functionally this behavior is very similar to applying the 'primarycache=metadata' property per open file. 2. O_DIRECT _MAY_ impose restrictions on IO alignment and length. No additional alignment or length restrictions are imposed. 3. O_DIRECT _MAY_ perform unbuffered IO operations directly between user memory and block device. No unbuffered IO operations are currently supported. In order to support features such as transparent compression, encryption, and checksumming a copy must be made to transform the data. 4. O_DIRECT _MAY_ imply O_DSYNC (XFS). O_DIRECT does not imply O_DSYNC for ZFS. Callers must provide O_DSYNC to request synchronous semantics. 5. O_DIRECT _MAY_ disable file locking that serializes IO operations. Applications should avoid mixing O_DIRECT and normal IO or mmap(2) IO to the same file. This is particularly true for overlapping regions. All I/O in ZFS is locked for correctness and this locking is not disabled by O_DIRECT. However, concurrently mixing O_DIRECT, mmap(2), and normal I/O on the same file is not recommended. This change is implemented by layering the aops->direct_IO operations on the existing AIO operations. Code already existed in ZFS on Linux for bypassing the page cache when O_DIRECT is specified. References: * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html * https://blogs.oracle.com/roch/entry/zfs_and_directio * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics * https://illumos.org/man/3c/directio Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #224 Closes #7823	2018-08-27 10:04:21 -07:00
Joao Carlos Mendes Luis	5d6ad2442b	Fedora 28: Fix misc bounds check compiler warnings Fix a bunch of truncation compiler warnings that show up on Fedora 28 (GCC 8.0.1). Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7368 Closes #7826 Closes #7830	2018-08-26 12:55:44 -07:00
LOLi	c434d8806c	Stack overflow when destroying deeply nested clones Destroy operations on deeply nested chains of clones can overflow the stack: Depth Size Location (221 entries) ----- ---- -------- 0) 15664 48 mutex_lock+0x5/0x30 1) 15616 8 mutex_lock+0x5/0x30 ... 26) 13576 72 dsl_dataset_remove_clones_key.isra.4+0x124/0x1e0 [zfs] 27) 13504 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 28) 13432 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] ... 185) 2128 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 186) 2056 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 187) 1984 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 188) 1912 136 dsl_destroy_snapshot_sync_impl+0x4e0/0x1090 [zfs] 189) 1776 16 dsl_destroy_snapshot_check+0x0/0x90 [zfs] ... 218) 304 128 kthread+0xdf/0x100 219) 176 48 ret_from_fork+0x22/0x40 220) 128 128 kthread+0x0/0x100 Fix this issue by converting dsl_dataset_remove_clones_key() from recursive to iterative. Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7279 Closes #7810	2018-08-22 11:03:31 -07:00
Tom Caputi	149ce888bb	Fix issues with raw receive_write_byref() This patch fixes 2 issues with raw, deduplicated send streams. The first is that datasets who had been completely received earlier in the stream were not still marked as raw receives. This caused problems when newly received datasets attempted to fetch raw data from these datasets without this flag set. The second problem was that the arc freeze checksum code was not consistent about which locks needed to be held while performing its asserts. The proper locking needed to run these asserts is actually fairly nuanced, since the asserts touch the linked list of buffers (requiring the header lock), the arc_state (requiring the b_evict_lock), and the b_freeze_cksum (requiring the b_freeze_lock). This seems like a large performance sacrifice and a lot of unneeded complexity to verify that this relatively small debug feature is working as intended, so this patch simply removes these asserts instead. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7701	2018-08-20 11:03:56 -07:00
Olaf Faaland	34fe773e30	Skip import activity test in more zdb code paths Since zdb opens the pools read-only, it cannot damage the pool in the event the pool is already imported either on the same host or on another one. If the pool vdev structure is changing while zdb is importing the pool, it may cause zdb to crash. However this is unlikely, and in any case it's a user space process and can simply be run again. For this reason, zdb should disable the multihost activity test on import that is normally run. This commit fixes a few zdb code paths where that had been overlooked. It also adds tests to ensure that several common use cases handle this properly in the future. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gu Zheng <guzheng2331314@163.com> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7797 Closes #7801	2018-08-20 10:05:23 -07:00
Serapheim Dimitropoulos	a448a2557e	Introduce read/write kstats per dataset The following patch introduces a few statistics on reads and writes grouped by dataset. These statistics are implemented as kstats (backed by aggregate sums for performance) and can be retrieved by using the dataset objset ID number. The motivation for this change is to provide some preliminary analytics on dataset usage/performance. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #7705	2018-08-20 09:52:37 -07:00
bunder2015	fa84714abb	ZTS: events path cleanup Removing hardcoded paths in events.cfg Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7805	2018-08-18 21:19:41 -07:00
bunder2015	80d45e089c	ZTS: largest_pool_001 path cleanup Removing hardcoded paths in largest_pool_001 Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7804	2018-08-18 21:18:31 -07:00
bunder2015	5468ee7a2f	ZTS: privilege group path cleanup Removing hardcoded paths in privilege group tests Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7803	2018-08-18 21:17:22 -07:00
Brian Behlendorf	089b16f48d	ZTS: Fix import_cache_device_replaced Allow the 'zpool replace' to run slowly without overwhelming the vdev queues by setting zfs_scan_vdev_limit=128k. This limits the number of concurrent slow IOs which need to be handled. The net effect is the test case runs approximately 3x faster putting it well under the 10 minute per-test time limit. Rename import_cache* test cases to imprt_cachefile*. Originally these were renamed due to a maximum tar name limit, this limit was removed by commit `1dfde3d9b`. Replaced instances of /var/tmp in zpool_import.cfg with $TEST_BASE_DIR. Reviewed-by: bunder2015 <omfgbunder@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7765 Closes #7802	2018-08-18 21:16:12 -07:00
Brian Behlendorf	802715b74a	ZTS: Fix reservation_001_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7796	2018-08-17 10:01:47 -07:00
Brian Behlendorf	1dfde3d9b2	Use posix format for dist tarballs Traditionally Automake has defaulted to the V7 tar format when creating tarballs for distributions. One of the many limitions of this format is a 99 character maximum path + file name limit. This can cause problems when adding new test cases to the ZTS due to the depth of the sub-tree and descriptive test names. This change switches the build system to the posix (aliased as pax) tar format which conforms to the POSIX.1-2001 specification. This format does not suffer from the V7 limitations, was designed to be compatible, and will become the default format in future versions of GNU tar. https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html As part of this change the blockfiles directories which were originally removed due to this limit have been readded. Reviewed by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7767	2018-08-15 09:52:28 -07:00
Tom Caputi	d9c460a0b6	Added encryption support for zfs recv -o / -x One small integration that was absent from b52563 was support for zfs recv -o / -x with regards to encryption parameters. The main use cases of this are as follows: * Receiving an unencrypted stream as encrypted without needing to create a "dummy" encrypted parent so that encryption can be inheritted. * Allowing users to change their keylocation on receive, so long as the receiving dataset is an encryption root. * Allowing users to explicitly exclude or override the encryption property from an unencrypted properties stream, allowing it to be received as encrypted. * Receiving a recursive heirarchy of unencrypted datasets, encrypting the top-level one and forcing all children to inherit the encryption. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7650	2018-08-15 09:48:49 -07:00
bunder2015	64e96969a8	ZTS: delegate group path cleanup Removing hardcoded paths in delegate group tests Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7778	2018-08-13 08:24:02 -07:00
bunder2015	d02fa81c78	ZTS: acl group path cleanup Removing hardcoded paths in acl group tests Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7777	2018-08-13 08:22:41 -07:00
bunder2015	604016054e	ZTS: inuse_004 path cleanup Removing hardcoded path in inuse_004 Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7775	2018-08-13 08:16:55 -07:00
bunder2015	5fba582887	ZTS: projectquota_002 path cleanup Removing hardcoded path in projectquota_002 Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7774	2018-08-13 08:15:53 -07:00
Brian Behlendorf	94b197a0a5	ZTS: Test case reliability * Both cli_root/zpool_import/import_cache_device_replaced, and redundancy/redundancy_004_neg have been observed to fail for spurious reasons ~1% of the time. Add them to the exception list and reference the open Github issue. * Speed up replacement/replacement_001_pos to prevent it from exceeding the 10 minute per test limit and getting KILLED. File vdev creation switched to truncate -s, redundant raidz1 testing pass dropped, fixed some minor formating issues. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7766	2018-08-12 09:38:53 -07:00
LOLi	c8c308362c	Allow inherited properties in zfs_check_settable() This change modifies how 'checksum' and 'dedup' properties are verified in zfs_check_settable() handling the case where they are explicitly inherited in the dataset hierarchy when receiving a recursive send stream. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7755 Closes #7576 Closes #7757	2018-08-03 14:56:25 -07:00
Brian Behlendorf	6da0998f59	ZTS: Fix zfs_create_007_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7763	2018-08-03 10:21:50 -07:00
Paul Dagnelie	492f64e941	OpenZFS 9112 - Improve allocation performance on high-end systems Overview ======== We parallelize the allocation process by creating the concept of "allocators". There are a certain number of allocators per metaslab group, defined by the value of a tunable at pool open time. Each allocator for a given metaslab group has up to 2 active metaslabs; one "primary", and one "secondary". The primary and secondary weight mean the same thing they did in in the pre-allocator world; primary metaslabs are used for most allocations, secondary metaslabs are used for ditto blocks being allocated in the same metaslab group. There is also the CLAIM weight, which has been separated out from the other weights, but that is less important to understanding the patch. The active metaslabs for each allocator are moved from their normal place in the metaslab tree for the group to the back of the tree. This way, they will not be selected for use by other allocators searching for new metaslabs unless all the passive metaslabs are unsuitable for allocations. If that does happen, the allocators will "steal" from each other to ensure that IOs don't fail until there is truly no space left to perform allocations. In addition, the alloc queue for each metaslab group has been broken into a separate queue for each allocator. We don't want to dramatically increase the number of inflight IOs on low-end systems, because it can significantly increase txg times. On the other hand, we want to ensure that there are enough IOs for each allocator to allow for good coalescing before sending the IOs to the disk. As a result, we take a compromise path; each allocator's alloc queue max depth starts at a certain value for every txg. Every time an IO completes, we increase the max depth. This should hopefully provide a good balance between the two failure modes, while not dramatically increasing complexity. We also parallelize the spa_alloc_tree and spa_alloc_lock, which cause very similar contention when selecting IOs to allocate. This parallelization uses the same allocator scheme as metaslab selection. Performance Results =================== Performance improvements from this change can vary significantly based on the number of CPUs in the system, whether or not the system has a NUMA architecture, the speed of the drives, the values for the various tunables, and the workload being performed. For an fio async sequential write workload on a 24 core NUMA system with 256 GB of RAM and 8 128 GB SSDs, there is a roughly 25% performance improvement. Future Work =========== Analysis of the performance of the system with this patch applied shows that a significant new bottleneck is the vdev disk queues, which also need to be parallelized. Prototyping of this change has occurred, and there was a performance improvement, but more work needs to be done before its stability has been verified and it is ready to be upstreamed. Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Alexander Motin <mav@FreeBSD.org> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Ported-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Porting Notes: * Fix reservation test failures by increasing tolerance. OpenZFS-issue: https://illumos.org/issues/9112 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3f3cc3c3 Closes #7682	2018-07-31 10:52:33 -07:00
Paul Dagnelie	21d48b5eac	OpenZFS 9438 - Holes can lose birth time info if a block has a mix of birth times As reported by https://github.com/zfsonlinux/zfs/issues/4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Porting Notes: * The DMU_OBJECT_END change in zfs_znode.c was already applied. * Added test cases from #5675 provided by @rincebrain for hole_birth issues. These test cases should be pushed upstream to OpenZFS. * Updated mk_files which is used by several rsend tests so the files created are a little more interesting and may contain holes. Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9438 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/738e2a3c External-issue: DLPX-46861 Closes #7746	2018-07-30 09:27:49 -07:00
Brian Behlendorf	b719768e35	ZTS: Fix reservation_017_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7750	2018-07-30 09:23:45 -07:00
Brian Behlendorf	d441e85dd7	Add support for autoexpand property While the autoexpand property may seem like a small feature it depends on a significant amount of system infrastructure. Enough of that infrastructure is now in place that with a few modifications for Linux it can be supported. Auto-expand works as follows; when a block device is modified (re-sized, closed after being open r/w, etc) a change uevent is generated for udev. The ZED, which is monitoring udev events, passes the change event along to zfs_deliver_dle() if the disk or partition contains a zfs_member as identified by blkid. From here the device is matched against all imported pool vdevs using the vdev_guid which was read from the label by blkid. If a match is found the ZED reopens the pool vdev. This re-opening is important because it allows the vdev to be briefly closed so the disk partition table can be re-read. Otherwise, it wouldn't be possible to report the maximum possible expansion size. Finally, if the property autoexpand=on a vdev expansion will be attempted. After performing some sanity checks on the disk to verify that it is safe to expand, the primary partition (-part1) will be expanded and the partition table updated. The partition is then re-opened (again) to detect the updated size which allows the new capacity to be used. In order to make all of the above possible the following changes were required: * Updated the zpool_expand_001_pos and zpool_expand_003_pos tests. These tests now create a pool which is layered on a loopback, scsi_debug, and file vdev. This allows for testing of non- partitioned block device (loopback), a partition block device (scsi_debug), and a file which does not receive udev change events. This provided for better test coverage, and by removing the layering on ZFS volumes there issues surrounding layering one pool on another are avoided. * zpool_find_vdev_by_physpath() updated to accept a vdev guid. This allows for matching by guid rather than path which is a more reliable way for the ZED to reference a vdev. * Fixed zfs_zevent_wait() signal handling which could result in the ZED spinning when a signal was not handled. * Removed vdev_disk_rrpart() functionality which can be abandoned in favor of kernel provided blkdev_reread_part() function. * Added a rwlock which is held as a writer while a disk is being reopened. This is important to prevent errors from occurring for any configuration related IOs which bypass the SCL_ZIO lock. The zpool_reopen_007_pos.ksh test case was added to verify IO error are never observed when reopening. This is not expected to impact IO performance. Additional fixes which aren't critical but were discovered and resolved in the course of developing this functionality. * Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for ZFS volumes. This is as good as a unique physical path, while the volumes are not used in the test cases anymore for other reasons this improvement was included. Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #120 Closes #2437 Closes #5771 Closes #7366 Closes #7582 Closes #7629	2018-07-23 15:40:15 -07:00
Matthew Ahrens	2e5dc449c1	OpenZFS 9337 - zfs get all is slow due to uncached metadata This project's goal is to make read-heavy channel programs and zfs(1m) administrative commands faster by caching all the metadata that they will need in the dbuf layer. This will prevent the data from being evicted, so that any future call to i.e. zfs get all won't have to go to disk (very much). There are two parts: The dbuf_metadata_cache. We identify what to put into the cache based on the object type of each dbuf. Caching objset properties os {version,normalization,utf8only,casesensitivity} in the objset_t. The reason these needed to be cached is that although they are queried frequently, they aren't stored in a dbuf type which we can easily recognize and cache in the dbuf layer; instead, we have to explicitly store them. There's already existing infrastructure for maintaining cached properties in the objset setup code, so I simply used that. Performance Testing: - Disabled kmem_flags - Tuned dbuf_cache_max_bytes very low (128K) - Tuned zfs_arc_max very low (64M) Created test pool with 400 filesystems, and 100 snapshots per filesystem. Later on in testing, added 600 more filesystems (with no snapshots) to make sure scaling didn't look different between snapshots and filesystems. Results: \| Test \| Time (trunk / diff) \| I/Os (trunk / diff) \| +------------------------+---------------------+---------------------+ \| zpool import \| 0:05 / 0:06 \| 12.9k / 12.9k \| \| zfs get all (uncached) \| 1:36 / 0:53 \| 16.7k / 5.7k \| \| zfs get all (cached) \| 1:36 / 0:51 \| 16.0k / 6.0k \| Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> OpenZFS-issue: https://illumos.org/issues/9337 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f Closes #7668	2018-07-12 10:49:27 -07:00
Serapheim Dimitropoulos	a7ed98d8b5	OpenZFS 9330 - stack overflow when creating a deeply nested dataset Datasets that are deeply nested (~100 levels) are impractical. We just put a limit of 50 levels to newly created datasets. Existing datasets should work without a problem. The problem can be seen by attempting to create a dataset using the -p option with many levels: panic[cpu0]/thread=ffffff01cd282c20: BAD TRAP: type=8 (#df Double fault) rp=ffffffff fffffffffbc3aa60 unix:die+100 () fffffffffbc3ab70 unix:trap+157d () ffffff00083d7020 unix:_patch_xrstorq_rbx+196 () ffffff00083d7050 zfs:dbuf_rele+2e () ... ffffff00083d7080 zfs:dsl_dir_close+32 () ffffff00083d70b0 zfs:dsl_dir_evict+30 () ffffff00083d70d0 zfs:dbuf_evict_user+4a () ffffff00083d7100 zfs:dbuf_rele_and_unlock+87 () ffffff00083d7130 zfs:dbuf_rele+2e () ... The block above repeats once per directory in the ... ... create -p command, working towards the root ... ffffff00083db9f0 zfs:dsl_dataset_drop_ref+19 () ffffff00083dba20 zfs:dsl_dataset_rele+42 () ffffff00083dba70 zfs:dmu_objset_prefetch+e4 () ffffff00083dbaa0 zfs:findfunc+23 () ffffff00083dbb80 zfs:dmu_objset_find_spa+38c () ffffff00083dbbc0 zfs:dmu_objset_find+40 () ffffff00083dbc20 zfs:zfs_ioc_snapshot_list_next+4b () ffffff00083dbcc0 zfs:zfsdev_ioctl+347 () ffffff00083dbd00 genunix:cdev_ioctl+45 () ffffff00083dbd40 specfs:spec_ioctl+5a () ffffff00083dbdc0 genunix:fop_ioctl+7b () ffffff00083dbec0 genunix:ioctl+18e () ffffff00083dbf10 unix:brand_sys_sysenter+1c9 () Porting notes: * Added zfs_max_dataset_nesting module option with documentation. * Updated zfs_rename_014_neg.ksh for Linux. * Increase the zfs.sh stack warning to 15K. Enough time has passed that 16K can be reasonably assumed to be the default value. It was increased in the 3.15 kernel released in June of 2014. Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> OpenZFS-issue: https://www.illumos.org/issues/9330 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/757a75a Closes #7681	2018-07-09 13:02:50 -07:00
Brian Behlendorf	66df02497c	ZTS: clean_mirror and scrub_mirror cleanup Remove the dependency on partitionable devices for the clean_mirror and scrub_mirror test cases. This allows for the setup and cleanup of the test cases to be simplified by removing the need for complex partitioning. This change also resolves a issue where the clean_mirror devices were not being properly damaged since the device name was not a full path. The result being loopX files were being left in the top level test_results directory. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7434 Closes #7690	2018-07-09 12:46:14 -07:00
Serapheim Dimitropoulos	4d044c4c1d	OpenZFS 9238 - ZFS Spacemap Encoding V2 Motivation ========== The current space map encoding has the following disadvantages: [1] Assuming 512 sector size each entry can represent at most 16MB for a segment. This makes the encoding very inefficient for large regions of space. [2] As vdev-wide space maps have started to be used by new features (i.e. device removal, zpool checkpoint) we've started imposing limits in the vdevs that can be used with them based on the maximum addressable offset (currently 64PB for a top-level vdev). New encoding ============ The layout can be found at space_map.h and it remains backwards compatible with the old one. The introduced two-word entry format, besides extending the limits imposed by the single-entry layout, also includes a vdev field and some extra padding after its prefix. The extra padding after the prefix should is reserved for future usage (e.g. new prefixes for future encodings or new fields for flags). The new vdev field not only makes the space maps more self-descriptive, but also opens the doors for pool-wide space maps (expected to be used in the log spacemap project). One final important note is that the number of bits used for vdevs is reduced to 24 bits for blkptrs. That was decided as we don't know of any setups that use more than 16M vdevs for the time being and we wanted to fit the vdev field in the space map. In addition that gives us some extra bits in dva_t. Other references: ================= The new encoding is also discussed towards the end of the Log Space Map presentation from 2017's OpenZFS summit. Link: https://www.youtube.com/watch?v=jj2IxRkl5bQ Authored by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-commit: https://github.com/openzfs/openzfs/commit/90a56e6d OpenZFS-issue: https://www.illumos.org/issues/9238 Closes #7665	2018-07-05 12:02:34 -07:00
Ahmed Gahnem	4e82b4be78	OpenZFS 9184 - Add ZFS performance test for fixed blocksize random read/write IO This change introduces a new performance test which does random reads and writes, but instead of using `bssplit` to determine the block size, it uses a fixed blocksize. Additionally, some new IO sizes are added to other tests and timestamp data is recorded with the performance data. Authored by: Ahmed Gahnem <ahmedg@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Ported-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com> Requires-builders: perf OpenZFS-issue: https://www.illumos.org/issues/9184 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/659 External-issue: DLPX-46724 Closes #7660	2018-07-02 13:46:06 -07:00
Brian Behlendorf	e03a41a604	ZTS: Improve enospc tests The enospc_002_pos test case would frequently fail due a command succeeding when it was expected to fail due to lack of space. In order to make this far less likely, files are created across multiple transaction groups in order to consume as many unused blocks as possible. The dependency that the tests run on a partitioned block device has been removed. It's simpler to use sparse files. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7663	2018-06-29 09:40:32 -07:00
Serapheim Dimitropoulos	d2734cce68	OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570	2018-06-26 10:07:42 -07:00
Tim Chase	88eaf610d9	Use "eval" in history_002_pos for log_must Otherwise the output is consumed by the output redirection. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #7570	2018-06-26 09:58:05 -07:00
Serapheim Dimitropoulos	7637ef8d23	OpenZFS 9591 - ms_shift can be incorrectly changed ms_shift can be incorrectly changed changed in MOS config for indirect vdevs that have been historically expanded According to spa_config_update() we expect new vdevs to have vdev_ms_array equal to 0 and then we go ahead and set their metaslab size. The problem is that indirect vdevs also have vdev_ms_array == 0 because their metaslabs are destroyed once their removal is done. As a result, if a vdev was expanded and then removed may have its ms_shift changed if another vdev was added after its removal. Fortunately this behavior does not cause any type of crash or bad behavior in the kernel but it can confuse zdb and anyone doing any kind of analysis of the history of the pools. Authored by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-commit: https://github.com/openzfs/openzfs/pull/651 OpenZFS-issue: https://illumos.org/issues/9591a External-issue: DLPX-58879 Closes #7644	2018-06-21 09:35:26 -07:00
Brian Behlendorf	e4a3297a04	ZTS: Adopt OpenZFS test analysis script Adopt and extend the OpenZFS ZTS results analysis script for use with ZFS on Linux. This allows for automatic analysis of tests which may be skipped for a variety or reasons or which are not entirely reliable. In addition to the list of 'known' failures, which have been updated for ZFS on Linux, there in a new 'maybe' section. This mapping include tests which might be correctly skipped depending on the test environment. This may be because of a missing dependency or lack of required kernel support. This list also includes tests which normally pass but might on occasion fail for a harmless reason. The script was also extended include a reason for why a given test might be skipped or may fail. The reason will be included after the test in the "results other than PASS that are expected" section. For failures it is preferable to set the reason to the GitHub issue number and for skipped tests several generic reasons are available. You may also specify a custom reason if needed. All tests were added back in to the linux.run file even if they are expected to failed. There is value in running tests which may not pass, the expected results for these tests has been encoded in the new analysis script. All tests which were disabled because they ran more slowly on a 32-bit system have been re-enabled. Developers working on 32-bit systems should assess what it reasonable for their environment. The unnecessary dependency on physical block devices was removed for the checksum, grow_pool, and grow_replicas test groups so they are no longer skipped. Updated the filetest_001_pos test case to run properly now that it is enabled and moved the grow tests in to a single directory. Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7638	2018-06-20 14:03:13 -07:00
John Gallagher	917f475fba	Add tunables for channel programs This patch adds tunables for modifying the maximum memory limit and maximum instruction limit that can be specified when running a channel program. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov Reviewed-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: John Gallagher <john.gallagher@delphix.com> External-issue: LX-1085 Closes #7618	2018-06-15 15:10:42 -07:00
John Gallagher	ab24877bd3	ZTS: deletes home directories in /export/home In the cleanup for the privilege tests, an empty variable, empty because the corresponding setup is skipped on Linux, results in /export/home being deleted. This patch adds an assertion that the variable is not empty, and causes the cleanup to be skipped on Linux as well. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: John Gallagher <john.gallagher@delphix.com> External-issue: LX-1099 Closes #7615	2018-06-12 10:42:26 -07:00
bunder2015	5277571260	ZTS: cleanup user_run user_run leaves two files in /tmp, moving them to $TEST_BASE_DIR and adding them to the default cleanup routine. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7614	2018-06-12 10:37:12 -07:00
Antonio Russo	39042f9736	Tunable directory for zfs runtime scripts zpool and zed place scripts in subdirectories of libexecdir. Some distributions locate architecture independent scripts in other locations (e.g. Debian). To avoid these paths getting out of sync, centralize the definitions. Build zfs-test's default.cfg by Makefile. Use the new directory logic building tests/zfs-tests/include/default.cfg.in. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7597	2018-06-07 09:59:59 -07:00
Antonio Russo	7106b23640	Minor documentation, logging, and testing typos This patch collects some minor inconsistencies and typos in the documentation, logging and testing infrastructure. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7608	2018-06-07 09:38:39 -07:00
Alek P	6969afcefd	Always continue recursive destroy after error Currently, during a recursive zfs destroy the first error that is encountered will stop the destruction of the datasets. Errors may happen for a variety of reasons including competing deletions and busy datasets. This patch switches recursive destroy to always do a best-effort recursive dataset destroy. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7574	2018-06-06 10:14:52 -07:00
bunder2015	62841115bc	ZTS: history path cleanup History tests were hard coded to use /tmp and didn't clean up properly after testing. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Issue #7507 Closes #7600	2018-06-06 10:00:22 -07:00
Tony Hutter	f0ed6c7448	Add pool state /proc entry, "SUSPENDED" pools 1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7331 Closes #7563	2018-06-06 09:33:54 -07:00
John Wren Kennedy	ab44e511e2	OpenZFS 9245 - zfs-test failures: slog_013_pos and slog_014_pos Test 13 would fail because of attempts to zpool destroy -f a pool that was still busy. Changed those calls to destroy_pool which does a retry loop, and the problem is no longer reproducible. Also removed some non functional code in the test which is why it was originally commented out by placing it after the call to log_pass. Test 14 would fail because sometimes the check for a degraded pool would complete before the pool had changed state. Changed the logic to check in a loop with a timeout and the problem is no longer reproducible. Authored by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Yuri Pankov <yuripv@yuripv.net> Reviewed-by: George Melikov <mail@gmelikov.ru> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: * Re-enabled slog_013_pos.ksh OpenZFS-issue: https://illumos.org/issues/9245 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8f323b5 Closes #7585	2018-06-04 14:55:02 -07:00
Sara Hartse	74d42600d8	zpool reopen should detect expanded devices Update bdev_capacity to have wholedisk vdevs query the size of the underlying block device (correcting for the size of the efi parition and partition alignment) and therefore detect expanded space. Correct vdev_get_stats_ex so that the expandsize is aligned to metaslab size and new space is only reported if it is large enough for a new metaslab. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Wren Kennedy <jwk404@gmail.com> Signed-off-by: sara hartse <sara.hartse@delphix.com> External-issue: LX-165 Closes #7546 Issue #7582	2018-05-31 10:36:37 -07:00
John Wren Kennedy	93491c4bb9	OpenZFS 9082 - Add ZFS performance test targeting ZIL latency This adds a new test to measure ZIL performance. - Adds the ability to induce IO delays with zinject - Adds a new variable (PERF_NTHREADS_PER_FS) to allow fio threads to be distributed to individual file systems as opposed to all IO going to one, as happens elsewhere. - Refactoring of do_fio_run Authored by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: John Wren Kennedy <jwk404@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/9082 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/634 External-issue: DLPX-48625 Closes #7491	2018-05-30 11:59:04 -07:00
Brian Behlendorf	93ce2b4ca5	Update build system and packaging Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_. Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> TEST_ZIMPORT_SKIP="yes" Closes #7556	2018-05-29 16:00:33 -07:00
Tony Nguyen	ba863d0be4	Profiling for perf tests Stack profiling is quite useful and Linux ZFS test suite does not current collect that data. Linux perf is a common tool for this purpose though the perf record data file can be quite large. With this change, Linux ZFS perf tests capture perf record data if perf is installed on the system and PERF_DO_PROFILING environment variable is set. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com> External-issue: LX-971 Closes #7549	2018-05-22 10:51:46 -07:00
Brian Behlendorf	ab6a2b5cd7	ZTS: Improve zpool_scrub_004_pos reliability It's possible for the `zpool attach` portion of this test case to complete before the `zpool scrub` can be issued. Update the test case to force the resilvering phase to take longer. Reviewed-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5444 Closes #7541	2018-05-15 08:58:46 -07:00
Brian Behlendorf	8c64fe0442	ZTS: Update O_TMPFILE support check In CentOS 7.5 the kernel provided a compatibility wrapper to support O_TMPFILE. This results in the test setup script correctly detecting kernel support. But the ZFS module was built without O_TMPFILE support due to the non-standard CentOS kernel interface. Handle this case by updating the setup check to fail either when the kernel or the ZFS module fail to provide support. The reason will be clearly logged in the test results. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7528	2018-05-14 20:36:30 -07:00
Pavel Zakharov	189bd0b670	OpenZFS 9190 - Fix cleanup routine in import_cachefile_device_replaced.ksh Must clear slow-disk zinject injections in test cleanup routine. Otherwise, when this test fails, it causes most subsequent tests to fail. Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://illumos.org/issues/9190 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/762c6b4 Closes #7530	2018-05-14 14:30:28 -04:00
bunder2015	29badadd4e	Fix shebangs on import tests Incorrect shebangs were used when porting. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7523 Closes #7524	2018-05-11 12:37:44 -07:00
bunder2015	670d74b9ce	ZTS: enospc_002 path cleanup Removing hard-coded path used in enospc_002 Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7515	2018-05-08 21:42:58 -07:00
Tim Chase	f3d28f0a59	Streamline the zpool_import tests Don't create an ext4 file system atop $DEV_DISKDIR/$DISK2. There's likely to not be sufficient space for it to succeed. Instead, simply create the vdev files in the directory where it would have been mounted. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7459	2018-05-08 21:40:13 -07:00
Pavel Zakharov	6cb8e5306d	OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery Some work has been done lately to improve the debugability of the ZFS pool load (and import) process. This includes: 7638 Refactor spa_load_impl into several functions 8961 SPA load/import should tell us why it failed 7277 zdb should be able to print zfs_dbgmsg's To iterate on top of that, there's a few changes that were made to make the import process more resilient and crash free. One of the first tasks during the pool load process is to parse a config provided from userland that describes what devices the pool is composed of. A vdev tree is generated from that config, and then all the vdevs are opened. The Meta Object Set (MOS) of the pool is accessed, and several metadata objects that are necessary to load the pool are read. The exact configuration of the pool is also stored inside the MOS. Since the configuration provided from userland is external and might not accurately describe the vdev tree of the pool at the txg that is being loaded, it cannot be relied upon to safely operate the pool. For that reason, the configuration in the MOS is read early on. In the past, the two configurations were compared together and if there was a mismatch then the load process was aborted and an error was returned. The latter was a good way to ensure a pool does not get corrupted, however it made the pool load process needlessly fragile in cases where the vdev configuration changed or the userland configuration was outdated. Since the MOS is stored in 3 copies, the configuration provided by userland doesn't have to be perfect in order to read its contents. Hence, a new approach has been adopted: The pool is first opened with the untrusted userland configuration just so that the real configuration can be read from the MOS. The trusted MOS configuration is then used to generate a new vdev tree and the pool is re-opened. When the pool is opened with an untrusted configuration, writes are disabled to avoid accidentally damaging it. During reads, some sanity checks are performed on block pointers to see if each DVA points to a known vdev; when the configuration is untrusted, instead of panicking the system if those checks fail we simply avoid issuing reads to the invalid DVAs. This new two-step pool load process now allows rewinding pools accross vdev tree changes such as device replacement, addition, etc. Loading a pool from an external config file in a clustering environment also becomes much safer now since the pool will import even if the config is outdated and didn't, for instance, register a recent device addition. With this code in place, it became relatively easy to implement a long-sought-after feature: the ability to import a pool with missing top level (i.e. non-redundant) devices. Note that since this almost guarantees some loss of data, this feature is for now restricted to a read-only import. Porting notes (ZTS): * Fix 'make dist' target in zpool_import * The maximum path length allowed by tar is 99 characters. Several of the new test cases exceeded this limit resulting in them not being included in the tarball. Shorten the names slightly. * Set/get tunables using accessor functions. * Get last synced txg via the "zfs_txg_history" mechanism. * Clear zinject handlers in cleanup for import_cache_device_replaced and import_rewind_device_replaced in order that the zpool can be exported if there is an error. * Increase FILESIZE to 8G in zfs-test.sh to allow for a larger ext4 file system to be created on ZFS_DISK2. Also, there's no need to partition ZFS_DISK2 at all. The partitioning had already been disabled for multipath devices. Among other things, the partitioning steals some space from the ext4 file system, makes it difficult to accurately calculate the paramters to parted and can make some of the tests fail. * Increase FS_SIZE and FILE_SIZE in the zpool_import test configuration now that FILESIZE is larger. * Write more data in order that device evacuation take lonnger in a couple tests. * Use mkdir -p to avoid errors when the directory already exists. * Remove use of sudo in import_rewind_config_changed. Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9075 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/619c0123 Closes #7459	2018-05-08 21:35:27 -07:00
Paul Dagnelie	ca0845d59e	OpenZFS 9256 - zfs send space estimation off by > 10% on some datasets Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> Porting Notes: * Added tuning to man page. * Test case changes dropped, default behavior unchanged. OpenZFS-issue: https://www.illumos.org/issues/9256 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/32356b3c56 Closes #7470	2018-05-08 08:59:24 -07:00
LOLi	4ceb8dd6fd	Fix 'zpool create -t <tempname>' Creating a pool with a temporary name fails when we also specify custom dataset properties: this is because we mistakenly call zfs_set_prop_nvlist() on the "real" pool name which, as expected, cannot be found because the SPA is present in the namespace with the temporary name. Fix this by specifying the correct pool name when setting the dataset properties. Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7502 Closes #7509	2018-05-07 21:11:58 -07:00
Brian Behlendorf	c02c1becce	ZTS: Re-enable MMP tests Commit `7fab6361` inadvertently disabled the MMP test cases by creating and not removing an /etc/hostid file in the new zpool_split_props test case. When the file exists the ZTS skips the entire MMP test group rather than modify what may be a system which is already configured. Update the test case to remove the file. Additionally, because the MMP tests were disabled a regression slipped in as part of commit `9eb7b46ed0`. Fix it. Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7514	2018-05-07 21:08:33 -07:00
bunder2015	a82a4a15be	ZTS: remove dead cleanup code from snapshot tests Caught during path cleanups, the files referenced do not appear to be created or used anywhere. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7508	2018-05-06 21:02:10 -07:00
LOLi	fb0da71fd9	ZTS: Fix zfs_diff_timestamp When using mawk instead of gawk zfs_diff_timestamp fails consistently: this is due to a subtle difference in how mawk handles substr(). From awk(1): --- Finally, here is how mawk handles exceptional cases not discussed in the AWK book or the Posix draft. It is unsafe to assume consistency across awks and safe to skip to the next section. substr(s, i, n) returns the characters of s in the intersection of the closed interval [1, length(s)] and the half-open interval [i, i+n). When this intersection is empty, the empty string is returned; so substr("ABC", 1, 0) = "" and substr("ABC", -4, 6) = "A". --- To support running zfs_diff_timestamp with both gawk and mawk change the second parameter passed to substr(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7503 Closes #7510	2018-05-06 20:42:29 -07:00
Tom Caputi	be9a5c355c	Add support for decryption faults in zinject This patch adds the ability for zinject to trigger decryption and authentication faults in the ZIO and ARC layers. This functionality is exposed via the new "decrypt" error type, which may be provided for "data" object types. This patch also refactors some of the core encryption / decryption functions so that they have consistent prototypes, handle errors consistently, and do not have unused arguments. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7474	2018-05-02 15:36:20 -07:00
Tom Caputi	2c24b5b148	Fix issues found with zfs diff Two deadlocks / ASSERT failures were introduced in `a2c2ed1b` which would occur whenever arc_buf_fill() failed to decrypt a block of data. This occurred because the call to arc_buf_destroy() which was responsible for cleaning up the newly created buffer would attempt to take out the hdr lock that it was already holding. This was resolved by calling the underlying functions directly without retaking the lock. In addition, the dmu_diff() code did not properly ensure that keys were loaded and mapped before begining dataset traversal. It turns out that this code does not need to look at any encrypted values, so the code was altered to perform raw IO only. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7354 Closes #7456	2018-05-01 11:24:20 -07:00
loli10K	85ce3f4fd1	Adopt pyzfs from ClusterHQ This commit introduces several changes: * Update LICENSE and project information * Give a good PEP8 talk to existing Python source code * Add RPM/DEB packaging for pyzfs * Fix some outstanding issues with the existing pyzfs code caused by changes in the ABI since the last time the code was updated * Integrate pyzfs Python unittest with the ZFS Test Suite * Add missing libzfs_core functions: lzc_change_key, lzc_channel_program, lzc_channel_program_nosync, lzc_load_key, lzc_receive_one, lzc_receive_resumable, lzc_receive_with_cmdprops, lzc_receive_with_header, lzc_reopen, lzc_send_resume, lzc_sync, lzc_unload_key, lzc_remap Note: this commit slightly changes zfs_ioc_unload_key() ABI. This allow to differentiate the case where we tried to unload a key on a non-existing dataset (ENOENT) from the situation where a dataset has no key loaded: this is consistent with the "change" case where trying to zfs_ioc_change_key() from a dataset with no key results in EACCES. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7230	2018-05-01 10:33:35 -07:00
LOLi	3cbe89b12a	Fix zfs incremental send remove '-o' properties When receiving an incremental send stream with intermediary snapshots zfs_receive_one() does not correctly identify the top-level dataset: consequently we restore said snapshots as if they were children datasets in the hierarchy, forcing inheritance of any property received with 'zfs send -o' and effectively removing any locally set value. The test case did not correctly verify this situation because it uses adjacent snapshots, basically testing 'zfs send -i' instead of 'zfs send -I': this commit adds an additional intermediary snapshot to the test script. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7478	2018-04-30 20:58:29 -07:00
Antonio Russo	c83ccb3e72	Add test with two kinds of file creation orders Data loss was identified in #7401 when many small files were copied. This adds a reproducer for this bug and other similar ones: randomly generate N files. Then, listing M of them by `ls -U` order, produce those same files in a directory of the same name. This triggers the bug consistently, provided N and M are large enough. Here, N=2^16 and M=2^13. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7411	2018-04-30 12:45:47 -05:00
LOLi	b4555c777a	Fix 'zfs remap <poolname@snapname>' Only filesystems and volumes are valid 'zfs remap' parameters: when passed a snapshot name zfs_remap_indirects() does not handle the EINVAL returned from libzfs_core, which results in failing an assertion and consequently crashing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7454	2018-04-19 09:45:17 -07:00
Chunwei Chen	599b864813	Fix ENOSPC in "Handle zap_add() failures in ..." Commit `cc63068` caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since `cc63068` has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7401 Closes #7421	2018-04-18 14:19:50 -07:00
Tom Caputi	b0ee5946aa	Fix issues with raw sends of spill blocks This patch fixes 2 issues in how spill blocks are processed during raw sends. The first problem is that compressed spill blocks were using the logical length rather than the physical length to determine how much data to dump into the send stream. The second issue is a typo that caused the spill record's object number to be used where the objset's ID number was required. Both issues have been corrected, and the payload_size is now printed in zstreamdump for future debugging. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7378 Closes #7432	2018-04-17 11:19:03 -07:00
Tom Caputi	e14a32b1c8	Fix object reclaim when using large dnodes Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7097 Closes #7433	2018-04-17 11:13:57 -07:00

... 3 4 5 6 7 ...

758 Commits