Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Colm	658fb8020f	Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off\|legacy\|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in all files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468	2021-02-17 21:30:45 -08:00
Brian Behlendorf	b2255edcc0	Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid\|raidz\|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102	2020-11-13 13:51:51 -08:00
sterlingjensen	a4ae4998cb	Fix memleak in cmd/mount_zfs.c Convert dynamic allocation to static buffer, simplify parse_dataset function return path. Add tests specific to the mount helper. Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11098	2020-11-10 15:50:44 -08:00
Ryan Moeller	cce66dfa8e	ZTS: Output all block copies in list_file_blocks The second part of list_file_blocks transforms the object description output by zdb -ddddd $ds $objnum into a stream of lines of the form "level path offset length" for the indirect blocks in the given file. The current code only works for the first copy of L0 blocks. L1 and L2 indirect blocks have more than one copy on disk. Add one more -d to the zdb command so we get all block copies and rewrite the transformation to match more than L0 and output all DVAs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11141	2020-11-05 17:16:21 -08:00
Ryan Moeller	94deb47872	ZTS: Fix list_file_blocks for mirror vdevs, level > 0 The first part of list_file_blocks transforms the pool configuration output by zdb -C $pool into shell code to set up a shell variable, VDEV_MAP, that maps from vdev id to the underlying vdev path. This variable is a simple indexed array. However, the vdev id in a DVA is only the id of the top level vdev. When the pool is mirrored, the top level vdev is a mirror and its children are the mirrored devices. So, what we need is to map from the top level vdev id to a list of the underlying vdev paths. ist_file_blocks does not need to work for raidz vdevs, so we can disregard that case. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11141	2020-11-05 17:16:16 -08:00
Richard Elling	e9527d44e6	Add zpool_influxdb command A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #10786	2020-10-09 09:29:21 -07:00
Ryan Moeller	73989f4b9e	Make dbufstat work on FreeBSD With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11008	2020-10-08 09:40:23 -07:00
George Amanakis	a76e4e6761	Make L2ARC tests more robust Instead of relying on arbitrary timers after pool export/import or cache device off/online rely on arcstats. This makes the L2ARC tests more robust. Also cleanup some functions related to persistent L2ARC. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10983	2020-10-05 15:29:05 -07:00
Ryan Moeller	c0bd2e0fe2	Drop references when skipping dmu_send due to EXDEV When an invalid incremental send is requested where the "to" ds is before the "from" ds, make sure to drop the reference to the pool and the dataset before returning the error. Add an assert on FreeBSD to make sure we don't hold any locks after returning from an ioctl. Add some test coverage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10919	2020-09-30 13:19:49 -07:00
George Wilson	c494aa7f57	vdev_ashift should only be set once == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-71831 Closes #10932	2020-09-18 12:13:47 -07:00
George Amanakis	085321621e	Add L2ARC arcstats for MFU/MRU buffers and buffer content type Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their content type is unknown. Knowing this information may prove beneficial in adjusting the L2ARC caching policy. This commit adds L2ARC arcstats that display the aligned size (in bytes) of L2ARC buffers according to their content type (data/metadata) and according to their ARC state (MRU/MFU or prefetch). It also expands the existing evict_l2_eligible arcstat to differentiate between MFU and MRU buffers. L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a buffer, its ARC state (MRU/MFU) is stored in the L2 header (b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size (in bytes) of L2ARC buffers according to their ARC state (based on b_arcs_state). We also account for the case where an L2ARC and ARC cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize reflects the alinged size (in bytes) of L2ARC buffers that were cached while they had the prefetch flag set in ARC. This is dynamically updated as the prefetch flag of L2ARC buffers changes. When buffers are evicted from ARC, if they are determined to be L2ARC eligible then their logical size is recorded in evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon eviction. Persistent L2ARC: When committing an L2ARC buffer to a log block (L2ARC metadata) its b_arcs_state and prefetch flag is also stored. If the buffer changes its arcstate or prefetch flag this is reflected in the above arcstats. However, the L2ARC metadata cannot currently be updated to reflect this change. Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count this as an MRU buffer. The buffer transitions to MFU. The arcstats are updated to reflect this. Upon pool re-import or on/offlining the L2ARC device the arcstats are cleared and the buffer will now be counted as an MRU buffer, as the L2ARC metadata were not updated. Bug fix: - If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of an ARC buffer. However, prefetches may be issued in a way that arc_read_done() is bypassed. Instead, move the related code in l2arc_write_eligible() to account for those cases too. Also add a test and update manpages for l2arc_mfuonly module parameter, and update the manpages and code comments for l2arc_noprefetch. Move persist_l2arc tests to l2arc. Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10743	2020-09-14 10:10:44 -07:00
Don Brady	4f07282786	Avoid posting duplicate zpool events Duplicate io and checksum ereport events can misrepresent that things are worse than they seem. Ideally the zpool events and the corresponding vdev stat error counts in a zpool status should be for unique errors -- not the same error being counted over and over. This can be demonstrated in a simple example. With a single bad block in a datafile and just 5 reads of the file we end up with a degraded vdev, even though there is only one unique error in the pool. The proposed solution to the above issue, is to eliminate duplicates when posting events and when updating vdev error stats. We now save recent error events of interest when posting events so that we can easily check for duplicates when posting an error. Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10861	2020-09-04 10:34:28 -07:00
Ryan Moeller	964791acdc	Make spa_stats.c tunables visible on FreeBSD Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and add update tunables.cfg in tests for the newly supported ones. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10858	2020-09-01 16:19:19 -07:00
Ryan Moeller	b0e75805ee	ZTS: Improve block_device_wait on FreeBSD FreeBSD doesn't have an equivalent to udevadm settle, so we have been resorting to a three second sleep to wait for device changes to take effect. This is far from ideal. We are mainly waiting for volmode=geom zvols to appear in /dev, so as a hack, reading the geom config will have the desired effect of quiescing the geom state. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10768	2020-08-24 08:50:15 -07:00
Michael Niewöhner	10b3c7f5e4	Add zstd support to zfs This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #6247 Closes #9024 Closes #10277 Closes #10278	2020-08-20 10:30:06 -07:00
Ryan Moeller	d4e6e9597d	ZTS: Remove bashisms from zfs-tests.sh Bring zfs-tests.sh in to compliance with the other scripts by converting it /bin/sh for to avoid a dependency on bash. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10640	2020-08-07 14:10:48 -07:00
Ryan Moeller	b6737193ee	FreeBSD: Fix `zfs jail` and add a test zfs_jail was not using zfs_ioctl so failed to map the IOC number correctly. Use zfs_ioctl to perform the jail ioctl and add a test case for FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10658	2020-08-01 08:44:54 -07:00
Ryan Moeller	af524bf7ff	ZTS: FreeBSD does have a l2arc.trim_ahead tunable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10633	2020-07-31 21:18:32 -07:00
Matthew Ahrens	3eabed74c0	Fix lua stack overflow on recursive call to gsub() The `zfs program` subcommand invokes a LUA interpreter to run ZFS "channel programs". This interpreter runs in a constrained environment, with defined memory limits. The LUA stack (used for LUA functions that call each other) is allocated in the kernel's heap, and is limited by the `-m MEMORY-LIMIT` flag and the `zfs_lua_max_memlimit` module parameter. The C stack is used by certain LUA features that are implemented in C. The C stack is limited by `LUAI_MAXCCALLS=20`, which limits call depth. Some LUA C calls use more stack space than others, and `gsub()` uses an unusually large amount. With a programming trick, it can be invoked recursively using the C stack (rather than the LUA stack). This overflows the 16KB Linux kernel stack after about 11 iterations, less than the limit of 20. One solution would be to decrease `LUAI_MAXCCALLS`. This could be made to work, but it has a few drawbacks: 1. The existing test suite does not pass with `LUAI_MAXCCALLS=10`. 2. There may be other LUA functions that use a lot of stack space, and the stack space may change depending on compiler version and options. This commit addresses the problem by adding a new limit on the amount of free space (in bytes) remaining on the C stack while running the LUA interpreter: `LUAI_MINCSTACK=4096`. If there is less than this amount of stack space remaining, a LUA runtime error is generated. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10611 Closes #10613	2020-07-27 16:11:47 -07:00
Ryan Moeller	103bc5b957	ZTS: Fix nonportable use of stat in list_file_blocks FreeBSD stat uses -f to specify the format string rather than -c. list_file_blocks in blkdev.shlib uses stat -c %i to get a file's object ID for zdb. We already have a library function to do this portably. Use get_objnum to get the file's object ID. Take log_must off of the call to list_free_blocks in corrupt_blocks_at_level, which had masked the error. It was not good to pipe the output of log_must into the while-loop, anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10572	2020-07-15 21:26:39 -07:00
Arvind Sankar	38e2e9ce83	Centralize variable substitution A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10559	2020-07-14 17:33:44 -07:00
George Wilson	c15d36c674	Remove dependency on sharetab file and refactor sharing logic == Motivation and Context The current implementation of 'sharenfs' and 'sharesmb' relies on the use of the sharetab file. The use of this file is os-specific and not required by linux or freebsd. Currently the code must maintain updates to this file which adds complexity and presents a significant performance impact when sharing many datasets. In addition, concurrently running 'zfs sharenfs' command results in missing entries in the sharetab file leading to unexpected failures. == Description This change removes the sharetab logic from the linux and freebsd implementation of 'sharenfs' and 'sharesmb'. It still preserves an os-specific library which contains the logic required for sharing NFS or SMB. The following entry points exist in the vastly simplified libshare library: - sa_enable_share -- shares a dataset but may not commit the change - sa_disable_share -- unshares a dataset but may not commit the change - sa_is_shared -- determine if a dataset is shared - sa_commit_share -- notify NFS/SMB subsystem to commit the shares - sa_validate_shareopts -- determine if sharing options are valid The sa_commit_share entry point is provided as a performance enhancement and is not required. The sa_enable_share/sa_disable_share may commit the share as part of the implementation. Libshare provides a framework for both NFS and SMB but some operating systems may not fully support these protocols or all features of the protocol. NFS Operation: For linux, libshare updates /etc/exports.d/zfs.exports to add and remove shares and then commits the changes by invoking 'exportfs -r'. This file, is automatically read by the kernel NFS implementation which makes for better integration with the NFS systemd service. For FreeBSD, libshare updates /etc/zfs/exports to add and remove shares and then commits the changes by sending a SIGHUP to mountd. SMB Operation: For linux, libshare adds and removes files in /var/lib/samba/usershares by calling the 'net' command directly. There is no need to commit the changes. FreeBSD does not support SMB. == Performance Results To test sharing performance we created a pool with an increasing number of datasets and invoked various zfs actions that would enable and disable sharing. The performance testing was limited to NFS sharing. The following tests were performed on an 8 vCPU system with 128GB and a pool comprised of 4 50GB SSDs: Scale testing: - Share all filesystems in parallel -- zfs sharenfs=on <dataset> & - Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> & Functional testing: - share each filesystem serially -- zfs share -a - unshare each filesystem serially -- zfs unshare -a - reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool> For 'zfs sharenfs=on' scale testing we saw an average reduction in time of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time of 83.36%. Functional testing also shows a huge improvement: - zfs share -- 97.97% reduction in time - zfs unshare -- 96.47% reduction in time - zfs inhert -r sharenfs -- 99.01% reduction in time Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Bryant G. Ly <bryangly@gmail.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-68690 Closes #1603 Closes #7692 Closes #7943 Closes #10300	2020-07-13 09:19:18 -07:00
Ryan Moeller	a2ec738c75	ZTS: Make bc conditional use compatible with new BSD bc FreeBSD recently replaced the GNU bc and dc in the base system with BSD licensed versions. They are supposed to be compatible with all the features present in the GNU versions, but it turns out they are picky about `if` statements having a corresponding `else`. ZTS uses `echo "if ($x > $y) 1" \| bc` in a few places, which causes tests to fail unexpectedly with the new bc. Change the two expressions in ZTS to `if ($x > $y) 1 else 0` for compatibility with the new BSD bc. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10551	2020-07-09 17:49:02 -07:00
Brian Behlendorf	9a49d3f3d3	Add device rebuild feature The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <he.huang@intel.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: John Poduska <jpoduska@datto.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10349	2020-07-03 11:05:50 -07:00
Arvind Sankar	6b99fc0620	Fixes for make dist Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10501	2020-06-26 14:20:02 -07:00
felixdoerre	221e67040f	pam: implement a zfs_key pam module Implements a pam module for automatically loading zfs encryption keys for home datasets. The pam module: - loads a zfs key and mounts the dataset when a session opens. - unmounts the dataset and unloads the key when the session closes. - when the user is logged on and changes the password, the module changes the encryption key. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: @jengelh <jengelh@inai.de> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Dörre <felix@dogcraft.de> Closes #9886 Closes #9903	2020-06-24 18:45:44 -07:00
Ryan Moeller	9192f27c1d	Add zfs_multihost_interval tunable handler for FreeBSD This tunable required a handler to be implemented for ZFS_MODULE_PARAM_CALL. Add the handler so the tunable can be declared in common code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10490	2020-06-23 13:32:42 -07:00
Matthew Ahrens	f66434268c	Remove unnecessary references to slavery The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10435	2020-06-10 17:07:59 -07:00
George Amanakis	b7654bd794	Trim L2ARC The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9713 Closes #9789 Closes #10224	2020-06-09 10:15:08 -07:00
Paul Dagnelie	de4f06c275	Small program that converts a dataset id and an object id to a path Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10204	2020-05-20 10:05:33 -07:00
John Poduska	41035a0496	Resilver restarts unnecessarily when it encounters errors When a resilver finishes, vdev_dtl_reassess is called to hopefully excise DTL_MISSING (amongst other things). If there are errors during the resilver, they are tracked in DTL_SCRUB, as spelled out in the block comment in vdev.c. DTL_SCRUB is in-core only, so it can only be used if the pool was online for the whole resilver. This state is tracked with the spa_scrub_started flag, which only gets set when the scan is initialized. Unfortunately, this flag gets cleared right before vdev_dtl_reassess gets called, so if there are any errors during the scan, DTL_MISSING will never get excised and the resilver will just continually restart. This fix simply moves clearing that flag until after the call to vdev_dtl_reasses. In addition, if a pool is imported and already has scn_errors > 0, this change will restart the resilver immediately instead of doing the rest of the scan and then restarting it from the beginning. On the other hand, if scn_errors == 0 at import, then no errors have been encountered so far, so the spa_scrub_started flag can be safely set. A test has been added to verify that resilver does not restart when relevant DTL's are available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10291	2020-05-13 10:54:27 -07:00
Ryan Moeller	6ed4391da9	ZTS: Count CKSUM for all vdevs in verify_pool The verify_pool function should detect checksum errors on any vdev, but it was only checking at the root of the pool. Accumulate the errors for all vdevs to obtain the correct count. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10271	2020-04-30 17:50:16 -07:00
Ryan Moeller	a7929f3137	Update FreeBSD tunables Remove some obsolete legacy compat, rename some misnamed, and add some missing tunables for FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10203	2020-04-15 11:14:47 -07:00
Matthew Macy	9f0a21e641	Add FreeBSD support to OpenZFS Add the FreeBSD platform code to the OpenZFS repository. As of this commit the source can be compiled and tested on FreeBSD 11 and 12. Subsequent commits are now required to compile on FreeBSD and Linux. Additionally, they must pass the ZFS Test Suite on FreeBSD which is being run by the CI. As of this commit 1230 tests pass on FreeBSD and there are no unexpected failures. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #898 Closes #8987	2020-04-14 11:36:28 -07:00
Matthew Ahrens	c618f87cd2	Add `zstream redup` command to convert deduplicated send streams Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10124 Closes #10156	2020-04-10 10:39:55 -07:00
George Amanakis	77f6826b83	Persistent L2ARC This commit makes the L2ARC persistent across reboots. We implement a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot. This significantly eases the impact a reboot has on read performance on systems with large caches. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Saso Kiselkov <skiselkov@gmail.com> Co-authored-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: George Amanakis <gamanakis@gmail.com> Ported-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #925 Closes #1823 Closes #2672 Closes #3744 Closes #9582	2020-04-10 10:33:35 -07:00
George Amanakis	37c22948e5	Reset l2ad_hand and l2ad_first in l2arc_evict Increasing l2arc_write_size or l2arc_write_boost can result in l2arc_write_buffers() not having enough space to perform its writes and panic zio_write_phys(). Instead of resetting l2ad_hand to l2ad_start at the end of l2arc_write_buffers() and not taking into account a possible user-mediated increase of l2arc_write_max, we do this in l2arc_evict(), right after l2arc_write_size() has run. If there is not enough space to evict (ie we will exceed l2ad_end) we evict to the end of the device, reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making sure in l2arc_write_size() that l2ad_start + size does not exceed l2ad_end. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10154	2020-03-31 10:46:48 -07:00
Ryan Moeller	9be70c3784	ZTS: Simplify some libtest functions Don't echo the results of arithmetic expressions, it's not necessary. Use hw.clockrate sysctl to get CPU freq instead of parsing dmesg.boot for a line that might not even be there anymore. Reduce bookkeeping in fill_fs, making it easier to follow. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10113	2020-03-10 10:44:14 -07:00
Ryan Moeller	2b95e91132	ZTS: Another round of changes for FreeBSD Highlights: * is_linux -> is_illumos swaps * make block_device_wait more clever when paths are given * slightly optimize default_cleanup_noexit * remove platform differences in user_run * temporarily expect non-libfetch behavior for keylocation=/foo/bar * fix sharenfs exceptions * don't test multihost property * fix misc broken platform checks * clear zinjected faults in removal_resume_export callback Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10092	2020-03-06 09:31:32 -08:00
Ryan Moeller	3d5ba1cf29	ZTS: Misc fixes for FreeBSD * Set geom debug flags in corrupt_blocks_at_level * Use the right time zone for history tests * Add missing commands.cfg entry for diskinfo * Rewrite get_last_txg_synced to use zdb * Don't check ulimits for sparse files * Suspend removal before removing a vdev, not after Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10054	2020-02-27 09:38:34 -08:00
Ryan Moeller	3a192f7d89	ZTS: Misc fixes for FreeBSD * Check for mountd in is_shared to avoid timeout when not running * Enhance robustness of some cleanup functions * Simplify atime lookup * Skip sharenfs validation for now * Don't add mountpoint property to inheritance validation on FreeBSD Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10047	2020-02-25 16:23:27 -08:00
Olaf Faaland	034313908a	ZTS: zed_start should not fail if zed is already running zed_start may be called in places where zed is not typically already running, but this is not a requirement of the tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #9974	2020-02-25 16:02:10 -08:00
Ryan Moeller	b7dbbf6aa7	ZTS: Refactor is_shared, fix impl on FreeBSD FreeBSD doesn't have a `share` command. It does have showmount. Split the separate platform impls out of is_shared_impl. Dispatch to the correct platform impl function from is_shared. Eliminate the use of is_shared_impl from tests. is_shared works. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10037	2020-02-21 15:59:20 -08:00
Ryan Moeller	43849fdf3f	ZTS: Move free to Linux commands list FreeBSD does not have the free command. This command is only used by Linux in a perf hostinfo function. Move free from the list of common commands to the list of Linux commands. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10011	2020-02-18 11:23:41 -08:00
Ryan Moeller	5f087dda78	Enable zpool events tunables and tests on FreeBSD We have have made the necessary changes in our module code to expose zevents through both devd and the zpool events ioctl. Now the tunables can be exposed and zpool events tests can be enabled on both platforms. A few minor tweaks to the tests were needed to accommodate the way wc formats output on FreeBSD. zed remains to be ported. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10008	2020-02-18 11:22:56 -08:00
Ryan Moeller	fb63fc0c03	ZTS: Move cksum to common system commands The cksum command is used by delegate tests. We have it on FreeBSD, so it should not have been moved to the Linux commands list. Move it back to the common commands list. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10007	2020-02-16 12:49:49 -08:00
Ryan Moeller	834f274fbf	ZTS: Interpret env vars in faketty on FreeBSD This was missed in review. On FreeBSD, script does not understand environment variables being passed as a command. Use env to make faketty handle env vars on FreeBSD. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9981	2020-02-12 13:04:51 -08:00
Graham Christensen	dda702fd16	bash scripts: use /usr/bin/env for bash shebangs Not all systems / distros have a `/bin/bash`, and these scripts are more difficult to run at development time. For example, my system is NixOS which doesn't have a /bin/bash. This is not a problem for NixOS building ZFS as a package: the build environment automatically replaces these shebangs with corrected paths. The problem is much more annoying at development time: either the scripts don't run, or I correct them for my local machine and deal with a perpetually dirty work tree. Before committing this patch I confirmed there are existing scripts which use `/usr/bin/env` to locate bash, so I am thinking this is a safe transformation. There are a handful of other shebangs in this repository which don't work on my system. This patch is useful on its own specifically for `commitcheck.sh`, otherwise I can't validate my commits before submission. Here are the remaining shebangs which NixOS systems won't have: 1274 #!/bin/ksh -p 91 #!/bin/ksh 89 #! /bin/ksh -p 2 #!/bin/sed -f 1 #!/usr/bin/perl -w 1 #!/usr/bin/ksh 1 #!/bin/nawk -f plus this which will create an invalid shebang in `tests/zfs-tests/tests/functional/mv_files/mv_files_common.kshlib`: echo "#!/bin/ksh" > $TEST_BASE_DIR/exitsZero.ksh I chose to leave those alone for now, and gauge the interest in this much smaller patch first. The fixes for these are easy enough by simply using `/usr/bin/env ksh`: 91 #!/bin/ksh 1 #!/usr/bin/ksh The fix for the other set is much trickier. Quoting the GNU coreutils manual: Most operating systems (e.g. GNU/Linux, BSDs) treat all text after the first space as a single argument. When using env in a script it is thus not possible to specify multiple arguments. and not all `env`'s support arguments. Mine (GNU Coreutils 8.31) does, though this feature is new since April 2018, GNU Coreutils 8.30: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=668306ed86c8c79b0af0db8b9c882654ebb66db2 and worse, requires the -S argument: -S, --split-string=S process and split S into separate arguments; used to pass multiple arguments on shebang lines Example: $ seq 1 2 \| $(nix-build '<nixpkgs>' -A coreutils)/bin/env "sort -nr" /nix/[...]-coreutils-8.31/bin/env: ‘sort -nr’: No such file or directory /nix/[...]-coreutils-8.31/bin/env: use -[v]S to pass options in shebang lines $ seq 1 2 \| $(nix-build '<nixpkgs>' -A coreutils)/bin/env "-S sort -nr" 2 1 GNU Coreutils says FreeBSD's `env` does, though I wonder if FreeBSD's would be unhappy with the `-S`: https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation BusyBox v1.30.1 does not, and does not have a `-S`-like option: $ seq 1 2 \| $(nix-build '<nixpkgs>' -A busybox)/bin/env "sort -nr" env: can't execute 'sort -nr': No such file or directory Toybox 0.8.1 also does not, and also does not have a `-S` option: $ seq 1 2 \| $(nix-build '<nixpkgs>' -A toybox)/bin/env "sort -nr" env: exec sort -nr: No such file or directory --- At any rate, if this patch merges and the remaining ~1,500 are updated, the much larger patch should probably include a checkstyle-like test asserting all new shebangs use `/usr/bin/env`. I also don't mind dealing with NixOS weirdness if the project would prefer that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Graham Christensen <graham@grahamc.com> Closes #9893	2020-02-10 13:13:46 -08:00
Igor K	818d4a87fd	ZTS: Add an is_dilos function for future ZTS updates Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #9960	2020-02-07 12:32:52 -08:00
Ryan Moeller	a3bddd49f8	ZTS: Fix a few defaults Linux was missing a default value for DEV_DSKDIR. Set it to /dev. Fix resulting fallout. SLICE_PREFIX seems like a good candidate for including in the defaults. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9898	2020-01-31 08:51:23 -08:00
Ryan Moeller	9d8ce2457d	ZTS: FreeBSD tunable for DISABLE_IVSET_GUID_CHECK Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9907	2020-01-29 11:25:15 -08:00
Ryan Moeller	395a6b19d3	ZTS: Simplify zero_partitions on FreeBSD We can avoid a great deal of `sleep 3` by simply destroying the whole partition table in one shot with gpart. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9908	2020-01-29 11:24:18 -08:00
Ryan Moeller	0ecd910923	ZTS: Don't use edonr on FreeBSD FreeBSD doesn't support feature@edonr. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9901	2020-01-28 08:38:02 -08:00
Ryan Moeller	7a298ae975	ZTS: Eliminate random and shuf, consolidate code Both GNU and FreeBSD sort have -R to randomize input. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9900	2020-01-28 08:36:33 -08:00
Ryan Moeller	1a69856034	ZTS: Add missing export for DEV_DSKDIR default Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9873	2020-01-23 09:38:09 -08:00
Ryan Moeller	31712a7ea8	ZTS: Fix incorrect is_physical_device usage This check isn't meant to be used for command substitution. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9844	2020-01-16 13:26:26 -08:00
Ryan Moeller	83c30248ef	ZTS: Fix is_physical_device on FreeBSD This should have been using egrep. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9840	2020-01-15 09:26:26 -08:00
Ryan Moeller	2476f10306	ZTS: Catalog tunable names for tests in tunables.cfg Update tests to use the variables for tunable names. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9831	2020-01-14 14:57:28 -08:00
Ryan Moeller	602f667285	ZTS: Clean up properties.shlib a bit Fixes the last property having an empty value on FreeBSD and makes the code a bit more readable. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9834	2020-01-13 15:21:42 -08:00
Ryan Moeller	6e1c594d64	ZTS: Create xattr helpers to hide platform Create xattr helpers to hide platform and update usage in tests. This does not generally aim to enable all xattr tests yet, but it is a necessary step in that direction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9826	2020-01-10 13:24:59 -08:00
Ryan Moeller	90ae48733c	ZTS: Provide an alternative to shuf for FreeBSD There was a shuf package but the upstream for the port has recently disappeared, so it is no longer available. Create a function to hide the usage of shuf. Implement using seq\|random on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9824	2020-01-09 09:31:17 -08:00
Ryan Moeller	f8d55b95a5	ZTS: Eliminate functions named 'random' The name overlaps with a command needed by FreeBSD. There is also no sense having two 'random' functions that do nearly the same thing, so consolidate to just the more general one and name it 'random_int_between'. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9820	2020-01-08 09:08:30 -08:00
Brian Behlendorf	581ca28169	ZTS: Cleanup partition tables The cleanup_devices function should remove any partitions created on the device and force the partition table to be reread. This is needed to ensure that blkid has an up to date version of what devices and partitions are used by zfs. The cleanup_devices call was removed from inuse_008_pos.ksh since it operated on partitions instead of devices and was not needed. Lastly ddidecode may be called by parted and was therefore added to the constrained path. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9806	2020-01-06 11:14:19 -08:00
Ryan Moeller	665684d721	ZTS: Add helper for disk device check Replace `test -b` and equivalents with `is_disk_device`, so that `-c` is used instead on FreeBSD which has no block cache layer for devices. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9795	2020-01-03 09:10:17 -08:00
Ryan Moeller	d7164b27be	ZTS: Move dumpdev tests to sunos.run Neither FreeBSD nor Linux support dumping to zvols. DilOS still uses these tests, so the files are kept and the tests have been relocated to sunos.run. An `is_illumos` function was added to libtest.shlib to eliminate some awkward platform checks. A few functions that are not expected to be used outside of illumos have been sanitized of extraneous FreeBSD adaptations. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9794	2020-01-03 09:08:23 -08:00
Tony Hutter	9fb2771aa5	Colorize zpool status output If the ZFS_COLOR env variable is set, then use ANSI color output in zpool status: - Column headers are bold - Degraded or offline pools/vdevs are yellow - Non-zero error counters and faulted vdevs/pools are red - The 'status:' and 'action:' sections are yellow if they're displaying a warning. This also includes a new 'faketty' function in libtest.shlib that is compatible with FreeBSD (code provided by @freqlabs). Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #9340	2019-12-19 16:26:07 -08:00
John Wren Kennedy	523fc80069	Tests for btree implementation used by range trees Additional test cases for the btree implementation, see #9181. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #9717	2019-12-19 11:53:54 -08:00
Matthew Macy	7839c4b5e1	Update ZTS to work on FreeBSD Update the common ZTS scripts and individual test cases as needed in order to allow them to be run on FreeBSD. The high level goal is to provide compatibility wrappers whenever possible to minimize changes to individual test cases. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9692	2019-12-18 12:29:43 -08:00
Kjeld Schouten	618b6adfbf	Refactor compression algorithm selection for tests - Moves compression algorithms for tests to properties.shlib - Removes all compression algorithms levels from general tests - Replaces on with lz4 for compression tests - Removes random algorithm selection, if not needed - Cleans copyright header formatting Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Closes #9645	2019-12-04 13:10:12 -08:00
Brian Behlendorf	5e5cefbaee	ZTS: harden xattr/cleanup.ksh When the xattr/cleanup.ksh script is unable to remove the test group due to an active process then it will not call default_cleanup. This will result in a zvol_ENOSPC/setup failure when attempting to create the /mnt/testdir directory which will already exist. Resolve the issue by performing the default_cleanup before removing the test user and group to ensure this step always happens. Also allow one more retry to further minimize the likelihood of the cleanup failing. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9358	2019-09-25 09:24:45 -07:00
loli10K	7e15647ce9	ZTS: Fix /usr/bin/env: 'python2': No such file or directory Since `4f342e45` env(1) must be able to find a "python2" executable in the "constrained path" on systems configured with --with-python=2.x otherwise the ZFS Test Suite won't be able to use Python scripts. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9325	2019-09-16 10:44:51 -07:00
John Gallagher	e60e158eff	Add subcommand to wait for background zfs activity to complete Currently the best way to wait for the completion of a long-running operation in a pool, like a scrub or device removal, is to poll 'zpool status' and parse its output, which is neither efficient nor convenient. This change adds a 'wait' subcommand to the zpool command. When invoked, 'zpool wait' will block until a specified type of background activity completes. Currently, this subcommand can wait for any of the following: - Scrubs or resilvers to complete - Devices to initialized - Devices to be replaced - Devices to be removed - Checkpoints to be discarded - Background freeing to complete For example, a scrub that is in progress could be waited for by running zpool wait -t scrub <pool> This also adds a -w flag to the attach, checkpoint, initialize, replace, remove, and scrub subcommands. When used, this flag makes the operations kicked off by these subcommands synchronous instead of asynchronous. This functionality is implemented using a new ioctl. The type of activity to wait for is provided as input to the ioctl, and the ioctl blocks until all activity of that type has completed. An ioctl was used over other methods of kernel-userspace communiction primarily for the sake of portability. Porting Notes: This is ported from Delphix OS change DLPX-44432. The following changes were made while porting: - Added ZoL-style ioctl input declaration. - Reorganized error handling in zpool_initialize in libzfs to integrate better with changes made for TRIM support. - Fixed check for whether a checkpoint discard is in progress. Previously it also waited if the pool had a checkpoint, instead of just if a checkpoint was being discarded. - Exposed zfs_initialize_chunk_size as a ZoL-style tunable. - Updated more existing tests to make use of new 'zpool wait' functionality, tests that don't exist in Delphix OS. - Used existing ZoL tunable zfs_scan_suspend_progress, together with zinject, in place of a new tunable zfs_scan_max_blks_per_txg. - Added support for a non-integral interval argument to zpool wait. Future work: ZoL has support for trimming devices, which Delphix OS does not. In the future, 'zpool wait' could be extended to add the ability to wait for trim operations to complete. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com> Closes #9162	2019-09-13 18:09:06 -07:00
John Wren Kennedy	b63e2d881f	ZTS: Introduce targeted corruption in file blocks filetest_001_pos verifies that various checksum algorithms detect corruption by overwriting the underlying vdev on which a file resides. It is possible for the overwrite to miss the blocks of a file, causing a spurious failure. This change introduces a function to corrupt the individual blocks of a file as determined by zdb. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #9288	2019-09-09 16:11:07 -07:00
Ryan Moeller	240c015ac6	Refactor checksum operations in tests md5sum in particular but also sha256sum to a lesser extent is used in several areas of the test suite for computing checksums. The vast majority of invocations are followed by `\| awk '{ print $1 }'`. Introduce functions to wrap up `md5sum $file \| awk '{ print $1 }'` and likewise for sha256sum. These also serve as a convenient interface for alternative implementations on other platforms. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9280	2019-09-05 09:51:59 -07:00
Andrea Gelmini	c6e457dffb	Fix typos in tests/ Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #9250	2019-09-02 18:14:53 -07:00
Ryan Moeller	815a645692	Simplify deleting partitions in libtest Eliminate unnecessary code duplication. We can use a for-loop instead of a while-loop. There is no need to echo $DISKSARRAY in a subshell or return 0. Declare all variables with typeset. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9224	2019-08-29 13:11:29 -07:00
Ryan Moeller	142f84dd19	Restore :: in Makefile.am The double-colon looked like a typo, but it's actually an obscure feature. Rules with :: may appear multiple times and are run independently of one another in the order they appear. The use of :: for distclean-local was conventional, not accidental. Add comments to indicate the intentional use of double-colon rules. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9210	2019-08-26 11:48:31 -07:00
Serapheim Dimitropoulos	3b9edd7b17	Introduce getting holds and listing bookmarks through ZCP Consumers of ZFS Channel Programs can now list bookmarks, and get holds from datasets. A minor-refactoring was also applied to distinguish between user and system properties in ZCP. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Ported-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Dan Kimmel <dan.kimmel@delphix.com> OpenZFS-issue: https://illumos.org/issues/8862 Closes #7902	2019-08-12 10:02:34 -07:00
Brian Behlendorf	1e620c9872	Revert "Develop tests for issues #5866 and #8858 " This reverts commit `693c1fc478`. This change resulted in a kmem leak being observed in existing code which needs to be identified and addressed. Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8978 Closes #9090	2019-07-29 12:46:56 -07:00
Paul Zuchowski	693c1fc478	Develop tests for issues #5866 and #8858 Provide zfstest coverage for these two issues which were a panic accessing extended attributes and a problem comparing 64 bit and 32 bit generation numbers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Issue #5866 Issue #8858 Closes #8978	2019-07-26 17:52:13 -07:00
Sara Hartse	37f03da8ba	Fast Clone Deletion Deleting a clone requires finding blocks are clone-only, not shared with the snapshot. This was done by traversing the entire block tree which results in a large performance penalty for sparsely written clones. This is new method keeps track of clone blocks when they are modified in a "Livelist" so that, when it’s time to delete, the clone-specific blocks are already at hand. We see performance improvements because now deletion work is proportional to the number of clone-modified blocks, not the size of the original dataset. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Sara Hartse <sara.hartse@delphix.com> Closes #8416	2019-07-26 10:54:14 -07:00
Brian Behlendorf	4ca457b065	ZTS: Fix mmp_interval failure The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8906	2019-06-19 10:39:28 -07:00
Paul Dagnelie	30af21b025	Implement Redacted Send/Receive Redacted send/receive allows users to send subsets of their data to a target system. One possible use case for this feature is to not transmit sensitive information to a data warehousing, test/dev, or analytics environment. Another is to save space by not replicating unimportant data within a given dataset, for example in backup tools like zrepl. Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots). Second, the new zfs redact command is used to create a redaction bookmark. The redaction bookmark stores the list of blocks in a snapshot that were modified by the redaction snapshot(s). Finally, the redaction bookmark is passed as a parameter to zfs send. When sending to the snapshot that was redacted, the redaction bookmark is used to filter out blocks that contain sensitive or unwanted information, and those blocks are not included in the send stream. When sending from the redaction bookmark, the blocks it contains are considered as candidate blocks in addition to those blocks in the destination snapshot that were modified since the creation_txg of the redaction bookmark. This step is necessary to allow the target to rehydrate data in the case where some blocks are accidentally or unnecessarily modified in the redaction snapshot. The changes to bookmarks to enable fast space estimation involve adding deadlists to bookmarks. There is also logic to manage the life cycles of these deadlists. The new size estimation process operates in cases where previously an accurate estimate could not be provided. In those cases, a send is performed where no data blocks are read, reducing the runtime significantly and providing a byte-accurate size estimate. Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Prashanth Sreenivasa <pks@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Chris Williamson <chris.williamson@delphix.com> Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com> Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #7958	2019-06-19 09:48:12 -07:00
Richard Elling	cfc16f8ba8	Improve ZTS block_device_wait debugging The udevadm settle timeout can be 120 or 180 seconds by default for some distributions. If a long delay is experienced, it could be due to some strangeness in a malfunctioning device that isn't related to the devices under test. To help debug this condition, a notice is given if settle takes too long. Arguments can now be passed to block_device_wait. The expected arguments are block device pathnames. Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:21:19 -07:00
Richard Elling	4cb1b541d4	Block_device_wait does not return an error code Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:21:08 -07:00
Richard Elling	bef70afaa6	Remove redundant redundant remove Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:20:42 -07:00
Richard Elling	2ff615b2bf	Fix logic error in setpartition function Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8839	2019-06-10 09:19:32 -07:00
loli10K	fe609530f2	zfs-tests: fix warnings when packaging some .shlib files This change prevents the following warning when packaging some zfs-tests files: *** WARNING: ./usr/src/zfs-0.8.0/tests/zfs-tests/include/zpool_script.shlib is executable but has empty or no shebang, removing executable bit Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8787	2019-05-24 14:12:14 -07:00
Tomohiro Kusumi	9c53e51616	Fix `zfs set atime\|relatime=off\|on` behavior on inherited datasets `zfs set atime\|relatime=off\|on` doesn't disable or enable the property on read for datasets whose property was inherited from parent, until a dataset is once unmounted and mounted again. (The properties start to work properly if a dataset is once unmounted and mounted again. The difference comes from regular mount process, e.g. via zpool import, uses mount options based on properties read from ondisk layout for each dataset, whereas `zfs set atime\|relatime=off\|on` just remounts a specified dataset.) -- # zpool create p1 <device> # zfs create p1/f1 # zfs set atime=off p1 # echo test > /p1/f1/test # sync # zfs list NAME USED AVAIL REFER MOUNTPOINT p1 176K 18.9G 25.5K /p1 p1/f1 26K 18.9G 26K /p1/f1 # zfs get atime NAME PROPERTY VALUE SOURCE p1 atime off local p1/f1 atime off inherited from p1 # stat /p1/f1/test \| grep Access \| tail -1 Access: 2019-04-26 23:32:33.741205192 +0900 # cat /p1/f1/test test # stat /p1/f1/test \| grep Access \| tail -1 Access: 2019-04-26 23:32:50.173231861 +0900 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ changed by read(2) -- The problem is that zfsvfs::z_atime which was probably intended to keep incore atime state just gets updated by a callback function of "atime" property change, atime_changed_cb(), and never used for anything else. Since now that all file read and atime update use a common function zpl_iter_read_common() -> file_accessed(), and whether to update atime via ->dirty_inode() is determined by atime_needs_update(), atime_needs_update() needs to return false once atime is turned off. It currently continues to return true on `zfs set atime=off`. Fix atime_changed_cb() by setting or dropping SB_NOATIME in VFS super block depending on a new atime value, so that atime_needs_update() works as expected after property change. The same problem applies to "relatime" except that a self contained relatime test is needed. This is because relatime_need_update() is based on a mount option flag MNT_RELATIME, which doesn't exist in datasets with inherited "relatime" property via `zfs set relatime=...`, hence it needs its own relatime test zfs_relatime_need_update(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8674 Closes #8675	2019-05-07 10:06:30 -07:00
Tom Caputi	df583073eb	Do not iterate through filesystems unnecessarily Currently, when attempting to list snapshots ZFS may do a lot of extra work checking child datasets. This is because the code does not realize that it will not be able to reach any snapshots contained within snapshots that are at the depth limit since the snapshots of those datasets are counted as an additional layer deeper. This patch corrects this issue. In addition, this patch adds the ability to do perform the commands: $ zfs list -t snapshot <dataset> $ zfs get -t snapshot <prop> <dataset> as a convenient way to list out properties of all snapshots of a given dataset without having to use the depth limit. Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8539	2019-04-01 11:58:59 -07:00
Brian Behlendorf	1b939560be	Add TRIM support UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Contributions-by: Tim Chase <tim@chase2k.com> Contributions-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8419 Closes #598	2019-03-29 09:13:20 -07:00
Richard Elling	85a150ce1e	Update valid vdev types for get_disklist Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #8532	2019-03-26 14:18:58 -07:00
Ahmed Ghanem	9634299657	OpenZFS 9185 - Enable testing over NFS in ZFS performance tests This change makes additions to the ZFS test suite that allows the performance tests to run over NFS. The test is run and performance data collected from the server side, while IO is generated on the NFS client. This has been tested with Linux and illumos NFS clients. Authored by: Ahmed Ghanem <ahmedg@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Kevin Greene <kevin.greene@delphix.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com> OpenZFS-issue: https://www.illumos.org/issues/9185 Closes #8367	2019-02-04 09:27:37 -08:00
George Wilson	619f097693	OpenZFS 9102 - zfs should be able to initialize storage devices PROBLEM ======== The first access to a block incurs a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are "thick provisioned", where supported by the platform (VMware). This can create a large delay in getting a new virtual machines up and running (or adding storage to an existing Engine). If the thick provision step is omitted, write performance will be suboptimal until all blocks on the LUN have been written. SOLUTION ========= This feature introduces a way to 'initialize' the disks at install or in the background to make sure we don't incur this first read penalty. When an entire LUN is added to ZFS, we make all space available immediately, and allow ZFS to find unallocated space and zero it out. This works with concurrent writes to arbitrary offsets, ensuring that we don't zero out something that has been (or is in the middle of being) written. This scheme can also be applied to existing pools (affecting only free regions on the vdev). Detailed design: - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...] - start, suspend, or cancel initialization - Creates new open-context thread for each vdev - Thread iterates through all metaslabs in this vdev - Each metaslab: - select a metaslab - load the metaslab - mark the metaslab as being zeroed - walk all free ranges within that metaslab and translate them to ranges on the leaf vdev - issue a "zeroing" I/O on the leaf vdev that corresponds to a free range on the metaslab we're working on - continue until all free ranges for this metaslab have been "zeroed" - reset/unmark the metaslab being zeroed - if more metaslabs exist, then repeat above tasks. - if no more metaslabs, then we're done. - progress for the initialization is stored on-disk in the vdev’s leaf zap object. The following information is stored: - the last offset that has been initialized - the state of the initialization process (i.e. active, suspended, or canceled) - the start time for the initialization - progress is reported via the zpool status command and shows information for each of the vdevs that are initializing Porting notes: - Added zfs_initialize_value module parameter to set the pattern written by "zpool initialize". - Added zfs_vdev_{initializing,removal}_{min,max}_active module options. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/9102 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb Closes #8230	2019-01-07 10:37:26 -08:00
Brian Behlendorf	6e72a5b9b6	pyzfs: python3 support (build system) Almost all of the Python code in the respository has been updated to be compatibile with Python 2.6, Python 3.4, or newer. The only exceptions are arc_summery3.py which requires Python 3, and pyzfs which requires at least Python 2.7. This allows us to maintain a single version of the code and support most default versions of python. This change does the following: * Sets the default shebang for all Python scripts to python3. If only Python 2 is available, then at install time scripts which are compatible with Python 2 will have their shebangs replaced with /usr/bin/python. This is done for compatibility until Python 2 goes end of life. Since only the installed versions are changed this means Python 3 must be installed on the system for test-runner when testing in-tree. * Added --with-python=<2\|3\|3.4,etc> configure option which sets the PYTHON environment variable to target a specific python version. By default the newest installed version of Python will be used or the preferred distribution version when creating pacakges. * Fixed --enable-pyzfs configure checks so they are run when --enable-pyzfs=check and --enable-pyzfs=yes. * Enabled pyzfs for Python 3.4 and newer, which is now supported. * Renamed pyzfs package to python<VERSION>-pyzfs and updated to install in the appropriate site location. For example, when building with --with-python=3.4 a python34-pyzfs will be created which installs in /usr/lib/python3.4/site-packages/. * Renamed the following python scripts according to the Fedora guidance for packaging utilities in /bin - dbufstat.py -> dbufstat - arcstat.py -> arcstat - arc_summary.py -> arc_summary - arc_summary3.py -> arc_summary3 * Updated python-cffi package name. On CentOS 6, CentOS 7, and Amazon Linux it's called python-cffi, not python2-cffi. For Python3 it's called python3-cffi or python3x-cffi. * Install one version of arc_summary. Depending on the version of Python available install either arc_summary2 or arc_summary3 as arc_summary. The user output is only slightly different. Reviewed-by: John Ramsden <johnramsden@riseup.net> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8096	2019-01-06 10:39:41 -08:00
Tom Caputi	7c46894081	ZTS: fix wait_scrubbed() Currently, wait_scrubbed() is the only function of its kind that accepts a timeout, which is 10s by default. This timeout is pretty short for a scrub and causes test failures if we run too long. This patch removes the timeout, instead leaning on the global test suite timeout to ensure the tests keep moving. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8210	2018-12-14 10:06:49 -08:00
Brian Behlendorf	7c9a42921e	Detect IO errors during device removal * Detect IO errors during device removal While device removal cannot verify the checksums of individual blocks during device removal, it can reasonably detect hard IO errors from the leaf vdevs. Failure to perform this error checking can result in device removal completing successfully, but moving no data which will permanently corrupt the pool. Situation 1: faulted/degraded vdevs In the configuration shown below, the removal of mirror-0 will permanently corrupt the pool. Device removal will preferentially copy data from 'vdev1 -> vdev3' and from 'vdev2 -> vdev4'. Which in this case will result in nothing being copied since one vdev in each of those groups in unavailable. However, device removal will complete successfully since all IO errors are ignored. tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 /var/tmp/vdev1 FAULTED 0 0 0 external fault /var/tmp/vdev2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 /var/tmp/vdev3 ONLINE 0 0 0 /var/tmp/vdev4 FAULTED 0 0 0 external fault This issue is resolved by updating the source child selection logic to exclude unreadable leaf vdevs. Additionally, unwritable destination child vdevs which can never succeed are skipped to prevent generating a large number of write IO errors. Situation 2: individual hard IO errors During removal if an unexpected hard IO error is encountered when either reading or writing the child vdev the entire removal operation is cancelled. While it may be possible to reconstruct the data after removal that cannot be guaranteed. The only strictly safe thing to do is to cancel the removal. As a future improvement we may want to instead suspend the removal process and allow the damaged region to be retried. But that work is left for another time, hard IO errors during the removal process are expected to be exceptionally rare. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #6900 Closes #8161	2018-12-04 09:37:37 -08:00
LOLi	ebb8735901	ZTS: "checksum" test group needs "lscpu" This change adds "lscpu" to the list of commands used by the ZFS Test Suite: this is required by the "checksum" test group to read the CPU frequency which is used in EdonR, Skein and SHA2 performance tests. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #8139	2018-11-20 09:47:58 -08:00
George Melikov	58aeb87a8f	ZTS: change `$(cat)` to `$(<)` for speedup It's better to use ksh/bash built in methods, rather than spawn new processes every time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #8071	2018-10-31 12:00:06 -05:00
LOLi	2e55034471	zpool: allow sharing of spare device among pools ZFS allows, by default, sharing of spare devices among different pools; this commit simply restores this functionality for disk devices and adds an additional tests case to the ZFS Test Suite to prevent future regression. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7999	2018-10-17 11:21:07 -07:00

1 2 3 4 5

238 Commits