Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Brian Behlendorf	b2ab468dde	Fix mmap / libaio deadlock Calling uiomove() in mappedread() under the page lock can result in a deadlock if the user space page needs to be faulted in. Resolve the issue by dropping the page lock before the uiomove(). The inode range lock protects against concurrent updates via zfs_read() and zfs_write(). Reviewed-by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7335 Closes #7339	2018-03-28 10:19:22 -07:00
Alek P	272b5d730f	Add JSON output support to channel programs The changes piggyback JSON output support on top of channel programs (#6558). This way the JSON output support is targeted to scripting use cases and is easily maintainable since it really only touches one function (zfs_do_channel_program()). This patch ports Joyent's JSON nvlist library from illumos to enable easy JSON printing of channel program output nvlist. To keep the delta small I also took advantage of the fact that printing in zfs_do_channel_program() was almost always done before exiting the program. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7281	2018-03-19 12:40:58 -07:00
Wolfgang Bumiller	0e85048f53	Take user namespaces into account in policy checks Change file related checks to use user namespaces and make sure involved uids/gids are mappable in the current namespace. Note that checks without file ownership information will still not take user namespaces into account, as some of these should be handled via 'zfs allow' (otherwise root in a user namespace could issue commands such as `zpool export`). This also adds an initial user namespace regression test for the setgid bit loss, with a user_ns_exec helper usable in further tests. Additionally, configure checks for the required user namespace related features are added for: * ns_capable * kuid/kgid_has_mapping() * user_ns in cred_t Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Closes #6800 Closes #7270	2018-03-07 15:40:42 -08:00
Tony Hutter	639b18944a	Allow to limit zed's syslog chattiness Some usage patterns like send/recv of replication streams can produce a large number of events. In such a case, the current all-syslog.sh zedlet will hold up to its name, and flood the logs with mostly redundant information. Two mitigate this situation, this changeset introduces to new variables ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE to zed.rc that give more control over which event classes end up in the syslog. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Daniel Kobras <d.kobras@science-computing.de> Closes #6886 Closes #7260	2018-03-06 15:41:52 -08:00
Olaf Faaland	d2160d0538	Record skipped MMP writes in multihost_history Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7212	2018-03-06 15:15:15 -08:00
Scot W. Stevenson	19528cf949	Add Python 3 rewrite of arc_summary.py Add new script arc_summary3.py as a complete rewrite of the arc_summary.py tool (see issue #6873) Add new options: -g/--graph - Display crude graphic representation of ARC status and quit -r/--raw - Print all available information as minimally formatted list (for grep) -s/--section - Print a single section. This replaces -p/--page, which is kept for backwards use but marked as depreciated Add new sections with information on ZIL and SPL. Notify user if sections L2ARC and VDEV are skipped instead of failing silently. Add warning that -p/--page option is depreciated. Developed for Python 3.5. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6873 Closes #6892	2018-02-28 08:52:34 -08:00
Tony Hutter	bf95a000c4	Add scrub after resilver zed script * Add a zed script to kick off a scrub after a resilver. The script is disabled by default. * Add a optional $PATH (-P) option to zed to allow it to use a custom $PATH for its zedlets. This is needed when you're running zed under the ZTS in a local workspace. * Update test scripts to not copy in all-debug.sh and all-syslog.sh by default. They can be optionally copied in as part of zed_setup(). These scripts slow down zed considerably under heavy events loads and can cause events to be dropped or their delivery delayed. This was causing some sporadic failures in the 'fault' tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #4662 Closes #7086	2018-02-23 11:38:05 -08:00
LOLi	faa97c1619	Want 'zfs send -b' This change implements 'zfs send -b' which can be used to send only received property values whether or not they are overridden by local settings. This can be very useful during "restore" operations from a backup pool because it allows to send only the property values originally sent from the backup source, even though they were later modified on the destination either by a 'zfs set' operation, explicit 'zfs inherit' or overridden during the receive process via 'zfs receive -o\|-x'. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7156	2018-02-21 12:32:06 -08:00
Nasf-Fan	9c5167d19f	Project Quota on ZFS Project quota is a new ZFS system space/object usage accounting and enforcement mechanism. Similar as user/group quota, project quota is another dimension of system quota. It bases on the new object attribute - project ID. Project ID is a numerical value to indicate to which project an object belongs. An object only can belong to one project though you (the object owner or privileged user) can change the object project ID via 'chattr -p' or 'zfs project [-s] -p' explicitly. The object also can inherit the project ID from its parent when created if the parent has the project inherit flag (that can be set via 'chattr +P' or 'zfs project -s [-p]'). By accounting the spaces/objects belong to the same project, we can know how many spaces/objects used by the project. And if we set the upper limit then we can control the spaces/objects that are consumed by such project. It is useful when multiple groups and users cooperate for the same project, or a user/group needs to participate in multiple projects. Support the following commands and functionalities: zfs set projectquota@project zfs set projectobjquota@project zfs get projectquota@project zfs get projectobjquota@project zfs get projectused@project zfs get projectobjused@project zfs projectspace zfs allow projectquota zfs allow projectobjquota zfs allow projectused zfs allow projectobjused zfs unallow projectquota zfs unallow projectobjquota zfs unallow projectused zfs unallow projectobjused chattr +/-P chattr -p project_id lsattr -p This patch also supports tree quota based on the project quota via "zfs project" commands set as following: zfs project [-d\|-r] <file\|directory ...> zfs project -C [-k] [-r] <file\|directory ...> zfs project -c [-0] [-d\|-r] [-p id] <file\|directory ...> zfs project [-p id] [-r] [-s] <file\|directory ...> For "df [-i] $DIR" command, if we set INHERIT (project ID) flag on the $DIR, then the proejct [obj]quota and [obj]used values for the $DIR's project ID will be shown as the total/free (avail) resource. Keep the same behavior as EXT4/XFS does. Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by Ned Bass <bass6@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Fan Yong <fan.yong@intel.com> TEST_ZIMPORT_POOLS="zol-0.6.1 zol-0.6.2 master" Change-Id: Ib4f0544602e03fb61fd46a849d7ba51a6005693c Closes #6290	2018-02-13 14:54:54 -08:00
sanjeevbagewadi	cc63068e95	Handle zap_add() failures in mixed case mode With "casesensitivity=mixed", zap_add() could fail when the number of files/directories with the same name (varying in case) exceed the capacity of the leaf node of a Fatzap. This results in a ASSERT() failure as zfs_link_create() does not expect zap_add() to fail. The fix is to handle these failures and rollback the transactions. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com> Closes #7011 Closes #7054	2018-02-09 10:15:53 -08:00
Chunwei Chen	eb9c4532dd	Fix zdb -ed on objset for exported pool zdb -ed on objset for exported pool would failed with: failed to own dataset 'qq/fs0': No such file or directory The reason is that zdb pass objset name to spa_import, it uses that name to create a spa. Later, when dmu_objset_own tries to lookup the spa using real pool name, it can't find one. We fix this by make sure we pass pool name rather than objset name to spa_import. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7099 Closes #6464	2018-02-09 10:11:34 -08:00
Don Brady	fc5d4b6737	Increase code coverage for Lua libraries Add test coverage for lua libraries Remove dead code in Lua implementation Signed-off-by: Don Brady <don.brady@delphix.com>	2018-02-08 15:29:38 -08:00
Don Brady	ee00bfb2e6	Add basic functional tests for zcp user properties Signed-off-by: Don Brady <don.brady@delphix.com>	2018-02-08 15:29:32 -08:00
Chris Williamson	234c91c508	OpenZFS 8600 - ZFS channel programs - snapshot Authored by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> ZFS channel programs should be able to create snapshots. In addition to the base snapshot functionality, this entails extra logic to handle edge cases which were formerly not possible, such as creating then destroying a snapshot in the same transaction sync. OpenZFS-issue: https://www.illumos.org/issues/8600 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68089b8b	2018-02-08 15:29:24 -08:00
Brad Lewis	af07368986	OpenZFS 8592 - ZFS channel programs - rollback Authored by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> ZFS channel programs should be able to perform a rollback. OpenZFS-issue: https://www.illumos.org/issues/8592 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d46b5ed6	2018-02-08 15:29:14 -08:00
Chris Williamson	475eca4908	OpenZFS 8605 - zfs channel programs fix zfs.exists Authored by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> zfs.exists() in channel programs doesn't return any result, and should have a man page entry. This patch corrects zfs.exists so that it returns a value indicating if the dataset exists or not. It also adds documentation about it in the man page. OpenZFS-issue: https://www.illumos.org/issues/8605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1e85e111	2018-02-08 15:28:52 -08:00
Chris Williamson	d99a015343	OpenZFS 7431 - ZFS Channel Programs Authored by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Don Brady <don.brady@delphix.com> Ported-by: John Kennedy <john.kennedy@delphix.com> OpenZFS-issue: https://www.illumos.org/issues/7431 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dfc11533 Porting Notes: * The CLI long option arguments for '-t' and '-m' don't parse on linux * Switched from kmem_alloc to vmem_alloc in zcp_lua_alloc * Lua implementation is built as its own module (zlua.ko) * Lua headers consumed directly by zfs code moved to 'include/sys/lua/' * There is no native setjmp/longjump available in stock Linux kernel. Brought over implementations from illumos and FreeBSD * The get_temporary_prop() was adapted due to VFS platform differences * Use of inline functions in lua parser to reduce stack usage per C call * Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow	2018-02-08 15:28:18 -08:00
Tom Caputi	047116ac76	Raw sends must be able to decrease nlevels Currently, when a raw zfs send file includes a DRR_OBJECT record that would decrease the number of levels of an existing object, the object is reallocated with dmu_object_reclaim() which creates the new dnode using the old object's nlevels. For non-raw sends this doesn't really matter, but raw sends require that nlevels on the receive side match that of the send side so that the checksum-of-MAC tree can be properly maintained. This patch corrects the issue by freeing the object completely before allocating it again in this case. This patch also corrects several issues with dnode_hold_impl() and related functions that prevented dnodes (particularly multi-slot dnodes) from being reallocated properly due to the fact that existing dnodes were not being fully cleaned up when they were freed. This patch adds a test to make sure that zfs recv functions properly with incremental streams containing dnodes of different sizes. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6821 Closes #6864	2018-02-02 11:43:11 -08:00
Tom Caputi	ae76f45cda	Encryption Stability and On-Disk Format Fixes The on-disk format for encrypted datasets protects not only the encrypted and authenticated blocks themselves, but also the order and interpretation of these blocks. In order to make this work while maintaining the ability to do raw sends, the indirect bps maintain a secure checksum of all the MACs in the block below it along with a few other fields that determine how the data is interpreted. Unfortunately, the current on-disk format erroneously includes some fields which are not portable and thus cannot support raw sends. It is not possible to easily work around this issue due to a separate and much smaller bug which causes indirect blocks for encrypted dnodes to not be compressed, which conflicts with the previous bug. In addition, the current code generates incompatible on-disk formats on big endian and little endian systems due to an issue with how block pointers are authenticated. Finally, raw send streams do not currently include dn_maxblkid when sending both the metadnode and normal dnodes which are needed in order to ensure that we are correctly maintaining the portable objset MAC. This patch zero's out the offending fields when computing the bp MAC and ensures that these MACs are always calculated in little endian order (regardless of the host system's byte order). This patch also registers an errata for the old on-disk format, which we detect by adding a "version" field to newly created DSL Crypto Keys. We allow datasets without a version (version 0) to only be mounted for read so that they can easily be migrated. We also now include dn_maxblkid in raw send streams to ensure the MAC can be maintained correctly. This patch also contains minor bug fixes and cleanups. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #6845 Closes #6864 Closes #7052	2018-02-02 11:37:16 -08:00
Giuseppe Di Natale	5e021f56d3	Add dbuf hash and dbuf cache kstats Introduce kstats about the dbuf hash and dbuf cache to make it easier to inspect state. This should help with debugging and understanding of these portions of the codebase. Correct format of dbuf kstat file. Introduce a dbc column to dbufs kstat to indicate if a dbuf is in the dbuf cache. Introduce field filtering in the dbufstat python script. Introduce a no header option to the dbufstat python script. Introduce a test case to test basic mru->mfu list movement in the ARC. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6906	2018-01-29 10:24:52 -08:00
Chunwei Chen	522db29275	zpool import -d to specify device path When we know which devices have the pool we are looking for, sometime it's better if we can directly pass those device paths to zpool import instead of letting it to search through all unrelated stuff, which might take a lot of time if you have hundreds of disks. This patch allows option -d <dev_path> to zpool import. You can have multiple pairs of -d <dev_path>, and zpool import will only search through those devices. For example: zpool import -d /dev/sda -d /dev/sdb Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7077	2018-01-26 10:49:46 -08:00
Brian Behlendorf	8fb1ede146	Extend deadman logic The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit `fa603f82`. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6999	2018-01-25 13:40:38 -08:00
LOLi	390d679acd	Fix 'zpool add' handling of nested interior VDEVs When replacing a faulted device which was previously handled by a spare multiple levels of nested interior VDEVs will be present in the pool configuration; the following example illustrates one of the possible situations: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 spare-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 /var/tmp/fault-dev UNAVAIL 0 0 0 cannot open /var/tmp/replace-dev ONLINE 0 0 0 /var/tmp/spare-dev1 ONLINE 0 0 0 /var/tmp/safe-dev ONLINE 0 0 0 spares /var/tmp/spare-dev1 INUSE currently in use This is safe and allowed, but get_replication() needs to handle this situation gracefully to let zpool add new devices to the pool. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6678 Closes #6996	2017-12-28 10:15:32 -08:00
Prakash Surya	2fe61a7ecc	OpenZFS 8909 - 8585 can cause a use-after-free kernel panic Authored by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: John Kennedy <jwk404@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Prakash Surya <prakash.surya@delphix.com> PROBLEM ======= There's a race condition that exists if `zil_free_lwb` races with either `zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`. Here's an example panic due to this bug: > ::status debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40 operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc) image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only > $c zio_shrink+0x12() zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20) zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8) zil_commit+0x80(ffffff03dcd15cc0, 9a9) zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0) write+0x250(42, fffffd7ff4832000, 2000) sys_syscall+0x177() If there's an outstanding lwb that's in `zil_commit_waiter_timeout` waiting to timeout, waiting on it's waiter's CV, we must be sure not to call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB may be freed and can result in a use-after-free situation where the stale lwb pointer stored in the `zil_commit_waiter_t` structure of the thread waiting on the waiter's CV is used. A similar situation can occur if an lwb is issued to disk, and thus in the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the disk is servicing that lwb. In this situation, the lwb will be freed by `zil_free_lwb`, which will result in a use-after-free situation when the lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called. This race condition is prevented in `zil_close` by calling `zil_commit` before `zil_free_lwb` is called, which will ensure all outstanding (i.e. all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states) reach the `LWB_STATE_DONE` state before the lwb's are freed (`zil_commit` will not return untill all the lwb's are `LWB_STATE_DONE`). Further, this race condition is prevented in `zil_sync` by only calling `zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set. All lwb's not in the `LWB_STATE_DONE` state will have a non-null value for this pointer; the pointer is only cleared in `zil_lwb_flush_vdevs_done`, at which point the lwb's state will be changed to `LWB_STATE_DONE`. This race is present in `zil_suspend`, leading to this bug. At first glance, it would appear as though this would not be true because `zil_suspend` will call `zil_commit`, just like `zil_close`, but the problem is that `zil_suspend` will set the zilog's `zl_suspend` field prior to calling `zil_commit`. Further, in `zil_commit`, if `zl_suspend` is set, `zil_commit` will take a special branch of logic and use `txg_wait_synced` instead of performing the normal `zil_commit` logic. This call to `txg_wait_synced` might be good enough for the data to reach disk safely before it returns, but it does not ensure that all outstanding lwb's reach the `LWB_STATE_DONE` state before it returns. This is because, if there's an lwb "stuck" in `zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will maintain a non-null value for it's `lwb_buf` field and thus `zil_sync` will not free that lwb. Thus, even though the lwb's data is already on disk, the lwb will be left lingering, waiting on the CV, and will eventually timeout and be issued to disk even though the write is unnecessary. So, after `zil_commit` is called from `zil_suspend`, we incorrectly assume that there are not outstanding lwb's, and proceed to free all lwb's found on the zilog's lwb list. As a result, we free the lwb that will later be used `zil_commit_waiter_timeout`. SOLUTION ======== The solution to this, is to ensure all outstanding lwb's complete before calling `zil_free_lwb` via `zil_destroy` in `zil_suspend`. This patch accomplishes this goal by forcing the normal `zil_commit` logic when called from `zil_sync`. Now, `zil_suspend` will call `zil_commit_impl` which will always use the normal logic of waiting/issuing lwb's to disk before it returns. As a result, any lwb's outstanding when `zil_commit_impl` is called will be guaranteed to reach the `LWB_STATE_DONE` state by the time it returns. Further, no new lwb's will be created via `zil_commit` since the zilog's `zl_suspend` flag will be set. This will force all new callers of `zil_commit` to use `txg_wait_synced` instead of creating and issuing new lwb's. Thus, all lwb's left on the zilog's lwb list when `zil_destroy` is called will be in the `LWB_STATE_DONE` state, and we'll avoid this race condition. OpenZFS-issue: https://www.illumos.org/issues/8909 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ece62b6f8d Closes #6940	2017-12-28 10:18:04 -08:00
LOLi	c4ba46dead	Handle invalid options in arc_summary If an invalid option is provided to arc_summary.py we handle any error thrown from the getopt Python module and print the usage help message. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6983	2017-12-19 13:02:40 -08:00
LOLi	4e9b156960	Various ZED fixes * Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #2562 Closes #6858	2017-12-08 16:58:41 -08:00
LOLi	99834d1950	Fix truncate(2) mtime and ctime handling On Linux, ftruncate(2) always changes the file timestamps, even if the file size is not changed. However, in case of a successfull truncate(2), the timestamps are updated only if the file size changes. This translates to the VFS calling the ZFS Posix Layer "setattr" function (zpl_setattr) with ATTR_MTIME and ATTR_CTIME unconditionally set on the iattr mask only when doing a ftruncate(2), while the truncate(2) is left to the filesystem implementation to be dealt with. This behaviour is consistent with POSIX:2004/SUSv3 specifications where there's no explicit requirement for file size changes to update the timestamps only for ftruncate(2): http://pubs.opengroup.org/onlinepubs/009695399/functions/truncate.html http://pubs.opengroup.org/onlinepubs/009695399/functions/ftruncate.html This has been later updated in POSIX:2008/SUSv4 where, for both truncate(2)/ftruncate(2), there's no mention of this size change requirement: http://austingroupbugs.net/view.php?id=489 http://pubs.opengroup.org/onlinepubs/9699919799/functions/truncate.html http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html Unfortunately the Linux VFS is still calling into the ZPL without ATTR_MTIME/ATTR_CTIME set in the truncate(2) case: we fix this by explicitly updating the timestamps when detecting the ATTR_SIZE bit, which is always set in do_truncate(), on the iattr mask. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6811 Closes #6819	2017-11-13 09:24:26 -08:00
George Melikov	b58b73ce74	Disable zpool_import_missing_003_pos Rarely observed failure of zpool_import_missing_003_pos during automated testing due to timeout. Disable the test case until it can be improved. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Issue #6839 Closes #6840	2017-11-07 10:32:04 -08:00
Giuseppe Di Natale	9a810efb02	Allow test-runner to filter test groups by tag Enable test-runner to accept a list of tags to identify which test groups the user wishes to run. Also allow test-runner to perform multiple iterations of a test run. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6788	2017-11-03 09:53:32 -07:00
LOLi	ee45fbd894	ZFS send fails to dump objects larger than 128PiB When dumping objects larger than 128PiB it's possible for do_dump() to miscalculate the FREE_RECORD offset due to an integer overflow condition: this prevents the receiving end from correctly restoring the dumped object. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6760	2017-10-26 16:58:38 -07:00
LOLi	88f9c9396b	Allow 'zpool events' filtering by pool name Additionally add four new tests: * zpool_events_clear: verify 'zpool events -c' functionality * zpool_events_cliargs: verify command line options and arguments * zpool_events_follow: verify 'zpool events -f' * zpool_events_poolname: verify events filtering by pool name Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #3285 Closes #6762	2017-10-26 16:49:33 -07:00
Arkadiusz Bubała	d3f2cd7e3b	Added no_scrub_restart flag to zpool reopen Added -n flag to zpool reopen that allows a running scrub operation to continue if there is a device with Dirty Time Log. By default if a component device has a DTL and zpool reopen is executed all running scan operations will be restarted. Added functional tests for `zpool reopen` Tests covers following scenarios: * `zpool reopen` without arguments, * `zpool reopen` with pool name as argument, * `zpool reopen` while scrubbing, * `zpool reopen -n` while scrubbing, * `zpool reopen -n` while resilvering, * `zpool reopen` with bad arguments. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com> Closes #6076 Closes #6746	2017-10-26 12:26:09 -07:00
David Quigley	d9daa7abcf	ZTS: Add auto-spare tests The ZED is expected to automatically kick in a hot spare device when there's one available in the pool and a sufficient number of read errors have been encountered. Use zinject to simulate the failure condition and verify the hot spare is used. auto_spare_001_pos.ksh: read IO errors, the vdev is FAULTED auto_spare_002_pos.ksh: read CHECKSUM errors, the vdev is DEGRADE Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: David Quigley <david.quigley@intel.com> Closes #6280	2017-10-23 11:42:37 -07:00
Tom Caputi	4807c0badb	Encryption patch follow-up * PBKDF2 implementation changed to OpenSSL implementation. * HKDF implementation moved to its own file and tests added to ensure correctness. * Removed libzfs's now unnecessary dependency on libzpool and libicp. * Ztest can now create and test encrypted datasets. This is currently disabled until issue #6526 is resolved, but otherwise functions as advertised. * Several small bug fixes discovered after enabling ztest to run on encrypted datasets. * Fixed coverity defects added by the encryption patch. * Updated man pages for encrypted send / receive behavior. * Fixed a bug where encrypted datasets could receive DRR_WRITE_EMBEDDED records. * Minor code cleanups / consolidation. Signed-off-by: Tom Caputi <tcaputi@datto.com>	2017-10-11 16:54:48 -04:00
Ned Bass	39f56627ae	receive_freeobjects() skips freeing some objects When receiving a FREEOBJECTS record, receive_freeobjects() incorrectly skips a freed object in some cases. Specifically, this happens when the first object in the range to be freed doesn't exist, but the second object does. This leaves an object allocated on disk on the receiving side which is unallocated on the sending side, which may cause receiving subsequent incremental streams to fail. The bug was caused by an incorrect increment of the object index variable when current object being freed doesn't exist. The increment is incorrect because incrementing the object index is handled by a call to dmu_object_next() in the increment portion of the for loop statement. Add test case that exposes this bug. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #6694 Closes #6695	2017-10-02 15:36:04 -07:00
LOLi	b59b22972d	Add 'zfs diff' coverage to the ZFS Test Suite This change adds four new tests to the ZTS: * zfs_diff_changes: verify type of changes diplayed (-, +, R and M) * zfs_diff_cliargs: verify command line options and arguments * zfs_diff_timestamp: verify 'zfs diff -t' * zfs_diff_types: verify type of objects (files, dirs, pipes...) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6686	2017-09-28 13:04:14 -07:00
LOLi	3fd3e56cfd	Fix some ZFS Test Suite issues * Add 'zfs bookmark' coverage (zfs_bookmark_cliargs) * Add OpenZFS 8166 coverage (zpool_scrub_offline_device) * Fix "busy" zfs_mount_remount failures * Fix bootfs_003_pos, bootfs_004_neg, zdb_005_pos local cleanup * Update usage of $KEEP variable, add get_all_pools() function * Enable history_008_pos and rsend_019_pos (non-32bit builders) * Enable zfs_copies_005_neg, update local cleanup * Fix zfs_send_007_pos (large_dnode + OpenZFS 8199) * Fix rollback_003_pos (use dataset name, not mountpoint, to unmount) * Update default_raidz_setup() to work properly with more than 3 disks * Use $TEST_BASE_DIR instead of hardcoded (/var)/tmp for file VDEVs * Update usage of /dev/random to /dev/urandom Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Issue #6086 Closes #5658 Closes #6143 Closes #6421 Closes #6627 Closes #6632	2017-09-25 10:32:34 -07:00
Brian Behlendorf	5c214ae318	Fix volume WR_INDIRECT log replay The portion of the zvol_replay_write() handler responsible for replaying indirect log records for some reason never existed. As a result indirect log records were not being correctly replayed. This went largely unnoticed since the majority of zvol log records were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578. This patch updates zvol_replay_write() to correctly handle these log records and adds a new test case which verifies volume replay to prevent any regression. The existing test case which verified replay on filesystem was renamed slog_replay_fs.ksh for clarity. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6603 Closes #6615	2017-09-08 15:07:00 -07:00
Brian Behlendorf	e0dd0a32a8	Revert "Handle new dnode size in incremental..." This reverts commit `65dcb0f67a` until a comprehensive fix is finalized. The stricter interior dnode detection in `4c5b89f59e` and the new test case added by this patch revealed a issue with resizing dnodes when receiving an incremental backup stream. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #6576	2017-09-07 10:00:54 -07:00
Olaf Faaland	4c5b89f59e	Improved dnode allocation and dmu_hold_impl() Refactor dmu_object_alloc_dnsize() and dnode_hold_impl() to simplify the code, fix errors introduced by commit `dbeb879` (PR #6117) interacting badly with large dnodes, and improve performance. * When allocating a new dnode in dmu_object_alloc_dnsize(), update the percpu object ID for the core's metadnode chunk immediately. This eliminates most lock contention when taking the hold and creating the dnode. * Correct detection of the chunk boundary to work properly with large dnodes. * Separate the dmu_hold_impl() code for the FREE case from the code for the ALLOCATED case to make it easier to read. * Fully populate the dnode handle array immediately after reading a block of the metadnode from disk. Subsequently the dnode handle array provides enough information to determine which dnode slots are in use and which are free. * Add several kstats to allow the behavior of the code to be examined. * Verify dnode packing in large_dnode_008_pos.ksh. Since the test is purely creates, it should leave very few holes in the metadnode. * Add test large_dnode_009_pos.ksh, which performs concurrent creates and deletes, to complement existing test which does only creates. With the above fixes, there is very little contention in a test of about 200,000 racing dnode allocations produced by tests 'large_dnode_008_pos' and 'large_dnode_009_pos'. name type data dnode_hold_dbuf_hold 4 0 dnode_hold_dbuf_read 4 0 dnode_hold_alloc_hits 4 3804690 dnode_hold_alloc_misses 4 216 dnode_hold_alloc_interior 4 3 dnode_hold_alloc_lock_retry 4 0 dnode_hold_alloc_lock_misses 4 0 dnode_hold_alloc_type_none 4 0 dnode_hold_free_hits 4 203105 dnode_hold_free_misses 4 4 dnode_hold_free_lock_misses 4 0 dnode_hold_free_lock_retry 4 0 dnode_hold_free_overflow 4 0 dnode_hold_free_refcount 4 57 dnode_hold_free_txg 4 0 dnode_allocate 4 203154 dnode_reallocate 4 0 dnode_buf_evict 4 23918 dnode_alloc_next_chunk 4 4887 dnode_alloc_race 4 0 dnode_alloc_next_block 4 18 The performance is slightly improved for concurrent creates with 16+ threads, and unchanged for low thread counts. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5396 Closes #6522 Closes #6414 Closes #6564	2017-09-05 16:15:04 -07:00
Ned Bass	65dcb0f67a	Handle new dnode size in incremental backup stream When receiving an incremental backup stream, call dmu_object_reclaim_dnsize() if an object's dnode size differs between the incremental source and target. Otherwise it may appear that a dnode which has shrunk is still occupying slots which are in fact free. This will cause a failure to receive new objects that should occupy the now-free slots. Add a test case to verify that an incremental stream containing objects with changed dnode sizes can be received without error. This test case fails without this change. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #6366 Closes #6576	2017-09-05 16:09:15 -07:00
LOLi	f763c3d1df	Fix range locking in ZIL commit codepath Since OpenZFS 7578 (`1b7c1e5`) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6238 Closes #6315 Closes #6356 Closes #6477	2017-08-21 08:59:48 -07:00
LOLi	08de8c16f5	Fix remounting snapshots read-write It's not enough to preserve/restore MS_RDONLY on the superblock flags to avoid remounting a snapshot read-write: be explicit about our intentions to the VFS layer so the readonly bit is updated correctly in do_remount_sb(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6510 Closes #6515	2017-08-17 14:28:17 -07:00
Tom Caputi	b525630342	Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769	2017-08-14 10:36:48 -07:00
Brian Behlendorf	9631681b75	Fix dnode allocation race When performing concurrent object allocations using the new multi-threaded allocator and large dnodes it's possible to allocate overlapping large dnodes. This case should have been handled by detecting an error returned by dnode_hold_impl(). But that logic only checked the returned dnp was not-NULL, and the dnp variable was not reset to NULL when retrying. Resolve this issue by properly checking the return value of dnode_hold_impl(). Additionally, it was possible that dnode_hold_impl() would misreport a dnode as free when it was in fact in use. This could occurs for two reasons: * The per-slot zrl_lock must be held over the entire critical section which includes the alloc/free until the new dnode is assigned to children_dnodes. Additionally, all of the zrl_lock's in the range must be held to protect moving dnodes. * The dn->dn_ot_type cannot be solely relied upon to check the type. When allocating a new dnode its type will be DMU_OT_NONE after dnode_create(). Only latter when dnode_allocate() is called will it transition to the new type. This means there's a window when allocating where it can mistaken for a free dnode. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6414 Closes #6439	2017-08-08 08:38:53 -07:00
Olaf Faaland	0582e40322	Add callback for zfs_multihost_interval Add a callback to wake all running mmp threads when zfs_multihost_interval is changed. This is necessary when the interval is changed from a very large value to a significantly lower one, while pools are imported that have the multihost property enabled. Without this commit, the mmp thread does not wake up and detect the new interval until after it has waited the old multihost interval time. A user monitoring mmp writes via the provided kstat would be led to believe that the changed setting did not work. Added a test in the ZTS under mmp to verify the new functionality is working. Added a test to ztest which starts and stops mmp threads, and calls into the code to signal sleeping mmp threads, to test for deadlocks or similar locking issues. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #6387	2017-07-25 13:22:20 -04:00
Olaf Faaland	ffb195c256	Release SCL_STATE in map_write_done() The config lock must be held for the duration of the MMP write. Since the I/Os are executed via map_nowait(), the done function is the only place where we know the write has completed. Since SCL_STATE is taken as reader, overlapping I/Os do not create a deadlock. The refcount is simply increased when new I/Os are queued and decreased when I/Os complete. Test case added which exercises the probe IO call path to verify the fix and prevent a regression. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #6394	2017-07-25 12:25:05 -04:00
Olaf Faaland	379ca9cf2b	Multi-modifier protection (MMP) Add multihost=on\|off pool property to control MMP. When enabled a new thread writes uberblocks to the last slot in each label, at a set frequency, to indicate to other hosts the pool is actively imported. These uberblocks are the last synced uberblock with an updated timestamp. Property defaults to off. During tryimport, find the "best" uberblock (newest txg and timestamp) repeatedly, checking for change in the found uberblock. Include the results of the activity test in the config returned by tryimport. These results are reported to user in "zpool import". Allow the user to control the period between MMP writes, and the duration of the activity test on import, via a new module parameter zfs_multihost_interval. The period is specified in milliseconds. The activity test duration is calculated from this value, and from the mmp_delay in the "best" uberblock found initially. Add a kstat interface to export statistics about Multiple Modifier Protection (MMP) updates. Include the last synced txg number, the timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV label that received the last MMP update, and the VDEV path. Abbreviated output below. $ cat /proc/spl/kstat/zfs/mypool/multihost 31 0 0x01 10 880 105092382393521 105144180101111 txg timestamp mmp_delay vdev_guid vdev_label vdev_path 20468 261337 250274925 68396651780 3 /dev/sda 20468 261339 252023374 6267402363293 1 /dev/sdc 20468 261340 252000858 6698080955233 1 /dev/sdx 20468 261341 251980635 783892869810 2 /dev/sdy 20468 261342 253385953 8923255792467 3 /dev/sdd 20468 261344 253336622 042125143176 0 /dev/sdab 20468 261345 253310522 1200778101278 2 /dev/sde 20468 261346 253286429 0950576198362 2 /dev/sdt 20468 261347 253261545 96209817917 3 /dev/sds 20468 261349 253238188 8555725937673 3 /dev/sdb Add a new tunable zfs_multihost_history to specify the number of MMP updates to store history for. By default it is set to zero meaning that no MMP statistics are stored. When using ztest to generate activity, for automated tests of the MMP function, some test functions interfere with the test. For example, the pool is exported to run zdb and then imported again. Add a new ztest function, "-M", to alter ztest behavior to prevent this. Add new tests to verify the new functionality. Tests provided by Giuseppe Di Natale. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #745 Closes #6279	2017-07-13 13:54:00 -04:00
LOLi	cf8738d853	Add port of FreeBSD 'volmode' property The volmode property may be set to control the visibility of ZVOL block devices. This allow switching ZVOL between three modes: full - existing fully functional behaviour (default) dev - hide partitions on ZVOL block devices none - not exposing volumes outside ZFS Additionally the new zvol_volmode module parameter can be used to control the default behaviour. This functionality can be used, for instance, on "backup" pools to avoid cluttering /dev with unneeded zd* devices. Original-patch-by: mav <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> FreeBSD-commit: https://github.com/freebsd/freebsd/commit/dd28e6bb Closes #1796 Closes #3438 Closes #6233	2017-07-12 13:05:37 -07:00
LOLi	92e43c1718	Fix 'zpool clear' on readonly pools Illumos 4080 inadvertently allows 'zpool clear' on readonly pools: fix this by reintroducing a check (POOL_CHECK_READONLY) in zfs_ioc_clear registration code. Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6306	2017-07-07 10:39:53 -07:00
Boris Protopopov	03928896e1	Call cv_signal() with mutex held In bqueue_dequeue(), call cv_signal() with bq_lock held. Re-enable rsend_009_pos to test the fix. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Closes #5887	2017-06-26 14:36:49 -07:00
Håkan Johansson	6eb6073a04	Allow add of raidz and mirror with same redundancy Allow new members to be added to a pool mixing raidz and mirror vdevs without giving -f, as long as they have matching redundancy. This case was missed in #5915, which only handled zpool create. Add zfstest zpool_add_010_pos.ksh, with test of zpool create followed by zpool add of mixed raidz and mirror vdevs. Add some more mixed raidz and mirror cases to zpool_create_006_pos.ksh. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Haakan Johansson <f96hajo@chalmers.se> Issue #5915 Closes #6181	2017-06-05 13:53:09 -07:00
Giuseppe Di Natale	099700d9df	zpool iostat/status -c improvements Users can now provide their own scripts to be run with 'zpool iostat/status -c'. User scripts should be placed in ~/.zpool.d to be included in zpool's default search path. Provide a script which can be used with 'zpool iostat\|status -c' that will return the type of device (hdd, sdd, file). Provide a script to get various values from smartctl when using 'zpool iostat/status -c'. Allow users to define the ZPOOL_SCRIPTS_PATH environment variable which can be used to override the default 'zpool iostat/status -c' search path. Allow the ZPOOL_SCRIPTS_ENABLED environment variable to enable or disable 'zpool status/iostat -c' functionality. Use the new smart script to provide the serial command. Install /etc/sudoers.d/zfs file which contains the sudoer rule for smartctl as a sample. Allow 'zpool iostat/status -c' tests to run in tree. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6121 Closes #6153	2017-06-05 10:52:15 -07:00
LOLi	92aceb2a7e	Fix "snapdev" property issues When inheriting the "snapdev" property to we don't always call zfs_prop_set_special(): this prevents device nodes from being created in certain situations. Because "snapdev" is the only special property that is also inheritable we need to call zfs_prop_set_special() even when we're not reverting it to the received value ('zfs inherit -S'). Additionally, fix a NULL pointer dereference accidentally introduced in `5559ba0` that can be triggered when setting the "snapdev" property to the value "hidden" twice. Finally, add a new test case "zvol_misc_snapdev" to the ZFS Test Suite. Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6131 Closes #6175 Closes #6176	2017-06-02 07:17:00 -07:00
Brian Behlendorf	261c013fbf	Revert "Fix "snapdev" property inheritance behaviour" This reverts commit `959f56b993`. An issue was uncovered by the new zvol_misc_snapdev test case which needs to be investigated and resolved. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6174 Issue #6131	2017-05-26 11:40:44 -07:00
LOLi	959f56b993	Fix "snapdev" property inheritance behaviour When inheriting the "snapdev" property to we don't always call zfs_prop_set_special(): this prevents device nodes from being created in certain situations. Because "snapdev" is the only special property that is also inheritable we need to call zfs_prop_set_special() even when we're not reverting it to the received value ('zfs inherit -S'). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6131	2017-05-25 16:43:46 -07:00
Brian Behlendorf	3f03fc8df3	Add zpool events tests * events_001_pos - Verify the expected events are generated when invoking the various zpool sub-commands. These events must appear in `zpool event` and be consumed by the ZED. * events_002_pos - Verify the ZED consumes events which were generated while it wasn't running when it is started. Additionally, verify that events are only processed once. As part of this change the default.cfg used by the test suite was changed to a default.cfg.in file. This was needed so the install location of all zed scripts, not only the enabled ones, could be reliably determined. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6128	2017-05-22 12:34:42 -04:00
Brian Behlendorf	5a6d6cf839	Enable xattr tests Updated the xattr_common.ksh helper functions to use the attr command on Linux to manipulate xattrs. Added an xattr.cfg file and reworked the user/group functionality to be consist with the existing delegate test cases. The intent of each test case was preserved. * xattr_001_pos, xattr_002_neg - Updated to verity xattr=on and xattr=sa sytle xattrs. * xattr_003_neg - Use user_run helper instead of su. * xattr_004_pos - Updated to work with ext2 xattrs. * xattr_007_neg - Updated to use attr instead of runat. * xattr_008_pos, xattr_009_neg8_pos, xattr_010_neg - Test cases disables since they aren't applicable to Linux. * xattr_011_pos - Updated to expected behavior from GNU versions of the tested utilities. * xattr_012_pos - Updated to use xattrtest to create many small xattrs instead of a single large one. * xattr_013_pos - Updated to use attr instead of runat. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6128	2017-05-22 12:34:42 -04:00
Brian Behlendorf	95401cb6f7	Enable remaining tests Enable most of the remaining test cases which were previously disabled. The required fixes are as follows: * cache_001_pos - No changes required. * cache_010_neg - Updated to use losetup under Linux. Loopback cache devices are allowed, ZVOLs as cache devices are not. Disabled until all the builders pass reliably. * cachefile_001_pos, cachefile_002_pos, cachefile_003_pos, cachefile_004_pos - Set set_device_dir path in cachefile.cfg, updated CPATH1 and CPATH2 to reference unique files. * zfs_clone_005_pos - Wait for udev to create volumes. * zfs_mount_007_pos - Updated mount options to expected Linux names. * zfs_mount_009_neg, zfs_mount_all_001_pos - No changes required. * zfs_unmount_005_pos, zfs_unmount_009_pos, zfs_unmount_all_001_pos - Updated to expect -f to not unmount busy mount points under Linux. * rsend_019_pos - Observed to occasionally take a long time on both 32-bit systems and the kmemleak builder. * zfs_written_property_001_pos - Switched sync(1) to sync_pool. * devices_001_pos, devices_002_neg - Updated create_dev_file() helper for Linux. * exec_002_neg.ksh - Fixed mmap_exec.c to preserve errno. Updated test case to expect EPERM from Linux as described by mmap(2). * grow_pool_001_pos - Adding missing setup.ksh and cleanup.ksh scripts from OpenZFS. * grow_replicas_001_pos.ksh - Added missing $SLICE_* variables. * history_004_pos, history_006_neg, history_008_pos - Fixed by previous commits and were not enabled. No changes required. * zfs_allow_010_pos - Added missing spaces after assorted zfs commands in delegate_common.kshlib. * inuse_* - Illumos dump device tests skipped. Remaining test cases updated to correctly create required partitions. * large_files_001_pos - Fixed largest_file.c to accept EINVAL as well as EFBIG as described in write(2). * link_count_001 - Added nproc to required commands. * umountall_001 - Updated to use umount -a. * online_offline_001_* - Pull in OpenZFS change to file_trunc.c to make the '-c 0' option run the test in a loop. Included online_offline.cfg file in all test cases. * rename_dirs_001_pos - Updated to use the rename_dir test binary, pkill restricted to exact matches and total runtime reduced. * slog_013_neg, write_dirs_002_pos - No changes required. * slog_013_pos.ksh - Updated to use losetup under Linux. * slog_014_pos.ksh - ZED will not be running, manually degrade the damaged vdev as expected. * nopwrite_varying_compression, nopwrite_volume - Forced pool sync with sync_pool to ensure up to date property values. * Fixed typos in ZED log messages. Refactored zed_* helper functions to resolve all-syslog exit=1 errors in zedlog. * zfs_copies_005_neg, zfs_get_004_pos, zpool_add_004_pos, zpool_destroy_001_pos, largest_pool_001_pos, clone_001_pos.ksh, clone_001_pos, - Skip until layering pools on zvols is solid. * largest_pool_001_pos - Limited to 7eb pool, maximum supported size in 8eb-1 on Linux. * zpool_expand_001_pos, zpool_expand_003_neg - Requires additional support from the ZED, updated skip reason. * zfs_rollback_001_pos, zfs_rollback_002_pos - Properly cleanup busy mount points under Linux between test loops. * privilege_001_pos, privilege_003_pos, rollback_003_pos, threadsappend_001_pos - Skip with log_unsupported. * snapshot_016_pos - No changes required. * snapshot_008_pos - Increased LIMIT from 512K to 2M and added sync_pool to avoid false positives. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6128	2017-05-22 12:34:32 -04:00
Alek P	bec1067d54	Implemented zpool sync command This addition will enable us to sync an open TXG to the main pool on demand. The functionality is similar to 'sync(2)' but 'zpool sync' will return when data has hit the main storage instead of potentially just the ZIL as is the case with the 'sync(2)' cmd. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #6122	2017-05-19 12:33:11 -07:00
Tony Hutter	4a283c7f77	Force fault a vdev with 'zpool offline -f' This patch adds a '-f' option to 'zpool offline' to fault a vdev instead of bringing it offline. Unlike the OFFLINE state, the FAULTED state will trigger the FMA code, allowing for things like autoreplace and triggering the slot fault LED. The -f faults persist across imports, unless they were set with the temporary (-t) flag. Both persistent and temporary faults can be cleared with zpool clear. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #6094	2017-05-19 12:30:16 -07:00
Brian Behlendorf	8c54ddd33a	Enable additional test cases Enable additional test cases, in most cases this required a few minor modifications to the test scripts. In a few cases a real bug was uncovered and fixed. And in a handful of cases where pools are layered on pools the test case will be skipped until this is supported. Details below for each test case. * zpool_add_004_pos - Skip test on Linux until adding zvols to pools is fully supported and deadlock free. * zpool_add_005_pos.ksh - Skip dumpadm portion of the test which isn't relevant for Linux. The find_vfstab_dev, find_mnttab_dev, and save_dump_dev functions were updated accordingly for Linux. Add O_EXCL to the in-use check to prevent the -f (force) option from working for mounted filesystems and improve the resulting error. * zpool_add_006_pos - Update test case such that it doesn't depend on nested pools. Switch to truncate from mkfile to reduce space requirements and speed up the test case. * zpool_clear_001_pos - Speed up test case by filling filesystem to 25% capacity. * zpool_create_002_pos, zpool_create_004_pos - Use sparse files for file vdevs in order to avoid increasing the partition size. * zpool_create_006_pos - `6ba1ce9` allows raidz+mirror configs with similar redundancy. Updating the valid_args and forced_args cases. * zpool_create_008_pos - Disable overlapping partition portion. * zpool_create_011_neg - Fix to correctly create the extra partition. Modified zpool_vdev.c to use fstat64_blk() wrapper which includes the st_size even for block devices. * zpool_create_012_neg - Updated to properly find swap devices. * zpool_create_014_neg, zpool_create_015_neg - Updated to use swap_setup() and swap_cleanup() wrappers which do the right thing on Linux and Illumos. Removed '-n' option which succeeds under Linux due to differences in the in-use checks. * zpool_create_016_pos.ksh - Skipped test case isn't useful. * zpool_create_020_pos - Added missing / to cleanup() function. Remove cache file prior to test to ensure a clean environment and avoid false positives. * zpool_destroy_001_pos - Removed test case which creates a pool on a zvol. This is more likely to deadlock under Linux and has never been completely supported on any platform. * zpool_destroy_002_pos - 'zpool destroy -f' is unsupported on Linux. Mount point must not be busy in order to unmount them. * zfs_destroy_001_pos - Handle EBUSY error which can occur with volumes when racing with udev. * zpool_expand_001_pos, zpool_expand_003_neg - Skip test on Linux until adding zvols to pools is fully supported and deadlock free. The test could be modified to use loop-back devices but it would be preferable to use the test case as is for improved coverage. * zpool_export_004_pos - Updated test case to such that it doesn't depend on nested pools. Normal file vdev under /var/tmp are fine. * zpool_import_all_001_pos - Updated to skip partition 1, which is known as slice 2, on Illumos. This prevents overwriting the default TESTPOOL which was causing the failure. * zpool_import_002_pos, zpool_import_012_pos - No changes needed. * zpool_remove_003_pos - No changes needed * zpool_upgrade_002_pos, zpool_upgrade_004_pos - Root cause addressed by upstream OpenZFS commit `3b7f360`. * zpool_upgrade_007_pos - Disabled in test case due to known failure. Opened issue https://github.com/zfsonlinux/zfs/issues/6112 * zvol_misc_002_pos - Updated to to use ext2. * zvol_misc_001_neg, zvol_misc_003_neg, zvol_misc_004_pos, zvol_misc_005_neg, zvol_misc_006_pos - Moved to skip list, these test case could be updated to use Linux's crash dump facility. * zvol_swap_* - Updated to use swap_setup/swap_cleanup helpers. File creation switched from /tmp to /var/tmp. Enabled minimal useful tests for Linux, skip test cases which aren't applicable. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3484 Issue #5634 Issue #2437 Issue #5202 Issue #4034 Closes #6095	2017-05-11 14:27:57 -07:00
LOLi	a3eeab2de6	Add property overriding (-o\|-x) to 'zfs receive' This allows users to specify "-o property=value" to override and "-x property" to exclude properties when receiving a zfs send stream. Both native and user properties can be specified. This is useful when using zfs send/receive for periodic backup/replication because it lets users change properties such as canmount, mountpoint, or compression without modifying the source. References: https://www.illumos.org/issues/2745 https://www.illumos.org/issues/3753 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #1350 Closes #5349	2017-05-09 16:21:09 -07:00
Brian Behlendorf	35b7842f68	Enable all zfs_destroy test cases * zfs_destroy_001_pos - Unable to reproduce the failures locally. Re-enabled to determine observed buildbot failure rate. * zfs_destroy_005_neg - Updated for expected Linux behavior. Busy mount points, even snapshots, are expected to fail. * zfs_destroy_010_pos - Resolved transient EBUSY with retry. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #5635 Issue #5893 Closes #6091	2017-05-03 18:27:59 -07:00
LOLi	dddef7d600	More ashift improvements This commit allow higher ashift values (up to 16) in 'zpool create' The ashift value was previously limited to 13 (8K block) in `b41c990` because the limited number of uberblocks we could fit in the statically sized (128K) vdev label ring buffer could prevent the ability the safely roll back a pool to recover it. Since `b02fe35` the largest uberblock size we support is 8K: this allow us to store a minimum number of 16 uberblocks in the vdev label, even with higher ashift values. Additionally change 'ashift' pool property behaviour: if set it will be used as the default hint value in subsequent vdev operations ('zpool add', 'attach' and 'replace'). A custom ashift value can still be specified from the command line, if desired. Finally, fix a bug in add-o_ashift.ksh caused by a missing variable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #2024 Closes #4205 Closes #4740 Closes #5763	2017-05-03 09:31:05 -07:00
Olaf Faaland	9d3f7b8791	Write label 2,3 uberblocks when vdev expands When vdev_psize increases, the location of labels 2 and 3 changes because their location is relative to the end of the device. The configs for labels 2 and 3 are written during the next spa_sync() because the vdev is added to the dirty config list. However, the uberblock rings are not re-written in their new location, leaving the device vulnerable to the beginning of the device being overwritten or damaged. This patch copies the uberblock ring from label 0 to labels 2 and 3, in their new locations, at the next sync after vdev_psize increases. Also, add a test zpool_expand_004_pos.ksh to confirm the uberblocks are copied. Reviewed-by: BearBabyLiu <liu.huang@zte.com.cn> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5108	2017-05-02 13:55:24 -07:00
Dan Kimmel	a7004725d0	OpenZFS 7252 - compressed zfs send / receive OpenZFS 7252 - compressed zfs send / receive OpenZFS 7628 - create long versions of ZFS send / receive options Authored by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: David Quigley <dpquigl@davequigley.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Approved by: Dan McDonald <danmcd@omniti.com> Reviewed by: David Quigley <dpquigl@davequigley.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Ported-by: bunder2015 <omfgbunder@gmail.com> Ported-by: Don Brady <don.brady@intel.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - Most of 7252 was already picked up during ABD work. This commit represents the gap from the final commit to openzfs. - Fixed split_large_blocks check in do_dump() - An alternate version of the write_compressible() function was implemented for Linux which does not depend on fio. The behavior of fio differs significantly based on the exact version. - mkholes was replaced with truncate for Linux. OpenZFS-issue: https://www.illumos.org/issues/7252 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5602294 Closes #6067	2017-04-26 12:31:43 -07:00
Tony Hutter	d6418de057	Prebaked scripts for zpool status/iostat -c This patch updates the "zpool status/iostat -c" commands to only run "pre-baked" scripts from the /etc/zfs/zpool.d directory (or wherever you install to). The scripts can only be run from -c as an unprivileged user (unless the ZPOOL_SCRIPTS_AS_ROOT environment var is set by root). This was done to encourage scripts to be written is such a way that normal users can use them, and to be cautious. If your script needs to run a privileged command, consider adding the appropriate line in /etc/sudoers. See zpool(8) for an example of how to do this. The patch also allows the scripts to output custom column names. If the script outputs a line like: name=value then "name" is used for the column name, and "value" is its value. Multiple columns can be specified by outputting multiple lines. Column names and values can have spaces. If the value is empty, a dash (-) is printed instead. After all the "name=value" lines are read (if any), zpool will take the next the next line of output (if any) and print it without a column header. After that, no more lines will be processed. This can be useful for printing errors. Lastly, this patch also disables the -c option with the latency and request size histograms, since it produced awkward output and made the code harder to maintain. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #5852	2017-04-21 09:27:04 -07:00
Brian Behlendorf	dd49132a1d	OpenZFS 7535 - need test for resumed send of top most filesystem Authored by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - zfs_share_001_pos.ksh - Older versions of exportfs will match multiple exports that share a common prefix. Reorder the 'fs' list so unshares occur from most to least unique. - zfs_share_005_pos.ksh - Enabled and updated for Linux. OpenZFS-issue: https://www.illumos.org/issues/7535 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ac89d1e Closes #5979	2017-04-12 08:47:42 -07:00
Yuri Pankov	dbb38f6605	OpenZFS 6865 - want zfs-tests cases for zpool labelclear command Authored by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - Updated 'zpool labelclear' and 'zdb -l' such that they attempt to find a vdev given solely its short name. This behavior is consistent with the upstream OpenZFS code and the test cases depend on it. The actual implementation differs slightly due to device naming conventions on Linux. - auto_online_001_pos, auto_replace_001_pos and add-o_ashift test cases updated to expect failure when no label exists. - read_efi_label() and zpool_label_disk_check() are read-only operations and should use O_RDONLY at open time to enforce this. - zpool_label_disk() and zpool_relabel_disk() write the partition information using O_DIRECT an fsync() and page cache invalidation to ensure a consistent view of the device. - dump_label() in zdb should invalidate the page cache in order to get the authoritative label from disk. OpenZFS-issue: https://www.illumos.org/issues/6865 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95076c Closes #5981	2017-04-11 09:54:39 -07:00
LOLi	047187c1bd	Fix size inflation in spa_get_worst_case_asize() When we try assign a new transaction to a TXG we must know beforehand if there is sufficient free space on disk. This is to decide, in dmu_tx_assign(), if we should reject the TX with ENOSPC. We rely on spa_get_worst_case_asize() to inflate the size of our logical writes by a factor of spa_asize_inflation which is calculated as: (VDEV_RAIDZ_MAXPARITY + 1) * SPA_DVAS_PER_BP * 2 == 24 The problem with the current implementation is that we don't take into account what happens with very small writes on VDEVs with large physical block sizes. Consider the case of writes to a dataset with recordsize=512, copies=3 on a VDEV with ashift=13 (usually SSD with 8K block size): every logical IO will end up allocating 3 * 8K = 24K on disk, so 512 bytes multiplied by 48, which is double the size we account for. If we allow this kind of writes to be assigned a TX it is possible, when the pool is almost full, to trigger an allocation failure (ENOSPC) in the ZIO pipeline, which will in turn result in the whole pool being suspended. The bug is fixed by using, in spa_get_worst_case_asize(), the MAX() value chosen between the logical io size from zfs_write() and the maximum physical block size used among our VDEVs. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5941	2017-04-10 15:28:21 -07:00
Toomas Soome	8aab121821	OpenZFS 7404 - rootpool_007_neg, bootfs_006_pos and bootfs_008_neg tests fail with the loader project bits Authored by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Marcel Telka <marcel@telka.sk> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed-by: George Melikov <mail@gmelikov.ru> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - Removed gzip and zle compression restriction on bootfs datasets. Grub added support for these long ago. Ay version of grub which understands lz4 also supports this. - Enabled rootpool tests in runfile but skipped by default in setup on Linux since they modify the rootpool. - bootfs_006_pos.ksh, striped pools are allowed as bootfs. OpenZFS-issue: https://www.illumos.org/issues/7404 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/55a424c Closes #5982	2017-04-07 14:18:19 -07:00
Sydney Vanda	7a4500a101	Added auto-replace FMA test for the ZFS Test Suite Also included are updates to auto-online test Automated auto-replace test to go along with ZED FMA integration (PR 4673) auto-replace_001.pos works using a scsi_debug device (the only usable virtual device currently due to whole_disk var needing to be set) Functionality for automated FMA auto-replace test to work with scsi_debug devs: Some functionality/exceptions needed to be added for automation of auto-replace to work correctly. In the test an alias vdev_id rule is added for any scsi_debug device which sets the phys_path="scsidebug" after a udevadm trigger command. A symlink is created for the vdev_id.conf file (in /etc/zfs/ by default) to be used in-tree for the test suite (/var/tmp/zfs/vdev_id.conf). "./scripts/zfs-helpers.sh -i" needs to be run before fault tests in the ZTS (to use udev rules in-tree) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: David Quigley <david.quigley@intel.com> Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com> Closes #5944	2017-04-05 16:18:19 -07:00
LOLi	ff61d1a495	Check ashift validity in 'zpool add' `df83110` added the ability to specify a custom "ashift" value from the command line in 'zpool add' and 'zpool attach'. This commit adds additional checks to the provided ashift to prevent invalid values from being used, which could result in disastrous consequences for the whole pool. Additionally provide ASHIFT_MAX and ASHIFT_MIN definitions in spa.h. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5878	2017-03-28 17:21:11 -07:00
Brian Behlendorf	4ffeb12fa8	Disable rsend_009_pos Test rsend_009_pos has been observed to fail pretty frequently when testing using a kmemleak enabled kernel. For the moment disable this test case until the underlying issue is resolved. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #5887 Closes #5934	2017-03-28 09:58:23 -07:00
Olaf Faaland	3c9e0d673e	Dump unique configurations and Uberblocks in zdb -lu For zdb -l, detect when the configuration nvlist in some label l (l>0) is the same as a configuration already dumped. If so, do not dump it. Make a similar check when dumping Uberblocks for zdb -lu. Check whether a label already dumped contains an identical Uberblock. If so, do not dump the Uberblock. When dumping a configuration or Uberblock, state which labels it is found in (0-3), for example: labels = 1 2 3 Detecting redundant uberblocks or configurations is accomplished by calculating checksums of the uberblocks and the packed nvlists containing the configuration. If there is nothing unique to be dumped for a label (ie the configuration and uberblocks have checksums matching those already dumped) print nothing for that label. With additional l's or u's, increase verbosity as follows: -l Dump each unique configuration only once. Indicate which labels it appears in. -ll In addition, dump label space usage stats. -lll Dump every configuration, unique or not. -u Dump each unique, valid, uberblock only once. Indicate which labels it appears in. -uu In addition, state which slots are invalid. -uuu Dump every uberblock, unique or not. -uuuu Dump the uberblock blockpointer (used to be -uuu) Make exit values conform to the manual page. Failing to unpack a configuration nvlist is considered an error, as well as failing to open or read from the device. Add three tests, zdb_00{3,4,5}_pos to verify the above functionality. An example of the output: ------------------------------------ LABEL 0 ------------------------------------ version: 5000 name: 'pool' state: 1 txg: 880 < ... redacted ... > features_for_read: com.delphix:hole_birth com.delphix:embedded_data labels = 0 Uberblock[0] magic = 0000000000bab10c version = 5000 txg = 0 guid_sum = 3038694082047428541 timestamp = 1487715500 UTC = Tue Feb 21 14:18:20 2017 labels = 0 1 2 3 Uberblock[4] magic = 0000000000bab10c version = 5000 txg = 772 guid_sum = 9045970794941528051 timestamp = 1487727291 UTC = Tue Feb 21 17:34:51 2017 labels = 0 < ... redacted ... > ------------------------------------ LABEL 1 ------------------------------------ version: 5000 name: 'pool' state: 1 txg: 14 < ... redacted ... > com.delphix:embedded_data labels = 1 2 3 Uberblock[4] magic = 0000000000bab10c version = 5000 txg = 4 guid_sum = 7793930272573252584 timestamp = 1487727521 UTC = Tue Feb 21 17:38:41 2017 labels = 1 2 3 < ... redacted ... > Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5738	2017-03-06 16:01:45 -08:00
Sydney Vanda	ec0e24c232	Add auto-online test for ZED/FMA as part of the ZTS Automated auto-online test to go along with ZED FMA integration (PR 4673) auto_online_001.pos works with real devices (sd- and mpath) and with non-real block devices (loop) by adding a scsi_debug device to the pool Note: In order for test group to run, ZED must not currently be running. Kernel 3.16.37 or higher needed for scsi_debug to work properly If timeout occurs on test using a scsi_debug device (error noticed on Ubuntu system), a reboot might be needed in order for test to pass. (more investigation into this) Also suppressed output from is_real_device/is_loop_device/is_mpath_device - was making the log file very cluttered with useless error messages "ie /dev/mapper/sdc is not a block device" from previous patch Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: David Quigley <david.quigley@intel.com> Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com> Closes #5774	2017-02-28 16:25:39 -08:00
John Wren Kennedy	9060917189	OpenZFS 7248 - large block support breaks rsend_009_pos 7249 rsend_015_pos produces false failures due to race 7250 testrunner can miss options specific to individual tests in runfiles Authored by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7248 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f9a78bf Closes #5799	2017-02-15 17:28:36 -08:00
Olaf Faaland	a454868b0c	Use file-based pools for zpool_expand test 002 and enable it Use -pH flags in get_pool_prop so that numeric properties such as size can be compared. The zpool_expand test suite is currently the only one which uses get_pool_prop for a numeric property. Add TEMPFILE and TEMPFILE{0,1,2} to default.cfg for tests that must build pools on top of files, such as this one where expansion is necessary but the entries in DISKS may not point to entities that can be expanded. Base the pool used for testing on file-type VDEVs instead of using zvols within an underlying pool, to avoid issues that come up when pools are backed by other pools. Remove shell variables EX_1GB and EX_2GB used to recognize correct expansion, and instead calculate the appropriate values based on the variables used to control file or volume size, org_size and exp_size. This change is also made in test 001 although that test is not enabled because it depends on FMA. Finally, enable zpool_expand_002_pos. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5757	2017-02-13 15:30:22 -08:00
George Melikov	501558ee6e	Disable racy snapshot_008_pos Sometimes zfstests check freed space just after `zfs destroy snapshot` and get wrong output, because the space being freed asynchronously in the background. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Issue #5740 Issue #5784 Closes #5785	2017-02-13 12:02:22 -08:00
Matthew Ahrens	a115cf35f8	OpenZFS 7162 - Intermittent failures from ro_props_001_pos Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/7162 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9ec0cbeb Closes #5511 Closes #5779	2017-02-13 11:26:45 -08:00
Simon Klinkert	449705dbef	OpenZFS 5704 - libzfs can only handle 255 file descriptors Authored by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/5704 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/bde3d61 Closes #5767	2017-02-10 10:54:30 -08:00
Matthew Ahrens	d7958b4cda	OpenZFS 7104 - increase indirect block size Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7104 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4b5c8e9 Closes #5679	2017-02-09 10:27:02 -08:00
Brian Behlendorf	b0eac56a4d	Move ziltest.sh to the ZTS framework The ziltest.sh script is a test case designed to verify the correct functioning of the ZIL. For historical reasons it was never added to the test suite and was always run independantly. This change rectifies that. The existing ziltest.sh has been translated in to `slog_015_pos.ksh` and added to the existing slog test cases. Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5758	2017-02-08 17:28:22 -08:00
LOLi	582cc01416	Fix ZFS Test Suite failures caused by ksh brace expansion feature Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5669 Closes #5743	2017-02-06 10:08:10 -08:00
George Melikov	2e0e443ac4	OpenZFS 7247 - zfs receive of deduplicated stream fails Authored by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7247 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2ad25b4 Closes #5689 Porting notes: - tests/zfs-tests/tests/functional/cli_root/zfs_receive/zfs_receive_013_pos.ksh renamed as zfs_receive_015_pos.ksh, zfs_receive_013_pos.ksh is now used for OpenZFS test. - libzfs_sendrecv.c: SMALLEST_POSSIBLE_MAX_DDT_MB is always used for all 32-bit builds.	2017-02-04 09:10:24 -08:00
Chunwei Chen	933ec99951	Retire .write/.read file operations The .write/.read file operations callbacks can be retired since support for .read_iter/.write_iter and .aio_read/.aio_write has been added. The vfs_write()/vfs_read() entry functions will select the correct interface for the kernel. This is desirable because all VFS write/read operations now rely on common code. This change also add the generic write checks to make sure that ulimits are enforced correctly on write. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5587 Closes #5673	2017-01-27 10:43:39 -08:00
George Melikov	774ee3c7ce	OpenZFS 7336 - vfork and O_CLOEXEC causes zfs_mount EBUSY Porting notes: - statvfs64 is replaced by statfs64. - ZFS_SUPER_MAGIC definition moved in include/sys/fs/zfs.h to share it between user and kernel space. Authored by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/7336 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dd862f6d Closes #5651	2017-01-26 12:28:29 -08:00
Brian Behlendorf	f925de3a20	Refresh Linux test suite runfile Associate disabled test cases with existing open issues, update comments to be consistent, disable a few additional test cases. The goal is for all enabled test to pass 100% reliably. The following test cases have been disabled due to infrequent failures during automated testing. Several of these test cases were previous disabled only for the kmemleak builder but have subsequently been observed on other automated builders. - zfs_destroy_001_pos - https://github.com/zfsonlinux/zfs/issues/5635 - zfs_rename_006_pos - https://github.com/zfsonlinux/zfs/issues/5647 - zfs_rename_009_neg - https://github.com/zfsonlinux/zfs/issues/5648 - zpool_clear_001_pos - https://github.com/zfsonlinux/zfs/issues/5634 - zfs_allow_010_pos - https://github.com/zfsonlinux/zfs/issues/5646 - reservation_018_pos - https://github.com/zfsonlinux/zfs/issues/5642 - snapused_004_pos - https://github.com/zfsonlinux/zfs/issues/5513 - rsend_022_pos - https://github.com/zfsonlinux/zfs/issues/5654 - rsend_024_pos - https://github.com/zfsonlinux/zfs/issues/5665 - history_008_pos - https://github.com/zfsonlinux/zfs/issues/5658 - history_006_neg - https://github.com/zfsonlinux/zfs/issues/5657 - history_008_pos - https://github.com/zfsonlinux/zfs/issues/5658 - zfs_inherit_003_pos - https://github.com/zfsonlinux/zfs/issues/5669 Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5649	2017-01-26 12:25:35 -08:00
George Melikov	a39ce90660	OpenZFS 6880 - zdb incorrectly reports feature count mismatch when feature is disabled Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: George Melikov <mail@gmelikov.ru> OpenZFS-issue: https://www.illumos.org/issues/6880 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c5d1600 Closes #5641	2017-01-24 08:59:08 -08:00
Brian Behlendorf	4faf8b6f6f	Disable racy test cases The following test cases may currently fail for benign reasons. Disable them until they can be updated to run reliably. - ro_props_001_pos - only recently enabled in `ce43e88` - nopwrite_volume Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5614	2017-01-19 10:24:27 -08:00
ka7	4e33ba4c38	Fix spelling Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se> Closes #5547 Closes #5543	2017-01-03 11:31:18 -06:00
LOLi	3500a14595	Don't persist temporary pool name on devices Fix a regression accidentally introduced by `e0ab3ab`. Additionally, add a new script zpool_import_014_pos.ksh to the ZFS test suite to exercise 'zpool import -t' functionality. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5466 Closes #5515	2016-12-22 10:39:00 -08:00
LOLi	5f1346c299	Fix dsl_props_set_sync_impl to work with nested nvlist When iterating over the input nvlist in dsl_props_set_sync_impl() when we don't preserve the nvpair name before looking up ZPROP_VALUE, so when we later go to process it nvpair_name() is always "value" and not the actual property name. This fixes a couple of bugs in zfs_ioc_recv(): * Received properties were not restored correctly when failing to receive an incremental send stream * Received properties were not completely replaced by the new ones when successfully receiving an incremental send stream Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #5497	2016-12-20 18:46:59 -08:00
Brian Behlendorf	a3823f428d	Fix file attributes This branch contains the following fixes/improvements. * Fix setting i_flags * Fix wrong operator in xvattr.h * Fix fchange macro in zpl_ioctl_setflags() * Added configure check to use inode_set_flags() * Added a test case for chattr for better test coverage Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5486 Closes #5470 Closes #5469	2016-12-19 13:01:10 -08:00
Tony Hutter	1528bfdb14	Don't run 'zpool iostat -c CMD' command on all vdevs, if vdevs specified zpool iostat allows you to specify only certain vdevs to display. Currently, if you run 'zpool iostat -c CMD vdev1 vdev2 ...' on specific vdevs, it will actually run the command on all vdevs, and just display the results for the vdevs you specify. This patch corrects the behavior to only run the command on the specified vdevs, and also enables the zpool_iostat_005_pos.ksh tests. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #5443	2016-12-16 16:10:45 -08:00
Chunwei Chen	b4d8e2be03	Add test for chattr Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2016-12-16 16:07:41 -08:00
Chunwei Chen	a806cb6a89	Don't count '@' for dataset namelen if not a snapshot Don't count '@' for dataset namelen if not a snapshot. This fixes making a pool unimportable when the dataset namelen is 255. Add test file for zfs create name length 255. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #5432 Closes #5456	2016-12-09 11:52:08 -07:00
ChaoyuZhang	6c09d3e5a0	Enable mountpoint_003_pos Update the test case to correctly interpret how Linux reports the mount options. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn> Closes #5410	2016-12-02 11:20:57 -07:00
ChaoyuZhang	ce43e88dd6	Enable ro_props_001_pos This script was disabled as the avail/used space changed slightly. Add sync_pool() and a short delay after snapshots are created to ensure everything in flight has been written. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn> Closes #5201 Closes #5419	2016-11-30 11:27:04 -07:00
LOLi	2f71caf2d9	Allow zfs unshare <protocol> -a Allow `zfs unshare <protocol> -a` command to share or unshare all datasets of a given protocol, nfs or smb. Additionally, enable most of ZFS Test Suite zfs_share/zfs_unshare test cases. To work around some Illumos-specific functionalities ($SHARE/$UNSHARE) some function wrappers were added around them. Finally, fix and issue in smb_is_share_active() that would leave SMB shares exported when invoking 'zfs unshare -a' Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #3238 Closes #5367	2016-11-29 12:22:38 -07:00
ChaoyuZhang	ce4197c1ca	Enable user_property_002_pos The user_property_002_pos passes as expected. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn> Closes #5406	2016-11-18 16:25:06 -08:00
Chunwei Chen	ace1eae84c	Add support for O_TMPFILE Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on supported filesystem. It's basically doing open(2) and unlink(2) atomically. The filesystem support is added through i_op->tmpfile. We basically copy the create operation except we get rid of the link and name related stuff and add the new node to unlinked set. We also add support for linkat(2) to link tmpfile. However, since all previous file operation will skip ZIL, we force a txg_wait_synced to make sure we are sync safe. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2016-11-04 10:46:40 -07:00
LOLi	e4010f2719	Allow for '-o feature@<feature>=disabled' on the command line Sometimes it is desirable to specifically disable one or several features directly on the 'zpool create' command line. $ zpool create -o feature@<feature>=disabled ... Original-patch-by: Turbo Fredriksson <turbo@bayour.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #3460 Closes #5142 Closes #5324	2016-10-25 16:17:47 -07:00
Brian Behlendorf	66392d81f5	Disable zpool_upgrade_002_pos test case This test case frequently triggers issue #4034. There exists a fix for this which is in the process of being upstreamed. Until that fix is available disable the test case. Reviewed by: George Wilson <george.wilson@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5329 Issue #4034	2016-10-24 16:39:47 -07:00
Brian Behlendorf	13d9a004fe	Fix taskq creation failure in vdev_open_children() When creating and destroying pools in tight loop it's possible to exhaust the number of allowed threads on a system. This results in taskq_create() failling and a NULL dereference. Resolve the issue by falling back to opening the vdevs all synchronously. Reviewed-by: Denys Rtveliashvili <denys@rtveliashvili.name> Reviewed-by: Håkan Johansson <f96hajo@chalmers.se> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/spl#521 Closes #4637	2016-10-24 13:28:58 -07:00
Akash Ayare	3691598e26	OpenZFS 6877 - zfs_rename_006_pos fails due to missing zvol snapshot device file Authored by: Akash Ayare <aayare@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn> Reviewed-by: yuxiang <guo.yong33@zte.com.cn> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Bug was caused due to a change in functionality. At some point, ZFS snapshots no longer created associated device files which were being used in the test. To resolve this issue, a clone of the snapshot can be produced which will also create the expected device files; then, the test will behave as it did historically. OpenZFS-issue: https://www.illumos.org/issues/6877 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2200f27 Closes #5275 Porting Notes: - Hardcoded /dev/zvol/rdsk changed to $ZVOL_RDEVDIR for compatibility. - Enabled in linux runfile.	2016-10-14 10:11:00 -07:00
Brian Behlendorf	7305538de3	Enable zfs_rename_002_pos, zfs_rename_005_neg, zfs_rename_007_pos These tests all pass once updated to wait for udev to create the expected linked under /dev/zvol/. Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn> Reviewed-by: yuxiang <guo.yong33@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5275	2016-10-14 10:11:00 -07:00
liaoyuxiangqin	21237e9167	Enable quota_002_pos, quota_004_pos and quota_005_pos In this test the 'ls -ls' command was used to print testfile size in blocks. Because the environment variable BLOCK_SIZE was set the 'ls -ls' command detected this and output its block count as the number of 8192 blocks. Rather than change the variable name the -k was was added to force ls to return 1k blocks. This has the additional advantage of behaving consistently across platforms. For additional details on GNU 'ls' behavior regarding block size: https://www.gnu.org/software/coreutils/manual/html_node/Block-size.html Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: yuxiang <guo.yong33@zte.com.cn> Closes #5269	2016-10-14 09:33:51 -07:00
Brian Behlendorf	5f014a0cc4	Enable zfs_receive_011_pos The zfs_receive_011_pos test can be enabled now that OpenZFS 6562 has been merged. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5276	2016-10-14 09:17:56 -07:00
liaoyuxiangqin	e8d3dcdfb1	Enable refquota_002_pos and refquota_004_pos The refquota_002_pos and refquota_004_pos test cases can pass without modification. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: yuxiang <guo.yong33@zte.com.cn> Closes #5273	2016-10-13 14:21:15 -07:00
Brian Behlendorf	52f1fe3cfd	Enable zfs_snapshot_008_neg and zfs_snapshot_009_pos (#5260 ) The zfs_snapshot_008_neg test case does not use nested pools and can be safely enabled. The zfs_snapshot_009_pos test case is also passing without modification. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5260	2016-10-11 09:32:31 -07:00
liaoyuxiangqin	45c90a6348	Enable reservation_012_pos, reservation_015_pos and reservation_016_pos Enable reservation_012_pos, reservation_015_pos and reservation_016_pos test cases which are passing. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: yuxiang <guo.yong33@zte.com.cn> Closes #5254	2016-10-11 09:28:49 -07:00
ChaoyuZhang	502291b32c	Enable readonly_001_pos Enable readonly_001_pos this test is now passing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>	2016-10-09 17:50:16 -07:00
LOLi	48f783de79	Fix uninitialized variable snapprops_nvlist in zfs_receive_one The variable snapprops_nvlist was never initialized, so properties were not applied to the received snapshot. Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise 'zfs receive' functionality for user properties. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #4338	2016-10-07 10:05:06 -07:00
Brian Behlendorf	910a571578	Add python style checking Introduce a make recipe for flake8 to enable python style checking. Ensure all python scripts pass flake8. Return an error code of 0 for arcstat.py -v and dbufstat.py -v. Add test cases for python scripts. Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ian Lee <IanLee1521@gmail.com> Closes #5230	2016-10-07 09:54:02 -07:00
Jinshan Xiong	1de321e626	Add support for user/group dnode accounting & quota This patch tracks dnode usage for each user/group in the DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode accounting have the key prefixed with "obj-" followed by the UID/GID in string format (as done for the block accounting). A new SPA feature has been added for dnode accounting as well as a new ZPL version. The SPA feature must be enabled in the pool before upgrading the zfs filesystem. During the zfs version upgrade, a "quotacheck" will be executed by marking all dnode as dirty. ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500 Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>	2016-10-07 09:45:13 -07:00
Giuseppe Di Natale	70c7714dca	Introduce tests for python scripts Implement tests to ensure that python scripts that are distributed with ZFS continue to at minimum run without errors. This will help prevent accidental breaking of these scripts. Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>	2016-10-06 13:11:57 -07:00
Tony Hutter	3c67d83a8a	OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Ported by: Tony Hutter <hutter2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Porting Notes: This code is ported on top of the Illumos Crypto Framework code: `b5e030c8db` The list of porting changes includes: - Copied module/icp/include/sha2/sha2.h directly from illumos - Removed from module/icp/algs/sha2/sha2.c: #pragma inline(SHA256Init, SHA384Init, SHA512Init) - Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since it now takes in an extra parameter. - Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c - Added skein & edonr to libicp/Makefile.am - Added sha512.S. It was generated from sha512-x86_64.pl in Illumos. - Updated ztest.c with new fletcher_4_() args; used NULL for new CTX argument. - In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section to not #include the non-existant endian.h. - In skein_test.c, renane NULL to 0 in "no test vector" array entries to get around a compiler warning. - Fixup test files: - Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>, - Remove <note.h> and define NOTE() as NOP. - Define u_longlong_t - Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p" - Rename NULL to 0 in "no test vector" array entries to get around a compiler warning. - Remove "for isa in $($ISAINFO); do" stuff - Add/update Makefiles - Add some userspace headers like stdio.h/stdlib.h in places of sys/types.h. - EXPORT_SYMBOL _Init/_Update/_Final... routines in ICP modules. - Update scripts/zfs2zol-patch.sed - include <sys/sha2.h> in sha2_impl.h - Add sha2.h to include/sys/Makefile.am - Add skein and edonr dirs to icp Makefile - Add new checksums to zpool_get.cfg - Move checksum switch block from zfs_secpolicy_setprop() to zfs_check_settable() - Fix -Wuninitialized error in edonr_byteorder.h on PPC - Fix stack frame size errors on ARM32 - Don't unroll loops in Skein on 32-bit to save stack space - Add memory barriers in sha2.c on 32-bit to save stack space - Add filetest_001_pos.ksh checksum sanity test - Add option to write psudorandom data in file_write utility	2016-10-03 14:51:15 -07:00
Brian Behlendorf	f4ce6d464c	Disable zpool_import_002_pos and ro_props_001_pos These test cases fail some percentage of the time resulting in automated testing failures. Disable the offending tests until they can be made reliable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #5201 Issue #5202 Closes #5194	2016-09-30 12:12:53 -07:00
liaoyuxiangqin	8a1cf1a560	Fix zfs_clone_010_pos.ksh to verify zfs clones property displays right Because the macro ZFS_MAXPROPLEN used in function print_dataset differs between platforms set it appropriately and calculate the expected number of passes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: yuxiang <guo.yong33@zte.com.cn> Closes #5154	2016-09-29 13:08:44 -07:00
ChaoyuZhang	db6597c6ea	Enable ro_props_001_pos and onoffs_001_pos Enable ro_props_001_pos and onoffs_001_pos which pass reliably. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn> Closes #5183	2016-09-29 12:56:48 -07:00
liaoyuxiangqin	f25bc4938d	Fix zfs_clone_010_pos.ksh to verify the space used by multiple copies The default blocksize in Linux is 1024 due to a GNU-ism. Setting the expected blocksize resolves the issue. As mentioned in the PR an alternate solution would be to set POSIXLY_CORRECT=1. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: yuxiang <guo.yong33@zte.com.cn> Closes #5167	2016-09-29 12:46:13 -07:00
candychencan	df7c4059cb	Enable property_alias_001_pos.ksh Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: candychencan <chen.can2@zte.com.cn> Closes #5175	2016-09-27 11:49:45 -07:00
cao	3ec68a4414	Update zfs destroy test scripts Update and enable zfs_destroy_0[08-13]_*.ksh. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn> Closes #5068	2016-09-22 15:28:34 -07:00
candychencan	84347be098	Fix zfs_destroy_001_pos.ksh Due to how the Linux VFS was designed busy mount points cannot be destroyed even when given the force option. Update the zfs_destroy_001_pos test case to expect this behavior when running under Linux. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: candychencan <chen.can2@zte.com.cn> Closes #5132	2016-09-21 13:51:53 -07:00
Brian Behlendorf	f448f8cddd	Disable zpool_upgrade_004_pos test case This test cause frequently triggers issue #4034. Disable this test case until the root cause of this issue has been addressed. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4034 Closes #5120	2016-09-16 13:25:46 -07:00
Brian Behlendorf	9d69e9b268	Fix zhack argument processing The argument processing is zhack makes the assumption that getopt() will not permute argv. This isn't true for the GNU implementation of getopt() unless the optstring is prefixed with a '+'. In which case this is equivalent to setting the POSIXLY_CORRECT environment variable In addition, update the usage() and optstrings to reflect the existing supported options. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: liaoyuxiangqin <guo.yong33@zte.com.cn> Closes #5047	2016-08-31 14:32:46 -07:00
cao	8fe453b671	Update zfs_destroy_004.ksh script Issues: Under Linux, when executing zfs_destroy_004.ksh destroy $fs is an error. The key issue here is that illumos kernel treats this case differently than the Linux kernel. On illumos you can unmount and destroy a filesystem which is busy and all consumers of it get EIO. On Linux the expected behavior is to prevent the unmount and destroy. Cause analysis: When create $fs file system and mount file system to $mntp. cd $mntp, linux isn't allow to destroy $fs in this mount contents. No matter what destroy with parameters. Solution: So log_mustnot $ZFS destroy $fs is ok. cd $olddir and destroy $fs. Signed-off-by: caoxuewen cao.xuewen@zte.com.cn Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5012	2016-08-30 15:35:54 -07:00
ChaoyuZhang	43cb1c1212	Update zfs_create_003_pos.ksh and zfs_create_006_pos.ksh As the scripts zfs_create_003_pos.ksh and zfs_create_006_pos.ksh can run successfully in the linux, add them to the <linux.run> file to increase test scene. Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5002	2016-08-30 15:29:27 -07:00
liuhuang	2158b165ed	Update zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh Update zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh to reflect the expected Linux behavior. The is_linux wrapper is used so the test case may be used on Linux and non-Linux platforms. Signed-off-by: liuhuang <liu.huang@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5000	2016-08-30 15:24:18 -07:00
Brian Behlendorf	a0cacb760a	Enable history test cases Updated test case history_001_pos.ksh so it can run in tree. The original test case assumed /usr/sbin/zfs and /usr/sbin/zpool were the only valid locations for these utilities. The same modification has already been made too history_common.kshlib. The only other failing test case was history_010_pos and that was the result of the ":linux" suffix not being appended when checking the long output in the test case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4882	2016-07-27 13:38:46 -07:00
Brian Behlendorf	8d9e124515	Enable zpool_upgrade test cases Creating the pool in a striped rather than mirrored configuration provides enough space for all upgrade tests to run. Test case zpool_upgrade_007_pos still fails and must be investigated so it has been left disabled. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4852	2016-07-14 14:05:13 -07:00
Dan McDonald	8c62a0d0f3	OpenZFS 6562 - Refquota on receive doesn't account for overage Authored by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gwr@nexenta.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6562 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5f7a8e6	2016-06-28 13:47:03 -07:00
Paul Dagnelie	e6d3a843d6	OpenZFS 6393 - zfs receive a full send as a clone Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Dan McDonald <danmcd@omniti.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6394 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68ecb2e	2016-06-28 13:47:03 -07:00
Matthew Ahrens	47dfff3b86	OpenZFS 2605, 6980, 6902 2605 want to resume interrupted zfs send Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Dan McDonald <danmcd@omniti.com> Ported-by: kernelOfTruth <kerneloftruth@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/2605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9c3fd12 6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6980 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ea4a67f Porting notes: - All rsend and snapshop tests enabled and updated for Linux. - Fix misuse of input argument in traverse_visitbp(). - Fix ISO C90 warnings and errors. - Fix gcc 'missing braces around initializer' in 'struct send_thread_arg to_arg =' warning. - Replace 4 argument fletcher_4_native() with 3 argument version, this change was made in OpenZFS 4185 which has not been ported. - Part of the sections for 'zfs receive' and 'zfs send' was rewritten and reordered to approximate upstream. - Fix mktree xattr creation, 'user.' prefix required. - Minor fixes to newly enabled test cases - Long holds for volumes allowed during receive for minor registration.	2016-06-28 13:47:02 -07:00
Ned Bass	50c957f702	Implement large_dnode pool feature Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542	2016-06-24 13:13:21 -07:00
Gvozden Neskovic	ab9f4b0b82	SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328	2016-06-21 09:27:26 -07:00
Brian Behlendorf	f74b821a66	Add `zfs allow` and `zfs unallow` support ZFS allows for specific permissions to be delegated to normal users with the `zfs allow` and `zfs unallow` commands. In addition, non- privileged users should be able to run all of the following commands: * zpool [list \| iostat \| status \| get] * zfs [list \| get] Historically this functionality was not available on Linux. In order to add it the secpolicy_* functions needed to be implemented and mapped to the equivalent Linux capability. Only then could the permissions on the `/dev/zfs` be relaxed and the internal ZFS permission checks used. Even with this change some limitations remain. Under Linux only the root user is allowed to modify the namespace (unless it's a private namespace). This means the mount, mountpoint, canmount, unmount, and remount delegations cannot be supported with the existing code. It may be possible to add this functionality in the future. This functionality was validated with the cli_user and delegation test cases from the ZFS Test Suite. These tests exhaustively verify each of the supported permissions which can be delegated and ensures only an authorized user can perform it. Two minor bug fixes were required for test-running.py. First, the Timer() object cannot be safely created in a `try:` block when there is an unconditional `finally` block which references it. Second, when running as a normal user also check for scripts using the both the .ksh and .sh suffixes. Finally, existing users who are simulating delegations by setting group permissions on the /dev/zfs device should revert that customization when updating to a version with this change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #362 Closes #434 Closes #4100 Closes #4394 Closes #4410 Closes #4487	2016-06-07 09:16:52 -07:00
Tony Hutter	193a37cb24	Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues Update the zfs module to collect statistics on average latencies, queue sizes, and keep an internal histogram of all IO latencies. Along with this, update "zpool iostat" with some new options to print out the stats: -l: Include average IO latencies stats: total_wait disk_wait syncq_wait asyncq_wait scrub read write read write read write read write wait ----- ----- ----- ----- ----- ----- ----- ----- ----- - 41ms - 2ms - 46ms - 4ms - - 5ms - 1ms - 1us - 4ms - - 5ms - 1ms - 1us - 4ms - - - - - - - - - - - 49ms - 2ms - 47ms - - - - - - - - - - - - - 2ms - 1ms - - - 1ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- 1ms 1ms 1ms 413us 16us 25us - 5ms - 1ms 1ms 1ms 413us 16us 25us - 5ms - 2ms 1ms 2ms 412us 26us 25us - 5ms - - 1ms - 413us - 25us - 5ms - - 1ms - 460us - 29us - 5ms - 196us 1ms 196us 370us 7us 23us - 5ms - ----- ----- ----- ----- ----- ----- ----- ----- ----- -w: Print out latency histograms: sdb total disk sync_queue async_queue latency read write read write read write read write scrub ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ 1ns 0 0 0 0 0 0 0 0 0 ... 33us 0 0 0 0 0 0 0 0 0 66us 0 0 107 2486 2 788 12 12 0 131us 2 797 359 4499 10 558 184 184 6 262us 22 801 264 1563 10 286 287 287 24 524us 87 575 71 52086 15 1063 136 136 92 1ms 152 1190 5 41292 4 1693 252 252 141 2ms 245 2018 0 50007 0 2322 371 371 220 4ms 189 7455 22 162957 0 3912 6726 6726 199 8ms 108 9461 0 102320 0 5775 2526 2526 86 17ms 23 11287 0 37142 0 8043 1813 1813 19 34ms 0 14725 0 24015 0 11732 3071 3071 0 67ms 0 23597 0 7914 0 18113 5025 5025 0 134ms 0 33798 0 254 0 25755 7326 7326 0 268ms 0 51780 0 12 0 41593 10002 10002 0 537ms 0 77808 0 0 0 64255 13120 13120 0 1s 0 105281 0 0 0 83805 20841 20841 0 2s 0 88248 0 0 0 73772 14006 14006 0 4s 0 47266 0 0 0 29783 17176 17176 0 9s 0 10460 0 0 0 4130 6295 6295 0 17s 0 0 0 0 0 0 0 0 0 34s 0 0 0 0 0 0 0 0 0 69s 0 0 0 0 0 0 0 0 0 137s 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------- -h: Help -H: Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space. -q: Include current number of entries in sync & async read/write queues, and scrub queue: syncq_read syncq_write asyncq_read asyncq_write scrubq_read pend activ pend activ pend activ pend activ pend activ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 0 0 78 29 0 0 0 0 0 0 0 0 78 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0 0 227 394 0 19 0 0 0 0 0 0 227 394 0 19 0 0 0 0 0 0 108 98 0 19 0 0 0 0 0 0 19 98 0 0 0 0 0 0 0 0 78 98 0 0 0 0 0 0 0 0 19 88 0 0 0 0 0 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -p: Display numbers in parseable (exact) values. Also, update iostat syntax to allow the user to specify specific vdevs to show statistics for. The three options for choosing pools/vdevs are: Display a list of pools: zpool iostat ... [pool ...] Display a list of vdevs from a specific pool: zpool iostat ... [pool vdev ...] Display a list of vdevs from any pools: zpool iostat ... [vdev ...] Lastly, allow zpool command "interval" value to be floating point: zpool iostat -v 0.5 Signed-off-by: Tony Hutter <hutter2@llnl.gov Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4433	2016-05-12 12:36:32 -07:00
Joe Stein	e0ab3ab553	OpenZFS 6736 - ZFS per-vdev ZAPs 6736 ZFS per-vdev ZAPs Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/6736 https://github.com/openzfs/openzfs/commit/215198a Ported-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4515	2016-05-02 14:27:45 -07:00
Ned Bass	98f03691a4	Fix ZPL miswrite of default POSIX ACL Commit `4967a3e` introduced a typo that caused the ZPL to store the intended default ACL as an access ACL. Due to caching this problem may not become visible until the filesystem is remounted or the inode is evicted from the cache. Fix the typo and add a regression test. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4520	2016-04-18 11:26:55 -07:00
Chunwei Chen	2b54cb1451	Add zfs-tests for relatime Add atime_003_pos to test relatime=on, we do check_atime_updated twice, the first time should success and the second time should fail. We also modify atime_001_pos to do check_atime_updated twice and both times should succeed. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4482	2016-04-05 18:56:06 -07:00
Brian Behlendorf	c35b188246	Fix zpool_scrub_* test cases The zpool_scrub_002, zpool_scrub_003, zpool_scrub_004 test cases fail reliably when running against small pools or fast storage. This occurs because the scrub/resilver operation completes before subsequent commands can be run. A one second delay has been added to 10% of zio's in order to ensure the scrub/resilver operation will run for at least several seconds. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4450	2016-03-30 09:30:34 -07:00
Brian Behlendorf	505d9655c9	Fix zdb -e and zhack thread_init() This issue was caused by calling `thread_init()` and `thread_fini()` multiple times resulting in `kthread_key` being invalid. To resolve the issue the explicit calls to `thread_init()` and `thread_fini()` required by the `zpool` command have been moved in to the command. Consumers such as `zdb` and `zhack` perform the same initialized through `kernel_init()` and `kernel_fini()`. Resolving this issue allows multiple additional test cases to be enabled. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #4331	2016-03-21 10:20:02 -07:00
Brian Behlendorf	99d0a9c39e	Disable zpool_add_004_pos test case This test case add a zvol to as a vdev to an existing pool. This use case is currently known to be racy. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-03-17 09:56:18 -07:00
Brian Behlendorf	6bb24f4dc7	Add the ZFS Test Suite Add the ZFS Test Suite and test-runner framework from illumos. This is a continuation of the work done by Turbo Fredriksson to port the ZFS Test Suite to Linux. While this work was originally conceived as a stand alone project integrating it directly with the ZoL source tree has several advantages: * Allows the ZFS Test Suite to be packaged in zfs-test package. * Facilitates easy integration with the CI testing. * Users can locally run the ZFS Test Suite to validate ZFS. This testing should ONLY be done on a dedicated test system because the ZFS Test Suite in its current form is destructive. * Allows the ZFS Test Suite to be run directly in the ZoL source tree enabled developers to iterate quickly during development. * Developers can easily add/modify tests in the framework as features are added or functionality is changed. The tests will then always be in sync with the implementation. Full documentation for how to run the ZFS Test Suite is available in the tests/README.md file. Warning: This test suite is designed to be run on a dedicated test system. It will make modifications to the system including, but not limited to, the following. * Adding new users * Adding new groups * Modifying the following /proc files: * /proc/sys/kernel/core_pattern * /proc/sys/kernel/core_uses_pid * Creating directories under / Notes: * Not all of the test cases are expected to pass and by default these test cases are disabled. The failures are primarily due to assumption made for illumos which are invalid under Linux. * When updating these test cases it should be done in as generic a way as possible so the patch can be submitted back upstream. Most existing library functions have been updated to be Linux aware, and the following functions and variables have been added. * Functions: * is_linux - Used to wrap a Linux specific section. * block_device_wait - Waits for block devices to be added to /dev/. * Variables: Linux Illumos * ZVOL_DEVDIR "/dev/zvol" "/dev/zvol/dsk" * ZVOL_RDEVDIR "/dev/zvol" "/dev/zvol/rdsk" * DEV_DSKDIR "/dev" "/dev/dsk" * DEV_RDSKDIR "/dev" "/dev/rdsk" * NEWFS_DEFAULT_FS "ext2" "ufs" * Many of the disabled test cases fail because 'zfs/zpool destroy' returns EBUSY. This is largely causes by the asynchronous nature of device handling on Linux and is expected, the impacted test cases will need to be updated to handle this. * There are several test cases which have been disabled because they can trigger a deadlock. A primary example of this is to recursively create zpools within zpools. These tests have been disabled until the root issue can be addressed. * Illumos specific utilities such as (mkfile) should be added to the tests/zfs-tests/cmd/ directory. Custom programs required by the test scripts can also be added here. * SELinux should be either is permissive mode or disabled when running the tests. The test cases should be updated to conform to a standard policy. * Redundant test functionality has been removed (zfault.sh). * Existing test scripts (zconfig.sh) should be migrated to use the framework for consistency and ease of testing. * The DISKS environment variable currently only supports loopback devices because of how the ZFS Test Suite expects partitions to be named (p1, p2, etc). Support must be added to generate the correct partition name based on the device location and name. * The ZFS Test Suite is part of the illumos code base at: https://github.com/illumos/illumos-gate/tree/master/usr/src/test Original-patch-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #6 Closes #1534	2016-03-16 13:46:16 -07:00

... 2 3 4 5 6

297 Commits