Commit Graph

298 Commits

Author SHA1 Message Date
ChaoyuZhang ce43e88dd6 Enable ro_props_001_pos
This script was disabled as the avail/used space changed slightly.
Add sync_pool() and a short delay after snapshots are created to
ensure everything in flight has been written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes #5201 
Closes #5419
2016-11-30 11:27:04 -07:00
LOLi 2f71caf2d9 Allow zfs unshare <protocol> -a
Allow `zfs unshare <protocol> -a` command to share or unshare all datasets
of a given protocol, nfs or smb.

Additionally, enable most of ZFS Test Suite zfs_share/zfs_unshare test cases.
To work around some Illumos-specific functionalities ($SHARE/$UNSHARE) some
function wrappers were added around them.

Finally, fix and issue in smb_is_share_active() that would leave SMB shares
exported when invoking 'zfs unshare -a'

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3238 
Closes #5367
2016-11-29 12:22:38 -07:00
ChaoyuZhang ce4197c1ca Enable user_property_002_pos
The user_property_002_pos passes as expected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes #5406
2016-11-18 16:25:06 -08:00
Chunwei Chen ace1eae84c Add support for O_TMPFILE
Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.

The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.

We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-11-04 10:46:40 -07:00
LOLi e4010f2719 Allow for '-o feature@<feature>=disabled' on the command line
Sometimes it is desirable to specifically disable one or several
features directly on the 'zpool create' command line.

$ zpool create -o feature@<feature>=disabled ...

Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3460 
Closes #5142 
Closes #5324
2016-10-25 16:17:47 -07:00
Brian Behlendorf 66392d81f5 Disable zpool_upgrade_002_pos test case
This test case frequently triggers issue #4034.  There exists a
fix for this which is in the process of being upstreamed.  Until
that fix is available disable the test case.

Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5329 
Issue #4034
2016-10-24 16:39:47 -07:00
Brian Behlendorf 13d9a004fe Fix taskq creation failure in vdev_open_children()
When creating and destroying pools in tight loop it's possible to
exhaust the number of allowed threads on a system.  This results
in taskq_create() failling and a NULL dereference.

Resolve the issue by falling back to opening the vdevs all
synchronously.

Reviewed-by: Denys Rtveliashvili <denys@rtveliashvili.name>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#521
Closes #4637
2016-10-24 13:28:58 -07:00
Akash Ayare 3691598e26 OpenZFS 6877 - zfs_rename_006_pos fails due to missing zvol snapshot device file
Authored by: Akash Ayare <aayare@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Bug was caused due to a change in functionality. At some point, ZFS
snapshots no longer created associated device files which were being
used in the test. To resolve this issue, a clone of the snapshot can be
produced which will also create the expected device files; then, the
test will behave as it did historically.

OpenZFS-issue: https://www.illumos.org/issues/6877
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2200f27
Closes #5275

Porting Notes:
- Hardcoded /dev/zvol/rdsk changed to $ZVOL_RDEVDIR for compatibility.
- Enabled in linux runfile.
2016-10-14 10:11:00 -07:00
Brian Behlendorf 7305538de3 Enable zfs_rename_002_pos, zfs_rename_005_neg, zfs_rename_007_pos
These tests all pass once updated to wait for udev to create the
expected linked under /dev/zvol/.

Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5275
2016-10-14 10:11:00 -07:00
liaoyuxiangqin 21237e9167 Enable quota_002_pos, quota_004_pos and quota_005_pos
In this test the 'ls -ls' command was used to print testfile size in
blocks.  Because the environment variable BLOCK_SIZE was set
the 'ls -ls' command detected this and output its block count as the
number of 8192 blocks.  Rather than change the variable name
the -k was was added to force ls to return 1k blocks.  This has the
additional advantage of behaving consistently across platforms.

For additional details on GNU 'ls' behavior regarding block size:

https://www.gnu.org/software/coreutils/manual/html_node/Block-size.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5269
2016-10-14 09:33:51 -07:00
Brian Behlendorf 5f014a0cc4 Enable zfs_receive_011_pos
The zfs_receive_011_pos test can be enabled now that OpenZFS 6562
has been merged.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5276
2016-10-14 09:17:56 -07:00
liaoyuxiangqin e8d3dcdfb1 Enable refquota_002_pos and refquota_004_pos
The refquota_002_pos and refquota_004_pos test cases can pass
without modification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5273
2016-10-13 14:21:15 -07:00
Brian Behlendorf 52f1fe3cfd Enable zfs_snapshot_008_neg and zfs_snapshot_009_pos (#5260)
The zfs_snapshot_008_neg test case does not use nested pools and
can be safely enabled.  The zfs_snapshot_009_pos test case is
also passing without modification.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5260
2016-10-11 09:32:31 -07:00
liaoyuxiangqin 45c90a6348 Enable reservation_012_pos, reservation_015_pos and reservation_016_pos
Enable reservation_012_pos, reservation_015_pos and reservation_016_pos
test cases which are passing.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5254
2016-10-11 09:28:49 -07:00
ChaoyuZhang 502291b32c Enable readonly_001_pos
Enable readonly_001_pos this test is now passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
2016-10-09 17:50:16 -07:00
LOLi 48f783de79 Fix uninitialized variable snapprops_nvlist in zfs_receive_one
The variable snapprops_nvlist was never initialized, so properties
were not applied to the received snapshot.

Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise
'zfs receive' functionality for user properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4338
2016-10-07 10:05:06 -07:00
Brian Behlendorf 910a571578 Add python style checking
Introduce a make recipe for flake8 to enable python
style checking. Ensure all python scripts pass flake8.
Return an error code of 0 for arcstat.py -v and
dbufstat.py -v.  Add test cases for python scripts.

Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ian Lee <IanLee1521@gmail.com>
Closes #5230
2016-10-07 09:54:02 -07:00
Jinshan Xiong 1de321e626 Add support for user/group dnode accounting & quota
This patch tracks dnode usage for each user/group in the
DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode
accounting have the key prefixed with "obj-" followed by the UID/GID
in string format (as done for the block accounting).
A new SPA feature has been added for dnode accounting as well as
a new ZPL version. The SPA feature must be enabled in the pool
before upgrading the zfs filesystem. During the zfs version upgrade,
a "quotacheck" will be executed by marking all dnode as dirty.

ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
2016-10-07 09:45:13 -07:00
Giuseppe Di Natale 70c7714dca Introduce tests for python scripts
Implement tests to ensure that python scripts
that are distributed with ZFS continue to at
minimum run without errors. This will help prevent
accidental breaking of these scripts.

Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
2016-10-06 13:11:57 -07:00
Tony Hutter 3c67d83a8a OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported by: Tony Hutter <hutter2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/4185
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee

Porting Notes:
This code is ported on top of the Illumos Crypto Framework code:

    b5e030c8db

The list of porting changes includes:

- Copied module/icp/include/sha2/sha2.h directly from illumos

- Removed from module/icp/algs/sha2/sha2.c:
	#pragma inline(SHA256Init, SHA384Init, SHA512Init)

- Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since
  it now takes in an extra parameter.

- Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c

- Added skein & edonr to libicp/Makefile.am

- Added sha512.S.  It was generated from sha512-x86_64.pl in Illumos.

- Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument.

- In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section
  to not #include the non-existant endian.h.

- In skein_test.c, renane NULL to 0 in "no test vector" array entries to get
  around a compiler warning.

- Fixup test files:
	- Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>,
	- Remove <note.h> and define NOTE() as NOP.
	- Define u_longlong_t
	- Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p"
	- Rename NULL to 0 in "no test vector" array entries to get around a
	  compiler warning.
	- Remove "for isa in $($ISAINFO); do" stuff
	- Add/update Makefiles
	- Add some userspace headers like stdio.h/stdlib.h in places of
	  sys/types.h.

- EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules.

- Update scripts/zfs2zol-patch.sed

- include <sys/sha2.h> in sha2_impl.h

- Add sha2.h to include/sys/Makefile.am

- Add skein and edonr dirs to icp Makefile

- Add new checksums to zpool_get.cfg

- Move checksum switch block from zfs_secpolicy_setprop() to
  zfs_check_settable()

- Fix -Wuninitialized error in edonr_byteorder.h on PPC

- Fix stack frame size errors on ARM32
  	- Don't unroll loops in Skein on 32-bit to save stack space
  	- Add memory barriers in sha2.c on 32-bit to save stack space

- Add filetest_001_pos.ksh checksum sanity test

- Add option to write psudorandom data in file_write utility
2016-10-03 14:51:15 -07:00
Brian Behlendorf f4ce6d464c Disable zpool_import_002_pos and ro_props_001_pos
These test cases fail some percentage of the time resulting
in automated testing failures.  Disable the offending tests
until they can be made reliable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5201 
Issue #5202 
Closes #5194
2016-09-30 12:12:53 -07:00
liaoyuxiangqin 8a1cf1a560 Fix zfs_clone_010_pos.ksh to verify zfs clones property displays right
Because the macro ZFS_MAXPROPLEN used in function print_dataset
differs between platforms set it appropriately and calculate the expected
number of passes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5154
2016-09-29 13:08:44 -07:00
ChaoyuZhang db6597c6ea Enable ro_props_001_pos and onoffs_001_pos
Enable ro_props_001_pos and onoffs_001_pos which pass reliably.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes #5183
2016-09-29 12:56:48 -07:00
liaoyuxiangqin f25bc4938d Fix zfs_clone_010_pos.ksh to verify the space used by multiple copies
The default blocksize in Linux is 1024 due to a GNU-ism.  Setting the
expected blocksize resolves the issue.  As mentioned in the PR an
alternate solution would be to set POSIXLY_CORRECT=1.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5167
2016-09-29 12:46:13 -07:00
candychencan df7c4059cb Enable property_alias_001_pos.ksh
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes #5175
2016-09-27 11:49:45 -07:00
cao 3ec68a4414 Update zfs destroy test scripts
Update and enable zfs_destroy_0[08-13]_*.ksh.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5068
2016-09-22 15:28:34 -07:00
candychencan 84347be098 Fix zfs_destroy_001_pos.ksh
Due to how the Linux VFS was designed busy mount points
cannot be destroyed even when given the force option.  Update
the zfs_destroy_001_pos test case to expect this behavior when
running under Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes #5132
2016-09-21 13:51:53 -07:00
Brian Behlendorf f448f8cddd Disable zpool_upgrade_004_pos test case
This test cause frequently triggers issue #4034.  Disable this
test case until the root cause of this issue has been addressed.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4034 
Closes #5120
2016-09-16 13:25:46 -07:00
Brian Behlendorf 9d69e9b268 Fix zhack argument processing
The argument processing is zhack makes the assumption that getopt()
will not permute argv.  This isn't true for the GNU implementation of
getopt() unless the optstring is prefixed with a '+'.  In which case
this is equivalent to setting the POSIXLY_CORRECT environment variable

In addition, update the usage() and optstrings to reflect the existing
supported options.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: liaoyuxiangqin <guo.yong33@zte.com.cn>
Closes #5047
2016-08-31 14:32:46 -07:00
cao 8fe453b671 Update zfs_destroy_004.ksh script
Issues:
Under Linux, when executing zfs_destroy_004.ksh destroy $fs is an
error.  The key issue here is that illumos kernel treats this case
differently than the Linux kernel. On illumos you can unmount and
destroy a filesystem which is busy and all consumers of it get EIO.
On Linux the expected behavior is to prevent the unmount and destroy.

Cause analysis:
When create $fs file system and mount file system to $mntp.
cd $mntp, linux isn't allow to destroy $fs in this mount contents.
No matter what destroy with parameters.

Solution:
So  log_mustnot $ZFS destroy $fs is ok.
cd $olddir and destroy $fs.

Signed-off-by: caoxuewen cao.xuewen@zte.com.cn
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5012
2016-08-30 15:35:54 -07:00
ChaoyuZhang 43cb1c1212 Update zfs_create_003_pos.ksh and zfs_create_006_pos.ksh
As the scripts zfs_create_003_pos.ksh and zfs_create_006_pos.ksh can
run successfully in the linux, add them to the <linux.run> file to
increase test scene.

Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5002
2016-08-30 15:29:27 -07:00
liuhuang 2158b165ed Update zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh
Update zfs_mount_005_pos.ksh and zfs_mount_010_neg.ksh to reflect
the expected Linux behavior.  The is_linux wrapper is used so the
test case may be used on Linux and non-Linux platforms.

Signed-off-by: liuhuang <liu.huang@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5000
2016-08-30 15:24:18 -07:00
Brian Behlendorf a0cacb760a Enable history test cases
Updated test case history_001_pos.ksh so it can run in tree.  The
original test case assumed /usr/sbin/zfs and /usr/sbin/zpool were
the only valid locations for these utilities.  The same modification
has already been made too history_common.kshlib.

The only other failing test case was history_010_pos and that was
the result of the ":linux" suffix not being appended when checking
the long output in the test case.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4882
2016-07-27 13:38:46 -07:00
Brian Behlendorf 8d9e124515 Enable zpool_upgrade test cases
Creating the pool in a striped rather than mirrored configuration
provides enough space for all upgrade tests to run.  Test case
zpool_upgrade_007_pos still fails and must be investigated so
it has been left disabled.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4852
2016-07-14 14:05:13 -07:00
Dan McDonald 8c62a0d0f3 OpenZFS 6562 - Refquota on receive doesn't account for overage
Authored by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/6562
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5f7a8e6
2016-06-28 13:47:03 -07:00
Paul Dagnelie e6d3a843d6 OpenZFS 6393 - zfs receive a full send as a clone
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/6394
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68ecb2e
2016-06-28 13:47:03 -07:00
Matthew Ahrens 47dfff3b86 OpenZFS 2605, 6980, 6902
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/2605
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9c3fd12

6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/6980
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ea4a67f

Porting notes:
- All rsend and snapshop tests enabled and updated for Linux.
- Fix misuse of input argument in traverse_visitbp().
- Fix ISO C90 warnings and errors.
- Fix gcc 'missing braces around initializer' in
  'struct send_thread_arg to_arg =' warning.
- Replace 4 argument fletcher_4_native() with 3 argument version,
  this change was made in OpenZFS 4185 which has not been ported.
- Part of the sections for 'zfs receive' and 'zfs send' was
  rewritten and reordered to approximate upstream.
- Fix mktree xattr creation, 'user.' prefix required.
- Minor fixes to newly enabled test cases
- Long holds for volumes allowed during receive for minor registration.
2016-06-28 13:47:02 -07:00
Ned Bass 50c957f702 Implement large_dnode pool feature
Justification
-------------

This feature adds support for variable length dnodes. Our motivation is
to eliminate the overhead associated with using spill blocks.  Spill
blocks are used to store system attribute data (i.e. file metadata) that
does not fit in the dnode's bonus buffer. By allowing a larger bonus
buffer area the use of a spill block can be avoided.  Spill blocks
potentially incur an additional read I/O for every dnode in a dnode
block. As a worst case example, reading 32 dnodes from a 16k dnode block
and all of the spill blocks could issue 33 separate reads. Now suppose
those dnodes have size 1024 and therefore don't need spill blocks.  Then
the worst case number of blocks read is reduced to from 33 to two--one
per dnode block. In practice spill blocks may tend to be co-located on
disk with the dnode blocks so the reduction in I/O would not be this
drastic. In a badly fragmented pool, however, the improvement could be
significant.

ZFS-on-Linux systems that make heavy use of extended attributes would
benefit from this feature. In particular, ZFS-on-Linux supports the
xattr=sa dataset property which allows file extended attribute data
to be stored in the dnode bonus buffer as an alternative to the
traditional directory-based format. Workloads such as SELinux and the
Lustre distributed filesystem often store enough xattr data to force
spill bocks when xattr=sa is in effect. Large dnodes may therefore
provide a performance benefit to such systems.

Other use cases that may benefit from this feature include files with
large ACLs and symbolic links with long target names. Furthermore,
this feature may be desirable on other platforms in case future
applications or features are developed that could make use of a
larger bonus buffer area.

Implementation
--------------

The size of a dnode may be a multiple of 512 bytes up to the size of
a dnode block (currently 16384 bytes). A dn_extra_slots field was
added to the current on-disk dnode_phys_t structure to describe the
size of the physical dnode on disk. The 8 bits for this field were
taken from the zero filled dn_pad2 field. The field represents how
many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
This convention results in a value of 0 for 512 byte dnodes which
preserves on-disk format compatibility with older software.

Similarly, the in-memory dnode_t structure has a new dn_num_slots field
to represent the total number of dnode_phys_t slots consumed on disk.
Thus dn->dn_num_slots is 1 greater than the corresponding
dnp->dn_extra_slots. This difference in convention was adopted
because, unlike on-disk structures, backward compatibility is not a
concern for in-memory objects, so we used a more natural way to
represent size for a dnode_t.

The default size for newly created dnodes is determined by the value of
a new "dnodesize" dataset property. By default the property is set to
"legacy" which is compatible with older software. Setting the property
to "auto" will allow the filesystem to choose the most suitable dnode
size. Currently this just sets the default dnode size to 1k, but future
code improvements could dynamically choose a size based on observed
workload patterns. Dnodes of varying sizes can coexist within the same
dataset and even within the same dnode block. For example, to enable
automatically-sized dnodes, run

 # zfs set dnodesize=auto tank/fish

The user can also specify literal values for the dnodesize property.
These are currently limited to powers of two from 1k to 16k. The
power-of-2 limitation is only for simplicity of the user interface.
Internally the implementation can handle any multiple of 512 up to 16k,
and consumers of the DMU API can specify any legal dnode value.

The size of a new dnode is determined at object allocation time and
stored as a new field in the znode in-memory structure. New DMU
interfaces are added to allow the consumer to specify the dnode size
that a newly allocated object should use. Existing interfaces are
unchanged to avoid having to update every call site and to preserve
compatibility with external consumers such as Lustre. The new
interfaces names are given below. The versions of these functions that
don't take a dnodesize parameter now just call the _dnsize() versions
with a dnodesize of 0, which means use the legacy dnode size.

New DMU interfaces:
  dmu_object_alloc_dnsize()
  dmu_object_claim_dnsize()
  dmu_object_reclaim_dnsize()

New ZAP interfaces:
  zap_create_dnsize()
  zap_create_norm_dnsize()
  zap_create_flags_dnsize()
  zap_create_claim_norm_dnsize()
  zap_create_link_dnsize()

The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
spa_maxdnodesize() function should be used to determine the maximum
bonus length for a pool.

These are a few noteworthy changes to key functions:

* The prototype for dnode_hold_impl() now takes a "slots" parameter.
  When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
  ensure the hole at the specified object offset is large enough to
  hold the dnode being created. The slots parameter is also used
  to ensure a dnode does not span multiple dnode blocks. In both of
  these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
  these failure cases are only possible when using DNODE_MUST_BE_FREE.

  If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
  dnode_hold_impl() will check if the requested dnode is already
  consumed as an extra dnode slot by an large dnode, in which case
  it returns ENOENT.

* The function dmu_object_alloc() advances to the next dnode block
  if dnode_hold_impl() returns an error for a requested object.
  This is because the beginning of the next dnode block is the only
  location it can safely assume to either be a hole or a valid
  starting point for a dnode.

* dnode_next_offset_level() and other functions that iterate
  through dnode blocks may no longer use a simple array indexing
  scheme. These now use the current dnode's dn_num_slots field to
  advance to the next dnode in the block. This is to ensure we
  properly skip the current dnode's bonus area and don't interpret it
  as a valid dnode.

zdb
---
The zdb command was updated to display a dnode's size under the
"dnsize" column when the object is dumped.

For ZIL create log records, zdb will now display the slot count for
the object.

ztest
-----
Ztest chooses a random dnodesize for every newly created object. The
random distribution is more heavily weighted toward small dnodes to
better simulate real-world datasets.

Unused bonus buffer space is filled with non-zero values computed from
the object number, dataset id, offset, and generation number.  This
helps ensure that the dnode traversal code properly skips the interior
regions of large dnodes, and that these interior regions are not
overwritten by data belonging to other dnodes. A new test visits each
object in a dataset. It verifies that the actual dnode size matches what
was stored in the ztest block tag when it was created. It also verifies
that the unused bonus buffer space is filled with the expected data
patterns.

ZFS Test Suite
--------------
Added six new large dnode-specific tests, and integrated the dnodesize
property into existing tests for zfs allow and send/recv.

Send/Receive
------------
ZFS send streams for datasets containing large dnodes cannot be received
on pools that don't support the large_dnode feature. A send stream with
large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
unrecognized by an incompatible receiving pool so that the zfs receive
will fail gracefully.

While not implemented here, it may be possible to generate a
backward-compatible send stream from a dataset containing large
dnodes. The implementation may be tricky, however, because the send
object record for a large dnode would need to be resized to a 512
byte dnode, possibly kicking in a spill block in the process. This
means we would need to construct a new SA layout and possibly
register it in the SA layout object. The SA layout is normally just
sent as an ordinary object record. But if we are constructing new
layouts while generating the send stream we'd have to build the SA
layout object dynamically and send it at the end of the stream.

For sending and receiving between pools that do support large dnodes,
the drr_object send record type is extended with a new field to store
the dnode slot count. This field was repurposed from unused padding
in the structure.

ZIL Replay
----------
The dnode slot count is stored in the uppermost 8 bits of the lr_foid
field. The bits were unused as the object id is currently capped at
48 bits.

Resizing Dnodes
---------------
It should be possible to resize a dnode when it is dirtied if the
current dnodesize dataset property differs from the dnode's size, but
this functionality is not currently implemented. Clearly a dnode can
only grow if there are sufficient contiguous unused slots in the
dnode block, but it should always be possible to shrink a dnode.
Growing dnodes may be useful to reduce fragmentation in a pool with
many spill blocks in use. Shrinking dnodes may be useful to allow
sending a dataset to a pool that doesn't support the large_dnode
feature.

Feature Reference Counting
--------------------------
The reference count for the large_dnode pool feature tracks the
number of datasets that have ever contained a dnode of size larger
than 512 bytes. The first time a large dnode is created in a dataset
the dataset is converted to an extensible dataset. This is a one-way
operation and the only way to decrement the feature count is to
destroy the dataset, even if the dataset no longer contains any large
dnodes. The complexity of reference counting on a per-dnode basis was
too high, so we chose to track it on a per-dataset basis similarly to
the large_block feature.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3542
2016-06-24 13:13:21 -07:00
Gvozden Neskovic ab9f4b0b82 SIMD implementation of vdev_raidz generate and reconstruct routines
This is a new implementation of RAIDZ1/2/3 routines using x86_64
scalar, SSE, and AVX2 instruction sets. Included are 3 parity
generation routines (P, PQ, and PQR) and 7 reconstruction routines,
for all RAIDZ level. On module load, a quick benchmark of supported
routines will select the fastest for each operation and they will
be used at runtime. Original implementation is still present and
can be selected via module parameter.

Patch contains:
- specialized gen/rec routines for all RAIDZ levels,
- new scalar raidz implementation (unrolled),
- two x86_64 SIMD implementations (SSE and AVX2 instructions sets),
- fastest routines selected on module load (benchmark).
- cmd/raidz_test - verify and benchmark all implementations
- added raidz_test to the ZFS Test Suite

New zfs module parameters:
- zfs_vdev_raidz_impl (str): selects the implementation to use. On
  module load, the parameter will only accept first 3 options, and
  the other implementations can be set once module is finished
  loading. Possible values for this option are:
    "fastest" - use the fastest math available
    "original" - use the original raidz code
    "scalar" - new scalar impl
    "sse" - new SSE impl if available
    "avx2" - new AVX2 impl if available

See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to
get the list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4328
2016-06-21 09:27:26 -07:00
Brian Behlendorf f74b821a66 Add `zfs allow` and `zfs unallow` support
ZFS allows for specific permissions to be delegated to normal users
with the `zfs allow` and `zfs unallow` commands.  In addition, non-
privileged users should be able to run all of the following commands:

  * zpool [list | iostat | status | get]
  * zfs [list | get]

Historically this functionality was not available on Linux.  In order
to add it the secpolicy_* functions needed to be implemented and mapped
to the equivalent Linux capability.  Only then could the permissions on
the `/dev/zfs` be relaxed and the internal ZFS permission checks used.

Even with this change some limitations remain.  Under Linux only the
root user is allowed to modify the namespace (unless it's a private
namespace).  This means the mount, mountpoint, canmount, unmount,
and remount delegations cannot be supported with the existing code.  It
may be possible to add this functionality in the future.

This functionality was validated with the cli_user and delegation test
cases from the ZFS Test Suite.  These tests exhaustively verify each
of the supported permissions which can be delegated and ensures only
an authorized user can perform it.

Two minor bug fixes were required for test-running.py.  First, the
Timer() object cannot be safely created in a `try:` block when there
is an unconditional `finally` block which references it.  Second,
when running as a normal user also check for scripts using the
both the .ksh and .sh suffixes.

Finally, existing users who are simulating delegations by setting
group permissions on the /dev/zfs device should revert that
customization when updating to a version with this change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #362 
Closes #434 
Closes #4100
Closes #4394 
Closes #4410 
Closes #4487
2016-06-07 09:16:52 -07:00
Tony Hutter 193a37cb24 Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues
Update the zfs module to collect statistics on average latencies, queue sizes,
and keep an internal histogram of all IO latencies.  Along with this, update
"zpool iostat" with some new options to print out the stats:

-l: Include average IO latencies stats:

 total_wait     disk_wait    syncq_wait    asyncq_wait  scrub
 read  write   read  write   read  write   read  write   wait
-----  -----  -----  -----  -----  -----  -----  -----  -----
    -   41ms      -    2ms      -   46ms      -    4ms      -
    -    5ms      -    1ms      -    1us      -    4ms      -
    -    5ms      -    1ms      -    1us      -    4ms      -
    -      -      -      -      -      -      -      -      -
    -   49ms      -    2ms      -   47ms      -      -      -
    -      -      -      -      -      -      -      -      -
    -    2ms      -    1ms      -      -      -    1ms      -
-----  -----  -----  -----  -----  -----  -----  -----  -----
  1ms    1ms    1ms  413us   16us   25us      -    5ms      -
  1ms    1ms    1ms  413us   16us   25us      -    5ms      -
  2ms    1ms    2ms  412us   26us   25us      -    5ms      -
    -    1ms      -  413us      -   25us      -    5ms      -
    -    1ms      -  460us      -   29us      -    5ms      -
196us    1ms  196us  370us    7us   23us      -    5ms      -
-----  -----  -----  -----  -----  -----  -----  -----  -----

-w: Print out latency histograms:

sdb           total           disk         sync_queue      async_queue
latency    read   write    read   write    read   write    read   write   scrub
-------  ------  ------  ------  ------  ------  ------  ------  ------  ------
1ns           0       0       0       0       0       0       0       0       0
...
33us          0       0       0       0       0       0       0       0       0
66us          0       0     107    2486       2     788      12      12       0
131us         2     797     359    4499      10     558     184     184       6
262us        22     801     264    1563      10     286     287     287      24
524us        87     575      71   52086      15    1063     136     136      92
1ms         152    1190       5   41292       4    1693     252     252     141
2ms         245    2018       0   50007       0    2322     371     371     220
4ms         189    7455      22  162957       0    3912    6726    6726     199
8ms         108    9461       0  102320       0    5775    2526    2526      86
17ms         23   11287       0   37142       0    8043    1813    1813      19
34ms          0   14725       0   24015       0   11732    3071    3071       0
67ms          0   23597       0    7914       0   18113    5025    5025       0
134ms         0   33798       0     254       0   25755    7326    7326       0
268ms         0   51780       0      12       0   41593   10002   10002       0
537ms         0   77808       0       0       0   64255   13120   13120       0
1s            0  105281       0       0       0   83805   20841   20841       0
2s            0   88248       0       0       0   73772   14006   14006       0
4s            0   47266       0       0       0   29783   17176   17176       0
9s            0   10460       0       0       0    4130    6295    6295       0
17s           0       0       0       0       0       0       0       0       0
34s           0       0       0       0       0       0       0       0       0
69s           0       0       0       0       0       0       0       0       0
137s          0       0       0       0       0       0       0       0       0
-------------------------------------------------------------------------------

-h: Help

-H: Scripted mode. Do not display headers, and separate fields by a single
    tab instead of arbitrary space.

-q: Include current number of entries in sync & async read/write queues,
    and scrub queue:

 syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read
 pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
-----  -----  -----  -----  -----  -----  -----  -----  -----  -----
    0      0      0      0     78     29      0      0      0      0
    0      0      0      0     78     29      0      0      0      0
    0      0      0      0      0      0      0      0      0      0
    -      -      -      -      -      -      -      -      -      -
    0      0      0      0      0      0      0      0      0      0
    -      -      -      -      -      -      -      -      -      -
    0      0      0      0      0      0      0      0      0      0
-----  -----  -----  -----  -----  -----  -----  -----  -----  -----
    0      0    227    394      0     19      0      0      0      0
    0      0    227    394      0     19      0      0      0      0
    0      0    108     98      0     19      0      0      0      0
    0      0     19     98      0      0      0      0      0      0
    0      0     78     98      0      0      0      0      0      0
    0      0     19     88      0      0      0      0      0      0
-----  -----  -----  -----  -----  -----  -----  -----  -----  -----

-p: Display numbers in parseable (exact) values.

Also, update iostat syntax to allow the user to specify specific vdevs
to show statistics for.  The three options for choosing pools/vdevs are:

Display a list of pools:
    zpool iostat ... [pool ...]

Display a list of vdevs from a specific pool:
    zpool iostat ... [pool vdev ...]

Display a list of vdevs from any pools:
    zpool iostat ... [vdev ...]

Lastly, allow zpool command "interval" value to be floating point:
    zpool iostat -v 0.5

Signed-off-by: Tony Hutter <hutter2@llnl.gov
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4433
2016-05-12 12:36:32 -07:00
Joe Stein e0ab3ab553 OpenZFS 6736 - ZFS per-vdev ZAPs
6736 ZFS per-vdev ZAPs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/6736
  https://github.com/openzfs/openzfs/commit/215198a

Ported-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4515
2016-05-02 14:27:45 -07:00
Ned Bass 98f03691a4 Fix ZPL miswrite of default POSIX ACL
Commit 4967a3e introduced a typo that caused the ZPL to store the
intended default ACL as an access ACL. Due to caching this problem
may not become visible until the filesystem is remounted or the inode
is evicted from the cache. Fix the typo and add a regression test.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4520
2016-04-18 11:26:55 -07:00
Chunwei Chen 2b54cb1451 Add zfs-tests for relatime
Add atime_003_pos to test relatime=on, we do check_atime_updated twice, the
first time should success and the second time should fail. We also modify
atime_001_pos to do check_atime_updated twice and both times should succeed.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4482
2016-04-05 18:56:06 -07:00
Brian Behlendorf c35b188246 Fix zpool_scrub_* test cases
The zpool_scrub_002, zpool_scrub_003, zpool_scrub_004 test cases fail
reliably when running against small pools or fast storage.  This
occurs because the scrub/resilver operation completes before subsequent
commands can be run.

A one second delay has been added to 10% of zio's in order to ensure
the scrub/resilver operation will run for at least several seconds.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4450
2016-03-30 09:30:34 -07:00
Brian Behlendorf 505d9655c9 Fix zdb -e and zhack thread_init()
This issue was caused by calling `thread_init()` and `thread_fini()`
multiple times resulting in `kthread_key` being invalid.  To resolve
the issue the explicit calls to `thread_init()` and `thread_fini()`
required by the `zpool` command have been moved in to the command.
Consumers such as `zdb` and `zhack` perform the same initialized
through `kernel_init()` and `kernel_fini()`.

Resolving this issue allows multiple additional test cases to
be enabled.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #4331
2016-03-21 10:20:02 -07:00
Brian Behlendorf 99d0a9c39e Disable zpool_add_004_pos test case
This test case add a zvol to as a vdev to an existing pool.  This
use case is currently known to be racy.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-03-17 09:56:18 -07:00
Brian Behlendorf 6bb24f4dc7 Add the ZFS Test Suite
Add the ZFS Test Suite and test-runner framework from illumos.
This is a continuation of the work done by Turbo Fredriksson to
port the ZFS Test Suite to Linux.  While this work was originally
conceived as a stand alone project integrating it directly with
the ZoL source tree has several advantages:

  * Allows the ZFS Test Suite to be packaged in zfs-test package.
    * Facilitates easy integration with the CI testing.
    * Users can locally run the ZFS Test Suite to validate ZFS.
      This testing should ONLY be done on a dedicated test system
      because the ZFS Test Suite in its current form is destructive.
  * Allows the ZFS Test Suite to be run directly in the ZoL source
    tree enabled developers to iterate quickly during development.
  * Developers can easily add/modify tests in the framework as
    features are added or functionality is changed.  The tests
    will then always be in sync with the implementation.

Full documentation for how to run the ZFS Test Suite is available
in the tests/README.md file.

Warning: This test suite is designed to be run on a dedicated test
system.  It will make modifications to the system including, but
not limited to, the following.

  * Adding new users
  * Adding new groups
  * Modifying the following /proc files:
    * /proc/sys/kernel/core_pattern
    * /proc/sys/kernel/core_uses_pid
  * Creating directories under /

Notes:
  * Not all of the test cases are expected to pass and by default
    these test cases are disabled.  The failures are primarily due
    to assumption made for illumos which are invalid under Linux.
  * When updating these test cases it should be done in as generic
    a way as possible so the patch can be submitted back upstream.
    Most existing library functions have been updated to be Linux
    aware, and the following functions and variables have been added.
    * Functions:
      * is_linux          - Used to wrap a Linux specific section.
      * block_device_wait - Waits for block devices to be added to /dev/.
    * Variables:            Linux          Illumos
      * ZVOL_DEVDIR         "/dev/zvol"    "/dev/zvol/dsk"
      * ZVOL_RDEVDIR        "/dev/zvol"    "/dev/zvol/rdsk"
      * DEV_DSKDIR          "/dev"         "/dev/dsk"
      * DEV_RDSKDIR         "/dev"         "/dev/rdsk"
      * NEWFS_DEFAULT_FS    "ext2"         "ufs"
  * Many of the disabled test cases fail because 'zfs/zpool destroy'
    returns EBUSY.  This is largely causes by the asynchronous nature
    of device handling on Linux and is expected, the impacted test
    cases will need to be updated to handle this.
  * There are several test cases which have been disabled because
    they can trigger a deadlock.  A primary example of this is to
    recursively create zpools within zpools.  These tests have been
    disabled until the root issue can be addressed.
  * Illumos specific utilities such as (mkfile) should be added to
    the tests/zfs-tests/cmd/ directory.  Custom programs required by
    the test scripts can also be added here.
  * SELinux should be either is permissive mode or disabled when
    running the tests.  The test cases should be updated to conform
    to a standard policy.
  * Redundant test functionality has been removed (zfault.sh).
  * Existing test scripts (zconfig.sh) should be migrated to use
    the framework for consistency and ease of testing.
  * The DISKS environment variable currently only supports loopback
    devices because of how the ZFS Test Suite expects partitions to
    be named (p1, p2, etc).  Support must be added to generate the
    correct partition name based on the device location and name.
  * The ZFS Test Suite is part of the illumos code base at:
    https://github.com/illumos/illumos-gate/tree/master/usr/src/test

Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6
Closes #1534
2016-03-16 13:46:16 -07:00