Commit Graph

2876 Commits

Author SHA1 Message Date
Brian Behlendorf 76a87a902e Disable zio_dva_throttle_enabled by default
Until it can be determined definitively that a performance
regression wasn't introduced accidentally by 3dfb57a this
functionality is being disabled by default.  It can be re-
enabled by setting zio_dva_throttle_enabled=1.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5335 
Issue #5289
2016-10-26 09:13:43 -07:00
LOLi e4010f2719 Allow for '-o feature@<feature>=disabled' on the command line
Sometimes it is desirable to specifically disable one or several
features directly on the 'zpool create' command line.

$ zpool create -o feature@<feature>=disabled ...

Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #3460 
Closes #5142 
Closes #5324
2016-10-25 16:17:47 -07:00
jxiong 16fa68f07d Do not upgrade userobj accounting for snapshot dataset
'zfs recv' could disown a living objset without calling
dmu_objset_disown(). This will cause the problem that the objset
would be released while the upgrading thread is still running.

This patch avoids the problem by checking if a dataset is a snapshot
before calling dmu_objset_userobjspace_upgrade().  Snapshots
are immutable and therefore it doesn't make sense to update them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Closes #5295 
Closes #5328
2016-10-25 13:21:05 -07:00
Tony Hutter 6568379eea Fix statechange-led.sh & unnecessary libdevmapper warning
- Fix autoreplace behaviour on statechange-led.sh script.

ZED sends the following events on an auto-replace:

1. statechange: Disk goes UNAVAIL->ONLINE
2. statechange: Disk goes ONLINE->UNAVAIL
3. vdev_attach: Disk goes ONLINE

Events 1-2 happen when ZED first attempts to do an auto-online.  When that
fails, ZED then tries an auto-replace, generating the vdev_attach event in #3.

In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE
transition to turn off the LED.  It ignored the #2 ONLINE->UNAVAIL transition,
assuming it was just the "old" VDEV going offline.  This is problematic, as
a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want
to ignore that.

This new patch correctly turns on the fault LED every time a drive becomes
UNAVAIL.  It also monitors vdev_attach events to trigger turning off the LED
when an auto-replaced disk comes online.

- Remove unnecessary libdevmapper warning with --with-config=kernel

This fixes an unnecessary libdevmapper warning when building
--with-config=kernel.  Kernel code does not use libdevmapper, so the warning
is not needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2375 
Closes #5312 
Closes #5331
2016-10-25 11:05:30 -07:00
Jason Zaman 402c7c27b0 icp: mark asm files with noexec stack
Similar to commit a3600a106.  Asm files need an explicit note
that they do not require an executable stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Zaman <jason@perfinion.com>
Closes #5332
2016-10-25 10:44:09 -07:00
tuxoko 9fa4db44b7 Fix cred leak in zpl_fallocate_common
This is caught by kmemleak when running compress_004_pos

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5244 
Closes #5330
2016-10-24 16:41:56 -07:00
Brian Behlendorf 66392d81f5 Disable zpool_upgrade_002_pos test case
This test case frequently triggers issue #4034.  There exists a
fix for this which is in the process of being upstreamed.  Until
that fix is available disable the test case.

Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5329 
Issue #4034
2016-10-24 16:39:47 -07:00
cao aed0e9f3e4 Fix coverity defects: CID 147511, 147513
CID 147511: Type:Dereference before null check
CID 147513: Type:Dereference before null check

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5306
2016-10-24 13:37:38 -07:00
Brian Behlendorf 13d9a004fe Fix taskq creation failure in vdev_open_children()
When creating and destroying pools in tight loop it's possible to
exhaust the number of allowed threads on a system.  This results
in taskq_create() failling and a NULL dereference.

Resolve the issue by falling back to opening the vdevs all
synchronously.

Reviewed-by: Denys Rtveliashvili <denys@rtveliashvili.name>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#521
Closes #4637
2016-10-24 13:28:58 -07:00
Tony Hutter 1bbd877049 Turn on/off enclosure slot fault LED even when disk isn't present
Previously when a drive faulted, the statechange-led.sh script would lookup
the drive's LED sysfs entry in /sys/block/sd*/device/enclosure_device, and
turn it on.  During testing we noticed that if you pulled out a drive, or if
the drive was so badly broken that it no longer appeared to Linux, that the
/sys/block/sd* path would be removed, and the script could not lookup the
LED entry.

To fix this, this patch looks up the disks's more persistent
"/sys/class/enclosure/X:X:X:X/Slot N" LED sysfs path at pool import.  It then
passes that path to the statechange-led script to use, rather than having the
script look it up on the fly.  This allows the script to turn on/off the slot
LEDs even when the drive is missing.

Closes #5309 
Closes #2375
2016-10-24 10:45:59 -07:00
Giuseppe Di Natale a85cefa35c Change location of current symlink created by test-runner
test-runner should be creating the current symlink in the
directory above the output directory. In a previous commit,
the current symlink was placed in the current working
directory, which could be inaccessible. It is more likely
that the output directory is always accessible.

This is needed because without this there's no deterministic
way to get the path to ZFS Test Suite results until after the
test suite has started. This makes it difficult for buildbot to
follow the log file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5314
2016-10-24 10:24:10 -07:00
Romain Dolbeau 24cdeaf12e Fletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bits
This is not useful on micro-architecture with a weak NEON
implementation (only 64 bits); the native version is slower &
the byteswap barely faster than scalar.  On A53 or A57, it's
a small improvement on scalar but OK for byteswap.

Results from an A53 system:
0 0 0x01 -1 0 1499068294333000 1499101101878000
implementation   native         byteswap       
scalar           1008227510     755880264      
aarch64_neon     1198098720     1044818671     
fastest          aarch64_neon   aarch64_neon 

Results from a A57 system:
0 0 0x01 -1 0 4407214734807033 4407233933777404
implementation   native         byteswap       
scalar           2302071241     1124873346     
aarch64_neon     2542214946     2245570352     
fastest          aarch64_neon   aarch64_neon 

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #5248
2016-10-21 10:55:49 -07:00
Brian Behlendorf e4ffa98dca Fix userquota_compare() function
The AVL tree compare function requires that either -1, 0, or 1 be
returned.  However the strcmp() function only guarantees that a
negative, zero, or positive value is returned.  Therefore, the
return value of strcmp() needs to be sanitized with AVL_ISIGN.

This was initially overlooked because the x86_64 implementation
of strcmp() happens to only returns the allowed values.  This
was observed on an aarch64 platform which behaves correctly but
differently as described above.

Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5311 
Closes #5313
2016-10-21 08:23:27 -07:00
luozhengzheng 9523b15ac1 Fix coverity defects: CID 153459
CID 153459: Null pointer dereferences (FORWARD_NULL)
Accidentally introduced by #5159.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5310
2016-10-20 11:54:02 -07:00
cao 9d01680430 Fix coverity defects: CID 147551, 147552
CID 147551: Type:dereference null return value
CID 147552: Type:dereference null return value

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5279
2016-10-20 11:49:50 -07:00
cao 5a6765cf8c Fix coverity defects: CID 147472
CID 147472: Type: 'Constant' variable guards dead code

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5288
2016-10-20 11:24:01 -07:00
luozhengzheng 1f72394443 Fix coverity defects: CID 150919, 150923
CID 150919: Buffer not null terminated (BUFFER_SIZE_WARNING)
CID 150923: Buffer not null terminated (BUFFER_SIZE_WARNING)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5298
2016-10-20 11:09:39 -07:00
legend-hua 381823d6d2 Update migration_004_pos, migration_005_pos, migration_006_pos
Log function should be "log_fail", rather than "log_failED"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: legend-hua <liu.hua130@zte.com.cn>
Closes #5300
2016-10-20 11:04:30 -07:00
Brian Behlendorf 72ac461cbe Fix make distclean Makefile.am removal
The file tests/zfs-tests/tests/stress/Makefile.am gets mistakenly
removed by the distclean target because it's empty.  Adding a
`SUBDIRS =` line prevents the removal.

This directory is being preserved as the location to add assorted
stress tests.  These may include but are not limited to.

  http://kernel.ubuntu.com/~cking/stress-ng/
  https://github.com/zfsonlinux/zfsstress/

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5308
2016-10-20 09:55:03 -07:00
Brian Behlendorf 3b0ba3ba99 Linux 4.9 compat: inode_change_ok() renamed setattr_prepare()
In torvalds/linux@31051c8 the inode_change_ok() function was
renamed setattr_prepare() and updated to take a dentry ratheri
than an inode.  Update the code to call the setattr_prepare()
and add a wrapper function which call inode_change_ok() for
older kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Requires-spl: refs/pull/581/head
2016-10-20 09:39:09 -07:00
Chunwei Chen 0fedeedd30 Linux 4.9 compat: remove iops->{set,get,remove}xattr
In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and
generic_{set,get,remove}xattr are removed. xattr operations will directly
go through sb->s_xattr.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-10-20 09:39:09 -07:00
Chunwei Chen b8d9e26440 Linux 4.9 compat: iops->rename() wants flags
In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are
merged together into iops->rename(), it now wants flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-10-20 09:39:09 -07:00
Chunwei Chen 8ba3f2bf6a Remove dir inode operations from zpl_inode_operations
These operations are dir specific, there's no point putting them in
zpl_inode_operations which is for regular files.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2016-10-20 09:39:09 -07:00
Brian Behlendorf 9d70aec6fd Update .gitignore
Two additional files were recently introduced and should be
ignored by git.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5299
2016-10-19 14:29:33 -07:00
Tony Hutter 6078881aa1 Multipath autoreplace, control enclosure LEDs, event rate limiting
1. Enable multipath autoreplace support for FMA.

This extends FMA autoreplace to work with multipath disks.  This
requires libdevmapper to be installed at build time.

2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online

Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure
LED for a drive when a drive becomes FAULTED/DEGRADED.  Your enclosure must
be supported by the Linux SES driver for this to work.  The enclosure LED
scripts work for multipath devices as well.  The scripts will clear the LED
when the fault is cleared.

3. Rate limit ZIO delay and checksum events so as not to flood ZED

ZIO delay and checksum events are rate limited to 5/sec in the zfs module.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2449 
Closes #3017 
Closes #5159
2016-10-19 12:55:59 -07:00
luozhengzheng 7c502b0b1d Fix coverity defects: CID 150926
CID 150926: Unchecked return value (CHECKED_RETURN)
- This case cannot occur given the existing taskq implementation
  and flags passed to task_dispatch().

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5272
2016-10-18 11:32:59 -07:00
Brian Behlendorf 6d00b5e136 Fix unused variable
Accidentally introduced by 3dfb57a, when building with debugging
disabled several variables are unused.  Resolve this by wrapping
them in ASSERTV to remove them for non-debug builds.

Reviewed by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5284
2016-10-18 10:44:44 -07:00
GeLiXin 66826e2285 Fix coverity defects: CID 147643, 152204, 49339
CID 147643: Type: String not null terminated
- make sure that the string is null terminated before strlen
  and fprintf.

CID 152204: Type: Copy into fixed size buffer
- since strlcpy isn't availabe here, use strncpy and terminate
  the string manually.

CID 49339: Type: Buffer not null terminated
- since strlcpy isn't availabe here, terminate the string
  manually before fprintf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5283
2016-10-18 10:43:22 -07:00
cao 1b81ab46d0 Fix coverity defects: CID 49339, 153393
CID 49339: Type:Buffer not null terminated
CID 153393: Type:Buffer not null terminated

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: <cao.xuewen cao.xuewen@zte.com.cn>
Closes #5296
2016-10-18 10:31:57 -07:00
Giuseppe Di Natale df7492240a Create a symlink to current test-runner output
Generate a symlink in the current working directory to
test-runner.py output. This will make it easier for the
ZFS buildbot to collect logs.

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5293
2016-10-18 10:19:28 -07:00
luozhengzheng b60eac3d1a Fix coverity defects: CID 150924
CID 150924: Unchecked return value (CHECKED_RETURN)
- On taskq_dispatch failure the reference must be dropped and
  this entry can be safely skipped.  This case should be impossible
  in the existing implementation but should be handled regardless.
  
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5278
2016-10-17 12:03:52 -07:00
Rudd-O f8e87e205c Properly use the Dracut cleanup hook to order pool shutdown
When Dracut starts up, it needs to determine whether a pool will remain
"hanging open" before the system shuts off. In such a case, then the
code to clean up the pool (using the previous export -F work) must
be invoked. Since Dracut has had a recent change that makes
mount-zfs.sh simply not run when the root dataset is already mounted,
we must use the cleanup hook to order Dracut to do shutdown cleanup.

Important note: this code will not accomplish its stated goal until this
bug is fixed: https://bugzilla.redhat.com/show_bug.cgi?id=1385432

That bug impacts more than just ZFS. It impacts LUKS, dmraid, and
unmount during poweroff. It is a Fedora-wide bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Closes #5287
2016-10-17 11:51:15 -07:00
Håkan Johansson fea33e4e50 Pass status_cbdata_t to print_status_config() and friends
First rename spare_cbdata_t cb -> spare_cb in print_status_config(),
to free up cb.

Using the structure removes the explicit parameters namewidth
and name_flags from several functions.  Also use status_cbdata_t
for print_import_config().  This simplifies print_logs().

Remove the parameter 'verbose' for print_logs().  It does not really
mean verbose, it selected between the print_status_config and
print_import_config() paths.  This selection is now done by
cb_print_config of spare_cbdata_t.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Håkan Johansson <f96hajo@chalmers.se>
Closes #5259
2016-10-17 11:46:35 -07:00
Rudd-O 7e8a2d0b75 Use -F to export pools so as not to dirty up device labels
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Closes #5228 
Closes #5238
2016-10-15 20:30:53 -07:00
Brian Behlendorf dabb6f4fab Allow partition aliases in vdev_id.conf (#5266)
When pools are assembled from partitions, vdev_id.conf aliases
do not work.  The directory /dev/disk/by-vdev is not created because
the associated udev rule for parsing vdev_id.conf is never called.
Extend to logic to match "disk" and "partition".

Patch-proposed-by: @sparksh
Reviewed-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3859
Closes #5266
2016-10-14 16:11:16 -07:00
cao b6ca6193f7 Fix coverity defects: CID 147488, 147490
CID 147488, Type:explicit null dereferenced
CID 147490, Type:dereference null return value

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5237
2016-10-14 11:00:47 -07:00
Akash Ayare 3691598e26 OpenZFS 6877 - zfs_rename_006_pos fails due to missing zvol snapshot device file
Authored by: Akash Ayare <aayare@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Bug was caused due to a change in functionality. At some point, ZFS
snapshots no longer created associated device files which were being
used in the test. To resolve this issue, a clone of the snapshot can be
produced which will also create the expected device files; then, the
test will behave as it did historically.

OpenZFS-issue: https://www.illumos.org/issues/6877
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2200f27
Closes #5275

Porting Notes:
- Hardcoded /dev/zvol/rdsk changed to $ZVOL_RDEVDIR for compatibility.
- Enabled in linux runfile.
2016-10-14 10:11:00 -07:00
Brian Behlendorf 7305538de3 Enable zfs_rename_002_pos, zfs_rename_005_neg, zfs_rename_007_pos
These tests all pass once updated to wait for udev to create the
expected linked under /dev/zvol/.

Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5275
2016-10-14 10:11:00 -07:00
luozhengzheng 9a875c6c5d Fix coverity defects: CID 150921, 150927
CID 150921: Unchecked return value (CHECKED_RETURN)
CID 150927 : Unchecked return value (CHECKED_RETURN)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5267
2016-10-14 09:40:08 -07:00
liaoyuxiangqin 21237e9167 Enable quota_002_pos, quota_004_pos and quota_005_pos
In this test the 'ls -ls' command was used to print testfile size in
blocks.  Because the environment variable BLOCK_SIZE was set
the 'ls -ls' command detected this and output its block count as the
number of 8192 blocks.  Rather than change the variable name
the -k was was added to force ls to return 1k blocks.  This has the
additional advantage of behaving consistently across platforms.

For additional details on GNU 'ls' behavior regarding block size:

https://www.gnu.org/software/coreutils/manual/html_node/Block-size.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5269
2016-10-14 09:33:51 -07:00
Brian Behlendorf 5f014a0cc4 Enable zfs_receive_011_pos
The zfs_receive_011_pos test can be enabled now that OpenZFS 6562
has been merged.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5276
2016-10-14 09:17:56 -07:00
Don Brady 3dfb57a35e OpenZFS 7090 - zfs should throttle allocations
OpenZFS 7090 - zfs should throttle allocations

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

When write I/Os are issued, they are issued in block order but the ZIO
pipeline will drive them asynchronously through the allocation stage
which can result in blocks being allocated out-of-order. It would be
nice to preserve as much of the logical order as possible.

In addition, the allocations are equally scattered across all top-level
VDEVs but not all top-level VDEVs are created equally. The pipeline
should be able to detect devices that are more capable of handling
allocations and should allocate more blocks to those devices. This
allows for dynamic allocation distribution when devices are imbalanced
as fuller devices will tend to be slower than empty devices.

The change includes a new pool-wide allocation queue which would
throttle and order allocations in the ZIO pipeline. The queue would be
ordered by issued time and offset and would provide an initial amount of
allocation of work to each top-level vdev. The allocation logic utilizes
a reservation system to reserve allocations that will be performed by
the allocator. Once an allocation is successfully completed it's
scheduled on a given top-level vdev. Each top-level vdev maintains a
maximum number of allocations that it can handle (mg_alloc_queue_depth).
The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth)
are distributed across the top-level vdevs metaslab groups and round
robin across all eligible metaslab groups to distribute the work. As
top-levels complete their work, they receive additional work from the
pool-wide allocation queue until the allocation queue is emptied.

OpenZFS-issue: https://www.illumos.org/issues/7090
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7
Closes #5258 

Porting Notes:
- Maintained minimal stack in zio_done
- Preserve linux-specific io sizes in zio_write_compress
- Added module params and documentation
- Updated to use optimize AVL cmp macros
2016-10-13 17:59:18 -07:00
cao a85a90557d Fix coverity defects: CID 147692, 147693, 147694
CID:147692, Type:Uninitialized scalar variable
CID:147693, Type:Uninitialized scalar variable
CID:147694, Type:Uninitialized scalar variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5252
2016-10-13 14:38:59 -07:00
cao 3f93077b02 Fix coverity defects: CID 150943, 150938
CID:150943, Type:Unintentional integer overflow
CID:150938, Type:Explicit null dereferenced

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5255
2016-10-13 14:30:50 -07:00
luozhengzheng 05852b3467 Fix coverity defects: CID 147571, 147574
CID 147571: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
CID 147574: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5268
2016-10-13 14:25:05 -07:00
liaoyuxiangqin e8d3dcdfb1 Enable refquota_002_pos and refquota_004_pos
The refquota_002_pos and refquota_004_pos test cases can pass
without modification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes #5273
2016-10-13 14:21:15 -07:00
GeLiXin 45cb520b9d Fix coverity defects: CID 147654, 147690
coverity scan CID:147654,type: Copy into fixed size buffer
- string operation may write past the end of the fixed-size
  destination buffer

coverity scan CID:147690,type: Uninitialized scalar variable
- call zfs_prop_get first in case we use sourcetype and
  share_sourcetype without initialization

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5253
2016-10-13 14:02:07 -07:00
luozhengzheng 1f51b525ff Fix coverity defects: CID 153394
coverity scan CID 153394, Type:String overflow

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes #5263
2016-10-12 13:24:03 -07:00
Tom Caputi ef78750d98 Fix ICP memleak introduced in #4760
The ICP requires destructors to for each crypto module that is added.
These do not necessarily exist in Illumos because they assume that
these modules can never be unloaded from the kernel. Some of this
cleanup code was missed when #4760 was merged, resulting in leaks.
This patch simply fixes that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #4760 
Closes #5265
2016-10-12 12:52:30 -07:00
cao 06cf4d9890 Fix coverity defects: CID 147606, 147609
coverity scan CID:147606, Type:resource leak
coverity scan CID:147609, Type:resource leak

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes #5245
2016-10-12 11:16:47 -07:00