Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
luozhengzheng	fad5fb01ad	Fix memory leak in recv_skip When the exception branch exits, the buf is leaked. Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn> Closes #5262	2016-10-11 10:24:18 -07:00
Brian Behlendorf	7515f8f63d	Fix file permissions The following new test cases need to have execute permissions set: userquota/groupspace_003_pos.ksh userquota/userquota_013_pos.ksh userquota/userspace_003_pos.ksh upgrade/upgrade_userobj_001_pos.ksh upgrade/setup.ksh upgrade/cleanup.ksh The following source files accidentally were marked executable: lib/libzpool/kernel.c lib/libshare/nfs.c lib/libzfs/libzfs_dataset.c lib/libzfs/libzfs_util.c tests/zfs-tests/cmd/rm_lnkcnt_zero_file/rm_lnkcnt_zero_file.c tests/zfs-tests/cmd/dir_rd_update/dir_rd_update.c cmd/zed/zed_exec.c module/icp/core/kcf_sched.c module/zfs/dsl_pool.c module/zfs/arc.c module/nvpair/nvpair.c man/man5/zfs-module-parameters.5 Reviewed-by: GeLiXin <ge.lixin@zte.com.cn> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5241	2016-10-08 14:57:56 -07:00
cao	ccc92611b1	Fix coverity defects: CID 147565-147567 coverity scan CID:147567, Type:dereference null return value coverity scan CID:147566, Type:dereference null return value coverity scan CID:147565, Type:dereference null return value Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn> Closes #5166	2016-10-07 13:19:43 -07:00
Brian Behlendorf	482cd9ee69	Fletcher4: Incremental updates and ctx calculation Fixes ABI issues with fletcher4 code, adds support for incremental updates, and adds ztest method for testing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Closes #5164	2016-10-07 12:44:12 -07:00
LOLi	48f783de79	Fix uninitialized variable snapprops_nvlist in zfs_receive_one The variable snapprops_nvlist was never initialized, so properties were not applied to the received snapshot. Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise 'zfs receive' functionality for user properties. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #4338	2016-10-07 10:05:06 -07:00
Jinshan Xiong	1de321e626	Add support for user/group dnode accounting & quota This patch tracks dnode usage for each user/group in the DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode accounting have the key prefixed with "obj-" followed by the UID/GID in string format (as done for the block accounting). A new SPA feature has been added for dnode accounting as well as a new ZPL version. The SPA feature must be enabled in the pool before upgrading the zfs filesystem. During the zfs version upgrade, a "quotacheck" will be executed by marking all dnode as dirty. ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500 Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>	2016-10-07 09:45:13 -07:00
luozhengzheng	e2c292bbfc	Fix coverity defects: CID 150953, 147603, 147610 coverity scan CID:150953,type: uninitialized scalar variable coverity scan CID:147603,type: Resource leak coverity scan CID:147610,type: Resource leak Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn> Closes #5209	2016-10-04 18:15:57 -07:00
Tony Hutter	3c67d83a8a	OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Ported by: Tony Hutter <hutter2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Porting Notes: This code is ported on top of the Illumos Crypto Framework code: `b5e030c8db` The list of porting changes includes: - Copied module/icp/include/sha2/sha2.h directly from illumos - Removed from module/icp/algs/sha2/sha2.c: #pragma inline(SHA256Init, SHA384Init, SHA512Init) - Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since it now takes in an extra parameter. - Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c - Added skein & edonr to libicp/Makefile.am - Added sha512.S. It was generated from sha512-x86_64.pl in Illumos. - Updated ztest.c with new fletcher_4_() args; used NULL for new CTX argument. - In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section to not #include the non-existant endian.h. - In skein_test.c, renane NULL to 0 in "no test vector" array entries to get around a compiler warning. - Fixup test files: - Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>, - Remove <note.h> and define NOTE() as NOP. - Define u_longlong_t - Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p" - Rename NULL to 0 in "no test vector" array entries to get around a compiler warning. - Remove "for isa in $($ISAINFO); do" stuff - Add/update Makefiles - Add some userspace headers like stdio.h/stdlib.h in places of sys/types.h. - EXPORT_SYMBOL _Init/_Update/_Final... routines in ICP modules. - Update scripts/zfs2zol-patch.sed - include <sys/sha2.h> in sha2_impl.h - Add sha2.h to include/sys/Makefile.am - Add skein and edonr dirs to icp Makefile - Add new checksums to zpool_get.cfg - Move checksum switch block from zfs_secpolicy_setprop() to zfs_check_settable() - Fix -Wuninitialized error in edonr_byteorder.h on PPC - Fix stack frame size errors on ARM32 - Don't unroll loops in Skein on 32-bit to save stack space - Add memory barriers in sha2.c on 32-bit to save stack space - Add filetest_001_pos.ksh checksum sanity test - Add option to write psudorandom data in file_write utility	2016-10-03 14:51:15 -07:00
Gvozden Neskovic	dc03fa3092	Fletcher4: Init in libzfs_init() All users of fletcher4 methods must call `fletcher_4_init()/_fini()` There's no benchmarking overhead when called from user-space. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>	2016-10-03 21:51:34 +02:00
Romain Dolbeau	62a65a654e	Add parity generation/rebuild using 128-bits NEON for Aarch64 This re-use the framework established for SSE2, SSSE3 and AVX2. However, GCC is using FP registers on Aarch64, so unlike SSE/AVX2 we can't rely on the registers being left alone between ASM statements. So instead, the NEON code uses C variables and GCC extended ASM syntax. Note that since the kernel explicitly disable vector registers, they have to be locally re-enabled explicitly. As we use the variable's number to define the symbolic name, and GCC won't allow duplicate symbolic names, numbers have to be unique. Even when the code is not going to be used (e.g. the case for 4 registers when using the macro with only 2). Only the actually used variables should be declared, otherwise the build will fails in debug mode. This requires the replacement of the XOR(X,X) syntax by a new ZERO(X) macro, which does the same thing but without repeating the argument. And perhaps someday there will be a machine where there is a more efficient way to zero a register than XOR with itself. This affects scalar, SSE2, SSSE3 and AVX2 as they need the new macro. It's possible to write faster implementations (different scheduling, different unrolling, interleaving NEON and scalar, ...) for various cores, but this one has the advantage of fitting in the current state of the code, and thus is likely easier to review/check/merge. The only difference between aarch64-neon and aarch64-neonx2 is that aarch64-neonx2 unroll some functions some more. Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net> Closes #4801	2016-10-03 09:44:00 -07:00
Richard Laager	d1502e9ed0	Correct zpool_vdev_remove() error message The error message in zpool_vdev_remove() said top-level devices could be removed, but that has never been true. Reported-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Laager <rlaager@wiktel.com> Closes #4506 Closes #5213	2016-10-02 11:34:17 -07:00
luozhengzheng	aecdc70604	Fix coverity defects: CID 147448, 147449, 147450, 147453, 147454 coverity scan CID:147448,type: unchecked return value coverity scan CID:147449,type: unchecked return value coverity scan CID:147450,type: unchecked return value coverity scan CID:147453,type: unchecked return value coverity scan CID:147454,type: unchecked return value Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn> Closes #5206	2016-10-02 11:24:54 -07:00
cao	0a8f18f932	Fix coverity defects: CID 147563, 147560 coverity scan CID:147563, Type:dereference null return value coverity scan CID:147560, Type:dereference null return value Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn> Closes #5168	2016-09-30 15:56:17 -07:00
GeLiXin	470f12d631	Fix coverity defects: CID 147531 147532 147533 147535 coverity scan CID:147531,type: Argument cannot be negative - may copy data with negative size coverity scan CID:147532,type: resource leaks - may close a fd which is negative coverity scan CID:147533,type: resource leaks - may call pwrite64 with a negative size coverity scan CID:147535,type: resource leaks - may call fdopen with a negative fd Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: GeLiXin <ge.lixin@zte.com.cn> Closes #5176	2016-09-30 15:47:57 -07:00
cao	8047715ab4	Fix coverity defects: CID 147707 coverity scan CID:147707, Type:Double free. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn> Closes #5097	2016-09-30 10:49:16 -07:00
Gvozden Neskovic	031d7c2fe6	fix: Shift exponent too large Undefined operation is reported by running ztest (or zloop) compiled with GCC UndefinedBehaviorSanitizer. Error only happens on top level of dnode indirection with large enough offset values. Logically, left shift operation would work, but bit shift semantics in C, and limitation of uint64_t, do not produce desired result. Issue #5059, #4883 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>	2016-09-29 15:55:41 -07:00
BearBabyLiu	0b78aeae92	Fix coverity defects: CID 147443, 147656, 147655, 147441, 147653 coverity scan CID:147443, Type: Buffer not null terminated coverity scan CID:147656, Type: Copy into fixed size buffer coverity scan CID:147655, Type: Copy into fixed size buffer coverity scan CID:147441, Type: Buffer not null terminated coverity scan CID:147653, Type: Copy into fixed size buffer Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: liuhuang <liu.huang@zte.com.cn> Closes #5165	2016-09-29 13:33:09 -07:00
cao	c9d61adbf8	Fix coverity defects: 147658, 147652, 147651 coverity scan CID:147658, Type:copy into fixed size buffer. coverity scan CID:147652, Type:copy into fixed size buffer. coverity scan CID:147651, Type:copy into fixed size buffer. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn> Closes #5160	2016-09-29 12:06:14 -07:00
kernelOfTruth aka. kOT, Gentoo user	6635624020	OpenZFS 6111 - zfs send should ignore datasets created after the ending snapshot Authored by: Alex Deiter <alex.deiter@gmail.com> Reviewed by: Alex Aizman alex.aizman@nexenta.com Reviewed by: Alek Pinchuk alek.pinchuk@nexenta.com Reviewed by: Roman Strashkin roman.strashkin@nexenta.com Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: kernelOfTruth <kerneloftruth@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/6111 OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/4a20c933 Closes #5110 Porting notes: There were changes from upstream due to the following commits: - zfs send -p send properties only for snapshots that are actually sent `057485504e` - Produce a full snapshot list for zfs send -p `e890dd85a7` - Implement zfs_ioc_recv_new() for OpenZFS 2605 `43e52eddb1` - OpenZFS 6314 - buffer overflow in dsl_dataset_name ZFS_MAXNAMELEN was changed to the now used ZFS_MAX_DATASET_NAME_LEN since `eca7b76001`	2016-09-22 17:22:37 -07:00
candychencan	d2a6f65cfb	Fix strncpy in taskq_create Assign the copy length to TASKQ_NAMELEN, so if the name length equals 'TASKQ_NAMELEN+1' , the final '\0' of tq->tq_name is preserved. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: candychencan <chen.can2@zte.com.cn> Closes #5136	2016-09-20 11:27:15 -07:00
slashdd	792517389f	Change /etc/mtab to /proc/self/mounts Fix misleading error message: "The /dev/zfs device is missing and must be created.", if /etc/mtab is missing. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com> Closes #4680 Closes #5029	2016-09-20 10:07:58 -07:00
Dan Kimmel	2aa34383b9	DLPX-40252 integrate EP-476 compressed zfs send/receive Authored by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Tom Caputi <tcaputi@datto.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Ported by: David Quigley <david.quigley@intel.com> Issue #5078	2016-09-13 09:58:58 -07:00
Don Brady	d02ca37979	Bring over illumos ZFS FMA logic -- phase 1 This first phase brings over the ZFS SLM module, zfs_mod.c, to handle auto operations in response to disk events. Disk event monitoring is provided from libudev and generates the expected payload schema for zfs_mod. This work leverages the recently added devid and phys_path strings in the vdev label. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@intel.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #4673	2016-09-01 11:39:45 -07:00
cao	2d96d7aa56	Fix zfs_unmount() and zfs_unshare_proto() leaks Always free mnpt memory on failure in the zfs_unmount() function. In the zfs_unshare_proto() function mountpoint is a const and should not be assigned. Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5054	2016-09-01 11:39:45 -07:00
Gvozden Neskovic	ee36c709c3	Performance optimization of AVL tree comparator functions perf: 2.75x faster ddt_entry_compare() First 256bits of ddt_key_t is a block checksum, which are expected to be close to random data. Hence, on average, comparison only needs to look at first few bytes of the keys. To reduce number of conditional jump instructions, the result is computed as: sign(memcmp(k1, k2)). Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} , which is computed efficiently. Synthetic performance evaluation of original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R) CPU E5-2660 v3: old 6.85789 s new 2.49089 s perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare() Compute the result directly instead of using conditionals perf: zfs_range_compare() Speedup between 1.1x - 2.5x, depending on compiler version and optimization level. perf: spa_error_entry_compare() `bcmp()` is not suitable for comparator use. Use `memcmp()` instead. perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare() perf: 2.8x faster zil_bp_compare() perf: 2.8x faster mze_compare() perf: faster dbuf_compare() perf: faster compares in spa_misc perf: 2.8x faster layout_hash_compare() perf: 2.8x faster space_reftree_compare() perf: libzfs: faster avl tree comparators perf: guid_compare() perf: dsl_deadlist_compare() perf: perm_set_compare() perf: 2x faster range_tree_seg_compare() perf: faster unique_compare() perf: faster vdev_cache _compare() perf: faster vdev_uberblock_compare() perf: faster fuid _compare() perf: faster zfs_znode_hold_compare() Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Richard Elling <richard.elling@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5033	2016-08-31 14:35:34 -07:00
GeLiXin	c40db193a5	Fix: Build warnings with different gcc optimization levels in debug mode This fix resolves warnings reported during compiling with different gcc optimization levels in debug mode, Test tools: gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) Linux version: 2.6.32-573.18.1.el6.x86_64, Red Hat Enterprise Linux Server release 6.1 (Santiago) List of warnings: CFLAGS=-O1 ./configure --enable-debug ;make ../../module/icp/core/kcf_sched.c: In function ‘kcf_aop_done’: ../../module/icp/core/kcf_sched.c:499: error: ‘fg’ may be used uninitialized in this function ../../module/icp/core/kcf_sched.c:499: note: ‘fg’ was declared here CFLAGS=-Os ./configure --enable-debug ; make libzfs_dataset.c: In function ‘zfs_prop_set_list’: libzfs_dataset.c:1575: error: ‘nvl_len’ may be used uninitialized in this function Signed-off-by: GeLiXin <ge.lixin@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5022	2016-08-29 12:46:18 -07:00
Brian Behlendorf	67925abb5e	Fix cv_timedwait_hires The user space implementation of cv_timedwait_hires() was always passing a relative time to pthread_cond_timedwait() when an absolute time is expected. This was accidentally introduced in commit `206971d2`. Replace two magic values with their corresponding preprocessor macro. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #5024	2016-08-29 10:54:14 -07:00
GeLiXin	23827a4ca1	Fix: Array bounds read in zprop_print_one_property() If the loop index i comes to (ZFS_GET_NCOLS - 1), the cbp->cb_columns[i + 1] actually read the data of cbp->cb_colwidths[0], which means the array subscript is above array bounds. Luckily the cbp->cb_colwidths[0] is always 0 and it seems we haven't looped enough times to exceed the array bounds so far, but it's really a secluded risk someday. Signed-off-by: GeLiXin <ge.lixin@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5003	2016-08-22 10:23:47 -07:00
Gvozden Neskovic	fc897b24b2	Rework of fletcher_4 module - Benchmark memory block is increased to 128kiB to reflect real block sizes more accurately. Measurements include all three stages needed for checksum generation, i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset overhead of time function. - Fastest implementation selects native and byteswap methods independently in benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()` are introduced. - Implementation mutex lock is replaced by atomic variable. - To save time, benchmark is not executed in userspace. Instead, highest supported implementation is used for fastest. Default userspace selector is still 'cycle'. - `fletcher_4_native/byteswap()` methods use incremental methods to finish calculation if data size is not multiple of vector stride (currently 64B). - Added `fletcher_4_native_varsize()` special purpose method for use when buffer size is not known in advance. The method does not enforce 4B alignment on buffer size, and will ignore last (size % 4) bytes of the data buffer. - Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows throughput for all supported implementations (in B/s), native and byteswap, as well as the code [fastest] is running. Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`: implementation native byteswap scalar 4768120823 3426105750 sse2 7947841777 4318964249 ssse3 7951922722 6112191941 avx2 13269714358 11043200912 fastest avx2 avx2 Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`: implementation native byteswap scalar 1291115967 1031555336 sse2 2539571138 1280970926 ssse3 2537778746 1080016762 avx2 4950749767 1078493449 avx512f 9581379998 4010029046 fastest avx512f avx512f Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4952	2016-08-16 14:11:55 -07:00
Gvozden Neskovic	70b258fc96	Fletcher4 implementation using avx512f instruction set Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per loop iteration. Size alignment of main fletcher4 methods is adjusted accordingly. New implementation is called 'avx512f'. Note: byteswap method can be implemented more efficiently when avx512bw hardware becomes available. Currently, it is ~ 2x slower than native method. Table shows result of full (native) fletcher4 calculation for different buffer size: fletcher4 4KB 16KB 64KB 128KB 256KB 1MB 16MB -------------------------------------------------------------------- [scalar] 1213 1228 1231 1231 1225 1200 1160 [sse2] 2374 2442 2459 2456 2462 2250 2220 [avx2] 4288 4753 4871 4893 4900 4050 3882 [avx512f] 5975 8445 9196 9221 9262 6307 5620 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4952	2016-08-16 14:11:14 -07:00
Hans Rosenfeld	fb390aafc8	OpenZFS 5997 - FRU field not set during pool creation and never updated Authored by: Hans Rosenfeld <hans.rosenfeld@nexenta.com> Reviewed by: Dan Fields <dan.fields@nexenta.com> Reviewed by: Josef Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Signed-off-by: Don Brady <don.brady@intel.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/5997 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1437283 Porting Notes: In addition to the OpenZFS changes this patch realigns the events with those found in OpenZFS. Events which would be logged as sysevents on illumos have been been mapped to the 'sysevent' class for Linux. In addition, several subclass names have been changed to match what is used in OpenZFS. In all cases this means a '.' was changed to an '_' in the subclass. The scripts provided by ZoL have been updated, however users which provide scripts for any of the following events will need to rename them based on the new subclass names. ereport.fs.zfs.config.sync sysevent.fs.zfs.config_sync ereport.fs.zfs.zpool.destroy sysevent.fs.zfs.pool_destroy ereport.fs.zfs.zpool.reguid sysevent.fs.zfs.pool_reguid ereport.fs.zfs.vdev.remove sysevent.fs.zfs.vdev_remove ereport.fs.zfs.vdev.clear sysevent.fs.zfs.vdev_clear ereport.fs.zfs.vdev.check sysevent.fs.zfs.vdev_check ereport.fs.zfs.vdev.spare sysevent.fs.zfs.vdev_spare ereport.fs.zfs.vdev.autoexpand sysevent.fs.zfs.vdev_autoexpand ereport.fs.zfs.resilver.start sysevent.fs.zfs.resilver_start ereport.fs.zfs.resilver.finish sysevent.fs.zfs.resilver_finish ereport.fs.zfs.scrub.start sysevent.fs.zfs.scrub_start ereport.fs.zfs.scrub.finish sysevent.fs.zfs.scrub_finish ereport.fs.zfs.bootfs.vdev.attach sysevent.fs.zfs.bootfs_vdev_attach	2016-08-12 13:06:48 -07:00
GeLiXin	d5884c3453	Fix indefinite article The indefinite article before nvlist should be "an", not "a". We have 27 "an nvlist" and 7 "a nvlist" in our comment, they should stay the same as we are such a strict filesystem. Signed-off-by: GeLiXin <ge.lixin@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4941	2016-08-11 11:23:49 -07:00
Gvozden Neskovic	689f093ebc	Build user-space with different gcc optimization levels This fix resolves warnings reported during compiling of user-space libraries with different gcc optimization levels. Tested with gcc versions: 4.9.2 (Debian), and 6.1.1 (Fedora). The patch enables use of following opt levels: O0, O1, O2, O3, Og, Os, Ofast. List of warnings: [GCC 4.9.2 -Os] libzfs_sendrecv.c:3726:26: error: 'clp' may be used uninitialized in this function [-Werror=maybe-uninitialized] [GCC 4.9.2 -Og] fs_fletcher.c:323:26: error: 'idx' may be used uninitialized in this function [-Werror=maybe-uninitialized] dsl_dataset.c:1290:12: error: 'atp' may be used uninitialized in this function [-Werror=maybe-uninitialized] [GCC 4.9.2 -Ofast] u8_textprep.c:1310:9: error: 'tc[3ul]' may be used uninitialized in this function [-Werror=maybe-uninitialized] u8_textprep.c:177:23: error: 'u8t[0ul]' may be used uninitialized in this function [-Werror=maybe-uninitialized] dsl_dataset.c:2089:37: error: ‘hds’ may be used uninitialized in this function [-Werror=maybe-uninitialized] dsl_dataset.c:3216:2: error: ‘ds’ may be used uninitialized in this function [-Werror=maybe-uninitialized] dsl_dataset.c:1591:2: error: ‘ds’ may be used uninitialized in this function [-Werror=maybe-uninitialized] dsl_dataset.c:3341:2: error: ‘ds’ may be used uninitialized in this function [-Werror=maybe-uninitialized] vdev_raidz.c:1153:8: error: 'dcount[2]' may be used uninitialized in this function [-Werror=maybe-uninitialized] vdev_raidz.c:1167:17: error: 'dst[2]' may be used uninitialized in this function [-Werror=maybe-uninitialized] kernel.c:1005:2: error: ‘resid’ may be used uninitialized in this function [-Werror=maybe-uninitialized] libzfs_dataset.c:2826:8: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized] libzfs_dataset.c:3056:35: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized] libzfs_dataset.c:1584:13: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized] libzfs_dataset.c:3056:35: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized] libzfs_dataset.c:1792:66: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized] libzfs_dataset.c:3986:35: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized] [GCC 6.1.1] Resolved in PR #4907 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4937	2016-08-09 14:40:35 -07:00
GeLiXin	88c4c7a067	Fix call zfs_get_name() with invalid parameter zfs_get_name() expects a parameter of type zfs_handle_t zhp , but gets an invalid parameter type of zfs_handle_t *zhp actually in libzfs_dataset_cmp(), which may trigger a coredump if called. libzfs_dataset_cmp() working normally so far, just because all the callers only give datasets of type ZFS_TYPE_FILESYSTEM to it, we compared their mountpoint and return, luckily. Signed-off-by: GeLiXin <ge.lixin@zte.com.cn> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4919	2016-08-08 12:24:57 -07:00
liaoyuxiangqin	e24e62a948	Fix memory leak in function add_config() Config of a hot spare or l2cache device will leak memory in function add_config(). At the start of this function, when dealing with a config which belongs to a hot spare not currently in use or a l2cache device the config should be freed. Signed-off-by: liaoyuxiangqin <guo.yong33@zte.com.cn> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4910	2016-08-01 12:49:03 -07:00
Gvozden Neskovic	78867a0a0a	libzfs_import.c: Uninitialized pointer read In zpool_find_import_scan: Reads an uninitialized pointer or its target Coverity #150966 Found by static analysis with CoverityScan 0.8.5 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4897	2016-07-29 15:34:12 -07:00
Colin Ian King	b264d9b3e5	libzfs: Fix missing va_end call on ENOSPC and EDQUOT cases The switch statement in function zfs_standard_error_fmt for the ENOSPC and EDQUOT cases returns immediately and unlike all other cases in the switch this does not perform the va_end call. Perform a break which ends up calling va_end rather than returning immediately. Found by static analysis with CoverityScan 0.8.5 Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4900	2016-07-29 15:34:12 -07:00
Brian Behlendorf	8a39abaafa	Multi-thread 'zpool import' for blkid Commit `519129f` added support to multi-thread 'zpool import' for the case where block devices are scanned for under /dev/. This commit generalizes that logic and applies it to the case where device names are acquired from libblkid. The zpool_find_import_scan() and zpool_find_import_blkid() functions create an AVL tree containing each device name. Each entry in this tree is dispatched to a taskq where the function zpool_open_func() validates the device by opening it and reading the label. This may result in additional entries being added to the tree and those device paths being verified. This is largely how the upstream OpenZFS code behaves but due to significant differences the non-Linux code has been dropped for readability. Additionally, this code makes use of taskqs and kmutexs which are normally not available to the command line tools. Special care has been taken to allow their use in the import functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #4794	2016-07-27 13:38:46 -07:00
Gvozden Neskovic	a64f903b06	Fixes for issues found with cppcheck tool The patch fixes small number of errors/false positives reported by `cppcheck`, static analysis tool for C/C++. cppcheck 1.72 $ cppcheck . --force --quiet [cmd/zfs/zfs_main.c:4444]: (error) Possible null pointer dereference: who_perm [cmd/zfs/zfs_main.c:4445]: (error) Possible null pointer dereference: who_perm [cmd/zfs/zfs_main.c:4446]: (error) Possible null pointer dereference: who_perm [cmd/zpool/zpool_iter.c:317]: (error) Uninitialized variable: nvroot [cmd/zpool/zpool_vdev.c:1526]: (error) Memory leak: child [lib/libefi/rdwr_efi.c:1118]: (error) Memory leak: efi_label [lib/libuutil/uu_misc.c:207]: (error) va_list 'args' was opened but not closed by va_end(). [lib/libzfs/libzfs_import.c:1554]: (error) Dangerous usage of 'diskname' (strncpy doesn't always null-terminate it). [lib/libzfs/libzfs_sendrecv.c:3279]: (error) Dereferencing 'cp' after it is deallocated / released [tests/zfs-tests/cmd/file_write/file_write.c:154]: (error) Possible null pointer dereference: operation [tests/zfs-tests/cmd/randfree_file/randfree_file.c:90]: (error) Memory leak: buf [cmd/zinject/zinject.c:1068]: (error) Uninitialized variable: dataset [module/icp/io/sha2_mod.c:698]: (error) Uninitialized variable: blocks_per_int64 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1392	2016-07-27 13:31:22 -07:00
Tom Caputi	0b04990a5d	Illumos Crypto Port module added to enable native encryption in zfs A port of the Illumos Crypto Framework to a Linux kernel module (found in module/icp). This is needed to do the actual encryption work. We cannot use the Linux kernel's built in crypto api because it is only exported to GPL-licensed modules. Having the ICP also means the crypto code can run on any of the other kernels under OpenZFS. I ended up porting over most of the internals of the framework, which means that porting over other API calls (if we need them) should be fairly easy. Specifically, I have ported over the API functions related to encryption, digests, macs, and crypto templates. The ICP is able to use assembly-accelerated encryption on amd64 machines and AES-NI instructions on Intel chips that support it. There are place-holder directories for similar assembly optimizations for other architectures (although they have not been written). Signed-off-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4329	2016-07-20 10:43:30 -07:00
Tyler J. Stachecki	35a76a0366	Implementation of SSE optimized Fletcher-4 Builds off of `1eeb4562` (Implementation of AVX2 optimized Fletcher-4) This commit adds another implementation of the Fletcher-4 algorithm. It is automatically selected at module load if it benchmarks higher than all other available implementations. The module benchmark was also amended to analyze the performance of the byteswap-ed version of Fletcher-4, as well as the non-byteswaped version. The average performance of the two is used to select the the fastest implementation available on the host system. Adds a pair of fields to an existing zcommon module parameter: - zfs_fletcher_4_impl (str) "sse2" - new SSE2 implementation if available "ssse3" - new SSSE3 implementation if available Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com> Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4789	2016-07-15 10:42:35 -07:00
Gvozden Neskovic	ae25d22235	Add RAID-Z routines for SSE2 instruction set, in x86_64 mode. The patch covers low-end and older x86 CPUs. Parity generation is equivalent to SSSE3 implementation, but reconstruction is somewhat slower. Previous 'sse' implementation is renamed to 'ssse3' to indicate highest instruction set used. Benchmark results: scalar_rec_p 4 720476442 scalar_rec_q 4 187462804 scalar_rec_r 4 138996096 scalar_rec_pq 4 140834951 scalar_rec_pr 4 129332035 scalar_rec_qr 4 81619194 scalar_rec_pqr 4 53376668 sse2_rec_p 4 2427757064 sse2_rec_q 4 747120861 sse2_rec_r 4 499871637 sse2_rec_pq 4 522403710 sse2_rec_pr 4 464632780 sse2_rec_qr 4 319124434 sse2_rec_pqr 4 205794190 ssse3_rec_p 4 2519939444 ssse3_rec_q 4 1003019289 ssse3_rec_r 4 616428767 ssse3_rec_pq 4 706326396 ssse3_rec_pr 4 570493618 ssse3_rec_qr 4 400185250 ssse3_rec_pqr 4 377541245 original_rec_p 4 691658568 original_rec_q 4 195510948 original_rec_r 4 26075538 original_rec_pq 4 103087368 original_rec_pr 4 15767058 original_rec_qr 4 15513175 original_rec_pqr 4 10746357 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4783	2016-07-13 10:24:55 -07:00
Paul Dagnelie	d1d19c7854	OpenZFS 6876 - Stack corruption after importing a pool with a too-long name Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking for trouble. We should check every dataset on import, using a 1024 byte buffer and checking each time to see if the dataset's new name is longer than 256 bytes. OpenZFS-issue: https://www.illumos.org/issues/6876 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ca8674e	2016-06-28 13:47:04 -07:00
Igor Kozhukhov	eca7b76001	OpenZFS 6314 - buffer overflow in dsl_dataset_name Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6314 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d6160ee	2016-06-28 13:47:03 -07:00
Brian Behlendorf	43e52eddb1	Implement zfs_ioc_recv_new() for OpenZFS 2605 Adds ZFS_IOC_RECV_NEW for resumable streams and preserves the legacy ZFS_IOC_RECV user/kernel interface. The new interface supports all stream options but is currently only used for resumable streams. This way updated user space utilities will interoperate with older kernel modules. ZFS_IOC_RECV_NEW is modeled after the existing ZFS_IOC_SEND_NEW handler. Non-Linux OpenZFS platforms have opted to change the legacy interface in an incompatible fashion instead of adding a new ioctl. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-06-28 13:47:03 -07:00
Dan McDonald	671c93546c	OpenZFS 4986 - receiving replication stream fails if any snapshot exceeds refquota Authored by: Dan McDonald <danmcd@omniti.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/4986 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5878fad	2016-06-28 13:47:03 -07:00
Brian Behlendorf	fd41e93563	OpenZFS 6051 - lzc_receive: allow the caller to read the begin record Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6051 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/620f322	2016-06-28 13:47:02 -07:00
Matthew Ahrens	47dfff3b86	OpenZFS 2605, 6980, 6902 2605 want to resume interrupted zfs send Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Dan McDonald <danmcd@omniti.com> Ported-by: kernelOfTruth <kerneloftruth@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/2605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9c3fd12 6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6980 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ea4a67f Porting notes: - All rsend and snapshop tests enabled and updated for Linux. - Fix misuse of input argument in traverse_visitbp(). - Fix ISO C90 warnings and errors. - Fix gcc 'missing braces around initializer' in 'struct send_thread_arg to_arg =' warning. - Replace 4 argument fletcher_4_native() with 3 argument version, this change was made in OpenZFS 4185 which has not been ported. - Part of the sections for 'zfs receive' and 'zfs send' was rewritten and reordered to approximate upstream. - Fix mktree xattr creation, 'user.' prefix required. - Minor fixes to newly enabled test cases - Long holds for volumes allowed during receive for minor registration.	2016-06-28 13:47:02 -07:00
Ned Bass	50c957f702	Implement large_dnode pool feature Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542	2016-06-24 13:13:21 -07:00
Gvozden Neskovic	ab9f4b0b82	SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328	2016-06-21 09:27:26 -07:00

1 2 3 4 5 ...

449 Commits