Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Brian Behlendorf	928c58dd0f	Fix vn_rdwr() compiler warning kernel.c: In function 'vn_rdwr': kernel.c:736:8: warning: unused variable 'status' [-Wunused-variable] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 14:10:30 -08:00
Matthew Ahrens	9867e8be2a	Illumos 4891 - want zdb option to dump all metadata 4891 want zdb option to dump all metadata Reviewed by: Sonu Pillai <sonu.pillai@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> We'd like a way for zdb to dump metadata in a machine-readable format, so that we can bring that back from a customer site for in-house diagnosis. Think of it as a crash dump for zpools, which can be used for post-mortem analysis of a malfunctioning pool References: https://www.illumos.org/issues/4891 https://github.com/illumos/illumos-gate/commit/df15e41 Porting notes: - [cmd/zdb/zdb.c] - `a5778ea` zdb: Introduce -V for verbatim import - In main() getopt 'opt' variable removed and the code was brought back in line with illumos. - [lib/libzpool/kernel.c] - `1e33ac1` Fix Solaris thread dependency by using pthreads - `f0e324f` Update utsname support - `4d58b69` Fix vn_open/vn_rdwr error handling - In vn_open() allocate 'dumppath' on heap instead of stack - Properly handle 'dump_fd == -1' error path - Free 'realpath' after added vn_dumpdir_code block Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 11:36:54 -08:00
Paul Dagnelie	fcff0f35bd	Illumos 5960, 5925 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin= Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> References: https://www.illumos.org/issues/5960 https://www.illumos.org/issues/5925 https://github.com/illumos/illumos-gate/commit/a2cdcdd Porting notes: - [lib/libzfs/libzfs_sendrecv.c] - `b8864a2` Fix gcc cast warnings - `325f023` Add linux kernel device support - `5c3f61e` Increase Linux pipe buffer size on 'zfs receive' - [module/zfs/zfs_vnops.c] - `3558fd7` Prototype/structure update for Linux - `c12e3a5` Restructure zfs_readdir() to fix regressions - [module/zfs/zvol.c] - Function @zvol_map_block() isn't needed in ZoL - `9965059` Prefetch start and end of volumes - [module/zfs/dmu.c] - Fixed ISO C90 - mixed declarations and code - Function dmu_prefetch() 'int i' is initialized before the following code block (c90 vs. c99) - [module/zfs/dbuf.c] - `fc5bb51` Fix stack dbuf_hold_impl() - `9b67f60` Illumos 4757, 4913 - `34229a2` Reduce stack usage for recursive traverse_visitbp() - [module/zfs/dmu_send.c] - Fixed ISO C90 - mixed declarations and code - `b58986e` Use large stacks when available - `241b541` Illumos 5959 - clean up per-dataset feature count code - `77aef6f` Use vmem_alloc() for nvlists - `00b4602` Add linux kernel memory support Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-08 15:08:19 -08:00
Matthew Ahrens	37f8a8835a	Illumos 5746 - more checksumming in zfs send 5746 more checksumming in zfs send Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@omniti.com> References: https://www.illumos.org/issues/5746 https://github.com/illumos/illumos-gate/commit/98110f0 https://github.com/zfsonlinux/zfs/issues/905 Porting notes: - Minor conflicts due to: - https://github.com/zfsonlinux/zfs/commit/2024041 - https://github.com/zfsonlinux/zfs/commit/044baf0 - https://github.com/zfsonlinux/zfs/commit/88904bb - Fix ISO C90 warnings (-Werror=declaration-after-statement) - arc_buf_t abuf; - dmu_buf_t bonus; - zio_cksum_t cksum_orig; - zio_cksum_t *cksump; - Fix format '%llx' format specifier warning - Align message in zstreamdump safe_malloc() with upstream Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3611	2015-12-30 14:24:14 -08:00
Chris Williamson	23de906c72	Illumos 5745 - zfs set allows only one dataset property to be set at a time 5745 zfs set allows only one dataset property to be set at a time Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Richard PALO <richard@NetBSD.org> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Rich Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/5745 https://github.com/illumos/illumos-gate/commit/3092556 Porting notes: - Fix the missing braces around initializer, zfs_cmd_t zc = {"\0"}; - Remove extra format argument in zfs_do_set() - Declare at the top: - zfs_prop_t prop; - nvpair_t elem; - nvpair_t next; - int i; - Additionally initialize: - int added_resv = 0; - zfs_prop_t prop = 0; - Assign 0 install of NULL for uint64_t types. - zc->zc_nvlist_conf = '\0'; - zc->zc_nvlist_src = '\0'; - zc->zc_nvlist_dst = '\0'; Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3574	2015-12-29 16:59:26 -08:00
Brian Behlendorf	a22502c9e6	Either _ILP32 or _LP64 must be defined For some arm, powerpc, and sparc platforms it was possible that neither _ILP32 of _LP64 would be defined. Update the isa_defs.h header to explicitly set these macros and generate a compile error in the case neither are defined. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4048	2015-12-10 11:55:38 -08:00
ilovezfs	25df831b81	Fix cstyle issue from `7a02327` Continuations should be indented four spaces. Signed-off-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4062	2015-12-04 14:37:41 -08:00
ilovezfs	917b8c5cec	Ext4's typical GPT partition type not recognized Adding additional entries to the efi conversion array will help prevent the overwriting of the GPTs of disks with in-use file systems in more cases. Most notably, this adds partition type 8300 "Linux filesystem" (0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and btrfs, among others. This commit itself does nothing to address the underlying problematic behavior that check_slice() isn't called on partitions of an unrecognized type, even when they contain a currently mounted file system. The additional entries were derived from these two resources: https://en.wikipedia.org/wiki/GUID_Partition_Table http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc Signed-off-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4016	2015-12-04 09:27:00 -08:00
Yuri Pankov	fc80384923	Illumos 934 - FreeBSD's GPT not recognized Reviewed by: Alexander Eremin <alexander.r.eremin@gmail.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/934 https://github.com/illumos/illumos-gate/commit/e21ea67 Ported-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4016	2015-12-04 09:19:40 -08:00
Jason Zaman	5c790678f1	sysmacros: Make P2ROUNDUP not trigger int overflow The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) \| (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) \| (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) \| (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) \| (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <jason@perfinion.com> Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3949	2015-11-16 16:10:07 -08:00
Brian Behlendorf	98401d2361	Fix maybe uninitialized As of gcc 5.1.1 20150618 (Red Hat 5.1.1-4) the -Werror=maybe-uninitialized check detects that 'snapname' in recv_incremental_replication() may not be initialized. Explicitly initialize the variable to resolved the warning. libzfs_sendrecv.c: In function ‘recv_incremental_replication’: libzfs_sendrecv.c:2019:2: error: ‘snapname’ may be used uninitialized in (void) snprintf(buf, sizeof (buf), "%s@%s", fsname, snapname); Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-11-09 12:15:19 -08:00
DHE	385f9691c4	libzfs: handle EDOM errors EDOM may occur if a user tries to set `recordsize` too large without use "zfs set". This can be demonstrated with: > zpool create testpool -O recordsize=32M /dev/... Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3911	2015-10-13 09:54:04 -07:00
Ned Bass	3af56fd95f	Honor xattr=sa dataset property ZFS incorrectly uses directory-based extended attributes even when xattr=sa is specified as a dataset property or mount option. Support to honor temporary mount options including "xattr" was added in commit `0282c4137e`. There are two issues with the mount option handling: * Libzfs has historically included "xattr" in its list of default mount options. This overrides the dataset property, so the dataset is always configured to use directory-based xattrs even when the xattr dataset property is set to off or sa. Address this by removing "xattr" from the set of default mount options in libzfs. * There was no way to enable system attribute-based extended attributes using temporary mount options. Add the mount options "saxattr" and "dirxattr" which enable the xattr behavior their names suggest. This approach has the advantages of mirroring the valid xattr dataset property values and following existing conventions for mount option names. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3787	2015-09-19 14:04:14 -07:00
Brian Behlendorf	0282c4137e	Add temporary mount options Add the required kernel side infrastructure to parse arbitrary mount options. This enables us to support temporary mount options in largely the same way it is handled on other platforms. See the 'Temporary Mount Point Properties' section of zfs(8) for complete details. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #985 Closes #3351	2015-09-03 14:14:55 -07:00
Brian Behlendorf	4cb7b9c5d4	Check large block feature flag on volumes Since ZoL allows large blocks to be used by volumes, unlike upstream illumos, the feature flag must be checked prior to volume creation. This is critical because unlike filesystems, volumes will create a object which uses large blocks as part of the create. Therefore, it cannot be safely checked in zfs_check_settable() after the dataset can been created. In addition this patch updates the relevant error messages to use zfs_nicenum() to print the maximum blocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3591	2015-08-28 09:25:03 -07:00
Richard Yao	9f5ba90f9f	Fix zvol detection The zpool create subcomand should not return an error on debug builds of the userland tools when given zvols. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3595	2015-08-19 13:32:57 -07:00
Brian Behlendorf	1229323d5f	Align thread priority with Linux defaults Under Linux filesystem threads responsible for handling I/O are normally created with the maximum priority. Non-I/O filesystem processes run with the default priority. ZFS should adopt the same priority scheme under Linux to maintain good performance and so that it will complete fairly when other Linux filesystems are active. The priorities have been updated to the following: $ ps -eLo rtprio,cls,pid,pri,nice,cmd \| egrep 'z_\|spl_\|zvol\|arc\|dbu\|meta' - TS 10743 19 -20 [spl_kmem_cache] - TS 10744 19 -20 [spl_system_task] - TS 10745 19 -20 [spl_dynamic_tas] - TS 10764 19 0 [dbu_evict] - TS 10765 19 0 [arc_prune] - TS 10766 19 0 [arc_reclaim] - TS 10767 19 0 [arc_user_evicts] - TS 10768 19 0 [l2arc_feed] - TS 10769 39 0 [z_unmount] - TS 10770 39 -20 [zvol] - TS 11011 39 -20 [z_null_iss] - TS 11012 39 -20 [z_null_int] - TS 11013 39 -20 [z_rd_iss] - TS 11014 39 -20 [z_rd_int_0] - TS 11022 38 -19 [z_wr_iss] - TS 11023 39 -20 [z_wr_iss_h] - TS 11024 39 -20 [z_wr_int_0] - TS 11032 39 -20 [z_wr_int_h] - TS 11033 39 -20 [z_fr_iss_0] - TS 11041 39 -20 [z_fr_int] - TS 11042 39 -20 [z_cl_iss] - TS 11043 39 -20 [z_cl_int] - TS 11044 39 -20 [z_ioctl_iss] - TS 11045 39 -20 [z_ioctl_int] - TS 11046 39 -20 [metaslab_group_] - TS 11050 19 0 [z_iput] - TS 11121 38 -19 [z_wr_iss] Note that under Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. This patch depends on https://github.com/zfsonlinux/spl/pull/466. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #3607	2015-07-28 13:36:47 -07:00
Matthew Ahrens	ca67b33aba	Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_grow 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5376 https://github.com/illumos/illumos-gate/commit/2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in #2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3616 Closes #2167	2015-07-23 09:41:28 -07:00
Turbo Fredriksson	47a4a6fd5f	Support parallel build trees (VPATH builds) Build products from an out of tree build should be written relative to the build directory. Sources should be referred to by their locations in the source directory. This is accomplished by adding the 'src' and 'obj' variables for the module Makefile.am, using relative paths to reference source files, and by setting VPATH when source files are not co-located with the Makefile. This enables the following: $ mkdir build $ cd build $ ../configure \ --with-spl=$HOME/src/git/spl/ \ --with-spl-obj=$HOME/src/git/spl/build $ make -s This change also has the advantage of resolving the following warning which is generated by modern versions of automake. Makefile.am:00: warning: source file 'xxx' is in a subdirectory, Makefile.am:00: but option 'subdir-objects' is disabled Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1082	2015-07-17 13:42:51 -07:00
Manoj Joseph	93f6d7e2e5	Illumos 5764 - "zfs send -nv" directs output to stderr 5764 "zfs send -nv" directs output to stderr Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Basil Crow <basil.crow@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://github.com/illumos/illumos-gate/commit/dc5f28a https://www.illumos.org/issues/5764 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3585	2015-07-14 10:28:32 -07:00
Richard Yao	541da9935d	Fix Xen Virtual Block Device detection We fail to make partitions on xvd (Xen Virtual Block) devices. This also causes debug builds of zpool create to return an error when given xen virtual block devices. These devices should be given the same treatment as vd (KVM Virtual Block) devices, so we adjust the relevant code paths. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3576	2015-07-10 12:21:14 -07:00
Will Andrews	98cb3a7655	Illumos 5813 - zfs_setprop_error(): Handle errno value E2BIG. 5813 zfs_setprop_error(): Handle errno value E2BIG. Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://github.com/illumos/illumos-gate/commit/6fdcb3d https://www.illumos.org/issues/5813 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3572	2015-07-10 12:13:19 -07:00
Jan Kryl	15cfbb38fd	Illumos 5427 - memory leak in libzfs when doing rollback 5427 memory leak in libzfs when doing rollback Reviewed by: Michael Tsymbalyuk <mtzaurus@gmail.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Dan McDonald <danmcd@omniti.com> References https://github.com/illumos/illumos-gate/commit/b7070b7 https://www.illumos.org/issues/5427 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3569	2015-07-10 12:09:32 -07:00
Josef 'Jeff' Sipek	02f8fe4260	Illumos 4626 - libzfs memleak in zpool_in_use() 4626 libzfs memleak in zpool_in_use() Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://github.com/illumos/illumos-gate/commit/fb13f48 https://www.illumos.org/issues/4626 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3563	2015-07-10 11:57:38 -07:00
George Wilson	669dedb33f	Illumos 5163 - arc should reap range_seg_cache 5163 arc should reap range_seg_cache Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5163 https://github.com/illumos/illumos-gate/commit/83803b5 Porting Notes: Added umem_cache_reap_now() wrapped to suppress unused variable warning for user space build in arc_kmem_reap_now(). Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-25 08:58:16 -07:00
Brian Behlendorf	aa9af22cdf	Update all default taskq settings Over the years the default values for the taskqs used on Linux have differed slightly from illumos. In the vast majority of cases this was done to avoid creating an obnoxious number of idle threads which would pollute the process listing. With the addition of support for dynamic taskqs all multi-threaded queues should be created as dynamic taskqs. This allows us to get the best of both worlds. * The illumos default values for the I/O pipeline can be restored. These values are known to work well for most workloads. The only exception is the zio write interrupt taskq which is changed to ZTI_P(12, 8). At least under Linux more threads has been shown to improve performance, see commit `7e55f4e`. * Reduces the number of idle threads on the system when it's not under heavy load. The maximum number of threads will only be created when they are required. * Remove the vdev_file_taskq and rely on the system_taskq instead which is now dynamic and may have up to 64-threads. Again this brings us back inline with upstream. * Tasks dispatched with taskq_dispatch_ent() are allowed to use dynamic taskqs. The Linux taskq implementation supports this. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #3507	2015-06-25 08:58:16 -07:00
Brian Behlendorf	fa720217b9	Add IMPLY() and EQUIV() macros Added for upstream compatibility, they are of the form: * IMPLY(a, b) - if (a) then (b) * EQUIV(a, b) - if (a) then (b) AND if (b) then (a) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-24 15:11:48 -07:00
Brian Behlendorf	4f34bd9792	Add taskq_wait_outstanding() function SPL commit behlendorf/spl@9cef1b5 adds the taskq_wait_outstanding() interface. See the commit log for the full justification for this addition. This patch adds the required user space counterpart. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com>	2015-06-11 10:27:25 -07:00
Prakash Surya	ca0bf58d65	Illumos 5497 - lock contention on arcs_mtx Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Porting notes and other significant code changes: The illumos 5368 patch (ARC should cache more metadata), which was never picked up by ZoL, is mostly reverted by this patch. Since ZoL relies on the kernel asynchronously calling the shrinker to actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every time it runs. The arc_adapt_thread() function no longer calls arc_do_user_evicts() since the newly-added arc_user_evicts_thread() calls it periodically. Notable conflicting ZoL commits which conflicted with this patch or whose effects are either duplicated or un-done by this patch: `302f753` - Integrate ARC more tightly with Linux `39e055c` - Adjust arc_p based on "bytes" in arc_shrink `f521ce1` - Allow "arc_p" to drop to zero or grow to "arc_c" `77765b5` - Remove "arc_meta_used" from arc_adjust calculation `94520ca` - Prune metadata from ghost lists in arc_adjust_meta Trace support for multilist_insert() and multilist_remove() has been added and produces the following output: fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 } fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 } The following arcstats have been removed: recycle_miss - Used by arcstat.py and arc_summary.py, both of which have been updated appropriately. l2_writes_hdr_miss The following arcstats have been added: evict_not_enough - Number of times arc_evict_state() was unable to evict enough buffers to reach its target amount. evict_l2_skip - Number of times arc_evict_hdr() skipped eviction because it was being written to the l2arc. l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times l2arc_write_done() failed to acquire hash_lock (and re-tries). arc_meta_min - Shows the value of the zfs_arc_meta_min module parameter (see below). The "index" column of the "dbuf" kstat has been removed since it doesn't have a direct analog in the new multilist scheme. Additional multilist- related stats could be added in the future but would likely require extensions to the mulilist API. The following module parameters have been added: zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list before moving on to the next sub-list. zfs_arc_meta_min - Enforce a floor on the amount of metadata in the ARC. zfs_arc_num_sublists_per_state - Number of multilist sub-lists per ARC state. zfs_arc_overflow_shift - Controls amount by which the ARC must exceed the target size to be considered "overflowing". Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov	2015-06-11 10:27:25 -07:00
Brian Behlendorf	65037d9b25	Add libzfs_error_init() function All fprintf() error messages are moved out of the libzfs_init() library function where they never belonged in the first place. A libzfs_error_init() function is added to provide useful error messages for the most common causes of failure. Additionally, in libzfs_run_process() the 'rc' variable was renamed to 'error' for consistency with the rest of the code base. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org>	2015-05-22 13:34:58 -07:00
Brian Behlendorf	87abfcba22	Wait in libzfs_init() for the /dev/zfs device While module loading itself is synchronous the creation of the /dev/zfs device is not. This is because /dev/zfs is typically created by a udev rule after the module is registered and presented to user space through sysfs. This small window between module loading and device creation can result in spurious failures of libzfs_init(). This patch closes that race by extending libzfs_init() so it can detect that the modules are loaded and only if required wait for the /dev/zfs device to be created. This allows scripts to reliably use the following shell construct without the need for additional error handling. $ /sbin/modprobe zfs && /sbin/zpool import -a To minimize the potential time waiting in libzfs_init() a strategy similar to adaptive mutexes is employed. The function will busy-wait for up to 10ms based on the expectation that the modules were just loaded and therefore the /dev/zfs will be created imminently. If it takes longer than this it will fall back to polling for up to 10 seconds. This behavior can be customized to some degree by setting the following new environment variables. This functionality is provided for backwards compatibility with existing scripts which depend on the module auto-load behavior. By default module auto-loading is now disabled. * ZFS_MODULE_LOADING="YES\|yes\|ON\|on" - Attempt to load modules. * ZFS_MODULE_TIMEOUT="<seconds>" - Seconds to wait for /dev/zfs The zfs-import-* systemd service files have been updated to call '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #2556	2015-05-22 13:31:58 -07:00
Hajo Möller	141b6381d3	Change 3-digit octal escapes to 4-digit ones Prefixing an octal value with a leading zero is the standard way to disambiguate it. This change only impacts the `zfs diff` output and is therefore very limited in scope. Signed-off-by: Hajo M<C3><B6>ller <dasjoe@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3417	2015-05-18 17:14:07 -07:00
Alexander Eremin	7fec46b9d8	Illumos 5847 - libzfs_diff should check zfs_prop_get() return 5847 libzfs_diff should check zfs_prop_get() return Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Albert Lee <trisk@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5847 https://github.com/illumos/illumos-gate/commit/8430278 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3412	2015-05-15 11:17:14 -07:00
Max Grossman	5dc8b7365f	Illumos 5765 - add support for estimating send stream size with lzc_send_space when source is a bookmark 5765 add support for estimating send stream size with lzc_send_space when source is a bookmark Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@nexenta.com> References: https://www.illumos.org/issues/5765 https://github.com/illumos/illumos-gate/commit/643da460 Porting notes: * Unused variable 'recordsize' in dmu_send_estimate() dropped Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3397	2015-05-13 09:03:59 -07:00
Matthew Ahrens	f1512ee61e	Illumos 5027 - zfs large block support 5027 zfs large block support Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5027 https://github.com/illumos/illumos-gate/commit/b515258 Porting Notes: * Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. * Unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on volumes. Volumes, like filesystems, are limited by the zfs_max_recordsize=1M module option. * By default the maximum record size is limited to 1M by the module option zfs_max_recordsize. This value may be safely increased up to 16M which is the largest block size supported by the on-disk format. At the moment, 1M blocks clearly offer a significant performance improvement but the benefits of going beyond this for the majority of workloads are less clear. * The illumos version of this patch increased DMU_MAX_ACCESS to 32M. This was determined not to be large enough when using 16M blocks because the zfs_make_xattrdir() function will fail (EFBIG) when assigning a TX. This was immediately observed under Linux because all newly created files must have a security xattr created and that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M. * On 32-bit platforms a hard limit of 1M is set for blocks due to the limited virtual address space. We should be able to relax this one the ABD patches are merged. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #354	2015-05-11 12:23:16 -07:00
Brian Behlendorf	3df293404a	Fix type mismatch on 32-bit systems The umem_alloc_aligned() function should not assume that a 'void *' type is 64-bit. It will not be on 32-bit platforms. Rather than complicating the ASSERT to handle this it is simply removed. Additionally, the '%lu' format specifier should not be assumed to imply a 64-bit value. Fix this by using the 'llu' format specifier which will always be atleast 64-bit and explicitly casing the variable to an u_longlong_t. This issue is handled the same way in many other parts of the code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-05-11 12:15:41 -07:00
George Wilson	98b254188a	Illumos #5244 - zio pipeline callers should explicitly invoke next stage 5244 zio pipeline callers should explicitly invoke next stage Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/5244 https://github.com/illumos/illumos-gate/commit/738f37b Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. Ported-by: Richard Yao <richard.yao@clusterhq.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2828	2015-04-30 15:07:47 -07:00
Jerry Jelinek	788eb90c4c	Illumos 3897 - zfs filesystem and snapshot limits 3897 zfs filesystem and snapshot limits Author: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3897 https://github.com/illumos/illumos-gate/commit/a2afb61 Porting Notes: dsl_dataset_snapshot_check(): reduce stack usage using kmem_alloc(). Ported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-04-28 16:22:51 -07:00
Matthew Ahrens	308a451f7f	Illumos 5134 - if ZFS_DEBUG or debug= is set, libzpool should enable debug prints 5134 if ZFS_DEBUG or debug= is set, libzpool should enable debug prints Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/projects/illumos-gate/issues/5134 https://github.com/illumos/illumos-gate/commit/7fa49ea Porting notes: Added dprintf_setup() to main in zfs_main.c and zpool_main.c. Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2669	2015-04-28 10:13:40 -07:00
Tim Chase	59199d9083	Support the "version" property on volumes via the zfs_prop_get_int() API As of this commit, volumes do not possess the version property in any existing OpenZFS implementation. The zpool upgrade code, however, uses zfs_prop_get_int() to fetch the version property of all children in a pool. The semantics of the function, however, demand that it only be used for known valid properties so it returns a garbage value for volumes. This patch causes the version of a volume to appear to callers using the zfs_prop_get_int() API to be that of the default ZPL version for the implementation. In the future, should volumes gain the property, its actual value will be used. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #3313	2015-04-24 11:08:37 -07:00
Ned Bass	9540be9b23	zpool import should honor overlay property Make the 'zpool import' command honor the overlay property to allow filesystems to be mounted on a non-empty directory. As it stands now this property is only checked by the 'zfs mount' command. Move the check into 'zfs_mount()` in libzpool so the property is honored for all callers. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3227	2015-03-27 14:46:58 -07:00
Brian Behlendorf	7d90f569b3	Check all vdev labels in 'zpool import' When using 'zpool import' to scan for available pools prefer vdev names which reference vdevs with more valid labels. There should be two labels at the start of the device and two labels at the end of the device. If labels are missing then the device has been damaged or is in some other way incomplete. Preferring names with fully intact labels helps weed out bad paths and improves the likelihood of being able to import the pool. This behavior only applies when scanning /dev/ for valid pools. If a cache file exists the pools described by the cache file will be used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Closes #3145 Closes #2844 Closes #3107	2015-03-25 14:52:52 -07:00
Richard Yao	5c3f61eb49	Increase Linux pipe buffer size on 'zfs receive' I noticed when reviewing documentation that it is possible for user space to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change the kernel pipe buffer size on Linux to increase the pipe size up to the value specified in /proc/sys/fs/pipe-max-size. There are users using mbuffer to improve zfs recv performance when piping over the network, so it seems advantageous to integrate such functionality directly into the zfs recv tool. This avoids the addition of two buffers and two copies (one for the buffer mbuffer adds and another for the additional pipe), so it should be more efficient. This could have been made configurable and/or this could have changed the value back to the original after we were done with the file descriptor, but I do not see a strong case for doing either, so I went with a simple implementation. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1161	2015-03-20 10:03:34 -07:00
Christer Ekholm	8bdcfb5396	Fix possible future overflow in zfs_nicenum The function zfs_nicenum that converts number to human-readable output uses a index to a string of letters. This patch limits the index to the length of the string. Signed-off-by: Christer Ekholm <che@chrekh.se> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3122	2015-02-24 11:39:10 -08:00
Brian Behlendorf	340dfbe193	Change VERIFY to ASSERT in mutex_destroy() There have been multiple reports of 'zdb' tripping the VERIFY in mutex_destroy() because pthread_mutex_destroy() returns EBUSY. Exactly how this can happen still needs to be explained, but this doesn't strictly need to be fatal for non-debug builds. Therefore, this patch converts the VERIFY to an ASSERT until the root cause is determined and resolved. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2027	2015-02-11 13:58:40 -08:00
Tim Chase	e890dd85a7	Produce a full snapshot list for zfs send -p In order to accelerate zfs receive operations in the face of many property-containing snapshots, commit `0574855` changed the header nvlist ("fss") of a send stream to exclude snapshots which aren't part of the stream. This, however, would cause zfs receive -F to erroneously remove snapshots; it would remove any snapshot which wasn't listed in the header nvlist. This patch restores the full list of snapshots in fss[<id>[snaps]] but still suppresses the properties of non-sent snapshots and also removes a consistency check in which an error is raised if a listed snapshot does not have any properties in fss[<id>[snapprops]]. The `0574855` commit also introduced a bug in which zfs send -p of a complete stream (zfs send -p pool/fs@snap) would exclude the snapshot properties in fss[<id>[snapprops]]. This patch detects the last snapshot in a series when no "from" snapshot has been specified and includes its properties. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2907	2015-02-09 16:43:17 -08:00
Chunwei Chen	53698a453d	Read spl_hostid module parameter before gethostid() If spl_hostid is set via module parameter, it's likely different from gethostid(). Therefore, the userspace tool should read it first before falling back to gethostid(). Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3034	2015-02-04 16:44:53 -08:00
Richard Yao	f9f431cd28	Use (void) memcpy(), not (void *) memcpy() This was caught by Clang. Clearly the intent of this code was to explicitly ignore the return value. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3054	2015-01-30 09:43:04 -08:00
Brian Behlendorf	6466b61db6	Make `zpool import -d\|-c` behave consistently When importing pools with zpool import -aN there is inconsistent behavior between '-d /dev/disk/by-id' (or another path) and '-c /etc/zfs/zpool.cache'. The difference in behavior is caused by zpool_find_import_cached() returning an empty nvlist_t when there are no pools to import but zpool_find_import_impl() returns NULL for the same situation. The behavior of zpool_find_import_cached() is arguably more correct because it allows returning NULL to be used for an error case and not an empty set. This change resolves the issue by updating get_configs() such that it returns an empty set instead of NULL when no config is found. The updated behavior will now always return 0 for this case. $ zpool import -aN; echo $? no pools available to import 0 $ zpool import -aN -d /var/tmp/; echo $? no pools available to import 0 $ zpool import -aN -c /etc/zfs/zpool.cache; echo $? no pools available to import 0 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2080	2015-01-28 11:12:31 -08:00
Brian Behlendorf	92119cc259	Mark IO pipeline with PF_FSTRANS In order to avoid deadlocking in the IO pipeline it is critical that pageout be avoided during direct memory reclaim. This ensures that the pipeline threads can always make forward progress and never end up blocking on a DMU transaction. For this very reason Linux now provides the PF_FSTRANS flag which may be set in the process context. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-01-16 14:28:05 -08:00
Prakash Surya	0b39b9f96f	Swap DTRACE_PROBE* with Linux tracepoints This patch leverages Linux tracepoints from within the ZFS on Linux code base. It also refactors the debug code to bring it back in sync with Illumos. The information exported via tracepoints can be used for a variety of reasons (e.g. debugging, tuning, general exploration/understanding, etc). It is advantageous to use Linux tracepoints as the mechanism to export this kind of information (as opposed to something else) for a number of reasons: * A number of external tools can make use of our tracepoints "automatically" (e.g. perf, systemtap) * Tracepoints are designed to be extremely cheap when disabled * It's one of the "accepted" ways to export this kind of information; many other kernel subsystems use tracepoints too. Unfortunately, though, there are a few caveats as well: * Linux tracepoints appear to only be available to GPL licensed modules due to the way certain kernel functions are exported. Thus, to actually make use of the tracepoints introduced by this patch, one might have to patch and re-compile the kernel; exporting the necessary functions to non-GPL modules. * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux tracepoints are not available for unsigned kernel modules (tracepoints will get disabled due to the module's 'F' taint). Thus, one either has to sign the zfs kernel module prior to loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or newer. Assuming the above two requirements are satisfied, lets look at an example of how this patch can be used and what information it exposes (all commands run as 'root'): # list all zfs tracepoints available $ ls /sys/kernel/debug/tracing/events/zfs enable filter zfs_arc__delete zfs_arc__evict zfs_arc__hit zfs_arc__miss zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write zfs_new_state__mfu zfs_new_state__mru # enable all zfs tracepoints, clear the tracepoint ring buffer $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable $ echo 0 > /sys/kernel/debug/tracing/trace # import zpool called 'tank', inspect tracepoint data (each line was # truncated, they're too long for a commit message otherwise) $ zpool import tank $ cat /sys/kernel/debug/tracing/trace \| head -n35 # tracer: nop # # entries-in-buffer/entries-written: 1219/1219 #P:8 # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr... z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr... z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr... z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr... z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr... z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr... z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr... z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ... To highlight the kind of detailed information that is being exported using this infrastructure, I've taken the first tracepoint line from the output above and reformatted it such that it fits in 80 columns: lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr { dva 0x1:0x40082 birth 15491 cksum0 0x163edbff3a flags 0x640 datacnt 1 type 1 size 2048 spa 3133524293419867460 state_type 0 access 0 mru_hits 0 mru_ghost_hits 0 mfu_hits 0 mfu_ghost_hits 0 l2_hits 0 refcount 1 } bp { dva0 0x1:0x40082 dva1 0x1:0x3000e5 dva2 0x1:0x5a006e cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00 lsize 2048 } zb { objset 0 object 0 level -1 blkid 0 } For the specific tracepoint shown here, 'zfs_arc__miss', data is exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!). This kind of precise and detailed information can be extremely valuable when trying to answer certain kinds of questions. For anybody unfamiliar but looking to build on this, I found the XFS source code along with the following three web links to be extremely helpful: * http://lwn.net/Articles/379903/ * http://lwn.net/Articles/381064/ * http://lwn.net/Articles/383362/ I should also node the more "boring" aspects of this patch: * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to support a sixth paramter. This parameter is used to populate the contents of the new conftest.h file. If no sixth parameter is provided, conftest.h will be empty. * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced. This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro, except it has support for a fifth option that is then passed as the sixth parameter to ZFS_LINUX_COMPILE_IFELSE. These autoconf changes were needed to test the availability of the Linux tracepoint macros. Due to the odd nature of the Linux tracepoint macro API, a separate ".h" must be created (the path and filename is used internally by the kernel's define_trace.h file). * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This is to determine if we can safely enable the Linux tracepoint functionality. We need to selectively disable the tracepoint code due to the kernel exporting certain functions as GPL only. Without this check, the build process will fail at link time. In addition, the SET_ERROR macro was modified into a tracepoint as well. To do this, the 'sdt.h' file was moved into the 'include/sys' directory and now contains a userspace portion and a kernel space portion. The dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as well. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-11-17 11:13:55 -08:00
Brian Behlendorf	f0e324f25d	Update utsname support Modify the code to use the utsname() kernel function rather than a global variable. This results is cleaner more portable code because utsname() is already provided by the kernel and can be easily emulated in user space via uname(2). This means that it will behave consistently in both contexts. This is also has the benefit that it allows the removal of a few _KERNEL pre-processor conditions. And it also is a pre-requisite for a proper FUSE port because we need to provide a valid utsname. Finally, it allows us to remove this functionality from the SPL and all the related compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2757	2014-10-17 14:58:57 -07:00
Andriy Gapon	057485504e	zfs send -p send properties only for snapshots that are actually sent ... as opposed to sending properties of all snapshots of the relevant filesystem. The previous behavior results in properties being set on all snapshots on the receiving side, which is quite slow. Behavior of zfs send -R is not changed. References: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/346 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2729 Issue #2210	2014-10-02 16:52:02 -07:00
smh	7509a3d299	FreeBSD PR kern/172259: Fixes zfs receive errors FreeBSD PR kern/172259: Fixes zfs receive errors caused by snapshot replication being processed in a random order instead of creation order. Eliminates needless filesystem renames caused by removed parent snapshots which subsequently causes many more errors. PR: kern/172259 Submitted by: Steven Hartland Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks References: https://github.com/freebsd/freebsd/commit/4995789 Porting notes: Minor whitespace fixes were made to conform with style requirements: lib/libzfs/libzfs_sendrecv.c: 2269: indent by spaces instead of tabs lib/libzfs/libzfs_sendrecv.c: 2270: indent by spaces instead of tabs Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2729	2014-10-02 16:48:19 -07:00
Richard Yao	83e9986f6e	Implement -t option to zpool create for temporary pool names Creating virtual machines that have their rootfs on ZFS on hosts that have their rootfs on ZFS causes SPA namespace collisions when the standard name rpool is used. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. `26b42f3f9d` introduced `zpool import -t ...` to simplify situations where a host must access a guest's pool when there is a SPA namespace conflict. We build upon that to introduce `zpool import -t tname ...`. That allows us to create a pool whose in-core name is tname, but whose on-disk name is the normal name specified. This simplifies the creation of machine images that use a rootfs on ZFS. That benefits not only real world deployments, but also ZFSOnLinux development by decreasing the time needed to perform rootfs on ZFS experiments. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2417	2014-09-30 10:46:59 -07:00
Brian Behlendorf	aa0ac7caa4	Make user stack limit configurable To aid in detecting and debugging stack overflow issues make the user space stack limit configurable via a new ZFS_STACK_SIZE environment variable. The value assigned to ZFS_STACK_SIZE will be used as the default stack size in bytes. Because this is mainly useful as a debugging aid in conjunction with ztest the stack limit is disabled by default. See the ztest(1) man page for additional details on using the ZFS_STACK_SIZE environment variable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #2743 Issue #2293	2014-09-30 10:46:55 -07:00
Matthew Ahrens	1f6f97f304	Illumos 5116 - zpool history -i goes into infinite loop 5116 zpool history -i goes into infinite loop Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Boris Protopopov <boris.protopopov@me.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5116 https://github.com/illumos/illumos-gate/commit/3339867 Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2715	2014-09-23 11:58:05 -07:00
Matthew Ahrens	ab2894e66f	Illumos 5135 - zpool_find_import_cached() can use fnvlist_* Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5135 https://github.com/illumos/illumos-gate/commit/b18d6b0 Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2693	2014-09-23 11:44:29 -07:00
Richard Yao	843b4aad50	lib/libzpool/kernel.c: Assert no owners in rw_destroy() This is intended to cause ztest to fail when rw_destroy() is called on a rwlock that has owners. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2330	2014-09-23 10:34:46 -07:00
Richard Yao	928ee9fe18	Properly NULL terminate string in zfs_strcmp_pathname The utility cppcheck caught this. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2330	2014-09-23 10:32:21 -07:00
George Wilson	a05dfd0028	Illumos 5147 - zpool list -v should show individual disk capacity The 'zpool list -v' command displays lots of info but excludes the capacity of each disk. This should be added. 5147 zpool list -v should show individual disk capacity Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Matthew Ahrens <matthew.ahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5147 https://github.com/illumos/illumos-gate/commit/7a09f97 Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2688	2014-09-23 10:10:42 -07:00
Ned Bass	cfd3549a53	Remove obsolete comment about guard pages Remove an obsolete comment that refers to code removed by commit `79c6e4c4`. The code and comment related to space consumed by guard pages in user-space stacks, which we no longer take into account. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2722	2014-09-22 12:22:16 -07:00
ilovezfs	1ca56e6033	Fragmentation should display as '-' if spacemap_histogram=disabled When com.delphix:spacemap_histogram is disabled, the value of fragmentation was printing as 18446744073709551615 (UINT64_MAX), when it should print as '-'. The issue was caused by a small mistake during the merge of "4980 metaslabs should have a fragmentation metric." upstream: https://github.com/illumos/illumos-gate/commit/2e4c998 ZoL: https://github.com/zfsonlinux/zfs/commit/f3a7f66 The problem is in zpool_get_prop_literal, where the handling of the pool property ZPOOL_PROP_FRAGMENTATION was added to wrong the section. In particular, ZPOOL_PROP_FRAGMENTATION should not be in the section where zpool_get_state(zhp) == POOL_STATE_UNAVAIL, but lower down after it's already been determined that the pool is in fact available, which is where upstream illumos correctly has had it. Thanks to lundman for helping to track down this bug. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2664	2014-09-05 09:19:01 -07:00
Turbo Fredriksson	c3f8dc2a48	Add a pkgconfig file Providing a pkg-config file makes is easy for 3rd party applications to link against the libzfs libraries. It also allows the libzfs developers to modify the list of required libraries and cflags without breaking existing applications. The following example illustrates how pkg-config can be used: cc `pkg-config --cflags --libs libzfs` -o myapp myapp.c /* * myapp.c / void main() { libzfs_handle_t hdl; hdl = libzfs_init(); if (hdl) libzfs_fini(hdl); } Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #585	2014-08-28 07:59:43 -07:00
Brian Behlendorf	9ad656b2d0	Retire HAVE_IOCTL_* configure checks The HAVE_IOCTL_* configure checks were originally added for compatibility with an ancient version of glibc. This support and additional complexity is no longer needed and is therefore being removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Closes #585	2014-08-28 07:45:54 -07:00
Andrew Hamilton	d09a99f96b	2493 change efi_rescan() to wait longer Change efi_rescan() to loop 10 times instead of 5 on EBUSY and to sleep at the end of each loop. This helps with some instances where the kernel does not reload the partition table fast enough for ZFS to detect. Signed-off-by: Andrew Hamilton <ahamilto@tjhsst.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2493	2014-08-26 16:12:37 -07:00
George Wilson	f3a7f6610f	Illumos 4976-4984 - metaslab improvements 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4983 need to collect metaslab information via mdb 4984 device selection should use fragmentation metric Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4976 https://www.illumos.org/issues/4978 https://www.illumos.org/issues/4979 https://www.illumos.org/issues/4980 https://www.illumos.org/issues/4981 https://www.illumos.org/issues/4982 https://www.illumos.org/issues/4983 https://www.illumos.org/issues/4984 https://github.com/illumos/illumos-gate/commit/2e4c998 Notes: The "zdb -M" option has been re-tasked to display the new metaslab fragmentation metric and the new "zdb -I" option is used to control the maximum number of in-flight I/Os. The new fragmentation metric is derived from the space map histogram which has been rolled up to the vdev and pool level and is presented to the user via "zpool list". Add a number of module parameters related to the new metaslab weighting logic. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2595	2014-08-18 08:40:49 -07:00
Alec Salazar	449e75d768	Avoid PAGESIZE redefinition Add #ifndef PAGESIZE to avoid redefinition warning on platforms where this value is already provided. Signed-off-by: Alec Salazar <alec.j.salazar@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2588	2014-08-13 10:36:09 -07:00
Alec Salazar	22a11a5b5a	Replace __va_list with va_list Most of the code base already uses va_list, which is specified by iso-c. gcc/glibc provides 'typedef __gnuc_va_list va_list'. and when not using gcc/glibc we can't expect to find __gnuc_va_list. Signed-off-by: Alec Salazar <alec.j.salazar@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2588	2014-08-13 10:35:00 -07:00
Matthew Ahrens	5dbd68a352	Illumos 4914 - zfs on-disk bookmark structure should be named _phys_t 4914 zfs on-disk bookmark structure should be named _phys_t Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/4914 https://github.com/illumos/illumos-gate/commit/7802d7b Porting notes: There were a number of zfsonlinux-specific uses of zbookmark_t which needed to be updated. This should reduce the likelihood of further problems like issue #2094 from occurring. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2558	2014-08-06 14:48:41 -07:00
Matthew Ahrens	fbeddd60b7	Illumos 4390 - I/O errors can corrupt space map when deleting fs/vol 4390 i/o errors when deleting filesystem/zvol can lead to space map corruption Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4390 https://github.com/illumos/illumos-gate/commit/7fd05ac Porting notes: Previous stack-reduction efforts in traverse_visitb() caused a fair number of un-mergable pieces of code. This patch should reduce its stack footprint a bit more. The new local bptree_entry_phys_t in bptree_add() is dynamically-allocated using kmem_zalloc() for the purpose of stack reduction. The new global zfs_free_leak_on_eio has been defined as an integer rather than a boolean_t as was the case with the related zfs_recover global. Also, zfs_free_leak_on_eio's definition has been inserted into zfs_debug.c for consistency with the existing definition of zfs_recover. Illumos placed it in spa_misc.c. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2545	2014-08-04 11:50:52 -07:00
Matthew Ahrens	9b67f60560	Illumos 4757, 4913 4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4757 https://www.illumos.org/issues/4913 https://github.com/illumos/illumos-gate/commit/5d7b4d4 Porting notes: For compatibility with the fastpath code the zio_done() function needed to be updated. Because embedded-data block pointers do not require DVAs to be allocated the associated vdevs will not be marked and therefore should not be unmarked. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2544	2014-08-01 14:28:05 -07:00
Matthew Ahrens	9bd274ddd8	Illumos #4374 4374 dn_free_ranges should use range_tree_t Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4374 https://github.com/illumos/illumos-gate/commit/bf16b11 Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2531	2014-07-30 09:20:35 -07:00
Matthew Ahrens	da536844d5	Illumos 4368, 4369. 4369 implement zfs bookmarks 4368 zfs send filesystems from readonly pools Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4369 https://www.illumos.org/issues/4368 https://github.com/illumos/illumos-gate/commit/78f1710 Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2530	2014-07-29 10:55:29 -07:00
Matthew Ahrens	fa86b5dbb6	Illumos 4171, 4172 4171 clean up spa_feature_*() interfaces 4172 implement extensible_dataset feature for use by other zpool features Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Garrett D'Amore <garrett@damore.org>a References: https://www.illumos.org/issues/4171 https://www.illumos.org/issues/4172 https://github.com/illumos/illumos-gate/commit/2acef22 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2528	2014-07-25 16:40:07 -07:00
George Wilson	93cf20764a	Illumos #4101 , #4102 , #4103 , #4105 , #4106 4101 metaslab_debug should allow for fine-grained control 4102 space_maps should store more information about themselves 4103 space map object blocksize should be increased 4105 removing a mirrored log device results in a leaked object 4106 asynchronously load metaslab Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Sebastien Roy <seb@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Prior to this patch, space_maps were preferred solely based on the amount of free space left in each. Unfortunately, this heuristic didn't contain any information about the make-up of that free space, which meant we could keep preferring and loading a highly fragmented space map that wouldn't actually have enough contiguous space to satisfy the allocation; then unloading that space_map and repeating the process. This change modifies the space_map's to store additional information about the contiguous space in the space_map, so that we can use this information to make a better decision about which space_map to load. This requires reallocating all space_map objects to increase their bonus buffer size sizes enough to fit the new metadata. The above feature can be enabled via a new feature flag introduced by this change: com.delphix:spacemap_histogram In addition to the above, this patch allows the space_map block size to be increase. Currently the block size is set to be 4K in size, which has certain implications including the following: * 4K sector devices will not see any compression benefit * large space_maps require more metadata on-disk * large space_maps require more time to load (typically random reads) Now the space_map block size can adjust as needed up to the maximum size set via the space_map_max_blksz variable. A bug was fixed which resulted in potentially leaking an object when removing a mirrored log device. The previous logic for vdev_remove() did not deal with removing top-level vdevs that are interior vdevs (i.e. mirror) correctly. The problem would occur when removing a mirrored log device, and result in the DTL space map object being leaked; because top-level vdevs don't have DTL space map objects associated with them. References: https://www.illumos.org/issues/4101 https://www.illumos.org/issues/4102 https://www.illumos.org/issues/4103 https://www.illumos.org/issues/4105 https://www.illumos.org/issues/4106 https://github.com/illumos/illumos-gate/commit/0713e23 Porting notes: A handful of kmem_alloc() calls were converted to kmem_zalloc(). Also, the KM_PUSHPAGE and TQ_PUSHPAGE flags were used as necessary. Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2488	2014-07-22 09:39:16 -07:00
Garrison Jensen	b8fce77b08	Fix comment spelling errors. Signed-off-by: Garrison Jensen <garrison.jensen@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2402	2014-07-01 14:20:23 -07:00
Tim Chase	09c0b8fe5e	Return default value on numeric properties failing the "head check. Updates `962d524212`. The referenced fix to get_numeric_property() caused numeric property lookups to consider the type of the parent (head) dataset when checking validity but there are some cases in the caller expects to see the property's default value even when the lookup is invalid. One case in which this is true is change_one() which is part of the renaming infrastructure. It may look up "zoned" on a snapshot of a volume which is not valid but it expects to see the default value of false. There may be other, yet unidentified cases in which zfs_prop_get_int() is used on technically invalid properties but which expect the property's default value to be returned. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Turbo Fredriksson <turbo@bayour.com> Closes #2320	2014-07-01 14:14:31 -07:00
Brian Behlendorf	d4aae2a054	Improve differing sector size error When adding or replacing a vdev with a different sector size the error message should be more useful. In addition to describing the problem provide a hint that the '-o ashift' option can be used to override the optimal default value. Since using a non-optimal value may incur a significant performance penalty we should issue this error. But there a numerous reasons why a administrator may wish to do this anyway. Signed-off-by: Niklas Edmundsson <ZNikke@github> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2421	2014-06-27 11:44:36 -07:00
Richard Yao	4def05f8a6	Fix memory leak in zpool_clear_label() Clang's static analyzer reported a memory leak in zpool_clear_label(). Upon review, it turns out to be right. This should be a very short lived leak because no daemons use this functionality, but that does not preclude the possibility of third party daemons that do use it. Lets fix it to be a good Samaritan. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2330	2014-05-30 17:00:37 -07:00
Marcel Huber	58bd7ad060	Omit compiler warning by sticking to RAII Resolve gcc 4.9.0 20140507 warnings about uninitialized 'ptr' when using -Wmaybe-uninitialized. The first two cases appears appear to be legitimate but not the second two. In general this is a good practice so they are all initialized. Signed-off-by: Marcel Huber <marcelhuberfoo@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2345	2014-05-22 09:44:15 -07:00
Tim Chase	962d524212	Check the dataset type more rigorously when fetching properties. When fetching property values of snapshots, a check against the head dataset type must be performed. Previously, this additional check was performed only when fetching "version", "normalize", "utf8only" or "case". This caused the ZPL properties "acltype", "exec", "devices", "nbmand", "setuid" and "xattr" to be erroneously displayed with meaningless values for snapshots of volumes. It also did not allow for the display of "volsize" of a snapshot of a volume. This patch adds the headcheck flag paramater to zfs_prop_valid_for_type() and zprop_valid_for_type() to indicate the check is being done against a head dataset's type in order that properties valid only for snapshots are handled correctly. This allows the the head check in get_numeric_property() to be performed when fetching a property for a snapshot. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2265	2014-05-06 10:41:46 -07:00
Richard Yao	3af3df905f	libspl: Implement LWP rwlock interface This implements a subset of the LWP rwlock interface by wrapping the equivalent POSIX thread interface. It is a superset of the features needed by ztest. The missing bits are {,_}rw_read_held() and {,_}rw_write_held(). Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1970	2014-05-01 15:53:52 -07:00
ilovezfs	78597769b4	Fill in mountpoint buffer before using it in errors zfs_is_mountable() fills in the mountpoint buffer, so, as in upstream, it needs to have been called before the mountpoint buffer can be used in error messages. In particular, return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED, dgettext(TEXT_DOMAIN, "cannot mount '%s'"), mountpoint)); should not come before the call to zfs_is_mountable(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ilovezfs <ilovezfs@icloud.com> Closes #2284	2014-04-30 15:52:01 -07:00
Jorgen Lundman	cdf37f0c59	Add support for aarch64 (ARMv8) Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2260	2014-04-25 15:35:30 -07:00
Tim Chase	b066274a77	Report atime and relatime as the property's actual value. Neither atime nor relatime should be considered to be "temporary mount point properties". Their semantics are enforced completely within ZFS and also they're (correctly) not documented as being temporary mount point properties. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2257	2014-04-16 11:57:17 -07:00
Chris Dunlap	7368eb621e	Set errno for mkdirp() called with NULL path ptr If mkdirp() is called with a NULL ptr for the path arg, it will return -1 with errno unchanged. This is unexpected since on error it should return -1 and set errno to one of the error values listed for mkdir(2). This commit sets errno = ENOENT for this NULL ptr case. This is in accordance with the errors specified by mkdir(2): ENOENT A component of the path prefix does not exist or is a null pathname. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2248	2014-04-09 13:32:22 -07:00
Richard Yao	787c455ed7	Improve partition detection on lesser used devices The format strings in efi_get_info() are intended to extract both the main device and partition number. However, this is only done correctly for hd, sd and vd devices. The format strings for ram, dm-, md and loop devices misparse the input. This causes the partition device to be incorrectly labelled as the main device with the partition being labelled 0. Reported-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2175	2014-04-08 14:45:12 -07:00
John M. Layman	cbca6076b3	Fix for re-reading /etc/mtab. This is a continuation of fb5c53ea65b75c67c23f90ebbbb1134a5bb6c140: When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. In this commit, a few more rewind(3) calls were replaced with freopen(3) to allow updated mtab entries to be picked up immediately. Signed-off-by: John M. Layman <jml@frijid.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2215 Issue #1611	2014-04-04 09:46:20 -07:00
Brian Behlendorf	1a5c611a22	Make command line guid parsing more tolerant Several of the zfs utilities allow you to pass a vdev's guid rather than the device name. However, the utilities are not consistent in how they parse that guid. For example, 'zinject' expects the guid to be passed as a hex value while 'zpool replace' wants it as a decimal. The user is forced to just know what format to use. This patch improve things by making the parsing more tolerant. When strtol(3) is called using 0 for the base, rather than say 10 or 16, it will then accept hex, decimal, or octal input based on the prefix. From the man page. If base is zero or 16, the string may then include a "0x" prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is '0', in which case it is taken as 8 (octal). NOTE: There may be additional conversions not caught be this patch. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-04-02 13:10:08 -07:00
Chris Dunlap	8c7aa0cfc4	Replace zpool_events_next() "block" parm w/ "flags" zpool_events_next() can be called in blocking mode by specifying a non-zero value for the "block" parameter. However, the design of the ZFS Event Daemon (zed) requires additional functionality from zpool_events_next(). Instead of adding additional arguments to the function, it makes more sense to use flags that can be bitwise-or'd together. This commit replaces the zpool_events_next() int "block" parameter with an unsigned bitwise "flags" parameter. It also defines ZEVENT_NONE to specify the default behavior. Since non-blocking mode can be specified with the existing ZEVENT_NONBLOCK flag, the default behavior becomes blocking mode. This, in effect, inverts the previous use of the "block" parameter. Existing callers of zpool_events_next() have been modified to check for the ZEVENT_NONBLOCK flag. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-03-31 16:11:21 -07:00
Brian Behlendorf	9b101a7320	Clarify zpool_events_next() comment Due to the very poorly chosen argument name 'cleanup_fd' it was completely unclear that this file descriptor is used to track the current cursor location. When the file descriptor is created by opening ZFS_DEV a private cursor is created in the kernel for the returned file descriptor. Subsequent calls to zpool_events_next() and zpool_events_seek() then require the file descriptor as an argument to reposition the cursor. When the file descriptor is closed the kernel state tracking the cursor is destroyed. This patch contains no functional change, it just changes a few variable names and clarifies the documentation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-03-31 16:11:08 -07:00
Brian Behlendorf	75e3ff58fe	Add zpool_events_seek() functionality The ZFS_IOC_EVENTS_SEEK ioctl was added to allow user space callers to seek around the zevent file descriptor by EID. When a specific EID is passed and it exists the cursor will be positioned there. If the EID is no longer cached by the kernel ENOENT is returned. The caller may also pass ZEVENT_SEEK_START or ZEVENT_SEEK_END to seek to those respective locations. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-03-31 16:10:57 -07:00
Gunnar Beutner	4d8c78c844	Remount datasets for "zfs inherit". Changing properties with "zfs inherit" should cause the datasets to be remounted. This ensures that the modified property values will be propagated in to the filesystem namespace where they can be enforced. This change is modeled after an identical fix made to zfs_prop_set(). Signed-off-by: Gunnar Beutner <gunnar@beutner.name> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2201	2014-03-24 11:11:15 -07:00
Richard Yao	8959b29e31	Assert alignment in umem_alloc_aligned Valgrind suggests that the address we are returning is not properly aligned, so lets add an assertion. ==87740== Address 0x1012a22a is 554 bytes inside a block of size 4,096 alloc'd ==87740== at 0x4C2BBA0: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==87740== by 0x4C2BCC7: posix_memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==87740== by 0x52FA845: zio_buf_alloc (umem.h:101) ==87740== by 0x52F6226: zil_alloc_lwb (zil.c:463) ==87740== by 0x52F8559: zil_commit (zil.c:566) ==87740== by 0x40611D: ztest_freeze (ztest.c:5909) ==87740== by 0x4066A7: ztest_init (ztest.c:6048) ==87740== by 0x407AF4: main (ztest.c:6226) Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2174	2014-03-20 12:01:37 -07:00
Brian Behlendorf	ffe9d38275	Add generic errata infrastructure From time to time it may be necessary to inform the pool administrator about an errata which impacts their pool. These errata will by shown to the administrator through the 'zpool status' and 'zpool import' output as appropriate. The errata must clearly describe the issue detected, how the pool is impacted, and what action should be taken to resolve the situation. Additional information for each errata will be provided at http://zfsonlinux.org/msg/ZFS-8000-ER. To accomplish the above this patch adds the required infrastructure to allow the kernel modules to notify the utilities that an errata has been detected. This is done through the ZPOOL_CONFIG_ERRATA uint64_t which has been added to the pool configuration nvlist. To add a new errata the following changes must be made: * A new errata identifier must be assigned by adding a new enum value to the zpool_errata_t type. New enums must be added to the end to preserve the existing ordering. * Code must be added to detect the issue. This does not strictly need to be done at pool import time but doing so will make the errata visible in 'zpool import' as well as 'zpool status'. Once detected the spa->spa_errata member should be set to the new enum. * If possible code should be added to clear the spa->spa_errata member once the errata has been resolved. * The show_import() and status_callback() functions must be updated to include an informational message describing the errata. This should include an action message describing what an administrator should do to address the errata. * The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be updated to describe the errata. This space can be used to provide as much additional information as needed to fully describe the errata. A link to this documentation will be automatically generated in the output of 'zpool import' and 'zpool status'. Original-idea-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Richard Yao <ryao@gentoo.or Issue #2094	2014-02-21 12:10:40 -08:00
Tim Chase	6d111134c0	Implement relatime. Add the "relatime" property. When set to "on", a file's atime will only be updated if the existing atime at least a day old or if the existing ctime or mtime has been updated since the last access. This behavior is compatible with the Linux "relatime" mount option. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2064 Closes #1917	2014-01-29 15:50:44 -08:00
Brian Behlendorf	741304503a	Prevent duplicate mnttab cache entries Under Linux its possible to mount the same filesystem multiple times in the namespace. This can be done either with bind mounts or simply with multiple mount points. Unfortunately, the mnttab cache code is implemented using an AVL tree which does not support duplicate entries. To avoid this issue this patch updates the code to check for a duplicate entry before adding a new one. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michael Martin <mgmartin.mgm@gmail.com> Closes #2041	2014-01-14 10:27:12 -08:00
Brian Behlendorf	d7ec8d4fd9	Define the needed ISA types for Sparc Add the minimum required ISA types to support the Sparc architecture. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: marku89 <mar42@kola.li> Issue #1700	2014-01-09 15:49:59 -08:00
Brian Behlendorf	4dad7d91e2	Remove unconditional sharetab update Removes the unconditional sharetab update when running any zfs command. This means the sharetab might become out of date if users are manually adding/removing shares with exportfs. But we shouldn't punish all callers to zfs in order to handle that unlikely case. In the unlikely event we observe issues because of this it can always be added back to just the share/unshare call paths where we need an up to date sharetab. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Issue #845	2014-01-07 09:48:09 -08:00
Matthew Thode	11b9ec23b9	Add full SELinux support Four new dataset properties have been added to support SELinux. They are 'context', 'fscontext', 'defcontext' and 'rootcontext' which map directly to the context options described in mount(8). When one of these properties is set to something other than 'none'. That string will be passed verbatim as a mount option for the given context when the filesystem is mounted. For example, if you wanted the rootcontext for a filesystem to be set to 'system_u:object_r:fs_t' you would set the property as follows: $ zfs set rootcontext="system_u:object_r:fs_t" storage-pool/media This will ensure the filesystem is automatically mounted with that rootcontext. It is equivalent to manually specifying the rootcontext with the -o option like this: $ zfs mount -o rootcontext=system_u:object_r:fs_t storage-pool/media By default all four contexts are set to 'none'. Further information on SELinux contexts is detailed in mount(8) and selinux(8) man pages. Signed-off-by: Matthew Thode <prometheanfire@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #1504	2013-12-19 10:37:31 -08:00
Michael Kjorling	d1d7e2689d	cstyle: Resolve C style issues The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1821	2013-12-18 16:46:35 -08:00
renelson	a5f3665168	Handle acl flags from util-linux mount command Add acl, noacl and posixacl to option_map, avoiding ENOENT error case when mount from util-linux-2.24 execs mount.zfs with any of those flags Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: renelson <bnelson@nelsonbe.com> Issue #1968	2013-12-18 16:46:20 -08:00
Brian Behlendorf	ba6a24026c	Remove ZFC_IOC__MINOR ioctl()s Early versions of ZFS coordinated the creation and destruction of device minors from userspace. This was inherently racy and in late 2009 these ioctl()s were removed leaving everything up to the kernel. This significantly simplified the code. However, we never picked up these changes in ZoL since we'd already significantly adjusted this code for Linux. This patch aims to rectify that by finally removing ZFC_IOC__MINOR ioctl()s and moving all the functionality down in to the kernel. Since this cleanup will change the kernel/user ABI it's being done in the same tag as the previous libzfs_core ABI changes. This will minimize, but not eliminate, the disruption to end users. Once merged ZoL, Illumos, and FreeBSD will basically be back in sync in regards to handling ZVOLs in the common code. While each platform must have its own custom zvol.c implemenation the interfaces provided are consistent. NOTES: 1) This patch introduces one subtle change in behavior which could not be easily avoided. Prior to this change callers of 'zfs create -V ...' were guaranteed that upon exit the /dev/zvol/ block device link would be created or an error returned. That's no longer the case. The utilities will no longer block waiting for the symlink to be created. Callers are now responsible for blocking, this is why a 'udev_wait' call was added to the 'label' function in scripts/common.sh. 2) The read-only behavior of a ZVOL now solely depends on if the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant policy setting in the gendisk structure was removed. This both simplifies the code and allows us to safely leverage set_disk_ro() to issue a KOBJ_CHANGE uevent. See the comment in the code for futher details on this. 3) Because __zvol_create_minor() and zvol_alloc() may now be called in a sync task they must use KM_PUSHPAGE. References: illumos/illumos-gate@681d9761e8 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #1969	2013-12-16 09:15:57 -08:00
Yuri Pankov	54d5378fae	Illumos #2583 2583 Add -p (parsable) option to zfs list References: https://www.illumos.org/issues/2583 illumos/illumos-gate@43d68d68c1 Ported-by: Gregor Kopka <gregor@kopka.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #937	2013-11-21 11:13:53 -08:00
Brian Behlendorf	64ad2b26e2	Remove the slog restriction on bootfs pools Under Linux this restriction does not apply because we have access to all the required devices. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1631	2013-11-14 14:28:35 -08:00
Tim Chase	fd4f76160c	Handle concurrent snapshot automounts failing due to EBUSY. In the current snapshot automount implementation, it is possible for multiple mounts to attempted concurrently. Only one of the mounts will succeed and the other will fail. The failed mounts will cause an EREMOTE to be propagated back to the application. This commit works around the problem by adding a new exit status, MOUNT_BUSY to the mount.zfs program which is used when the underlying mount(2) call returns EBUSY. The zfs code detects this condition and treats it as if the mount had succeeded. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1819	2013-11-08 10:45:14 -08:00
Marcel Telka	8ce0af07bb	Illumos #4061 4061 libzfs: memory leak in iter_dependents_cb() Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com> Reviewed by: Boris Protopopov <boris.protopopov@nexenta.com> Reviewed by: Andy Stormont <andyjstormont@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/4061 illumos/illumos-gate@2fbdf8dbf0 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:24:10 -08:00
Matthew Ahrens	46ba1e59d3	Illumos #3996 3996 want a libzfs_core API to rollback to latest snapshot Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andy Stormont <andyjstormont@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3996 illumos/illumos-gate@a7027df17f Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:23:11 -08:00
Steven Hartland	6389d42205	Illumos #3909 3909 "zfs send -D" does not work Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3909 illumos/illumos-gate@36f7455d36 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:15:11 -08:00
Keith M Wesolowski	96c2e96193	Illumos #3894 3894 zfs should not allow snapshot of inconsistent dataset Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/3894 illumos/illumos-gate@ca48f36f20 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Matthew Ahrens	1a077756e8	Illumos #3829 3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl Reviewed by: Matt Amdur <matt.amdur@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3829 illumos/illumos-gate@bb6e70758d Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Steven Hartland	34ffbed88c	Illumos #3818 3818 zpool status -x should report pools with removed l2arc devices Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3818 illumos/illumos-gate@7f2416ef64 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Steven Hartland	95fd54a1c5	Illumos #3740 3740 Poor ZFS send / receive performance due to snapshot hold / release processing Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3740 illumos/illumos-gate@a7a845e4bf Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775 Porting notes: 1. `13fe019870` introduced a merge conflict in dsl_dataset_user_release_tmp where some variables were moved outside of the preprocessor directive. 2. dea9dfefdd747534b3846845629d2200f0616dad made the previous merge conflict worse by switching KM_SLEEP to KM_PUSHPAGE. This is notable because this commit refactors the code, adding a new KM_SLEEP allocation. It is not clear to me whether this should be converted to KM_PUSHPAGE. 3. We had a merge conflict in libzfs_sendrecv.c because of copyright notices. 4. Several small C99 compatibility fixed were made.	2013-11-04 11:17:48 -08:00
Will Andrews	7bc7f25040	Illumos #3745 , #3811 3745 zpool create should treat -O mountpoint and -m the same 3811 zpool create -o altroot=/xyz -O mountpoint=/mnt ignores the mountpoint option Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3745 https://www.illumos.org/issues/3811 illumos/illumos-gate@8b71377531 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Will Andrews	e49f1e20a0	Illumos #3741 3741 zfs needs better comments Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3741 illumos/illumos-gate@3e30c24aee Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Martin Matuska	b1118acbb1	Illumos #3699 , #3739 3699 zfs hold or release of a non-existent snapshot does not output error 3739 cannot set zfs quota or reservation on pool version < 22 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Shrock <eric.schrock@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/3699 https://www.illumos.org/issues/3739 illumos/illumos-gate@013023d4ed Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Adam Leventhal	63fd3c6cfd	Illumos #3582 , #3584 3582 zfs_delay() should support a variable resolution 3584 DTrace sdt probes for ZFS txg states Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Richard Elling <richard.elling@dey-sys.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3582 illumos/illumos-gate@0689f76 Ported by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Matthew Ahrens	330847ff36	Illumos #3537 3537 want pool io kstats Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Sa?o Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Brendan Gregg <brendan.gregg@joyent.com> Approved by: Gordon Ross <gwr@nexenta.com> References: http://www.illumos.org/issues/3537 illumos/illumos-gate@c3a6601 Ported by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: 1. The patch was restructured to take advantage of the existing spa statistics infrastructure. To accomplish this the kstat was moved in to spa->io_stats and the init/destroy code moved to spa_stats.c. 2. The I/O kstat was simply named <pool> which conflicted with the pool directory we had already created. Therefore it was renamed to <pool>/io 3. An update handler was added to allow the kstat to be zeroed.	2013-10-31 09:16:03 -07:00
Ralf Ertzinger	8b921f667a	Introduce zpool_get_prop_literal interface This change introduces zpool_get_prop_literal. It's an expanded version of zpool_get_prop taking one additional boolean parameter. With this parameter set to B_FALSE it will behave identically to zpool_get_prop. Setting it to B_TRUE will return full precision numbers for the following properties: ZPOOL_PROP_SIZE ZPOOL_PROP_ALLOCATED ZPOOL_PROP_FREE ZPOOL_PROP_FREEING ZPOOL_PROP_EXPANDSZ ZPOOL_PROP_ASHIFT Also introduced is a wrapper function for zpool_get_prop making it use zpool_get_prop_literal in the background. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1813	2013-10-28 14:27:53 -07:00
Brian Behlendorf	e0b0ca983d	Add visibility in to cached dbufs Currently there is no mechanism to inspect which dbufs are being cached by the system. There are some coarse counters in arcstats by they only give a rough idea of what's being cached. This patch aims to improve the current situation by adding a new dbufs kstat. When read this new kstat will walk all cached dbufs linked in to the dbuf_hash. For each dbuf it will dump detailed information about the buffer. It will also dump additional information about the referenced arc buffer and its related dnode. This provides a more complete view in to exactly what is being cached. With this generic infrastructure in place utilities can be written to post-process the data to understand exactly how the caching is working. For example, the data could be processed to show a list of all cached dnodes and how much space they're consuming. Or a similar list could be generated based on dnode type. Many other ways to interpret the data exist based on what kinds of questions you're trying to answer. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov>	2013-10-25 13:59:40 -07:00
Prakash Surya	1421c89142	Add visibility in to arc_read This change is an attempt to add visibility into the arc_read calls occurring on a system, in real time. To do this, a list was added to the in memory SPA data structure for a pool, with each element on the list corresponding to a call to arc_read. These entries are then exported through the kstat interface, which can then be interpreted in userspace. For each arc_read call, the following information is exported: * A unique identifier (uint64_t) * The time the entry was added to the list (hrtime_t) (not wall clock time; relative to the other entries on the list) * The objset ID (uint64_t) * The object number (uint64_t) * The indirection level (uint64_t) * The block ID (uint64_t) * The name of the function originating the arc_read call (char[24]) * The arc_flags from the arc_read call (uint32_t) * The PID of the reading thread (pid_t) * The command or name of thread originating read (char[16]) From this exported information one can see, in real time, exactly what is being read, what function is generating the read, and whether or not the read was found to be already cached. There is still some work to be done, but this should serve as a good starting point. Specifically, dbuf_read's are not accounted for in the currently exported information. Thus, a follow up patch should probably be added to export these calls that never call into arc_read (they only hit the dbuf hash table). In addition, it might be nice to create a utility similar to "arcstat.py" to digest the exported information and display it in a more readable format. Or perhaps, log the information and allow for it to be "replayed" at a later time. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-10-25 13:57:25 -07:00
Brian Behlendorf	76463d4026	Revert "Add txgs-<pool> kstat file" This reverts commit `e95853a331`.	2013-10-25 13:57:25 -07:00
Brian Behlendorf	11cb9d773f	Increase default udev wait time When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1646	2013-10-22 10:25:51 -07:00
Richard Yao	a6ce1eae54	Fix libzfs_core changes to follow GNU libtool guidelines The GNU libtool documentation states to start with a version of 0:0:0, rather than 1:1:0. Illumos uses the name libzfs_core.so.1, so to be consistent, we should go with 1:0:0. http://www.gnu.org/software/libtool/manual/libtool.html#Updating-version-info The GNU libtool documentation also provides guidence on how the version information should be incremented. Doing this does a SONAME bump of the libzfs and libzpool libraries. This is particularly important on Gentoo because a SONAME bump enables portage to retain the older libraries until any packages that link to them are rebuilt. The main example of this is GRUB2's grub2-mkconfig, which will break unless it is rebuilt against the new libraries. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Richard Yao	31fc19399e	Generate libraries with correct DT_NEEDED entries Libraries that depend on other libraries should list them in ELF's DT_NEEDED field so that programs linking to them do not need to specify those libraries unless they depend on them as well. This is not the case in the current code and the consequence is that anything that needs a library must know its dependencies. This is fragile and caused GRUB2's configure script to break when a dependency was added on libblkid in libzfs. This resolves that problem by using LIBADD/LDADD to specify libraries in Makefile.am instead of LDFLAGS. This ensures that proper DT_NEEDED entries are generated and prevents GRUB2's configure script from breaking in the presence of a libblkid dependency. This also removes unneeded dependencies from various files. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Richard Yao	1db7b9be75	Fix libblkid support libblkid support is dormant because the autotools check is broken and liblkid identifies ZFS vdevs as "zfs_member", not "zfs". We fix that with a few changes: First, we fix the libblkid autotools check to do a few things: 1. Make a 64MB file, which is the minimum size ZFS permits. 2. Make 4 fake uberblock entries to make libblkid's check succeed. 3. Return 0 upon success to make autotools use the success case. 4. Include stdlib.h to avoid implicit declration of free(). 5. Check for "zfs_member", not "zfs" 6. Make --with-blkid disable autotools check (avoids Gentoo sandbox violation) 7. Pass '-lblkid' correctly using LIBS not LDFLAGS. Second, we change the libblkid support to scan for "zfs_member", not "zfs". This makes --with-blkid work on Gentoo. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Matthew Ahrens	13fe019870	Illumos #3464 3464 zfs synctask code needs restructuring Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3464 illumos/illumos-gate@3b2aab1880 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1495	2013-09-04 16:01:24 -07:00
Matthew Ahrens	6f1ffb0665	Illumos #2882 , #2883 , #2900 2882 implement libzfs_core 2883 changing "canmount" property to "on" should not always remount dataset 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Siden <christopher.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2883 https://www.illumos.org/issues/2900 illumos/illumos-gate@4445fffbbb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1293 Porting notes: WARNING: This patch changes the user/kernel ABI. That means that the zfs/zpool utilities built from master are NOT compatible with the 0.6.2 kernel modules. Ensure you load the matching kernel modules from master after updating the utilities. Otherwise the zfs/zpool commands will be unable to interact with your pool and you will see errors similar to the following: $ zpool list failed to read pool configuration: bad address no pools available $ zfs list no datasets available Add zvol minor device creation to the new zfs_snapshot_nvl function. Remove the logging of the "release" operation in dsl_dataset_user_release_sync(). The logging caused a null dereference because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the logging functions try to get the ds name via the dsl_dataset_name() function. I've got no idea why this particular code would have worked in Illumos. This code has subsequently been completely reworked in Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring). Squash some "may be used uninitialized" warning/erorrs. Fix some printf format warnings for %lld and %llu. Apply a few spa_writeable() changes that were made to Illumos in illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and 3115 fixes. Add a missing call to fnvlist_free(nvl) in log_internal() that was added in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time (zfsonlinux/zfs@9e11c73) because it depended on future work.	2013-09-04 15:49:00 -07:00
Turbo Fredriksson	abbfdca483	No point in rewind() mtab in zfs_unshare_proto(). We're not really reading the file, but instead use libzfs_mnttab_find() which does the nessesary freopen() for us. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1498	2013-08-15 10:18:31 -07:00
John Layman	fb5c53ea65	Fix for re-reading /etc/mtab in zfs_is_mounted() When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1611	2013-08-14 11:37:06 -07:00
Yuri Pankov	105afebb15	Illumos #3098 zfs userspace/groupspace fail 3098 zfs userspace/groupspace fail without saying why when run as non-root Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3098 illumos/illumos-gate@70f56fa693 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1596	2013-08-14 09:28:34 -07:00
Brian Behlendorf	ff3510c1a5	Fix zpool_read_label() The zpool_read_label() function was subtly broken due to a difference of behavior in fstat64(2) on Solaris vs Linux. Under Solaris when a block device is stat'ed the st_size field will contain the size of the device in bytes. Under Linux this is only true for regular file and symlinks. A compatibility function called fstat64_blk(2) was added which can be used when the Solaris behavior is required. This flaw was never noticed because the only time we would need to use the device size is when the first two labels are damaged. I noticed this issue while adding the zpool_clear_label() function which is similar in design and does require us to write all the labels. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-07-09 16:02:04 -07:00
Dmitry Khasanov	131cc95ca7	Add FreeBSD 'zpool labelclear' command The FreeBSD implementation of zfs adds the 'zpool labelclear' command. Since this functionality is helpful and straight forward to add it is being included in ZoL. References: freebsd/freebsd@119a041dc9 Ported-by: Dmitry Khasanov <pik4ez@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1126	2013-07-09 15:58:05 -07:00
Dmitry Khasanov	51a3ae72d2	Readd zpool_clear_label() from OpenSolaris This patch restores the zpool_clear_label() function from OpenSolaris. This was removed by commit `d603ed6` because it wasn't clear we had a use for it in ZoL. However, this functionality is a prerequisite for adding the 'zpool labelclear' command from FreeBSD. As part of bringing this change in the zpool_clear_label() function was changed to use fstat64_blk(2) for compatibility with Linux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1126	2013-07-09 15:42:27 -07:00
Aaron Fineman	bbb75c1190	Add error message for missing /etc/mtab The zpool command should not silently fail when the /etc/mtab file does not exist. This can occur in an initramfs environment when the /etc/mtab file hasn't yet been generated. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1541	2013-06-27 14:43:37 -07:00
Madhav Suresh	c99c90015e	Illumos #3006 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero Reviewed by Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by George Wilson <george.wilson@delphix.com> Approved by Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@fb09f5aad4 https://illumos.org/issues/3006 Requires: zfsonlinux/spl@1c6d149feb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1509	2013-06-19 15:14:10 -07:00
Ned Bass	da29fe63f0	Don't leak mount flags into kernel When calling mount(), care must be taken to avoid passing in flags that are used only by the user space utilities. Otherwise we may stomp on flags that are reserved for other purposes in the kernel. In particular, openSUSE 12.3 kernels have added a new MS_RICHACL super-block flag whose value conflicts with our MS_COMMENT flag. This causes incorrect behavior such as the umask being ignored. The MS_COMMENT flag essentially serves as a placeholder in the option_map data structure of zfs_mount.c, but its value is never used. Therefore we can avoid the conflict by defining it to 0. The MS_USERS, MS_OWNER, and MS_GROUP flags also conflict with reserved flags in the kernel. While this is not known to have caused any problems, it is nevertheless incorrect. For the purposes of the mount.zfs helper, the "users", "owner", and "group" options just serve as hints to set additional implied options. Therefore we now define their associated mount flags in terms of the options that they imply rather than giving them unique values. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1457	2013-06-18 15:30:08 -07:00
Brian Behlendorf	044baf009a	Use taskq for dump_bytes() The vn_rdwr() function performs I/O by calling the vfs_write() or vfs_read() functions. These functions reside just below the system call layer and the expectation is they have almost the entire 8k of stack space to work with. In fact, certain layered configurations such as ext+lvm+md+multipath require the majority of this stack to avoid stack overflows. To avoid this posibility the vn_rdwr() call in dump_bytes() has been moved to the ZIO_TYPE_FREE, taskq. This ensures that all I/O will be performed with the majority of the stack space available. This ends up being very similiar to as if the I/O were issued via sys_write() or sys_read(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1399 Closes #1423	2013-05-06 14:05:42 -07:00
Brian Behlendorf	a4914d38a7	Silence 'old_umask' uninit variable warning Recent changes have caused older versions of gcc to mistakenly flag 'old_umask' in vn_open() as an unitialized variable. To silence the warning initialize it. kernel.c: In function 'vn_open': kernel.c:525:6: error: 'old_umask' may be used uninitialized in this function Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-01 17:05:58 -07:00
George.Wilson	cc92e9d0c3	3246 ZFS I/O deadman thread Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> NOTES: This patch has been reworked from the original in the following ways to accomidate Linux ZFS implementation ) Usage of the cyclic interface was replaced by the delayed taskq interface. This avoids the need to implement new compatibility code and allows us to rely on the existing taskq implementation. ) An extern for zfs_txg_synctime_ms was added to sys/dsl_pool.h because declaring externs in source files as was done in the original patch is just plain wrong. ) Instead of panicing the system when the deadman triggers a zevent describing the blocked vdev and the first pending I/O is posted. If the panic behavior is desired Linux provides other generic methods to panic the system when threads are observed to hang. ) For reference, to delay zios by 30 seconds for testing you can use zinject as follows: 'zinject -d <vdev> -D30 <pool>' References: illumos/illumos-gate@283b84606b https://www.illumos.org/issues/3246 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1396	2013-05-01 17:05:52 -07:00
George Wilson	295304bed6	Illumos #3422 , #3425 3422 zpool create/syseventd race yield non-importable pool 3425 first write to a new zvol can fail with EFBIG Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@bda8819455 https://www.illumos.org/issues/3422 https://www.illumos.org/issues/3425 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1390	2013-04-12 09:01:36 -07:00
Jan Engelhardt	4e95cc99b0	build: resolve orthographic and other grammatical errors Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-04-02 10:44:52 -07:00
Turbo Fredriksson	be8bc8c0d3	Add smb_available() sanity checks Do basic sanity checks in smb_available() to verify that the 'net' command and the usershare directory exists. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1124	2013-04-02 09:27:52 -07:00
Eric Dillmann	0b4d1b5853	Add snapdev=[hidden\|visible] dataset property The new snapdev dataset property may be set to control the visibility of zvol snapshot devices. By default this value is set to 'hidden' which will prevent zvol snapshots from appearing under /dev/zvol/ and /dev/<dataset>/. When set to 'visible' all zvol snapshots for the dataset will be visible. This functionality was largely added because when automatic snapshoting is enabled large numbers of read-only zvol snapshots will be created. When creating these devices the kernel will attempt to read their partition tables, and blkid will attempt to identify any filesystems on those partitions. This leads to a variety of issues: 1) The zvol partition tables will be read in the context of the `modprobe zfs` for automatically imported pools. This is undesirable and should be done asynchronously, but for now reducing the number of visible devices helps. 2) Udev expects to be able to complete its work for a new block devices fairly quickly. When many zvol devices are added at the same time this is no longer be true. It can lead to udev timeouts and missing /dev/zvol links. 3) Simply having lots of devices in /dev/ can be aukward from a management standpoint. Hidding the devices your unlikely to ever use helps with this. Any snapshot device which is needed can be made visible by changing the snapdev property. NOTE: This patch changes the default behavior for zvols which was effectively 'snapdev=visible'. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1235 Closes #945 Issue #956 Issue #756	2013-03-05 12:37:54 -08:00
Brian Behlendorf	02a946ce10	Remove unused machelf.h header The machelf.h header is never included by anything in the zfs build process. It is all effectively dead code which can be safely removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1265	2013-02-05 15:34:50 -08:00
Richard Yao	399f60c8b4	Fix function relocations in libzpool binutils 2.23.1 fails in situations that generate function relocations on PowerPC and possibly other architectures. This causes linking of libzpool to fail because it depends on libnvpair. We add a dependency on libnvpair to lib/libzpool/Makefile.am to correct that. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1267	2013-02-05 15:33:32 -08:00
Brian Behlendorf	dbf763b39b	Retire zpool_id infrastructure In the interest of maintaining only one udev helper to give vdevs user friendly names, the zpool_id and zpool_layout infrastructure is being retired. They are superseded by vdev_id which incorporates all the previous functionality. Documentation for the new vdev_id(8) helper and its configuration file, vdev_id.conf(5), can be found in their respective man pages. Several useful example files are installed under /etc/zfs/. /etc/zfs/vdev_id.conf.alias.example /etc/zfs/vdev_id.conf.multipath.example /etc/zfs/vdev_id.conf.sas_direct.example /etc/zfs/vdev_id.conf.sas_switch.example Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #981	2013-01-29 12:23:17 -08:00
Brian Behlendorf	79c6e4c445	Remove NPTL_GUARD_WITHIN_STACK Commit `4b2f65b253` increased the user space stack by 4x to resolve certain stack overflows. As such it no longer makes sense to worry about a single extra page which might or might not be part of the process stack. There is now ample headroom for normal usage. By eliminating this configure check we are also resolving the following segfault which intentionally occurs at configure time and may be logged in dmesg. conftest[22156]: segfault at 7fbf18a47e48 ip 00000000004007fe sp 00007fbf18a4be50 error 6 in conftest[400000+1000] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-29 10:58:20 -08:00
Eric Dillmann	9759c60f1a	Illumos #3035 LZ4 compression support in ZFS and GRUB 3035 LZ4 compression support in ZFS and GRUB Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Christopher Siden <csiden@delphix.com> References: illumos/illumos-gate@a6f561b4ae https://www.illumos.org/issues/3035 http://wiki.illumos.org/display/illumos/LZ4+Compression+In+ZFS This patch has been slightly modified from the upstream Illumos version to be compatible with Linux. Due to the very limited stack space in the kernel a lz4 workspace kmem cache is used. Since we are using gcc we are also able to take advantage of the gcc optimized __builtin_ctz functions. Support for GRUB has been dropped from this patch. That code is available but those changes will need to made to the upstream GRUB package. Lastly, several hunks of dead code were dropped for clarity. They include the functions real_LZ4_uncompress(), LZ4_compressBound() and the Visual Studio specific hunks wrapped in _MSC_VER. Ported-by: Eric Dillmann <eric@jave.fr> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1217	2013-01-29 09:28:20 -08:00
Brian Behlendorf	14ee71efbc	Use strerror() not strerror_r() The differ() function used strerror_r() instead of strerror() because it allowed the error message to be directly copied in to a buffer. This causes two issues under Linux. * There are two versions of strerror_r() available an XSI-compliant version which returns an 'int' error code. And a GNU-specific version which return a 'char ' to the resulting error string. int strerror_r(int errnum, char buf, size_t buflen); /* XSI / char strerror_r(int errnum, char buf, size_t buflen); / GNU / The most recent versions of strerror_r() are annotated with the warn_unused_result attribute. This causes the following warning since the upstream implementation casts the result to void. warning: ignoring return value of 'strerror_r', declared with attribute warn_unused_result [-Wunused-result] The cleanest way to resolve both of these problems is just to use strerror() and make a copy of the result in to the buffer. This resolves both issues and this is the only instance of strerror_r() in the code base. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1231	2013-01-28 10:02:38 -08:00
Darik Horn	38145d6129	Ensure that zfs diff prints unicode safely. In the stream_bytes() library function used by `zfs diff`, explicitly cast each byte in the input string to an unsigned character so that the Linux fprintf() correctly escapes to octal and does not mangle the output. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1172	2013-01-16 10:15:57 -08:00
Garrett D'Amore	844793c3cc	Illumos #1557 assertion failed in userland taskq_destroy() 1557 assertion failed in userland taskq_destroy() Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@aa846ad9bc illumos changeset: 13597:3eac1e8e0f4c https://www.illumos.org/issues/1557 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-11 09:17:06 -08:00
Christopher Siden	b9b24bb4ca	Illumos #2762 : zpool command should have better support for feature flags 2762 zpool command should have better support for feature flags Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@57221772c3 https://www.illumos.org/issues/2762 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
George Wilson	3bc7e0fb0f	Illumos #3090 and #3102 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@dfbb943217 illumos changeset: 13777:b1e53580146d https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #939	2013-01-08 10:35:42 -08:00
Christopher Siden	9ae529ec5d	Illumos #2619 and #2747 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@53089ab7c8 illumos/illumos-gate@ad135b5d64 illumos changeset: 13700:2889e2596bd6 https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 NOTE: The grub specific changes were not ported. This change must be made to the Linux grub packages. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:35 -08:00
Massimo Maggi	5e6320cd12	Fix get/set users/groups in quota props via numeric id Fix setting/getting users/groups in quota properties through numeric identifier. This support was accidentally disabled in the original port by applying the HAVE_IDMAP wrapper macro too broadly. Fix obtained by moving #ifdef HAVE_IDMAP to exclude only the part of code that really needs IDMAP. Now zfs (get\|set) (user\|group)quota@1000 works as expected. Signed-off-by: Massimo Maggi <massimo@mmmm.it> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1147	2012-12-17 09:52:58 -08:00
Jorgen Lundman	53c2ec1d1b	Fix 'zpool create' segfault due to bad syntax Incorrect syntax should never cause a segfault. In this case listing multiple comma delimited options after '-o' triggered the problem. For example: zpool create -o ashift=12,listsnaps=on This patch resolves the issue by wrapping the calls which use hdr with a NULL test. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1118	2012-12-04 11:15:25 -08:00
Turbo Fredriksson	645fb9cc21	Implemented sharing datasets via SMB using libshare Add the initial support for the 'smbshare' option using the existing libshare infrastructure. Because this implementation relies on usershares samba version 3.0.23 is required. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #493	2012-12-03 09:42:15 -08:00
Brian Behlendorf	c372b36e3e	Allow GPT+EFI vdevs for root pools Commit `57a4edd` allows the bootfs property to be set on any pool. However, many of the zpool commands still prevent you from using EFI labeled devices for the root pool. For example: # zpool attach rpool /dev/sda /dev/sdb cannot label 'sdb': EFI labeled devices are not supported on root pools. on root devices. For non-Solaris builds such as Linux disable this error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1077	2012-11-30 13:45:14 -08:00
Brian Behlendorf	0e20a31b4b	Recreate minors when renaming zvols When a zvol with snapshots is renamed the device files under /dev/zvol/ are not renamed. This patch resolves the problem by destroying and recreating the minors with the new name so the links can be recreated bu udev. Original-patch-by: Suman Chakravartula <schakrava@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #408	2012-11-19 16:59:44 -08:00
Brian Behlendorf	e95853a331	Add txgs-<pool> kstat file Create a kstat file which contains useful statistics about the last N txgs processed. This can be helpful when analyzing pool performance. The new KSTAT_TYPE_TXG type was added for this purpose and it tracks the following statistics per-txg. txg - Unique txg number state - State (O)pen/(Q)uiescing/(S)yncing/(C)ommitted birth; - Creation time nread - Bytes read nwritten; - Bytes written reads - IOPs read writes - IOPs write open_time; - Length in nanoseconds the txg was open quiesce_time - Length in nanoseconds the txg was quiescing sync_time; - Length in nanoseconds the txg was syncing Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-02 15:45:56 -07:00
Brian Behlendorf	30b937ee15	Update spare and cache device names on import During 'zpool import' all ZPOOL_CONFIG_PATH names are supposed to be updated by fix_paths(). This was not happening for spare and cache devices because the proper names were getting filtered out of the pool_list_t->names. Interestingly, the names were being filtered because the spare and cache devices do not contain the pool name in their vdev label. The fix is to exclude the device path from the list only if: 1) has a valid ZPOOL_CONFIG_POOL_NAME key in the label, and 2) that pool name does not match the specified pool name. Since the label is valid and because it does properly store the vdev guid it will be correctly assembled without the pool name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #725	2012-10-22 08:46:02 -07:00
Brian Behlendorf	eac4720465	Allow 'zpool replace' to use short device names The 'zpool replace' command would fail when given a short name because unlike on other platforms the short name cannot be deterministically expanded to a single path. Multiple path prefixes must be checked and in addition the partition suffix for whole disks is determined by the prefix. To handle this complexity a zfs_strcmp_pathname() function was added which takes either a short or fully qualified device name. Short names will be expanded using the prefixes in the default import search path, or the ZPOOL_IMPORT_PATH environment variable if it's defined. All posible expansions are then compared against the comparison path. Care is taken to strip redundant slashes to ensure legitimate matches are not missed. In the context of this work the existing zfs_resolve_shortname() function was extended to consider the ZPOOL_IMPORT_PATH when set. The zfs_append_partition() interface was also simplified to take only a single buffer. The vast majority of these changes rework existing Linux specific code which was originally written to accomidate udev. However, there is some minimal cleanup which removes Illumos specific code. This was done to improve readability but the basic flow and intent of the upstream code was maintained. These changes are the logical conclusion of the previos work to adjust the 'zpool import' search behavior, see commit 44867b6a. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #544 Closes #976	2012-10-22 08:45:58 -07:00
Etienne Dechamps	142e6dd100	Add atomic_sub_* functions to libspl. Both the SPL and the ZFS libspl export most of the atomic_* functions, except atomic_sub_* functions which are only exported by the SPL, not by libspl. This patch remedies that by implementing atomic_sub_* functions in libspl. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1013	2012-10-17 08:56:37 -07:00
Matthew Ahrens	04434775b7	Illumos #3100 : zvol rename fails with EBUSY when dirty. illumos/illumos-gate@2e2c135528 Illumos changeset: 13780:6da32a929222 3100 zvol rename fails with EBUSY when dirty Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Adam H. Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <eric.schrock@delphix.com> Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #995	2012-10-03 13:59:02 -07:00
Etienne Dechamps	0aebd4f9e3	Create threads in detached state in userspace. Currently, thread_create(), when called in userspace, creates a joinable (i.e. not detached thread). This is the pthread default. Unfortunately, this does not reproduce kthreads behavior (kthreads are always detached). In addition, this contradicts the original Solaris code which creates userspace threads in detached mode. These joinable threads are never joined, which leads to a leakage of pthread thread objects ("zombie threads"). This in turn results in excessive ressource consumption, and possible ressource exhaustion in extreme cases (e.g. long ztest runs). This patch fixes the issue by creating userspace threads in detached mode. The only exception is ztest worker threads which are meant to be joinable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #989	2012-10-03 13:32:48 -07:00
Bill Pijewski	37abac6d55	Illumos #2703 : add mechanism to report ZFS send progress Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2703 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-19 13:39:06 -07:00
Chris Siden	1bd201e70d	Illumos #1948 : zpool list should show more detailed pool info Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/1948 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #685	2012-09-19 13:39:05 -07:00
Brian Behlendorf	0a2f7b3662	Seg fault 'zpool import -d /dev/disk/by-id -a' Introduced by commit `44867b6d6e`. We should of course check to ensure best isn't NULL before attempting to dereference it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #974	2012-09-18 12:33:37 -07:00
Brian Behlendorf	44867b6d6e	Improve `zpool import` search behavior The goal of this change is to make 'zpool import' prefer to use the peristent /dev/mapper or /dev/disk/by-* paths. These are far preferable to the devices in /dev/ whos names are not persistent and are determined by the order in which a device is detected. This patch improves things by changing the default search path from just to the top level /dev/ directory to (in order): /dev/disk/by-vdev - Custom rules, use first if they exist /dev/disk/zpool - Custom rules, use first if they exist /dev/mapper - Use multipath devices before components /dev/disk/by-uuid - Single unique entry and persistent /dev/disk/by-id - May be multiple entries and persistent /dev/disk/by-path - Encodes physical location and persistent /dev/disk/by-label - Custom persistent labels /dev - UNSAFE device names will change The default search path can be overriden by setting the ZPOOL_IMPORT_PATH environment variable. This must be a colon delimited list of paths which are searched for vdevs. If the 'zpool import -d' option is specified only those listed paths will be searched. Finally, when multiple paths to the same device are found. If one of the paths is an exact match for the path used last time to import the pool it will be used. When there are no exact matches the prefered path will be determined by the provided search order. This means you can still import a pool and force specific names by providing the -d <path> option. And the prefered names will persist as long as those paths exist on your system. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #965	2012-09-17 13:49:07 -07:00
Cyril Plisko	27ccd4147b	Avoid running exportfs on each zfs/zpool command invocation Delay executing exportfs command until its results are actually required. Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Gunnar Beutner <gunnar@beutner.name> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-11 10:21:49 -07:00
Etienne Dechamps	4b2f65b253	Increase the stack space in userspace. In `1e33ac1e26`, the maximum stack size for userspace tools was set to 8k to mimic the available kernel stack size. Unfortunately, due to differences in how the stack is used in userspace vs kernel space, spurious stack overflows could occur in userspace tools due to the limited stack size. This is especially true in ztest when debugging is enabled. This patch multiplies the userspace stack size by 4, which fixes the stack overflow issues. This comes at the price of not being able to catch stack size issues in userspace, but the previous solution proved unreliable anyway. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #934.	2012-09-06 11:59:59 -07:00
Michael Martin	fc24f7c887	Fix missing vdev names in zpool status output Commit `858219c` makes more sense down below in the 'if (verbose)' section of the code. Initially, buf and path will never point to the same location. Once 'path = buf' is set on a raidz vdev, the code may drop into the verbose section depending on the verbose flag. In here, using a tmpbuf makes sense since now 'buf == path'. This issue does not occur in the upstream Solaris code because their implementations of snprintf() allow for buf and path to be the same address. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #57	2012-09-05 22:09:12 -07:00
Brian Behlendorf	ca8b5af89d	Remove autotools products Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #718	2012-08-27 11:47:44 -07:00
Garrett D'Amore	08b1b21d58	Illumos #2803 : zfs get guid pretty-prints the output Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/2803 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:40:14 -07:00
Christopher Siden	e956d65106	Illumos #1796 , #2871 , #2903 , #2957 1796 "ZFS HOLD" should not be used when doing "ZFS SEND" from a read-only pool 2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite 2903 zfs destroy -d does not work 2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/1796 https://www.illumos.org/issues/2871 https://www.illumos.org/issues/2903 https://www.illumos.org/issues/2957 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:40:02 -07:00
Eric Schrock	db49968e5c	Illumos #2635 : 'zfs rename -f' to perform force unmount Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <George.Wilson@delphix.com> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/2635 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #717	2012-08-23 10:39:43 -07:00
Martin Matuska	cf997d797b	Properly initialize and free destroydata This regression was accidentally introduced by commit `330d06f90d` due to ZoL specific code. The fix is to simply ensure the passed nvlist is initialized and freed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #876	2012-08-23 09:42:21 -07:00
Dan McDonald	d96eb2b153	Illumos #1693 : persistent 'comment' field for a zpool Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1693 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #678	2012-08-08 11:49:37 -07:00
Etienne Dechamps	ee5fd0bb80	Set zvol discard_granularity to the volblocksize. Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #862	2012-08-07 14:55:31 -07:00
Matthew Ahrens	330d06f90d	Illumos #1644 , #1645 , #1646 , #1647 , #1708 1644 add ZFS "clones" property 1645 add ZFS "written" and "written@..." properties 1646 "zfs send" should estimate size of stream 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots 1708 adjust size of zpool history data References: https://www.illumos.org/issues/1644 https://www.illumos.org/issues/1645 https://www.illumos.org/issues/1646 https://www.illumos.org/issues/1647 https://www.illumos.org/issues/1708 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. This change has reordered all of the new ioctl()s to the end of the list. This should help minimize this issue in the future. Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Albert Lee <trisk@opensolaris.org> Approved by: Garrett D'Amore <garret@nexenta.com> Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #826 Closes #664	2012-07-31 09:25:30 -07:00
Etienne Dechamps	f09398cec6	Use /sys/module instead of /proc/modules. When libzfs checks if the module is loaded or not, it currently reads /proc/modules and searches for a line matching the module name. Unfortunately, if the module is included in the kernel itself (built-in module), then /proc/modules won't list it, so libzfs will wrongly conclude that the module is not loaded, thus making all ZFS userspace tools unusable. Fortunately, all loaded modules appear as directories in /sys/module, even built-in ones. Thus we can use /sys/module in lieu of /proc/modules to fix the issue. As a bonus, the code for checking becomes much simpler. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #851	2012-07-26 13:45:33 -07:00
Richard Yao	739a1a82e0	Linux 3.5 compat, end_writeback() changed to clear_inode() The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #784	2012-07-23 12:29:36 -07:00
Richard Yao	ea1fdf46e2	Linux 3.5 compat, iops->truncate_range() removed The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:32 -07:00
Richard Yao	756c3e5a9c	Linux 3.5 compat, eops->encode_fh() takes inodes The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:23 -07:00
Etienne Dechamps	b5a28807cd	Move partition scanning from userspace to module. Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #808	2012-07-17 09:17:31 -07:00
Brian Behlendorf	7535251dcf	Add PowerPC to supported VTOCs This code was was inherited from Solaris which was careful to define the expected VTOC for various supported architectures. While this check may have made sense there it's something we should be able to safely drop under Linux. However, I'm not quite ready to do that yet. So for the moment I'm just doing the very safe thing of adding PowerPC as a supported type. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-07-12 11:52:34 -07:00
Etienne Dechamps	cee43a7477	Fix efi_use_whole_disk() when efi_nparts == 128. Commit `e5dc681a` changed EFI_NUMPAR from 9 to 128. This means that the on-disk EFI label has efi_nparts = 128 instead of 9. The index of the reserved partition, however, is still 8. This breaks efi_use_whole_disk(), which uses efi_nparts-1 as the index of the reserved partition. This commit fixes efi_use_whole_disk() when the index of the reserved partition is not efi_nparts-1. It rewrites the algorithm and makes it more robust by using the order of the partitions instead of their numbering. It assumes that the last non-empty partition is the reserved partition, and that the non-empty partition before that is the data partition. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #808	2012-07-12 08:59:22 -07:00
Etienne Dechamps	7608bd0dd0	Use the right device path when relabeling. Currently, zpool_vdev_online() calls zpool_relabel_disk() with a short partition device name, which is obviously wrong because (1) zpool_relabel_disk() expects a full, absolute path to use with open() and (2) efi_write() must be called on an opened disk device, not a partition device. With this patch, zpool_relabel_disk() gets called with a full disk device path. The path is determined using the same algorithm as zpool_find_vdev(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #808	2012-07-12 08:59:16 -07:00
Etienne Dechamps	8adf486422	Fix error handling for "zpool online -e". The error handling code around zpool_relabel_disk() is either inexistent or wrong. The function call itself is not checked, and zpool_relabel_disk() is generating error messages from an unitialized buffer. Before: # zpool online -e homez sdb; echo $? `: cannot relabel 'sdb1': unable to open device: 2 0 After: # zpool online -e homez sdb; echo $? cannot expand sdb: cannot relabel 'sdb1': unable to open device: 2 1 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #808	2012-07-12 08:58:19 -07:00
George Wilson	c7f2d69de3	Illumos #1949 , #1953 1949 crash during reguid causes stale config 1953 allow and unallow missing from zpool history since removal of pyzfs Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Garrett D'Amore <garrett.damore@gmail.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Steve Gonczi <gonczi@comcast.net> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/1949 https://www.illumos.org/issues/1953 Ported by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:33:31 -07:00
Garrett D'Amore	3541dc6d02	Illumos #1748 : desire support for reguid in zfs Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com> Reviewed by: Alexander Stetsenko <ams@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1748 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. If only the user space component is updated both the 'zpool events' command and the 'zpool reguid' command will not work until the kernel modules are updated. Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:08:56 -07:00
Pawel Jakub Dawidek	0cee24064a	Speed up 'zfs list -t snapshot -o name -s name' FreeBSD #xxx: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s NOTE: This patch only appears in FreeBSD. If/when Illumos picks up the change we may want to drop this patch and adopt their version. However, for now this addresses a real issue. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #450	2012-06-14 09:49:04 -07:00
Richard Yao	bc98d6c809	Make zvol_remove_link() print a more useful error message Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-13 16:27:19 -07:00
Daniel Verite	c6327b63e6	Retry removal of busy minors When failing to remove a zvol device link because it's busy, wait a bit and retry in a loop instead of giving up immediately. This technique is similar to the loop in zpool_label_disk_wait(), with the same goal: waiting for the asynchronous udev processes to finish their work. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #692	2012-06-11 10:50:20 -07:00
Richard Yao	6a0936babc	Linux 3.4 compat, d_make_root() replaces d_alloc_root() torvalds/linux@adc0e91ab1 introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #776	2012-06-11 10:04:49 -07:00
Brian Behlendorf	abe5b8fb66	Improve 'zpool import' EBUSY error message When a device is already open O_EXCL by another process the `zpool import` will correctly fail. However, the default failure message isn't very helpful. It may in fact be harmful if you take its advise and destroy your pool. cannot import 'tank': pool is busy Destroy and re-create the pool from a backup source. Improve the error message in the EBUSY case to simply print a message indicating that the devices are current in use. The user will need to manually identify which process has the device open exclusively and why. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-01 08:55:24 -07:00
Brian Behlendorf	b04c9fc009	Add /dev/mapper/ to search path When creating pools short device names may be used when those devices appear in certain well known locations under /dev/. This change adds /dev/mapper/ to that list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-01 08:55:24 -07:00
Ned A. Bass	821b683436	Add vdev_id for JBOD-friendly udev aliases vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path in a storage topology to a channel name. The channel name is combined with a disk enclosure slot number to create an alias that reflects the physical location of the drive. This is particularly helpful when it comes to tasks like replacing failed drives. Slot numbers may also be re-mapped in case the default numbering is unsatisfactory. The drive aliases will be created as symbolic links in /dev/disk/by-vdev. The only currently supported topologies are sas_direct and sas_switch: o sas_direct - a channel is uniquely identified by a PCI slot and a HBA port o sas_switch - a channel is uniquely identified by a SAS switch port A multipath mode is supported in which dm-mpath devices are handled by examining the first running component disk, as reported by 'multipath -l'. In multipath mode the configuration file should contain a channel definition with the same name for each path to a given enclosure. vdev_id can replace the existing zpool_id script on systems where the storage topology conforms to sas_direct or sas_switch. The script could be extended to support other topologies as well. The advantage of vdev_id is that it is driven by a single static input file that can be shared across multiple nodes having a common storage toplogy. zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per node and a separate slot-mapping file. However, zpool_id provides the flexibility of using any device names that show up in /dev/disk/by-path, so it may still be needed on some systems. vdev_id's functionality subsumes that of the sas_switch_id script, and it is unlikely that anyone is using it, so sas_switch_id is removed. Finally, /dev/disk/by-vdev is added to the list of directories that 'zpool import' will scan. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #713	2012-06-01 08:55:14 -07:00
Jorgen Lundman	c421831192	Define the needed ISA types for ARM Add the minimum required ISA types to support the ARM architecture. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-05-03 11:18:54 -07:00
Brian Behlendorf	b39d3b9f7b	Linux 3.3 compat, iops->create()/mkdir()/mknod() The mode argument of iops->create()/mkdir()/mknod() was changed from an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf check was added to detect the API change and then correctly set a zpl_umode_t typedef. There is no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #701	2012-04-30 12:52:38 -07:00
Richard Laager	109491a897	Improve error message consistency Signed-off-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-04-11 10:43:17 -07:00
Brian Behlendorf	1c5de20ae2	Add --enable-debug-dmu-tx configure option Allow rigorous (and expensive) tx validation to be enabled/disabled indepentantly from the standard zfs debugging. When enabled these checks ensure that all txs are constructed properly and that a dbuf is never dirtied without taking the correct tx hold. This checking is particularly helpful when adding new dmu consumers like Lustre. However, for established consumers such as the zpl with no known outstanding tx construction problems this is just overhead. --enable-debug-dmu-tx - Enable/disable validation of each tx as --disable-debug-dmu-tx it is constructed. By default validation is disabled due to performance concerns. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-23 12:25:17 -07:00
Brian Behlendorf	ebe7e575ea	Add .zfs control directory Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. ) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. ) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. ) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. ) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <rohan.puri15@gmail.com> Contributiobs-by: Andrew Barnes <barnes333@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #173	2012-03-22 13:03:47 -07:00
Ned Bass	613d88eda8	Align parition end on 1 MiB boundary Some devices have exhibited sensitivity to the ending alignment of partitions. In particular, even if the first partition begins at 1 MiB, we have seen many sd driver task abort errors with certain SSDs if the first partition doesn't end on a 1 MiB boundary. This occurs when the vdev label is read during pool creation or importation and causes a delay of about 30 seconds per device. It can also be simulated with dd when the pool isn't imported: dd if=/dev/sda1 of=/dev/null bs=262144 count=1 For the record, this problem was observed with SMARTMOD SG9XCA2E200GE01 200GB SSDs. Unfortunately I don't have a good explanation for this behavior. It seems to have something to do with highly fragmented single-sector requests being issued to the device, which it may not support. With end-aligned partitions at least page-sized requests were queued and issued to the driver according to blktrace. In any case, aligning the partition end is a fairly innocuous work-around, wasting at most 1 MiB of space. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #574	2012-03-05 09:49:50 -08:00
Brian Behlendorf	4b787d75c8	Cleanly support debug packages Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 14:08:17 -08:00
Richard Yao	b41c9906dc	Support ashift=13 for 8KB SSD block sizes New SSDs are now available which use an internal 8k block size. To make sure ZFS can get the maximum performance out of these devices we're increasing the maximum ashift to 13 (8KB). This value is still small enough that we can fit 16 uberblocks in the vdev ring label. However, I don't want to increase this any futher or it will limit the ability the safely roll back a pool to recover it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #565	2012-02-13 12:25:27 -08:00
Turbo Fredriksson	d2e032ca9c	Add 'fsid' mount option to allowed options. Resolves nfs-utils-1.0.x compatibility issue which requires that the fsid be set in the export options. exportfs: Warning: /tank/dir requires fsid= for NFS export Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #570	2012-02-13 09:43:57 -08:00
Etienne Dechamps	30930fba21	Add support for DISCARD to ZVOLs. DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning. It allows ZVOL clients to discard (unmap, trim) block ranges from a ZVOL, thus optimizing disk space usage by allowing a ZVOL to shrink instead of just grow. We can't use zfs_space() or zfs_freesp() here, since these functions only work on regular files, not volumes. Fortunately we can use the low-level function dmu_free_long_range() which does exactly what we want. Currently the discard operation is not added to the log. That's not a big deal since losing discard requests cannot result in data corruption. It would however result in disk space usage higher than it should be. Thus adding log support to zvol_discard() is probably a good idea for a future improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 16:19:38 -08:00
Etienne Dechamps	cb2d19010d	Support the fallocate() file operation. Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is supported, since it's the only one that matches the behavior of zfs_space(). This makes it pretty much useless in its current form, but it's a start. To support other flag combinations we would need to modify zfs_space() to make it more flexible, or emulate the desired functionality in zpl_fallocate(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 16:19:32 -08:00
Etienne Dechamps	34037afe24	Improve ZVOL queue behavior. The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Etienne Dechamps	b18019d2d8	Fix synchronicity for ZVOLs. zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does not mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not mean that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Darik Horn	e67329d8e0	Let libnvpair be linked independently of libzfs. Autoconf will fail to detect the ZoL libnvpair on systems that do not implicitly link library runtime dependencies, which is anything that has the GCC 4.5 DCO update. Build libuutil before libnvpair, and put it on the the LDADD line of the libnvpair automake template. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #560	2012-02-07 11:37:15 -08:00
Brian Behlendorf	47621f3d76	Linux 3.3 compat, sops->show_options() The second argument of sops->show_options() was changed from a 'struct vfsmount ' to a 'struct dentry '. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #549	2012-02-03 10:02:01 -08:00
Brian Behlendorf	d7e398ce1a	Cleanup ZFS debug infrastructure Historically the internal zfs debug infrastructure has been scattered throughout the code. Since we expect to start making more use of this code this patch performs some cleanup. * Consolidate the zfs debug infrastructure in the zfs_debug.[ch] files. This includes moving the zfs_flags and zfs_recover variables, plus moving the zfs_panic_recover() function. * Remove the existing unused functionality in zfs_debug.c and replace it with code which correctly utilized the spl logging infrastructure. * Remove the __dprintf() function from zfs_ioctl.c. This is dead code, the dprintf() functionality in the kernel relies on the spl log support. * Remove dprintf() from hdr_recl(). This wasn't particularly useful and was missing the required format specifier anyway. * Subsequent patches should unify the dprintf() and zfs_dbgmsg() functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:24:30 -08:00
Prakash Surya	ff998d804f	Ignore dataset if the dds_type is DMU_OST_OTHER Since the zpios and potentially other ZFS tests use the DMU_OST_OTHER type to label their datasets, the zpool and zfs commands should gracefully handle this type when it is encountered. This patch modifies the commands' behavior to ignore any datasets with a dds_type of DMU_OST_OTHER. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #536	2012-01-19 09:29:48 -08:00
Darik Horn	f783130a1f	Allow GPT+EFI vdev replacement in boot pools. Commit zfsonlinux/zfs@57a4eddc4d allows the bootfs property to be set on any pool, but does not accommodate subsequent vdev changes. For example: # zpool replace rpool /dev/sda /dev/sdb operation not supported on this type of pool property 'bootfs' is not supported on EFI labeled devices For non-Solaris builds, disable the check that emits this error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 11:05:24 -08:00
Darik Horn	750562833f	Combine libraries: spl, avl, efi, share, unicode. These libraries, which are an artifact of the ZoL development process, conflict with packages that are already in distribution: * libspl: SPL Programming Language * libavl: AVL for Linux * libefi: GRUB And these libraries are potential conflicts: * libshare: the Linux Mount Manager * libunicode: Perl and Python Recompose these five ZoL components into the four libraries that are conventionally provided by Solaris and FreeBSD systems: + libnvpair + libuutil + libzpool + libzfs This change resolves the name conflict, makes ZoL more compatible with existing software that uses autotools to detect ZFS, and allows pkg-zfs to better reflect the official Debian kFreeBSD packaging. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #430	2012-01-17 15:19:50 -08:00
Suman Chakravartula	e18be9a637	Add overlay(-O) mount option support Linux supports mounting over non-empty directories by default. In Solaris this is not the case and -O option is required for zfs mount to mount a zfs filesystem over a non-empty directory. For compatibility, I've added support for -O option to mount zfs filesystems over non-empty directories if the user wants to, just like in Solaris. I've defined MS_OVERLAY to record it in the flags variable if the -O option is supplied. The flags variable passes through a few functions and its checked before performing the empty directory check in zfs_mount function. If -O is given, the check is not performed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #473	2012-01-12 15:49:38 -08:00
Richard Laager	2932b6a800	Treat /dev/vd* as whole disks Correctly detect /dev/vd devices as whole disks and attempt to create an EFI partition table. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-11 16:44:54 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Prakash Surya	6ba3b44614	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2011-12-14 19:14:23 -08:00
Garrett D'Amore	a38718a63d	Illumos #734 : Use taskq_dispatch_ent() interface It has been observed that some of the hottest locks are those of the zio taskqs. Contention on these locks can limit the rate at which zios are dispatched which limits performance. This upstream change from Illumos uses new interface to the taskqs which allow them to utilize a prealloc'ed taskq_ent_t. This removes the need to perform an allocation at dispatch time while holding the contended lock. This has the effect of improving system performance. Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Alexey Zaytsev <alexey.zaytsev@nexenta.com> Reviewed by: Jason Brian King <jason.brian.king@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> References to Illumos issue: https://www.illumos.org/issues/734 Ported-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #482	2011-12-14 09:19:30 -08:00
Gunnar Beutner	590338f63e	Added comments for libshare's NFS functions. Some of the functions' purpose wasn't immediately obvious without additional explanations. This commit adds these missing comments. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-05 09:33:00 -08:00
Suman Chakravartula	ada8ec1ec5	Allow leading digits in userquota/groupquota names While setting/getting userquota and groupquota properties, the input was not treated as a possible username or groupname if it had a leading digit. While useradd in linux recommends the regexp [a-z_][a-z0-9_-]*[$]? , it is not enforced. This causes problem for usernames with leading digits in them. We need to be able to support getting and setting properties for this unconventional but possible input category I've updated the code to validate the username or groupname directly via the API. Also, note that I moved this validation to the beginning before the check for SID names with @. This also supports usernames with @ character in them which are valid. Only when input with @ is not a valid username, it is interpreted as a potential SID name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #428	2011-11-21 16:29:18 -08:00
Brian Behlendorf	ca5fd24984	Limit maximum ashift value to 12 While we initially allowed you to set your ashift as large as 17 (SPA_MAXBLOCKSIZE) that is actually unsafe. What wasn't considered at the time is that each uberblock written to the vdev label ring buffer will be of this size. Now the buffer is statically sized to 128k and we need to be able to fit several uberblocks in it. With a large ashift that becomes a problem. Therefore I'm reducing the maximum configurable ashift value to 12. This is large enough for the 4k sector drives and small enough that we can still keep the most recent 32 uberblock in the vdev label ring buffer. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #425	2011-11-11 14:50:48 -08:00
Brian Behlendorf	5547c2f1bf	Simplify BDI integration Update the code to use the bdi_setup_and_register() helper to simplify the bdi integration code. The updated code now just registers the bdi during mount and destroys it during unmount. The only complication is that for 2.6.32 - 2.6.33 kernels the helper wasn't available so in these cases the zfs code must provide it. Luckily the bdi_setup_and_register() function is trivial. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #367	2011-11-08 10:19:03 -08:00
Zachary Bedell	7a0232735d	Make libefi-created GPT compatible with gptfdisk GPT's created by libefi set the HeaderSize attribute in the GPT header to 512 -- size of the GPT header INCLUDING the 420 padding bytes at the end. Most other tools set the size to 92 -- size of the actual header itself excluding the padding. Most tools check the recorded HeaderSize when verifying CRC, but gptfdisk hardcodes 92 and thus reports CRC verification problems for full-disk vdevs created IE with `zpool create pool sdc`. This commit changes libefi's behavior for GPT creation and also fixes several edge cases where libefi's behavior was similar (though in an incompatible manner) to gptfdisk. Libefi assumed HeaderSize was always 512 even if the GPT recorded a different value. Sanity checks of the GPT headersize read from disk were added before applying checksum calculation -- this will prevent segfault in cases of bogus on-disk values. Zpools created with the resuling libefi are verified as correct both by parted and gptfdisk. Also pool have been tested to import correctly on ZFS on Linux as well as Solaris Express 11 livecd. Signed-off-by: Zachary Bedell <zac@thebedells.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #344	2011-09-26 09:44:43 -07:00
Brian Behlendorf	de0a1c099b	Autogen refresh for udev changes Run autogen.sh using the same autotools versions as upstream: * autoconf-2.63 * automake-1.11.1 * libtool-2.2.6b	2011-08-08 16:30:27 -07:00
Brian Behlendorf	76659dc110	Add backing_device_info per-filesystem For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174	2011-08-04 13:37:38 -07:00
Gunnar Beutner	ddd0fd9ef6	Use libzfs_run_process() in libshare. This should simplify the code a bit by re-using existing code to fork and exec a process. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #190	2011-08-01 13:52:35 -07:00
Gunnar Beutner	3132cb397a	Use /dev/null for stdout/stderr in libzfs_run_process(). Simply closing the stdout and/or stderr file descriptors for the child process can have bad side effects if for example the child writes to stdout/stderr after open()ing a file. The open() call might have returned the same file descriptor one would usually expect for stdout/stderr (1 and 2), thereby causing mis-directed writes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #190	2011-08-01 13:52:23 -07:00
James H	5333eb0b3b	Call exportfs -v once for NFS shares. At the moment we call exportfs -v every time we check whether an NFS share is active. This happens every time you run a zfs or zpool command, making them extremely slow when you have a lot of exports. The time taken is approx O(n2) of the number of shares. This commit stores the output from exportfs -v in a temporary file and use this to speed up subsequent accesses. This mechanism is still too slow - if you have tens of thousands of NFS shares it will still be painful running ANY zfs/zpool command. Signed-off-by: Gunnar Beutner <gunnar@beutner.name> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #341	2011-08-01 13:50:40 -07:00
Alexander Stetsenko	0b7936d5c2	Illumos #278 : get rid zfs of python and pyzfs dependencies Remove all python and pyzfs dependencies for consistency and to ensure full functionality even in a mimimalist environment. Reviewed by: gordon.w.ross@gmail.com Reviewed by: trisk@opensolaris.org Reviewed by: alexander.r.eremin@gmail.com Reviewed by: jerry.jelinek@joyent.com Approved by: garrett@nexenta.com References to Illumos issue and patch: - https://www.illumos.org/issues/278 - https://github.com/illumos/illumos-gate/commit/1af68beac3 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340 Issue #160 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-08-01 12:09:36 -07:00
Matt Ahrens	f5fc4acaa7	Illumos #1092 : zfs refratio property Add a "REFRATIO" property, which is the compression ratio based on data referenced. For snapshots, this is the same as COMPRESSRATIO, but for filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie, includes blocks in children, but not blocks shared with the origin). This is needed to figure out how much space a filesystem would use if it were not compressed (ignoring snapshots). Reviewed by: George Wilson <George.Wilson@delphix.com> Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Mark Musante <Mark.Musante@oracle.com> Reviewed by: Garrett D'Amore <garrett@nexenta.com> Approved by: Garrett D'Amore <garrett@nexenta.com> References to Illumos issue and patch: - https://www.illumos.org/issues/1092 - https://github.com/illumos/illumos-gate/commit/187d6ac08a Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340	2011-08-01 12:09:11 -07:00
Kyle Fuller	615ab66d18	Provide a rc.d script for archlinux Unlike most other Linux distributions archlinux installs its init scripts in /etc/rc.d insead of /etc/init.d. This commit provides an archlinux rc.d script for zfs and extends the build infrastructure to ensure it get's installed in the correct place. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #322	2011-07-11 14:12:23 -07:00
Brian Behlendorf	b1c932d318	Add proper library versioning The zfs libraries were never properly versioned. Since the API has remained static for quite some time this we never an issue. However, going forward they should be versioned. This commit versions all of the libraries to 1.0.0. From here on out this version must be updated to reflect changes to the library.	2011-07-06 09:20:28 -07:00
Gunnar Beutner	52e7c3a2e5	Link libshare directly to libzfs Drop usage of dlopen/dlsym for libshare. There is no need to do this because the zfs packages provide libshare. Unlike on Solaris we are guaranteed it will be available. This avoids possible problems with hardcoding the libshare path in the code (e.g. when users specify a different install path via configure options). It additionally simplifies the code which is good for maintainability. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-07-06 09:20:28 -07:00
Gunnar Beutner	46e18b3f0f	Implemented sharing datasets via NFS using libshare. The sharenfs and sharesmb properties depend on the libshare library to export datasets via NFS and SMB. This commit implements the base libshare functionality as well as support for managing NFS shares. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-07-06 09:20:28 -07:00
Brian Behlendorf	f2cfee80e3	Fix implicit declaration of 'mkdirp' The lib/libspl/include/libgen.h header file was being mistakenly left out of the 'make dist' tarball. It just happens this doesn't cause a build failure when creating packages because the system libgen/h is included instead. This simply results in the following warning due to the missing forward declaration of mkdirp(). ../../lib/libzfs/libzfs_mount.c:417:3: warning: implicit declaration of function 'mkdirp' [-Wimplicit-function-declaration]	2011-07-01 13:39:47 -07:00
Brian Behlendorf	2cf7f52bc4	Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231	2011-07-01 13:36:39 -07:00
Brian Behlendorf	5c03efc379	Linux compat 2.6.39: security_inode_init_security() The security_inode_init_security() function now takes an additional qstr argument which must be passed in from the dentry if available. Passing a NULL is safe when no qstr is available the relevant security checks will just be skipped. Closes #246 Closes #217 Closes #187	2011-07-01 12:40:08 -07:00
Prasad Joshi	b312979252	Tear down and flush the mmap region The inode eviction should unmap the pages associated with the inode. These pages should also be flushed to disk to avoid the data loss. Therefore, use truncate_setsize() in evict_inode() to release the pagecache. The API truncate_setsize() was added in 2.6.35 kernel. To ensure compatibility with the old kernel, the patch defines its own truncate_setsize function. Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com> Closes #255	2011-06-27 09:59:19 -07:00
Christian Kohlschütter	df30f56639	Add "ashift" property to zpool create Some disks with internal sectors larger than 512 bytes (e.g., 4k) can suffer from bad write performance when ashift is not configured correctly. This is caused by the disk not reporting its actual sector size, but a sector size of 512 bytes. The drive may behave this way for compatibility reasons. For example, the WDC WD20EARS disks are known to exhibit this behavior. When creating a zpool, ZFS takes that wrong sector size and sets the "ashift" property accordingly (to 9: 1<<9=512), whereas it should be set to 12 for 4k sectors (1<<12=4096). This patch allows an adminstrator to manual specify the known correct ashift size at 'zpool create' time. This can significantly improve performance in certain cases. However, it will have an impact on your total pool capacity. See the updated ashift property description in the zpool.8 man page for additional details. Valid values for the ashift property range from 9 to 17 (512B-128KB). Additionally, you may set the ashift to 0 if you wish to auto-detect the sector size based on what the disk reports, this is the default behavior. The most common ashift values are 9 and 12. Example: zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd Closes #280 Original-patch-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-06-17 16:35:49 -07:00
Brian Behlendorf	2e08aedba4	Always check -Wno-unused-but-set-variable gcc support The previous commit `8a7e1ceefa` wasn't quite right. This check applies to both the user and kernel space build and as such we must make sure it runs regardless of what the --with-config option is set too. For example, if --with-config=kernel then the autoconf test does not run and we generate build warnings when compiling the kernel packages.	2011-06-14 16:40:35 -07:00
Brian Behlendorf	8a7e1ceefa	Check for -Wno-unused-but-set-variable gcc support Gcc versions 4.3.2 and earlier do not support the compiler flag -Wno-unused-but-set-variable. This can lead to build failures on older Linux platforms such as Debian Lenny. Since this is an optional build argument this changes add a new autoconf check for the option. If it is supported by the installed version of gcc then it is used otherwise it is omited. See commit's `12c1acde76` and `79713039a2` for the reason the -Wno-unused-but-set-variable options was originally added.	2011-06-14 14:43:22 -07:00
Brian Behlendorf	1b9d8c340f	Fix 'zfs send -D' segfault Sending pools with dedup results in a segfault due to a Solaris portability issue. Under Solaris the pipe(2) library call creates a bidirectional data channel. Unfortunately, on Linux pipe(2) call creates unidirection data channel. The fix is to use the socketpair(2) function to create the expected bidirectional channel. Seth Heeren did the original leg work on this issue for zfs-fuse. We finally just rediscovered the same portability issue and dfurphy was able to point me at the original issue for the fix. Closes #268	2011-06-09 13:58:48 -07:00
Brian Behlendorf	df554c148e	Fix 'zfs set volsize=N pool/dataset' This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <github@fajar.net> Closes #68 Closes #84	2011-05-02 08:54:40 -07:00

... 3 4 5 6 7 ...

546 Commits