Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Boris Protopopov	a0bd735adb	Add support for asynchronous zvol minor operations zfsonlinux issue #2217 - zvol minor operations: check snapdev property before traversing snapshots of a dataset zfsonlinux issue #3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue #3681. Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2217 Closes #3678 Closes #3681	2016-03-10 09:49:22 -08:00
Brian Behlendorf	27a19a0d5a	zimport.sh: Add configure/make option support Allow the following environment variables to control the build behavior of the zimport.sh script. This can be useful when you want a debug build or require specific build options. The default values are: CONFIG_OPTIONS="" MAKE_OPTIONS="-s -j$(nproc)" Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-11-16 16:10:07 -08:00
Brian Behlendorf	5970eb3d60	Use truncate instead of fallocate in ziltest.sh For the purposes of creating sparse files the truncate command is preferable to fallocate because generic sparse files are more widely supported by older platforms. Specifically Debian Wheezy which is based on a 2.6.32 kernel used ext3 by default which at the time did not support it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-07-13 11:02:59 -07:00
Brian Behlendorf	a254ecfc8b	Add ziltest.sh The ziltest.sh script is a test case designed to verify the correct functioning of the ZIL. It's being added to the scripts directory so it can be easily added to the automated regression testing. The general idea is to build up an intent log from a bunch of diverse user commands without actually committing them to the file system. Then copy the file system, replay the intent log and compare the file system and the copy. Ported-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3531	2015-06-26 14:21:41 -07:00
Dan Swartzendruber	1611bb7b4f	Set zfs_autoimport_disable default value to 1 When loading the ZFS kernel modules they should not populate the spa namespace using the cache file. This behavior isn't consistent with other Linux kernel modules and we need to move away from it. Removing this makes the whole startup process predictable with four basic steps which are driven by the init system. 1) modprobe 2) zpool import 3) zfs mount 4) zfs share This change also helps lay the groundwork for eventually removing the kobj_* compatibility code on the kernel side. It may need to be preserved in userspace because libzfs_init() depends on it. This is why the conditional must be wrapped with an #ifdef _KERNEL. Signed-off-by: Dan Swartzendruber <dswartz@druber.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2820	2015-02-17 16:09:41 -08:00
Brian Behlendorf	6442f3cfe3	Retire zio_bulk_flags Long ago the zio_bulk_flags module parameter was introduced to facilitate debugging and profiling the zio_buf_caches. Today this code works well and there's no compelling reason to keep this functionality. In fact it's preferable to revert this so the code is more consistent with other ZFS implementations. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue #3063	2015-02-10 16:08:49 -08:00
Prakash Surya	0b39b9f96f	Swap DTRACE_PROBE* with Linux tracepoints This patch leverages Linux tracepoints from within the ZFS on Linux code base. It also refactors the debug code to bring it back in sync with Illumos. The information exported via tracepoints can be used for a variety of reasons (e.g. debugging, tuning, general exploration/understanding, etc). It is advantageous to use Linux tracepoints as the mechanism to export this kind of information (as opposed to something else) for a number of reasons: * A number of external tools can make use of our tracepoints "automatically" (e.g. perf, systemtap) * Tracepoints are designed to be extremely cheap when disabled * It's one of the "accepted" ways to export this kind of information; many other kernel subsystems use tracepoints too. Unfortunately, though, there are a few caveats as well: * Linux tracepoints appear to only be available to GPL licensed modules due to the way certain kernel functions are exported. Thus, to actually make use of the tracepoints introduced by this patch, one might have to patch and re-compile the kernel; exporting the necessary functions to non-GPL modules. * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux tracepoints are not available for unsigned kernel modules (tracepoints will get disabled due to the module's 'F' taint). Thus, one either has to sign the zfs kernel module prior to loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or newer. Assuming the above two requirements are satisfied, lets look at an example of how this patch can be used and what information it exposes (all commands run as 'root'): # list all zfs tracepoints available $ ls /sys/kernel/debug/tracing/events/zfs enable filter zfs_arc__delete zfs_arc__evict zfs_arc__hit zfs_arc__miss zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write zfs_new_state__mfu zfs_new_state__mru # enable all zfs tracepoints, clear the tracepoint ring buffer $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable $ echo 0 > /sys/kernel/debug/tracing/trace # import zpool called 'tank', inspect tracepoint data (each line was # truncated, they're too long for a commit message otherwise) $ zpool import tank $ cat /sys/kernel/debug/tracing/trace \| head -n35 # tracer: nop # # entries-in-buffer/entries-written: 1219/1219 #P:8 # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr... z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr... z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr... z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ... lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr... z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr... z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru... lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr... z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr... z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru... lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ... lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ... To highlight the kind of detailed information that is being exported using this infrastructure, I've taken the first tracepoint line from the output above and reformatted it such that it fits in 80 columns: lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr { dva 0x1:0x40082 birth 15491 cksum0 0x163edbff3a flags 0x640 datacnt 1 type 1 size 2048 spa 3133524293419867460 state_type 0 access 0 mru_hits 0 mru_ghost_hits 0 mfu_hits 0 mfu_ghost_hits 0 l2_hits 0 refcount 1 } bp { dva0 0x1:0x40082 dva1 0x1:0x3000e5 dva2 0x1:0x5a006e cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00 lsize 2048 } zb { objset 0 object 0 level -1 blkid 0 } For the specific tracepoint shown here, 'zfs_arc__miss', data is exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!). This kind of precise and detailed information can be extremely valuable when trying to answer certain kinds of questions. For anybody unfamiliar but looking to build on this, I found the XFS source code along with the following three web links to be extremely helpful: * http://lwn.net/Articles/379903/ * http://lwn.net/Articles/381064/ * http://lwn.net/Articles/383362/ I should also node the more "boring" aspects of this patch: * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to support a sixth paramter. This parameter is used to populate the contents of the new conftest.h file. If no sixth parameter is provided, conftest.h will be empty. * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced. This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro, except it has support for a fifth option that is then passed as the sixth parameter to ZFS_LINUX_COMPILE_IFELSE. These autoconf changes were needed to test the availability of the Linux tracepoint macros. Due to the odd nature of the Linux tracepoint macro API, a separate ".h" must be created (the path and filename is used internally by the kernel's define_trace.h file). * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This is to determine if we can safely enable the Linux tracepoint functionality. We need to selectively disable the tracepoint code due to the kernel exporting certain functions as GPL only. Without this check, the build process will fail at link time. In addition, the SET_ERROR macro was modified into a tracepoint as well. To do this, the 'sdt.h' file was moved into the 'include/sys' directory and now contains a userspace portion and a kernel space portion. The dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as well. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-11-17 11:13:55 -08:00
Ned Bass	5024046763	cstyle: allow right paren on its own line Make the style checker script accept right parentheses on their own lines. This is motivated by the Linux tracepoints macro DECLARE_EVENT_CLASS. The code within TP_fast_assign() (a parameter of DECLARE_EVENT_CLASS) is normal C assignments terminated by semicolons. But the style checker forbids us from following a semicolon with a non-blank and from preceding a right parenthesis with white space. Therefore the closing parenthesis must go on the next line, yet the style checker foribs us from indenting it for readability. Relaxing the no-non-blank-after-semicolon rule would open the door to too many bad style practices. So instead we relax the no-white-space-before-right-paren rule if the parenthesis is on its own line. The relaxation is overriden with the -p option so we still have a way to catch misuse of this style. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-11-17 11:13:50 -08:00
Brian Behlendorf	bf2850de82	Fix source_tree variable in dkms build The source_tree variable in the previous commit had an extra $. Remove it so that source_tree is expanded properly. An identical fix has been applied in the original patch to the stable branch. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2776	2014-10-13 10:38:41 -07:00
Tom Prince	f178adea81	Point dkms build at installed source tree, rather than build directory. Signed-off-by: Tom Prince <tom.prince@clusterhq.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2776	2014-10-09 12:05:43 -07:00
Tom Prince	fee48fd22c	Install header during post-build rather than post-install. New versions of dkms clean up the build directory after installing. It appears that this was always intended, but had rm -rf "/path/to/build/*" (note the quotes), which prevented it from working. Also, the build step is already installing stuff into the directory where these files go, so installing our stuff there as part of build rather than install makes sense. Signed-off-by: Tom Prince <tom.prince@clusterhq.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2776	2014-10-09 12:03:50 -07:00
Brian Behlendorf	093219a6b3	zpool-create.sh: allow features to be disabled The zimport.sh script makes use of the zpool-create.sh script to construct test pools for importing with older versions of ZoL. It is desirable to have a way to disable all the features so new pools can be imported with older code. The simplest and most flexible way to achieve this was to merge the VERBOSE_FLAG and FORCE_FLAG in to a single ZPOOL_FLAGS variable. The contents of this variable will be used in the 'zpool create' allowing us to easily pass arbitrary flags. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #2524	2014-07-25 11:58:31 -07:00
Turbo Fredriksson	0f629346bb	Set LANG to a reasonable default (C) Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on the translated date string in the changelog. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: zfsonlinux/spl#306	2014-06-10 16:46:21 -07:00
Brian Behlendorf	e0b8f62902	Various zimport.sh fixes 1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR. These are vestiges of an earlier version of the script and were missed when it was updated. Additionally ensure the directory is created. 2) The 'fail' function should take an integer argument for the error code to return. Otherwise 0 (success) will be mistakenly returned and errors will we incorrectly suppressed. The error code should be meaningful enough to determine where the script failed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-17 09:30:55 -07:00
Brian Behlendorf	888f7141a3	Make zimport.sh bash dependency explicit Unfortunately, the zimport.sh test script really does depend on bash. Moving to /bin/sh should be possible once the shared infrastructure scripts it depends on is made portable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-10 16:07:59 -07:00
Brian Behlendorf	443c3f7332	Improve zfs.sh error messages Ensure an error message is logged when the 'zfs.sh' script fails to either load a module or if udev fails to create the /dev/zfs device. Error messages for missing KERNEL_MODULES are suppressed because that functionality may just be built-in to the kernel. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-10 14:27:00 -07:00
Brian Behlendorf	cc9ee13e1a	Dynamically create loop devices Several of the in-tree regression tests depend on the availability of loop devices. If for some reason no loop devices are available the tests will fail. Normally this isn't an issue because most Linux distributions create 8 loop devices by default. This is enough for our purposes. However, recent Fedora releases have only been creating a single loop device and this leads to failures. Alternately, if something else of the system is using the loop devices we may see failures. The fix for this is to update the support scripts to dynamically create loop devices as needed. The scripts need only create a node under /dev/ and the loop driver with create the minor. This behavior has been supported by the loop driver for ages. Additionally this patch updates cleanup_loop_devices() to cleanup loop devices which have already had their file store deleted. This helps prevent stale loop devices from accumulating on the system due to test failures. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #2249	2014-04-09 13:29:32 -07:00
Chris Dunlap	9e246ac3d8	Initial implementation of zed (ZFS Event Daemon) zed monitors ZFS events. When a zevent is posted, zed will run any scripts that have been enabled for the corresponding zevent class. Multiple scripts may be invoked for a given zevent. The zevent nvpairs are passed to the scripts as environment variables. Events are processed synchronously by the single thread, and there is no maximum timeout for script execution. Consequently, a misbehaving script can delay (or forever block) the processing of subsequent zevents. Plans are to address this in future commits. Initial scripts have been developed to log events to syslog and send email in response to checksum/data/io errors and resilver.finish/scrub.finish events. By default, email will only be sent if the ZED_EMAIL variable is configured in zed.rc (which is serving as a config file of sorts until a proper configuration file is implemented). Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-04-02 13:10:03 -07:00
Brian Behlendorf	a16bc6bdd9	Add zimport.sh compatibility test script Verify that an assortment of known good reference pools can be imported using different versions of the ZoL code. By default references pools for the major ZFS implementation will be checked against the most recent ZoL tags and the master development branch. Alternate tags or branches may be verified with the '-s <src-tag> option. Passing the keyword "installed" will instruct the script to test whatever version is installed. Preferentially a reference pool is used for all tests. However, if one does not exist and the pool-tag matches one of the src-tags then a new reference pool will be created using binaries from that source build. This is particularly useful when you need to test your changes before opening a pull request. New reference pools may be added by placing a bzip2 compressed tarball of the pool in the scripts/zpool-example directory and then passing the -p <pool-tag> option. To increase the test coverage reference pools should be collected for all the major ZFS implementations. Having these pools easily available is also helpful to the developers. Care should be taken to run these tests with a kernel supported by all the listed tags. Otherwise build failure will cause false positives. EXAMPLES: The following example will verify the zfs-0.6.2 tag, the master branch, and the installed zfs version can correctly import the listed pools. Note there is no reference pool available for master and installed but because binaries are available one is automatically constructed. The working directory is also preserved between runs (-k) preventing the need to rebuild from source for multiple runs. zimport.sh -k -f /var/tmp/zimport \ -s "zfs-0.6.1 zfs-0.6.2 master installed" \ -p "all master installed" --------------------- ZFS on Linux Source Versions -------------- zfs-0.6.1 zfs-0.6.2 master 0.6.2-180 ----------------------------------------------------------------- Clone SPL Skip Skip Skip Skip Clone ZFS Skip Skip Skip Skip Build SPL Skip Skip Skip Skip Build ZFS Skip Skip Skip Skip ----------------------------------------------------------------- zevo-1.1.1 Pass Pass Pass Pass zol-0.6.1 Pass Pass Pass Pass zol-0.6.2-173 Fail Fail Pass Pass zol-0.6.2 Pass Pass Pass Pass master Fail Fail Pass Pass installed Pass Pass Pass Pass Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #2094	2014-02-21 12:10:31 -08:00
Brian Behlendorf	99d3ece847	Add default FILEDIR path to zpool-config scripts Allow the caller of the zpool-create.sh script to override the default path where file vdevs are created. This allows for greated flexibilty when scripting. Additionally, update the default path from /tmp/ to /var/tmp/ because these days /tmp/ is likely a ramdisk. Even though these files are sparse they may grow large in which case they should be backed by a physical device. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2120	2014-02-12 09:37:46 -08:00
Brian Behlendorf	0820ec65c4	Fix zconfig.sh test 9 Commit `ba6a240` adjusted the behavior of 'zfs create -V'. The caller is no longer guaranteed that udev will have finished creating the /dev/ entries by the time to command exits. It is therefore required that we explicitly block waiting for udev to settle for this test to run reliably. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-02-10 15:48:32 -08:00
Brian Behlendorf	8ffef572ed	cstyle: Allow spaces in all comments Update the cstyle.pl script to allow pictures in all comments not just header comments. Recent changes from Illumos such as `d3cc8b1` have relocated various pictures in the standard block comments to make the code more readable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1821	2013-12-18 16:46:35 -08:00
Brian Behlendorf	c2d439dffd	Silence e2fsck warning in zconfig.sh When running zconfig.sh test 7 and 8 cause the following warning to be printed to the console. It's caused because we're snapshoting a mounted ext2 filesystem which is not in a 'clean' state. This is to be expected since we have no guarentees about the on-disk consistency of the filesystem. EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended To silence the warning and preserve the intent of these test cases they have been updated to unmount the filesystem prior to snapshoting them. This ensures the ext2 filesystem is in a consistent state when the snapshot is taken. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #1972	2013-12-16 09:46:10 -08:00
Brian Behlendorf	ba6a24026c	Remove ZFC_IOC__MINOR ioctl()s Early versions of ZFS coordinated the creation and destruction of device minors from userspace. This was inherently racy and in late 2009 these ioctl()s were removed leaving everything up to the kernel. This significantly simplified the code. However, we never picked up these changes in ZoL since we'd already significantly adjusted this code for Linux. This patch aims to rectify that by finally removing ZFC_IOC__MINOR ioctl()s and moving all the functionality down in to the kernel. Since this cleanup will change the kernel/user ABI it's being done in the same tag as the previous libzfs_core ABI changes. This will minimize, but not eliminate, the disruption to end users. Once merged ZoL, Illumos, and FreeBSD will basically be back in sync in regards to handling ZVOLs in the common code. While each platform must have its own custom zvol.c implemenation the interfaces provided are consistent. NOTES: 1) This patch introduces one subtle change in behavior which could not be easily avoided. Prior to this change callers of 'zfs create -V ...' were guaranteed that upon exit the /dev/zvol/ block device link would be created or an error returned. That's no longer the case. The utilities will no longer block waiting for the symlink to be created. Callers are now responsible for blocking, this is why a 'udev_wait' call was added to the 'label' function in scripts/common.sh. 2) The read-only behavior of a ZVOL now solely depends on if the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant policy setting in the gendisk structure was removed. This both simplifies the code and allows us to safely leverage set_disk_ro() to issue a KOBJ_CHANGE uevent. See the comment in the code for futher details on this. 3) Because __zvol_create_minor() and zvol_alloc() may now be called in a sync task they must use KM_PUSHPAGE. References: illumos/illumos-gate@681d9761e8 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #1969	2013-12-16 09:15:57 -08:00
Brian Behlendorf	a35beedfb3	Add cstyle.pl utility and cstyle.1 man page Cstyle is the C source style checker used by Illumos. Since the original ZFS source was written using these style guidelines they must also be followed by ZoL for consistency. The checker has been added to the scripts directory and may be run on a per file basis. New patches should be careful to avoid introducing new style warnings. Additionally, the 'checkstyle' target has been added to the top level Makefile and can be used to check the entire source tree. While Zol has historically attempted to follow the SunOS style guide the lack of a rigorous style checker has allowed various warning to be introduced. Currently there are 2211 reported style violations and we want to gradually eliminate these from the tree. Note the cstyle.1 man page is provided under man/man1/cstyle.1 but since it is a developer utility it is not installed along with the other man pages. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-10-30 11:36:30 -07:00
Prakash Surya	37fd6e00a6	Add script to fix file names in upstream patches Added a simple sed script to do a search and replace on the Illumos ZFS file names and replace them with the ZFS on Linux equivalent. Example usage: # Replace Illumos paths with Linux paths $ ./scripts/zfs2zol-patch.sed arc.c.patch > arc.c.patch.linux # Ensure the script worked as expected $ diff arc.c.patch arc.c.patch.linux # Apply the patch using Linux paths $ patch -p1 < arc.c.patch.linux Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1679	2013-10-29 10:30:43 -07:00
Brian Behlendorf	cb79a4e8bb	Add kmod repo integration When the kmod packaging infrastructure was originally added the dependency on the rpmfusion yum repositories was disabled. This was done at the time in favour of getting local builds working. Now the time has come to conditionally re-enable that functionality so we can properly provide binary kmod packages. ./configure --with-config=srpm make SRPM_DEFINE_KMOD='--define="repo rpmfusion"' srpm-kmod mock rebuild zfs-kmod-x.y.z-r.el6.src.rpm One nice benefit of finishing this work is that the generic and fedora spl-kmod spec files can be merged again. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-08-01 09:48:07 -07:00
Brian Behlendorf	91604b298c	Open pools asynchronously after module load One of the side effects of calling zvol_create_minors() in zvol_init() is that all pools listed in the cache file will be opened. Depending on the state and contents of your pool this operation can take a considerable length of time. Doing this at load time is undesirable because the kernel is holding a global module lock. This prevents other modules from loading and can serialize an otherwise parallel boot process. Doing this after module inititialization also reduces the chances of accidentally introducing a race during module init. To ensure that /dev/zvol/<pool>/<dataset> devices are still automatically created after the module load completes a udev rules has been added. When udev notices that the /dev/zfs device has been create the 'zpool list' command will be run. This then will cause all the pools listed in the zpool.cache file to be opened. Because this process in now driven asynchronously by udev there is the risk of problems in downstream distributions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #756 Issue #1020 Issue #1234	2013-07-03 09:24:38 -07:00
Nathaniel Clark	389cf730ce	Make spl directory setable when building rpms and add --buildroot This adds ability to set the location of spl via defines when building from the spec files. This is useful for build systems that build spl and zfs together without installing the actual rpms. Signed-off-by: Nathaniel Clark <Nathaniel.Clark@misrule.us> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1486	2013-06-21 16:00:45 -07:00
Jan Engelhardt	4e95cc99b0	build: resolve orthographic and other grammatical errors Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-04-02 10:44:52 -07:00
Brian Behlendorf	0df23ca9a1	Provide ${kmodname}-devel-kmod for yum-builddep In order to ensure that yum-builddep pulls in all the build requirements a generic ${kmodname}-devel-kmod provides line is added. This allows a version of the development headers to be included without requiring knowledge of the kernel version. This is important because unlike rpmbuild which does correctly expand the source rpm spec file, yum-builddep does not. Without this generic provides line mock which relies on yum-builddep is unable to automatically satisfy the dependency. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-25 13:28:19 -07:00
Brian Behlendorf	352b557bd6	Use requested kernel for dkms builds The --with-linux and --with-linux-obj options must be specified as part of the dkms build otherwise the package will be built against the running kernel. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-21 12:51:06 -07:00
Brian Behlendorf	ae0f0ba950	Use BUILD_DEPENDS option for dkms builds Support was added to dkms so build dependencies can be specified. This allows us to ensure that the spl package will always be built before the zfs package. Those patches have not yet been merged upstream but they are available in the zfsonlinux/dkms repository. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-21 12:51:06 -07:00
Brian Behlendorf	f3757573a6	Refresh RPM packaging Refresh the existing RPM packaging to conform to the 'Fedora Packaging Guidelines'. This includes adopting the kmods2 packaging standard which is used fod kmods distributed by rpmfusion for Fedora/RHEL. http://fedoraproject.org/wiki/Packaging:Guidelines http://rpmfusion.org/Packaging/KernelModules/Kmods2 While the spec files have been entirely rewritten from a user perspective the only major changes are: * The Fedora packages now have a build dependency on the rpmfusion repositories. The generic kmod packages also have a new dependency on kmodtool-1.22 but it is bundled with the source rpm so no additional packages are needed. * The kernel binary module packages have been renamed from zfs-modules-* to kmod-zfs-* as specificed by kmods2. * The is now a common kmod-zfs-devel-* package in addition to the per-kernel devel packages. The common package contains the development headers while the per-kernel package contains kernel specific build products. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1341	2013-03-18 15:33:17 -07:00
Brian Behlendorf	48c028f5a5	Replace libexecdir with datadir According to the FHS. Testing scripts and examples which are all architecture independent should be installed in a subdirectory under /usr/share. http://www.pathname.com/fhs/2.2/fhs-4.11.html Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-06 15:46:40 -08:00
Eric Dillmann	0b4d1b5853	Add snapdev=[hidden\|visible] dataset property The new snapdev dataset property may be set to control the visibility of zvol snapshot devices. By default this value is set to 'hidden' which will prevent zvol snapshots from appearing under /dev/zvol/ and /dev/<dataset>/. When set to 'visible' all zvol snapshots for the dataset will be visible. This functionality was largely added because when automatic snapshoting is enabled large numbers of read-only zvol snapshots will be created. When creating these devices the kernel will attempt to read their partition tables, and blkid will attempt to identify any filesystems on those partitions. This leads to a variety of issues: 1) The zvol partition tables will be read in the context of the `modprobe zfs` for automatically imported pools. This is undesirable and should be done asynchronously, but for now reducing the number of visible devices helps. 2) Udev expects to be able to complete its work for a new block devices fairly quickly. When many zvol devices are added at the same time this is no longer be true. It can lead to udev timeouts and missing /dev/zvol links. 3) Simply having lots of devices in /dev/ can be aukward from a management standpoint. Hidding the devices your unlikely to ever use helps with this. Any snapshot device which is needed can be made visible by changing the snapdev property. NOTE: This patch changes the default behavior for zvols which was effectively 'snapdev=visible'. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1235 Closes #945 Issue #956 Issue #756	2013-03-05 12:37:54 -08:00
Brian Behlendorf	dbf763b39b	Retire zpool_id infrastructure In the interest of maintaining only one udev helper to give vdevs user friendly names, the zpool_id and zpool_layout infrastructure is being retired. They are superseded by vdev_id which incorporates all the previous functionality. Documentation for the new vdev_id(8) helper and its configuration file, vdev_id.conf(5), can be found in their respective man pages. Several useful example files are installed under /etc/zfs/. /etc/zfs/vdev_id.conf.alias.example /etc/zfs/vdev_id.conf.multipath.example /etc/zfs/vdev_id.conf.sas_direct.example /etc/zfs/vdev_id.conf.sas_switch.example Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #981	2013-01-29 12:23:17 -08:00
Brian Behlendorf	ff5b1c8065	Quiet mkfs.ext2 output The -q option should quiet the mkfs.ext2 output but certain versions of e2fsprogs appear to ignore it. This can result in an extra 'done' message in the test output. To keep this noise from distracting just direct stdout to /dev/null. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-28 15:36:02 -08:00
Brian Behlendorf	930b6fec21	Stop using /bin/ as a source in zconfig.sh Test 5, 6, 7, and 7 in zconfig.sh use /bin/ as a source of random directories and files for their test. This has lead to unexpected tests failures because the total size of /bin/ on the test system isn't checked and it is entirely possible for it to be larger than the target filesystem. To resolve this issue we create a somewhat random collection of files and directories in /var/tmp to use. On average we expect about 5MB of data with the worst case being 20MB. This is large enough to be interesting and small enough to always fit in the default test datasets. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1113	2013-01-28 14:51:26 -08:00
Brian Behlendorf	e528c9b461	Fix zconfig.sh partitioning error Parted version 3.0 doesn't allow us to specify the start and end percentages as 50% and 100% respectively. This results in: Error: The location 100% is outside the device /dev/zd0 Therefore we change the syntax to 51% and -1 for end of device. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-24 13:57:16 -08:00
Brian Behlendorf	563103decd	Fix test script error codes The 'exit $?' command in the INT TERM EXIT trap was overwritting the expected error code with the error code from mv. Fix the issue by removing the 'exit $?'. It's important the we preserve the original error code so failures are easily noticed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-24 13:53:12 -08:00
Turbo Fredriksson	645fb9cc21	Implemented sharing datasets via SMB using libshare Add the initial support for the 'smbshare' option using the existing libshare infrastructure. Because this implementation relies on usershares samba version 3.0.23 is required. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #493	2012-12-03 09:42:15 -08:00
Andrew Reid	6cb7ab069d	Do not return /dev/loop-control in unused_loop_device The function unused_loop_device in /usr/libexec/zfs/common.sh returns /dev/loop-control on the first call. This device is NOT a loop device (https://github.com/torvalds/linux/commit/770fe30) it is a control device. This in turn causes the script zconfig.sh to fail with: zpool-create.sh: Error 1 creating /tmp/zpool-vdev0 -> /dev/loop-control loopback The patch makes the function return /dev/loop[0-9]* which are loop devices. Signed-off-by: Andrew Reid <ColdCanuck@nailedtotheperch.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #797	2012-10-15 10:02:42 -07:00
Etienne Dechamps	2f342404c1	Force 4K blocksize when testing ext2 on zvol. Currently, mkfs.ext2 on zconfig.sh zvols tries to use a 8K blocksize, probably because by default zvol exposes an optimal I/O size of 8K. Unfortunately, a ext2 blocksize of 8K is not supported by the kernel, so the resulting filesystem is unmountable. This patch fixes the issue by making sure the blocksize is 4K. We have to use -F to force it else mkfs.ext2 won't allow us to use a blocksize smaller than the optimal I/O size. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #979	2012-10-03 10:52:51 -07:00
Brian Behlendorf	ca8b5af89d	Remove autotools products Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #718	2012-08-27 11:47:44 -07:00
Etienne Dechamps	ee5fd0bb80	Set zvol discard_granularity to the volblocksize. Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #862	2012-08-07 14:55:31 -07:00
Richard Yao	739a1a82e0	Linux 3.5 compat, end_writeback() changed to clear_inode() The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #784	2012-07-23 12:29:36 -07:00
Richard Yao	ea1fdf46e2	Linux 3.5 compat, iops->truncate_range() removed The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:32 -07:00
Richard Yao	756c3e5a9c	Linux 3.5 compat, eops->encode_fh() takes inodes The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:23 -07:00
Etienne Dechamps	b5a28807cd	Move partition scanning from userspace to module. Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #808	2012-07-17 09:17:31 -07:00
Richard Yao	6a0936babc	Linux 3.4 compat, d_make_root() replaces d_alloc_root() torvalds/linux@adc0e91ab1 introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #776	2012-06-11 10:04:49 -07:00
Brian Behlendorf	b39d3b9f7b	Linux 3.3 compat, iops->create()/mkdir()/mknod() The mode argument of iops->create()/mkdir()/mknod() was changed from an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf check was added to detect the API change and then correctly set a zpl_umode_t typedef. There is no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #701	2012-04-30 12:52:38 -07:00
Brian Behlendorf	1c5de20ae2	Add --enable-debug-dmu-tx configure option Allow rigorous (and expensive) tx validation to be enabled/disabled indepentantly from the standard zfs debugging. When enabled these checks ensure that all txs are constructed properly and that a dbuf is never dirtied without taking the correct tx hold. This checking is particularly helpful when adding new dmu consumers like Lustre. However, for established consumers such as the zpl with no known outstanding tx construction problems this is just overhead. --enable-debug-dmu-tx - Enable/disable validation of each tx as --disable-debug-dmu-tx it is constructed. By default validation is disabled due to performance concerns. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-23 12:25:17 -07:00
Brian Behlendorf	ebe7e575ea	Add .zfs control directory Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. ) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. ) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. ) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. ) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <rohan.puri15@gmail.com> Contributiobs-by: Andrew Barnes <barnes333@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #173	2012-03-22 13:03:47 -07:00
Brian Behlendorf	4b787d75c8	Cleanly support debug packages Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 14:08:17 -08:00
Etienne Dechamps	30930fba21	Add support for DISCARD to ZVOLs. DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning. It allows ZVOL clients to discard (unmap, trim) block ranges from a ZVOL, thus optimizing disk space usage by allowing a ZVOL to shrink instead of just grow. We can't use zfs_space() or zfs_freesp() here, since these functions only work on regular files, not volumes. Fortunately we can use the low-level function dmu_free_long_range() which does exactly what we want. Currently the discard operation is not added to the log. That's not a big deal since losing discard requests cannot result in data corruption. It would however result in disk space usage higher than it should be. Thus adding log support to zvol_discard() is probably a good idea for a future improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 16:19:38 -08:00
Etienne Dechamps	cb2d19010d	Support the fallocate() file operation. Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is supported, since it's the only one that matches the behavior of zfs_space(). This makes it pretty much useless in its current form, but it's a start. To support other flag combinations we would need to modify zfs_space() to make it more flexible, or emulate the desired functionality in zpl_fallocate(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 16:19:32 -08:00
Brian Behlendorf	93648f314c	Fix zconfig.sh non-optimal alignment The recent zvol improvements have changed default suggested alignment for zvols from 512b (default) to 8k (zvol blocksize). Because of this the zconfig.sh tests which create paritions are now generating a warning about non-optimal alignments. This change updates the need zconfig.sh tests such that a partition will be properly aligned. In the process, it shifts from using the sfdisk utility to the parted utility to create partitions. It also moves the creation of labels, partitions, and filesystems in to generic functions in common.sh.in. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 13:23:28 -08:00
Etienne Dechamps	34037afe24	Improve ZVOL queue behavior. The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Etienne Dechamps	b18019d2d8	Fix synchronicity for ZVOLs. zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does not mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not mean that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Brian Behlendorf	47621f3d76	Linux 3.3 compat, sops->show_options() The second argument of sops->show_options() was changed from a 'struct vfsmount ' to a 'struct dentry '. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #549	2012-02-03 10:02:01 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Prakash Surya	6ba3b44614	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2011-12-14 19:14:23 -08:00
Brian Behlendorf	5547c2f1bf	Simplify BDI integration Update the code to use the bdi_setup_and_register() helper to simplify the bdi integration code. The updated code now just registers the bdi during mount and destroys it during unmount. The only complication is that for 2.6.32 - 2.6.33 kernels the helper wasn't available so in these cases the zfs code must provide it. Luckily the bdi_setup_and_register() function is trivial. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #367	2011-11-08 10:19:03 -08:00
Brian Behlendorf	5cbf6db937	Disable 90-zfs.rules for test suite When running the zconfig.sh, zpios-sanity.sh, and zfault.sh from the installed packages the 90-zfs.rules can cause failures. These will occur because the test suite assumes it has full control over loading/unloading the module stack. If the stack gets asynchronously loaded by the udev rule the test suite will treat it as a failure. Resolve the issue by disabling the offending rule during the tests and enabling it on exit. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-10-11 14:45:37 -07:00
Brian Behlendorf	de0a1c099b	Autogen refresh for udev changes Run autogen.sh using the same autotools versions as upstream: * autoconf-2.63 * automake-1.11.1 * libtool-2.2.6b	2011-08-08 16:30:27 -07:00
Brian Behlendorf	76659dc110	Add backing_device_info per-filesystem For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174	2011-08-04 13:37:38 -07:00
Kyle Fuller	615ab66d18	Provide a rc.d script for archlinux Unlike most other Linux distributions archlinux installs its init scripts in /etc/rc.d insead of /etc/init.d. This commit provides an archlinux rc.d script for zfs and extends the build infrastructure to ensure it get's installed in the correct place. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #322	2011-07-11 14:12:23 -07:00
Brian Behlendorf	2cf7f52bc4	Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231	2011-07-01 13:36:39 -07:00
Brian Behlendorf	5c03efc379	Linux compat 2.6.39: security_inode_init_security() The security_inode_init_security() function now takes an additional qstr argument which must be passed in from the dentry if available. Passing a NULL is safe when no qstr is available the relevant security checks will just be skipped. Closes #246 Closes #217 Closes #187	2011-07-01 12:40:08 -07:00
Prasad Joshi	b312979252	Tear down and flush the mmap region The inode eviction should unmap the pages associated with the inode. These pages should also be flushed to disk to avoid the data loss. Therefore, use truncate_setsize() in evict_inode() to release the pagecache. The API truncate_setsize() was added in 2.6.35 kernel. To ensure compatibility with the old kernel, the patch defines its own truncate_setsize function. Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com> Closes #255	2011-06-27 09:59:19 -07:00
Brian Behlendorf	2e08aedba4	Always check -Wno-unused-but-set-variable gcc support The previous commit `8a7e1ceefa` wasn't quite right. This check applies to both the user and kernel space build and as such we must make sure it runs regardless of what the --with-config option is set too. For example, if --with-config=kernel then the autoconf test does not run and we generate build warnings when compiling the kernel packages.	2011-06-14 16:40:35 -07:00
Brian Behlendorf	8a7e1ceefa	Check for -Wno-unused-but-set-variable gcc support Gcc versions 4.3.2 and earlier do not support the compiler flag -Wno-unused-but-set-variable. This can lead to build failures on older Linux platforms such as Debian Lenny. Since this is an optional build argument this changes add a new autoconf check for the option. If it is supported by the installed version of gcc then it is used otherwise it is omited. See commit's `12c1acde76` and `79713039a2` for the reason the -Wno-unused-but-set-variable options was originally added.	2011-06-14 14:43:22 -07:00
Brian Behlendorf	10715a0187	Add default stack checking When your kernel is built with kernel stack tracing enabled and you have the debugfs filesystem mounted. Then the zfs.sh script will clear the worst observed kernel stack depth on module load and check the worst case usage on module removal. If the stack depth ever exceeds 7000 bytes the full stack will be printed for debugging. This is dangerously close to overrunning the default 8k stack. This additional advisory debugging is particularly valuable when running the regression tests on a kernel built with 16k stacks. In this case, almost no matter how bad the stack overrun is you will see be able to get a clean stack trace for debugging. Since the worst case stack usage can be highly variable it's helpful to always check the worst case usage.	2011-06-13 13:50:21 -07:00
Brian Behlendorf	da88a7fbe8	Pass -f option for import If a pool was not cleanly exported passing the -f flag may be required at 'zpool import' time. Since this test is simply validating that the pool can be successfully imported in the absense of the cache file always pass the -f to ensure it succeeds. This failure was observed under RHEL6.1.	2011-06-10 11:21:31 -07:00
Brian Behlendorf	cbc6fab65c	Sanatize zpios-sanity.sh environment Just like zconfig.sh the zpios-sanity.sh tests should run in a sanatized environment. This ensures they never conflict with an installed /etc/zfs/zpool.cache file. This commit additionally improves the -c cleanup option. It now removes the modules stack if loaded and destroys relevant md devices. This behavior is now identical to zconfig.sh.	2011-06-03 15:08:49 -07:00
Brian Behlendorf	608860b6d0	Delay before destroying loopback devices Generally I don't approve of just adding an arbitrary delay to avoid a problem but in this case I'm going to let it slide. We may need to delay briefly after 'zpool destroy' returns to ensure the loopback devices are closed. If they aren't closed than losetup -d will not be able to destroy them. Unfortunately, there's no easy state the check so we'll have to make due with a simple delay.	2011-06-03 14:38:25 -07:00
Brian Behlendorf	36391312af	Always unload zpios.ko on exit We should always unload zpios.ko on exit. This ensures that subsequent calls to 'zfs.sh -u' from other utilities will be able to unload the module stack and properly cleanup. This is important for the the --cleanup option which can be passed to zconfig.sh and zfault.sh.	2011-06-02 10:25:35 -07:00
Brian Behlendorf	2ea9dc40f8	Fix zpios-sanity.sh return code The zpios-sanity.sh script should return failure when any of the individual zpios.sh tests fail. The previous code would always return success suppressing real failures.	2011-06-02 10:13:15 -07:00
Brian Behlendorf	df554c148e	Fix 'zfs set volsize=N pool/dataset' This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <github@fajar.net> Closes #68 Closes #84	2011-05-02 08:54:40 -07:00
Gunnar Beutner	055656d4f4	Implemented NFS export_operations. Implemented the required NFS operations for exporting ZFS datasets using the in-kernel NFS daemon.	2011-04-29 12:36:13 -07:00
Brian Behlendorf	3fce1d0962	Update zconfig.sh to use new zvol names This change should have occured when we commited the new udev rules for zvols. Basically, the test script is just out of date. We need to update it to use the /dev/zvol/ device names, and to expect the more common -partN suffixes. I added a udev_trigger() call in zconfig_partition() and zconfig_zvol_device_stat() to ensure that all the udev rules have run before. This ensures the devices are available to subsequent commands and closes a small race. Finally, I was forced added a small 'sleep 1' to test 10. I was observing occassional failures in my VM due to the device still claiming to be busy. Delaying betwen the various methods of adding/removing a vdev avoids the issue. Closes #207	2011-04-19 16:33:41 -07:00
Ned Bass	fa417e57a6	Call udevadm trigger more safely Some udev hooks are not designed to be idempotent, so calling udevadm trigger outside of the distribution's initialization scripts can have unexpected (and potentially dangerous) side effects. For example, the system time may change or devices may appear multiple times. See Ubuntu launchpad bug 320200 and this mailing list post for more details: https://lists.ubuntu.com/archives/ubuntu-devel/2009-January/027260.html To avoid these problems we call udevadm trigger with --action=change --subsystem-match=block. The first argument tells udev just to refresh devices, and make sure everything's as it should be. The second argument limits the scope to block devices, so devices belonging to other subsystems cannot be affected. This doesn't fix the problem on older udev implementations that don't provide udevadm but instead have udevtrigger as a standalone program. In this case the above options aren't available so there's no way to call call udevtrigger safely. But we can live with that since this issue only exists in optional test and helper scripts, and most zfs-on-linux users are running newer systems anyways.	2011-04-05 13:00:51 -07:00
Brian Behlendorf	bdf4328b04	Linux 2.6.28 compat, insert_inode_locked() Added insert_inode_locked() helper function, prior to this most callers used insert_inode_hash(). The older method doesn't check for collisions in the inode_hashtable but it still acceptible for use. Fallback to using insert_inode_hash() when insert_inode_locked() is unavailable.	2011-03-22 12:15:54 -07:00
Brian Behlendorf	01c0e61da0	Add init scripts To support automatically mounting your zfs on filesystem on boot a basic init script is needed. Unfortunately, every distribution has their own idea of the _right_ way to do things. Rather than write one very complicated portable init script, which would be invariably replaced by the distributions own anyway. I have instead added support to provide multiple distribution specific init scripts. The correct init script for your distribution will be selected by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT. During 'make install' the correct script for your system will be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the usual /etc/init.d/zfs location. Currently, there is zfs.fedora and a more generic zfs.lsb init script. Hopefully, the distribution maintainers who know best how they want their init scripts to function will feedback their approved versions to be included in the project. This change does not consider upstart jobs but I'm not at all opposed to add that sort of thing.	2011-03-17 16:51:54 -07:00
Brian Behlendorf	45066d1f20	Linux 2.6.38 compat, blkdev_get_by_path() The open_bdev_exclusive() function has been replaced (again) by the more generic blkdev_get_by_path() function. Additionally, the counterpart function close_bdev_exclusive() has been replaced by blkdev_put(). Because these functions are more generic versions of the functions they replaced the compatibility macro must add the FMODE_EXCL mask to ensure they are exclusive. Closes #114	2011-02-23 12:29:38 -08:00
Brian Behlendorf	b9f6a49025	Update 'zfs.sh -u' to umount all zfs filesystems Before it is safe to unload the zfs module stack all mounted zfs filesystems must be unmounted. If they are not unmounted, there will be references held on the modules and the stack cannot be removed. To handle this have 'zfs.sh -u' which is used by all of the test scripts umount all zfs filesystem before attempting to unload the module stack.	2011-02-16 11:10:31 -08:00
Brian Behlendorf	2c395def27	Linux 2.6.36 compat, sops->evict_inode() The new prefered inteface for evicting an inode from the inode cache is the ->evict_inode() callback. It replaces both the ->delete_inode() and ->clear_inode() callbacks which were previously used for this.	2011-02-11 13:47:51 -08:00
Brian Behlendorf	7268e1bec8	Linux 2.6.35 compat, fops->fsync() The fsync() callback in the file_operations structure used to take 3 arguments. The callback now only takes 2 arguments because the dentry argument was determined to be unused by all consumers. To handle this a compatibility prototype was added to ensure the right prototype is used. Our implementation never used the dentry argument either so it's just a matter of using the right prototype.	2011-02-11 09:05:51 -08:00
Brian Behlendorf	777d4af891	Linux 2.6.35 compat, const struct xattr_handler The const keyword was added to the 'struct xattr_handler' in the generic Linux super_block structure. To handle this we define an appropriate xattr_handler_t typedef which can be used. This was the preferred solution because it keeps the code clean and readable.	2011-02-10 16:29:00 -08:00
Brian Behlendorf	c5d915f423	Minimal libshare infrastructure ZFS even under Solaris does not strictly require libshare to be available. The current implementation attempts to dlopen() the library to access the needed symbols. If this fails libshare support is simply disabled. This means that on Linux we only need the most minimal libshare implementation. In fact just enough to prevent the build from failing. Longer term we can decide if we want to implement a libshare library like Solaris. At best this would be an abstraction layer between ZFS and NFS/SMB. Alternately, we can drop libshare entirely and directly integrate ZFS with Linux's NFS/SMB. Finally the bare bones user-libshare.m4 test was dropped. If we do decide to implement libshare at some point it will surely be as part of this package so the check is not needed.	2011-02-04 16:14:29 -08:00
Brian Behlendorf	b3259b6a2b	Autoconf selinux support If libselinux is detected on your system at configure time link against it. This allows us to use a library call to detect if selinux is enabled and if it is to pass the mount option: "context=\"system_u:object_r:file_t:s0" For now this is required because none of the existing selinux policies are aware of the zfs filesystem type. Because of this they do not properly enable xattr based labeling even though zfs supports all of the required hooks. Until distro's add zfs as a known xattr friendly fs type we must use mntpoint labeling. Alternately, end users could modify their existing selinux policy with a little guidance.	2011-01-28 12:45:19 -08:00
Ned Bass	31165fd9aa	Remove partition from vdev name in zfault.sh As of the 0.5.2 tag, names of whole-disk vdevs must be specified to the command line tools without partition identifiers. This commit fixes a 'zpool online' command in zfault.sh that incorrectly includes he partition in the vdev name, causing test 9 to fail. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-29 10:53:53 -08:00
Brian Behlendorf	e0f3df67e5	Add '-ts' options to zconfig.sh/zfault.sh usage When adding this functionality originally the options to only run specific tests (-t), or conversely skip specific tests (-s) were omitted from the usage page. This commit adds the missing documentation.	2010-11-11 11:40:06 -08:00
Brian Behlendorf	7dc3830c0f	Remove spl/zfs modules as part of cleanup The idea behind the '-c' flag is to cleanup everything from a previous test run which might cause the test script to fail. This should also include removing the previously loaded module. This makes it a little easier to run 'zconfig.sh -c', however remember this is a test script and it will take all of your other zpools offline for the purposes of the test. This notion has also been extended to the default 'make check' behavior.	2010-11-11 11:40:06 -08:00
Brian Behlendorf	cf47fad67d	Unconditionally load core kernel modules Loading and unloading the zlib modules as part of the zfs.sh script has proven a little problematic for a few reasons. * First, your kernel may not need to load either zlib_inflate or zlib_deflate. This functionality may be built directly in to your kernel. It depends entirely on what your distribution decided was the right thing to do. * Second, even if you do manage to load the correct modules you may not be able to unload them. There may other consumers of the modules with a reference preventing the unload. To avoid both of these issues the test scripts have been updated to attempt to unconditionally load all modules listed in KERNEL_MODULES. If the module is successfully loaded you must have needed it. If the module can't be loaded that almost certainly means either it is built in to your kernel or is already being used by another consumer. In both cases this is not an issue and we can move on to the spl/zfs modules. Finally, by removing these kernel modules from the MODULES list we ensure they are never unloaded during 'zfs.sh -u'. This avoids the issue of the script failing because there is another consumer using the module we were not aware of. In other words the script restricts unloading modules to only the spl/zfs modules. Closes #78	2010-11-11 11:38:25 -08:00
Brian Behlendorf	8c3ab23f4b	Add lustre zpios-test workload The lustre zpios-test simulates a reasonable lustre workload. It will create 128 threads, the same as a Lustre OSS, and then 4096 individual objects. Each objects is 16MiB in size and will be written/read in 1MiB from a random thread. This is fundamentally how we expect Lustre to behave for large IO intensive workloads.	2010-11-08 14:03:36 -08:00
Brian Behlendorf	cb39a6c6aa	Replace custom zpool configs with generic configs To streamline testing I have in the past added several custom configs to the zpool-config directory. This change reverts those custom configs and replaces them with three generic config which can do the same thing. The generic config behavior can be set by setting various environment variables when calling either the zpool-create.sh or zpios.sh scripts. For example if you wanted to create and test a single 4-disk Raid-Z2 configuration using disks [A-D]1 with dedicated ZIL and L2ARC devices you could run the following. $ ZIL="log A2" L2ARC="cache B2" RANKS=1 CHANNELS=4 LEVEL=2 \ zpool-create.sh -c zpool-raidz $ zpool status tank pool: tank state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 A1 ONLINE 0 0 0 B1 ONLINE 0 0 0 C1 ONLINE 0 0 0 D1 ONLINE 0 0 0 logs A2 ONLINE 0 0 0 cache B2 ONLINE 0 0 0 errors: No known data errors	2010-11-08 14:03:36 -08:00
Ned Bass	d4055aac3c	Add zconfig test for adding and removing vdevs This test performs a sanity check of the zpool add and remove commands. It tests adding and removing both a cache disk and a log disk to and from a zpool. Usage of both a shorthand device path and a full path is covered. The test uses a scsi_debug device as the disk to be added and removed. This is done so that zpool will see it as a whole disk and partition it, which it does not currently done for loopback devices. We want to verify that the manipulation done to whole disks paths to hide the parition information does not break the add/remove interface. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:41:57 -07:00
Brian Behlendorf	0ee8118bd3	Add zfault zpool configurations and tests Eleven new zpool configurations were added to allow testing of various failure cases. The first 5 zpool configurations leverage the 'faulty' md device type which allow us to simuluate IO errors at the block layer. The last 6 zpool configurations leverage the scsi_debug module provided by modern kernels. This device allows you to create virtual scsi devices which are backed by a ram disk. With this setup we can verify the full IO stack by injecting faults at the lowest layer. Both methods of fault injection are important to verifying the IO stack. The zfs code itself also provides a mechanism for error injection via the zinject command line tool. While we should also take advantage of this appraoch to validate the code it does not address any of the Linux integration issues which are the most concerning. For the moment we're trusting that the upstream Solaris guys are running zinject and would have caught internal zfs logic errors. Currently, there are 6 r/w test cases layered on top of the 'faulty' md devices. They include 3 writes tests for soft/transient errors, hard/permenant errors, and all writes error to the device. There are 3 matching read tests for soft/transient errors, hard/permenant errors, and fixable read error with a write. Although for this last case zfs doesn't do anything special. The seventh test case verifies zfs detects and corrects checksum errors. In this case one of the drives is extensively damaged and by dd'ing over large sections of it. We then ensure zfs logs the issue and correctly rebuilds the damage. The next test cases use the scsi_debug configuration to injects error at the bottom of the scsi stack. This ensures we find any flaws in the scsi midlayer or our usage of it. Plus it stresses the device specific retry, timeout, and error handling outside of zfs's control. The eighth test case is to verify that the system correctly handles an intermittent device timeout. Here the scsi_debug device drops 1 in N requests resulting in a retry either at the block level. The ZFS code does specify the FAILFAST option but it turns out that for this case the Linux IO stack with still retry the command. The FAILFAST logic located in scsi_noretry_cmd() does no seem to apply to the simply timeout case. It appears to be more targeted to specific device or transport errors from the lower layers. The ninth test case handles a persistent failure in which the device is removed from the system by Linux. The test verifies that the failure is detected, the device is made unavailable, and then can be successfully re-add when brought back online. Additionally, it ensures that errors and events are logged to the correct places and the no data corruption has occured due to the failure.	2010-10-12 15:20:03 -07:00
Brian Behlendorf	2959d94a0a	Add FAILFAST support ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	12f3012974	Add missing Makefile.in from zpool_layout commit The scripts/zpool-layout/Makefile.in file generated by autogen.sh was accidentally omitted from the previous commit. Add it.	2010-09-17 16:24:15 -07:00
Brian Behlendorf	a5b4d63582	Add [-m map] option to zpool_layout By default the zpool_layout command would always use the slot number assigned by Linux when generating the zdev.conf file. This is a reasonable default there are cases when it makes sense to remap the slot id assigned by Linux using your own custom mapping. This commit adds support to zpool_layout to provide a custom slot mapping file. The file contains in the first column the Linux slot it and in the second column the custom slot mapping. By passing this map file with '-m map' to zpool_config the mapping will be applied when generating zdev.conf. Additionally, two sample mapping have been added which reflect different ways to map the slots in the dragon drawers.	2010-09-17 11:02:19 -07:00
Brian Behlendorf	2c4834f87a	Wait up to timeout seconds for udev device Occasional failures were observed in zconfig.sh because udev could be delayed for a few seconds. To handle this the wait_udev function has been added to wait for timeout seconds for an expected device before returning an error. By default callers currently use a 30 seconds timeout which should be much longer than udev ever needs but not so long to worry the test suite is hung.	2010-09-11 20:54:41 -07:00
Brian Behlendorf	ac063c48ae	Reduce volume size in zconfig.sh Due to occasional ENOSPC failures on certain platforms I've reduced the size of the ZVOL from 400M to 300M for the zvol+ext2 clone tests.	2010-09-10 21:35:27 -07:00
Brian Behlendorf	6283f55ea1	Support custom build directories and move includes One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.	2010-09-08 12:38:56 -07:00
Brian Behlendorf	9691eb9fee	Remove scripts/common.sh This script is now dynamically generated at configure time from scripts/common.sh.in. This change was made by commit `26e61dd074` but we accidentally kept the common.sh file around. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-09-01 13:29:50 -07:00
Brian Behlendorf	e70e591c51	Add initial autoconf products Add the initial products from autogen.sh. These products will be updated incrementally after this point as development occurs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:02 -07:00
Brian Behlendorf	302ef1517e	Add linux zpios support Linux kernel implementation of PIOS test app. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:01 -07:00
Brian Behlendorf	325f023544	Add linux kernel device support This branch contains the majority of the changes required to cleanly intergrate with Linux style special devices (/dev/zfs). Mainly this means dropping all the Solaris style callbacks and replacing them with the Linux equivilants. This patch also adds the onexit infrastructure needed to track some minimal state between ioctls. Under Linux it would be easy to do this simply using the file->private_data. But under Solaris they apparent need to pass the file descriptor as part of the ioctl data and then perform a lookup in the kernel. Once again to keep code change to a minimum I've implemented the Solaris solution. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:50 -07:00
Brian Behlendorf	c9c0d073da	Add build system Add autoconf style build infrastructure to the ZFS tree. This includes autogen.sh, configure.ac, m4 macros, some scripts/*, and makefiles for all the core ZFS components.	2010-08-31 13:41:27 -07:00
Brian Behlendorf	428870ff73	Update core ZFS code from build 121 to build 141.	2010-05-28 13:45:14 -07:00
Brian Behlendorf	3affbe6d7e	Update nvpair's to include nv_alloc_fixed support	2010-04-29 11:59:41 -07:00
Brian Behlendorf	fa42225a3d	Add Solaris FMA style support	2010-04-29 10:37:15 -07:00
Brian Behlendorf	414f1f975e	Rename update-zfs.sh -> zfs-update.sh for consistency	2010-03-11 09:53:59 -08:00
Brian Behlendorf	058ac9ba78	Pull in latest man pages as part of update-zfs.sh The script has been updated to download the latest documentations packages for Solaris and extract the needed ZFS man pages. These will still need a little markup to handle changes between the Solaris and Linux versions of ZFS. Howver, they should be pretty minor I've tried hard to keep the interface the same. In additional to the script update the zdb, zfs, and zpool man pages have been added to the repo.	2009-12-11 16:15:33 -08:00
Brian Behlendorf	0aa61e8427	Remove zvol.c when updating in update-zfs.sh Linux version available.	2009-11-15 16:20:01 -08:00
Brian Behlendorf	5c36312909	Script update-zfs.sh updated to include libefi library	2009-10-09 15:37:29 -07:00
Brian Behlendorf	42bcb36c89	Add unicode library	2009-01-05 12:03:23 -08:00
Brian Behlendorf	36b849fa51	Remove zdump, it's an unrelateds command which I added simply due to the z* command convention	2009-01-05 11:10:13 -08:00
Brian Behlendorf	810db7e0a2	Remove zcommon reference merged in to zpool	2008-12-12 13:41:20 -08:00
Brian Behlendorf	6b2c60acca	Moving lib/libspl to linux-libspl branch	2008-12-11 15:38:59 -08:00
Brian Behlendorf	a4076c7544	Script updates	2008-12-11 14:21:14 -08:00
Brian Behlendorf	c4911ece24	Move library files to lib	2008-12-11 14:16:55 -08:00
Brian Behlendorf	4b7ee081ce	Fix typo	2008-12-11 11:16:38 -08:00
Brian Behlendorf	77755a5771	Add a few missing files	2008-12-11 11:14:49 -08:00
Brian Behlendorf	172bb4bd5e	Move the world out of /zfs/ and seperate out module build tree	2008-12-11 11:08:09 -08:00
Brian Behlendorf	9e8b1e836c	Remove libumem, we will try and remove this dependency entirely. If we can't then the best move will simply be to use the official library, or build it as a convenience library	2008-12-10 12:43:20 -08:00
Brian Behlendorf	5e97ed8493	Move vmem* to libumem	2008-12-09 14:14:00 -08:00
Brian Behlendorf	48343be6a3	Temporarily move taskq+util to libzpool until that directory is broken in to lib+module	2008-12-09 13:32:01 -08:00
Brian Behlendorf	2f40ac4d9e	Minor tweak to update script	2008-12-08 16:38:46 -08:00
Brian Behlendorf	96072c88e2	Add userspace zfs_context file	2008-12-03 15:43:56 -08:00
Brian Behlendorf	b128c09fbe	Rebase to OpenSolaris b103, in the process we are removing any code which did not originate from the OpenSolaris source. These changes will be reintroduced in topic branches for easier tracking	2008-12-03 12:09:06 -08:00
Brian Behlendorf	7ebbc0c799	Finish removing all non-upstream bits from master	2008-12-01 16:15:29 -08:00
Brian Behlendorf	ef76e2f5ea	Removed build system from master branch, will relocate to linux-zfs-branch	2008-12-01 15:41:33 -08:00
Brian Behlendorf	100eb88b46	Update zpios for trivial workload	2008-11-26 15:48:14 -08:00
Brian Behlendorf	34dc7c2f25	Initial Linux ZFS GIT Repo	2008-11-20 12:01:55 -08:00

1 2 3 4 5

238 Commits