PaX/GrSecurity patched kernels implement a dialect of C that relies on a
GCC plugin for enforcement. A basic idea in this dialect is that
function pointers in structures should not change during runtime.
This causes code that modifies function pointers at runtime to fail to
compile in many instances. The autotools checks rely on whether or
not small test cases compile against a given kernel. Some
autotools checks assume some default case if other cases fail. When one
of these autotools checks tests a PaX/GrSecurity patched kernel by
modifying a function pointer at runtime, the default case will be used.
Early detection of such situations is possible by relying on compiler
warnings, which are compiler errors when --enable-debug is used.
Unfortunately, very few people build ZFS with --enable-debug. The more
common situation is that these issues manifest themselves as runtime
failures in the form of NULL pointer exceptions.
Previous patches that addressed such issues with PaX/GrSecurity
compatibility largely relied on rewriting autotools checks to avoid
runtime function pointer modification or the addition of PaX/GrSecurity
specific checks. This patch takes the previous work to its logical
conclusion by eliminating the use of runtime function pointer
modification. This permits the removal of PaX-specific autotools checks
in favor of ones that work across all supported kernels.
This should resolve issues that were reported to occur with
PaX/GrSecurity-patched Linux 3.7.5 kernels on Gentoo Linux.
https://bugs.gentoo.org/show_bug.cgi?id=457176
We should be able to prevent future regressions in PaX/GrSecurity
compatibility by ensuring that all changes to ZFSOnLinux avoid runtime
function pointer modification. At the same time, this does not solve the
issue of silent failures triggering default cases in the autotools
check, which is what permitted these regressions to become runtime
failures in the first place. This will need to be addressed in a future
patch.
Reported-by: Marcin Mirosław <bug@mejor.pl>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1300
To determine whether the kernel is capable of handling empty barrier
BIOs, we check for the presence of the bio_empty_barrier() macro,
which was introduced in 2.6.24. If this macro is defined, then we can
flush disk vdevs; if it isn't, then flushing is disabled.
Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37,
even though the kernel is still capable of handling empty barrier BIOs.
As a result, flushing is effectively disabled on kernels >= 2.6.37,
meaning that starting from this kernel version, zfs doesn't use
barriers to guarantee on-disk data consistency. This is quite bad and
can lead to potential data corruption on power failures.
This patch fixes the issue by removing the configure check for
bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore.
Thanks to Richard Kojedzinszky for catching this nasty bug.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1318
As a matter of fact, we're already using -Werror for most tests because
of a bug in kernel-bio-empty-barrier.m4 which sets -Werror without
reverting it afterwards. This meant that all tests which ran after this
one was using -Werror.
This patch simply makes it clear that we're using -Werror and makes
the code more readable and more predictable.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1317
Commit 4b2f65b253 increased the user
space stack by 4x to resolve certain stack overflows. As such it
no longer makes sense to worry about a single extra page which
might or might not be part of the process stack. There is now
ample headroom for normal usage.
By eliminating this configure check we are also resolving the
following segfault which intentionally occurs at configure time
and may be logged in dmesg.
conftest[22156]: segfault at 7fbf18a47e48 ip 00000000004007fe
sp 00007fbf18a4be50 error 6 in conftest[400000+1000]
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
It's doubtful many people were impacted by this but commit 6c28567
accidentally broke ZFS builds for 2.6.26 and earlier kernels. This
commit depends on the lookup_bdev() function which exists in 2.6.26
but wasn't exported until 2.6.27.
The availability of the function isn't critical so a wrapper is
introduced which returns ERR_PTR(-ENOTSUP) when the function isn't
defined. This will have the effect of causing zvol_is_zvol() to
always fail for 2.6.26 kernels. This in turn means vdevs will
always get opened concurrently which is good for normal usage.
This will only become an issue if your using a zvol as a vdev in
another pool. In which case you really should be using a newer
kernel anyway.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1205
As of Linux 2.6.37 the right way to register custom dentry
operations is to use the super block's ->s_d_op field.
For older kernels they should be registered as part of the
lookup operation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1223
Lookups in the snapshot control directory for an existing snapshot
fail with ENOENT if an earlier lookup failed before the snapshot was
created. This is because the earlier lookup causes a negative dentry
to be cached which is never invalidated.
The bug can be reproduced as follows (the second ls should succeed):
$ ls /tank/.zfs/snapshot/s
ls: cannot access /tank/.zfs/snapshot/s: No such file or directory
$ zfs snap tank@s
$ ls /tank/.zfs/snapshot/s
ls: cannot access /tank/.zfs/snapshot/s: No such file or directory
To remedy this, always invalidate cached dentries in the snapshot
control directory. Since these entries never exist on disk there is
no significant performance penalty for the extra lookups.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1192
Certain versions of gcc generate an 'unrecognized command
line option' error message when -Wunused-but-set-variable
is used unconditionally. This in turn can cause several
of the autoconf tests to misdetect an interface.
Now, the use of -Wunused-but-set-variable in the autoconf
tests was introduced by commit b9c59ec8 to address a gcc
4.6 compatibility problem. So we really only need to pass
this option for version of gcc which are known to support it.
Therefore, the tests have been updated to use the result of
the existing ZFS_AC_CONFIG_ALWAYS_NO_UNUSED_BUT_SET_VARIABLE
which determines if gcc supports this option.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1004
Revert the portion of commit d3aa3ea which always resulted in the
SAs being update when an mmap()'ed file was closed. That change
accidentally resulted in unexpected ctime updates which upset tools
like git. That was always a horrible hack and I'm happy it will
never make it in to a tagged release.
The right fix is something I initially resisted doing because I
was worried about the additional overhead. However, in hindsight
the overhead isn't as bad as I feared.
This patch implemented the sops->dirty_inode() callback which is
unsurprisingly called when an inode is dirtied. We leverage this
callback to keep the znode SAs strictly in sync with the inode.
However, for now we're going to go slowly to avoid introducing
any new unexpected issues by only updating the atime, mtime, and
ctime. This will cover the callpath of most concern to us.
->filemap_page_mkwrite->file_update_time->update_time->
mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#764Closes#1140
Previously this check was only performed when ./configure was
attempting to autodetect your kernel source directory. But we
should also handle the case where --with-linux was provided
and is obviously wrong. This way we catch the error before
invoking make and compiling the source with an incorrect
autoconf results.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closeszfsonlinux/spl#162
Use the bdev_physical_block_size() interface to determine the
minimize write size which can be issued without incurring a
read-modify-write operation. This is used to set the ashift
correctly to prevent a performance penalty when using AF hard
disks.
Unfortunately, this interface isn't entirely reliable because
it's not uncommon for disks to misreport this value. For this
reason you may still need to manually set your ashift with:
zpool create -o ashift=12 ...
The solution to this in the upstream Illumos source was to add
a white list of known offending drives. Maintaining such a list
will be a burden, but it still may be worth doing if we can
detect a large number of these drives. This should be considered
as future work.
Reported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#916
Use .mkdir instead of .create in 3.3 compatibility check. Linux 3.6
modifies inode_operations->create's function prototype. This causes
an autotools Linux 3.3. compatibility check for a function prototype
change in create, mkdir and mknode to fail. Since mkdir and mknode
are unchanged, we modify the check to examine it instead.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
As of Linux commit ebfc3b49a7ac25920cb5be5445f602e51d2ea559 the
struct nameidata is no longer passed to iops->create. Instead
only the result of (inamedata->flags & LOOKUP_EXCL) is passed.
ZFS like almost all Linux fileystems never made use of this so
only the prototype needs to be wrapped for compatibility.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
As of Linux commit 00cd8dd3bf95f2cc8435b4cac01d9995635c6d0b the
struct nameidata is no longer passed to iops->lookup. Instead
only the inamedata->flags are passed.
ZFS like almost all Linux fileystems never made use of this so
only the prototype needs to be wrapped for compatibility.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
As of Linux commit 9249e17fe094d853d1ef7475dd559a2cc7e23d42 the
mount flags are now passed to sget() so they can be used when
initializing a new superblock.
ZFS never uses sget() in this fashion so we can simply pass a
zero and add a zpl_sget() compatibility wrapper.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
As of Linux 2.6.36 an elevator_change() interface was added.
This commit updates vdev_elevator_switch() to use this interface
when available, otherwise it falls back to the usermodehelper
method.
Original-patch-by: foobarz <sysop@xeon.(none)>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#906
In order to implement synchronous NFS metadata semantics ZFS
needs to provide the .commit_metadata hook. All it takes there
is to make sure changes are committed to ZIL. Fortunately
zfs_fsync() does just that, so simply calling it from
zpl_commit_metadata() does the trick.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#969
This reverts commit 395350c85d which
accidentally introduced issue #955.
Pools using AF drives which were originally created with a sector
size of 512 bytes will now be correctly detected to have physical
sector size of 4096. This is desirable for a new pool, however for
an existing pool abruptly changing the sector size causes problems.
For this reason, this change is being reverted until the additional
logic can be added to detect the existing pool case. Existing
pools must use the ashift size stored in the label regardless of
what the disk reports. This is critical for compatibility.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #955
Use the bdev_physical_block_size() interface to determine the
minimize write size which can be issued without incurring a
read-modify-write operation. This is used to set the ashift
correctly to prevent a performance penalty when using AF hard
disks.
Unfortunately, this interface isn't entirely reliable because
it's not uncommon for disks to misreport this value. For this
reason you may still need to manually set your ashift with:
zpool create -o ashift=12 ...
The solution to this in the upstream Illumos source was to add
a while list of known offending drives. Maintaining such a list
will be a burden, but it still may be worth doing if we can
detect a large number of these drives. This should be considered
as future work.
Reported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#916
The autoconf macro which failed if CONFIG_PREEMPT was set in the kernel
config was removed. With the inclusion of a few previous patches
targeting support for preempt enabled kernels, it is now safe to run
with this kernel config option enabled.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#83
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#718
ZFS fails to build when SPL is built into the kernel on unless
--with-spl=/path/to/kernel/sources is specified. We fallback to the
kernel sources directory when SPL is not found elsewhere to resolve
that.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closed#896
./configure erroneously detects absence of dops->d_automount
when built against a GrSecurity patched kernel.
Summerized error message found in config.log:
checking whether dops->d_automount() exists
...
In function 'main': ... error: constified variable 'dops'
cannot be local
The "dops" variable cannot be a local variable, so it's
moved to the global scope.
This test also fails if the prototype of the dops->d_automount
function pointer is changed.
Signed-off-by: Massimo Maggi <massimo@mmmm.it>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Closes#884
This commit adds support for building a zfs-modules-dkms sub package
built around Dynamic Kernel Module Support. This is to allow building
packages using the DKMS infrastructure which is intended to ease the
burden of kernel version changes, upgrades, etc.
By default zfs-modules-dkms-* sub package will be built as part of
the 'make rpm' target. Alternately, you can build only the DKMS
module package using the 'make rpm-dkms' target.
Examples:
# To build packaged binaries as well as a dkms packages
$ ./configure && make rpm
# To build only the packaged binary utilities and dkms packages
$ ./configure && make rpm-utils rpm-dkms
Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are
supported for building the dkms sub package.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #535
When checking for the SPL Module.symvers file, a timeout can now be
passed in which will pause the configure step while it waits for this
file to be generated. By default, the configure behavior is unchanged as
a timeout of 0 is used. If a positive number of seconds is passed,
configure will wait that number of seconds for the Module.symvers file
before moving on.
The main motivation for this change was to support parallel execution of
'./configure && make' for the SPL and ZFS packages in preparation of
supporting DKMS based packages.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Currently, zvols have a discard granularity set to 0, which suggests to
the upper layer that discard requests of arbirarily small size and
alignment can be made efficiently.
In practice however, ZFS does not handle unaligned discard requests
efficiently: indeed, it is unable to free a part of a block. It will
write zeros to the specified range instead, which is both useless and
inefficient (see dnode_free_range).
With this patch, zvol block devices expose volblocksize as their discard
granularity, so the upper layer is aware that it's not supposed to send
discard requests smaller than volblocksize.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#862
This patch adds a new autoconf function: ZFS_LINUX_TRY_COMPILE_SYMBOL.
This new function does the following:
- Call LINUX_TRY_COMPILE with the specified parameters.
- If unsuccessful, return false.
- If successful and we're configuring with --enable-linux-builtin,
return true.
- Else, call CHECK_SYMBOL_EXPORT with the specified parameters and
return the result.
All calls to CHECK_SYMBOL_EXPORT are converted to
LINUX_TRY_COMPILE_SYMBOL so that the tests work even when configuring
for builtin on a kernel which doesn't have loadable module support, or
hasn't been built yet.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
Currently, when building a test case, we're compiling an entire Linux
module from beginning to end. This includes the MODPOST stage, which
generates a "conftest.mod.c" file with some boilerplate module
declaration code.
This poses a problem when configuring for built-in on kernels which have
loadable module support disabled. In this case conftest.mod.c is
referencing disabled code, resulting in a compilation failure, thus
breaking the tests.
This patch fixes the issue by faking the modpost stage when the
--enable-linux-builtin option is provided. It does so by forcing the
modpost command to be /bin/true, and using an empty conftest.mod.c file.
The test module still compiles fine, although the result isn't loadable,
but we don't really care at this point.
Note it is important to preserve the modpost stage when building out of
tree. The ZFS_AC_KERNEL_BLK_END_REQUEST, ZFS_AC_KERNEL_BLK_QUEUE_FLUSH,
and ZFS_AC_KERNEL_BLK_RQ_BYTES configure checks all depend on it to
identify GPL-only symbols.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
This patch adds a new option to configure: --enable-linux-builtin. When
this option is used, the following happens:
- Compilation of kernel modules is disabled.
- A failure to find UTS_RELEASE is followed by a suggestion to run
"make prepare" on the kernel source tree.
This patch also adds a new test which tries to compile an empty module
as a basic toolchain sanity test. If it fails and the option was
specified, the error is followed by a suggestion to run "make scripts"
on the kernel source tree.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
Currently, when configure --with-config is used, selective compilation
is only effective for the simple "make" case. Package builders (e.g.
make rpm) still build everything (utils and modules). This patch fixes
that.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
The end_writeback() function was changed by moving the call to
inode_sync_wait() earlier in to evict(). This effecitvely changes
the ordering of the sync but it does not impact the details of
the zfs implementation.
However, as part of this change end_writeback() was renamed to
clear_inode() to reflect the new semantics. This change does
impact us and clear_inode() now maps to end_writeback() for
kernels prior to 3.5.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#784
The vmtruncate_range() support has been removed from the kernel in
favor of using the fallocate method in the file_operations table.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
The export_operations member ->encode_fh() has been updated to
take both the child and parent inodes. This interface used to
take the child dentry and a bool describing if the parent is needed.
NOTE: While updating this code I noticed that we do not currently
cleanly handle the case where we're passed a connectable parent.
This code should be audited to make sure we're doing the right thing.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
Support for PaX/GRSecurity patched kernels was developed against Linux
3.2. Unfortunately, an autotools check introduced for a Linux 3.3 API
fails on PaX/GRSecurity patched kernels. This causes the module to be
built against the Linux 3.2 ABI, which results in a NULL pointer
dereference at runtime.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Closes#794Closes#809
Gentoo Hardened kernels include the PaX/GRSecurity patches. They use a
dialect of C that relies on a GCC plugin. In particular, struct
file_operations has been marked do_const in the PaX/GRSecurity dialect,
which causes GCC to consider all instances of it as const. This caused
failures in the autotools checks and the ZFS source code.
To address this, we modify the autotools checks to take into account
differences between the PaX C dialect and the regular C dialect. We also
modify struct zfs_acl's z_ops member to be a pointer to a function
pointer table. Lastly, we modify zpl_put_link() to address a PaX change
to the function prototype of nd_get_link(). This avoids compiler errors
in the PaX/GRSecurity dialect.
Note that the change in zpl_put_link() causes a warning that becomes a
build failure when debugging is enabled. Fixing that warning requires
ryao/spl@5ca50ef459.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#484
Currently, zpool online -e (dynamic vdev expansion) doesn't work on
whole disks because we're invoking ioctl(BLKRRPART) from userspace
while ZFS still has a partition open on the disk, which results in
EBUSY.
This patch moves the BLKRRPART invocation from the zpool utility to the
module. Specifically, this is done just before opening the device in
vdev_disk_open() which is called inside vdev_reopen(). This requires
jumping through some hoops to get to the disk device from the partition
device, and to make sure we can still open the partition after the
BLKRRPART call.
Note that this new code path is triggered on dynamic vdev expansion
only; other actions, like creating a new pool, are unchanged and still
call BLKRRPART from userspace.
This change also depends on API changes which are available in 2.6.37
and latter kernels. The build system has been updated to detect this,
but there is no compatibility mode for older kernels. This means that
online expansion will NOT be available in older kernels. However, it
will still be possible to expand the vdev offline.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#808
torvalds/linux@adc0e91ab1 introduced
introduced d_make_root() as a replacement for d_alloc_root(). Further
commits appear to have removed d_alloc_root() from the Linux source
tree. This causes the following failure:
error: implicit declaration of function 'd_alloc_root'
[-Werror=implicit-function-declaration]
To correct this we update the code to use the current d_make_root()
interface for readability. Then we introduce an autotools check
to determine if d_make_root() is available. If it isn't then we
define some compatibility logic which used the older d_alloc_root()
interface.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#776
The configure script error message for kernels built with
CONFIG_DEBUG_LOCK_ALLOC may give the impression that the issue is
strictly with license compliance. To avoid confusion add some words
indicating that the linking stage will fail if the build continues.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#773
The CONFIG_DEBUG_LOCK_ALLOC check at configure time was added to
detect when mutex_lock() is defined as a GPL-only symbol. However,
the check as written only inferred this from this configuration
setting, it never actually checked. This change introduces that
missing check to prevent false positives.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The mode argument of iops->create()/mkdir()/mknod() was changed from
an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf
check was added to detect the API change and then correctly set a
zpl_umode_t typedef. There is no functional change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#701
Caught by lint, this permission change was accidentally introduced
by commit 42cb3819f1. Restore the
correct permissions and while I'm at it add a missing whack-bang
to config/ltmain.sh.
lint: executable-not-elf-or-script: zpool_main.c zfs_main.c
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#620
Allow rigorous (and expensive) tx validation to be enabled/disabled
indepentantly from the standard zfs debugging. When enabled these
checks ensure that all txs are constructed properly and that a dbuf
is never dirtied without taking the correct tx hold.
This checking is particularly helpful when adding new dmu consumers
like Lustre. However, for established consumers such as the zpl
with no known outstanding tx construction problems this is just
overhead.
--enable-debug-dmu-tx - Enable/disable validation of each tx as
--disable-debug-dmu-tx it is constructed. By default validation
is disabled due to performance concerns.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add support for the .zfs control directory. This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required. The bulk of the core
functionality is now all there with the following limitations.
*) The .zfs/snapshot directory automount support requires a 2.6.37
or newer kernel. The exception is RHEL6.2 which has backported
the d_automount patches.
*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
in the .zfs/snapshot directory works as expected. However,
this functionality is only available to root until zfs
delegations are finished.
* mkdir - create a snapshot
* rmdir - destroy a snapshot
* mv - rename a snapshot
The following issues are known defeciences, but we expect them to
be addressed by future commits.
*) Add automount support for kernels older the 2.6.37. This should
be possible using follow_link() which is what Linux did before.
*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
The majority of the ground work for this is complete. However,
finishing this work will require resolving some lingering
integration issues with the Linux NFS kernel server.
*) The .zfs/shares directory exists but no futher smb functionality
has yet been implemented.
Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#173
Improve the distribution detection by moving the tests for
distribution specific files first. The Ubuntu and Debian
checks are left for last because they are the least likely
to be unique. This is particularly true in the case of Debian
since so many distributions are based on Debian.
Since this is currently only used to identify the correct
packaging method for this system the result in many instances
is simply cosmetic.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Allow a source rpm to be rebuilt with debugging enabled. This
avoids the need to have to manually modify the spec file. By
default debugging is still largely disabled. To enable specific
debugging features use the following options with rpmbuild.
'--with debug' - Enables ASSERTs
# For example:
$ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm
Additionally, ZFS_CONFIG has been added to zfs_config.h for
packages which build against these headers. This is critical
to ensure both zfs and the dependant package are using the same
prototype and structure definitions.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
It allows ZVOL clients to discard (unmap, trim) block ranges from
a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
shrink instead of just grow.
We can't use zfs_space() or zfs_freesp() here, since these functions
only work on regular files, not volumes. Fortunately we can use the
low-level function dmu_free_long_range() which does exactly what we
want.
Currently the discard operation is not added to the log. That's not
a big deal since losing discard requests cannot result in data
corruption. It would however result in disk space usage higher than
it should be. Thus adding log support to zvol_discard() is probably
a good idea for a future improvement.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is
supported, since it's the only one that matches the behavior of
zfs_space(). This makes it pretty much useless in its current
form, but it's a start.
To support other flag combinations we would need to modify
zfs_space() to make it more flexible, or emulate the desired
functionality in zpl_fallocate().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #334
The Linux block device queue subsystem exposes a number of configurable
settings described in Linux block/blk-settings.c. The defaults for these
settings are tuned for hard drives, and are not optimized for ZVOLs. Proper
configuration of these options would allow upper layers (I/O scheduler) to
take better decisions about write merging and ordering.
Detailed rationale:
- max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to
handle writes of any size, so there's no reason to impose a limit. Let the
upper layer decide.
- max_segments and max_segment_size are set to unlimited. zvol_write() will
copy the requests' contents into a dbuf anyway, so the number and size of
the segments are irrelevant. Let the upper layer decide.
- physical_block_size and io_opt are set to the ZVOL's block size. This
has the potential to somewhat alleviate issue #361 for ZVOLs, by warning
the upper layers that writes smaller than the volume's block size will be
slow.
- The NONROT flag is set to indicate this isn't a rotational device.
Although the backing zpool might be composed of rotational devices, the
resulting ZVOL often doesn't exhibit the same behavior due to the COW
mechanisms used by ZFS. Setting this flag will prevent upper layers from
making useless decisions (such as reordering writes) based on incorrect
assumptions about the behavior of the ZVOL.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
zvol_write() assumes that the write request must be written to stable storage
if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed,
"sync" does *not* mean what we think it means in the context of the Linux
block layer. This is well explained in linux/fs.h:
WRITE: A normal async write. Device will be plugged.
WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down
the hint that someone will be waiting on this IO
shortly.
WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush.
WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on
non-volatile media on completion.
In other words, SYNC does not *mean* that the write must be on stable storage
on completion. It just means that someone is waiting on us to complete the
write request. Thus triggering a ZIL commit for each SYNC write request on a
ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL
users have no way to express that they actually want data to be written to
stable storage, which means the ZIL is broken for ZVOLs.
The request for stable storage is expressed by the FUA flag, so we must
commit the ZIL after the write if the FUA flag is set. In addition, we must
commit the ZIL before the write if the FLUSH flag is set.
Also, we must inform the block layer that we actually support FLUSH and FUA.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The second argument of sops->show_options() was changed from a
'struct vfsmount *' to a 'struct dentry *'. Add an autoconf check
to detect the API change and then conditionally define the expected
interface. In either case we are only interested in the zfs_sb_t.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#549
When the original build system code was added the release
component was accidentally omited from the development header
install path. This patch adds the missing path component so
it's always clear exactly what release your compiling against.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Unfortunately, Arch's package manager `pacman` shares it's name with a
popular arcade video game. Thus, in order to refrain from executing the
video game when we mean to execute the package manager, ZFS_AC_PACMAN is
now only run when $VENDOR is determined to be "arch".
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#517
The security_inode_init_security() API has been changed to include
a filesystem specific callback to write security extended attributes.
This was done to support the initialization of multiple LSM xattrs
and the EVM xattr.
This change updates the code to use the new API when it's available.
Otherwise it falls back to the previous implementation.
In addition, the ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY
autoconf test has been made more rigerous by passing the expected
types. This is done to ensure we always properly the detect the
correct form for the security_inode_init_security() API.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#516
The Linux 3.1 kernel has introduced the concept of per-filesystem
shrinkers which are directly assoicated with a super block. Prior
to this change there was one shared global shrinker.
The zfs code relied on being able to call the global shrinker when
the arc_meta_limit was exceeded. This would cause the VFS to drop
references on a fraction of the dentries in the dcache. The ARC
could then safely reclaim the memory used by these entries and
honor the arc_meta_limit. Unfortunately, when per-filesystem
shrinkers were added the old interfaces were made unavailable.
This change adds support to use the new per-filesystem shrinker
interface so we can continue to honor the arc_meta_limit. The
major benefit of the new interface is that we can now target
only the zfs filesystem for dentry and inode pruning. Thus we
can minimize any impact on the caching of other filesystems.
In the context of making this change several other important
issues related to managing the ARC were addressed, they include:
* The dnlc_reduce_cache() function which was called by the ARC
to drop dentries for the Posix layer was replaced with a generic
zfs_prune_t callback. The ZPL layer now registers a callback to
drop these dentries removing a layering violation which dates
back to the Solaris code. This callback can also be used by
other ARC consumers such as Lustre.
arc_add_prune_callback()
arc_remove_prune_callback()
* The arc_reduce_dnlc_percent module option has been changed to
arc_meta_prune for clarity. The dnlc functions are specific to
Solaris's VFS and have already been largely eliminated already.
The replacement tunable now represents the number of bytes the
prune callback will request when invoked.
* Less aggressively invoke the prune callback. We used to call
this whenever we exceeded the arc_meta_limit however that's not
strictly correct since it results in over zeleous reclaim of
dentries and inodes. It is now only called once the arc_meta_limit
is exceeded and every effort has been made to evict other data from
the ARC cache.
* More promptly manage exceeding the arc_meta_limit. When reading
meta data in to the cache if a buffer was unable to be recycled
notify the arc_reclaim thread to invoke the required prune.
* Added arcstat_prune kstat which is incremented when the ARC
is forced to request that a consumer prune its cache. Remember
this will only occur when the ARC has no other choice. If it
can evict buffers safely without invoking the prune callback
it will.
* This change is also expected to resolve the unexpect collapses
of the ARC cache. This would occur because when exceeded just the
arc_meta_limit reclaim presure would be excerted on the arc_c
value via arc_shrink(). This effectively shrunk the entire cache
when really we just needed to reclaim meta data.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#466Closes#292
If the lsb-release package is installed on an Arch Linux distribution,
the configure step will incorrectly detect the running distribution as
Ubuntu. This is a result of both distributions providing an
/etc/lsb-release file, and the Ubuntu VENDOR check being performed
first.
Since the Arch Linux test check's for a file more specific to the Arch
Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's
check provides a quick and easy solution.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit
SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170
Use the new set_nlink() kernel function instead.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #462
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:
$ ./configure
$ make pkg # Alternatively, one can run 'make arch' as well
on the Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the zfs userland utilities and
another for the zfs kernel modules. The new packages can then be
installed by running:
# pacman -U $package.pkg.tar.xz
In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be build using the 'sarch' make
rule.
NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#491
As of GCC 4.6, specific kernel 2.6.32 header files do not compile
cleanly without warnings. One specific example of this is the
arch/x86/include/asm/percpu.h file. Thus, a few of the configure tests
were getting hung up on this and the '-Wno-unsued-but-set-variables'
compile option had to be introduced.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#459
This change updates the AC_LANG_PROGRAM autoconf macro invocations to be
wrapped in quotes. As of autoconf version 2.68, the quotes are necessary
to prevent warnings from appearing. Specifically, the autoconf v2.68
Forward Porting Notes specifies:
It is important to note that you need to ensure that the call to
AC_LANG_SOURCE is quoted and not expanded, otherwise that will
cause the warning to appear nonetheless.
Finally, because of the additional quoting we can drop the extra
quotas used by the ZFS_AC_CONFIG_USER_STACK_GUARD autoconf check.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#464
The Linux 3.1 kernel updated the fops->fsync() callback yet again.
They now pass the requested range and delegate the responsibility
for calling filemap_write_and_wait_range() to the callback. In
addition imutex is no longer held by the caller and the callback
is responsible for taking the lock if required.
This commit updates the code to provide a zpl_fsync() function
for the updated API. Implementations for the previous two APIs
are also maintained for compatibility.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#445
Only under Ubuntu Lucid the rpm packaging step mistakenly adds
the following files twice to the package because of the /lib
naming convention. This is harmless but results in a warning
which the buildot flags as a failure. Suppress this warning.
warning: File listed twice: /lib/udev/rules.d
warning: File listed twice: /lib/udev/rules.d/60-zpool.rules
warning: File listed twice: /lib/udev/rules.d/60-zvol.rules
warning: File listed twice: /lib/udev/rules.d/90-zfs.rules
warning: File listed twice: /lib/udev/sas_switch_id
warning: File listed twice: /lib/udev/zpool_id
warning: File listed twice: /lib/udev/zvol_id
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Update the code to use the bdi_setup_and_register() helper to
simplify the bdi integration code. The updated code now just
registers the bdi during mount and destroys it during unmount.
The only complication is that for 2.6.32 - 2.6.33 kernels the
helper wasn't available so in these cases the zfs code must
provide it. Luckily the bdi_setup_and_register() function
is trivial.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#367
The 'if' statements found in kernel.m4 were converted to use the
portable alternative provided by autoconf, the AS_IF macro.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
A few of the autoconf error messages were inconsistent with the rest of
the build system. To be specific, the inconsistencies addressed by this
commit are the following:
* The second line of the error message for the CONFIG_PREEMPT check
was missing it's third asterisk.
* A few of the error messages were prefixed by two tabs, whereas the
majority of error messages are only prefixed by a single tab.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The warnings listed in the suppression file will be suppressed
and not flagged during regular buildbot builds. These warnings
are expected, harmless, and can obscure real issues unless they
are suppressed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The hardened gentoo kernel defines all of the super block
operation callbacks as const. This prevents the autoconf test
from assigning the callback and results in a false negative.
By moving the assignment in to the declaration we can avoid
this issue and get a correct result for this patched kernel.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#296
This change moves the default install location for the zfs udev
rules from /etc/udev/ to /lib/udev/. The correct convention is
for rules provided by a package to be installed in /lib/udev/.
The /etc/udev/ directory is reserved for custom rules or local
overrides.
Additionally, this patch cleans up some abuse of the bindir install
location by adding a udevdir and udevruledir install directories.
This allows us to revert to the default bin install location. The
udev install directories can be set with the following new options.
--with-udevdir=DIR install udev helpers [EPREFIX/lib/udev]
--with-udevruledir=DIR install udev rules [UDEVDIR/rules.d]
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#356
For a long time now the kernel has been moving away from using the
pdflush daemon to write 'old' dirty pages to disk. The primary reason
for this is because the pdflush daemon is single threaded and can be
a limiting factor for performance. Since pdflush sequentially walks
the dirty inode list for each super block any delay in processing can
slow down dirty page writeback for all filesystems.
The replacement for pdflush is called bdi (backing device info). The
bdi system involves creating a per-filesystem control structure each
with its own private sets of queues to manage writeback. The advantage
is greater parallelism which improves performance and prevents a single
filesystem from slowing writeback to the others.
For a long time both systems co-existed in the kernel so it wasn't
strictly required to implement the bdi scheme. However, as of
Linux 2.6.36 kernels the pdflush functionality has been retired.
Since ZFS already bypasses the page cache for most I/O this is only
an issue for mmap(2) writes which must go through the page cache.
Even then adding this missing support for newer kernels was overlooked
because there are other mechanisms which can trigger writeback.
However, there is one critical case where not implementing the bdi
functionality can cause problems. If an application handles a page
fault it can enter the balance_dirty_pages() callpath. This will
result in the application hanging until the number of dirty pages in
the system drops below the dirty ratio.
Without a registered backing_device_info for the filesystem the
dirty pages will not get written out. Thus the application will hang.
As mentioned above this was less of an issue with older kernels because
pdflush would eventually write out the dirty pages.
This change adds a backing_device_info structure to the zfs_sb_t
which is already allocated per-super block. It is then registered
when the filesystem mounted and unregistered on unmount. It will
not be registered for mounted snapshots which are read-only. This
change will result in flush-<pool> thread being dynamically created
and destroyed per-mounted filesystem for writeback.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#174
The latest kernels no longer define AUTOCONF_INCLUDED which was
being used to detect the new style autoconf.h kernel configure
options. This results in the CONFIG_* checks always failing
incorrectly for newer kernels.
The fix for this is a simplification of the testing method.
Rather than attempting to explicitly include to renamed config
header. It is simpler to unconditionally include <linux/module.h>
which must pick up the correctly named header.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#320
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d. This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#322
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure. When using the new
interface the caller must now use the mount_nodev() helper.
Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers. This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use. It provides our only entry point
in to the namespace layer for manipulating certain mount options.
This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly. It also allowed me
to keep more of the original Solaris code unmodified. Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do. However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem. Thus keeping a back
reference from the filesystem to the namespace is complicated.
Rather than introduce some ugly hack to get the vfsmount and
continue as before. I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.
This commit updates the code to remove this vfsmount back
reference entirely. All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.
Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code. This
change which fairly involved has turned out nicely.
Closes#246Closes#217Closes#187Closes#248Closes#231
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.
Closes#246Closes#217Closes#187
RPM version 4.9.0 has been observed to generate extra debug
messages in certain cases. These debug messages prevent us
from cleanly acquiring the architecture. This is clearly
an upstream RPM bug which will get fixed. But until then
a safe solution is to pipe the result through 'tail -1'
to just grab the architecture bit we care about.
Example 'rpm -qp spl-0.6.0-rc4.src.rpm --qf %{arch}' output:
Freeing read locks for locker 0x166: 28031/47480843735008
Freeing read locks for locker 0x168: 28031/47480843735008
x86_64
The inode eviction should unmap the pages associated with the inode.
These pages should also be flushed to disk to avoid the data loss.
Therefore, use truncate_setsize() in evict_inode() to release the
pagecache.
The API truncate_setsize() was added in 2.6.35 kernel. To ensure
compatibility with the old kernel, the patch defines its own
truncate_setsize function.
Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com>
Closes#255
The previous commit 8a7e1ceefa wasn't
quite right. This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.
For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable. This can lead to build failures
on older Linux platforms such as Debian Lenny. Since this is
an optional build argument this changes add a new autoconf check
for the option. If it is supported by the installed version of
gcc then it is used otherwise it is omited.
See commit's 12c1acde76 and
79713039a2 for the reason the
-Wno-unused-but-set-variable options was originally added.
Every distribution has slightly different requirements for their
init scripts. Because of this the zfs package contains several
init scripts for various distributions. These scripts have been
contributed by, and are supported by, the larger zfs community.
Init scripts for Gentoo/Lunar/Redhat have been contributed by:
Gentoo - devsk <devsku@gmail.com>
Lunar - Jean-Michel Bruenn <jean.bruenn@ip-minds.de>
Redhat - Fajar A. Nugraha <list@fajar.net>
This change fixes a kernel panic which would occur when resizing
a dataset which was not open. The objset_t stored in the
zvol_state_t will be set to NULL when the block device is closed.
To avoid this issue we pass the correct objset_t as the third arg.
The code has also been updated to correctly notify the kernel
when the block device capacity changes. For 2.6.28 and newer
kernels the capacity change will be immediately detected. For
earlier kernels the capacity change will be detected when the
device is next opened. This is a known limitation of older
kernels.
Online ext3 resize test case passes on 2.6.28+ kernels:
$ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023
$ zpool create tank /tmp/zvol
$ zfs create -V 500M tank/zd0
$ mkfs.ext3 /dev/zd0
$ mkdir /mnt/zd0
$ mount /dev/zd0 /mnt/zd0
$ df -h /mnt/zd0
$ zfs set volsize=800M tank/zd0
$ resize2fs /dev/zd0
$ df -h /mnt/zd0
Original-patch-by: Fajar A. Nugraha <github@fajar.net>
Closes#68Closes#84
As of gcc-4.6 the option -Wunused-but-set-variable is enabled by
default. While this is a useful warning there are numerous places
in the ZFS code when a variable is set and then only checked in an
ASSERT(). To avoid having to update every instance of this in the
code we now set -Wno-unused-but-set-variable to suppress the warning.
Additionally, when building with --enable-debug and -Werror set these
warning also become fatal. We can reevaluate the suppression of these
error at a later time if it becomes an issue. For now we are basically
just reverting to the previous gcc behavior.
Newer versions of gcc are getting smart enough to detect the sloppy
syntax used for the autoconf tests. It is now generating warnings
for unused/undeclared variables. Newer version of gcc even have
the -Wunused-but-set-variable option set by default. This isn't a
problem except when -Werror is set and they get promoted to an error.
In this case the autoconf test will return an incorrect result which
will result in a build failure latter on.
To handle this I'm tightening up many of the autoconf tests to
explicitly mark variables as unused to suppress the gcc warning.
Remember, all of the autoconf code can never actually be run we
just want to get a clean build error to detect which APIs are
available. Never using a variable is absolutely fine for this.
Closes#176
Added insert_inode_locked() helper function, prior to this most callers
used insert_inode_hash(). The older method doesn't check for collisions
in the inode_hashtable but it still acceptible for use. Fallback to
using insert_inode_hash() when insert_inode_locked() is unavailable.
To simplify the process of using zfs as your root filesystem a
zfs-drucat sub-package has been added. This sub-package adds a zfs
dracut module which allows your initramfs to be rebuilt with zfs
support. The process for doing this is still complicated but there
is clearly interest from the community about getting this working
well and documented. This should help lay some of the groundwork.
Longer term these changes should be pushed in the upstream dracut
package. Once that occurs this subpackage will no longer be
required for new systems, however we may want to conditionally
build this package in the future for systems running older
dracut versions.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed. Unfortunately, every distribution
has their own idea of the _right_ way to do things. Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway. I have
instead added support to provide multiple distribution specific
init scripts.
The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.
Currently, there is zfs.fedora and a more generic zfs.lsb init
script. Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.
This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
Detect early on in configure if the Modules.symvers file is missing.
Without this file there will be build failures later and it's best
to catch this early and provide a useful error. In this case the
most likely problem is the kernel-devel packages are not installed.
It may also be possible that they are using an unbuilt custom kernel
in which case they must build the kernel first.
Closes#127
Until support is added for preemptible kernels detect this at
configure time and make it fatal. Otherwise, it is possible to
have a successful build and kernel modules with flakey behavior.
The open_bdev_exclusive() function has been replaced (again) by the
more generic blkdev_get_by_path() function. Additionally, the
counterpart function close_bdev_exclusive() has been replaced by
blkdev_put(). Because these functions are more generic versions
of the functions they replaced the compatibility macro must add
the FMODE_EXCL mask to ensure they are exclusive.
Closes#114
The new prefered inteface for evicting an inode from the inode cache
is the ->evict_inode() callback. It replaces both the ->delete_inode()
and ->clear_inode() callbacks which were previously used for this.
The xattr handler prototypes were sanitized with the idea being that
the same handlers could be used for multiple methods. The result of
this was the inode type was changes to a dentry, and both the get()
and set() hooks had a handler_flags argument added. The list()
callback was similiarly effected but no autoconf check was added
because we do not use the list() callback.
The fsync() callback in the file_operations structure used to take
3 arguments. The callback now only takes 2 arguments because the
dentry argument was determined to be unused by all consumers. To
handle this a compatibility prototype was added to ensure the right
prototype is used. Our implementation never used the dentry argument
either so it's just a matter of using the right prototype.
The const keyword was added to the 'struct xattr_handler' in the
generic Linux super_block structure. To handle this we define an
appropriate xattr_handler_t typedef which can be used. This was
the preferred solution because it keeps the code clean and readable.
Preferentially use the /lib/modules/$(uname -r)/source and
/lib/modules/$(uname -r)/build links. Only if neither of these
links exist fallback to alternate methods for deducing which
kernel to build with. This resolves the need to manually
specify --with-linux= and --with-linux-obj= on Debian systems.
ZFS even under Solaris does not strictly require libshare to be
available. The current implementation attempts to dlopen() the
library to access the needed symbols. If this fails libshare
support is simply disabled.
This means that on Linux we only need the most minimal libshare
implementation. In fact just enough to prevent the build from
failing. Longer term we can decide if we want to implement a
libshare library like Solaris. At best this would be an abstraction
layer between ZFS and NFS/SMB. Alternately, we can drop libshare
entirely and directly integrate ZFS with Linux's NFS/SMB.
Finally the bare bones user-libshare.m4 test was dropped. If we
do decide to implement libshare at some point it will surely be
as part of this package so the check is not needed.
If libselinux is detected on your system at configure time link
against it. This allows us to use a library call to detect if
selinux is enabled and if it is to pass the mount option:
"context=\"system_u:object_r:file_t:s0"
For now this is required because none of the existing selinux
policies are aware of the zfs filesystem type. Because of this
they do not properly enable xattr based labeling even though
zfs supports all of the required hooks.
Until distro's add zfs as a known xattr friendly fs type we
must use mntpoint labeling. Alternately, end users could modify
their existing selinux policy with a little guidance.
Refresh the autogen.sh products based on the versions which are
installed by default in the GA RHEL6.0 release.
autoconf (GNU Autoconf) 2.63
automake (GNU automake) 1.11.1
ltmain.sh (GNU libtool) 2.2.6b
The name of the flag used to mark a bio as synchronous has changed
again in the 2.6.36 kernel due to the unification of the BIO_RW_*
and REQ_* flags. The new flag is called REQ_SYNC. To simplify
checking this flag I have introduced the vdev_disk_dio_is_sync()
helper function. Based on the results of several new autoconf
tests it uses the correct mask to check for a synchronous bio.
Preferred interface for flagging a synchronous bio:
2.6.12-2.6.29: BIO_RW_SYNC
2.6.30-2.6.35: BIO_RW_SYNCIO
2.6.36-2.6.xx: REQ_SYNC
As of linux-2.6.36 the BIO_RW_FAILFAST and REQ_FAILFAST flags
have been unified under the REQ_* names. These flags always had
to be kept in-sync so this is a nice step forward, unfortunately
it means we need to be careful to only use the new unified flags
when the BIO_RW_* flags are not defined. Additional autoconf
checks were added for this and if it is ever unclear which method
to use no flags are set. This is safe but may result in longer
delays before a disk is failed.
Perferred interface for setting FAILFAST on a bio:
2.6.12-2.6.27: BIO_RW_FAILFAST
2.6.28-2.6.35: BIO_RW_FAILFAST_{DEV|TRANSPORT|DRIVER}
2.6.36-2.6.xx: REQ_FAILFAST_{DEV|TRANSPORT|DRIVER}
ZFS works best when it is notified as soon as possible when a device
failure occurs. This allows it to immediately start any recovery
actions which may be needed. In theory Linux supports a flag which
can be set on bio's called FAILFAST which provides this quick
notification by disabling the retry logic in the lower scsi layers.
That's the theory at least. In practice is turns out that while the
flag exists you oddly have to set it with the BIO_RW_AHEAD flag.
And even when it's set it you may get retries in the low level
drivers decides that's the right behavior, or if you don't get the
right error codes reported to the scsi midlayer.
Unfortunately, without additional kernels patchs there's not much
which can be done to improve this. Basically, this just means that
it may take 2-3 minutes before a ZFS is notified properly that a
device has failed. This can be improved and I suspect I'll be
submitting patches upstream to handle this.
One of the neat tricks an autoconf style project is capable of
is allow configurion/building in a directory other than the
source directory. The major advantage to this is that you can
build the project various different ways while making changes
in a single source tree.
For example, this project is designed to work on various different
Linux distributions each of which work slightly differently. This
means that changes need to verified on each of those supported
distributions perferably before the change is committed to the
public git repo.
Using nfs and custom build directories makes this much easier.
I now have a single source tree in nfs mounted on several different
systems each running a supported distribution. When I make a
change to the source base I suspect may break things I can
concurrently build from the same source on all the systems each
in their own subdirectory.
wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz
tar -xzf zfs-x.y.z.tar.gz
cd zfs-x-y-z
------------------------- run concurrently ----------------------
<ubuntu system> <fedora system> <debian system> <rhel6 system>
mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6
cd ubuntu cd fedora cd debian cd rhel6
../configure ../configure ../configure ../configure
make make make make
make check make check make check make check
This change also moves many of the include headers from individual
incude/sys directories under the modules directory in to a single
top level include directory. This has the advantage of making
the build rules cleaner and logically it makes a bit more sense.