Commit Graph

638 Commits

Author SHA1 Message Date
Gvozden Neskovic fc0c72b167 Support for vectorized algorithms on x86
This is initial support for x86 vectorized implementations of ZFS parity
and checksum algorithms.

For the compilation phase, configure step checks if toolchain supports relevant
instruction sets. Each implementation must ensure that the code is not passed
to compiler if relevant instruction set is not supported. For this purpose,
following new defines are provided if instruction set is supported:
	- HAVE_SSE,
	- HAVE_SSE2,
	- HAVE_SSE3,
	- HAVE_SSSE3,
	- HAVE_SSE4_1,
	- HAVE_SSE4_2,
	- HAVE_AVX,
	- HAVE_AVX2.

For detecting if an instruction set can be used in runtime, following functions
are provided in (include/linux/simd_x86.h):
	- zfs_sse_available()
	- zfs_sse2_available()
	- zfs_sse3_available()
	- zfs_ssse3_available()
	- zfs_sse4_1_available()
	- zfs_sse4_2_available()
	- zfs_avx_available()
	- zfs_avx2_available()
	- zfs_bmi1_available()
	- zfs_bmi2_available()

These function should be called once, on module load, or initialization.
They are safe to use from user and kernel space.
If an implementation is using more than single instruction set, both compiler
and runtime support for all relevant instruction sets should be checked.

Kernel fpu methods:
	- kfpu_begin()
	- kfpu_end()

Use __get_cpuid_max and __cpuid_count from <cpuid.h>
Both gcc and clang have support for these. They also handle ebx register
in case it is used for PIC code.

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4381
2016-03-21 09:24:34 -07:00
Richard Yao e853ba3519 Cleanup linking
I noticed during code review of zfsonlinux/zfs#4385 that the author of a
commit had peppered the various Makefile.am files with `$(TIRPC_LIBS)`
when putting it into `lib/libspl/Makefile.am` should have sufficed. Upon
further examination, it seems that he had copied what we do with
`$(ZLIB)`. We also have a bit of that with `-ldl` too.  Unfortunately,
what we do is wrong, so lets fix it to set a good example for future
contributors.

In addition, we have multiple `-lz` and `-luuid` passed to the compiler
because each `AC_CHECK_LIB` adds it to `$LIBS`. That is somewhat
annoying to see, so we switch to `AC_SEARCH_LIBS` to avoid it.  This is
consistent with the recommendation to use `AC_SEARCH_LIBS` over
`AC_CHECK_LIB` by autotools upstream:

https://www.gnu.org/software/autoconf/manual/autoconf-2.66/html_node/Libraries.html

In an ideal world, this would translate into improvements in ELF's
`DT_NEEDED` entries, but that is not the case because of a couple of
bugs in libtool.

The first bug causes libtool to overlink by using static link
dependencies for dynamic linking:

https://wiki.mageia.org/en/Overlinking_issues_in_packaging#libtool_issues

The workaround for this should be to pass `-Wl,--as-needed` in
`LDFLAGS`. That leads us to the second bug, where libtool passes
`LDFLAGS` after the libraries are specified and `ld` will only honor
`--as-needed` on libraries specified before it:

https://sigquit.wordpress.com/2011/02/16/why-asneeded-doesnt-work-as-expected-for-your-libraries-on-your-autotools-project/

There are a few possible workarounds for the second bug. One is to
either patch the compiler spec file to specify `-Wl,--as-needed` or pass
`-Wl,--as-needed` via `CC` like `CC='gcc -Wl,--as-needed'` so that it is
specified early. Another is to patch ltmain.sh like Gentoo does:

https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/ELT-patches/as-needed

Without one of those workarounds, this cleanup provides no benefit in
terms of `DT_NEEDED` entry generation. It should still be an improvement
because it nicely simplifies the code while encouraging good habits when
patching autotools scripts.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4426
2016-03-18 13:31:11 -07:00
Brian Behlendorf 6bb24f4dc7 Add the ZFS Test Suite
Add the ZFS Test Suite and test-runner framework from illumos.
This is a continuation of the work done by Turbo Fredriksson to
port the ZFS Test Suite to Linux.  While this work was originally
conceived as a stand alone project integrating it directly with
the ZoL source tree has several advantages:

  * Allows the ZFS Test Suite to be packaged in zfs-test package.
    * Facilitates easy integration with the CI testing.
    * Users can locally run the ZFS Test Suite to validate ZFS.
      This testing should ONLY be done on a dedicated test system
      because the ZFS Test Suite in its current form is destructive.
  * Allows the ZFS Test Suite to be run directly in the ZoL source
    tree enabled developers to iterate quickly during development.
  * Developers can easily add/modify tests in the framework as
    features are added or functionality is changed.  The tests
    will then always be in sync with the implementation.

Full documentation for how to run the ZFS Test Suite is available
in the tests/README.md file.

Warning: This test suite is designed to be run on a dedicated test
system.  It will make modifications to the system including, but
not limited to, the following.

  * Adding new users
  * Adding new groups
  * Modifying the following /proc files:
    * /proc/sys/kernel/core_pattern
    * /proc/sys/kernel/core_uses_pid
  * Creating directories under /

Notes:
  * Not all of the test cases are expected to pass and by default
    these test cases are disabled.  The failures are primarily due
    to assumption made for illumos which are invalid under Linux.
  * When updating these test cases it should be done in as generic
    a way as possible so the patch can be submitted back upstream.
    Most existing library functions have been updated to be Linux
    aware, and the following functions and variables have been added.
    * Functions:
      * is_linux          - Used to wrap a Linux specific section.
      * block_device_wait - Waits for block devices to be added to /dev/.
    * Variables:            Linux          Illumos
      * ZVOL_DEVDIR         "/dev/zvol"    "/dev/zvol/dsk"
      * ZVOL_RDEVDIR        "/dev/zvol"    "/dev/zvol/rdsk"
      * DEV_DSKDIR          "/dev"         "/dev/dsk"
      * DEV_RDSKDIR         "/dev"         "/dev/rdsk"
      * NEWFS_DEFAULT_FS    "ext2"         "ufs"
  * Many of the disabled test cases fail because 'zfs/zpool destroy'
    returns EBUSY.  This is largely causes by the asynchronous nature
    of device handling on Linux and is expected, the impacted test
    cases will need to be updated to handle this.
  * There are several test cases which have been disabled because
    they can trigger a deadlock.  A primary example of this is to
    recursively create zpools within zpools.  These tests have been
    disabled until the root issue can be addressed.
  * Illumos specific utilities such as (mkfile) should be added to
    the tests/zfs-tests/cmd/ directory.  Custom programs required by
    the test scripts can also be added here.
  * SELinux should be either is permissive mode or disabled when
    running the tests.  The test cases should be updated to conform
    to a standard policy.
  * Redundant test functionality has been removed (zfault.sh).
  * Existing test scripts (zconfig.sh) should be migrated to use
    the framework for consistency and ease of testing.
  * The DISKS environment variable currently only supports loopback
    devices because of how the ZFS Test Suite expects partitions to
    be named (p1, p2, etc).  Support must be added to generate the
    correct partition name based on the device location and name.
  * The ZFS Test Suite is part of the illumos code base at:
    https://github.com/illumos/illumos-gate/tree/master/usr/src/test

Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6
Closes #1534
2016-03-16 13:46:16 -07:00
Brian Behlendorf 7d11e37e55 Require libblkid
Historically libblkid support was detected as part of configure
and optionally enabled.  This was done because at the time support
for detecting ZFS pool vdevs had just be added to libblkid and
those updated packages were not yet part of many distributions.
This is no longer the case and any reasonably current distribution
will ship a version of libblkid which can detect ZFS pool vdevs.

This patch makes libblkid mandatory at build time and libblkid
the preferred method of scanning for ZFS pools.  For distributions
which include a modern version of libblkid there is no change in
behavior.  Explicitly scanning the default search paths is still
supported and can be enabled with the '-s' command line option.

Additionally making libblkid mandatory means that the 'zpool create'
command can reliably detect if a specified device has an existing
non-ZFS filesystem (ext4, xfs) and print a warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2448
2016-03-09 10:39:22 -08:00
Carlo Landmeter c53fb0113c Add support for alpine linux
Both Alpine Linux and Gentoo use OpenRC so we share its logic

Signed-off-by: Carlo Landmeter <clandmeter@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4386
2016-03-08 13:19:53 -08:00
Olaf Faaland c7e7ec1997 Make configure error clearer when failing to find SPL
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4251
2016-02-17 10:50:29 -08:00
Brian Behlendorf 4967a3eb9d Linux 4.5 compat: xattr list handler
The registered xattr .list handler was simplified in the 4.5 kernel
to only perform a permission check.  Given a dentry for the file it
must return a boolean indicating if the name is visible.  This
differs slightly from the previous APIs which also required the
function to copy the name in to the provided list and return its
size.  That is now all the responsibility of the caller.

This should be straight forward change to make to ZoL since we've
always required the caller to make the copy.  However, this was
slightly complicated by the need to support 3 older APIs.  Yes,
between 2.6.32 and 4.5 there are 4 versions of this interface!

Therefore, while the functional change in this patch is small it
includes significant cleanup to make the code understandable and
maintainable.  These changes include:

- Improved configure checks for .list, .get, and .set interfaces.
  - Interfaces checked from newest to oldest.
  - Strict checking for each possible known interface.
  - Configure fails when no known interface is available.
  - HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency
    with similar iops and fops configure checks.

- POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to
  move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}.
  Compatibility wrapper were added for old kernels.

- ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing
  ZPL_XATTR_{GET|SET} WRAPPERs.  Only the inode is guaranteed to be
  a valid pointer, passing NULL for the 'list' and 'name' variables
  is allowed and must be checked for.  All .list functions were
  updated to use the wrapper to aid readability.

- zpl_xattr_filldir() updated to use the .list function for its
  permission check which is consistent with the updated Linux 4.5
  interface.  If a .list function is registered it should return 0
  to indicate a name should be skipped, if there is no registered
  function the name will be added.

- Additional documentation from xattr(7) describing the correct
  behavior for each namespace was added before the relevant handlers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
2016-01-20 11:36:56 -08:00
Brian Behlendorf beeed4596b Linux 4.5 compat: get_link() / put_link()
The follow_link() interface was retired in favor of get_link().
In the process of phasing in get_link() the Linux kernel went
through two different versions.  The first of which depended
on put_link() and the final version on a delayed done function.

- Improved configure checks for .follow_link, .get_link, .put_link.
  - Interfaces checked from newest to oldest.
  - Strict checking for each possible known interface.
  - Configure fails when no known interface is available.

- Both versions .get_link are detected and supported as well
  two previous versions of .follow_link.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
2016-01-20 11:36:00 -08:00
George Wilson 59d4c71cca Illumos 3557, 3558, 3559, 3560
3557 dumpvp_size is not updated correctly when a dump zvol's size is changed
3558 setting the volsize on a dump device does not return back ENOSPC
3559 setting a volsize larger than the space available sometimes succeeds
3560 dumpadm should be able to remove a dump device
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Albert Lee <trisk@nexenta.com>

References:
  https://www.illumos.org/issues/3559
  https://github.com/illumos/illumos-gate/commit/c61ea56

Porting notes:
- Internal zvol.c changes not applied due to implementation differences.
  The external interface and behavior was already consistent with the
  latest upstream code.
- Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check.  All
  supported kernels (2.6.32 and newer) provide this interface.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4217
2016-01-15 15:38:35 -08:00
Kamil Domański 76d5bf196c Skip GPL-only symbols test when cross-compiling
Signed-off-by: Kamil Domański <kamil@domanski.co>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4107
2015-12-18 13:46:23 -08:00
Kamil Domanski e0ed96fa43 Skip GPL-only symbols test when cross-compiling
Signed-off-by: Kamil Domański <kamil@domanski.co>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#507
Closes zfsonlinux/zfs#4075
2015-12-14 10:40:33 -08:00
Brian Behlendorf cb877e0ff2 Revert "Skip GPL-only symbols test when cross-compiling"
This reverts commit 61bbbd9a77 because
older versions of autoconf (2.63) do not support the cross-compile
argument to AC_RUN_IFELSE.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #507
2015-12-11 17:07:00 -08:00
Kamil Domański 61bbbd9a77 Skip GPL-only symbols test when cross-compiling
This test depends on being able to execute the resulting binary
which will be impossible when cross-compiling.  Instead make a
worst case assumption which allows the build to continue as
recommended by the autoconf manual.

https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Runtime.html

Signed-off-by: Kamil Domański <kamil@domanski.co>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: tuxoko <tuxoko@gmail.com>
Closes zfsonlinux/spl#507
Closes zfsonlinux/zfs#4075
2015-12-11 15:27:56 -08:00
Brian Behlendorf b58986eebf Use large stacks when available
While stack size will vary by architecture it has historically defaulted to
8K on x86_64 systems.  However, as of Linux 3.15 the default thread stack
size was increased to 16K.  These kernels are now the default in most non-
enterprise distributions which means we no longer need to assume 8K stacks.

This patch takes advantage of that fact by appropriately reverting stack
conservation changes which were made to ensure stability.  Changes which
may have had a negative impact on performance for certain workloads.  This
also has the side effect of bringing the code slightly more in line with
upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4059
2015-12-07 12:20:43 -08:00
Chunwei Chen 61d482f7cd Linux 4.4 compat: xattr operations takes xattr_handler
The xattr_hander->{list,get,set} were changed to take a xattr_handler,
and handler_flags argument was removed and should be accessed by
handler->flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4021
2015-12-01 16:48:25 -08:00
Chunwei Chen 1a09371678 Linux 4.4 compat: make_request_fn returns blk_qc_t
As part of block polling support in Linux 4.4, make_request_fn should
return a cookie value of type blk_qc_t. For now, we make zvol_request
always return BLK_QC_T_NONE until we assess whether and how we want
to support block polling.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4021
2015-12-01 16:48:08 -08:00
Brian Behlendorf 5592404784 Fix synchronous behavior in __vdev_disk_physio()
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio
based on the ZIO_PRIORITY_* flag passed in.  This had the unnoticed
side-effect of making the vdev_disk_io_start() synchronous for
certain I/Os.

This in turn resulted in vdev_disk_io_start() being able to
re-dispatch zio's which would result in a RCU stalls when a disk
was removed from the system.  Additionally, this could negatively
impact performance and explains the performance regressions reported
in both #3829 and #3780.

This patch resolves the issue by making the blocking behavior
dependent on a 'wait' flag being passed rather than overloading
the passed bio flags.

Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to
non-rotational devices where there is no benefit to queuing to
aggregate the I/O.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3652
Issue #3780
Issue #3785
Issue #3817
Issue #3821
Issue #3829
Issue #3832
Issue #3870
2015-09-25 12:47:31 -07:00
Lukas Wunner 784a7fe5d9 Linux 4.3 compat: bio_end_io_t / BIO_UPTODATE
Commit torvalds/linux@4246a0b63b
("block: add a bi_error field to struct bio") dropped the error
argument from bio_endio in favor of newly introduced bio->bi_error.
This also replaces bio->bi_flags value BIO_UPTODATE.

bio_endio was a 3 argument function until Linux 2.6.24, which made it
a 2 argument function, and now the prototype has changed yet again to
a 1 argument function. Support for pre 2.6.24 kernels was already
dropped with 37f9dac592 ("zvol processing should use struct bio")
which assumed the 2 argument version in zvol_request(). Remaining code
to support the 3 argument version is hereby removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Issue #3799
2015-09-25 12:44:54 -07:00
Richard Yao 8198d18ca7 Reintroduce IO accounting on zvols on Linux 3.19+
zfsonlinux/zfs@e20cd6f7a8 caused us to
lose IO accounting on zvols. When I originally wrote that last year, the
symbols we needed to maintain IO accounting were GPL exported, but
torvalds/linux@394ffa503b provided
suitable symbols for restoring this functionality 4 months later.  We
can call them to restore the IO accounting on Linux 3.19 and later as
well as any older kernels where that patch is backported.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3741
2015-09-09 09:29:24 -07:00
Richard Yao d60328645d Remove blk_queue_nonrot() autotools check
This autotools check was never needed because we can check for the
existence of QUEUE_FLAG_NONROT in the kernel headers.

Also, the comment in config/kernel-blk-queue-nonrot.m4 is incorrect.
This was a Linux 2.6.28 API change, not a Linux 2.6.27 API change.

Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:25 -04:00
Richard Yao d677203a9b Remove blk_queue_discard() autotools check
This autotools check was never needed because we can check for the
existence of QUEUE_FLAG_DISCARD in the kernel headers.

Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:25 -04:00
Richard Yao 7d6e2adb4e Remove blk_rq_bytes()/blk_rq_sectors autotools checks
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao f952eaa7ec Remove blk_rq_pos() autotools check
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao f8c56b405d Remove blk_fetch_request() autotools check
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao e8c6be131c Remove blk_requeue_request() autotools check
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao dd6f9fe61b Remove blk_end_request() autotools check.
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao 65f340e725 Remove rq_is_sync() autotools check
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao 9ddf9b8e15 Remove rq_for_each_segment() autotools check
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:37:24 -04:00
Richard Yao 37f9dac592 zvol processing should use struct bio
Internally, zvols are files exposed through the block device API. This
is intended to reduce overhead when things require block devices.
However, the ZoL zvol code emulates a traditional block device in that
it has a top half and a bottom half. This is an unnecessary source of
overhead that does not exist on any other OpenZFS platform does this.
This patch removes it. Early users of this patch reported double digit
performance gains in IOPS on zvols in the range of 50% to 80%.

Comments in the code suggest that the current implementation was done to
obtain IO merging from Linux's IO elevator. However, the DMU already
does write merging while arc_read() should implicitly merge read IOs
because only 1 thread is permitted to fetch the buffer into ARC. In
addition, commercial ZFSOnLinux distributions report that regular files
are more performant than zvols under the current implementation, and the
main consumers of zvols are VMs and iSCSI targets, which have their own
elevators to merge IOs.

Some minor refactoring allows us to register zfs_request() as our
->make_request() handler in place of the generic_make_request()
function. This eliminates the layer of code that broke IO requests on
zvols into a top half and a bottom half. This has several benefits:

1. No per zvol spinlocks.
2. No redundant IO elevator processing.
3. Interrupts are disabled only when actually necessary.
4. No redispatching of IOs when all taskq threads are busy.
5. Linux's page out routines will properly block.
6. Many autotools checks become obsolete.

An unfortunate consequence of eliminating the layer that
generic_make_request() is that we no longer calls the instrumentation
hooks for block IO accounting. Those hooks are GPL-exported, so we
cannot call them ourselves and consequently, we lose the ability to do
IO monitoring via iostat.  Since zvols are internally files mapped as
block devices, this should be okay. Anyone who is willing to accept the
performance penalty for the block IO layer's accounting could use the
loop device in between the zvol and its consumer. Alternatively, perf
and ftrace likely could be used. Also, tools like latencytop will still
work. Tools such as latencytop sometimes provide a better view of
performance bottlenecks than the traditional block IO accounting tools
do.

Lastly, if direct reclaim occurs during spacemap loading and swap is on
a zvol, this code will deadlock. That deadlock could already occur with
sync=always on zvols. Given that swap on zvols is not yet production
ready, this is not a blocker.

Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-09-04 15:30:24 -04:00
Richard Yao 97771edaca Remove blk_queue_io_opt() autotools check
This is needed for supporting kernels earlier than 2.6.30. Support for
those kernels was dropped, so we can safely remove this check.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-09-01 09:33:18 -07:00
Richard Yao 3c119330a6 Remove blk_queue_physical_block_size() autotools check
This is needed for supporting kernels earlier than 2.6.30. Support for
those kernels was dropped, so we can safely remove this check.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-09-01 09:33:18 -07:00
Brian Behlendorf 278bee9319 Linux 3.18 compat: Snapshot auto-mounting
Re-factor the .zfs/snapshot auto-mouting code to take in to account
changes made to the upstream kernels.  And to lay the groundwork for
enabling access to .zfs snapshots via NFS clients.  This patch makes
the following core improvements.

* All actively auto-mounted snapshots are now tracked in two global
trees which are indexed by snapshot name and objset id respectively.
This allows for fast lookups of any auto-mounted snapshot regardless
without needing access to the parent dataset.

* Snapshot entries are added to the tree in zfsctl_snapshot_mount().
However, they are now removed from the tree in the context of the
unmount process.  This eliminates the need complicated error logic
in zfsctl_snapshot_unmount() to handle unmount failures.

* References are now taken on the snapshot entries in the tree to
ensure they always remain valid while a task is outstanding.

* The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right
after the auto-mount succeeds.  This allows to kernel to unmount
idle auto-mounted snapshots if needed removing the need for the
zfsctl_unmount_snapshots() function.

* Snapshots in active use will not be automatically unmounted.  As
long as at least one dentry is revalidated every zfs_expire_snapshot/2
seconds the auto-unmount expiration timer will be extended.

* Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS
to be immediately unmounted when the dentry was revalidated.  This
was a consequence of ZFS invaliding all snapdir dentries to ensure that
negative dentries didn't mask new snapshots.  This patch modifies the
behavior such that only negative dentries are invalidated.  This solves
the issue and may result in a performance improvement.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3589
Closes #3344
Closes #3295
Closes #3257
Closes #3243
Closes #3030
Closes #2841
2015-08-31 13:54:39 -07:00
Chunwei Chen 17888ae30d Add compatibility layer for {kmap,kunmap}_atomic
Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2.
We use zfs_{kmap,kunmap}_atomic as wrappers and always take 2 argument, but
ignore the 2nd for newer kernel.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-08-24 10:13:25 -07:00
Chris Dunlop 302f31ffc7 Linux 4.1 compat: configure bdi_setup_and_register()
Pull struct backing_dev_info off the stack: by linux-4.1 it's grown
past our 1024 byte stack frame warning limit resulting in an incorrect
configure result.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #3671
2015-08-18 16:43:04 -07:00
Turbo Fredriksson 48511ea645 Fix some minor issues with the SYSV init and initramfs scripts.
This is some minor fixes to commits 2cac7f5f11
and 2a34db1bdb.

* Make sure to alien'ate the new initramfs rpm package as well!
  The rpm package is build correctly, but alien isn't run on it to
  create the deb.
* Before copying file from COPY_FILE_LIST, make sure the DESTDIR/dir exists.
* Include /lib/udev/vdev_id file in the initrd.
* Because the initrd needs to use '/sbin/modprobe' instead of 'modprobe',
  we need to use this in load_module() as well.
  * Make sure that load_module() can be used more globaly, instead of
    calling '/sbin/modprobe' all over the place.
  * Make sure that check_module_loaded() have a parameter - module to
    check.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3626
2015-07-24 15:05:33 -07:00
Brian Behlendorf 9eb361aaa5 Default to --disable-debug-kmem
The default kmem debugging (--enable-debug-kmem) can severely impact
performance on large-scale NUMA systems due to the atomic operations
used in the memory accounting. A 32-thread fio test running on a
40-core 80-thread system and performing 100% cached reads with kmem
debugging is:

Enabled:
READ: io=177071MB, aggrb=2951.2MB/s, minb=2951.2MB/s, maxb=2951.2MB/s,

Disabled:
READ: io=271454MB, aggrb=4524.4MB/s, minb=4524.4MB/s, maxb=4524.4MB/s,

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issues #463
2015-07-21 11:47:10 -07:00
Turbo Fredriksson 47a4a6fd5f Support parallel build trees (VPATH builds)
Build products from an out of tree build should be written
relative to the build directory.  Sources should be referred
to by their locations in the source directory.

This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile.  This enables the following:

  $ mkdir build
  $ cd build
  $ ../configure \
    --with-spl=$HOME/src/git/spl/ \
    --with-spl-obj=$HOME/src/git/spl/build
  $ make -s

This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.

  Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
  Makefile.am:00: but option 'subdir-objects' is disabled

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1082
2015-07-17 13:42:51 -07:00
Turbo Fredriksson 37d7cd94f3 Support parallel build trees (VPATH builds)
Build products from an out of tree build should be written
relative to the build directory.  Sources should be referred
to by their locations in the source directory.

This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile.  This enables the following:

  $ mkdir build
  $ cd build
  $ ../configure
  $ make -s

This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.

  Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
  Makefile.am:00: but option 'subdir-objects' is disabled

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1082
2015-07-17 12:53:11 -07:00
Brian Behlendorf bd29109f1a Linux 4.2 compat: follow_link() / put_link()
As of Linux 4.2 the kernel has completely retired the nameidata
structure.  One of the few remaining consumers of this interface
were the follow_link() and put_link() callbacks.

This patch adds the required checks to configure to detect the
interface change and updates the functions accordingly.  Migrating
to the simple_follow_link() interface was considered but was decided
against ironically due to the increased complexity.

It also should be noted that the kernel follow_link() and put_link()
interfaces changes several times after 4.1 and but before 4.2.  This
means there is a narrow range of kernel commits which never appear
in an official tag of the Linux kernel which ZoL will not build.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3596
2015-07-17 09:18:16 -07:00
Brian Behlendorf c2d17fd891 Disable gcc bool-compare warning
As of gcc version 5.1.1 a new boolean comparison warning has been
introduced.  This warning is harmless but is triggered several places
in the ZFS code base.  Because warnings are promoted to errors when
building with debugging enabled it is necessary to disable the warning
when using versions of gcc which automatically enabling this check.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-13 12:55:26 -07:00
Turbo Fredriksson 2cac7f5f11 Initramfs scripts for ZoL.
* Supports booting of a ZFS snapshot.
  Do this by cloning the snapshot into a dataset. If this, the resulting
  dataset, already exists, destroy it. Then mount it on root.
  * If snapshot does not exist, use base dataset (the part before '@')
    as boot filesystem instead.
  * If no snapshot is specified on the 'root=' kernel command line, but there
    is an '@', then get a list of snapshots below that filesystem and ask the
    user which to use.
  * Clone with 'mountpoint=none' and 'canmount=noauto' - we mount manually
    and explicitly.
    * For sub-filesystems, that doesn't have a mountpoint property set, we use
      the 'org.zol:mountpoint' to keep track of it's mountpoint.
  * Allow rollback of snapshots instead of clone it and boot from the clone.
* Allow mounting a root- and subfs with mountpoint=legacy set
* Allow mounting a filesystem which is using nativ encryption.
* Support all currently used kernel command line arguments
  All the different distributions have their own standard on what to specify
  on the kernel command line to boot of a ZFS filesystem.
  * Extra options:
    * zfsdebug=(on,yes,1)	Show extra debugging information
    * zfsforce=(on,yes,1)	Force import the pool
    * rollback=(on,yes,1)	Rollback (instead of clone) the snapshot
* Only try to import pool if it haven't already been imported
  * This will negate the need to force import a pool that have not been exported cleanly.
  * Support exclusion of pools to import by setting ZFS_POOL_EXCEPTIONS in /etc/default/zfs.
* Support additional configuration variable ZFS_INITRD_ADDITIONAL_DATASETS
  to mount additional filesystems not located under your root dataset.
* Include /etc/modprobe.d/{zfs,spl}.conf in the initrd if it/they exist.
* Include the udev rule to use by-vdev for pool imports.
* Include the /etc/default/zfs file to the initrd.
* Only try /dev/disk/by-* in the initrd if USE_DISK_BY_ID is set.
  * Use /dev/disk/by-vdev before anything.
  * Add /dev as a last ditch attempt.
  * Fallback to using the cache file if that exist if nothing else worked.
* Use /sbin/modprobe instead of built-in (BusyBox) modprobe.
  This gets rid of the message "modprobe: can't load module zcommon".
  Thanx to pcoultha for finding this.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2116
Closes #2114
2015-07-08 18:14:34 -07:00
Brian Behlendorf 218b4e0a76 Add zfs_sb_prune_aliases() function
For kernels which do not implement a per-suberblock shrinker,
those older than Linux 3.1, the shrink_dcache_parent() function
was used to attempt to reclaim dentries.  This was found not be
entirely reliable and could lead to performance issues on older
kernels running meta-data heavy workloads.

To address this issue a zfs_sb_prune_aliases() function has been
added to implement this functionality.  It relies on traversing
the list of znodes for a filesystem and adding them to a private
list with a reference held.  The private list can then be safely
walked outside the z_znodes_lock to prune dentires and drop the
last reference so the inode can be freed.

This provides the same synchronous behavior as the per-filesystem
shrinker and has the advantage of depending on only long standing
interfaces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #3501
2015-06-22 10:22:49 -07:00
Matus Kral 57ae840077 Linux 4.1 compat: use read_iter() / write_iter()
Linux 3.15 commit torvalds/linux@293bc98 introduced two new methods.
The ->read_iter() and ->write_iter() methods were designed to replace
the ->aio_read() and ->aio_write() interfaces.  Both interfaces were
preserved for several kernel releases in order to migrate all existing
consumers to the new interfaces.  But as of Linux 4.1 the legacy
interface has been retired and the ZFS code must be updated to use
the new interfaces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3352
2015-06-18 12:06:59 -07:00
Tim Chase 90947b2357 3.12 compat, NUMA-aware per-superblock shrinker
Kernels >= 3.12 have a NUMA-aware superblock shrinker which is used in
ZoL by zfs_sb_prune().  This patch calls the shrinker for each on-line
NUMA node in order that memory be freed for each one.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3495
2015-06-17 10:43:13 -07:00
Brian Behlendorf 86c16c59fe Retire rwsem_is_locked() compat
Stock Linux 2.6.32 and earlier kernels contained a broken version of
rwsem_is_locked() which could return an incorrect value.  Because of
this compatibility code was added to detect the broken implementation
and replace it with our own if needed.

The fix for this issue was merged in to the mainline Linux kernel as
of 2.6.33 and the major enterprise distributions based on 2.6.32 have
all backported the fix.  Therefore there is no longer a need to carry
this code and it can be removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #454
2015-06-10 16:35:48 -07:00
Turbo Fredriksson 2a34db1bdb Base init scripts for SYSV systems
* Based on the init scripts included with Debian GNU/Linux, then take code
  from the already existing ones, trying to merge them into one set of
  scripts that will work for 'everyone' for better maintainability.
  * Add configurable variables to control the workings of the init scripts:
    * ZFS_INITRD_PRE_MOUNTROOT_SLEEP
      Set a sleep time before we load the module (used primarily by initrd
      scripts to allow for slower media (such as  USB devices etc) to be
      availible before we load the zfs module).
    * ZFS_INITRD_POST_MODPROBE_SLEEP
      Set a timed sleep in the initrd to after the load of the zfs module.
    * ZFS_INITRD_ADDITIONAL_DATASETS
      To allow for mounting additional datasets in the initrd. Primarily used
      in initrd scripts to allow for when filesystem needed to boot (such as
      /usr, /opt, /var etc) isn't directly under the root dataset.
    * ZFS_POOL_EXCEPTIONS
      Exclude pools from being imported (in the initrd and/or init scripts).
    * ZFS_DKMS_ENABLE_DEBUG, ZFS_DKMS_ENABLE_DEBUG_DMU_TX, ZFS_DKMS_DISABLE_STRIP
      Set to control how dkms should build the dkms packages.
    * ZPOOL_IMPORT_PATH
      Set path(s) where "zpool import" should import pools from.
      This was previously the job of "USE_DISK_BY_ID" (which is still used
      for backwards compatibility) but was renamed to allow for better
      control of import path(s).
      * If old USE_DISK_BY_ID is set, but not new ZPOOL_IMPORT_PATH, then we
        set ZPOOL_IMPORT_PATH to sane defaults just to be on the safe side.
    * ZED_ARGS
      To allow for local options to zed without having to change the init script.
  * The import function, do_import(), imports pools by name instead of '-a'
    for better control of pools to import and from where.
    * If USE_DISK_BY_ID is set (for backwards compatibility), but isn't 'yes'
      then ignore it.
    * If pool(s) isn't found with a simple "zpool import" (seen it happen),
      try looking for them in /dev/disk/by-id (if it exists). Any duplicates
      (pools found with both commands) is filtered out.
      * IF we have found extra pool(s) this way, we must force USE_DISK_BY_ID
        so that the first, simple "zpool import $pool" is able to find it.
    * Fallback on importing the pool using the cache file (if it exists) only
      if 'simple' import (either with ZPOOL_IMPORT_PATH or the 'built in'
      defaults) didn't work.
  * The export function, do_export(), will export all pools imported, EXCEPT
    the root pool (if there is one).
  * ZED script from the Debian GNU/Linux packages added.
    * Refreshed ZED init script from behlendorf@5e7a660 to be portable so it
      may be used on both LSB and Redhat style systems.
    * If there is no pool(s) imported and zed successfully shut down, we will
      unload the zfs modules.
  * The function library file for the ZoL init script is installed as
    /etc/init.d/zfs-functions.
  * The four init scripts, the /etc/{defaults,sysconfig,conf.d}/zfs config file
    as well as the common function library is tagged as '%config(noreplace)' in
    the rpm rules file to make sure they are not replaced automatically if locally
    modifed.
  * Pitfals and workarounds:
    * If we're running from init, remove stale /etc/dfs/sharetab before importing
      pools in the zfs-import init script.
    * On Debian GNU/Linux, there's a 'sendsigs' script that will kill basically
      everything quite early in the shutdown phase and zed is/should be stopped
      much later than that. We don't want zed to be among the ones killed, so add
      the zed pid to list of pids for 'sendsigs' to ignore.
    * CentOS uses echo_success() and echo_failure() to print out status of
      command. These in turn uses "echo -n \0xx[etc]" to move cursor and choose
      colour etc. This doesn't work with the modified IFS variable we need to
      use in zfs-import for some reason, so work around that when we define
      zfs_log_{end,failure}_msg() for RedHat and derivative distributions.
  * All scripts passes ShellCheck (with one false positive in do_mount()).

Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Chris Dunlap <cdunlap@llnl.gov>
Closes #2974
Closes #2107
2015-05-28 14:14:53 -07:00
Turbo Fredriksson 01fcbec52d The mount helper mount.zfs MUST be in /sbin (not '$sbindir').
Commit 60e9f69 added the --with-mounthelperdir option for Gentoo
and in the process accidentally modified the default installation
location.  For security reasons mount(8) expects it to only be
installed under /sbin.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3426
2015-05-18 16:54:36 -07:00
Tim Chase e48533383b Linux 2.6.36 compat, use REQ_FAILFAST_MASK and remove pre-2.6.36 support
Commit f4af6bb783 which added support
for REQ_FAILFAST_MASK but the new autoconf test didn't use the same
preprocessor macro name as the code did.

The effect is that FAILFAST mode has not been enabled for ZoL in any
post-2.6.35 kernel.

Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels.

Raise an error condition if the FAILFAST interface can't be detected.

Signed-off-by: Tim Chase <tim@onlight.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3386
2015-05-11 15:07:00 -07:00
Brian Behlendorf fade6b00b6 Add RHEL style kmod packages
Provide a Redhat specific spl-kmod.spec file which uses the old style
kmods (not kmods2) packaging.  By using the provided kmodtool script
packages can be built which support weak modules.  This allows for the
kernel to be updated without having to rebuild the SPL kernel modules.

Packages for RHEL/Centos/SL/TOSS which use this spec file can by built
as follows:

$ ./configure --with-spec=redhat
$ make rpms

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-03-27 14:42:04 -07:00
Brian Behlendorf ee2ca1db28 Add RHEL style kmod packages
Provide a Redhat specific zfs-kmod.spec file which uses the old style
kmods (not kmods2) packaging.  By using the provided kmodtool script
packages can be built which support weak modules.  This allows for the
kernel to be updated without having to rebuild the ZFS kernel modules.

Packages for RHEL/Centos/SL/TOSS which use this spec file can by built
as follows:

$ ./configure --with-spec=redhat
$ make rpms

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-03-27 14:41:48 -07:00
Brian Behlendorf d820d2e9cf Remove rpm/fedora directory
Originally it was thought that custom spec files might be required
for Fedora.  Happily that has turns out not to be the case.  Since
this directory just contains symlinks to the generic spec files it
can be removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-03-27 14:30:58 -07:00
Brian Behlendorf 72998c2c9d Remove rpm/fedora directory
Originally it was thought that custom spec files might be required
for Fedora.  Happily that has turns out not to be the case.  Since
this directory just contains symlinks to the generic spec files it
can be removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-03-27 14:22:38 -07:00
Tim Chase abb642b9a9 Set HAVE_FS_STRUCT_SPINLOCK correctly when CONFIG_FRAME_WARN==1024
If kernel lock debugging is enabled, the fs_struct structure exceeds the
typical 1024 byte limit of CONFIG_FRAME_WARN and isn't enabled when it
otherwise should be.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #440
2015-03-24 13:25:25 -07:00
Bill McGonigle e023409500 Linux 4.0 compat: bdi_setup_and_register() __must_check
Explicitly disable the unused by variable warnings by setting
__attribute__((unused)) for bdi_setup_and_register().  This is
required because the function is defined with the __must_check
attribute.

Signed-off-by: Bill McGonigle <bill-github.com-public1@bfccomputing.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3141
2015-03-16 10:56:26 -07:00
Brian Behlendorf 8c45def24a Linux 4.0 compat: bdi_setup_and_register()
The 'capabilities' argument which was passed to bdi_setup_and_register()
has been removed.  File systems should no longer pass BDI_CAP_MAP_COPY.
For our purposes this means there are now three different interfaces
which must be handled.  A zpl_bdi_setup_and_register() wrapper function
has been introduced to provide a single interface to the ZPL code.

* 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
* 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
* 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.

I've also taken this opportunity to remove HAVE_BDI because kernels
older then 2.6.32 are no longer supported.  All kernels newer than
this will have one of the above interfaces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #3128
2015-03-03 10:49:45 -08:00
Brian Behlendorf 5f920fbee1 Retire MUTEX_OWNER checks
To minimize the size of a kmutex_t a MUTEX_OWNER check was added.
It allowed the kmutex_t wrapper to leverage the mutex owner which was
already stored in the mutex for certain kernel configurations.

The upside to this was that it reduced the size of the kmutex_t wrapper
structure by the size of a task_struct pointer (4/8 bytes).  The
downside was that two mutex implementations needed to be maintained.
Depending on your exact kernel configuration the correct one would
be selected.

Over the years this solution worked but it could be fragile since it
depending heavily on assumed kernel mutex implementation details.  For
example the SPL_AC_MUTEX_OWNER_TASK_STRUCT configure check needed to
be added when the kernel changed how the owner was stored.  It also
made the code more complicated than it needed to be.

Therefore, in the name of simplicity and portability this optimization
is being retired.  It will slightly increase the memory requirements
for a kmutex_t but only very slightly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #435
2015-03-03 10:13:33 -08:00
Jörg Thalheim 534759fad3 Linux 3.19 compat: file_inode was added
struct access f->f_dentry->d_inode was replaced by accessor function
file_inode(f)

Signed-off-by: Joerg Thalheim <joerg@higgsboson.tk>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3084
2015-02-10 11:24:51 -08:00
Ned Bass 4e30e68caf Don't use AC_LANG_SOURCE for conftest.h source
Using AC_LANG_SOURCE with some versions of autoconf is problematic if
the given source is to be written to a header file. Such versions assume
the contents are to be written to conftest.c and generate shell code to
that effect. The contents of the test program to detect support for
Linux tracepoints were consequently malformed (containing the source for
conftest.h) so the build system incorrectly disabled tracepoints
support. Fix this in ZFS_LINUX_TRY_COMPILE_HEADER by passing the header
source directly to ZFS_LINUX_COMPILE_IFELSE.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2953
2015-01-06 16:53:30 -08:00
Brian Behlendorf 8d9a23e82c Retire legacy debugging infrastructure
When the SPL was originally written Linux tracepoints were still
in their infancy.  Therefore, an entire debugging subsystem was
added to facilite tracing which served us well for many years.

Now that Linux tracepoints have matured they provide all the
functionality of the previous tracing subsystem.  Rather than
maintain parallel functionality it makes sense to fully adopt
tracepoints.  Therefore, this patch retires the legacy debugging
infrastructure.

See zfsonlinux/zfs@bc9f413 for the tracepoint changes.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #408
2014-11-19 10:35:07 -08:00
Prakash Surya 0b39b9f96f Swap DTRACE_PROBE* with Linux tracepoints
This patch leverages Linux tracepoints from within the ZFS on Linux
code base. It also refactors the debug code to bring it back in sync
with Illumos.

The information exported via tracepoints can be used for a variety of
reasons (e.g. debugging, tuning, general exploration/understanding,
etc). It is advantageous to use Linux tracepoints as the mechanism to
export this kind of information (as opposed to something else) for a
number of reasons:

    * A number of external tools can make use of our tracepoints
      "automatically" (e.g. perf, systemtap)
    * Tracepoints are designed to be extremely cheap when disabled
    * It's one of the "accepted" ways to export this kind of
      information; many other kernel subsystems use tracepoints too.

Unfortunately, though, there are a few caveats as well:

    * Linux tracepoints appear to only be available to GPL licensed
      modules due to the way certain kernel functions are exported.
      Thus, to actually make use of the tracepoints introduced by this
      patch, one might have to patch and re-compile the kernel;
      exporting the necessary functions to non-GPL modules.

    * Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux
      tracepoints are not available for unsigned kernel modules
      (tracepoints will get disabled due to the module's 'F' taint).
      Thus, one either has to sign the zfs kernel module prior to
      loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or
      newer.

Assuming the above two requirements are satisfied, lets look at an
example of how this patch can be used and what information it exposes
(all commands run as 'root'):

    # list all zfs tracepoints available

    $ ls /sys/kernel/debug/tracing/events/zfs
    enable              filter              zfs_arc__delete
    zfs_arc__evict      zfs_arc__hit        zfs_arc__miss
    zfs_l2arc__evict    zfs_l2arc__hit      zfs_l2arc__iodone
    zfs_l2arc__miss     zfs_l2arc__read     zfs_l2arc__write
    zfs_new_state__mfu  zfs_new_state__mru

    # enable all zfs tracepoints, clear the tracepoint ring buffer

    $ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable
    $ echo 0 > /sys/kernel/debug/tracing/trace

    # import zpool called 'tank', inspect tracepoint data (each line was
    # truncated, they're too long for a commit message otherwise)

    $ zpool import tank
    $ cat /sys/kernel/debug/tracing/trace | head -n35
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 1219/1219   #P:8
    #
    #                              _-----=> irqs-off
    #                             / _----=> need-resched
    #                            | / _---=> hardirq/softirq
    #                            || / _--=> preempt-depth
    #                            ||| /     delay
    #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
    #              | |       |   ||||       |         |
            lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr...
          z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr...
          z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr...
          z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ...
            lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ...
            lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ...
            lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr...
          z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr...
          z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru...
            lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr...
          z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru...
            lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr...
          z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru...
            lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ...
            lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ...

To highlight the kind of detailed information that is being exported
using this infrastructure, I've taken the first tracepoint line from the
output above and reformatted it such that it fits in 80 columns:

    lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss:
        hdr {
            dva 0x1:0x40082
            birth 15491
            cksum0 0x163edbff3a
            flags 0x640
            datacnt 1
            type 1
            size 2048
            spa 3133524293419867460
            state_type 0
            access 0
            mru_hits 0
            mru_ghost_hits 0
            mfu_hits 0
            mfu_ghost_hits 0
            l2_hits 0
            refcount 1
        } bp {
            dva0 0x1:0x40082
            dva1 0x1:0x3000e5
            dva2 0x1:0x5a006e
            cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00
            lsize 2048
        } zb {
            objset 0
            object 0
            level -1
            blkid 0
        }

For the specific tracepoint shown here, 'zfs_arc__miss', data is
exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and
zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!).
This kind of precise and detailed information can be extremely valuable
when trying to answer certain kinds of questions.

For anybody unfamiliar but looking to build on this, I found the XFS
source code along with the following three web links to be extremely
helpful:

    * http://lwn.net/Articles/379903/
    * http://lwn.net/Articles/381064/
    * http://lwn.net/Articles/383362/

I should also node the more "boring" aspects of this patch:

    * The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to
       support a sixth paramter. This parameter is used to populate the
       contents of the new conftest.h file. If no sixth parameter is
       provided, conftest.h will be empty.

    * The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced.
      This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro,
      except it has support for a fifth option that is then passed as
      the sixth parameter to ZFS_LINUX_COMPILE_IFELSE.

These autoconf changes were needed to test the availability of the Linux
tracepoint macros. Due to the odd nature of the Linux tracepoint macro
API, a separate ".h" must be created (the path and filename is used
internally by the kernel's define_trace.h file).

    * The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This
      is to determine if we can safely enable the Linux tracepoint
      functionality. We need to selectively disable the tracepoint code
      due to the kernel exporting certain functions as GPL only. Without
      this check, the build process will fail at link time.

In addition, the SET_ERROR macro was modified into a tracepoint as well.
To do this, the 'sdt.h' file was moved into the 'include/sys' directory
and now contains a userspace portion and a kernel space portion. The
dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as
well.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-11-17 11:13:55 -08:00
Marcel Wysocki 7f118e836e Add config/compile to config/.gitignore
This file may be added by automake and therefore should be added
to config/.gitignore.  For the full list of possible auxiliary
programs see the full automake documentation.

http://www.gnu.org/software/automake/manual/automake.html#Auxiliary-Programs

Signed-off-by: Marcel Wysocki <maci.stgn@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-31 16:26:44 -07:00
Marcel Wysocki 11662bf969 Add config/compile to config/.gitignore
This file may be added by automake and therefore should be added
to config/.gitignore.  For the full list of possible auxiliary
programs see the full automake documentation.

http://www.gnu.org/software/automake/manual/automake.html#Auxiliary-Programs

Signed-off-by: Marcel Wysocki <maci.stgn@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2848
2014-10-31 16:25:34 -07:00
Richard Yao b76707027c Make systemd-modules-load.service file directory configurable
Installing outside of the prefix is not permissible under Gentoo Prefix.
The package manager will cause the installation process to fail if/when
it sees this. We could handle this by disabling systemd support on
prefix because systemd does not check these paths, but the Gentoo
Council decided that small files such as these should be installed.
That means disabling systemd support on prefix is not an acceptable
workaround. As a consequence, we need some way of control the directory
into which these files are installed.

Making this configurable increases our compliance with the
freedesktop.org specification, which allows these files to be installed
into /etc/modules-load.d:

http://www.freedesktop.org/software/systemd/man/modules-load.d.html

Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2641
2014-10-28 09:41:14 -07:00
Richard Yao 60e9f69c97 Make directory into which mount.zfs is installed configurable
Installing outside of the prefix is not permissible under Gentoo Prefix.
The package manager will cause the installation process to fail if/when
it sees this. I could script a workaround inside the ebuild, but it
seemed to make more sense to make this more configurable.

Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2641
2014-10-28 09:40:59 -07:00
Richard Yao d8d7826721 Search /usr/local/src for SPL Object Directory
Since we changed the default location for the kernel headers to respect
--prefix in the SPL, we must search that location to prevent user builds
from breaking.

Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2641
2014-10-28 09:37:23 -07:00
Brian Behlendorf dcf91382b9 Remove vfs_fsync() wrapper
The vfs_fsync() function has been available since Linux 2.6.29.
There is no longer a need to maintain this compatibility code.
However, the HAVE_2ARGS_VFS_FSYNC check was left in place
since that change occured after 2.6.32.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf 599662c538 Remove kern_path() wrapper
The kern_path() function has been available since Linux 2.6.28.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf 3d5392cefa Remove kvasprintf() wrapper
The kvasprintf() function has been available since Linux 2.6.22.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf 0fac9c9e6d Remove proc_handler() wrapper
As of Linux 2.6.32 the proc handlers where updated to expect only
five arguments.  Therefore there is no longer a need to maintain
this compatibility code and this infrastructure can be simplified.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:52 -07:00
Brian Behlendorf e03119e86f Update put_task_struct() comments
Update the comments to correctly reflect when this interface was
added.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 68a829b29d Remove credential configure checks.
The groups_search() function was never exported by a mainline kernel
therefore we drop this compatibility code and always provide our own
implementation.

Additionally, the cred_t structure has been available since 2.6.29
so there is no longer a need to maintain compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf e39174ed56 Add vfs_unlink() and vfs_rename() comments
Just for consistency with the other autoconf checks a small comment
block was added before these checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 137af025f6 Remove set_fs_pwd() configure check
This function has never been exported by any mainline and was only
briefly available under RHEL5.  Therefore this check is being removed
and the code update to always use the wrapper function.

The next step will be to eliminate all this code.  If ZFS were updated
not to assume that it's pwd was / there would be no need for this.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 3c49a16989 Remove user_path_dir() wrapper
The user_path_dir() function has been available since Linux 2.6.27.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 44778f4110 Remove kallsyms_lookup_name() wrapper
After the removable of get_vmalloc_info(), the unused global memory
variables, and the optional dcache/icache shrinkers there is no
longer a need for the kallsyms compatibility code.  This allows
us to eliminate another brittle area of the code by removing the
kernel upcall this functionality depended on for older kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 89a461e70c Remove shrink_{i,d}node_cache() wrappers
This is optional functionality which may or may not be useful to
ZFS when using older kernels.  It is never a hard requirement.
Therefore this functionality is being removed from the SPL and
a simpler slimmed down version will be added to ZFS.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 8bbbe46f86 Remove global memory variables
Platforms such as Illumos and FreeBSD have historically provided
global variables which summerize the memory state of a system.
Linux on the otherhand doesn't expose any of this information
to kernel modules and uses entirely different mechanisms for
memory management.

In order to simplify the original ZFS port to Linux these global
variables were emulated by the SPL for the benefit of ZFS.  As ZoL
has matured over the years it has moved steadily away from these
interfaces and now no longer depends on them at all.

Therefore, this patch completely removes the global variables
availrmem, minfree, desfree, lotsfree, needfree, swapfs_minfree,
and swapfs_reserve.  This greatly simplifies the memory management
code and eliminates a common area of confusion.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf e1310afae3 Remove get_vmalloc_info() wrapper
The get_vmalloc_info() function was used to back the vmem_size()
function.  This was always problematic and resulted in brittle
code because the kernel never provided a clean interface for
modules.

However, it turns out that the only caller of this function in
ZFS uses it to determine the total virtual address space size.
This can be determined easily without get_vmalloc_info() so
vmem_size() has been updated to take this approach which allows
us to shed the get_vmalloc_info() dependency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 50e41ab1e1 Remove on_each_cpu() wrapper
The on_each_cpu() function has been available since Linux 2.6.27.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf b652d169b0 Remove mutex_lock_nested() wrapper
The mutex_lock_nested() function has been available since Linux 2.6.18.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 2bc5666f53 Remove i_mutex() configure check
The inode structure has used i_mutex as its internal locking
primitive since 2.6.16.  The compatibility code to check for
the previous semaphore primitive has been removed.  However,
the wrapper function itself is being kept because it's entirely
possible this primitive will change again to allow finer grained
locking.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf 9f36cace41 Remove kmalloc_node() compatibility code
The kmalloc_node() function has been available since Linux 2.6.12.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf d227e114ed Remove linux/uaccess.h header check
The uaccess header has been available in the same location since
Linux 2.6.18.  There is no longer a need to maintain this
compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:51 -07:00
Brian Behlendorf e5b65e3179 Remove uintptr_t typedef
The uintptr_t typedef has been available since Linux 2.6.24.
There is no longer a need to maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf ff0582cb39 Remove atomic64_xchg() wrappers
The atomic64_xchg() and atomic64_cmpxchg() functions have been
available since Linux 2.6.24.  There is no longer a need to
maintain this compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf 82f2f1a3af Simplify the time compatibility wrappers
Many of the time functions had grown overly complex in order to
handle kernel compatibility issues.  However, as of Linux 2.6.26
all the required functionality is available.  This allows us to
retire numerous configure checks and greatly simplify the time
compatibility wrappers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf 87f8055a91 Map highbit64() to fls64()
The fls64() function has been available since Linux 2.6.16 and
it should be used to implemented highbit64().  This allows us
to provide an optimized implementation and simplify the code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf 9c91800d19 Remove CTL_UNNUMBERED sysctl interface
Support for the CTL_UNNUMBERED sysctl interface was removed in
Linux 2.6.19.  There is no longer any reason to maintain this
compatibility code.  There also issue any reason to keep around
the CTL_NAME macro and helpers so they have been retired.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf b38bf6a4e3 Remove register_sysctl() compatibility code
The register_sysctl() interface has been stable since Linux 2.6.21.
There is no longer a need to maintain compatibility code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:50 -07:00
Brian Behlendorf bb4dee3df2 Remove utsname() wrapper
There is no longer a need to wrap this because utsname() is provided
by the kernel and can be called directly.  This will require a small
change in the ZFS code because utsname is expected to be a global
structure and not a function.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:11:41 -07:00
Brian Behlendorf a80d69caf0 Remove adaptive mutex implementation
Since the Linux 2.6.29 kernel all mutexes have been adaptive mutexs.
There is no longer any point in keeping this code so it is being
removed to simplify the code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf 3a92530563 Update code to use misc_register()/misc_deregister()
When the SPL was originally written it was designed to use the
device_create() and device_destroy() functions.  Unfortunately,
these functions changed considerably over the years making them
difficult to rely on.

As it turns out a better choice would have been to use the
misc_register()/misc_deregister() functions.  This interface
for registering character devices has remained stable, is simple,
and provides everything we need.

Therefore the code has been reworked to use this interface.  The
higher level ZFS code has always depended on these same interfaces
so this is also as a step towards minimizing our kernel dependencies.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf 6203295438 Make license compatibility checks consistent
Apply the license specified in the META file to ensure the
compatibility checks are all performed consistently.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-10-17 15:07:28 -07:00
Brian Behlendorf e33045ee98 Make license compatibility checks consistent
Apply the license specified in the META file to ensure the
compatibility checks are all performed consistently.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2757
2014-10-17 14:58:38 -07:00
Brian Behlendorf 9ad656b2d0 Retire HAVE_IOCTL_* configure checks
The HAVE_IOCTL_* configure checks were originally added for
compatibility with an ancient version of glibc.  This support
and additional complexity is no longer needed and is therefore
being removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #585
2014-08-28 07:45:54 -07:00
Richard Yao ec18fe3ce8 Cleanup vn_rename() and vn_remove()
zfsonlinux/spl#bcb15891ab394e11615eee08bba1fd85ac32e158 implemented
Linux 3.6+ support by adding duplicate vn_rename and vn_remove
functions. The new ones were cleaner, but the duplicate functions made
the codebase less maintainable. This adds some compatibility shims that
allow us to retire the older vn_rename and vn_remove in favor of the new
ones on old kernels. The result is a net 143 line reduction in lines of
code and a cleaner codebase.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #370
2014-08-13 16:25:44 -07:00
Ned Bass 2fc44f66ec Linux 3.17 compat: remove wait_on_bit action function
Linux kernel 3.17 removes the action function argument from
wait_on_bit().  Add autoconf test and compatibility macro to support
the new interface.

The former "wait_on_bit" interface required an 'action' function to
be provided which does the actual waiting. There were over 20 such
functions in the kernel, many of them identical, though most cases
can be satisfied by one of just two functions: one which uses
io_schedule() and one which just uses schedule().  This API change
was made to consolidate all of those redundant wait functions.

References: torvalds/linux@7431620

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #378
2014-08-11 14:17:00 -07:00
Brian Behlendorf 1139491da7 Revert "Disable GCCs aggressive loop optimization"
This reverts commit 0f62f3f9ab.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2010
2014-07-22 09:56:55 -07:00
Turbo Fredriksson 2ee4e7da90 Accept udev and dracut paths specified by ./configure
There are two common locations where udev and dracut components are
commonly installed.  When building packages using the 'make rpm|deb'
targets check those common locations and pass them to rpmbuild.  For
non-standard configurations these values can be provided by the
the following configure options:

  --with-udevdir=DIR      install udev helpers [default=check]
  --with-udevruledir=DIR  install udev rules [[UDEVDIR/rules.d]]
  --with-dracutdir=DIR    install dracut helpers [default=check]

When rebuilding using the source packages the per-distribution
default values specified in the spec file will be used.  This is
the preferred way to build packages for a distribution but the
ability to override the defaults is provided as a convenience.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2310
Closes #1680
2014-06-11 16:32:57 -07:00
Turbo Fredriksson 0f629346bb Set LANG to a reasonable default (C)
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: zfsonlinux/spl#306
2014-06-10 16:46:21 -07:00
Turbo Fredriksson 1e929b97ac Set LANG to a reasonable default (C)
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #306
2014-06-10 14:50:11 -07:00
Turbo Fredriksson 69c7bdb6e7 Accept kernel source dir(s) specified by ./configure
This adds ability to set the location of the kernel via defines
when building from the spec files.  This is useful when building
against a kernel installed in a non-standard location.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1874
2014-06-05 13:46:49 -07:00
Turbo Fredriksson c9b5cc8c00 Move the libraries into separate packages
From day one the various ZFS libraries should have been placed in their
own sub-packages.  Primarily this allows for multiple major versions of
the libraries to be concurrently installed.  It also facilitates a
smaller build environment by minimizing the required dependencies.

The specific changes required to split the libraries from the utilities
are as follows:

* libzpool2, libnvpair1, libuutil1, and libzfs2 packages were added
  and contain the versioned shared libraries.  The Fedora packaging
  guidelines discourage providing static libraries so they are not
  included in the packages.

  http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries

* The zfs-devel package was renamed libzfs2-devel and the new package
  obsoletes the old zfs-devel package.   This package includes all
  the required headers for the libzpool2, libnvpair1, libuutil1, and
  libzfs2 libraries and their respective unversioned shared libraries.

  This package should eventually be split in to individual lib*-devel
  packages but it will still take some work to cleanly separate them.
  Therefore the libzfs2-devel package provides the expected lib*-devel
  packages so the all proper dependencies can still be created.

  http://fedoraproject.org/wiki/Packaging:Guidelines#Devel_Packages

* Moved '/sbin/ldconfig' execution from the zfs packge to each of the
  new library packages as described by the packaging guidelines.

  http://fedoraproject.org/wiki/Packaging:Guidelines#Shared_Libraries

* The /usr/share/doc/ files were moved in to the libzfs2-devel package.

* Updated config/deb.am to be aware of the packaging changes.  This
  ensures that 'deb-utils' make target converts all the resulting
  packages generated by the 'rpm-utils' target.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2329
Closes: #2341
Issue: #2145
2014-06-02 13:43:20 -07:00
Brian Behlendorf 79aada6105 Restrict release number to META version
When creating packages in a git repository the release number
can be automatically set by 'git describe'.  This normally works
well but if your repository has newer tags which match the form
NAME-VERSION* the release may be incorrectly calculated.  To
prevent this the match patten has been restricted to the contents
of the META file, NAME-VERSION.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-05-29 19:31:50 -07:00
Brian Behlendorf c4f38ddd80 Restrict release number to META version
When creating packages in a git repository the release number
can be automatically set by 'git describe'.  This normally works
well but if your repository has newer tags which match the form
NAME-VERSION* the release may be incorrectly calculated.  To
prevent this the match patten has been restricted to NAME-VERSION.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-05-29 19:08:03 -07:00
Brian Behlendorf a073aeb060 Add KMC_SLAB cache type
For small objects the Linux slab allocator has several advantages
over its counterpart in the SPL.  These include:

1) It is more memory-efficient and packs objects more tightly.
2) It is continually tuned to maximize performance.

Therefore it makes sense to layer the SPLs slab allocator on top
of the Linux slab allocator.  This allows us to leverage the
advantages above while preserving the Illumos semantics we depend
on.  However, there are some things we need to be careful of:

1) The Linux slab allocator was never designed to work well with
   large objects.  Because the SPL slab must still handle this use
   case a cut off limit was added to transition from Linux slab
   backed objects to kmem or vmem backed slabs.

   spl_kmem_cache_slab_limit - Objects less than or equal to this
   size in bytes will be backed by the Linux slab.  By default
   this value is zero which disables the Linux slab functionality.
   Reasonable values for this cut off limit are in the range of
   4096-16386 bytes.

   spl_kmem_cache_kmem_limit - Objects less than or equal to this
   size in bytes will be backed by a kmem slab.  Objects over this
   size will be vmem backed instead.  This value defaults to
   1/8 a page, or 512 bytes on an x86_64 architecture.

2) Be aware that using the Linux slab may inadvertently introduce
   new deadlocks.  Care has been taken previously to ensure that
   all allocations which occur in the write path use GFP_NOIO.
   However, there may be internal allocations performed in the
   Linux slab which do not honor these flags.  If this is the case
   a deadlock may occur.

The path forward is definitely to start relying on the Linux slab.
But for that to happen we need to start building confidence that
there aren't any unexpected surprises lurking for us.  And ideally
need to move completely away from using the SPLs slab for large
memory allocations.  This patch is a first step.

NOTES:
1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab
   backed caches but it is not supported for kmem/vmem backed caches.

2) Regardless of the spl_kmem_cache_*_limit settings a cache may
   be explicitly set to a given type by passed the KMC_KMEM,
   KMC_VMEM, or KMC_SLAB flags during cache creation.

3) The constructors, destructors, and reclaim callbacks are all
   functional and will be called regardless of the cache type.

4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to
   the issues involved in presenting correct object accounting.
   Instead they will appear in /proc/slabinfo under the same names.

5) Several kmem SPLAT tests needed to be fixed because they relied
   incorrectly on internal kmem slab accounting.  With the updated
   test cases all the SPLAT tests pass as expected.

6) An autoconf test was added to ensure that the __GFP_COMP flag
   was correctly added to the default flags used when allocating
   a slab.  This is required to ensure all pages in higher order
   slabs are properly refcounted, see ae16ed9.

7) When using the SLUB allocator there is no need to attempt to
   set the __GFP_COMP flag.  This has been the default behavior
   for the SLUB since Linux 2.6.25.

8) When using the SLUB it may be desirable to set the slub_nomerge
   kernel parameter to prevent caches from being merged.

Original-patch-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #356
2014-05-22 10:28:01 -07:00
Chunwei Chen ad3412efd7 Linux 3.15: vfs_rename() added a flags argument
Detect the updated vfs_rename() interface and call it with an
extra flags argument.

References:
  torvalds/linux@520c8b1

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #355
2014-05-07 13:38:17 -07:00
Richard Yao 3b4f425a5a Refactor inode_owner_or_capable() autotools check
We need inode_owner_or_capable() for ZFS file attributes in addition to
xattrs, so it should go into its own file. This moves it into its own
file and changes it to be more comprehensive. It will now fail if no
known good API is detected.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1691
2014-05-01 10:06:49 -07:00
Chunwei Chen b761912b34 Linux 3.14 compat: rq_for_each_segment in dmu_req_copy
rq_for_each_segment changed from taking bio_vec * to taking bio_vec.
We provide rq_for_each_segment4 which takes both.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2124
2014-04-10 14:28:51 -07:00
Chunwei Chen d4541210f3 Linux 3.14 compat: Immutable biovec changes in vdev_disk.c
bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter.
This patch creates BIO_BI_*(bio) macros to hide the differences.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2124
2014-04-10 14:28:38 -07:00
Chunwei Chen 408ec0d2e1 Linux 3.14 compat: posix_acl_{create,chmod}
posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod}

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2124
2014-04-10 14:27:03 -07:00
Chris Dunlap 518eba1492 Replace check for _POSIX_MEMLOCK w/ HAVE_MLOCKALL
zed supports a '-M' cmdline opt to lock all pages in memory via
mlockall().  The _POSIX_MEMLOCK define is checked to determine whether
this function is supported.  The current test assumes mlockall()
is supported if _POSIX_MEMLOCK is non-zero.  However, this test is
insufficient according to mlock(2) and sysconf(3).  If _POSIX_MEMLOCK
is -1, mlockall() is not supported; but if _POSIX_MEMLOCK is 0,
availability must be checked at runtime.

This commit adds an autoconf check for mlockall() to user.m4.  The zed
code block for mlockall() is now guarded with a test for HAVE_MLOCKALL.
If defined, mlockall() will be called and its runtime availability
checked via its return value.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
2014-04-02 13:10:08 -07:00
Chris Dunlap 07917db990 Add defs for makefile installation dir vars
Add macro definitions to AM_CPPFLAGS to propagate makefile installation
directory variables for libexecdir, runstatedir, sbindir, and
sysconfdir.

https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Installation-Directory-Variables.html

  A corollary is that you should not use these variables except
  in makefiles. For instance, instead of trying to evaluate
  datadir in configure and hard-coding it in makefiles using e.g.,
  'AC_DEFINE_UNQUOTED([DATADIR], ["$datadir"], [Data directory.])',
  you should add -DDATADIR='$(datadir)' to your makefile's definition
  of CPPFLAGS (AM_CPPFLAGS if you are also using Automake).

The runstatedir directory is for "installing data files which the
programs modify while they run, that pertain to one specific machine,
and which need not persist longer than the execution of the program".

https://www.gnu.org/prep/standards/html_node/Directory-Variables.html

It will be defined by autoconf 2.70 or later, and default to
"$(localstatedir)/run".

http://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=commit;h=a197431414088a417b407b9b20583b2e8f7363bd

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
2014-03-31 16:11:13 -07:00
Richard Yao 1de1488fdc Linux 3.13 compat: Handle __must_check bdi_setup_and_register
torvalds/linux@8077c0d983 added a
__must_check to the bdi_setup_and_register(), which caused our autotools
check to break. zfsonlinux/zfs@729210564a
was intended to correct that, but it depended on -Wno-unused-result,
which is unrecognized in older GCC versions. That commit has been
reverted in favor of a solution that does not require
-Wno-unused-result.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2102
Closes #2135
2014-03-24 11:10:06 -07:00
Richard Yao 6b6b8d1041 Revert "Properly ignore bdi_setup_and_register return value"
Older GCC versions do not obey -Wno-unused-result. This reverts commit
729210564a in favor of a solution that
does not require -Wno-unused-result.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1906
2014-03-24 11:08:55 -07:00
Chunwei Chen a15dac42df config: compile test rather than run test
When testing compiler flags, we only need to do compile test. Otherwise,
configure will fail with "configure: error: cannot run test program while
cross compiling" when cross compiling.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2191
2014-03-20 12:11:57 -07:00
Ralf Ertzinger 881f45c6a8 Add systemd unit files for ZFS startup
This adds systemd unit files replacing the functionality offered by
the SysV init script found in etc/init.d.

It has been developed and tested on Fedora 19, Fedora 20
and openSuSE 13.1.

Four unit files and one target are offered.

zfs-import-cache.service:
    Import pools from /etc/zfs/zpool.cache. This unit will wait for
    udev to settle.
zfs-import-scan.service:
    Import pools by scanning /dev/disk/by-id for zvols. This unit will
    only run if /etc/zfs/zpool.cache is not present. This unit will wait
    for udev to settle
zfs-mount.service:
    Mount ZFS native filesystems. It contains a dependency to be loaded
    before local-fs.target.
zfs-share.service:
    Share NFS/SMB filesystems. This unit contains a dependency that
    will cause it to be restarted whenever the smb or nfs-server unit
    is restarted, restoring the shares added.
zfs.target:
    This target pulls in the other units in order to start ZFS. It's
    the only unit that can be enabled/disabled, all other services
    are static and pulled in by dependencies. It will honour zfs=off
    and zfs=no options on the kernel command line.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2108
2014-02-05 12:25:30 -08:00
Brian Behlendorf 0f62f3f9ab Disable GCCs aggressive loop optimization
GCC >+ 4.8's aggressive loop optimization breaks some of the iterators
over the dn_blkptr[] pseudo-array in dnode_phys. Since dn_blkptr[] is
defined as a single-element array, GCC believes an iterator can only
access index 0 and will unroll the loop into a single iteration.

One way to resolve the issue would be to cast the array to a pointer
and fix all the iterators that might break.  The only loop where it
is known to cause a problem is this loop in dmu_objset_write_ready():

    for (i = 0; i < dnp->dn_nblkptr; i++)
            bp->blk_fill += dnp->dn_blkptr[i].blk_fill;

In the common case where dn_nblkptr is 3, the loop is only executed a
single time and "i" is equal to 1 following the loop.

The specific breakage caused by this problem is that the blk_fill of
root block pointers wouldn't be set properly when more than one blkptr
is in use (when no indrect blocks are needed).

The simple reproducing sequence is:

zpool create tank /tank.img
zdb -ddddd tank 0

Notice that "fill=31", however, there are two L0 indirect blocks with
"F=31" and "F=5". The fill count should be 36 rather than 31. This
problem causes an assert to be hit in a simple "zdb tank" when built
with --enable-debug.

However, this approach was not taken because we need to be absolutely
sure we catch all instances of this unwanted optimization.  Therefore,
the build system has been updated to detect if GCC supports the
aggressive loop optimization.  If it does the optimization will be
explicitly disabled using the -fno-aggressive-loop-optimization option.

Original-fix-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2010
Closes #2051
2014-01-14 13:55:58 -08:00
Matthew Thode 11b9ec23b9 Add full SELinux support
Four new dataset properties have been added to support SELinux.  They
are 'context', 'fscontext', 'defcontext' and 'rootcontext' which map
directly to the context options described in mount(8).  When one of
these properties is set to something other than 'none'.  That string
will be passed verbatim as a mount option for the given context when
the filesystem is mounted.

For example, if you wanted the rootcontext for a filesystem to be set
to 'system_u:object_r:fs_t' you would set the property as follows:

  $ zfs set rootcontext="system_u:object_r:fs_t" storage-pool/media

This will ensure the filesystem is automatically mounted with that
rootcontext.  It is equivalent to manually specifying the rootcontext
with the -o option like this:

  $ zfs mount -o rootcontext=system_u:object_r:fs_t storage-pool/media

By default all four contexts are set to 'none'.  Further information
on SELinux contexts is detailed in mount(8) and selinux(8) man pages.

Signed-off-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #1504
2013-12-19 10:37:31 -08:00
Richard Yao 729210564a Properly ignore bdi_setup_and_register return value
This broke compilation against Linux 3.13 and GCC 4.7.3.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1906
2013-12-04 14:53:45 -08:00
Richard Yao 50a0749eba Linux 3.13 compat: Pass NULL for new delegated inode argument
This check was originally added for SLES10, a093c6a, to check for
a 'struct vfsmount *' argument which they added.  However, since
SLES10 is based on a 2.6.16 kernel which is no longer supported
this functionality was dropped.  The checks were refactored to
support Linux 3.13 without concern for historical versions.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #312
2013-12-02 10:37:49 -08:00
DHE fd23663000 Fix typos in commit b83e3e48c9
There's a missing semicolon and equals sign in the first hunk of this
commit in config/kernel-bdi.m4. This results in the test always
failing. The effects were noticed when rrdtool, a tool which modifies
files by mmap() and msync(), would have data never get saved to disk
in spite of the files working while the mounted filesystem remains
mounted.

Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #1889
2013-11-20 15:24:39 -08:00
Richard Yao c3d9c0df3e Linux 3.12 compat: New shrinker API
torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.

This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.

Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped.  As a general rule compatibility is
only maintained back to Linux 2.6.26.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1732
Closes zfsonlinux/zfs#1822
Closes #293
Closes #307
2013-11-06 13:23:40 -08:00
Ned Bass 184c687387 Emulate illumos interface cv_timedwait_hires()
Needed for Illumos #3582. This interface is supposed to support
a variable-resolution timeout with nanosecond granularity.  This
implementation rounds up to microsecond resolution, as nanosecond-
precision timing is rarely needed for real-world performance
tuning and may incur unnecessary busy-waiting.  usleep_range() is
used if available, otherwise udelay() or msleep() are used
depending on the length of the delay interval.

Add flags from sys/callo.h as these are used to control the behavior of
cv_timedwait_hires().  Specifically,

CALLOUT_FLAG_ABSOLUTE
    Normally, the expiration passed to the timeout API functions is
    an expiration interval. If this flag is specified, then it is
    interpreted as the expiration time itself.

CALLOUT_FLAG_ROUNDUP
    Roundup the expiration time to the next resolution boundary. If this
    flag is not specified, the expiration time is rounded down.

References:
    https://www.illumos.org/issues/3582
    illumos/illumos-gate@0689f76

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #304
2013-11-04 09:49:24 -08:00
Massimo Maggi 023699cd62 Posix ACL Support
This change adds support for Posix ACLs by storing them as an xattr
which is common practice for many Linux file systems.  Since the
Posix ACL is stored as an xattr it will not overwrite any existing
ZFS/NFSv4 ACLs which may have been set.  The Posix ACL will also
be non-functional on other platforms although it may be visible
as an xattr if that platform understands SA based xattrs.

By default Posix ACLs are disabled but they may be enabled with
the new 'aclmode=noacl|posixacl' property.  Set the property to
'posixacl' to enable them.  If ZFS/NFSv4 ACL support is ever added
an appropriate acltype will be added.

This change passes the POSIX Test Suite cleanly with the exception
of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too).

  http://www.tuxera.com/community/posix-test-suite/

Signed-off-by: Massimo Maggi <me@massimo-maggi.eu>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #170
2013-10-29 14:54:26 -07:00
Richard Yao 1db7b9be75 Fix libblkid support
libblkid support is dormant because the autotools check is broken and
liblkid identifies ZFS vdevs as "zfs_member", not "zfs". We fix that
with a few changes:

First, we fix the libblkid autotools check to do a few things:

1. Make a 64MB file, which is the minimum size ZFS permits.
2. Make 4 fake uberblock entries to make libblkid's check succeed.
3. Return 0 upon success to make autotools use the success case.
4. Include stdlib.h to avoid implicit declration of free().
5. Check for "zfs_member", not "zfs"
6. Make --with-blkid disable autotools check (avoids Gentoo sandbox violation)
7. Pass '-lblkid' correctly using LIBS not LDFLAGS.

Second, we change the libblkid support to scan for "zfs_member", not
"zfs".

This makes --with-blkid work on Gentoo.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1751
2013-10-10 16:56:51 -07:00
Richard Yao b83e3e48c9 Stop runtime pointer modifications in autotools checks
c38367c73f was meant to eliminate runtime
function pointer modifications in autotools checks because they were
prone to false negatives on kernels hardened by the PaX project.
Unfortunately, I missed the xattr_handler and super_block->s_bdi
autotools checks. Recent changes to PaX constified
xattr_handler->get/set, which lead me to discover this oversight.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1433
2013-09-13 13:30:55 -07:00
Richard Yao 0f37d0c8be Linux 3.11 compat: fops->iterate()
Commit torvalds/linux@2233f31aad
replaced ->readdir() with ->iterate() in struct file_operations.
All filesystems must now use the new ->iterate method.

To handle this the code was reworked to use the new ->iterate
interface.  Care was taken to keep the majority of changes
confined to the ZPL layer which is already Linux specific.
However, minor changes were required to the common zfs_readdir()
function.

Compatibility with older kernels was accomplished by adding
versions of the trivial dir_emit* helper functions.  Also the
various *_readdir() functions were reworked in to wrappers
which create a dir_context structure to pass to the new
*_iterate() functions.

Unfortunately, the new dir_emit* functions prevent us from
passing a private pointer to the filldir function.  The xattr
directory code leveraged this ability through zfs_readdir()
to generate the list of xattr names.  Since we can no longer
use zfs_readdir() a simplified zpl_xattr_readdir() function
was added to perform the same task.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1653
Issue #1591
2013-08-15 16:19:07 -07:00
Richard Yao f7fd6ddd96 Linux 3.8 compat: Use kuid_t/kgid_t when required
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are
replaced by kuid_t/kgid_t, which are structures instead of integral
types. This causes any code that uses an integral type to fail to build.
The User Namespace functionality introduced in Linux 3.8 requires
CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any
kernel that supported it.

We resolve this by converting between the new kuid_t/kgid_t structures
and the original uid_t/gid_t types.

Original-patch-by: DHE
Rewrite-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #260
2013-08-09 10:09:29 -07:00
Brian Behlendorf dba1d70566 Fix arc_adapt() spinning in iterate_supers_type()
The iterate_supers_type() function which was introduced in the
3.0 kernel was supposed to provide a safe way to call an arbitrary
function on all super blocks of a specific type.  Unfortunately,
because a list_head was used a bug was introduced which made it
possible for iterate_supers_type() to get stuck spinning on a
super block which was just deactivated.

This can occur because when the list head is removed from the
fs_supers list it is reinitialized to point to itself.  If the
iterate_supers_type() function happened to be processing the
removed list_head it will get stuck spinning on that list_head.

The bug was fixed in the 3.3 kernel by converting the list_head
to an hlist_node.  However, to resolve the issue for existing
3.0 - 3.2 kernels we detect when a list_head is used.  Then to
prevent the spinning from occurring the .next pointer is set to
the fs_supers list_head which ensures the iterate_supers_type()
function will always terminate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1045
Closes #861
Closes #790
2013-07-17 09:28:06 -07:00
Chris Dunlop a1d9543a39 3.10 API change: block_device_operations->release() returns void
Linux kernel commit torvalds/linux@db2a144 changed the return type
of block_device_operations->release() to void.  Detect the expected
prototype and defined our callout accordingly.

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1494
2013-07-08 15:41:57 -07:00
Yuxuan Shui 1ddf9722dc Linux 3.10 compat: replace PDE()->data with PDE_DATA()
Linux kernel commit torvalds/linux@d9dda78b renamed PDE() to
PDE_DATA().  To handle this detect the prefered interface
and define a PDE_DATA() wrapper for consistency.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257
2013-07-08 15:14:21 -07:00
Yuxuan Shui c02ab72fb9 Linux 3.10 compat: struct vmalloc_info moved
Linux kernel commmit torvalds/linux@db3808c1 moved the
vmalloc_info structure from a private to a public header.
Now that it's available for kernel modules use it.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257
2013-07-08 15:09:20 -07:00
Li Dongyang 802e7b5feb Add SEEK_DATA/SEEK_HOLE to lseek()/llseek()
The approach taken was the rework zfs_holey() as little as
possible and then just wrap the code as needed to ensure
correct locking and error handling.

Tested with xfstests 285 and 286.  All tests pass except for
7-9 of 285 which try to reserve blocks first via fallocate(2)
and fail because fallocate(2) is not yet supported.

Note that the filp->f_lock spinlock did not exist prior to
Linux 2.6.30, but we avoid the need for autotools check by
virtue of the fact that SEEK_DATA/SEEK_HOLE support was not
added until Linux 3.1.

An autoconf check was added for lseek_execute() which is
currently a private function but the expectation is that it
will be exported perhaps as early as Linux 3.11.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1384
2013-07-02 09:24:43 -07:00
Carlos Alberto Lopez Perez 5165473737 Ensure --with-spl-timeout waits for spl_config.h and symvers
The previous code was only waiting for the symver file. But the
postinst target of the DKMS script for SPL will not only create
the symvers file, but also the header spl_config.h.

If we are waiting in the configure script of ZFS for the SPL
symvers file, then we also need to wait for spl_config.h.
Otherwise the configure script will abort because the spl_config.h
is not yet available.

On top of that, the function ZFS_AC_SPL_MODULE_SYMVERS is moved
to the end of the function ZFS_AC_SPL to allow both checks share
the with-spl-timeout parameter.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1431
2013-05-02 15:40:44 -07:00
Brian Behlendorf e013670550 Set RPM_DEFINE_COMMON options
When the kmod packaging was introduced the ability to pass the
--enable-debug and --enable-dmu-tx options from configure all
the way through to `make rpm|deb` was accidenally lost.  Update
ZFS_AC_RPM to explicitlu set RPM_DEFINE_COMMON with these
rpmbuild defines.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
2013-04-24 16:18:55 -07:00
Turbo Fredriksson 1a33036df9 Add --bump=0 to alien
Preserve the release field when creating Debian packages.  The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue #1402
Issue #928
2013-04-24 16:18:53 -07:00
Turbo Fredriksson d012ba3832 Support .nogitrelease file
When building a custom release in a git tree provide the ability
to prevent the release field from being overwritten by the
`git describe` output.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
2013-04-24 16:18:49 -07:00
Turbo Fredriksson 16253cff43 Add --bump=0 to alien
Preserve the release field when creating Debian packages.  The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue zfsonlinux/zfs#1402
Issue zfsonlinux/zfs#928
2013-04-24 16:18:11 -07:00
Turbo Fredriksson 2c21370746 Support .nogitrelease file
When building a custom release in a git tree provide the ability
to prevent the release field from being overwritten by the
`git describe` output.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1402
2013-04-24 16:18:03 -07:00
Brian Behlendorf d17eeafbf0 Replace the ZFS_AC_META perl dependency with awk
The only remaining perl dependency is part of the ZFS_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1380
2013-04-02 16:05:45 -07:00
Brian Behlendorf 7fd629d430 Replace the SPL_AC_META perl dependency with awk
The only remaining perl dependency is part of the SPL_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1380
2013-04-02 16:04:19 -07:00
Jan Engelhardt 83918aebe5 build: do not call boilerplate ourself
Rationale see section 3.5 "Using `autoreconf' to Update `configure'
Scripts" of the autoconf manual.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 11:08:46 -07:00
Jan Engelhardt a9e86ac4fd gitignore: anchor entries at their respective directory
.ko is specific to module, .m4 to config, etc.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 11:07:52 -07:00
Jan Engelhardt 92c4ea38c9 build: use CPPFLAGS
-D and -I are preprocessor flags, so should preferably be in the
appropriate variable.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 11:07:11 -07:00
Jan Engelhardt 7a8a639390 build: resolve orthographic and other grammatical errors
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 11:06:38 -07:00
Jan Engelhardt 8c39262945 build: do not call boilerplate ourself
Rationale see section 3.5 "Using `autoreconf' to Update `configure'
Scripts" of the autoconf manual.

http://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/autoreconf-Invocation.html

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 10:55:20 -07:00
Jan Engelhardt ea0fcfc875 gitignore: anchor entries at their respective directory
.ko is specific to module, .m4 to config, etc.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 10:50:17 -07:00
Jan Engelhardt ecf76e3676 build: use CPPFLAGS
-D and -I are preprocessor flags, so should preferably be in the
appropriate variable.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 10:48:26 -07:00
Jan Engelhardt 4e95cc99b0 build: resolve orthographic and other grammatical errors
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-04-02 10:44:52 -07:00
Brian Behlendorf c14183adca Use 'git describe' for working builds
When building from an arbitrary commit in the git tree it's useful
for the resulting packages to be uniquely identifiable.  Therefore,
the build system has been updated to detect if your compiling in
git tree.

If you are building in a git tree, and there are commits after the
last annotated tag.  Then the <id>-<hash> component of 'git describe'
will be used to overwrite the 'Release:' field in the META file.

The only tricky part is that to ensure the 'make dist' tarball is
built using the correct release.  A dist-hook was added to the top
level make file to rewrite the META file using the correct release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #195
Issue #111
2013-03-22 15:00:55 -07:00
Brian Behlendorf f6fb7651a0 Use 'git describe' for working builds
When building from an arbitrary commit in the git tree it's useful
for the resulting packages to be uniquely identifiable.  Therefore,
the build system has been updated to detect if your compiling in
git tree.

If you are building in a git tree, and there are commits after the
last annotated tag.  Then the <id>-<hash> component of 'git describe'
will be used to overwrite the 'Release:' field in the META file.

The only tricky part is that to ensure the 'make dist' tarball is
built using the correct release.  A dist-hook was added to the top
level make file to rewrite the META file using the correct release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-22 14:52:01 -07:00
Brian Behlendorf f3757573a6 Refresh RPM packaging
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'.  This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.

  http://fedoraproject.org/wiki/Packaging:Guidelines
  http://rpmfusion.org/Packaging/KernelModules/Kmods2

While the spec files have been entirely rewritten from a
user perspective the only major changes are:

* The Fedora packages now have a build dependency on the
  rpmfusion repositories.  The generic kmod packages also
  have a new dependency on kmodtool-1.22 but it is bundled
  with the source rpm so no additional packages are needed.

* The kernel binary module packages have been renamed from
  zfs-modules-* to kmod-zfs-* as specificed by kmods2.

* The is now a common kmod-zfs-devel-* package in addition
  to the per-kernel devel packages.  The common package
  contains the development headers while the per-kernel
  package contains kernel specific build products.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1341
2013-03-18 15:33:17 -07:00
Brian Behlendorf 493972c896 Refresh RPM packaging
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'.  This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.

  http://fedoraproject.org/wiki/Packaging:Guidelines
  http://rpmfusion.org/Packaging/KernelModules/Kmods2

While the spec files have been entirely rewritten from a
user perspective the only major changes are:

* The Fedora packages now have a build dependency on the
  rpmfusion repositories.  The generic kmod packages also
  have a new dependency on kmodtool-1.22 but it is bundled
  with the source rpm so no additional packages are needed.

* The kernel binary module packages have been renamed from
  spl-modules-* to kmod-spl-* as specificed by kmods2.

* The is now a common kmod-spl-devel-* package in addition
  to the per-kernel devel packages.  The common package
  contains the development headers while the per-kernel
  package contains kernel specific build products.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #222
2013-03-18 15:31:54 -07:00
Richard Yao 8274ed5988 Drop support for 3 argument version of set_fs_pwd
This was a suggestion that Brian Behlendorf made when reviewing an early
pull request for Linux 3.9 support. This commit was made intentionally
easy to revert should we ever have a reason to reintroduce support for
older kernels.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-14 10:43:31 -07:00
Richard Yao a54718cfe0 Linux 3.9 compat: set_fs_root takes const struct path *
torvalds/linux@dcf787f391 enforces
const-correctness in passing struct path *.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-14 10:43:29 -07:00
Richard Yao 2a305c34c8 Linux 3.9 compat: vfs_getattr takes two arguments
The function prototype of vfs_getattr previoulsy took struct vfsmount *
and struct dentry * as arguments. These would always be defined together
in a struct path *.

torvalds/linux@3dadecce20 modified
vfs_getattr to take struct path * is taken as an argument instead.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-14 10:43:26 -07:00
Richard Yao 10087fe1fa Linux 3.9 compat: Include linux/sched/rt.h
Linux 3.9 reorganized sched.h, splitting it into numerous files.
torvalds/linux@8bd75c77b7 moved MAX_PRIO
and MAX_RT_PRIO to linux/sched/rt.h.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-14 10:43:19 -07:00
Brian Behlendorf 9b2af9a097 Configure --with-spl{-obj} auto-detect cleanup
Because the install location for the spl/zfs-devel headers was
changed we need to refresh the auto-detect code.  Note that
for packaging which already explicitly calls --with-spl{-obj}
nothing has changed.

The updated code is now structured like that in ZFS_AC_KERNEL
and should be cleaner and easier to maintain.  In addition,
it's stricter about detecting a valid source and object
directory.  It requires:

* The source directory contains the file 'spl.release'
* The object directory contains the file 'spl_config.h'
* The following paths will be checked.  Notice the /var/lib/
  and /usr/src paths require that the spl and zfs version be
  matched.  This is done to prevent accidentally mixing releases.

        dnl # 1) /var/lib/dkms/spl/<version>/build
        dnl # 2) /usr/src/spl-<version>/<kernel-version>
        dnl # 3) /usr/src/spl-<version>
        dnl # 4) ../spl
        dnl # 5) /usr/src/kernels/<kernel-version>

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-13 13:42:16 -07:00
Brian Behlendorf 0da31cd6ca Remove ARCH packaging
The kernel modules are now available in the Arch User Repository
(AUR) via zfs.  Since their packaging is maintained and superior
to ours it is being removed from the tree.

  https://wiki.archlinux.org/index.php/ZFS

Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-06 15:46:41 -08:00
Brian Behlendorf ffb21118ad Add --with-dracutdir configure option
The standard dracut directory has moved from /usr/share/dracut to
/usr/lib/dracut.  To ensure the dracut modules get installed in
the correct location provide a --with-dracutdir configure option
to set the path.

The default install location has been updated to /usr/lib/dracut
which is used by more current versions of Fedora.  However, this
default is overriden by the RPM packaging for consistency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-06 15:46:41 -08:00
Brian Behlendorf 5f0a4b0847 Remove ARCH packaging
The kernel modules are now available in the Arch User Repository
(AUR) via zfs.  Since their packaging is maintained and superior
to ours it is being removed from the tree.

  https://wiki.archlinux.org/index.php/ZFS

Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-04 19:09:34 -08:00
Richard Yao c38367c73f Eliminate runtime function pointer mods in autotools checks
PaX/GrSecurity patched kernels implement a dialect of C that relies on a
GCC plugin for enforcement. A basic idea in this dialect is that
function pointers in structures should not change during runtime.
This causes code that modifies function pointers at runtime to fail to
compile in many instances. The autotools checks rely on whether or
not small test cases compile against a given kernel. Some
autotools checks assume some default case if other cases fail. When one
of these autotools checks tests a PaX/GrSecurity patched kernel by
modifying a function pointer at runtime, the default case will be used.

Early detection of such situations is possible by relying on compiler
warnings, which are compiler errors when --enable-debug is used.
Unfortunately, very few people build ZFS with --enable-debug. The more
common situation is that these issues manifest themselves as runtime
failures in the form of NULL pointer exceptions.

Previous patches that addressed such issues with PaX/GrSecurity
compatibility largely relied on rewriting autotools checks to avoid
runtime function pointer modification or the addition of PaX/GrSecurity
specific checks. This patch takes the previous work to its logical
conclusion by eliminating the use of runtime function pointer
modification. This permits the removal of PaX-specific autotools checks
in favor of ones that work across all supported kernels.

This should resolve issues that were reported to occur with
PaX/GrSecurity-patched Linux 3.7.5 kernels on Gentoo Linux.

https://bugs.gentoo.org/show_bug.cgi?id=457176

We should be able to prevent future regressions in PaX/GrSecurity
compatibility by ensuring that all changes to ZFSOnLinux avoid runtime
function pointer modification. At the same time, this does not solve the
issue of silent failures triggering default cases in the autotools
check, which is what permitted these regressions to become runtime
failures in the first place. This will need to be addressed in a future
patch.

Reported-by: Marcin Mirosław <bug@mejor.pl>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1300
2013-03-04 08:49:17 -08:00
Etienne Dechamps d9b0ebbe82 Remove the bio_empty_barrier() check.
To determine whether the kernel is capable of handling empty barrier
BIOs, we check for the presence of the bio_empty_barrier() macro,
which was introduced in 2.6.24. If this macro is defined, then we can
flush disk vdevs; if it isn't, then flushing is disabled.

Unfortunately, the bio_empty_barrier() macro was removed in 2.6.37,
even though the kernel is still capable of handling empty barrier BIOs.

As a result, flushing is effectively disabled on kernels >= 2.6.37,
meaning that starting from this kernel version, zfs doesn't use
barriers to guarantee on-disk data consistency. This is quite bad and
can lead to potential data corruption on power failures.

This patch fixes the issue by removing the configure check for
bio_empty_barrier(), as we don't support kernels <= 2.6.24 anymore.

Thanks to Richard Kojedzinszky for catching this nasty bug.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1318
2013-02-24 10:22:34 -08:00
Etienne Dechamps d75af3c0eb Use -Werror for all kernel configure tests.
As a matter of fact, we're already using -Werror for most tests because
of a bug in kernel-bio-empty-barrier.m4 which sets -Werror without
reverting it afterwards. This meant that all tests which ran after this
one was using -Werror.

This patch simply makes it clear that we're using -Werror and makes
the code more readable and more predictable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1317
2013-02-24 10:20:28 -08:00
Richard Yao a0625691b3 Fix HAVE_MUTEX_OWNER_TASK_STRUCT autotools check on PPC64
The HAVE_MUTEX_OWNER_TASK_STRUCT fails on PPC64 with the following
error:

error: 'current' undeclared (first use in this function)

We include linux/sched.h to ensure that current is available.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-02-05 15:36:03 -08:00
Brian Behlendorf dd3678fc29 Fix atomic64_* autoconf checks
The SPL_AC_ATOMIC_SPINLOCK, SPL_AC_TYPE_ATOMIC64_CMPXCHG, and
SPL_AC_TYPE_ATOMIC64_XCHG were all directly including the
'asm/atomic.h' header.  As of Linux 3.4 this header was removed
which results in a build failure.

The right thing to do is include 'linux/atomic.h' however we
can't safely do this because it doesn't exist in 2.6.26 kernels.
Therefore, we include 'linux/fs.h' which in turn includes the
correct atomic header regardless of the kernel version.

When these incorrect APIs are used in ZFS the following build
failure results.

  arc.c:791:80: warning: '__ret' may be used uninitialized
  in this function [-Wuninitialized]
  arc.c:791:1875: error: call to '__cmpxchg_wrong_size'
  declared with attribute error: Bad argument size for cmpxchg

Since this is all Linux 2.6.24 compatibility code there's
an argument to be made that it should be removed because
kernels this old are not supported.  However, because we're
so close to a release I'm going to leave it in place for now.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#814
Closes zfsonlinux/zfs#1254
2013-02-05 10:05:46 -08:00
Brian Behlendorf de081a2ab4 Check for KALLSYMS
Check at ./configure time that the kernel was built with kallsyms
support.  If the kernel doesn't have CONFIG_KALLSYMS defined the
modules will still compile cleanly but will not be loadable.  So
we really want to catch this early during ./configure.  Note that
we do not require CONFIG_KALLSYMS_ALL but it may be safely defined.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6
2013-01-29 16:35:23 -08:00
Brian Behlendorf 79c6e4c445 Remove NPTL_GUARD_WITHIN_STACK
Commit 4b2f65b253 increased the user
space stack by 4x to resolve certain stack overflows.  As such it
no longer makes sense to worry about a single extra page which
might or might not be part of the process stack.  There is now
ample headroom for normal usage.

By eliminating this configure check we are also resolving the
following segfault which intentionally occurs at configure time
and may be logged in dmesg.

  conftest[22156]: segfault at 7fbf18a47e48 ip 00000000004007fe
  sp 00007fbf18a4be50 error 6 in conftest[400000+1000]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-29 10:58:20 -08:00
Brian Behlendorf 2b7ab9d4d9 Linux 2.6.26 compat, lookup_bdev()
It's doubtful many people were impacted by this but commit 6c28567
accidentally broke ZFS builds for 2.6.26 and earlier kernels.  This
commit depends on the lookup_bdev() function which exists in 2.6.26
but wasn't exported until 2.6.27.

The availability of the function isn't critical so a wrapper is
introduced which returns ERR_PTR(-ENOTSUP) when the function isn't
defined.  This will have the effect of causing zvol_is_zvol() to
always fail for 2.6.26 kernels.  This in turn means vdevs will
always get opened concurrently which is good for normal usage.
This will only become an issue if your using a zvol as a vdev in
another pool.  In which case you really should be using a newer
kernel anyway.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1205
2013-01-28 15:35:00 -08:00
Brian Behlendorf ee93035378 Use sb->s_d_op default dentry operations
As of Linux 2.6.37 the right way to register custom dentry
operations is to use the super block's ->s_d_op field.
For older kernels they should be registered as part of the
lookup operation.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1223
2013-01-18 15:04:23 -08:00
Brian Behlendorf 84dd1f4f15 Remove spl_invalidate_inodes()
This functionality is no longer required by ZFS, see commit
zfsonlinux/zfs@7b3e34ba5a.
Since there are no other consumers, and because it adds
additional autoconf complexity which must be maintained
the spl_invalidate_inodes() function has been removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#795
2013-01-17 11:40:47 -08:00
Ned Bass f1a05fa114 Fix false ENOENT on snapshot control dentries
Lookups in the snapshot control directory for an existing snapshot
fail with ENOENT if an earlier lookup failed before the snapshot was
created.  This is because the earlier lookup causes a negative dentry
to be cached which is never invalidated.

The bug can be reproduced as follows (the second ls should succeed):

 $ ls /tank/.zfs/snapshot/s
 ls: cannot access /tank/.zfs/snapshot/s: No such file or directory
 $ zfs snap tank@s
 $ ls /tank/.zfs/snapshot/s
 ls: cannot access /tank/.zfs/snapshot/s: No such file or directory

To remedy this, always invalidate cached dentries in the snapshot
control directory.  Since these entries never exist on disk there is
no significant performance penalty for the extra lookups.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1192
2013-01-16 16:28:54 -08:00
Brian Behlendorf e191b54ecf Only use gcc -Wunused-but-set-variable when available
Certain versions of gcc generate an 'unrecognized command
line option' error message when -Wunused-but-set-variable
is used unconditionally.  This in turn can cause several
of the autoconf tests to misdetect an interface.

Now, the use of -Wunused-but-set-variable in the autoconf
tests was introduced by commit b9c59ec8 to address a gcc
4.6 compatibility problem.  So we really only need to pass
this option for version of gcc which are known to support it.

Therefore, the tests have been updated to use the result of
the existing ZFS_AC_CONFIG_ALWAYS_NO_UNUSED_BUT_SET_VARIABLE
which determines if gcc supports this option.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1004
2013-01-10 16:09:39 -08:00
Brian Behlendorf 42b3ce622f Check for ZLIB_INFLATE and ZLIB_DEFLATE
Check at ./configure time that the kernel was built with zlib
support enabled.  This support may either be configured as a
module or builtin to the kernel.  But if it's missing the build
will fail so it's best to catch this early.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#582
2013-01-09 16:40:25 -08:00
Brian Behlendorf 050cd84e62 Linux compat 3.7.1, on_each_cpu()
Some kernels require that we include the 'linux/irqflags.h'
header for the SPL_AC_3ARGS_ON_EACH_CPU check.  Otherwise,
the functions local_irq_enable()/local_irq_disable() will not
be defined and the prototype will be misdetected as the four
argument version.

This change actually include 'linux/interrupt.h' which in turn
includes 'linux/irqflags.h' to be as generic as possible.

Additionally, passing NULL as the function can result in a
gcc error because the on_each_cpu() macro executes it
unconditionally.  To make the test more robust we pass the
dummy function on_each_cpu_func().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #204
2013-01-09 10:28:28 -08:00
Brian Behlendorf 1c7b3eaf87 RHEL 6.4 compat, fallocate()
In the upstream kernel the FALLOC_FL_PUNCH_HOLE #define was
introduced after the fallocate() function was moved from the
inode_operations to the file_operations structure.  Therefore,
the SPL code assumed that if FALLOC_FL_PUNCH_HOLE was defined
it was safe to use f_ops->fallocate().

Unfortunately, the RHEL6.4 kernel has only backported the
FALLOC_FL_PUNCH_HOLE #define and not the fallocate() change.

To address this compatibility issue the spl_filp_fallocate()
helper function was added to properly detect which interface
is available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-08 09:53:13 -08:00
Brian Behlendorf 8780c53961 Update SAs when an inode is dirtied
Revert the portion of commit d3aa3ea which always resulted in the
SAs being update when an mmap()'ed file was closed.  That change
accidentally resulted in unexpected ctime updates which upset tools
like git.  That was always a horrible hack and I'm happy it will
never make it in to a tagged release.

The right fix is something I initially resisted doing because I
was worried about the additional overhead.  However, in hindsight
the overhead isn't as bad as I feared.

This patch implemented the sops->dirty_inode() callback which is
unsurprisingly called when an inode is dirtied.  We leverage this
callback to keep the znode SAs strictly in sync with the inode.

However, for now we're going to go slowly to avoid introducing
any new unexpected issues by only updating the atime, mtime, and
ctime.  This will cover the callpath of most concern to us.

  ->filemap_page_mkwrite->file_update_time->update_time->
      mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #764
Closes #1140
2012-12-14 12:18:54 -08:00
Brian Behlendorf eb0be2ed46 Removed SPL_AC_3ARGS_INIT_WORK check
All consumers of the kernel delayed work queues have been shifted
over to rely on the taskq implementation.  This compatibility code
can now be removed.  Any new callers which need this functionality
should use the taskq interfaces for delayed work items.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-12-12 09:57:10 -08:00
Brian Behlendorf 56a517ae3a Verify --with-linux source directory exists
Previously this check was only performed when ./configure was
attempting to autodetect your kernel source directory.  But we
should also handle the case where --with-linux was provided
and is obviously wrong.  This way we catch the error before
invoking make and compiling the source with an incorrect
autoconf results.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#162
2012-11-29 15:08:35 -08:00
Brian Behlendorf 251677e98f Verify --with-linux source directory exists
Previously this check was only performed when ./configure was
attempting to autodetect your kernel source directory.  But we
should also handle the case where --with-linux was provided
and is obviously wrong.  This way we catch the error before
invoking make and compiling the source with an incorrect
autoconf results.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #162
2012-11-29 15:05:54 -08:00
Brian Behlendorf 2404b01499 Improve AF hard disk detection
Use the bdev_physical_block_size() interface to determine the
minimize write size which can be issued without incurring a
read-modify-write operation.  This is used to set the ashift
correctly to prevent a performance penalty when using AF hard
disks.

Unfortunately, this interface isn't entirely reliable because
it's not uncommon for disks to misreport this value.  For this
reason you may still need to manually set your ashift with:

  zpool create -o ashift=12 ...

The solution to this in the upstream Illumos source was to add
a white list of known offending drives.  Maintaining such a list
will be a burden, but it still may be worth doing if we can
detect a large number of these drives.  This should be considered
as future work.

Reported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #916
2012-11-15 11:06:14 -08:00
Brian Behlendorf 1e0c2c2ccf Linux 3.7 compat, __clear_close_on_exec() removed
Commit torvalds/linux@b8318b0 moved the __clear_close_on_exec()
function out of include/linux/fdtable.h and in to fs/file.c
making it unavailable to the SPL.

Now as it turns out we only used this function to tear down
some test infrastructure for the vn_getf()/vn_releasef() SPLAT
regression tests.  Rather than implement even more autoconf
compatibilty code to handle this we just remove the test case.
This also allows us to drop three existing autoconf tests.

This does mean the SPLAT tests will no longer verify these
functions but historically they have never been a problem.
And if we feel we absolutely need this test coverage I'm
sure a more portable version of the test case could be added.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #183
2012-10-18 13:36:44 -07:00
Yuxuan Shui bcb15891ab Linux 3.6 compat, kern_path_locked() added
The kern_path_parent() function was removed from Linux 3.6 because
it was observed that all the callers just want the parent dentry.
The simpler kern_path_locked() function replaces kern_path_parent()
and does the lookup while holding the ->i_mutex lock.

This is good news for the vn implementation because it removes the
need for us to handle the locking.  However, it makes it harder to
implement a single readable vn_remove()/vn_rename() function which
is usually what we prefer.

Therefore, we implement a new version of vn_remove()/vn_rename()
for Linux 3.6 and newer kernels.  This allows us to leave the
existing working implementation untouched, and to add a simpler
version for newer kernels.

Long term I would very much like to see all of the vn code removed
since what this code enabled is generally frowned upon in the kernel.
But that can't happen util we either abondon the zpool.cache file
or implement alternate infrastructure to update is correctly in
user space.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #154
2012-10-14 16:26:21 -07:00
Richard Yao 95f5c63b47 Linux 3.6 compat, iops->mkdir()
Use .mkdir instead of .create in 3.3 compatibility check.  Linux 3.6
modifies inode_operations->create's function prototype. This causes
an autotools Linux 3.3. compatibility check for a function prototype
change in create, mkdir and mknode to fail. Since mkdir and mknode
are unchanged, we modify the check to examine it instead.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
2012-10-14 15:29:26 -07:00
Yuxuan Shui 558ef6d080 Linux 3.6 compat, iops->create()
As of Linux commit ebfc3b49a7ac25920cb5be5445f602e51d2ea559 the
struct nameidata is no longer passed to iops->create.  Instead
only the result of (inamedata->flags & LOOKUP_EXCL) is passed.

ZFS like almost all Linux fileystems never made use of this so
only the prototype needs to be wrapped for compatibility.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
2012-10-14 14:42:25 -07:00
Yuxuan Shui 8f195a908f Linux 3.6 compat, iops->lookup()
As of Linux commit 00cd8dd3bf95f2cc8435b4cac01d9995635c6d0b the
struct nameidata is no longer passed to iops->lookup.  Instead
only the inamedata->flags are passed.

ZFS like almost all Linux fileystems never made use of this so
only the prototype needs to be wrapped for compatibility.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
2012-10-14 13:06:54 -07:00
Yuxuan Shui 3c20361075 Linux 3.6 compat, sget()
As of Linux commit 9249e17fe094d853d1ef7475dd559a2cc7e23d42 the
mount flags are now passed to sget() so they can be used when
initializing a new superblock.

ZFS never uses sget() in this fashion so we can simply pass a
zero and add a zpl_sget() compatibility wrapper.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #873
2012-10-14 13:06:48 -07:00
Etienne Dechamps bbdc6ae495 Add interface for file hole punching.
This adds an interface to "punch holes" (deallocate space) in VFS
files. The interface is identical to the Solaris VOP_SPACE interface.
This interface is necessary for TRIM support on file vdevs.

This is implemented using Linux fallocate(FALLOC_FL_PUNCH_HOLE), which
was introduced in 2.6.38. For a brief time before 2.6.38 this was done
using the truncate_range inode operation, which was quickly deprecated.
This patch only supports FALLOC_FL_PUNCH_HOLE.

This adds support for the truncate_range() inode operation to
VOP_SPACE() for file hole punching. This API is deprecated and removed
in 3.5, so it's only useful for old kernels.

On tmpfs, the truncate_range() inode operation translates to
shmem_truncate_range(). Unfortunately, this function expects the end
offset to be inclusive and aligned to the end of a page. If it is not,
the kernel will stop with a BUG_ON().

This patch fixes the issue by adapting to the constraints set forth by
shmem_truncate_range().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #168
2012-10-04 16:22:07 -07:00
Brian Behlendorf 6d1d976b2c Modify vdev_elevator_switch() to use elevator_change()
As of Linux 2.6.36 an elevator_change() interface was added.
This commit updates vdev_elevator_switch() to use this interface
when available, otherwise it falls back to the usermodehelper
method.

Original-patch-by: foobarz <sysop@xeon.(none)>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #906
2012-10-03 13:31:44 -07:00
Cyril Plisko 393b44c711 Implement .commit_metadata hook for NFS export
In order to implement synchronous NFS metadata semantics ZFS
needs to provide the .commit_metadata hook.  All it takes there
is to make sure changes are committed to ZIL.  Fortunately
zfs_fsync() does just that, so simply calling it from
zpl_commit_metadata() does the trick.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #969
2012-10-03 10:49:45 -07:00
Brian Behlendorf cda4db408c Revert "Improve AF hard disk detection"
This reverts commit 395350c85d which
accidentally introduced issue #955.

Pools using AF drives which were originally created with a sector
size of 512 bytes will now be correctly detected to have physical
sector size of 4096.  This is desirable for a new pool, however for
an existing pool abruptly changing the sector size causes problems.

For this reason, this change is being reverted until the additional
logic can be added to detect the existing pool case.  Existing
pools must use the ashift size stored in the label regardless of
what the disk reports.  This is critical for compatibility.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #955
2012-09-11 16:33:49 -07:00
Brian Behlendorf 395350c85d Improve AF hard disk detection
Use the bdev_physical_block_size() interface to determine the
minimize write size which can be issued without incurring a
read-modify-write operation.  This is used to set the ashift
correctly to prevent a performance penalty when using AF hard
disks.

Unfortunately, this interface isn't entirely reliable because
it's not uncommon for disks to misreport this value.  For this
reason you may still need to manually set your ashift with:

  zpool create -o ashift=12 ...

The solution to this in the upstream Illumos source was to add
a while list of known offending drives.  Maintaining such a list
will be a burden, but it still may be worth doing if we can
detect a large number of these drives.  This should be considered
as future work.

Reported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #916
2012-09-04 15:35:32 -07:00
Brian Behlendorf bc03e07a7c Revert "Detect kernels that honor gfp flags passed to vmalloc()"
This reverts commit 36811b4430.
Which is no longer required because there is now SPL code in
place to safely handle the deadlocks the kernel patch was designed
to address.  Therefore we can unconditionally use vmalloc() and
drop all the PF_MEMALLOC code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-08-27 12:00:55 -07:00
Prakash Surya 587045a638 Remove SPL_LINUX_CONFIG autoconf macro
Since removing the check for CONFIG_PREEMPT, there are no consumers of
the SPL_LINUX_CONFIG macro. As such, there is no reason to keep it
around.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #164
2012-08-27 11:58:37 -07:00
Prakash Surya f86373f5b2 Remove autoconf check for CONFIG_PREEMPT
The autoconf macro which failed if CONFIG_PREEMPT was set in the kernel
config was removed. With the inclusion of a few previous patches
targeting support for preempt enabled kernels, it is now safe to run
with this kernel config option enabled.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #83
2012-08-27 11:54:41 -07:00
Prakash Surya e3a4360702 Revert "Make CONFIG_PREEMPT Fatal"
This reverts commit 7731d46b69.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-08-27 11:52:53 -07:00
Brian Behlendorf ca8b5af89d Remove autotools products
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #718
2012-08-27 11:47:44 -07:00
Brian Behlendorf c638e9ad04 Remove autotools products
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#718
2012-08-27 11:46:23 -07:00
Richard Yao 074e72953c Check kernel source directory for SPL
ZFS fails to build when SPL is built into the kernel on unless
--with-spl=/path/to/kernel/sources is specified. We fallback to the
kernel sources directory when SPL is not found elsewhere to resolve
that.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closed #896
2012-08-26 13:49:09 -07:00
Massimo Maggi 52cd92022e Fix snapshot automounting with GrSecurity constify plugin.
./configure erroneously detects absence of dops->d_automount
when built against a GrSecurity patched kernel.

Summerized error message found in config.log:

  checking whether dops->d_automount() exists
  ...
  In function 'main': ... error: constified variable 'dops'
  cannot be local

The "dops" variable cannot be a local variable, so it's
moved to the global scope.

This test also fails if the prototype of the dops->d_automount
function pointer is changed.

Signed-off-by: Massimo Maggi <massimo@mmmm.it>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Closes #884
2012-08-24 08:56:38 -07:00
Prakash Surya 26e08952e6 Support building a zfs-modules-dkms sub package
This commit adds support for building a zfs-modules-dkms sub package
built around Dynamic Kernel Module Support. This is to allow building
packages using the DKMS infrastructure which is intended to ease the
burden of kernel version changes, upgrades, etc.

By default zfs-modules-dkms-* sub package will be built as part of
the 'make rpm' target.  Alternately, you can build only the DKMS
module package using the 'make rpm-dkms' target.

Examples:

    # To build packaged binaries as well as a dkms packages
    $ ./configure && make rpm

    # To build only the packaged binary utilities and dkms packages
    $ ./configure && make rpm-utils rpm-dkms

Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are
      supported for building the dkms sub package.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #535
2012-08-08 15:21:01 -07:00
Prakash Surya 5085d55817 Add '--with-spl-timeout' option
When checking for the SPL Module.symvers file, a timeout can now be
passed in which will pause the configure step while it waits for this
file to be generated. By default, the configure behavior is unchanged as
a timeout of 0 is used. If a positive number of seconds is passed,
configure will wait that number of seconds for the Module.symvers file
before moving on.

The main motivation for this change was to support parallel execution of
'./configure && make' for the SPL and ZFS packages in preparation of
supporting DKMS based packages.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-08-08 15:20:55 -07:00
Prakash Surya d83d25c2f8 Support building a spl-modules-dkms sub package
This commit adds support for building a spl-modules-dkms sub package
built around Dynamic Kernel Module Support. This is to allow building
packages using the DKMS infrastructure which is intended to ease the
burden of kernel version changes, upgrades, etc.

By default spl-modules-dkms-* sub package will be built as part of
the 'make rpm' target.  Alternately, you can build only the DKMS
module package using the 'make rpm-dkms' target.

Examples:

    # To build packaged binaries as well as a dkms packages
    $ ./configure && make rpm

    # To build only the packaged binary utilities and dkms packages
    $ ./configure && make rpm-utils rpm-dkms

Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are
      supported for building the dkms sub package.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#535
2012-08-08 13:49:40 -07:00
Etienne Dechamps ee5fd0bb80 Set zvol discard_granularity to the volblocksize.
Currently, zvols have a discard granularity set to 0, which suggests to
the upper layer that discard requests of arbirarily small size and
alignment can be made efficiently.

In practice however, ZFS does not handle unaligned discard requests
efficiently: indeed, it is unable to free a part of a block. It will
write zeros to the specified range instead, which is both useless and
inefficient (see dnode_free_range).

With this patch, zvol block devices expose volblocksize as their discard
granularity, so the upper layer is aware that it's not supposed to send
discard requests smaller than volblocksize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #862
2012-08-07 14:55:31 -07:00
Etienne Dechamps 476ff5a4da Handle any invalidate_inodes_check prototype.
In the comments of commit 723aa3b0c2,
mmatuska reported that the test for invalidate_inodes_check() is broken
if invalidate_inodes() takes two arguments.

This patch fixes the issue by resorting to another approach for
detecting invalidate_inodes_check(): is simply checks if
invalidate_inodes is defined as a macro. If it is, then it concludes
that invalidate_inodes_check() is available. This will continue to work
even if the prototype of invalidate_inodes_check() changes over time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #148
2012-08-06 11:39:49 -07:00
Etienne Dechamps 723aa3b0c2 When checking for symbol exports, try compiling.
This patch adds a new autoconf function: SPL_LINUX_TRY_COMPILE_SYMBOL.
This new function does the following:

 - Call LINUX_TRY_COMPILE with the specified parameters.
 - If unsuccessful, return false.
 - If successful and we're configuring with --enable-linux-builtin,
   return true.
 - Else, call CHECK_SYMBOL_EXPORT with the specified parameters and
   return the result.

All calls to CHECK_SYMBOL_EXPORT are converted to
LINUX_TRY_COMPILE_SYMBOL so that the tests work even when configuring
for builtin on a kernel which doesn't have loadable module support, or
hasn't been built yet.

The only exception are:

 - AC_GET_VMALLOC_INFO, because we don't even have a public header to
include in the test case, but that's okay considering this symbol can
be ignored just fine.

- SPL_AC_DEVICE_CREATE, which is legacy API for 2.6.18 kernels.  Since
kernels this old are no longer supported it should arguably just be
removed entirely from the build system.

Note that we're also checking for the correct prototype with an actual
call, which was not the case with CHECK_SYMBOL_EXPORT. However, for
"complicated" test cases like with multiple symbol versions (e.g.
vfs_fsync), we stick with the original behavior and only check for the
function's existence.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#851
2012-07-26 15:12:35 -07:00
Etienne Dechamps df7cc5bc71 Fake modpost stage for LINUX_COMPILE.
Currently, when building a test case, we're compiling an entire Linux
module from beginning to end. This includes the MODPOST stage, which
generates a "conftest.mod.c" file with some boilerplate module
declaration code.

This poses a problem when configuring for built-in on kernels which have
loadable module support disabled. In this case conftest.mod.c is
referencing disabled code, resulting in a compilation failure, thus
breaking the tests.

This patch fixes the issue by faking the modpost stage when the
--enable-linux-builtin option is provided.  It does so by forcing the
modpost command to be /bin/true, and using an empty conftest.mod.c file.
The test module still compiles fine, although the result isn't loadable,
but we don't really care at this point.

Note it is important to preserve the modpost stage when building out of
tree.  This allows for the posibility of configure checks to leverage
this phase to identify GPL-only symbols.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#851
2012-07-26 15:12:10 -07:00
Etienne Dechamps 0408008b33 Make configure builtin-aware.
This patch adds a new option to configure: --enable-linux-builtin. When
this option is used, the following happens:

 - Compilation of kernel modules is disabled.

 - A failure to find UTS_RELEASE is followed by a suggestion to run
   "make prepare" on the kernel source tree.

This patch also adds a new test which tries to compile an empty module
as a basic toolchain sanity test. If it fails and the option was
specified, the error is followed by a suggestion to run "make scripts"
on the kernel source tree.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#851
2012-07-26 14:55:20 -07:00
Etienne Dechamps 016432fbeb Don't build packages that haven't been selected.
Currently, when configure --with-config is used, selective compilation
is only effective for the simple "make" case. Package builders (e.g.
make rpm) still build everything (utils and modules). This patch fixes
that.

This patch also drops the duplicate rpm-modules build target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Issue zfsonlinux/zfs#851
2012-07-26 14:54:32 -07:00
Etienne Dechamps 705741827a When checking for symbol exports, try compiling.
This patch adds a new autoconf function: ZFS_LINUX_TRY_COMPILE_SYMBOL.
This new function does the following:

 - Call LINUX_TRY_COMPILE with the specified parameters.
 - If unsuccessful, return false.
 - If successful and we're configuring with --enable-linux-builtin,
   return true.
 - Else, call CHECK_SYMBOL_EXPORT with the specified parameters and
   return the result.

All calls to CHECK_SYMBOL_EXPORT are converted to
LINUX_TRY_COMPILE_SYMBOL so that the tests work even when configuring
for builtin on a kernel which doesn't have loadable module support, or
hasn't been built yet.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
2012-07-26 13:42:57 -07:00
Etienne Dechamps fc88a6dda9 Fake modpost stage for LINUX_COMPILE.
Currently, when building a test case, we're compiling an entire Linux
module from beginning to end. This includes the MODPOST stage, which
generates a "conftest.mod.c" file with some boilerplate module
declaration code.

This poses a problem when configuring for built-in on kernels which have
loadable module support disabled. In this case conftest.mod.c is
referencing disabled code, resulting in a compilation failure, thus
breaking the tests.

This patch fixes the issue by faking the modpost stage when the
--enable-linux-builtin option is provided.  It does so by forcing the
modpost command to be /bin/true, and using an empty conftest.mod.c file.
The test module still compiles fine, although the result isn't loadable,
but we don't really care at this point.

Note it is important to preserve the modpost stage when building out of
tree.  The ZFS_AC_KERNEL_BLK_END_REQUEST, ZFS_AC_KERNEL_BLK_QUEUE_FLUSH,
and ZFS_AC_KERNEL_BLK_RQ_BYTES configure checks all depend on it to
identify GPL-only symbols.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
2012-07-26 13:41:02 -07:00
Etienne Dechamps 319a99a3d4 Make configure builtin-aware.
This patch adds a new option to configure: --enable-linux-builtin. When
this option is used, the following happens:

 - Compilation of kernel modules is disabled.

 - A failure to find UTS_RELEASE is followed by a suggestion to run
   "make prepare" on the kernel source tree.

This patch also adds a new test which tries to compile an empty module
as a basic toolchain sanity test. If it fails and the option was
specified, the error is followed by a suggestion to run "make scripts"
on the kernel source tree.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
2012-07-26 13:40:18 -07:00
Etienne Dechamps b2c5198b19 Don't build packages that haven't been selected.
Currently, when configure --with-config is used, selective compilation
is only effective for the simple "make" case. Package builders (e.g.
make rpm) still build everything (utils and modules). This patch fixes
that.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
2012-07-26 13:39:37 -07:00
Richard Yao 739a1a82e0 Linux 3.5 compat, end_writeback() changed to clear_inode()
The end_writeback() function was changed by moving the call to
inode_sync_wait() earlier in to evict().   This effecitvely changes
the ordering of the sync but it does not impact the details of
the zfs implementation.

However, as part of this change end_writeback() was renamed to
clear_inode() to reflect the new semantics.  This change does
impact us and clear_inode() now maps to end_writeback() for
kernels prior to 3.5.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #784
2012-07-23 12:29:36 -07:00
Richard Yao ea1fdf46e2 Linux 3.5 compat, iops->truncate_range() removed
The vmtruncate_range() support has been removed from the kernel in
favor of using the fallocate method in the file_operations table.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:32 -07:00
Richard Yao 756c3e5a9c Linux 3.5 compat, eops->encode_fh() takes inodes
The export_operations member ->encode_fh() has been updated to
take both the child and parent inodes.  This interface used to
take the child dentry and a bool describing if the parent is needed.

NOTE: While updating this code I noticed that we do not currently
cleanly handle the case where we're passed a connectable parent.
This code should be audited to make sure we're doing the right thing.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:23 -07:00
Richard Yao ed3fc80048 Fix NULL pointer dereference on PaX/GRSecurity patched Linux 3.3 and later kernels
Support for PaX/GRSecurity patched kernels was developed against Linux
3.2.  Unfortunately, an autotools check introduced for a Linux 3.3 API
fails on PaX/GRSecurity patched kernels. This causes the module to be
built against the Linux 3.2 ABI, which results in a NULL pointer
dereference at runtime.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Closes #794
Closes #809
2012-07-20 12:31:45 -07:00
Richard Yao 0a6b03d3b8 Fix build failures on PaX/GRSecurity patched kernels
Gentoo Hardened kernels include the PaX/GRSecurity patches. They use a
dialect of C that relies on a GCC plugin. In particular, struct
file_operations has been marked do_const in the PaX/GRSecurity dialect,
which causes GCC to consider all instances of it as const. This caused
failures in the autotools checks and the ZFS source code.

To address this, we modify the autotools checks to take into account
differences between the PaX C dialect and the regular C dialect. We also
modify struct zfs_acl's z_ops member to be a pointer to a function
pointer table. Lastly, we modify zpl_put_link() to address a PaX change
to the function prototype of nd_get_link().  This avoids compiler errors
in the PaX/GRSecurity dialect.

Note that the change in zpl_put_link() causes a warning that becomes a
build failure when debugging is enabled. Fixing that warning requires
ryao/spl@5ca50ef459.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #484
2012-07-17 09:22:43 -07:00
Etienne Dechamps b5a28807cd Move partition scanning from userspace to module.
Currently, zpool online -e (dynamic vdev expansion) doesn't work on
whole disks because we're invoking ioctl(BLKRRPART) from userspace
while ZFS still has a partition open on the disk, which results in
EBUSY.

This patch moves the BLKRRPART invocation from the zpool utility to the
module. Specifically, this is done just before opening the device in
vdev_disk_open() which is called inside vdev_reopen(). This requires
jumping through some hoops to get to the disk device from the partition
device, and to make sure we can still open the partition after the
BLKRRPART call.

Note that this new code path is triggered on dynamic vdev expansion
only; other actions, like creating a new pool, are unchanged and still
call BLKRRPART from userspace.

This change also depends on API changes which are available in 2.6.37
and latter kernels.  The build system has been updated to detect this,
but there is no compatibility mode for older kernels.  This means that
online expansion will NOT be available in older kernels.  However, it
will still be possible to expand the vdev offline.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #808
2012-07-17 09:17:31 -07:00
Richard Yao 36811b4430 Detect kernels that honor gfp flags passed to vmalloc()
zfsonlinux/spl@2092cf68d8 used
PF_MEMALLOC to workaround a bug in the Linux kernel where
allocations did not honor the gfp flags passed to vmalloc().
Unfortunately, PF_MEMALLOC has the side effect of permitting
allocations to allocate pages outside of ZONE_NORMAL. This
has been observed to result in the depletion of ZONE_DMA32.

A kernel patch is available in the Gentoo bug tracker for
this issue.

  https://bugs.gentoo.org/show_bug.cgi?id=416685

This negates any benefit PF_MEMALLOC provides, so we introduce
an autotools check to disable the use of PF_MEMALLOC on
systems with patched kernels.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #126
2012-07-11 11:44:27 -07:00
Richard Yao e0093fea58 Linux 3.4 compat, __clear_close_on_exec replaces FD_CLR
torvalds/linux@1dce27c5aa introduced
__clear_close_on_exec() as a replacement for FD_CLR. Further commits
appear to have removed FD_CLR from the Linux source tree.  This
causes the following failure:

  error: implicit declaration of function '__FD_CLR'
  [-Werror=implicit-function-declaration]

To correct this we update the code to use the current
__clear_close_on_exec() interface for readability.  Then we introduce
an autotools check to determine if __clear_close_on_exec() is available.
If it isn't then we define some compatibility logic which used the older
FD_CLR() interface.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #124
2012-06-13 16:18:51 -07:00
Richard Yao 6a0936babc Linux 3.4 compat, d_make_root() replaces d_alloc_root()
torvalds/linux@adc0e91ab1 introduced
introduced d_make_root() as a replacement for d_alloc_root(). Further
commits appear to have removed d_alloc_root() from the Linux source
tree. This causes the following failure:

  error: implicit declaration of function 'd_alloc_root'
  [-Werror=implicit-function-declaration]

To correct this we update the code to use the current d_make_root()
interface for readability.  Then we introduce an autotools check
to determine if d_make_root() is available.  If it isn't then we
define some compatibility logic which used the older d_alloc_root()
interface.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #776
2012-06-11 10:04:49 -07:00
Ned Bass cac1f230e0 Improve CONFIG_DEBUG_LOCK_ALLOC error message
The configure script error message for kernels built with
CONFIG_DEBUG_LOCK_ALLOC may give the impression that the issue is
strictly with license compliance.  To avoid confusion add some words
indicating that the linking stage will fail if the build continues.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #773
2012-06-11 09:28:04 -07:00
Brian Behlendorf e5b8562277 Extend CONFIG_DEBUG_LOCK_ALLOC check
The CONFIG_DEBUG_LOCK_ALLOC check at configure time was added to
detect when mutex_lock() is defined as a GPL-only symbol.  However,
the check as written only inferred this from this configuration
setting, it never actually checked.  This change introduces that
missing check to prevent false positives.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-06-01 08:51:56 -07:00
Brian Behlendorf b39d3b9f7b Linux 3.3 compat, iops->create()/mkdir()/mknod()
The mode argument of iops->create()/mkdir()/mknod() was changed from
an 'int' to a 'umode_t'.  To prevent a compiler warning an autoconf
check was added to detect the API change and then correctly set a
zpl_umode_t typedef.  There is no functional change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #701
2012-04-30 12:52:38 -07:00
Brian Behlendorf f47e1351db Fix executable permissions
Caught by lint, this permission change was accidentally introduced
by commit 42cb3819f1.  Restore the
correct permissions and while I'm at it add a missing whack-bang
to config/ltmain.sh.

  lint: executable-not-elf-or-script: zpool_main.c zfs_main.c

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #620
2012-03-26 11:52:44 -07:00
Brian Behlendorf 1c5de20ae2 Add --enable-debug-dmu-tx configure option
Allow rigorous (and expensive) tx validation to be enabled/disabled
indepentantly from the standard zfs debugging.  When enabled these
checks ensure that all txs are constructed properly and that a dbuf
is never dirtied without taking the correct tx hold.

This checking is particularly helpful when adding new dmu consumers
like Lustre.  However, for established consumers such as the zpl
with no known outstanding tx construction problems this is just
overhead.

--enable-debug-dmu-tx  - Enable/disable validation of each tx as
--disable-debug-dmu-tx   it is constructed.  By default validation
                         is disabled due to performance concerns.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-23 12:25:17 -07:00
Brian Behlendorf ebe7e575ea Add .zfs control directory
Add support for the .zfs control directory.  This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required.  The bulk of the core
functionality is now all there with the following limitations.

*) The .zfs/snapshot directory automount support requires a 2.6.37
   or newer kernel.  The exception is RHEL6.2 which has backported
   the d_automount patches.

*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
   in the .zfs/snapshot directory works as expected.  However,
   this functionality is only available to root until zfs
   delegations are finished.

      * mkdir - create a snapshot
      * rmdir - destroy a snapshot
      * mv    - rename a snapshot

The following issues are known defeciences, but we expect them to
be addressed by future commits.

*) Add automount support for kernels older the 2.6.37.  This should
   be possible using follow_link() which is what Linux did before.

*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
   The majority of the ground work for this is complete.  However,
   finishing this work will require resolving some lingering
   integration issues with the Linux NFS kernel server.

*) The .zfs/shares directory exists but no futher smb functionality
   has yet been implemented.

Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #173
2012-03-22 13:03:47 -07:00
Brian Behlendorf a3a69b74cd Fix distribution detection
Improve the distribution detection by moving the tests for
distribution specific files first.  The Ubuntu and Debian
checks are left for last because they are the least likely
to be unique.  This is particularly true in the case of Debian
since so many distributions are based on Debian.

Since this is currently only used to identify the correct
packaging method for this system the result in many instances
is simply cosmetic.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-05 10:38:38 -08:00
Richard Yao 76c2b24c61 Fix distribution detection
Improve the distribution detection by moving the tests for
distribution specific files first.  The Ubuntu and Debian
checks are left for last because they are the least likely
to be unique.  This is particularly true in the case of Debian
since so many distributions are based on Debian.

Since this is currently only used to identify the correct
packaging method for this system the result in many instances
is simply cosmetic.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-05 10:38:27 -08:00
Brian Behlendorf 3c208a5480 Cleanly support debug packages
Allow a source rpm to be rebuilt with debugging enabled.  This
avoids the need to have to manually modify the spec file.  By
default debugging is still largely disabled.  To enable specific
debugging features use the following options with rpmbuild.

  '--with debug'               - Enables ASSERTs
  '--with debug-log'           - Enables the internal debug log
  '--with debug-kmem'          - Enables basic memory accounting
  '--with debug-kmem-tracking' - Enables detailed memory tracking

  # For example:
  $ rpmbuild --rebuild --with debug spl-modules-0.6.0-rc6.src.rpm

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-27 14:24:22 -08:00
Brian Behlendorf 4b787d75c8 Cleanly support debug packages
Allow a source rpm to be rebuilt with debugging enabled.  This
avoids the need to have to manually modify the spec file.  By
default debugging is still largely disabled.  To enable specific
debugging features use the following options with rpmbuild.

  '--with debug'               - Enables ASSERTs

  # For example:
  $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm

Additionally, ZFS_CONFIG has been added to zfs_config.h for
packages which build against these headers.  This is critical
to ensure both zfs and the dependant package are using the same
prototype and structure definitions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-27 14:08:17 -08:00
Etienne Dechamps 30930fba21 Add support for DISCARD to ZVOLs.
DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
It allows ZVOL clients to discard (unmap, trim) block ranges from
a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
shrink instead of just grow.

We can't use zfs_space() or zfs_freesp() here, since these functions
only work on regular files, not volumes. Fortunately we can use the
low-level function dmu_free_long_range() which does exactly what we
want.

Currently the discard operation is not added to the log. That's not
a big deal since losing discard requests cannot result in data
corruption. It would however result in disk space usage higher than
it should be. Thus adding log support to zvol_discard() is probably
a good idea for a future improvement.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-09 16:19:38 -08:00
Etienne Dechamps cb2d19010d Support the fallocate() file operation.
Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is
supported, since it's the only one that matches the behavior of
zfs_space(). This makes it pretty much useless in its current
form, but it's a start.

To support other flag combinations we would need to modify
zfs_space() to make it more flexible, or emulate the desired
functionality in zpl_fallocate().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #334
2012-02-09 16:19:32 -08:00
Etienne Dechamps 34037afe24 Improve ZVOL queue behavior.
The Linux block device queue subsystem exposes a number of configurable
settings described in Linux block/blk-settings.c. The defaults for these
settings are tuned for hard drives, and are not optimized for ZVOLs. Proper
configuration of these options would allow upper layers (I/O scheduler) to
take better decisions about write merging and ordering.

Detailed rationale:

 - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to
   handle writes of any size, so there's no reason to impose a limit. Let the
   upper layer decide.

 - max_segments and max_segment_size are set to unlimited. zvol_write() will
   copy the requests' contents into a dbuf anyway, so the number and size of
   the segments are irrelevant. Let the upper layer decide.

 - physical_block_size and io_opt are set to the ZVOL's block size. This
   has the potential to somewhat alleviate issue #361 for ZVOLs, by warning
   the upper layers that writes smaller than the volume's block size will be
   slow.

 - The NONROT flag is set to indicate this isn't a rotational device.
   Although the backing zpool might be composed of rotational devices, the
   resulting ZVOL often doesn't exhibit the same behavior due to the COW
   mechanisms used by ZFS. Setting this flag will prevent upper layers from
   making useless decisions (such as reordering writes) based on incorrect
   assumptions about the behavior of the ZVOL.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-07 16:23:06 -08:00
Etienne Dechamps b18019d2d8 Fix synchronicity for ZVOLs.
zvol_write() assumes that the write request must be written to stable storage
if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed,
"sync" does *not* mean what we think it means in the context of the Linux
block layer. This is well explained in linux/fs.h:

    WRITE:       A normal async write. Device will be plugged.
    WRITE_SYNC:  Synchronous write. Identical to WRITE, but passes down
                 the hint that someone will be waiting on this IO
                 shortly.
    WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush.
    WRITE_FUA:   Like WRITE_SYNC but data is guaranteed to be on
                 non-volatile media on completion.

In other words, SYNC does not *mean* that the write must be on stable storage
on completion. It just means that someone is waiting on us to complete the
write request. Thus triggering a ZIL commit for each SYNC write request on a
ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL
users have no way to express that they actually want data to be written to
stable storage, which means the ZIL is broken for ZVOLs.

The request for stable storage is expressed by the FUA flag, so we must
commit the ZIL after the write if the FUA flag is set. In addition, we must
commit the ZIL before the write if the FLUSH flag is set.

Also, we must inform the block layer that we actually support FLUSH and FUA.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-07 16:23:06 -08:00
Brian Behlendorf 47621f3d76 Linux 3.3 compat, sops->show_options()
The second argument of sops->show_options() was changed from a
'struct vfsmount *' to a 'struct dentry *'.  Add an autoconf check
to detect the API change and then conditionally define the expected
interface.  In either case we are only interested in the zfs_sb_t.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #549
2012-02-03 10:02:01 -08:00
Brian Behlendorf 4b2220f0b9 Add --enable-debug-log configure option
Until now the notion of an internal debug logging infrastructure
was conflated with enabling ASSERT()s.  This patch clarifies things
by cleanly breaking the two subsystem apart.  The result of this
is the following behavior.

--enable-debug      - Enable/disable code wrapped in ASSERT()s.
--disable-debug       ASSERT()s are used to check invariants and
                      are never required for correct operation.
                      They are disabled by default because they
                      may impact performance.

--enable-debug-log  - Enable/disable the debug log infrastructure.
--disable-debug-log   This infrastructure allows the spl code and
                      its consumer to log messages to an in-kernel
                      log.  The granularity of the logging can be
                      controlled by a debug mask.  By default the
                      mask disables most debug messages resulting
                      in a negligible performance impact.  Because
                      of this the debug log is enabled by default.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-02 11:27:54 -08:00
Brian Behlendorf b40a77aefc Add the release component to headers
When the original build system code was added the release
component was accidentally omited from the development header
install path.  This patch adds the missing path component so
it's always clear exactly what release your compiling against.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-01-18 12:19:47 -08:00
Brian Behlendorf 0b14b9f327 Run SPL_AC_PACMAN only if $VENDOR is "arch"
Unfortunately, Arch's package manager `pacman` shares it's name with a
popular arcade video game. Thus, in order to refrain from executing the
video game when we mean to execute the package manager, SPL_AC_PACMAN is
now only run when $VENDOR is determined to be "arch".

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#517
2012-01-13 09:08:12 -08:00
Prakash Surya 58d956b085 Run ZFS_AC_PACMAN only if $VENDOR is "arch"
Unfortunately, Arch's package manager `pacman` shares it's name with a
popular arcade video game. Thus, in order to refrain from executing the
video game when we mean to execute the package manager, ZFS_AC_PACMAN is
now only run when $VENDOR is determined to be "arch".

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #517
2012-01-13 09:03:11 -08:00
Brian Behlendorf 166dd49de0 Linux 3.2 compat, security_inode_init_security()
The security_inode_init_security() API has been changed to include
a filesystem specific callback to write security extended attributes.
This was done to support the initialization of multiple LSM xattrs
and the EVM xattr.

This change updates the code to use the new API when it's available.
Otherwise it falls back to the previous implementation.

In addition, the ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY
autoconf test has been made more rigerous by passing the expected
types.  This is done to ensure we always properly the detect the
correct form for the security_inode_init_security() API.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #516
2012-01-12 15:06:39 -08:00
Darik Horn 588d900433 Linux 3.2 compat: rw_semaphore.wait_lock is raw
The wait_lock member of the rw_semaphore struct became a raw_spinlock_t
in Linux 3.2 at torvalds/linux@ddb6c9b58a.

Wrap spin_lock_* function calls in a new spl_rwsem_* interface to
ensure type safety if raw_spinlock_t becomes architecture specific,
and to satisfy these compiler warnings:

  warning: passing argument 1 of ‘spinlock_check’
    from incompatible pointer type [enabled by default]
  note: expected ‘struct spinlock_t *’
    but argument is of type ‘struct raw_spinlock_t *’

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #76
Closes: zfsonlinux/zfs#463
2012-01-11 16:28:05 -08:00
Brian Behlendorf ab26409db7 Linux 3.1 compat, super_block->s_shrink
The Linux 3.1 kernel has introduced the concept of per-filesystem
shrinkers which are directly assoicated with a super block.  Prior
to this change there was one shared global shrinker.

The zfs code relied on being able to call the global shrinker when
the arc_meta_limit was exceeded.  This would cause the VFS to drop
references on a fraction of the dentries in the dcache.  The ARC
could then safely reclaim the memory used by these entries and
honor the arc_meta_limit.  Unfortunately, when per-filesystem
shrinkers were added the old interfaces were made unavailable.

This change adds support to use the new per-filesystem shrinker
interface so we can continue to honor the arc_meta_limit.  The
major benefit of the new interface is that we can now target
only the zfs filesystem for dentry and inode pruning.  Thus we
can minimize any impact on the caching of other filesystems.

In the context of making this change several other important
issues related to managing the ARC were addressed, they include:

* The dnlc_reduce_cache() function which was called by the ARC
to drop dentries for the Posix layer was replaced with a generic
zfs_prune_t callback.  The ZPL layer now registers a callback to
drop these dentries removing a layering violation which dates
back to the Solaris code.  This callback can also be used by
other ARC consumers such as Lustre.

  arc_add_prune_callback()
  arc_remove_prune_callback()

* The arc_reduce_dnlc_percent module option has been changed to
arc_meta_prune for clarity.  The dnlc functions are specific to
Solaris's VFS and have already been largely eliminated already.
The replacement tunable now represents the number of bytes the
prune callback will request when invoked.

* Less aggressively invoke the prune callback.  We used to call
this whenever we exceeded the arc_meta_limit however that's not
strictly correct since it results in over zeleous reclaim of
dentries and inodes.  It is now only called once the arc_meta_limit
is exceeded and every effort has been made to evict other data from
the ARC cache.

* More promptly manage exceeding the arc_meta_limit.  When reading
meta data in to the cache if a buffer was unable to be recycled
notify the arc_reclaim thread to invoke the required prune.

* Added arcstat_prune kstat which is incremented when the ARC
is forced to request that a consumer prune its cache.  Remember
this will only occur when the ARC has no other choice.  If it
can evict buffers safely without invoking the prune callback
it will.

* This change is also expected to resolve the unexpect collapses
of the ARC cache.  This would occur because when exceeded just the
arc_meta_limit reclaim presure would be excerted on the arc_c
value via arc_shrink().  This effectively shrunk the entire cache
when really we just needed to reclaim meta data.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #466
Closes #292
2012-01-11 11:46:02 -08:00
Brian Behlendorf 5f6c14b1ed Proxmox VE kernel compat, invalidate_inodes()
The Proxmox VE kernel contains a patch which renames the function
invalidate_inodes() to invalidate_inodes_check().  In the process
it adds a 'check' argument and a '#define invalidate_inodes(x)'
compatibility wrapper for legacy callers.  Therefore, if either
of these functions are exported invalidate_inodes() can be
safely used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #58
2011-12-21 14:29:45 -08:00
Prakash Surya 8eaa020b46 Move Arch Linux's VENDOR check above Ubuntu's
If the lsb-release package is installed on an Arch Linux distribution,
the configure step will incorrectly detect the running distribution as
Ubuntu. This is a result of both distributions providing an
/etc/lsb-release file, and the Ubuntu VENDOR check being performed
first.

Since the Arch Linux test check's for a file more specific to the Arch
Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's
check provides a quick and easy solution.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-12-19 12:05:10 -08:00
Prakash Surya cd2817f8a6 Move Arch Linux's VENDOR check above Ubuntu's
If the lsb-release package is installed on an Arch Linux distribution,
the configure step will incorrectly detect the running distribution as
Ubuntu. This is a result of both distributions providing an
/etc/lsb-release file, and the Ubuntu VENDOR check being performed
first.

Since the Arch Linux test check's for a file more specific to the Arch
Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's
check provides a quick and easy solution.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #72
2011-12-19 12:03:40 -08:00
Darik Horn 28eb9213d8 Linux 3.2 compat: set_nlink()
Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit

  SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170

Use the new set_nlink() kernel function instead.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #462
2011-12-16 20:02:52 -08:00
Prakash Surya 6ba3b44614 Add make rule for building Arch Linux packages
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:

    $ ./configure
    $ make pkg     # Alternatively, one can run 'make arch' as well

on the Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the zfs userland utilities and
another for the zfs kernel modules. The new packages can then be
installed by running:

    # pacman -U $package.pkg.tar.xz

In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be build using the 'sarch' make
rule.

NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #491
2011-12-14 19:14:23 -08:00
Prakash Surya c2dceb5cd5 Add make rule for building Arch Linux packages
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:

    $ ./configure
    $ make pkg     # Alternatively, one can run 'make arch' as well

on an Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the spl userland utilties and
another for the spl kernel modules. The new packages can then be
installed by running:

    # pacman -U $package.pkg.tar.xz

In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be built using the 'sarch' make
rule.

NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #68
2011-12-14 16:44:10 -08:00
Prakash Surya b9c59ec83a Fix configure tests to play nice with GCC 4.6
As of GCC 4.6, specific kernel 2.6.32 header files do not compile
cleanly without warnings. One specific example of this is the
arch/x86/include/asm/percpu.h file. Thus, a few of the configure tests
were getting hung up on this and the '-Wno-unsued-but-set-variables'
compile option had to be introduced.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #459
2011-11-29 16:14:25 -08:00
Prakash Surya e89236fd28 In autoconf v2.68, AC_LANG_PROGRAM must be quoted
This change updates the AC_LANG_PROGRAM autoconf macro invocations to be
wrapped in quotes. As of autoconf version 2.68, the quotes are necessary
to prevent warnings from appearing. Specifically, the autoconf v2.68
Forward Porting Notes specifies:

    It is important to note that you need to ensure that the call to
    AC_LANG_SOURCE is quoted and not expanded, otherwise that will
    cause the warning to appear nonetheless.

Finally, because of the additional quoting we can drop the extra
quotas used by the ZFS_AC_CONFIG_USER_STACK_GUARD autoconf check.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #464
2011-11-28 11:16:33 -08:00
Brian Behlendorf adcd70bd1a Linux 3.1 compat, fops->fsync()
The Linux 3.1 kernel updated the fops->fsync() callback yet again.
They now pass the requested range and delegate the responsibility
for calling filemap_write_and_wait_range() to the callback.  In
addition imutex is no longer held by the caller and the callback
is responsible for taking the lock if required.

This commit updates the code to provide a zpl_fsync() function
for the updated API.  Implementations for the previous two APIs
are also maintained for compatibility.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #445
2011-11-10 10:03:08 -08:00
Brian Behlendorf 0d0b523728 Linux 3.1 compat, vfs_fsync()
Preferentially use the vfs_fsync() function.  This function was
initially introduced in 2.6.29 and took three arguments.  As
of 2.6.35 the dentry argument was dropped from the function.
For older kernels fall back to using file_fsync() which also
took three arguments including the dentry.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #52
2011-11-09 19:36:21 -08:00
Brian Behlendorf 12ff95ff57 Linux 3.1 compat, kern_path_parent()
Prior to Linux 3.1 the kern_path_parent symbol was exported for
use by kernel modules.  As of Linux 3.1 it is now longer easily
available.  To handle this case the spl will now dynamically
look up address of the missing symbol at module load time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #52
2011-11-09 16:51:25 -08:00
Brian Behlendorf 8c19f5b407 Suppress packaging warning
Only under Ubuntu Lucid the rpm packaging step mistakenly adds
the following files twice to the package because of the /lib
naming convention.  This is harmless but results in a warning
which the buildot flags as a failure.  Suppress this warning.

  warning: File listed twice: /lib/udev/rules.d
  warning: File listed twice: /lib/udev/rules.d/60-zpool.rules
  warning: File listed twice: /lib/udev/rules.d/60-zvol.rules
  warning: File listed twice: /lib/udev/rules.d/90-zfs.rules
  warning: File listed twice: /lib/udev/sas_switch_id
  warning: File listed twice: /lib/udev/zpool_id
  warning: File listed twice: /lib/udev/zvol_id

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-11-08 11:32:04 -08:00
Brian Behlendorf 5547c2f1bf Simplify BDI integration
Update the code to use the bdi_setup_and_register() helper to
simplify the bdi integration code.  The updated code now just
registers the bdi during mount and destroys it during unmount.

The only complication is that for 2.6.32 - 2.6.33 kernels the
helper wasn't available so in these cases the zfs code must
provide it.  Luckily the bdi_setup_and_register() function
is trivial.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #367
2011-11-08 10:19:03 -08:00
Brian Behlendorf 97fd6a07c2 Fix HAVE_FS_STRUCT_SPINLOCK check for gcc-4.1.2
Older versions of gcc (gcc-4.1.2) will treat an 'incompatible
pointer type' as a warning instead of an error.  This results
in HAVE_FS_STRUCT_SPINLOCK being defined incorrectly.  This
failure mode was observed when using a RHEL6 2.6.32 based kernel
under RHEL5.5 which contains the old version of gcc.  To resolve
the issue the warning is explicitly promoted to an error.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-09-19 13:45:08 -07:00
Prakash Surya 8366cd6a83 Convert 'if' statements to AS_IF in kernel.m4
The 'if' statements found in kernel.m4 were converted to use the
portable alternative provided by autoconf, the AS_IF macro.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-09-06 13:20:48 -07:00
Prakash Surya 2984e0bb0c Fix minor autoconf error message inconsistencies
A few of the autoconf error messages were inconsistent with the rest of
the build system. To be specific, the inconsistencies addressed by this
commit are the following:

 * The second line of the error message for the CONFIG_PREEMPT check
   was missing it's third asterisk.

 * A few of the error messages were prefixed by two tabs, whereas the
   majority of error messages are only prefixed by a single tab.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-09-06 13:20:04 -07:00
Brian Behlendorf 9c4f40b894 Buildbot suppression rules
The warnings listed in the suppression file will be suppressed
and not flagged during regular buildbot builds.  These warnings
are expected, harmless, and can obscure real issues unless they
are suppressed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-08-19 16:26:06 -07:00
Brian Behlendorf ddd052aa83 Improve HAVE_EVICT_INODE check
The hardened gentoo kernel defines all of the super block
operation callbacks as const.  This prevents the autoconf test
from assigning the callback and results in a false negative.
By moving the assignment in to the declaration we can avoid
this issue and get a correct result for this patched kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #296
2011-08-08 16:42:09 -07:00
Kyle Fuller 12d06bac9b Move udev rules from /etc/udev to /lib/udev
This change moves the default install location for the zfs udev
rules from /etc/udev/ to /lib/udev/.  The correct convention is
for rules provided by a package to be installed in /lib/udev/.
The /etc/udev/ directory is reserved for custom rules or local
overrides.

Additionally, this patch cleans up some abuse of the bindir install
location by adding a udevdir and udevruledir install directories.
This allows us to revert to the default bin install location.  The
udev install directories can be set with the following new options.

  --with-udevdir=DIR      install udev helpers [EPREFIX/lib/udev]
  --with-udevruledir=DIR  install udev rules [UDEVDIR/rules.d]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #356
2011-08-08 16:21:10 -07:00
Brian Behlendorf 76659dc110 Add backing_device_info per-filesystem
For a long time now the kernel has been moving away from using the
pdflush daemon to write 'old' dirty pages to disk.  The primary reason
for this is because the pdflush daemon is single threaded and can be
a limiting factor for performance.  Since pdflush sequentially walks
the dirty inode list for each super block any delay in processing can
slow down dirty page writeback for all filesystems.

The replacement for pdflush is called bdi (backing device info).  The
bdi system involves creating a per-filesystem control structure each
with its own private sets of queues to manage writeback.  The advantage
is greater parallelism which improves performance and prevents a single
filesystem from slowing writeback to the others.

For a long time both systems co-existed in the kernel so it wasn't
strictly required to implement the bdi scheme.  However, as of
Linux 2.6.36 kernels the pdflush functionality has been retired.

Since ZFS already bypasses the page cache for most I/O this is only
an issue for mmap(2) writes which must go through the page cache.
Even then adding this missing support for newer kernels was overlooked
because there are other mechanisms which can trigger writeback.

However, there is one critical case where not implementing the bdi
functionality can cause problems.  If an application handles a page
fault it can enter the balance_dirty_pages() callpath.  This will
result in the application hanging until the number of dirty pages in
the system drops below the dirty ratio.

Without a registered backing_device_info for the filesystem the
dirty pages will not get written out.  Thus the application will hang.
As mentioned above this was less of an issue with older kernels because
pdflush would eventually write out the dirty pages.

This change adds a backing_device_info structure to the zfs_sb_t
which is already allocated per-super block.  It is then registered
when the filesystem mounted and unregistered on unmount.  It will
not be registered for mounted snapshots which are read-only.  This
change will result in flush-<pool> thread being dynamically created
and destroyed per-mounted filesystem for writeback.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #174
2011-08-04 13:37:38 -07:00
Brian Behlendorf 0da7869690 Fix the configure CONFIG_* option detection
The latest kernels no longer define AUTOCONF_INCLUDED which was
being used to detect the new style autoconf.h kernel configure
options.  This results in the CONFIG_* checks always failing
incorrectly for newer kernels.

The fix for this is a simplification of the testing method.
Rather than attempting to explicitly include to renamed config
header.  It is simpler to unconditionally include <linux/module.h>
which must pick up the correctly named header.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #320
2011-07-22 15:07:16 -07:00
Brian Behlendorf c064bdee95 Fix the configure CONFIG_* option detection
The latest kernels no longer define AUTOCONF_INCLUDED which was
being used to detect the new style autoconf.h kernel configure
options.  This results in the CONFIG_* checks always failing
incorrectly for newer kernels.

The fix for this is a simplification of the testing method.
Rather than attempting to explicitly include to renamed config
header.  It is simpler to unconditionally include <linux/module.h>
which must pick up the correctly named header.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #320
2011-07-22 15:07:03 -07:00
Kyle Fuller 615ab66d18 Provide a rc.d script for archlinux
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d.  This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #322
2011-07-11 14:12:23 -07:00
Brian Behlendorf 2cf7f52bc4 Linux compat 2.6.39: mount_nodev()
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure.  When using the new
interface the caller must now use the mount_nodev() helper.

Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers.  This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use.  It provides our only entry point
in to the namespace layer for manipulating certain mount options.

This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly.  It also allowed me
to keep more of the original Solaris code unmodified.  Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do.  However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem.  Thus keeping a back
reference from the filesystem to the namespace is complicated.

Rather than introduce some ugly hack to get the vfsmount and
continue as before.  I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.

This commit updates the code to remove this vfsmount back
reference entirely.  All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.

Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code.  This
change which fairly involved has turned out nicely.

Closes #246
Closes #217
Closes #187
Closes #248
Closes #231
2011-07-01 13:36:39 -07:00
Brian Behlendorf 5c03efc379 Linux compat 2.6.39: security_inode_init_security()
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.

Closes #246
Closes #217
Closes #187
2011-07-01 12:40:08 -07:00
Brian Behlendorf bd2f5ac97f Avoid 'rpm -q' bug for 'make pkg'
RPM version 4.9.0 has been observed to generate extra debug
messages in certain cases.  These debug messages prevent us
from cleanly acquiring the architecture.  This is clearly
an upstream RPM bug which will get fixed.  But until then
a safe solution is to pipe the result through 'tail -1'
to just grab the architecture bit we care about.

Example 'rpm -qp spl-0.6.0-rc4.src.rpm --qf %{arch}' output:

Freeing read locks for locker 0x166: 28031/47480843735008
Freeing read locks for locker 0x168: 28031/47480843735008
x86_64
2011-07-01 12:39:25 -07:00
Prasad Joshi b312979252 Tear down and flush the mmap region
The inode eviction should unmap the pages associated with the inode.
These pages should also be flushed to disk to avoid the data loss.
Therefore, use truncate_setsize() in evict_inode() to release the
pagecache.

The API truncate_setsize() was added in 2.6.35 kernel. To ensure
compatibility with the old kernel, the patch defines its own
truncate_setsize function.

Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com>
Closes #255
2011-06-27 09:59:19 -07:00
Brian Behlendorf 86fd39f354 Linux 2.6.39 compat, mutex owner
Prior to Linux 2.6.39 when CONFIG_DEBUG_MUTEXES was defined
the kernel stored a thread_info pointer as the mutex owner.
From this you could get the pointer of the current task_struct
to compare with get_current().

As of Linux 2.6.39 this behavior has changed and now the mutex
stores a pointer to the task_struct.  This commit detects the
type of pointer stored in the mutex and adjusts the mutex_owner()
and mutex_owned() functions to perform the correct comparision.
2011-06-24 13:00:08 -07:00
Brian Behlendorf a55bcaad18 Linux 3.0: Shrinker compatibility
Update the the wrapper macros for the memory shrinker to handle
this 4th API change.  The callback function now takes a
shrink_control structure.  This is certainly a step in the
right direction but it's annoying to have to accomidate yet
another version of the API.
2011-06-21 14:02:39 -07:00
Brian Behlendorf a32661a6c9 Avoid 'rpm -q' bug for 'make pkg'
RPM version 4.9.0 has been observed to generate extra debug
messages in certain cases.  These debug messages prevent us
from cleanly acquiring the architecture.  This is clearly
an upstream RPM bug which will get fixed.  But until then
a safe solution is to pipe the result through 'tail -1'
to just grab the architecture bit we care about.

Example 'rpm -qp spl-0.6.0-rc4.src.rpm --qf %{arch}' output:

Freeing read locks for locker 0x166: 28031/47480843735008
Freeing read locks for locker 0x168: 28031/47480843735008
x86_64
2011-06-16 11:49:38 -07:00
Brian Behlendorf 2e08aedba4 Always check -Wno-unused-but-set-variable gcc support
The previous commit 8a7e1ceefa wasn't
quite right.  This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.

For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
2011-06-14 16:40:35 -07:00
Brian Behlendorf 8a7e1ceefa Check for -Wno-unused-but-set-variable gcc support
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable.  This can lead to build failures
on older Linux platforms such as Debian Lenny.  Since this is
an optional build argument this changes add a new autoconf check
for the option.  If it is supported by the installed version of
gcc then it is used otherwise it is omited.

See commit's 12c1acde76 and
79713039a2 for the reason the
-Wno-unused-but-set-variable options was originally added.
2011-06-14 14:43:22 -07:00
Alexey Shvetsov d9bfe0f57a Fix distribution detection for gentoo
Also this may fix other distros because some of them also provide
/etc/lsb-release not only ubuntu.

Closes #244
2011-05-14 08:54:48 -07:00
Brian Behlendorf 712f8bd87b Add Gentoo/Lunar/Redhat Init Scripts
Every distribution has slightly different requirements for their
init scripts.  Because of this the zfs package contains several
init scripts for various distributions.  These scripts have been
contributed by, and are supported by, the larger zfs community.
Init scripts for Gentoo/Lunar/Redhat have been contributed by:

  Gentoo - devsk <devsku@gmail.com>
  Lunar  - Jean-Michel Bruenn <jean.bruenn@ip-minds.de>
  Redhat - Fajar A. Nugraha <list@fajar.net>
2011-05-02 15:59:13 -07:00
Brian Behlendorf df554c148e Fix 'zfs set volsize=N pool/dataset'
This change fixes a kernel panic which would occur when resizing
a dataset which was not open.  The objset_t stored in the
zvol_state_t will be set to NULL when the block device is closed.
To avoid this issue we pass the correct objset_t as the third arg.

The code has also been updated to correctly notify the kernel
when the block device capacity changes.  For 2.6.28 and newer
kernels the capacity change will be immediately detected.  For
earlier kernels the capacity change will be detected when the
device is next opened.  This is a known limitation of older
kernels.

Online ext3 resize test case passes on 2.6.28+ kernels:
$ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023
$ zpool create tank /tmp/zvol
$ zfs create -V 500M tank/zd0
$ mkfs.ext3 /dev/zd0
$ mkdir /mnt/zd0
$ mount /dev/zd0 /mnt/zd0
$ df -h /mnt/zd0
$ zfs set volsize=800M tank/zd0
$ resize2fs /dev/zd0
$ df -h /mnt/zd0

Original-patch-by: Fajar A. Nugraha <github@fajar.net>
Closes #68
Closes #84
2011-05-02 08:54:40 -07:00
Gunnar Beutner 055656d4f4 Implemented NFS export_operations.
Implemented the required NFS operations for exporting ZFS datasets
using the in-kernel NFS daemon.
2011-04-29 12:36:13 -07:00
Darik Horn ad35b6a6e9 Remove the gawk dependency.
This reverts commit 1814251453.

Demote the gawk call back to awk and ensure that stderr is attached.  GNU gawk
tolerates a missing stderr handle, but many utilities do not, which could be
why a regular awk call was unexplainably failing on some systems.

Use argv[0] instead of sh_path for consistency internally and with other Linux
drivers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-04-21 09:41:09 -07:00
Brian Behlendorf 3dfc591ac4 Linux 2.6.39 compat, zlib_deflate_workspacesize()
The function zlib_deflate_workspacesize() now take 2 arguments.
This was done to avoid always having to allocate the maximum size
workspace (268K).  The caller can now specific the windowBits and
memLevel compression parameters to get a smaller workspace.

For our purposes we introduce a spl_zlib_deflate_workspacesize()
wrapper which accepts both arguments.  When the two argument
version of zlib_deflate_workspacesize() is available the arguments
are passed through.  When it's not we assume the worst case and
a maximally sized workspace is used.
2011-04-20 14:39:15 -07:00
Brian Behlendorf b1cbc4610c Linux 2.6.39 compat, kern_path_parent()
The path_lookup() function has been renamed to kern_path_parent()
and the flags argument has been removed.  The only behavior now
offered is that of LOOKUP_PARENT.  The spl already always passed
this flag so dropping the flag does not impact us.
2011-04-20 12:30:17 -07:00
Brian Behlendorf 12c1acde76 Set -Wno-unused-but-set-variable globally
As of gcc-4.6 the option -Wunused-but-set-variable is enabled by
default.  While this is a useful warning there are numerous places
in the ZFS code when a variable is set and then only checked in an
ASSERT().  To avoid having to update every instance of this in the
code we now set -Wno-unused-but-set-variable to suppress the warning.

Additionally, when building with --enable-debug and -Werror set these
warning also become fatal.  We can reevaluate the suppression of these
error at a later time if it becomes an issue.  For now we are basically
just reverting to the previous gcc behavior.
2011-04-19 10:44:10 -07:00
Brian Behlendorf 79713039a2 Fix gcc configure warnings
Newer versions of gcc are getting smart enough to detect the sloppy
syntax used for the autoconf tests.  It is now generating warnings
for unused/undeclared variables.  Newer version of gcc even have
the -Wunused-but-set-variable option set by default.  This isn't a
problem except when -Werror is set and they get promoted to an error.
In this case the autoconf test will return an incorrect result which
will result in a build failure latter on.

To handle this I'm tightening up many of the autoconf tests to
explicitly mark variables as unused to suppress the gcc warning.
Remember, all of the autoconf code can never actually be run we
just want to get a clean build error to detect which APIs are
available.  Never using a variable is absolutely fine for this.

Closes #176
2011-04-19 10:10:47 -07:00
Brian Behlendorf 03318641af Fix gcc configure warnings
Newer versions of gcc are getting smart enough to detect the sloppy
syntax used for the autoconf tests.  It is now generating warnings
for unused/undeclared variables.  Newer version of gcc even have
the -Wunused-but-set-variable option set by default.  This isn't a
problem except when -Werror is set and they get promoted to an error.
In this case the autoconf test will return an incorrect result which
will result in a build failure latter on.

To handle this I'm tightening up many of the autoconf tests to
explicitly mark variables as unused to suppress the gcc warning.
Remember, all of the autoconf code can never actually be run we
just want to get a clean build error to detect which APIs are
available.  Never using a variable is absolutely fine for this.
2011-04-19 09:41:41 -07:00
Brian Behlendorf 9b0f9079d2 Linux 2.6.39 compat, invalidate_inodes()
To resolve a potiential filesystem corruption issue a second
argument was added to invalidate_inodes().  This argument controls
whether dirty inodes are dropped or treated as busy when invalidating
a super block.  When only the legacy API is available the second
argument will be dropped for compatibility.
2011-04-19 09:08:08 -07:00
Brian Behlendorf e76f4bf11d Add dnlc_reduce_cache() support
Provide the dnlc_reduce_cache() function which attempts to prune
cached entries from the dcache and icache.  After the entries are
pruned any slabs which they may have been using are reaped.

Note the API takes a reclaim percentage but we don't have easy
access to the total number of cache entries to calculate the
reclaim count.  However, in practice this doesn't need to be
exactly correct.  We simply need to reclaim some useful fraction
(but not all) of the cache.  The caller can determine if more
needs to be done.
2011-04-06 20:06:03 -07:00
Brian Behlendorf bdf4328b04 Linux 2.6.28 compat, insert_inode_locked()
Added insert_inode_locked() helper function, prior to this most callers
used insert_inode_hash().  The older method doesn't check for collisions
in the inode_hashtable but it still acceptible for use.  Fallback to
using insert_inode_hash() when insert_inode_locked() is unavailable.
2011-03-22 12:15:54 -07:00
Manuel Amador (Rudd-O) ae26d0465a Add dracut support
To simplify the process of using zfs as your root filesystem a
zfs-drucat sub-package has been added.  This sub-package adds a zfs
dracut module which allows your initramfs to be rebuilt with zfs
support.  The process for doing this is still complicated but there
is clearly interest from the community about getting this working
well and documented.  This should help lay some of the groundwork.

Longer term these changes should be pushed in the upstream dracut
package.  Once that occurs this subpackage will no longer be
required for new systems, however we may want to conditionally
build this package in the future for systems running older
dracut versions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-03-17 16:52:04 -07:00
Brian Behlendorf 01c0e61da0 Add init scripts
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed.  Unfortunately, every distribution
has their own idea of the _right_ way to do things.  Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway.  I have
instead added support to provide multiple distribution specific
init scripts.

The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.

Currently, there is zfs.fedora and a more generic zfs.lsb init
script.  Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.

This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
2011-03-17 16:51:54 -07:00
Brian Behlendorf a60b1c0a8e Make Missing Modules.symvers Fatal
Detect early on in configure if the Modules.symvers file is missing.
Without this file there will be build failures later and it's best
to catch this early and provide a useful error.  In this case the
most likely problem is the kernel-devel packages are not installed.
It may also be possible that they are using an unbuilt custom kernel
in which case they must build the kernel first.

Closes #127
2011-03-07 13:09:20 -08:00
Brian Behlendorf 912fd84d13 Make Missing Modules.symvers Fatal
Detect early on in configure if the Modules.symvers file is missing.
Without this file there will be build failures later and it's best
to catch this early and provide a useful error.  In this case the
most likely problem is the kernel-devel packages are not installed.
It may also be possible that they are using an unbuilt custom kernel
in which case they must build the kernel first.
2011-03-07 13:09:01 -08:00
Brian Behlendorf 15805c7711 Make CONFIG_PREEMPT Fatal
Until support is added for preemptible kernels detect this at
configure time and make it fatal.  Otherwise, it is possible to
have a successful build and kernel modules with flakey behavior.
2011-03-07 12:09:02 -08:00
Brian Behlendorf 7731d46b69 Make CONFIG_PREEMPT Fatal
Until support is added for preemptible kernels detect this at
configure time and make it fatal.  Otherwise, it is possible to
have a successful build and kernel modules with flakey behavior.
2011-03-07 10:58:07 -08:00
Brian Behlendorf 914b063133 Linux compat 2.6.37, invalidate_inodes()
In the 2.6.37 kernel the function invalidate_inodes() is no longer
exported for use by modules.  This memory management functionality
is needed to invalidate the inodes attached to a super block without
unmounting the filesystem.

Because this function still exists in the kernel and the prototype
is available is a common header all we strictly need is the symbol
address.  The address is obtained using spl_kallsyms_lookup_name()
and assigned to the variable invalidate_inodes_fn.  Then a #define
is used to replace all instances of invalidate_inodes() with a
call to the acquired address.  All the complexity is hidden behind
HAVE_INVALIDATE_INODES and invalidate_inodes() can be used as usual.

Long term we should try to get this, or another, interface made
available to modules again.
2011-02-23 12:44:32 -08:00
Brian Behlendorf 45066d1f20 Linux 2.6.38 compat, blkdev_get_by_path()
The open_bdev_exclusive() function has been replaced (again) by the
more generic blkdev_get_by_path() function.  Additionally, the
counterpart function close_bdev_exclusive() has been replaced by
blkdev_put().  Because these functions are more generic versions
of the functions they replaced the compatibility macro must add
the FMODE_EXCL mask to ensure they are exclusive.

Closes #114
2011-02-23 12:29:38 -08:00
Brian Behlendorf 2c395def27 Linux 2.6.36 compat, sops->evict_inode()
The new prefered inteface for evicting an inode from the inode cache
is the ->evict_inode() callback.  It replaces both the ->delete_inode()
and ->clear_inode() callbacks which were previously used for this.
2011-02-11 13:47:51 -08:00
Brian Behlendorf f9637c6c8b Linux 2.6.33 compat, get/set xattr callbacks
The xattr handler prototypes were sanitized with the idea being that
the same handlers could be used for multiple methods.  The result of
this was the inode type was changes to a dentry, and both the get()
and set() hooks had a handler_flags argument added.  The list()
callback was similiarly effected but no autoconf check was added
because we do not use the list() callback.
2011-02-11 10:41:00 -08:00