Compare commits

...

188 Commits

Author SHA1 Message Date
Tony Hutter 2bc71fa976 Prepare to release 0.6.5.11
META file and RPM release log updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-07-10 11:01:14 -07:00
Tony Hutter 5a20d4283c Linux 4.12 compat: super_setup_bdi_name() - add missing code
This includes code that was mistakenly left out of the 7dae2c8 merge into
0.6.5.10.  Its inclusion fixes a kernel warning on Kubuntu 17.04:

	WARN_ON(sb->s_bdi != &noop_backing_dev_info);

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6089
Closes #6324
(backported from zfs upstream commit 7dae2c81e7)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
2017-07-10 11:00:34 -07:00
alaviss bf04e4d442 Musl libc fixes
Musl libc's <stdio.h> doesn't include <stdarg.h>, which cause
`va_start` and `va_end` end up being undefined symbols.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Leorize <alaviss@users.noreply.github.com>
Closes #6310
2017-07-06 15:25:39 -07:00
DHE 5e6057b574 Increase zfs_vdev_async_write_min_active to 2
Resilver operations frequently cause only a small amount of dirty data
to be written to disk at a time, resulting in the IO scheduler to only
issue 1 write at a time to the resilvering disk. When it is rotational
media the drive will often travel past the next sector to be written
before receiving a write command from ZFS, significantly delaying the
write of the next sector.

Raise zfs_vdev_async_write_min_active so that drives are kept fed
during resilvering.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Issue #4825
Closes #5926
2017-07-06 15:25:39 -07:00
loli10K 94d353a0bf Fix int overflow in zbookmark_is_before()
When the DSL scan code tries to resume the scrub from the saved
zbookmark calls dsl_scan_check_resume()->zbookmark_is_before() to
decide if the current dnode still needs to be visited.

A subtle int overflow condition in zbookmark_is_before(), exacerbated
by bumping the indirect block size to 128K (d7958b4), can lead to the
wrong assuption that the dnode does not need to be scanned.

This results in scrubs completing "successfully" in matter of mere
minutes on pools with several TB of used space because every time we
try to resume the dnode traversal on a dataset zbookmark_is_before()
tells us the whole objset has already been scanned completely.

Fix this by forcing the right shift operator to be executed before
the multiplication, as done in zbookmark_compare() (fcff0f3).

Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
2017-07-06 15:25:39 -07:00
Tony Hutter e9fc1bd5e6 Fix RHEL 7.4 bio_set_op_attrs build error
On RHEL 7.4, include/linux/bio.h now includes a macro for
bio_set_op_attrs that conflicts with the ifndef in ZFS
include/linux/blkdev_compat.h.  This patch fixes the build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6234
Closes #6271
2017-07-06 15:25:39 -07:00
Tony Hutter b88f4d7ba7 GCC 7.1 fixes
GCC 7.1 with will warn when we're not checking the snprintf()
return code in cases where the buffer could be truncated. This
patch either checks the snprintf return code (where applicable),
or simply disables the warnings (ztest.c).

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6253
2017-07-06 15:25:39 -07:00
Brian Behlendorf 3e297b90f5 Remove complicated libspl assert wrappers
Effectively provide our own version of assert()/verify() for use
in user space.  This minimizes our dependencies and aligns the
user space assertion handling with what's used in the kernel.

Signed-off-by: Carlo Landmeter <clandmeter@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4449
2017-07-06 15:25:39 -07:00
Justin Lecher 709f25e248 Compatibilty with glibc-2.23
In glibc-2.23 <sys/sysmacros.h> isn't automatically included in
<sys/types.h> [1], so we need ot explicitely include it.

https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Justin Lecher <jlec@gentoo.org>
Closes #6132
2017-07-06 15:25:39 -07:00
Olaf Faaland cd2209b75e glibc 2.5 compat: use correct header for makedev() et al.
In glibc 2.5, makedev(), major(), and minor() are defined in
sys/sysmacros.h.  They are also defined in types.h for backward
compatability, but using these definitions triggers a compile warning.
This breaks the ZFS build, as it builds with -Werror.

autoconf email threads indicate these macros may be defined in
sys/mkdev.h in some cases.

This commit adds configure checks to detect where makedev() is defined:
  sys/sysmacros.h
  sys/mkdev.h

It assumes major() and minor() are defined in the same place.

The libspl types.h then includes
	sys/sysmacros.h (preferred) or
	sys/mkdev.h (2nd choice)
if one of those defines makedev().

This is done before including the system types.h.

An alternative would be to remove uses of major, minor, and makedev,
instead comparing the st_dev returned from stat64.  These configure
checks would then be unnecessary.

This change revealed that __NORETURN was being defined unnecessarily in
libspl/include/sys/sysmacros.h.  That definition is removed.

The files in which __NORETURN are used all include types.h, and so all
will get the definition provided by feature_tests.h

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5945
2017-07-06 15:25:39 -07:00
Tony Hutter a57fa2c532 Prepare to release 0.6.5.10
META file and RPM release log updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-06-12 15:31:33 -04:00
Brian Behlendorf 590509b75e Add MS_MANDLOCK mount failure message
Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4729
Closes #6199
2017-06-09 14:05:15 -07:00
Matthew Ahrens d07a8deac8 OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations
Authored by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Don Brady <don.brady@intel.com>
Ported-by: Matt Ahrens <mahrens@delphix.com>

RAID-Z requires that space be allocated in multiples of P+1 sectors,
because this is the minimum size block that can have the required amount
of parity.  Thus blocks on RAIDZ1 must be allocated in a multiple of 2
sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4.  A sector
is a unit of 2^ashift bytes, typically 512B or 4KB.

To satisfy this constraint, the allocation size is rounded up to the
proper multiple, resulting in up to 3 "pad sectors" at the end of some
blocks.  The contents of these pad sectors are not used, so we do not
need to read or write these sectors.  However, some storage hardware
performs much worse (around 1/2 as fast) on mostly-contiguous writes
when there are small gaps of non-overwritten data between the writes.
Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
include pad sectors.  If writing a pad sector will fill the gap between
two (required) writes, we will issue the optional zio, thus doubling
performance.  The gap-filling performance improvement was introduced in
July 2009.

Writing the optional zio is done by the io aggregation code in
vdev_queue.c.  The problem is that it is also subject to the limit on
the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
default 128KB.  For a given block, if the amount of data plus padding
written to a leaf device exceeds zfs_vdev_aggregation_limit, the
optional zio will not be written, resulting in a ~2x performance
degradation.

The problem occurs only for certain values of ashift, compressed block
size, and RAID-Z configuration (number of parity and data disks).  It
cannot occur with the default recordsize=128KB.  If compression is
enabled, all configurations with recordsize=1MB or larger will be
impacted to some degree.

The problem notably occurs with recordsize=1MB, compression=off, with 10
disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors).  Therefore
this problem has been known as "the 1MB 10-wide RAIDZ2 (or 3) problem".

The problem also occurs with the following configurations:

With recordsize=512KB or 256KB, compression=off, the problem occurs only
in rarely-used configurations:
* 4-wide RAIDZ1 with recordsize=512KB and ashift=12 (4KB sectors)
* 4-wide RAIDZ2 (either recordsize, either ashift)
* 5-wide RAIDZ2 with recordsize=512KB (either ashift)
* 6-wide RAIDZ2 with recordsize=512KB (either ashift)

With recordsize=1MB, compression=off, ashift=9 (512B sectors)
* RAIDZ1 with 4 or 8 disks
* RAIDZ2 with 4, 8, or 10 disks
* RAIDZ3 with 6, 8, 9, or 10 disks

With recordsize=1MB, compression=off, ashift=12 (4KB sectors)
* RAIDZ1 with 7 or 8 disks
* RAIDZ2 with 4, 5, or 10 disks
* RAIDZ3 with 6, 9, or 10 disks

With recordsize=2MB and larger (which can only be selected by changing
kernel tunables), many configurations are affected, including with
higher numbers of disks (up to 18 disks with recordsize=2MB).

Increase zfs_vdev_aggregation_limit to allow the optional zio to be
aggregated, thus eliminating the problem.  Setting it to 256KB fixes all
commonly-used configurations.

The solution is to aggregate optional zio's regardless of the
aggregation size limit.

Analysis sponsored by Intel Corp.

OpenZFS-issue: https://www.illumos.org/issues/8005
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/321
Closes #5931
2017-06-09 14:05:15 -07:00
Chunwei Chen 69494c6aff Fix import wrong spare/l2 device when path change
If, for example, your aux device was /dev/sdc, but now the aux device is
removed and /dev/sdc points to other device. zpool import will still
use that device and corrupt it.

The problem is that the spa_validate_aux in spa_import, rather than
validate the on-disk label, it would actually write label to disk. We
remove them since spa_load_{spares,l2cache} seems to do everything we
need and they would actually validate on-disk label.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6158
2017-06-09 14:05:15 -07:00
Chunwei Chen 412e3c26a9 Fix import finding spare/l2cache when path changes
When spare or l2cache device path changes, zpool import will not fix up
their paths like normal vdev. The issue is that when you supply a pool
name argument to zpool import, it will use it to filter out device which
doesn't have the pool name in the label. Since spare and l2cache device
never have that in the label, they'll always get filtered out.

We fix this by making sure we never filter out a spare or l2cache
device.

Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6158
2017-06-09 14:05:15 -07:00
LOLi ed9cb8390b Linux 4.9 compat: fix zfs_ctldir xattr handling
Since torvalds/linux@d0a5b99 IOP_XATTR is used to indicate the inode
has xattr support: clear it for the ctldir inodes to avoid EIO errors.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6189
2017-06-09 14:05:15 -07:00
LOLi cb8210d125 Linux 4.12 compat: fix super_setup_bdi_name() call
Provide a format parameter to super_setup_bdi_name() so we don't
create duplicate names in '/devices/virtual/bdi' sysfs namespace which
would prevent us from mounting more than one ZFS filesystem at a time.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6147
2017-06-09 14:05:15 -07:00
Brian Behlendorf 21fd04ec40 Linux 4.12 compat: CURRENT_TIME removed
Linux 4.9 added current_time() as the preferred interface to get
the filesystem time.  CURRENT_TIME was retired in Linux 4.12.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6114
2017-06-09 14:05:15 -07:00
Brian Behlendorf e4cb6ee6a5 Linux 4.12 compat: super_setup_bdi_name()
All filesystems were converted to dynamically allocated BDIs.  The
destruction of backing_dev_info structures is handled as part of
super block destruction.  Refactor the code to abstract away the
details of creating and destroying a BDI.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6089
2017-06-09 14:05:15 -07:00
Brian Behlendorf a83a4f9d10 Limit zfs_dirty_data_max_max to 4G
Reinstate default 4G zfs_dirty_data_max_max limit.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6072
Closes #6081
2017-06-09 14:05:15 -07:00
Matthew Ahrens 1e5f75ecbe OpenZFS 8166 - zpool scrub thinks it repaired offline device
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>

If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged.  When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started.  The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.

The fix is to never clear the DTL of offline devices.  Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.

The problem can be worked around by running "zpool scrub" after
"zpool online".

OpenZFS-issue: https://www.illumos.org/issues/8166
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/372
Closes #5806
Closes #6103
2017-06-09 14:05:15 -07:00
Ned Bass 36ccb9db43 vdev_id: fix failure due to multipath -l bug
Udev may fail to create the expected symbolic links in
/dev/disk/by-vdev on systems with the
device-mapper-multipath-0.4.9-100.el6 package installed. This affects
RHEL 6.9 and possibly other downstream distributions.

That version of the multipath command may incorrectly list a drive
state as "unkown" instead of "running". The issue was introduced
in the patch for https://bugzilla.redhat.com/show_bug.cgi?id=1401769

The vdev_id udev helper uses the state reported by "multipath -l" to
detect an online component disk of a multipath device in order to
resolve its physical slot and enclosure. Changing the command
invocation to "multipath -ll" works around the above issue by causing
multipath to consult additional sources of information to determine
the drive state.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6039
2017-06-09 14:05:15 -07:00
jxiong a2c9518711 Guarantee PAGESIZE alignment for large zio buffers
In current implementation, only zio buffers in 16KB and bigger are
guaranteed PAGESIZE alignment. This breaks Lustre since it assumes
that 'arc_buf_t::b_data' must be page aligned when zio buffers are
greater than or equal to PAGESIZE.

This patch will make the zio buffers to be PAGESIZE aligned when
the sizes are not less than PAGESIZE.

This change may cause a little bit memory waste but that should be
fine because after ABD is introduced, zio buffers are used to hold
data temporarily and live in memory for a short while.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Closes #6084
2017-06-09 14:05:15 -07:00
Tony Hutter cc519c4027 Fix harmless "BARRIER is deprecated" kernel warning on Centos 6.8
A one time warning after module load that "BARRIER is deprecated" was seen
on the heavily patched 2.6.32-642.13.1.el6.x86_64 Centos 6.8 kernel.  It seems
that kernel had both the old BARRIER and the newer FLUSH/FUA interfaces
defined.  This fixes the warning by prefering the newer FLUSH/FUA interface
if it's available.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5739
Closes #5828
2017-06-09 14:05:15 -07:00
Chunwei Chen dbb48937ce Add kmap_atomic in dmu_bio_copy
This is needed for 32 bit systems.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-06-09 14:05:15 -07:00
Tim Chase 34a3a7c660 zdb: segfault in dump_bpobj_subobjs()
Avoid buffer overrun on all-zero bpobj subobjects by using signed
array index.  Also fix the type cast on the printf() argument.

Signed-off-by: Tim Chase <tim@onlight.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3905
2017-06-09 14:05:15 -07:00
Brian Behlendorf 4a4c57d5ff Fix atomic_sub_64() i386 assembly implementation
The atomic_sub_64() should use sbbl instead of adcl.  In user
space these atomics are used for statistics tracking and aren't
critical which explain how this was overlooked.  The kernel
space implementation of these atomics are layered on the
architecture specific implementations provided by the kernel.

Reviewed by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5671
Closes #5717
2017-06-09 14:05:15 -07:00
Chunwei Chen 2094a93e87 Fix loop device becomes read-only
Commit 933ec99 removes read and write from f_op because the vfs layer will
select iter_write or aio_write automatically. However, for Linux <= 4.0,
loop_set_fd will actually check f_op->write and set read-only if not exists.
This patch add them back and use the generic do_sync_{read,write} for
aio_{read,write} and new_sync_{read,write} for {read,write}_iter.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5776
Closes #5855
2017-06-09 14:05:15 -07:00
loli10K 03336d011c Allow ZVOL bookmarks to be listed recursively
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4503
Closes #5072
2017-06-09 14:05:15 -07:00
Brian Behlendorf f0a4bfbe4d Fix zfs-mount.service failure on boot
The mount(8) command will helpfully try to resolve any device name
which is passed in.  It does this by applying some simple heuristics
before passing it along to the registered mount helper.

Normally this fine.  However, one of these heuristics is to prepend
the current working directory to the passed device name.  If that
resulting directory name exists mount(8) will perform the mount(2)
system call and never invoke the helper utility.

Since the cwd for systemd when running as the system instance is
the root directory the default mount points created by zfs(8) can
cause a mount failure.

This change avoids the issue by explicitly setting the cwd to
a different path when performing the mount.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5719
2017-06-09 14:05:15 -07:00
Brian Behlendorf ebef1f2fb6 Fix iput() calls within a tx
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5758
2017-06-09 14:05:15 -07:00
Chunwei Chen 00a1a11989 Fix off by one in zpl_lookup
Doing the following command would return success with zfs creating an orphan
object.

	touch $(for i in $(seq 256); do printf "n"; done)

The funny thing is that this will only work once for each directory, because
after upgraded to fzap, zfs_lookup would fail properly since it has additional
length check.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5768
2017-06-09 14:05:15 -07:00
Olaf Faaland b4c181dc76 Linux 4.11 compat: iops.getattr and friends
In torvalds/linux@a528d35, there are changes to the getattr family of functions,
struct kstat, and the interface of inode_operations .getattr.

The inode_operations .getattr and simple_getattr() interface changed to:

int (*getattr) (const struct path *, struct dentry *, struct kstat *,
    u32 request_mask, unsigned int query_flags)

The request_mask argument indicates which field(s) the caller intends to use.
Fields the caller has not specified via request_mask may be set in the returned
struct anyway, but their values may be approximate.

The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.

Currently both fields are ignored.  It is possible that getattr-related
functions within zfs could be optimized based on the request_mask.

struct kstat includes new fields:
u32               result_mask;  /* What fields the user got */
u64               attributes;   /* See STATX_ATTR_* flags */
struct timespec   btime;        /* File creation time */

Fields attribute and btime are cleared; the result_mask reflects this.  These
appear to be optional based on simple_getattr() and vfs_getattr() within the
kernel, which take the same approach.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5875
2017-06-09 14:05:15 -07:00
Olaf Faaland 626ba3142b Linux 4.11 compat: avoid refcount_t name conflict
Linux 4.11 introduces a new type, refcount_t, which conflicts with the
type of the same name defined within ZFS.

Rename the ZFS type zfs_refcount_t.  Within the ZFS code, use a macro to
cause references to refcount_t to be changed to zfs_refcount_t at
compile time.  This reduces conflicts when later landing OpenZFS
patches.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5823
Closes #5842
2017-06-09 14:05:15 -07:00
Brian Behlendorf 0bbd80c058 Prepare to release 0.6.5.9
META file and RPM release log updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2017-02-03 13:11:42 -08:00
Chunwei Chen 10fbf7c406 Make zfs mount according to relatime config in dataset
Also enable lazytime in mount.zfs

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2017-02-03 11:58:19 -08:00
Chunwei Chen 1ad7f89628 Enable lazytime semantic for atime
Linux 4.0 introduces lazytime. The idea is that when we update the atime, we
delay writing it to disk for as long as it is reasonably possible.

When lazytime is enabled, dirty_inode will be called with only I_DIRTY_TIME
flag whenever i_atime is updated. So under such condition, we will set
z_atime_dirty. We will only write it to disk if file is closed, inode is
evicted or setattr is called. Ideally, we should also write it whenever SA
is going to be updated, but it is left for future improvement.

There's one thing that we should take care of now that we allow i_atime to be
dirty. In original implementation, whenever SA is modified, zfs_inode_update
will be called to overwrite every thing in inode. This will cause dirty
i_atime to be discarded. We fix this by don't overwrite i_atime in
zfs_inode_update. We only overwrite i_atime when allocating new inode or doing
zfs_rezget with zfs_inode_update_new.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2017-02-03 11:58:19 -08:00
Chunwei Chen 5137c95dec Fix atime handling and relatime
The problem for atime:

We have 3 places for atime: inode->i_atime, znode->z_atime and SA. And its
handling is a mess. A huge part of mess regarding atime comes from
zfs_tstamp_update_setup, zfs_inode_update, and zfs_getattr, which behave
inconsistently with those three values.

zfs_tstamp_update_setup clears z_atime_dirty unconditionally as long as you
don't pass ATTR_ATIME. Which means every write(2) operation which only updates
ctime and mtime will cause atime changes to not be written to disk.

Also zfs_inode_update from write(2) will replace inode->i_atime with what's
inside SA(stale). But doesn't touch z_atime. So after read(2) and write(2).
You'll have i_atime(stale), z_atime(new), SA(stale) and z_atime_dirty=0.

Now, if you do stat(2), zfs_getattr will actually replace i_atime with what's
inside, z_atime. So you will have now you'll have i_atime(new), z_atime(new),
SA(stale) and z_atime_dirty=0. These will all gone after umount. And you'll
leave with a stale atime.

The problem for relatime:

We do have a relatime config inside ZFS dataset, but how it should interact
with the mount flag MS_RELATIME is not well defined. It seems it wanted
relatime mount option to override the dataset config by showing it as
temporary in `zfs get`. But at the same time, `zfs set relatime=on|off` would
also seems to want to override the mount option. Not to mention that
MS_RELATIME flag is actually never passed into ZFS, so it never really worked.

How Linux handles atime:

The Linux kernel actually handles atime completely in VFS, except for writing
it to disk. So if we remove the atime handling in ZFS, things would just work,
no matter it's strictatime, relatime, noatime, or even O_NOATIME. And whenever
VFS updates the i_atime, it will notify the underlying filesystem via
sb->dirty_inode().

And also there's one thing to note about atime flags like MS_RELATIME and
other flags like MS_NODEV, etc. They are mount point flags rather than
filesystem(sb) flags. Since native linux filesystem can be mounted at multiple
places at the same time, they can all have different atime settings. So these
flags are never passed down to filesystem drivers.

What this patch tries to do:

We remove znode->z_atime, since we won't gain anything from it. We remove most
of the atime handling and leave it to VFS. The only thing we do with atime is
to write it when dirty_inode() or setattr() is called. We also add
file_accessed() in zpl_read() since it's not provided in vfs_read().

After this patch, only the MS_RELATIME flag will have effect. The setting in
dataset won't do anything. We will make zfstuil to mount ZFS with MS_RELATIME
set according to the setting in dataset in future patch.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2017-02-03 11:58:19 -08:00
Chunwei Chen a0e099580a Fix write(2) returns zero bug from 933ec99
For generic_write_checks with 2 args, we can exit when it returns zero because
it means count is zero. However this is not the case for generic_write_checks
with 4 args, where zero means no error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5720
Closes #5726
2017-02-03 10:25:41 -08:00
Chunwei Chen 5070e5311c Retire .write/.read file operations
The .write/.read file operations callbacks can be retired since
support for .read_iter/.write_iter and .aio_read/.aio_write has
been added.  The vfs_write()/vfs_read() entry functions will
select the correct interface for the kernel.  This is desirable
because all VFS write/read operations now rely on common code.

This change also add the generic write checks to make sure that
ulimits are enforced correctly on write.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5587
Closes #5673
2017-02-03 10:25:37 -08:00
Chunwei Chen 110470266d Fix zmo leak when zfs_sb_create fails
zfs_sb_create would normally takes ownership of zmo, and it will be freed in
zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5490
Closes #5496
2017-02-03 10:25:33 -08:00
Chunwei Chen d425320ac8 Fix fchange in zpl_ioctl_setflags
The fchange in zpl_ioctl_setflags was for detecting flag change. However it
was incorrect and would always fail to detect a flag change from set to unset,
causing users without CAP_LINUX_IMMUTABLE to be able to unset flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:25:29 -08:00
Chunwei Chen 2a51899946 Fix wrong operator in xvattr.h
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:25:25 -08:00
Chunwei Chen f3da7a1b40 Don't count '@' for dataset namelen if not a snapshot
Don't count '@' for dataset namelen if not a snapshot.  This
fixes making a pool unimportable when the  dataset namelen
is 255.

Add test file for zfs create name length 255.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5432
Closes #5456
2017-02-03 10:25:22 -08:00
Richard Yao 625ee0a5e0 zfs_inode_update should not call dmu_object_size_from_db under spinlock
We should never block when holding a spin lock, but zfs_inode_update can
block in the critical section of a spin lock in zfs_inode_update:

zfs_inode_update -> dmu_object_size_from_db -> zrl_add -> mutex_enter

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3858
2017-02-03 10:25:19 -08:00
Gvozden Neskovic 9dd467a271 Fix ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE check
Pass `ACL_TYPE_ACCESS` for type parameter of `set_cached_acl()` and
`forget_cached_acl()` to avoid removal of dead code after BUG() in
compile time. Tested on 3.2.0 kernel.

Introduced in 3779913

Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5378
2017-02-03 10:25:15 -08:00
Isaac Huang 6ebfe58117 Explicit block device plugging when submitting multiple BIOs
Without plugging, the default 'noop' scheduler will not merge
the BIOs which are part of a large ZIO.

Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5181
2017-02-03 10:25:12 -08:00
Tim Chase 39d65926c9 4.10 compat - BIO flag changes and others
[bio] The req_op enum was changed to req_opf.  Update the "Linux 4.8 API"
autotools checks to use an int to determine whether the various REQ_OP
values are defined.  This should work properly on kernels >= 4.8.

[bio] bio_set_op_attrs() is now an inline function and can't be detected
with #ifdef.  Add a configure check to determine whether bio_set_op_attrs()
is defined.  Move the local definition of it from vdev_disk.c to
blkdev_compat.h for consistency with other related compability shims.

[bio] The read/write flags and their modifiers, including WRITE_FLUSH,
WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h.  Add the new
bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA
and set the flags appropriately for each supported kernel version.

[vfs] The generic_readlink() function has been made static.  If .readlink
in inode_operations is NULL, generic_readlink() is used.

[zol typo] Completely unrelated to 4.10 compat, fix a typo in the check
for REQ_OP_SECURE_ERASE so that the proper macro is defined:

    s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5499
2017-02-03 10:25:07 -08:00
Brian Behlendorf a57228e51c Reorder HAVE_BIO_RW_* checks
The HAVE_BIO_RW_* #ifdef's must appear before REQ_* #ifdef's
in the bio_is_flush() and bio_is_discard() macros.  Linux 2.6.32
era kernels defined both of values and the HAVE_BIO_RW_* must be
used in this case.  This resulted in a panic in zconfig test 5.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4951
Closes #4959
2017-02-03 10:25:03 -08:00
Brian Behlendorf bea68ec5bf Remove custom root pool import code
Non-Linux OpenZFS implementations require additional support to be
used a root pool.  This code should simply be removed to avoid
confusion and improve readability.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4951
2017-02-03 10:24:59 -08:00
Tim Chase 88fa992878 Fix sync behavior for disk vdevs
Prior to b39c22b, which was first generally available in the 0.6.5
release as b39c22b, ZoL never actually submitted synchronous read or write
requests to the Linux block layer.  This means the vdev_disk_dio_is_sync()
function had always returned false and, therefore, the completion in
dio_request_t.dr_comp was never actually used.

In b39c22b, synchronous ZIO operations were translated to synchronous
BIO requests in vdev_disk_io_start().  The follow-on commits 5592404 and
aa159af fixed several problems introduced by b39c22b.  In particular,
5592404 introduced the new flag parameter "wait" to __vdev_disk_physio()
but under ZoL, since vdev_disk_physio() is never actually used, the wait
flag was always zero so the new code had no effect other than to cause
a bug in the use of the dio_request_t.dr_comp which was fixed by aa159af.

The original rationale for introducing synchronous operations in b39c22b
was to hurry certains requests through the BIO layer which would have
otherwise been subject to its unplug timer which would increase the
latency.  This behavior of the unplug timer, however, went away during the
transition of the plug/unplug system between kernels 2.6.32 and 2.6.39.

To handle the unplug timer behavior on 2.6.32-2.6.35 kernels the
BIO_RW_UNPLUG flag is used as a hint to suppress the plugging behavior.

For kernels 2.6.36-2.6.38, the REQ_UNPLUG macro will be available and
ise used for the same purpose.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4858
2017-02-03 10:24:54 -08:00
Chunwei Chen c09af45f7b Use set_cached_acl and forget_cached_acl when possible
Originally, these two function are inline, so their usability is tied to
posix_acl_release. However, since Linux 3.14, they became EXPORT_SYMBOL, so we
can always use them. In this patch, we create an independent test for these
two functions so we can use them when possible.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:50 -08:00
Chunwei Chen 64c259c509 Batch free zpl_posix_acl_release
Currently every calls to zpl_posix_acl_release will schedule a delayed task,
and each delayed task will add a timer. This used to be fine except for
possibly bad performance impact.

However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In
this new implementation, the larger the delay, the less accuracy the timer is.
So when we have a flood of timer from zpl_posix_acl_release, they will expire
at the same time. Couple with the fact that task_expire will do linear search
with lock held. This causes an extreme amount of contention inside interrupt
and would actually lockup the system.

We fix this by doing batch free to prevent a flood of delayed task. Every call
to zpl_posix_acl_release will put the posix_acl to be freed on a lockless
list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free
every posix_acl that passed the grace period on the list. This way, we only
have one delayed task every second.

[1] https://lwn.net/Articles/646950/

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:45 -08:00
Neal Gompa (ニール・ゴンパ) 447040c31d Process all systemd services through the systemd scriptlets
This patch ensures that all systemd services are processed through the
systemd scriptlets, so that services are properly configured per the
preset file installed by the package.

Without this, zfs.target is set, but none of the services are enabled per
the preset file, meaning automounting filesystems and such won't work
out of the box.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa13@gmail.com>
Closes #5356
2017-02-03 10:24:41 -08:00
tuxoko 734e235f67 Fix cred leak in zpl_fallocate_common
This is caught by kmemleak when running compress_004_pos

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5244
Closes #5330
2017-02-03 10:24:38 -08:00
Hajo Möller ffcd0c5434 Fix lookup_bdev() on Ubuntu
Ubuntu added support for checking inode permissions to lookup_bdev() in kernel
commit 193fb6a2c94fab8eb8ce70a5da4d21c7d4023bee (merged in 4.4.0-6.21).
Upstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636517

This patch adds a test for Ubuntu's variant of lookup_bdev() to configure and
calls the function in the correct way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hajo Möller <dasjoe@gmail.com>
Closes #5336
2017-02-03 10:24:34 -08:00
LOLi d2beed9116 Fix uninitialized variable snapprops_nvlist in zfs_receive_one
The variable snapprops_nvlist was never initialized, so properties
were not applied to the received snapshot.

Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise
'zfs receive' functionality for user properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4338
2017-02-03 10:24:30 -08:00
Tim Chase 4c83fa9b87 Write issue taskq shouldn't be dynamic
This is as much an upstream compatibility as it's a bit of a performance
gain.

The illumos taskq implemention doesn't allow a TASKQ_THREADS_CPU_PCT type
to be dynamic and in fact enforces as much with an ASSERT.

As to performance, if this taskq is dynamic, it can cause excessive
contention on tq_lock as the threads are created and destroyed because it
can see bursts of many thousands of tasks in a short time, particularly
in heavy high-concurrency zvol write workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5236
2017-02-03 10:24:26 -08:00
Brian Behlendorf cbf8713874 Use large stacks when available
While stack size will vary by architecture it has historically defaulted to
8K on x86_64 systems.  However, as of Linux 3.15 the default thread stack
size was increased to 16K.  These kernels are now the default in most non-
enterprise distributions which means we no longer need to assume 8K stacks.

This patch takes advantage of that fact by appropriately reverting stack
conservation changes which were made to ensure stability.  Changes which
may have had a negative impact on performance for certain workloads.  This
also has the side effect of bringing the code slightly more in line with
upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4059
2017-02-03 10:24:22 -08:00
Stian Ellingsen dc3d6a6db1 Use env, not sh in zfsctl_snapshot_{,un}mount()
Call mount and umount via /usr/bin/env instead of /bin/sh in
zfsctl_snapshot_mount() and zfsctl_snapshot_unmount().

This change fixes a shell code injection flaw.  The call to /bin/sh
passed the mountpoint unescaped, only surrounded by single quotes.  A
mountpoint containing one or more single quotes would cause the command
to fail or potentially execute arbitrary shell code.

This change also provides compatibility with grsecurity patches.
Grsecurity only allows call_usermodehelper() to use helper binaries in
certain paths.  /usr/bin/* is allowed, /bin/* is not.
2017-02-03 10:24:17 -08:00
Stian Ellingsen d71db895a1 Fix use after free in zfsctl_snapshot_unmount() 2017-02-03 10:24:12 -08:00
tuxoko 42dae6d7a6 Linux 3.14 compat: assign inode->set_acl
Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5371
Closes #5375
2017-02-03 10:24:09 -08:00
Brian Behlendorf f85c85ea06 Linux 4.9 compat: inode_change_ok() renamed setattr_prepare()
In torvalds/linux@31051c8 the inode_change_ok() function was
renamed setattr_prepare() and updated to take a dentry ratheri
than an inode.  Update the code to call the setattr_prepare()
and add a wrapper function which call inode_change_ok() for
older kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:06 -08:00
Chunwei Chen 670508f080 Linux 4.9 compat: remove iops->{set,get,remove}xattr
In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and
generic_{set,get,remove}xattr are removed. xattr operations will directly
go through sb->s_xattr.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:00 -08:00
Chunwei Chen 28172e8aa7 Linux 4.9 compat: iops->rename() wants flags
In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are
merged together into iops->rename(), it now wants flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:23:57 -08:00
tuxoko c0716f13ef Linux 4.7 compat: Fix deadlock during lookup on case-insensitive
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5124
Closes #5141
Closes #5147
Closes #5148
2017-02-03 10:23:53 -08:00
DeHackEd dbc95a682c Kernel 4.9 compat: file_operations->aio_fsync removal
Linux kernel commit 723c038475b78 removed this field.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #5393
2017-02-03 10:23:50 -08:00
Chunwei Chen 20a0763746 Remove dir inode operations from zpl_inode_operations
These operations are dir specific, there's no point putting them in
zpl_inode_operations which is for regular files.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:23:47 -08:00
Brian Behlendorf e56852059f Fix uninitialized variable in avl_add()
Silence the following warning when compiling with gcc 5.4.0.
Specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609.

module/avl/avl.c: In function ‘avl_add’:
module/avl/avl.c:647:2: warning: ‘where’ may be used uninitialized
    in this function [-Wmaybe-uninitialized]
  avl_insert(tree, new_node, where);

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2017-02-03 10:23:42 -08:00
Ned Bass 1f734a62ac Prepare to release 0.6.5.8
META file and RPM release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-09-09 13:21:10 -07:00
Brian Behlendorf ffddb4dfab Fix gcc -Warray-bounds check for dump_object() in zdb
As of gcc 6.1.1 20160621 (Red Hat 6.1.1-3) an array bounds warnings
is detected in the zdb the dump_object() function.  The analysis is
correct but difficult to interpret because this is implemented as a
macro.  Rework the ZDB_OT_NAME in to a function and remove the case
detected by gcc which is a side effect of the DMU_OT_IS_VALID() macro.

  zdb.c: In function ‘dump_object’:
  zdb.c:1931:288: error: array subscript is outside array bounds
      [-Werror=array-bounds]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #4907
2016-09-09 13:21:10 -07:00
Brian Behlendorf 8fe1fb14cb Handle block pointers with a corrupt logical size
Commit 5f6d0b6 was originally added to gracefully handle block
pointers with a damaged logical size.  However, it incorrectly
assumed that all passed arc_done_func_t could handle a NULL
arc_buf_t.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4069
Closes #4080
2016-09-09 13:21:10 -07:00
Brian Behlendorf bf8b4a9fd5 Linux 4.8 compat: Fix removal of bio->bi_rw member
All users of bio->bi_rw have been replaced with compatibility wrappers.
This allows the kernel specific logic to be abstracted away, and for
each of the supported cases to be documented with the wrapper.  The
updated interfaces are as follows:

* void blk_queue_set_write_cache(struct request_queue *, bool, bool)
* boolean_t bio_is_flush(struct bio *)
* boolean_t bio_is_fua(struct bio *)
* boolean_t bio_is_discard(struct bio *)
* boolean_t bio_is_secure_erase(struct bio *)
* VDEV_WRITE_FLUSH_FUA

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4951
2016-09-09 13:21:10 -07:00
Brian Behlendorf 39a78fe9d4 Linux 4.8 compat: posix_acl_valid()
The posix_acl_valid() function has been updated to require a
user namespace.  Filesystem callers should normally provide the
user_ns from the super block associcated with the ACL; the
zpl_posix_acl_valid() wrapper has been added for this purpose.
See https://github.com/torvalds/linux/commit/0d4d717f for
complete details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4922
2016-09-09 13:21:10 -07:00
Chunwei Chen 6ae0dbdc8a Linux 4.8 compat: REQ_OP and bio_set_op_attrs()
New REQ_OP_* definitions have been introduced to separate the
WRITE, READ, and DISCARD operations from the flags.  This included
changing the encoding of bi_rw.  It places REQ_OP_* in high order
bits and other stuff in low order bits.  This encoding is done
through the new helper function bio_set_op_attrs.  For complete
details refer to:

https://github.com/torvalds/linux/commit/f215082
https://github.com/torvalds/linux/commit/4e1b2d5

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4892
Closes #4899
2016-09-09 13:21:10 -07:00
Brian Behlendorf a0591c4370 Linux 4.8 compat: REQ_PREFLUSH
The REQ_FLUSH flag was renamed REQ_PREFLUSH to avoid confusion with
REQ_OP_FLUSH.  See https://github.com/torvalds/linux/commit/28a8f0d3
for complete details.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4892
Issue #4899
2016-09-09 13:21:10 -07:00
Brian Behlendorf 68b8d22c6e Linux 4.8 compat: submit_bio()
The rw argument has been removed from submit_bio/submit_bio_wait.
Callers are now expected to set bio->bi_rw instead of passing it
in.  See https://github.com/torvalds/linux/commit/4e49ea4a for
complete details.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4892
Issue #4899
2016-09-09 13:21:09 -07:00
smh 3d824a8878 FreeBSD rS271776 - Persist vdev_resilver_txg changes
Persist vdev_resilver_txg changes to avoid panic caused by validation
vs a vdev_resilver_txg value from a previous resilver.

Authored-by: smh <smh@FreeBSD.org>
Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/5154
FreeBSD-issue: https://reviews.freebsd.org/rS271776
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/c3c60bf
Closes #4790
2016-09-09 13:21:09 -07:00
GeLiXin e5c02cbb03 Fix: Array bounds read in zprop_print_one_property()
If the loop index i comes to (ZFS_GET_NCOLS - 1), the cbp->cb_columns[i + 1]
actually read the data of cbp->cb_colwidths[0], which means the array
subscript is above array bounds.

Luckily the cbp->cb_colwidths[0] is always 0 and it seems we haven't
looped enough times to exceed the array bounds so far, but it's really
a secluded risk someday.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5003
2016-09-09 13:21:09 -07:00
GeLiXin e66b546cb7 Fix call zfs_get_name() with invalid parameter
zfs_get_name() expects a parameter of type zfs_handle_t *zhp , but
gets an invalid parameter type of zfs_handle_t **zhp actually in
libzfs_dataset_cmp(), which may trigger a coredump if called.

libzfs_dataset_cmp() working normally so far, just because all the
callers only give datasets of type ZFS_TYPE_FILESYSTEM to it, we
compared their mountpoint and return, luckily.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4919
2016-09-09 13:21:09 -07:00
GeLiXin c23686524f Fix incorrect pool state after import
Import a raidz pool which has a vdev with a bad label, zpool status
shows the right state of the dev, but the wrong state of the pool.
The pool state should be DEGRADED, not ONLINE.

We examine the label in vdev_validate while in spa_load_impl, the bad
label can be detected but doesn't propagate its state to the parent.
There are other chances to propagate state in the following vdev_load
if we failed to load DTL, but our pool is raidz1 which can tolerate a
faulted disk.  So we lost the last chance to correct the pool state.

Propagate the leaf vdev's state to parent if its label was corrupted,
as is done elsewhere in vdev_validate.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #4948
2016-09-09 13:21:09 -07:00
GeLiXin 74acdfc682 Fix self-healing IO prior to dsl_pool_init() completion
Async writes triggered by a self-healing IO may be issued before the
pool finishes the process of initialization.  This results in a NULL
dereference of `spa->spa_dsl_pool` in vdev_queue_max_async_writes().

George Wilson recommended addressing this issue by initializing the
passed `dsl_pool_t **` prior to dmu_objset_open_impl().  Since the
caller is passing the `spa->spa_dsl_pool` this has the effect of
ensuring it's initialized.

However, since this depends on the caller knowing they must pass
the `spa->spa_dsl_pool` an additional NULL check was added to
vdev_queue_max_async_writes().  This guards against any future
restructuring of the code which might result in dsl_pool_init()
being called differently.

Signed-off-by: GeLiXin <47034221@qq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4652
2016-09-09 13:21:09 -07:00
Paul Dagnelie d9e1eec9a2 OpenZFS 6876 - Stack corruption after importing a pool with a too-long name
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking
for trouble. We should check every dataset on import, using a 1024 byte
buffer and checking each time to see if the dataset's new name is longer
than 256 bytes.

OpenZFS-issue: https://www.illumos.org/issues/6876
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ca8674e
2016-09-09 13:21:09 -07:00
Matthew Ahrens 1421562a0d OpenZFS 7263 - deeply nested nvlist can overflow stack
nvlist_pack() and nvlist_unpack are implemented recursively, which can
cause the stack to overflow with a deeply nested nvlist; i.e. an nvlist
which contains an nvlist, which contains an nvlist, which...

Unprivileged users can pass an nvlist to the kernel via certain ioctls
on /dev/zfs, which the kernel will unpack without additional permission
checking or validation. Therefore, an unprivileged user can cause the
kernel's stack to overflow and panic.

Ideally, these functions would be implemented non-recursively. As a
quick fix, this patch limits the depth of the recursion and returns an
error when attempting to pack and unpack a deeply-nested nvlist.

Signed-off-by: Adam Leventhal <ahl@delphix.com>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Prakash Surya <prakash.surya@delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/7263
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0511d6d

-
2016-09-09 13:21:09 -07:00
Chunwei Chen 58000c3ec7 Fix dbuf_stats_hash_table_data race
Dropping DBUF_HASH_MUTEX when walking the hash list is unsafe. The dbuf
can be freed at any time.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4846
2016-09-09 13:21:09 -07:00
Tim Chase e871059bc4 Prevent null dereferences when accessing dbuf kstat
In arc_buf_info(), the arc_buf_t may have no header.  If not, don't try
to fetch the arc buffer stats and instead just zero them.

The null dereferences were observed while accessing the dbuf kstat with
awk on a system in which millions of small files were being created in
order to overflow the system's metadata limit.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4837
2016-09-09 13:21:09 -07:00
Chunwei Chen 91f81c42f0 fh_to_dentry should return ESTALE when generation mismatch
When generation mismatch, it usually means the file pointed by the file handle
was deleted. We should return ESTALE to indicate this. We return ENOENT in
zfs_vget since zpl_fh_to_dentry will convert it to ESTALE.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4828
2016-09-09 13:21:09 -07:00
Chunwei Chen 2ab9247411 Don't allow accessing XATTR via export handle
Allow accessing XATTR through export handle is a very bad idea. It
would allow user to write whatever they want in fields where they
otherwise could not.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4828
2016-09-09 13:21:09 -07:00
Chunwei Chen af4e50750b Fix out-of-bound access in zfs_fillpage
The original code will do an out-of-bound access on pl[] during last
iteration.

 ==================================================================
 BUG: KASAN: stack-out-of-bounds in zfs_getpage+0x14c/0x2d0 [zfs]
 Read of size 8 by task tmpfile/7850
 page:ffffea00017c6dc0 count:0 mapcount:0 mapping:          (null) index:0x0
 flags: 0xffff8000000000()
 page dumped because: kasan: bad access detected
 CPU: 3 PID: 7850 Comm: tmpfile Tainted: G           OE   4.6.0+ #3
  ffff88005f1b7678 0000000006dbe035 ffff88005f1b7508 ffffffff81635618
  ffff88005f1b7678 ffff88005f1b75a0 ffff88005f1b7590 ffffffff81313ee8
  ffffea0001ae8dd0 ffff88005f1b7670 0000000000000246 0000000041b58ab3
 Call Trace:
  [<ffffffff81635618>] dump_stack+0x63/0x8b
  [<ffffffff81313ee8>] kasan_report_error+0x528/0x560
  [<ffffffff81278f20>] ? filemap_map_pages+0x5f0/0x5f0
  [<ffffffff813144b8>] kasan_report+0x58/0x60
  [<ffffffffc12250dc>] ? zfs_getpage+0x14c/0x2d0 [zfs]
  [<ffffffff81312e4e>] __asan_load8+0x5e/0x70
  [<ffffffffc12250dc>] zfs_getpage+0x14c/0x2d0 [zfs]
  [<ffffffffc1252131>] zpl_readpage+0xd1/0x180 [zfs]

  [<ffffffff81353c3a>] SyS_execve+0x3a/0x50
  [<ffffffff810058ef>] do_syscall_64+0xef/0x180
  [<ffffffff81d0ee25>] entry_SYSCALL64_slow_path+0x25/0x25
 Memory state around the buggy address:
  ffff88005f1b7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff88005f1b7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff88005f1b7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
                                                                 ^
  ffff88005f1b7680: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  ffff88005f1b7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ==================================================================

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4705
Issue #4708
2016-09-09 13:21:09 -07:00
Chunwei Chen 3602878ff7 Fix memleak in zpl_parse_options
strsep() will advance tmp_mntopts, and will change it to NULL on last
iteration.  This will cause strfree(tmp_mntopts) to not free anything.

unreferenced object 0xffff8800883976c0 (size 64):
  comm "mount.zfs", pid 3361, jiffies 4294931877 (age 1482.408s)
  hex dump (first 32 bytes):
    72 77 00 73 74 72 69 63 74 61 74 69 6d 65 00 7a  rw.strictatime.z
    66 73 75 74 69 6c 00 6d 6e 74 70 6f 69 6e 74 3d  fsutil.mntpoint=
  backtrace:
    [<ffffffff81810c4e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811f9cac>] __kmalloc+0x16c/0x250
    [<ffffffffc065ce9b>] strdup+0x3b/0x60 [spl]
    [<ffffffffc080fad6>] zpl_parse_options+0x56/0x300 [zfs]
    [<ffffffffc080fe46>] zpl_mount+0x36/0x80 [zfs]
    [<ffffffff81222dc8>] mount_fs+0x38/0x160
    [<ffffffff81240097>] vfs_kern_mount+0x67/0x110
    [<ffffffff812428e0>] do_mount+0x250/0xe20
    [<ffffffff812437d5>] SyS_mount+0x95/0xe0
    [<ffffffff8181aff6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4706
Issue #4708
2016-09-09 13:21:09 -07:00
Chunwei Chen 9f5f758d77 Fix arc_prune_task use-after-free
arc_prune_task uses a refcount to protect arc_prune_t, but it doesn't prevent
the underlying zsb from disappearing if there's a concurrent umount. We fix
this by force the caller of arc_remove_prune_callback to wait for
arc_prune_taskq to finish.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4687
Closes #4690
2016-09-09 13:21:09 -07:00
Chunwei Chen d5b0e7fcf1 Fix get_zfs_sb race with concurrent umount
Certain ioctl operations will call get_zfs_sb, which will holds an active
count on sb without checking whether it's active or not. This will result
in use-after-free. We fix this by using atomic_inc_not_zero to make sure
we got an active sb.

P1                                          P2
---                                         ---
deactivate_locked_super(): s_active = 0
                                            zfs_sb_hold()
                                            ->get_zfs_sb(): s_active = 1
->zpl_kill_sb()
-->zpl_put_super()
--->zfs_umount()
---->zfs_sb_free(zsb)
                                            zfs_sb_rele(zsb)

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-09-09 13:21:09 -07:00
Chunwei Chen ec9b8fae06 Kill zp->z_xattr_parent to prevent pinning
zp->z_xattr_parent will pin the parent. This will cause huge issue
when unlink a file with xattr. Because the unlinked file is pinned, it
will never get purged immediately. And because of that, the xattr
stuff will never be marked as unlinked. So the whole unlinked stuff
will stay there until shrink cache or umount.

This change partially reverts e89260a.  This is safe because only the
zp->z_xattr_parent optimization is removed, zpl_xattr_security_init()
is still called from the zpl outside the inode lock.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #4359
Issue #3508
Issue #4413
Issue #4827
2016-09-09 13:21:09 -07:00
Chunwei Chen f7923f4ada xattr dir doesn't get purged during iput
We need to set inode->i_nlink to zero so iput will purge it. Without this, it
will get purged during shrink cache or umount, which would likely result in
deadlock due to zfs_zget waiting forever on its children which are in the
dispose_list of the same thread.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #4359
Issue #3508
Issue #4413
Issue #4827
2016-09-09 13:21:09 -07:00
Ned Bass 5acbedbbe8 Add ZIO_CHECKSUM_IS_ZERO
The ZIO_CHECKSUM_IS_ZERO macro was added in master commit:

37f8a88 Illumos 5746 - more checksumming in zfs send

That whole patch is not suitable for the release branch
but some other backported patches on that macro.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-09-09 13:21:09 -07:00
Rich Ercolani 3a8e13688b Add tunable to ignore hole_birth (enabled by default)
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue #4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4833
2016-09-09 13:20:54 -07:00
Peng 4f96e68fad Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z
The following scenario can result in garbage in the dn_spill field.
The db->db_blkptr must be set to NULL when DNODE_FLAG_SPILL_BLKPTR
is clear to ensure the dn_spill field is cleared.

Current txg = A.
* A new spill buffer is created. Its dbuf is initialized with
  db_blkptr = NULL and it's dirtied.

Current txg = B.
* The spill buffer is modified. It's marked as dirty in this txg.
* Additional changes make the spill buffer unnecessary because the
  xattr fits into the bonus buffer, so it's removed. The dbuf is
  undirtied in this txg, but it's still referenced and cannot be
  destroyed.

Current txg = C.
* Starts syncing of txg A
* dbuf_sync_leaf() is called for the spill buffer. Since db_blkptr
  is NULL, dbuf_check_blkptr() is called.
* The dbuf starts being written and it reaches the ready state
  (not done yet).
* A new change makes the spill buffer necessary again.
  sa_build_layouts() ends up calling dbuf_find() to locate the
  dbuf.  It finds the old dbuf because it has not been destroyed yet
  (it will be destroyed when the previous write is done and there
  are no more references). The old dbuf has db_blkptr != NULL.
* txg A write is complete and the dbuf released. However it's still
  referenced, so it's not destroyed.

Current txg = D.
* Starts syncing of txg B
* dbuf_sync_leaf() is called for the bonus buffer. Its contents are
  directly copied into the dnode, overwriting the blkptr area because,
  in txg B, the bonus buffer was big enough to hold the entire xattr.
* At this point, the db_blkptr of the spill buffer used in txg C
  gets corrupted.

Signed-off-by: Peng <peng.hse@xtaotech.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3937
2016-09-05 16:07:09 -07:00
Chunwei Chen a77cea5f0f Fix Large kmem_alloc in vdev_metaslab_init
This allocation can go way over 1MB, so we should use vmem_alloc
instead of kmem_alloc.

  Large kmem_alloc(1430784, 0x1000), please file an issue...
  Call Trace:
   [<ffffffffa0324aff>] ? spl_kmem_zalloc+0xef/0x160 [spl]
   [<ffffffffa17d0c8d>] ? vdev_metaslab_init+0x9d/0x1f0 [zfs]
   [<ffffffffa17d46d0>] ? vdev_load+0xc0/0xd0 [zfs]
   [<ffffffffa17d4643>] ? vdev_load+0x33/0xd0 [zfs]
   [<ffffffffa17c0004>] ? spa_load+0xfc4/0x1b60 [zfs]
   [<ffffffffa17c1838>] ? spa_tryimport+0x98/0x430 [zfs]
   [<ffffffffa17f28b1>] ? zfs_ioc_pool_tryimport+0x41/0x80 [zfs]
   [<ffffffffa17f5669>] ? zfsdev_ioctl+0x4a9/0x4e0 [zfs]
   [<ffffffff811bacdf>] ? do_vfs_ioctl+0x2cf/0x4b0
   [<ffffffff811baf41>] ? SyS_ioctl+0x81/0xa0

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4752
2016-09-05 16:07:09 -07:00
Tim Chase db3f5edcf1 Linux 4.6 compat: Fall back to d_prune_aliases() if necessary
As of 4.6, the icache and dcache LRUs are memcg aware insofar as the
kernel's per-superblock shrinker is concerned.  The effect is that dcache
or icache entries added by a task in a non-root memcg won't be scanned
by the shrinker in the context of the root (or NULL) memcg.  This defeats
the attempts by zfs_sb_prune() to unpin buffers and can allow metadata to
grow uncontrollably.  This patch reverts to the d_prune_aliaes() method
in case the kernel's per-superblock shrinker is not able to free anything.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes: #4726
2016-09-05 16:07:09 -07:00
Brian Behlendorf 6ae855d8a7 Systemd configuration fixes
* Disable zfs-import-scan.service by default.  This ensures that
pools will not be automatically imported unless they appear in
the cache file.  When this service is explicitly enabled pools
will be imported with the "cachefile=none" property set.  This
prevents the creation of, or update to, an existing cache file.

    $ systemctl list-unit-files | grep zfs
    zfs-import-cache.service                  enabled
    zfs-import-scan.service                   disabled
    zfs-mount.service                         enabled
    zfs-share.service                         enabled
    zfs-zed.service                           enabled
    zfs.target                                enabled

* Change services to dynamic from static by adding an [Install]
section and adding 'WantedBy' tags in favor of 'Requires' tags.
This allows for easier customization of the boot behavior.

* Start the zfs-import-cache.service after the root pivot so
the cache file is available in the standard location.

* Start the zfs-mount.service after the systemd-remount-fs.service
to ensure the root fs is writeable and the ZFS filesystems can
create their mount points.

* Change the default behavior to only load the ZFS kernel modules
in zfs-import-*.service or when blkid(8) detects a pool.  Users
who wish to unconditionally load the kernel modules must uncomment
the list of modules in /lib/modules-load.d/zfs.conf.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4325
Closes #4496
Closes #4658
Closes #4699
2016-09-05 16:07:09 -07:00
Grischa Zengel ff2a2b208d Add nfs-kernel-server for Debian
Debian based systems use nfs-kernel-server as the service name.
List both nfs-server.service and nfs-kernel-server.service so
this service will work on multiple distributions.

Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4350
2016-09-05 16:07:09 -07:00
Turbo Fredriksson 1db6030de2 Rename 'zed.service' to 'zfs-zed.service'
For consistency all systemd unit files and init scripts now share
the same names.  This prevents an issue where the zed is started
twice on systems where both the systemd and sysv infrastructure is
installed concurrently.

For backward compatibility a 'zed' alias has been added.  This
allows the user to interact with the service using either the
name 'zed' or 'zfs-zed'.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3837
2016-09-05 16:07:09 -07:00
Chunwei Chen f3f0c589c3 Skip ctldir znode in zfs_rezget to fix snapdir issues
Skip ctldir in zfs_rezget, otherwise they will always get invalidated. This
will cause funny behaviour for the mounted snapdirs. Especially for
Linux >= 3.18, d_invalidate will detach the mountpoint and prevent anyone
automount it again as long as someone is still using the detached mount.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4514
Closes #4661
Closes #4672
2016-09-05 16:07:08 -07:00
Chunwei Chen 26e2bfa770 Linux 4.7 compat: fix zpl_get_acl returns invalid acl pointer
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4944
Closes #4946
2016-09-05 16:07:08 -07:00
Brian Behlendorf 97a1bbd4ea Retire HAVE_CURRENT_UMASK and HAVE_POSIX_ACL_CACHING
Remove ZFS_AC_KERNEL_CURRENT_UMASK and ZFS_AC_KERNEL_POSIX_ACL_CACHING
configure checks, all supported kernel provide this functionality.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4922
2016-09-05 16:07:08 -07:00
Brian Behlendorf 01d9981349 Linux 4.7 compat: handler->set() takes both dentry and inode
Counterpart to fd4c7b7, the same approach was taken to resolve
the compatibility issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4717
Issue #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 7043281906 Linux 4.7 compat: use iterate_shared for concurrent readdir
Register iterate_shared if it exists so the kernel will used shared
lock and allowing concurrent readdir.

Also, use shared lock when doing llseek with SEEK_DATA or SEEK_HOLE
to allow concurrent seeking.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4664
Closes #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 1aff4bb235 Linux 4.7 compat: replace blk_queue_flush with blk_queue_write_cache
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 55b8857346 Linux 4.7 compat: handler->get() takes both dentry and inode
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 703c9f5893 Remove dummy znode from zvol_state
struct zvol_state contains a dummy znode, which is around 1KB on x64,
only for zfs_range_lock. But in reality, other than z_range_lock and
z_range_avl, zfs_range_lock only need znode on regular file, which
means we add 1KB on a structure and gain nothing.

In this patch, we remove the dummy znode for zvol_state. In order to
do that, we also need to refactor zfs_range_lock a bit. We move
z_range_lock and z_range_avl pair out of znode_t to form zfs_rlock_t.
This new struct replaces znode_t as the main handle inside the range
lock functions.

We also add pointers to z_size, z_blksz, and z_max_blksz so range lock
code doesn't depend on znode_t.  This allows non-ZPL consumers like
Lustre to use the range locks with their equivalent znode_t structure.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4510
2016-09-05 16:07:08 -07:00
Matthew Ahrens 2ea36ad824 Illumos 4953, 4954, 4955
4953 zfs rename <snapshot> need not involve libshare
4954 "zfs create" need not involve libshare if we are not sharing
4955 libshare's get_zfs_dataset need not sort the datasets
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4953
  https://www.illumos.org/issues/4954
  https://www.illumos.org/issues/4955
  https://github.com/illumos/illumos-gate/commit/33cde0d

Porting notes:
- Dropped qsort libshare_zfs.c hunk, no equivalent ZoL code.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4219
2016-09-05 16:07:08 -07:00
Brian Behlendorf efde19487c Fix ztest truncated cache file
Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
truncate and overwrite rather than rename the cache file.  This is
the correct fix but it should have only been applied for the kernel
build.  In user space rename(2) is needed because ztest depends on
the cache file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4129
2016-09-05 16:07:08 -07:00
AndCycle 347cdb6e61 Obey arc_meta_limit default size when changing arc_max
When decreasing the maximum ARC size preserve the 3/4 default
ratio for the arc_meta_limit.  Otherwise, the arc_meta_limit
may be set the same as arc_max.

Signed-off-by: AndCycle <andcycle@andcycle.idv.tw>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4001
2016-09-05 16:07:08 -07:00
Brian Behlendorf 2aec0bf4c5 Add `make lint` target
Add a `make lint` target which maps to a cppcheck target.  As with
the shellcheck target it will only run when cppcheck is installed.
This allows a `make lint` build check to be incrementally added to
the automated testing for distribution which provide cppcheck.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4915
2016-09-05 16:07:08 -07:00
Marcel Huber 9a6fcfd47d Fixes bug in fix_paths()
Fixes bug introduced in commit 7d90f569a.  Hinted by gcc:

libzfs_import.c: In function ‘fix_paths’:
libzfs_import.c:602:28: warning: self-comparison always evaluates to true [-Wtautological-compare]
    if (best->ne_num_labels == best->ne_num_labels &&

Signed-off-by: Marcel Huber <marcelhuberfoo@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4632
2016-09-05 16:07:08 -07:00
Ned Bass d30abebc85 Prepare to tag zfs-0.6.5.7
META file and release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-05-12 19:35:49 -07:00
Tim Chase 52475b507a Enable PF_FSTRANS for ioctl secpolicy callbacks (#4571)
At the very least, the zfs_secpolicy_write_perms ioctl security policy
callback, which calls dsl_dataset_hold(), can require freeing memory and,
therefore, re-enter ZFS.  This patch enables PF_FSTRANS for all of the
security policy callbacks similarly to the manner in which it's enabled
for the actual ioctl callback.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4554
2016-05-06 18:22:34 -04:00
Brian Behlendorf 2cb77346cb Use udev for partition detection
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4523
Closes #3708
Closes #4077
Closes #4144
Closes #4214
Closes #4517
2016-05-06 18:22:34 -04:00
Brian Behlendorf c9ca152fd1 Fix 'zpool import' blkid device names
When importing a pool using the blkid cache only the device
node path was added to the list of known paths for a device.
This results in 'zpool import' always using the sdX names
in preference to the 'path' name stored in the label.

To fix the issue the blkid import path has been updated to
add both the 'path', 'devid', and 'devname' names from the
label to the known paths.  A sanity check is done to ensure
these paths do refer to the same device identified by blkid.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4523
Closes #3043
2016-05-06 18:22:34 -04:00
Chunwei Chen 21ea9460fa Remove wrong ASSERT in annotate_ecksum
When using large blocks like 1M, there will be more than UINT16_MAX qwords in
one block, so this ASSERT would go off. Also, it is possible for the histogram
to overflow. We cap them to UINT16_MAX to prevent this.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4257
2016-05-01 18:29:51 -04:00
Colin Ian King 354424de5a Add support 32 bit FS_IOC32_{GET|SET}FLAGS compat ioctls
We need 32 bit userspace FS_IOC32_GETFLAGS and FS_IOC32_SETFLAGS
compat ioctls for systems such as powerpc64.  We use the normal
compat ioctl idiom as used by a variety of file systems to provide
this support.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4477
2016-05-01 18:26:09 -04:00
Brian Behlendorf d746e2ea0e Linux 4.6 compat: PAGE_CACHE_SIZE removal
As described in torvalds/linux@4a2d057e the macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were originally introduced
to make it possible to add bigger chunks to the page cache.  This
never panned out and it has therefore been removed from the kernel.

ZFS has been updated to use the PAGE_{SIZE,SHIFT,MASK,ALIGN} macros
and calls to page_cache_release() have been replaced with put_page().

There was no need to introduce a configure check for this because
these interfaces have existed for a very long time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4489
2016-05-01 18:24:30 -04:00
Colin Ian King 60a4ea3f94 Fix inverted logic on none elevator comparison
Commit d1d7e2689d ("cstyle: Resolve C style issues") inverted
the logic on the none elevator comparison.  Fix this and make it
cstyle warning clean.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4507
2016-05-01 18:23:29 -04:00
Brian Behlendorf fbffa53a5c Create unique partition labels
When partitioning a device a name may be specified for each partition.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that might
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4517
2016-05-01 18:20:32 -04:00
Ned Bass 6400ae85ee Fix ZPL miswrite of default POSIX ACL
Commit 4967a3e introduced a typo that caused the ZPL to store the
intended default ACL as an access ACL. Due to caching this problem
may not become visible until the filesystem is remounted or the inode
is evicted from the cache. Fix the typo.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4520
2016-05-01 18:19:54 -04:00
Chunwei Chen 31dbe4b404 Linux 4.5 compat: Use xattr_handler->name for acl
Linux 4.5 added member "name" to xattr_handler. xattr_handler which matches to
whole name rather than prefix should use "name" instead of "prefix".
Otherwise, kernel will return with EINVAL when it tries to resolve handlers.

Also, we remove the strcmp checks when xattr_handler has name, because
xattr_resolve_name will do the check for us.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4549
Closes #4537
2016-05-01 18:17:57 -04:00
Brian Behlendorf a64fb11bf3 Fix user namespaces uid/gid mapping
As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
not the current user_ns, should be passed to posix_acl_from_xattr()
and posix_acl_to_xattr().  Conveniently the init_user_ns is
available through the init credential (kcred).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4177
2016-05-01 18:17:07 -04:00
Gordan Bobic 5e202e55ef Fix aarch64 compilation
sys/param.h depends on types defined in sys/types.h
(hrtime_t & timestruc_t).

Signed-off-by: Gordan Bobic <gordan@redsleeve.org>
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4420
2016-03-24 09:22:05 -07:00
Ned Bass 21f21fe859 Prepare to tag zfs-0.6.5.6
META file and release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-03-22 18:08:04 -07:00
Boris Protopopov d0337e80ca Fix lock order inversion with zvol_open()
zfsonlinux issue #3681 - lock order inversion between zvol_open() and
dsl_pool_sync()...zvol_rename_minors()

Remove trylock of spa_namespace_lock as it is no longer needed when
zvol minor operations are performed in a separate context with no
prior locking state; the spa_namespace_lock is no longer held
when bdev->bd_mutex or zfs_state_lock might be taken in the code
paths originating from the zvol minor operation callbacks.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3681
2016-03-22 18:08:04 -07:00
Boris Protopopov 9d02f9557f Add support for asynchronous zvol minor operations
zfsonlinux issue #2217 - zvol minor operations: check snapdev
property before traversing snapshots of a dataset

zfsonlinux issue #3681 - lock order inversion between zvol_open()
and dsl_pool_sync()...zvol_rename_minors()

Create a per-pool zvol taskq for asynchronous zvol tasks.
There are a few key design decisions to be aware of.

* Each taskq must be single threaded to ensure tasks are always
  processed in the order in which they were dispatched.

* There is a taskq per-pool in order to keep the pools independent.
  This way if one pool is suspended it will not impact another.

* The preferred location to dispatch a zvol minor task is a sync
  task.  In this context there is easy access to the spa_t and
  minimal error handling is required because the sync task must
  succeed.

Support for asynchronous zvol minor operations address issue #3681.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2217
Closes #3678
Closes #3681
2016-03-22 18:08:04 -07:00
Boris Protopopov 682be6e0c9 Make zvol minor functionality more robust
Close the race window in zvol_open() to prevent removal of
zvol_state in the 'first open' code path. Move the call to
check_disk_change() under zvol_state_lock to make sure the
zvol_media_changed() and zvol_revalidate_disk() called by
check_disk_change() are invoked with positive zv_open_count.

Skip opened zvols when removing minors and set private_data
to NULL for zvols that are not in use whose minors are being
removed, to indicate to zvol_open() that the state is gone.
Skip opened zvols when renaming minors to avoid modifying
zv_name that might be in use, e.g. in zvol_ioctl().

Drop zvol_state_lock before calling add_disk() when creating
minors to avoid deadlocks with zvol_open().

Wrap dmu_objset_find() with spl_fstran_mark()/unmark().

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4344
2016-03-22 17:54:07 -07:00
Richard Sharpe 82ff881071 Handling negative dentries in a CI file system.
For a Case Insensitive file system we must avoid creating negative
entries in the dentry cache. We must also pass the FIGNORECASE into
zfs_lookup so that special files are handled correctly.

We must also prevent negative dentries from being created when files are
unlinked.

Tested by running fsstress from LTP (10 loops, 10 processes, 10,000 ops.)

Also tested with printks (now removed) to ensure that lookups come to
zpl_lookup when negative should not exist.

Tests:
1.   ls Some-file.txt; touch some-file.txt; ls Some-file.txt
  and ensure no errors.

2.   touch Some-file.txt; rm some-file.txt; ls Some-file.txt
  and ensure that the last ls shows log messages showing the lookup
  went all the way to zpl_lookup.

Thanks to tuxoko for helping me get this correct.

Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4243
2016-03-22 17:54:07 -07:00
Richard Sharpe a99c845fdc Fix casesensitivity=insensitive deadlock
When casesensitivity=insensitive is set for the
file system, we can deadlock in a rename if the user uses different case
for each path. For example rename("A/some-file.txt", "a/some-file.txt").

The simple test for this is:

1. mkdir some-dir in a ZFS file system
2. touch some-dir/some-file.txt
3. mv Some-dir/some-file.txt some-dir/some-other-file.txt

This last request deadlocks trying to relock the i_mutex on the inode for
the parent directory.

The solution is to use d_add_ci in zpl_lookup if we are on a file system
that has the casesensitivity=insensitive attribute set.

This patch checks if we are working on a case insensitive file system and if
so, allocates storage for the case insensitive name and passes it to
zfs_lookup and then calls d_add_ci instead of d_splice_alias.

The performance impact seems to be minimal even though we have introduced a
kmalloc and kfree in the lookup path.

The problem was found when running Microsoft's FSCT against Samba on top of
ZFS On Linux.

Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4136
2016-03-22 17:54:07 -07:00
Tim Chase d41e763c72 Correctly parse -R flag arguments
Currently, only the 'b' flag takes an argument which is an offset into
the block at which a blkptr should be decoded.  The index into the flag
string needed to be updated after parsing an argument.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4304
2016-03-22 17:54:07 -07:00
Brian Behlendorf b145e23daf Prevent zpool_find_vdev() from truncating vdev path
When extracting tokens from the string strtok(2) is allowed to modify
the passed buffer.  Therefore the zfs_strcmp_pathname() function must
make a copy of the passed string before passing it to strtok(3).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #4312
2016-03-22 17:54:07 -07:00
Brian Behlendorf 83a5c8541e Change full path subcommand flag from -p to -P
Commit d2f3e29 introduced the -p option which outputs full paths
for vdevs to multiple zpool subcommands.  When this was merged
there was no conflict for this flag letter.  However it's certain
there will be a conflict with the -p (parsable) flag used by other
subcommands.  Therefore, -p is being changed to -P to avoid this.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4368
2016-03-22 17:54:07 -07:00
Richard Yao eb02072279 Add -gLp to zpool subcommands for alt vdev names
The following options have been added to the zpool add, iostat,
list, status, and split subcommands.  The default behavior was
not modified, from zfs(8).

  -g    Display vdev GUIDs  instead  of  the  normal  short
        device  names.  These GUIDs can be used in-place of
        device   names   for    the    zpool    detach/off‐
        line/remove/replace commands.

  -L    Display real paths for vdevs resolving all symbolic
        links. This can be used to lookup the current block
        device  name regardless of the /dev/disk/ path used
        to open it.

  -p    Display  full  paths  for vdevs instead of only the
        last component of the path.  This can  be  used  in
        conjunction with the -L flag.

This behavior may also be enabled using the following environment
variables.

  ZPOOL_VDEV_NAME_GUID
  ZPOOL_VDEV_NAME_FOLLOW_LINKS
  ZPOOL_VDEV_NAME_PATH

This change is based on worked originally started by Richard Yao
to add a -g option.  Then extended by @ilovezfs to add a -L option
for openzfsonosx.  Those changes have been merged, re-factored,
a -p option added and extended to all relevant zpool subcommands.

Original-patch-by: Richard Yao <ryao@gentoo.org>
Extended-by: ilovezfs <ilovezfs@icloud.com>
Extended-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ilovezfs <ilovezfs@icloud.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2011
Closes #4341
2016-03-22 17:54:06 -07:00
John Wren Kennedy fa567594b8 Illumos 5767 - fix several problems with zfs test suite
5767 fix several problems with zfs test suite
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References:
  https://www.illumos.org/issues/5767
  https://github.com/illumos/illumos-gate/commit/52244c0

Porting Notes:
- Only the updates to zpool_main.c were kept because the ZFS test
  suite is not currently part of the ZoL source tree.  The test
  suite itself should be updated to include the latest versions
  of the tests once we're running it for every commit
- Fixes `zpool list` output.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-03-22 17:54:06 -07:00
Brian Behlendorf 75233289fc Remove RPM package restriction
ZFS on Linux is regularly tested on arm, ppc, ppc64, i686 and x86_64
architectures.  Given this the artificial architecture restriction in
the packaging has been removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-03-22 17:54:06 -07:00
Dimitri John Ledkov 5f5bc92754 Add support for s390[x].
Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4425
2016-03-22 17:54:06 -07:00
Paul Dagnelie 63ce7b6fcf Illumos 6370 - ZFS send fails to transmit some holes
6370 ZFS send fails to transmit some holes
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Stefan Ring <stefanrin@gmail.com>
Reviewed by: Steven Burgess <sburgess@datto.com>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/6370
  https://github.com/illumos/illumos-gate/commit/286ef71

In certain circumstances, "zfs send -i" (incremental send) can produce
a stream which will result in incorrect sparse file contents on the
target.

The problem manifests as regions of the received file that should be
sparse (and read a zero-filled) actually contain data from a file that
was deleted (and which happened to share this file's object ID).

Note: this can happen only with filesystems (not zvols, because they do
not free (and thus can not reuse) object IDs).

Note: This can happen only if, since the incremental source (FromSnap),
a file was deleted and then another file was created, and the new file
is sparse (i.e. has areas that were never written to and should be
implicitly zero-filled).

We suspect that this was introduced by 4370 (applies only if hole_birth
feature is enabled), and made worse by 5243 (applies if hole_birth
feature is disabled, and we never send any holes).

The bug is caused by the hole birth feature. When an object is deleted
and replaced, all the holes in the object have birth time zero. However,
zfs send cannot tell that the holes are new since the file was replaced,
so it doesn't send them in an incremental. As a result, you can end up
with invalid data when you receive incremental send streams. As a
short-term fix, we can always send holes with birth time 0 (unless it's
a zvol or a dataset where we can guarantee that no objects have been
reused).

Ported-by: Steven Burgess <sburgess@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4369
Closes #4050
2016-03-15 10:01:48 -07:00
Ned Bass 504ff59709 Prepare to tag zfs-0.6.5.5
META file and release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-03-09 14:56:21 -08:00
Brian Behlendorf 84638a5d0c Fix maybe uninitialized
As of gcc 5.1.1 20150618 (Red Hat 5.1.1-4) the -Werror=maybe-uninitialized
check detects that 'snapname' in recv_incremental_replication() may not be
initialized.  Explicitly initialize the variable to resolved the warning.

  libzfs_sendrecv.c: In function ‘recv_incremental_replication’:
  libzfs_sendrecv.c:2019:2: error: ‘snapname’ may be used uninitialized in
    (void) snprintf(buf, sizeof (buf), "%s@%s", fsname, snapname);

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-29 14:11:11 -08:00
Brian Behlendorf 9986e9f544 Linux 4.5 compat: pfn_t typedef
The pfn_t typedef was inherited from Illumos but never directly
used by any libspl consumers.  This doesn't cause any issues in
user space but for consistency with the kernel build it has been
removed.  See torvalds/linux/commit/34c0fd54.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
2016-01-29 09:52:13 -08:00
Brian Behlendorf 9842008fc0 Linux 4.5 compat: xattr list handler
The registered xattr .list handler was simplified in the 4.5 kernel
to only perform a permission check.  Given a dentry for the file it
must return a boolean indicating if the name is visible.  This
differs slightly from the previous APIs which also required the
function to copy the name in to the provided list and return its
size.  That is now all the responsibility of the caller.

This should be straight forward change to make to ZoL since we've
always required the caller to make the copy.  However, this was
slightly complicated by the need to support 3 older APIs.  Yes,
between 2.6.32 and 4.5 there are 4 versions of this interface!

Therefore, while the functional change in this patch is small it
includes significant cleanup to make the code understandable and
maintainable.  These changes include:

- Improved configure checks for .list, .get, and .set interfaces.
  - Interfaces checked from newest to oldest.
  - Strict checking for each possible known interface.
  - Configure fails when no known interface is available.
  - HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency
    with similar iops and fops configure checks.

- POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to
  move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}.
  Compatibility wrapper were added for old kernels.

- ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing
  ZPL_XATTR_{GET|SET} WRAPPERs.  Only the inode is guaranteed to be
  a valid pointer, passing NULL for the 'list' and 'name' variables
  is allowed and must be checked for.  All .list functions were
  updated to use the wrapper to aid readability.

- zpl_xattr_filldir() updated to use the .list function for its
  permission check which is consistent with the updated Linux 4.5
  interface.  If a .list function is registered it should return 0
  to indicate a name should be skipped, if there is no registered
  function the name will be added.

- Additional documentation from xattr(7) describing the correct
  behavior for each namespace was added before the relevant handlers.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
2016-01-29 09:52:13 -08:00
Brian Behlendorf b3c9e2caf5 Linux 4.5 compat: get_link() / put_link()
The follow_link() interface was retired in favor of get_link().
In the process of phasing in get_link() the Linux kernel went
through two different versions.  The first of which depended
on put_link() and the final version on a delayed done function.

- Improved configure checks for .follow_link, .get_link, .put_link.
  - Interfaces checked from newest to oldest.
  - Strict checking for each possible known interface.
  - Configure fails when no known interface is available.

- Both versions .get_link are detected and supported as well
  two previous versions of .follow_link.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
2016-01-29 09:52:13 -08:00
Olaf Faaland f00a5734f6 Create zfs-kmod-debuginfo rpm with redhat spec file
Correct the redhat specfile so that working debuginfo rpms are created
for the kernel modules.  The generic specfile already does the right
thing.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4224
2016-01-29 09:52:12 -08:00
Hajo Möller ee24b44f5e Make arc_summary.py and dbufstat.py compatible with python3
To make arc_summary.py and dbufstat.py compatible with python3
some minor fixes were required, this was done automatically by
`2to3 -w arc_summary.py` and `2to3 -w dbufstat.py`.

Signed-off-by: Hajo Möller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Reviewed-by: Richard Laager <rlaager@wiktel.com>
2016-01-29 09:52:12 -08:00
Christian Neukirchen a5ab85824d mount.zfs: use getopt_long instead of getopt to guarantee permutation of argv.
mount.zfs is called by convention (and util-linux) with arguments
last, i.e.

  % mount.zfs <dataset> <mountpoint> -o <options>

This is not a problem on glibc since GNU getopt(3) will reorder the
arguments.  However, alternative libc such as musl libc (or glibc with
$POSIXLY_CORRECT set) will not permute argv and fail to parse the -o
<options>.  Use getopt_long so musl will permute arguments.

Signed-off-by: Christian Neukirchen <chneukirchen@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4222
2016-01-29 09:52:12 -08:00
Tim Chase 6f7acfc9c9 Prevent arc_c collapse
Adjusting arc_c directly is racy because it can happen in the context
of multiple threads.  It should always be >= 2 * maxblocksize.  Set it
to a known valid value rather than adjusting it directly.

In addition refactor arc_shrink() to a simpler structure, protect against
underflow in the calculation of the new arc_c value.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reverts: 935434ef
Closes: #3904
Closes: #4161
2016-01-29 09:52:12 -08:00
Chunwei Chen a08add3067 Prevent duplicated xattr between SA and dir
When replacing an xattr would cause overflowing in SA, we would fallback
to xattr dir. However, current implementation don't clear the one in SA,
so we would end up with duplicated SA.

For example, running the following script on an xattr=sa filesystem
would cause duplicated "user.1".

-- dup_xattr.sh begin --
randbase64()
{
        dd if=/dev/urandom bs=1 count=$1 2>/dev/null | openssl enc -a -A
}

file=$1
touch $file
setfattr -h -n user.1 -v `randbase64 5000` $file
setfattr -h -n user.2 -v `randbase64 20000` $file
setfattr -h -n user.3 -v `randbase64 20000` $file
setfattr -h -n user.1 -v `randbase64 20000` $file
getfattr -m. -d $file
-- dup_xattr.sh end --

Also, when a filesystem is switch from xattr=sa to xattr=on, it will
never modify those in SA. This would cause strange behavior like, you
cannot delete an xattr, or setxattr would cause duplicate and the result
would not match when you getxattr.

For example, the following shell sequence.

-- shell begin --
$ sudo zfs set xattr=sa pp/fs0
$ touch zzz
$ setfattr -n user.test -v asdf zzz
$ sudo zfs set xattr=on pp/fs0
$ setfattr -x user.test zzz
setfattr: zzz: No such attribute
$ getfattr -d zzz
user.test="asdf"
$ setfattr -n user.test -v zxcv zzz
$ getfattr -d zzz
user.test="asdf"
user.test="asdf"
-- shell end --

We fix this behavior, by first finding where the xattr resides before
setxattr. Then, after we successfully updated the xattr in one location,
we will clear the other location. Note that, because update and clear
are not in single tx, we could still end up with duplicated xattr. But
by doing setxattr again, it can be fixed.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #3472
Closes #4153
2016-01-29 09:52:12 -08:00
Brian Behlendorf 0a2f95748d Close possible zfs_znode_held() race
Check if the lock is held while holding the z_hold_locks() lock.
This prevents a possible use-after-free bug for callers which are
not holding the lock.  There currently are no such callers so this
can't cause a problem today but it has been fixed regardless.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4244
Issue #4124
2016-01-29 09:52:12 -08:00
Brian Behlendorf 3b9fd93d0b Fix zsb->z_hold_mtx deadlock
The zfs_znode_hold_enter() / zfs_znode_hold_exit() functions are used to
serialize access to a znode and its SA buffer while the object is being
created or destroyed.  This kind of locking would normally reside in the
znode itself but in this case that's impossible because the znode and SA
buffer may not yet exist.  Therefore the locking is handled externally
with an array of mutexs and AVLs trees which contain per-object locks.

In zfs_znode_hold_enter() a per-object lock is created as needed, inserted
in to the correct AVL tree and finally the per-object lock is held.  In
zfs_znode_hold_exit() the process is reversed.  The per-object lock is
released, removed from the AVL tree and destroyed if there are no waiters.

This scheme has two important properties:

1) No memory allocations are performed while holding one of the z_hold_locks.
   This ensures evict(), which can be called from direct memory reclaim, will
   never block waiting on a z_hold_locks which just happens to have hashed
   to the same index.

2) All locks used to serialize access to an object are per-object and never
   shared.  This minimizes lock contention without creating a large number
   of dedicated locks.

On the downside it does require znode_lock_t structures to be frequently
allocated and freed.  However, because these are backed by a kmem cache
and very short lived this cost is minimal.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4106
2016-01-29 09:52:12 -08:00
Brian Behlendorf 05c3401e3f Add zfs_object_mutex_size module option
Add a zfs_object_mutex_size module option to facilitate resizing the
the per-dataset znode mutex array.  Increasing this value may help
make the deadlock described in #4106 less common, but this is not a
proper fix.  This patch is primarily to aid debugging and analysis.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #4106
2016-01-29 09:52:12 -08:00
Ned Bass a5dae61721 Prevent SA length overflow
The function sa_update() accepts a 32-bit length parameter and
assigns it to a 16-bit field in sa_bulk_attr_t, potentially
truncating the passed-in value. This could lead to corrupt system
attribute (SA) records getting written to the pool. Add a VERIFY to
sa_update() to detect cases where overflow would occur. The SA length
is limited to 16-bit values by the on-disk format defined by
sa_hdr_phys_t.

The function zfs_sa_set_xattr() is vulnerable to this bug if the
unpacked nvlist of xattrs is less than 64k in size but the packed
size is greater than 64k. Fix this by appropriately checking the
size of the packed nvlist before calling sa_update(). Add error
handling to zpl_xattr_set_sa() to keep the cached list of SA-based
xattrs consistent with the data on disk.

Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned
transaction if sa_update() returns an error, but the DMU only allows
unassigned transactions to be aborted. Wrap the sa_update() call in a
VERIFY0, remove the transaction abort, and call dmu_tx_commit()
unconditionally. This is consistent practice with other callers
of sa_update().

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4150
2016-01-29 09:41:14 -08:00
Ned Bass 1ffc4c150e Prepare to tag zfs-0.6.5.4
META file and release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-01-08 16:08:18 -08:00
Chunwei Chen d621aa5431 Make xattr dir truncate and remove in one tx
We need truncate and remove be in the same tx when doing zfs_rmnode on xattr
dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap
object on the delete queue. We do this by skipping dmu_free_long_range and let
zfs_znode_delete to do the work.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4114
Issue #4052
Issue #4006
Issue #3018
Issue #2861
2015-12-30 16:13:37 -08:00
Chunwei Chen 19d991a99e Fix empty xattr dir causing lockup
During zfs_rmnode on a xattr dir, if the system crash just after
dmu_free_long_range, we would get empty xattr dir in delete queue. This would
cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during
mount, and would try to do rw_enter on a wrong structure and cause system
lockup.

We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4114
Closes #4052
Closes #4006
Closes #3018
Closes #2861
2015-12-30 16:13:30 -08:00
Chunwei Chen 0bf37725f8 Fix fail path in zfs_znode_alloc
When sa_bulk_lookup() fails, unlock_new_inode() will spit out a WARNING. It
will also recursive deadlock on ZFS_OBJ_HOLD_ENTER in zfs_zinactive().

Since we never call insert_inode_locked in fail path, I_NEW is never set, the
inode is never hashed. So unlock_new_inode() can be safely remove it.

We set z_sa_hdl to NULL in fail path so that iput path will stop at
zfs_inactive() without entering zfs_zinactive(). This way we can avoid the
deadlock and prevent double sa_handle_destroy().

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3899
2015-12-23 17:37:20 -08:00
Brian Behlendorf 3445a340d5 Fix z_xattr_lock/z_teardown_lock inversion
There exists a lock inversion between the z_xattr_lock and the
z_teardown_lock.  Resolve this by taking the z_teardown_lock in
all registered xattr callbacks prior to taking the z_xattr_lock.
This ensures the locks are always taken is the same order thus
preventing a deadlock.  Note the z_teardown_lock is taken again
in zfs_lookup() and this is safe because the z_teardown lock is
a re-entrant read reader/writer lock.

* process-1
zpl_xattr_get -> Takes zp->z_xattr_lock
  __zpl_xattr_get
    zfs_lookup -> Takes zsb->z_teardown_lock in ZFS_ENTER macro

* process-2
zfs_ioc_recv -> Takes zsb->z_teardown_lock in zfs_suspend_fs()
  zfs_resume_fs
    zfs_rezget -> Takes zp->z_xattr_lock

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #3943
Closes #3969
Closes #4121
2015-12-23 17:37:08 -08:00
Jason Zaman 44f547af98 sysmacros: Make P2ROUNDUP not trigger int overflow
The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which
triggers PaX's integer overflow detection for unsigned integers.
Replace the macros with an equivalent version that does not trigger
the overflow.

Axioms:
A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement.
B. ~(x & y) === ((~(x)) | (~(y))) under De Morgan's law.
C. ~(~x) === x under the law of excluded middle.

Proof:
0. (-(-(x) & -(align))) original
1. (~(-(x) & -(align)) + 1) by A
2. (((~(-(x))) | (~(-(align)))) + 1) by B
3. (((~(~((x) - 1))) | (~(~((align) - 1)))) + 1) by A
4. (((((x) - 1)) | (((align) - 1))) + 1) by C
Q.E.D.

Signed-off-by: Jason Zaman <jason@perfinion.com>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3949
2015-12-23 17:29:35 -08:00
Brian Behlendorf a0dba38cd4 Follow 0/-E convention for module load errors
Because errors during module load are so rare it went unnoticed that
it was possible that a positive errno was returned.  This would result
in the module being loaded, nothing being initialized, and a system
panic shortly thereafter.  This is what was causing the hard failures
in the automated testing.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-12-23 17:29:35 -08:00
Brian Behlendorf 2b50578e29 Add TEST configuration file for buildbot
The TEST file is provided as a hint to the automated test infra-
structure.  It controls which regression tests are run and how they
are run.  This file along with any lines in the commit messages
which start with TEST_*  are sourced by the test scripts and can
be used to override the default values.  For complete details see:

https://github.com/zfsonlinux/zfs-buildbot/

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-12-23 17:29:35 -08:00
tuxoko a3fcc7c48b Fix zfs_dirty_data_max overflow on 32-bit
On 32 bit, the calculation of zfs_dirty_data_max from phymem will overflow,
causing it to be smaller than zfs_dirty_data_sync, and will cause txg being
delayed while no one write to disk. The end result is horrendous write speed.

On 4G ram 32-bit VM, before this patch, simple dd results in ~7MB/s. Now it
can reach speed on par with 64-bit VM.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3973
2015-12-23 17:29:35 -08:00
tuxoko becc31dda7 Fix null pointer in arc_kmem_reap_now on 32-bit
On 32 bit system, zio_buf_cache is limit to 1M. Larger than that is all NULL.
So we need to avoid reaping them.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3973
2015-12-23 17:29:35 -08:00
Brian Behlendorf d8ac2b39c9 Fix --enable-linux-builtin
Adding VPATH support, commit 47a4a6f, required that a `src`
and `obj` line be added to the top of the Makefiles.  They
must be removed from the Makefiles when builtin.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/spl#481
Issue zfsonlinux/spl#498
2015-12-23 17:29:34 -08:00
tuxoko 13a9527913 Prevent rm modules.* when make install
This was originally in fe0ed8f910, but somehow
was changed and not working anymore. And it will cause the following error:

modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin'

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4027
2015-12-23 17:29:34 -08:00
Brian Behlendorf 627b35a68d Add zap_prefetch() interface
Provide a generic interface to prefetch ZAP entries by name.  This
functionality is being added for external consumers such as Lustre.
It is based of the existing zap_prefetch_uint64() version which is
used by the deduplication code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4061
2015-12-23 17:29:34 -08:00
cable2999 5856071367 Update arcstat.py to remove deprecated rmis reference.
Running arcstat.py -x currently throws KeyError due to rmis being
absent, it was removed in commit ca0bf58.

Signed-off-by: cable2999 <cable2999@users.noreply.github.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3931
2015-12-23 17:29:34 -08:00
Olaf Faaland 637fb2fcfa Remove "index" column from dbufstat.py
Commit ca0bf58d to address arcs_mtx contention removed column "index"
from the output of kstats/dbuf.

dbufstat.py was not updated to reflect this, which causes it to crash
when run with -bx

This removes "index" from hardcoded lists of columns.

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4096
2015-12-23 17:29:34 -08:00
Brian Behlendorf 8bdac257b8 Fix zfsctl_lookup_objset() deadlock
The zfsctl_snapshot_unmount_delay() function must not be called
from zfsctl_lookup_objset() while it is currently holding the
zfs_snapshot_lock.  This will result in a deadlock.  It is safe
to call zfsctl_snapshot_unmount_delay_impl() directly because the
function already has a reference on the zfs_snapentry_t.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #3997
2015-12-23 17:29:34 -08:00
Brian Behlendorf a0b4635fb0 Either _ILP32 or _LP64 must be defined
For some arm, powerpc, and sparc platforms it was possible that
neither _ILP32 of _LP64 would be defined.  Update the isa_defs.h
header to explicitly set these macros and generate a compile error
in the case neither are defined.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4048
2015-12-23 17:29:34 -08:00
Brian Behlendorf cb98d1ef27 Hold the zfs_snapentry_t before dispatch
While exceptionally unlikely to cause a problem the zfs_snapentry_t
hold should be taken before the dispatch to prevent any possibility
of the task being processed before the hold.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2015-12-23 17:29:34 -08:00
Chunwei Chen 813a4af70e Fix snapshot automount race cause EREMOTE
When a concorrent mount finishes just before calling to
zfsctl_snapshot_ismounted, if we return EISDIR, the VFS will return
with EREMOTE. We should instead just return 0, so VFS may retry and
would likely notice the dentry is alreadly mounted. This will be
inline with when usermode helper return EBUSY.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-12-23 17:29:34 -08:00
Brian Behlendorf e16c04d643 Change zfs_snapshot_lock from mutex to rw lock
By changing the zfs_snapshot_lock from a mutex to a rw lock the
zfsctl_lookup_objset() function can be allowed to run concurrently.
This should reduce the latency of fh_to_dentry lookups in ZFS
snapshots which are being accessed over NFS.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2015-12-23 17:29:34 -08:00
Chunwei Chen 9b41d9c1b2 Use spa as key besides objsetid for snapentry
objsetid is not unique across pool, so using it solely as key would cause
panic when automounting two snapshot on different pools with the same
objsetid. We fix this by adding spa pointer as additional key.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3948
Issue #3786
Issue #3887
2015-12-23 17:29:34 -08:00
Chunwei Chen c5e30a0ff9 Fix snapshot automount behavior when concurrent or fail
When concurrent threads accessing the snapdir, one will succeed the user
helper mount while others will get EBUSY. However, the original code treats
those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which
causes repeated avl_add and thus panic.

Also, if the snapshot is already mounted somewhere else, a thread accessing
the snapdir will also get EBUSY from user helper mount. And it will cause
strange things as doing follow_down_one will fail and then follow_up will jump
up to the mountpoint of the filesystem and confuse the hell out of VFS.

The patch fix both behavior by returning 0 immediately for the EBUSY threads.
Note, this will have a side effect for the second case where the VFS will
retry several times before returning ELOOP.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4018
2015-12-23 17:29:34 -08:00
Brian Behlendorf 279e27db23 Set 'zfs_expire_snapshot=0' to disable auto-unmount
There are cases where it's desirable that auto-mounted snapshots
not expire after a fixed duration.  They should be unmounted only
when the filesystem they are a snapshot of is unmounted.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2015-12-23 17:29:34 -08:00
Brian Behlendorf 5f4004efc0 Fix vdev_queue_aggregate() deadlock
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #3808
Closes #3867
2015-12-23 17:29:34 -08:00
Kamil Domański 65d65b7a8d Skip GPL-only symbols test when cross-compiling
Signed-off-by: Kamil Domański <kamil@domanski.co>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4107
2015-12-23 17:29:34 -08:00
DHE f7dfb8b07a Make zio_taskq_batch_pct user configurable
Adds zio_taskq_batch_pct as an exported module parameter,
allowing users to modify it at module load time.

Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4110
2015-12-23 17:29:34 -08:00
Chunwei Chen 15126e5d08 Linux 4.4 compat: xattr operations takes xattr_handler
The xattr_hander->{list,get,set} were changed to take a xattr_handler,
and handler_flags argument was removed and should be accessed by
handler->flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4021
2015-12-04 14:59:10 -08:00
Chunwei Chen e909a45d22 Linux 4.4 compat: make_request_fn returns blk_qc_t
As part of block polling support in Linux 4.4, make_request_fn should
return a cookie value of type blk_qc_t. For now, we make zvol_request
always return BLK_QC_T_NONE until we assess whether and how we want
to support block polling.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4021
2015-12-04 14:58:32 -08:00
Ned Bass 9aaf60b66d Prepare to tag zfs-0.6.5.3
META file and release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2015-10-13 15:35:23 -07:00
Justin T. Gibbs f9f5394f74 Illumos 6267 - dn_bonus evicted too early
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/6267
  https://github.com/illumos/illumos-gate/commit/d205810

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Ned Bass <bass6@llnl.gov>
Issue #3865
Issue #3443
2015-10-13 15:32:16 -07:00
Chunwei Chen cd887ab869 Fix use-after-free in vdev_disk_physio_completion
Currently, vdev_disk_physio_completion will try to wake up an waiter without
first checking the existence. This creates a race window in which complete is
called after dr is freed.

We add dr_wait in dio_request to indicate the existence of waiter. Also,
remove dr_rw since no one is using it, and reorder dr_ref to make the struct
more compact in 64bit.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2015-10-13 15:31:44 -07:00
James Lee 16a276f109 zfs-import: Perform verbatim import using cache file
This change modifies the import service to use the default cache file
to perform a verbatim import of pools at boot.  This fixes code that
searches all devices and imported all visible pools.

Using the cache file is in keeping with the way ZFS has always worked,
how Solaris, Illumos, FreeBSD, and systemd performs imports, and is how
it is written in the man page (zpool(1M,8)):

    All pools  in  this  cache  are  automatically imported when the
    system boots.

Importantly, the cache contains important information for importing
multipath devices, and helps control which pools get imported in more
dynamic environments like SANs, which may have thousands of visible
and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable
is not equipped to handle.  Verbatim imports prevent rogue pools from
being automatically imported and mounted where they shouldn't be.

The change also stops the service from exporting pools at shutdown.
Exporting pools is only meant to be performed explicitly by the
administrator of the system.

The old behavior of searching and importing all visible pools is
preserved and can be switched on by heeding the warning and toggling
the ZPOOL_IMPORT_ALL_VISIBLE variable in /etc/default/zfs.

Signed-off-by: James Lee <jlee@thestaticvoid.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3777
Closes #3526
2015-10-13 14:33:49 -07:00
140 changed files with 5868 additions and 2331 deletions

2
META
View File

@ -1,7 +1,7 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 0.6.5.2
Version: 0.6.5.11
Release: 1
Release-Tags: relext
License: CDDL

View File

@ -55,6 +55,13 @@ shellcheck:
done; \
fi
lint: cppcheck
cppcheck:
@if type cppcheck > /dev/null 2>&1; then \
cppcheck --quiet --force ${top_srcdir}; \
fi
ctags:
$(RM) tags
find $(top_srcdir) -name .git -prune -o -name '*.[hc]' | xargs ctags

91
TEST Normal file
View File

@ -0,0 +1,91 @@
#!/bin/sh
### prepare
#TEST_PREPARE_WATCHDOG="no"
### SPLAT
#TEST_SPLAT_SKIP="yes"
#TEST_SPLAT_OPTIONS="-acvx"
### ztest
#TEST_ZTEST_SKIP="yes"
#TEST_ZTEST_TIMEOUT=1800
#TEST_ZTEST_DIR="/var/tmp/"
#TEST_ZTEST_OPTIONS="-V"
### ziltest
#TEST_ZILTEST_SKIP="yes"
#TEST_ZILTEST_OPTIONS=""
### zconfig
#TEST_ZCONFIG_SKIP="yes"
TEST_ZCONFIG_OPTIONS="-c -s10"
### zimport
#TEST_ZIMPORT_SKIP="yes"
#TEST_ZIMPORT_DIR="/var/tmp/zimport"
#TEST_ZIMPORT_VERSIONS="master installed"
#TEST_ZIMPORT_POOLS="zol-0.6.1 zol-0.6.2 master installed"
#TEST_ZIMPORT_OPTIONS="-c"
### xfstests
#TEST_XFSTESTS_SKIP="yes"
#TEST_XFSTESTS_URL="https://github.com/behlendorf/xfstests/archive/"
#TEST_XFSTESTS_VER="zfs.tar.gz"
#TEST_XFSTESTS_POOL="tank"
#TEST_XFSTESTS_FS="xfstests"
#TEST_XFSTESTS_VDEV="/var/tmp/vdev"
#TEST_XFSTESTS_OPTIONS=""
### filebench
#TEST_FILEBENCH_SKIP="yes"
#TEST_FILEBENCH_URL="http://build.zfsonlinux.org/"
#TEST_FILEBENCH_VER="filebench-1.4.9.1.tar.gz"
#TEST_FILEBENCH_RUNTIME=10
#TEST_FILEBENCH_POOL="tank"
#TEST_FILEBENCH_FS="filebench"
#TEST_FILEBENCH_VDEV="/var/tmp/vdev"
#TEST_FILEBENCH_DIR="/$TEST_FILEBENCH_POOL/$TEST_FILEBENCH_FS"
#TEST_FILEBENCH_OPTIONS=""
### zfsstress
#TEST_ZFSSTRESS_SKIP="yes"
#TEST_ZFSSTRESS_URL="https://github.com/nedbass/zfsstress/archive/"
#TEST_ZFSSTRESS_VER="master.tar.gz"
#TEST_ZFSSTRESS_RUNTIME=300
#TEST_ZFSSTRESS_POOL="tank"
#TEST_ZFSSTRESS_FS="fish"
#TEST_ZFSSTRESS_VDEV="/var/tmp/vdev"
#TEST_ZFSSTRESS_DIR="/$TEST_ZFSSTRESS_POOL/$TEST_ZFSSTRESS_FS"
#TEST_ZFSSTRESS_OPTIONS=""
### per-builder customization
#
# BB_NAME=builder-name <distribution-version-architecture-type>
# - distribution=Amazon,Debian,Fedora,RHEL,SUSE,Ubuntu
# - version=x.y
# - architecture=x86_64,i686,arm,aarch64
# - type=build,test
#
case "$BB_NAME" in
Amazon*)
;;
CentOS*)
# Sporadic segmentation faults
TEST_ZTEST_SKIP="yes"
# Sporadic VERIFY(!zilog_is_dirty(zilog)) failed
TEST_ZILTEST_SKIP="yes"
;;
Debian*)
;;
Fedora*)
;;
RHEL*)
;;
SUSE*)
;;
Ubuntu*)
;;
*)
;;
esac

View File

@ -94,7 +94,7 @@ def get_Kstat():
def div1():
sys.stdout.write("\n")
for i in xrange(18):
for i in range(18):
sys.stdout.write("%s" % "----")
sys.stdout.write("\n")
@ -1060,7 +1060,7 @@ def _tunable_summary(Kstat):
if alternate_tunable_layout:
format = "\t%s=%s\n"
if show_tunable_descriptions and descriptions.has_key(name):
if show_tunable_descriptions and name in descriptions:
sys.stdout.write("\t# %s\n" % descriptions[name])
sys.stdout.write(format % (name, values[name]))
@ -1132,7 +1132,7 @@ def main():
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError , e:
except IndexError as e:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
sys.exit()

View File

@ -97,8 +97,8 @@ cols = {
v = {}
hdr = ["time", "read", "miss", "miss%", "dmis", "dm%", "pmis", "pm%", "mmis",
"mm%", "arcsz", "c"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "rmis",
"dread", "pread", "read"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "dread",
"pread", "read"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output

View File

@ -34,7 +34,7 @@ import errno
bhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize"]
bxhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize",
"meta", "state", "dbholds", "list", "atype", "index", "flags",
"meta", "state", "dbholds", "list", "atype", "flags",
"count", "asize", "access", "mru", "gmru", "mfu", "gmfu", "l2",
"l2_dattr", "l2_asize", "l2_comp", "aholds", "dtype", "btype",
"data_bs", "meta_bs", "bsize", "lvls", "dholds", "blocks", "dsize"]
@ -45,7 +45,7 @@ dxhdr = ["pool", "objset", "object", "dtype", "btype", "data_bs", "meta_bs",
"bsize", "lvls", "dholds", "blocks", "dsize", "cached", "direct",
"indirect", "bonus", "spill"]
dincompat = ["level", "blkid", "offset", "dbsize", "meta", "state", "dbholds",
"list", "atype", "index", "flags", "count", "asize", "access",
"list", "atype", "flags", "count", "asize", "access",
"mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr", "l2_asize",
"l2_comp", "aholds"]
@ -53,7 +53,7 @@ thdr = ["pool", "objset", "dtype", "cached"]
txhdr = ["pool", "objset", "dtype", "cached", "direct", "indirect",
"bonus", "spill"]
tincompat = ["object", "level", "blkid", "offset", "dbsize", "meta", "state",
"dbholds", "list", "atype", "index", "flags", "count", "asize",
"dbholds", "list", "atype", "flags", "count", "asize",
"access", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr",
"l2_asize", "l2_comp", "aholds", "btype", "data_bs", "meta_bs",
"bsize", "lvls", "dholds", "blocks", "dsize"]
@ -72,7 +72,6 @@ cols = {
"dbholds": [7, 1000, "number of holds on buffer"],
"list": [4, -1, "which ARC list contains this buffer"],
"atype": [7, -1, "ARC header type (data or metadata)"],
"index": [5, -1, "buffer's index into its ARC list"],
"flags": [8, -1, "ARC read flags"],
"count": [5, -1, "ARC data count"],
"asize": [7, 1024, "size of this ARC buffer"],
@ -387,9 +386,9 @@ def update_dict(d, k, line, labels):
def print_dict(d):
print_header()
for pool in d.keys():
for objset in d[pool].keys():
for v in d[pool][objset].values():
for pool in list(d.keys()):
for objset in list(d[pool].keys()):
for v in list(d[pool][objset].values()):
print_values(v)

View File

@ -32,6 +32,7 @@
#include <sys/stat.h>
#include <libzfs.h>
#include <locale.h>
#include <getopt.h>
#define ZS_COMMENT 0x00000000 /* comment */
#define ZS_ZFSUTIL 0x00000001 /* caller is zfs(8) */
@ -76,7 +77,10 @@ static const option_map_t option_map[] = {
{ MNTOPT_RELATIME, MS_RELATIME, ZS_COMMENT },
#endif
#ifdef MS_STRICTATIME
{ MNTOPT_DFRATIME, MS_STRICTATIME, ZS_COMMENT },
{ MNTOPT_STRICTATIME, MS_STRICTATIME, ZS_COMMENT },
#endif
#ifdef MS_LAZYTIME
{ MNTOPT_LAZYTIME, MS_LAZYTIME, ZS_COMMENT },
#endif
{ MNTOPT_CONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_FSCONTEXT, MS_COMMENT, ZS_COMMENT },
@ -389,7 +393,7 @@ main(int argc, char **argv)
opterr = 0;
/* check options */
while ((c = getopt(argc, argv, "sfnvo:h?")) != -1) {
while ((c = getopt_long(argc, argv, "sfnvo:h?", 0, 0)) != -1) {
switch (c) {
case 's':
sloppy = 1;
@ -604,10 +608,23 @@ main(int argc, char **argv)
"failed for unknown reason.\n"), dataset);
}
return (MOUNT_SYSERR);
#ifdef MS_MANDLOCK
case EPERM:
if (mntflags & MS_MANDLOCK) {
(void) fprintf(stderr, gettext("filesystem "
"'%s' has the 'nbmand=on' property set, "
"this mount\noption may be disabled in "
"your kernel. Use 'zfs set nbmand=off'\n"
"to disable this option and try to "
"mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR);
}
/* fallthru */
#endif
default:
(void) fprintf(stderr, gettext("filesystem "
"'%s' can not be mounted due to error "
"%d\n"), dataset, errno);
"'%s' can not be mounted: %s\n"), dataset,
strerror(errno));
return (MOUNT_USAGE);
}
}

View File

@ -184,9 +184,9 @@ sas_handler() {
return
fi
# Get the raw scsi device name from multipath -l. Strip off
# Get the raw scsi device name from multipath -ll. Strip off
# leading pipe symbols to make field numbering consistent.
DEV=`multipath -l $DM_NAME |
DEV=`multipath -ll $DM_NAME |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
if [ -z "$DEV" ] ; then
return

View File

@ -67,13 +67,22 @@
zio_compress_table[(idx)].ci_name : "UNKNOWN")
#define ZDB_CHECKSUM_NAME(idx) ((idx) < ZIO_CHECKSUM_FUNCTIONS ? \
zio_checksum_table[(idx)].ci_name : "UNKNOWN")
#define ZDB_OT_NAME(idx) ((idx) < DMU_OT_NUMTYPES ? \
dmu_ot[(idx)].ot_name : DMU_OT_IS_VALID(idx) ? \
dmu_ot_byteswap[DMU_OT_BYTESWAP(idx)].ob_name : "UNKNOWN")
#define ZDB_OT_TYPE(idx) ((idx) < DMU_OT_NUMTYPES ? (idx) : \
(((idx) == DMU_OTN_ZAP_DATA || (idx) == DMU_OTN_ZAP_METADATA) ? \
DMU_OT_ZAP_OTHER : DMU_OT_NUMTYPES))
static char *
zdb_ot_name(dmu_object_type_t type)
{
if (type < DMU_OT_NUMTYPES)
return (dmu_ot[type].ot_name);
else if ((type & DMU_OT_NEWTYPE) &&
((type & DMU_OT_BYTESWAP_MASK) < DMU_BSWAP_NUMFUNCS))
return (dmu_ot_byteswap[type & DMU_OT_BYTESWAP_MASK].ob_name);
else
return ("UNKNOWN");
}
#ifndef lint
extern int zfs_recover;
extern uint64_t zfs_arc_max, zfs_arc_meta_limit;
@ -469,7 +478,7 @@ static void
dump_bpobj_subobjs(objset_t *os, uint64_t object, void *data, size_t size)
{
dmu_object_info_t doi;
uint64_t i;
int64_t i;
VERIFY0(dmu_object_info(os, object, &doi));
uint64_t *subobjs = kmem_alloc(doi.doi_max_offset, KM_SLEEP);
@ -488,7 +497,7 @@ dump_bpobj_subobjs(objset_t *os, uint64_t object, void *data, size_t size)
}
for (i = 0; i <= last_nonzero; i++) {
(void) printf("\t%llu\n", (longlong_t)subobjs[i]);
(void) printf("\t%llu\n", (u_longlong_t)subobjs[i]);
}
kmem_free(subobjs, doi.doi_max_offset);
}
@ -1925,12 +1934,12 @@ dump_object(objset_t *os, uint64_t object, int verbosity, int *print_header)
(void) printf("%10lld %3u %5s %5s %5s %5s %6s %s%s\n",
(u_longlong_t)object, doi.doi_indirection, iblk, dblk,
asize, lsize, fill, ZDB_OT_NAME(doi.doi_type), aux);
asize, lsize, fill, zdb_ot_name(doi.doi_type), aux);
if (doi.doi_bonus_type != DMU_OT_NONE && verbosity > 3) {
(void) printf("%10s %3s %5s %5s %5s %5s %6s %s\n",
"", "", "", "", "", bonus_size, "bonus",
ZDB_OT_NAME(doi.doi_bonus_type));
zdb_ot_name(doi.doi_bonus_type));
}
if (verbosity >= 4) {
@ -3357,8 +3366,10 @@ zdb_read_block(char *thing, spa_t *spa)
continue;
p = &flagstr[i + 1];
if (bit == ZDB_FLAG_PRINT_BLKPTR)
if (bit == ZDB_FLAG_PRINT_BLKPTR) {
blkptr_offset = strtoull(p, &p, 16);
i = p - &flagstr[i + 1];
}
if (*p != ':' && *p != '\0') {
(void) printf("***Invalid flag arg: '%s'\n", s);
free(dup);

View File

@ -444,13 +444,13 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
/*
* If we're recursive, then we always allow filesystems as
* arguments. If we also are interested in snapshots, then we
* can take volumes as well.
* arguments. If we also are interested in snapshots or
* bookmarks, then we can take volumes as well.
*/
argtype = types;
if (flags & ZFS_ITER_RECURSE) {
argtype |= ZFS_TYPE_FILESYSTEM;
if (types & ZFS_TYPE_SNAPSHOT)
if (types & (ZFS_TYPE_SNAPSHOT | ZFS_TYPE_BOOKMARK))
argtype |= ZFS_TYPE_VOLUME;
}

View File

@ -207,7 +207,7 @@ static const char *
get_usage(zpool_help_t idx) {
switch (idx) {
case HELP_ADD:
return (gettext("\tadd [-fn] [-o property=value] "
return (gettext("\tadd [-fgLnP] [-o property=value] "
"<pool> <vdev> ...\n"));
case HELP_ATTACH:
return (gettext("\tattach [-f] [-o property=value] "
@ -237,12 +237,12 @@ get_usage(zpool_help_t idx) {
"[-R root] [-F [-n]]\n"
"\t <pool | id> [newpool]\n"));
case HELP_IOSTAT:
return (gettext("\tiostat [-v] [-T d|u] [-y] [pool] ... "
return (gettext("\tiostat [-gLPvy] [-T d|u] [pool] ... "
"[interval [count]]\n"));
case HELP_LABELCLEAR:
return (gettext("\tlabelclear [-f] <vdev>\n"));
case HELP_LIST:
return (gettext("\tlist [-Hv] [-o property[,...]] "
return (gettext("\tlist [-gHLPv] [-o property[,...]] "
"[-T d|u] [pool] ... [interval [count]]\n"));
case HELP_OFFLINE:
return (gettext("\toffline [-t] <pool> <device> ...\n"));
@ -258,8 +258,8 @@ get_usage(zpool_help_t idx) {
case HELP_SCRUB:
return (gettext("\tscrub [-s] <pool> ...\n"));
case HELP_STATUS:
return (gettext("\tstatus [-vxD] [-T d|u] [pool] ... [interval "
"[count]]\n"));
return (gettext("\tstatus [-gLPvxD] [-T d|u] [pool] ... "
"[interval [count]]\n"));
case HELP_UPGRADE:
return (gettext("\tupgrade\n"
"\tupgrade -v\n"
@ -272,7 +272,7 @@ get_usage(zpool_help_t idx) {
case HELP_SET:
return (gettext("\tset <property=value> <pool> \n"));
case HELP_SPLIT:
return (gettext("\tsplit [-n] [-R altroot] [-o mntopts]\n"
return (gettext("\tsplit [-gLnP] [-R altroot] [-o mntopts]\n"
"\t [-o property=value] <pool> <newpool> "
"[<device> ...]\n"));
case HELP_REGUID:
@ -371,7 +371,7 @@ usage(boolean_t requested)
void
print_vdev_tree(zpool_handle_t *zhp, const char *name, nvlist_t *nv, int indent,
boolean_t print_logs)
boolean_t print_logs, int name_flags)
{
nvlist_t **child;
uint_t c, children;
@ -392,9 +392,9 @@ print_vdev_tree(zpool_handle_t *zhp, const char *name, nvlist_t *nv, int indent,
if ((is_log && !print_logs) || (!is_log && print_logs))
continue;
vname = zpool_vdev_name(g_zfs, zhp, child[c], B_FALSE);
vname = zpool_vdev_name(g_zfs, zhp, child[c], name_flags);
print_vdev_tree(zhp, vname, child[c], indent + 2,
B_FALSE);
B_FALSE, name_flags);
free(vname);
}
}
@ -502,12 +502,15 @@ add_prop_list_default(const char *propname, char *propval, nvlist_t **props,
}
/*
* zpool add [-fn] [-o property=value] <pool> <vdev> ...
* zpool add [-fgLnP] [-o property=value] <pool> <vdev> ...
*
* -f Force addition of devices, even if they appear in use
* -g Display guid for individual vdev name.
* -L Follow links when resolving vdev path name.
* -n Do not add the devices, but display the resulting layout if
* they were to be added.
* -o Set property=value.
* -P Display full path for vdev name.
*
* Adds the given vdevs to 'pool'. As with create, the bulk of this work is
* handled by get_vdev_spec(), which constructs the nvlist needed to pass to
@ -518,6 +521,7 @@ zpool_do_add(int argc, char **argv)
{
boolean_t force = B_FALSE;
boolean_t dryrun = B_FALSE;
int name_flags = 0;
int c;
nvlist_t *nvroot;
char *poolname;
@ -528,11 +532,17 @@ zpool_do_add(int argc, char **argv)
char *propval;
/* check options */
while ((c = getopt(argc, argv, "fno:")) != -1) {
while ((c = getopt(argc, argv, "fgLno:P")) != -1) {
switch (c) {
case 'f':
force = B_TRUE;
break;
case 'g':
name_flags |= VDEV_NAME_GUID;
break;
case 'L':
name_flags |= VDEV_NAME_FOLLOW_LINKS;
break;
case 'n':
dryrun = B_TRUE;
break;
@ -549,6 +559,9 @@ zpool_do_add(int argc, char **argv)
(add_prop_list(optarg, propval, &props, B_TRUE)))
usage(B_FALSE);
break;
case 'P':
name_flags |= VDEV_NAME_PATH;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -606,15 +619,19 @@ zpool_do_add(int argc, char **argv)
"configuration:\n"), zpool_get_name(zhp));
/* print original main pool and new tree */
print_vdev_tree(zhp, poolname, poolnvroot, 0, B_FALSE);
print_vdev_tree(zhp, NULL, nvroot, 0, B_FALSE);
print_vdev_tree(zhp, poolname, poolnvroot, 0, B_FALSE,
name_flags);
print_vdev_tree(zhp, NULL, nvroot, 0, B_FALSE, name_flags);
/* Do the same for the logs */
if (num_logs(poolnvroot) > 0) {
print_vdev_tree(zhp, "logs", poolnvroot, 0, B_TRUE);
print_vdev_tree(zhp, NULL, nvroot, 0, B_TRUE);
print_vdev_tree(zhp, "logs", poolnvroot, 0, B_TRUE,
name_flags);
print_vdev_tree(zhp, NULL, nvroot, 0, B_TRUE,
name_flags);
} else if (num_logs(nvroot) > 0) {
print_vdev_tree(zhp, "logs", nvroot, 0, B_TRUE);
print_vdev_tree(zhp, "logs", nvroot, 0, B_TRUE,
name_flags);
}
/* Do the same for the caches */
@ -624,7 +641,7 @@ zpool_do_add(int argc, char **argv)
(void) printf(gettext("\tcache\n"));
for (c = 0; c < l2children; c++) {
vname = zpool_vdev_name(g_zfs, NULL,
l2child[c], B_FALSE);
l2child[c], name_flags);
(void) printf("\t %s\n", vname);
free(vname);
}
@ -635,7 +652,7 @@ zpool_do_add(int argc, char **argv)
(void) printf(gettext("\tcache\n"));
for (c = 0; c < l2children; c++) {
vname = zpool_vdev_name(g_zfs, NULL,
l2child[c], B_FALSE);
l2child[c], name_flags);
(void) printf("\t %s\n", vname);
free(vname);
}
@ -1082,9 +1099,9 @@ zpool_do_create(int argc, char **argv)
(void) printf(gettext("would create '%s' with the "
"following layout:\n\n"), poolname);
print_vdev_tree(NULL, poolname, nvroot, 0, B_FALSE);
print_vdev_tree(NULL, poolname, nvroot, 0, B_FALSE, 0);
if (num_logs(nvroot) > 0)
print_vdev_tree(NULL, "logs", nvroot, 0, B_TRUE);
print_vdev_tree(NULL, "logs", nvroot, 0, B_TRUE, 0);
ret = 0;
} else {
@ -1311,13 +1328,15 @@ zpool_do_export(int argc, char **argv)
* name column.
*/
static int
max_width(zpool_handle_t *zhp, nvlist_t *nv, int depth, int max)
max_width(zpool_handle_t *zhp, nvlist_t *nv, int depth, int max,
int name_flags)
{
char *name = zpool_vdev_name(g_zfs, zhp, nv, B_TRUE);
char *name;
nvlist_t **child;
uint_t c, children;
int ret;
name = zpool_vdev_name(g_zfs, zhp, nv, name_flags | VDEV_NAME_TYPE_ID);
if (strlen(name) + depth > max)
max = strlen(name) + depth;
@ -1327,7 +1346,7 @@ max_width(zpool_handle_t *zhp, nvlist_t *nv, int depth, int max)
&child, &children) == 0) {
for (c = 0; c < children; c++)
if ((ret = max_width(zhp, child[c], depth + 2,
max)) > max)
max, name_flags)) > max)
max = ret;
}
@ -1335,7 +1354,7 @@ max_width(zpool_handle_t *zhp, nvlist_t *nv, int depth, int max)
&child, &children) == 0) {
for (c = 0; c < children; c++)
if ((ret = max_width(zhp, child[c], depth + 2,
max)) > max)
max, name_flags)) > max)
max = ret;
}
@ -1343,11 +1362,10 @@ max_width(zpool_handle_t *zhp, nvlist_t *nv, int depth, int max)
&child, &children) == 0) {
for (c = 0; c < children; c++)
if ((ret = max_width(zhp, child[c], depth + 2,
max)) > max)
max, name_flags)) > max)
max = ret;
}
return (max);
}
@ -1399,9 +1417,9 @@ find_spare(zpool_handle_t *zhp, void *data)
/*
* Print out configuration state as requested by status_callback.
*/
void
static void
print_status_config(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
int namewidth, int depth, boolean_t isspare)
int namewidth, int depth, boolean_t isspare, int name_flags)
{
nvlist_t **child;
uint_t c, children;
@ -1537,20 +1555,21 @@ print_status_config(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
&ishole);
if (islog || ishole)
continue;
vname = zpool_vdev_name(g_zfs, zhp, child[c], B_TRUE);
vname = zpool_vdev_name(g_zfs, zhp, child[c],
name_flags | VDEV_NAME_TYPE_ID);
print_status_config(zhp, vname, child[c],
namewidth, depth + 2, isspare);
namewidth, depth + 2, isspare, name_flags);
free(vname);
}
}
/*
* Print the configuration of an exported pool. Iterate over all vdevs in the
* pool, printing out the name and status for each one.
*/
void
print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth)
static void
print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth,
int name_flags)
{
nvlist_t **child;
uint_t c, children;
@ -1615,8 +1634,10 @@ print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth)
if (is_log)
continue;
vname = zpool_vdev_name(g_zfs, NULL, child[c], B_TRUE);
print_import_config(vname, child[c], namewidth, depth + 2);
vname = zpool_vdev_name(g_zfs, NULL, child[c],
name_flags | VDEV_NAME_TYPE_ID);
print_import_config(vname, child[c], namewidth, depth + 2,
name_flags);
free(vname);
}
@ -1624,7 +1645,8 @@ print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth)
&child, &children) == 0) {
(void) printf(gettext("\tcache\n"));
for (c = 0; c < children; c++) {
vname = zpool_vdev_name(g_zfs, NULL, child[c], B_FALSE);
vname = zpool_vdev_name(g_zfs, NULL, child[c],
name_flags);
(void) printf("\t %s\n", vname);
free(vname);
}
@ -1634,7 +1656,8 @@ print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth)
&child, &children) == 0) {
(void) printf(gettext("\tspares\n"));
for (c = 0; c < children; c++) {
vname = zpool_vdev_name(g_zfs, NULL, child[c], B_FALSE);
vname = zpool_vdev_name(g_zfs, NULL, child[c],
name_flags);
(void) printf("\t %s\n", vname);
free(vname);
}
@ -1650,7 +1673,8 @@ print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth)
* works because only the top level vdev is marked "is_log"
*/
static void
print_logs(zpool_handle_t *zhp, nvlist_t *nv, int namewidth, boolean_t verbose)
print_logs(zpool_handle_t *zhp, nvlist_t *nv, int namewidth, boolean_t verbose,
int name_flags)
{
uint_t c, children;
nvlist_t **child;
@ -1669,12 +1693,14 @@ print_logs(zpool_handle_t *zhp, nvlist_t *nv, int namewidth, boolean_t verbose)
&is_log);
if (!is_log)
continue;
name = zpool_vdev_name(g_zfs, zhp, child[c], B_TRUE);
name = zpool_vdev_name(g_zfs, zhp, child[c],
name_flags | VDEV_NAME_TYPE_ID);
if (verbose)
print_status_config(zhp, name, child[c], namewidth,
2, B_FALSE);
2, B_FALSE, name_flags);
else
print_import_config(name, child[c], namewidth, 2);
print_import_config(name, child[c], namewidth, 2,
name_flags);
free(name);
}
}
@ -1923,13 +1949,13 @@ show_import(nvlist_t *config)
(void) printf(gettext(" config:\n\n"));
namewidth = max_width(NULL, nvroot, 0, 0);
namewidth = max_width(NULL, nvroot, 0, 0, 0);
if (namewidth < 10)
namewidth = 10;
print_import_config(name, nvroot, namewidth, 0);
print_import_config(name, nvroot, namewidth, 0, 0);
if (num_logs(nvroot) > 0)
print_logs(NULL, nvroot, namewidth, B_FALSE);
print_logs(NULL, nvroot, namewidth, B_FALSE, 0);
if (reason == ZPOOL_STATUS_BAD_GUID_SUM) {
(void) printf(gettext("\n\tAdditional devices are known to "
@ -2438,6 +2464,7 @@ error:
typedef struct iostat_cbdata {
boolean_t cb_verbose;
int cb_name_flags;
int cb_namewidth;
int cb_iteration;
zpool_list_t *cb_list;
@ -2560,7 +2587,8 @@ print_vdev_stats(zpool_handle_t *zhp, const char *name, nvlist_t *oldnv,
if (ishole || islog)
continue;
vname = zpool_vdev_name(g_zfs, zhp, newchild[c], B_FALSE);
vname = zpool_vdev_name(g_zfs, zhp, newchild[c],
cb->cb_name_flags);
print_vdev_stats(zhp, vname, oldnv ? oldchild[c] : NULL,
newchild[c], cb, depth + 2);
free(vname);
@ -2581,7 +2609,7 @@ print_vdev_stats(zpool_handle_t *zhp, const char *name, nvlist_t *oldnv,
if (islog) {
vname = zpool_vdev_name(g_zfs, zhp, newchild[c],
B_FALSE);
cb->cb_name_flags);
print_vdev_stats(zhp, vname, oldnv ?
oldchild[c] : NULL, newchild[c],
cb, depth + 2);
@ -2607,7 +2635,7 @@ print_vdev_stats(zpool_handle_t *zhp, const char *name, nvlist_t *oldnv,
"-\n", cb->cb_namewidth, "cache");
for (c = 0; c < children; c++) {
vname = zpool_vdev_name(g_zfs, zhp, newchild[c],
B_FALSE);
cb->cb_name_flags);
print_vdev_stats(zhp, vname, oldnv ? oldchild[c] : NULL,
newchild[c], cb, depth + 2);
free(vname);
@ -2700,7 +2728,7 @@ get_namewidth(zpool_handle_t *zhp, void *data)
cb->cb_namewidth = strlen(zpool_get_name(zhp));
else
cb->cb_namewidth = max_width(zhp, nvroot, 0,
cb->cb_namewidth);
cb->cb_namewidth, cb->cb_name_flags);
}
/*
@ -2800,8 +2828,11 @@ get_timestamp_arg(char c)
}
/*
* zpool iostat [-v] [-T d|u] [pool] ... [interval [count]]
* zpool iostat [-gLPv] [-T d|u] [pool] ... [interval [count]]
*
* -g Display guid for individual vdev name.
* -L Follow links when resolving vdev path name.
* -P Display full path for vdev name.
* -v Display statistics for individual vdevs
* -T Display a timestamp in date(1) or Unix format
*
@ -2821,11 +2852,23 @@ zpool_do_iostat(int argc, char **argv)
zpool_list_t *list;
boolean_t verbose = B_FALSE;
boolean_t omit_since_boot = B_FALSE;
iostat_cbdata_t cb;
boolean_t guid = B_FALSE;
boolean_t follow_links = B_FALSE;
boolean_t full_name = B_FALSE;
iostat_cbdata_t cb = { 0 };
/* check options */
while ((c = getopt(argc, argv, "T:vy")) != -1) {
while ((c = getopt(argc, argv, "gLPT:vy")) != -1) {
switch (c) {
case 'g':
guid = B_TRUE;
break;
case 'L':
follow_links = B_TRUE;
break;
case 'P':
full_name = B_TRUE;
break;
case 'T':
get_timestamp_arg(*optarg);
break;
@ -2870,6 +2913,12 @@ zpool_do_iostat(int argc, char **argv)
*/
cb.cb_list = list;
cb.cb_verbose = verbose;
if (guid)
cb.cb_name_flags |= VDEV_NAME_GUID;
if (follow_links)
cb.cb_name_flags |= VDEV_NAME_FOLLOW_LINKS;
if (full_name)
cb.cb_name_flags |= VDEV_NAME_PATH;
cb.cb_iteration = 0;
cb.cb_namewidth = 0;
@ -2953,6 +3002,7 @@ zpool_do_iostat(int argc, char **argv)
typedef struct list_cbdata {
boolean_t cb_verbose;
int cb_name_flags;
int cb_namewidth;
boolean_t cb_scripted;
zprop_list_t *cb_proplist;
@ -3128,6 +3178,9 @@ print_list_stats(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
uint_t c, children;
char *vname;
boolean_t scripted = cb->cb_scripted;
uint64_t islog = B_FALSE;
boolean_t haslog = B_FALSE;
char *dashes = "%-*s - - - - - -\n";
verify(nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &c) == 0);
@ -3178,24 +3231,51 @@ print_list_stats(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
ZPOOL_CONFIG_IS_HOLE, &ishole) == 0 && ishole)
continue;
vname = zpool_vdev_name(g_zfs, zhp, child[c], B_FALSE);
if (nvlist_lookup_uint64(child[c],
ZPOOL_CONFIG_IS_LOG, &islog) == 0 && islog) {
haslog = B_TRUE;
continue;
}
vname = zpool_vdev_name(g_zfs, zhp, child[c],
cb->cb_name_flags);
print_list_stats(zhp, vname, child[c], cb, depth + 2);
free(vname);
}
/*
* Include level 2 ARC devices in iostat output
*/
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_L2CACHE,
&child, &children) != 0)
return;
if (haslog == B_TRUE) {
/* LINTED E_SEC_PRINTF_VAR_FMT */
(void) printf(dashes, cb->cb_namewidth, "log");
for (c = 0; c < children; c++) {
if (nvlist_lookup_uint64(child[c], ZPOOL_CONFIG_IS_LOG,
&islog) != 0 || !islog)
continue;
vname = zpool_vdev_name(g_zfs, zhp, child[c],
cb->cb_name_flags);
print_list_stats(zhp, vname, child[c], cb, depth + 2);
free(vname);
}
}
if (children > 0) {
(void) printf("%-*s - - - - - "
"-\n", cb->cb_namewidth, "cache");
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_L2CACHE,
&child, &children) == 0 && children > 0) {
/* LINTED E_SEC_PRINTF_VAR_FMT */
(void) printf(dashes, cb->cb_namewidth, "cache");
for (c = 0; c < children; c++) {
vname = zpool_vdev_name(g_zfs, zhp, child[c],
B_FALSE);
cb->cb_name_flags);
print_list_stats(zhp, vname, child[c], cb, depth + 2);
free(vname);
}
}
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_SPARES, &child,
&children) == 0 && children > 0) {
/* LINTED E_SEC_PRINTF_VAR_FMT */
(void) printf(dashes, cb->cb_namewidth, "spare");
for (c = 0; c < children; c++) {
vname = zpool_vdev_name(g_zfs, zhp, child[c],
cb->cb_name_flags);
print_list_stats(zhp, vname, child[c], cb, depth + 2);
free(vname);
}
@ -3227,13 +3307,16 @@ list_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool list [-H] [-o prop[,prop]*] [-T d|u] [pool] ... [interval [count]]
* zpool list [-gHLP] [-o prop[,prop]*] [-T d|u] [pool] ... [interval [count]]
*
* -g Display guid for individual vdev name.
* -H Scripted mode. Don't display headers, and separate properties
* by a single tab.
* -L Follow links when resolving vdev path name.
* -o List of properties to display. Defaults to
* "name,size,allocated,free,expandsize,fragmentation,capacity,"
* "dedupratio,health,altroot"
* -P Display full path for vdev name.
* -T Display a timestamp in date(1) or Unix format
*
* List all pools in the system, whether or not they're healthy. Output space
@ -3254,14 +3337,23 @@ zpool_do_list(int argc, char **argv)
boolean_t first = B_TRUE;
/* check options */
while ((c = getopt(argc, argv, ":Ho:T:v")) != -1) {
while ((c = getopt(argc, argv, ":gHLo:PT:v")) != -1) {
switch (c) {
case 'g':
cb.cb_name_flags |= VDEV_NAME_GUID;
break;
case 'H':
cb.cb_scripted = B_TRUE;
break;
case 'L':
cb.cb_name_flags |= VDEV_NAME_FOLLOW_LINKS;
break;
case 'o':
props = optarg;
break;
case 'P':
cb.cb_name_flags |= VDEV_NAME_PATH;
break;
case 'T':
get_timestamp_arg(*optarg);
break;
@ -3517,13 +3609,16 @@ zpool_do_detach(int argc, char **argv)
}
/*
* zpool split [-n] [-o prop=val] ...
* zpool split [-gLnP] [-o prop=val] ...
* [-o mntopt] ...
* [-R altroot] <pool> <newpool> [<device> ...]
*
* -g Display guid for individual vdev name.
* -L Follow links when resolving vdev path name.
* -n Do not split the pool, but display the resulting layout if
* it were to be split.
* -o Set property=value, or set mount options.
* -P Display full path for vdev name.
* -R Mount the split-off pool under an alternate root.
*
* Splits the named pool and gives it the new pool name. Devices to be split
@ -3547,10 +3642,17 @@ zpool_do_split(int argc, char **argv)
flags.dryrun = B_FALSE;
flags.import = B_FALSE;
flags.name_flags = 0;
/* check options */
while ((c = getopt(argc, argv, ":R:no:")) != -1) {
while ((c = getopt(argc, argv, ":gLR:no:P")) != -1) {
switch (c) {
case 'g':
flags.name_flags |= VDEV_NAME_GUID;
break;
case 'L':
flags.name_flags |= VDEV_NAME_FOLLOW_LINKS;
break;
case 'R':
flags.import = B_TRUE;
if (add_prop_list(
@ -3578,6 +3680,9 @@ zpool_do_split(int argc, char **argv)
mntopts = optarg;
}
break;
case 'P':
flags.name_flags |= VDEV_NAME_PATH;
break;
case ':':
(void) fprintf(stderr, gettext("missing argument for "
"'%c' option\n"), optopt);
@ -3625,7 +3730,8 @@ zpool_do_split(int argc, char **argv)
if (flags.dryrun) {
(void) printf(gettext("would create '%s' with the "
"following layout:\n\n"), newpool);
print_vdev_tree(NULL, newpool, config, 0, B_FALSE);
print_vdev_tree(NULL, newpool, config, 0, B_FALSE,
flags.name_flags);
}
nvlist_free(config);
}
@ -4031,6 +4137,7 @@ zpool_do_scrub(int argc, char **argv)
typedef struct status_cbdata {
int cb_count;
int cb_name_flags;
boolean_t cb_allpools;
boolean_t cb_verbose;
boolean_t cb_explain;
@ -4187,7 +4294,7 @@ print_error_log(zpool_handle_t *zhp)
static void
print_spares(zpool_handle_t *zhp, nvlist_t **spares, uint_t nspares,
int namewidth)
int namewidth, int name_flags)
{
uint_t i;
char *name;
@ -4198,16 +4305,16 @@ print_spares(zpool_handle_t *zhp, nvlist_t **spares, uint_t nspares,
(void) printf(gettext("\tspares\n"));
for (i = 0; i < nspares; i++) {
name = zpool_vdev_name(g_zfs, zhp, spares[i], B_FALSE);
name = zpool_vdev_name(g_zfs, zhp, spares[i], name_flags);
print_status_config(zhp, name, spares[i],
namewidth, 2, B_TRUE);
namewidth, 2, B_TRUE, name_flags);
free(name);
}
}
static void
print_l2cache(zpool_handle_t *zhp, nvlist_t **l2cache, uint_t nl2cache,
int namewidth)
int namewidth, int name_flags)
{
uint_t i;
char *name;
@ -4218,9 +4325,9 @@ print_l2cache(zpool_handle_t *zhp, nvlist_t **l2cache, uint_t nl2cache,
(void) printf(gettext("\tcache\n"));
for (i = 0; i < nl2cache; i++) {
name = zpool_vdev_name(g_zfs, zhp, l2cache[i], B_FALSE);
name = zpool_vdev_name(g_zfs, zhp, l2cache[i], name_flags);
print_status_config(zhp, name, l2cache[i],
namewidth, 2, B_FALSE);
namewidth, 2, B_FALSE, name_flags);
free(name);
}
}
@ -4562,7 +4669,7 @@ status_callback(zpool_handle_t *zhp, void *data)
ZPOOL_CONFIG_SCAN_STATS, (uint64_t **)&ps, &c);
print_scan_status(ps);
namewidth = max_width(zhp, nvroot, 0, 0);
namewidth = max_width(zhp, nvroot, 0, 0, cbp->cb_name_flags);
if (namewidth < 10)
namewidth = 10;
@ -4570,17 +4677,20 @@ status_callback(zpool_handle_t *zhp, void *data)
(void) printf(gettext("\t%-*s %-8s %5s %5s %5s\n"), namewidth,
"NAME", "STATE", "READ", "WRITE", "CKSUM");
print_status_config(zhp, zpool_get_name(zhp), nvroot,
namewidth, 0, B_FALSE);
namewidth, 0, B_FALSE, cbp->cb_name_flags);
if (num_logs(nvroot) > 0)
print_logs(zhp, nvroot, namewidth, B_TRUE);
print_logs(zhp, nvroot, namewidth, B_TRUE,
cbp->cb_name_flags);
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_L2CACHE,
&l2cache, &nl2cache) == 0)
print_l2cache(zhp, l2cache, nl2cache, namewidth);
print_l2cache(zhp, l2cache, nl2cache, namewidth,
cbp->cb_name_flags);
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_SPARES,
&spares, &nspares) == 0)
print_spares(zhp, spares, nspares, namewidth);
print_spares(zhp, spares, nspares, namewidth,
cbp->cb_name_flags);
if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_ERRCOUNT,
&nerr) == 0) {
@ -4628,8 +4738,11 @@ status_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool status [-vx] [-T d|u] [pool] ... [interval [count]]
* zpool status [-gLPvx] [-T d|u] [pool] ... [interval [count]]
*
* -g Display guid for individual vdev name.
* -L Follow links when resolving vdev path name.
* -P Display full path for vdev name.
* -v Display complete error logs
* -x Display only pools with potential problems
* -D Display dedup status (undocumented)
@ -4646,8 +4759,17 @@ zpool_do_status(int argc, char **argv)
status_cbdata_t cb = { 0 };
/* check options */
while ((c = getopt(argc, argv, "vxDT:")) != -1) {
while ((c = getopt(argc, argv, "gLPvxDT:")) != -1) {
switch (c) {
case 'g':
cb.cb_name_flags |= VDEV_NAME_GUID;
break;
case 'L':
cb.cb_name_flags |= VDEV_NAME_FOLLOW_LINKS;
break;
case 'P':
cb.cb_name_flags |= VDEV_NAME_PATH;
break;
case 'v':
cb.cb_verbose = B_TRUE;
break;
@ -5916,6 +6038,7 @@ main(int argc, char **argv)
(void) setlocale(LC_ALL, "");
(void) textdomain(TEXT_DOMAIN);
srand(time(NULL));
dprintf_setup(&argc, argv);

View File

@ -1206,12 +1206,10 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
/*
* Remove any previously existing symlink from a udev path to
* the device before labeling the disk. This makes
* zpool_label_disk_wait() truly wait for the new link to show
* up instead of returning if it finds an old link still in
* place. Otherwise there is a window between when udev
* deletes and recreates the link during which access attempts
* will fail with ENOENT.
* the device before labeling the disk. This ensures that
* only newly created links are used. Otherwise there is a
* window between when udev deletes and recreates the link
* during which access attempts will fail with ENOENT.
*/
strncpy(udevpath, path, MAXPATHLEN);
(void) zfs_append_partition(udevpath, MAXPATHLEN);
@ -1235,6 +1233,8 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
* and then block until udev creates the new link.
*/
if (!is_exclusive || !is_spare(NULL, udevpath)) {
char *devnode = strrchr(devpath, '/') + 1;
ret = strncmp(udevpath, UDISK_ROOT, strlen(UDISK_ROOT));
if (ret == 0) {
ret = lstat64(udevpath, &statbuf);
@ -1242,18 +1242,29 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
(void) unlink(udevpath);
}
if (zpool_label_disk(g_zfs, zhp,
strrchr(devpath, '/') + 1) == -1)
/*
* When labeling a pool the raw device node name
* is provided as it appears under /dev/.
*/
if (zpool_label_disk(g_zfs, zhp, devnode) == -1)
return (-1);
/*
* Wait for udev to signal the device is available
* by the provided path.
*/
ret = zpool_label_disk_wait(udevpath, DISK_LABEL_WAIT);
if (ret) {
(void) fprintf(stderr, gettext("cannot "
"resolve path '%s': %d\n"), udevpath, ret);
return (-1);
(void) fprintf(stderr,
gettext("missing link: %s was "
"partitioned but %s is missing\n"),
devnode, udevpath);
return (ret);
}
(void) zero_label(udevpath);
ret = zero_label(udevpath);
if (ret)
return (ret);
}
/*

View File

@ -1,6 +1,8 @@
include $(top_srcdir)/config/Rules.am
AM_CFLAGS += $(DEBUG_STACKFLAGS) $(FRAME_LARGER_THAN)
# -Wnoformat-truncation to get rid of compiler warning for unchecked
# truncating snprintfs on gcc 7.1.1.
AM_CFLAGS += $(DEBUG_STACKFLAGS) $(FRAME_LARGER_THAN) $(NO_FORMAT_TRUNCATION)
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \

View File

@ -7,7 +7,8 @@ AM_CFLAGS += ${NO_BOOL_COMPARE}
AM_CFLAGS += -fno-strict-aliasing
AM_CPPFLAGS = -D_GNU_SOURCE -D__EXTENSIONS__ -D_REENTRANT
AM_CPPFLAGS += -D_POSIX_PTHREAD_SEMANTICS -D_FILE_OFFSET_BITS=64
AM_CPPFLAGS += -D_LARGEFILE64_SOURCE -DTEXT_DOMAIN=\"zfs-linux-user\"
AM_CPPFLAGS += -D_LARGEFILE64_SOURCE -DHAVE_LARGE_STACKS=1
AM_CPPFLAGS += -DTEXT_DOMAIN=\"zfs-linux-user\"
AM_CPPFLAGS += -DLIBEXECDIR=\"$(libexecdir)\"
AM_CPPFLAGS += -DRUNSTATEDIR=\"$(runstatedir)\"
AM_CPPFLAGS += -DSBINDIR=\"$(sbindir)\"

View File

@ -39,6 +39,35 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_RELEASE], [
])
])
dnl #
dnl # 3.14 API change,
dnl # set_cached_acl() and forget_cached_acl() changed from inline to
dnl # EXPORT_SYMBOL. In the former case, they may not be usable because of
dnl # posix_acl_release. In the latter case, we can always use them.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE], [
AC_MSG_CHECKING([whether set_cached_acl() is usable])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/cred.h>
#include <linux/fs.h>
#include <linux/posix_acl.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct inode *ip = NULL;
struct posix_acl *acl = posix_acl_alloc(1, 0);
set_cached_acl(ip, ACL_TYPE_ACCESS, acl);
forget_cached_acl(ip, ACL_TYPE_ACCESS);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SET_CACHED_ACL_USABLE, 1,
[posix_acl_release() is usable])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 3.1 API change,
dnl # posix_acl_chmod_masq() is not exported anymore and posix_acl_chmod()
@ -75,27 +104,6 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_CHMOD], [
])
])
dnl #
dnl # 2.6.30 API change,
dnl # caching of ACL into the inode was added in this version.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_CACHING], [
AC_MSG_CHECKING([whether inode has i_acl and i_default_acl])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
struct inode ino;
ino.i_acl = NULL;
ino.i_default_acl = NULL;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_POSIX_ACL_CACHING, 1,
[inode contains i_acl and i_default_acl])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 3.1 API change,
dnl # posix_acl_equiv_mode now wants an umode_t* instead of a mode_t*
@ -117,6 +125,30 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T], [
])
])
dnl #
dnl # 4.8 API change,
dnl # The function posix_acl_valid now must be passed a namespace.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_VALID_WITH_NS], [
AC_MSG_CHECKING([whether posix_acl_valid() wants user namespace])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
#include <linux/posix_acl.h>
],[
struct user_namespace *user_ns = NULL;
const struct posix_acl *acl = NULL;
int error;
error = posix_acl_valid(user_ns, acl);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_POSIX_ACL_VALID_WITH_NS, 1,
[posix_acl_valid() wants user namespace])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 2.6.27 API change,
dnl # Check if inode_operations contains the function permission
@ -247,18 +279,45 @@ AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_GET_ACL], [
])
dnl #
dnl # 2.6.30 API change,
dnl # current_umask exists only since this version.
dnl # 3.14 API change,
dnl # Check if inode_operations contains the function set_acl
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CURRENT_UMASK], [
AC_MSG_CHECKING([whether current_umask exists])
AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_SET_ACL], [
AC_MSG_CHECKING([whether iops->set_acl() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int set_acl_fn(struct inode *inode, struct posix_acl *acl, int type)
{ return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.set_acl = set_acl_fn,
};
],[
current_umask();
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_CURRENT_UMASK, 1, [current_umask() exists])
AC_DEFINE(HAVE_SET_ACL, 1, [iops->set_acl() exists])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 4.7 API change,
dnl # The kernel get_acl will now check cache before calling i_op->get_acl and
dnl # do set_cached_acl after that, so i_op->get_acl don't need to do that
dnl # anymore.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_GET_ACL_HANDLE_CACHE], [
AC_MSG_CHECKING([whether uncached_acl_sentinel() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
void *sentinel __attribute__ ((unused)) = uncached_acl_sentinel(NULL);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_KERNEL_GET_ACL_HANDLE_CACHE, 1, [uncached_acl_sentinel() exists])
],[
AC_MSG_RESULT(no)
])

View File

@ -0,0 +1,21 @@
dnl #
dnl # Linux 4.9-rc5+ ABI, removal of the .aio_fsync field
dnl #
AC_DEFUN([ZFS_AC_KERNEL_AIO_FSYNC], [
AC_MSG_CHECKING([whether fops->aio_fsync() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
static const struct file_operations
fops __attribute__ ((unused)) = {
.aio_fsync = NULL,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FILE_AIO_FSYNC, 1, [fops->aio_fsync() exists])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -1,38 +0,0 @@
dnl #
dnl # 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
dnl # 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
dnl # 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BDI_SETUP_AND_REGISTER], [
AC_MSG_CHECKING([whether bdi_setup_and_register() wants 2 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_2ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 2 args])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bdi_setup_and_register() wants 3 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
unsigned int cap = BDI_CAP_MAP_COPY;
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name, cap);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_3ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 3 args])
], [
AC_MSG_RESULT(no)
])
])
])

56
config/kernel-bdi.m4 Normal file
View File

@ -0,0 +1,56 @@
dnl #
dnl # 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
dnl # 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
dnl # 4.0 - 4.11, bdi_setup_and_register() takes 2 arguments.
dnl # 4.12 - x.y, super_setup_bdi_name() new interface.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BDI], [
AC_MSG_CHECKING([whether super_setup_bdi_name() exists])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
struct super_block sb;
], [
char *name = "bdi";
int error __attribute__((unused)) =
super_setup_bdi_name(&sb, name);
], [super_setup_bdi_name], [fs/super.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SUPER_SETUP_BDI_NAME, 1,
[super_setup_bdi_name() exits])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether bdi_setup_and_register() wants 2 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_2ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 2 args])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether bdi_setup_and_register() wants 3 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
unsigned int cap = BDI_CAP_MAP_COPY;
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name, cap);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_3ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 3 args])
], [
AC_MSG_RESULT(no)
])
])
])
])

84
config/kernel-bio-op.m4 Normal file
View File

@ -0,0 +1,84 @@
dnl #
dnl # Linux 4.8 API,
dnl #
dnl # The bio_op() helper was introduced as a replacement for explicitly
dnl # checking the bio->bi_rw flags. The following checks are used to
dnl # detect if a specific operation is supported.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_REQ_OP_DISCARD], [
AC_MSG_CHECKING([whether REQ_OP_DISCARD is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blk_types.h>
],[
int op __attribute__ ((unused)) = REQ_OP_DISCARD;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_REQ_OP_DISCARD, 1,
[REQ_OP_DISCARD is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_REQ_OP_SECURE_ERASE], [
AC_MSG_CHECKING([whether REQ_OP_SECURE_ERASE is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blk_types.h>
],[
int op __attribute__ ((unused)) = REQ_OP_SECURE_ERASE;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_REQ_OP_SECURE_ERASE, 1,
[REQ_OP_SECURE_ERASE is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_REQ_OP_FLUSH], [
AC_MSG_CHECKING([whether REQ_OP_FLUSH is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blk_types.h>
],[
int op __attribute__ ((unused)) = REQ_OP_FLUSH;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_REQ_OP_FLUSH, 1,
[REQ_OP_FLUSH is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_BIO_BI_OPF], [
AC_MSG_CHECKING([whether bio->bi_opf is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h>
],[
struct bio bio __attribute__ ((unused));
bio.bi_opf = 0;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BIO_BI_OPF, 1, [bio->bi_opf is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_HAVE_BIO_SET_OP_ATTRS], [
AC_MSG_CHECKING([whether bio_set_op_attrs is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h>
],[
struct bio *bio __attribute__ ((unused)) = NULL;
bio_set_op_attrs(bio, 0, 0);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BIO_SET_OP_ATTRS, 1,
[bio_set_op_attrs is available])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -22,25 +22,64 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_FLUSH], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_FLUSH, 1,
[blk_queue_flush() is available])
AC_MSG_CHECKING([whether blk_queue_flush() is GPL-only])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/blkdev.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct request_queue *q = NULL;
(void) blk_queue_flush(q, REQ_FLUSH);
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY, 1,
[blk_queue_flush() is GPL-only])
])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether blk_queue_flush() is GPL-only])
dnl #
dnl # 4.7 API change
dnl # Replace blk_queue_flush with blk_queue_write_cache
dnl #
AC_MSG_CHECKING([whether blk_queue_write_cache() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/blkdev.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct request_queue *q = NULL;
(void) blk_queue_flush(q, REQ_FLUSH);
],[
AC_MSG_RESULT(no)
blk_queue_write_cache(q, true, true);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY, 1,
[blk_queue_flush() is GPL-only])
AC_DEFINE(HAVE_BLK_QUEUE_WRITE_CACHE, 1,
[blk_queue_write_cache() exists])
AC_MSG_CHECKING([whether blk_queue_write_cache() is GPL-only])
ZFS_LINUX_TRY_COMPILE([
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/blkdev.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct request_queue *q = NULL;
blk_queue_write_cache(q, true, true);
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY, 1,
[blk_queue_write_cache() is GPL-only])
])
],[
AC_MSG_RESULT(no)
])
EXTRA_KCFLAGS="$tmp_flags"
])

View File

@ -0,0 +1,44 @@
dnl #
dnl # 2.6.32-2.6.35 API - The BIO_RW_UNPLUG enum can be used as a hint
dnl # to unplug the queue.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BIO_RW_UNPLUG], [
AC_MSG_CHECKING([whether the BIO_RW_UNPLUG enum is available])
tmp_flags="$EXTRA_KCFLAGS"
EXTRA_KCFLAGS="${NO_UNUSED_BUT_SET_VARIABLE}"
ZFS_LINUX_TRY_COMPILE([
#include <linux/blkdev.h>
],[
extern enum bio_rw_flags rw;
rw = BIO_RW_UNPLUG;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_HAVE_BIO_RW_UNPLUG, 1,
[BIO_RW_UNPLUG is available])
],[
AC_MSG_RESULT(no)
])
EXTRA_KCFLAGS="$tmp_flags"
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BLK_PLUG], [
AC_MSG_CHECKING([whether struct blk_plug is available])
tmp_flags="$EXTRA_KCFLAGS"
EXTRA_KCFLAGS="${NO_UNUSED_BUT_SET_VARIABLE}"
ZFS_LINUX_TRY_COMPILE([
#include <linux/blkdev.h>
],[
struct blk_plug plug;
blk_start_plug(&plug);
blk_finish_plug(&plug);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_HAVE_BLK_PLUG, 1,
[struct blk_plug is available])
],[
AC_MSG_RESULT(no)
])
EXTRA_KCFLAGS="$tmp_flags"
])

View File

@ -0,0 +1,19 @@
dnl #
dnl # 4.9, current_time() added
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CURRENT_TIME],
[AC_MSG_CHECKING([whether current_time() exists])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
struct inode ip;
struct timespec now __attribute__ ((unused));
now = current_time(&ip);
], [current_time], [fs/inode.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_CURRENT_TIME, 1, [current_time() exists])
], [
AC_MSG_RESULT(no)
])
])

View File

@ -1,24 +0,0 @@
dnl #
dnl # 4.2 API change
dnl # This kernel retired the nameidata structure which forced the
dnl # restructuring of the follow_link() prototype and how it is called.
dnl # We check for the new interface rather than detecting the old one.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_FOLLOW_LINK], [
AC_MSG_CHECKING([whether iops->follow_link() passes nameidata])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
const char *follow_link(struct dentry *de, void **cookie)
{ return "symlink"; }
static struct inode_operations iops __attribute__ ((unused)) = {
.follow_link = follow_link,
};
],[
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FOLLOW_LINK_NAMEIDATA, 1,
[iops->follow_link() nameidata])
])
])

View File

@ -0,0 +1,22 @@
dnl #
dnl # 4.10 API
dnl #
dnl # NULL inode_operations.readlink implies generic_readlink(), which
dnl # has been made static.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_GENERIC_READLINK_GLOBAL], [
AC_MSG_CHECKING([whether generic_readlink is global])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
int i __attribute__ ((unused));
i = generic_readlink(NULL, NULL, 0);
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_GENERIC_READLINK, 1,
[generic_readlink is global])
],[
AC_MSG_RESULT([no])
])
])

100
config/kernel-get-link.m4 Normal file
View File

@ -0,0 +1,100 @@
dnl #
dnl # Supported get_link() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_FOLLOW_LINK], [
dnl #
dnl # 4.2 API change
dnl # - This kernel retired the nameidata structure.
dnl #
AC_MSG_CHECKING([whether iops->follow_link() passes cookie])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
const char *follow_link(struct dentry *de,
void **cookie) { return "symlink"; }
static struct inode_operations
iops __attribute__ ((unused)) = {
.follow_link = follow_link,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FOLLOW_LINK_COOKIE, 1,
[iops->follow_link() cookie])
],[
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether iops->follow_link() passes nameidata])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
void *follow_link(struct dentry *de, struct
nameidata *nd) { return (void *)NULL; }
static struct inode_operations
iops __attribute__ ((unused)) = {
.follow_link = follow_link,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FOLLOW_LINK_NAMEIDATA, 1,
[iops->follow_link() nameidata])
],[
AC_MSG_ERROR(no; please file a bug report)
])
])
])
AC_DEFUN([ZFS_AC_KERNEL_GET_LINK], [
dnl #
dnl # 4.5 API change
dnl # The get_link interface has added a delayed done call and
dnl # used it to retire the put_link() interface.
dnl #
AC_MSG_CHECKING([whether iops->get_link() passes delayed])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
const char *get_link(struct dentry *de, struct inode *ip,
struct delayed_call *done) { return "symlink"; }
static struct inode_operations
iops __attribute__ ((unused)) = {
.get_link = get_link,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_GET_LINK_DELAYED, 1,
[iops->get_link() delayed])
],[
dnl #
dnl # 4.5 API change
dnl # The follow_link() interface has been replaced by
dnl # get_link() which behaves the same as before except:
dnl # - An inode is passed as a separate argument
dnl # - When called in RCU mode a NULL dentry is passed.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether iops->get_link() passes cookie])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
const char *get_link(struct dentry *de, struct
inode *ip, void **cookie) { return "symlink"; }
static struct inode_operations
iops __attribute__ ((unused)) = {
.get_link = get_link,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_GET_LINK_COOKIE, 1,
[iops->get_link() cookie])
],[
dnl #
dnl # Check for the follow_link APIs.
dnl #
AC_MSG_RESULT(no)
ZFS_AC_KERNEL_FOLLOW_LINK
])
])
])

View File

@ -0,0 +1,67 @@
dnl #
dnl # Linux 4.11 API
dnl # See torvalds/linux@a528d35
dnl #
AC_DEFUN([ZFS_AC_PATH_KERNEL_IOPS_GETATTR], [
AC_MSG_CHECKING([whether iops->getattr() takes a path])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int test_getattr(
const struct path *p, struct kstat *k,
u32 request_mask, unsigned int query_flags)
{ return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.getattr = test_getattr,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_PATH_IOPS_GETATTR, 1,
[iops->getattr() takes a path])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Linux 3.9 - 4.10 API
dnl #
AC_DEFUN([ZFS_AC_VFSMOUNT_KERNEL_IOPS_GETATTR], [
AC_MSG_CHECKING([whether iops->getattr() takes a vfsmount])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int test_getattr(
struct vfsmount *mnt, struct dentry *d,
struct kstat *k)
{ return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.getattr = test_getattr,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFSMOUNT_IOPS_GETATTR, 1,
[iops->getattr() takes a vfsmount])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # The interface of the getattr callback from the inode_operations
dnl # structure changed. Also, the interface of the simple_getattr()
dnl # function provided by the kernel changed.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_GETATTR], [
ZFS_AC_PATH_KERNEL_IOPS_GETATTR
ZFS_AC_VFSMOUNT_KERNEL_IOPS_GETATTR
])

View File

@ -1,17 +1,29 @@
dnl #
dnl # 2.6.27 API change
dnl # lookup_bdev() was exported.
dnl # 2.6.27, lookup_bdev() was exported.
dnl # 4.4.0-6.21 - x.y on Ubuntu, lookup_bdev() takes 2 arguments.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_LOOKUP_BDEV],
[AC_MSG_CHECKING([whether lookup_bdev() is available])
[AC_MSG_CHECKING([whether lookup_bdev() wants 1 arg])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
lookup_bdev(NULL);
], [lookup_bdev], [fs/block_dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_LOOKUP_BDEV, 1, [lookup_bdev() is available])
AC_DEFINE(HAVE_1ARG_LOOKUP_BDEV, 1, [lookup_bdev() wants 1 arg])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether lookup_bdev() wants 2 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
lookup_bdev(NULL, FMODE_READ);
], [lookup_bdev], [fs/block_dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_2ARGS_LOOKUP_BDEV, 1,
[lookup_bdev() wants 2 args])
], [
AC_MSG_RESULT(no)
])
])
])

View File

@ -2,6 +2,9 @@ dnl #
dnl # Linux 3.2 API Change
dnl # make_request_fn returns void instead of int.
dnl #
dnl # Linux 4.4 API Change
dnl # make_request_fn returns blk_qc_t.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_MAKE_REQUEST_FN], [
AC_MSG_CHECKING([whether make_request_fn() returns int])
ZFS_LINUX_TRY_COMPILE([
@ -36,8 +39,27 @@ AC_DEFUN([ZFS_AC_KERNEL_MAKE_REQUEST_FN], [
AC_DEFINE(MAKE_REQUEST_FN_RET, void,
[make_request_fn() returns void])
],[
AC_MSG_ERROR(no - Please file a bug report at
https://github.com/zfsonlinux/zfs/issues/new)
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether make_request_fn() returns blk_qc_t])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blkdev.h>
blk_qc_t make_request(struct request_queue *q, struct bio *bio)
{
return (BLK_QC_T_NONE);
}
],[
blk_queue_make_request(NULL, &make_request);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(MAKE_REQUEST_FN_RET, blk_qc_t,
[make_request_fn() returns blk_qc_t])
AC_DEFINE(HAVE_MAKE_REQUEST_FN_RET_QC, 1,
[Noting that make_request_fn() returns blk_qc_t])
],[
AC_MSG_ERROR(no - Please file a bug report at
https://github.com/zfsonlinux/zfs/issues/new)
])
])
])
])

View File

@ -1,23 +0,0 @@
dnl #
dnl # 4.2 API change
dnl # This kernel retired the nameidata structure which forced the
dnl # restructuring of the put_link() prototype and how it is called.
dnl # We check for the new interface rather than detecting the old one.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_PUT_LINK], [
AC_MSG_CHECKING([whether iops->put_link() passes nameidata])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
void put_link(struct inode *ip, void *cookie) { return; }
static struct inode_operations iops __attribute__ ((unused)) = {
.put_link = put_link,
};
],[
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_PUT_LINK_NAMEIDATA, 1,
[iops->put_link() nameidata])
])
])

60
config/kernel-put-link.m4 Normal file
View File

@ -0,0 +1,60 @@
dnl #
dnl # Supported symlink APIs
dnl #
AC_DEFUN([ZFS_AC_KERNEL_PUT_LINK], [
dnl #
dnl # 4.5 API change
dnl # get_link() uses delayed done, there is no put_link() interface.
dnl #
ZFS_LINUX_TRY_COMPILE([
#if !defined(HAVE_GET_LINK_DELAYED)
#error "Expecting get_link() delayed done"
#endif
],[
],[
AC_DEFINE(HAVE_PUT_LINK_DELAYED, 1, [iops->put_link() delayed])
],[
dnl #
dnl # 4.2 API change
dnl # This kernel retired the nameidata structure.
dnl #
AC_MSG_CHECKING([whether iops->put_link() passes cookie])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
void put_link(struct inode *ip, void *cookie)
{ return; }
static struct inode_operations
iops __attribute__ ((unused)) = {
.put_link = put_link,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_PUT_LINK_COOKIE, 1,
[iops->put_link() cookie])
],[
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether iops->put_link() passes nameidata])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
void put_link(struct dentry *de, struct
nameidata *nd, void *ptr) { return; }
static struct inode_operations
iops __attribute__ ((unused)) = {
.put_link = put_link,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_PUT_LINK_NAMEIDATA, 1,
[iops->put_link() nameidata])
],[
AC_MSG_ERROR(no; please file a bug report)
])
])
])
])

25
config/kernel-rename.m4 Normal file
View File

@ -0,0 +1,25 @@
dnl #
dnl # 4.9 API change,
dnl # iops->rename2() merged into iops->rename(), and iops->rename() now wants
dnl # flags.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_RENAME_WANTS_FLAGS], [
AC_MSG_CHECKING([whether iops->rename() wants flags])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int rename_fn(struct inode *sip, struct dentry *sdp,
struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.rename = rename_fn,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_RENAME_WANTS_FLAGS, 1, [iops->rename() wants flags])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,23 @@
dnl #
dnl # 4.9 API change
dnl # The inode_change_ok() function has been renamed setattr_prepare()
dnl # and updated to take a dentry rather than an inode.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SETATTR_PREPARE],
[AC_MSG_CHECKING([whether setattr_prepare() is available])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
struct dentry *dentry = NULL;
struct iattr *attr = NULL;
int error;
error = setattr_prepare(dentry, attr);
], [setattr_prepare], [fs/attr.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SETATTR_PREPARE, 1,
[setattr_prepare() is available])
], [
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,20 @@
dnl #
dnl # 4.8 API change
dnl # The rw argument has been removed from submit_bio/submit_bio_wait.
dnl # Callers are now expected to set bio->bi_rw instead of passing it in.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SUBMIT_BIO], [
AC_MSG_CHECKING([whether submit_bio() wants 1 arg])
ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h>
],[
blk_qc_t blk_qc;
struct bio *bio = NULL;
blk_qc = submit_bio(bio);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_1ARG_SUBMIT_BIO, 1, [submit_bio() wants 1 arg])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -1,8 +1,8 @@
dnl #
dnl # 3.11 API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_ITERATE], [
AC_MSG_CHECKING([whether fops->iterate() is available])
dnl #
dnl # 4.7 API change
dnl #
AC_MSG_CHECKING([whether fops->iterate_shared() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int iterate(struct file *filp, struct dir_context * context)
@ -10,34 +10,55 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_ITERATE], [
static const struct file_operations fops
__attribute__ ((unused)) = {
.iterate = iterate,
.iterate_shared = iterate,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_ITERATE, 1,
[fops->iterate() is available])
AC_DEFINE(HAVE_VFS_ITERATE_SHARED, 1,
[fops->iterate_shared() is available])
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether fops->readdir() is available])
dnl #
dnl # 3.11 API change
dnl #
AC_MSG_CHECKING([whether fops->iterate() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int readdir(struct file *filp, void *entry, filldir_t func)
int iterate(struct file *filp, struct dir_context * context)
{ return 0; }
static const struct file_operations fops
__attribute__ ((unused)) = {
.readdir = readdir,
.iterate = iterate,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_READDIR, 1,
[fops->readdir() is available])
AC_DEFINE(HAVE_VFS_ITERATE, 1,
[fops->iterate() is available])
],[
AC_MSG_ERROR(no; file a bug report with ZFSOnLinux)
])
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether fops->readdir() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int readdir(struct file *filp, void *entry, filldir_t func)
{ return 0; }
static const struct file_operations fops
__attribute__ ((unused)) = {
.readdir = readdir,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_READDIR, 1,
[fops->readdir() is available])
],[
AC_MSG_ERROR(no; file a bug report with ZFSOnLinux)
])
])
])
])

View File

@ -1,5 +1,5 @@
dnl #
dnl # Linux 4.1.x API
dnl # Linux 3.16 API
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_RW_ITERATE],
[AC_MSG_CHECKING([whether fops->read/write_iter() are available])
@ -21,6 +21,47 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_RW_ITERATE],
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_RW_ITERATE, 1,
[fops->read/write_iter() are available])
ZFS_AC_KERNEL_NEW_SYNC_READ
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Linux 4.1 API
dnl #
AC_DEFUN([ZFS_AC_KERNEL_NEW_SYNC_READ],
[AC_MSG_CHECKING([whether new_sync_read() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
new_sync_read(NULL, NULL, 0, NULL);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_NEW_SYNC_READ, 1,
[new_sync_read() is available])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Linux 4.1.x API
dnl #
AC_DEFUN([ZFS_AC_KERNEL_GENERIC_WRITE_CHECKS],
[AC_MSG_CHECKING([whether generic_write_checks() takes kiocb])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
struct kiocb *iocb = NULL;
struct iov_iter *iov = NULL;
generic_write_checks(iocb, iov);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_GENERIC_WRITE_CHECKS_KIOCB, 1,
[generic_write_checks() takes kiocb])
],[
AC_MSG_RESULT(no)
])

View File

@ -3,8 +3,8 @@ dnl # 2.6.35 API change,
dnl # The 'struct xattr_handler' was constified in the generic
dnl # super_block structure.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CONST_XATTR_HANDLER],
[AC_MSG_CHECKING([whether super_block uses const struct xattr_hander])
AC_DEFUN([ZFS_AC_KERNEL_CONST_XATTR_HANDLER], [
AC_MSG_CHECKING([whether super_block uses const struct xattr_handler])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
#include <linux/xattr.h>
@ -26,24 +26,78 @@ AC_DEFUN([ZFS_AC_KERNEL_CONST_XATTR_HANDLER],
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_CONST_XATTR_HANDLER, 1,
[super_block uses const struct xattr_hander])
[super_block uses const struct xattr_handler])
],[
AC_MSG_RESULT([no])
])
])
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_hander->get() callback was changed to take a dentry
dnl # instead of an inode, and a handler_flags argument was added.
dnl # 4.5 API change,
dnl # struct xattr_handler added new member "name".
dnl # xattr_handler which matches to whole name rather than prefix should use
dnl # "name" instead of "prefix", e.g. "system.posix_acl_access"
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
AC_MSG_CHECKING([whether xattr_handler->get() wants dentry])
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_NAME], [
AC_MSG_CHECKING([whether xattr_handler has name])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int handler_flags) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.name = XATTR_NAME_POSIX_ACL_ACCESS,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_HANDLER_NAME, 1,
[xattr_handler has name])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 4.9 API change,
dnl # iops->{set,get,remove}xattr and generic_{set,get,remove}xattr are
dnl # removed. xattr operations will directly go through sb->s_xattr.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_HAVE_GENERIC_SETXATTR], [
AC_MSG_CHECKING([whether generic_setxattr() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
#include <linux/xattr.h>
static const struct inode_operations
iops __attribute__ ((unused)) = {
.setxattr = generic_setxattr
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_GENERIC_SETXATTR, 1,
[generic_setxattr() exists])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Supported xattr handler get() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
dnl #
dnl # 4.7 API change,
dnl # The xattr_handler->get() callback was changed to take both
dnl # dentry and inode.
dnl #
AC_MSG_CHECKING([whether xattr_handler->get() wants both dentry and inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
@ -51,26 +105,102 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DENTRY_XATTR_GET, 1,
[xattr_handler->get() wants dentry])
AC_DEFINE(HAVE_XATTR_GET_DENTRY_INODE, 1,
[xattr_handler->get() wants both dentry and inode])
],[
AC_MSG_RESULT(no)
dnl #
dnl # 4.4 API change,
dnl # The xattr_handler->get() callback was changed to take a
dnl # attr_handler, and handler_flags argument was removed and
dnl # should be accessed by handler->flags.
dnl #
AC_MSG_CHECKING([whether xattr_handler->get() wants xattr_handler])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
void *buffer, size_t size) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_HANDLER, 1,
[xattr_handler->get() wants xattr_handler])
],[
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_handler->get() callback was changed to take
dnl # a dentry instead of an inode, and a handler_flags
dnl # argument was added.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether xattr_handler->get() wants dentry])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int handler_flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_DENTRY, 1,
[xattr_handler->get() wants dentry])
],[
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->get() wants inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(struct inode *ip, const char *name,
void *buffer, size_t size) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_INODE, 1,
[xattr_handler->get() wants inode])
],[
AC_MSG_ERROR([no; please file a bug report])
])
])
])
])
])
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_hander->set() callback was changed to take a dentry
dnl # instead of an inode, and a handler_flags argument was added.
dnl # Supported xattr handler set() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_SET], [
AC_MSG_CHECKING([whether xattr_handler->set() wants dentry])
dnl #
dnl # 4.7 API change,
dnl # The xattr_handler->set() callback was changed to take both
dnl # dentry and inode.
dnl #
AC_MSG_CHECKING([whether xattr_handler->set() wants both dentry and inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags,
int handler_flags) { return 0; }
int set(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, const void *buffer,
size_t size, int flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
@ -78,26 +208,98 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_SET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DENTRY_XATTR_SET, 1,
[xattr_handler->set() wants dentry])
AC_DEFINE(HAVE_XATTR_SET_DENTRY_INODE, 1,
[xattr_handler->set() wants both dentry and inode])
],[
AC_MSG_RESULT(no)
dnl #
dnl # 4.4 API change,
dnl # The xattr_handler->set() callback was changed to take a
dnl # xattr_handler, and handler_flags argument was removed and
dnl # should be accessed by handler->flags.
dnl #
AC_MSG_CHECKING([whether xattr_handler->set() wants xattr_handler])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_HANDLER, 1,
[xattr_handler->set() wants xattr_handler])
],[
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_handler->set() callback was changed to take a
dnl # dentry instead of an inode, and a handler_flags
dnl # argument was added.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether xattr_handler->set() wants dentry])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags,
int handler_flags) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_DENTRY, 1,
[xattr_handler->set() wants dentry])
],[
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->set() wants inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(struct inode *ip, const char *name,
const void *buffer, size_t size, int flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_INODE, 1,
[xattr_handler->set() wants inode])
],[
AC_MSG_ERROR([no; please file a bug report])
])
])
])
])
])
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_hander->list() callback was changed to take a dentry
dnl # instead of an inode, and a handler_flags argument was added.
dnl # Supported xattr handler list() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_LIST], [
AC_MSG_CHECKING([whether xattr_handler->list() wants dentry])
dnl # 4.5 API change,
dnl # The xattr_handler->list() callback was changed to take only a
dnl # dentry and it only needs to return if it's accessable.
AC_MSG_CHECKING([whether xattr_handler->list() wants simple])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
size_t list(struct dentry *dentry, char *list, size_t list_size,
const char *name, size_t name_len, int handler_flags)
{ return 0; }
bool list(struct dentry *dentry) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.list = list,
@ -105,10 +307,87 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_LIST], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DENTRY_XATTR_LIST, 1,
[xattr_handler->list() wants dentry])
AC_DEFINE(HAVE_XATTR_LIST_SIMPLE, 1,
[xattr_handler->list() wants simple])
],[
dnl #
dnl # 4.4 API change,
dnl # The xattr_handler->list() callback was changed to take a
dnl # xattr_handler, and handler_flags argument was removed
dnl # and should be accessed by handler->flags.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->list() wants xattr_handler])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
size_t list(const struct xattr_handler *handler,
struct dentry *dentry, char *list, size_t list_size,
const char *name, size_t name_len) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.list = list,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_LIST_HANDLER, 1,
[xattr_handler->list() wants xattr_handler])
],[
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_handler->list() callback was changed
dnl # to take a dentry instead of an inode, and a
dnl # handler_flags argument was added.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->list() wants dentry])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
size_t list(struct dentry *dentry,
char *list, size_t list_size,
const char *name, size_t name_len,
int handler_flags) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.list = list,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_LIST_DENTRY, 1,
[xattr_handler->list() wants dentry])
],[
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->list() wants inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
size_t list(struct inode *ip, char *lst,
size_t list_size, const char *name,
size_t name_len) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.list = list,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_LIST_INODE, 1,
[xattr_handler->list() wants inode])
],[
AC_MSG_ERROR(
[no; please file a bug report])
])
])
])
])
])

View File

@ -8,6 +8,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_CONFIG
ZFS_AC_KERNEL_DECLARE_EVENT_CLASS
ZFS_AC_KERNEL_CURRENT_BIO_TAIL
ZFS_AC_KERNEL_SUBMIT_BIO
ZFS_AC_KERNEL_BDEV_BLOCK_DEVICE_OPERATIONS
ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID
ZFS_AC_KERNEL_TYPE_FMODE_T
@ -22,31 +23,43 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_BIO_BVEC_ITER
ZFS_AC_KERNEL_BIO_FAILFAST_DTD
ZFS_AC_KERNEL_REQ_FAILFAST_MASK
ZFS_AC_KERNEL_REQ_OP_DISCARD
ZFS_AC_KERNEL_REQ_OP_SECURE_ERASE
ZFS_AC_KERNEL_REQ_OP_FLUSH
ZFS_AC_KERNEL_BIO_BI_OPF
ZFS_AC_KERNEL_BIO_END_IO_T_ARGS
ZFS_AC_KERNEL_BIO_RW_BARRIER
ZFS_AC_KERNEL_BIO_RW_DISCARD
ZFS_AC_KERNEL_BLK_QUEUE_FLUSH
ZFS_AC_KERNEL_BLK_QUEUE_MAX_HW_SECTORS
ZFS_AC_KERNEL_BLK_QUEUE_MAX_SEGMENTS
ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BIO_RW_UNPLUG
ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BLK_PLUG
ZFS_AC_KERNEL_GET_DISK_RO
ZFS_AC_KERNEL_GET_GENDISK
ZFS_AC_KERNEL_HAVE_BIO_SET_OP_ATTRS
ZFS_AC_KERNEL_GENERIC_READLINK_GLOBAL
ZFS_AC_KERNEL_DISCARD_GRANULARITY
ZFS_AC_KERNEL_CONST_XATTR_HANDLER
ZFS_AC_KERNEL_XATTR_HANDLER_NAME
ZFS_AC_KERNEL_XATTR_HANDLER_GET
ZFS_AC_KERNEL_XATTR_HANDLER_SET
ZFS_AC_KERNEL_XATTR_HANDLER_LIST
ZFS_AC_KERNEL_INODE_OWNER_OR_CAPABLE
ZFS_AC_KERNEL_POSIX_ACL_FROM_XATTR_USERNS
ZFS_AC_KERNEL_POSIX_ACL_RELEASE
ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE
ZFS_AC_KERNEL_POSIX_ACL_CHMOD
ZFS_AC_KERNEL_POSIX_ACL_CACHING
ZFS_AC_KERNEL_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T
ZFS_AC_KERNEL_POSIX_ACL_VALID_WITH_NS
ZFS_AC_KERNEL_INODE_OPERATIONS_PERMISSION
ZFS_AC_KERNEL_INODE_OPERATIONS_PERMISSION_WITH_NAMEIDATA
ZFS_AC_KERNEL_INODE_OPERATIONS_CHECK_ACL
ZFS_AC_KERNEL_INODE_OPERATIONS_CHECK_ACL_WITH_FLAGS
ZFS_AC_KERNEL_INODE_OPERATIONS_GET_ACL
ZFS_AC_KERNEL_CURRENT_UMASK
ZFS_AC_KERNEL_INODE_OPERATIONS_SET_ACL
ZFS_AC_KERNEL_INODE_OPERATIONS_GETATTR
ZFS_AC_KERNEL_GET_ACL_HANDLE_CACHE
ZFS_AC_KERNEL_SHOW_OPTIONS
ZFS_AC_KERNEL_FILE_INODE
ZFS_AC_KERNEL_FSYNC
@ -55,16 +68,18 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_NR_CACHED_OBJECTS
ZFS_AC_KERNEL_FREE_CACHED_OBJECTS
ZFS_AC_KERNEL_FALLOCATE
ZFS_AC_KERNEL_AIO_FSYNC
ZFS_AC_KERNEL_MKDIR_UMODE_T
ZFS_AC_KERNEL_LOOKUP_NAMEIDATA
ZFS_AC_KERNEL_CREATE_NAMEIDATA
ZFS_AC_KERNEL_FOLLOW_LINK
ZFS_AC_KERNEL_GET_LINK
ZFS_AC_KERNEL_PUT_LINK
ZFS_AC_KERNEL_TRUNCATE_RANGE
ZFS_AC_KERNEL_AUTOMOUNT
ZFS_AC_KERNEL_ENCODE_FH_WITH_INODE
ZFS_AC_KERNEL_COMMIT_METADATA
ZFS_AC_KERNEL_CLEAR_INODE
ZFS_AC_KERNEL_SETATTR_PREPARE
ZFS_AC_KERNEL_INSERT_INODE_LOCKED
ZFS_AC_KERNEL_D_MAKE_ROOT
ZFS_AC_KERNEL_D_OBTAIN_ALIAS
@ -81,17 +96,21 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_SHRINK_CONTROL_HAS_NID
ZFS_AC_KERNEL_S_INSTANCES_LIST_HEAD
ZFS_AC_KERNEL_S_D_OP
ZFS_AC_KERNEL_BDI_SETUP_AND_REGISTER
ZFS_AC_KERNEL_BDI
ZFS_AC_KERNEL_SET_NLINK
ZFS_AC_KERNEL_ELEVATOR_CHANGE
ZFS_AC_KERNEL_5ARG_SGET
ZFS_AC_KERNEL_LSEEK_EXECUTE
ZFS_AC_KERNEL_VFS_ITERATE
ZFS_AC_KERNEL_VFS_RW_ITERATE
ZFS_AC_KERNEL_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_MAKE_REQUEST_FN
ZFS_AC_KERNEL_GENERIC_IO_ACCT
ZFS_AC_KERNEL_RENAME_WANTS_FLAGS
ZFS_AC_KERNEL_HAVE_GENERIC_SETXATTR
ZFS_AC_KERNEL_CURRENT_TIME
AS_IF([test "$LINUX_OBJ" != "$LINUX"], [
KERNELMAKE_PARAMS="$KERNELMAKE_PARAMS O=$LINUX_OBJ"
@ -448,21 +467,49 @@ dnl # detected at configure time and cause a build failure. Otherwise
dnl # modules may be successfully built that behave incorrectly.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CONFIG], [
AC_RUN_IFELSE([
AC_LANG_PROGRAM([
#include "$LINUX/include/linux/license.h"
AS_IF([test "x$cross_compiling" != xyes], [
AC_RUN_IFELSE([
AC_LANG_PROGRAM([
#include "$LINUX/include/linux/license.h"
], [
return !license_is_gpl_compatible("$ZFS_META_LICENSE");
])
], [
AC_DEFINE([ZFS_IS_GPL_COMPATIBLE], [1],
[Define to 1 if GPL-only symbols can be used])
], [
return !license_is_gpl_compatible("$ZFS_META_LICENSE");
])
], [
AC_DEFINE([ZFS_IS_GPL_COMPATIBLE], [1],
[Define to 1 if GPL-only symbols can be used])
], [
])
ZFS_AC_KERNEL_CONFIG_THREAD_SIZE
ZFS_AC_KERNEL_CONFIG_DEBUG_LOCK_ALLOC
])
dnl #
dnl # Check configured THREAD_SIZE
dnl #
dnl # The stack size will vary by architecture, but as of Linux 3.15 on x86_64
dnl # the default thread stack size was increased to 16K from 8K. Therefore,
dnl # on newer kernels and some architectures stack usage optimizations can be
dnl # conditionally applied to improve performance without negatively impacting
dnl # stability.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CONFIG_THREAD_SIZE], [
AC_MSG_CHECKING([whether kernel was built with 16K or larger stacks])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
],[
#if (THREAD_SIZE < 16384)
#error "THREAD_SIZE is less than 16K"
#endif
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_LARGE_STACKS, 1, [kernel has large stacks])
],[
AC_MSG_RESULT([no])
])
])
dnl #
dnl # Check CONFIG_DEBUG_LOCK_ALLOC
dnl #
@ -572,7 +619,7 @@ dnl #
dnl # ZFS_LINUX_CONFIG
dnl #
AC_DEFUN([ZFS_LINUX_CONFIG],
[AC_MSG_CHECKING([whether Linux was built with CONFIG_$1])
[AC_MSG_CHECKING([whether kernel was built with CONFIG_$1])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
],[

39
config/user-makedev.m4 Normal file
View File

@ -0,0 +1,39 @@
dnl #
dnl # glibc 2.25
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_MAKEDEV_IN_SYSMACROS], [
AC_MSG_CHECKING([makedev() is declared in sys/sysmacros.h])
AC_TRY_COMPILE(
[
#include <sys/sysmacros.h>
],[
int k;
k = makedev(0,0);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MAKEDEV_IN_SYSMACROS, 1,
[makedev() is declared in sys/sysmacros.h])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # glibc X < Y < 2.25
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV], [
AC_MSG_CHECKING([makedev() is declared in sys/mkdev.h])
AC_TRY_COMPILE(
[
#include <sys/mkdev.h>
],[
int k;
k = makedev(0,0);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MAKEDEV_IN_MKDEV, 1,
[makedev() is declared in sys/mkdev.h])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,22 @@
dnl #
dnl # Check if gcc supports -Wno-format-truncation option.
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_NO_FORMAT_TRUNCATION], [
AC_MSG_CHECKING([for -Wno-format-truncation support])
saved_flags="$CFLAGS"
CFLAGS="$CFLAGS -Wno-format-truncation"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])],
[
NO_FORMAT_TRUNCATION=-Wno-format-truncation
AC_MSG_RESULT([yes])
],
[
NO_FORMAT_TRUNCATION=
AC_MSG_RESULT([no])
])
CFLAGS="$saved_flags"
AC_SUBST([NO_FORMAT_TRUNCATION])
])

View File

@ -13,6 +13,9 @@ AC_DEFUN([ZFS_AC_CONFIG_USER], [
ZFS_AC_CONFIG_USER_LIBBLKID
ZFS_AC_CONFIG_USER_FRAME_LARGER_THAN
ZFS_AC_CONFIG_USER_RUNSTATEDIR
ZFS_AC_CONFIG_USER_MAKEDEV_IN_SYSMACROS
ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV
ZFS_AC_CONFIG_USER_NO_FORMAT_TRUNCATION
dnl #
dnl # Checks for library functions
AC_CHECK_FUNCS([mlockall])

View File

@ -55,6 +55,8 @@ adjust_obj_paths()
for MODULE in "${MODULES[@]}"
do
adjust_obj_paths "$KERNEL_DIR/fs/zfs/$MODULE/Makefile"
sed -i.bak '/obj =/d' "$KERNEL_DIR/fs/zfs/$MODULE/Makefile"
sed -i.bak '/src =/d' "$KERNEL_DIR/fs/zfs/$MODULE/Makefile"
done
cat > "$KERNEL_DIR/fs/zfs/Kconfig" <<"EOF"

View File

@ -1,11 +1,10 @@
#!@SHELL@
#
# zfs-import This script will import/export zfs pools.
# zfs-import This script will import ZFS pools
#
# chkconfig: 2345 01 99
# description: This script will import/export zfs pools during system
# boot/shutdown.
# It is also responsible for all userspace zfs services.
# description: This script will perform a verbatim import of ZFS pools
# during system boot.
# probe: true
#
### BEGIN INIT INFO
@ -17,7 +16,7 @@
# X-Start-Before: checkfs
# X-Stop-After: zfs-mount
# Short-Description: Import ZFS pools
# Description: Run the `zpool import` or `zpool export` commands.
# Description: Run the `zpool import` command.
### END INIT INFO
#
# NOTE: Not having '$local_fs' on Required-Start but only on Required-Stop
@ -43,6 +42,16 @@ do_depend()
keyword -lxc -openvz -prefix -vserver
}
# Use the zpool cache file to import pools
do_verbatim_import()
{
if [ -f "$ZPOOL_CACHE" ]
then
zfs_action "Importing ZFS pool(s)" \
"$ZPOOL" import -c "$ZPOOL_CACHE" -N -a
fi
}
# Support function to get a list of all pools, separated with ';'
find_pools()
{
@ -60,8 +69,8 @@ find_pools()
echo "${pools%%;}" # Return without the last ';'.
}
# Import all pools
do_import()
# Find and import all visible pools, even exported ones
do_import_all_visible()
{
local already_imported available_pools pool npools
local exception dir ZPOOL_IMPORT_PATH RET=0 r=1
@ -109,7 +118,7 @@ do_import()
fi
fi
# Filter out any exceptions...
# Filter out any exceptions...
if [ -n "$ZFS_POOL_EXCEPTIONS" ]
then
local found=""
@ -249,41 +258,15 @@ do_import()
return "$RET"
}
# Export all pools
do_export()
do_import()
{
local already_imported pool root_pool RET r
RET=0
root_pool=$(get_root_pool)
[ -n "$init" ] && zfs_log_begin_msg "Exporting ZFS pool(s)"
# Find list of already imported pools.
already_imported=$(find_pools "$ZPOOL" list -H -oname)
OLD_IFS="$IFS" ; IFS=";"
for pool in $already_imported; do
[ "$pool" = "$root_pool" ] && continue
if [ -z "$init" ]
then
# Interactive - one 'Importing ...' line per pool
zfs_log_begin_msg "Exporting ZFS pool $pool"
else
# Not interactive - a dot for each pool.
zfs_log_progress_msg "."
fi
"$ZPOOL" export "$pool"
r="$?" ; RET=$((RET + r))
[ -z "$init" ] && zfs_log_end_msg "$r"
done
IFS="$OLD_IFS"
[ -n "$init" ] && zfs_log_end_msg "$RET"
return "$RET"
if check_boolean "$ZPOOL_IMPORT_ALL_VISIBLE"
then
do_import_all_visible
else
# This is the default option
do_verbatim_import
fi
}
# Output the status and list of pools
@ -323,14 +306,6 @@ do_start()
fi
}
do_stop()
{
# Check to see if the module is even loaded.
check_module_loaded "zfs" || exit 0
do_export
}
# ----------------------------------------------------
if [ ! -e /etc/gentoo-release ]
@ -340,7 +315,7 @@ then
do_start
;;
stop)
do_stop
# no-op
;;
status)
do_status
@ -350,7 +325,7 @@ then
;;
*)
[ -n "$1" ] && echo "Error: Unknown command $1."
echo "Usage: $0 {start|stop|status}"
echo "Usage: $0 {start|status}"
exit 3
;;
esac
@ -360,6 +335,5 @@ else
# Create wrapper functions since Gentoo don't use the case part.
depend() { do_depend; }
start() { do_start; }
stop() { do_stop; }
status() { do_status; }
fi

View File

@ -16,6 +16,24 @@ ZFS_SHARE='yes'
# Run `zfs unshare -a` during system stop?
ZFS_UNSHARE='yes'
# By default, a verbatim import of all pools is performed at boot based on the
# contents of the default zpool cache file. The contents of the cache are
# managed automatically by the 'zpool import' and 'zpool export' commands.
#
# By setting this to 'yes', the system will instead search all devices for
# pools and attempt to import them all at boot, even those that have been
# exported. Under this mode, the search path can be controlled by the
# ZPOOL_IMPORT_PATH variable and a list of pools that should not be imported
# can be listed in the ZFS_POOL_EXCEPTIONS variable.
#
# Note that importing all visible pools may include pools that you don't
# expect, such as those on removable devices and SANs, and those pools may
# proceed to mount themselves in places you do not want them to. The results
# can be unpredictable and possibly dangerous. Only enable this option if you
# understand this risk and have complete physical control over your system and
# SAN to prevent the insertion of malicious pools.
ZPOOL_IMPORT_ALL_VISIBLE='no'
# Specify specific path(s) to look for device nodes and/or links for the
# pool import(s). See zpool(8) for more information about this variable.
# It supersedes the old USE_DISK_BY_ID which indicated that it would only
@ -23,6 +41,18 @@ ZFS_UNSHARE='yes'
# The old variable will still work in the code, but is deprecated.
#ZPOOL_IMPORT_PATH="/dev/disk/by-vdev:/dev/disk/by-id"
# List of pools that should NOT be imported at boot
# when ZPOOL_IMPORT_ALL_VISIBLE is 'yes'.
# This is a space separated list.
#ZFS_POOL_EXCEPTIONS="test2"
# List of pools that SHOULD be imported at boot by the initramfs
# instead of trying to import all available pools. If this is set
# then ZFS_POOL_EXCEPTIONS is ignored.
# Only applicable for Debian GNU/Linux {dkms,initramfs}.
# This is a semi-colon separated list.
#ZFS_POOL_IMPORT="pool1;pool2"
# Should the datasets be mounted verbosely?
# A mount counter will be used when mounting if set to 'yes'.
VERBOSE_MOUNT='no'
@ -97,17 +127,6 @@ ZFS_INITRD_POST_MODPROBE_SLEEP='0'
# Example: If root FS is 'rpool/ROOT/rootfs', this would make sense.
#ZFS_INITRD_ADDITIONAL_DATASETS="rpool/ROOT/usr rpool/ROOT/var"
# List of pools that should NOT be imported at boot?
# This is a space separated list.
#ZFS_POOL_EXCEPTIONS="test2"
# List of pools to import?
# If this variable is set, there will be NO auto-import of ANY other
# pool. In essence, there will be no auto detection of availible pools.
# This is a semi-colon separated list.
# Makes the variable ZFS_POOL_EXCEPTIONS above redundant (won't be checked).
#ZFS_POOL_IMPORT="pool1;pool2"
# Optional arguments for the ZFS Event Daemon (ZED).
# See zed(8) for more information on available options.
#ZED_ARGS="-M"

View File

@ -1 +1,3 @@
zfs
# Always load kernel modules at boot. The default behavior is to load the
# kernel modules in the zfs-import-*.service or when blkid(8) detects a pool.
#zfs

View File

@ -1,2 +1,7 @@
# ZFS is enabled by default
enable zfs.*
enable zfs-import-cache.service
disable zfs-import-scan.service
enable zfs-mount.service
enable zfs-share.service
enable zfs-zed.service
enable zfs.target

View File

@ -2,7 +2,7 @@ systemdpreset_DATA = \
50-zfs.preset
systemdunit_DATA = \
zed.service \
zfs-zed.service \
zfs-import-cache.service \
zfs-import-scan.service \
zfs-mount.service \
@ -10,7 +10,7 @@ systemdunit_DATA = \
zfs.target
EXTRA_DIST = \
$(top_srcdir)/etc/systemd/system/zed.service.in \
$(top_srcdir)/etc/systemd/system/zfs-zed.service.in \
$(top_srcdir)/etc/systemd/system/zfs-import-cache.service.in \
$(top_srcdir)/etc/systemd/system/zfs-import-scan.service.in \
$(top_srcdir)/etc/systemd/system/zfs-mount.service.in \

View File

@ -4,6 +4,7 @@ DefaultDependencies=no
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=cryptsetup.target
After=systemd-remount-fs.service
ConditionPathExists=@sysconfdir@/zfs/zpool.cache
[Service]
@ -11,3 +12,7 @@ Type=oneshot
RemainAfterExit=yes
ExecStartPre=/sbin/modprobe zfs
ExecStart=@sbindir@/zpool import -c @sysconfdir@/zfs/zpool.cache -aN
[Install]
WantedBy=zfs-mount.service
WantedBy=zfs.target

View File

@ -10,4 +10,8 @@ ConditionPathExists=!@sysconfdir@/zfs/zpool.cache
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/sbin/modprobe zfs
ExecStart=@sbindir@/zpool import -d /dev/disk/by-id -aN
ExecStart=@sbindir@/zpool import -aN -o cachefile=none
[Install]
WantedBy=zfs-mount.service
WantedBy=zfs.target

View File

@ -1,15 +1,18 @@
[Unit]
Description=Mount ZFS filesystems
DefaultDependencies=no
Wants=zfs-import-cache.service
Wants=zfs-import-scan.service
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=zfs-import-cache.service
After=zfs-import-scan.service
After=systemd-remount-fs.service
Before=local-fs.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=@sbindir@/zfs mount -a
WorkingDirectory=-/sbin/
[Install]
WantedBy=zfs-share.service
WantedBy=zfs.target

View File

@ -1,14 +1,16 @@
[Unit]
Description=ZFS file system shares
After=nfs-server.service
After=nfs-server.service nfs-kernel-server.service
After=smb.service
After=zfs-mount.service
Requires=zfs-mount.service
PartOf=nfs-server.service
PartOf=nfs-server.service nfs-kernel-server.service
PartOf=smb.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-@bindir@/rm /etc/dfs/sharetab
ExecStartPre=-@bindir@/rm -f /etc/dfs/sharetab
ExecStart=@sbindir@/zfs share -a
[Install]
WantedBy=zfs.target

View File

@ -7,3 +7,7 @@ After=zfs-import-scan.service
[Service]
ExecStart=@sbindir@/zed -F
Restart=on-abort
[Install]
Alias=zed.service
WantedBy=zfs.target

View File

@ -1,8 +1,5 @@
[Unit]
Description=ZFS startup target
Requires=zfs-mount.service
Requires=zfs-share.service
Wants=zed.service
[Install]
WantedBy=multi-user.target

View File

@ -248,6 +248,7 @@ typedef struct splitflags {
/* after splitting, import the pool */
int import : 1;
int name_flags;
} splitflags_t;
/*
@ -406,8 +407,15 @@ struct zfs_cmd;
extern const char *zfs_history_event_names[];
typedef enum {
VDEV_NAME_PATH = 1 << 0,
VDEV_NAME_GUID = 1 << 1,
VDEV_NAME_FOLLOW_LINKS = 1 << 2,
VDEV_NAME_TYPE_ID = 1 << 3,
} vdev_name_t;
extern char *zpool_vdev_name(libzfs_handle_t *, zpool_handle_t *, nvlist_t *,
boolean_t verbose);
int name_flags);
extern int zpool_upgrade(zpool_handle_t *, uint64_t);
extern int zpool_get_history(zpool_handle_t *, nvlist_t **);
extern int zpool_history_unpack(char *, uint64_t, uint64_t *,

View File

@ -46,11 +46,6 @@
extern "C" {
#endif
#ifdef VERIFY
#undef VERIFY
#endif
#define VERIFY verify
typedef struct libzfs_fru {
char *zf_device;
char *zf_fru;

View File

@ -37,21 +37,48 @@ typedef unsigned __bitwise__ fmode_t;
#endif /* HAVE_FMODE_T */
/*
* 2.6.36 API change,
* 4.7 - 4.x API,
* The blk_queue_write_cache() interface has replaced blk_queue_flush()
* interface. However, the new interface is GPL-only thus we implement
* our own trivial wrapper when the GPL-only version is detected.
*
* 2.6.36 - 4.6 API,
* The blk_queue_flush() interface has replaced blk_queue_ordered()
* interface. However, while the old interface was available to all the
* new one is GPL-only. Thus if the GPL-only version is detected we
* implement our own trivial helper compatibility funcion. The hope is
* that long term this function will be opened up.
* implement our own trivial helper.
*
* 2.6.x - 2.6.35
* Legacy blk_queue_ordered() interface.
*/
#if defined(HAVE_BLK_QUEUE_FLUSH) && defined(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY)
#define blk_queue_flush __blk_queue_flush
static inline void
__blk_queue_flush(struct request_queue *q, unsigned int flags)
blk_queue_set_write_cache(struct request_queue *q, bool wc, bool fua)
{
q->flush_flags = flags & (REQ_FLUSH | REQ_FUA);
#if defined(HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY)
spin_lock_irq(q->queue_lock);
if (wc)
queue_flag_set(QUEUE_FLAG_WC, q);
else
queue_flag_clear(QUEUE_FLAG_WC, q);
if (fua)
queue_flag_set(QUEUE_FLAG_FUA, q);
else
queue_flag_clear(QUEUE_FLAG_FUA, q);
spin_unlock_irq(q->queue_lock);
#elif defined(HAVE_BLK_QUEUE_WRITE_CACHE)
blk_queue_write_cache(q, wc, fua);
#elif defined(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY)
if (wc)
q->flush_flags |= REQ_FLUSH;
if (fua)
q->flush_flags |= REQ_FUA;
#elif defined(HAVE_BLK_QUEUE_FLUSH)
blk_queue_flush(q, (wc ? REQ_FLUSH : 0) | (fua ? REQ_FUA : 0));
#else
blk_queue_ordered(q, QUEUE_ORDERED_DRAIN, NULL);
#endif
}
#endif /* HAVE_BLK_QUEUE_FLUSH && HAVE_BLK_QUEUE_FLUSH_GPL_ONLY */
/*
* Most of the blk_* macros were removed in 2.6.36. Ostensibly this was
* done to improve readability and allow easier grepping. However, from
@ -234,12 +261,21 @@ bio_set_flags_failfast(struct block_device *bdev, int *flags)
/*
* 2.6.27 API change
* The function was exported for use, prior to this it existed by the
* The function was exported for use, prior to this it existed but the
* symbol was not exported.
*
* 4.4.0-6.21 API change for Ubuntu
* lookup_bdev() gained a second argument, FMODE_*, to check inode permissions.
*/
#ifndef HAVE_LOOKUP_BDEV
#define lookup_bdev(path) ERR_PTR(-ENOTSUP)
#endif
#ifdef HAVE_1ARG_LOOKUP_BDEV
#define vdev_lookup_bdev(path) lookup_bdev(path)
#else
#ifdef HAVE_2ARGS_LOOKUP_BDEV
#define vdev_lookup_bdev(path) lookup_bdev(path, 0)
#else
#define vdev_lookup_bdev(path) ERR_PTR(-ENOTSUP)
#endif /* HAVE_2ARGS_LOOKUP_BDEV */
#endif /* HAVE_1ARG_LOOKUP_BDEV */
/*
* 2.6.30 API change
@ -265,48 +301,172 @@ bio_set_flags_failfast(struct block_device *bdev, int *flags)
#endif /* HAVE_BDEV_LOGICAL_BLOCK_SIZE */
#endif /* HAVE_BDEV_PHYSICAL_BLOCK_SIZE */
#ifndef HAVE_BIO_SET_OP_ATTRS
/*
* 2.6.37 API change
* The WRITE_FLUSH, WRITE_FUA, and WRITE_FLUSH_FUA flags have been
* introduced as a replacement for WRITE_BARRIER. This was done to
* allow richer semantics to be expressed to the block layer. It is
* the block layers responsibility to choose the correct way to
* implement these semantics.
*
* The existence of these flags implies that REQ_FLUSH an REQ_FUA are
* defined. Thus we can safely define VDEV_REQ_FLUSH and VDEV_REQ_FUA
* compatibility macros.
* Kernels without bio_set_op_attrs use bi_rw for the bio flags.
*/
#ifdef WRITE_FLUSH_FUA
#define VDEV_WRITE_FLUSH_FUA WRITE_FLUSH_FUA
#define VDEV_REQ_FLUSH REQ_FLUSH
#define VDEV_REQ_FUA REQ_FUA
#else
#define VDEV_WRITE_FLUSH_FUA WRITE_BARRIER
#ifdef HAVE_BIO_RW_BARRIER
#define VDEV_REQ_FLUSH (1 << BIO_RW_BARRIER)
#define VDEV_REQ_FUA (1 << BIO_RW_BARRIER)
#else
#define VDEV_REQ_FLUSH REQ_HARDBARRIER
#define VDEV_REQ_FUA REQ_FUA
#endif
static inline void
bio_set_op_attrs(struct bio *bio, unsigned rw, unsigned flags)
{
bio->bi_rw |= rw | flags;
}
#endif
/*
* 2.6.32 API change
* Use the normal I/O patch for discards.
* bio_set_flush - Set the appropriate flags in a bio to guarantee
* data are on non-volatile media on completion.
*
* 2.6.X - 2.6.36 API,
* WRITE_BARRIER - Tells the block layer to commit all previously submitted
* writes to stable storage before this one is started and that the current
* write is on stable storage upon completion. Also prevents reordering
* on both sides of the current operation.
*
* 2.6.37 - 4.8 API,
* Introduce WRITE_FLUSH, WRITE_FUA, and WRITE_FLUSH_FUA flags as a
* replacement for WRITE_BARRIER to allow expressing richer semantics
* to the block layer. It's up to the block layer to implement the
* semantics correctly. Use the WRITE_FLUSH_FUA flag combination.
*
* 4.8 - 4.9 API,
* REQ_FLUSH was renamed to REQ_PREFLUSH. For consistency with previous
* ZoL releases, prefer the WRITE_FLUSH_FUA flag set if it's available.
*
* 4.10 API,
* The read/write flags and their modifiers, including WRITE_FLUSH,
* WRITE_FUA and WRITE_FLUSH_FUA were removed from fs.h in
* torvalds/linux@70fd7614 and replaced by direct flag modification
* of the REQ_ flags in bio->bi_opf. Use REQ_PREFLUSH.
*/
#ifdef QUEUE_FLAG_DISCARD
#ifdef HAVE_BIO_RW_DISCARD
#define VDEV_REQ_DISCARD (1 << BIO_RW_DISCARD)
static inline void
bio_set_flush(struct bio *bio)
{
#if defined(REQ_PREFLUSH) /* >= 4.10 */
bio_set_op_attrs(bio, 0, REQ_PREFLUSH);
#elif defined(WRITE_FLUSH_FUA) /* >= 2.6.37 and <= 4.9 */
bio_set_op_attrs(bio, 0, WRITE_FLUSH_FUA);
#elif defined(WRITE_BARRIER) /* < 2.6.37 */
bio_set_op_attrs(bio, 0, WRITE_BARRIER);
#else
#define VDEV_REQ_DISCARD REQ_DISCARD
#error "Allowing the build will cause bio_set_flush requests to be ignored."
"Please file an issue report at: "
"https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_OP_FLUSH
*
* 4.8-rc0 - 4.8-rc1,
* REQ_PREFLUSH
*
* 2.6.36 - 4.7 API,
* REQ_FLUSH
*
* 2.6.x - 2.6.35 API,
* HAVE_BIO_RW_BARRIER
*
* Used to determine if a cache flush has been requested. This check has
* been left intentionally broad in order to cover both a legacy flush
* and the new preflush behavior introduced in Linux 4.8. This is correct
* in all cases but may have a performance impact for some kernels. It
* has the advantage of minimizing kernel specific changes in the zvol code.
*
*/
static inline boolean_t
bio_is_flush(struct bio *bio)
{
#if defined(HAVE_REQ_OP_FLUSH) && defined(HAVE_BIO_BI_OPF)
return ((bio_op(bio) == REQ_OP_FLUSH) || (bio->bi_opf & REQ_PREFLUSH));
#elif defined(REQ_PREFLUSH) && defined(HAVE_BIO_BI_OPF)
return (bio->bi_opf & REQ_PREFLUSH);
#elif defined(REQ_PREFLUSH) && !defined(HAVE_BIO_BI_OPF)
return (bio->bi_rw & REQ_PREFLUSH);
#elif defined(REQ_FLUSH)
return (bio->bi_rw & REQ_FLUSH);
#elif defined(HAVE_BIO_RW_BARRIER)
return (bio->bi_rw & (1 << BIO_RW_BARRIER));
#else
#error "Allowing the build will cause flush requests to be ignored. Please "
"file an issue report at: https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_FUA flag moved to bio->bi_opf
*
* 2.6.x - 4.7 API,
* REQ_FUA
*/
static inline boolean_t
bio_is_fua(struct bio *bio)
{
#if defined(HAVE_BIO_BI_OPF)
return (bio->bi_opf & REQ_FUA);
#elif defined(REQ_FUA)
return (bio->bi_rw & REQ_FUA);
#else
#error "Allowing the build will cause fua requests to be ignored. Please "
"file an issue report at: https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_OP_DISCARD
*
* 2.6.36 - 4.7 API,
* REQ_DISCARD
*
* 2.6.28 - 2.6.35 API,
* BIO_RW_DISCARD
*
* In all cases the normal I/O path is used for discards. The only
* difference is how the kernel tags individual I/Os as discards.
*
* Note that 2.6.32 era kernels provide both BIO_RW_DISCARD and REQ_DISCARD,
* where BIO_RW_DISCARD is the correct interface. Therefore, it is important
* that the HAVE_BIO_RW_DISCARD check occur before the REQ_DISCARD check.
*/
static inline boolean_t
bio_is_discard(struct bio *bio)
{
#if defined(HAVE_REQ_OP_DISCARD)
return (bio_op(bio) == REQ_OP_DISCARD);
#elif defined(HAVE_BIO_RW_DISCARD)
return (bio->bi_rw & (1 << BIO_RW_DISCARD));
#elif defined(REQ_DISCARD)
return (bio->bi_rw & REQ_DISCARD);
#else
#error "Allowing the build will cause discard requests to become writes "
"potentially triggering the DMU_MAX_ACCESS assertion. Please file a "
"potentially triggering the DMU_MAX_ACCESS assertion. Please file "
"an issue report at: https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_OP_SECURE_ERASE
*
* 2.6.36 - 4.7 API,
* REQ_SECURE
*
* 2.6.x - 2.6.35 API,
* Unsupported by kernel
*/
static inline boolean_t
bio_is_secure_erase(struct bio *bio)
{
#if defined(HAVE_REQ_OP_SECURE_ERASE)
return (bio_op(bio) == REQ_OP_SECURE_ERASE);
#elif defined(REQ_SECURE)
return (bio->bi_rw & REQ_SECURE);
#else
return (0);
#endif
}
/*
* 2.6.33 API change

View File

@ -69,45 +69,115 @@ truncate_setsize(struct inode *ip, loff_t new)
/*
* 2.6.32 - 2.6.33, bdi_setup_and_register() is not available.
* 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
* 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.
* 4.0 - 4.11, bdi_setup_and_register() takes 2 arguments.
* 4.12 - x.y, super_setup_bdi_name() new interface.
*/
#if defined(HAVE_2ARGS_BDI_SETUP_AND_REGISTER)
#if defined(HAVE_SUPER_SETUP_BDI_NAME)
extern atomic_long_t zfs_bdi_seq;
static inline int
zpl_bdi_setup_and_register(struct backing_dev_info *bdi, char *name)
zpl_bdi_setup(struct super_block *sb, char *name)
{
return (bdi_setup_and_register(bdi, name));
return super_setup_bdi_name(sb, "%.28s-%ld", name,
atomic_long_inc_return(&zfs_bdi_seq));
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
}
#elif defined(HAVE_2ARGS_BDI_SETUP_AND_REGISTER)
static inline int
zpl_bdi_setup(struct super_block *sb, char *name)
{
struct backing_dev_info *bdi;
int error;
bdi = kmem_zalloc(sizeof (struct backing_dev_info), KM_SLEEP);
error = bdi_setup_and_register(bdi, name);
if (error) {
kmem_free(bdi, sizeof (struct backing_dev_info));
return (error);
}
sb->s_bdi = bdi;
return (0);
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
sb->s_bdi = NULL;
}
#elif defined(HAVE_3ARGS_BDI_SETUP_AND_REGISTER)
static inline int
zpl_bdi_setup_and_register(struct backing_dev_info *bdi, char *name)
zpl_bdi_setup(struct super_block *sb, char *name)
{
return (bdi_setup_and_register(bdi, name, BDI_CAP_MAP_COPY));
struct backing_dev_info *bdi;
int error;
bdi = kmem_zalloc(sizeof (struct backing_dev_info), KM_SLEEP);
error = bdi_setup_and_register(bdi, name, BDI_CAP_MAP_COPY);
if (error) {
kmem_free(sb->s_bdi, sizeof (struct backing_dev_info));
return (error);
}
sb->s_bdi = bdi;
return (0);
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
sb->s_bdi = NULL;
}
#else
extern atomic_long_t zfs_bdi_seq;
static inline int
zpl_bdi_setup_and_register(struct backing_dev_info *bdi, char *name)
zpl_bdi_setup(struct super_block *sb, char *name)
{
char tmp[32];
struct backing_dev_info *bdi;
int error;
bdi = kmem_zalloc(sizeof (struct backing_dev_info), KM_SLEEP);
bdi->name = name;
bdi->capabilities = BDI_CAP_MAP_COPY;
error = bdi_init(bdi);
if (error)
return (error);
sprintf(tmp, "%.28s%s", name, "-%d");
error = bdi_register(bdi, NULL, tmp,
atomic_long_inc_return(&zfs_bdi_seq));
if (error) {
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
return (error);
}
return (error);
error = bdi_register(bdi, NULL, "%.28s-%ld", name,
atomic_long_inc_return(&zfs_bdi_seq));
if (error) {
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
return (error);
}
sb->s_bdi = bdi;
return (0);
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
sb->s_bdi = NULL;
}
#endif
@ -202,22 +272,11 @@ lseek_execute(
* At 60 seconds the kernel will also begin issuing RCU stall warnings.
*/
#include <linux/posix_acl.h>
#ifndef HAVE_POSIX_ACL_CACHING
#define ACL_NOT_CACHED ((void *)(-1))
#endif /* HAVE_POSIX_ACL_CACHING */
#if defined(HAVE_POSIX_ACL_RELEASE) && !defined(HAVE_POSIX_ACL_RELEASE_GPL_ONLY)
#define zpl_posix_acl_release(arg) posix_acl_release(arg)
#define zpl_set_cached_acl(ip, ty, n) set_cached_acl(ip, ty, n)
#define zpl_forget_cached_acl(ip, ty) forget_cached_acl(ip, ty)
#else
static inline void
zpl_posix_acl_free(void *arg) {
kfree(arg);
}
void zpl_posix_acl_release_impl(struct posix_acl *);
static inline void
zpl_posix_acl_release(struct posix_acl *acl)
@ -225,15 +284,17 @@ zpl_posix_acl_release(struct posix_acl *acl)
if ((acl == NULL) || (acl == ACL_NOT_CACHED))
return;
if (atomic_dec_and_test(&acl->a_refcount)) {
taskq_dispatch_delay(system_taskq, zpl_posix_acl_free, acl,
TQ_SLEEP, ddi_get_lbolt() + 60*HZ);
}
if (atomic_dec_and_test(&acl->a_refcount))
zpl_posix_acl_release_impl(acl);
}
#endif /* HAVE_POSIX_ACL_RELEASE */
#ifdef HAVE_SET_CACHED_ACL_USABLE
#define zpl_set_cached_acl(ip, ty, n) set_cached_acl(ip, ty, n)
#define zpl_forget_cached_acl(ip, ty) forget_cached_acl(ip, ty)
#else
static inline void
zpl_set_cached_acl(struct inode *ip, int type, struct posix_acl *newer) {
#ifdef HAVE_POSIX_ACL_CACHING
struct posix_acl *older = NULL;
spin_lock(&ip->i_lock);
@ -255,14 +316,13 @@ zpl_set_cached_acl(struct inode *ip, int type, struct posix_acl *newer) {
spin_unlock(&ip->i_lock);
zpl_posix_acl_release(older);
#endif /* HAVE_POSIX_ACL_CACHING */
}
static inline void
zpl_forget_cached_acl(struct inode *ip, int type) {
zpl_set_cached_acl(ip, type, (struct posix_acl *)ACL_NOT_CACHED);
}
#endif /* HAVE_POSIX_ACL_RELEASE */
#endif /* HAVE_SET_CACHED_ACL_USABLE */
#ifndef HAVE___POSIX_ACL_CHMOD
#ifdef HAVE_POSIX_ACL_CHMOD
@ -320,15 +380,19 @@ typedef umode_t zpl_equivmode_t;
#else
typedef mode_t zpl_equivmode_t;
#endif /* HAVE_POSIX_ACL_EQUIV_MODE_UMODE_T */
#endif /* CONFIG_FS_POSIX_ACL */
#ifndef HAVE_CURRENT_UMASK
static inline int
current_umask(void)
{
return (current->fs->umask);
}
#endif /* HAVE_CURRENT_UMASK */
/*
* 4.8 API change,
* posix_acl_valid() now must be passed a namespace, the namespace from
* from super block associated with the given inode is used for this purpose.
*/
#ifdef HAVE_POSIX_ACL_VALID_WITH_NS
#define zpl_posix_acl_valid(ip, acl) posix_acl_valid(ip->i_sb->s_user_ns, acl)
#else
#define zpl_posix_acl_valid(ip, acl) posix_acl_valid(acl)
#endif
#endif /* CONFIG_FS_POSIX_ACL */
/*
* 2.6.38 API change,
@ -363,4 +427,69 @@ static inline struct inode *file_inode(const struct file *f)
#define zpl_follow_up(path) follow_up(path)
#endif
/*
* 4.9 API change
*/
#ifndef HAVE_SETATTR_PREPARE
static inline int
setattr_prepare(struct dentry *dentry, struct iattr *ia)
{
return (inode_change_ok(dentry->d_inode, ia));
}
#endif
/*
* 4.11 API change
* These macros are defined by kernel 4.11. We define them so that the same
* code builds under kernels < 4.11 and >= 4.11. The macros are set to 0 so
* that it will create obvious failures if they are accidentally used when built
* against a kernel >= 4.11.
*/
#ifndef STATX_BASIC_STATS
#define STATX_BASIC_STATS 0
#endif
#ifndef AT_STATX_SYNC_AS_STAT
#define AT_STATX_SYNC_AS_STAT 0
#endif
/*
* 4.11 API change
* 4.11 takes struct path *, < 4.11 takes vfsmount *
*/
#ifdef HAVE_VFSMOUNT_IOPS_GETATTR
#define ZPL_GETATTR_WRAPPER(func) \
static int \
func(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) \
{ \
struct path path = { .mnt = mnt, .dentry = dentry }; \
return func##_impl(&path, stat, STATX_BASIC_STATS, \
AT_STATX_SYNC_AS_STAT); \
}
#elif defined(HAVE_PATH_IOPS_GETATTR)
#define ZPL_GETATTR_WRAPPER(func) \
static int \
func(const struct path *path, struct kstat *stat, u32 request_mask, \
unsigned int query_flags) \
{ \
return (func##_impl(path, stat, request_mask, query_flags)); \
}
#else
#error
#endif
/*
* 4.9 API change
* Preferred interface to get the current FS time.
*/
#if !defined(HAVE_CURRENT_TIME)
static inline struct timespec
current_time(struct inode *ip)
{
return (timespec_trunc(current_kernel_time(), ip->i_sb->s_time_gran));
}
#endif
#endif /* _ZFS_VFS_H */

View File

@ -41,12 +41,99 @@ typedef const struct xattr_handler xattr_handler_t;
typedef struct xattr_handler xattr_handler_t;
#endif
/*
* 3.7 API change,
* Preferred XATTR_NAME_* definitions introduced, these are mapped to
* the previous definitions for older kernels.
*/
#ifndef XATTR_NAME_POSIX_ACL_DEFAULT
#define XATTR_NAME_POSIX_ACL_DEFAULT POSIX_ACL_XATTR_DEFAULT
#endif
#ifndef XATTR_NAME_POSIX_ACL_ACCESS
#define XATTR_NAME_POSIX_ACL_ACCESS POSIX_ACL_XATTR_ACCESS
#endif
/*
* 4.5 API change,
*/
#if defined(HAVE_XATTR_LIST_SIMPLE)
#define ZPL_XATTR_LIST_WRAPPER(fn) \
static bool \
fn(struct dentry *dentry) \
{ \
return (!!__ ## fn(dentry->d_inode, NULL, 0, NULL, 0)); \
}
/*
* 4.4 API change,
*/
#elif defined(HAVE_XATTR_LIST_DENTRY)
#define ZPL_XATTR_LIST_WRAPPER(fn) \
static size_t \
fn(struct dentry *dentry, char *list, size_t list_size, \
const char *name, size_t name_len, int type) \
{ \
return (__ ## fn(dentry->d_inode, \
list, list_size, name, name_len)); \
}
/*
* 2.6.33 API change,
* The xattr_hander->get() callback was changed to take a dentry
*/
#elif defined(HAVE_XATTR_LIST_HANDLER)
#define ZPL_XATTR_LIST_WRAPPER(fn) \
static size_t \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
char *list, size_t list_size, const char *name, size_t name_len) \
{ \
return (__ ## fn(dentry->d_inode, \
list, list_size, name, name_len)); \
}
/*
* 2.6.32 API
*/
#elif defined(HAVE_XATTR_LIST_INODE)
#define ZPL_XATTR_LIST_WRAPPER(fn) \
static size_t \
fn(struct inode *ip, char *list, size_t list_size, \
const char *name, size_t name_len) \
{ \
return (__ ## fn(ip, list, list_size, name, name_len)); \
}
#endif
/*
* 4.7 API change,
* The xattr_handler->get() callback was changed to take a both dentry and
* inode, because the dentry might not be attached to an inode yet.
*/
#if defined(HAVE_XATTR_GET_DENTRY_INODE)
#define ZPL_XATTR_GET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
struct inode *inode, const char *name, void *buffer, size_t size) \
{ \
return (__ ## fn(inode, name, buffer, size)); \
}
/*
* 4.4 API change,
* The xattr_handler->get() callback was changed to take a xattr_handler,
* and handler_flags argument was removed and should be accessed by
* handler->flags.
*/
#elif defined(HAVE_XATTR_GET_HANDLER)
#define ZPL_XATTR_GET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
const char *name, void *buffer, size_t size) \
{ \
return (__ ## fn(dentry->d_inode, name, buffer, size)); \
}
/*
* 2.6.33 API change,
* The xattr_handler->get() callback was changed to take a dentry
* instead of an inode, and a handler_flags argument was added.
*/
#ifdef HAVE_DENTRY_XATTR_GET
#elif defined(HAVE_XATTR_GET_DENTRY)
#define ZPL_XATTR_GET_WRAPPER(fn) \
static int \
fn(struct dentry *dentry, const char *name, void *buffer, size_t size, \
@ -54,21 +141,52 @@ fn(struct dentry *dentry, const char *name, void *buffer, size_t size, \
{ \
return (__ ## fn(dentry->d_inode, name, buffer, size)); \
}
#else
/*
* 2.6.32 API
*/
#elif defined(HAVE_XATTR_GET_INODE)
#define ZPL_XATTR_GET_WRAPPER(fn) \
static int \
fn(struct inode *ip, const char *name, void *buffer, size_t size) \
{ \
return (__ ## fn(ip, name, buffer, size)); \
}
#endif /* HAVE_DENTRY_XATTR_GET */
#endif
/*
* 4.7 API change,
* The xattr_handler->set() callback was changed to take a both dentry and
* inode, because the dentry might not be attached to an inode yet.
*/
#if defined(HAVE_XATTR_SET_DENTRY_INODE)
#define ZPL_XATTR_SET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
struct inode *inode, const char *name, const void *buffer, \
size_t size, int flags) \
{ \
return (__ ## fn(inode, name, buffer, size, flags)); \
}
/*
* 4.4 API change,
* The xattr_handler->set() callback was changed to take a xattr_handler,
* and handler_flags argument was removed and should be accessed by
* handler->flags.
*/
#elif defined(HAVE_XATTR_SET_HANDLER)
#define ZPL_XATTR_SET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
const char *name, const void *buffer, size_t size, int flags) \
{ \
return (__ ## fn(dentry->d_inode, name, buffer, size, flags)); \
}
/*
* 2.6.33 API change,
* The xattr_hander->set() callback was changed to take a dentry
* The xattr_handler->set() callback was changed to take a dentry
* instead of an inode, and a handler_flags argument was added.
*/
#ifdef HAVE_DENTRY_XATTR_SET
#elif defined(HAVE_XATTR_SET_DENTRY)
#define ZPL_XATTR_SET_WRAPPER(fn) \
static int \
fn(struct dentry *dentry, const char *name, const void *buffer, \
@ -76,7 +194,10 @@ fn(struct dentry *dentry, const char *name, const void *buffer, \
{ \
return (__ ## fn(dentry->d_inode, name, buffer, size, flags)); \
}
#else
/*
* 2.6.32 API
*/
#elif defined(HAVE_XATTR_SET_INODE)
#define ZPL_XATTR_SET_WRAPPER(fn) \
static int \
fn(struct inode *ip, const char *name, const void *buffer, \
@ -84,7 +205,7 @@ fn(struct inode *ip, const char *name, const void *buffer, \
{ \
return (__ ## fn(ip, name, buffer, size, flags)); \
}
#endif /* HAVE_DENTRY_XATTR_SET */
#endif
#ifdef HAVE_6ARGS_SECURITY_INODE_INIT_SECURITY
#define zpl_security_inode_init_security(ip, dip, qstr, nm, val, len) \
@ -96,20 +217,20 @@ fn(struct inode *ip, const char *name, const void *buffer, \
/*
* Linux 3.7 API change. posix_acl_{from,to}_xattr gained the user_ns
* parameter. For the HAVE_POSIX_ACL_FROM_XATTR_USERNS version the
* userns _may_ not be correct because it's used outside the RCU.
* parameter. All callers are expected to pass the &init_user_ns which
* is available through the init credential (kcred).
*/
#ifdef HAVE_POSIX_ACL_FROM_XATTR_USERNS
static inline struct posix_acl *
zpl_acl_from_xattr(const void *value, int size)
{
return (posix_acl_from_xattr(CRED()->user_ns, value, size));
return (posix_acl_from_xattr(kcred->user_ns, value, size));
}
static inline int
zpl_acl_to_xattr(struct posix_acl *acl, void *value, int size)
{
return (posix_acl_to_xattr(CRED()->user_ns, acl, value, size));
return (posix_acl_to_xattr(kcred->user_ns, acl, value, size));
}
#else

View File

@ -230,9 +230,25 @@ typedef struct dmu_buf_impl {
/* User callback information. */
dmu_buf_user_t *db_user;
uint8_t db_immediate_evict;
/*
* Evict user data as soon as the dirty and reference
* counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* This block was freed while a read or write was
* active.
*/
uint8_t db_freed_in_flight;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to
* evict this dbuf, but couldn't due to outstanding
* references. Evict once the refcount drops to 0.
*/
uint8_t db_pending_evict;
uint8_t db_dirtycnt;
} dmu_buf_impl_t;

View File

@ -93,7 +93,6 @@ struct objset {
uint8_t os_copies;
enum zio_checksum os_dedup_checksum;
boolean_t os_dedup_verify;
boolean_t os_evicting;
zfs_logbias_op_t os_logbias;
zfs_cache_type_t os_primary_cache;
zfs_cache_type_t os_secondary_cache;

View File

@ -221,6 +221,7 @@ int dsl_dataset_own_obj(struct dsl_pool *dp, uint64_t dsobj,
void dsl_dataset_disown(dsl_dataset_t *ds, void *tag);
void dsl_dataset_name(dsl_dataset_t *ds, char *name);
boolean_t dsl_dataset_tryown(dsl_dataset_t *ds, void *tag);
int dsl_dataset_namelen(dsl_dataset_t *ds);
uint64_t dsl_dataset_create_sync(dsl_dir_t *pds, const char *lastname,
dsl_dataset_t *origin, uint64_t flags, cred_t *, dmu_tx_t *);
uint64_t dsl_dataset_create_sync_dd(dsl_dir_t *dd, dsl_dataset_t *origin,

View File

@ -69,6 +69,7 @@ typedef enum dmu_objset_type {
#define ZAP_MAXNAMELEN 256
#define ZAP_MAXVALUELEN (1024 * 8)
#define ZAP_OLDMAXVALUELEN 1024
#define ZFS_MAX_DATASET_NAME_LEN 256
/*
* Dataset properties are identified by these constants and must be added to

View File

@ -68,8 +68,9 @@
#define MNTOPT_NOFAIL "nofail" /* no failure */
#define MNTOPT_RELATIME "relatime" /* allow relative time updates */
#define MNTOPT_NORELATIME "norelatime" /* do not allow relative time updates */
#define MNTOPT_DFRATIME "strictatime" /* Deferred access time updates */
#define MNTOPT_NODFRATIME "nostrictatime" /* No Deferred access time updates */
#define MNTOPT_STRICTATIME "strictatime" /* strict access time updates */
#define MNTOPT_NOSTRICTATIME "nostrictatime" /* No strict access time updates */
#define MNTOPT_LAZYTIME "lazytime" /* Defer access time writing */
#define MNTOPT_SETUID "suid" /* Both setuid and devices allowed */
#define MNTOPT_NOSETUID "nosuid" /* Neither setuid nor devices allowed */
#define MNTOPT_OWNER "owner" /* allow owner mount */

View File

@ -40,6 +40,17 @@ extern "C" {
*/
#define FTAG ((char *)__func__)
/*
* Starting with 4.11, torvalds/linux@f405df5, the linux kernel defines a
* refcount_t type of its own. The macro below effectively changes references
* in the ZFS code from refcount_t to zfs_refcount_t at compile time, so that
* existing code need not be altered, reducing conflicts when landing openZFS
* patches.
*/
#define refcount_t zfs_refcount_t
#define refcount_add zfs_refcount_add
#ifdef ZFS_DEBUG
typedef struct reference {
list_node_t ref_link;
@ -55,7 +66,7 @@ typedef struct refcount {
list_t rc_removed;
int64_t rc_count;
int64_t rc_removed_count;
} refcount_t;
} zfs_refcount_t;
/* Note: refcount_t must be initialized with refcount_create[_untracked]() */
@ -65,7 +76,7 @@ void refcount_destroy(refcount_t *rc);
void refcount_destroy_many(refcount_t *rc, uint64_t number);
int refcount_is_zero(refcount_t *rc);
int64_t refcount_count(refcount_t *rc);
int64_t refcount_add(refcount_t *rc, void *holder_tag);
int64_t zfs_refcount_add(refcount_t *rc, void *holder_tag);
int64_t refcount_remove(refcount_t *rc, void *holder_tag);
int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_tag);
int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder_tag);
@ -86,7 +97,7 @@ typedef struct refcount {
#define refcount_destroy_many(rc, number) ((rc)->rc_count = 0)
#define refcount_is_zero(rc) ((rc)->rc_count == 0)
#define refcount_count(rc) ((rc)->rc_count)
#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1)
#define zfs_refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1)
#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1)
#define refcount_add_many(rc, number, holder) \
atomic_add_64_nv(&(rc)->rc_count, number)

View File

@ -82,6 +82,10 @@ typedef struct sa_bulk_attr {
uint16_t sa_size;
} sa_bulk_attr_t;
/*
* The on-disk format of sa_hdr_phys_t limits SA lengths to 16-bit values.
*/
#define SA_ATTR_MAX_LEN UINT16_MAX
/*
* special macro for adding entries for bulk attr support
@ -95,6 +99,7 @@ typedef struct sa_bulk_attr {
#define SA_ADD_BULK_ATTR(b, idx, attr, func, data, len) \
{ \
ASSERT3U(len, <=, SA_ATTR_MAX_LEN); \
b[idx].sa_attr = attr;\
b[idx].sa_data_func = func; \
b[idx].sa_data = data; \

View File

@ -446,6 +446,10 @@ _NOTE(CONSTCOND) } while (0)
((zc1).zc_word[2] - (zc2).zc_word[2]) | \
((zc1).zc_word[3] - (zc2).zc_word[3])))
#define ZIO_CHECKSUM_IS_ZERO(zc) \
(0 == ((zc)->zc_word[0] | (zc)->zc_word[1] | \
(zc)->zc_word[2] | (zc)->zc_word[3]))
#define DVA_IS_VALID(dva) (DVA_GET_ASIZE(dva) != 0)
#define ZIO_SET_CHECKSUM(zcp, w0, w1, w2, w3) \

View File

@ -23,6 +23,7 @@
* Copyright (c) 2011, 2015 by Delphix. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
*/
#ifndef _SYS_SPA_IMPL_H
@ -252,6 +253,7 @@ struct spa {
uint64_t spa_deadman_synctime; /* deadman expiration timer */
uint64_t spa_errata; /* errata issues detected */
spa_stats_t spa_stats; /* assorted spa statistics */
taskq_t *spa_zvol_taskq; /* Taskq for minor managment */
/*
* spa_refcount & spa_config_lock must be the last elements

View File

@ -56,7 +56,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__field(uint64_t, z_mapcnt)
__field(uint64_t, z_gen)
__field(uint64_t, z_size)
__array(uint64_t, z_atime, 2)
__field(uint64_t, z_links)
__field(uint64_t, z_pflags)
__field(uint64_t, z_uid)
@ -64,7 +63,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__field(uint32_t, z_sync_cnt)
__field(mode_t, z_mode)
__field(boolean_t, z_is_sa)
__field(boolean_t, z_is_zvol)
__field(boolean_t, z_is_mapped)
__field(boolean_t, z_is_ctldir)
__field(boolean_t, z_is_stale)
@ -95,8 +93,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_mapcnt = zn->z_mapcnt;
__entry->z_gen = zn->z_gen;
__entry->z_size = zn->z_size;
__entry->z_atime[0] = zn->z_atime[0];
__entry->z_atime[1] = zn->z_atime[1];
__entry->z_links = zn->z_links;
__entry->z_pflags = zn->z_pflags;
__entry->z_uid = zn->z_uid;
@ -104,7 +100,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_sync_cnt = zn->z_sync_cnt;
__entry->z_mode = zn->z_mode;
__entry->z_is_sa = zn->z_is_sa;
__entry->z_is_zvol = zn->z_is_zvol;
__entry->z_is_mapped = zn->z_is_mapped;
__entry->z_is_ctldir = zn->z_is_ctldir;
__entry->z_is_stale = zn->z_is_stale;
@ -126,9 +121,9 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
),
TP_printk("zn { id %llu unlinked %u atime_dirty %u "
"zn_prefetch %u moved %u blksz %u seq %u "
"mapcnt %llu gen %llu size %llu atime 0x%llx:0x%llx "
"mapcnt %llu gen %llu size %llu "
"links %llu pflags %llu uid %llu gid %llu "
"sync_cnt %u mode 0x%x is_sa %d is_zvol %d "
"sync_cnt %u mode 0x%x is_sa %d "
"is_mapped %d is_ctldir %d is_stale %d inode { "
"ino %lu nlink %u version %llu size %lli blkbits %u "
"bytes %u mode 0x%x generation %x } } ace { type %u "
@ -136,10 +131,10 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_id, __entry->z_unlinked, __entry->z_atime_dirty,
__entry->z_zn_prefetch, __entry->z_moved, __entry->z_blksz,
__entry->z_seq, __entry->z_mapcnt, __entry->z_gen,
__entry->z_size, __entry->z_atime[0], __entry->z_atime[1],
__entry->z_size,
__entry->z_links, __entry->z_pflags, __entry->z_uid,
__entry->z_gid, __entry->z_sync_cnt, __entry->z_mode,
__entry->z_is_sa, __entry->z_is_zvol, __entry->z_is_mapped,
__entry->z_is_sa, __entry->z_is_mapped,
__entry->z_is_ctldir, __entry->z_is_stale, __entry->i_ino,
__entry->i_nlink, __entry->i_version, __entry->i_size,
__entry->i_blkbits, __entry->i_bytes, __entry->i_mode,

View File

@ -37,9 +37,5 @@ typedef struct vdev_disk {
struct block_device *vd_bdev;
} vdev_disk_t;
extern int vdev_disk_physio(struct block_device *, caddr_t,
size_t, uint64_t, int);
extern int vdev_disk_read_rootlabel(char *, char *, nvlist_t **);
#endif /* _KERNEL */
#endif /* _SYS_VDEV_DISK_H */

View File

@ -225,7 +225,7 @@ typedef struct xvattr {
* of requested attributes (xva_reqattrmap[]).
*/
#define XVA_SET_REQ(xvap, attr) \
ASSERT((xvap)->xva_vattr.va_mask | AT_XVATTR); \
ASSERT((xvap)->xva_vattr.va_mask & AT_XVATTR); \
ASSERT((xvap)->xva_magic == XVA_MAGIC); \
(xvap)->xva_reqattrmap[XVA_INDEX(attr)] |= XVA_ATTRBIT(attr)
/*
@ -233,7 +233,7 @@ typedef struct xvattr {
* of requested attributes (xva_reqattrmap[]).
*/
#define XVA_CLR_REQ(xvap, attr) \
ASSERT((xvap)->xva_vattr.va_mask | AT_XVATTR); \
ASSERT((xvap)->xva_vattr.va_mask & AT_XVATTR); \
ASSERT((xvap)->xva_magic == XVA_MAGIC); \
(xvap)->xva_reqattrmap[XVA_INDEX(attr)] &= ~XVA_ATTRBIT(attr)
@ -242,7 +242,7 @@ typedef struct xvattr {
* of returned attributes (xva_rtnattrmap[]).
*/
#define XVA_SET_RTN(xvap, attr) \
ASSERT((xvap)->xva_vattr.va_mask | AT_XVATTR); \
ASSERT((xvap)->xva_vattr.va_mask & AT_XVATTR); \
ASSERT((xvap)->xva_magic == XVA_MAGIC); \
(XVA_RTNATTRMAP(xvap))[XVA_INDEX(attr)] |= XVA_ATTRBIT(attr)
@ -251,7 +251,7 @@ typedef struct xvattr {
* to see of the corresponding attribute bit is set. If so, returns non-zero.
*/
#define XVA_ISSET_REQ(xvap, attr) \
((((xvap)->xva_vattr.va_mask | AT_XVATTR) && \
((((xvap)->xva_vattr.va_mask & AT_XVATTR) && \
((xvap)->xva_magic == XVA_MAGIC) && \
((xvap)->xva_mapsize > XVA_INDEX(attr))) ? \
((xvap)->xva_reqattrmap[XVA_INDEX(attr)] & XVA_ATTRBIT(attr)) : 0)
@ -261,7 +261,7 @@ typedef struct xvattr {
* to see of the corresponding attribute bit is set. If so, returns non-zero.
*/
#define XVA_ISSET_RTN(xvap, attr) \
((((xvap)->xva_vattr.va_mask | AT_XVATTR) && \
((((xvap)->xva_vattr.va_mask & AT_XVATTR) && \
((xvap)->xva_magic == XVA_MAGIC) && \
((xvap)->xva_mapsize > XVA_INDEX(attr))) ? \
((XVA_RTNATTRMAP(xvap))[XVA_INDEX(attr)] & XVA_ATTRBIT(attr)) : 0)

View File

@ -213,6 +213,7 @@ int zap_lookup_norm(objset_t *ds, uint64_t zapobj, const char *name,
int zap_lookup_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints, uint64_t integer_size, uint64_t num_integers, void *buf);
int zap_contains(objset_t *ds, uint64_t zapobj, const char *name);
int zap_prefetch(objset_t *os, uint64_t zapobj, const char *name);
int zap_prefetch_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints);

View File

@ -77,7 +77,8 @@ extern int zfsctl_snapdir_mkdir(struct inode *dip, char *dirname, vattr_t *vap,
extern void zfsctl_snapdir_inactive(struct inode *ip);
extern int zfsctl_snapshot_mount(struct path *path, int flags);
extern int zfsctl_snapshot_unmount(char *snapname, int flags);
extern int zfsctl_snapshot_unmount_delay(uint64_t objsetid, int delay);
extern int zfsctl_snapshot_unmount_delay(spa_t *spa, uint64_t objsetid,
int delay);
extern int zfsctl_lookup_objset(struct super_block *sb, uint64_t objsetid,
zfs_sb_t **zsb);

View File

@ -32,7 +32,9 @@ extern "C" {
#ifdef _KERNEL
#include <sys/zfs_znode.h>
#include <sys/list.h>
#include <sys/avl.h>
#include <sys/condvar.h>
typedef enum {
RL_READER,
@ -40,8 +42,16 @@ typedef enum {
RL_APPEND
} rl_type_t;
typedef struct zfs_rlock {
kmutex_t zr_mutex; /* protects changes to zr_avl */
avl_tree_t zr_avl; /* avl tree of range locks */
uint64_t *zr_size; /* points to znode->z_size */
uint_t *zr_blksz; /* points to znode->z_blksz */
uint64_t *zr_max_blksz; /* points to zsb->z_max_blksz */
} zfs_rlock_t;
typedef struct rl {
znode_t *r_zp; /* znode this lock applies to */
zfs_rlock_t *r_zrl;
avl_node_t r_node; /* avl node link */
uint64_t r_off; /* file range offset */
uint64_t r_len; /* file range length */
@ -61,7 +71,8 @@ typedef struct rl {
* is converted to RL_WRITER that specified to lock from the start of the
* end of file. Returns the range lock structure.
*/
rl_t *zfs_range_lock(znode_t *zp, uint64_t off, uint64_t len, rl_type_t type);
rl_t *zfs_range_lock(zfs_rlock_t *zrl, uint64_t off, uint64_t len,
rl_type_t type);
/* Unlock range and destroy range lock structure. */
void zfs_range_unlock(rl_t *rl);
@ -78,6 +89,23 @@ void zfs_range_reduce(rl_t *rl, uint64_t off, uint64_t len);
*/
int zfs_range_compare(const void *arg1, const void *arg2);
static inline void
zfs_rlock_init(zfs_rlock_t *zrl)
{
mutex_init(&zrl->zr_mutex, NULL, MUTEX_DEFAULT, NULL);
avl_create(&zrl->zr_avl, zfs_range_compare,
sizeof (rl_t), offsetof(rl_t, r_node));
zrl->zr_size = NULL;
zrl->zr_blksz = NULL;
zrl->zr_max_blksz = NULL;
}
static inline void
zfs_rlock_destroy(zfs_rlock_t *zrl)
{
avl_destroy(&zrl->zr_avl);
mutex_destroy(&zrl->zr_mutex);
}
#endif /* _KERNEL */
#ifdef __cplusplus

View File

@ -64,7 +64,6 @@ typedef struct zfs_mntopts {
typedef struct zfs_sb {
struct super_block *z_sb; /* generic super_block */
struct backing_dev_info z_bdi; /* generic backing dev info */
struct zfs_sb *z_parent; /* parent fs */
objset_t *z_os; /* objset reference */
zfs_mntopts_t *z_mntopts; /* passed mount options */
@ -112,8 +111,9 @@ typedef struct zfs_sb {
uint64_t z_groupquota_obj;
uint64_t z_replay_eof; /* New end of file - replay only */
sa_attr_type_t *z_attr_table; /* SA attr mapping->id */
#define ZFS_OBJ_MTX_SZ 256
kmutex_t *z_hold_mtx; /* znode hold locks */
uint64_t z_hold_size; /* znode hold array size */
avl_tree_t *z_hold_trees; /* znode hold trees */
kmutex_t *z_hold_locks; /* znode hold locks */
} zfs_sb_t;
#define ZFS_SUPER_MAGIC 0x2fc12fc1

View File

@ -37,6 +37,7 @@
#include <sys/rrwlock.h>
#include <sys/zfs_sa.h>
#include <sys/zfs_stat.h>
#include <sys/zfs_rlock.h>
#endif
#include <sys/zfs_acl.h>
#include <sys/zil.h>
@ -187,8 +188,7 @@ typedef struct znode {
krwlock_t z_parent_lock; /* parent lock for directories */
krwlock_t z_name_lock; /* "master" lock for dirent locks */
zfs_dirlock_t *z_dirlocks; /* directory entry lock list */
kmutex_t z_range_lock; /* protects changes to z_range_avl */
avl_tree_t z_range_avl; /* avl tree of file range locks */
zfs_rlock_t z_range_lock; /* file range lock */
uint8_t z_unlinked; /* file has been unlinked */
uint8_t z_atime_dirty; /* atime needs to be synced */
uint8_t z_zn_prefetch; /* Prefetch znodes? */
@ -198,7 +198,6 @@ typedef struct znode {
uint64_t z_mapcnt; /* number of pages mapped to file */
uint64_t z_gen; /* generation (cached) */
uint64_t z_size; /* file size (cached) */
uint64_t z_atime[2]; /* atime (cached) */
uint64_t z_links; /* file links (cached) */
uint64_t z_pflags; /* pflags (cached) */
uint64_t z_uid; /* uid fuid (cached) */
@ -209,17 +208,21 @@ typedef struct znode {
zfs_acl_t *z_acl_cached; /* cached acl */
krwlock_t z_xattr_lock; /* xattr data lock */
nvlist_t *z_xattr_cached; /* cached xattrs */
struct znode *z_xattr_parent; /* xattr parent znode */
list_node_t z_link_node; /* all znodes in fs link */
sa_handle_t *z_sa_hdl; /* handle to sa data */
boolean_t z_is_sa; /* are we native sa? */
boolean_t z_is_zvol; /* are we used by the zvol */
boolean_t z_is_mapped; /* are we mmap'ed */
boolean_t z_is_ctldir; /* are we .zfs entry */
boolean_t z_is_stale; /* are we stale due to rollback? */
struct inode z_inode; /* generic vfs inode */
} znode_t;
typedef struct znode_hold {
uint64_t zh_obj; /* object id */
kmutex_t zh_lock; /* lock serializing object access */
avl_node_t zh_node; /* avl tree linkage */
refcount_t zh_refcount; /* active consumer reference count */
} znode_hold_t;
/*
* Range locking rules
@ -273,17 +276,11 @@ typedef struct znode {
/*
* Macros for dealing with dmu_buf_hold
*/
#define ZFS_OBJ_HASH(obj_num) ((obj_num) & (ZFS_OBJ_MTX_SZ - 1))
#define ZFS_OBJ_MUTEX(zsb, obj_num) \
(&(zsb)->z_hold_mtx[ZFS_OBJ_HASH(obj_num)])
#define ZFS_OBJ_HOLD_ENTER(zsb, obj_num) \
mutex_enter(ZFS_OBJ_MUTEX((zsb), (obj_num)))
#define ZFS_OBJ_HOLD_TRYENTER(zsb, obj_num) \
mutex_tryenter(ZFS_OBJ_MUTEX((zsb), (obj_num)))
#define ZFS_OBJ_HOLD_EXIT(zsb, obj_num) \
mutex_exit(ZFS_OBJ_MUTEX((zsb), (obj_num)))
#define ZFS_OBJ_HOLD_OWNED(zsb, obj_num) \
mutex_owned(ZFS_OBJ_MUTEX((zsb), (obj_num)))
#define ZFS_OBJ_MTX_SZ 64
#define ZFS_OBJ_MTX_MAX (1024 * 1024)
#define ZFS_OBJ_HASH(zsb, obj) ((obj) & ((zsb->z_hold_size) - 1))
extern unsigned int zfs_object_mutex_size;
/* Encode ZFS stored time values from a struct timespec */
#define ZFS_TIME_ENCODE(tp, stmp) \
@ -306,20 +303,17 @@ typedef struct znode {
#define STATE_CHANGED (ATTR_CTIME)
#define CONTENT_MODIFIED (ATTR_MTIME | ATTR_CTIME)
#define ZFS_ACCESSTIME_STAMP(zsb, zp) \
if ((zsb)->z_atime && !(zfs_is_readonly(zsb))) \
zfs_tstamp_update_setup(zp, ACCESSED, NULL, NULL, B_FALSE);
extern int zfs_init_fs(zfs_sb_t *, znode_t **);
extern void zfs_set_dataprop(objset_t *);
extern void zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *,
dmu_tx_t *tx);
extern void zfs_tstamp_update_setup(znode_t *, uint_t, uint64_t [2],
uint64_t [2], boolean_t);
uint64_t [2]);
extern void zfs_grow_blocksize(znode_t *, uint64_t, dmu_tx_t *);
extern int zfs_freesp(znode_t *, uint64_t, uint64_t, int, boolean_t);
extern void zfs_znode_init(void);
extern void zfs_znode_fini(void);
extern int zfs_znode_hold_compare(const void *, const void *);
extern int zfs_zget(zfs_sb_t *, uint64_t, znode_t **);
extern int zfs_rezget(znode_t *);
extern void zfs_zinactive(znode_t *);

View File

@ -525,6 +525,7 @@ extern void *zio_buf_alloc(size_t size);
extern void zio_buf_free(void *buf, size_t size);
extern void *zio_data_buf_alloc(size_t size);
extern void zio_data_buf_free(void *buf, size_t size);
extern void *zio_buf_alloc_flags(size_t size, int flags);
extern void zio_resubmit_stage_async(void *);

View File

@ -76,7 +76,7 @@ extern ssize_t zpl_xattr_list(struct dentry *dentry, char *buf, size_t size);
extern int zpl_xattr_security_init(struct inode *ip, struct inode *dip,
const struct qstr *qstr);
#if defined(CONFIG_FS_POSIX_ACL)
extern int zpl_set_acl(struct inode *ip, int type, struct posix_acl *acl);
extern int zpl_set_acl(struct inode *ip, struct posix_acl *acl, int type);
extern struct posix_acl *zpl_get_acl(struct inode *ip, int type);
#if !defined(HAVE_GET_ACL)
#if defined(HAVE_CHECK_ACL_WITH_FLAGS)
@ -123,7 +123,7 @@ extern const struct inode_operations zpl_ops_snapdirs;
extern const struct file_operations zpl_fops_shares;
extern const struct inode_operations zpl_ops_shares;
#ifdef HAVE_VFS_ITERATE
#if defined(HAVE_VFS_ITERATE) || defined(HAVE_VFS_ITERATE_SHARED)
#define DIR_CONTEXT_INIT(_dirent, _actor, _pos) { \
.actor = _actor, \

View File

@ -21,6 +21,7 @@
/*
* Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
*/
#ifndef _SYS_ZVOL_H
@ -31,24 +32,22 @@
#define ZVOL_OBJ 1ULL
#define ZVOL_ZAP_OBJ 2ULL
#ifdef _KERNEL
extern void zvol_create_minors(spa_t *spa, const char *name, boolean_t async);
extern void zvol_remove_minors(spa_t *spa, const char *name, boolean_t async);
extern void zvol_rename_minors(spa_t *spa, const char *oldname,
const char *newname, boolean_t async);
#ifdef _KERNEL
extern int zvol_check_volsize(uint64_t volsize, uint64_t blocksize);
extern int zvol_check_volblocksize(const char *name, uint64_t volblocksize);
extern int zvol_get_stats(objset_t *os, nvlist_t *nv);
extern boolean_t zvol_is_zvol(const char *);
extern void zvol_create_cb(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx);
extern int zvol_create_minor(const char *name);
extern int zvol_create_minors(const char *name);
extern int zvol_remove_minor(const char *name);
extern void zvol_remove_minors(const char *name);
extern void zvol_rename_minors(const char *oldname, const char *newname);
extern int zvol_set_volsize(const char *, uint64_t);
extern int zvol_set_volblocksize(const char *, uint64_t);
extern int zvol_set_snapdev(const char *, uint64_t);
extern int zvol_set_snapdev(const char *, zprop_source_t, uint64_t);
extern int zvol_init(void);
extern void zvol_fini(void);
#endif /* _KERNEL */
#endif /* _SYS_ZVOL_H */

View File

@ -88,7 +88,7 @@ struct dk_map2 default_vtoc_map[NDKMAP] = {
#if defined(_SUNOS_VTOC_16)
#if defined(i386) || defined(__amd64) || defined(__arm) || \
defined(__powerpc) || defined(__sparc)
defined(__powerpc) || defined(__sparc) || defined(__s390__)
{ V_BOOT, V_UNMNT }, /* i - 8 */
{ V_ALTSCTR, 0 }, /* j - 9 */

View File

@ -507,7 +507,7 @@
movl 16(%esp), %ebx
movl 20(%esp), %ecx
subl %eax, %ebx
adcl %edx, %ecx
sbbl %edx, %ecx
lock
cmpxchg8b (%edi)
jne 1b

View File

@ -34,6 +34,7 @@
#include <sys/mnttab.h>
#include <sys/types.h>
#include <sys/sysmacros.h>
#include <sys/stat.h>
#include <unistd.h>

View File

@ -31,69 +31,66 @@
#include <stdio.h>
#include <stdlib.h>
#ifndef __assert_c99
static inline void
__assert_c99(const char *expr, const char *file, int line, const char *func)
{
fprintf(stderr, "%s:%i: %s: Assertion `%s` failed.\n",
file, line, func, expr);
abort();
}
#endif /* __assert_c99 */
#ifndef verify
#if defined(__STDC__)
#if __STDC_VERSION__ - 0 >= 199901L
#define verify(EX) (void)((EX) || \
(__assert_c99(#EX, __FILE__, __LINE__, __func__), 0))
#else
#define verify(EX) (void)((EX) || (__assert(#EX, __FILE__, __LINE__), 0))
#endif /* __STDC_VERSION__ - 0 >= 199901L */
#else
#define verify(EX) (void)((EX) || (_assert("EX", __FILE__, __LINE__), 0))
#endif /* __STDC__ */
#endif /* verify */
#undef VERIFY
#undef ASSERT
#define VERIFY verify
#define ASSERT assert
extern void __assert(const char *, const char *, int);
#include <stdarg.h>
static inline int
assfail(const char *buf, const char *file, int line)
libspl_assert(const char *buf, const char *file, const char *func, int line)
{
__assert(buf, file, line);
return (0);
fprintf(stderr, "%s\n", buf);
fprintf(stderr, "ASSERT at %s:%d:%s()", file, line, func);
abort();
}
/* BEGIN CSTYLED */
#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \
const TYPE __left = (TYPE)(LEFT); \
const TYPE __right = (TYPE)(RIGHT); \
if (!(__left OP __right)) { \
char *__buf = alloca(256); \
(void) snprintf(__buf, 256, "%s %s %s (0x%llx %s 0x%llx)", \
#LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right); \
assfail(__buf, __FILE__, __LINE__); \
} \
/* printf version of libspl_assert */
static inline void
libspl_assertf(const char *file, const char *func, int line, char *format, ...)
{
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
fprintf(stderr, "\n");
fprintf(stderr, "ASSERT at %s:%d:%s()", file, line, func);
va_end(args);
abort();
}
#ifdef verify
#undef verify
#endif
#define VERIFY(cond) \
(void) ((!(cond)) && \
libspl_assert(#cond, __FILE__, __FUNCTION__, __LINE__))
#define verify(cond) \
(void) ((!(cond)) && \
libspl_assert(#cond, __FILE__, __FUNCTION__, __LINE__))
#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) \
do { \
const TYPE __left = (TYPE)(LEFT); \
const TYPE __right = (TYPE)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx)", #LEFT, #OP, #RIGHT); \
} while (0)
/* END CSTYLED */
#define VERIFY3S(x, y, z) VERIFY3_IMPL(x, y, z, int64_t)
#define VERIFY3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t)
#define VERIFY3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t)
#define VERIFY0(x) VERIFY3_IMPL(x, ==, 0, uint64_t)
#ifdef assert
#undef assert
#endif
#ifdef NDEBUG
#define ASSERT3S(x, y, z) ((void)0)
#define ASSERT3U(x, y, z) ((void)0)
#define ASSERT3P(x, y, z) ((void)0)
#define ASSERT0(x) ((void)0)
#define ASSERT(x) ((void)0)
#define assert(x) ((void)0)
#define ASSERTV(x)
#define IMPLY(A, B) ((void)0)
#define EQUIV(A, B) ((void)0)
@ -102,13 +99,17 @@ assfail(const char *buf, const char *file, int line)
#define ASSERT3U(x, y, z) VERIFY3U(x, y, z)
#define ASSERT3P(x, y, z) VERIFY3P(x, y, z)
#define ASSERT0(x) VERIFY0(x)
#define ASSERT(x) VERIFY(x)
#define assert(x) VERIFY(x)
#define ASSERTV(x) x
#define IMPLY(A, B) \
((void)(((!(A)) || (B)) || \
assfail("(" #A ") implies (" #B ")", __FILE__, __LINE__)))
libspl_assert("(" #A ") implies (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define EQUIV(A, B) \
((void)((!!(A) == !!(B)) || \
assfail("(" #A ") is equivalent to (" #B ")", __FILE__, __LINE__)))
libspl_assert("(" #A ") is equivalent to (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#endif /* NDEBUG */

View File

@ -78,7 +78,7 @@ extern "C" {
#define _SUNOS_VTOC_16
/* powerpc arch specific defines */
#elif defined(__powerpc) || defined(__powerpc__)
#elif defined(__powerpc) || defined(__powerpc__) || defined(__powerpc64__)
#if !defined(__powerpc)
#define __powerpc
@ -88,11 +88,13 @@ extern "C" {
#define __powerpc__
#endif
#if defined(__powerpc64__)
#if !defined(_LP64)
#ifdef __powerpc64__
#define _LP64
#endif
#else
#define _LP32
#if !defined(_ILP32)
#define _ILP32
#endif
#endif
@ -113,6 +115,16 @@ extern "C" {
#define __arm__
#endif
#if defined(__aarch64__)
#if !defined(_LP64)
#define _LP64
#endif
#else
#if !defined(_ILP32)
#define _ILP32
#endif
#endif
#if defined(__ARMEL__) || defined(__AARCH64EL__)
#define _LITTLE_ENDIAN
#else
@ -122,7 +134,7 @@ extern "C" {
#define _SUNOS_VTOC_16
/* sparc arch specific defines */
#elif defined(__sparc) || defined(__sparc__)
#elif defined(__sparc) || defined(__sparc__) || defined(__sparc64__)
#if !defined(__sparc)
#define __sparc
@ -135,21 +147,32 @@ extern "C" {
#define _BIG_ENDIAN
#define _SUNOS_VTOC_16
/* sparc64 arch specific defines */
#elif defined(__sparc64) || defined(__sparc64__)
#if !defined(__sparc64)
#define __sparc64
#if defined(__sparc64__)
#if !defined(_LP64)
#define _LP64
#endif
#else
#if !defined(_ILP32)
#define _ILP32
#endif
#endif
#if !defined(__sparc64__)
#define __sparc64__
/* s390 arch specific defines */
#elif defined(__s390__)
#if defined(__s390x__)
#if !defined(_LP64)
#define _LP64
#endif
#else
#if !defined(_ILP32)
#define _ILP32
#endif
#endif
#define _BIG_ENDIAN
#define _SUNOS_VTOC_16
#else /* Currently x86_64, i386, arm, powerpc, and sparc are supported */
#else /* Currently x86_64, i386, arm, powerpc, s390, and sparc are supported */
#error "Unsupported ISA type"
#endif
@ -157,6 +180,10 @@ extern "C" {
#error "Both _ILP32 and _LP64 are defined"
#endif
#if !defined(_ILP32) && !defined(_LP64)
#error "Neither _ILP32 or _LP64 are defined"
#endif
#if defined(_LITTLE_ENDIAN) && defined(_BIG_ENDIAN)
#error "Both _LITTLE_ENDIAN and _BIG_ENDIAN are defined"
#endif

View File

@ -42,16 +42,13 @@
#define makedevice(maj, min) makedev(maj, min)
#define _sysconf(a) sysconf(a)
#define __NORETURN __attribute__((noreturn))
/*
* Compatibility macros/typedefs needed for Solaris -> Linux port
*/
#define P2ALIGN(x, align) ((x) & -(align))
#define P2CROSS(x, y, align) (((x) ^ (y)) > (align) - 1)
#define P2ROUNDUP(x, align) (-(-(x) & -(align)))
#define P2ROUNDUP_TYPED(x, align, type) \
(-(-(type)(x) & -(type)(align)))
#define P2ROUNDUP(x, align) ((((x) - 1) | ((align) - 1)) + 1)
#define P2BOUNDARY(off, len, align) \
(((off) ^ ((off) + (len) - 1)) > (align) - 1)
#define P2PHASE(x, align) ((x) & ((align) - 1))
@ -79,7 +76,7 @@
#define P2NPHASE_TYPED(x, align, type) \
(-(type)(x) & ((type)(align) - 1))
#define P2ROUNDUP_TYPED(x, align, type) \
(-(-(type)(x) & -(type)(align)))
((((type)(x) - 1) | ((type)(align) - 1)) + 1)
#define P2END_TYPED(x, align, type) \
(-(~(type)(x) & -(type)(align)))
#define P2PHASEUP_TYPED(x, align, phase, type) \

View File

@ -27,10 +27,15 @@
#ifndef _LIBSPL_SYS_TYPES_H
#define _LIBSPL_SYS_TYPES_H
#if defined(HAVE_MAKEDEV_IN_SYSMACROS)
#include <sys/sysmacros.h>
#elif defined(HAVE_MAKEDEV_IN_MKDEV)
#include <sys/mkdev.h>
#endif
#include <sys/isa_defs.h>
#include <sys/feature_tests.h>
#include_next <sys/types.h>
#include <sys/param.h> /* for NBBY */
#include <sys/types32.h>
#include <sys/va_list.h>
@ -53,7 +58,6 @@ typedef u_longlong_t u_offset_t;
typedef u_longlong_t len_t;
typedef longlong_t diskaddr_t;
typedef ulong_t pfn_t; /* page frame number */
typedef ulong_t pgcnt_t; /* number of pages */
typedef long spgcnt_t; /* signed number of pages */
@ -96,4 +100,6 @@ typedef union {
} lloff_t;
#endif
#include <sys/param.h> /* for NBBY */
#endif

View File

@ -3315,8 +3315,9 @@ zfs_check_snap_cb(zfs_handle_t *zhp, void *arg)
char name[ZFS_MAXNAMELEN];
int rv = 0;
(void) snprintf(name, sizeof (name),
"%s@%s", zhp->zfs_name, dd->snapname);
if (snprintf(name, sizeof (name), "%s@%s", zhp->zfs_name,
dd->snapname) >= sizeof (name))
return (EINVAL);
if (lzc_exists(name))
verify(nvlist_add_boolean(dd->nvl, name) == 0);
@ -3534,8 +3535,9 @@ zfs_snapshot_cb(zfs_handle_t *zhp, void *arg)
int rv = 0;
if (zfs_prop_get_int(zhp, ZFS_PROP_INCONSISTENT) == 0) {
(void) snprintf(name, sizeof (name),
"%s@%s", zfs_get_name(zhp), sd->sd_snapname);
if (snprintf(name, sizeof (name), "%s@%s", zfs_get_name(zhp),
sd->sd_snapname) >= sizeof (name))
return (EINVAL);
fnvlist_add_boolean(sd->sd_nvl, name);
@ -3889,7 +3891,6 @@ zfs_rename(zfs_handle_t *zhp, const char *target, boolean_t recursive,
}
if (recursive) {
parentname = zfs_strdup(zhp->zfs_hdl, zhp->zfs_name);
if (parentname == NULL) {
ret = -1;
@ -3902,8 +3903,7 @@ zfs_rename(zfs_handle_t *zhp, const char *target, boolean_t recursive,
ret = -1;
goto error;
}
} else {
} else if (zhp->zfs_type != ZFS_TYPE_SNAPSHOT) {
if ((cl = changelist_gather(zhp, ZFS_PROP_NAME, 0,
force_unmount ? MS_FORCE : 0)) == NULL)
return (-1);
@ -3952,23 +3952,23 @@ zfs_rename(zfs_handle_t *zhp, const char *target, boolean_t recursive,
* On failure, we still want to remount any filesystems that
* were previously mounted, so we don't alter the system state.
*/
if (!recursive)
if (cl != NULL)
(void) changelist_postfix(cl);
} else {
if (!recursive) {
if (cl != NULL) {
changelist_rename(cl, zfs_get_name(zhp), target);
ret = changelist_postfix(cl);
}
}
error:
if (parentname) {
if (parentname != NULL) {
free(parentname);
}
if (zhrp) {
if (zhrp != NULL) {
zfs_close(zhrp);
}
if (cl) {
if (cl != NULL) {
changelist_free(cl);
}
return (ret);
@ -4259,8 +4259,9 @@ zfs_hold_one(zfs_handle_t *zhp, void *arg)
char name[ZFS_MAXNAMELEN];
int rv = 0;
(void) snprintf(name, sizeof (name),
"%s@%s", zhp->zfs_name, ha->snapname);
if (snprintf(name, sizeof (name), "%s@%s", zhp->zfs_name,
ha->snapname) >= sizeof (name))
return (EINVAL);
if (lzc_exists(name))
fnvlist_add_string(ha->nvl, name, ha->tag);
@ -4379,8 +4380,11 @@ zfs_release_one(zfs_handle_t *zhp, void *arg)
int rv = 0;
nvlist_t *existing_holds;
(void) snprintf(name, sizeof (name),
"%s@%s", zhp->zfs_name, ha->snapname);
if (snprintf(name, sizeof (name), "%s@%s", zhp->zfs_name,
ha->snapname) >= sizeof (name)) {
ha->error = EINVAL;
rv = EINVAL;
}
if (lzc_get_holds(name, &existing_holds) != 0) {
ha->error = ENOENT;

View File

@ -97,6 +97,8 @@ typedef struct pool_list {
name_entry_t *names;
} pool_list_t;
#define DEV_BYID_PATH "/dev/disk/by-id/"
static char *
get_devid(const char *path)
{
@ -121,6 +123,40 @@ get_devid(const char *path)
return (ret);
}
/*
* Wait up to timeout_ms for udev to set up the device node. The device is
* considered ready when the provided path have been verified to exist and
* it has been allowed to settle. At this point the device the device can
* be accessed reliably. Depending on the complexity of the udev rules thisi
* process could take several seconds.
*/
int
zpool_label_disk_wait(char *path, int timeout_ms)
{
int settle_ms = 50;
long sleep_ms = 10;
hrtime_t start, settle;
struct stat64 statbuf;
start = gethrtime();
settle = 0;
do {
errno = 0;
if ((stat64(path, &statbuf) == 0) && (errno == 0)) {
if (settle == 0)
settle = gethrtime();
else if (NSEC2MSEC(gethrtime() - settle) >= settle_ms)
return (0);
} else if (errno != ENOENT) {
return (errno);
}
usleep(sleep_ms * MILLISEC);
} while (NSEC2MSEC(gethrtime() - start) < timeout_ms);
return (ENODEV);
}
/*
* Go through and fix up any path and/or devid information for the given vdev
@ -162,7 +198,6 @@ fix_paths(nvlist_t *nv, name_entry_t *names)
best = NULL;
for (ne = names; ne != NULL; ne = ne->ne_next) {
if (ne->ne_guid == guid) {
if (path == NULL) {
best = ne;
break;
@ -186,7 +221,7 @@ fix_paths(nvlist_t *nv, name_entry_t *names)
}
/* Prefer paths earlier in the search order. */
if (best->ne_num_labels == best->ne_num_labels &&
if (ne->ne_num_labels == best->ne_num_labels &&
ne->ne_order < best->ne_order) {
best = ne;
continue;
@ -352,6 +387,118 @@ add_config(libzfs_handle_t *hdl, pool_list_t *pl, const char *path,
return (0);
}
#ifdef HAVE_LIBBLKID
static int
add_path(libzfs_handle_t *hdl, pool_list_t *pools, uint64_t pool_guid,
uint64_t vdev_guid, const char *path, int order)
{
nvlist_t *label;
uint64_t guid;
int error, fd, num_labels;
fd = open64(path, O_RDONLY);
if (fd < 0)
return (errno);
error = zpool_read_label(fd, &label, &num_labels);
close(fd);
if (error || label == NULL)
return (ENOENT);
error = nvlist_lookup_uint64(label, ZPOOL_CONFIG_POOL_GUID, &guid);
if (error || guid != pool_guid) {
nvlist_free(label);
return (EINVAL);
}
error = nvlist_lookup_uint64(label, ZPOOL_CONFIG_GUID, &guid);
if (error || guid != vdev_guid) {
nvlist_free(label);
return (EINVAL);
}
error = add_config(hdl, pools, path, order, num_labels, label);
return (error);
}
static int
add_configs_from_label_impl(libzfs_handle_t *hdl, pool_list_t *pools,
nvlist_t *nvroot, uint64_t pool_guid, uint64_t vdev_guid)
{
char udevpath[MAXPATHLEN];
char *path;
nvlist_t **child;
uint_t c, children;
uint64_t guid;
int error;
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_CHILDREN,
&child, &children) == 0) {
for (c = 0; c < children; c++) {
error = add_configs_from_label_impl(hdl, pools,
child[c], pool_guid, vdev_guid);
if (error)
return (error);
}
return (0);
}
if (nvroot == NULL)
return (0);
error = nvlist_lookup_uint64(nvroot, ZPOOL_CONFIG_GUID, &guid);
if ((error != 0) || (guid != vdev_guid))
return (0);
error = nvlist_lookup_string(nvroot, ZPOOL_CONFIG_PATH, &path);
if (error == 0)
(void) add_path(hdl, pools, pool_guid, vdev_guid, path, 0);
error = nvlist_lookup_string(nvroot, ZPOOL_CONFIG_DEVID, &path);
if (error == 0) {
sprintf(udevpath, "%s%s", DEV_BYID_PATH, path);
(void) add_path(hdl, pools, pool_guid, vdev_guid, udevpath, 1);
}
return (0);
}
/*
* Given a disk label call add_config() for all known paths to the device
* as described by the label itself. The paths are added in the following
* priority order: 'path', 'devid', 'devnode'. As these alternate paths are
* added the labels are verified to make sure they refer to the same device.
*/
static int
add_configs_from_label(libzfs_handle_t *hdl, pool_list_t *pools,
char *devname, int num_labels, nvlist_t *label)
{
nvlist_t *nvroot;
uint64_t pool_guid;
uint64_t vdev_guid;
int error;
if (nvlist_lookup_nvlist(label, ZPOOL_CONFIG_VDEV_TREE, &nvroot) ||
nvlist_lookup_uint64(label, ZPOOL_CONFIG_POOL_GUID, &pool_guid) ||
nvlist_lookup_uint64(label, ZPOOL_CONFIG_GUID, &vdev_guid))
return (ENOENT);
/* Allow devlinks to stabilize so all paths are available. */
zpool_label_disk_wait(devname, DISK_LABEL_WAIT);
/* Add alternate paths as described by the label vdev_tree. */
(void) add_configs_from_label_impl(hdl, pools, nvroot,
pool_guid, vdev_guid);
/* Add the device node /dev/sdX path as a last resort. */
error = add_config(hdl, pools, devname, 100, num_labels, label);
return (error);
}
#endif /* HAVE_LIBBLKID */
/*
* Returns true if the named pool matches the given GUID.
*/
@ -975,9 +1122,7 @@ zpool_find_import_blkid(libzfs_handle_t *hdl, pool_list_t *pools)
blkid_cache cache;
blkid_dev_iterate iter;
blkid_dev dev;
const char *devname;
nvlist_t *config;
int fd, err, num_labels;
int err;
err = blkid_get_cache(&cache, NULL);
if (err != 0) {
@ -1008,25 +1153,23 @@ zpool_find_import_blkid(libzfs_handle_t *hdl, pool_list_t *pools)
}
while (blkid_dev_next(iter, &dev) == 0) {
devname = blkid_dev_devname(dev);
nvlist_t *label;
char *devname;
int fd, num_labels;
devname = (char *) blkid_dev_devname(dev);
if ((fd = open64(devname, O_RDONLY)) < 0)
continue;
err = zpool_read_label(fd, &config, &num_labels);
err = zpool_read_label(fd, &label, &num_labels);
(void) close(fd);
if (err != 0) {
(void) no_memory(hdl);
goto err_blkid3;
}
if (err || label == NULL)
continue;
if (config != NULL) {
err = add_config(hdl, pools, devname, 0,
num_labels, config);
if (err != 0)
goto err_blkid3;
}
add_configs_from_label(hdl, pools, devname, num_labels, label);
}
err = 0;
err_blkid3:
blkid_dev_iterate_end(iter);
@ -1194,16 +1337,33 @@ zpool_find_import_impl(libzfs_handle_t *hdl, importargs_t *iarg)
if (config != NULL) {
boolean_t matched = B_TRUE;
boolean_t aux = B_FALSE;
char *pname;
if ((iarg->poolname != NULL) &&
/*
* Check if it's a spare or l2cache device. If
* it is, we need to skip the name and guid
* check since they don't exist on aux device
* label.
*/
if (iarg->poolname != NULL ||
iarg->guid != 0) {
uint64_t state;
aux = nvlist_lookup_uint64(config,
ZPOOL_CONFIG_POOL_STATE,
&state) == 0 &&
(state == POOL_STATE_SPARE ||
state == POOL_STATE_L2CACHE);
}
if ((iarg->poolname != NULL) && !aux &&
(nvlist_lookup_string(config,
ZPOOL_CONFIG_POOL_NAME, &pname) == 0)) {
if (strcmp(iarg->poolname, pname))
matched = B_FALSE;
} else if (iarg->guid != 0) {
} else if (iarg->guid != 0 && !aux) {
uint64_t this_guid;
matched = nvlist_lookup_uint64(config,

View File

@ -204,8 +204,11 @@ zfs_iter_bookmarks(zfs_handle_t *zhp, zfs_iter_f func, void *data)
bmark_name = nvpair_name(pair);
bmark_props = fnvpair_value_nvlist(pair);
(void) snprintf(name, sizeof (name), "%s#%s", zhp->zfs_name,
bmark_name);
if (snprintf(name, sizeof (name), "%s#%s", zhp->zfs_name,
bmark_name) >= sizeof (name)) {
err = EINVAL;
goto out;
}
nzhp = make_bookmark_handle(zhp, name, bmark_props);
if (nzhp == NULL)

View File

@ -21,6 +21,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2014 by Delphix. All rights reserved.
*/
/*
@ -363,6 +364,14 @@ zfs_add_options(zfs_handle_t *zhp, char *options, int len)
error = zfs_add_option(zhp, options, len,
ZFS_PROP_ATIME, MNTOPT_ATIME, MNTOPT_NOATIME);
/*
* don't add relatime/strictatime when atime=off, otherwise strictatime
* will force atime=on
*/
if (strstr(options, MNTOPT_NOATIME) == NULL) {
error = zfs_add_option(zhp, options, len,
ZFS_PROP_RELATIME, MNTOPT_RELATIME, MNTOPT_STRICTATIME);
}
error = error ? error : zfs_add_option(zhp, options, len,
ZFS_PROP_DEVICES, MNTOPT_DEVICES, MNTOPT_NODEVICES);
error = error ? error : zfs_add_option(zhp, options, len,
@ -744,13 +753,6 @@ zfs_share_proto(zfs_handle_t *zhp, zfs_share_proto_t *proto)
if (!zfs_is_mountable(zhp, mountpoint, sizeof (mountpoint), NULL))
return (0);
if ((ret = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) {
(void) zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
dgettext(TEXT_DOMAIN, "cannot share '%s': %s"),
zfs_get_name(zhp), sa_errorstr(ret));
return (-1);
}
for (curr_proto = proto; *curr_proto != PROTO_END; curr_proto++) {
/*
* Return success if there are no share options.
@ -761,6 +763,14 @@ zfs_share_proto(zfs_handle_t *zhp, zfs_share_proto_t *proto)
strcmp(shareopts, "off") == 0)
continue;
ret = zfs_init_libshare(hdl, SA_INIT_SHARE_API);
if (ret != SA_OK) {
(void) zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
dgettext(TEXT_DOMAIN, "cannot share '%s': %s"),
zfs_get_name(zhp), sa_errorstr(ret));
return (-1);
}
/*
* If the 'zoned' property is set, then zfs_is_mountable()
* will have already bailed out if we are in the global zone.
@ -1072,7 +1082,7 @@ libzfs_dataset_cmp(const void *a, const void *b)
if (gotb)
return (1);
return (strcmp(zfs_get_name(a), zfs_get_name(b)));
return (strcmp(zfs_get_name(*za), zfs_get_name(*zb)));
}
/*

View File

@ -1378,8 +1378,7 @@ zpool_add(zpool_handle_t *zhp, nvlist_t *nvroot)
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"device '%s' contains an EFI label and "
"cannot be used on root pools."),
zpool_vdev_name(hdl, NULL, spares[s],
B_FALSE));
zpool_vdev_name(hdl, NULL, spares[s], 0));
return (zfs_error(hdl, EZFS_POOL_NOTSUP, msg));
}
}
@ -1700,7 +1699,7 @@ print_vdev_tree(libzfs_handle_t *hdl, const char *name, nvlist_t *nv,
return;
for (c = 0; c < children; c++) {
vname = zpool_vdev_name(hdl, NULL, child[c], B_TRUE);
vname = zpool_vdev_name(hdl, NULL, child[c], VDEV_NAME_TYPE_ID);
print_vdev_tree(hdl, vname, child[c], indent + 2);
free(vname);
}
@ -1892,7 +1891,12 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *config, const char *newname,
"one or more devices are already in use\n"));
(void) zfs_error(hdl, EZFS_BADDEV, desc);
break;
case ENAMETOOLONG:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"new name of at least one dataset is longer than "
"the maximum allowable length"));
(void) zfs_error(hdl, EZFS_NAMETOOLONG, desc);
break;
default:
(void) zpool_standard_error(hdl, error, desc);
zpool_explain_recover(hdl,
@ -2688,7 +2692,7 @@ zpool_vdev_attach(zpool_handle_t *zhp,
verify(nvlist_lookup_nvlist(zpool_get_config(zhp, NULL),
ZPOOL_CONFIG_VDEV_TREE, &config_root) == 0);
if ((newname = zpool_vdev_name(NULL, NULL, child[0], B_FALSE)) == NULL)
if ((newname = zpool_vdev_name(NULL, NULL, child[0], 0)) == NULL)
return (-1);
/*
@ -2879,11 +2883,11 @@ find_vdev_entry(zpool_handle_t *zhp, nvlist_t **mchild, uint_t mchildren,
for (mc = 0; mc < mchildren; mc++) {
uint_t sc;
char *mpath = zpool_vdev_name(zhp->zpool_hdl, zhp,
mchild[mc], B_FALSE);
mchild[mc], 0);
for (sc = 0; sc < schildren; sc++) {
char *spath = zpool_vdev_name(zhp->zpool_hdl, zhp,
schild[sc], B_FALSE);
schild[sc], 0);
boolean_t result = (strcmp(mpath, spath) == 0);
free(spath);
@ -3424,21 +3428,34 @@ strip_partition(libzfs_handle_t *hdl, char *path)
*/
char *
zpool_vdev_name(libzfs_handle_t *hdl, zpool_handle_t *zhp, nvlist_t *nv,
boolean_t verbose)
int name_flags)
{
char *path, *devid, *type;
char *path, *devid, *type, *env;
uint64_t value;
char buf[PATH_BUF_LEN];
char tmpbuf[PATH_BUF_LEN];
vdev_stat_t *vs;
uint_t vsc;
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
&value) == 0) {
verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID,
&value) == 0);
(void) snprintf(buf, sizeof (buf), "%llu",
(u_longlong_t)value);
env = getenv("ZPOOL_VDEV_NAME_PATH");
if (env && (strtoul(env, NULL, 0) > 0 ||
!strncasecmp(env, "YES", 3) || !strncasecmp(env, "ON", 2)))
name_flags |= VDEV_NAME_PATH;
env = getenv("ZPOOL_VDEV_NAME_GUID");
if (env && (strtoul(env, NULL, 0) > 0 ||
!strncasecmp(env, "YES", 3) || !strncasecmp(env, "ON", 2)))
name_flags |= VDEV_NAME_GUID;
env = getenv("ZPOOL_VDEV_NAME_FOLLOW_LINKS");
if (env && (strtoul(env, NULL, 0) > 0 ||
!strncasecmp(env, "YES", 3) || !strncasecmp(env, "ON", 2)))
name_flags |= VDEV_NAME_FOLLOW_LINKS;
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, &value) == 0 ||
name_flags & VDEV_NAME_GUID) {
nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &value);
(void) snprintf(buf, sizeof (buf), "%llu", (u_longlong_t)value);
path = buf;
} else if (nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) == 0) {
/*
@ -3479,11 +3496,21 @@ zpool_vdev_name(libzfs_handle_t *hdl, zpool_handle_t *zhp, nvlist_t *nv,
devid_str_free(newdevid);
}
if (name_flags & VDEV_NAME_FOLLOW_LINKS) {
char *rp = realpath(path, NULL);
if (rp) {
strlcpy(buf, rp, sizeof (buf));
path = buf;
free(rp);
}
}
/*
* For a block device only use the name.
*/
verify(nvlist_lookup_string(nv, ZPOOL_CONFIG_TYPE, &type) == 0);
if (strcmp(type, VDEV_TYPE_DISK) == 0) {
if ((strcmp(type, VDEV_TYPE_DISK) == 0) &&
!(name_flags & VDEV_NAME_PATH)) {
path = strrchr(path, '/');
path++;
}
@ -3491,8 +3518,8 @@ zpool_vdev_name(libzfs_handle_t *hdl, zpool_handle_t *zhp, nvlist_t *nv,
/*
* Remove the partition from the path it this is a whole disk.
*/
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_WHOLE_DISK,
&value) == 0 && value) {
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_WHOLE_DISK, &value)
== 0 && value && !(name_flags & VDEV_NAME_PATH)) {
return (strip_partition(hdl, path));
}
} else {
@ -3514,7 +3541,7 @@ zpool_vdev_name(libzfs_handle_t *hdl, zpool_handle_t *zhp, nvlist_t *nv,
* We identify each top-level vdev by using a <type-id>
* naming convention.
*/
if (verbose) {
if (name_flags & VDEV_NAME_TYPE_ID) {
uint64_t id;
verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_ID,
@ -4072,29 +4099,6 @@ find_start_block(nvlist_t *config)
return (MAXOFFSET_T);
}
int
zpool_label_disk_wait(char *path, int timeout)
{
struct stat64 statbuf;
int i;
/*
* Wait timeout miliseconds for a newly created device to be available
* from the given path. There is a small window when a /dev/ device
* will exist and the udev link will not, so we must wait for the
* symlink. Depending on the udev rules this may take a few seconds.
*/
for (i = 0; i < timeout; i++) {
usleep(1000);
errno = 0;
if ((stat64(path, &statbuf) == 0) && (errno == 0))
return (0);
}
return (ENOENT);
}
int
zpool_label_disk_check(char *path)
{
@ -4120,6 +4124,32 @@ zpool_label_disk_check(char *path)
return (0);
}
/*
* Generate a unique partition name for the ZFS member. Partitions must
* have unique names to ensure udev will be able to create symlinks under
* /dev/disk/by-partlabel/ for all pool members. The partition names are
* of the form <pool>-<unique-id>.
*/
static void
zpool_label_name(char *label_name, int label_size)
{
uint64_t id = 0;
int fd;
fd = open("/dev/urandom", O_RDONLY);
if (fd > 0) {
if (read(fd, &id, sizeof (id)) != sizeof (id))
id = 0;
close(fd);
}
if (id == 0)
id = (((uint64_t)rand()) << 32) | (uint64_t)rand();
snprintf(label_name, label_size, "zfs-%016llx", (u_longlong_t) id);
}
/*
* Label an individual disk. The name provided is the short name,
* stripped of any leading /dev path.
@ -4210,7 +4240,7 @@ zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, char *name)
* can get, in the absence of V_OTHER.
*/
vtoc->efi_parts[0].p_tag = V_USR;
(void) strcpy(vtoc->efi_parts[0].p_name, "zfs");
zpool_label_name(vtoc->efi_parts[0].p_name, EFI_PART_NAME_LEN);
vtoc->efi_parts[8].p_start = slice_size + start_block;
vtoc->efi_parts[8].p_size = resv;
@ -4234,12 +4264,11 @@ zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, char *name)
(void) close(fd);
efi_free(vtoc);
/* Wait for the first expected partition to appear. */
(void) snprintf(path, sizeof (path), "%s/%s", DISK_ROOT, name);
(void) zfs_append_partition(path, MAXPATHLEN);
rval = zpool_label_disk_wait(path, 3000);
/* Wait to udev to signal use the device has settled. */
rval = zpool_label_disk_wait(path, DISK_LABEL_WAIT);
if (rval) {
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, "failed to "
"detect device partitions on '%s': %d"), path, rval);

View File

@ -1487,9 +1487,13 @@ zfs_send(zfs_handle_t *zhp, const char *fromsnap, const char *tosnap,
drr_versioninfo, DMU_COMPOUNDSTREAM);
DMU_SET_FEATUREFLAGS(drr.drr_u.drr_begin.
drr_versioninfo, featureflags);
(void) snprintf(drr.drr_u.drr_begin.drr_toname,
if (snprintf(drr.drr_u.drr_begin.drr_toname,
sizeof (drr.drr_u.drr_begin.drr_toname),
"%s@%s", zhp->zfs_name, tosnap);
"%s@%s", zhp->zfs_name, tosnap) >=
sizeof (drr.drr_u.drr_begin.drr_toname)) {
err = EINVAL;
goto stderr_out;
}
drr.drr_payloadlen = buflen;
err = cksum_and_write(&drr, sizeof (drr), &zc, outfd);
@ -2003,7 +2007,7 @@ created_before(libzfs_handle_t *hdl, avl_tree_t *avl,
uint64_t guid1, uint64_t guid2)
{
nvlist_t *nvfs;
char *fsname, *snapname;
char *fsname = NULL, *snapname = NULL;
char buf[ZFS_MAXNAMELEN];
int rv;
zfs_handle_t *guid1hdl, *guid2hdl;
@ -2689,7 +2693,8 @@ zfs_receive_one(libzfs_handle_t *hdl, int infd, const char *tosnap,
ENOENT);
if (stream_avl != NULL) {
char *snapname;
char *snapname = NULL;
nvlist_t *lookup = NULL;
nvlist_t *fs = fsavl_find(stream_avl, drrb->drr_toguid,
&snapname);
nvlist_t *props;
@ -2710,6 +2715,11 @@ zfs_receive_one(libzfs_handle_t *hdl, int infd, const char *tosnap,
nvlist_free(props);
if (ret != 0)
return (-1);
if (0 == nvlist_lookup_nvlist(fs, "snapprops", &lookup)) {
VERIFY(0 == nvlist_lookup_nvlist(lookup,
snapname, &snapprops_nvlist));
}
}
cp = NULL;

9
lib/libzfs/libzfs_util.c Normal file → Executable file
View File

@ -1024,16 +1024,18 @@ zfs_strcmp_pathname(char *name, char *cmp, int wholedisk)
int path_len, cmp_len;
char path_name[MAXPATHLEN];
char cmp_name[MAXPATHLEN];
char *dir;
char *dir, *dup;
/* Strip redundant slashes if one exists due to ZPOOL_IMPORT_PATH */
memset(cmp_name, 0, MAXPATHLEN);
dir = strtok(cmp, "/");
dup = strdup(cmp);
dir = strtok(dup, "/");
while (dir) {
strcat(cmp_name, "/");
strcat(cmp_name, dir);
dir = strtok(NULL, "/");
}
free(dup);
if (name[0] != '/')
return (zfs_strcmp_shortname(name, cmp_name, wholedisk));
@ -1350,7 +1352,8 @@ zprop_print_one_property(const char *name, zprop_get_cbdata_t *cbp,
continue;
}
if (cbp->cb_columns[i + 1] == GET_COL_NONE)
if (i == (ZFS_GET_NCOLS - 1) ||
cbp->cb_columns[i + 1] == GET_COL_NONE)
(void) printf("%s", str);
else if (cbp->cb_scripted)
(void) printf("%s\t", str);

View File

@ -20,6 +20,7 @@
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
*/
#include <assert.h>
@ -1322,3 +1323,24 @@ spl_fstrans_check(void)
{
return (0);
}
void
zvol_create_minors(spa_t *spa, const char *name, boolean_t async)
{
}
void
zvol_remove_minor(spa_t *spa, const char *name, boolean_t async)
{
}
void
zvol_remove_minors(spa_t *spa, const char *name, boolean_t async)
{
}
void
zvol_rename_minors(spa_t *spa, const char *oldname, const char *newname,
boolean_t async)
{
}

View File

@ -24,6 +24,19 @@ Description of the different parameters to the ZFS module.
.sp
.LP
.sp
.ne 2
.na
\fBignore_hole_birth\fR (int)
.ad
.RS 12n
When set, the hole_birth optimization will not be used, and all holes will
always be sent on zfs send. Useful if you suspect your datasets are affected
by a bug in hole_birth.
.sp
Use \fB1\fR (default) for on and \fB0\fR for off.
.RE
.sp
.ne 2
.na
@ -870,7 +883,12 @@ Default value: \fB10\fR.
Minimum asynchronous write I/Os active to each device.
See the section "ZFS I/O SCHEDULER".
.sp
Default value: \fB1\fR.
Lower values are associated with better latency on rotational media but poorer
resilver performance. The default value of 2 was chosen as a compromise. A
value of 3 has been shown to improve resilver performance further at a cost of
further increasing latency.
.sp
Default value: \fB2\fR.
.RE
.sp
@ -1598,6 +1616,23 @@ Prioritize requeued I/O
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na
\fBzio_taskq_batch_pct\fR (uint)
.ad
.RS 12n
Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
for IO. These workers are responsible for IO work such as compression and
checksum calculations. Fractional number of CPUs will be rounded down.
.sp
The default value of 75 was chosen to avoid using all CPUs which can result in
latency issues and inconsistent application performance, especially when high
compression is enabled.
.sp
Default value: \fB75\fR.
.RE
.sp
.ne 2
.na

View File

@ -26,7 +26,7 @@ zpool \- configures ZFS storage pools
.LP
.nf
\fBzpool add\fR [\fB-fn\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fIvdev\fR ...
\fBzpool add\fR [\fB-fgLnP\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fIvdev\fR ...
.fi
.LP
@ -94,7 +94,7 @@ zpool \- configures ZFS storage pools
.LP
.nf
\fBzpool iostat\fR [\fB-T\fR d | u ] [\fB-v\fR] [\fB-y\fR] [\fIpool\fR] ... [\fIinterval\fR[\fIcount\fR]]
\fBzpool iostat\fR [\fB-T\fR d | u ] [\fB-gLPvy\fR] [\fIpool\fR] ... [\fIinterval\fR[\fIcount\fR]]
.fi
.LP
@ -104,7 +104,7 @@ zpool \- configures ZFS storage pools
.LP
.nf
\fBzpool list\fR [\fB-T\fR d | u ] [\fB-Hv\fR] [\fB-o\fR \fIproperty\fR[,...]] [\fIpool\fR] ...
\fBzpool list\fR [\fB-T\fR d | u ] [\fB-HgLPv\fR] [\fB-o\fR \fIproperty\fR[,...]] [\fIpool\fR] ...
[\fIinterval\fR[\fIcount\fR]]
.fi
@ -150,12 +150,12 @@ zpool \- configures ZFS storage pools
.LP
.nf
\fBzpool split\fR [\fB-n\fR] [\fB-R\fR \fIaltroot\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fInewpool\fR [\fIdevice\fR ...]
\fBzpool split\fR [\fB-gLnP\fR] [\fB-R\fR \fIaltroot\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fInewpool\fR [\fIdevice\fR ...]
.fi
.LP
.nf
\fBzpool status\fR [\fB-xvD\fR] [\fB-T\fR d | u] [\fIpool\fR] ... [\fIinterval\fR [\fIcount\fR]]
\fBzpool status\fR [\fB-gLPvxD\fR] [\fB-T\fR d | u] [\fIpool\fR] ... [\fIinterval\fR [\fIcount\fR]]
.fi
.LP
@ -836,7 +836,7 @@ Displays a help message.
.ne 2
.mk
.na
\fB\fBzpool add\fR [\fB-fn\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fIvdev\fR ...\fR
\fB\fBzpool add\fR [\fB-fgLnP\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fIvdev\fR ...\fR
.ad
.sp .6
.RS 4n
@ -852,6 +852,28 @@ Adds the specified virtual devices to the given pool. The \fIvdev\fR specificati
Forces use of \fBvdev\fRs, even if they appear in use or specify a conflicting replication level. Not all devices can be overridden in this manner.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-g\fR\fR
.ad
.RS 6n
.rt
Display vdev GUIDs instead of the normal device names. These GUIDs can be used in place of device names for the zpool detach/offline/remove/replace commands.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-L\fR\fR
.ad
.RS 6n
.rt
Display real paths for vdevs resolving all symbolic links. This can be used to look up the current block device name regardless of the /dev/disk/ path used to open it.
.RE
.sp
.ne 2
.mk
@ -863,6 +885,17 @@ Forces use of \fBvdev\fRs, even if they appear in use or specify a conflicting r
Displays the configuration that would be used without actually adding the \fBvdev\fRs. The actual pool creation can still fail due to insufficient privileges or device sharing.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-P\fR\fR
.ad
.RS 6n
.rt
Display full paths for vdevs instead of only the last component of the path. This can be used in conjunction with the \fB-L\fR flag.
.RE
.sp
.ne 2
.mk
@ -1608,7 +1641,7 @@ Allows a pool to import when there is a missing log device.
.ne 2
.mk
.na
\fB\fBzpool iostat\fR [\fB-T\fR \fBd\fR | \fBu\fR] [\fB-v\fR] [\fB-y\fR] [\fIpool\fR] ... [\fIinterval\fR[\fIcount\fR]]\fR
\fB\fBzpool iostat\fR [\fB-T\fR \fBd\fR | \fBu\fR] [\fB-gLPvy\fR] [\fIpool\fR] ... [\fIinterval\fR[\fIcount\fR]]\fR
.ad
.sp .6
.RS 4n
@ -1626,6 +1659,39 @@ Display a time stamp.
Specify \fBu\fR for a printed representation of the internal representation of time. See \fBtime\fR(2). Specify \fBd\fR for standard date format. See \fBdate\fR(1).
.RE
.sp
.ne 2
.mk
.na
\fB\fB-g\fR\fR
.ad
.RS 12n
.rt
Display vdev GUIDs instead of the normal device names. These GUIDs can be used in place of device names for the zpool detach/offline/remove/replace commands.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-L\fR\fR
.ad
.RS 12n
.rt
Display real paths for vdevs resolving all symbolic links. This can be used to look up the current block device name regardless of the /dev/disk/ path used to open it.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-P\fR\fR
.ad
.RS 12n
.rt
Display full paths for vdevs instead of only the last component of the path. This can be used in conjunction with the \fB-L\fR flag.
.RE
.sp
.ne 2
.mk
@ -1676,7 +1742,7 @@ Treat exported or foreign devices as inactive.
.ne 2
.mk
.na
\fB\fBzpool list\fR [\fB-T\fR \fBd\fR | \fBu\fR] [\fB-Hv\fR] [\fB-o\fR \fIprops\fR[,...]] [\fIpool\fR] ... [\fIinterval\fR[\fIcount\fR]]\fR
\fB\fBzpool list\fR [\fB-T\fR \fBd\fR | \fBu\fR] [\fB-HgLPv\fR] [\fB-o\fR \fIprops\fR[,...]] [\fIpool\fR] ... [\fIinterval\fR[\fIcount\fR]]\fR
.ad
.sp .6
.RS 4n
@ -1692,6 +1758,39 @@ Lists the given pools along with a health status and space usage. If no \fIpools
Scripted mode. Do not display headers, and separate fields by a single tab instead of arbitrary space.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-g\fR\fR
.ad
.RS 12n
.rt
Display vdev GUIDs instead of the normal device names. These GUIDs can be used in place of device names for the zpool detach/offline/remove/replace commands.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-L\fR\fR
.ad
.RS 12n
.rt
Display real paths for vdevs resolving all symbolic links. This can be used to look up the current block device name regardless of the /dev/disk/ path used to open it.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-P\fR\fR
.ad
.RS 12n
.rt
Display full paths for vdevs instead of only the last component of the path. This can be used in conjunction with the \fB-L\fR flag.
.RE
.ne 2
.mk
.na
@ -1886,7 +1985,7 @@ Sets the given property on the specified pool. See the "Properties" section for
.ne 2
.mk
.na
\fBzpool split\fR [\fB-n\fR] [\fB-R\fR \fIaltroot\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fInewpool\fR [\fIdevice\fR ...]
\fBzpool split\fR [\fB-gLnP\fR] [\fB-R\fR \fIaltroot\fR] [\fB-o\fR \fIproperty=value\fR] \fIpool\fR \fInewpool\fR [\fIdevice\fR ...]
.ad
.sp .6
.RS 4n
@ -1894,6 +1993,28 @@ Split devices off \fIpool\fR creating \fInewpool\fR. All \fBvdev\fRs in \fIpool\
The optional \fIdevice\fR specification causes the specified device(s) to be included in the new pool and, should any devices remain unspecified, the last device in each mirror is used as would be by default.
.sp
.ne 2
.mk
.na
\fB\fB-g\fR\fR
.ad
.RS 6n
.rt
Display vdev GUIDs instead of the normal device names. These GUIDs can be used in place of device names for the zpool detach/offline/remove/replace commands.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-L\fR\fR
.ad
.RS 6n
.rt
Display real paths for vdevs resolving all symbolic links. This can be used to look up the current block device name regardless of the /dev/disk/ path used to open it.
.RE
.sp
.ne 2
.mk
@ -1905,6 +2026,17 @@ The optional \fIdevice\fR specification causes the specified device(s) to be inc
Do dry run, do not actually perform the split. Print out the expected configuration of \fInewpool\fR.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-P\fR\fR
.ad
.RS 6n
.rt
Display full paths for vdevs instead of only the last component of the path. This can be used in conjunction with the \fB-L\fR flag.
.RE
.sp
.ne 2
.mk
@ -1933,22 +2065,45 @@ Sets the specified property for \fInewpool\fR. See the “Properties” section
.ne 2
.mk
.na
\fBzpool status\fR [\fB-xvD\fR] [\fB-T\fR d | u] [\fIpool\fR] ... [\fIinterval\fR [\fIcount\fR]]
\fBzpool status\fR [\fB-gLPvxD\fR] [\fB-T\fR d | u] [\fIpool\fR] ... [\fIinterval\fR [\fIcount\fR]]
.ad
.sp .6
.RS 4n
Displays the detailed health status for the given pools. If no \fIpool\fR is specified, then the status of each pool in the system is displayed. For more information on pool and device health, see the "Device Failure and Recovery" section.
.sp
If a scrub or resilver is in progress, this command reports the percentage done and the estimated time to completion. Both of these are only approximate, because the amount of data in the pool and the other workloads on the system can change.
.sp
.ne 2
.mk
.na
\fB\fB-x\fR\fR
\fB\fB-g\fR\fR
.ad
.RS 12n
.rt
Only display status for pools that are exhibiting errors or are otherwise unavailable. Warnings about pools not using the latest on-disk format will not be included.
Display vdev GUIDs instead of the normal device names. These GUIDs can be used innplace of device names for the zpool detach/offline/remove/replace commands.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-L\fR\fR
.ad
.RS 12n
.rt
Display real paths for vdevs resolving all symbolic links. This can be used to look up the current block device name regardless of the /dev/disk/ path used to open it.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-P\fR\fR
.ad
.RS 12n
.rt
Display full paths for vdevs instead of only the last component of the path. This can be used in conjunction with the \fB-L\fR flag.
.RE
.sp
@ -1962,6 +2117,17 @@ Only display status for pools that are exhibiting errors or are otherwise unavai
Displays verbose data error information, printing out a complete list of all data errors since the last complete pool scrub.
.RE
.sp
.ne 2
.mk
.na
\fB\fB-x\fR\fR
.ad
.RS 12n
.rt
Only display status for pools that are exhibiting errors or are otherwise unavailable. Warnings about pools not using the latest on-disk format will not be included.
.RE
.sp
.ne 2
.mk
@ -2401,6 +2567,17 @@ Cause \fBzpool\fR to dump core on exit for the purposes of running \fB::findleak
.B "ZPOOL_IMPORT_PATH"
The search path for devices or files to use with the pool. This is a colon-separated list of directories in which \fBzpool\fR looks for device nodes and files.
Similar to the \fB-d\fR option in \fIzpool import\fR.
.TP
.B "ZPOOL_VDEV_NAME_GUID"
Cause \fBzpool\fR subcommands to output vdev guids by default. This behavior
is identical to the \fBzpool status -g\fR command line option.
.TP
.B "ZPOOL_VDEV_NAME_FOLLOW_LINKS"
Cause \fBzpool\fR subcommands to follow links for vdev names by default. This behavior is identical to the \fBzpool status -L\fR command line option.
.TP
.B "ZPOOL_VDEV_NAME_PATH"
Cause \fBzpool\fR subcommands to output full vdev path names by default. This
behavior is identical to the \fBzpool status -p\fR command line option.
.SH SEE ALSO
.sp

View File

@ -47,7 +47,7 @@ modules_install:
KERNELRELEASE=@LINUX_VERSION@
@# Remove extraneous build products when packaging
kmoddir=$(DESTDIR)$(INSTALL_MOD_PATH)/lib/modules/@LINUX_VERSION@; \
if [ -n $$kmoddir ]; then \
if [ -n "$(DESTDIR)" ]; then \
find $$kmoddir -name 'modules.*' | xargs $(RM); \
fi
sysmap=$(DESTDIR)$(INSTALL_MOD_PATH)/boot/System.map-@LINUX_VERSION@; \

View File

@ -630,7 +630,7 @@ avl_insert_here(
void
avl_add(avl_tree_t *tree, void *new_node)
{
avl_index_t where;
avl_index_t where = 0;
/*
* This is unfortunate. We want to call panic() here, even for

View File

@ -21,6 +21,7 @@
/*
* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2015, 2016 by Delphix. All rights reserved.
*/
#include <sys/stropts.h>
@ -138,6 +139,11 @@ static int nvlist_add_common(nvlist_t *nvl, const char *name, data_type_t type,
#define NVPAIR2I_NVP(nvp) \
((i_nvp_t *)((size_t)(nvp) - offsetof(i_nvp_t, nvi_nvp)))
#ifdef _KERNEL
int nvpair_max_recursion = 20;
#else
int nvpair_max_recursion = 100;
#endif
int
nv_alloc_init(nv_alloc_t *nva, const nv_alloc_ops_t *nvo, /* args */ ...)
@ -2017,6 +2023,7 @@ typedef struct {
const nvs_ops_t *nvs_ops;
void *nvs_private;
nvpriv_t *nvs_priv;
int nvs_recursion;
} nvstream_t;
/*
@ -2168,9 +2175,16 @@ static int
nvs_embedded(nvstream_t *nvs, nvlist_t *embedded)
{
switch (nvs->nvs_op) {
case NVS_OP_ENCODE:
return (nvs_operation(nvs, embedded, NULL));
case NVS_OP_ENCODE: {
int err;
if (nvs->nvs_recursion >= nvpair_max_recursion)
return (EINVAL);
nvs->nvs_recursion++;
err = nvs_operation(nvs, embedded, NULL);
nvs->nvs_recursion--;
return (err);
}
case NVS_OP_DECODE: {
nvpriv_t *priv;
int err;
@ -2183,8 +2197,12 @@ nvs_embedded(nvstream_t *nvs, nvlist_t *embedded)
nvlist_init(embedded, embedded->nvl_nvflag, priv);
if (nvs->nvs_recursion >= nvpair_max_recursion)
return (EINVAL);
nvs->nvs_recursion++;
if ((err = nvs_operation(nvs, embedded, NULL)) != 0)
nvlist_free(embedded);
nvs->nvs_recursion--;
return (err);
}
default:
@ -2272,6 +2290,7 @@ nvlist_common(nvlist_t *nvl, char *buf, size_t *buflen, int encoding,
return (EINVAL);
nvs.nvs_op = nvs_op;
nvs.nvs_recursion = 0;
/*
* For NVS_OP_ENCODE and NVS_OP_DECODE make sure an nvlist and

View File

@ -1451,6 +1451,13 @@ arc_buf_info(arc_buf_t *ab, arc_buf_info_t *abi, int state_index)
l2arc_buf_hdr_t *l2hdr = NULL;
arc_state_t *state = NULL;
memset(abi, 0, sizeof (arc_buf_info_t));
if (hdr == NULL)
return;
abi->abi_flags = hdr->b_flags;
if (HDR_HAS_L1HDR(hdr)) {
l1hdr = &hdr->b_l1hdr;
state = l1hdr->b_state;
@ -1458,9 +1465,6 @@ arc_buf_info(arc_buf_t *ab, arc_buf_info_t *abi, int state_index)
if (HDR_HAS_L2HDR(hdr))
l2hdr = &hdr->b_l2hdr;
memset(abi, 0, sizeof (arc_buf_info_t));
abi->abi_flags = hdr->b_flags;
if (l1hdr) {
abi->abi_datacnt = l1hdr->b_datacnt;
abi->abi_access = l1hdr->b_arc_access;
@ -2697,12 +2701,7 @@ arc_prune_task(void *ptr)
if (func != NULL)
func(ap->p_adjust, ap->p_private);
/* Callback unregistered concurrently with execution */
if (refcount_remove(&ap->p_refcnt, func) == 0) {
ASSERT(!list_link_active(&ap->p_node));
refcount_destroy(&ap->p_refcnt);
kmem_free(ap, sizeof (*ap));
}
refcount_remove(&ap->p_refcnt, func);
}
/*
@ -3179,13 +3178,10 @@ arc_flush(spa_t *spa, boolean_t retry)
void
arc_shrink(int64_t to_free)
{
if (arc_c > arc_c_min) {
if (arc_c > arc_c_min + to_free)
atomic_add_64(&arc_c, -to_free);
else
arc_c = arc_c_min;
uint64_t c = arc_c;
if (c > to_free && c - to_free > arc_c_min) {
arc_c = c - to_free;
atomic_add_64(&arc_p, -(arc_p >> arc_shrink_shift));
if (arc_c > arc_size)
arc_c = MAX(arc_size, arc_c_min);
@ -3193,6 +3189,8 @@ arc_shrink(int64_t to_free)
arc_p = (arc_c >> 1);
ASSERT(arc_c >= arc_c_min);
ASSERT((int64_t)arc_p >= 0);
} else {
arc_c = arc_c_min;
}
if (arc_size > arc_c)
@ -3373,6 +3371,11 @@ arc_kmem_reap_now(void)
}
for (i = 0; i < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT; i++) {
#ifdef _ILP32
/* reach upper limit of cache size on 32-bit */
if (zio_buf_cache[i] == NULL)
break;
#endif
if (zio_buf_cache[i] != prev_cache) {
prev_cache = zio_buf_cache[i];
kmem_cache_reap_now(zio_buf_cache[i]);
@ -3757,7 +3760,7 @@ arc_adapt(int bytes, arc_state_t *state)
* If we're within (2 * maxblocksize) bytes of the target
* cache size, increment the target cache size
*/
VERIFY3U(arc_c, >=, 2ULL << SPA_MAXBLOCKSHIFT);
ASSERT3U(arc_c, >=, 2ULL << SPA_MAXBLOCKSHIFT);
if (arc_size >= arc_c - (2ULL << SPA_MAXBLOCKSHIFT)) {
atomic_add_64(&arc_c, (int64_t)bytes);
if (arc_c > arc_c_max)
@ -4316,17 +4319,11 @@ top:
/*
* Gracefully handle a damaged logical block size as a
* checksum error by passing a dummy zio to the done callback.
* checksum error.
*/
if (size > spa_maxblocksize(spa)) {
if (done) {
rzio = zio_null(pio, spa, NULL,
NULL, NULL, zio_flags);
rzio->io_error = ECKSUM;
done(rzio, buf, private);
zio_nowait(rzio);
}
rc = ECKSUM;
ASSERT3P(buf, ==, NULL);
rc = SET_ERROR(ECKSUM);
goto out;
}
@ -4564,13 +4561,19 @@ arc_add_prune_callback(arc_prune_func_t *func, void *private)
void
arc_remove_prune_callback(arc_prune_t *p)
{
boolean_t wait = B_FALSE;
mutex_enter(&arc_prune_mtx);
list_remove(&arc_prune_list, p);
if (refcount_remove(&p->p_refcnt, &arc_prune_list) == 0) {
refcount_destroy(&p->p_refcnt);
kmem_free(p, sizeof (*p));
}
if (refcount_remove(&p->p_refcnt, &arc_prune_list) > 0)
wait = B_TRUE;
mutex_exit(&arc_prune_mtx);
/* wait for arc_prune_task to finish */
if (wait)
taskq_wait_outstanding(arc_prune_taskq, 0);
ASSERT0(refcount_count(&p->p_refcnt));
refcount_destroy(&p->p_refcnt);
kmem_free(p, sizeof (*p));
}
void
@ -5100,7 +5103,9 @@ arc_tempreserve_space(uint64_t reserve, uint64_t txg)
int error;
uint64_t anon_size;
if (reserve > arc_c/4 && !arc_no_grow)
if (!arc_no_grow &&
reserve > arc_c/4 &&
reserve * 4 > (2ULL << SPA_MAXBLOCKSHIFT))
arc_c = MIN(arc_c_max, reserve * 4);
/*
@ -5244,7 +5249,7 @@ arc_tuning_update(void)
arc_c_max = zfs_arc_max;
arc_c = arc_c_max;
arc_p = (arc_c >> 1);
arc_meta_limit = MIN(arc_meta_limit, arc_c_max);
arc_meta_limit = MIN(arc_meta_limit, (3 * arc_c_max) / 4);
}
/* Valid range: 32M - <arc_c_max> */
@ -5470,14 +5475,15 @@ arc_init(void)
* If it has been set by a module parameter, take that.
* Otherwise, use a percentage of physical memory defined by
* zfs_dirty_data_max_percent (default 10%) with a cap at
* zfs_dirty_data_max_max (default 25% of physical memory).
* zfs_dirty_data_max_max (default 4G or 25% of physical memory).
*/
if (zfs_dirty_data_max_max == 0)
zfs_dirty_data_max_max = physmem * PAGESIZE *
zfs_dirty_data_max_max_percent / 100;
zfs_dirty_data_max_max = MIN(4ULL * 1024 * 1024 * 1024,
(uint64_t)physmem * PAGESIZE *
zfs_dirty_data_max_max_percent / 100);
if (zfs_dirty_data_max == 0) {
zfs_dirty_data_max = physmem * PAGESIZE *
zfs_dirty_data_max = (uint64_t)physmem * PAGESIZE *
zfs_dirty_data_max_percent / 100;
zfs_dirty_data_max = MIN(zfs_dirty_data_max,
zfs_dirty_data_max_max);

View File

@ -303,7 +303,7 @@ dbuf_verify_user(dmu_buf_impl_t *db, dbvu_verify_type_t verify_type)
*/
ASSERT3U(holds, >=, db->db_dirtycnt);
} else {
if (db->db_immediate_evict == TRUE)
if (db->db_user_immediate_evict == TRUE)
ASSERT3U(holds, >=, db->db_dirtycnt);
else
ASSERT3U(holds, >, 0);
@ -1880,8 +1880,9 @@ dbuf_create(dnode_t *dn, uint8_t level, uint64_t blkid,
db->db_blkptr = blkptr;
db->db_user = NULL;
db->db_immediate_evict = 0;
db->db_freed_in_flight = 0;
db->db_user_immediate_evict = FALSE;
db->db_freed_in_flight = FALSE;
db->db_pending_evict = FALSE;
if (blkid == DMU_BONUS_BLKID) {
ASSERT3P(parent, ==, dn->dn_dbuf);
@ -2318,12 +2319,13 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
arc_buf_freeze(db->db_buf);
if (holds == db->db_dirtycnt &&
db->db_level == 0 && db->db_immediate_evict)
db->db_level == 0 && db->db_user_immediate_evict)
dbuf_evict_user(db);
if (holds == 0) {
if (db->db_blkid == DMU_BONUS_BLKID) {
dnode_t *dn;
boolean_t evict_dbuf = db->db_pending_evict;
/*
* If the dnode moves here, we cannot cross this
@ -2338,7 +2340,7 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
* Decrementing the dbuf count means that the bonus
* buffer's dnode hold is no longer discounted in
* dnode_move(). The dnode cannot move until after
* the dnode_rele_and_unlock() below.
* the dnode_rele() below.
*/
DB_DNODE_EXIT(db);
@ -2348,35 +2350,10 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
*/
mutex_exit(&db->db_mtx);
/*
* If the dnode has been freed, evict the bonus
* buffer immediately. The data in the bonus
* buffer is no longer relevant and this prevents
* a stale bonus buffer from being associated
* with this dnode_t should the dnode_t be reused
* prior to being destroyed.
*/
mutex_enter(&dn->dn_mtx);
if (dn->dn_type == DMU_OT_NONE ||
dn->dn_free_txg != 0) {
/*
* Drop dn_mtx. It is a leaf lock and
* cannot be held when dnode_evict_bonus()
* acquires other locks in order to
* perform the eviction.
*
* Freed dnodes cannot be reused until the
* last hold is released. Since this bonus
* buffer has a hold, the dnode will remain
* in the free state, even without dn_mtx
* held, until the dnode_rele_and_unlock()
* below.
*/
mutex_exit(&dn->dn_mtx);
if (evict_dbuf)
dnode_evict_bonus(dn);
mutex_enter(&dn->dn_mtx);
}
dnode_rele_and_unlock(dn, db);
dnode_rele(dn, db);
} else if (db->db_buf == NULL) {
/*
* This is a special case: we never associated this
@ -2423,7 +2400,7 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
} else {
dbuf_clear(db);
}
} else if (db->db_objset->os_evicting ||
} else if (db->db_pending_evict ||
arc_buf_eviction_needed(db->db_buf)) {
dbuf_clear(db);
} else {
@ -2471,7 +2448,7 @@ dmu_buf_set_user_ie(dmu_buf_t *db_fake, dmu_buf_user_t *user)
{
dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake;
db->db_immediate_evict = TRUE;
db->db_user_immediate_evict = TRUE;
return (dmu_buf_set_user(db_fake, user));
}
@ -2651,6 +2628,22 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx)
if (db->db_blkid == DMU_SPILL_BLKID) {
mutex_enter(&dn->dn_mtx);
if (!(dn->dn_phys->dn_flags & DNODE_FLAG_SPILL_BLKPTR)) {
/*
* In the previous transaction group, the bonus buffer
* was entirely used to store the attributes for the
* dnode which overrode the dn_spill field. However,
* when adding more attributes to the file a spill
* block was required to hold the extra attributes.
*
* Make sure to clear the garbage left in the dn_spill
* field from the previous attributes in the bonus
* buffer. Otherwise, after writing out the spill
* block to the new allocated dva, it will free
* the old block pointed to by the invalid dn_spill.
*/
db->db_blkptr = NULL;
}
dn->dn_phys->dn_flags |= DNODE_FLAG_SPILL_BLKPTR;
mutex_exit(&dn->dn_mtx);
}

View File

@ -148,7 +148,6 @@ dbuf_stats_hash_table_data(char *buf, size_t size, void *data)
}
mutex_enter(&db->db_mtx);
mutex_exit(DBUF_HASH_MUTEX(h, dsh->idx));
if (db->db_state != DB_EVICTING) {
length = __dbuf_stats_hash_table_data(buf, size, db);
@ -157,7 +156,6 @@ dbuf_stats_hash_table_data(char *buf, size_t size, void *data)
}
mutex_exit(&db->db_mtx);
mutex_enter(DBUF_HASH_MUTEX(h, dsh->idx));
}
mutex_exit(DBUF_HASH_MUTEX(h, dsh->idx));

Some files were not shown because too many files have changed in this diff Show More