Compare commits

...

128 Commits

Author SHA1 Message Date
Tony Hutter 2bc71fa976 Prepare to release 0.6.5.11
META file and RPM release log updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-07-10 11:01:14 -07:00
Tony Hutter 5a20d4283c Linux 4.12 compat: super_setup_bdi_name() - add missing code
This includes code that was mistakenly left out of the 7dae2c8 merge into
0.6.5.10.  Its inclusion fixes a kernel warning on Kubuntu 17.04:

	WARN_ON(sb->s_bdi != &noop_backing_dev_info);

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6089
Closes #6324
(backported from zfs upstream commit 7dae2c81e7)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
2017-07-10 11:00:34 -07:00
alaviss bf04e4d442 Musl libc fixes
Musl libc's <stdio.h> doesn't include <stdarg.h>, which cause
`va_start` and `va_end` end up being undefined symbols.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Leorize <alaviss@users.noreply.github.com>
Closes #6310
2017-07-06 15:25:39 -07:00
DHE 5e6057b574 Increase zfs_vdev_async_write_min_active to 2
Resilver operations frequently cause only a small amount of dirty data
to be written to disk at a time, resulting in the IO scheduler to only
issue 1 write at a time to the resilvering disk. When it is rotational
media the drive will often travel past the next sector to be written
before receiving a write command from ZFS, significantly delaying the
write of the next sector.

Raise zfs_vdev_async_write_min_active so that drives are kept fed
during resilvering.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Issue #4825
Closes #5926
2017-07-06 15:25:39 -07:00
loli10K 94d353a0bf Fix int overflow in zbookmark_is_before()
When the DSL scan code tries to resume the scrub from the saved
zbookmark calls dsl_scan_check_resume()->zbookmark_is_before() to
decide if the current dnode still needs to be visited.

A subtle int overflow condition in zbookmark_is_before(), exacerbated
by bumping the indirect block size to 128K (d7958b4), can lead to the
wrong assuption that the dnode does not need to be scanned.

This results in scrubs completing "successfully" in matter of mere
minutes on pools with several TB of used space because every time we
try to resume the dnode traversal on a dataset zbookmark_is_before()
tells us the whole objset has already been scanned completely.

Fix this by forcing the right shift operator to be executed before
the multiplication, as done in zbookmark_compare() (fcff0f3).

Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
2017-07-06 15:25:39 -07:00
Tony Hutter e9fc1bd5e6 Fix RHEL 7.4 bio_set_op_attrs build error
On RHEL 7.4, include/linux/bio.h now includes a macro for
bio_set_op_attrs that conflicts with the ifndef in ZFS
include/linux/blkdev_compat.h.  This patch fixes the build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6234
Closes #6271
2017-07-06 15:25:39 -07:00
Tony Hutter b88f4d7ba7 GCC 7.1 fixes
GCC 7.1 with will warn when we're not checking the snprintf()
return code in cases where the buffer could be truncated. This
patch either checks the snprintf return code (where applicable),
or simply disables the warnings (ztest.c).

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6253
2017-07-06 15:25:39 -07:00
Brian Behlendorf 3e297b90f5 Remove complicated libspl assert wrappers
Effectively provide our own version of assert()/verify() for use
in user space.  This minimizes our dependencies and aligns the
user space assertion handling with what's used in the kernel.

Signed-off-by: Carlo Landmeter <clandmeter@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4449
2017-07-06 15:25:39 -07:00
Justin Lecher 709f25e248 Compatibilty with glibc-2.23
In glibc-2.23 <sys/sysmacros.h> isn't automatically included in
<sys/types.h> [1], so we need ot explicitely include it.

https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Justin Lecher <jlec@gentoo.org>
Closes #6132
2017-07-06 15:25:39 -07:00
Olaf Faaland cd2209b75e glibc 2.5 compat: use correct header for makedev() et al.
In glibc 2.5, makedev(), major(), and minor() are defined in
sys/sysmacros.h.  They are also defined in types.h for backward
compatability, but using these definitions triggers a compile warning.
This breaks the ZFS build, as it builds with -Werror.

autoconf email threads indicate these macros may be defined in
sys/mkdev.h in some cases.

This commit adds configure checks to detect where makedev() is defined:
  sys/sysmacros.h
  sys/mkdev.h

It assumes major() and minor() are defined in the same place.

The libspl types.h then includes
	sys/sysmacros.h (preferred) or
	sys/mkdev.h (2nd choice)
if one of those defines makedev().

This is done before including the system types.h.

An alternative would be to remove uses of major, minor, and makedev,
instead comparing the st_dev returned from stat64.  These configure
checks would then be unnecessary.

This change revealed that __NORETURN was being defined unnecessarily in
libspl/include/sys/sysmacros.h.  That definition is removed.

The files in which __NORETURN are used all include types.h, and so all
will get the definition provided by feature_tests.h

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5945
2017-07-06 15:25:39 -07:00
Tony Hutter a57fa2c532 Prepare to release 0.6.5.10
META file and RPM release log updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-06-12 15:31:33 -04:00
Brian Behlendorf 590509b75e Add MS_MANDLOCK mount failure message
Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4729
Closes #6199
2017-06-09 14:05:15 -07:00
Matthew Ahrens d07a8deac8 OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations
Authored by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Don Brady <don.brady@intel.com>
Ported-by: Matt Ahrens <mahrens@delphix.com>

RAID-Z requires that space be allocated in multiples of P+1 sectors,
because this is the minimum size block that can have the required amount
of parity.  Thus blocks on RAIDZ1 must be allocated in a multiple of 2
sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4.  A sector
is a unit of 2^ashift bytes, typically 512B or 4KB.

To satisfy this constraint, the allocation size is rounded up to the
proper multiple, resulting in up to 3 "pad sectors" at the end of some
blocks.  The contents of these pad sectors are not used, so we do not
need to read or write these sectors.  However, some storage hardware
performs much worse (around 1/2 as fast) on mostly-contiguous writes
when there are small gaps of non-overwritten data between the writes.
Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
include pad sectors.  If writing a pad sector will fill the gap between
two (required) writes, we will issue the optional zio, thus doubling
performance.  The gap-filling performance improvement was introduced in
July 2009.

Writing the optional zio is done by the io aggregation code in
vdev_queue.c.  The problem is that it is also subject to the limit on
the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
default 128KB.  For a given block, if the amount of data plus padding
written to a leaf device exceeds zfs_vdev_aggregation_limit, the
optional zio will not be written, resulting in a ~2x performance
degradation.

The problem occurs only for certain values of ashift, compressed block
size, and RAID-Z configuration (number of parity and data disks).  It
cannot occur with the default recordsize=128KB.  If compression is
enabled, all configurations with recordsize=1MB or larger will be
impacted to some degree.

The problem notably occurs with recordsize=1MB, compression=off, with 10
disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors).  Therefore
this problem has been known as "the 1MB 10-wide RAIDZ2 (or 3) problem".

The problem also occurs with the following configurations:

With recordsize=512KB or 256KB, compression=off, the problem occurs only
in rarely-used configurations:
* 4-wide RAIDZ1 with recordsize=512KB and ashift=12 (4KB sectors)
* 4-wide RAIDZ2 (either recordsize, either ashift)
* 5-wide RAIDZ2 with recordsize=512KB (either ashift)
* 6-wide RAIDZ2 with recordsize=512KB (either ashift)

With recordsize=1MB, compression=off, ashift=9 (512B sectors)
* RAIDZ1 with 4 or 8 disks
* RAIDZ2 with 4, 8, or 10 disks
* RAIDZ3 with 6, 8, 9, or 10 disks

With recordsize=1MB, compression=off, ashift=12 (4KB sectors)
* RAIDZ1 with 7 or 8 disks
* RAIDZ2 with 4, 5, or 10 disks
* RAIDZ3 with 6, 9, or 10 disks

With recordsize=2MB and larger (which can only be selected by changing
kernel tunables), many configurations are affected, including with
higher numbers of disks (up to 18 disks with recordsize=2MB).

Increase zfs_vdev_aggregation_limit to allow the optional zio to be
aggregated, thus eliminating the problem.  Setting it to 256KB fixes all
commonly-used configurations.

The solution is to aggregate optional zio's regardless of the
aggregation size limit.

Analysis sponsored by Intel Corp.

OpenZFS-issue: https://www.illumos.org/issues/8005
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/321
Closes #5931
2017-06-09 14:05:15 -07:00
Chunwei Chen 69494c6aff Fix import wrong spare/l2 device when path change
If, for example, your aux device was /dev/sdc, but now the aux device is
removed and /dev/sdc points to other device. zpool import will still
use that device and corrupt it.

The problem is that the spa_validate_aux in spa_import, rather than
validate the on-disk label, it would actually write label to disk. We
remove them since spa_load_{spares,l2cache} seems to do everything we
need and they would actually validate on-disk label.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6158
2017-06-09 14:05:15 -07:00
Chunwei Chen 412e3c26a9 Fix import finding spare/l2cache when path changes
When spare or l2cache device path changes, zpool import will not fix up
their paths like normal vdev. The issue is that when you supply a pool
name argument to zpool import, it will use it to filter out device which
doesn't have the pool name in the label. Since spare and l2cache device
never have that in the label, they'll always get filtered out.

We fix this by making sure we never filter out a spare or l2cache
device.

Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6158
2017-06-09 14:05:15 -07:00
LOLi ed9cb8390b Linux 4.9 compat: fix zfs_ctldir xattr handling
Since torvalds/linux@d0a5b99 IOP_XATTR is used to indicate the inode
has xattr support: clear it for the ctldir inodes to avoid EIO errors.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6189
2017-06-09 14:05:15 -07:00
LOLi cb8210d125 Linux 4.12 compat: fix super_setup_bdi_name() call
Provide a format parameter to super_setup_bdi_name() so we don't
create duplicate names in '/devices/virtual/bdi' sysfs namespace which
would prevent us from mounting more than one ZFS filesystem at a time.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6147
2017-06-09 14:05:15 -07:00
Brian Behlendorf 21fd04ec40 Linux 4.12 compat: CURRENT_TIME removed
Linux 4.9 added current_time() as the preferred interface to get
the filesystem time.  CURRENT_TIME was retired in Linux 4.12.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6114
2017-06-09 14:05:15 -07:00
Brian Behlendorf e4cb6ee6a5 Linux 4.12 compat: super_setup_bdi_name()
All filesystems were converted to dynamically allocated BDIs.  The
destruction of backing_dev_info structures is handled as part of
super block destruction.  Refactor the code to abstract away the
details of creating and destroying a BDI.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6089
2017-06-09 14:05:15 -07:00
Brian Behlendorf a83a4f9d10 Limit zfs_dirty_data_max_max to 4G
Reinstate default 4G zfs_dirty_data_max_max limit.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6072
Closes #6081
2017-06-09 14:05:15 -07:00
Matthew Ahrens 1e5f75ecbe OpenZFS 8166 - zpool scrub thinks it repaired offline device
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Matthew Ahrens <mahrens@delphix.com>

If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged.  When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started.  The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.

The fix is to never clear the DTL of offline devices.  Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.

The problem can be worked around by running "zpool scrub" after
"zpool online".

OpenZFS-issue: https://www.illumos.org/issues/8166
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/372
Closes #5806
Closes #6103
2017-06-09 14:05:15 -07:00
Ned Bass 36ccb9db43 vdev_id: fix failure due to multipath -l bug
Udev may fail to create the expected symbolic links in
/dev/disk/by-vdev on systems with the
device-mapper-multipath-0.4.9-100.el6 package installed. This affects
RHEL 6.9 and possibly other downstream distributions.

That version of the multipath command may incorrectly list a drive
state as "unkown" instead of "running". The issue was introduced
in the patch for https://bugzilla.redhat.com/show_bug.cgi?id=1401769

The vdev_id udev helper uses the state reported by "multipath -l" to
detect an online component disk of a multipath device in order to
resolve its physical slot and enclosure. Changing the command
invocation to "multipath -ll" works around the above issue by causing
multipath to consult additional sources of information to determine
the drive state.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6039
2017-06-09 14:05:15 -07:00
jxiong a2c9518711 Guarantee PAGESIZE alignment for large zio buffers
In current implementation, only zio buffers in 16KB and bigger are
guaranteed PAGESIZE alignment. This breaks Lustre since it assumes
that 'arc_buf_t::b_data' must be page aligned when zio buffers are
greater than or equal to PAGESIZE.

This patch will make the zio buffers to be PAGESIZE aligned when
the sizes are not less than PAGESIZE.

This change may cause a little bit memory waste but that should be
fine because after ABD is introduced, zio buffers are used to hold
data temporarily and live in memory for a short while.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Closes #6084
2017-06-09 14:05:15 -07:00
Tony Hutter cc519c4027 Fix harmless "BARRIER is deprecated" kernel warning on Centos 6.8
A one time warning after module load that "BARRIER is deprecated" was seen
on the heavily patched 2.6.32-642.13.1.el6.x86_64 Centos 6.8 kernel.  It seems
that kernel had both the old BARRIER and the newer FLUSH/FUA interfaces
defined.  This fixes the warning by prefering the newer FLUSH/FUA interface
if it's available.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5739
Closes #5828
2017-06-09 14:05:15 -07:00
Chunwei Chen dbb48937ce Add kmap_atomic in dmu_bio_copy
This is needed for 32 bit systems.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-06-09 14:05:15 -07:00
Tim Chase 34a3a7c660 zdb: segfault in dump_bpobj_subobjs()
Avoid buffer overrun on all-zero bpobj subobjects by using signed
array index.  Also fix the type cast on the printf() argument.

Signed-off-by: Tim Chase <tim@onlight.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3905
2017-06-09 14:05:15 -07:00
Brian Behlendorf 4a4c57d5ff Fix atomic_sub_64() i386 assembly implementation
The atomic_sub_64() should use sbbl instead of adcl.  In user
space these atomics are used for statistics tracking and aren't
critical which explain how this was overlooked.  The kernel
space implementation of these atomics are layered on the
architecture specific implementations provided by the kernel.

Reviewed by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5671
Closes #5717
2017-06-09 14:05:15 -07:00
Chunwei Chen 2094a93e87 Fix loop device becomes read-only
Commit 933ec99 removes read and write from f_op because the vfs layer will
select iter_write or aio_write automatically. However, for Linux <= 4.0,
loop_set_fd will actually check f_op->write and set read-only if not exists.
This patch add them back and use the generic do_sync_{read,write} for
aio_{read,write} and new_sync_{read,write} for {read,write}_iter.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5776
Closes #5855
2017-06-09 14:05:15 -07:00
loli10K 03336d011c Allow ZVOL bookmarks to be listed recursively
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4503
Closes #5072
2017-06-09 14:05:15 -07:00
Brian Behlendorf f0a4bfbe4d Fix zfs-mount.service failure on boot
The mount(8) command will helpfully try to resolve any device name
which is passed in.  It does this by applying some simple heuristics
before passing it along to the registered mount helper.

Normally this fine.  However, one of these heuristics is to prepend
the current working directory to the passed device name.  If that
resulting directory name exists mount(8) will perform the mount(2)
system call and never invoke the helper utility.

Since the cwd for systemd when running as the system instance is
the root directory the default mount points created by zfs(8) can
cause a mount failure.

This change avoids the issue by explicitly setting the cwd to
a different path when performing the mount.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5719
2017-06-09 14:05:15 -07:00
Brian Behlendorf ebef1f2fb6 Fix iput() calls within a tx
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5758
2017-06-09 14:05:15 -07:00
Chunwei Chen 00a1a11989 Fix off by one in zpl_lookup
Doing the following command would return success with zfs creating an orphan
object.

	touch $(for i in $(seq 256); do printf "n"; done)

The funny thing is that this will only work once for each directory, because
after upgraded to fzap, zfs_lookup would fail properly since it has additional
length check.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5768
2017-06-09 14:05:15 -07:00
Olaf Faaland b4c181dc76 Linux 4.11 compat: iops.getattr and friends
In torvalds/linux@a528d35, there are changes to the getattr family of functions,
struct kstat, and the interface of inode_operations .getattr.

The inode_operations .getattr and simple_getattr() interface changed to:

int (*getattr) (const struct path *, struct dentry *, struct kstat *,
    u32 request_mask, unsigned int query_flags)

The request_mask argument indicates which field(s) the caller intends to use.
Fields the caller has not specified via request_mask may be set in the returned
struct anyway, but their values may be approximate.

The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.

Currently both fields are ignored.  It is possible that getattr-related
functions within zfs could be optimized based on the request_mask.

struct kstat includes new fields:
u32               result_mask;  /* What fields the user got */
u64               attributes;   /* See STATX_ATTR_* flags */
struct timespec   btime;        /* File creation time */

Fields attribute and btime are cleared; the result_mask reflects this.  These
appear to be optional based on simple_getattr() and vfs_getattr() within the
kernel, which take the same approach.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5875
2017-06-09 14:05:15 -07:00
Olaf Faaland 626ba3142b Linux 4.11 compat: avoid refcount_t name conflict
Linux 4.11 introduces a new type, refcount_t, which conflicts with the
type of the same name defined within ZFS.

Rename the ZFS type zfs_refcount_t.  Within the ZFS code, use a macro to
cause references to refcount_t to be changed to zfs_refcount_t at
compile time.  This reduces conflicts when later landing OpenZFS
patches.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #5823
Closes #5842
2017-06-09 14:05:15 -07:00
Brian Behlendorf 0bbd80c058 Prepare to release 0.6.5.9
META file and RPM release log updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2017-02-03 13:11:42 -08:00
Chunwei Chen 10fbf7c406 Make zfs mount according to relatime config in dataset
Also enable lazytime in mount.zfs

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2017-02-03 11:58:19 -08:00
Chunwei Chen 1ad7f89628 Enable lazytime semantic for atime
Linux 4.0 introduces lazytime. The idea is that when we update the atime, we
delay writing it to disk for as long as it is reasonably possible.

When lazytime is enabled, dirty_inode will be called with only I_DIRTY_TIME
flag whenever i_atime is updated. So under such condition, we will set
z_atime_dirty. We will only write it to disk if file is closed, inode is
evicted or setattr is called. Ideally, we should also write it whenever SA
is going to be updated, but it is left for future improvement.

There's one thing that we should take care of now that we allow i_atime to be
dirty. In original implementation, whenever SA is modified, zfs_inode_update
will be called to overwrite every thing in inode. This will cause dirty
i_atime to be discarded. We fix this by don't overwrite i_atime in
zfs_inode_update. We only overwrite i_atime when allocating new inode or doing
zfs_rezget with zfs_inode_update_new.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2017-02-03 11:58:19 -08:00
Chunwei Chen 5137c95dec Fix atime handling and relatime
The problem for atime:

We have 3 places for atime: inode->i_atime, znode->z_atime and SA. And its
handling is a mess. A huge part of mess regarding atime comes from
zfs_tstamp_update_setup, zfs_inode_update, and zfs_getattr, which behave
inconsistently with those three values.

zfs_tstamp_update_setup clears z_atime_dirty unconditionally as long as you
don't pass ATTR_ATIME. Which means every write(2) operation which only updates
ctime and mtime will cause atime changes to not be written to disk.

Also zfs_inode_update from write(2) will replace inode->i_atime with what's
inside SA(stale). But doesn't touch z_atime. So after read(2) and write(2).
You'll have i_atime(stale), z_atime(new), SA(stale) and z_atime_dirty=0.

Now, if you do stat(2), zfs_getattr will actually replace i_atime with what's
inside, z_atime. So you will have now you'll have i_atime(new), z_atime(new),
SA(stale) and z_atime_dirty=0. These will all gone after umount. And you'll
leave with a stale atime.

The problem for relatime:

We do have a relatime config inside ZFS dataset, but how it should interact
with the mount flag MS_RELATIME is not well defined. It seems it wanted
relatime mount option to override the dataset config by showing it as
temporary in `zfs get`. But at the same time, `zfs set relatime=on|off` would
also seems to want to override the mount option. Not to mention that
MS_RELATIME flag is actually never passed into ZFS, so it never really worked.

How Linux handles atime:

The Linux kernel actually handles atime completely in VFS, except for writing
it to disk. So if we remove the atime handling in ZFS, things would just work,
no matter it's strictatime, relatime, noatime, or even O_NOATIME. And whenever
VFS updates the i_atime, it will notify the underlying filesystem via
sb->dirty_inode().

And also there's one thing to note about atime flags like MS_RELATIME and
other flags like MS_NODEV, etc. They are mount point flags rather than
filesystem(sb) flags. Since native linux filesystem can be mounted at multiple
places at the same time, they can all have different atime settings. So these
flags are never passed down to filesystem drivers.

What this patch tries to do:

We remove znode->z_atime, since we won't gain anything from it. We remove most
of the atime handling and leave it to VFS. The only thing we do with atime is
to write it when dirty_inode() or setattr() is called. We also add
file_accessed() in zpl_read() since it's not provided in vfs_read().

After this patch, only the MS_RELATIME flag will have effect. The setting in
dataset won't do anything. We will make zfstuil to mount ZFS with MS_RELATIME
set according to the setting in dataset in future patch.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2017-02-03 11:58:19 -08:00
Chunwei Chen a0e099580a Fix write(2) returns zero bug from 933ec99
For generic_write_checks with 2 args, we can exit when it returns zero because
it means count is zero. However this is not the case for generic_write_checks
with 4 args, where zero means no error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5720
Closes #5726
2017-02-03 10:25:41 -08:00
Chunwei Chen 5070e5311c Retire .write/.read file operations
The .write/.read file operations callbacks can be retired since
support for .read_iter/.write_iter and .aio_read/.aio_write has
been added.  The vfs_write()/vfs_read() entry functions will
select the correct interface for the kernel.  This is desirable
because all VFS write/read operations now rely on common code.

This change also add the generic write checks to make sure that
ulimits are enforced correctly on write.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5587
Closes #5673
2017-02-03 10:25:37 -08:00
Chunwei Chen 110470266d Fix zmo leak when zfs_sb_create fails
zfs_sb_create would normally takes ownership of zmo, and it will be freed in
zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5490
Closes #5496
2017-02-03 10:25:33 -08:00
Chunwei Chen d425320ac8 Fix fchange in zpl_ioctl_setflags
The fchange in zpl_ioctl_setflags was for detecting flag change. However it
was incorrect and would always fail to detect a flag change from set to unset,
causing users without CAP_LINUX_IMMUTABLE to be able to unset flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:25:29 -08:00
Chunwei Chen 2a51899946 Fix wrong operator in xvattr.h
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:25:25 -08:00
Chunwei Chen f3da7a1b40 Don't count '@' for dataset namelen if not a snapshot
Don't count '@' for dataset namelen if not a snapshot.  This
fixes making a pool unimportable when the  dataset namelen
is 255.

Add test file for zfs create name length 255.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5432
Closes #5456
2017-02-03 10:25:22 -08:00
Richard Yao 625ee0a5e0 zfs_inode_update should not call dmu_object_size_from_db under spinlock
We should never block when holding a spin lock, but zfs_inode_update can
block in the critical section of a spin lock in zfs_inode_update:

zfs_inode_update -> dmu_object_size_from_db -> zrl_add -> mutex_enter

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3858
2017-02-03 10:25:19 -08:00
Gvozden Neskovic 9dd467a271 Fix ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE check
Pass `ACL_TYPE_ACCESS` for type parameter of `set_cached_acl()` and
`forget_cached_acl()` to avoid removal of dead code after BUG() in
compile time. Tested on 3.2.0 kernel.

Introduced in 3779913

Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #5378
2017-02-03 10:25:15 -08:00
Isaac Huang 6ebfe58117 Explicit block device plugging when submitting multiple BIOs
Without plugging, the default 'noop' scheduler will not merge
the BIOs which are part of a large ZIO.

Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #5181
2017-02-03 10:25:12 -08:00
Tim Chase 39d65926c9 4.10 compat - BIO flag changes and others
[bio] The req_op enum was changed to req_opf.  Update the "Linux 4.8 API"
autotools checks to use an int to determine whether the various REQ_OP
values are defined.  This should work properly on kernels >= 4.8.

[bio] bio_set_op_attrs() is now an inline function and can't be detected
with #ifdef.  Add a configure check to determine whether bio_set_op_attrs()
is defined.  Move the local definition of it from vdev_disk.c to
blkdev_compat.h for consistency with other related compability shims.

[bio] The read/write flags and their modifiers, including WRITE_FLUSH,
WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h.  Add the new
bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA
and set the flags appropriately for each supported kernel version.

[vfs] The generic_readlink() function has been made static.  If .readlink
in inode_operations is NULL, generic_readlink() is used.

[zol typo] Completely unrelated to 4.10 compat, fix a typo in the check
for REQ_OP_SECURE_ERASE so that the proper macro is defined:

    s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5499
2017-02-03 10:25:07 -08:00
Brian Behlendorf a57228e51c Reorder HAVE_BIO_RW_* checks
The HAVE_BIO_RW_* #ifdef's must appear before REQ_* #ifdef's
in the bio_is_flush() and bio_is_discard() macros.  Linux 2.6.32
era kernels defined both of values and the HAVE_BIO_RW_* must be
used in this case.  This resulted in a panic in zconfig test 5.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4951
Closes #4959
2017-02-03 10:25:03 -08:00
Brian Behlendorf bea68ec5bf Remove custom root pool import code
Non-Linux OpenZFS implementations require additional support to be
used a root pool.  This code should simply be removed to avoid
confusion and improve readability.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4951
2017-02-03 10:24:59 -08:00
Tim Chase 88fa992878 Fix sync behavior for disk vdevs
Prior to b39c22b, which was first generally available in the 0.6.5
release as b39c22b, ZoL never actually submitted synchronous read or write
requests to the Linux block layer.  This means the vdev_disk_dio_is_sync()
function had always returned false and, therefore, the completion in
dio_request_t.dr_comp was never actually used.

In b39c22b, synchronous ZIO operations were translated to synchronous
BIO requests in vdev_disk_io_start().  The follow-on commits 5592404 and
aa159af fixed several problems introduced by b39c22b.  In particular,
5592404 introduced the new flag parameter "wait" to __vdev_disk_physio()
but under ZoL, since vdev_disk_physio() is never actually used, the wait
flag was always zero so the new code had no effect other than to cause
a bug in the use of the dio_request_t.dr_comp which was fixed by aa159af.

The original rationale for introducing synchronous operations in b39c22b
was to hurry certains requests through the BIO layer which would have
otherwise been subject to its unplug timer which would increase the
latency.  This behavior of the unplug timer, however, went away during the
transition of the plug/unplug system between kernels 2.6.32 and 2.6.39.

To handle the unplug timer behavior on 2.6.32-2.6.35 kernels the
BIO_RW_UNPLUG flag is used as a hint to suppress the plugging behavior.

For kernels 2.6.36-2.6.38, the REQ_UNPLUG macro will be available and
ise used for the same purpose.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4858
2017-02-03 10:24:54 -08:00
Chunwei Chen c09af45f7b Use set_cached_acl and forget_cached_acl when possible
Originally, these two function are inline, so their usability is tied to
posix_acl_release. However, since Linux 3.14, they became EXPORT_SYMBOL, so we
can always use them. In this patch, we create an independent test for these
two functions so we can use them when possible.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:50 -08:00
Chunwei Chen 64c259c509 Batch free zpl_posix_acl_release
Currently every calls to zpl_posix_acl_release will schedule a delayed task,
and each delayed task will add a timer. This used to be fine except for
possibly bad performance impact.

However, in Linux 4.8, a new timer wheel implementation[1] is introduced. In
this new implementation, the larger the delay, the less accuracy the timer is.
So when we have a flood of timer from zpl_posix_acl_release, they will expire
at the same time. Couple with the fact that task_expire will do linear search
with lock held. This causes an extreme amount of contention inside interrupt
and would actually lockup the system.

We fix this by doing batch free to prevent a flood of delayed task. Every call
to zpl_posix_acl_release will put the posix_acl to be freed on a lockless
list. Every batch window, 1 sec, the zpl_posix_acl_free will fire up and free
every posix_acl that passed the grace period on the list. This way, we only
have one delayed task every second.

[1] https://lwn.net/Articles/646950/

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:45 -08:00
Neal Gompa (ニール・ゴンパ) 447040c31d Process all systemd services through the systemd scriptlets
This patch ensures that all systemd services are processed through the
systemd scriptlets, so that services are properly configured per the
preset file installed by the package.

Without this, zfs.target is set, but none of the services are enabled per
the preset file, meaning automounting filesystems and such won't work
out of the box.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa13@gmail.com>
Closes #5356
2017-02-03 10:24:41 -08:00
tuxoko 734e235f67 Fix cred leak in zpl_fallocate_common
This is caught by kmemleak when running compress_004_pos

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5244
Closes #5330
2017-02-03 10:24:38 -08:00
Hajo Möller ffcd0c5434 Fix lookup_bdev() on Ubuntu
Ubuntu added support for checking inode permissions to lookup_bdev() in kernel
commit 193fb6a2c94fab8eb8ce70a5da4d21c7d4023bee (merged in 4.4.0-6.21).
Upstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636517

This patch adds a test for Ubuntu's variant of lookup_bdev() to configure and
calls the function in the correct way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hajo Möller <dasjoe@gmail.com>
Closes #5336
2017-02-03 10:24:34 -08:00
LOLi d2beed9116 Fix uninitialized variable snapprops_nvlist in zfs_receive_one
The variable snapprops_nvlist was never initialized, so properties
were not applied to the received snapshot.

Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise
'zfs receive' functionality for user properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #4338
2017-02-03 10:24:30 -08:00
Tim Chase 4c83fa9b87 Write issue taskq shouldn't be dynamic
This is as much an upstream compatibility as it's a bit of a performance
gain.

The illumos taskq implemention doesn't allow a TASKQ_THREADS_CPU_PCT type
to be dynamic and in fact enforces as much with an ASSERT.

As to performance, if this taskq is dynamic, it can cause excessive
contention on tq_lock as the threads are created and destroyed because it
can see bursts of many thousands of tasks in a short time, particularly
in heavy high-concurrency zvol write workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5236
2017-02-03 10:24:26 -08:00
Brian Behlendorf cbf8713874 Use large stacks when available
While stack size will vary by architecture it has historically defaulted to
8K on x86_64 systems.  However, as of Linux 3.15 the default thread stack
size was increased to 16K.  These kernels are now the default in most non-
enterprise distributions which means we no longer need to assume 8K stacks.

This patch takes advantage of that fact by appropriately reverting stack
conservation changes which were made to ensure stability.  Changes which
may have had a negative impact on performance for certain workloads.  This
also has the side effect of bringing the code slightly more in line with
upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #4059
2017-02-03 10:24:22 -08:00
Stian Ellingsen dc3d6a6db1 Use env, not sh in zfsctl_snapshot_{,un}mount()
Call mount and umount via /usr/bin/env instead of /bin/sh in
zfsctl_snapshot_mount() and zfsctl_snapshot_unmount().

This change fixes a shell code injection flaw.  The call to /bin/sh
passed the mountpoint unescaped, only surrounded by single quotes.  A
mountpoint containing one or more single quotes would cause the command
to fail or potentially execute arbitrary shell code.

This change also provides compatibility with grsecurity patches.
Grsecurity only allows call_usermodehelper() to use helper binaries in
certain paths.  /usr/bin/* is allowed, /bin/* is not.
2017-02-03 10:24:17 -08:00
Stian Ellingsen d71db895a1 Fix use after free in zfsctl_snapshot_unmount() 2017-02-03 10:24:12 -08:00
tuxoko 42dae6d7a6 Linux 3.14 compat: assign inode->set_acl
Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5371
Closes #5375
2017-02-03 10:24:09 -08:00
Brian Behlendorf f85c85ea06 Linux 4.9 compat: inode_change_ok() renamed setattr_prepare()
In torvalds/linux@31051c8 the inode_change_ok() function was
renamed setattr_prepare() and updated to take a dentry ratheri
than an inode.  Update the code to call the setattr_prepare()
and add a wrapper function which call inode_change_ok() for
older kernels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:06 -08:00
Chunwei Chen 670508f080 Linux 4.9 compat: remove iops->{set,get,remove}xattr
In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and
generic_{set,get,remove}xattr are removed. xattr operations will directly
go through sb->s_xattr.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:24:00 -08:00
Chunwei Chen 28172e8aa7 Linux 4.9 compat: iops->rename() wants flags
In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are
merged together into iops->rename(), it now wants flags.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:23:57 -08:00
tuxoko c0716f13ef Linux 4.7 compat: Fix deadlock during lookup on case-insensitive
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #5124
Closes #5141
Closes #5147
Closes #5148
2017-02-03 10:23:53 -08:00
DeHackEd dbc95a682c Kernel 4.9 compat: file_operations->aio_fsync removal
Linux kernel commit 723c038475b78 removed this field.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #5393
2017-02-03 10:23:50 -08:00
Chunwei Chen 20a0763746 Remove dir inode operations from zpl_inode_operations
These operations are dir specific, there's no point putting them in
zpl_inode_operations which is for regular files.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
2017-02-03 10:23:47 -08:00
Brian Behlendorf e56852059f Fix uninitialized variable in avl_add()
Silence the following warning when compiling with gcc 5.4.0.
Specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609.

module/avl/avl.c: In function ‘avl_add’:
module/avl/avl.c:647:2: warning: ‘where’ may be used uninitialized
    in this function [-Wmaybe-uninitialized]
  avl_insert(tree, new_node, where);

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2017-02-03 10:23:42 -08:00
Ned Bass 1f734a62ac Prepare to release 0.6.5.8
META file and RPM release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-09-09 13:21:10 -07:00
Brian Behlendorf ffddb4dfab Fix gcc -Warray-bounds check for dump_object() in zdb
As of gcc 6.1.1 20160621 (Red Hat 6.1.1-3) an array bounds warnings
is detected in the zdb the dump_object() function.  The analysis is
correct but difficult to interpret because this is implemented as a
macro.  Rework the ZDB_OT_NAME in to a function and remove the case
detected by gcc which is a side effect of the DMU_OT_IS_VALID() macro.

  zdb.c: In function ‘dump_object’:
  zdb.c:1931:288: error: array subscript is outside array bounds
      [-Werror=array-bounds]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #4907
2016-09-09 13:21:10 -07:00
Brian Behlendorf 8fe1fb14cb Handle block pointers with a corrupt logical size
Commit 5f6d0b6 was originally added to gracefully handle block
pointers with a damaged logical size.  However, it incorrectly
assumed that all passed arc_done_func_t could handle a NULL
arc_buf_t.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4069
Closes #4080
2016-09-09 13:21:10 -07:00
Brian Behlendorf bf8b4a9fd5 Linux 4.8 compat: Fix removal of bio->bi_rw member
All users of bio->bi_rw have been replaced with compatibility wrappers.
This allows the kernel specific logic to be abstracted away, and for
each of the supported cases to be documented with the wrapper.  The
updated interfaces are as follows:

* void blk_queue_set_write_cache(struct request_queue *, bool, bool)
* boolean_t bio_is_flush(struct bio *)
* boolean_t bio_is_fua(struct bio *)
* boolean_t bio_is_discard(struct bio *)
* boolean_t bio_is_secure_erase(struct bio *)
* VDEV_WRITE_FLUSH_FUA

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4951
2016-09-09 13:21:10 -07:00
Brian Behlendorf 39a78fe9d4 Linux 4.8 compat: posix_acl_valid()
The posix_acl_valid() function has been updated to require a
user namespace.  Filesystem callers should normally provide the
user_ns from the super block associcated with the ACL; the
zpl_posix_acl_valid() wrapper has been added for this purpose.
See https://github.com/torvalds/linux/commit/0d4d717f for
complete details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4922
2016-09-09 13:21:10 -07:00
Chunwei Chen 6ae0dbdc8a Linux 4.8 compat: REQ_OP and bio_set_op_attrs()
New REQ_OP_* definitions have been introduced to separate the
WRITE, READ, and DISCARD operations from the flags.  This included
changing the encoding of bi_rw.  It places REQ_OP_* in high order
bits and other stuff in low order bits.  This encoding is done
through the new helper function bio_set_op_attrs.  For complete
details refer to:

https://github.com/torvalds/linux/commit/f215082
https://github.com/torvalds/linux/commit/4e1b2d5

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4892
Closes #4899
2016-09-09 13:21:10 -07:00
Brian Behlendorf a0591c4370 Linux 4.8 compat: REQ_PREFLUSH
The REQ_FLUSH flag was renamed REQ_PREFLUSH to avoid confusion with
REQ_OP_FLUSH.  See https://github.com/torvalds/linux/commit/28a8f0d3
for complete details.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4892
Issue #4899
2016-09-09 13:21:10 -07:00
Brian Behlendorf 68b8d22c6e Linux 4.8 compat: submit_bio()
The rw argument has been removed from submit_bio/submit_bio_wait.
Callers are now expected to set bio->bi_rw instead of passing it
in.  See https://github.com/torvalds/linux/commit/4e49ea4a for
complete details.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4892
Issue #4899
2016-09-09 13:21:09 -07:00
smh 3d824a8878 FreeBSD rS271776 - Persist vdev_resilver_txg changes
Persist vdev_resilver_txg changes to avoid panic caused by validation
vs a vdev_resilver_txg value from a previous resilver.

Authored-by: smh <smh@FreeBSD.org>
Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/5154
FreeBSD-issue: https://reviews.freebsd.org/rS271776
FreeBSD-commit: https://github.com/freebsd/freebsd/commit/c3c60bf
Closes #4790
2016-09-09 13:21:09 -07:00
GeLiXin e5c02cbb03 Fix: Array bounds read in zprop_print_one_property()
If the loop index i comes to (ZFS_GET_NCOLS - 1), the cbp->cb_columns[i + 1]
actually read the data of cbp->cb_colwidths[0], which means the array
subscript is above array bounds.

Luckily the cbp->cb_colwidths[0] is always 0 and it seems we haven't
looped enough times to exceed the array bounds so far, but it's really
a secluded risk someday.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5003
2016-09-09 13:21:09 -07:00
GeLiXin e66b546cb7 Fix call zfs_get_name() with invalid parameter
zfs_get_name() expects a parameter of type zfs_handle_t *zhp , but
gets an invalid parameter type of zfs_handle_t **zhp actually in
libzfs_dataset_cmp(), which may trigger a coredump if called.

libzfs_dataset_cmp() working normally so far, just because all the
callers only give datasets of type ZFS_TYPE_FILESYSTEM to it, we
compared their mountpoint and return, luckily.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4919
2016-09-09 13:21:09 -07:00
GeLiXin c23686524f Fix incorrect pool state after import
Import a raidz pool which has a vdev with a bad label, zpool status
shows the right state of the dev, but the wrong state of the pool.
The pool state should be DEGRADED, not ONLINE.

We examine the label in vdev_validate while in spa_load_impl, the bad
label can be detected but doesn't propagate its state to the parent.
There are other chances to propagate state in the following vdev_load
if we failed to load DTL, but our pool is raidz1 which can tolerate a
faulted disk.  So we lost the last chance to correct the pool state.

Propagate the leaf vdev's state to parent if its label was corrupted,
as is done elsewhere in vdev_validate.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #4948
2016-09-09 13:21:09 -07:00
GeLiXin 74acdfc682 Fix self-healing IO prior to dsl_pool_init() completion
Async writes triggered by a self-healing IO may be issued before the
pool finishes the process of initialization.  This results in a NULL
dereference of `spa->spa_dsl_pool` in vdev_queue_max_async_writes().

George Wilson recommended addressing this issue by initializing the
passed `dsl_pool_t **` prior to dmu_objset_open_impl().  Since the
caller is passing the `spa->spa_dsl_pool` this has the effect of
ensuring it's initialized.

However, since this depends on the caller knowing they must pass
the `spa->spa_dsl_pool` an additional NULL check was added to
vdev_queue_max_async_writes().  This guards against any future
restructuring of the code which might result in dsl_pool_init()
being called differently.

Signed-off-by: GeLiXin <47034221@qq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4652
2016-09-09 13:21:09 -07:00
Paul Dagnelie d9e1eec9a2 OpenZFS 6876 - Stack corruption after importing a pool with a too-long name
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking
for trouble. We should check every dataset on import, using a 1024 byte
buffer and checking each time to see if the dataset's new name is longer
than 256 bytes.

OpenZFS-issue: https://www.illumos.org/issues/6876
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ca8674e
2016-09-09 13:21:09 -07:00
Matthew Ahrens 1421562a0d OpenZFS 7263 - deeply nested nvlist can overflow stack
nvlist_pack() and nvlist_unpack are implemented recursively, which can
cause the stack to overflow with a deeply nested nvlist; i.e. an nvlist
which contains an nvlist, which contains an nvlist, which...

Unprivileged users can pass an nvlist to the kernel via certain ioctls
on /dev/zfs, which the kernel will unpack without additional permission
checking or validation. Therefore, an unprivileged user can cause the
kernel's stack to overflow and panic.

Ideally, these functions would be implemented non-recursively. As a
quick fix, this patch limits the depth of the recursion and returns an
error when attempting to pack and unpack a deeply-nested nvlist.

Signed-off-by: Adam Leventhal <ahl@delphix.com>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Prakash Surya <prakash.surya@delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/7263
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0511d6d

-
2016-09-09 13:21:09 -07:00
Chunwei Chen 58000c3ec7 Fix dbuf_stats_hash_table_data race
Dropping DBUF_HASH_MUTEX when walking the hash list is unsafe. The dbuf
can be freed at any time.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4846
2016-09-09 13:21:09 -07:00
Tim Chase e871059bc4 Prevent null dereferences when accessing dbuf kstat
In arc_buf_info(), the arc_buf_t may have no header.  If not, don't try
to fetch the arc buffer stats and instead just zero them.

The null dereferences were observed while accessing the dbuf kstat with
awk on a system in which millions of small files were being created in
order to overflow the system's metadata limit.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4837
2016-09-09 13:21:09 -07:00
Chunwei Chen 91f81c42f0 fh_to_dentry should return ESTALE when generation mismatch
When generation mismatch, it usually means the file pointed by the file handle
was deleted. We should return ESTALE to indicate this. We return ENOENT in
zfs_vget since zpl_fh_to_dentry will convert it to ESTALE.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4828
2016-09-09 13:21:09 -07:00
Chunwei Chen 2ab9247411 Don't allow accessing XATTR via export handle
Allow accessing XATTR through export handle is a very bad idea. It
would allow user to write whatever they want in fields where they
otherwise could not.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4828
2016-09-09 13:21:09 -07:00
Chunwei Chen af4e50750b Fix out-of-bound access in zfs_fillpage
The original code will do an out-of-bound access on pl[] during last
iteration.

 ==================================================================
 BUG: KASAN: stack-out-of-bounds in zfs_getpage+0x14c/0x2d0 [zfs]
 Read of size 8 by task tmpfile/7850
 page:ffffea00017c6dc0 count:0 mapcount:0 mapping:          (null) index:0x0
 flags: 0xffff8000000000()
 page dumped because: kasan: bad access detected
 CPU: 3 PID: 7850 Comm: tmpfile Tainted: G           OE   4.6.0+ #3
  ffff88005f1b7678 0000000006dbe035 ffff88005f1b7508 ffffffff81635618
  ffff88005f1b7678 ffff88005f1b75a0 ffff88005f1b7590 ffffffff81313ee8
  ffffea0001ae8dd0 ffff88005f1b7670 0000000000000246 0000000041b58ab3
 Call Trace:
  [<ffffffff81635618>] dump_stack+0x63/0x8b
  [<ffffffff81313ee8>] kasan_report_error+0x528/0x560
  [<ffffffff81278f20>] ? filemap_map_pages+0x5f0/0x5f0
  [<ffffffff813144b8>] kasan_report+0x58/0x60
  [<ffffffffc12250dc>] ? zfs_getpage+0x14c/0x2d0 [zfs]
  [<ffffffff81312e4e>] __asan_load8+0x5e/0x70
  [<ffffffffc12250dc>] zfs_getpage+0x14c/0x2d0 [zfs]
  [<ffffffffc1252131>] zpl_readpage+0xd1/0x180 [zfs]

  [<ffffffff81353c3a>] SyS_execve+0x3a/0x50
  [<ffffffff810058ef>] do_syscall_64+0xef/0x180
  [<ffffffff81d0ee25>] entry_SYSCALL64_slow_path+0x25/0x25
 Memory state around the buggy address:
  ffff88005f1b7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff88005f1b7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff88005f1b7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
                                                                 ^
  ffff88005f1b7680: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  ffff88005f1b7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ==================================================================

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4705
Issue #4708
2016-09-09 13:21:09 -07:00
Chunwei Chen 3602878ff7 Fix memleak in zpl_parse_options
strsep() will advance tmp_mntopts, and will change it to NULL on last
iteration.  This will cause strfree(tmp_mntopts) to not free anything.

unreferenced object 0xffff8800883976c0 (size 64):
  comm "mount.zfs", pid 3361, jiffies 4294931877 (age 1482.408s)
  hex dump (first 32 bytes):
    72 77 00 73 74 72 69 63 74 61 74 69 6d 65 00 7a  rw.strictatime.z
    66 73 75 74 69 6c 00 6d 6e 74 70 6f 69 6e 74 3d  fsutil.mntpoint=
  backtrace:
    [<ffffffff81810c4e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811f9cac>] __kmalloc+0x16c/0x250
    [<ffffffffc065ce9b>] strdup+0x3b/0x60 [spl]
    [<ffffffffc080fad6>] zpl_parse_options+0x56/0x300 [zfs]
    [<ffffffffc080fe46>] zpl_mount+0x36/0x80 [zfs]
    [<ffffffff81222dc8>] mount_fs+0x38/0x160
    [<ffffffff81240097>] vfs_kern_mount+0x67/0x110
    [<ffffffff812428e0>] do_mount+0x250/0xe20
    [<ffffffff812437d5>] SyS_mount+0x95/0xe0
    [<ffffffff8181aff6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4706
Issue #4708
2016-09-09 13:21:09 -07:00
Chunwei Chen 9f5f758d77 Fix arc_prune_task use-after-free
arc_prune_task uses a refcount to protect arc_prune_t, but it doesn't prevent
the underlying zsb from disappearing if there's a concurrent umount. We fix
this by force the caller of arc_remove_prune_callback to wait for
arc_prune_taskq to finish.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4687
Closes #4690
2016-09-09 13:21:09 -07:00
Chunwei Chen d5b0e7fcf1 Fix get_zfs_sb race with concurrent umount
Certain ioctl operations will call get_zfs_sb, which will holds an active
count on sb without checking whether it's active or not. This will result
in use-after-free. We fix this by using atomic_inc_not_zero to make sure
we got an active sb.

P1                                          P2
---                                         ---
deactivate_locked_super(): s_active = 0
                                            zfs_sb_hold()
                                            ->get_zfs_sb(): s_active = 1
->zpl_kill_sb()
-->zpl_put_super()
--->zfs_umount()
---->zfs_sb_free(zsb)
                                            zfs_sb_rele(zsb)

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-09-09 13:21:09 -07:00
Chunwei Chen ec9b8fae06 Kill zp->z_xattr_parent to prevent pinning
zp->z_xattr_parent will pin the parent. This will cause huge issue
when unlink a file with xattr. Because the unlinked file is pinned, it
will never get purged immediately. And because of that, the xattr
stuff will never be marked as unlinked. So the whole unlinked stuff
will stay there until shrink cache or umount.

This change partially reverts e89260a.  This is safe because only the
zp->z_xattr_parent optimization is removed, zpl_xattr_security_init()
is still called from the zpl outside the inode lock.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #4359
Issue #3508
Issue #4413
Issue #4827
2016-09-09 13:21:09 -07:00
Chunwei Chen f7923f4ada xattr dir doesn't get purged during iput
We need to set inode->i_nlink to zero so iput will purge it. Without this, it
will get purged during shrink cache or umount, which would likely result in
deadlock due to zfs_zget waiting forever on its children which are in the
dispose_list of the same thread.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #4359
Issue #3508
Issue #4413
Issue #4827
2016-09-09 13:21:09 -07:00
Ned Bass 5acbedbbe8 Add ZIO_CHECKSUM_IS_ZERO
The ZIO_CHECKSUM_IS_ZERO macro was added in master commit:

37f8a88 Illumos 5746 - more checksumming in zfs send

That whole patch is not suitable for the release branch
but some other backported patches on that macro.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-09-09 13:21:09 -07:00
Rich Ercolani 3a8e13688b Add tunable to ignore hole_birth (enabled by default)
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue #4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4833
2016-09-09 13:20:54 -07:00
Peng 4f96e68fad Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z
The following scenario can result in garbage in the dn_spill field.
The db->db_blkptr must be set to NULL when DNODE_FLAG_SPILL_BLKPTR
is clear to ensure the dn_spill field is cleared.

Current txg = A.
* A new spill buffer is created. Its dbuf is initialized with
  db_blkptr = NULL and it's dirtied.

Current txg = B.
* The spill buffer is modified. It's marked as dirty in this txg.
* Additional changes make the spill buffer unnecessary because the
  xattr fits into the bonus buffer, so it's removed. The dbuf is
  undirtied in this txg, but it's still referenced and cannot be
  destroyed.

Current txg = C.
* Starts syncing of txg A
* dbuf_sync_leaf() is called for the spill buffer. Since db_blkptr
  is NULL, dbuf_check_blkptr() is called.
* The dbuf starts being written and it reaches the ready state
  (not done yet).
* A new change makes the spill buffer necessary again.
  sa_build_layouts() ends up calling dbuf_find() to locate the
  dbuf.  It finds the old dbuf because it has not been destroyed yet
  (it will be destroyed when the previous write is done and there
  are no more references). The old dbuf has db_blkptr != NULL.
* txg A write is complete and the dbuf released. However it's still
  referenced, so it's not destroyed.

Current txg = D.
* Starts syncing of txg B
* dbuf_sync_leaf() is called for the bonus buffer. Its contents are
  directly copied into the dnode, overwriting the blkptr area because,
  in txg B, the bonus buffer was big enough to hold the entire xattr.
* At this point, the db_blkptr of the spill buffer used in txg C
  gets corrupted.

Signed-off-by: Peng <peng.hse@xtaotech.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3937
2016-09-05 16:07:09 -07:00
Chunwei Chen a77cea5f0f Fix Large kmem_alloc in vdev_metaslab_init
This allocation can go way over 1MB, so we should use vmem_alloc
instead of kmem_alloc.

  Large kmem_alloc(1430784, 0x1000), please file an issue...
  Call Trace:
   [<ffffffffa0324aff>] ? spl_kmem_zalloc+0xef/0x160 [spl]
   [<ffffffffa17d0c8d>] ? vdev_metaslab_init+0x9d/0x1f0 [zfs]
   [<ffffffffa17d46d0>] ? vdev_load+0xc0/0xd0 [zfs]
   [<ffffffffa17d4643>] ? vdev_load+0x33/0xd0 [zfs]
   [<ffffffffa17c0004>] ? spa_load+0xfc4/0x1b60 [zfs]
   [<ffffffffa17c1838>] ? spa_tryimport+0x98/0x430 [zfs]
   [<ffffffffa17f28b1>] ? zfs_ioc_pool_tryimport+0x41/0x80 [zfs]
   [<ffffffffa17f5669>] ? zfsdev_ioctl+0x4a9/0x4e0 [zfs]
   [<ffffffff811bacdf>] ? do_vfs_ioctl+0x2cf/0x4b0
   [<ffffffff811baf41>] ? SyS_ioctl+0x81/0xa0

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4752
2016-09-05 16:07:09 -07:00
Tim Chase db3f5edcf1 Linux 4.6 compat: Fall back to d_prune_aliases() if necessary
As of 4.6, the icache and dcache LRUs are memcg aware insofar as the
kernel's per-superblock shrinker is concerned.  The effect is that dcache
or icache entries added by a task in a non-root memcg won't be scanned
by the shrinker in the context of the root (or NULL) memcg.  This defeats
the attempts by zfs_sb_prune() to unpin buffers and can allow metadata to
grow uncontrollably.  This patch reverts to the d_prune_aliaes() method
in case the kernel's per-superblock shrinker is not able to free anything.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes: #4726
2016-09-05 16:07:09 -07:00
Brian Behlendorf 6ae855d8a7 Systemd configuration fixes
* Disable zfs-import-scan.service by default.  This ensures that
pools will not be automatically imported unless they appear in
the cache file.  When this service is explicitly enabled pools
will be imported with the "cachefile=none" property set.  This
prevents the creation of, or update to, an existing cache file.

    $ systemctl list-unit-files | grep zfs
    zfs-import-cache.service                  enabled
    zfs-import-scan.service                   disabled
    zfs-mount.service                         enabled
    zfs-share.service                         enabled
    zfs-zed.service                           enabled
    zfs.target                                enabled

* Change services to dynamic from static by adding an [Install]
section and adding 'WantedBy' tags in favor of 'Requires' tags.
This allows for easier customization of the boot behavior.

* Start the zfs-import-cache.service after the root pivot so
the cache file is available in the standard location.

* Start the zfs-mount.service after the systemd-remount-fs.service
to ensure the root fs is writeable and the ZFS filesystems can
create their mount points.

* Change the default behavior to only load the ZFS kernel modules
in zfs-import-*.service or when blkid(8) detects a pool.  Users
who wish to unconditionally load the kernel modules must uncomment
the list of modules in /lib/modules-load.d/zfs.conf.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4325
Closes #4496
Closes #4658
Closes #4699
2016-09-05 16:07:09 -07:00
Grischa Zengel ff2a2b208d Add nfs-kernel-server for Debian
Debian based systems use nfs-kernel-server as the service name.
List both nfs-server.service and nfs-kernel-server.service so
this service will work on multiple distributions.

Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4350
2016-09-05 16:07:09 -07:00
Turbo Fredriksson 1db6030de2 Rename 'zed.service' to 'zfs-zed.service'
For consistency all systemd unit files and init scripts now share
the same names.  This prevents an issue where the zed is started
twice on systems where both the systemd and sysv infrastructure is
installed concurrently.

For backward compatibility a 'zed' alias has been added.  This
allows the user to interact with the service using either the
name 'zed' or 'zfs-zed'.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3837
2016-09-05 16:07:09 -07:00
Chunwei Chen f3f0c589c3 Skip ctldir znode in zfs_rezget to fix snapdir issues
Skip ctldir in zfs_rezget, otherwise they will always get invalidated. This
will cause funny behaviour for the mounted snapdirs. Especially for
Linux >= 3.18, d_invalidate will detach the mountpoint and prevent anyone
automount it again as long as someone is still using the detached mount.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4514
Closes #4661
Closes #4672
2016-09-05 16:07:08 -07:00
Chunwei Chen 26e2bfa770 Linux 4.7 compat: fix zpl_get_acl returns invalid acl pointer
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4944
Closes #4946
2016-09-05 16:07:08 -07:00
Brian Behlendorf 97a1bbd4ea Retire HAVE_CURRENT_UMASK and HAVE_POSIX_ACL_CACHING
Remove ZFS_AC_KERNEL_CURRENT_UMASK and ZFS_AC_KERNEL_POSIX_ACL_CACHING
configure checks, all supported kernel provide this functionality.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4922
2016-09-05 16:07:08 -07:00
Brian Behlendorf 01d9981349 Linux 4.7 compat: handler->set() takes both dentry and inode
Counterpart to fd4c7b7, the same approach was taken to resolve
the compatibility issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #4717
Issue #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 7043281906 Linux 4.7 compat: use iterate_shared for concurrent readdir
Register iterate_shared if it exists so the kernel will used shared
lock and allowing concurrent readdir.

Also, use shared lock when doing llseek with SEEK_DATA or SEEK_HOLE
to allow concurrent seeking.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4664
Closes #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 1aff4bb235 Linux 4.7 compat: replace blk_queue_flush with blk_queue_write_cache
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 55b8857346 Linux 4.7 compat: handler->get() takes both dentry and inode
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4665
2016-09-05 16:07:08 -07:00
Chunwei Chen 703c9f5893 Remove dummy znode from zvol_state
struct zvol_state contains a dummy znode, which is around 1KB on x64,
only for zfs_range_lock. But in reality, other than z_range_lock and
z_range_avl, zfs_range_lock only need znode on regular file, which
means we add 1KB on a structure and gain nothing.

In this patch, we remove the dummy znode for zvol_state. In order to
do that, we also need to refactor zfs_range_lock a bit. We move
z_range_lock and z_range_avl pair out of znode_t to form zfs_rlock_t.
This new struct replaces znode_t as the main handle inside the range
lock functions.

We also add pointers to z_size, z_blksz, and z_max_blksz so range lock
code doesn't depend on znode_t.  This allows non-ZPL consumers like
Lustre to use the range locks with their equivalent znode_t structure.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4510
2016-09-05 16:07:08 -07:00
Matthew Ahrens 2ea36ad824 Illumos 4953, 4954, 4955
4953 zfs rename <snapshot> need not involve libshare
4954 "zfs create" need not involve libshare if we are not sharing
4955 libshare's get_zfs_dataset need not sort the datasets
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4953
  https://www.illumos.org/issues/4954
  https://www.illumos.org/issues/4955
  https://github.com/illumos/illumos-gate/commit/33cde0d

Porting notes:
- Dropped qsort libshare_zfs.c hunk, no equivalent ZoL code.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4219
2016-09-05 16:07:08 -07:00
Brian Behlendorf efde19487c Fix ztest truncated cache file
Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
truncate and overwrite rather than rename the cache file.  This is
the correct fix but it should have only been applied for the kernel
build.  In user space rename(2) is needed because ztest depends on
the cache file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4129
2016-09-05 16:07:08 -07:00
AndCycle 347cdb6e61 Obey arc_meta_limit default size when changing arc_max
When decreasing the maximum ARC size preserve the 3/4 default
ratio for the arc_meta_limit.  Otherwise, the arc_meta_limit
may be set the same as arc_max.

Signed-off-by: AndCycle <andcycle@andcycle.idv.tw>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4001
2016-09-05 16:07:08 -07:00
Brian Behlendorf 2aec0bf4c5 Add `make lint` target
Add a `make lint` target which maps to a cppcheck target.  As with
the shellcheck target it will only run when cppcheck is installed.
This allows a `make lint` build check to be incrementally added to
the automated testing for distribution which provide cppcheck.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4915
2016-09-05 16:07:08 -07:00
Marcel Huber 9a6fcfd47d Fixes bug in fix_paths()
Fixes bug introduced in commit 7d90f569a.  Hinted by gcc:

libzfs_import.c: In function ‘fix_paths’:
libzfs_import.c:602:28: warning: self-comparison always evaluates to true [-Wtautological-compare]
    if (best->ne_num_labels == best->ne_num_labels &&

Signed-off-by: Marcel Huber <marcelhuberfoo@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4632
2016-09-05 16:07:08 -07:00
Ned Bass d30abebc85 Prepare to tag zfs-0.6.5.7
META file and release log updated.

Signed-off-by: Ned Bass <bass6@llnl.gov>
2016-05-12 19:35:49 -07:00
Tim Chase 52475b507a Enable PF_FSTRANS for ioctl secpolicy callbacks (#4571)
At the very least, the zfs_secpolicy_write_perms ioctl security policy
callback, which calls dsl_dataset_hold(), can require freeing memory and,
therefore, re-enter ZFS.  This patch enables PF_FSTRANS for all of the
security policy callbacks similarly to the manner in which it's enabled
for the actual ioctl callback.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4554
2016-05-06 18:22:34 -04:00
Brian Behlendorf 2cb77346cb Use udev for partition detection
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4523
Closes #3708
Closes #4077
Closes #4144
Closes #4214
Closes #4517
2016-05-06 18:22:34 -04:00
Brian Behlendorf c9ca152fd1 Fix 'zpool import' blkid device names
When importing a pool using the blkid cache only the device
node path was added to the list of known paths for a device.
This results in 'zpool import' always using the sdX names
in preference to the 'path' name stored in the label.

To fix the issue the blkid import path has been updated to
add both the 'path', 'devid', and 'devname' names from the
label to the known paths.  A sanity check is done to ensure
these paths do refer to the same device identified by blkid.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4523
Closes #3043
2016-05-06 18:22:34 -04:00
Chunwei Chen 21ea9460fa Remove wrong ASSERT in annotate_ecksum
When using large blocks like 1M, there will be more than UINT16_MAX qwords in
one block, so this ASSERT would go off. Also, it is possible for the histogram
to overflow. We cap them to UINT16_MAX to prevent this.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4257
2016-05-01 18:29:51 -04:00
Colin Ian King 354424de5a Add support 32 bit FS_IOC32_{GET|SET}FLAGS compat ioctls
We need 32 bit userspace FS_IOC32_GETFLAGS and FS_IOC32_SETFLAGS
compat ioctls for systems such as powerpc64.  We use the normal
compat ioctl idiom as used by a variety of file systems to provide
this support.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4477
2016-05-01 18:26:09 -04:00
Brian Behlendorf d746e2ea0e Linux 4.6 compat: PAGE_CACHE_SIZE removal
As described in torvalds/linux@4a2d057e the macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were originally introduced
to make it possible to add bigger chunks to the page cache.  This
never panned out and it has therefore been removed from the kernel.

ZFS has been updated to use the PAGE_{SIZE,SHIFT,MASK,ALIGN} macros
and calls to page_cache_release() have been replaced with put_page().

There was no need to introduce a configure check for this because
these interfaces have existed for a very long time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4489
2016-05-01 18:24:30 -04:00
Colin Ian King 60a4ea3f94 Fix inverted logic on none elevator comparison
Commit d1d7e2689d ("cstyle: Resolve C style issues") inverted
the logic on the none elevator comparison.  Fix this and make it
cstyle warning clean.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4507
2016-05-01 18:23:29 -04:00
Brian Behlendorf fbffa53a5c Create unique partition labels
When partitioning a device a name may be specified for each partition.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that might
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4517
2016-05-01 18:20:32 -04:00
Ned Bass 6400ae85ee Fix ZPL miswrite of default POSIX ACL
Commit 4967a3e introduced a typo that caused the ZPL to store the
intended default ACL as an access ACL. Due to caching this problem
may not become visible until the filesystem is remounted or the inode
is evicted from the cache. Fix the typo.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4520
2016-05-01 18:19:54 -04:00
Chunwei Chen 31dbe4b404 Linux 4.5 compat: Use xattr_handler->name for acl
Linux 4.5 added member "name" to xattr_handler. xattr_handler which matches to
whole name rather than prefix should use "name" instead of "prefix".
Otherwise, kernel will return with EINVAL when it tries to resolve handlers.

Also, we remove the strcmp checks when xattr_handler has name, because
xattr_resolve_name will do the check for us.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4549
Closes #4537
2016-05-01 18:17:57 -04:00
Brian Behlendorf a64fb11bf3 Fix user namespaces uid/gid mapping
As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
not the current user_ns, should be passed to posix_acl_from_xattr()
and posix_acl_to_xattr().  Conveniently the init_user_ns is
available through the init credential (kcred).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4177
2016-05-01 18:17:07 -04:00
Gordan Bobic 5e202e55ef Fix aarch64 compilation
sys/param.h depends on types defined in sys/types.h
(hrtime_t & timestruc_t).

Signed-off-by: Gordan Bobic <gordan@redsleeve.org>
Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4420
2016-03-24 09:22:05 -07:00
104 changed files with 3084 additions and 1425 deletions

2
META
View File

@ -1,7 +1,7 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 0.6.5.6
Version: 0.6.5.11
Release: 1
Release-Tags: relext
License: CDDL

View File

@ -55,6 +55,13 @@ shellcheck:
done; \
fi
lint: cppcheck
cppcheck:
@if type cppcheck > /dev/null 2>&1; then \
cppcheck --quiet --force ${top_srcdir}; \
fi
ctags:
$(RM) tags
find $(top_srcdir) -name .git -prune -o -name '*.[hc]' | xargs ctags

View File

@ -77,7 +77,10 @@ static const option_map_t option_map[] = {
{ MNTOPT_RELATIME, MS_RELATIME, ZS_COMMENT },
#endif
#ifdef MS_STRICTATIME
{ MNTOPT_DFRATIME, MS_STRICTATIME, ZS_COMMENT },
{ MNTOPT_STRICTATIME, MS_STRICTATIME, ZS_COMMENT },
#endif
#ifdef MS_LAZYTIME
{ MNTOPT_LAZYTIME, MS_LAZYTIME, ZS_COMMENT },
#endif
{ MNTOPT_CONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_FSCONTEXT, MS_COMMENT, ZS_COMMENT },
@ -605,10 +608,23 @@ main(int argc, char **argv)
"failed for unknown reason.\n"), dataset);
}
return (MOUNT_SYSERR);
#ifdef MS_MANDLOCK
case EPERM:
if (mntflags & MS_MANDLOCK) {
(void) fprintf(stderr, gettext("filesystem "
"'%s' has the 'nbmand=on' property set, "
"this mount\noption may be disabled in "
"your kernel. Use 'zfs set nbmand=off'\n"
"to disable this option and try to "
"mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR);
}
/* fallthru */
#endif
default:
(void) fprintf(stderr, gettext("filesystem "
"'%s' can not be mounted due to error "
"%d\n"), dataset, errno);
"'%s' can not be mounted: %s\n"), dataset,
strerror(errno));
return (MOUNT_USAGE);
}
}

View File

@ -184,9 +184,9 @@ sas_handler() {
return
fi
# Get the raw scsi device name from multipath -l. Strip off
# Get the raw scsi device name from multipath -ll. Strip off
# leading pipe symbols to make field numbering consistent.
DEV=`multipath -l $DM_NAME |
DEV=`multipath -ll $DM_NAME |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
if [ -z "$DEV" ] ; then
return

View File

@ -67,13 +67,22 @@
zio_compress_table[(idx)].ci_name : "UNKNOWN")
#define ZDB_CHECKSUM_NAME(idx) ((idx) < ZIO_CHECKSUM_FUNCTIONS ? \
zio_checksum_table[(idx)].ci_name : "UNKNOWN")
#define ZDB_OT_NAME(idx) ((idx) < DMU_OT_NUMTYPES ? \
dmu_ot[(idx)].ot_name : DMU_OT_IS_VALID(idx) ? \
dmu_ot_byteswap[DMU_OT_BYTESWAP(idx)].ob_name : "UNKNOWN")
#define ZDB_OT_TYPE(idx) ((idx) < DMU_OT_NUMTYPES ? (idx) : \
(((idx) == DMU_OTN_ZAP_DATA || (idx) == DMU_OTN_ZAP_METADATA) ? \
DMU_OT_ZAP_OTHER : DMU_OT_NUMTYPES))
static char *
zdb_ot_name(dmu_object_type_t type)
{
if (type < DMU_OT_NUMTYPES)
return (dmu_ot[type].ot_name);
else if ((type & DMU_OT_NEWTYPE) &&
((type & DMU_OT_BYTESWAP_MASK) < DMU_BSWAP_NUMFUNCS))
return (dmu_ot_byteswap[type & DMU_OT_BYTESWAP_MASK].ob_name);
else
return ("UNKNOWN");
}
#ifndef lint
extern int zfs_recover;
extern uint64_t zfs_arc_max, zfs_arc_meta_limit;
@ -469,7 +478,7 @@ static void
dump_bpobj_subobjs(objset_t *os, uint64_t object, void *data, size_t size)
{
dmu_object_info_t doi;
uint64_t i;
int64_t i;
VERIFY0(dmu_object_info(os, object, &doi));
uint64_t *subobjs = kmem_alloc(doi.doi_max_offset, KM_SLEEP);
@ -488,7 +497,7 @@ dump_bpobj_subobjs(objset_t *os, uint64_t object, void *data, size_t size)
}
for (i = 0; i <= last_nonzero; i++) {
(void) printf("\t%llu\n", (longlong_t)subobjs[i]);
(void) printf("\t%llu\n", (u_longlong_t)subobjs[i]);
}
kmem_free(subobjs, doi.doi_max_offset);
}
@ -1925,12 +1934,12 @@ dump_object(objset_t *os, uint64_t object, int verbosity, int *print_header)
(void) printf("%10lld %3u %5s %5s %5s %5s %6s %s%s\n",
(u_longlong_t)object, doi.doi_indirection, iblk, dblk,
asize, lsize, fill, ZDB_OT_NAME(doi.doi_type), aux);
asize, lsize, fill, zdb_ot_name(doi.doi_type), aux);
if (doi.doi_bonus_type != DMU_OT_NONE && verbosity > 3) {
(void) printf("%10s %3s %5s %5s %5s %5s %6s %s\n",
"", "", "", "", "", bonus_size, "bonus",
ZDB_OT_NAME(doi.doi_bonus_type));
zdb_ot_name(doi.doi_bonus_type));
}
if (verbosity >= 4) {

View File

@ -444,13 +444,13 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
/*
* If we're recursive, then we always allow filesystems as
* arguments. If we also are interested in snapshots, then we
* can take volumes as well.
* arguments. If we also are interested in snapshots or
* bookmarks, then we can take volumes as well.
*/
argtype = types;
if (flags & ZFS_ITER_RECURSE) {
argtype |= ZFS_TYPE_FILESYSTEM;
if (types & ZFS_TYPE_SNAPSHOT)
if (types & (ZFS_TYPE_SNAPSHOT | ZFS_TYPE_BOOKMARK))
argtype |= ZFS_TYPE_VOLUME;
}

View File

@ -6038,6 +6038,7 @@ main(int argc, char **argv)
(void) setlocale(LC_ALL, "");
(void) textdomain(TEXT_DOMAIN);
srand(time(NULL));
dprintf_setup(&argc, argv);

View File

@ -1206,12 +1206,10 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
/*
* Remove any previously existing symlink from a udev path to
* the device before labeling the disk. This makes
* zpool_label_disk_wait() truly wait for the new link to show
* up instead of returning if it finds an old link still in
* place. Otherwise there is a window between when udev
* deletes and recreates the link during which access attempts
* will fail with ENOENT.
* the device before labeling the disk. This ensures that
* only newly created links are used. Otherwise there is a
* window between when udev deletes and recreates the link
* during which access attempts will fail with ENOENT.
*/
strncpy(udevpath, path, MAXPATHLEN);
(void) zfs_append_partition(udevpath, MAXPATHLEN);
@ -1235,6 +1233,8 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
* and then block until udev creates the new link.
*/
if (!is_exclusive || !is_spare(NULL, udevpath)) {
char *devnode = strrchr(devpath, '/') + 1;
ret = strncmp(udevpath, UDISK_ROOT, strlen(UDISK_ROOT));
if (ret == 0) {
ret = lstat64(udevpath, &statbuf);
@ -1242,18 +1242,29 @@ make_disks(zpool_handle_t *zhp, nvlist_t *nv)
(void) unlink(udevpath);
}
if (zpool_label_disk(g_zfs, zhp,
strrchr(devpath, '/') + 1) == -1)
/*
* When labeling a pool the raw device node name
* is provided as it appears under /dev/.
*/
if (zpool_label_disk(g_zfs, zhp, devnode) == -1)
return (-1);
/*
* Wait for udev to signal the device is available
* by the provided path.
*/
ret = zpool_label_disk_wait(udevpath, DISK_LABEL_WAIT);
if (ret) {
(void) fprintf(stderr, gettext("cannot "
"resolve path '%s': %d\n"), udevpath, ret);
return (-1);
(void) fprintf(stderr,
gettext("missing link: %s was "
"partitioned but %s is missing\n"),
devnode, udevpath);
return (ret);
}
(void) zero_label(udevpath);
ret = zero_label(udevpath);
if (ret)
return (ret);
}
/*

View File

@ -1,6 +1,8 @@
include $(top_srcdir)/config/Rules.am
AM_CFLAGS += $(DEBUG_STACKFLAGS) $(FRAME_LARGER_THAN)
# -Wnoformat-truncation to get rid of compiler warning for unchecked
# truncating snprintfs on gcc 7.1.1.
AM_CFLAGS += $(DEBUG_STACKFLAGS) $(FRAME_LARGER_THAN) $(NO_FORMAT_TRUNCATION)
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \

View File

@ -7,7 +7,8 @@ AM_CFLAGS += ${NO_BOOL_COMPARE}
AM_CFLAGS += -fno-strict-aliasing
AM_CPPFLAGS = -D_GNU_SOURCE -D__EXTENSIONS__ -D_REENTRANT
AM_CPPFLAGS += -D_POSIX_PTHREAD_SEMANTICS -D_FILE_OFFSET_BITS=64
AM_CPPFLAGS += -D_LARGEFILE64_SOURCE -DTEXT_DOMAIN=\"zfs-linux-user\"
AM_CPPFLAGS += -D_LARGEFILE64_SOURCE -DHAVE_LARGE_STACKS=1
AM_CPPFLAGS += -DTEXT_DOMAIN=\"zfs-linux-user\"
AM_CPPFLAGS += -DLIBEXECDIR=\"$(libexecdir)\"
AM_CPPFLAGS += -DRUNSTATEDIR=\"$(runstatedir)\"
AM_CPPFLAGS += -DSBINDIR=\"$(sbindir)\"

View File

@ -39,6 +39,35 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_RELEASE], [
])
])
dnl #
dnl # 3.14 API change,
dnl # set_cached_acl() and forget_cached_acl() changed from inline to
dnl # EXPORT_SYMBOL. In the former case, they may not be usable because of
dnl # posix_acl_release. In the latter case, we can always use them.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE], [
AC_MSG_CHECKING([whether set_cached_acl() is usable])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/cred.h>
#include <linux/fs.h>
#include <linux/posix_acl.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct inode *ip = NULL;
struct posix_acl *acl = posix_acl_alloc(1, 0);
set_cached_acl(ip, ACL_TYPE_ACCESS, acl);
forget_cached_acl(ip, ACL_TYPE_ACCESS);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SET_CACHED_ACL_USABLE, 1,
[posix_acl_release() is usable])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 3.1 API change,
dnl # posix_acl_chmod_masq() is not exported anymore and posix_acl_chmod()
@ -75,27 +104,6 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_CHMOD], [
])
])
dnl #
dnl # 2.6.30 API change,
dnl # caching of ACL into the inode was added in this version.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_CACHING], [
AC_MSG_CHECKING([whether inode has i_acl and i_default_acl])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
struct inode ino;
ino.i_acl = NULL;
ino.i_default_acl = NULL;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_POSIX_ACL_CACHING, 1,
[inode contains i_acl and i_default_acl])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 3.1 API change,
dnl # posix_acl_equiv_mode now wants an umode_t* instead of a mode_t*
@ -117,6 +125,30 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T], [
])
])
dnl #
dnl # 4.8 API change,
dnl # The function posix_acl_valid now must be passed a namespace.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_VALID_WITH_NS], [
AC_MSG_CHECKING([whether posix_acl_valid() wants user namespace])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
#include <linux/posix_acl.h>
],[
struct user_namespace *user_ns = NULL;
const struct posix_acl *acl = NULL;
int error;
error = posix_acl_valid(user_ns, acl);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_POSIX_ACL_VALID_WITH_NS, 1,
[posix_acl_valid() wants user namespace])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 2.6.27 API change,
dnl # Check if inode_operations contains the function permission
@ -247,18 +279,45 @@ AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_GET_ACL], [
])
dnl #
dnl # 2.6.30 API change,
dnl # current_umask exists only since this version.
dnl # 3.14 API change,
dnl # Check if inode_operations contains the function set_acl
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CURRENT_UMASK], [
AC_MSG_CHECKING([whether current_umask exists])
AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_SET_ACL], [
AC_MSG_CHECKING([whether iops->set_acl() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int set_acl_fn(struct inode *inode, struct posix_acl *acl, int type)
{ return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.set_acl = set_acl_fn,
};
],[
current_umask();
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_CURRENT_UMASK, 1, [current_umask() exists])
AC_DEFINE(HAVE_SET_ACL, 1, [iops->set_acl() exists])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 4.7 API change,
dnl # The kernel get_acl will now check cache before calling i_op->get_acl and
dnl # do set_cached_acl after that, so i_op->get_acl don't need to do that
dnl # anymore.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_GET_ACL_HANDLE_CACHE], [
AC_MSG_CHECKING([whether uncached_acl_sentinel() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
void *sentinel __attribute__ ((unused)) = uncached_acl_sentinel(NULL);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_KERNEL_GET_ACL_HANDLE_CACHE, 1, [uncached_acl_sentinel() exists])
],[
AC_MSG_RESULT(no)
])

View File

@ -0,0 +1,21 @@
dnl #
dnl # Linux 4.9-rc5+ ABI, removal of the .aio_fsync field
dnl #
AC_DEFUN([ZFS_AC_KERNEL_AIO_FSYNC], [
AC_MSG_CHECKING([whether fops->aio_fsync() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
static const struct file_operations
fops __attribute__ ((unused)) = {
.aio_fsync = NULL,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FILE_AIO_FSYNC, 1, [fops->aio_fsync() exists])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -1,38 +0,0 @@
dnl #
dnl # 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
dnl # 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
dnl # 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BDI_SETUP_AND_REGISTER], [
AC_MSG_CHECKING([whether bdi_setup_and_register() wants 2 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_2ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 2 args])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bdi_setup_and_register() wants 3 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
unsigned int cap = BDI_CAP_MAP_COPY;
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name, cap);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_3ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 3 args])
], [
AC_MSG_RESULT(no)
])
])
])

56
config/kernel-bdi.m4 Normal file
View File

@ -0,0 +1,56 @@
dnl #
dnl # 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
dnl # 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
dnl # 4.0 - 4.11, bdi_setup_and_register() takes 2 arguments.
dnl # 4.12 - x.y, super_setup_bdi_name() new interface.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BDI], [
AC_MSG_CHECKING([whether super_setup_bdi_name() exists])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
struct super_block sb;
], [
char *name = "bdi";
int error __attribute__((unused)) =
super_setup_bdi_name(&sb, name);
], [super_setup_bdi_name], [fs/super.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SUPER_SETUP_BDI_NAME, 1,
[super_setup_bdi_name() exits])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether bdi_setup_and_register() wants 2 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_2ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 2 args])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether bdi_setup_and_register() wants 3 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/backing-dev.h>
struct backing_dev_info bdi;
], [
char *name = "bdi";
unsigned int cap = BDI_CAP_MAP_COPY;
int error __attribute__((unused)) =
bdi_setup_and_register(&bdi, name, cap);
], [bdi_setup_and_register], [mm/backing-dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_3ARGS_BDI_SETUP_AND_REGISTER, 1,
[bdi_setup_and_register() wants 3 args])
], [
AC_MSG_RESULT(no)
])
])
])
])

84
config/kernel-bio-op.m4 Normal file
View File

@ -0,0 +1,84 @@
dnl #
dnl # Linux 4.8 API,
dnl #
dnl # The bio_op() helper was introduced as a replacement for explicitly
dnl # checking the bio->bi_rw flags. The following checks are used to
dnl # detect if a specific operation is supported.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_REQ_OP_DISCARD], [
AC_MSG_CHECKING([whether REQ_OP_DISCARD is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blk_types.h>
],[
int op __attribute__ ((unused)) = REQ_OP_DISCARD;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_REQ_OP_DISCARD, 1,
[REQ_OP_DISCARD is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_REQ_OP_SECURE_ERASE], [
AC_MSG_CHECKING([whether REQ_OP_SECURE_ERASE is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blk_types.h>
],[
int op __attribute__ ((unused)) = REQ_OP_SECURE_ERASE;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_REQ_OP_SECURE_ERASE, 1,
[REQ_OP_SECURE_ERASE is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_REQ_OP_FLUSH], [
AC_MSG_CHECKING([whether REQ_OP_FLUSH is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blk_types.h>
],[
int op __attribute__ ((unused)) = REQ_OP_FLUSH;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_REQ_OP_FLUSH, 1,
[REQ_OP_FLUSH is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_BIO_BI_OPF], [
AC_MSG_CHECKING([whether bio->bi_opf is defined])
ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h>
],[
struct bio bio __attribute__ ((unused));
bio.bi_opf = 0;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BIO_BI_OPF, 1, [bio->bi_opf is defined])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_HAVE_BIO_SET_OP_ATTRS], [
AC_MSG_CHECKING([whether bio_set_op_attrs is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h>
],[
struct bio *bio __attribute__ ((unused)) = NULL;
bio_set_op_attrs(bio, 0, 0);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BIO_SET_OP_ATTRS, 1,
[bio_set_op_attrs is available])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -22,25 +22,64 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_FLUSH], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_FLUSH, 1,
[blk_queue_flush() is available])
AC_MSG_CHECKING([whether blk_queue_flush() is GPL-only])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/blkdev.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct request_queue *q = NULL;
(void) blk_queue_flush(q, REQ_FLUSH);
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY, 1,
[blk_queue_flush() is GPL-only])
])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether blk_queue_flush() is GPL-only])
dnl #
dnl # 4.7 API change
dnl # Replace blk_queue_flush with blk_queue_write_cache
dnl #
AC_MSG_CHECKING([whether blk_queue_write_cache() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/blkdev.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct request_queue *q = NULL;
(void) blk_queue_flush(q, REQ_FLUSH);
],[
AC_MSG_RESULT(no)
blk_queue_write_cache(q, true, true);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY, 1,
[blk_queue_flush() is GPL-only])
AC_DEFINE(HAVE_BLK_QUEUE_WRITE_CACHE, 1,
[blk_queue_write_cache() exists])
AC_MSG_CHECKING([whether blk_queue_write_cache() is GPL-only])
ZFS_LINUX_TRY_COMPILE([
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/blkdev.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct request_queue *q = NULL;
blk_queue_write_cache(q, true, true);
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY, 1,
[blk_queue_write_cache() is GPL-only])
])
],[
AC_MSG_RESULT(no)
])
EXTRA_KCFLAGS="$tmp_flags"
])

View File

@ -0,0 +1,44 @@
dnl #
dnl # 2.6.32-2.6.35 API - The BIO_RW_UNPLUG enum can be used as a hint
dnl # to unplug the queue.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BIO_RW_UNPLUG], [
AC_MSG_CHECKING([whether the BIO_RW_UNPLUG enum is available])
tmp_flags="$EXTRA_KCFLAGS"
EXTRA_KCFLAGS="${NO_UNUSED_BUT_SET_VARIABLE}"
ZFS_LINUX_TRY_COMPILE([
#include <linux/blkdev.h>
],[
extern enum bio_rw_flags rw;
rw = BIO_RW_UNPLUG;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_HAVE_BIO_RW_UNPLUG, 1,
[BIO_RW_UNPLUG is available])
],[
AC_MSG_RESULT(no)
])
EXTRA_KCFLAGS="$tmp_flags"
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BLK_PLUG], [
AC_MSG_CHECKING([whether struct blk_plug is available])
tmp_flags="$EXTRA_KCFLAGS"
EXTRA_KCFLAGS="${NO_UNUSED_BUT_SET_VARIABLE}"
ZFS_LINUX_TRY_COMPILE([
#include <linux/blkdev.h>
],[
struct blk_plug plug;
blk_start_plug(&plug);
blk_finish_plug(&plug);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_HAVE_BLK_PLUG, 1,
[struct blk_plug is available])
],[
AC_MSG_RESULT(no)
])
EXTRA_KCFLAGS="$tmp_flags"
])

View File

@ -0,0 +1,19 @@
dnl #
dnl # 4.9, current_time() added
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CURRENT_TIME],
[AC_MSG_CHECKING([whether current_time() exists])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
struct inode ip;
struct timespec now __attribute__ ((unused));
now = current_time(&ip);
], [current_time], [fs/inode.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_CURRENT_TIME, 1, [current_time() exists])
], [
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,22 @@
dnl #
dnl # 4.10 API
dnl #
dnl # NULL inode_operations.readlink implies generic_readlink(), which
dnl # has been made static.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_GENERIC_READLINK_GLOBAL], [
AC_MSG_CHECKING([whether generic_readlink is global])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
int i __attribute__ ((unused));
i = generic_readlink(NULL, NULL, 0);
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_GENERIC_READLINK, 1,
[generic_readlink is global])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -0,0 +1,67 @@
dnl #
dnl # Linux 4.11 API
dnl # See torvalds/linux@a528d35
dnl #
AC_DEFUN([ZFS_AC_PATH_KERNEL_IOPS_GETATTR], [
AC_MSG_CHECKING([whether iops->getattr() takes a path])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int test_getattr(
const struct path *p, struct kstat *k,
u32 request_mask, unsigned int query_flags)
{ return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.getattr = test_getattr,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_PATH_IOPS_GETATTR, 1,
[iops->getattr() takes a path])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Linux 3.9 - 4.10 API
dnl #
AC_DEFUN([ZFS_AC_VFSMOUNT_KERNEL_IOPS_GETATTR], [
AC_MSG_CHECKING([whether iops->getattr() takes a vfsmount])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int test_getattr(
struct vfsmount *mnt, struct dentry *d,
struct kstat *k)
{ return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.getattr = test_getattr,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFSMOUNT_IOPS_GETATTR, 1,
[iops->getattr() takes a vfsmount])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # The interface of the getattr callback from the inode_operations
dnl # structure changed. Also, the interface of the simple_getattr()
dnl # function provided by the kernel changed.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_GETATTR], [
ZFS_AC_PATH_KERNEL_IOPS_GETATTR
ZFS_AC_VFSMOUNT_KERNEL_IOPS_GETATTR
])

View File

@ -1,17 +1,29 @@
dnl #
dnl # 2.6.27 API change
dnl # lookup_bdev() was exported.
dnl # 2.6.27, lookup_bdev() was exported.
dnl # 4.4.0-6.21 - x.y on Ubuntu, lookup_bdev() takes 2 arguments.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_LOOKUP_BDEV],
[AC_MSG_CHECKING([whether lookup_bdev() is available])
[AC_MSG_CHECKING([whether lookup_bdev() wants 1 arg])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
lookup_bdev(NULL);
], [lookup_bdev], [fs/block_dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_LOOKUP_BDEV, 1, [lookup_bdev() is available])
AC_DEFINE(HAVE_1ARG_LOOKUP_BDEV, 1, [lookup_bdev() wants 1 arg])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether lookup_bdev() wants 2 args])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
lookup_bdev(NULL, FMODE_READ);
], [lookup_bdev], [fs/block_dev.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_2ARGS_LOOKUP_BDEV, 1,
[lookup_bdev() wants 2 args])
], [
AC_MSG_RESULT(no)
])
])
])
])

25
config/kernel-rename.m4 Normal file
View File

@ -0,0 +1,25 @@
dnl #
dnl # 4.9 API change,
dnl # iops->rename2() merged into iops->rename(), and iops->rename() now wants
dnl # flags.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_RENAME_WANTS_FLAGS], [
AC_MSG_CHECKING([whether iops->rename() wants flags])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int rename_fn(struct inode *sip, struct dentry *sdp,
struct inode *tip, struct dentry *tdp,
unsigned int flags) { return 0; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.rename = rename_fn,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_RENAME_WANTS_FLAGS, 1, [iops->rename() wants flags])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,23 @@
dnl #
dnl # 4.9 API change
dnl # The inode_change_ok() function has been renamed setattr_prepare()
dnl # and updated to take a dentry rather than an inode.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SETATTR_PREPARE],
[AC_MSG_CHECKING([whether setattr_prepare() is available])
ZFS_LINUX_TRY_COMPILE_SYMBOL([
#include <linux/fs.h>
], [
struct dentry *dentry = NULL;
struct iattr *attr = NULL;
int error;
error = setattr_prepare(dentry, attr);
], [setattr_prepare], [fs/attr.c], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SETATTR_PREPARE, 1,
[setattr_prepare() is available])
], [
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,20 @@
dnl #
dnl # 4.8 API change
dnl # The rw argument has been removed from submit_bio/submit_bio_wait.
dnl # Callers are now expected to set bio->bi_rw instead of passing it in.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SUBMIT_BIO], [
AC_MSG_CHECKING([whether submit_bio() wants 1 arg])
ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h>
],[
blk_qc_t blk_qc;
struct bio *bio = NULL;
blk_qc = submit_bio(bio);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_1ARG_SUBMIT_BIO, 1, [submit_bio() wants 1 arg])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -1,8 +1,8 @@
dnl #
dnl # 3.11 API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_ITERATE], [
AC_MSG_CHECKING([whether fops->iterate() is available])
dnl #
dnl # 4.7 API change
dnl #
AC_MSG_CHECKING([whether fops->iterate_shared() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int iterate(struct file *filp, struct dir_context * context)
@ -10,34 +10,55 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_ITERATE], [
static const struct file_operations fops
__attribute__ ((unused)) = {
.iterate = iterate,
.iterate_shared = iterate,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_ITERATE, 1,
[fops->iterate() is available])
AC_DEFINE(HAVE_VFS_ITERATE_SHARED, 1,
[fops->iterate_shared() is available])
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether fops->readdir() is available])
dnl #
dnl # 3.11 API change
dnl #
AC_MSG_CHECKING([whether fops->iterate() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int readdir(struct file *filp, void *entry, filldir_t func)
int iterate(struct file *filp, struct dir_context * context)
{ return 0; }
static const struct file_operations fops
__attribute__ ((unused)) = {
.readdir = readdir,
.iterate = iterate,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_READDIR, 1,
[fops->readdir() is available])
AC_DEFINE(HAVE_VFS_ITERATE, 1,
[fops->iterate() is available])
],[
AC_MSG_ERROR(no; file a bug report with ZFSOnLinux)
])
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether fops->readdir() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
int readdir(struct file *filp, void *entry, filldir_t func)
{ return 0; }
static const struct file_operations fops
__attribute__ ((unused)) = {
.readdir = readdir,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_READDIR, 1,
[fops->readdir() is available])
],[
AC_MSG_ERROR(no; file a bug report with ZFSOnLinux)
])
])
])
])

View File

@ -1,5 +1,5 @@
dnl #
dnl # Linux 4.1.x API
dnl # Linux 3.16 API
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_RW_ITERATE],
[AC_MSG_CHECKING([whether fops->read/write_iter() are available])
@ -21,6 +21,47 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_RW_ITERATE],
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_RW_ITERATE, 1,
[fops->read/write_iter() are available])
ZFS_AC_KERNEL_NEW_SYNC_READ
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Linux 4.1 API
dnl #
AC_DEFUN([ZFS_AC_KERNEL_NEW_SYNC_READ],
[AC_MSG_CHECKING([whether new_sync_read() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
new_sync_read(NULL, NULL, 0, NULL);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_NEW_SYNC_READ, 1,
[new_sync_read() is available])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Linux 4.1.x API
dnl #
AC_DEFUN([ZFS_AC_KERNEL_GENERIC_WRITE_CHECKS],
[AC_MSG_CHECKING([whether generic_write_checks() takes kiocb])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
],[
struct kiocb *iocb = NULL;
struct iov_iter *iov = NULL;
generic_write_checks(iocb, iov);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_GENERIC_WRITE_CHECKS_KIOCB, 1,
[generic_write_checks() takes kiocb])
],[
AC_MSG_RESULT(no)
])

View File

@ -32,23 +32,72 @@ AC_DEFUN([ZFS_AC_KERNEL_CONST_XATTR_HANDLER], [
])
])
dnl #
dnl # 4.5 API change,
dnl # struct xattr_handler added new member "name".
dnl # xattr_handler which matches to whole name rather than prefix should use
dnl # "name" instead of "prefix", e.g. "system.posix_acl_access"
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_NAME], [
AC_MSG_CHECKING([whether xattr_handler has name])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.name = XATTR_NAME_POSIX_ACL_ACCESS,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_HANDLER_NAME, 1,
[xattr_handler has name])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 4.9 API change,
dnl # iops->{set,get,remove}xattr and generic_{set,get,remove}xattr are
dnl # removed. xattr operations will directly go through sb->s_xattr.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_HAVE_GENERIC_SETXATTR], [
AC_MSG_CHECKING([whether generic_setxattr() exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
#include <linux/xattr.h>
static const struct inode_operations
iops __attribute__ ((unused)) = {
.setxattr = generic_setxattr
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_GENERIC_SETXATTR, 1,
[generic_setxattr() exists])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # Supported xattr handler get() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
dnl #
dnl # 4.4 API change,
dnl # The xattr_handler->get() callback was changed to take a
dnl # attr_handler, and handler_flags argument was removed and
dnl # should be accessed by handler->flags.
dnl # 4.7 API change,
dnl # The xattr_handler->get() callback was changed to take both
dnl # dentry and inode.
dnl #
AC_MSG_CHECKING([whether xattr_handler->get() wants xattr_handler])
AC_MSG_CHECKING([whether xattr_handler->get() wants both dentry and inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
void *buffer, size_t size) { return 0; }
struct dentry *dentry, struct inode *inode,
const char *name, void *buffer, size_t size) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
@ -56,23 +105,22 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_HANDLER, 1,
[xattr_handler->get() wants xattr_handler])
AC_DEFINE(HAVE_XATTR_GET_DENTRY_INODE, 1,
[xattr_handler->get() wants both dentry and inode])
],[
dnl #
dnl # 2.6.33 API change,
dnl # The xattr_handler->get() callback was changed to take
dnl # a dentry instead of an inode, and a handler_flags
dnl # argument was added.
dnl # 4.4 API change,
dnl # The xattr_handler->get() callback was changed to take a
dnl # attr_handler, and handler_flags argument was removed and
dnl # should be accessed by handler->flags.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether xattr_handler->get() wants dentry])
AC_MSG_CHECKING([whether xattr_handler->get() wants xattr_handler])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int handler_flags)
{ return 0; }
int get(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
void *buffer, size_t size) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
@ -80,20 +128,23 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_DENTRY, 1,
[xattr_handler->get() wants dentry])
AC_DEFINE(HAVE_XATTR_GET_HANDLER, 1,
[xattr_handler->get() wants xattr_handler])
],[
dnl #
dnl # 2.6.32 API
dnl # 2.6.33 API change,
dnl # The xattr_handler->get() callback was changed to take
dnl # a dentry instead of an inode, and a handler_flags
dnl # argument was added.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->get() wants inode])
AC_MSG_CHECKING([whether xattr_handler->get() wants dentry])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(struct inode *ip, const char *name,
void *buffer, size_t size) { return 0; }
int get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int handler_flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
@ -101,10 +152,32 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_GET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_INODE, 1,
[xattr_handler->get() wants inode])
AC_DEFINE(HAVE_XATTR_GET_DENTRY, 1,
[xattr_handler->get() wants dentry])
],[
AC_MSG_ERROR([no; please file a bug report])
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->get() wants inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int get(struct inode *ip, const char *name,
void *buffer, size_t size) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.get = get,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_GET_INODE, 1,
[xattr_handler->get() wants inode])
],[
AC_MSG_ERROR([no; please file a bug report])
])
])
])
])
@ -115,18 +188,18 @@ dnl # Supported xattr handler set() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_SET], [
dnl #
dnl # 4.4 API change,
dnl # The xattr_handler->set() callback was changed to take a
dnl # xattr_handler, and handler_flags argument was removed and
dnl # should be accessed by handler->flags.
dnl # 4.7 API change,
dnl # The xattr_handler->set() callback was changed to take both
dnl # dentry and inode.
dnl #
AC_MSG_CHECKING([whether xattr_handler->set() wants xattr_handler])
AC_MSG_CHECKING([whether xattr_handler->set() wants both dentry and inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags)
struct dentry *dentry, struct inode *inode,
const char *name, const void *buffer,
size_t size, int flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
@ -135,23 +208,23 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_SET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_HANDLER, 1,
[xattr_handler->set() wants xattr_handler])
AC_DEFINE(HAVE_XATTR_SET_DENTRY_INODE, 1,
[xattr_handler->set() wants both dentry and inode])
],[
dnl #
dnl # 2.6.33 API change,
dnl # 4.4 API change,
dnl # The xattr_handler->set() callback was changed to take a
dnl # dentry instead of an inode, and a handler_flags
dnl # argument was added.
dnl # xattr_handler, and handler_flags argument was removed and
dnl # should be accessed by handler->flags.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether xattr_handler->set() wants dentry])
AC_MSG_CHECKING([whether xattr_handler->set() wants xattr_handler])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags,
int handler_flags) { return 0; }
int set(const struct xattr_handler *handler,
struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
@ -159,21 +232,23 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_SET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_DENTRY, 1,
[xattr_handler->set() wants dentry])
AC_DEFINE(HAVE_XATTR_SET_HANDLER, 1,
[xattr_handler->set() wants xattr_handler])
],[
dnl #
dnl # 2.6.32 API
dnl # 2.6.33 API change,
dnl # The xattr_handler->set() callback was changed to take a
dnl # dentry instead of an inode, and a handler_flags
dnl # argument was added.
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->set() wants inode])
AC_MSG_CHECKING([whether xattr_handler->set() wants dentry])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(struct inode *ip, const char *name,
const void *buffer, size_t size, int flags)
{ return 0; }
int set(struct dentry *dentry, const char *name,
const void *buffer, size_t size, int flags,
int handler_flags) { return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
@ -181,10 +256,33 @@ AC_DEFUN([ZFS_AC_KERNEL_XATTR_HANDLER_SET], [
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_INODE, 1,
[xattr_handler->set() wants inode])
AC_DEFINE(HAVE_XATTR_SET_DENTRY, 1,
[xattr_handler->set() wants dentry])
],[
AC_MSG_ERROR([no; please file a bug report])
dnl #
dnl # 2.6.32 API
dnl #
AC_MSG_RESULT(no)
AC_MSG_CHECKING(
[whether xattr_handler->set() wants inode])
ZFS_LINUX_TRY_COMPILE([
#include <linux/xattr.h>
int set(struct inode *ip, const char *name,
const void *buffer, size_t size, int flags)
{ return 0; }
static const struct xattr_handler
xops __attribute__ ((unused)) = {
.set = set,
};
],[
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_XATTR_SET_INODE, 1,
[xattr_handler->set() wants inode])
],[
AC_MSG_ERROR([no; please file a bug report])
])
])
])
])

View File

@ -8,6 +8,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_CONFIG
ZFS_AC_KERNEL_DECLARE_EVENT_CLASS
ZFS_AC_KERNEL_CURRENT_BIO_TAIL
ZFS_AC_KERNEL_SUBMIT_BIO
ZFS_AC_KERNEL_BDEV_BLOCK_DEVICE_OPERATIONS
ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID
ZFS_AC_KERNEL_TYPE_FMODE_T
@ -22,31 +23,43 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_BIO_BVEC_ITER
ZFS_AC_KERNEL_BIO_FAILFAST_DTD
ZFS_AC_KERNEL_REQ_FAILFAST_MASK
ZFS_AC_KERNEL_REQ_OP_DISCARD
ZFS_AC_KERNEL_REQ_OP_SECURE_ERASE
ZFS_AC_KERNEL_REQ_OP_FLUSH
ZFS_AC_KERNEL_BIO_BI_OPF
ZFS_AC_KERNEL_BIO_END_IO_T_ARGS
ZFS_AC_KERNEL_BIO_RW_BARRIER
ZFS_AC_KERNEL_BIO_RW_DISCARD
ZFS_AC_KERNEL_BLK_QUEUE_FLUSH
ZFS_AC_KERNEL_BLK_QUEUE_MAX_HW_SECTORS
ZFS_AC_KERNEL_BLK_QUEUE_MAX_SEGMENTS
ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BIO_RW_UNPLUG
ZFS_AC_KERNEL_BLK_QUEUE_HAVE_BLK_PLUG
ZFS_AC_KERNEL_GET_DISK_RO
ZFS_AC_KERNEL_GET_GENDISK
ZFS_AC_KERNEL_HAVE_BIO_SET_OP_ATTRS
ZFS_AC_KERNEL_GENERIC_READLINK_GLOBAL
ZFS_AC_KERNEL_DISCARD_GRANULARITY
ZFS_AC_KERNEL_CONST_XATTR_HANDLER
ZFS_AC_KERNEL_XATTR_HANDLER_NAME
ZFS_AC_KERNEL_XATTR_HANDLER_GET
ZFS_AC_KERNEL_XATTR_HANDLER_SET
ZFS_AC_KERNEL_XATTR_HANDLER_LIST
ZFS_AC_KERNEL_INODE_OWNER_OR_CAPABLE
ZFS_AC_KERNEL_POSIX_ACL_FROM_XATTR_USERNS
ZFS_AC_KERNEL_POSIX_ACL_RELEASE
ZFS_AC_KERNEL_SET_CACHED_ACL_USABLE
ZFS_AC_KERNEL_POSIX_ACL_CHMOD
ZFS_AC_KERNEL_POSIX_ACL_CACHING
ZFS_AC_KERNEL_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T
ZFS_AC_KERNEL_POSIX_ACL_VALID_WITH_NS
ZFS_AC_KERNEL_INODE_OPERATIONS_PERMISSION
ZFS_AC_KERNEL_INODE_OPERATIONS_PERMISSION_WITH_NAMEIDATA
ZFS_AC_KERNEL_INODE_OPERATIONS_CHECK_ACL
ZFS_AC_KERNEL_INODE_OPERATIONS_CHECK_ACL_WITH_FLAGS
ZFS_AC_KERNEL_INODE_OPERATIONS_GET_ACL
ZFS_AC_KERNEL_CURRENT_UMASK
ZFS_AC_KERNEL_INODE_OPERATIONS_SET_ACL
ZFS_AC_KERNEL_INODE_OPERATIONS_GETATTR
ZFS_AC_KERNEL_GET_ACL_HANDLE_CACHE
ZFS_AC_KERNEL_SHOW_OPTIONS
ZFS_AC_KERNEL_FILE_INODE
ZFS_AC_KERNEL_FSYNC
@ -55,6 +68,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_NR_CACHED_OBJECTS
ZFS_AC_KERNEL_FREE_CACHED_OBJECTS
ZFS_AC_KERNEL_FALLOCATE
ZFS_AC_KERNEL_AIO_FSYNC
ZFS_AC_KERNEL_MKDIR_UMODE_T
ZFS_AC_KERNEL_LOOKUP_NAMEIDATA
ZFS_AC_KERNEL_CREATE_NAMEIDATA
@ -65,6 +79,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_ENCODE_FH_WITH_INODE
ZFS_AC_KERNEL_COMMIT_METADATA
ZFS_AC_KERNEL_CLEAR_INODE
ZFS_AC_KERNEL_SETATTR_PREPARE
ZFS_AC_KERNEL_INSERT_INODE_LOCKED
ZFS_AC_KERNEL_D_MAKE_ROOT
ZFS_AC_KERNEL_D_OBTAIN_ALIAS
@ -81,17 +96,21 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_SHRINK_CONTROL_HAS_NID
ZFS_AC_KERNEL_S_INSTANCES_LIST_HEAD
ZFS_AC_KERNEL_S_D_OP
ZFS_AC_KERNEL_BDI_SETUP_AND_REGISTER
ZFS_AC_KERNEL_BDI
ZFS_AC_KERNEL_SET_NLINK
ZFS_AC_KERNEL_ELEVATOR_CHANGE
ZFS_AC_KERNEL_5ARG_SGET
ZFS_AC_KERNEL_LSEEK_EXECUTE
ZFS_AC_KERNEL_VFS_ITERATE
ZFS_AC_KERNEL_VFS_RW_ITERATE
ZFS_AC_KERNEL_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_MAKE_REQUEST_FN
ZFS_AC_KERNEL_GENERIC_IO_ACCT
ZFS_AC_KERNEL_RENAME_WANTS_FLAGS
ZFS_AC_KERNEL_HAVE_GENERIC_SETXATTR
ZFS_AC_KERNEL_CURRENT_TIME
AS_IF([test "$LINUX_OBJ" != "$LINUX"], [
KERNELMAKE_PARAMS="$KERNELMAKE_PARAMS O=$LINUX_OBJ"
@ -462,9 +481,35 @@ AC_DEFUN([ZFS_AC_KERNEL_CONFIG], [
])
])
ZFS_AC_KERNEL_CONFIG_THREAD_SIZE
ZFS_AC_KERNEL_CONFIG_DEBUG_LOCK_ALLOC
])
dnl #
dnl # Check configured THREAD_SIZE
dnl #
dnl # The stack size will vary by architecture, but as of Linux 3.15 on x86_64
dnl # the default thread stack size was increased to 16K from 8K. Therefore,
dnl # on newer kernels and some architectures stack usage optimizations can be
dnl # conditionally applied to improve performance without negatively impacting
dnl # stability.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_CONFIG_THREAD_SIZE], [
AC_MSG_CHECKING([whether kernel was built with 16K or larger stacks])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
],[
#if (THREAD_SIZE < 16384)
#error "THREAD_SIZE is less than 16K"
#endif
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_LARGE_STACKS, 1, [kernel has large stacks])
],[
AC_MSG_RESULT([no])
])
])
dnl #
dnl # Check CONFIG_DEBUG_LOCK_ALLOC
dnl #
@ -574,7 +619,7 @@ dnl #
dnl # ZFS_LINUX_CONFIG
dnl #
AC_DEFUN([ZFS_LINUX_CONFIG],
[AC_MSG_CHECKING([whether Linux was built with CONFIG_$1])
[AC_MSG_CHECKING([whether kernel was built with CONFIG_$1])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
],[

39
config/user-makedev.m4 Normal file
View File

@ -0,0 +1,39 @@
dnl #
dnl # glibc 2.25
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_MAKEDEV_IN_SYSMACROS], [
AC_MSG_CHECKING([makedev() is declared in sys/sysmacros.h])
AC_TRY_COMPILE(
[
#include <sys/sysmacros.h>
],[
int k;
k = makedev(0,0);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MAKEDEV_IN_SYSMACROS, 1,
[makedev() is declared in sys/sysmacros.h])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # glibc X < Y < 2.25
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV], [
AC_MSG_CHECKING([makedev() is declared in sys/mkdev.h])
AC_TRY_COMPILE(
[
#include <sys/mkdev.h>
],[
int k;
k = makedev(0,0);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MAKEDEV_IN_MKDEV, 1,
[makedev() is declared in sys/mkdev.h])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,22 @@
dnl #
dnl # Check if gcc supports -Wno-format-truncation option.
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_NO_FORMAT_TRUNCATION], [
AC_MSG_CHECKING([for -Wno-format-truncation support])
saved_flags="$CFLAGS"
CFLAGS="$CFLAGS -Wno-format-truncation"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [])],
[
NO_FORMAT_TRUNCATION=-Wno-format-truncation
AC_MSG_RESULT([yes])
],
[
NO_FORMAT_TRUNCATION=
AC_MSG_RESULT([no])
])
CFLAGS="$saved_flags"
AC_SUBST([NO_FORMAT_TRUNCATION])
])

View File

@ -13,6 +13,9 @@ AC_DEFUN([ZFS_AC_CONFIG_USER], [
ZFS_AC_CONFIG_USER_LIBBLKID
ZFS_AC_CONFIG_USER_FRAME_LARGER_THAN
ZFS_AC_CONFIG_USER_RUNSTATEDIR
ZFS_AC_CONFIG_USER_MAKEDEV_IN_SYSMACROS
ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV
ZFS_AC_CONFIG_USER_NO_FORMAT_TRUNCATION
dnl #
dnl # Checks for library functions
AC_CHECK_FUNCS([mlockall])

View File

@ -1 +1,3 @@
zfs
# Always load kernel modules at boot. The default behavior is to load the
# kernel modules in the zfs-import-*.service or when blkid(8) detects a pool.
#zfs

View File

@ -1,2 +1,7 @@
# ZFS is enabled by default
enable zfs.*
enable zfs-import-cache.service
disable zfs-import-scan.service
enable zfs-mount.service
enable zfs-share.service
enable zfs-zed.service
enable zfs.target

View File

@ -2,7 +2,7 @@ systemdpreset_DATA = \
50-zfs.preset
systemdunit_DATA = \
zed.service \
zfs-zed.service \
zfs-import-cache.service \
zfs-import-scan.service \
zfs-mount.service \
@ -10,7 +10,7 @@ systemdunit_DATA = \
zfs.target
EXTRA_DIST = \
$(top_srcdir)/etc/systemd/system/zed.service.in \
$(top_srcdir)/etc/systemd/system/zfs-zed.service.in \
$(top_srcdir)/etc/systemd/system/zfs-import-cache.service.in \
$(top_srcdir)/etc/systemd/system/zfs-import-scan.service.in \
$(top_srcdir)/etc/systemd/system/zfs-mount.service.in \

View File

@ -4,6 +4,7 @@ DefaultDependencies=no
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=cryptsetup.target
After=systemd-remount-fs.service
ConditionPathExists=@sysconfdir@/zfs/zpool.cache
[Service]
@ -11,3 +12,7 @@ Type=oneshot
RemainAfterExit=yes
ExecStartPre=/sbin/modprobe zfs
ExecStart=@sbindir@/zpool import -c @sysconfdir@/zfs/zpool.cache -aN
[Install]
WantedBy=zfs-mount.service
WantedBy=zfs.target

View File

@ -10,4 +10,8 @@ ConditionPathExists=!@sysconfdir@/zfs/zpool.cache
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/sbin/modprobe zfs
ExecStart=@sbindir@/zpool import -d /dev/disk/by-id -aN
ExecStart=@sbindir@/zpool import -aN -o cachefile=none
[Install]
WantedBy=zfs-mount.service
WantedBy=zfs.target

View File

@ -1,15 +1,18 @@
[Unit]
Description=Mount ZFS filesystems
DefaultDependencies=no
Wants=zfs-import-cache.service
Wants=zfs-import-scan.service
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=zfs-import-cache.service
After=zfs-import-scan.service
After=systemd-remount-fs.service
Before=local-fs.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=@sbindir@/zfs mount -a
WorkingDirectory=-/sbin/
[Install]
WantedBy=zfs-share.service
WantedBy=zfs.target

View File

@ -1,14 +1,16 @@
[Unit]
Description=ZFS file system shares
After=nfs-server.service
After=nfs-server.service nfs-kernel-server.service
After=smb.service
After=zfs-mount.service
Requires=zfs-mount.service
PartOf=nfs-server.service
PartOf=nfs-server.service nfs-kernel-server.service
PartOf=smb.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-@bindir@/rm /etc/dfs/sharetab
ExecStartPre=-@bindir@/rm -f /etc/dfs/sharetab
ExecStart=@sbindir@/zfs share -a
[Install]
WantedBy=zfs.target

View File

@ -7,3 +7,7 @@ After=zfs-import-scan.service
[Service]
ExecStart=@sbindir@/zed -F
Restart=on-abort
[Install]
Alias=zed.service
WantedBy=zfs.target

View File

@ -1,8 +1,5 @@
[Unit]
Description=ZFS startup target
Requires=zfs-mount.service
Requires=zfs-share.service
Wants=zed.service
[Install]
WantedBy=multi-user.target

View File

@ -46,11 +46,6 @@
extern "C" {
#endif
#ifdef VERIFY
#undef VERIFY
#endif
#define VERIFY verify
typedef struct libzfs_fru {
char *zf_device;
char *zf_fru;

View File

@ -37,21 +37,48 @@ typedef unsigned __bitwise__ fmode_t;
#endif /* HAVE_FMODE_T */
/*
* 2.6.36 API change,
* 4.7 - 4.x API,
* The blk_queue_write_cache() interface has replaced blk_queue_flush()
* interface. However, the new interface is GPL-only thus we implement
* our own trivial wrapper when the GPL-only version is detected.
*
* 2.6.36 - 4.6 API,
* The blk_queue_flush() interface has replaced blk_queue_ordered()
* interface. However, while the old interface was available to all the
* new one is GPL-only. Thus if the GPL-only version is detected we
* implement our own trivial helper compatibility funcion. The hope is
* that long term this function will be opened up.
* implement our own trivial helper.
*
* 2.6.x - 2.6.35
* Legacy blk_queue_ordered() interface.
*/
#if defined(HAVE_BLK_QUEUE_FLUSH) && defined(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY)
#define blk_queue_flush __blk_queue_flush
static inline void
__blk_queue_flush(struct request_queue *q, unsigned int flags)
blk_queue_set_write_cache(struct request_queue *q, bool wc, bool fua)
{
q->flush_flags = flags & (REQ_FLUSH | REQ_FUA);
#if defined(HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY)
spin_lock_irq(q->queue_lock);
if (wc)
queue_flag_set(QUEUE_FLAG_WC, q);
else
queue_flag_clear(QUEUE_FLAG_WC, q);
if (fua)
queue_flag_set(QUEUE_FLAG_FUA, q);
else
queue_flag_clear(QUEUE_FLAG_FUA, q);
spin_unlock_irq(q->queue_lock);
#elif defined(HAVE_BLK_QUEUE_WRITE_CACHE)
blk_queue_write_cache(q, wc, fua);
#elif defined(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY)
if (wc)
q->flush_flags |= REQ_FLUSH;
if (fua)
q->flush_flags |= REQ_FUA;
#elif defined(HAVE_BLK_QUEUE_FLUSH)
blk_queue_flush(q, (wc ? REQ_FLUSH : 0) | (fua ? REQ_FUA : 0));
#else
blk_queue_ordered(q, QUEUE_ORDERED_DRAIN, NULL);
#endif
}
#endif /* HAVE_BLK_QUEUE_FLUSH && HAVE_BLK_QUEUE_FLUSH_GPL_ONLY */
/*
* Most of the blk_* macros were removed in 2.6.36. Ostensibly this was
* done to improve readability and allow easier grepping. However, from
@ -234,12 +261,21 @@ bio_set_flags_failfast(struct block_device *bdev, int *flags)
/*
* 2.6.27 API change
* The function was exported for use, prior to this it existed by the
* The function was exported for use, prior to this it existed but the
* symbol was not exported.
*
* 4.4.0-6.21 API change for Ubuntu
* lookup_bdev() gained a second argument, FMODE_*, to check inode permissions.
*/
#ifndef HAVE_LOOKUP_BDEV
#define lookup_bdev(path) ERR_PTR(-ENOTSUP)
#endif
#ifdef HAVE_1ARG_LOOKUP_BDEV
#define vdev_lookup_bdev(path) lookup_bdev(path)
#else
#ifdef HAVE_2ARGS_LOOKUP_BDEV
#define vdev_lookup_bdev(path) lookup_bdev(path, 0)
#else
#define vdev_lookup_bdev(path) ERR_PTR(-ENOTSUP)
#endif /* HAVE_2ARGS_LOOKUP_BDEV */
#endif /* HAVE_1ARG_LOOKUP_BDEV */
/*
* 2.6.30 API change
@ -265,48 +301,172 @@ bio_set_flags_failfast(struct block_device *bdev, int *flags)
#endif /* HAVE_BDEV_LOGICAL_BLOCK_SIZE */
#endif /* HAVE_BDEV_PHYSICAL_BLOCK_SIZE */
#ifndef HAVE_BIO_SET_OP_ATTRS
/*
* 2.6.37 API change
* The WRITE_FLUSH, WRITE_FUA, and WRITE_FLUSH_FUA flags have been
* introduced as a replacement for WRITE_BARRIER. This was done to
* allow richer semantics to be expressed to the block layer. It is
* the block layers responsibility to choose the correct way to
* implement these semantics.
*
* The existence of these flags implies that REQ_FLUSH an REQ_FUA are
* defined. Thus we can safely define VDEV_REQ_FLUSH and VDEV_REQ_FUA
* compatibility macros.
* Kernels without bio_set_op_attrs use bi_rw for the bio flags.
*/
#ifdef WRITE_FLUSH_FUA
#define VDEV_WRITE_FLUSH_FUA WRITE_FLUSH_FUA
#define VDEV_REQ_FLUSH REQ_FLUSH
#define VDEV_REQ_FUA REQ_FUA
#else
#define VDEV_WRITE_FLUSH_FUA WRITE_BARRIER
#ifdef HAVE_BIO_RW_BARRIER
#define VDEV_REQ_FLUSH (1 << BIO_RW_BARRIER)
#define VDEV_REQ_FUA (1 << BIO_RW_BARRIER)
#else
#define VDEV_REQ_FLUSH REQ_HARDBARRIER
#define VDEV_REQ_FUA REQ_FUA
#endif
static inline void
bio_set_op_attrs(struct bio *bio, unsigned rw, unsigned flags)
{
bio->bi_rw |= rw | flags;
}
#endif
/*
* 2.6.32 API change
* Use the normal I/O patch for discards.
* bio_set_flush - Set the appropriate flags in a bio to guarantee
* data are on non-volatile media on completion.
*
* 2.6.X - 2.6.36 API,
* WRITE_BARRIER - Tells the block layer to commit all previously submitted
* writes to stable storage before this one is started and that the current
* write is on stable storage upon completion. Also prevents reordering
* on both sides of the current operation.
*
* 2.6.37 - 4.8 API,
* Introduce WRITE_FLUSH, WRITE_FUA, and WRITE_FLUSH_FUA flags as a
* replacement for WRITE_BARRIER to allow expressing richer semantics
* to the block layer. It's up to the block layer to implement the
* semantics correctly. Use the WRITE_FLUSH_FUA flag combination.
*
* 4.8 - 4.9 API,
* REQ_FLUSH was renamed to REQ_PREFLUSH. For consistency with previous
* ZoL releases, prefer the WRITE_FLUSH_FUA flag set if it's available.
*
* 4.10 API,
* The read/write flags and their modifiers, including WRITE_FLUSH,
* WRITE_FUA and WRITE_FLUSH_FUA were removed from fs.h in
* torvalds/linux@70fd7614 and replaced by direct flag modification
* of the REQ_ flags in bio->bi_opf. Use REQ_PREFLUSH.
*/
#ifdef QUEUE_FLAG_DISCARD
#ifdef HAVE_BIO_RW_DISCARD
#define VDEV_REQ_DISCARD (1 << BIO_RW_DISCARD)
static inline void
bio_set_flush(struct bio *bio)
{
#if defined(REQ_PREFLUSH) /* >= 4.10 */
bio_set_op_attrs(bio, 0, REQ_PREFLUSH);
#elif defined(WRITE_FLUSH_FUA) /* >= 2.6.37 and <= 4.9 */
bio_set_op_attrs(bio, 0, WRITE_FLUSH_FUA);
#elif defined(WRITE_BARRIER) /* < 2.6.37 */
bio_set_op_attrs(bio, 0, WRITE_BARRIER);
#else
#define VDEV_REQ_DISCARD REQ_DISCARD
#error "Allowing the build will cause bio_set_flush requests to be ignored."
"Please file an issue report at: "
"https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_OP_FLUSH
*
* 4.8-rc0 - 4.8-rc1,
* REQ_PREFLUSH
*
* 2.6.36 - 4.7 API,
* REQ_FLUSH
*
* 2.6.x - 2.6.35 API,
* HAVE_BIO_RW_BARRIER
*
* Used to determine if a cache flush has been requested. This check has
* been left intentionally broad in order to cover both a legacy flush
* and the new preflush behavior introduced in Linux 4.8. This is correct
* in all cases but may have a performance impact for some kernels. It
* has the advantage of minimizing kernel specific changes in the zvol code.
*
*/
static inline boolean_t
bio_is_flush(struct bio *bio)
{
#if defined(HAVE_REQ_OP_FLUSH) && defined(HAVE_BIO_BI_OPF)
return ((bio_op(bio) == REQ_OP_FLUSH) || (bio->bi_opf & REQ_PREFLUSH));
#elif defined(REQ_PREFLUSH) && defined(HAVE_BIO_BI_OPF)
return (bio->bi_opf & REQ_PREFLUSH);
#elif defined(REQ_PREFLUSH) && !defined(HAVE_BIO_BI_OPF)
return (bio->bi_rw & REQ_PREFLUSH);
#elif defined(REQ_FLUSH)
return (bio->bi_rw & REQ_FLUSH);
#elif defined(HAVE_BIO_RW_BARRIER)
return (bio->bi_rw & (1 << BIO_RW_BARRIER));
#else
#error "Allowing the build will cause flush requests to be ignored. Please "
"file an issue report at: https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_FUA flag moved to bio->bi_opf
*
* 2.6.x - 4.7 API,
* REQ_FUA
*/
static inline boolean_t
bio_is_fua(struct bio *bio)
{
#if defined(HAVE_BIO_BI_OPF)
return (bio->bi_opf & REQ_FUA);
#elif defined(REQ_FUA)
return (bio->bi_rw & REQ_FUA);
#else
#error "Allowing the build will cause fua requests to be ignored. Please "
"file an issue report at: https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_OP_DISCARD
*
* 2.6.36 - 4.7 API,
* REQ_DISCARD
*
* 2.6.28 - 2.6.35 API,
* BIO_RW_DISCARD
*
* In all cases the normal I/O path is used for discards. The only
* difference is how the kernel tags individual I/Os as discards.
*
* Note that 2.6.32 era kernels provide both BIO_RW_DISCARD and REQ_DISCARD,
* where BIO_RW_DISCARD is the correct interface. Therefore, it is important
* that the HAVE_BIO_RW_DISCARD check occur before the REQ_DISCARD check.
*/
static inline boolean_t
bio_is_discard(struct bio *bio)
{
#if defined(HAVE_REQ_OP_DISCARD)
return (bio_op(bio) == REQ_OP_DISCARD);
#elif defined(HAVE_BIO_RW_DISCARD)
return (bio->bi_rw & (1 << BIO_RW_DISCARD));
#elif defined(REQ_DISCARD)
return (bio->bi_rw & REQ_DISCARD);
#else
#error "Allowing the build will cause discard requests to become writes "
"potentially triggering the DMU_MAX_ACCESS assertion. Please file a "
"potentially triggering the DMU_MAX_ACCESS assertion. Please file "
"an issue report at: https://github.com/zfsonlinux/zfs/issues/new"
#endif
}
/*
* 4.8 - 4.x API,
* REQ_OP_SECURE_ERASE
*
* 2.6.36 - 4.7 API,
* REQ_SECURE
*
* 2.6.x - 2.6.35 API,
* Unsupported by kernel
*/
static inline boolean_t
bio_is_secure_erase(struct bio *bio)
{
#if defined(HAVE_REQ_OP_SECURE_ERASE)
return (bio_op(bio) == REQ_OP_SECURE_ERASE);
#elif defined(REQ_SECURE)
return (bio->bi_rw & REQ_SECURE);
#else
return (0);
#endif
}
/*
* 2.6.33 API change

View File

@ -69,45 +69,115 @@ truncate_setsize(struct inode *ip, loff_t new)
/*
* 2.6.32 - 2.6.33, bdi_setup_and_register() is not available.
* 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
* 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.
* 4.0 - 4.11, bdi_setup_and_register() takes 2 arguments.
* 4.12 - x.y, super_setup_bdi_name() new interface.
*/
#if defined(HAVE_2ARGS_BDI_SETUP_AND_REGISTER)
#if defined(HAVE_SUPER_SETUP_BDI_NAME)
extern atomic_long_t zfs_bdi_seq;
static inline int
zpl_bdi_setup_and_register(struct backing_dev_info *bdi, char *name)
zpl_bdi_setup(struct super_block *sb, char *name)
{
return (bdi_setup_and_register(bdi, name));
return super_setup_bdi_name(sb, "%.28s-%ld", name,
atomic_long_inc_return(&zfs_bdi_seq));
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
}
#elif defined(HAVE_2ARGS_BDI_SETUP_AND_REGISTER)
static inline int
zpl_bdi_setup(struct super_block *sb, char *name)
{
struct backing_dev_info *bdi;
int error;
bdi = kmem_zalloc(sizeof (struct backing_dev_info), KM_SLEEP);
error = bdi_setup_and_register(bdi, name);
if (error) {
kmem_free(bdi, sizeof (struct backing_dev_info));
return (error);
}
sb->s_bdi = bdi;
return (0);
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
sb->s_bdi = NULL;
}
#elif defined(HAVE_3ARGS_BDI_SETUP_AND_REGISTER)
static inline int
zpl_bdi_setup_and_register(struct backing_dev_info *bdi, char *name)
zpl_bdi_setup(struct super_block *sb, char *name)
{
return (bdi_setup_and_register(bdi, name, BDI_CAP_MAP_COPY));
struct backing_dev_info *bdi;
int error;
bdi = kmem_zalloc(sizeof (struct backing_dev_info), KM_SLEEP);
error = bdi_setup_and_register(bdi, name, BDI_CAP_MAP_COPY);
if (error) {
kmem_free(sb->s_bdi, sizeof (struct backing_dev_info));
return (error);
}
sb->s_bdi = bdi;
return (0);
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
sb->s_bdi = NULL;
}
#else
extern atomic_long_t zfs_bdi_seq;
static inline int
zpl_bdi_setup_and_register(struct backing_dev_info *bdi, char *name)
zpl_bdi_setup(struct super_block *sb, char *name)
{
char tmp[32];
struct backing_dev_info *bdi;
int error;
bdi = kmem_zalloc(sizeof (struct backing_dev_info), KM_SLEEP);
bdi->name = name;
bdi->capabilities = BDI_CAP_MAP_COPY;
error = bdi_init(bdi);
if (error)
return (error);
sprintf(tmp, "%.28s%s", name, "-%d");
error = bdi_register(bdi, NULL, tmp,
atomic_long_inc_return(&zfs_bdi_seq));
if (error) {
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
return (error);
}
return (error);
error = bdi_register(bdi, NULL, "%.28s-%ld", name,
atomic_long_inc_return(&zfs_bdi_seq));
if (error) {
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
return (error);
}
sb->s_bdi = bdi;
return (0);
}
static inline void
zpl_bdi_destroy(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
bdi_destroy(bdi);
kmem_free(bdi, sizeof (struct backing_dev_info));
sb->s_bdi = NULL;
}
#endif
@ -202,22 +272,11 @@ lseek_execute(
* At 60 seconds the kernel will also begin issuing RCU stall warnings.
*/
#include <linux/posix_acl.h>
#ifndef HAVE_POSIX_ACL_CACHING
#define ACL_NOT_CACHED ((void *)(-1))
#endif /* HAVE_POSIX_ACL_CACHING */
#if defined(HAVE_POSIX_ACL_RELEASE) && !defined(HAVE_POSIX_ACL_RELEASE_GPL_ONLY)
#define zpl_posix_acl_release(arg) posix_acl_release(arg)
#define zpl_set_cached_acl(ip, ty, n) set_cached_acl(ip, ty, n)
#define zpl_forget_cached_acl(ip, ty) forget_cached_acl(ip, ty)
#else
static inline void
zpl_posix_acl_free(void *arg) {
kfree(arg);
}
void zpl_posix_acl_release_impl(struct posix_acl *);
static inline void
zpl_posix_acl_release(struct posix_acl *acl)
@ -225,15 +284,17 @@ zpl_posix_acl_release(struct posix_acl *acl)
if ((acl == NULL) || (acl == ACL_NOT_CACHED))
return;
if (atomic_dec_and_test(&acl->a_refcount)) {
taskq_dispatch_delay(system_taskq, zpl_posix_acl_free, acl,
TQ_SLEEP, ddi_get_lbolt() + 60*HZ);
}
if (atomic_dec_and_test(&acl->a_refcount))
zpl_posix_acl_release_impl(acl);
}
#endif /* HAVE_POSIX_ACL_RELEASE */
#ifdef HAVE_SET_CACHED_ACL_USABLE
#define zpl_set_cached_acl(ip, ty, n) set_cached_acl(ip, ty, n)
#define zpl_forget_cached_acl(ip, ty) forget_cached_acl(ip, ty)
#else
static inline void
zpl_set_cached_acl(struct inode *ip, int type, struct posix_acl *newer) {
#ifdef HAVE_POSIX_ACL_CACHING
struct posix_acl *older = NULL;
spin_lock(&ip->i_lock);
@ -255,14 +316,13 @@ zpl_set_cached_acl(struct inode *ip, int type, struct posix_acl *newer) {
spin_unlock(&ip->i_lock);
zpl_posix_acl_release(older);
#endif /* HAVE_POSIX_ACL_CACHING */
}
static inline void
zpl_forget_cached_acl(struct inode *ip, int type) {
zpl_set_cached_acl(ip, type, (struct posix_acl *)ACL_NOT_CACHED);
}
#endif /* HAVE_POSIX_ACL_RELEASE */
#endif /* HAVE_SET_CACHED_ACL_USABLE */
#ifndef HAVE___POSIX_ACL_CHMOD
#ifdef HAVE_POSIX_ACL_CHMOD
@ -320,15 +380,19 @@ typedef umode_t zpl_equivmode_t;
#else
typedef mode_t zpl_equivmode_t;
#endif /* HAVE_POSIX_ACL_EQUIV_MODE_UMODE_T */
#endif /* CONFIG_FS_POSIX_ACL */
#ifndef HAVE_CURRENT_UMASK
static inline int
current_umask(void)
{
return (current->fs->umask);
}
#endif /* HAVE_CURRENT_UMASK */
/*
* 4.8 API change,
* posix_acl_valid() now must be passed a namespace, the namespace from
* from super block associated with the given inode is used for this purpose.
*/
#ifdef HAVE_POSIX_ACL_VALID_WITH_NS
#define zpl_posix_acl_valid(ip, acl) posix_acl_valid(ip->i_sb->s_user_ns, acl)
#else
#define zpl_posix_acl_valid(ip, acl) posix_acl_valid(acl)
#endif
#endif /* CONFIG_FS_POSIX_ACL */
/*
* 2.6.38 API change,
@ -363,4 +427,69 @@ static inline struct inode *file_inode(const struct file *f)
#define zpl_follow_up(path) follow_up(path)
#endif
/*
* 4.9 API change
*/
#ifndef HAVE_SETATTR_PREPARE
static inline int
setattr_prepare(struct dentry *dentry, struct iattr *ia)
{
return (inode_change_ok(dentry->d_inode, ia));
}
#endif
/*
* 4.11 API change
* These macros are defined by kernel 4.11. We define them so that the same
* code builds under kernels < 4.11 and >= 4.11. The macros are set to 0 so
* that it will create obvious failures if they are accidentally used when built
* against a kernel >= 4.11.
*/
#ifndef STATX_BASIC_STATS
#define STATX_BASIC_STATS 0
#endif
#ifndef AT_STATX_SYNC_AS_STAT
#define AT_STATX_SYNC_AS_STAT 0
#endif
/*
* 4.11 API change
* 4.11 takes struct path *, < 4.11 takes vfsmount *
*/
#ifdef HAVE_VFSMOUNT_IOPS_GETATTR
#define ZPL_GETATTR_WRAPPER(func) \
static int \
func(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) \
{ \
struct path path = { .mnt = mnt, .dentry = dentry }; \
return func##_impl(&path, stat, STATX_BASIC_STATS, \
AT_STATX_SYNC_AS_STAT); \
}
#elif defined(HAVE_PATH_IOPS_GETATTR)
#define ZPL_GETATTR_WRAPPER(func) \
static int \
func(const struct path *path, struct kstat *stat, u32 request_mask, \
unsigned int query_flags) \
{ \
return (func##_impl(path, stat, request_mask, query_flags)); \
}
#else
#error
#endif
/*
* 4.9 API change
* Preferred interface to get the current FS time.
*/
#if !defined(HAVE_CURRENT_TIME)
static inline struct timespec
current_time(struct inode *ip)
{
return (timespec_trunc(current_kernel_time(), ip->i_sb->s_time_gran));
}
#endif
#endif /* _ZFS_VFS_H */

View File

@ -101,13 +101,26 @@ fn(struct inode *ip, char *list, size_t list_size, \
}
#endif
/*
* 4.7 API change,
* The xattr_handler->get() callback was changed to take a both dentry and
* inode, because the dentry might not be attached to an inode yet.
*/
#if defined(HAVE_XATTR_GET_DENTRY_INODE)
#define ZPL_XATTR_GET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
struct inode *inode, const char *name, void *buffer, size_t size) \
{ \
return (__ ## fn(inode, name, buffer, size)); \
}
/*
* 4.4 API change,
* The xattr_handler->get() callback was changed to take a xattr_handler,
* and handler_flags argument was removed and should be accessed by
* handler->flags.
*/
#if defined(HAVE_XATTR_GET_HANDLER)
#elif defined(HAVE_XATTR_GET_HANDLER)
#define ZPL_XATTR_GET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
@ -140,13 +153,27 @@ fn(struct inode *ip, const char *name, void *buffer, size_t size) \
}
#endif
/*
* 4.7 API change,
* The xattr_handler->set() callback was changed to take a both dentry and
* inode, because the dentry might not be attached to an inode yet.
*/
#if defined(HAVE_XATTR_SET_DENTRY_INODE)
#define ZPL_XATTR_SET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
struct inode *inode, const char *name, const void *buffer, \
size_t size, int flags) \
{ \
return (__ ## fn(inode, name, buffer, size, flags)); \
}
/*
* 4.4 API change,
* The xattr_handler->set() callback was changed to take a xattr_handler,
* and handler_flags argument was removed and should be accessed by
* handler->flags.
*/
#if defined(HAVE_XATTR_SET_HANDLER)
#elif defined(HAVE_XATTR_SET_HANDLER)
#define ZPL_XATTR_SET_WRAPPER(fn) \
static int \
fn(const struct xattr_handler *handler, struct dentry *dentry, \
@ -190,20 +217,20 @@ fn(struct inode *ip, const char *name, const void *buffer, \
/*
* Linux 3.7 API change. posix_acl_{from,to}_xattr gained the user_ns
* parameter. For the HAVE_POSIX_ACL_FROM_XATTR_USERNS version the
* userns _may_ not be correct because it's used outside the RCU.
* parameter. All callers are expected to pass the &init_user_ns which
* is available through the init credential (kcred).
*/
#ifdef HAVE_POSIX_ACL_FROM_XATTR_USERNS
static inline struct posix_acl *
zpl_acl_from_xattr(const void *value, int size)
{
return (posix_acl_from_xattr(CRED()->user_ns, value, size));
return (posix_acl_from_xattr(kcred->user_ns, value, size));
}
static inline int
zpl_acl_to_xattr(struct posix_acl *acl, void *value, int size)
{
return (posix_acl_to_xattr(CRED()->user_ns, acl, value, size));
return (posix_acl_to_xattr(kcred->user_ns, acl, value, size));
}
#else

View File

@ -221,6 +221,7 @@ int dsl_dataset_own_obj(struct dsl_pool *dp, uint64_t dsobj,
void dsl_dataset_disown(dsl_dataset_t *ds, void *tag);
void dsl_dataset_name(dsl_dataset_t *ds, char *name);
boolean_t dsl_dataset_tryown(dsl_dataset_t *ds, void *tag);
int dsl_dataset_namelen(dsl_dataset_t *ds);
uint64_t dsl_dataset_create_sync(dsl_dir_t *pds, const char *lastname,
dsl_dataset_t *origin, uint64_t flags, cred_t *, dmu_tx_t *);
uint64_t dsl_dataset_create_sync_dd(dsl_dir_t *dd, dsl_dataset_t *origin,

View File

@ -69,6 +69,7 @@ typedef enum dmu_objset_type {
#define ZAP_MAXNAMELEN 256
#define ZAP_MAXVALUELEN (1024 * 8)
#define ZAP_OLDMAXVALUELEN 1024
#define ZFS_MAX_DATASET_NAME_LEN 256
/*
* Dataset properties are identified by these constants and must be added to

View File

@ -68,8 +68,9 @@
#define MNTOPT_NOFAIL "nofail" /* no failure */
#define MNTOPT_RELATIME "relatime" /* allow relative time updates */
#define MNTOPT_NORELATIME "norelatime" /* do not allow relative time updates */
#define MNTOPT_DFRATIME "strictatime" /* Deferred access time updates */
#define MNTOPT_NODFRATIME "nostrictatime" /* No Deferred access time updates */
#define MNTOPT_STRICTATIME "strictatime" /* strict access time updates */
#define MNTOPT_NOSTRICTATIME "nostrictatime" /* No strict access time updates */
#define MNTOPT_LAZYTIME "lazytime" /* Defer access time writing */
#define MNTOPT_SETUID "suid" /* Both setuid and devices allowed */
#define MNTOPT_NOSETUID "nosuid" /* Neither setuid nor devices allowed */
#define MNTOPT_OWNER "owner" /* allow owner mount */

View File

@ -40,6 +40,17 @@ extern "C" {
*/
#define FTAG ((char *)__func__)
/*
* Starting with 4.11, torvalds/linux@f405df5, the linux kernel defines a
* refcount_t type of its own. The macro below effectively changes references
* in the ZFS code from refcount_t to zfs_refcount_t at compile time, so that
* existing code need not be altered, reducing conflicts when landing openZFS
* patches.
*/
#define refcount_t zfs_refcount_t
#define refcount_add zfs_refcount_add
#ifdef ZFS_DEBUG
typedef struct reference {
list_node_t ref_link;
@ -55,7 +66,7 @@ typedef struct refcount {
list_t rc_removed;
int64_t rc_count;
int64_t rc_removed_count;
} refcount_t;
} zfs_refcount_t;
/* Note: refcount_t must be initialized with refcount_create[_untracked]() */
@ -65,7 +76,7 @@ void refcount_destroy(refcount_t *rc);
void refcount_destroy_many(refcount_t *rc, uint64_t number);
int refcount_is_zero(refcount_t *rc);
int64_t refcount_count(refcount_t *rc);
int64_t refcount_add(refcount_t *rc, void *holder_tag);
int64_t zfs_refcount_add(refcount_t *rc, void *holder_tag);
int64_t refcount_remove(refcount_t *rc, void *holder_tag);
int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_tag);
int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder_tag);
@ -86,7 +97,7 @@ typedef struct refcount {
#define refcount_destroy_many(rc, number) ((rc)->rc_count = 0)
#define refcount_is_zero(rc) ((rc)->rc_count == 0)
#define refcount_count(rc) ((rc)->rc_count)
#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1)
#define zfs_refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1)
#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1)
#define refcount_add_many(rc, number, holder) \
atomic_add_64_nv(&(rc)->rc_count, number)

View File

@ -446,6 +446,10 @@ _NOTE(CONSTCOND) } while (0)
((zc1).zc_word[2] - (zc2).zc_word[2]) | \
((zc1).zc_word[3] - (zc2).zc_word[3])))
#define ZIO_CHECKSUM_IS_ZERO(zc) \
(0 == ((zc)->zc_word[0] | (zc)->zc_word[1] | \
(zc)->zc_word[2] | (zc)->zc_word[3]))
#define DVA_IS_VALID(dva) (DVA_GET_ASIZE(dva) != 0)
#define ZIO_SET_CHECKSUM(zcp, w0, w1, w2, w3) \

View File

@ -56,7 +56,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__field(uint64_t, z_mapcnt)
__field(uint64_t, z_gen)
__field(uint64_t, z_size)
__array(uint64_t, z_atime, 2)
__field(uint64_t, z_links)
__field(uint64_t, z_pflags)
__field(uint64_t, z_uid)
@ -64,7 +63,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__field(uint32_t, z_sync_cnt)
__field(mode_t, z_mode)
__field(boolean_t, z_is_sa)
__field(boolean_t, z_is_zvol)
__field(boolean_t, z_is_mapped)
__field(boolean_t, z_is_ctldir)
__field(boolean_t, z_is_stale)
@ -95,8 +93,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_mapcnt = zn->z_mapcnt;
__entry->z_gen = zn->z_gen;
__entry->z_size = zn->z_size;
__entry->z_atime[0] = zn->z_atime[0];
__entry->z_atime[1] = zn->z_atime[1];
__entry->z_links = zn->z_links;
__entry->z_pflags = zn->z_pflags;
__entry->z_uid = zn->z_uid;
@ -104,7 +100,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_sync_cnt = zn->z_sync_cnt;
__entry->z_mode = zn->z_mode;
__entry->z_is_sa = zn->z_is_sa;
__entry->z_is_zvol = zn->z_is_zvol;
__entry->z_is_mapped = zn->z_is_mapped;
__entry->z_is_ctldir = zn->z_is_ctldir;
__entry->z_is_stale = zn->z_is_stale;
@ -126,9 +121,9 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
),
TP_printk("zn { id %llu unlinked %u atime_dirty %u "
"zn_prefetch %u moved %u blksz %u seq %u "
"mapcnt %llu gen %llu size %llu atime 0x%llx:0x%llx "
"mapcnt %llu gen %llu size %llu "
"links %llu pflags %llu uid %llu gid %llu "
"sync_cnt %u mode 0x%x is_sa %d is_zvol %d "
"sync_cnt %u mode 0x%x is_sa %d "
"is_mapped %d is_ctldir %d is_stale %d inode { "
"ino %lu nlink %u version %llu size %lli blkbits %u "
"bytes %u mode 0x%x generation %x } } ace { type %u "
@ -136,10 +131,10 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_id, __entry->z_unlinked, __entry->z_atime_dirty,
__entry->z_zn_prefetch, __entry->z_moved, __entry->z_blksz,
__entry->z_seq, __entry->z_mapcnt, __entry->z_gen,
__entry->z_size, __entry->z_atime[0], __entry->z_atime[1],
__entry->z_size,
__entry->z_links, __entry->z_pflags, __entry->z_uid,
__entry->z_gid, __entry->z_sync_cnt, __entry->z_mode,
__entry->z_is_sa, __entry->z_is_zvol, __entry->z_is_mapped,
__entry->z_is_sa, __entry->z_is_mapped,
__entry->z_is_ctldir, __entry->z_is_stale, __entry->i_ino,
__entry->i_nlink, __entry->i_version, __entry->i_size,
__entry->i_blkbits, __entry->i_bytes, __entry->i_mode,

View File

@ -37,9 +37,5 @@ typedef struct vdev_disk {
struct block_device *vd_bdev;
} vdev_disk_t;
extern int vdev_disk_physio(struct block_device *, caddr_t,
size_t, uint64_t, int);
extern int vdev_disk_read_rootlabel(char *, char *, nvlist_t **);
#endif /* _KERNEL */
#endif /* _SYS_VDEV_DISK_H */

View File

@ -225,7 +225,7 @@ typedef struct xvattr {
* of requested attributes (xva_reqattrmap[]).
*/
#define XVA_SET_REQ(xvap, attr) \
ASSERT((xvap)->xva_vattr.va_mask | AT_XVATTR); \
ASSERT((xvap)->xva_vattr.va_mask & AT_XVATTR); \
ASSERT((xvap)->xva_magic == XVA_MAGIC); \
(xvap)->xva_reqattrmap[XVA_INDEX(attr)] |= XVA_ATTRBIT(attr)
/*
@ -233,7 +233,7 @@ typedef struct xvattr {
* of requested attributes (xva_reqattrmap[]).
*/
#define XVA_CLR_REQ(xvap, attr) \
ASSERT((xvap)->xva_vattr.va_mask | AT_XVATTR); \
ASSERT((xvap)->xva_vattr.va_mask & AT_XVATTR); \
ASSERT((xvap)->xva_magic == XVA_MAGIC); \
(xvap)->xva_reqattrmap[XVA_INDEX(attr)] &= ~XVA_ATTRBIT(attr)
@ -242,7 +242,7 @@ typedef struct xvattr {
* of returned attributes (xva_rtnattrmap[]).
*/
#define XVA_SET_RTN(xvap, attr) \
ASSERT((xvap)->xva_vattr.va_mask | AT_XVATTR); \
ASSERT((xvap)->xva_vattr.va_mask & AT_XVATTR); \
ASSERT((xvap)->xva_magic == XVA_MAGIC); \
(XVA_RTNATTRMAP(xvap))[XVA_INDEX(attr)] |= XVA_ATTRBIT(attr)
@ -251,7 +251,7 @@ typedef struct xvattr {
* to see of the corresponding attribute bit is set. If so, returns non-zero.
*/
#define XVA_ISSET_REQ(xvap, attr) \
((((xvap)->xva_vattr.va_mask | AT_XVATTR) && \
((((xvap)->xva_vattr.va_mask & AT_XVATTR) && \
((xvap)->xva_magic == XVA_MAGIC) && \
((xvap)->xva_mapsize > XVA_INDEX(attr))) ? \
((xvap)->xva_reqattrmap[XVA_INDEX(attr)] & XVA_ATTRBIT(attr)) : 0)
@ -261,7 +261,7 @@ typedef struct xvattr {
* to see of the corresponding attribute bit is set. If so, returns non-zero.
*/
#define XVA_ISSET_RTN(xvap, attr) \
((((xvap)->xva_vattr.va_mask | AT_XVATTR) && \
((((xvap)->xva_vattr.va_mask & AT_XVATTR) && \
((xvap)->xva_magic == XVA_MAGIC) && \
((xvap)->xva_mapsize > XVA_INDEX(attr))) ? \
((XVA_RTNATTRMAP(xvap))[XVA_INDEX(attr)] & XVA_ATTRBIT(attr)) : 0)

View File

@ -32,7 +32,9 @@ extern "C" {
#ifdef _KERNEL
#include <sys/zfs_znode.h>
#include <sys/list.h>
#include <sys/avl.h>
#include <sys/condvar.h>
typedef enum {
RL_READER,
@ -40,8 +42,16 @@ typedef enum {
RL_APPEND
} rl_type_t;
typedef struct zfs_rlock {
kmutex_t zr_mutex; /* protects changes to zr_avl */
avl_tree_t zr_avl; /* avl tree of range locks */
uint64_t *zr_size; /* points to znode->z_size */
uint_t *zr_blksz; /* points to znode->z_blksz */
uint64_t *zr_max_blksz; /* points to zsb->z_max_blksz */
} zfs_rlock_t;
typedef struct rl {
znode_t *r_zp; /* znode this lock applies to */
zfs_rlock_t *r_zrl;
avl_node_t r_node; /* avl node link */
uint64_t r_off; /* file range offset */
uint64_t r_len; /* file range length */
@ -61,7 +71,8 @@ typedef struct rl {
* is converted to RL_WRITER that specified to lock from the start of the
* end of file. Returns the range lock structure.
*/
rl_t *zfs_range_lock(znode_t *zp, uint64_t off, uint64_t len, rl_type_t type);
rl_t *zfs_range_lock(zfs_rlock_t *zrl, uint64_t off, uint64_t len,
rl_type_t type);
/* Unlock range and destroy range lock structure. */
void zfs_range_unlock(rl_t *rl);
@ -78,6 +89,23 @@ void zfs_range_reduce(rl_t *rl, uint64_t off, uint64_t len);
*/
int zfs_range_compare(const void *arg1, const void *arg2);
static inline void
zfs_rlock_init(zfs_rlock_t *zrl)
{
mutex_init(&zrl->zr_mutex, NULL, MUTEX_DEFAULT, NULL);
avl_create(&zrl->zr_avl, zfs_range_compare,
sizeof (rl_t), offsetof(rl_t, r_node));
zrl->zr_size = NULL;
zrl->zr_blksz = NULL;
zrl->zr_max_blksz = NULL;
}
static inline void
zfs_rlock_destroy(zfs_rlock_t *zrl)
{
avl_destroy(&zrl->zr_avl);
mutex_destroy(&zrl->zr_mutex);
}
#endif /* _KERNEL */
#ifdef __cplusplus

View File

@ -64,7 +64,6 @@ typedef struct zfs_mntopts {
typedef struct zfs_sb {
struct super_block *z_sb; /* generic super_block */
struct backing_dev_info z_bdi; /* generic backing dev info */
struct zfs_sb *z_parent; /* parent fs */
objset_t *z_os; /* objset reference */
zfs_mntopts_t *z_mntopts; /* passed mount options */

View File

@ -37,6 +37,7 @@
#include <sys/rrwlock.h>
#include <sys/zfs_sa.h>
#include <sys/zfs_stat.h>
#include <sys/zfs_rlock.h>
#endif
#include <sys/zfs_acl.h>
#include <sys/zil.h>
@ -187,8 +188,7 @@ typedef struct znode {
krwlock_t z_parent_lock; /* parent lock for directories */
krwlock_t z_name_lock; /* "master" lock for dirent locks */
zfs_dirlock_t *z_dirlocks; /* directory entry lock list */
kmutex_t z_range_lock; /* protects changes to z_range_avl */
avl_tree_t z_range_avl; /* avl tree of file range locks */
zfs_rlock_t z_range_lock; /* file range lock */
uint8_t z_unlinked; /* file has been unlinked */
uint8_t z_atime_dirty; /* atime needs to be synced */
uint8_t z_zn_prefetch; /* Prefetch znodes? */
@ -198,7 +198,6 @@ typedef struct znode {
uint64_t z_mapcnt; /* number of pages mapped to file */
uint64_t z_gen; /* generation (cached) */
uint64_t z_size; /* file size (cached) */
uint64_t z_atime[2]; /* atime (cached) */
uint64_t z_links; /* file links (cached) */
uint64_t z_pflags; /* pflags (cached) */
uint64_t z_uid; /* uid fuid (cached) */
@ -209,11 +208,9 @@ typedef struct znode {
zfs_acl_t *z_acl_cached; /* cached acl */
krwlock_t z_xattr_lock; /* xattr data lock */
nvlist_t *z_xattr_cached; /* cached xattrs */
struct znode *z_xattr_parent; /* xattr parent znode */
list_node_t z_link_node; /* all znodes in fs link */
sa_handle_t *z_sa_hdl; /* handle to sa data */
boolean_t z_is_sa; /* are we native sa? */
boolean_t z_is_zvol; /* are we used by the zvol */
boolean_t z_is_mapped; /* are we mmap'ed */
boolean_t z_is_ctldir; /* are we .zfs entry */
boolean_t z_is_stale; /* are we stale due to rollback? */
@ -306,16 +303,12 @@ extern unsigned int zfs_object_mutex_size;
#define STATE_CHANGED (ATTR_CTIME)
#define CONTENT_MODIFIED (ATTR_MTIME | ATTR_CTIME)
#define ZFS_ACCESSTIME_STAMP(zsb, zp) \
if ((zsb)->z_atime && !(zfs_is_readonly(zsb))) \
zfs_tstamp_update_setup(zp, ACCESSED, NULL, NULL, B_FALSE);
extern int zfs_init_fs(zfs_sb_t *, znode_t **);
extern void zfs_set_dataprop(objset_t *);
extern void zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *,
dmu_tx_t *tx);
extern void zfs_tstamp_update_setup(znode_t *, uint_t, uint64_t [2],
uint64_t [2], boolean_t);
uint64_t [2]);
extern void zfs_grow_blocksize(znode_t *, uint64_t, dmu_tx_t *);
extern int zfs_freesp(znode_t *, uint64_t, uint64_t, int, boolean_t);
extern void zfs_znode_init(void);

View File

@ -76,7 +76,7 @@ extern ssize_t zpl_xattr_list(struct dentry *dentry, char *buf, size_t size);
extern int zpl_xattr_security_init(struct inode *ip, struct inode *dip,
const struct qstr *qstr);
#if defined(CONFIG_FS_POSIX_ACL)
extern int zpl_set_acl(struct inode *ip, int type, struct posix_acl *acl);
extern int zpl_set_acl(struct inode *ip, struct posix_acl *acl, int type);
extern struct posix_acl *zpl_get_acl(struct inode *ip, int type);
#if !defined(HAVE_GET_ACL)
#if defined(HAVE_CHECK_ACL_WITH_FLAGS)
@ -123,7 +123,7 @@ extern const struct inode_operations zpl_ops_snapdirs;
extern const struct file_operations zpl_fops_shares;
extern const struct inode_operations zpl_ops_shares;
#ifdef HAVE_VFS_ITERATE
#if defined(HAVE_VFS_ITERATE) || defined(HAVE_VFS_ITERATE_SHARED)
#define DIR_CONTEXT_INIT(_dirent, _actor, _pos) { \
.actor = _actor, \

View File

@ -507,7 +507,7 @@
movl 16(%esp), %ebx
movl 20(%esp), %ecx
subl %eax, %ebx
adcl %edx, %ecx
sbbl %edx, %ecx
lock
cmpxchg8b (%edi)
jne 1b

View File

@ -34,6 +34,7 @@
#include <sys/mnttab.h>
#include <sys/types.h>
#include <sys/sysmacros.h>
#include <sys/stat.h>
#include <unistd.h>

View File

@ -31,69 +31,66 @@
#include <stdio.h>
#include <stdlib.h>
#ifndef __assert_c99
static inline void
__assert_c99(const char *expr, const char *file, int line, const char *func)
{
fprintf(stderr, "%s:%i: %s: Assertion `%s` failed.\n",
file, line, func, expr);
abort();
}
#endif /* __assert_c99 */
#ifndef verify
#if defined(__STDC__)
#if __STDC_VERSION__ - 0 >= 199901L
#define verify(EX) (void)((EX) || \
(__assert_c99(#EX, __FILE__, __LINE__, __func__), 0))
#else
#define verify(EX) (void)((EX) || (__assert(#EX, __FILE__, __LINE__), 0))
#endif /* __STDC_VERSION__ - 0 >= 199901L */
#else
#define verify(EX) (void)((EX) || (_assert("EX", __FILE__, __LINE__), 0))
#endif /* __STDC__ */
#endif /* verify */
#undef VERIFY
#undef ASSERT
#define VERIFY verify
#define ASSERT assert
extern void __assert(const char *, const char *, int);
#include <stdarg.h>
static inline int
assfail(const char *buf, const char *file, int line)
libspl_assert(const char *buf, const char *file, const char *func, int line)
{
__assert(buf, file, line);
return (0);
fprintf(stderr, "%s\n", buf);
fprintf(stderr, "ASSERT at %s:%d:%s()", file, line, func);
abort();
}
/* BEGIN CSTYLED */
#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \
const TYPE __left = (TYPE)(LEFT); \
const TYPE __right = (TYPE)(RIGHT); \
if (!(__left OP __right)) { \
char *__buf = alloca(256); \
(void) snprintf(__buf, 256, "%s %s %s (0x%llx %s 0x%llx)", \
#LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right); \
assfail(__buf, __FILE__, __LINE__); \
} \
/* printf version of libspl_assert */
static inline void
libspl_assertf(const char *file, const char *func, int line, char *format, ...)
{
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
fprintf(stderr, "\n");
fprintf(stderr, "ASSERT at %s:%d:%s()", file, line, func);
va_end(args);
abort();
}
#ifdef verify
#undef verify
#endif
#define VERIFY(cond) \
(void) ((!(cond)) && \
libspl_assert(#cond, __FILE__, __FUNCTION__, __LINE__))
#define verify(cond) \
(void) ((!(cond)) && \
libspl_assert(#cond, __FILE__, __FUNCTION__, __LINE__))
#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) \
do { \
const TYPE __left = (TYPE)(LEFT); \
const TYPE __right = (TYPE)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx)", #LEFT, #OP, #RIGHT); \
} while (0)
/* END CSTYLED */
#define VERIFY3S(x, y, z) VERIFY3_IMPL(x, y, z, int64_t)
#define VERIFY3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t)
#define VERIFY3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t)
#define VERIFY0(x) VERIFY3_IMPL(x, ==, 0, uint64_t)
#ifdef assert
#undef assert
#endif
#ifdef NDEBUG
#define ASSERT3S(x, y, z) ((void)0)
#define ASSERT3U(x, y, z) ((void)0)
#define ASSERT3P(x, y, z) ((void)0)
#define ASSERT0(x) ((void)0)
#define ASSERT(x) ((void)0)
#define assert(x) ((void)0)
#define ASSERTV(x)
#define IMPLY(A, B) ((void)0)
#define EQUIV(A, B) ((void)0)
@ -102,13 +99,17 @@ assfail(const char *buf, const char *file, int line)
#define ASSERT3U(x, y, z) VERIFY3U(x, y, z)
#define ASSERT3P(x, y, z) VERIFY3P(x, y, z)
#define ASSERT0(x) VERIFY0(x)
#define ASSERT(x) VERIFY(x)
#define assert(x) VERIFY(x)
#define ASSERTV(x) x
#define IMPLY(A, B) \
((void)(((!(A)) || (B)) || \
assfail("(" #A ") implies (" #B ")", __FILE__, __LINE__)))
libspl_assert("(" #A ") implies (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define EQUIV(A, B) \
((void)((!!(A) == !!(B)) || \
assfail("(" #A ") is equivalent to (" #B ")", __FILE__, __LINE__)))
libspl_assert("(" #A ") is equivalent to (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#endif /* NDEBUG */

View File

@ -42,7 +42,6 @@
#define makedevice(maj, min) makedev(maj, min)
#define _sysconf(a) sysconf(a)
#define __NORETURN __attribute__((noreturn))
/*
* Compatibility macros/typedefs needed for Solaris -> Linux port

View File

@ -27,10 +27,15 @@
#ifndef _LIBSPL_SYS_TYPES_H
#define _LIBSPL_SYS_TYPES_H
#if defined(HAVE_MAKEDEV_IN_SYSMACROS)
#include <sys/sysmacros.h>
#elif defined(HAVE_MAKEDEV_IN_MKDEV)
#include <sys/mkdev.h>
#endif
#include <sys/isa_defs.h>
#include <sys/feature_tests.h>
#include_next <sys/types.h>
#include <sys/param.h> /* for NBBY */
#include <sys/types32.h>
#include <sys/va_list.h>
@ -95,4 +100,6 @@ typedef union {
} lloff_t;
#endif
#include <sys/param.h> /* for NBBY */
#endif

View File

@ -3315,8 +3315,9 @@ zfs_check_snap_cb(zfs_handle_t *zhp, void *arg)
char name[ZFS_MAXNAMELEN];
int rv = 0;
(void) snprintf(name, sizeof (name),
"%s@%s", zhp->zfs_name, dd->snapname);
if (snprintf(name, sizeof (name), "%s@%s", zhp->zfs_name,
dd->snapname) >= sizeof (name))
return (EINVAL);
if (lzc_exists(name))
verify(nvlist_add_boolean(dd->nvl, name) == 0);
@ -3534,8 +3535,9 @@ zfs_snapshot_cb(zfs_handle_t *zhp, void *arg)
int rv = 0;
if (zfs_prop_get_int(zhp, ZFS_PROP_INCONSISTENT) == 0) {
(void) snprintf(name, sizeof (name),
"%s@%s", zfs_get_name(zhp), sd->sd_snapname);
if (snprintf(name, sizeof (name), "%s@%s", zfs_get_name(zhp),
sd->sd_snapname) >= sizeof (name))
return (EINVAL);
fnvlist_add_boolean(sd->sd_nvl, name);
@ -3889,7 +3891,6 @@ zfs_rename(zfs_handle_t *zhp, const char *target, boolean_t recursive,
}
if (recursive) {
parentname = zfs_strdup(zhp->zfs_hdl, zhp->zfs_name);
if (parentname == NULL) {
ret = -1;
@ -3902,8 +3903,7 @@ zfs_rename(zfs_handle_t *zhp, const char *target, boolean_t recursive,
ret = -1;
goto error;
}
} else {
} else if (zhp->zfs_type != ZFS_TYPE_SNAPSHOT) {
if ((cl = changelist_gather(zhp, ZFS_PROP_NAME, 0,
force_unmount ? MS_FORCE : 0)) == NULL)
return (-1);
@ -3952,23 +3952,23 @@ zfs_rename(zfs_handle_t *zhp, const char *target, boolean_t recursive,
* On failure, we still want to remount any filesystems that
* were previously mounted, so we don't alter the system state.
*/
if (!recursive)
if (cl != NULL)
(void) changelist_postfix(cl);
} else {
if (!recursive) {
if (cl != NULL) {
changelist_rename(cl, zfs_get_name(zhp), target);
ret = changelist_postfix(cl);
}
}
error:
if (parentname) {
if (parentname != NULL) {
free(parentname);
}
if (zhrp) {
if (zhrp != NULL) {
zfs_close(zhrp);
}
if (cl) {
if (cl != NULL) {
changelist_free(cl);
}
return (ret);
@ -4259,8 +4259,9 @@ zfs_hold_one(zfs_handle_t *zhp, void *arg)
char name[ZFS_MAXNAMELEN];
int rv = 0;
(void) snprintf(name, sizeof (name),
"%s@%s", zhp->zfs_name, ha->snapname);
if (snprintf(name, sizeof (name), "%s@%s", zhp->zfs_name,
ha->snapname) >= sizeof (name))
return (EINVAL);
if (lzc_exists(name))
fnvlist_add_string(ha->nvl, name, ha->tag);
@ -4379,8 +4380,11 @@ zfs_release_one(zfs_handle_t *zhp, void *arg)
int rv = 0;
nvlist_t *existing_holds;
(void) snprintf(name, sizeof (name),
"%s@%s", zhp->zfs_name, ha->snapname);
if (snprintf(name, sizeof (name), "%s@%s", zhp->zfs_name,
ha->snapname) >= sizeof (name)) {
ha->error = EINVAL;
rv = EINVAL;
}
if (lzc_get_holds(name, &existing_holds) != 0) {
ha->error = ENOENT;

View File

@ -97,6 +97,8 @@ typedef struct pool_list {
name_entry_t *names;
} pool_list_t;
#define DEV_BYID_PATH "/dev/disk/by-id/"
static char *
get_devid(const char *path)
{
@ -121,6 +123,40 @@ get_devid(const char *path)
return (ret);
}
/*
* Wait up to timeout_ms for udev to set up the device node. The device is
* considered ready when the provided path have been verified to exist and
* it has been allowed to settle. At this point the device the device can
* be accessed reliably. Depending on the complexity of the udev rules thisi
* process could take several seconds.
*/
int
zpool_label_disk_wait(char *path, int timeout_ms)
{
int settle_ms = 50;
long sleep_ms = 10;
hrtime_t start, settle;
struct stat64 statbuf;
start = gethrtime();
settle = 0;
do {
errno = 0;
if ((stat64(path, &statbuf) == 0) && (errno == 0)) {
if (settle == 0)
settle = gethrtime();
else if (NSEC2MSEC(gethrtime() - settle) >= settle_ms)
return (0);
} else if (errno != ENOENT) {
return (errno);
}
usleep(sleep_ms * MILLISEC);
} while (NSEC2MSEC(gethrtime() - start) < timeout_ms);
return (ENODEV);
}
/*
* Go through and fix up any path and/or devid information for the given vdev
@ -162,7 +198,6 @@ fix_paths(nvlist_t *nv, name_entry_t *names)
best = NULL;
for (ne = names; ne != NULL; ne = ne->ne_next) {
if (ne->ne_guid == guid) {
if (path == NULL) {
best = ne;
break;
@ -186,7 +221,7 @@ fix_paths(nvlist_t *nv, name_entry_t *names)
}
/* Prefer paths earlier in the search order. */
if (best->ne_num_labels == best->ne_num_labels &&
if (ne->ne_num_labels == best->ne_num_labels &&
ne->ne_order < best->ne_order) {
best = ne;
continue;
@ -352,6 +387,118 @@ add_config(libzfs_handle_t *hdl, pool_list_t *pl, const char *path,
return (0);
}
#ifdef HAVE_LIBBLKID
static int
add_path(libzfs_handle_t *hdl, pool_list_t *pools, uint64_t pool_guid,
uint64_t vdev_guid, const char *path, int order)
{
nvlist_t *label;
uint64_t guid;
int error, fd, num_labels;
fd = open64(path, O_RDONLY);
if (fd < 0)
return (errno);
error = zpool_read_label(fd, &label, &num_labels);
close(fd);
if (error || label == NULL)
return (ENOENT);
error = nvlist_lookup_uint64(label, ZPOOL_CONFIG_POOL_GUID, &guid);
if (error || guid != pool_guid) {
nvlist_free(label);
return (EINVAL);
}
error = nvlist_lookup_uint64(label, ZPOOL_CONFIG_GUID, &guid);
if (error || guid != vdev_guid) {
nvlist_free(label);
return (EINVAL);
}
error = add_config(hdl, pools, path, order, num_labels, label);
return (error);
}
static int
add_configs_from_label_impl(libzfs_handle_t *hdl, pool_list_t *pools,
nvlist_t *nvroot, uint64_t pool_guid, uint64_t vdev_guid)
{
char udevpath[MAXPATHLEN];
char *path;
nvlist_t **child;
uint_t c, children;
uint64_t guid;
int error;
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_CHILDREN,
&child, &children) == 0) {
for (c = 0; c < children; c++) {
error = add_configs_from_label_impl(hdl, pools,
child[c], pool_guid, vdev_guid);
if (error)
return (error);
}
return (0);
}
if (nvroot == NULL)
return (0);
error = nvlist_lookup_uint64(nvroot, ZPOOL_CONFIG_GUID, &guid);
if ((error != 0) || (guid != vdev_guid))
return (0);
error = nvlist_lookup_string(nvroot, ZPOOL_CONFIG_PATH, &path);
if (error == 0)
(void) add_path(hdl, pools, pool_guid, vdev_guid, path, 0);
error = nvlist_lookup_string(nvroot, ZPOOL_CONFIG_DEVID, &path);
if (error == 0) {
sprintf(udevpath, "%s%s", DEV_BYID_PATH, path);
(void) add_path(hdl, pools, pool_guid, vdev_guid, udevpath, 1);
}
return (0);
}
/*
* Given a disk label call add_config() for all known paths to the device
* as described by the label itself. The paths are added in the following
* priority order: 'path', 'devid', 'devnode'. As these alternate paths are
* added the labels are verified to make sure they refer to the same device.
*/
static int
add_configs_from_label(libzfs_handle_t *hdl, pool_list_t *pools,
char *devname, int num_labels, nvlist_t *label)
{
nvlist_t *nvroot;
uint64_t pool_guid;
uint64_t vdev_guid;
int error;
if (nvlist_lookup_nvlist(label, ZPOOL_CONFIG_VDEV_TREE, &nvroot) ||
nvlist_lookup_uint64(label, ZPOOL_CONFIG_POOL_GUID, &pool_guid) ||
nvlist_lookup_uint64(label, ZPOOL_CONFIG_GUID, &vdev_guid))
return (ENOENT);
/* Allow devlinks to stabilize so all paths are available. */
zpool_label_disk_wait(devname, DISK_LABEL_WAIT);
/* Add alternate paths as described by the label vdev_tree. */
(void) add_configs_from_label_impl(hdl, pools, nvroot,
pool_guid, vdev_guid);
/* Add the device node /dev/sdX path as a last resort. */
error = add_config(hdl, pools, devname, 100, num_labels, label);
return (error);
}
#endif /* HAVE_LIBBLKID */
/*
* Returns true if the named pool matches the given GUID.
*/
@ -975,9 +1122,7 @@ zpool_find_import_blkid(libzfs_handle_t *hdl, pool_list_t *pools)
blkid_cache cache;
blkid_dev_iterate iter;
blkid_dev dev;
const char *devname;
nvlist_t *config;
int fd, err, num_labels;
int err;
err = blkid_get_cache(&cache, NULL);
if (err != 0) {
@ -1008,25 +1153,23 @@ zpool_find_import_blkid(libzfs_handle_t *hdl, pool_list_t *pools)
}
while (blkid_dev_next(iter, &dev) == 0) {
devname = blkid_dev_devname(dev);
nvlist_t *label;
char *devname;
int fd, num_labels;
devname = (char *) blkid_dev_devname(dev);
if ((fd = open64(devname, O_RDONLY)) < 0)
continue;
err = zpool_read_label(fd, &config, &num_labels);
err = zpool_read_label(fd, &label, &num_labels);
(void) close(fd);
if (err != 0) {
(void) no_memory(hdl);
goto err_blkid3;
}
if (err || label == NULL)
continue;
if (config != NULL) {
err = add_config(hdl, pools, devname, 0,
num_labels, config);
if (err != 0)
goto err_blkid3;
}
add_configs_from_label(hdl, pools, devname, num_labels, label);
}
err = 0;
err_blkid3:
blkid_dev_iterate_end(iter);
@ -1194,16 +1337,33 @@ zpool_find_import_impl(libzfs_handle_t *hdl, importargs_t *iarg)
if (config != NULL) {
boolean_t matched = B_TRUE;
boolean_t aux = B_FALSE;
char *pname;
if ((iarg->poolname != NULL) &&
/*
* Check if it's a spare or l2cache device. If
* it is, we need to skip the name and guid
* check since they don't exist on aux device
* label.
*/
if (iarg->poolname != NULL ||
iarg->guid != 0) {
uint64_t state;
aux = nvlist_lookup_uint64(config,
ZPOOL_CONFIG_POOL_STATE,
&state) == 0 &&
(state == POOL_STATE_SPARE ||
state == POOL_STATE_L2CACHE);
}
if ((iarg->poolname != NULL) && !aux &&
(nvlist_lookup_string(config,
ZPOOL_CONFIG_POOL_NAME, &pname) == 0)) {
if (strcmp(iarg->poolname, pname))
matched = B_FALSE;
} else if (iarg->guid != 0) {
} else if (iarg->guid != 0 && !aux) {
uint64_t this_guid;
matched = nvlist_lookup_uint64(config,

View File

@ -204,8 +204,11 @@ zfs_iter_bookmarks(zfs_handle_t *zhp, zfs_iter_f func, void *data)
bmark_name = nvpair_name(pair);
bmark_props = fnvpair_value_nvlist(pair);
(void) snprintf(name, sizeof (name), "%s#%s", zhp->zfs_name,
bmark_name);
if (snprintf(name, sizeof (name), "%s#%s", zhp->zfs_name,
bmark_name) >= sizeof (name)) {
err = EINVAL;
goto out;
}
nzhp = make_bookmark_handle(zhp, name, bmark_props);
if (nzhp == NULL)

View File

@ -21,6 +21,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2014 by Delphix. All rights reserved.
*/
/*
@ -363,6 +364,14 @@ zfs_add_options(zfs_handle_t *zhp, char *options, int len)
error = zfs_add_option(zhp, options, len,
ZFS_PROP_ATIME, MNTOPT_ATIME, MNTOPT_NOATIME);
/*
* don't add relatime/strictatime when atime=off, otherwise strictatime
* will force atime=on
*/
if (strstr(options, MNTOPT_NOATIME) == NULL) {
error = zfs_add_option(zhp, options, len,
ZFS_PROP_RELATIME, MNTOPT_RELATIME, MNTOPT_STRICTATIME);
}
error = error ? error : zfs_add_option(zhp, options, len,
ZFS_PROP_DEVICES, MNTOPT_DEVICES, MNTOPT_NODEVICES);
error = error ? error : zfs_add_option(zhp, options, len,
@ -744,13 +753,6 @@ zfs_share_proto(zfs_handle_t *zhp, zfs_share_proto_t *proto)
if (!zfs_is_mountable(zhp, mountpoint, sizeof (mountpoint), NULL))
return (0);
if ((ret = zfs_init_libshare(hdl, SA_INIT_SHARE_API)) != SA_OK) {
(void) zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
dgettext(TEXT_DOMAIN, "cannot share '%s': %s"),
zfs_get_name(zhp), sa_errorstr(ret));
return (-1);
}
for (curr_proto = proto; *curr_proto != PROTO_END; curr_proto++) {
/*
* Return success if there are no share options.
@ -761,6 +763,14 @@ zfs_share_proto(zfs_handle_t *zhp, zfs_share_proto_t *proto)
strcmp(shareopts, "off") == 0)
continue;
ret = zfs_init_libshare(hdl, SA_INIT_SHARE_API);
if (ret != SA_OK) {
(void) zfs_error_fmt(hdl, EZFS_SHARENFSFAILED,
dgettext(TEXT_DOMAIN, "cannot share '%s': %s"),
zfs_get_name(zhp), sa_errorstr(ret));
return (-1);
}
/*
* If the 'zoned' property is set, then zfs_is_mountable()
* will have already bailed out if we are in the global zone.
@ -1072,7 +1082,7 @@ libzfs_dataset_cmp(const void *a, const void *b)
if (gotb)
return (1);
return (strcmp(zfs_get_name(a), zfs_get_name(b)));
return (strcmp(zfs_get_name(*za), zfs_get_name(*zb)));
}
/*

View File

@ -1891,7 +1891,12 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *config, const char *newname,
"one or more devices are already in use\n"));
(void) zfs_error(hdl, EZFS_BADDEV, desc);
break;
case ENAMETOOLONG:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"new name of at least one dataset is longer than "
"the maximum allowable length"));
(void) zfs_error(hdl, EZFS_NAMETOOLONG, desc);
break;
default:
(void) zpool_standard_error(hdl, error, desc);
zpool_explain_recover(hdl,
@ -4094,29 +4099,6 @@ find_start_block(nvlist_t *config)
return (MAXOFFSET_T);
}
int
zpool_label_disk_wait(char *path, int timeout)
{
struct stat64 statbuf;
int i;
/*
* Wait timeout miliseconds for a newly created device to be available
* from the given path. There is a small window when a /dev/ device
* will exist and the udev link will not, so we must wait for the
* symlink. Depending on the udev rules this may take a few seconds.
*/
for (i = 0; i < timeout; i++) {
usleep(1000);
errno = 0;
if ((stat64(path, &statbuf) == 0) && (errno == 0))
return (0);
}
return (ENOENT);
}
int
zpool_label_disk_check(char *path)
{
@ -4142,6 +4124,32 @@ zpool_label_disk_check(char *path)
return (0);
}
/*
* Generate a unique partition name for the ZFS member. Partitions must
* have unique names to ensure udev will be able to create symlinks under
* /dev/disk/by-partlabel/ for all pool members. The partition names are
* of the form <pool>-<unique-id>.
*/
static void
zpool_label_name(char *label_name, int label_size)
{
uint64_t id = 0;
int fd;
fd = open("/dev/urandom", O_RDONLY);
if (fd > 0) {
if (read(fd, &id, sizeof (id)) != sizeof (id))
id = 0;
close(fd);
}
if (id == 0)
id = (((uint64_t)rand()) << 32) | (uint64_t)rand();
snprintf(label_name, label_size, "zfs-%016llx", (u_longlong_t) id);
}
/*
* Label an individual disk. The name provided is the short name,
* stripped of any leading /dev path.
@ -4232,7 +4240,7 @@ zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, char *name)
* can get, in the absence of V_OTHER.
*/
vtoc->efi_parts[0].p_tag = V_USR;
(void) strcpy(vtoc->efi_parts[0].p_name, "zfs");
zpool_label_name(vtoc->efi_parts[0].p_name, EFI_PART_NAME_LEN);
vtoc->efi_parts[8].p_start = slice_size + start_block;
vtoc->efi_parts[8].p_size = resv;
@ -4256,12 +4264,11 @@ zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, char *name)
(void) close(fd);
efi_free(vtoc);
/* Wait for the first expected partition to appear. */
(void) snprintf(path, sizeof (path), "%s/%s", DISK_ROOT, name);
(void) zfs_append_partition(path, MAXPATHLEN);
rval = zpool_label_disk_wait(path, 3000);
/* Wait to udev to signal use the device has settled. */
rval = zpool_label_disk_wait(path, DISK_LABEL_WAIT);
if (rval) {
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, "failed to "
"detect device partitions on '%s': %d"), path, rval);

View File

@ -1487,9 +1487,13 @@ zfs_send(zfs_handle_t *zhp, const char *fromsnap, const char *tosnap,
drr_versioninfo, DMU_COMPOUNDSTREAM);
DMU_SET_FEATUREFLAGS(drr.drr_u.drr_begin.
drr_versioninfo, featureflags);
(void) snprintf(drr.drr_u.drr_begin.drr_toname,
if (snprintf(drr.drr_u.drr_begin.drr_toname,
sizeof (drr.drr_u.drr_begin.drr_toname),
"%s@%s", zhp->zfs_name, tosnap);
"%s@%s", zhp->zfs_name, tosnap) >=
sizeof (drr.drr_u.drr_begin.drr_toname)) {
err = EINVAL;
goto stderr_out;
}
drr.drr_payloadlen = buflen;
err = cksum_and_write(&drr, sizeof (drr), &zc, outfd);
@ -2689,7 +2693,8 @@ zfs_receive_one(libzfs_handle_t *hdl, int infd, const char *tosnap,
ENOENT);
if (stream_avl != NULL) {
char *snapname;
char *snapname = NULL;
nvlist_t *lookup = NULL;
nvlist_t *fs = fsavl_find(stream_avl, drrb->drr_toguid,
&snapname);
nvlist_t *props;
@ -2710,6 +2715,11 @@ zfs_receive_one(libzfs_handle_t *hdl, int infd, const char *tosnap,
nvlist_free(props);
if (ret != 0)
return (-1);
if (0 == nvlist_lookup_nvlist(fs, "snapprops", &lookup)) {
VERIFY(0 == nvlist_lookup_nvlist(lookup,
snapname, &snapprops_nvlist));
}
}
cp = NULL;

3
lib/libzfs/libzfs_util.c Normal file → Executable file
View File

@ -1352,7 +1352,8 @@ zprop_print_one_property(const char *name, zprop_get_cbdata_t *cbp,
continue;
}
if (cbp->cb_columns[i + 1] == GET_COL_NONE)
if (i == (ZFS_GET_NCOLS - 1) ||
cbp->cb_columns[i + 1] == GET_COL_NONE)
(void) printf("%s", str);
else if (cbp->cb_scripted)
(void) printf("%s\t", str);

View File

@ -24,6 +24,19 @@ Description of the different parameters to the ZFS module.
.sp
.LP
.sp
.ne 2
.na
\fBignore_hole_birth\fR (int)
.ad
.RS 12n
When set, the hole_birth optimization will not be used, and all holes will
always be sent on zfs send. Useful if you suspect your datasets are affected
by a bug in hole_birth.
.sp
Use \fB1\fR (default) for on and \fB0\fR for off.
.RE
.sp
.ne 2
.na
@ -870,7 +883,12 @@ Default value: \fB10\fR.
Minimum asynchronous write I/Os active to each device.
See the section "ZFS I/O SCHEDULER".
.sp
Default value: \fB1\fR.
Lower values are associated with better latency on rotational media but poorer
resilver performance. The default value of 2 was chosen as a compromise. A
value of 3 has been shown to improve resilver performance further at a cost of
further increasing latency.
.sp
Default value: \fB2\fR.
.RE
.sp

View File

@ -630,7 +630,7 @@ avl_insert_here(
void
avl_add(avl_tree_t *tree, void *new_node)
{
avl_index_t where;
avl_index_t where = 0;
/*
* This is unfortunate. We want to call panic() here, even for

View File

@ -21,6 +21,7 @@
/*
* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2015, 2016 by Delphix. All rights reserved.
*/
#include <sys/stropts.h>
@ -138,6 +139,11 @@ static int nvlist_add_common(nvlist_t *nvl, const char *name, data_type_t type,
#define NVPAIR2I_NVP(nvp) \
((i_nvp_t *)((size_t)(nvp) - offsetof(i_nvp_t, nvi_nvp)))
#ifdef _KERNEL
int nvpair_max_recursion = 20;
#else
int nvpair_max_recursion = 100;
#endif
int
nv_alloc_init(nv_alloc_t *nva, const nv_alloc_ops_t *nvo, /* args */ ...)
@ -2017,6 +2023,7 @@ typedef struct {
const nvs_ops_t *nvs_ops;
void *nvs_private;
nvpriv_t *nvs_priv;
int nvs_recursion;
} nvstream_t;
/*
@ -2168,9 +2175,16 @@ static int
nvs_embedded(nvstream_t *nvs, nvlist_t *embedded)
{
switch (nvs->nvs_op) {
case NVS_OP_ENCODE:
return (nvs_operation(nvs, embedded, NULL));
case NVS_OP_ENCODE: {
int err;
if (nvs->nvs_recursion >= nvpair_max_recursion)
return (EINVAL);
nvs->nvs_recursion++;
err = nvs_operation(nvs, embedded, NULL);
nvs->nvs_recursion--;
return (err);
}
case NVS_OP_DECODE: {
nvpriv_t *priv;
int err;
@ -2183,8 +2197,12 @@ nvs_embedded(nvstream_t *nvs, nvlist_t *embedded)
nvlist_init(embedded, embedded->nvl_nvflag, priv);
if (nvs->nvs_recursion >= nvpair_max_recursion)
return (EINVAL);
nvs->nvs_recursion++;
if ((err = nvs_operation(nvs, embedded, NULL)) != 0)
nvlist_free(embedded);
nvs->nvs_recursion--;
return (err);
}
default:
@ -2272,6 +2290,7 @@ nvlist_common(nvlist_t *nvl, char *buf, size_t *buflen, int encoding,
return (EINVAL);
nvs.nvs_op = nvs_op;
nvs.nvs_recursion = 0;
/*
* For NVS_OP_ENCODE and NVS_OP_DECODE make sure an nvlist and

View File

@ -1451,6 +1451,13 @@ arc_buf_info(arc_buf_t *ab, arc_buf_info_t *abi, int state_index)
l2arc_buf_hdr_t *l2hdr = NULL;
arc_state_t *state = NULL;
memset(abi, 0, sizeof (arc_buf_info_t));
if (hdr == NULL)
return;
abi->abi_flags = hdr->b_flags;
if (HDR_HAS_L1HDR(hdr)) {
l1hdr = &hdr->b_l1hdr;
state = l1hdr->b_state;
@ -1458,9 +1465,6 @@ arc_buf_info(arc_buf_t *ab, arc_buf_info_t *abi, int state_index)
if (HDR_HAS_L2HDR(hdr))
l2hdr = &hdr->b_l2hdr;
memset(abi, 0, sizeof (arc_buf_info_t));
abi->abi_flags = hdr->b_flags;
if (l1hdr) {
abi->abi_datacnt = l1hdr->b_datacnt;
abi->abi_access = l1hdr->b_arc_access;
@ -2697,12 +2701,7 @@ arc_prune_task(void *ptr)
if (func != NULL)
func(ap->p_adjust, ap->p_private);
/* Callback unregistered concurrently with execution */
if (refcount_remove(&ap->p_refcnt, func) == 0) {
ASSERT(!list_link_active(&ap->p_node));
refcount_destroy(&ap->p_refcnt);
kmem_free(ap, sizeof (*ap));
}
refcount_remove(&ap->p_refcnt, func);
}
/*
@ -4320,17 +4319,11 @@ top:
/*
* Gracefully handle a damaged logical block size as a
* checksum error by passing a dummy zio to the done callback.
* checksum error.
*/
if (size > spa_maxblocksize(spa)) {
if (done) {
rzio = zio_null(pio, spa, NULL,
NULL, NULL, zio_flags);
rzio->io_error = ECKSUM;
done(rzio, buf, private);
zio_nowait(rzio);
}
rc = ECKSUM;
ASSERT3P(buf, ==, NULL);
rc = SET_ERROR(ECKSUM);
goto out;
}
@ -4568,13 +4561,19 @@ arc_add_prune_callback(arc_prune_func_t *func, void *private)
void
arc_remove_prune_callback(arc_prune_t *p)
{
boolean_t wait = B_FALSE;
mutex_enter(&arc_prune_mtx);
list_remove(&arc_prune_list, p);
if (refcount_remove(&p->p_refcnt, &arc_prune_list) == 0) {
refcount_destroy(&p->p_refcnt);
kmem_free(p, sizeof (*p));
}
if (refcount_remove(&p->p_refcnt, &arc_prune_list) > 0)
wait = B_TRUE;
mutex_exit(&arc_prune_mtx);
/* wait for arc_prune_task to finish */
if (wait)
taskq_wait_outstanding(arc_prune_taskq, 0);
ASSERT0(refcount_count(&p->p_refcnt));
refcount_destroy(&p->p_refcnt);
kmem_free(p, sizeof (*p));
}
void
@ -5250,7 +5249,7 @@ arc_tuning_update(void)
arc_c_max = zfs_arc_max;
arc_c = arc_c_max;
arc_p = (arc_c >> 1);
arc_meta_limit = MIN(arc_meta_limit, arc_c_max);
arc_meta_limit = MIN(arc_meta_limit, (3 * arc_c_max) / 4);
}
/* Valid range: 32M - <arc_c_max> */
@ -5476,11 +5475,12 @@ arc_init(void)
* If it has been set by a module parameter, take that.
* Otherwise, use a percentage of physical memory defined by
* zfs_dirty_data_max_percent (default 10%) with a cap at
* zfs_dirty_data_max_max (default 25% of physical memory).
* zfs_dirty_data_max_max (default 4G or 25% of physical memory).
*/
if (zfs_dirty_data_max_max == 0)
zfs_dirty_data_max_max = (uint64_t)physmem * PAGESIZE *
zfs_dirty_data_max_max_percent / 100;
zfs_dirty_data_max_max = MIN(4ULL * 1024 * 1024 * 1024,
(uint64_t)physmem * PAGESIZE *
zfs_dirty_data_max_max_percent / 100);
if (zfs_dirty_data_max == 0) {
zfs_dirty_data_max = (uint64_t)physmem * PAGESIZE *

View File

@ -2628,6 +2628,22 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx)
if (db->db_blkid == DMU_SPILL_BLKID) {
mutex_enter(&dn->dn_mtx);
if (!(dn->dn_phys->dn_flags & DNODE_FLAG_SPILL_BLKPTR)) {
/*
* In the previous transaction group, the bonus buffer
* was entirely used to store the attributes for the
* dnode which overrode the dn_spill field. However,
* when adding more attributes to the file a spill
* block was required to hold the extra attributes.
*
* Make sure to clear the garbage left in the dn_spill
* field from the previous attributes in the bonus
* buffer. Otherwise, after writing out the spill
* block to the new allocated dva, it will free
* the old block pointed to by the invalid dn_spill.
*/
db->db_blkptr = NULL;
}
dn->dn_phys->dn_flags |= DNODE_FLAG_SPILL_BLKPTR;
mutex_exit(&dn->dn_mtx);
}

View File

@ -148,7 +148,6 @@ dbuf_stats_hash_table_data(char *buf, size_t size, void *data)
}
mutex_enter(&db->db_mtx);
mutex_exit(DBUF_HASH_MUTEX(h, dsh->idx));
if (db->db_state != DB_EVICTING) {
length = __dbuf_stats_hash_table_data(buf, size, db);
@ -157,7 +156,6 @@ dbuf_stats_hash_table_data(char *buf, size_t size, void *data)
}
mutex_exit(&db->db_mtx);
mutex_enter(DBUF_HASH_MUTEX(h, dsh->idx));
}
mutex_exit(DBUF_HASH_MUTEX(h, dsh->idx));

View File

@ -49,6 +49,7 @@
#ifdef _KERNEL
#include <sys/vmsystm.h>
#include <sys/zfs_znode.h>
#include <linux/kmap_compat.h>
#endif
/*
@ -1056,6 +1057,7 @@ dmu_bio_copy(void *arg_buf, int size, struct bio *bio, size_t bio_offset)
char *bv_buf;
int tocpy, bv_len, bv_offset;
int offset = 0;
void *paddr;
bio_for_each_segment4(bv, bvp, bio, iter) {
@ -1080,14 +1082,15 @@ dmu_bio_copy(void *arg_buf, int size, struct bio *bio, size_t bio_offset)
tocpy = MIN(bv_len, size - offset);
ASSERT3S(tocpy, >=, 0);
bv_buf = page_address(bvp->bv_page) + bv_offset;
ASSERT3P(bv_buf, !=, NULL);
paddr = zfs_kmap_atomic(bvp->bv_page, KM_USER0);
bv_buf = paddr + bv_offset;
ASSERT3P(paddr, !=, NULL);
if (bio_data_dir(bio) == WRITE)
memcpy(arg_buf + offset, bv_buf, tocpy);
else
memcpy(bv_buf, arg_buf + offset, tocpy);
zfs_kunmap_atomic(paddr, KM_USER0);
offset += tocpy;
}
out:

View File

@ -69,7 +69,7 @@ typedef struct dump_bytes_io {
} dump_bytes_io_t;
static void
dump_bytes_strategy(void *arg)
dump_bytes_cb(void *arg)
{
dump_bytes_io_t *dbi = (dump_bytes_io_t *)arg;
dmu_sendarg_t *dsp = dbi->dbi_dsp;
@ -96,6 +96,9 @@ dump_bytes(dmu_sendarg_t *dsp, void *buf, int len)
dbi.dbi_buf = buf;
dbi.dbi_len = len;
#if defined(HAVE_LARGE_STACKS)
dump_bytes_cb(&dbi);
#else
/*
* The vn_rdwr() call is performed in a taskq to ensure that there is
* always enough stack space to write safely to the target filesystem.
@ -103,7 +106,8 @@ dump_bytes(dmu_sendarg_t *dsp, void *buf, int len)
* them and they are used in vdev_file.c for a similar purpose.
*/
spa_taskq_dispatch_sync(dmu_objset_spa(dsp->dsa_os), ZIO_TYPE_FREE,
ZIO_TASKQ_ISSUE, dump_bytes_strategy, &dbi, TQ_SLEEP);
ZIO_TASKQ_ISSUE, dump_bytes_cb, &dbi, TQ_SLEEP);
#endif /* HAVE_LARGE_STACKS */
return (dsp->dsa_err);
}

View File

@ -39,6 +39,7 @@
#include <sys/zfeature.h>
int32_t zfs_pd_bytes_max = 50 * 1024 * 1024; /* 50MB */
int32_t ignore_hole_birth = 1;
typedef struct prefetch_data {
kmutex_t pd_mtx;
@ -250,7 +251,7 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp,
*
* Note that the meta-dnode cannot be reallocated.
*/
if ((!td->td_realloc_possible ||
if (!ignore_hole_birth && (!td->td_realloc_possible ||
zb->zb_object == DMU_META_DNODE_OBJECT) &&
td->td_hole_birth_enabled_txg <= td->td_min_txg)
return (0);
@ -692,4 +693,7 @@ EXPORT_SYMBOL(traverse_pool);
module_param(zfs_pd_bytes_max, int, 0644);
MODULE_PARM_DESC(zfs_pd_bytes_max, "Max number of bytes to prefetch");
module_param(ignore_hole_birth, int, 0644);
MODULE_PARM_DESC(ignore_hole_birth, "Ignore hole_birth txg for send");
#endif

View File

@ -665,6 +665,21 @@ dsl_dataset_name(dsl_dataset_t *ds, char *name)
}
}
int
dsl_dataset_namelen(dsl_dataset_t *ds)
{
int len;
VERIFY0(dsl_dataset_get_snapname(ds));
mutex_enter(&ds->ds_lock);
len = strlen(ds->ds_snapname);
/* add '@' if ds is a snap */
if (len > 0)
len++;
len += dsl_dir_namelen(ds->ds_dir);
mutex_exit(&ds->ds_lock);
return (len);
}
void
dsl_dataset_rele(dsl_dataset_t *ds, void *tag)
{

14
module/zfs/dsl_pool.c Normal file → Executable file
View File

@ -182,12 +182,20 @@ dsl_pool_init(spa_t *spa, uint64_t txg, dsl_pool_t **dpp)
int err;
dsl_pool_t *dp = dsl_pool_open_impl(spa, txg);
/*
* Initialize the caller's dsl_pool_t structure before we actually open
* the meta objset. This is done because a self-healing write zio may
* be issued as part of dmu_objset_open_impl() and the spa needs its
* dsl_pool_t initialized in order to handle the write.
*/
*dpp = dp;
err = dmu_objset_open_impl(spa, NULL, &dp->dp_meta_rootbp,
&dp->dp_meta_objset);
if (err != 0)
if (err != 0) {
dsl_pool_close(dp);
else
*dpp = dp;
*dpp = NULL;
}
return (err);
}

View File

@ -137,7 +137,7 @@ refcount_add_many(refcount_t *rc, uint64_t number, void *holder)
}
int64_t
refcount_add(refcount_t *rc, void *holder)
zfs_refcount_add(refcount_t *rc, void *holder)
{
return (refcount_add_many(rc, 1, holder));
}

View File

@ -845,7 +845,7 @@ spa_taskqs_init(spa_t *spa, zio_type_t t, zio_taskq_type_t q)
uint_t count = ztip->zti_count;
spa_taskqs_t *tqs = &spa->spa_zio_taskq[t][q];
char name[32];
uint_t i, flags = TASKQ_DYNAMIC;
uint_t i, flags = 0;
boolean_t batch = B_FALSE;
if (mode == ZTI_MODE_NULL) {
@ -863,6 +863,7 @@ spa_taskqs_init(spa_t *spa, zio_type_t t, zio_taskq_type_t q)
case ZTI_MODE_FIXED:
ASSERT3U(value, >=, 1);
value = MAX(value, 1);
flags |= TASKQ_DYNAMIC;
break;
case ZTI_MODE_BATCH:
@ -1974,6 +1975,16 @@ spa_load_verify_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
return (0);
}
/* ARGSUSED */
int
verify_dataset_name_len(dsl_pool_t *dp, dsl_dataset_t *ds, void *arg)
{
if (dsl_dataset_namelen(ds) >= ZFS_MAX_DATASET_NAME_LEN)
return (SET_ERROR(ENAMETOOLONG));
return (0);
}
static int
spa_load_verify(spa_t *spa)
{
@ -1988,6 +1999,14 @@ spa_load_verify(spa_t *spa)
if (policy.zrp_request & ZPOOL_NEVER_REWIND)
return (0);
dsl_pool_config_enter(spa->spa_dsl_pool, FTAG);
error = dmu_objset_find_dp(spa->spa_dsl_pool,
spa->spa_dsl_pool->dp_root_dir_obj, verify_dataset_name_len, NULL,
DS_FIND_CHILDREN);
dsl_pool_config_exit(spa->spa_dsl_pool, FTAG);
if (error != 0)
return (error);
rio = zio_root(spa, NULL, &sle,
ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE);
@ -3844,211 +3863,6 @@ spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props,
return (0);
}
#ifdef _KERNEL
/*
* Get the root pool information from the root disk, then import the root pool
* during the system boot up time.
*/
extern int vdev_disk_read_rootlabel(char *, char *, nvlist_t **);
static nvlist_t *
spa_generate_rootconf(char *devpath, char *devid, uint64_t *guid)
{
nvlist_t *config;
nvlist_t *nvtop, *nvroot;
uint64_t pgid;
if (vdev_disk_read_rootlabel(devpath, devid, &config) != 0)
return (NULL);
/*
* Add this top-level vdev to the child array.
*/
VERIFY(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvtop) == 0);
VERIFY(nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&pgid) == 0);
VERIFY(nvlist_lookup_uint64(config, ZPOOL_CONFIG_GUID, guid) == 0);
/*
* Put this pool's top-level vdevs into a root vdev.
*/
VERIFY(nvlist_alloc(&nvroot, NV_UNIQUE_NAME, KM_SLEEP) == 0);
VERIFY(nvlist_add_string(nvroot, ZPOOL_CONFIG_TYPE,
VDEV_TYPE_ROOT) == 0);
VERIFY(nvlist_add_uint64(nvroot, ZPOOL_CONFIG_ID, 0ULL) == 0);
VERIFY(nvlist_add_uint64(nvroot, ZPOOL_CONFIG_GUID, pgid) == 0);
VERIFY(nvlist_add_nvlist_array(nvroot, ZPOOL_CONFIG_CHILDREN,
&nvtop, 1) == 0);
/*
* Replace the existing vdev_tree with the new root vdev in
* this pool's configuration (remove the old, add the new).
*/
VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, nvroot) == 0);
nvlist_free(nvroot);
return (config);
}
/*
* Walk the vdev tree and see if we can find a device with "better"
* configuration. A configuration is "better" if the label on that
* device has a more recent txg.
*/
static void
spa_alt_rootvdev(vdev_t *vd, vdev_t **avd, uint64_t *txg)
{
int c;
for (c = 0; c < vd->vdev_children; c++)
spa_alt_rootvdev(vd->vdev_child[c], avd, txg);
if (vd->vdev_ops->vdev_op_leaf) {
nvlist_t *label;
uint64_t label_txg;
if (vdev_disk_read_rootlabel(vd->vdev_physpath, vd->vdev_devid,
&label) != 0)
return;
VERIFY(nvlist_lookup_uint64(label, ZPOOL_CONFIG_POOL_TXG,
&label_txg) == 0);
/*
* Do we have a better boot device?
*/
if (label_txg > *txg) {
*txg = label_txg;
*avd = vd;
}
nvlist_free(label);
}
}
/*
* Import a root pool.
*
* For x86. devpath_list will consist of devid and/or physpath name of
* the vdev (e.g. "id1,sd@SSEAGATE..." or "/pci@1f,0/ide@d/disk@0,0:a").
* The GRUB "findroot" command will return the vdev we should boot.
*
* For Sparc, devpath_list consists the physpath name of the booting device
* no matter the rootpool is a single device pool or a mirrored pool.
* e.g.
* "/pci@1f,0/ide@d/disk@0,0:a"
*/
int
spa_import_rootpool(char *devpath, char *devid)
{
spa_t *spa;
vdev_t *rvd, *bvd, *avd = NULL;
nvlist_t *config, *nvtop;
uint64_t guid, txg;
char *pname;
int error;
/*
* Read the label from the boot device and generate a configuration.
*/
config = spa_generate_rootconf(devpath, devid, &guid);
#if defined(_OBP) && defined(_KERNEL)
if (config == NULL) {
if (strstr(devpath, "/iscsi/ssd") != NULL) {
/* iscsi boot */
get_iscsi_bootpath_phy(devpath);
config = spa_generate_rootconf(devpath, devid, &guid);
}
}
#endif
if (config == NULL) {
cmn_err(CE_NOTE, "Cannot read the pool label from '%s'",
devpath);
return (SET_ERROR(EIO));
}
VERIFY(nvlist_lookup_string(config, ZPOOL_CONFIG_POOL_NAME,
&pname) == 0);
VERIFY(nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_TXG, &txg) == 0);
mutex_enter(&spa_namespace_lock);
if ((spa = spa_lookup(pname)) != NULL) {
/*
* Remove the existing root pool from the namespace so that we
* can replace it with the correct config we just read in.
*/
spa_remove(spa);
}
spa = spa_add(pname, config, NULL);
spa->spa_is_root = B_TRUE;
spa->spa_import_flags = ZFS_IMPORT_VERBATIM;
/*
* Build up a vdev tree based on the boot device's label config.
*/
VERIFY(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvtop) == 0);
spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER);
error = spa_config_parse(spa, &rvd, nvtop, NULL, 0,
VDEV_ALLOC_ROOTPOOL);
spa_config_exit(spa, SCL_ALL, FTAG);
if (error) {
mutex_exit(&spa_namespace_lock);
nvlist_free(config);
cmn_err(CE_NOTE, "Can not parse the config for pool '%s'",
pname);
return (error);
}
/*
* Get the boot vdev.
*/
if ((bvd = vdev_lookup_by_guid(rvd, guid)) == NULL) {
cmn_err(CE_NOTE, "Can not find the boot vdev for guid %llu",
(u_longlong_t)guid);
error = SET_ERROR(ENOENT);
goto out;
}
/*
* Determine if there is a better boot device.
*/
avd = bvd;
spa_alt_rootvdev(rvd, &avd, &txg);
if (avd != bvd) {
cmn_err(CE_NOTE, "The boot device is 'degraded'. Please "
"try booting from '%s'", avd->vdev_path);
error = SET_ERROR(EINVAL);
goto out;
}
/*
* If the boot device is part of a spare vdev then ensure that
* we're booting off the active spare.
*/
if (bvd->vdev_parent->vdev_ops == &vdev_spare_ops &&
!bvd->vdev_isspare) {
cmn_err(CE_NOTE, "The boot device is currently spared. Please "
"try booting from '%s'",
bvd->vdev_parent->
vdev_child[bvd->vdev_parent->vdev_children - 1]->vdev_path);
error = SET_ERROR(EINVAL);
goto out;
}
error = 0;
out:
spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER);
vdev_free(rvd);
spa_config_exit(spa, SCL_ALL, FTAG);
mutex_exit(&spa_namespace_lock);
nvlist_free(config);
return (error);
}
#endif
/*
* Import a non-root pool into the system.
*/
@ -4148,12 +3962,6 @@ spa_import(char *pool, nvlist_t *config, nvlist_t *props, uint64_t flags)
VERIFY(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvroot) == 0);
if (error == 0)
error = spa_validate_aux(spa, nvroot, -1ULL,
VDEV_ALLOC_SPARE);
if (error == 0)
error = spa_validate_aux(spa, nvroot, -1ULL,
VDEV_ALLOC_L2CACHE);
spa_config_exit(spa, SCL_ALL, FTAG);
if (props != NULL)
@ -6762,7 +6570,6 @@ EXPORT_SYMBOL(spa_open);
EXPORT_SYMBOL(spa_open_rewind);
EXPORT_SYMBOL(spa_get_stats);
EXPORT_SYMBOL(spa_create);
EXPORT_SYMBOL(spa_import_rootpool);
EXPORT_SYMBOL(spa_import);
EXPORT_SYMBOL(spa_tryimport);
EXPORT_SYMBOL(spa_destroy);

View File

@ -174,7 +174,7 @@ spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl)
VERIFY(nvlist_pack(nvl, &buf, &buflen, NV_ENCODE_XDR,
KM_SLEEP) == 0);
#ifdef __linux__
#if defined(__linux__) && defined(_KERNEL)
/*
* Write the configuration to disk. Due to the complexity involved
* in performing a rename from within the kernel the file is truncated
@ -201,7 +201,8 @@ spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl)
*/
(void) snprintf(temp, MAXPATHLEN, "%s.tmp", dp->scd_path);
if (vn_open(temp, UIO_SYSSPACE, oflags, 0644, &vp, CRCREAT, 0) == 0) {
error = vn_open(temp, UIO_SYSSPACE, oflags, 0644, &vp, CRCREAT, 0);
if (error == 0) {
if (vn_rdwr(UIO_WRITE, vp, buf, buflen, 0, UIO_SYSSPACE,
0, RLIM64_INFINITY, kcred, NULL) == 0 &&
VOP_FSYNC(vp, FSYNC, kcred, NULL) == 0) {

View File

@ -877,11 +877,11 @@ vdev_metaslab_init(vdev_t *vd, uint64_t txg)
ASSERT(oldc <= newc);
mspp = kmem_zalloc(newc * sizeof (*mspp), KM_SLEEP);
mspp = vmem_zalloc(newc * sizeof (*mspp), KM_SLEEP);
if (oldc != 0) {
bcopy(vd->vdev_ms, mspp, oldc * sizeof (*mspp));
kmem_free(vd->vdev_ms, oldc * sizeof (*mspp));
vmem_free(vd->vdev_ms, oldc * sizeof (*mspp));
}
vd->vdev_ms = mspp;
@ -935,7 +935,7 @@ vdev_metaslab_fini(vdev_t *vd)
if (msp != NULL)
metaslab_fini(msp);
}
kmem_free(vd->vdev_ms, count * sizeof (metaslab_t *));
vmem_free(vd->vdev_ms, count * sizeof (metaslab_t *));
vd->vdev_ms = NULL;
}
@ -1407,7 +1407,7 @@ vdev_validate(vdev_t *vd, boolean_t strict)
spa_last_synced_txg(spa) : -1ULL;
if ((label = vdev_label_read_config(vd, txg)) == NULL) {
vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
VDEV_AUX_BAD_LABEL);
return (0);
}
@ -1799,6 +1799,9 @@ vdev_dtl_should_excise(vdev_t *vd)
ASSERT0(scn->scn_phys.scn_errors);
ASSERT0(vd->vdev_children);
if (vd->vdev_state < VDEV_STATE_DEGRADED)
return (B_FALSE);
if (vd->vdev_resilver_txg == 0 ||
range_tree_space(vd->vdev_dtl[DTL_MISSING]) == 0)
return (B_TRUE);
@ -1894,12 +1897,15 @@ vdev_dtl_reassess(vdev_t *vd, uint64_t txg, uint64_t scrub_txg, int scrub_done)
/*
* If the vdev was resilvering and no longer has any
* DTLs then reset its resilvering flag.
* DTLs then reset its resilvering flag and dirty
* the top level so that we persist the change.
*/
if (vd->vdev_resilver_txg != 0 &&
range_tree_space(vd->vdev_dtl[DTL_MISSING]) == 0 &&
range_tree_space(vd->vdev_dtl[DTL_OUTAGE]) == 0)
range_tree_space(vd->vdev_dtl[DTL_OUTAGE]) == 0) {
vd->vdev_resilver_txg = 0;
vdev_config_dirty(vd->vdev_top);
}
mutex_exit(&vd->vdev_dtl_lock);

View File

@ -41,10 +41,8 @@ static void *zfs_vdev_holder = VDEV_HOLDER;
* Virtual device vector for disks.
*/
typedef struct dio_request {
struct completion dr_comp; /* Completion for sync IO */
zio_t *dr_zio; /* Parent ZIO */
atomic_t dr_ref; /* References */
int dr_wait; /* Wait for IO */
int dr_error; /* Bio error */
int dr_bio_count; /* Count of bio's */
struct bio *dr_bio[0]; /* Attached bio's */
@ -139,7 +137,7 @@ vdev_elevator_switch(vdev_t *v, char *elevator)
return (0);
/* Leave existing scheduler when set to "none" */
if (strncmp(elevator, "none", 4) && (strlen(elevator) == 4) == 0)
if ((strncmp(elevator, "none", 4) == 0) && (strlen(elevator) == 4))
return (0);
#ifdef HAVE_ELEVATOR_CHANGE
@ -244,12 +242,12 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
{
struct block_device *bdev = ERR_PTR(-ENXIO);
vdev_disk_t *vd;
int mode, block_size;
int count = 0, mode, block_size;
/* Must have a pathname and it must be absolute. */
if (v->vdev_path == NULL || v->vdev_path[0] != '/') {
v->vdev_stat.vs_aux = VDEV_AUX_BAD_LABEL;
return (EINVAL);
return (SET_ERROR(EINVAL));
}
/*
@ -264,7 +262,7 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
vd = kmem_zalloc(sizeof (vdev_disk_t), KM_SLEEP);
if (vd == NULL)
return (ENOMEM);
return (SET_ERROR(ENOMEM));
/*
* Devices are always opened by the path provided at configuration
@ -279,16 +277,35 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
* /dev/[hd]d devices which may be reordered due to probing order.
* Devices in the wrong locations will be detected by the higher
* level vdev validation.
*
* The specified paths may be briefly removed and recreated in
* response to udev events. This should be exceptionally unlikely
* because the zpool command makes every effort to verify these paths
* have already settled prior to reaching this point. Therefore,
* a ENOENT failure at this point is highly likely to be transient
* and it is reasonable to sleep and retry before giving up. In
* practice delays have been observed to be on the order of 100ms.
*/
mode = spa_mode(v->vdev_spa);
if (v->vdev_wholedisk && v->vdev_expanding)
bdev = vdev_disk_rrpart(v->vdev_path, mode, vd);
if (IS_ERR(bdev))
while (IS_ERR(bdev) && count < 50) {
bdev = vdev_bdev_open(v->vdev_path,
vdev_bdev_mode(mode), zfs_vdev_holder);
if (unlikely(PTR_ERR(bdev) == -ENOENT)) {
msleep(10);
count++;
} else if (IS_ERR(bdev)) {
break;
}
}
if (IS_ERR(bdev)) {
dprintf("failed open v->vdev_path=%s, error=%d count=%d\n",
v->vdev_path, -PTR_ERR(bdev), count);
kmem_free(vd, sizeof (vdev_disk_t));
return (-PTR_ERR(bdev));
return (SET_ERROR(-PTR_ERR(bdev)));
}
v->vdev_tsd = vd;
@ -344,7 +361,6 @@ vdev_disk_dio_alloc(int bio_count)
dr = kmem_zalloc(sizeof (dio_request_t) +
sizeof (struct bio *) * bio_count, KM_SLEEP);
if (dr) {
init_completion(&dr->dr_comp);
atomic_set(&dr->dr_ref, 0);
dr->dr_bio_count = bio_count;
dr->dr_error = 0;
@ -407,7 +423,6 @@ BIO_END_IO_PROTO(vdev_disk_physio_completion, bio, error)
{
dio_request_t *dr = bio->bi_private;
int rc;
int wait;
if (dr->dr_error == 0) {
#ifdef HAVE_1ARG_BIO_END_IO_T
@ -420,13 +435,8 @@ BIO_END_IO_PROTO(vdev_disk_physio_completion, bio, error)
#endif
}
wait = dr->dr_wait;
/* Drop reference aquired by __vdev_disk_physio */
rc = vdev_disk_dio_put(dr);
/* Wake up synchronous waiter this is the last outstanding bio */
if (wait && rc == 1)
complete(&dr->dr_comp);
}
static inline unsigned long
@ -476,30 +486,43 @@ bio_map(struct bio *bio, void *bio_ptr, unsigned int bio_size)
}
static inline void
vdev_submit_bio(int rw, struct bio *bio)
vdev_submit_bio_impl(struct bio *bio)
{
#ifdef HAVE_1ARG_SUBMIT_BIO
submit_bio(bio);
#else
submit_bio(0, bio);
#endif
}
static inline void
vdev_submit_bio(struct bio *bio)
{
#ifdef HAVE_CURRENT_BIO_TAIL
struct bio **bio_tail = current->bio_tail;
current->bio_tail = NULL;
submit_bio(rw, bio);
vdev_submit_bio_impl(bio);
current->bio_tail = bio_tail;
#else
struct bio_list *bio_list = current->bio_list;
current->bio_list = NULL;
submit_bio(rw, bio);
vdev_submit_bio_impl(bio);
current->bio_list = bio_list;
#endif
}
static int
__vdev_disk_physio(struct block_device *bdev, zio_t *zio, caddr_t kbuf_ptr,
size_t kbuf_size, uint64_t kbuf_offset, int flags, int wait)
size_t kbuf_size, uint64_t kbuf_offset, int rw, int flags)
{
dio_request_t *dr;
caddr_t bio_ptr;
uint64_t bio_offset;
int rw, bio_size, bio_count = 16;
int bio_size, bio_count = 16;
int i = 0, error = 0;
#if defined(HAVE_BLK_QUEUE_HAVE_BLK_PLUG)
struct blk_plug plug;
#endif
ASSERT3U(kbuf_offset + kbuf_size, <=, bdev->bd_inode->i_size);
@ -511,9 +534,7 @@ retry:
if (zio && !(zio->io_flags & (ZIO_FLAG_IO_RETRY | ZIO_FLAG_TRYHARD)))
bio_set_flags_failfast(bdev, &flags);
rw = flags;
dr->dr_zio = zio;
dr->dr_wait = wait;
/*
* When the IO size exceeds the maximum bio size for the request
@ -555,9 +576,9 @@ retry:
dr->dr_bio[i]->bi_bdev = bdev;
BIO_BI_SECTOR(dr->dr_bio[i]) = bio_offset >> 9;
dr->dr_bio[i]->bi_rw = rw;
dr->dr_bio[i]->bi_end_io = vdev_disk_physio_completion;
dr->dr_bio[i]->bi_private = dr;
bio_set_op_attrs(dr->dr_bio[i], rw, flags);
/* Remaining size is returned to become the new size */
bio_size = bio_map(dr->dr_bio[i], bio_ptr, bio_size);
@ -572,38 +593,26 @@ retry:
if (zio)
zio->io_delay = jiffies_64;
#if defined(HAVE_BLK_QUEUE_HAVE_BLK_PLUG)
if (dr->dr_bio_count > 1)
blk_start_plug(&plug);
#endif
/* Submit all bio's associated with this dio */
for (i = 0; i < dr->dr_bio_count; i++)
if (dr->dr_bio[i])
vdev_submit_bio(rw, dr->dr_bio[i]);
vdev_submit_bio(dr->dr_bio[i]);
/*
* On synchronous blocking requests we wait for all bio the completion
* callbacks to run. We will be woken when the last callback runs
* for this dio. We are responsible for putting the last dio_request
* reference will in turn put back the last bio references. The
* only synchronous consumer is vdev_disk_read_rootlabel() all other
* IO originating from vdev_disk_io_start() is asynchronous.
*/
if (wait) {
wait_for_completion(&dr->dr_comp);
error = dr->dr_error;
ASSERT3S(atomic_read(&dr->dr_ref), ==, 1);
}
#if defined(HAVE_BLK_QUEUE_HAVE_BLK_PLUG)
if (dr->dr_bio_count > 1)
blk_finish_plug(&plug);
#endif
(void) vdev_disk_dio_put(dr);
return (error);
}
int
vdev_disk_physio(struct block_device *bdev, caddr_t kbuf,
size_t size, uint64_t offset, int flags)
{
bio_set_flags_failfast(bdev, &flags);
return (__vdev_disk_physio(bdev, NULL, kbuf, size, offset, flags, 1));
}
BIO_END_IO_PROTO(vdev_disk_io_flush_completion, bio, rc)
{
zio_t *zio = bio->bi_private;
@ -642,7 +651,8 @@ vdev_disk_io_flush(struct block_device *bdev, zio_t *zio)
bio->bi_private = zio;
bio->bi_bdev = bdev;
zio->io_delay = jiffies_64;
vdev_submit_bio(VDEV_WRITE_FLUSH_FUA, bio);
bio_set_flush(bio);
vdev_submit_bio(bio);
invalidate_bdev(bdev);
return (0);
@ -653,8 +663,7 @@ vdev_disk_io_start(zio_t *zio)
{
vdev_t *v = zio->io_vd;
vdev_disk_t *vd = v->vdev_tsd;
zio_priority_t pri = zio->io_priority;
int flags, error;
int rw, flags, error;
switch (zio->io_type) {
case ZIO_TYPE_IOCTL:
@ -693,17 +702,25 @@ vdev_disk_io_start(zio_t *zio)
zio_execute(zio);
return;
case ZIO_TYPE_WRITE:
if ((pri == ZIO_PRIORITY_SYNC_WRITE) && (v->vdev_nonrot))
flags = WRITE_SYNC;
else
flags = WRITE;
rw = WRITE;
#if defined(HAVE_BLK_QUEUE_HAVE_BIO_RW_UNPLUG)
flags = (1 << BIO_RW_UNPLUG);
#elif defined(REQ_UNPLUG)
flags = REQ_UNPLUG;
#else
flags = 0;
#endif
break;
case ZIO_TYPE_READ:
if ((pri == ZIO_PRIORITY_SYNC_READ) && (v->vdev_nonrot))
flags = READ_SYNC;
else
flags = READ;
rw = READ;
#if defined(HAVE_BLK_QUEUE_HAVE_BIO_RW_UNPLUG)
flags = (1 << BIO_RW_UNPLUG);
#elif defined(REQ_UNPLUG)
flags = REQ_UNPLUG;
#else
flags = 0;
#endif
break;
default:
@ -713,7 +730,7 @@ vdev_disk_io_start(zio_t *zio)
}
error = __vdev_disk_physio(vd->vd_bdev, zio, zio->io_data,
zio->io_size, zio->io_offset, flags, 0);
zio->io_size, zio->io_offset, rw, flags);
if (error) {
zio->io_error = error;
zio_interrupt(zio);
@ -783,68 +800,5 @@ vdev_ops_t vdev_disk_ops = {
B_TRUE /* leaf vdev */
};
/*
* Given the root disk device devid or pathname, read the label from
* the device, and construct a configuration nvlist.
*/
int
vdev_disk_read_rootlabel(char *devpath, char *devid, nvlist_t **config)
{
struct block_device *bdev;
vdev_label_t *label;
uint64_t s, size;
int i;
bdev = vdev_bdev_open(devpath, vdev_bdev_mode(FREAD), zfs_vdev_holder);
if (IS_ERR(bdev))
return (-PTR_ERR(bdev));
s = bdev_capacity(bdev);
if (s == 0) {
vdev_bdev_close(bdev, vdev_bdev_mode(FREAD));
return (EIO);
}
size = P2ALIGN_TYPED(s, sizeof (vdev_label_t), uint64_t);
label = vmem_alloc(sizeof (vdev_label_t), KM_SLEEP);
for (i = 0; i < VDEV_LABELS; i++) {
uint64_t offset, state, txg = 0;
/* read vdev label */
offset = vdev_label_offset(size, i, 0);
if (vdev_disk_physio(bdev, (caddr_t)label,
VDEV_SKIP_SIZE + VDEV_PHYS_SIZE, offset, READ_SYNC) != 0)
continue;
if (nvlist_unpack(label->vl_vdev_phys.vp_nvlist,
sizeof (label->vl_vdev_phys.vp_nvlist), config, 0) != 0) {
*config = NULL;
continue;
}
if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE,
&state) != 0 || state >= POOL_STATE_DESTROYED) {
nvlist_free(*config);
*config = NULL;
continue;
}
if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG,
&txg) != 0 || txg == 0) {
nvlist_free(*config);
*config = NULL;
continue;
}
break;
}
vmem_free(label, sizeof (vdev_label_t));
vdev_bdev_close(bdev, vdev_bdev_mode(FREAD));
return (0);
}
module_param(zfs_vdev_scheduler, charp, 0644);
MODULE_PARM_DESC(zfs_vdev_scheduler, "I/O scheduler");

View File

@ -24,7 +24,7 @@
*/
/*
* Copyright (c) 2012, 2014 by Delphix. All rights reserved.
* Copyright (c) 2012, 2017 by Delphix. All rights reserved.
*/
#include <sys/zfs_context.h>
@ -146,7 +146,7 @@ uint32_t zfs_vdev_sync_write_min_active = 10;
uint32_t zfs_vdev_sync_write_max_active = 10;
uint32_t zfs_vdev_async_read_min_active = 1;
uint32_t zfs_vdev_async_read_max_active = 3;
uint32_t zfs_vdev_async_write_min_active = 1;
uint32_t zfs_vdev_async_write_min_active = 2;
uint32_t zfs_vdev_async_write_max_active = 10;
uint32_t zfs_vdev_scrub_min_active = 1;
uint32_t zfs_vdev_scrub_max_active = 2;
@ -249,20 +249,29 @@ static int
vdev_queue_max_async_writes(spa_t *spa)
{
int writes;
uint64_t dirty = spa->spa_dsl_pool->dp_dirty_total;
uint64_t dirty = 0;
dsl_pool_t *dp = spa_get_dsl(spa);
uint64_t min_bytes = zfs_dirty_data_max *
zfs_vdev_async_write_active_min_dirty_percent / 100;
uint64_t max_bytes = zfs_dirty_data_max *
zfs_vdev_async_write_active_max_dirty_percent / 100;
/*
* Async writes may occur before the assignment of the spa's
* dsl_pool_t if a self-healing zio is issued prior to the
* completion of dmu_objset_open_impl().
*/
if (dp == NULL)
return (zfs_vdev_async_write_max_active);
/*
* Sync tasks correspond to interactive user actions. To reduce the
* execution time of those actions we push data out as fast as possible.
*/
if (spa_has_pending_synctask(spa)) {
if (spa_has_pending_synctask(spa))
return (zfs_vdev_async_write_max_active);
}
dirty = dp->dp_dirty_total;
if (dirty < min_bytes)
return (zfs_vdev_async_write_min_active);
if (dirty > max_bytes)
@ -536,7 +545,7 @@ vdev_queue_aggregate(vdev_queue_t *vq, zio_t *zio)
/*
* Walk backwards through sufficiently contiguous I/Os
* recording the last non-option I/O.
* recording the last non-optional I/O.
*/
while ((dio = AVL_PREV(t, first)) != NULL &&
(dio->io_flags & ZIO_FLAG_AGG_INHERIT) == flags &&
@ -558,10 +567,14 @@ vdev_queue_aggregate(vdev_queue_t *vq, zio_t *zio)
/*
* Walk forward through sufficiently contiguous I/Os.
* The aggregation limit does not apply to optional i/os, so that
* we can issue contiguous writes even if they are larger than the
* aggregation limit.
*/
while ((dio = AVL_NEXT(t, last)) != NULL &&
(dio->io_flags & ZIO_FLAG_AGG_INHERIT) == flags &&
IO_SPAN(first, dio) <= zfs_vdev_aggregation_limit &&
(IO_SPAN(first, dio) <= zfs_vdev_aggregation_limit ||
(dio->io_flags & ZIO_FLAG_OPTIONAL)) &&
IO_GAP(last, dio) <= maxgap) {
last = dio;
if (!(last->io_flags & ZIO_FLAG_OPTIONAL))
@ -596,6 +609,7 @@ vdev_queue_aggregate(vdev_queue_t *vq, zio_t *zio)
dio = AVL_NEXT(t, last);
dio->io_flags &= ~ZIO_FLAG_OPTIONAL;
} else {
/* do not include the optional i/o */
while (last != mandatory && last != first) {
ASSERT(last->io_flags & ZIO_FLAG_OPTIONAL);
last = AVL_PREV(t, last);
@ -607,7 +621,6 @@ vdev_queue_aggregate(vdev_queue_t *vq, zio_t *zio)
return (NULL);
size = IO_SPAN(first, last);
ASSERT3U(size, <=, zfs_vdev_aggregation_limit);
buf = zio_buf_alloc_flags(size, KM_NOSLEEP);
if (buf == NULL)

View File

@ -1457,7 +1457,7 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx)
if (ace_trivial_common(aclp, 0, zfs_ace_walk) == 0)
zp->z_pflags |= ZFS_ACL_TRIVIAL;
zfs_tstamp_update_setup(zp, STATE_CHANGED, NULL, ctime, B_TRUE);
zfs_tstamp_update_setup(zp, STATE_CHANGED, NULL, ctime);
return (sa_bulk_update(zp->z_sa_hdl, bulk, count, tx));
}
@ -2473,53 +2473,33 @@ zfs_zaccess(znode_t *zp, int mode, int flags, boolean_t skipaclchk, cred_t *cr)
{
uint32_t working_mode;
int error;
boolean_t check_privs;
znode_t *check_zp = zp;
int is_attr;
boolean_t check_privs;
znode_t *xzp;
znode_t *check_zp = zp;
mode_t needed_bits;
uid_t owner;
is_attr = ((zp->z_pflags & ZFS_XATTR) && S_ISDIR(ZTOI(zp)->i_mode));
/*
* If attribute then validate against base file
*/
if ((zp->z_pflags & ZFS_XATTR) && S_ISDIR(ZTOI(zp)->i_mode)) {
if (is_attr) {
uint64_t parent;
rw_enter(&zp->z_xattr_lock, RW_READER);
if (zp->z_xattr_parent) {
check_zp = zp->z_xattr_parent;
rw_exit(&zp->z_xattr_lock);
if ((error = sa_lookup(zp->z_sa_hdl,
SA_ZPL_PARENT(ZTOZSB(zp)), &parent,
sizeof (parent))) != 0)
return (error);
/*
* Verify a lookup yields the same znode.
*/
ASSERT3S(sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(
ZTOZSB(zp)), &parent, sizeof (parent)), ==, 0);
ASSERT3U(check_zp->z_id, ==, parent);
} else {
rw_exit(&zp->z_xattr_lock);
error = sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(
ZTOZSB(zp)), &parent, sizeof (parent));
if (error)
return (error);
/*
* Cache the lookup on the parent file znode as
* zp->z_xattr_parent and hold a reference. This
* effectively pins the parent in memory until all
* child xattr znodes have been destroyed and
* release their references in zfs_inode_destroy().
*/
error = zfs_zget(ZTOZSB(zp), parent, &check_zp);
if (error)
return (error);
rw_enter(&zp->z_xattr_lock, RW_WRITER);
if (zp->z_xattr_parent == NULL)
zp->z_xattr_parent = check_zp;
rw_exit(&zp->z_xattr_lock);
if ((error = zfs_zget(ZTOZSB(zp),
parent, &xzp)) != 0) {
return (error);
}
check_zp = xzp;
/*
* fixup mode to map to xattr perms
*/
@ -2561,11 +2541,15 @@ zfs_zaccess(znode_t *zp, int mode, int flags, boolean_t skipaclchk, cred_t *cr)
if ((error = zfs_zaccess_common(check_zp, mode, &working_mode,
&check_privs, skipaclchk, cr)) == 0) {
if (is_attr)
iput(ZTOI(xzp));
return (secpolicy_vnode_access2(cr, ZTOI(zp), owner,
needed_bits, needed_bits));
}
if (error && !check_privs) {
if (is_attr)
iput(ZTOI(xzp));
return (error);
}
@ -2626,6 +2610,9 @@ zfs_zaccess(znode_t *zp, int mode, int flags, boolean_t skipaclchk, cred_t *cr)
needed_bits, needed_bits);
}
if (is_attr)
iput(ZTOI(xzp));
return (error);
}

View File

@ -455,7 +455,7 @@ static struct inode *
zfsctl_inode_alloc(zfs_sb_t *zsb, uint64_t id,
const struct file_operations *fops, const struct inode_operations *ops)
{
struct timespec now = current_fs_time(zsb->z_sb);
struct timespec now;
struct inode *ip;
znode_t *zp;
@ -463,6 +463,7 @@ zfsctl_inode_alloc(zfs_sb_t *zsb, uint64_t id,
if (ip == NULL)
return (NULL);
now = current_time(ip);
zp = ITOZ(ip);
ASSERT3P(zp->z_dirlocks, ==, NULL);
ASSERT3P(zp->z_acl_cached, ==, NULL);
@ -478,15 +479,12 @@ zfsctl_inode_alloc(zfs_sb_t *zsb, uint64_t id,
zp->z_mapcnt = 0;
zp->z_gen = 0;
zp->z_size = 0;
zp->z_atime[0] = 0;
zp->z_atime[1] = 0;
zp->z_links = 0;
zp->z_pflags = 0;
zp->z_uid = 0;
zp->z_gid = 0;
zp->z_mode = 0;
zp->z_sync_cnt = 0;
zp->z_is_zvol = B_FALSE;
zp->z_is_mapped = B_FALSE;
zp->z_is_ctldir = B_TRUE;
zp->z_is_sa = B_FALSE;
@ -501,6 +499,9 @@ zfsctl_inode_alloc(zfs_sb_t *zsb, uint64_t id,
ip->i_ctime = now;
ip->i_fop = fops;
ip->i_op = ops;
#if defined(IOP_XATTR)
ip->i_opflags &= ~IOP_XATTR;
#endif
if (insert_inode_locked(ip)) {
unlock_new_inode(ip);
@ -1010,16 +1011,11 @@ out:
* best effort. In the case where it does fail, perhaps because
* it's in use, the unmount will fail harmlessly.
*/
#define SET_UNMOUNT_CMD \
"exec 0</dev/null " \
" 1>/dev/null " \
" 2>/dev/null; " \
"umount -t zfs -n %s'%s'"
int
zfsctl_snapshot_unmount(char *snapname, int flags)
{
char *argv[] = { "/bin/sh", "-c", NULL, NULL };
char *argv[] = { "/usr/bin/env", "umount", "-t", "zfs", "-n", NULL,
NULL };
char *envp[] = { NULL };
zfs_snapentry_t *se;
int error;
@ -1031,12 +1027,12 @@ zfsctl_snapshot_unmount(char *snapname, int flags)
}
rw_exit(&zfs_snapshot_lock);
argv[2] = kmem_asprintf(SET_UNMOUNT_CMD,
flags & MNT_FORCE ? "-f " : "", se->se_path);
zfsctl_snapshot_rele(se);
if (flags & MNT_FORCE)
argv[4] = "-fn";
argv[5] = se->se_path;
dprintf("unmount; path=%s\n", se->se_path);
error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
strfree(argv[2]);
zfsctl_snapshot_rele(se);
/*
@ -1051,11 +1047,6 @@ zfsctl_snapshot_unmount(char *snapname, int flags)
}
#define MOUNT_BUSY 0x80 /* Mount failed due to EBUSY (from mntent.h) */
#define SET_MOUNT_CMD \
"exec 0</dev/null " \
" 1>/dev/null " \
" 2>/dev/null; " \
"mount -t zfs -n '%s' '%s'"
int
zfsctl_snapshot_mount(struct path *path, int flags)
@ -1066,7 +1057,8 @@ zfsctl_snapshot_mount(struct path *path, int flags)
zfs_sb_t *snap_zsb;
zfs_snapentry_t *se;
char *full_name, *full_path;
char *argv[] = { "/bin/sh", "-c", NULL, NULL };
char *argv[] = { "/usr/bin/env", "mount", "-t", "zfs", "-n", NULL, NULL,
NULL };
char *envp[] = { NULL };
int error;
struct path spath;
@ -1111,9 +1103,9 @@ zfsctl_snapshot_mount(struct path *path, int flags)
* value from call_usermodehelper() will be (exitcode << 8 + signal).
*/
dprintf("mount; name=%s path=%s\n", full_name, full_path);
argv[2] = kmem_asprintf(SET_MOUNT_CMD, full_name, full_path);
argv[5] = full_name;
argv[6] = full_path;
error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
strfree(argv[2]);
if (error) {
if (!(error & MOUNT_BUSY << 8)) {
cmn_err(CE_WARN, "Unable to automount %s/%s: %d",

View File

@ -593,7 +593,7 @@ zfs_purgedir(znode_t *dzp)
if (error)
skipped += 1;
dmu_tx_commit(tx);
set_nlink(ZTOI(xzp), xzp->z_links);
zfs_iput_async(ZTOI(xzp));
}
zap_cursor_fini(&zc);
@ -694,6 +694,7 @@ zfs_rmnode(znode_t *zp)
mutex_enter(&xzp->z_lock);
xzp->z_unlinked = B_TRUE; /* mark xzp for deletion */
xzp->z_links = 0; /* no more links to it */
set_nlink(ZTOI(xzp), 0); /* this will let iput purge us */
VERIFY(0 == sa_update(xzp->z_sa_hdl, SA_ZPL_LINKS(zsb),
&xzp->z_links, sizeof (xzp->z_links), tx));
mutex_exit(&xzp->z_lock);
@ -759,7 +760,7 @@ zfs_link_create(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag)
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zsb), NULL,
ctime, sizeof (ctime));
zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime,
ctime, B_TRUE);
ctime);
}
error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
ASSERT(error == 0);
@ -780,7 +781,7 @@ zfs_link_create(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag)
ctime, sizeof (ctime));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb), NULL,
&dzp->z_pflags, sizeof (dzp->z_pflags));
zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime, B_TRUE);
zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime);
error = sa_bulk_update(dzp->z_sa_hdl, bulk, count, tx);
ASSERT(error == 0);
mutex_exit(&dzp->z_lock);
@ -875,8 +876,8 @@ zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag,
NULL, &ctime, sizeof (ctime));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb),
NULL, &zp->z_pflags, sizeof (zp->z_pflags));
zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime,
B_TRUE);
zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime,
ctime);
}
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zsb),
NULL, &zp->z_links, sizeof (zp->z_links));
@ -903,7 +904,7 @@ zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag,
NULL, mtime, sizeof (mtime));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb),
NULL, &dzp->z_pflags, sizeof (dzp->z_pflags));
zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime, B_TRUE);
zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime);
error = sa_bulk_update(dzp->z_sa_hdl, bulk, count, tx);
ASSERT(error == 0);
mutex_exit(&dzp->z_lock);

View File

@ -457,7 +457,8 @@ update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count)
/* We store the bits in big-endian (largest-first) order */
for (i = 0; i < 64; i++) {
if (value & (1ull << i)) {
hist[63 - i]++;
if (hist[63 - i] < UINT16_MAX)
hist[63 - i]++;
++bits;
}
}
@ -615,7 +616,6 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info,
if (badbuf == NULL || goodbuf == NULL)
return (eip);
ASSERT3U(nui64s, <=, UINT16_MAX);
ASSERT3U(size, ==, nui64s * sizeof (uint64_t));
ASSERT3U(size, <=, SPA_MAXBLOCKSIZE);
ASSERT3U(size, <=, UINT32_MAX);

View File

@ -1379,9 +1379,9 @@ get_zfs_sb(const char *dsname, zfs_sb_t **zsbp)
mutex_enter(&os->os_user_ptr_lock);
*zsbp = dmu_objset_get_user(os);
if (*zsbp && (*zsbp)->z_sb) {
atomic_inc(&((*zsbp)->z_sb->s_active));
} else {
/* bump s_active only when non-zero to prevent umount race */
if (*zsbp == NULL || (*zsbp)->z_sb == NULL ||
!atomic_inc_not_zero(&((*zsbp)->z_sb->s_active))) {
error = SET_ERROR(ESRCH);
}
mutex_exit(&os->os_user_ptr_lock);
@ -5817,8 +5817,11 @@ zfsdev_ioctl(struct file *filp, unsigned cmd, unsigned long arg)
}
if (error == 0 && !(flag & FKIOCTL))
if (error == 0 && !(flag & FKIOCTL)) {
cookie = spl_fstrans_mark();
error = vec->zvec_secpolicy(zc, innvl, CRED());
spl_fstrans_unmark(cookie);
}
if (error != 0)
goto out;

View File

@ -101,9 +101,9 @@
* Check if a write lock can be grabbed, or wait and recheck until available.
*/
static void
zfs_range_lock_writer(znode_t *zp, rl_t *new)
zfs_range_lock_writer(zfs_rlock_t *zrl, rl_t *new)
{
avl_tree_t *tree = &zp->z_range_avl;
avl_tree_t *tree = &zrl->zr_avl;
rl_t *rl;
avl_index_t where;
uint64_t end_size;
@ -112,32 +112,32 @@ zfs_range_lock_writer(znode_t *zp, rl_t *new)
for (;;) {
/*
* Range locking is also used by zvol and uses a
* dummied up znode. However, for zvol, we don't need to
* append or grow blocksize, and besides we don't have
* a "sa" data or zfs_sb_t - so skip that processing.
* Range locking is also used by zvol. However, for zvol, we
* don't need to append or grow blocksize, so skip that
* processing.
*
* Yes, this is ugly, and would be solved by not handling
* grow or append in range lock code. If that was done then
* we could make the range locking code generically available
* to other non-zfs consumers.
*/
if (!zp->z_is_zvol) { /* caller is ZPL */
if (zrl->zr_size) { /* caller is ZPL */
/*
* If in append mode pick up the current end of file.
* This is done under z_range_lock to avoid races.
*/
if (new->r_type == RL_APPEND)
new->r_off = zp->z_size;
new->r_off = *zrl->zr_size;
/*
* If we need to grow the block size then grab the whole
* file range. This is also done under z_range_lock to
* avoid races.
*/
end_size = MAX(zp->z_size, new->r_off + len);
if (end_size > zp->z_blksz && (!ISP2(zp->z_blksz) ||
zp->z_blksz < ZTOZSB(zp)->z_max_blksz)) {
end_size = MAX(*zrl->zr_size, new->r_off + len);
if (end_size > *zrl->zr_blksz &&
(!ISP2(*zrl->zr_blksz) ||
*zrl->zr_blksz < *zrl->zr_max_blksz)) {
new->r_off = 0;
new->r_len = UINT64_MAX;
}
@ -175,7 +175,7 @@ wait:
cv_init(&rl->r_wr_cv, NULL, CV_DEFAULT, NULL);
rl->r_write_wanted = B_TRUE;
}
cv_wait(&rl->r_wr_cv, &zp->z_range_lock);
cv_wait(&rl->r_wr_cv, &zrl->zr_mutex);
/* reset to original */
new->r_off = off;
@ -353,9 +353,9 @@ zfs_range_add_reader(avl_tree_t *tree, rl_t *new, rl_t *prev, avl_index_t where)
* Check if a reader lock can be grabbed, or wait and recheck until available.
*/
static void
zfs_range_lock_reader(znode_t *zp, rl_t *new)
zfs_range_lock_reader(zfs_rlock_t *zrl, rl_t *new)
{
avl_tree_t *tree = &zp->z_range_avl;
avl_tree_t *tree = &zrl->zr_avl;
rl_t *prev, *next;
avl_index_t where;
uint64_t off = new->r_off;
@ -378,7 +378,7 @@ retry:
cv_init(&prev->r_rd_cv, NULL, CV_DEFAULT, NULL);
prev->r_read_wanted = B_TRUE;
}
cv_wait(&prev->r_rd_cv, &zp->z_range_lock);
cv_wait(&prev->r_rd_cv, &zrl->zr_mutex);
goto retry;
}
if (off + len < prev->r_off + prev->r_len)
@ -401,7 +401,7 @@ retry:
cv_init(&next->r_rd_cv, NULL, CV_DEFAULT, NULL);
next->r_read_wanted = B_TRUE;
}
cv_wait(&next->r_rd_cv, &zp->z_range_lock);
cv_wait(&next->r_rd_cv, &zrl->zr_mutex);
goto retry;
}
if (off + len <= next->r_off + next->r_len)
@ -423,14 +423,14 @@ got_lock:
* previously locked as RL_WRITER).
*/
rl_t *
zfs_range_lock(znode_t *zp, uint64_t off, uint64_t len, rl_type_t type)
zfs_range_lock(zfs_rlock_t *zrl, uint64_t off, uint64_t len, rl_type_t type)
{
rl_t *new;
ASSERT(type == RL_READER || type == RL_WRITER || type == RL_APPEND);
new = kmem_alloc(sizeof (rl_t), KM_SLEEP);
new->r_zp = zp;
new->r_zrl = zrl;
new->r_off = off;
if (len + off < off) /* overflow */
len = UINT64_MAX - off;
@ -441,18 +441,18 @@ zfs_range_lock(znode_t *zp, uint64_t off, uint64_t len, rl_type_t type)
new->r_write_wanted = B_FALSE;
new->r_read_wanted = B_FALSE;
mutex_enter(&zp->z_range_lock);
mutex_enter(&zrl->zr_mutex);
if (type == RL_READER) {
/*
* First check for the usual case of no locks
*/
if (avl_numnodes(&zp->z_range_avl) == 0)
avl_add(&zp->z_range_avl, new);
if (avl_numnodes(&zrl->zr_avl) == 0)
avl_add(&zrl->zr_avl, new);
else
zfs_range_lock_reader(zp, new);
} else
zfs_range_lock_writer(zp, new); /* RL_WRITER or RL_APPEND */
mutex_exit(&zp->z_range_lock);
zfs_range_lock_reader(zrl, new);
} else /* RL_WRITER or RL_APPEND */
zfs_range_lock_writer(zrl, new);
mutex_exit(&zrl->zr_mutex);
return (new);
}
@ -474,9 +474,9 @@ zfs_range_free(void *arg)
* Unlock a reader lock
*/
static void
zfs_range_unlock_reader(znode_t *zp, rl_t *remove, list_t *free_list)
zfs_range_unlock_reader(zfs_rlock_t *zrl, rl_t *remove, list_t *free_list)
{
avl_tree_t *tree = &zp->z_range_avl;
avl_tree_t *tree = &zrl->zr_avl;
rl_t *rl, *next = NULL;
uint64_t len;
@ -543,7 +543,7 @@ zfs_range_unlock_reader(znode_t *zp, rl_t *remove, list_t *free_list)
void
zfs_range_unlock(rl_t *rl)
{
znode_t *zp = rl->r_zp;
zfs_rlock_t *zrl = rl->r_zrl;
list_t free_list;
rl_t *free_rl;
@ -552,10 +552,10 @@ zfs_range_unlock(rl_t *rl)
ASSERT(!rl->r_proxy);
list_create(&free_list, sizeof (rl_t), offsetof(rl_t, rl_node));
mutex_enter(&zp->z_range_lock);
mutex_enter(&zrl->zr_mutex);
if (rl->r_type == RL_WRITER) {
/* writer locks can't be shared or split */
avl_remove(&zp->z_range_avl, rl);
avl_remove(&zrl->zr_avl, rl);
if (rl->r_write_wanted)
cv_broadcast(&rl->r_wr_cv);
@ -568,9 +568,9 @@ zfs_range_unlock(rl_t *rl)
* lock may be shared, let zfs_range_unlock_reader()
* release the zp->z_range_lock lock and free the rl_t
*/
zfs_range_unlock_reader(zp, rl, &free_list);
zfs_range_unlock_reader(zrl, rl, &free_list);
}
mutex_exit(&zp->z_range_lock);
mutex_exit(&zrl->zr_mutex);
while ((free_rl = list_head(&free_list)) != NULL) {
list_remove(&free_list, free_rl);
@ -588,17 +588,17 @@ zfs_range_unlock(rl_t *rl)
void
zfs_range_reduce(rl_t *rl, uint64_t off, uint64_t len)
{
znode_t *zp = rl->r_zp;
zfs_rlock_t *zrl = rl->r_zrl;
/* Ensure there are no other locks */
ASSERT(avl_numnodes(&zp->z_range_avl) == 1);
ASSERT(avl_numnodes(&zrl->zr_avl) == 1);
ASSERT(rl->r_off == 0);
ASSERT(rl->r_type == RL_WRITER);
ASSERT(!rl->r_proxy);
ASSERT3U(rl->r_len, ==, UINT64_MAX);
ASSERT3U(rl->r_cnt, ==, 1);
mutex_enter(&zp->z_range_lock);
mutex_enter(&zrl->zr_mutex);
rl->r_off = off;
rl->r_len = len;
@ -607,7 +607,7 @@ zfs_range_reduce(rl_t *rl, uint64_t off, uint64_t len)
if (rl->r_read_wanted)
cv_broadcast(&rl->r_rd_cv);
mutex_exit(&zp->z_range_lock);
mutex_exit(&zrl->zr_mutex);
}
/*
@ -626,3 +626,10 @@ zfs_range_compare(const void *arg1, const void *arg2)
return (-1);
return (0);
}
#ifdef _KERNEL
EXPORT_SYMBOL(zfs_range_lock);
EXPORT_SYMBOL(zfs_range_unlock);
EXPORT_SYMBOL(zfs_range_reduce);
EXPORT_SYMBOL(zfs_range_compare);
#endif

View File

@ -277,7 +277,7 @@ zfs_sa_upgrade(sa_handle_t *hdl, dmu_tx_t *tx)
sa_bulk_attr_t *bulk, *sa_attrs;
zfs_acl_locator_cb_t locate = { 0 };
uint64_t uid, gid, mode, rdev, xattr, parent;
uint64_t crtime[2], mtime[2], ctime[2];
uint64_t crtime[2], mtime[2], ctime[2], atime[2];
zfs_acl_phys_t znode_acl;
char scanstamp[AV_SCANSTAMP_SZ];
boolean_t drop_lock = B_FALSE;
@ -309,6 +309,7 @@ zfs_sa_upgrade(sa_handle_t *hdl, dmu_tx_t *tx)
/* First do a bulk query of the attributes that aren't cached */
bulk = kmem_alloc(sizeof (sa_bulk_attr_t) * 20, KM_SLEEP);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zsb), NULL, &atime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zsb), NULL, &mtime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zsb), NULL, &ctime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CRTIME(zsb), NULL, &crtime, 16);
@ -344,7 +345,7 @@ zfs_sa_upgrade(sa_handle_t *hdl, dmu_tx_t *tx)
SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_FLAGS(zsb), NULL,
&zp->z_pflags, 8);
SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_ATIME(zsb), NULL,
zp->z_atime, 16);
&atime, 16);
SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_MTIME(zsb), NULL,
&mtime, 16);
SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_CTIME(zsb), NULL,

View File

@ -699,20 +699,18 @@ zfs_sb_create(const char *osname, zfs_mntopts_t *zmo, zfs_sb_t **zsbp)
zsb = kmem_zalloc(sizeof (zfs_sb_t), KM_SLEEP);
/*
* Optional temporary mount options, free'd in zfs_sb_free().
*/
zsb->z_mntopts = (zmo ? zmo : zfs_mntopts_alloc());
/*
* We claim to always be readonly so we can open snapshots;
* other ZPL code will prevent us from writing to snapshots.
*/
error = dmu_objset_own(osname, DMU_OST_ZFS, B_TRUE, zsb, &os);
if (error) {
kmem_free(zsb, sizeof (zfs_sb_t));
return (error);
}
/*
* Optional temporary mount options, free'd in zfs_sb_free().
*/
zsb->z_mntopts = (zmo ? zmo : zfs_mntopts_alloc());
if (error)
goto out_zmo;
/*
* Initialize the zfs-specific filesystem structure.
@ -840,8 +838,9 @@ zfs_sb_create(const char *osname, zfs_mntopts_t *zmo, zfs_sb_t **zsbp)
out:
dmu_objset_disown(os, zsb);
out_zmo:
*zsbp = NULL;
zfs_mntopts_free(zsb->z_mntopts);
kmem_free(zsb, sizeof (zfs_sb_t));
return (error);
}
@ -1124,8 +1123,7 @@ zfs_root(zfs_sb_t *zsb, struct inode **ipp)
}
EXPORT_SYMBOL(zfs_root);
#if !defined(HAVE_SPLIT_SHRINKER_CALLBACK) && !defined(HAVE_SHRINK) && \
defined(HAVE_D_PRUNE_ALIASES)
#ifdef HAVE_D_PRUNE_ALIASES
/*
* Linux kernels older than 3.1 do not support a per-filesystem shrinker.
* To accommodate this we must improvise and manually walk the list of znodes
@ -1215,15 +1213,29 @@ zfs_sb_prune(struct super_block *sb, unsigned long nr_to_scan, int *objects)
} else {
*objects = (*shrinker->scan_objects)(shrinker, &sc);
}
#elif defined(HAVE_SPLIT_SHRINKER_CALLBACK)
*objects = (*shrinker->scan_objects)(shrinker, &sc);
#elif defined(HAVE_SHRINK)
*objects = (*shrinker->shrink)(shrinker, &sc);
#elif defined(HAVE_D_PRUNE_ALIASES)
#define D_PRUNE_ALIASES_IS_DEFAULT
*objects = zfs_sb_prune_aliases(zsb, nr_to_scan);
#else
#error "No available dentry and inode cache pruning mechanism."
#endif
#if defined(HAVE_D_PRUNE_ALIASES) && !defined(D_PRUNE_ALIASES_IS_DEFAULT)
#undef D_PRUNE_ALIASES_IS_DEFAULT
/*
* Fall back to zfs_sb_prune_aliases if the kernel's per-superblock
* shrinker couldn't free anything, possibly due to the inodes being
* allocated in a different memcg.
*/
if (*objects == 0)
*objects = zfs_sb_prune_aliases(zsb, nr_to_scan);
#endif
ZFS_EXIT(zsb);
dprintf_ds(zsb->z_os->os_dsl_dataset,
@ -1391,13 +1403,13 @@ zfs_domount(struct super_block *sb, zfs_mntopts_t *zmo, int silent)
sb->s_time_gran = 1;
sb->s_blocksize = recordsize;
sb->s_blocksize_bits = ilog2(recordsize);
zsb->z_bdi.ra_pages = 0;
sb->s_bdi = &zsb->z_bdi;
error = -zpl_bdi_setup_and_register(&zsb->z_bdi, "zfs");
error = -zpl_bdi_setup(sb, "zfs");
if (error)
goto out;
sb->s_bdi->ra_pages = 0;
/* Set callback operations for the file system. */
sb->s_op = &zpl_super_operations;
sb->s_xattr = zpl_xattr_handlers;
@ -1493,7 +1505,7 @@ zfs_umount(struct super_block *sb)
arc_remove_prune_callback(zsb->z_arc_prune);
VERIFY(zfs_sb_teardown(zsb, B_TRUE) == 0);
os = zsb->z_os;
bdi_destroy(sb->s_bdi);
zpl_bdi_destroy(sb);
/*
* z_os will be NULL if there was an error in
@ -1601,6 +1613,14 @@ zfs_vget(struct super_block *sb, struct inode **ipp, fid_t *fidp)
ZFS_EXIT(zsb);
return (err);
}
/* Don't export xattr stuff */
if (zp->z_pflags & ZFS_XATTR) {
iput(ZTOI(zp));
ZFS_EXIT(zsb);
return (SET_ERROR(ENOENT));
}
(void) sa_lookup(zp->z_sa_hdl, SA_ZPL_GEN(zsb), &zp_gen,
sizeof (uint64_t));
zp_gen = zp_gen & gen_mask;
@ -1613,7 +1633,7 @@ zfs_vget(struct super_block *sb, struct inode **ipp, fid_t *fidp)
fid_gen);
iput(ZTOI(zp));
ZFS_EXIT(zsb);
return (SET_ERROR(EINVAL));
return (SET_ERROR(ENOENT));
}
*ipp = ZTOI(zp);
@ -1858,7 +1878,10 @@ zfs_init(void)
void
zfs_fini(void)
{
taskq_wait_outstanding(system_taskq, 0);
/*
* we don't use outstanding because zpl_posix_acl_free might add more.
*/
taskq_wait(system_taskq);
unregister_filesystem(&zpl_fs_type);
zfs_znode_fini();
zfsctl_fini();

View File

@ -332,11 +332,11 @@ update_pages(struct inode *ip, int64_t start, int len,
int64_t off;
void *pb;
off = start & (PAGE_CACHE_SIZE-1);
for (start &= PAGE_CACHE_MASK; len > 0; start += PAGE_CACHE_SIZE) {
nbytes = MIN(PAGE_CACHE_SIZE - off, len);
off = start & (PAGE_SIZE-1);
for (start &= PAGE_MASK; len > 0; start += PAGE_SIZE) {
nbytes = MIN(PAGE_SIZE - off, len);
pp = find_lock_page(mp, start >> PAGE_CACHE_SHIFT);
pp = find_lock_page(mp, start >> PAGE_SHIFT);
if (pp) {
if (mapping_writably_mapped(mp))
flush_dcache_page(pp);
@ -353,7 +353,7 @@ update_pages(struct inode *ip, int64_t start, int len,
SetPageUptodate(pp);
ClearPageError(pp);
unlock_page(pp);
page_cache_release(pp);
put_page(pp);
}
len -= nbytes;
@ -384,11 +384,11 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
void *pb;
start = uio->uio_loffset;
off = start & (PAGE_CACHE_SIZE-1);
for (start &= PAGE_CACHE_MASK; len > 0; start += PAGE_CACHE_SIZE) {
bytes = MIN(PAGE_CACHE_SIZE - off, len);
off = start & (PAGE_SIZE-1);
for (start &= PAGE_MASK; len > 0; start += PAGE_SIZE) {
bytes = MIN(PAGE_SIZE - off, len);
pp = find_lock_page(mp, start >> PAGE_CACHE_SHIFT);
pp = find_lock_page(mp, start >> PAGE_SHIFT);
if (pp) {
ASSERT(PageUptodate(pp));
@ -401,7 +401,7 @@ mappedread(struct inode *ip, int nbytes, uio_t *uio)
mark_page_accessed(pp);
unlock_page(pp);
page_cache_release(pp);
put_page(pp);
} else {
error = dmu_read_uio_dbuf(sa_get_db(zp->z_sa_hdl),
uio, bytes);
@ -481,7 +481,8 @@ zfs_read(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
/*
* Lock the range against changes.
*/
rl = zfs_range_lock(zp, uio->uio_loffset, uio->uio_resid, RL_READER);
rl = zfs_range_lock(&zp->z_range_lock, uio->uio_loffset, uio->uio_resid,
RL_READER);
/*
* If we are reading past end-of-file we can skip
@ -549,7 +550,6 @@ zfs_read(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
out:
zfs_range_unlock(rl);
ZFS_ACCESSTIME_STAMP(zsb, zp);
ZFS_EXIT(zsb);
return (error);
}
@ -663,7 +663,7 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
* Obtain an appending range lock to guarantee file append
* semantics. We reset the write offset once we have the lock.
*/
rl = zfs_range_lock(zp, 0, n, RL_APPEND);
rl = zfs_range_lock(&zp->z_range_lock, 0, n, RL_APPEND);
woff = rl->r_off;
if (rl->r_len == UINT64_MAX) {
/*
@ -680,7 +680,7 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
* this write, then this range lock will lock the entire file
* so that we can re-write the block safely.
*/
rl = zfs_range_lock(zp, woff, n, RL_WRITER);
rl = zfs_range_lock(&zp->z_range_lock, woff, n, RL_WRITER);
}
if (woff >= limit) {
@ -864,8 +864,7 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
}
mutex_exit(&zp->z_acl_lock);
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime,
B_TRUE);
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime);
/*
* Update the file size (zp_size) if it has changed;
@ -1007,7 +1006,8 @@ zfs_get_data(void *arg, lr_write_t *lr, char *buf, zio_t *zio)
* we don't have to write the data twice.
*/
if (buf != NULL) { /* immediate write */
zgd->zgd_rl = zfs_range_lock(zp, offset, size, RL_READER);
zgd->zgd_rl = zfs_range_lock(&zp->z_range_lock, offset, size,
RL_READER);
/* test for truncation needs to be done while range locked */
if (offset >= zp->z_size) {
error = SET_ERROR(ENOENT);
@ -1028,8 +1028,8 @@ zfs_get_data(void *arg, lr_write_t *lr, char *buf, zio_t *zio)
size = zp->z_blksz;
blkoff = ISP2(size) ? P2PHASE(offset, size) : offset;
offset -= blkoff;
zgd->zgd_rl = zfs_range_lock(zp, offset, size,
RL_READER);
zgd->zgd_rl = zfs_range_lock(&zp->z_range_lock, offset,
size, RL_READER);
if (zp->z_blksz == size)
break;
offset += blkoff;
@ -1602,13 +1602,13 @@ top:
error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
if (error) {
zfs_dirent_unlock(dl);
iput(ip);
if (xzp)
iput(ZTOI(xzp));
if (error == ERESTART) {
waited = B_TRUE;
dmu_tx_wait(tx);
dmu_tx_abort(tx);
iput(ip);
if (xzp)
iput(ZTOI(xzp));
goto top;
}
#ifdef HAVE_PN_UTILS
@ -1616,6 +1616,9 @@ top:
pn_free(realnmp);
#endif /* HAVE_PN_UTILS */
dmu_tx_abort(tx);
iput(ip);
if (xzp)
iput(ZTOI(xzp));
ZFS_EXIT(zsb);
return (error);
}
@ -1944,14 +1947,15 @@ top:
rw_exit(&zp->z_parent_lock);
rw_exit(&zp->z_name_lock);
zfs_dirent_unlock(dl);
iput(ip);
if (error == ERESTART) {
waited = B_TRUE;
dmu_tx_wait(tx);
dmu_tx_abort(tx);
iput(ip);
goto top;
}
dmu_tx_abort(tx);
iput(ip);
ZFS_EXIT(zsb);
return (error);
}
@ -2138,9 +2142,6 @@ update:
zap_cursor_fini(&zc);
if (error == ENOENT)
error = 0;
ZFS_ACCESSTIME_STAMP(zsb, zp);
out:
ZFS_EXIT(zsb);
@ -2193,11 +2194,11 @@ zfs_getattr(struct inode *ip, vattr_t *vap, int flags, cred_t *cr)
zfs_sb_t *zsb = ITOZSB(ip);
int error = 0;
uint64_t links;
uint64_t mtime[2], ctime[2];
uint64_t atime[2], mtime[2], ctime[2];
xvattr_t *xvap = (xvattr_t *)vap; /* vap may be an xvattr_t * */
xoptattr_t *xoap = NULL;
boolean_t skipaclchk = (flags & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE;
sa_bulk_attr_t bulk[2];
sa_bulk_attr_t bulk[3];
int count = 0;
ZFS_ENTER(zsb);
@ -2205,6 +2206,7 @@ zfs_getattr(struct inode *ip, vattr_t *vap, int flags, cred_t *cr)
zfs_fuid_map_ids(zp, cr, &vap->va_uid, &vap->va_gid);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zsb), NULL, &atime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zsb), NULL, &mtime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zsb), NULL, &ctime, 16);
@ -2353,7 +2355,7 @@ zfs_getattr(struct inode *ip, vattr_t *vap, int flags, cred_t *cr)
}
}
ZFS_TIME_DECODE(&vap->va_atime, zp->z_atime);
ZFS_TIME_DECODE(&vap->va_atime, atime);
ZFS_TIME_DECODE(&vap->va_mtime, mtime);
ZFS_TIME_DECODE(&vap->va_ctime, ctime);
@ -2400,7 +2402,6 @@ zfs_getattr_fast(struct inode *ip, struct kstat *sp)
mutex_enter(&zp->z_lock);
generic_fillattr(ip, sp);
ZFS_TIME_DECODE(&sp->atime, zp->z_atime);
sa_object_size(zp->z_sa_hdl, &blksize, &nblocks);
sp->blksize = blksize;
@ -2464,7 +2465,7 @@ zfs_setattr(struct inode *ip, vattr_t *vap, int flags, cred_t *cr)
uint64_t new_mode;
uint64_t new_uid, new_gid;
uint64_t xattr_obj;
uint64_t mtime[2], ctime[2];
uint64_t mtime[2], ctime[2], atime[2];
znode_t *attrzp;
int need_policy = FALSE;
int err, err2;
@ -2937,10 +2938,11 @@ top:
}
if (mask & ATTR_ATIME) {
ZFS_TIME_ENCODE(&vap->va_atime, zp->z_atime);
if ((mask & ATTR_ATIME) || zp->z_atime_dirty) {
zp->z_atime_dirty = 0;
ZFS_TIME_ENCODE(&ip->i_atime, atime);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zsb), NULL,
&zp->z_atime, sizeof (zp->z_atime));
&atime, sizeof (atime));
}
if (mask & ATTR_MTIME) {
@ -2955,19 +2957,17 @@ top:
NULL, mtime, sizeof (mtime));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zsb), NULL,
&ctime, sizeof (ctime));
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime,
B_TRUE);
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime);
} else if (mask != 0) {
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zsb), NULL,
&ctime, sizeof (ctime));
zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime,
B_TRUE);
zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime);
if (attrzp) {
SA_ADD_BULK_ATTR(xattr_bulk, xattr_count,
SA_ZPL_CTIME(zsb), NULL,
&ctime, sizeof (ctime));
zfs_tstamp_update_setup(attrzp, STATE_CHANGED,
mtime, ctime, B_TRUE);
mtime, ctime);
}
}
/*
@ -3029,8 +3029,6 @@ out:
ASSERT(err2 == 0);
}
if (attrzp)
iput(ZTOI(attrzp));
if (aclp)
zfs_acl_free(aclp);
@ -3041,11 +3039,15 @@ out:
if (err) {
dmu_tx_abort(tx);
if (attrzp)
iput(ZTOI(attrzp));
if (err == ERESTART)
goto top;
} else {
err2 = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
dmu_tx_commit(tx);
if (attrzp)
iput(ZTOI(attrzp));
zfs_inode_update(zp);
}
@ -3078,7 +3080,7 @@ zfs_rename_unlock(zfs_zlock_t **zlpp)
while ((zl = *zlpp) != NULL) {
if (zl->zl_znode != NULL)
iput(ZTOI(zl->zl_znode));
zfs_iput_async(ZTOI(zl->zl_znode));
rw_exit(zl->zl_rwlock);
*zlpp = zl->zl_next;
kmem_free(zl, sizeof (*zl));
@ -3415,16 +3417,19 @@ top:
if (sdzp == tdzp)
rw_exit(&sdzp->z_name_lock);
iput(ZTOI(szp));
if (tzp)
iput(ZTOI(tzp));
if (error == ERESTART) {
waited = B_TRUE;
dmu_tx_wait(tx);
dmu_tx_abort(tx);
iput(ZTOI(szp));
if (tzp)
iput(ZTOI(tzp));
goto top;
}
dmu_tx_abort(tx);
iput(ZTOI(szp));
if (tzp)
iput(ZTOI(tzp));
ZFS_EXIT(zsb);
return (error);
}
@ -3688,7 +3693,6 @@ zfs_readlink(struct inode *ip, uio_t *uio, cred_t *cr)
error = zfs_sa_readlink(zp, uio);
mutex_exit(&zp->z_lock);
ZFS_ACCESSTIME_STAMP(zsb, zp);
ZFS_EXIT(zsb);
return (error);
}
@ -3894,8 +3898,8 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc)
pgoff = page_offset(pp); /* Page byte-offset in file */
offset = i_size_read(ip); /* File length in bytes */
pglen = MIN(PAGE_CACHE_SIZE, /* Page length in bytes */
P2ROUNDUP(offset, PAGE_CACHE_SIZE)-pgoff);
pglen = MIN(PAGE_SIZE, /* Page length in bytes */
P2ROUNDUP(offset, PAGE_SIZE)-pgoff);
/* Page is beyond end of file */
if (pgoff >= offset) {
@ -3947,7 +3951,7 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc)
redirty_page_for_writepage(wbc, pp);
unlock_page(pp);
rl = zfs_range_lock(zp, pgoff, pglen, RL_WRITER);
rl = zfs_range_lock(&zp->z_range_lock, pgoff, pglen, RL_WRITER);
lock_page(pp);
/* Page mapping changed or it was no longer dirty, we're done */
@ -4006,7 +4010,7 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc)
}
va = kmap(pp);
ASSERT3U(pglen, <=, PAGE_CACHE_SIZE);
ASSERT3U(pglen, <=, PAGE_SIZE);
dmu_write(zsb->z_os, zp->z_id, pgoff, pglen, va, tx);
kunmap(pp);
@ -4054,7 +4058,7 @@ zfs_dirty_inode(struct inode *ip, int flags)
dmu_tx_t *tx;
uint64_t mode, atime[2], mtime[2], ctime[2];
sa_bulk_attr_t bulk[4];
int error;
int error = 0;
int cnt = 0;
if (zfs_is_readonly(zsb) || dmu_objset_is_snapshot(zsb->z_os))
@ -4063,6 +4067,20 @@ zfs_dirty_inode(struct inode *ip, int flags)
ZFS_ENTER(zsb);
ZFS_VERIFY_ZP(zp);
#ifdef I_DIRTY_TIME
/*
* This is the lazytime semantic indroduced in Linux 4.0
* This flag will only be called from update_time when lazytime is set.
* (Note, I_DIRTY_SYNC will also set if not lazytime)
* Fortunately mtime and ctime are managed within ZFS itself, so we
* only need to dirty atime.
*/
if (flags == I_DIRTY_TIME) {
zp->z_atime_dirty = 1;
goto out;
}
#endif
tx = dmu_tx_create(zsb->z_os);
dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
@ -4075,6 +4093,8 @@ zfs_dirty_inode(struct inode *ip, int flags)
}
mutex_enter(&zp->z_lock);
zp->z_atime_dirty = 0;
SA_ADD_BULK_ATTR(bulk, cnt, SA_ZPL_MODE(zsb), NULL, &mode, 8);
SA_ADD_BULK_ATTR(bulk, cnt, SA_ZPL_ATIME(zsb), NULL, &atime, 16);
SA_ADD_BULK_ATTR(bulk, cnt, SA_ZPL_MTIME(zsb), NULL, &mtime, 16);
@ -4087,7 +4107,6 @@ zfs_dirty_inode(struct inode *ip, int flags)
mode = ip->i_mode;
zp->z_mode = mode;
zp->z_atime_dirty = 0;
error = sa_bulk_update(zp->z_sa_hdl, bulk, cnt, tx);
mutex_exit(&zp->z_lock);
@ -4105,6 +4124,7 @@ zfs_inactive(struct inode *ip)
{
znode_t *zp = ITOZ(ip);
zfs_sb_t *zsb = ITOZSB(ip);
uint64_t atime[2];
int error;
int need_unlock = 0;
@ -4128,9 +4148,10 @@ zfs_inactive(struct inode *ip)
if (error) {
dmu_tx_abort(tx);
} else {
ZFS_TIME_ENCODE(&ip->i_atime, atime);
mutex_enter(&zp->z_lock);
(void) sa_update(zp->z_sa_hdl, SA_ZPL_ATIME(zsb),
(void *)&zp->z_atime, sizeof (zp->z_atime), tx);
(void *)&atime, sizeof (atime), tx);
zp->z_atime_dirty = 0;
mutex_exit(&zp->z_lock);
dmu_tx_commit(tx);
@ -4181,7 +4202,7 @@ zfs_fillpage(struct inode *ip, struct page *pl[], int nr_pages)
int err;
os = zsb->z_os;
io_len = nr_pages << PAGE_CACHE_SHIFT;
io_len = nr_pages << PAGE_SHIFT;
i_size = i_size_read(ip);
io_off = page_offset(pl[0]);
@ -4192,10 +4213,10 @@ zfs_fillpage(struct inode *ip, struct page *pl[], int nr_pages)
* Iterate over list of pages and read each page individually.
*/
page_idx = 0;
cur_pp = pl[0];
for (total = io_off + io_len; io_off < total; io_off += PAGESIZE) {
caddr_t va;
cur_pp = pl[page_idx++];
va = kmap(cur_pp);
err = dmu_read(os, zp->z_id, io_off, PAGESIZE, va,
DMU_READ_PREFETCH);
@ -4206,7 +4227,6 @@ zfs_fillpage(struct inode *ip, struct page *pl[], int nr_pages)
err = SET_ERROR(EIO);
return (err);
}
cur_pp = pl[++page_idx];
}
return (0);
@ -4240,9 +4260,6 @@ zfs_getpage(struct inode *ip, struct page *pl[], int nr_pages)
err = zfs_fillpage(ip, pl, nr_pages);
if (!err)
ZFS_ACCESSTIME_STAMP(zsb, zp);
ZFS_EXIT(zsb);
return (err);
}

View File

@ -113,14 +113,11 @@ zfs_znode_cache_constructor(void *buf, void *arg, int kmflags)
mutex_init(&zp->z_acl_lock, NULL, MUTEX_DEFAULT, NULL);
rw_init(&zp->z_xattr_lock, NULL, RW_DEFAULT, NULL);
mutex_init(&zp->z_range_lock, NULL, MUTEX_DEFAULT, NULL);
avl_create(&zp->z_range_avl, zfs_range_compare,
sizeof (rl_t), offsetof(rl_t, r_node));
zfs_rlock_init(&zp->z_range_lock);
zp->z_dirlocks = NULL;
zp->z_acl_cached = NULL;
zp->z_xattr_cached = NULL;
zp->z_xattr_parent = NULL;
zp->z_moved = 0;
return (0);
}
@ -137,13 +134,11 @@ zfs_znode_cache_destructor(void *buf, void *arg)
rw_destroy(&zp->z_name_lock);
mutex_destroy(&zp->z_acl_lock);
rw_destroy(&zp->z_xattr_lock);
avl_destroy(&zp->z_range_avl);
mutex_destroy(&zp->z_range_lock);
zfs_rlock_destroy(&zp->z_range_lock);
ASSERT(zp->z_dirlocks == NULL);
ASSERT(zp->z_acl_cached == NULL);
ASSERT(zp->z_xattr_cached == NULL);
ASSERT(zp->z_xattr_parent == NULL);
}
static int
@ -435,11 +430,6 @@ zfs_inode_destroy(struct inode *ip)
zp->z_xattr_cached = NULL;
}
if (zp->z_xattr_parent) {
zfs_iput_async(ZTOI(zp->z_xattr_parent));
zp->z_xattr_parent = NULL;
}
kmem_cache_free(znode_cache, zp);
}
@ -492,110 +482,6 @@ zfs_inode_set_ops(zfs_sb_t *zsb, struct inode *ip)
}
}
/*
* Construct a znode+inode and initialize.
*
* This does not do a call to dmu_set_user() that is
* up to the caller to do, in case you don't want to
* return the znode
*/
static znode_t *
zfs_znode_alloc(zfs_sb_t *zsb, dmu_buf_t *db, int blksz,
dmu_object_type_t obj_type, uint64_t obj, sa_handle_t *hdl,
struct inode *dip)
{
znode_t *zp;
struct inode *ip;
uint64_t mode;
uint64_t parent;
sa_bulk_attr_t bulk[9];
int count = 0;
ASSERT(zsb != NULL);
ip = new_inode(zsb->z_sb);
if (ip == NULL)
return (NULL);
zp = ITOZ(ip);
ASSERT(zp->z_dirlocks == NULL);
ASSERT3P(zp->z_acl_cached, ==, NULL);
ASSERT3P(zp->z_xattr_cached, ==, NULL);
ASSERT3P(zp->z_xattr_parent, ==, NULL);
zp->z_moved = 0;
zp->z_sa_hdl = NULL;
zp->z_unlinked = 0;
zp->z_atime_dirty = 0;
zp->z_mapcnt = 0;
zp->z_id = db->db_object;
zp->z_blksz = blksz;
zp->z_seq = 0x7A4653;
zp->z_sync_cnt = 0;
zp->z_is_zvol = B_FALSE;
zp->z_is_mapped = B_FALSE;
zp->z_is_ctldir = B_FALSE;
zp->z_is_stale = B_FALSE;
zfs_znode_sa_init(zsb, zp, db, obj_type, hdl);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zsb), NULL, &mode, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GEN(zsb), NULL, &zp->z_gen, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zsb), NULL, &zp->z_size, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zsb), NULL, &zp->z_links, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb), NULL,
&zp->z_pflags, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zsb), NULL,
&parent, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zsb), NULL,
&zp->z_atime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zsb), NULL, &zp->z_uid, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zsb), NULL, &zp->z_gid, 8);
if (sa_bulk_lookup(zp->z_sa_hdl, bulk, count) != 0 || zp->z_gen == 0) {
if (hdl == NULL)
sa_handle_destroy(zp->z_sa_hdl);
zp->z_sa_hdl = NULL;
goto error;
}
zp->z_mode = mode;
/*
* xattr znodes hold a reference on their unique parent
*/
if (dip && zp->z_pflags & ZFS_XATTR) {
igrab(dip);
zp->z_xattr_parent = ITOZ(dip);
}
ip->i_ino = obj;
zfs_inode_update(zp);
zfs_inode_set_ops(zsb, ip);
/*
* The only way insert_inode_locked() can fail is if the ip->i_ino
* number is already hashed for this super block. This can never
* happen because the inode numbers map 1:1 with the object numbers.
*
* The one exception is rolling back a mounted file system, but in
* this case all the active inode are unhashed during the rollback.
*/
VERIFY3S(insert_inode_locked(ip), ==, 0);
mutex_enter(&zsb->z_znodes_lock);
list_insert_tail(&zsb->z_all_znodes, zp);
zsb->z_nr_znodes++;
membar_producer();
mutex_exit(&zsb->z_znodes_lock);
unlock_new_inode(ip);
return (zp);
error:
iput(ip);
return (NULL);
}
void
zfs_set_inode_flags(znode_t *zp, struct inode *ip)
{
@ -622,12 +508,13 @@ zfs_set_inode_flags(znode_t *zp, struct inode *ip)
* inode has the correct field it should be used, and the ZFS code
* updated to access the inode. This can be done incrementally.
*/
void
zfs_inode_update(znode_t *zp)
static void
zfs_inode_update_impl(znode_t *zp, boolean_t new)
{
zfs_sb_t *zsb;
struct inode *ip;
uint32_t blksize;
u_longlong_t i_blocks;
uint64_t atime[2], mtime[2], ctime[2];
ASSERT(zp != NULL);
@ -642,6 +529,8 @@ zfs_inode_update(znode_t *zp)
sa_lookup(zp->z_sa_hdl, SA_ZPL_MTIME(zsb), &mtime, 16);
sa_lookup(zp->z_sa_hdl, SA_ZPL_CTIME(zsb), &ctime, 16);
dmu_object_size_from_db(sa_get_db(zp->z_sa_hdl), &blksize, &i_blocks);
spin_lock(&ip->i_lock);
ip->i_generation = zp->z_gen;
ip->i_uid = SUID_TO_KUID(zp->z_uid);
@ -650,10 +539,14 @@ zfs_inode_update(znode_t *zp)
ip->i_mode = zp->z_mode;
zfs_set_inode_flags(zp, ip);
ip->i_blkbits = SPA_MINBLOCKSHIFT;
dmu_object_size_from_db(sa_get_db(zp->z_sa_hdl), &blksize,
(u_longlong_t *)&ip->i_blocks);
ip->i_blocks = i_blocks;
ZFS_TIME_DECODE(&ip->i_atime, atime);
/*
* Only read atime from SA if we are newly created inode (or rezget),
* otherwise i_atime might be dirty.
*/
if (new)
ZFS_TIME_DECODE(&ip->i_atime, atime);
ZFS_TIME_DECODE(&ip->i_mtime, mtime);
ZFS_TIME_DECODE(&ip->i_ctime, ctime);
@ -661,6 +554,112 @@ zfs_inode_update(znode_t *zp)
spin_unlock(&ip->i_lock);
}
static void
zfs_inode_update_new(znode_t *zp)
{
zfs_inode_update_impl(zp, B_TRUE);
}
void
zfs_inode_update(znode_t *zp)
{
zfs_inode_update_impl(zp, B_FALSE);
}
/*
* Construct a znode+inode and initialize.
*
* This does not do a call to dmu_set_user() that is
* up to the caller to do, in case you don't want to
* return the znode
*/
static znode_t *
zfs_znode_alloc(zfs_sb_t *zsb, dmu_buf_t *db, int blksz,
dmu_object_type_t obj_type, uint64_t obj, sa_handle_t *hdl)
{
znode_t *zp;
struct inode *ip;
uint64_t mode;
uint64_t parent;
sa_bulk_attr_t bulk[8];
int count = 0;
ASSERT(zsb != NULL);
ip = new_inode(zsb->z_sb);
if (ip == NULL)
return (NULL);
zp = ITOZ(ip);
ASSERT(zp->z_dirlocks == NULL);
ASSERT3P(zp->z_acl_cached, ==, NULL);
ASSERT3P(zp->z_xattr_cached, ==, NULL);
zp->z_moved = 0;
zp->z_sa_hdl = NULL;
zp->z_unlinked = 0;
zp->z_atime_dirty = 0;
zp->z_mapcnt = 0;
zp->z_id = db->db_object;
zp->z_blksz = blksz;
zp->z_seq = 0x7A4653;
zp->z_sync_cnt = 0;
zp->z_is_mapped = B_FALSE;
zp->z_is_ctldir = B_FALSE;
zp->z_is_stale = B_FALSE;
zp->z_range_lock.zr_size = &zp->z_size;
zp->z_range_lock.zr_blksz = &zp->z_blksz;
zp->z_range_lock.zr_max_blksz = &ZTOZSB(zp)->z_max_blksz;
zfs_znode_sa_init(zsb, zp, db, obj_type, hdl);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zsb), NULL, &mode, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GEN(zsb), NULL, &zp->z_gen, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zsb), NULL, &zp->z_size, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zsb), NULL, &zp->z_links, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb), NULL,
&zp->z_pflags, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zsb), NULL,
&parent, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zsb), NULL, &zp->z_uid, 8);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zsb), NULL, &zp->z_gid, 8);
if (sa_bulk_lookup(zp->z_sa_hdl, bulk, count) != 0 || zp->z_gen == 0) {
if (hdl == NULL)
sa_handle_destroy(zp->z_sa_hdl);
zp->z_sa_hdl = NULL;
goto error;
}
zp->z_mode = mode;
ip->i_ino = obj;
zfs_inode_update_new(zp);
zfs_inode_set_ops(zsb, ip);
/*
* The only way insert_inode_locked() can fail is if the ip->i_ino
* number is already hashed for this super block. This can never
* happen because the inode numbers map 1:1 with the object numbers.
*
* The one exception is rolling back a mounted file system, but in
* this case all the active inode are unhashed during the rollback.
*/
VERIFY3S(insert_inode_locked(ip), ==, 0);
mutex_enter(&zsb->z_znodes_lock);
list_insert_tail(&zsb->z_all_znodes, zp);
zsb->z_nr_znodes++;
membar_producer();
mutex_exit(&zsb->z_znodes_lock);
unlock_new_inode(ip);
return (zp);
error:
iput(ip);
return (NULL);
}
/*
* Safely mark an inode dirty. Inodes which are part of a read-only
* file system or snapshot may not be dirtied.
@ -913,8 +912,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr,
VERIFY(sa_replace_all_by_template(sa_hdl, sa_attrs, cnt, tx) == 0);
if (!(flag & IS_ROOT_NODE)) {
*zpp = zfs_znode_alloc(zsb, db, 0, obj_type, obj, sa_hdl,
ZTOI(dzp));
*zpp = zfs_znode_alloc(zsb, db, 0, obj_type, obj, sa_hdl);
VERIFY(*zpp != NULL);
VERIFY(dzp != NULL);
} else {
@ -1124,7 +1122,7 @@ again:
* bonus buffer.
*/
zp = zfs_znode_alloc(zsb, db, doi.doi_data_block_size,
doi.doi_bonus_type, obj_num, NULL, NULL);
doi.doi_bonus_type, obj_num, NULL);
if (zp == NULL) {
err = SET_ERROR(ENOENT);
} else {
@ -1142,12 +1140,22 @@ zfs_rezget(znode_t *zp)
dmu_buf_t *db;
uint64_t obj_num = zp->z_id;
uint64_t mode;
sa_bulk_attr_t bulk[8];
sa_bulk_attr_t bulk[7];
int err;
int count = 0;
uint64_t gen;
znode_hold_t *zh;
/*
* skip ctldir, otherwise they will always get invalidated. This will
* cause funny behaviour for the mounted snapdirs. Especially for
* Linux >= 3.18, d_invalidate will detach the mountpoint and prevent
* anyone automount it again as long as someone is still using the
* detached mount.
*/
if (zp->z_is_ctldir)
return (0);
zh = zfs_znode_hold_enter(zsb, obj_num);
mutex_enter(&zp->z_acl_lock);
@ -1162,11 +1170,6 @@ zfs_rezget(znode_t *zp)
nvlist_free(zp->z_xattr_cached);
zp->z_xattr_cached = NULL;
}
if (zp->z_xattr_parent) {
zfs_iput_async(ZTOI(zp->z_xattr_parent));
zp->z_xattr_parent = NULL;
}
rw_exit(&zp->z_xattr_lock);
ASSERT(zp->z_sa_hdl == NULL);
@ -1197,8 +1200,6 @@ zfs_rezget(znode_t *zp)
&zp->z_links, sizeof (zp->z_links));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb), NULL,
&zp->z_pflags, sizeof (zp->z_pflags));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zsb), NULL,
&zp->z_atime, sizeof (zp->z_atime));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zsb), NULL,
&zp->z_uid, sizeof (zp->z_uid));
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zsb), NULL,
@ -1222,7 +1223,8 @@ zfs_rezget(znode_t *zp)
zp->z_unlinked = (zp->z_links == 0);
zp->z_blksz = doi.doi_data_block_size;
zfs_inode_update(zp);
zp->z_atime_dirty = 0;
zfs_inode_update_new(zp);
zfs_znode_hold_exit(zsb, zh);
@ -1293,78 +1295,28 @@ zfs_compare_timespec(struct timespec *t1, struct timespec *t2)
return (t1->tv_nsec - t2->tv_nsec);
}
/*
* Determine whether the znode's atime must be updated. The logic mostly
* duplicates the Linux kernel's relatime_need_update() functionality.
* This function is only called if the underlying filesystem actually has
* atime updates enabled.
*/
static inline boolean_t
zfs_atime_need_update(znode_t *zp, timestruc_t *now)
{
if (!ZTOZSB(zp)->z_relatime)
return (B_TRUE);
/*
* In relatime mode, only update the atime if the previous atime
* is earlier than either the ctime or mtime or if at least a day
* has passed since the last update of atime.
*/
if (zfs_compare_timespec(&ZTOI(zp)->i_mtime, &ZTOI(zp)->i_atime) >= 0)
return (B_TRUE);
if (zfs_compare_timespec(&ZTOI(zp)->i_ctime, &ZTOI(zp)->i_atime) >= 0)
return (B_TRUE);
if ((long)now->tv_sec - ZTOI(zp)->i_atime.tv_sec >= 24*60*60)
return (B_TRUE);
return (B_FALSE);
}
/*
* Prepare to update znode time stamps.
*
* IN: zp - znode requiring timestamp update
* flag - ATTR_MTIME, ATTR_CTIME, ATTR_ATIME flags
* have_tx - true of caller is creating a new txg
* flag - ATTR_MTIME, ATTR_CTIME flags
*
* OUT: zp - new atime (via underlying inode's i_atime)
* OUT: zp - z_seq
* mtime - new mtime
* ctime - new ctime
*
* NOTE: The arguments are somewhat redundant. The following condition
* is always true:
*
* have_tx == !(flag & ATTR_ATIME)
* Note: We don't update atime here, because we rely on Linux VFS to do
* atime updating.
*/
void
zfs_tstamp_update_setup(znode_t *zp, uint_t flag, uint64_t mtime[2],
uint64_t ctime[2], boolean_t have_tx)
uint64_t ctime[2])
{
timestruc_t now;
ASSERT(have_tx == !(flag & ATTR_ATIME));
gethrestime(&now);
/*
* NOTE: The following test intentionally does not update z_atime_dirty
* in the case where an ATIME update has been requested but for which
* the update is omitted due to relatime logic. The rationale being
* that if the flag was set somewhere else, we should leave it alone
* here.
*/
if (flag & ATTR_ATIME) {
if (zfs_atime_need_update(zp, &now)) {
ZFS_TIME_ENCODE(&now, zp->z_atime);
ZTOI(zp)->i_atime.tv_sec = zp->z_atime[0];
ZTOI(zp)->i_atime.tv_nsec = zp->z_atime[1];
zp->z_atime_dirty = 1;
}
} else {
zp->z_atime_dirty = 0;
zp->z_seq++;
}
zp->z_seq++;
if (flag & ATTR_MTIME) {
ZFS_TIME_ENCODE(&now, mtime);
@ -1437,7 +1389,7 @@ zfs_extend(znode_t *zp, uint64_t end)
/*
* We will change zp_size, lock the whole file.
*/
rl = zfs_range_lock(zp, 0, UINT64_MAX, RL_WRITER);
rl = zfs_range_lock(&zp->z_range_lock, 0, UINT64_MAX, RL_WRITER);
/*
* Nothing to do if file already at desired length.
@ -1510,13 +1462,12 @@ zfs_zero_partial_page(znode_t *zp, uint64_t start, uint64_t len)
int64_t off;
void *pb;
ASSERT((start & PAGE_CACHE_MASK) ==
((start + len - 1) & PAGE_CACHE_MASK));
ASSERT((start & PAGE_MASK) == ((start + len - 1) & PAGE_MASK));
off = start & (PAGE_CACHE_SIZE - 1);
start &= PAGE_CACHE_MASK;
off = start & (PAGE_SIZE - 1);
start &= PAGE_MASK;
pp = find_lock_page(mp, start >> PAGE_CACHE_SHIFT);
pp = find_lock_page(mp, start >> PAGE_SHIFT);
if (pp) {
if (mapping_writably_mapped(mp))
flush_dcache_page(pp);
@ -1532,7 +1483,7 @@ zfs_zero_partial_page(znode_t *zp, uint64_t start, uint64_t len)
SetPageUptodate(pp);
ClearPageError(pp);
unlock_page(pp);
page_cache_release(pp);
put_page(pp);
}
}
@ -1555,7 +1506,7 @@ zfs_free_range(znode_t *zp, uint64_t off, uint64_t len)
/*
* Lock the range being freed.
*/
rl = zfs_range_lock(zp, off, len, RL_WRITER);
rl = zfs_range_lock(&zp->z_range_lock, off, len, RL_WRITER);
/*
* Nothing to do if file already at desired length.
@ -1579,14 +1530,14 @@ zfs_free_range(znode_t *zp, uint64_t off, uint64_t len)
loff_t first_page_offset, last_page_offset;
/* first possible full page in hole */
first_page = (off + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
first_page = (off + PAGE_SIZE - 1) >> PAGE_SHIFT;
/* last page of hole */
last_page = (off + len) >> PAGE_CACHE_SHIFT;
last_page = (off + len) >> PAGE_SHIFT;
/* offset of first_page */
first_page_offset = first_page << PAGE_CACHE_SHIFT;
first_page_offset = first_page << PAGE_SHIFT;
/* offset of last_page */
last_page_offset = last_page << PAGE_CACHE_SHIFT;
last_page_offset = last_page << PAGE_SHIFT;
/* truncate whole pages */
if (last_page_offset > first_page_offset) {
@ -1637,7 +1588,7 @@ zfs_trunc(znode_t *zp, uint64_t end)
/*
* We will change zp_size, lock the whole file.
*/
rl = zfs_range_lock(zp, 0, UINT64_MAX, RL_WRITER);
rl = zfs_range_lock(&zp->z_range_lock, 0, UINT64_MAX, RL_WRITER);
/*
* Nothing to do if file already at desired length.
@ -1737,7 +1688,7 @@ log:
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zsb), NULL, ctime, 16);
SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zsb),
NULL, &zp->z_pflags, 8);
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime, B_TRUE);
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime);
error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
ASSERT(error == 0);

View File

@ -139,10 +139,10 @@ zio_init(void)
if (arc_watch && !IS_P2ALIGNED(size, PAGESIZE))
continue;
#endif
if (size <= 4 * SPA_MINBLOCKSIZE) {
if (size < PAGESIZE) {
align = SPA_MINBLOCKSIZE;
} else if (IS_P2ALIGNED(size, p2 >> 2)) {
align = MIN(p2 >> 2, PAGESIZE);
align = PAGESIZE;
}
if (align != 0) {
@ -1415,6 +1415,31 @@ zio_execute(zio_t *zio)
spl_fstrans_unmark(cookie);
}
/*
* Used to determine if in the current context the stack is sized large
* enough to allow zio_execute() to be called recursively. A minimum
* stack size of 16K is required to avoid needing to re-dispatch the zio.
*/
boolean_t
zio_execute_stack_check(zio_t *zio)
{
#if !defined(HAVE_LARGE_STACKS)
dsl_pool_t *dp = spa_get_dsl(zio->io_spa);
/* Executing in txg_sync_thread() context. */
if (dp && curthread == dp->dp_tx.tx_sync_thread)
return (B_TRUE);
/* Pool initialization outside of zio_taskq context. */
if (dp && spa_is_initializing(dp->dp_spa) &&
!zio_taskq_member(zio, ZIO_TASKQ_ISSUE) &&
!zio_taskq_member(zio, ZIO_TASKQ_ISSUE_HIGH))
return (B_TRUE);
#endif /* HAVE_LARGE_STACKS */
return (B_FALSE);
}
__attribute__((always_inline))
static inline void
__zio_execute(zio_t *zio)
@ -1424,8 +1449,6 @@ __zio_execute(zio_t *zio)
while (zio->io_stage < ZIO_STAGE_DONE) {
enum zio_stage pipeline = zio->io_pipeline;
enum zio_stage stage = zio->io_stage;
dsl_pool_t *dp;
boolean_t cut;
int rv;
ASSERT(!MUTEX_HELD(&zio->io_lock));
@ -1438,10 +1461,6 @@ __zio_execute(zio_t *zio)
ASSERT(stage <= ZIO_STAGE_DONE);
dp = spa_get_dsl(zio->io_spa);
cut = (stage == ZIO_STAGE_VDEV_IO_START) ?
zio_requeue_io_start_cut_in_line : B_FALSE;
/*
* If we are in interrupt context and this pipeline stage
* will grab a config lock that is held across I/O,
@ -1453,21 +1472,19 @@ __zio_execute(zio_t *zio)
*/
if ((stage & ZIO_BLOCKING_STAGES) && zio->io_vd == NULL &&
zio_taskq_member(zio, ZIO_TASKQ_INTERRUPT)) {
boolean_t cut = (stage == ZIO_STAGE_VDEV_IO_START) ?
zio_requeue_io_start_cut_in_line : B_FALSE;
zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, cut);
return;
}
/*
* If we executing in the context of the tx_sync_thread,
* or we are performing pool initialization outside of a
* zio_taskq[ZIO_TASKQ_ISSUE|ZIO_TASKQ_ISSUE_HIGH] context.
* Then issue the zio asynchronously to minimize stack usage
* for these deep call paths.
* If the current context doesn't have large enough stacks
* the zio must be issued asynchronously to prevent overflow.
*/
if ((dp && curthread == dp->dp_tx.tx_sync_thread) ||
(dp && spa_is_initializing(dp->dp_spa) &&
!zio_taskq_member(zio, ZIO_TASKQ_ISSUE) &&
!zio_taskq_member(zio, ZIO_TASKQ_ISSUE_HIGH))) {
if (zio_execute_stack_check(zio)) {
boolean_t cut = (stage == ZIO_STAGE_VDEV_IO_START) ?
zio_requeue_io_start_cut_in_line : B_FALSE;
zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, cut);
return;
}
@ -3455,7 +3472,7 @@ zbookmark_is_before(const dnode_phys_t *dnp, const zbookmark_phys_t *zb1,
if (zb1->zb_object == DMU_META_DNODE_OBJECT) {
uint64_t nextobj = zb1nextL0 *
(dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT) >> DNODE_SHIFT;
(dnp->dn_datablkszsec << (SPA_MINBLOCKSHIFT - DNODE_SHIFT));
return (nextobj <= zb2thisobj);
}

View File

@ -81,7 +81,7 @@ out:
return (error);
}
#if !defined(HAVE_VFS_ITERATE)
#if !defined(HAVE_VFS_ITERATE) && !defined(HAVE_VFS_ITERATE_SHARED)
static int
zpl_root_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
@ -100,16 +100,17 @@ zpl_root_readdir(struct file *filp, void *dirent, filldir_t filldir)
*/
/* ARGSUSED */
static int
zpl_root_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat)
zpl_root_getattr_impl(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
int error;
struct inode *ip = path->dentry->d_inode;
error = simple_getattr(mnt, dentry, stat);
stat->atime = CURRENT_TIME;
generic_fillattr(ip, stat);
stat->atime = current_time(ip);
return (error);
return (0);
}
ZPL_GETATTR_WRAPPER(zpl_root_getattr);
static struct dentry *
#ifdef HAVE_LOOKUP_NAMEIDATA
@ -144,7 +145,9 @@ const struct file_operations zpl_fops_root = {
.open = zpl_common_open,
.llseek = generic_file_llseek,
.read = generic_read_dir,
#ifdef HAVE_VFS_ITERATE
#ifdef HAVE_VFS_ITERATE_SHARED
.iterate_shared = zpl_root_iterate,
#elif defined(HAVE_VFS_ITERATE)
.iterate = zpl_root_iterate,
#else
.readdir = zpl_root_readdir,
@ -285,7 +288,7 @@ out:
return (error);
}
#if !defined(HAVE_VFS_ITERATE)
#if !defined(HAVE_VFS_ITERATE) && !defined(HAVE_VFS_ITERATE_SHARED)
static int
zpl_snapdir_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
@ -299,13 +302,17 @@ zpl_snapdir_readdir(struct file *filp, void *dirent, filldir_t filldir)
}
#endif /* HAVE_VFS_ITERATE */
int
zpl_snapdir_rename(struct inode *sdip, struct dentry *sdentry,
struct inode *tdip, struct dentry *tdentry)
static int
zpl_snapdir_rename2(struct inode *sdip, struct dentry *sdentry,
struct inode *tdip, struct dentry *tdentry, unsigned int flags)
{
cred_t *cr = CRED();
int error;
/* We probably don't want to support renameat2(2) in ctldir */
if (flags)
return (-EINVAL);
crhold(cr);
error = -zfsctl_snapdir_rename(sdip, dname(sdentry),
tdip, dname(tdentry), cr, 0);
@ -315,6 +322,15 @@ zpl_snapdir_rename(struct inode *sdip, struct dentry *sdentry,
return (error);
}
#ifndef HAVE_RENAME_WANTS_FLAGS
static int
zpl_snapdir_rename(struct inode *sdip, struct dentry *sdentry,
struct inode *tdip, struct dentry *tdentry)
{
return (zpl_snapdir_rename2(sdip, sdentry, tdip, tdentry, 0));
}
#endif
static int
zpl_snapdir_rmdir(struct inode *dip, struct dentry *dentry)
{
@ -360,21 +376,22 @@ zpl_snapdir_mkdir(struct inode *dip, struct dentry *dentry, zpl_umode_t mode)
*/
/* ARGSUSED */
static int
zpl_snapdir_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat)
zpl_snapdir_getattr_impl(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
zfs_sb_t *zsb = ITOZSB(dentry->d_inode);
int error;
struct inode *ip = path->dentry->d_inode;
zfs_sb_t *zsb = ITOZSB(path->dentry->d_inode);
ZFS_ENTER(zsb);
error = simple_getattr(mnt, dentry, stat);
generic_fillattr(path->dentry->d_inode, stat);
stat->nlink = stat->size = 2;
stat->ctime = stat->mtime = dmu_objset_snap_cmtime(zsb->z_os);
stat->atime = CURRENT_TIME;
stat->atime = current_time(ip);
ZFS_EXIT(zsb);
return (error);
return (0);
}
ZPL_GETATTR_WRAPPER(zpl_snapdir_getattr);
/*
* The '.zfs/snapshot' directory file operations. These mainly control
@ -385,7 +402,9 @@ const struct file_operations zpl_fops_snapdir = {
.open = zpl_common_open,
.llseek = generic_file_llseek,
.read = generic_read_dir,
#ifdef HAVE_VFS_ITERATE
#ifdef HAVE_VFS_ITERATE_SHARED
.iterate_shared = zpl_snapdir_iterate,
#elif defined(HAVE_VFS_ITERATE)
.iterate = zpl_snapdir_iterate,
#else
.readdir = zpl_snapdir_readdir,
@ -401,7 +420,11 @@ const struct file_operations zpl_fops_snapdir = {
const struct inode_operations zpl_ops_snapdir = {
.lookup = zpl_snapdir_lookup,
.getattr = zpl_snapdir_getattr,
#ifdef HAVE_RENAME_WANTS_FLAGS
.rename = zpl_snapdir_rename2,
#else
.rename = zpl_snapdir_rename,
#endif
.rmdir = zpl_snapdir_rmdir,
.mkdir = zpl_snapdir_mkdir,
};
@ -472,7 +495,7 @@ out:
return (error);
}
#if !defined(HAVE_VFS_ITERATE)
#if !defined(HAVE_VFS_ITERATE) && !defined(HAVE_VFS_ITERATE_SHARED)
static int
zpl_shares_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
@ -488,10 +511,10 @@ zpl_shares_readdir(struct file *filp, void *dirent, filldir_t filldir)
/* ARGSUSED */
static int
zpl_shares_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat)
zpl_shares_getattr_impl(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
struct inode *ip = dentry->d_inode;
struct inode *ip = path->dentry->d_inode;
zfs_sb_t *zsb = ITOZSB(ip);
znode_t *dzp;
int error;
@ -499,11 +522,11 @@ zpl_shares_getattr(struct vfsmount *mnt, struct dentry *dentry,
ZFS_ENTER(zsb);
if (zsb->z_shares_dir == 0) {
error = simple_getattr(mnt, dentry, stat);
generic_fillattr(path->dentry->d_inode, stat);
stat->nlink = stat->size = 2;
stat->atime = CURRENT_TIME;
stat->atime = current_time(ip);
ZFS_EXIT(zsb);
return (error);
return (0);
}
error = -zfs_zget(zsb, zsb->z_shares_dir, &dzp);
@ -517,6 +540,7 @@ zpl_shares_getattr(struct vfsmount *mnt, struct dentry *dentry,
return (error);
}
ZPL_GETATTR_WRAPPER(zpl_shares_getattr);
/*
* The '.zfs/shares' directory file operations.
@ -525,7 +549,9 @@ const struct file_operations zpl_fops_shares = {
.open = zpl_common_open,
.llseek = generic_file_llseek,
.read = generic_read_dir,
#ifdef HAVE_VFS_ITERATE
#ifdef HAVE_VFS_ITERATE_SHARED
.iterate_shared = zpl_shares_iterate,
#elif defined(HAVE_VFS_ITERATE)
.iterate = zpl_shares_iterate,
#else
.readdir = zpl_shares_readdir,

View File

@ -24,6 +24,9 @@
*/
#ifdef CONFIG_COMPAT
#include <linux/compat.h>
#endif
#include <sys/dmu_objset.h>
#include <sys/zfs_vfsops.h>
#include <sys/zfs_vnops.h>
@ -90,7 +93,7 @@ zpl_iterate(struct file *filp, struct dir_context *ctx)
return (error);
}
#if !defined(HAVE_VFS_ITERATE)
#if !defined(HAVE_VFS_ITERATE) && !defined(HAVE_VFS_ITERATE_SHARED)
static int
zpl_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
@ -128,12 +131,15 @@ zpl_fsync(struct file *filp, struct dentry *dentry, int datasync)
return (error);
}
#ifdef HAVE_FILE_AIO_FSYNC
static int
zpl_aio_fsync(struct kiocb *kiocb, int datasync)
{
struct file *filp = kiocb->ki_filp;
return (zpl_fsync(filp, filp->f_path.dentry, datasync));
}
#endif
#elif defined(HAVE_FSYNC_WITHOUT_DENTRY)
/*
* Linux 2.6.35 - 3.0 API,
@ -159,11 +165,14 @@ zpl_fsync(struct file *filp, int datasync)
return (error);
}
#ifdef HAVE_FILE_AIO_FSYNC
static int
zpl_aio_fsync(struct kiocb *kiocb, int datasync)
{
return (zpl_fsync(kiocb->ki_filp, datasync));
}
#endif
#elif defined(HAVE_FSYNC_RANGE)
/*
* Linux 3.1 - 3.x API,
@ -194,11 +203,14 @@ zpl_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
return (error);
}
#ifdef HAVE_FILE_AIO_FSYNC
static int
zpl_aio_fsync(struct kiocb *kiocb, int datasync)
{
return (zpl_fsync(kiocb->ki_filp, kiocb->ki_pos, -1, datasync));
}
#endif
#else
#error "Unsupported fops->fsync() implementation"
#endif
@ -247,20 +259,6 @@ zpl_read_common(struct inode *ip, const char *buf, size_t len, loff_t *ppos,
flags, cr, 0));
}
static ssize_t
zpl_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
{
cred_t *cr = CRED();
ssize_t read;
crhold(cr);
read = zpl_read_common(filp->f_mapping->host, buf, len, ppos,
UIO_USERSPACE, filp->f_flags, cr);
crfree(cr);
return (read);
}
static ssize_t
zpl_iter_read_common(struct kiocb *kiocb, const struct iovec *iovp,
unsigned long nr_segs, size_t count, uio_seg_t seg, size_t skip)
@ -274,6 +272,7 @@ zpl_iter_read_common(struct kiocb *kiocb, const struct iovec *iovp,
nr_segs, &kiocb->ki_pos, seg, filp->f_flags, cr, skip);
crfree(cr);
file_accessed(filp);
return (read);
}
@ -298,7 +297,14 @@ static ssize_t
zpl_aio_read(struct kiocb *kiocb, const struct iovec *iovp,
unsigned long nr_segs, loff_t pos)
{
return (zpl_iter_read_common(kiocb, iovp, nr_segs, kiocb->ki_nbytes,
ssize_t ret;
size_t count;
ret = generic_segment_checks(iovp, &nr_segs, &count, VERIFY_WRITE);
if (ret)
return (ret);
return (zpl_iter_read_common(kiocb, iovp, nr_segs, count,
UIO_USERSPACE, 0));
}
#endif /* HAVE_VFS_RW_ITERATE */
@ -336,6 +342,7 @@ zpl_write_common_iovec(struct inode *ip, const struct iovec *iovp, size_t count,
return (wrote);
}
inline ssize_t
zpl_write_common(struct inode *ip, const char *buf, size_t len, loff_t *ppos,
uio_seg_t segment, int flags, cred_t *cr)
@ -349,20 +356,6 @@ zpl_write_common(struct inode *ip, const char *buf, size_t len, loff_t *ppos,
flags, cr, 0));
}
static ssize_t
zpl_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
{
cred_t *cr = CRED();
ssize_t wrote;
crhold(cr);
wrote = zpl_write_common(filp->f_mapping->host, buf, len, ppos,
UIO_USERSPACE, filp->f_flags, cr);
crfree(cr);
return (wrote);
}
static ssize_t
zpl_iter_write_common(struct kiocb *kiocb, const struct iovec *iovp,
unsigned long nr_segs, size_t count, uio_seg_t seg, size_t skip)
@ -383,16 +376,42 @@ zpl_iter_write_common(struct kiocb *kiocb, const struct iovec *iovp,
static ssize_t
zpl_iter_write(struct kiocb *kiocb, struct iov_iter *from)
{
size_t count;
ssize_t ret;
uio_seg_t seg = UIO_USERSPACE;
#ifndef HAVE_GENERIC_WRITE_CHECKS_KIOCB
struct file *file = kiocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *ip = mapping->host;
int isblk = S_ISBLK(ip->i_mode);
count = iov_iter_count(from);
ret = generic_write_checks(file, &kiocb->ki_pos, &count, isblk);
if (ret)
return (ret);
#else
/*
* XXX - ideally this check should be in the same lock region with
* write operations, so that there's no TOCTTOU race when doing
* append and someone else grow the file.
*/
ret = generic_write_checks(kiocb, from);
if (ret <= 0)
return (ret);
count = ret;
#endif
if (from->type & ITER_KVEC)
seg = UIO_SYSSPACE;
if (from->type & ITER_BVEC)
seg = UIO_BVEC;
ret = zpl_iter_write_common(kiocb, from->iov, from->nr_segs,
iov_iter_count(from), seg, from->iov_offset);
count, seg, from->iov_offset);
if (ret > 0)
iov_iter_advance(from, ret);
return (ret);
}
#else
@ -400,7 +419,22 @@ static ssize_t
zpl_aio_write(struct kiocb *kiocb, const struct iovec *iovp,
unsigned long nr_segs, loff_t pos)
{
return (zpl_iter_write_common(kiocb, iovp, nr_segs, kiocb->ki_nbytes,
struct file *file = kiocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *ip = mapping->host;
int isblk = S_ISBLK(ip->i_mode);
size_t count;
ssize_t ret;
ret = generic_segment_checks(iovp, &nr_segs, &count, VERIFY_READ);
if (ret)
return (ret);
ret = generic_write_checks(file, &pos, &count, isblk);
if (ret)
return (ret);
return (zpl_iter_write_common(kiocb, iovp, nr_segs, count,
UIO_USERSPACE, 0));
}
#endif /* HAVE_VFS_RW_ITERATE */
@ -416,13 +450,13 @@ zpl_llseek(struct file *filp, loff_t offset, int whence)
loff_t maxbytes = ip->i_sb->s_maxbytes;
loff_t error;
spl_inode_lock(ip);
spl_inode_lock_shared(ip);
cookie = spl_fstrans_mark();
error = -zfs_holey(ip, whence, &offset);
spl_fstrans_unmark(cookie);
if (error == 0)
error = lseek_execute(filp, ip, offset, maxbytes);
spl_inode_unlock(ip);
spl_inode_unlock_shared(ip);
return (error);
}
@ -646,8 +680,6 @@ zpl_fallocate_common(struct inode *ip, int mode, loff_t offset, loff_t len)
if (mode != (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
return (error);
crhold(cr);
if (offset < 0 || len <= 0)
return (-EINVAL);
@ -666,6 +698,7 @@ zpl_fallocate_common(struct inode *ip, int mode, loff_t offset, loff_t len)
bf.l_len = len;
bf.l_pid = 0;
crhold(cr);
cookie = spl_fstrans_mark();
error = -zfs_space(ip, F_FREESP, &bf, FWRITE, offset, cr);
spl_fstrans_unmark(cookie);
@ -725,8 +758,7 @@ zpl_ioctl_getflags(struct file *filp, void __user *arg)
* is outside of our jurisdiction.
*/
#define fchange(f0, f1, b0, b1) ((((f0) & (b0)) == (b0)) != \
(((b1) & (f1)) == (f1)))
#define fchange(f0, f1, b0, b1) (!((f0) & (b0)) != !((f1) & (b1)))
static int
zpl_ioctl_setflags(struct file *filp, void __user *arg)
@ -798,7 +830,17 @@ zpl_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
static long
zpl_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
return (zpl_ioctl(filp, cmd, arg));
switch (cmd) {
case FS_IOC32_GETFLAGS:
cmd = FS_IOC_GETFLAGS;
break;
case FS_IOC32_SETFLAGS:
cmd = FS_IOC_SETFLAGS;
break;
default:
return (-ENOTTY);
}
return (zpl_ioctl(filp, cmd, (unsigned long)compat_ptr(arg)));
}
#endif /* CONFIG_COMPAT */
@ -814,18 +856,24 @@ const struct file_operations zpl_file_operations = {
.open = zpl_open,
.release = zpl_release,
.llseek = zpl_llseek,
.read = zpl_read,
.write = zpl_write,
#ifdef HAVE_VFS_RW_ITERATE
#ifdef HAVE_NEW_SYNC_READ
.read = new_sync_read,
.write = new_sync_write,
#endif
.read_iter = zpl_iter_read,
.write_iter = zpl_iter_write,
#else
.read = do_sync_read,
.write = do_sync_write,
.aio_read = zpl_aio_read,
.aio_write = zpl_aio_write,
#endif
.mmap = zpl_mmap,
.fsync = zpl_fsync,
#ifdef HAVE_FILE_AIO_FSYNC
.aio_fsync = zpl_aio_fsync,
#endif
#ifdef HAVE_FILE_FALLOCATE
.fallocate = zpl_fallocate,
#endif /* HAVE_FILE_FALLOCATE */
@ -838,7 +886,9 @@ const struct file_operations zpl_file_operations = {
const struct file_operations zpl_dir_file_operations = {
.llseek = generic_file_llseek,
.read = generic_read_dir,
#ifdef HAVE_VFS_ITERATE
#ifdef HAVE_VFS_ITERATE_SHARED
.iterate_shared = zpl_iterate,
#elif defined(HAVE_VFS_ITERATE)
.iterate = zpl_iterate,
#else
.readdir = zpl_readdir,

View File

@ -50,7 +50,7 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
int zfs_flags = 0;
zfs_sb_t *zsb = dentry->d_sb->s_fs_info;
if (dlen(dentry) > ZFS_MAXNAMELEN)
if (dlen(dentry) >= ZAP_MAXNAMELEN)
return (ERR_PTR(-ENAMETOOLONG));
crhold(cr);
@ -102,9 +102,13 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
struct dentry *new_dentry;
struct qstr ci_name;
ci_name.name = pn.pn_buf;
ci_name.len = strlen(pn.pn_buf);
new_dentry = d_add_ci(dentry, ip, &ci_name);
if (strcmp(dname(dentry), pn.pn_buf) == 0) {
new_dentry = d_splice_alias(ip, dentry);
} else {
ci_name.name = pn.pn_buf;
ci_name.len = strlen(pn.pn_buf);
new_dentry = d_add_ci(dentry, ip, &ci_name);
}
kmem_free(pn.pn_buf, ZFS_MAXNAMELEN);
return (new_dentry);
} else {
@ -298,18 +302,25 @@ zpl_rmdir(struct inode * dir, struct dentry *dentry)
}
static int
zpl_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
zpl_getattr_impl(const struct path *path, struct kstat *stat, u32 request_mask,
unsigned int query_flags)
{
int error;
fstrans_cookie_t cookie;
cookie = spl_fstrans_mark();
error = -zfs_getattr_fast(dentry->d_inode, stat);
/*
* XXX request_mask and query_flags currently ignored.
*/
error = -zfs_getattr_fast(path->dentry->d_inode, stat);
spl_fstrans_unmark(cookie);
ASSERT3S(error, <=, 0);
return (error);
}
ZPL_GETATTR_WRAPPER(zpl_getattr);
static int
zpl_setattr(struct dentry *dentry, struct iattr *ia)
@ -320,7 +331,7 @@ zpl_setattr(struct dentry *dentry, struct iattr *ia)
int error;
fstrans_cookie_t cookie;
error = inode_change_ok(ip, ia);
error = setattr_prepare(dentry, ia);
if (error)
return (error);
@ -335,6 +346,9 @@ zpl_setattr(struct dentry *dentry, struct iattr *ia)
vap->va_mtime = ia->ia_mtime;
vap->va_ctime = ia->ia_ctime;
if (vap->va_mask & ATTR_ATIME)
ip->i_atime = ia->ia_atime;
cookie = spl_fstrans_mark();
error = -zfs_setattr(ip, vap, 0, cr);
if (!error && (ia->ia_valid & ATTR_MODE))
@ -349,13 +363,17 @@ zpl_setattr(struct dentry *dentry, struct iattr *ia)
}
static int
zpl_rename(struct inode *sdip, struct dentry *sdentry,
struct inode *tdip, struct dentry *tdentry)
zpl_rename2(struct inode *sdip, struct dentry *sdentry,
struct inode *tdip, struct dentry *tdentry, unsigned int flags)
{
cred_t *cr = CRED();
int error;
fstrans_cookie_t cookie;
/* We don't have renameat2(2) support */
if (flags)
return (-EINVAL);
crhold(cr);
cookie = spl_fstrans_mark();
error = -zfs_rename(sdip, dname(sdentry), tdip, dname(tdentry), cr, 0);
@ -366,6 +384,15 @@ zpl_rename(struct inode *sdip, struct dentry *sdentry,
return (error);
}
#ifndef HAVE_RENAME_WANTS_FLAGS
static int
zpl_rename(struct inode *sdip, struct dentry *sdentry,
struct inode *tdip, struct dentry *tdentry)
{
return (zpl_rename2(sdip, sdentry, tdip, tdentry, 0));
}
#endif
static int
zpl_symlink(struct inode *dir, struct dentry *dentry, const char *name)
{
@ -530,7 +557,7 @@ zpl_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
return (-EMLINK);
crhold(cr);
ip->i_ctime = CURRENT_TIME_SEC;
ip->i_ctime = current_time(ip);
igrab(ip); /* Use ihold() if available */
cookie = spl_fstrans_mark();
@ -642,19 +669,13 @@ zpl_revalidate(struct dentry *dentry, unsigned int flags)
}
const struct inode_operations zpl_inode_operations = {
.create = zpl_create,
.link = zpl_link,
.unlink = zpl_unlink,
.symlink = zpl_symlink,
.mkdir = zpl_mkdir,
.rmdir = zpl_rmdir,
.mknod = zpl_mknod,
.rename = zpl_rename,
.setattr = zpl_setattr,
.getattr = zpl_getattr,
#ifdef HAVE_GENERIC_SETXATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.removexattr = generic_removexattr,
#endif
.listxattr = zpl_xattr_list,
#ifdef HAVE_INODE_TRUNCATE_RANGE
.truncate_range = zpl_truncate_range,
@ -663,6 +684,9 @@ const struct inode_operations zpl_inode_operations = {
.fallocate = zpl_fallocate,
#endif /* HAVE_INODE_FALLOCATE */
#if defined(CONFIG_FS_POSIX_ACL)
#if defined(HAVE_SET_ACL)
.set_acl = zpl_set_acl,
#endif
#if defined(HAVE_GET_ACL)
.get_acl = zpl_get_acl,
#elif defined(HAVE_CHECK_ACL)
@ -682,14 +706,23 @@ const struct inode_operations zpl_dir_inode_operations = {
.mkdir = zpl_mkdir,
.rmdir = zpl_rmdir,
.mknod = zpl_mknod,
#ifdef HAVE_RENAME_WANTS_FLAGS
.rename = zpl_rename2,
#else
.rename = zpl_rename,
#endif
.setattr = zpl_setattr,
.getattr = zpl_getattr,
#ifdef HAVE_GENERIC_SETXATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.removexattr = generic_removexattr,
#endif
.listxattr = zpl_xattr_list,
#if defined(CONFIG_FS_POSIX_ACL)
#if defined(HAVE_SET_ACL)
.set_acl = zpl_set_acl,
#endif
#if defined(HAVE_GET_ACL)
.get_acl = zpl_get_acl,
#elif defined(HAVE_CHECK_ACL)
@ -701,7 +734,9 @@ const struct inode_operations zpl_dir_inode_operations = {
};
const struct inode_operations zpl_symlink_inode_operations = {
#ifdef HAVE_GENERIC_READLINK
.readlink = generic_readlink,
#endif
#if defined(HAVE_GET_LINK_DELAYED) || defined(HAVE_GET_LINK_COOKIE)
.get_link = zpl_get_link,
#elif defined(HAVE_FOLLOW_LINK_COOKIE) || defined(HAVE_FOLLOW_LINK_NAMEIDATA)
@ -712,20 +747,27 @@ const struct inode_operations zpl_symlink_inode_operations = {
#endif
.setattr = zpl_setattr,
.getattr = zpl_getattr,
#ifdef HAVE_GENERIC_SETXATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.removexattr = generic_removexattr,
#endif
.listxattr = zpl_xattr_list,
};
const struct inode_operations zpl_special_inode_operations = {
.setattr = zpl_setattr,
.getattr = zpl_getattr,
#ifdef HAVE_GENERIC_SETXATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.removexattr = generic_removexattr,
#endif
.listxattr = zpl_xattr_list,
#if defined(CONFIG_FS_POSIX_ACL)
#if defined(HAVE_SET_ACL)
.set_acl = zpl_set_acl,
#endif
#if defined(HAVE_GET_ACL)
.get_acl = zpl_get_acl,
#elif defined(HAVE_CHECK_ACL)

View File

@ -336,12 +336,12 @@ zpl_parse_options(char *osname, char *mntopts, zfs_mntopts_t *zmo,
if (mntopts) {
substring_t args[MAX_OPT_ARGS];
char *tmp_mntopts, *p;
char *tmp_mntopts, *p, *t;
int token;
tmp_mntopts = strdup(mntopts);
t = tmp_mntopts = strdup(mntopts);
while ((p = strsep(&tmp_mntopts, ",")) != NULL) {
while ((p = strsep(&t, ",")) != NULL) {
if (!*p)
continue;

Some files were not shown because too many files have changed in this diff Show More