Compare commits

...

39 Commits

Author SHA1 Message Date
Tony Hutter a8c2b7ebc6 Tag zfs-0.7.13
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2019-02-22 09:47:55 -08:00
John Wren Kennedy 2af898ee24 test-runner: python3 support
Updated to be compatible with Python 2.6, 2.7, 3.5 or newer.

Reviewed-by: John Ramsden <johnramsden@riseup.net>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #8096
2019-02-22 09:47:34 -08:00
Gregor Kopka c32c2f17d0 Fix flake 8 style warnings
Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.
- try/except wrapping of configparser import as there are still
  python 2.7 systems that lack a compatibility shim

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7925
Closes #7952
2019-02-22 09:47:34 -08:00
Tony Hutter 2254b2bbbe GCC 9.0: Fix ztest "directive argument is not a nul-terminated string"
GCC 9.0 is complaining because we're trying to print strings that
are defined like this:

.zo_pool = { 'z', 't', 'e', 's', 't', '\0' },

Fix them by making them actual strings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8330
2019-02-22 09:47:34 -08:00
Brian Behlendorf 5c4ec382a7 Linux 5.0 compat: Fix bio_set_dev()
The Linux 5.0 kernel updated the bio_set_dev() macro so it calls the
GPL-only bio_associate_blkg() symbol thus inadvertently converting
the entire macro.  Provide a minimal version which always assigns the
request queue's root_blkg to the bio.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8287
2019-02-22 09:47:34 -08:00
Tony Hutter e22bfd8149 Linux 5.0 compat: Disable vector instructions on 5.0+ kernels
The 5.0 kernel no longer exports the functions we need to do vector
(SSE/SSE2/SSE3/AVX...) instructions.  Disable vector-based checksum
algorithms when building against those kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8259
2019-02-22 09:47:34 -08:00
Tony Hutter f45ad7bff6 Linux 5.0 compat: Fix SUBDIRs
SUBDIRs has been deprecated for a long time, and was finally removed in
the 5.0 kernel.  Use "M=" instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8257
2019-02-22 09:47:34 -08:00
Tony Hutter 0a3a4d067a Linux 5.0 compat: Convert MS_* macros to SB_*
In the 5.0 kernel, only the mount namespace code should use the MS_*
macos. Filesystems should use the SB_* ones.

https://patchwork.kernel.org/patch/10552493/

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8264
2019-02-22 09:47:34 -08:00
Tony Hutter ba8024a284 Linux 5.0 compat: Use totalram_pages()
totalram_pages() was converted to an atomic variable in 5.0:

https://patchwork.kernel.org/patch/10652795/

Its value should now be read though the totalram_pages() helper
function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8263
2019-02-22 09:47:34 -08:00
Tony Hutter edc2675aed Linux 5.0 compat: access_ok() drops 'type' parameter
access_ok no longer needs a 'type' parameter in the 5.0 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8261
2019-02-22 09:47:34 -08:00
ilbsmart 98bb45e27a deadlock between mm_sem and tx assign in zfs_write() and page fault
The bug time sequence:
1. thread #1, `zfs_write` assign a txg "n".
2. In a same process, thread #2, mmap page fault (which means the
   `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed,
   and wait previous txg "n" completed.
3. thread #1 call `uiomove` to write, however page fault is occurred
   in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by
   thread #2, so it stuck and can't complete,  then txg "n" will
   not complete.

So thread #1 and thread #2 are deadlocked.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Grady Wong <grady.w@xtaotech.com>
Closes #7939
2019-02-22 09:47:34 -08:00
Neal Gompa (ニール・ゴンパ) 44f463824b dkms: Enable debuginfo option to be set with zfs sysconfig file
On some Linux distributions, the kernel module build will not
default to building with debuginfo symbols, which can make it
difficult for debugging and testing.

For this case, we provide a flag to override the build to force
debuginfo to be produced for the kernel module build.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Neal Gompa <ngompa@datto.com>
Co-authored-by: Simon Watson <swatson@datto.com>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Simon Watson <swatson@datto.com>
Closes #8304
2019-02-22 09:47:34 -08:00
Neal Gompa (ニール・ゴンパ) b0d579bc55 Bump commit subject length to 72 characters
There's not really a reason to keep the subject length so short,
since the reason to make it this short was for making nice renders
of a summary list of the git log. With 72 characters, this still
works out fine, so let's just raise it to that so that it's easier
to give slightly more descriptive change summaries.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes #8250
2019-02-22 09:47:34 -08:00
Benjamin Gentil 7e5def8ae0 zfs.8 uses wrong snapshot names in Example 15
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: Benjamin Gentil <benjamin@gentil.io>
Closes #8241
2019-02-22 09:47:34 -08:00
Tony Hutter 89019a846b Add enclosure_symlinks option to vdev_id
Add an 'enclosure_symlinks' option to vdev_id.conf.  This creates
consistently named symlinks to the enclosure devices (/dev/sg*) based
off the configuration in vdev_id.conf.  The enclosure symlinks show
up in /dev/by-enclosure/<prefix>-<channel><num>.  The links make it
make it easy to run sg_ses on a particular enclosure device.  The
enclosure links are created in addition to the normal
/dev/disk/by-vdev links.

'enclosure_symlinks' is only valid in sas_direct configurations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Simon Guest <simon.guest@tesujimath.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8194
2019-02-22 09:47:34 -08:00
Simon Guest 41f7723e9c vdev_id: new slot type ses
This extends vdev_id to support a new slot type, ses, for SCSI Enclosure
Services.  With slot type ses, the disk slot numbers are determined by
using the device slot number reported by sg_ses for the device with
matching SAS address, found by querying all available enclosures.

This is primarily of use on systems with a deficient driver omitting
support for bay_identifier in /sys/devices.  In my testing, I found that
the existing slot types of port and id were not stable across disk
replacement, so an alternative was required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Simon Guest <simon.guest@tesujimath.org>
Closes #6956
2019-02-22 09:47:34 -08:00
Simon Guest 2b8c3cb0c8 vdev_id: extension for new scsi topology
On systems with SCSI rather than SAS disk topology, this change enables
the vdev_id script to match against the block device path, and therefore
create a vdev alias in /dev/disk/by-vdev.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Simon Guest <simon.guest@tesujimath.org>
Closes #6592
2019-02-22 09:47:34 -08:00
Olaf Faaland f325d76e96 Rename macro ZFS_MINOR due to Lustre conflict
Macro ZFS_MINOR, introduced in commit a6cc9756 to record the chosen
static minor number for /dev/zfs, conflicts with an existing macro
in Lustre.  The lustre macro (along with _MAJOR, _PATCH, _FIX) is
used to record the zfsonlinux version Lustre is being built against.

Since the Lustre macro came first, and is used in past versions of
lustre at least going back to 2.10, it makes sense to rename the
macro in ZFS instead of doing so in Lustre which would require
backporting the patch.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #8195
2019-02-22 09:47:34 -08:00
Brian Behlendorf e3fb781c5f Add kernel module auto-loading
Historically a dynamic misc minor number was registered for the
/dev/zfs device in order to prevent minor number collisions.  This
was fine but it prevented us from being able to use the kernel
module auto-loaded which requires a known reserved value.

Resolve this issue by adding a configure test to find an available
misc minor number which can then be used in MODULE_ALIAS_MISCDEV at
build time.  By adding this alias the zfs kmod is added to the list
of known static-nodes and the systemd-tmpfiles-setup-dev service
will create a /dev/zfs character device at boot time.

This in turn allows us to update the 90-zfs.rules file to make it
aware this is a static node.  The upshot of this is that whenever
a process (zpool, zfs, zed) opens the /dev/zfs the kmods will be
automatic loaded.  This even works for unprivileged users so there
is no longer a need to manually load the modules at boot time.

As an additional bonus the zed now no longer needs to start after
the zfs-import.service since it will trigger the module load.

In the unlikely event the minor number we selected conflicts with
another out of tree unregistered minor number the code falls back
to dynamically allocating it.  In this case the modules again
must be manually loaded.

Note that due to the change in the method of registering the minor
number the zimport.sh test case may incorrectly fail when the
static node for the installed packages is created instead of the
dynamic one.  This issue will only transiently impact zimport.sh
for this single commit when we transition and are mixing and
matching methods.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes #7287
2019-02-22 09:47:34 -08:00
Ben Wolsieffer 14a5e48fb9 Use autoconf variable for C preprocessor
This fixes the build when cross-compiling, where the preprocessor might
be prefixed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ben Wolsieffer <benwolsieffer@gmail.com>
Closes #8180
2019-02-22 09:47:34 -08:00
Matthew Ahrens 01937958ce OpenZFS 9577 - remove zfs_dbuf_evict_key tsd
The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary -
we can instead pass a flag down in a few places to prevent recursive
dbuf eviction. Making this change has 3 benefits:

1. The code semantics are easier to understand.
2. On Linux, performance is improved, because creating/removing
   TSD values (by setting to NULL vs non-NULL) is expensive, and
   we do it very often.
3. According to Nexenta, the current semantics can cause a
   deadlock when concurrently calling dmu_objset_evict_dbufs()
   (which is rare today, but they are working on a "parallel
   unmount" change that triggers this more easily):

Porting Notes:
* Minor conflict with OpenZFS 9337 which has not yet been ported.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9577
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/645
External-issue: DLPX-58547
Closes #7602
2019-02-22 09:47:34 -08:00
LOLi edb504f9db Honor --with-mounthelperdir where applicable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6962
2019-02-22 09:47:34 -08:00
LOLi 2428fbbfcf contrib/initramfs: switch to automake
Use automake to build initramfs scripts and hooks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6761
2019-02-22 09:47:33 -08:00
Tony Hutter 16d298188f Tag zfs-0.7.12
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-11-08 14:38:37 -08:00
Tony Hutter f42f8702ce Add BuildRequires gcc, make, elfutils-libelf-devel
This adds a BuildRequires for gcc, make, and elfutils-libelf-devel
into our spec files.  gcc has been a packaging requirement for
awhile now:

https://fedoraproject.org/wiki/Packaging:C_and_C%2B%2B

These additional BuildRequires allow us to mock build in
Fedora 29.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Tony Hutter <hutter2@llnl.gov>
Closes #8095
Closes #8102
2018-11-08 14:38:28 -08:00
Brian Behlendorf 9e58d5ef38 Fix flake8 "invalid escape sequence 'x'" warning
From, https://lintlyci.github.io/Flake8Rules/rules/W605.html

As of Python 3.6, a backslash-character pair that is not a valid
escape sequence now generates a DeprecationWarning. Although this
will eventually become a SyntaxError, that will not be for several
Python releases.

Note 'float_pobj' was simply removed from arcstat.py since it
was entirely unused.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8056
2018-11-08 14:38:28 -08:00
Brian Behlendorf 320f9de8ab ZTS: Update O_TMPFILE support check
In CentOS 7.5 the kernel provided a compatibility wrapper to support
O_TMPFILE.  This results in the test setup script correctly detecting
kernel support.  But the ZFS module was built without O_TMPFILE
support due to the non-standard CentOS kernel interface.

Handle this case by updating the setup check to fail either when
the kernel or the ZFS module fail to provide support.  The reason
will be clearly logged in the test results.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7528
2018-11-08 14:38:28 -08:00
George Melikov 262275ab26 Allow use of pool GUID as root pool
It's helpful if there are pools with same names,
but you need to use only one of them.

Main case is twin servers, meanwhile some software
requires the same name of pools (e.g. Proxmox).

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Igor ‘guardian’ Lidin of Moscow, Russia
Closes #8052
2018-11-08 14:38:28 -08:00
Brian Behlendorf 55f39a01e6 Fix arc_release() refcount
Update arc_release to use arc_buf_size().  This hunk was accidentally
dropped when porting compressed send/recv, 2aa34383b.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8000
2018-11-08 14:38:28 -08:00
Tim Schumacher b884768e46 Prefix all refcount functions with zfs_
Recent changes in the Linux kernel made it necessary to prefix
the refcount_add() function with zfs_ due to a name collision.

To bring the other functions in line with that and to avoid future
collisions, prefix the other refcount functions as well.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7963
2018-11-08 14:38:28 -08:00
Tim Schumacher f8f4e13776 Linux 4.19-rc3+ compat: Remove refcount_t compat
torvalds/linux@59b57717f ("blkcg: delay blkg destruction until
after writeback has finished") added a refcount_t to the blkcg
structure. Due to the refcount_t compatibility code, zfs_refcount_t
was used by mistake.

Resolve this by removing the compatibility code and replacing the
occurrences of refcount_t with zfs_refcount_t.

Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7885
Closes #7932
2018-11-08 14:38:28 -08:00
Gregor Kopka 5f07d51751 Zpool iostat: remove latency/queue scaling
Bandwidth and iops are average per second while *_wait are averages
per request for latency or, for queue depths, an instantaneous
measurement at the end of an interval (according to man zpool).

When calculating the first two it makes sense to do
x/interval_duration (x being the increase in total bytes or number of
requests over the duration of the interval, interval_duration in
seconds) to 'scale' from amount/interval_duration to amount/second.

But applying the same math for the latter (*_wait latencies/queue) is
wrong as there is no interval_duration component in the values (these
are time/requests to get to average_time/request or already an
absulute number).

This bug leads to the only correct continuous *_wait figures for both
latencies and queue depths from 'zpool iostat -l/q' being with
duration=1 as then the wrong math cancels itself (x/1 is a nop).

This removes temporal scaling from latency and queue depth figures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7945
Closes #7694
2018-11-08 14:38:28 -08:00
Brian Behlendorf b2f003c4f4 Fix statfs(2) for 32-bit user space
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7927
Closes #7122
Closes #7937
2018-11-08 14:38:28 -08:00
Olaf Faaland 9014da2b01 Skip import activity test in more zdb code paths
Since zdb opens the pools read-only, it cannot damage the pool in the
event the pool is already imported either on the same host or on
another one.

If the pool vdev structure is changing while zdb is importing the
pool, it may cause zdb to crash.  However this is unlikely, and in any
case it's a user space process and can simply be run again.

For this reason, zdb should disable the multihost activity test on
import that is normally run.

This commit fixes a few zdb code paths where that had been overlooked.
It also adds tests to ensure that several common use cases handle this
properly in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gu Zheng <guzheng2331314@163.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7797
Closes #7801
2018-11-08 14:38:28 -08:00
Matthew Ahrens 45579c9515 Reduce taskq and context-switch cost of zio pipe
When doing a read from disk, ZFS creates 3 ZIO's: a zio_null(), the
logical zio_read(), and then a physical zio. Currently, each of these
results in a separate taskq_dispatch(zio_execute).

On high-read-iops workloads, this causes a significant performance
impact. By processing all 3 ZIO's in a single taskq entry, we reduce the
overhead on taskq locking and context switching.  We accomplish this by
allowing zio_done() to return a "next zio to execute" to zio_execute().

This results in a ~12% performance increase for random reads, from
96,000 iops to 108,000 iops (with recordsize=8k, on SSD's).

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-59292
Closes #7736
2018-11-08 14:38:28 -08:00
Tom Caputi b32f1279d4 Fix race in dnode_check_slots_free()
Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.

This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7147
Closes #7388
2018-11-08 14:38:28 -08:00
Tony Hutter 1b0cd07131 Tag zfs-0.7.11
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-09-13 10:13:41 -07:00
Dr. András Korn 8c6867dae4 tx_waited -> tx_dirty_delayed in trace_dmu.h
This change was missed in 0735ecb334.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: András Korn <korn-github.com@elan.rulez.org>
Closes #7096
2018-09-13 10:12:22 -07:00
Tony Hutter 99310c0aa0 Revert "zpool reopen should detect expanded devices"
This reverts commit 2a16d4cfaf.

The commit was causing a "attempt to access beyond the end
of device" error:

list.zfsonlinux.org/pipermail/zfs-discuss/2018-September/032217.html
2018-09-13 10:11:42 -07:00
102 changed files with 1867 additions and 1045 deletions

View File

@ -161,7 +161,7 @@ coding convention.
### Commit Message Formats ### Commit Message Formats
#### New Changes #### New Changes
Commit messages for new changes must meet the following guidelines: Commit messages for new changes must meet the following guidelines:
* In 50 characters or less, provide a summary of the change as the * In 72 characters or less, provide a summary of the change as the
first line in the commit message. first line in the commit message.
* A body which provides a description of the change. If necessary, * A body which provides a description of the change. If necessary,
please summarize important information such as why the proposed please summarize important information such as why the proposed

2
META
View File

@ -1,7 +1,7 @@
Meta: 1 Meta: 1
Name: zfs Name: zfs
Branch: 1.0 Branch: 1.0
Version: 0.7.10 Version: 0.7.13
Release: 1 Release: 1
Release-Tags: relext Release-Tags: relext
License: CDDL License: CDDL

View File

@ -112,7 +112,6 @@ cur = {}
d = {} d = {}
out = None out = None
kstat = None kstat = None
float_pobj = re.compile("^[0-9]+(\.[0-9]+)?$")
def detailed_usage(): def detailed_usage():

View File

@ -7,6 +7,8 @@ DEFAULT_INCLUDES += \
# #
# Ignore the prefix for the mount helper. It must be installed in /sbin/ # Ignore the prefix for the mount helper. It must be installed in /sbin/
# because this path is hardcoded in the mount(8) for security reasons. # because this path is hardcoded in the mount(8) for security reasons.
# However, if needed, the configure option --with-mounthelperdir= can be used
# to override the default install location.
# #
sbindir=$(mounthelperdir) sbindir=$(mounthelperdir)
sbin_PROGRAMS = mount.zfs sbin_PROGRAMS = mount.zfs

View File

@ -100,10 +100,11 @@ usage() {
cat << EOF cat << EOF
Usage: vdev_id [-h] Usage: vdev_id [-h]
vdev_id <-d device> [-c config_file] [-p phys_per_port] vdev_id <-d device> [-c config_file] [-p phys_per_port]
[-g sas_direct|sas_switch] [-m] [-g sas_direct|sas_switch|scsi] [-m]
-c specify name of alernate config file [default=$CONFIG] -c specify name of alernate config file [default=$CONFIG]
-d specify basename of device (i.e. sda) -d specify basename of device (i.e. sda)
-e Create enclose device symlinks only (/dev/by-enclosure)
-g Storage network topology [default="$TOPOLOGY"] -g Storage network topology [default="$TOPOLOGY"]
-m Run in multipath mode -m Run in multipath mode
-p number of phy's per switch port [default=$PHYS_PER_PORT] -p number of phy's per switch port [default=$PHYS_PER_PORT]
@ -135,7 +136,7 @@ map_channel() {
MAPPED_CHAN=`awk "\\$1 == \"channel\" && \\$2 == ${PORT} \ MAPPED_CHAN=`awk "\\$1 == \"channel\" && \\$2 == ${PORT} \
{ print \\$3; exit }" $CONFIG` { print \\$3; exit }" $CONFIG`
;; ;;
"sas_direct") "sas_direct"|"scsi")
MAPPED_CHAN=`awk "\\$1 == \"channel\" && \ MAPPED_CHAN=`awk "\\$1 == \"channel\" && \
\\$2 == \"${PCI_ID}\" && \\$3 == ${PORT} \ \\$2 == \"${PCI_ID}\" && \\$3 == ${PORT} \
{ print \\$4; exit }" $CONFIG` { print \\$4; exit }" $CONFIG`
@ -276,6 +277,23 @@ sas_handler() {
d=$(eval echo \${$i}) d=$(eval echo \${$i})
SLOT=`echo $d | sed -e 's/^.*://'` SLOT=`echo $d | sed -e 's/^.*://'`
;; ;;
"ses")
# look for this SAS path in all SCSI Enclosure Services
# (SES) enclosures
sas_address=`cat $end_device_dir/sas_address 2>/dev/null`
enclosures=`lsscsi -g | \
sed -n -e '/enclosu/s/^.* \([^ ][^ ]*\) *$/\1/p'`
for enclosure in $enclosures; do
set -- $(sg_ses -p aes $enclosure | \
awk "/device slot number:/{slot=\$12} \
/SAS address: $sas_address/\
{print slot}")
SLOT=$1
if [ -n "$SLOT" ] ; then
break
fi
done
;;
esac esac
if [ -z "$SLOT" ] ; then if [ -z "$SLOT" ] ; then
return return
@ -289,6 +307,156 @@ sas_handler() {
echo ${CHAN}${SLOT}${PART} echo ${CHAN}${SLOT}${PART}
} }
scsi_handler() {
if [ -z "$FIRST_BAY_NUMBER" ] ; then
FIRST_BAY_NUMBER=`awk "\\$1 == \"first_bay_number\" \
{print \\$2; exit}" $CONFIG`
fi
FIRST_BAY_NUMBER=${FIRST_BAY_NUMBER:-0}
if [ -z "$PHYS_PER_PORT" ] ; then
PHYS_PER_PORT=`awk "\\$1 == \"phys_per_port\" \
{print \\$2; exit}" $CONFIG`
fi
PHYS_PER_PORT=${PHYS_PER_PORT:-4}
if ! echo $PHYS_PER_PORT | grep -q -E '^[0-9]+$' ; then
echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric"
exit 1
fi
if [ -z "$MULTIPATH_MODE" ] ; then
MULTIPATH_MODE=`awk "\\$1 == \"multipath\" \
{print \\$2; exit}" $CONFIG`
fi
# Use first running component device if we're handling a dm-mpath device
if [ "$MULTIPATH_MODE" = "yes" ] ; then
# If udev didn't tell us the UUID via DM_NAME, check /dev/mapper
if [ -z "$DM_NAME" ] ; then
DM_NAME=`ls -l --full-time /dev/mapper |
awk "/\/$DEV$/{print \\$9}"`
fi
# For raw disks udev exports DEVTYPE=partition when
# handling partitions, and the rules can be written to
# take advantage of this to append a -part suffix. For
# dm devices we get DEVTYPE=disk even for partitions so
# we have to append the -part suffix directly in the
# helper.
if [ "$DEVTYPE" != "partition" ] ; then
PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
fi
# Strip off partition information.
DM_NAME=`echo $DM_NAME | sed 's/p[0-9][0-9]*$//'`
if [ -z "$DM_NAME" ] ; then
return
fi
# Get the raw scsi device name from multipath -ll. Strip off
# leading pipe symbols to make field numbering consistent.
DEV=`multipath -ll $DM_NAME |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
if [ -z "$DEV" ] ; then
return
fi
fi
if echo $DEV | grep -q ^/devices/ ; then
sys_path=$DEV
else
sys_path=`udevadm info -q path -p /sys/block/$DEV 2>/dev/null`
fi
# expect sys_path like this, for example:
# /devices/pci0000:00/0000:00:0b.0/0000:09:00.0/0000:0a:05.0/0000:0c:00.0/host3/target3:1:0/3:1:0:21/block/sdv
# Use positional parameters as an ad-hoc array
set -- $(echo "$sys_path" | tr / ' ')
num_dirs=$#
scsi_host_dir="/sys"
# Get path up to /sys/.../hostX
i=1
while [ $i -le $num_dirs ] ; do
d=$(eval echo \${$i})
scsi_host_dir="$scsi_host_dir/$d"
echo $d | grep -q -E '^host[0-9]+$' && break
i=$(($i + 1))
done
if [ $i = $num_dirs ] ; then
return
fi
PCI_ID=$(eval echo \${$(($i -1))} | awk -F: '{print $2":"$3}')
# In scsi mode, the directory two levels beneath
# /sys/.../hostX reveals the port and slot.
port_dir=$scsi_host_dir
j=$(($i + 2))
i=$(($i + 1))
while [ $i -le $j ] ; do
port_dir="$port_dir/$(eval echo \${$i})"
i=$(($i + 1))
done
set -- $(echo $port_dir | sed -e 's/^.*:\([^:]*\):\([^:]*\)$/\1 \2/')
PORT=$1
SLOT=$(($2 + $FIRST_BAY_NUMBER))
if [ -z "$SLOT" ] ; then
return
fi
CHAN=`map_channel $PCI_ID $PORT`
SLOT=`map_slot $SLOT $CHAN`
if [ -z "$CHAN" ] ; then
return
fi
echo ${CHAN}${SLOT}${PART}
}
# Figure out the name for the enclosure symlink
enclosure_handler () {
# We get all the info we need from udev's DEVPATH variable:
#
# DEVPATH=/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/subsystem/devices/0:0:0:0/scsi_generic/sg0
# Get the enclosure ID ("0:0:0:0")
ENC=$(basename $(readlink -m "/sys/$DEVPATH/../.."))
if [ ! -d /sys/class/enclosure/$ENC ] ; then
# Not an enclosure, bail out
return
fi
# Get the long sysfs device path to our enclosure. Looks like:
# /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/ ... /enclosure/0:0:0:0
ENC_DEVICE=$(readlink /sys/class/enclosure/$ENC)
# Grab the full path to the hosts port dir:
# /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0
PORT_DIR=$(echo $ENC_DEVICE | grep -Eo '.+host[0-9]+/port-[0-9]+:[0-9]+')
# Get the port number
PORT_ID=$(echo $PORT_DIR | grep -Eo "[0-9]+$")
# The PCI directory is two directories up from the port directory
# /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0
PCI_ID_LONG=$(basename $(readlink -m "/sys/$PORT_DIR/../.."))
# Strip down the PCI address from 0000:05:00.0 to 05:00.0
PCI_ID=$(echo "$PCI_ID_LONG" | sed -r 's/^[0-9]+://g')
# Name our device according to vdev_id.conf (like "L0" or "U1").
NAME=$(awk "/channel/{if (\$1 == \"channel\" && \$2 == \"$PCI_ID\" && \
\$3 == \"$PORT_ID\") {print \$4int(count[\$4])}; count[\$4]++}" $CONFIG)
echo "${NAME}"
}
alias_handler () { alias_handler () {
# Special handling is needed to correctly append a -part suffix # Special handling is needed to correctly append a -part suffix
# to partitions of device mapper devices. The DEVTYPE attribute # to partitions of device mapper devices. The DEVTYPE attribute
@ -344,7 +512,7 @@ alias_handler () {
done done
} }
while getopts 'c:d:g:mp:h' OPTION; do while getopts 'c:d:eg:mp:h' OPTION; do
case ${OPTION} in case ${OPTION} in
c) c)
CONFIG=${OPTARG} CONFIG=${OPTARG}
@ -352,6 +520,16 @@ while getopts 'c:d:g:mp:h' OPTION; do
d) d)
DEV=${OPTARG} DEV=${OPTARG}
;; ;;
e)
# When udev sees a scsi_generic device, it calls this script with -e to
# create the enclosure device symlinks only. We also need
# "enclosure_symlinks yes" set in vdev_id.config to actually create the
# symlink.
ENCLOSURE_MODE=$(awk '{if ($1 == "enclosure_symlinks") print $2}' $CONFIG)
if [ "$ENCLOSURE_MODE" != "yes" ] ; then
exit 0
fi
;;
g) g)
TOPOLOGY=$OPTARG TOPOLOGY=$OPTARG
;; ;;
@ -371,7 +549,7 @@ if [ ! -r $CONFIG ] ; then
exit 0 exit 0
fi fi
if [ -z "$DEV" ] ; then if [ -z "$DEV" -a -z "$ENCLOSURE_MODE" ] ; then
echo "Error: missing required option -d" echo "Error: missing required option -d"
exit 1 exit 1
fi fi
@ -384,16 +562,37 @@ if [ -z "$BAY" ] ; then
BAY=`awk "\\$1 == \"slot\" {print \\$2; exit}" $CONFIG` BAY=`awk "\\$1 == \"slot\" {print \\$2; exit}" $CONFIG`
fi fi
TOPOLOGY=${TOPOLOGY:-sas_direct}
# Should we create /dev/by-enclosure symlinks?
if [ "$ENCLOSURE_MODE" = "yes" -a "$TOPOLOGY" = "sas_direct" ] ; then
ID_ENCLOSURE=$(enclosure_handler)
if [ -z "$ID_ENCLOSURE" ] ; then
exit 0
fi
# Just create the symlinks to the enclosure devices and then exit.
ENCLOSURE_PREFIX=$(awk '/enclosure_symlinks_prefix/{print $2}' $CONFIG)
if [ -z "$ENCLOSURE_PREFIX" ] ; then
ENCLOSURE_PREFIX="enc"
fi
echo "ID_ENCLOSURE=$ID_ENCLOSURE"
echo "ID_ENCLOSURE_PATH=by-enclosure/$ENCLOSURE_PREFIX-$ID_ENCLOSURE"
exit 0
fi
# First check if an alias was defined for this device. # First check if an alias was defined for this device.
ID_VDEV=`alias_handler` ID_VDEV=`alias_handler`
if [ -z "$ID_VDEV" ] ; then if [ -z "$ID_VDEV" ] ; then
BAY=${BAY:-bay} BAY=${BAY:-bay}
TOPOLOGY=${TOPOLOGY:-sas_direct}
case $TOPOLOGY in case $TOPOLOGY in
sas_direct|sas_switch) sas_direct|sas_switch)
ID_VDEV=`sas_handler` ID_VDEV=`sas_handler`
;; ;;
scsi)
ID_VDEV=`scsi_handler`
;;
*) *)
echo "Error: unknown topology $TOPOLOGY" echo "Error: unknown topology $TOPOLOGY"
exit 1 exit 1

View File

@ -24,7 +24,7 @@
* Copyright (c) 2011, 2016 by Delphix. All rights reserved. * Copyright (c) 2011, 2016 by Delphix. All rights reserved.
* Copyright (c) 2014 Integros [integros.com] * Copyright (c) 2014 Integros [integros.com]
* Copyright 2016 Nexenta Systems, Inc. * Copyright 2016 Nexenta Systems, Inc.
* Copyright (c) 2017 Lawrence Livermore National Security, LLC. * Copyright (c) 2017, 2018 Lawrence Livermore National Security, LLC.
* Copyright (c) 2015, 2017, Intel Corporation. * Copyright (c) 2015, 2017, Intel Corporation.
*/ */
@ -3659,6 +3659,22 @@ dump_simulated_ddt(spa_t *spa)
dump_dedup_ratio(&dds_total); dump_dedup_ratio(&dds_total);
} }
static void
zdb_set_skip_mmp(char *target)
{
spa_t *spa;
/*
* Disable the activity check to allow examination of
* active pools.
*/
mutex_enter(&spa_namespace_lock);
if ((spa = spa_lookup(target)) != NULL) {
spa->spa_import_flags |= ZFS_IMPORT_SKIP_MMP;
}
mutex_exit(&spa_namespace_lock);
}
static void static void
dump_zpool(spa_t *spa) dump_zpool(spa_t *spa)
{ {
@ -4412,14 +4428,15 @@ main(int argc, char **argv)
target, strerror(ENOMEM)); target, strerror(ENOMEM));
} }
/*
* Disable the activity check to allow examination of
* active pools.
*/
if (dump_opt['C'] > 1) { if (dump_opt['C'] > 1) {
(void) printf("\nConfiguration for import:\n"); (void) printf("\nConfiguration for import:\n");
dump_nvlist(cfg, 8); dump_nvlist(cfg, 8);
} }
/*
* Disable the activity check to allow examination of
* active pools.
*/
error = spa_import(target_pool, cfg, NULL, error = spa_import(target_pool, cfg, NULL,
flags | ZFS_IMPORT_SKIP_MMP); flags | ZFS_IMPORT_SKIP_MMP);
} }
@ -4430,16 +4447,7 @@ main(int argc, char **argv)
if (error == 0) { if (error == 0) {
if (target_is_spa || dump_opt['R']) { if (target_is_spa || dump_opt['R']) {
/* zdb_set_skip_mmp(target);
* Disable the activity check to allow examination of
* active pools.
*/
mutex_enter(&spa_namespace_lock);
if ((spa = spa_lookup(target)) != NULL) {
spa->spa_import_flags |= ZFS_IMPORT_SKIP_MMP;
}
mutex_exit(&spa_namespace_lock);
error = spa_open_rewind(target, &spa, FTAG, policy, error = spa_open_rewind(target, &spa, FTAG, policy,
NULL); NULL);
if (error) { if (error) {
@ -4462,6 +4470,7 @@ main(int argc, char **argv)
} }
} }
} else { } else {
zdb_set_skip_mmp(target);
error = open_objset(target, DMU_OST_ANY, FTAG, &os); error = open_objset(target, DMU_OST_ANY, FTAG, &os);
} }
} }

View File

@ -3493,7 +3493,7 @@ single_histo_average(uint64_t *histo, unsigned int buckets)
static void static void
print_iostat_queues(iostat_cbdata_t *cb, nvlist_t *oldnv, print_iostat_queues(iostat_cbdata_t *cb, nvlist_t *oldnv,
nvlist_t *newnv, double scale) nvlist_t *newnv)
{ {
int i; int i;
uint64_t val; uint64_t val;
@ -3523,7 +3523,7 @@ print_iostat_queues(iostat_cbdata_t *cb, nvlist_t *oldnv,
format = ZFS_NICENUM_1024; format = ZFS_NICENUM_1024;
for (i = 0; i < ARRAY_SIZE(names); i++) { for (i = 0; i < ARRAY_SIZE(names); i++) {
val = nva[i].data[0] * scale; val = nva[i].data[0];
print_one_stat(val, format, column_width, cb->cb_scripted); print_one_stat(val, format, column_width, cb->cb_scripted);
} }
@ -3532,7 +3532,7 @@ print_iostat_queues(iostat_cbdata_t *cb, nvlist_t *oldnv,
static void static void
print_iostat_latency(iostat_cbdata_t *cb, nvlist_t *oldnv, print_iostat_latency(iostat_cbdata_t *cb, nvlist_t *oldnv,
nvlist_t *newnv, double scale) nvlist_t *newnv)
{ {
int i; int i;
uint64_t val; uint64_t val;
@ -3562,7 +3562,7 @@ print_iostat_latency(iostat_cbdata_t *cb, nvlist_t *oldnv,
/* Print our avg latencies on the line */ /* Print our avg latencies on the line */
for (i = 0; i < ARRAY_SIZE(names); i++) { for (i = 0; i < ARRAY_SIZE(names); i++) {
/* Compute average latency for a latency histo */ /* Compute average latency for a latency histo */
val = single_histo_average(nva[i].data, nva[i].count) * scale; val = single_histo_average(nva[i].data, nva[i].count);
print_one_stat(val, format, column_width, cb->cb_scripted); print_one_stat(val, format, column_width, cb->cb_scripted);
} }
free_calc_stats(nva, ARRAY_SIZE(names)); free_calc_stats(nva, ARRAY_SIZE(names));
@ -3701,9 +3701,9 @@ print_vdev_stats(zpool_handle_t *zhp, const char *name, nvlist_t *oldnv,
print_iostat_default(calcvs, cb, scale); print_iostat_default(calcvs, cb, scale);
} }
if (cb->cb_flags & IOS_LATENCY_M) if (cb->cb_flags & IOS_LATENCY_M)
print_iostat_latency(cb, oldnv, newnv, scale); print_iostat_latency(cb, oldnv, newnv);
if (cb->cb_flags & IOS_QUEUES_M) if (cb->cb_flags & IOS_QUEUES_M)
print_iostat_queues(cb, oldnv, newnv, scale); print_iostat_queues(cb, oldnv, newnv);
if (cb->cb_flags & IOS_ANYHISTO_M) { if (cb->cb_flags & IOS_ANYHISTO_M) {
printf("\n"); printf("\n");
print_iostat_histos(cb, oldnv, newnv, scale, name); print_iostat_histos(cb, oldnv, newnv, scale, name);

View File

@ -171,8 +171,8 @@ typedef struct ztest_shared_opts {
} ztest_shared_opts_t; } ztest_shared_opts_t;
static const ztest_shared_opts_t ztest_opts_defaults = { static const ztest_shared_opts_t ztest_opts_defaults = {
.zo_pool = { 'z', 't', 'e', 's', 't', '\0' }, .zo_pool = "ztest",
.zo_dir = { '/', 't', 'm', 'p', '\0' }, .zo_dir = "/tmp",
.zo_alt_ztest = { '\0' }, .zo_alt_ztest = { '\0' },
.zo_alt_libpath = { '\0' }, .zo_alt_libpath = { '\0' },
.zo_vdevs = 5, .zo_vdevs = 5,
@ -1189,7 +1189,7 @@ ztest_spa_prop_set_uint64(zpool_prop_t prop, uint64_t value)
*/ */
typedef struct { typedef struct {
list_node_t z_lnode; list_node_t z_lnode;
refcount_t z_refcnt; zfs_refcount_t z_refcnt;
uint64_t z_object; uint64_t z_object;
zfs_rlock_t z_range_lock; zfs_rlock_t z_range_lock;
} ztest_znode_t; } ztest_znode_t;
@ -1205,7 +1205,7 @@ ztest_znode_init(uint64_t object)
ztest_znode_t *zp = umem_alloc(sizeof (*zp), UMEM_NOFAIL); ztest_znode_t *zp = umem_alloc(sizeof (*zp), UMEM_NOFAIL);
list_link_init(&zp->z_lnode); list_link_init(&zp->z_lnode);
refcount_create(&zp->z_refcnt); zfs_refcount_create(&zp->z_refcnt);
zp->z_object = object; zp->z_object = object;
zfs_rlock_init(&zp->z_range_lock); zfs_rlock_init(&zp->z_range_lock);
@ -1215,10 +1215,10 @@ ztest_znode_init(uint64_t object)
static void static void
ztest_znode_fini(ztest_znode_t *zp) ztest_znode_fini(ztest_znode_t *zp)
{ {
ASSERT(refcount_is_zero(&zp->z_refcnt)); ASSERT(zfs_refcount_is_zero(&zp->z_refcnt));
zfs_rlock_destroy(&zp->z_range_lock); zfs_rlock_destroy(&zp->z_range_lock);
zp->z_object = 0; zp->z_object = 0;
refcount_destroy(&zp->z_refcnt); zfs_refcount_destroy(&zp->z_refcnt);
list_link_init(&zp->z_lnode); list_link_init(&zp->z_lnode);
umem_free(zp, sizeof (*zp)); umem_free(zp, sizeof (*zp));
} }
@ -1248,13 +1248,13 @@ ztest_znode_get(ztest_ds_t *zd, uint64_t object)
for (zp = list_head(&zll->z_list); (zp); for (zp = list_head(&zll->z_list); (zp);
zp = list_next(&zll->z_list, zp)) { zp = list_next(&zll->z_list, zp)) {
if (zp->z_object == object) { if (zp->z_object == object) {
refcount_add(&zp->z_refcnt, RL_TAG); zfs_refcount_add(&zp->z_refcnt, RL_TAG);
break; break;
} }
} }
if (zp == NULL) { if (zp == NULL) {
zp = ztest_znode_init(object); zp = ztest_znode_init(object);
refcount_add(&zp->z_refcnt, RL_TAG); zfs_refcount_add(&zp->z_refcnt, RL_TAG);
list_insert_head(&zll->z_list, zp); list_insert_head(&zll->z_list, zp);
} }
mutex_exit(&zll->z_lock); mutex_exit(&zll->z_lock);
@ -1268,8 +1268,8 @@ ztest_znode_put(ztest_ds_t *zd, ztest_znode_t *zp)
ASSERT3U(zp->z_object, !=, 0); ASSERT3U(zp->z_object, !=, 0);
zll = &zd->zd_range_lock[zp->z_object & (ZTEST_OBJECT_LOCKS - 1)]; zll = &zd->zd_range_lock[zp->z_object & (ZTEST_OBJECT_LOCKS - 1)];
mutex_enter(&zll->z_lock); mutex_enter(&zll->z_lock);
refcount_remove(&zp->z_refcnt, RL_TAG); zfs_refcount_remove(&zp->z_refcnt, RL_TAG);
if (refcount_is_zero(&zp->z_refcnt)) { if (zfs_refcount_is_zero(&zp->z_refcnt)) {
list_remove(&zll->z_list, zp); list_remove(&zll->z_list, zp);
ztest_znode_fini(zp); ztest_znode_fini(zp);
} }

View File

@ -0,0 +1,21 @@
dnl #
dnl # Linux 5.0: access_ok() drops 'type' parameter:
dnl #
dnl # - access_ok(type, addr, size)
dnl # + access_ok(addr, size)
dnl #
AC_DEFUN([ZFS_AC_KERNEL_ACCESS_OK_TYPE], [
AC_MSG_CHECKING([whether access_ok() has 'type' parameter])
ZFS_LINUX_TRY_COMPILE([
#include <linux/uaccess.h>
],[
const void __user __attribute__((unused)) *addr = (void *) 0xdeadbeef;
unsigned long __attribute__((unused)) size = 1;
int error __attribute__((unused)) = access_ok(0, addr, size);
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_ACCESS_OK_TYPE, 1, [kernel has access_ok with 'type' parameter])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -1,10 +1,10 @@
dnl # dnl #
dnl # Linux 4.14 API, dnl # Linux 4.14 API,
dnl # dnl #
dnl # The bio_set_dev() helper was introduced as part of the transition dnl # The bio_set_dev() helper macro was introduced as part of the transition
dnl # to have struct gendisk in struct bio. dnl # to have struct gendisk in struct bio.
dnl # dnl #
AC_DEFUN([ZFS_AC_KERNEL_BIO_SET_DEV], [ AC_DEFUN([ZFS_AC_KERNEL_BIO_SET_DEV_MACRO], [
AC_MSG_CHECKING([whether bio_set_dev() exists]) AC_MSG_CHECKING([whether bio_set_dev() exists])
ZFS_LINUX_TRY_COMPILE([ ZFS_LINUX_TRY_COMPILE([
#include <linux/bio.h> #include <linux/bio.h>
@ -20,3 +20,34 @@ AC_DEFUN([ZFS_AC_KERNEL_BIO_SET_DEV], [
AC_MSG_RESULT(no) AC_MSG_RESULT(no)
]) ])
]) ])
dnl #
dnl # Linux 5.0 API,
dnl #
dnl # The bio_set_dev() helper macro was updated to internally depend on
dnl # bio_associate_blkg() symbol which is exported GPL-only.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BIO_SET_DEV_GPL_ONLY], [
AC_MSG_CHECKING([whether bio_set_dev() is GPL-only])
ZFS_LINUX_TRY_COMPILE([
#include <linux/module.h>
#include <linux/bio.h>
#include <linux/fs.h>
MODULE_LICENSE("$ZFS_META_LICENSE");
],[
struct block_device *bdev = NULL;
struct bio *bio = NULL;
bio_set_dev(bio, bdev);
],[
AC_MSG_RESULT(no)
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BIO_SET_DEV_GPL_ONLY, 1,
[bio_set_dev() GPL-only])
])
])
AC_DEFUN([ZFS_AC_KERNEL_BIO_SET_DEV], [
ZFS_AC_KERNEL_BIO_SET_DEV_MACRO
ZFS_AC_KERNEL_BIO_SET_DEV_GPL_ONLY
])

View File

@ -1,18 +1,41 @@
dnl # dnl #
dnl # 4.2 API change dnl # Handle differences in kernel FPU code.
dnl # asm/i387.h is replaced by asm/fpu/api.h dnl #
dnl # Kernel
dnl # 5.0: All kernel fpu functions are GPL only, so we can't use them.
dnl # (nothing defined)
dnl #
dnl # 4.2: Use __kernel_fpu_{begin,end}()
dnl # HAVE_UNDERSCORE_KERNEL_FPU & KERNEL_EXPORTS_X86_FPU
dnl #
dnl # Pre-4.2: Use kernel_fpu_{begin,end}()
dnl # HAVE_KERNEL_FPU & KERNEL_EXPORTS_X86_FPU
dnl # dnl #
AC_DEFUN([ZFS_AC_KERNEL_FPU], [ AC_DEFUN([ZFS_AC_KERNEL_FPU], [
AC_MSG_CHECKING([whether asm/fpu/api.h exists]) AC_MSG_CHECKING([which kernel_fpu function to use])
ZFS_LINUX_TRY_COMPILE([ ZFS_LINUX_TRY_COMPILE([
#include <linux/kernel.h> #include <asm/i387.h>
#include <asm/fpu/api.h> #include <asm/xcr.h>
],[ ],[
__kernel_fpu_begin(); kernel_fpu_begin();
kernel_fpu_end();
],[ ],[
AC_MSG_RESULT(yes) AC_MSG_RESULT(kernel_fpu_*)
AC_DEFINE(HAVE_FPU_API_H, 1, [kernel has <asm/fpu/api.h> interface]) AC_DEFINE(HAVE_KERNEL_FPU, 1, [kernel has kernel_fpu_* functions])
AC_DEFINE(KERNEL_EXPORTS_X86_FPU, 1, [kernel exports FPU functions])
],[ ],[
AC_MSG_RESULT(no) ZFS_LINUX_TRY_COMPILE([
#include <linux/kernel.h>
#include <asm/fpu/api.h>
],[
__kernel_fpu_begin();
__kernel_fpu_end();
],[
AC_MSG_RESULT(__kernel_fpu_*)
AC_DEFINE(HAVE_UNDERSCORE_KERNEL_FPU, 1, [kernel has __kernel_fpu_* functions])
AC_DEFINE(KERNEL_EXPORTS_X86_FPU, 1, [kernel exports FPU functions])
],[
AC_MSG_RESULT(not exported)
])
]) ])
]) ])

View File

@ -0,0 +1,20 @@
dnl #
dnl # 4.5 API change
dnl # Added in_compat_syscall() which can be overridden on a per-
dnl # architecture basis. Prior to this is_compat_task() was the
dnl # provided interface.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_IN_COMPAT_SYSCALL], [
AC_MSG_CHECKING([whether in_compat_syscall() is available])
ZFS_LINUX_TRY_COMPILE([
#include <linux/compat.h>
],[
in_compat_syscall();
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_IN_COMPAT_SYSCALL, 1,
[in_compat_syscall() is available])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -0,0 +1,26 @@
dnl #
dnl # Determine an available miscellaneous minor number which can be used
dnl # for the /dev/zfs device. This is needed because kernel module
dnl # auto-loading depends on registering a reserved non-conflicting minor
dnl # number. Start with a large known available unreserved minor and work
dnl # our way down to lower value if a collision is detected.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_MISC_MINOR], [
AC_MSG_CHECKING([for available /dev/zfs minor])
for i in $(seq 249 -1 200); do
if ! grep -q "^#define\s\+.*_MINOR\s\+.*$i" \
${LINUX}/include/linux/miscdevice.h; then
ZFS_DEVICE_MINOR="$i"
AC_MSG_RESULT($ZFS_DEVICE_MINOR)
AC_DEFINE_UNQUOTED([ZFS_DEVICE_MINOR],
[$ZFS_DEVICE_MINOR], [/dev/zfs minor])
break
fi
done
AS_IF([ test -z "$ZFS_DEVICE_MINOR"], [
AC_MSG_ERROR([
*** No available misc minor numbers available for use.])
])
])

View File

@ -5,7 +5,9 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL ZFS_AC_KERNEL
ZFS_AC_SPL ZFS_AC_SPL
ZFS_AC_QAT ZFS_AC_QAT
ZFS_AC_KERNEL_ACCESS_OK_TYPE
ZFS_AC_TEST_MODULE ZFS_AC_TEST_MODULE
ZFS_AC_KERNEL_MISC_MINOR
ZFS_AC_KERNEL_OBJTOOL ZFS_AC_KERNEL_OBJTOOL
ZFS_AC_KERNEL_CONFIG ZFS_AC_KERNEL_CONFIG
ZFS_AC_KERNEL_DECLARE_EVENT_CLASS ZFS_AC_KERNEL_DECLARE_EVENT_CLASS
@ -129,6 +131,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_GLOBAL_PAGE_STATE ZFS_AC_KERNEL_GLOBAL_PAGE_STATE
ZFS_AC_KERNEL_ACL_HAS_REFCOUNT ZFS_AC_KERNEL_ACL_HAS_REFCOUNT
ZFS_AC_KERNEL_USERNS_CAPABILITIES ZFS_AC_KERNEL_USERNS_CAPABILITIES
ZFS_AC_KERNEL_IN_COMPAT_SYSCALL
AS_IF([test "$LINUX_OBJ" != "$LINUX"], [ AS_IF([test "$LINUX_OBJ" != "$LINUX"], [
KERNELMAKE_PARAMS="$KERNELMAKE_PARAMS O=$LINUX_OBJ" KERNELMAKE_PARAMS="$KERNELMAKE_PARAMS O=$LINUX_OBJ"
@ -254,7 +257,7 @@ AC_DEFUN([ZFS_AC_KERNEL], [
AS_IF([test "$utsrelease"], [ AS_IF([test "$utsrelease"], [
kernsrcver=`(echo "#include <$utsrelease>"; kernsrcver=`(echo "#include <$utsrelease>";
echo "kernsrcver=UTS_RELEASE") | echo "kernsrcver=UTS_RELEASE") |
cpp -I $kernelbuild/include | ${CPP} -I $kernelbuild/include - |
grep "^kernsrcver=" | cut -d \" -f 2` grep "^kernsrcver=" | cut -d \" -f 2`
AS_IF([test -z "$kernsrcver"], [ AS_IF([test -z "$kernsrcver"], [

View File

@ -122,6 +122,9 @@ AC_CONFIG_FILES([
contrib/dracut/02zfsexpandknowledge/Makefile contrib/dracut/02zfsexpandknowledge/Makefile
contrib/dracut/90zfs/Makefile contrib/dracut/90zfs/Makefile
contrib/initramfs/Makefile contrib/initramfs/Makefile
contrib/initramfs/hooks/Makefile
contrib/initramfs/scripts/Makefile
contrib/initramfs/scripts/local-top/Makefile
module/Makefile module/Makefile
module/avl/Makefile module/avl/Makefile
module/nvpair/Makefile module/nvpair/Makefile

View File

@ -24,6 +24,7 @@ $(pkgdracut_SCRIPTS):%:%.in
-e 's,@udevruledir\@,$(udevruledir),g' \ -e 's,@udevruledir\@,$(udevruledir),g' \
-e 's,@sysconfdir\@,$(sysconfdir),g' \ -e 's,@sysconfdir\@,$(sysconfdir),g' \
-e 's,@systemdunitdir\@,$(systemdunitdir),g' \ -e 's,@systemdunitdir\@,$(systemdunitdir),g' \
-e 's,@mounthelperdir\@,$(mounthelperdir),g' \
$< >'$@' $< >'$@'
distclean-local:: distclean-local::

View File

@ -5,7 +5,7 @@ check() {
[ "${1}" = "-d" ] && return 0 [ "${1}" = "-d" ] && return 0
# Verify the zfs tool chain # Verify the zfs tool chain
for tool in "@sbindir@/zpool" "@sbindir@/zfs" "@sbindir@/mount.zfs" ; do for tool in "@sbindir@/zpool" "@sbindir@/zfs" "@mounthelperdir@/mount.zfs" ; do
test -x "$tool" || return 1 test -x "$tool" || return 1
done done
# Verify grep exists # Verify grep exists
@ -53,7 +53,7 @@ install() {
# Fallback: Guess the path and include all matches # Fallback: Guess the path and include all matches
dracut_install /usr/lib/gcc/*/*/libgcc_s.so* dracut_install /usr/lib/gcc/*/*/libgcc_s.so*
fi fi
dracut_install @sbindir@/mount.zfs dracut_install @mounthelperdir@/mount.zfs
dracut_install @udevdir@/vdev_id dracut_install @udevdir@/vdev_id
dracut_install awk dracut_install awk
dracut_install head dracut_install head

View File

@ -3,12 +3,11 @@ initrddir = $(datarootdir)/initramfs-tools
initrd_SCRIPTS = \ initrd_SCRIPTS = \
conf.d/zfs conf-hooks.d/zfs hooks/zfs scripts/zfs scripts/local-top/zfs conf.d/zfs conf-hooks.d/zfs hooks/zfs scripts/zfs scripts/local-top/zfs
SUBDIRS = hooks scripts
EXTRA_DIST = \ EXTRA_DIST = \
$(top_srcdir)/contrib/initramfs/conf.d/zfs \ $(top_srcdir)/contrib/initramfs/conf.d/zfs \
$(top_srcdir)/contrib/initramfs/conf-hooks.d/zfs \ $(top_srcdir)/contrib/initramfs/conf-hooks.d/zfs \
$(top_srcdir)/contrib/initramfs/hooks/zfs \
$(top_srcdir)/contrib/initramfs/scripts/zfs \
$(top_srcdir)/contrib/initramfs/scripts/local-top/zfs \
$(top_srcdir)/contrib/initramfs/README.initramfs.markdown $(top_srcdir)/contrib/initramfs/README.initramfs.markdown
install-initrdSCRIPTS: $(EXTRA_DIST) install-initrdSCRIPTS: $(EXTRA_DIST)

1
contrib/initramfs/hooks/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
zfs

View File

@ -0,0 +1,21 @@
hooksdir = $(datarootdir)/initramfs-tools/hooks
hooks_SCRIPTS = \
zfs
EXTRA_DIST = \
$(top_srcdir)/contrib/initramfs/hooks/zfs.in
$(hooks_SCRIPTS):%:%.in
-$(SED) -e 's,@sbindir\@,$(sbindir),g' \
-e 's,@sysconfdir\@,$(sysconfdir),g' \
-e 's,@udevdir\@,$(udevdir),g' \
-e 's,@udevruledir\@,$(udevruledir),g' \
-e 's,@mounthelperdir\@,$(mounthelperdir),g' \
$< >'$@'
clean-local::
-$(RM) $(hooks_SCRIPTS)
distclean-local::
-$(RM) $(hooks_SCRIPTS)

View File

@ -8,11 +8,13 @@ PREREQ="zdev"
# These prerequisites are provided by the zfsutils package. The zdb utility is # These prerequisites are provided by the zfsutils package. The zdb utility is
# not strictly required, but it can be useful at the initramfs recovery prompt. # not strictly required, but it can be useful at the initramfs recovery prompt.
COPY_EXEC_LIST="/sbin/zdb /sbin/zpool /sbin/zfs /sbin/mount.zfs" COPY_EXEC_LIST="@sbindir@/zdb @sbindir@/zpool @sbindir@/zfs"
COPY_EXEC_LIST="$COPY_EXEC_LIST /usr/bin/dirname /lib/udev/vdev_id" COPY_EXEC_LIST="$COPY_EXEC_LIST @mounthelperdir@/mount.zfs @udevdir@/vdev_id"
COPY_FILE_LIST="/etc/hostid /etc/zfs/zpool.cache /etc/default/zfs" COPY_FILE_LIST="/etc/hostid @sysconfdir@/zfs/zpool.cache"
COPY_FILE_LIST="$COPY_FILE_LIST /etc/zfs/zfs-functions /etc/zfs/vdev_id.conf" COPY_FILE_LIST="$COPY_FILE_LIST @sysconfdir@/default/zfs"
COPY_FILE_LIST="$COPY_FILE_LIST /lib/udev/rules.d/69-vdev.rules" COPY_FILE_LIST="$COPY_FILE_LIST @sysconfdir@/zfs/zfs-functions"
COPY_FILE_LIST="$COPY_FILE_LIST @sysconfdir@/zfs/vdev_id.conf"
COPY_FILE_LIST="$COPY_FILE_LIST @udevruledir@/69-vdev.rules"
# These prerequisites are provided by the base system. # These prerequisites are provided by the base system.
COPY_EXEC_LIST="$COPY_EXEC_LIST /usr/bin/dirname /bin/hostname /sbin/blkid" COPY_EXEC_LIST="$COPY_EXEC_LIST /usr/bin/dirname /bin/hostname /sbin/blkid"

1
contrib/initramfs/scripts/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
zfs

View File

@ -0,0 +1,20 @@
scriptsdir = $(datarootdir)/initramfs-tools/scripts
scripts_SCRIPTS = \
zfs
SUBDIRS = local-top
EXTRA_DIST = \
$(top_srcdir)/contrib/initramfs/scripts/zfs.in
$(scripts_SCRIPTS):%:%.in
-$(SED) -e 's,@sbindir\@,$(sbindir),g' \
-e 's,@sysconfdir\@,$(sysconfdir),g' \
$< >'$@'
clean-local::
-$(RM) $(scripts_SCRIPTS)
distclean-local::
-$(RM) $(scripts_SCRIPTS)

View File

@ -0,0 +1,3 @@
localtopdir = $(datarootdir)/initramfs-tools/scripts/local-top
EXTRA_DIST = zfs

View File

@ -11,9 +11,9 @@
# Paths to what we need - in the initrd, these paths are hardcoded, # Paths to what we need - in the initrd, these paths are hardcoded,
# so override the defines in zfs-functions. # so override the defines in zfs-functions.
ZFS="/sbin/zfs" ZFS="@sbindir@/zfs"
ZPOOL="/sbin/zpool" ZPOOL="@sbindir@/zpool"
ZPOOL_CACHE="/etc/zfs/zpool.cache" ZPOOL_CACHE="@sysconfdir@/zfs/zpool.cache"
export ZFS ZPOOL ZPOOL_CACHE export ZFS ZPOOL ZPOOL_CACHE
# This runs any scripts that should run before we start importing # This runs any scripts that should run before we start importing
@ -193,7 +193,7 @@ import_pool()
# Verify that the pool isn't already imported # Verify that the pool isn't already imported
# Make as sure as we can to not require '-f' to import. # Make as sure as we can to not require '-f' to import.
"${ZPOOL}" status "$pool" > /dev/null 2>&1 && return 0 "${ZPOOL}" get name,guid -o value -H 2>/dev/null | grep -Fxq "$pool" && return 0
# For backwards compatibility, make sure that ZPOOL_IMPORT_PATH is set # For backwards compatibility, make sure that ZPOOL_IMPORT_PATH is set
# to something we can use later with the real import(s). We want to # to something we can use later with the real import(s). We want to
@ -772,6 +772,7 @@ mountroot()
# root=zfs:<pool>/<dataset> (uses this for rpool - first part, without 'zfs:') # root=zfs:<pool>/<dataset> (uses this for rpool - first part, without 'zfs:')
# #
# Option <dataset> could also be <snapshot> # Option <dataset> could also be <snapshot>
# Option <pool> could also be <guid>
# ------------ # ------------
# Support force option # Support force option
@ -889,6 +890,14 @@ mountroot()
/bin/sh /bin/sh
fi fi
# In case the pool was specified as guid, resolve guid to name
pool="$("${ZPOOL}" get name,guid -o name,value -H | \
awk -v pool="${ZFS_RPOOL}" '$2 == pool { print $1 }')"
if [ -n "$pool" ]; then
ZFS_BOOTFS="${pool}/${ZFS_BOOTFS#*/}"
ZFS_RPOOL="${pool}"
fi
# Set elevator=noop on the root pool's vdevs' disks. ZFS already # Set elevator=noop on the root pool's vdevs' disks. ZFS already
# does this for wholedisk vdevs (for all pools), so this is only # does this for wholedisk vdevs (for all pools), so this is only
# important for partitions. # important for partitions.

View File

@ -1,3 +1,3 @@
# Always load kernel modules at boot. The default behavior is to load the # The default behavior is to allow udev to load the kernel modules on demand.
# kernel modules in the zfs-import-*.service or when blkid(8) detects a pool. # Uncomment the following line to unconditionally load them at boot.
#zfs #zfs

View File

@ -12,7 +12,6 @@ ConditionPathExists=@sysconfdir@/zfs/zpool.cache
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=yes RemainAfterExit=yes
ExecStartPre=-/sbin/modprobe zfs
ExecStart=@sbindir@/zpool import -c @sysconfdir@/zfs/zpool.cache -aN ExecStart=@sbindir@/zpool import -c @sysconfdir@/zfs/zpool.cache -aN
[Install] [Install]

View File

@ -11,7 +11,6 @@ ConditionPathExists=!@sysconfdir@/zfs/zpool.cache
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=yes RemainAfterExit=yes
ExecStartPre=-/sbin/modprobe zfs
ExecStart=@sbindir@/zpool import -aN -o cachefile=none ExecStart=@sbindir@/zpool import -aN -o cachefile=none
[Install] [Install]

View File

@ -4,6 +4,7 @@ pkgsysconf_DATA = \
vdev_id.conf.alias.example \ vdev_id.conf.alias.example \
vdev_id.conf.sas_direct.example \ vdev_id.conf.sas_direct.example \
vdev_id.conf.sas_switch.example \ vdev_id.conf.sas_switch.example \
vdev_id.conf.multipath.example vdev_id.conf.multipath.example \
vdev_id.conf.scsi.example
EXTRA_DIST = $(pkgsysconf_DATA) EXTRA_DIST = $(pkgsysconf_DATA)

View File

@ -2,6 +2,9 @@ multipath no
topology sas_direct topology sas_direct
phys_per_port 4 phys_per_port 4
# Additionally create /dev/by-enclousure/ symlinks for enclosure devices
enclosure_symlinks yes
# PCI_ID HBA PORT CHANNEL NAME # PCI_ID HBA PORT CHANNEL NAME
channel 85:00.0 1 A channel 85:00.0 1 A
channel 85:00.0 0 B channel 85:00.0 0 B

View File

@ -0,0 +1,9 @@
multipath no
topology scsi
phys_per_port 1
# Usually scsi disks are numbered from 0, but this can be offset, to
# match the physical bay numbers, as follows:
first_bay_number 1
# PCI_ID HBA PORT CHANNEL NAME
channel 0c:00.0 0 Y

View File

@ -27,6 +27,7 @@
#define _ZFS_KMAP_H #define _ZFS_KMAP_H
#include <linux/highmem.h> #include <linux/highmem.h>
#include <linux/uaccess.h>
#ifdef HAVE_1ARG_KMAP_ATOMIC #ifdef HAVE_1ARG_KMAP_ATOMIC
/* 2.6.37 API change */ /* 2.6.37 API change */
@ -37,4 +38,11 @@
#define zfs_kunmap_atomic(addr, km_type) kunmap_atomic(addr, km_type) #define zfs_kunmap_atomic(addr, km_type) kunmap_atomic(addr, km_type)
#endif #endif
/* 5.0 API change - no more 'type' argument for access_ok() */
#ifdef HAVE_ACCESS_OK_TYPE
#define zfs_access_ok(type, addr, size) access_ok(type, addr, size)
#else
#define zfs_access_ok(type, addr, size) access_ok(addr, size)
#endif
#endif /* _ZFS_KMAP_H */ #endif /* _ZFS_KMAP_H */

View File

@ -81,7 +81,7 @@
#endif #endif
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(HAVE_FPU_API_H) #if defined(HAVE_UNDERSCORE_KERNEL_FPU)
#include <asm/fpu/api.h> #include <asm/fpu/api.h>
#include <asm/fpu/internal.h> #include <asm/fpu/internal.h>
#define kfpu_begin() \ #define kfpu_begin() \
@ -94,12 +94,18 @@
__kernel_fpu_end(); \ __kernel_fpu_end(); \
preempt_enable(); \ preempt_enable(); \
} }
#else #elif defined(HAVE_KERNEL_FPU)
#include <asm/i387.h> #include <asm/i387.h>
#include <asm/xcr.h> #include <asm/xcr.h>
#define kfpu_begin() kernel_fpu_begin() #define kfpu_begin() kernel_fpu_begin()
#define kfpu_end() kernel_fpu_end() #define kfpu_end() kernel_fpu_end()
#endif /* defined(HAVE_FPU_API_H) */ #else
/* Kernel doesn't export any kernel_fpu_* functions */
#include <asm/fpu/internal.h> /* For kernel xgetbv() */
#define kfpu_begin() panic("This code should never run")
#define kfpu_end() panic("This code should never run")
#endif /* defined(HAVE_KERNEL_FPU) */
#else #else
/* /*
* fpu dummy methods for userspace * fpu dummy methods for userspace
@ -278,11 +284,13 @@ __simd_state_enabled(const uint64_t state)
boolean_t has_osxsave; boolean_t has_osxsave;
uint64_t xcr0; uint64_t xcr0;
#if defined(_KERNEL) && defined(X86_FEATURE_OSXSAVE) #if defined(_KERNEL)
#if defined(X86_FEATURE_OSXSAVE) && defined(KERNEL_EXPORTS_X86_FPU)
has_osxsave = !!boot_cpu_has(X86_FEATURE_OSXSAVE); has_osxsave = !!boot_cpu_has(X86_FEATURE_OSXSAVE);
#elif defined(_KERNEL) && !defined(X86_FEATURE_OSXSAVE)
has_osxsave = B_FALSE;
#else #else
has_osxsave = B_FALSE;
#endif
#elif !defined(_KERNEL)
has_osxsave = __cpuid_has_osxsave(); has_osxsave = __cpuid_has_osxsave();
#endif #endif
@ -307,8 +315,12 @@ static inline boolean_t
zfs_sse_available(void) zfs_sse_available(void)
{ {
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_XMM)); return (!!boot_cpu_has(X86_FEATURE_XMM));
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_sse()); return (__cpuid_has_sse());
#endif #endif
} }
@ -320,8 +332,12 @@ static inline boolean_t
zfs_sse2_available(void) zfs_sse2_available(void)
{ {
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_XMM2)); return (!!boot_cpu_has(X86_FEATURE_XMM2));
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_sse2()); return (__cpuid_has_sse2());
#endif #endif
} }
@ -333,8 +349,12 @@ static inline boolean_t
zfs_sse3_available(void) zfs_sse3_available(void)
{ {
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_XMM3)); return (!!boot_cpu_has(X86_FEATURE_XMM3));
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_sse3()); return (__cpuid_has_sse3());
#endif #endif
} }
@ -346,8 +366,12 @@ static inline boolean_t
zfs_ssse3_available(void) zfs_ssse3_available(void)
{ {
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_SSSE3)); return (!!boot_cpu_has(X86_FEATURE_SSSE3));
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_ssse3()); return (__cpuid_has_ssse3());
#endif #endif
} }
@ -359,8 +383,12 @@ static inline boolean_t
zfs_sse4_1_available(void) zfs_sse4_1_available(void)
{ {
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_XMM4_1)); return (!!boot_cpu_has(X86_FEATURE_XMM4_1));
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_sse4_1()); return (__cpuid_has_sse4_1());
#endif #endif
} }
@ -372,8 +400,12 @@ static inline boolean_t
zfs_sse4_2_available(void) zfs_sse4_2_available(void)
{ {
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_XMM4_2)); return (!!boot_cpu_has(X86_FEATURE_XMM4_2));
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_sse4_2()); return (__cpuid_has_sse4_2());
#endif #endif
} }
@ -386,8 +418,12 @@ zfs_avx_available(void)
{ {
boolean_t has_avx; boolean_t has_avx;
#if defined(_KERNEL) #if defined(_KERNEL)
#if defined(KERNEL_EXPORTS_X86_FPU)
has_avx = !!boot_cpu_has(X86_FEATURE_AVX); has_avx = !!boot_cpu_has(X86_FEATURE_AVX);
#else #else
has_avx = B_FALSE;
#endif
#elif !defined(_KERNEL)
has_avx = __cpuid_has_avx(); has_avx = __cpuid_has_avx();
#endif #endif
@ -401,11 +437,13 @@ static inline boolean_t
zfs_avx2_available(void) zfs_avx2_available(void)
{ {
boolean_t has_avx2; boolean_t has_avx2;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX2) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX2) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx2 = !!boot_cpu_has(X86_FEATURE_AVX2); has_avx2 = !!boot_cpu_has(X86_FEATURE_AVX2);
#elif defined(_KERNEL) && !defined(X86_FEATURE_AVX2)
has_avx2 = B_FALSE;
#else #else
has_avx2 = B_FALSE;
#endif
#elif !defined(_KERNEL)
has_avx2 = __cpuid_has_avx2(); has_avx2 = __cpuid_has_avx2();
#endif #endif
@ -418,11 +456,13 @@ zfs_avx2_available(void)
static inline boolean_t static inline boolean_t
zfs_bmi1_available(void) zfs_bmi1_available(void)
{ {
#if defined(_KERNEL) && defined(X86_FEATURE_BMI1) #if defined(_KERNEL)
#if defined(X86_FEATURE_BMI1) && defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_BMI1)); return (!!boot_cpu_has(X86_FEATURE_BMI1));
#elif defined(_KERNEL) && !defined(X86_FEATURE_BMI1)
return (B_FALSE);
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_bmi1()); return (__cpuid_has_bmi1());
#endif #endif
} }
@ -433,16 +473,17 @@ zfs_bmi1_available(void)
static inline boolean_t static inline boolean_t
zfs_bmi2_available(void) zfs_bmi2_available(void)
{ {
#if defined(_KERNEL) && defined(X86_FEATURE_BMI2) #if defined(_KERNEL)
#if defined(X86_FEATURE_BMI2) && defined(KERNEL_EXPORTS_X86_FPU)
return (!!boot_cpu_has(X86_FEATURE_BMI2)); return (!!boot_cpu_has(X86_FEATURE_BMI2));
#elif defined(_KERNEL) && !defined(X86_FEATURE_BMI2)
return (B_FALSE);
#else #else
return (B_FALSE);
#endif
#elif !defined(_KERNEL)
return (__cpuid_has_bmi2()); return (__cpuid_has_bmi2());
#endif #endif
} }
/* /*
* AVX-512 family of instruction sets: * AVX-512 family of instruction sets:
* *
@ -466,8 +507,12 @@ zfs_avx512f_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512F) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512F) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = !!boot_cpu_has(X86_FEATURE_AVX512F); has_avx512 = !!boot_cpu_has(X86_FEATURE_AVX512F);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512f(); has_avx512 = __cpuid_has_avx512f();
#endif #endif
@ -481,9 +526,13 @@ zfs_avx512cd_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512CD) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512CD) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512CD); boot_cpu_has(X86_FEATURE_AVX512CD);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512cd(); has_avx512 = __cpuid_has_avx512cd();
#endif #endif
@ -497,9 +546,13 @@ zfs_avx512er_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512ER) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512ER) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512ER); boot_cpu_has(X86_FEATURE_AVX512ER);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512er(); has_avx512 = __cpuid_has_avx512er();
#endif #endif
@ -513,9 +566,13 @@ zfs_avx512pf_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512PF) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512PF) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512PF); boot_cpu_has(X86_FEATURE_AVX512PF);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512pf(); has_avx512 = __cpuid_has_avx512pf();
#endif #endif
@ -529,9 +586,13 @@ zfs_avx512bw_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512BW) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512BW) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512BW); boot_cpu_has(X86_FEATURE_AVX512BW);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512bw(); has_avx512 = __cpuid_has_avx512bw();
#endif #endif
@ -545,9 +606,13 @@ zfs_avx512dq_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512DQ) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512DQ) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512DQ); boot_cpu_has(X86_FEATURE_AVX512DQ);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512dq(); has_avx512 = __cpuid_has_avx512dq();
#endif #endif
@ -561,9 +626,13 @@ zfs_avx512vl_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512VL) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512VL) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512VL); boot_cpu_has(X86_FEATURE_AVX512VL);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512vl(); has_avx512 = __cpuid_has_avx512vl();
#endif #endif
@ -577,9 +646,13 @@ zfs_avx512ifma_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512IFMA) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512IFMA) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512IFMA); boot_cpu_has(X86_FEATURE_AVX512IFMA);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512ifma(); has_avx512 = __cpuid_has_avx512ifma();
#endif #endif
@ -593,9 +666,13 @@ zfs_avx512vbmi_available(void)
{ {
boolean_t has_avx512 = B_FALSE; boolean_t has_avx512 = B_FALSE;
#if defined(_KERNEL) && defined(X86_FEATURE_AVX512VBMI) #if defined(_KERNEL)
#if defined(X86_FEATURE_AVX512VBMI) && defined(KERNEL_EXPORTS_X86_FPU)
has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) && has_avx512 = boot_cpu_has(X86_FEATURE_AVX512F) &&
boot_cpu_has(X86_FEATURE_AVX512VBMI); boot_cpu_has(X86_FEATURE_AVX512VBMI);
#else
has_avx512 = B_FALSE;
#endif
#elif !defined(_KERNEL) #elif !defined(_KERNEL)
has_avx512 = __cpuid_has_avx512f() && has_avx512 = __cpuid_has_avx512f() &&
__cpuid_has_avx512vbmi(); __cpuid_has_avx512vbmi();

View File

@ -30,6 +30,7 @@
#include <sys/taskq.h> #include <sys/taskq.h>
#include <sys/cred.h> #include <sys/cred.h>
#include <linux/backing-dev.h> #include <linux/backing-dev.h>
#include <linux/compat.h>
/* /*
* 2.6.28 API change, * 2.6.28 API change,
@ -296,9 +297,6 @@ lseek_execute(
* This is several orders of magnitude larger than expected grace period. * This is several orders of magnitude larger than expected grace period.
* At 60 seconds the kernel will also begin issuing RCU stall warnings. * At 60 seconds the kernel will also begin issuing RCU stall warnings.
*/ */
#ifdef refcount_t
#undef refcount_t
#endif
#include <linux/posix_acl.h> #include <linux/posix_acl.h>
@ -429,8 +427,6 @@ typedef mode_t zpl_equivmode_t;
#define zpl_posix_acl_valid(ip, acl) posix_acl_valid(acl) #define zpl_posix_acl_valid(ip, acl) posix_acl_valid(acl)
#endif #endif
#define refcount_t zfs_refcount_t
#endif /* CONFIG_FS_POSIX_ACL */ #endif /* CONFIG_FS_POSIX_ACL */
/* /*
@ -626,4 +622,21 @@ inode_set_iversion(struct inode *ip, u64 val)
} }
#endif #endif
/*
* Returns true when called in the context of a 32-bit system call.
*/
static inline int
zpl_is_32bit_api(void)
{
#ifdef CONFIG_COMPAT
#ifdef HAVE_IN_COMPAT_SYSCALL
return (in_compat_syscall());
#else
return (is_compat_task());
#endif
#else
return (BITS_PER_LONG == 32);
#endif
}
#endif /* _ZFS_VFS_H */ #endif /* _ZFS_VFS_H */

View File

@ -52,7 +52,7 @@ typedef struct abd {
abd_flags_t abd_flags; abd_flags_t abd_flags;
uint_t abd_size; /* excludes scattered abd_offset */ uint_t abd_size; /* excludes scattered abd_offset */
struct abd *abd_parent; struct abd *abd_parent;
refcount_t abd_children; zfs_refcount_t abd_children;
union { union {
struct abd_scatter { struct abd_scatter {
uint_t abd_offset; uint_t abd_offset;

View File

@ -76,7 +76,7 @@ struct arc_prune {
void *p_private; void *p_private;
uint64_t p_adjust; uint64_t p_adjust;
list_node_t p_node; list_node_t p_node;
refcount_t p_refcnt; zfs_refcount_t p_refcnt;
}; };
typedef enum arc_strategy { typedef enum arc_strategy {

View File

@ -74,12 +74,12 @@ typedef struct arc_state {
/* /*
* total amount of evictable data in this state * total amount of evictable data in this state
*/ */
refcount_t arcs_esize[ARC_BUFC_NUMTYPES]; zfs_refcount_t arcs_esize[ARC_BUFC_NUMTYPES];
/* /*
* total amount of data in this state; this includes: evictable, * total amount of data in this state; this includes: evictable,
* non-evictable, ARC_BUFC_DATA, and ARC_BUFC_METADATA. * non-evictable, ARC_BUFC_DATA, and ARC_BUFC_METADATA.
*/ */
refcount_t arcs_size; zfs_refcount_t arcs_size;
/* /*
* supports the "dbufs" kstat * supports the "dbufs" kstat
*/ */
@ -163,7 +163,7 @@ typedef struct l1arc_buf_hdr {
uint32_t b_l2_hits; uint32_t b_l2_hits;
/* self protecting */ /* self protecting */
refcount_t b_refcnt; zfs_refcount_t b_refcnt;
arc_callback_t *b_acb; arc_callback_t *b_acb;
abd_t *b_pabd; abd_t *b_pabd;
@ -180,7 +180,7 @@ typedef struct l2arc_dev {
kmutex_t l2ad_mtx; /* lock for buffer list */ kmutex_t l2ad_mtx; /* lock for buffer list */
list_t l2ad_buflist; /* buffer list */ list_t l2ad_buflist; /* buffer list */
list_node_t l2ad_node; /* device list node */ list_node_t l2ad_node; /* device list node */
refcount_t l2ad_alloc; /* allocated bytes */ zfs_refcount_t l2ad_alloc; /* allocated bytes */
} l2arc_dev_t; } l2arc_dev_t;
typedef struct l2arc_buf_hdr { typedef struct l2arc_buf_hdr {

View File

@ -20,7 +20,7 @@
*/ */
/* /*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2012, 2015 by Delphix. All rights reserved. * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
* Copyright (c) 2013 by Saso Kiselkov. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
*/ */
@ -212,7 +212,7 @@ typedef struct dmu_buf_impl {
* If nonzero, the buffer can't be destroyed. * If nonzero, the buffer can't be destroyed.
* Protected by db_mtx. * Protected by db_mtx.
*/ */
refcount_t db_holds; zfs_refcount_t db_holds;
/* buffer holding our data */ /* buffer holding our data */
arc_buf_t *db_buf; arc_buf_t *db_buf;
@ -294,7 +294,7 @@ boolean_t dbuf_try_add_ref(dmu_buf_t *db, objset_t *os, uint64_t obj,
uint64_t dbuf_refcount(dmu_buf_impl_t *db); uint64_t dbuf_refcount(dmu_buf_impl_t *db);
void dbuf_rele(dmu_buf_impl_t *db, void *tag); void dbuf_rele(dmu_buf_impl_t *db, void *tag);
void dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag); void dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag, boolean_t evicting);
dmu_buf_impl_t *dbuf_find(struct objset *os, uint64_t object, uint8_t level, dmu_buf_impl_t *dbuf_find(struct objset *os, uint64_t object, uint8_t level,
uint64_t blkid); uint64_t blkid);

View File

@ -161,6 +161,7 @@ extern "C" {
* dn_allocated_txg * dn_allocated_txg
* dn_free_txg * dn_free_txg
* dn_assigned_txg * dn_assigned_txg
* dn_dirty_txg
* dd_assigned_tx * dd_assigned_tx
* dn_notxholds * dn_notxholds
* dn_dirtyctx * dn_dirtyctx

View File

@ -97,8 +97,8 @@ typedef struct dmu_tx_hold {
dmu_tx_t *txh_tx; dmu_tx_t *txh_tx;
list_node_t txh_node; list_node_t txh_node;
struct dnode *txh_dnode; struct dnode *txh_dnode;
refcount_t txh_space_towrite; zfs_refcount_t txh_space_towrite;
refcount_t txh_memory_tohold; zfs_refcount_t txh_memory_tohold;
enum dmu_tx_hold_type txh_type; enum dmu_tx_hold_type txh_type;
uint64_t txh_arg1; uint64_t txh_arg1;
uint64_t txh_arg2; uint64_t txh_arg2;

View File

@ -20,7 +20,7 @@
*/ */
/* /*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
*/ */
@ -260,13 +260,14 @@ struct dnode {
uint64_t dn_allocated_txg; uint64_t dn_allocated_txg;
uint64_t dn_free_txg; uint64_t dn_free_txg;
uint64_t dn_assigned_txg; uint64_t dn_assigned_txg;
uint64_t dn_dirty_txg; /* txg dnode was last dirtied */
kcondvar_t dn_notxholds; kcondvar_t dn_notxholds;
enum dnode_dirtycontext dn_dirtyctx; enum dnode_dirtycontext dn_dirtyctx;
uint8_t *dn_dirtyctx_firstset; /* dbg: contents meaningless */ uint8_t *dn_dirtyctx_firstset; /* dbg: contents meaningless */
/* protected by own devices */ /* protected by own devices */
refcount_t dn_tx_holds; zfs_refcount_t dn_tx_holds;
refcount_t dn_holds; zfs_refcount_t dn_holds;
kmutex_t dn_dbufs_mtx; kmutex_t dn_dbufs_mtx;
/* /*
@ -338,7 +339,7 @@ int dnode_hold_impl(struct objset *dd, uint64_t object, int flag, int dn_slots,
void *ref, dnode_t **dnp); void *ref, dnode_t **dnp);
boolean_t dnode_add_ref(dnode_t *dn, void *ref); boolean_t dnode_add_ref(dnode_t *dn, void *ref);
void dnode_rele(dnode_t *dn, void *ref); void dnode_rele(dnode_t *dn, void *ref);
void dnode_rele_and_unlock(dnode_t *dn, void *tag); void dnode_rele_and_unlock(dnode_t *dn, void *tag, boolean_t evicting);
void dnode_setdirty(dnode_t *dn, dmu_tx_t *tx); void dnode_setdirty(dnode_t *dn, dmu_tx_t *tx);
void dnode_sync(dnode_t *dn, dmu_tx_t *tx); void dnode_sync(dnode_t *dn, dmu_tx_t *tx);
void dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs, void dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs,
@ -362,6 +363,9 @@ void dnode_evict_dbufs(dnode_t *dn);
void dnode_evict_bonus(dnode_t *dn); void dnode_evict_bonus(dnode_t *dn);
void dnode_free_interior_slots(dnode_t *dn); void dnode_free_interior_slots(dnode_t *dn);
#define DNODE_IS_DIRTY(_dn) \
((_dn)->dn_dirty_txg >= spa_syncing_txg((_dn)->dn_objset->os_spa))
#define DNODE_IS_CACHEABLE(_dn) \ #define DNODE_IS_CACHEABLE(_dn) \
((_dn)->dn_objset->os_primary_cache == ZFS_CACHE_ALL || \ ((_dn)->dn_objset->os_primary_cache == ZFS_CACHE_ALL || \
(DMU_OT_IS_METADATA((_dn)->dn_type) && \ (DMU_OT_IS_METADATA((_dn)->dn_type) && \

View File

@ -186,7 +186,7 @@ typedef struct dsl_dataset {
* Owning counts as a long hold. See the comments above * Owning counts as a long hold. See the comments above
* dsl_pool_hold() for details. * dsl_pool_hold() for details.
*/ */
refcount_t ds_longholds; zfs_refcount_t ds_longholds;
/* no locking; only for making guesses */ /* no locking; only for making guesses */
uint64_t ds_trysnap_txg; uint64_t ds_trysnap_txg;

View File

@ -179,8 +179,7 @@ struct metaslab_class {
* number of allocations allowed. * number of allocations allowed.
*/ */
uint64_t mc_alloc_max_slots; uint64_t mc_alloc_max_slots;
refcount_t mc_alloc_slots; zfs_refcount_t mc_alloc_slots;
uint64_t mc_alloc_groups; /* # of allocatable groups */ uint64_t mc_alloc_groups; /* # of allocatable groups */
uint64_t mc_alloc; /* total allocated space */ uint64_t mc_alloc; /* total allocated space */
@ -230,7 +229,7 @@ struct metaslab_group {
* are unable to handle their share of allocations. * are unable to handle their share of allocations.
*/ */
uint64_t mg_max_alloc_queue_depth; uint64_t mg_max_alloc_queue_depth;
refcount_t mg_alloc_queue_depth; zfs_refcount_t mg_alloc_queue_depth;
/* /*
* A metalab group that can no longer allocate the minimum block * A metalab group that can no longer allocate the minimum block

View File

@ -41,17 +41,6 @@ extern "C" {
*/ */
#define FTAG ((char *)__func__) #define FTAG ((char *)__func__)
/*
* Starting with 4.11, torvalds/linux@f405df5, the linux kernel defines a
* refcount_t type of its own. The macro below effectively changes references
* in the ZFS code from refcount_t to zfs_refcount_t at compile time, so that
* existing code need not be altered, reducing conflicts when landing openZFS
* patches.
*/
#define refcount_t zfs_refcount_t
#define refcount_add zfs_refcount_add
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
typedef struct reference { typedef struct reference {
list_node_t ref_link; list_node_t ref_link;
@ -69,57 +58,60 @@ typedef struct refcount {
uint64_t rc_removed_count; uint64_t rc_removed_count;
} zfs_refcount_t; } zfs_refcount_t;
/* Note: refcount_t must be initialized with refcount_create[_untracked]() */ /*
* Note: zfs_refcount_t must be initialized with
* refcount_create[_untracked]()
*/
void refcount_create(refcount_t *rc); void zfs_refcount_create(zfs_refcount_t *);
void refcount_create_untracked(refcount_t *rc); void zfs_refcount_create_untracked(zfs_refcount_t *);
void refcount_create_tracked(refcount_t *rc); void zfs_refcount_create_tracked(zfs_refcount_t *);
void refcount_destroy(refcount_t *rc); void zfs_refcount_destroy(zfs_refcount_t *);
void refcount_destroy_many(refcount_t *rc, uint64_t number); void zfs_refcount_destroy_many(zfs_refcount_t *, uint64_t);
int refcount_is_zero(refcount_t *rc); int zfs_refcount_is_zero(zfs_refcount_t *);
int64_t refcount_count(refcount_t *rc); int64_t zfs_refcount_count(zfs_refcount_t *);
int64_t zfs_refcount_add(refcount_t *rc, void *holder_tag); int64_t zfs_refcount_add(zfs_refcount_t *, void *);
int64_t refcount_remove(refcount_t *rc, void *holder_tag); int64_t zfs_refcount_remove(zfs_refcount_t *, void *);
int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_tag); int64_t zfs_refcount_add_many(zfs_refcount_t *, uint64_t, void *);
int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder_tag); int64_t zfs_refcount_remove_many(zfs_refcount_t *, uint64_t, void *);
void refcount_transfer(refcount_t *dst, refcount_t *src); void zfs_refcount_transfer(zfs_refcount_t *, zfs_refcount_t *);
void refcount_transfer_ownership(refcount_t *, void *, void *); void zfs_refcount_transfer_ownership(zfs_refcount_t *, void *, void *);
boolean_t refcount_held(refcount_t *, void *); boolean_t zfs_refcount_held(zfs_refcount_t *, void *);
boolean_t refcount_not_held(refcount_t *, void *); boolean_t zfs_refcount_not_held(zfs_refcount_t *, void *);
void refcount_init(void); void zfs_refcount_init(void);
void refcount_fini(void); void zfs_refcount_fini(void);
#else /* ZFS_DEBUG */ #else /* ZFS_DEBUG */
typedef struct refcount { typedef struct refcount {
uint64_t rc_count; uint64_t rc_count;
} refcount_t; } zfs_refcount_t;
#define refcount_create(rc) ((rc)->rc_count = 0) #define zfs_refcount_create(rc) ((rc)->rc_count = 0)
#define refcount_create_untracked(rc) ((rc)->rc_count = 0) #define zfs_refcount_create_untracked(rc) ((rc)->rc_count = 0)
#define refcount_create_tracked(rc) ((rc)->rc_count = 0) #define zfs_refcount_create_tracked(rc) ((rc)->rc_count = 0)
#define refcount_destroy(rc) ((rc)->rc_count = 0) #define zfs_refcount_destroy(rc) ((rc)->rc_count = 0)
#define refcount_destroy_many(rc, number) ((rc)->rc_count = 0) #define zfs_refcount_destroy_many(rc, number) ((rc)->rc_count = 0)
#define refcount_is_zero(rc) ((rc)->rc_count == 0) #define zfs_refcount_is_zero(rc) ((rc)->rc_count == 0)
#define refcount_count(rc) ((rc)->rc_count) #define zfs_refcount_count(rc) ((rc)->rc_count)
#define zfs_refcount_add(rc, holder) atomic_inc_64_nv(&(rc)->rc_count) #define zfs_refcount_add(rc, holder) atomic_inc_64_nv(&(rc)->rc_count)
#define refcount_remove(rc, holder) atomic_dec_64_nv(&(rc)->rc_count) #define zfs_refcount_remove(rc, holder) atomic_dec_64_nv(&(rc)->rc_count)
#define refcount_add_many(rc, number, holder) \ #define zfs_refcount_add_many(rc, number, holder) \
atomic_add_64_nv(&(rc)->rc_count, number) atomic_add_64_nv(&(rc)->rc_count, number)
#define refcount_remove_many(rc, number, holder) \ #define zfs_refcount_remove_many(rc, number, holder) \
atomic_add_64_nv(&(rc)->rc_count, -number) atomic_add_64_nv(&(rc)->rc_count, -number)
#define refcount_transfer(dst, src) { \ #define zfs_refcount_transfer(dst, src) { \
uint64_t __tmp = (src)->rc_count; \ uint64_t __tmp = (src)->rc_count; \
atomic_add_64(&(src)->rc_count, -__tmp); \ atomic_add_64(&(src)->rc_count, -__tmp); \
atomic_add_64(&(dst)->rc_count, __tmp); \ atomic_add_64(&(dst)->rc_count, __tmp); \
} }
#define refcount_transfer_ownership(rc, current_holder, new_holder) (void)0 #define zfs_refcount_transfer_ownership(rc, current_holder, new_holder) (void)0
#define refcount_held(rc, holder) ((rc)->rc_count > 0) #define zfs_refcount_held(rc, holder) ((rc)->rc_count > 0)
#define refcount_not_held(rc, holder) (B_TRUE) #define zfs_refcount_not_held(rc, holder) (B_TRUE)
#define refcount_init() #define zfs_refcount_init()
#define refcount_fini() #define zfs_refcount_fini()
#endif /* ZFS_DEBUG */ #endif /* ZFS_DEBUG */

View File

@ -57,8 +57,8 @@ typedef struct rrwlock {
kmutex_t rr_lock; kmutex_t rr_lock;
kcondvar_t rr_cv; kcondvar_t rr_cv;
kthread_t *rr_writer; kthread_t *rr_writer;
refcount_t rr_anon_rcount; zfs_refcount_t rr_anon_rcount;
refcount_t rr_linked_rcount; zfs_refcount_t rr_linked_rcount;
boolean_t rr_writer_wanted; boolean_t rr_writer_wanted;
boolean_t rr_track_all; boolean_t rr_track_all;
} rrwlock_t; } rrwlock_t;

View File

@ -110,7 +110,7 @@ typedef struct sa_idx_tab {
list_node_t sa_next; list_node_t sa_next;
sa_lot_t *sa_layout; sa_lot_t *sa_layout;
uint16_t *sa_variable_lengths; uint16_t *sa_variable_lengths;
refcount_t sa_refcount; zfs_refcount_t sa_refcount;
uint32_t *sa_idx_tab; /* array of offsets */ uint32_t *sa_idx_tab; /* array of offsets */
} sa_idx_tab_t; } sa_idx_tab_t;

View File

@ -78,7 +78,7 @@ typedef struct spa_config_lock {
kthread_t *scl_writer; kthread_t *scl_writer;
int scl_write_wanted; int scl_write_wanted;
kcondvar_t scl_cv; kcondvar_t scl_cv;
refcount_t scl_count; zfs_refcount_t scl_count;
} spa_config_lock_t; } spa_config_lock_t;
typedef struct spa_config_dirent { typedef struct spa_config_dirent {
@ -281,12 +281,12 @@ struct spa {
/* /*
* spa_refcount & spa_config_lock must be the last elements * spa_refcount & spa_config_lock must be the last elements
* because refcount_t changes size based on compilation options. * because zfs_refcount_t changes size based on compilation options.
* In order for the MDB module to function correctly, the other * In order for the MDB module to function correctly, the other
* fields must remain in the same location. * fields must remain in the same location.
*/ */
spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */ spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */
refcount_t spa_refcount; /* number of opens */ zfs_refcount_t spa_refcount; /* number of opens */
taskq_t *spa_upgrade_taskq; /* taskq for upgrade jobs */ taskq_t *spa_upgrade_taskq; /* taskq for upgrade jobs */
}; };

View File

@ -71,7 +71,7 @@
__entry->db_offset = db->db.db_offset; \ __entry->db_offset = db->db.db_offset; \
__entry->db_size = db->db.db_size; \ __entry->db_size = db->db.db_size; \
__entry->db_state = db->db_state; \ __entry->db_state = db->db_state; \
__entry->db_holds = refcount_count(&db->db_holds); \ __entry->db_holds = zfs_refcount_count(&db->db_holds); \
snprintf(__get_str(msg), TRACE_DBUF_MSG_MAX, \ snprintf(__get_str(msg), TRACE_DBUF_MSG_MAX, \
DBUF_TP_PRINTK_FMT, DBUF_TP_PRINTK_ARGS); \ DBUF_TP_PRINTK_FMT, DBUF_TP_PRINTK_ARGS); \
} else { \ } else { \

View File

@ -50,7 +50,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
__field(uint64_t, tx_lastsnap_txg) __field(uint64_t, tx_lastsnap_txg)
__field(uint64_t, tx_lasttried_txg) __field(uint64_t, tx_lasttried_txg)
__field(boolean_t, tx_anyobj) __field(boolean_t, tx_anyobj)
__field(boolean_t, tx_waited) __field(boolean_t, tx_dirty_delayed)
__field(hrtime_t, tx_start) __field(hrtime_t, tx_start)
__field(boolean_t, tx_wait_dirty) __field(boolean_t, tx_wait_dirty)
__field(int, tx_err) __field(int, tx_err)
@ -62,7 +62,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
__entry->tx_lastsnap_txg = tx->tx_lastsnap_txg; __entry->tx_lastsnap_txg = tx->tx_lastsnap_txg;
__entry->tx_lasttried_txg = tx->tx_lasttried_txg; __entry->tx_lasttried_txg = tx->tx_lasttried_txg;
__entry->tx_anyobj = tx->tx_anyobj; __entry->tx_anyobj = tx->tx_anyobj;
__entry->tx_waited = tx->tx_waited; __entry->tx_dirty_delayed = tx->tx_dirty_delayed;
__entry->tx_start = tx->tx_start; __entry->tx_start = tx->tx_start;
__entry->tx_wait_dirty = tx->tx_wait_dirty; __entry->tx_wait_dirty = tx->tx_wait_dirty;
__entry->tx_err = tx->tx_err; __entry->tx_err = tx->tx_err;
@ -70,11 +70,12 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
__entry->min_tx_time = min_tx_time; __entry->min_tx_time = min_tx_time;
), ),
TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu " TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu "
"anyobj %d waited %d start %llu wait_dirty %d err %i " "anyobj %d dirty_delayed %d start %llu wait_dirty %d err %i "
"} dirty %llu min_tx_time %llu", "} dirty %llu min_tx_time %llu",
__entry->tx_txg, __entry->tx_lastsnap_txg, __entry->tx_txg, __entry->tx_lastsnap_txg,
__entry->tx_lasttried_txg, __entry->tx_anyobj, __entry->tx_waited, __entry->tx_lasttried_txg, __entry->tx_anyobj,
__entry->tx_start, __entry->tx_wait_dirty, __entry->tx_err, __entry->tx_dirty_delayed, __entry->tx_start,
__entry->tx_wait_dirty, __entry->tx_err,
__entry->dirty, __entry->min_tx_time) __entry->dirty, __entry->min_tx_time)
); );
/* END CSTYLED */ /* END CSTYLED */

View File

@ -42,7 +42,7 @@
#include <sys/uio.h> #include <sys/uio.h>
extern int uiomove(void *, size_t, enum uio_rw, uio_t *); extern int uiomove(void *, size_t, enum uio_rw, uio_t *);
extern void uio_prefaultpages(ssize_t, uio_t *); extern int uio_prefaultpages(ssize_t, uio_t *);
extern int uiocopy(void *, size_t, enum uio_rw, uio_t *, size_t *); extern int uiocopy(void *, size_t, enum uio_rw, uio_t *, size_t *);
extern void uioskip(uio_t *, size_t); extern void uioskip(uio_t *, size_t);

View File

@ -23,23 +23,11 @@
* Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). * Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
* Written by Brian Behlendorf <behlendorf1@llnl.gov>. * Written by Brian Behlendorf <behlendorf1@llnl.gov>.
* LLNL-CODE-403049. * LLNL-CODE-403049.
* Copyright (c) 2018 by Delphix. All rights reserved.
*/ */
#ifndef _SYS_VDEV_DISK_H #ifndef _SYS_VDEV_DISK_H
#define _SYS_VDEV_DISK_H #define _SYS_VDEV_DISK_H
/*
* Don't start the slice at the default block of 34; many storage
* devices will use a stripe width of 128k, other vendors prefer a 1m
* alignment. It is best to play it safe and ensure a 1m alignment
* given 512B blocks. When the block size is larger by a power of 2
* we will still be 1m aligned. Some devices are sensitive to the
* partition ending alignment as well.
*/
#define NEW_START_BLOCK 2048
#define PARTITION_END_ALIGNMENT 2048
#ifdef _KERNEL #ifdef _KERNEL
#include <sys/vdev.h> #include <sys/vdev.h>

View File

@ -226,7 +226,7 @@ int zap_lookup_norm_by_dnode(dnode_t *dn, const char *name,
boolean_t *ncp); boolean_t *ncp);
int zap_count_write_by_dnode(dnode_t *dn, const char *name, int zap_count_write_by_dnode(dnode_t *dn, const char *name,
int add, refcount_t *towrite, refcount_t *tooverwrite); int add, zfs_refcount_t *towrite, zfs_refcount_t *tooverwrite);
/* /*
* Create an attribute with the given name and value. * Create an attribute with the given name and value.

View File

@ -209,7 +209,7 @@ typedef struct znode_hold {
uint64_t zh_obj; /* object id */ uint64_t zh_obj; /* object id */
kmutex_t zh_lock; /* lock serializing object access */ kmutex_t zh_lock; /* lock serializing object access */
avl_node_t zh_node; /* avl tree linkage */ avl_node_t zh_node; /* avl tree linkage */
refcount_t zh_refcount; /* active consumer reference count */ zfs_refcount_t zh_refcount; /* active consumer reference count */
} znode_hold_t; } znode_hold_t;
/* /*

View File

@ -237,7 +237,7 @@ enum zio_child {
#define ZIO_CHILD_DDT_BIT ZIO_CHILD_BIT(ZIO_CHILD_DDT) #define ZIO_CHILD_DDT_BIT ZIO_CHILD_BIT(ZIO_CHILD_DDT)
#define ZIO_CHILD_LOGICAL_BIT ZIO_CHILD_BIT(ZIO_CHILD_LOGICAL) #define ZIO_CHILD_LOGICAL_BIT ZIO_CHILD_BIT(ZIO_CHILD_LOGICAL)
#define ZIO_CHILD_ALL_BITS \ #define ZIO_CHILD_ALL_BITS \
(ZIO_CHILD_VDEV_BIT | ZIO_CHILD_GANG_BIT | \ (ZIO_CHILD_VDEV_BIT | ZIO_CHILD_GANG_BIT | \
ZIO_CHILD_DDT_BIT | ZIO_CHILD_LOGICAL_BIT) ZIO_CHILD_DDT_BIT | ZIO_CHILD_LOGICAL_BIT)
enum zio_wait_type { enum zio_wait_type {
@ -375,7 +375,7 @@ typedef struct zio_transform {
struct zio_transform *zt_next; struct zio_transform *zt_next;
} zio_transform_t; } zio_transform_t;
typedef int zio_pipe_stage_t(zio_t *zio); typedef zio_t *zio_pipe_stage_t(zio_t *zio);
/* /*
* The io_reexecute flags are distinct from io_flags because the child must * The io_reexecute flags are distinct from io_flags because the child must

View File

@ -22,7 +22,6 @@
/* /*
* Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2012 Nexenta Systems, Inc. All rights reserved. * Copyright 2012 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2018 by Delphix. All rights reserved.
*/ */
#include <stdio.h> #include <stdio.h>
@ -1154,7 +1153,7 @@ efi_use_whole_disk(int fd)
/* /*
* Find the last physically non-zero partition. * Find the last physically non-zero partition.
* This should be the reserved partition. * This is the reserved partition.
*/ */
for (i = 0; i < efi_label->efi_nparts; i ++) { for (i = 0; i < efi_label->efi_nparts; i ++) {
if (resv_start < efi_label->efi_parts[i].p_start) { if (resv_start < efi_label->efi_parts[i].p_start) {
@ -1163,23 +1162,6 @@ efi_use_whole_disk(int fd)
} }
} }
/*
* Verify that we've found the reserved partition by checking
* that it looks the way it did when we created it in zpool_label_disk.
* If we've found the incorrect partition, then we know that this
* device was reformatted and no longer is soley used by ZFS.
*/
if ((efi_label->efi_parts[resv_index].p_size != EFI_MIN_RESV_SIZE) ||
(efi_label->efi_parts[resv_index].p_tag != V_RESERVED) ||
(resv_index != 8)) {
if (efi_debug) {
(void) fprintf(stderr,
"efi_use_whole_disk: wholedisk not available\n");
}
efi_free(efi_label);
return (VT_ENOSPC);
}
/* /*
* Find the last physically non-zero partition before that. * Find the last physically non-zero partition before that.
* This is the data partition. * This is the data partition.

View File

@ -22,7 +22,7 @@
/* /*
* Copyright 2015 Nexenta Systems, Inc. All rights reserved. * Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2018 by Delphix. All rights reserved. * Copyright (c) 2011, 2014 by Delphix. All rights reserved.
* Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com> * Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com>
* Copyright (c) 2017 Datto Inc. * Copyright (c) 2017 Datto Inc.
*/ */
@ -42,7 +42,6 @@
#include <sys/efi_partition.h> #include <sys/efi_partition.h>
#include <sys/vtoc.h> #include <sys/vtoc.h>
#include <sys/zfs_ioctl.h> #include <sys/zfs_ioctl.h>
#include <sys/vdev_disk.h>
#include <dlfcn.h> #include <dlfcn.h>
#include "zfs_namecheck.h" #include "zfs_namecheck.h"
@ -935,6 +934,17 @@ zpool_prop_get_feature(zpool_handle_t *zhp, const char *propname, char *buf,
return (0); return (0);
} }
/*
* Don't start the slice at the default block of 34; many storage
* devices will use a stripe width of 128k, other vendors prefer a 1m
* alignment. It is best to play it safe and ensure a 1m alignment
* given 512B blocks. When the block size is larger by a power of 2
* we will still be 1m aligned. Some devices are sensitive to the
* partition ending alignment as well.
*/
#define NEW_START_BLOCK 2048
#define PARTITION_END_ALIGNMENT 2048
/* /*
* Validate the given pool name, optionally putting an extended error message in * Validate the given pool name, optionally putting an extended error message in
* 'buf'. * 'buf'.

View File

@ -963,13 +963,14 @@ libzfs_load_module(const char *module)
load = 0; load = 0;
} }
if (load && libzfs_run_process("/sbin/modprobe", argv, 0)) if (load) {
return (ENOEXEC); if (libzfs_run_process("/sbin/modprobe", argv, 0))
} return (ENOEXEC);
/* Module loading is synchronous it must be available */ if (!libzfs_module_loaded(module))
if (!libzfs_module_loaded(module)) return (ENXIO);
return (ENXIO); }
}
/* /*
* Device creation by udev is asynchronous and waiting may be * Device creation by udev is asynchronous and waiting may be

View File

@ -38,6 +38,19 @@ defined by udev. This may be an absolute path or the base filename.
Maps a physical path to a channel name (typically representing a single Maps a physical path to a channel name (typically representing a single
disk enclosure). disk enclosure).
.TP
\fIenclosure_symlinks\fR <yes|no>
Additionally create /dev/by-enclosure symlinks to the disk enclosure
sg devices using the naming scheme from from vdev_id.conf.
\fIenclosure_symlinks\fR is only allowed for sas_direct mode.
.TP
\fIenclosure_symlinks_prefix\fR <prefix>
Specify the prefix for the enclosure symlinks in the form of:
/dev/by-enclosure/<prefix>-<channel><num>
Defaults to "enc" if not specified.
.TP
\fIpci_slot\fR - specifies the PCI SLOT of the HBA \fIpci_slot\fR - specifies the PCI SLOT of the HBA
hosting the disk enclosure being mapped, as found in the output of hosting the disk enclosure being mapped, as found in the output of
.BR lspci (8). .BR lspci (8).
@ -90,7 +103,7 @@ internally uses this value to determine which HBA or switch port a
device is connected to. The default is 4. device is connected to. The default is 4.
.TP .TP
\fIslot\fR <bay|phy|port|id|lun> \fIslot\fR <bay|phy|port|id|lun|ses>
Specifies from which element of a SAS identifier the slot number is Specifies from which element of a SAS identifier the slot number is
taken. The default is bay. taken. The default is bay.
@ -103,6 +116,12 @@ taken. The default is bay.
\fIid\fR - use the scsi id as the slot number. \fIid\fR - use the scsi id as the slot number.
\fIlun\fR - use the scsi lun as the slot number. \fIlun\fR - use the scsi lun as the slot number.
\fIses\fR - use the SCSI Enclosure Services (SES) enclosure device slot number,
as reported by
.BR sg_ses (8).
This is intended for use only on systems where \fIbay\fR is unsupported,
noting that \fIport\fR and \fIid\fR may be unstable across disk replacement.
.SH EXAMPLES .SH EXAMPLES
A non-multipath configuration with direct-attached SAS enclosures and an A non-multipath configuration with direct-attached SAS enclosures and an
arbitrary slot re-mapping. arbitrary slot re-mapping.
@ -163,6 +182,27 @@ definitions - one per physical path.
channel 86:00.0 0 B channel 86:00.0 0 B
.fi .fi
.P .P
A configuration with enclosure_symlinks enabled.
.P
.nf
multipath yes
enclosure_symlinks yes
# PCI_ID HBA PORT CHANNEL NAME
channel 05:00.0 1 U
channel 05:00.0 0 L
channel 06:00.0 1 U
channel 06:00.0 0 L
.fi
In addition to the disks symlinks, this configuration will create:
.P
.nf
/dev/by-enclosure/enc-L0
/dev/by-enclosure/enc-L1
/dev/by-enclosure/enc-U0
/dev/by-enclosure/enc-U1
.fi
.P
A configuration using device link aliases. A configuration using device link aliases.
.P .P
.nf .nf

View File

@ -29,7 +29,7 @@
.\" Copyright 2016 Nexenta Systems, Inc. .\" Copyright 2016 Nexenta Systems, Inc.
.\" Copyright 2016 Richard Laager. All rights reserved. .\" Copyright 2016 Richard Laager. All rights reserved.
.\" .\"
.Dd July 13, 2018 .Dd Jan 05, 2019
.Dt ZFS 8 SMM .Dt ZFS 8 SMM
.Os Linux .Os Linux
.Sh NAME .Sh NAME
@ -3981,9 +3981,9 @@ renames the remaining snapshots, and then creates a new snapshot, as follows:
# zfs destroy -r pool/users@7daysago # zfs destroy -r pool/users@7daysago
# zfs rename -r pool/users@6daysago @7daysago # zfs rename -r pool/users@6daysago @7daysago
# zfs rename -r pool/users@5daysago @6daysago # zfs rename -r pool/users@5daysago @6daysago
# zfs rename -r pool/users@yesterday @5daysago # zfs rename -r pool/users@4daysago @5daysago
# zfs rename -r pool/users@yesterday @4daysago # zfs rename -r pool/users@3daysago @4daysago
# zfs rename -r pool/users@yesterday @3daysago # zfs rename -r pool/users@2daysago @3daysago
# zfs rename -r pool/users@yesterday @2daysago # zfs rename -r pool/users@yesterday @2daysago
# zfs rename -r pool/users@today @yesterday # zfs rename -r pool/users@today @yesterday
# zfs snapshot -r pool/users@today # zfs snapshot -r pool/users@today

View File

@ -36,12 +36,12 @@ modules:
list='$(SUBDIR_TARGETS)'; for targetdir in $$list; do \ list='$(SUBDIR_TARGETS)'; for targetdir in $$list; do \
$(MAKE) -C $$targetdir; \ $(MAKE) -C $$targetdir; \
done done
$(MAKE) -C @LINUX_OBJ@ SUBDIRS=`pwd` @KERNELMAKE_PARAMS@ CONFIG_ZFS=m $@ $(MAKE) -C @LINUX_OBJ@ M=`pwd` @KERNELMAKE_PARAMS@ CONFIG_ZFS=m $@
clean: clean:
@# Only cleanup the kernel build directories when CONFIG_KERNEL @# Only cleanup the kernel build directories when CONFIG_KERNEL
@# is defined. This indicates that kernel modules should be built. @# is defined. This indicates that kernel modules should be built.
@CONFIG_KERNEL_TRUE@ $(MAKE) -C @LINUX_OBJ@ SUBDIRS=`pwd` @KERNELMAKE_PARAMS@ $@ @CONFIG_KERNEL_TRUE@ $(MAKE) -C @LINUX_OBJ@ M=`pwd` @KERNELMAKE_PARAMS@ $@
if [ -f @SPL_SYMBOLS@ ]; then $(RM) @SPL_SYMBOLS@; fi if [ -f @SPL_SYMBOLS@ ]; then $(RM) @SPL_SYMBOLS@; fi
if [ -f @LINUX_SYMBOLS@ ]; then $(RM) @LINUX_SYMBOLS@; fi if [ -f @LINUX_SYMBOLS@ ]; then $(RM) @LINUX_SYMBOLS@; fi
@ -49,7 +49,7 @@ clean:
modules_install: modules_install:
@# Install the kernel modules @# Install the kernel modules
$(MAKE) -C @LINUX_OBJ@ SUBDIRS=`pwd` $@ \ $(MAKE) -C @LINUX_OBJ@ M=`pwd` $@ \
INSTALL_MOD_PATH=$(DESTDIR)$(INSTALL_MOD_PATH) \ INSTALL_MOD_PATH=$(DESTDIR)$(INSTALL_MOD_PATH) \
INSTALL_MOD_DIR=$(INSTALL_MOD_DIR) \ INSTALL_MOD_DIR=$(INSTALL_MOD_DIR) \
KERNELRELEASE=@LINUX_VERSION@ KERNELRELEASE=@LINUX_VERSION@

View File

@ -50,6 +50,7 @@
#include <sys/types.h> #include <sys/types.h>
#include <sys/uio_impl.h> #include <sys/uio_impl.h>
#include <linux/kmap_compat.h> #include <linux/kmap_compat.h>
#include <linux/uaccess.h>
/* /*
* Move "n" bytes at byte address "p"; "rw" indicates the direction * Move "n" bytes at byte address "p"; "rw" indicates the direction
@ -77,8 +78,23 @@ uiomove_iov(void *p, size_t n, enum uio_rw rw, struct uio *uio)
if (copy_to_user(iov->iov_base+skip, p, cnt)) if (copy_to_user(iov->iov_base+skip, p, cnt))
return (EFAULT); return (EFAULT);
} else { } else {
if (copy_from_user(p, iov->iov_base+skip, cnt)) if (uio->uio_fault_disable) {
return (EFAULT); if (!zfs_access_ok(VERIFY_READ,
(iov->iov_base + skip), cnt)) {
return (EFAULT);
}
pagefault_disable();
if (__copy_from_user_inatomic(p,
(iov->iov_base + skip), cnt)) {
pagefault_enable();
return (EFAULT);
}
pagefault_enable();
} else {
if (copy_from_user(p,
(iov->iov_base + skip), cnt))
return (EFAULT);
}
} }
break; break;
case UIO_SYSSPACE: case UIO_SYSSPACE:
@ -156,7 +172,7 @@ EXPORT_SYMBOL(uiomove);
* error will terminate the process as this is only a best attempt to get * error will terminate the process as this is only a best attempt to get
* the pages resident. * the pages resident.
*/ */
void int
uio_prefaultpages(ssize_t n, struct uio *uio) uio_prefaultpages(ssize_t n, struct uio *uio)
{ {
const struct iovec *iov; const struct iovec *iov;
@ -170,7 +186,7 @@ uio_prefaultpages(ssize_t n, struct uio *uio)
switch (uio->uio_segflg) { switch (uio->uio_segflg) {
case UIO_SYSSPACE: case UIO_SYSSPACE:
case UIO_BVEC: case UIO_BVEC:
return; return (0);
case UIO_USERSPACE: case UIO_USERSPACE:
case UIO_USERISPACE: case UIO_USERISPACE:
break; break;
@ -194,7 +210,7 @@ uio_prefaultpages(ssize_t n, struct uio *uio)
p = iov->iov_base + skip; p = iov->iov_base + skip;
while (cnt) { while (cnt) {
if (fuword8((uint8_t *)p, &tmp)) if (fuword8((uint8_t *)p, &tmp))
return; return (EFAULT);
incr = MIN(cnt, PAGESIZE); incr = MIN(cnt, PAGESIZE);
p += incr; p += incr;
cnt -= incr; cnt -= incr;
@ -204,8 +220,10 @@ uio_prefaultpages(ssize_t n, struct uio *uio)
*/ */
p--; p--;
if (fuword8((uint8_t *)p, &tmp)) if (fuword8((uint8_t *)p, &tmp))
return; return (EFAULT);
} }
return (0);
} }
EXPORT_SYMBOL(uio_prefaultpages); EXPORT_SYMBOL(uio_prefaultpages);

View File

@ -597,7 +597,7 @@ abd_alloc(size_t size, boolean_t is_metadata)
} }
abd->abd_size = size; abd->abd_size = size;
abd->abd_parent = NULL; abd->abd_parent = NULL;
refcount_create(&abd->abd_children); zfs_refcount_create(&abd->abd_children);
abd->abd_u.abd_scatter.abd_offset = 0; abd->abd_u.abd_scatter.abd_offset = 0;
@ -614,7 +614,7 @@ abd_free_scatter(abd_t *abd)
{ {
abd_free_pages(abd); abd_free_pages(abd);
refcount_destroy(&abd->abd_children); zfs_refcount_destroy(&abd->abd_children);
ABDSTAT_BUMPDOWN(abdstat_scatter_cnt); ABDSTAT_BUMPDOWN(abdstat_scatter_cnt);
ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size); ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size);
ABDSTAT_INCR(abdstat_scatter_chunk_waste, ABDSTAT_INCR(abdstat_scatter_chunk_waste,
@ -641,7 +641,7 @@ abd_alloc_linear(size_t size, boolean_t is_metadata)
} }
abd->abd_size = size; abd->abd_size = size;
abd->abd_parent = NULL; abd->abd_parent = NULL;
refcount_create(&abd->abd_children); zfs_refcount_create(&abd->abd_children);
if (is_metadata) { if (is_metadata) {
abd->abd_u.abd_linear.abd_buf = zio_buf_alloc(size); abd->abd_u.abd_linear.abd_buf = zio_buf_alloc(size);
@ -664,7 +664,7 @@ abd_free_linear(abd_t *abd)
zio_data_buf_free(abd->abd_u.abd_linear.abd_buf, abd->abd_size); zio_data_buf_free(abd->abd_u.abd_linear.abd_buf, abd->abd_size);
} }
refcount_destroy(&abd->abd_children); zfs_refcount_destroy(&abd->abd_children);
ABDSTAT_BUMPDOWN(abdstat_linear_cnt); ABDSTAT_BUMPDOWN(abdstat_linear_cnt);
ABDSTAT_INCR(abdstat_linear_data_size, -(int)abd->abd_size); ABDSTAT_INCR(abdstat_linear_data_size, -(int)abd->abd_size);
@ -775,8 +775,8 @@ abd_get_offset_impl(abd_t *sabd, size_t off, size_t size)
abd->abd_size = size; abd->abd_size = size;
abd->abd_parent = sabd; abd->abd_parent = sabd;
refcount_create(&abd->abd_children); zfs_refcount_create(&abd->abd_children);
(void) refcount_add_many(&sabd->abd_children, abd->abd_size, abd); (void) zfs_refcount_add_many(&sabd->abd_children, abd->abd_size, abd);
return (abd); return (abd);
} }
@ -818,7 +818,7 @@ abd_get_from_buf(void *buf, size_t size)
abd->abd_flags = ABD_FLAG_LINEAR; abd->abd_flags = ABD_FLAG_LINEAR;
abd->abd_size = size; abd->abd_size = size;
abd->abd_parent = NULL; abd->abd_parent = NULL;
refcount_create(&abd->abd_children); zfs_refcount_create(&abd->abd_children);
abd->abd_u.abd_linear.abd_buf = buf; abd->abd_u.abd_linear.abd_buf = buf;
@ -836,11 +836,11 @@ abd_put(abd_t *abd)
ASSERT(!(abd->abd_flags & ABD_FLAG_OWNER)); ASSERT(!(abd->abd_flags & ABD_FLAG_OWNER));
if (abd->abd_parent != NULL) { if (abd->abd_parent != NULL) {
(void) refcount_remove_many(&abd->abd_parent->abd_children, (void) zfs_refcount_remove_many(&abd->abd_parent->abd_children,
abd->abd_size, abd); abd->abd_size, abd);
} }
refcount_destroy(&abd->abd_children); zfs_refcount_destroy(&abd->abd_children);
abd_free_struct(abd); abd_free_struct(abd);
} }
@ -872,7 +872,7 @@ abd_borrow_buf(abd_t *abd, size_t n)
} else { } else {
buf = zio_buf_alloc(n); buf = zio_buf_alloc(n);
} }
(void) refcount_add_many(&abd->abd_children, n, buf); (void) zfs_refcount_add_many(&abd->abd_children, n, buf);
return (buf); return (buf);
} }
@ -904,7 +904,7 @@ abd_return_buf(abd_t *abd, void *buf, size_t n)
ASSERT0(abd_cmp_buf(abd, buf, n)); ASSERT0(abd_cmp_buf(abd, buf, n));
zio_buf_free(buf, n); zio_buf_free(buf, n);
} }
(void) refcount_remove_many(&abd->abd_children, n, buf); (void) zfs_refcount_remove_many(&abd->abd_children, n, buf);
} }
void void

View File

@ -1181,7 +1181,7 @@ hdr_full_cons(void *vbuf, void *unused, int kmflag)
bzero(hdr, HDR_FULL_SIZE); bzero(hdr, HDR_FULL_SIZE);
cv_init(&hdr->b_l1hdr.b_cv, NULL, CV_DEFAULT, NULL); cv_init(&hdr->b_l1hdr.b_cv, NULL, CV_DEFAULT, NULL);
refcount_create(&hdr->b_l1hdr.b_refcnt); zfs_refcount_create(&hdr->b_l1hdr.b_refcnt);
mutex_init(&hdr->b_l1hdr.b_freeze_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&hdr->b_l1hdr.b_freeze_lock, NULL, MUTEX_DEFAULT, NULL);
list_link_init(&hdr->b_l1hdr.b_arc_node); list_link_init(&hdr->b_l1hdr.b_arc_node);
list_link_init(&hdr->b_l2hdr.b_l2node); list_link_init(&hdr->b_l2hdr.b_l2node);
@ -1228,7 +1228,7 @@ hdr_full_dest(void *vbuf, void *unused)
ASSERT(HDR_EMPTY(hdr)); ASSERT(HDR_EMPTY(hdr));
cv_destroy(&hdr->b_l1hdr.b_cv); cv_destroy(&hdr->b_l1hdr.b_cv);
refcount_destroy(&hdr->b_l1hdr.b_refcnt); zfs_refcount_destroy(&hdr->b_l1hdr.b_refcnt);
mutex_destroy(&hdr->b_l1hdr.b_freeze_lock); mutex_destroy(&hdr->b_l1hdr.b_freeze_lock);
ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node)); ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node));
arc_space_return(HDR_FULL_SIZE, ARC_SPACE_HDRS); arc_space_return(HDR_FULL_SIZE, ARC_SPACE_HDRS);
@ -1893,20 +1893,20 @@ arc_evictable_space_increment(arc_buf_hdr_t *hdr, arc_state_t *state)
ASSERT0(hdr->b_l1hdr.b_bufcnt); ASSERT0(hdr->b_l1hdr.b_bufcnt);
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
(void) refcount_add_many(&state->arcs_esize[type], (void) zfs_refcount_add_many(&state->arcs_esize[type],
HDR_GET_LSIZE(hdr), hdr); HDR_GET_LSIZE(hdr), hdr);
return; return;
} }
ASSERT(!GHOST_STATE(state)); ASSERT(!GHOST_STATE(state));
if (hdr->b_l1hdr.b_pabd != NULL) { if (hdr->b_l1hdr.b_pabd != NULL) {
(void) refcount_add_many(&state->arcs_esize[type], (void) zfs_refcount_add_many(&state->arcs_esize[type],
arc_hdr_size(hdr), hdr); arc_hdr_size(hdr), hdr);
} }
for (buf = hdr->b_l1hdr.b_buf; buf != NULL; buf = buf->b_next) { for (buf = hdr->b_l1hdr.b_buf; buf != NULL; buf = buf->b_next) {
if (arc_buf_is_shared(buf)) if (arc_buf_is_shared(buf))
continue; continue;
(void) refcount_add_many(&state->arcs_esize[type], (void) zfs_refcount_add_many(&state->arcs_esize[type],
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
} }
@ -1928,20 +1928,20 @@ arc_evictable_space_decrement(arc_buf_hdr_t *hdr, arc_state_t *state)
ASSERT0(hdr->b_l1hdr.b_bufcnt); ASSERT0(hdr->b_l1hdr.b_bufcnt);
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
(void) refcount_remove_many(&state->arcs_esize[type], (void) zfs_refcount_remove_many(&state->arcs_esize[type],
HDR_GET_LSIZE(hdr), hdr); HDR_GET_LSIZE(hdr), hdr);
return; return;
} }
ASSERT(!GHOST_STATE(state)); ASSERT(!GHOST_STATE(state));
if (hdr->b_l1hdr.b_pabd != NULL) { if (hdr->b_l1hdr.b_pabd != NULL) {
(void) refcount_remove_many(&state->arcs_esize[type], (void) zfs_refcount_remove_many(&state->arcs_esize[type],
arc_hdr_size(hdr), hdr); arc_hdr_size(hdr), hdr);
} }
for (buf = hdr->b_l1hdr.b_buf; buf != NULL; buf = buf->b_next) { for (buf = hdr->b_l1hdr.b_buf; buf != NULL; buf = buf->b_next) {
if (arc_buf_is_shared(buf)) if (arc_buf_is_shared(buf))
continue; continue;
(void) refcount_remove_many(&state->arcs_esize[type], (void) zfs_refcount_remove_many(&state->arcs_esize[type],
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
} }
@ -1960,13 +1960,13 @@ add_reference(arc_buf_hdr_t *hdr, void *tag)
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
if (!MUTEX_HELD(HDR_LOCK(hdr))) { if (!MUTEX_HELD(HDR_LOCK(hdr))) {
ASSERT(hdr->b_l1hdr.b_state == arc_anon); ASSERT(hdr->b_l1hdr.b_state == arc_anon);
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
} }
state = hdr->b_l1hdr.b_state; state = hdr->b_l1hdr.b_state;
if ((refcount_add(&hdr->b_l1hdr.b_refcnt, tag) == 1) && if ((zfs_refcount_add(&hdr->b_l1hdr.b_refcnt, tag) == 1) &&
(state != arc_anon)) { (state != arc_anon)) {
/* We don't use the L2-only state list. */ /* We don't use the L2-only state list. */
if (state != arc_l2c_only) { if (state != arc_l2c_only) {
@ -1998,7 +1998,7 @@ remove_reference(arc_buf_hdr_t *hdr, kmutex_t *hash_lock, void *tag)
* arc_l2c_only counts as a ghost state so we don't need to explicitly * arc_l2c_only counts as a ghost state so we don't need to explicitly
* check to prevent usage of the arc_l2c_only list. * check to prevent usage of the arc_l2c_only list.
*/ */
if (((cnt = refcount_remove(&hdr->b_l1hdr.b_refcnt, tag)) == 0) && if (((cnt = zfs_refcount_remove(&hdr->b_l1hdr.b_refcnt, tag)) == 0) &&
(state != arc_anon)) { (state != arc_anon)) {
multilist_insert(state->arcs_list[arc_buf_type(hdr)], hdr); multilist_insert(state->arcs_list[arc_buf_type(hdr)], hdr);
ASSERT3U(hdr->b_l1hdr.b_bufcnt, >, 0); ASSERT3U(hdr->b_l1hdr.b_bufcnt, >, 0);
@ -2043,7 +2043,7 @@ arc_buf_info(arc_buf_t *ab, arc_buf_info_t *abi, int state_index)
abi->abi_mru_ghost_hits = l1hdr->b_mru_ghost_hits; abi->abi_mru_ghost_hits = l1hdr->b_mru_ghost_hits;
abi->abi_mfu_hits = l1hdr->b_mfu_hits; abi->abi_mfu_hits = l1hdr->b_mfu_hits;
abi->abi_mfu_ghost_hits = l1hdr->b_mfu_ghost_hits; abi->abi_mfu_ghost_hits = l1hdr->b_mfu_ghost_hits;
abi->abi_holds = refcount_count(&l1hdr->b_refcnt); abi->abi_holds = zfs_refcount_count(&l1hdr->b_refcnt);
} }
if (l2hdr) { if (l2hdr) {
@ -2079,7 +2079,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr,
*/ */
if (HDR_HAS_L1HDR(hdr)) { if (HDR_HAS_L1HDR(hdr)) {
old_state = hdr->b_l1hdr.b_state; old_state = hdr->b_l1hdr.b_state;
refcnt = refcount_count(&hdr->b_l1hdr.b_refcnt); refcnt = zfs_refcount_count(&hdr->b_l1hdr.b_refcnt);
bufcnt = hdr->b_l1hdr.b_bufcnt; bufcnt = hdr->b_l1hdr.b_bufcnt;
update_old = (bufcnt > 0 || hdr->b_l1hdr.b_pabd != NULL); update_old = (bufcnt > 0 || hdr->b_l1hdr.b_pabd != NULL);
} else { } else {
@ -2148,7 +2148,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr,
* the reference. As a result, we use the arc * the reference. As a result, we use the arc
* header pointer for the reference. * header pointer for the reference.
*/ */
(void) refcount_add_many(&new_state->arcs_size, (void) zfs_refcount_add_many(&new_state->arcs_size,
HDR_GET_LSIZE(hdr), hdr); HDR_GET_LSIZE(hdr), hdr);
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
} else { } else {
@ -2175,13 +2175,15 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr,
if (arc_buf_is_shared(buf)) if (arc_buf_is_shared(buf))
continue; continue;
(void) refcount_add_many(&new_state->arcs_size, (void) zfs_refcount_add_many(
&new_state->arcs_size,
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
ASSERT3U(bufcnt, ==, buffers); ASSERT3U(bufcnt, ==, buffers);
if (hdr->b_l1hdr.b_pabd != NULL) { if (hdr->b_l1hdr.b_pabd != NULL) {
(void) refcount_add_many(&new_state->arcs_size, (void) zfs_refcount_add_many(
&new_state->arcs_size,
arc_hdr_size(hdr), hdr); arc_hdr_size(hdr), hdr);
} else { } else {
ASSERT(GHOST_STATE(old_state)); ASSERT(GHOST_STATE(old_state));
@ -2203,7 +2205,7 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr,
* header on the ghost state. * header on the ghost state.
*/ */
(void) refcount_remove_many(&old_state->arcs_size, (void) zfs_refcount_remove_many(&old_state->arcs_size,
HDR_GET_LSIZE(hdr), hdr); HDR_GET_LSIZE(hdr), hdr);
} else { } else {
arc_buf_t *buf; arc_buf_t *buf;
@ -2229,13 +2231,13 @@ arc_change_state(arc_state_t *new_state, arc_buf_hdr_t *hdr,
if (arc_buf_is_shared(buf)) if (arc_buf_is_shared(buf))
continue; continue;
(void) refcount_remove_many( (void) zfs_refcount_remove_many(
&old_state->arcs_size, arc_buf_size(buf), &old_state->arcs_size, arc_buf_size(buf),
buf); buf);
} }
ASSERT3U(bufcnt, ==, buffers); ASSERT3U(bufcnt, ==, buffers);
ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
(void) refcount_remove_many( (void) zfs_refcount_remove_many(
&old_state->arcs_size, arc_hdr_size(hdr), hdr); &old_state->arcs_size, arc_hdr_size(hdr), hdr);
} }
} }
@ -2505,8 +2507,8 @@ arc_return_buf(arc_buf_t *buf, void *tag)
ASSERT3P(buf->b_data, !=, NULL); ASSERT3P(buf->b_data, !=, NULL);
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
(void) refcount_add(&hdr->b_l1hdr.b_refcnt, tag); (void) zfs_refcount_add(&hdr->b_l1hdr.b_refcnt, tag);
(void) refcount_remove(&hdr->b_l1hdr.b_refcnt, arc_onloan_tag); (void) zfs_refcount_remove(&hdr->b_l1hdr.b_refcnt, arc_onloan_tag);
arc_loaned_bytes_update(-arc_buf_size(buf)); arc_loaned_bytes_update(-arc_buf_size(buf));
} }
@ -2519,8 +2521,8 @@ arc_loan_inuse_buf(arc_buf_t *buf, void *tag)
ASSERT3P(buf->b_data, !=, NULL); ASSERT3P(buf->b_data, !=, NULL);
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
(void) refcount_add(&hdr->b_l1hdr.b_refcnt, arc_onloan_tag); (void) zfs_refcount_add(&hdr->b_l1hdr.b_refcnt, arc_onloan_tag);
(void) refcount_remove(&hdr->b_l1hdr.b_refcnt, tag); (void) zfs_refcount_remove(&hdr->b_l1hdr.b_refcnt, tag);
arc_loaned_bytes_update(arc_buf_size(buf)); arc_loaned_bytes_update(arc_buf_size(buf));
} }
@ -2547,13 +2549,13 @@ arc_hdr_free_on_write(arc_buf_hdr_t *hdr)
/* protected by hash lock, if in the hash table */ /* protected by hash lock, if in the hash table */
if (multilist_link_active(&hdr->b_l1hdr.b_arc_node)) { if (multilist_link_active(&hdr->b_l1hdr.b_arc_node)) {
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
ASSERT(state != arc_anon && state != arc_l2c_only); ASSERT(state != arc_anon && state != arc_l2c_only);
(void) refcount_remove_many(&state->arcs_esize[type], (void) zfs_refcount_remove_many(&state->arcs_esize[type],
size, hdr); size, hdr);
} }
(void) refcount_remove_many(&state->arcs_size, size, hdr); (void) zfs_refcount_remove_many(&state->arcs_size, size, hdr);
if (type == ARC_BUFC_METADATA) { if (type == ARC_BUFC_METADATA) {
arc_space_return(size, ARC_SPACE_META); arc_space_return(size, ARC_SPACE_META);
} else { } else {
@ -2581,7 +2583,8 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t *buf)
* refcount ownership to the hdr since it always owns * refcount ownership to the hdr since it always owns
* the refcount whenever an arc_buf_t is shared. * the refcount whenever an arc_buf_t is shared.
*/ */
refcount_transfer_ownership(&hdr->b_l1hdr.b_state->arcs_size, buf, hdr); zfs_refcount_transfer_ownership(&hdr->b_l1hdr.b_state->arcs_size, buf,
hdr);
hdr->b_l1hdr.b_pabd = abd_get_from_buf(buf->b_data, arc_buf_size(buf)); hdr->b_l1hdr.b_pabd = abd_get_from_buf(buf->b_data, arc_buf_size(buf));
abd_take_ownership_of_buf(hdr->b_l1hdr.b_pabd, abd_take_ownership_of_buf(hdr->b_l1hdr.b_pabd,
HDR_ISTYPE_METADATA(hdr)); HDR_ISTYPE_METADATA(hdr));
@ -2609,7 +2612,8 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t *buf)
* We are no longer sharing this buffer so we need * We are no longer sharing this buffer so we need
* to transfer its ownership to the rightful owner. * to transfer its ownership to the rightful owner.
*/ */
refcount_transfer_ownership(&hdr->b_l1hdr.b_state->arcs_size, hdr, buf); zfs_refcount_transfer_ownership(&hdr->b_l1hdr.b_state->arcs_size, hdr,
buf);
arc_hdr_clear_flags(hdr, ARC_FLAG_SHARED_DATA); arc_hdr_clear_flags(hdr, ARC_FLAG_SHARED_DATA);
abd_release_ownership_of_buf(hdr->b_l1hdr.b_pabd); abd_release_ownership_of_buf(hdr->b_l1hdr.b_pabd);
abd_put(hdr->b_l1hdr.b_pabd); abd_put(hdr->b_l1hdr.b_pabd);
@ -2833,7 +2837,7 @@ arc_hdr_alloc(uint64_t spa, int32_t psize, int32_t lsize,
* it references and compressed arc enablement. * it references and compressed arc enablement.
*/ */
arc_hdr_alloc_pabd(hdr); arc_hdr_alloc_pabd(hdr);
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
return (hdr); return (hdr);
} }
@ -2927,8 +2931,10 @@ arc_hdr_realloc(arc_buf_hdr_t *hdr, kmem_cache_t *old, kmem_cache_t *new)
* the wrong pointer address when calling arc_hdr_destroy() later. * the wrong pointer address when calling arc_hdr_destroy() later.
*/ */
(void) refcount_remove_many(&dev->l2ad_alloc, arc_hdr_size(hdr), hdr); (void) zfs_refcount_remove_many(&dev->l2ad_alloc, arc_hdr_size(hdr),
(void) refcount_add_many(&dev->l2ad_alloc, arc_hdr_size(nhdr), nhdr); hdr);
(void) zfs_refcount_add_many(&dev->l2ad_alloc, arc_hdr_size(nhdr),
nhdr);
buf_discard_identity(hdr); buf_discard_identity(hdr);
kmem_cache_free(old, hdr); kmem_cache_free(old, hdr);
@ -3008,7 +3014,7 @@ arc_hdr_l2hdr_destroy(arc_buf_hdr_t *hdr)
vdev_space_update(dev->l2ad_vdev, -psize, 0, 0); vdev_space_update(dev->l2ad_vdev, -psize, 0, 0);
(void) refcount_remove_many(&dev->l2ad_alloc, psize, hdr); (void) zfs_refcount_remove_many(&dev->l2ad_alloc, psize, hdr);
arc_hdr_clear_flags(hdr, ARC_FLAG_HAS_L2HDR); arc_hdr_clear_flags(hdr, ARC_FLAG_HAS_L2HDR);
} }
@ -3018,7 +3024,7 @@ arc_hdr_destroy(arc_buf_hdr_t *hdr)
if (HDR_HAS_L1HDR(hdr)) { if (HDR_HAS_L1HDR(hdr)) {
ASSERT(hdr->b_l1hdr.b_buf == NULL || ASSERT(hdr->b_l1hdr.b_buf == NULL ||
hdr->b_l1hdr.b_bufcnt > 0); hdr->b_l1hdr.b_bufcnt > 0);
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon); ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
} }
ASSERT(!HDR_IO_IN_PROGRESS(hdr)); ASSERT(!HDR_IO_IN_PROGRESS(hdr));
@ -3171,7 +3177,7 @@ arc_evict_hdr(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
return (bytes_evicted); return (bytes_evicted);
} }
ASSERT0(refcount_count(&hdr->b_l1hdr.b_refcnt)); ASSERT0(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt));
while (hdr->b_l1hdr.b_buf) { while (hdr->b_l1hdr.b_buf) {
arc_buf_t *buf = hdr->b_l1hdr.b_buf; arc_buf_t *buf = hdr->b_l1hdr.b_buf;
if (!mutex_tryenter(&buf->b_evict_lock)) { if (!mutex_tryenter(&buf->b_evict_lock)) {
@ -3484,7 +3490,7 @@ arc_flush_state(arc_state_t *state, uint64_t spa, arc_buf_contents_t type,
{ {
uint64_t evicted = 0; uint64_t evicted = 0;
while (refcount_count(&state->arcs_esize[type]) != 0) { while (zfs_refcount_count(&state->arcs_esize[type]) != 0) {
evicted += arc_evict_state(state, spa, ARC_EVICT_ALL, type); evicted += arc_evict_state(state, spa, ARC_EVICT_ALL, type);
if (!retry) if (!retry)
@ -3507,7 +3513,7 @@ arc_prune_task(void *ptr)
if (func != NULL) if (func != NULL)
func(ap->p_adjust, ap->p_private); func(ap->p_adjust, ap->p_private);
refcount_remove(&ap->p_refcnt, func); zfs_refcount_remove(&ap->p_refcnt, func);
} }
/* /*
@ -3530,14 +3536,14 @@ arc_prune_async(int64_t adjust)
for (ap = list_head(&arc_prune_list); ap != NULL; for (ap = list_head(&arc_prune_list); ap != NULL;
ap = list_next(&arc_prune_list, ap)) { ap = list_next(&arc_prune_list, ap)) {
if (refcount_count(&ap->p_refcnt) >= 2) if (zfs_refcount_count(&ap->p_refcnt) >= 2)
continue; continue;
refcount_add(&ap->p_refcnt, ap->p_pfunc); zfs_refcount_add(&ap->p_refcnt, ap->p_pfunc);
ap->p_adjust = adjust; ap->p_adjust = adjust;
if (taskq_dispatch(arc_prune_taskq, arc_prune_task, if (taskq_dispatch(arc_prune_taskq, arc_prune_task,
ap, TQ_SLEEP) == TASKQID_INVALID) { ap, TQ_SLEEP) == TASKQID_INVALID) {
refcount_remove(&ap->p_refcnt, ap->p_pfunc); zfs_refcount_remove(&ap->p_refcnt, ap->p_pfunc);
continue; continue;
} }
ARCSTAT_BUMP(arcstat_prune); ARCSTAT_BUMP(arcstat_prune);
@ -3559,8 +3565,9 @@ arc_adjust_impl(arc_state_t *state, uint64_t spa, int64_t bytes,
{ {
int64_t delta; int64_t delta;
if (bytes > 0 && refcount_count(&state->arcs_esize[type]) > 0) { if (bytes > 0 && zfs_refcount_count(&state->arcs_esize[type]) > 0) {
delta = MIN(refcount_count(&state->arcs_esize[type]), bytes); delta = MIN(zfs_refcount_count(&state->arcs_esize[type]),
bytes);
return (arc_evict_state(state, spa, delta, type)); return (arc_evict_state(state, spa, delta, type));
} }
@ -3603,8 +3610,9 @@ restart:
*/ */
adjustmnt = arc_meta_used - arc_meta_limit; adjustmnt = arc_meta_used - arc_meta_limit;
if (adjustmnt > 0 && refcount_count(&arc_mru->arcs_esize[type]) > 0) { if (adjustmnt > 0 &&
delta = MIN(refcount_count(&arc_mru->arcs_esize[type]), zfs_refcount_count(&arc_mru->arcs_esize[type]) > 0) {
delta = MIN(zfs_refcount_count(&arc_mru->arcs_esize[type]),
adjustmnt); adjustmnt);
total_evicted += arc_adjust_impl(arc_mru, 0, delta, type); total_evicted += arc_adjust_impl(arc_mru, 0, delta, type);
adjustmnt -= delta; adjustmnt -= delta;
@ -3620,8 +3628,9 @@ restart:
* simply decrement the amount of data evicted from the MRU. * simply decrement the amount of data evicted from the MRU.
*/ */
if (adjustmnt > 0 && refcount_count(&arc_mfu->arcs_esize[type]) > 0) { if (adjustmnt > 0 &&
delta = MIN(refcount_count(&arc_mfu->arcs_esize[type]), zfs_refcount_count(&arc_mfu->arcs_esize[type]) > 0) {
delta = MIN(zfs_refcount_count(&arc_mfu->arcs_esize[type]),
adjustmnt); adjustmnt);
total_evicted += arc_adjust_impl(arc_mfu, 0, delta, type); total_evicted += arc_adjust_impl(arc_mfu, 0, delta, type);
} }
@ -3629,17 +3638,17 @@ restart:
adjustmnt = arc_meta_used - arc_meta_limit; adjustmnt = arc_meta_used - arc_meta_limit;
if (adjustmnt > 0 && if (adjustmnt > 0 &&
refcount_count(&arc_mru_ghost->arcs_esize[type]) > 0) { zfs_refcount_count(&arc_mru_ghost->arcs_esize[type]) > 0) {
delta = MIN(adjustmnt, delta = MIN(adjustmnt,
refcount_count(&arc_mru_ghost->arcs_esize[type])); zfs_refcount_count(&arc_mru_ghost->arcs_esize[type]));
total_evicted += arc_adjust_impl(arc_mru_ghost, 0, delta, type); total_evicted += arc_adjust_impl(arc_mru_ghost, 0, delta, type);
adjustmnt -= delta; adjustmnt -= delta;
} }
if (adjustmnt > 0 && if (adjustmnt > 0 &&
refcount_count(&arc_mfu_ghost->arcs_esize[type]) > 0) { zfs_refcount_count(&arc_mfu_ghost->arcs_esize[type]) > 0) {
delta = MIN(adjustmnt, delta = MIN(adjustmnt,
refcount_count(&arc_mfu_ghost->arcs_esize[type])); zfs_refcount_count(&arc_mfu_ghost->arcs_esize[type]));
total_evicted += arc_adjust_impl(arc_mfu_ghost, 0, delta, type); total_evicted += arc_adjust_impl(arc_mfu_ghost, 0, delta, type);
} }
@ -3688,8 +3697,8 @@ arc_adjust_meta_only(void)
* evict some from the MRU here, and some from the MFU below. * evict some from the MRU here, and some from the MFU below.
*/ */
target = MIN((int64_t)(arc_meta_used - arc_meta_limit), target = MIN((int64_t)(arc_meta_used - arc_meta_limit),
(int64_t)(refcount_count(&arc_anon->arcs_size) + (int64_t)(zfs_refcount_count(&arc_anon->arcs_size) +
refcount_count(&arc_mru->arcs_size) - arc_p)); zfs_refcount_count(&arc_mru->arcs_size) - arc_p));
total_evicted += arc_adjust_impl(arc_mru, 0, target, ARC_BUFC_METADATA); total_evicted += arc_adjust_impl(arc_mru, 0, target, ARC_BUFC_METADATA);
@ -3699,7 +3708,8 @@ arc_adjust_meta_only(void)
* space allotted to the MFU (which is defined as arc_c - arc_p). * space allotted to the MFU (which is defined as arc_c - arc_p).
*/ */
target = MIN((int64_t)(arc_meta_used - arc_meta_limit), target = MIN((int64_t)(arc_meta_used - arc_meta_limit),
(int64_t)(refcount_count(&arc_mfu->arcs_size) - (arc_c - arc_p))); (int64_t)(zfs_refcount_count(&arc_mfu->arcs_size) - (arc_c -
arc_p)));
total_evicted += arc_adjust_impl(arc_mfu, 0, target, ARC_BUFC_METADATA); total_evicted += arc_adjust_impl(arc_mfu, 0, target, ARC_BUFC_METADATA);
@ -3817,8 +3827,8 @@ arc_adjust(void)
* arc_p here, and then evict more from the MFU below. * arc_p here, and then evict more from the MFU below.
*/ */
target = MIN((int64_t)(arc_size - arc_c), target = MIN((int64_t)(arc_size - arc_c),
(int64_t)(refcount_count(&arc_anon->arcs_size) + (int64_t)(zfs_refcount_count(&arc_anon->arcs_size) +
refcount_count(&arc_mru->arcs_size) + arc_meta_used - arc_p)); zfs_refcount_count(&arc_mru->arcs_size) + arc_meta_used - arc_p));
/* /*
* If we're below arc_meta_min, always prefer to evict data. * If we're below arc_meta_min, always prefer to evict data.
@ -3902,8 +3912,8 @@ arc_adjust(void)
* cache. The following logic enforces these limits on the ghost * cache. The following logic enforces these limits on the ghost
* caches, and evicts from them as needed. * caches, and evicts from them as needed.
*/ */
target = refcount_count(&arc_mru->arcs_size) + target = zfs_refcount_count(&arc_mru->arcs_size) +
refcount_count(&arc_mru_ghost->arcs_size) - arc_c; zfs_refcount_count(&arc_mru_ghost->arcs_size) - arc_c;
bytes = arc_adjust_impl(arc_mru_ghost, 0, target, ARC_BUFC_DATA); bytes = arc_adjust_impl(arc_mru_ghost, 0, target, ARC_BUFC_DATA);
total_evicted += bytes; total_evicted += bytes;
@ -3921,8 +3931,8 @@ arc_adjust(void)
* mru + mfu + mru ghost + mfu ghost <= 2 * arc_c * mru + mfu + mru ghost + mfu ghost <= 2 * arc_c
* mru ghost + mfu ghost <= arc_c * mru ghost + mfu ghost <= arc_c
*/ */
target = refcount_count(&arc_mru_ghost->arcs_size) + target = zfs_refcount_count(&arc_mru_ghost->arcs_size) +
refcount_count(&arc_mfu_ghost->arcs_size) - arc_c; zfs_refcount_count(&arc_mfu_ghost->arcs_size) - arc_c;
bytes = arc_adjust_impl(arc_mfu_ghost, 0, target, ARC_BUFC_DATA); bytes = arc_adjust_impl(arc_mfu_ghost, 0, target, ARC_BUFC_DATA);
total_evicted += bytes; total_evicted += bytes;
@ -3994,9 +4004,9 @@ arc_all_memory(void)
{ {
#ifdef _KERNEL #ifdef _KERNEL
#ifdef CONFIG_HIGHMEM #ifdef CONFIG_HIGHMEM
return (ptob(totalram_pages - totalhigh_pages)); return (ptob(zfs_totalram_pages - totalhigh_pages));
#else #else
return (ptob(totalram_pages)); return (ptob(zfs_totalram_pages));
#endif /* CONFIG_HIGHMEM */ #endif /* CONFIG_HIGHMEM */
#else #else
return (ptob(physmem) / 2); return (ptob(physmem) / 2);
@ -4422,10 +4432,10 @@ static uint64_t
arc_evictable_memory(void) arc_evictable_memory(void)
{ {
uint64_t arc_clean = uint64_t arc_clean =
refcount_count(&arc_mru->arcs_esize[ARC_BUFC_DATA]) + zfs_refcount_count(&arc_mru->arcs_esize[ARC_BUFC_DATA]) +
refcount_count(&arc_mru->arcs_esize[ARC_BUFC_METADATA]) + zfs_refcount_count(&arc_mru->arcs_esize[ARC_BUFC_METADATA]) +
refcount_count(&arc_mfu->arcs_esize[ARC_BUFC_DATA]) + zfs_refcount_count(&arc_mfu->arcs_esize[ARC_BUFC_DATA]) +
refcount_count(&arc_mfu->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_count(&arc_mfu->arcs_esize[ARC_BUFC_METADATA]);
uint64_t arc_dirty = MAX((int64_t)arc_size - (int64_t)arc_clean, 0); uint64_t arc_dirty = MAX((int64_t)arc_size - (int64_t)arc_clean, 0);
/* /*
@ -4532,8 +4542,8 @@ arc_adapt(int bytes, arc_state_t *state)
{ {
int mult; int mult;
uint64_t arc_p_min = (arc_c >> arc_p_min_shift); uint64_t arc_p_min = (arc_c >> arc_p_min_shift);
int64_t mrug_size = refcount_count(&arc_mru_ghost->arcs_size); int64_t mrug_size = zfs_refcount_count(&arc_mru_ghost->arcs_size);
int64_t mfug_size = refcount_count(&arc_mfu_ghost->arcs_size); int64_t mfug_size = zfs_refcount_count(&arc_mfu_ghost->arcs_size);
if (state == arc_l2c_only) if (state == arc_l2c_only)
return; return;
@ -4698,7 +4708,7 @@ arc_get_data_impl(arc_buf_hdr_t *hdr, uint64_t size, void *tag)
*/ */
if (!GHOST_STATE(state)) { if (!GHOST_STATE(state)) {
(void) refcount_add_many(&state->arcs_size, size, tag); (void) zfs_refcount_add_many(&state->arcs_size, size, tag);
/* /*
* If this is reached via arc_read, the link is * If this is reached via arc_read, the link is
@ -4710,8 +4720,8 @@ arc_get_data_impl(arc_buf_hdr_t *hdr, uint64_t size, void *tag)
* trying to [add|remove]_reference it. * trying to [add|remove]_reference it.
*/ */
if (multilist_link_active(&hdr->b_l1hdr.b_arc_node)) { if (multilist_link_active(&hdr->b_l1hdr.b_arc_node)) {
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
(void) refcount_add_many(&state->arcs_esize[type], (void) zfs_refcount_add_many(&state->arcs_esize[type],
size, tag); size, tag);
} }
@ -4720,8 +4730,8 @@ arc_get_data_impl(arc_buf_hdr_t *hdr, uint64_t size, void *tag)
* data, and we have outgrown arc_p, update arc_p * data, and we have outgrown arc_p, update arc_p
*/ */
if (arc_size < arc_c && hdr->b_l1hdr.b_state == arc_anon && if (arc_size < arc_c && hdr->b_l1hdr.b_state == arc_anon &&
(refcount_count(&arc_anon->arcs_size) + (zfs_refcount_count(&arc_anon->arcs_size) +
refcount_count(&arc_mru->arcs_size) > arc_p)) zfs_refcount_count(&arc_mru->arcs_size) > arc_p))
arc_p = MIN(arc_c, arc_p + size); arc_p = MIN(arc_c, arc_p + size);
} }
} }
@ -4758,13 +4768,13 @@ arc_free_data_impl(arc_buf_hdr_t *hdr, uint64_t size, void *tag)
/* protected by hash lock, if in the hash table */ /* protected by hash lock, if in the hash table */
if (multilist_link_active(&hdr->b_l1hdr.b_arc_node)) { if (multilist_link_active(&hdr->b_l1hdr.b_arc_node)) {
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
ASSERT(state != arc_anon && state != arc_l2c_only); ASSERT(state != arc_anon && state != arc_l2c_only);
(void) refcount_remove_many(&state->arcs_esize[type], (void) zfs_refcount_remove_many(&state->arcs_esize[type],
size, tag); size, tag);
} }
(void) refcount_remove_many(&state->arcs_size, size, tag); (void) zfs_refcount_remove_many(&state->arcs_size, size, tag);
VERIFY3U(hdr->b_type, ==, type); VERIFY3U(hdr->b_type, ==, type);
if (type == ARC_BUFC_METADATA) { if (type == ARC_BUFC_METADATA) {
@ -4811,7 +4821,7 @@ arc_access(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
* another prefetch (to make it less likely to be evicted). * another prefetch (to make it less likely to be evicted).
*/ */
if (HDR_PREFETCH(hdr)) { if (HDR_PREFETCH(hdr)) {
if (refcount_count(&hdr->b_l1hdr.b_refcnt) == 0) { if (zfs_refcount_count(&hdr->b_l1hdr.b_refcnt) == 0) {
/* link protected by hash lock */ /* link protected by hash lock */
ASSERT(multilist_link_active( ASSERT(multilist_link_active(
&hdr->b_l1hdr.b_arc_node)); &hdr->b_l1hdr.b_arc_node));
@ -4852,7 +4862,7 @@ arc_access(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
if (HDR_PREFETCH(hdr)) { if (HDR_PREFETCH(hdr)) {
new_state = arc_mru; new_state = arc_mru;
if (refcount_count(&hdr->b_l1hdr.b_refcnt) > 0) if (zfs_refcount_count(&hdr->b_l1hdr.b_refcnt) > 0)
arc_hdr_clear_flags(hdr, ARC_FLAG_PREFETCH); arc_hdr_clear_flags(hdr, ARC_FLAG_PREFETCH);
DTRACE_PROBE1(new_state__mru, arc_buf_hdr_t *, hdr); DTRACE_PROBE1(new_state__mru, arc_buf_hdr_t *, hdr);
} else { } else {
@ -4876,7 +4886,7 @@ arc_access(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
* the head of the list now. * the head of the list now.
*/ */
if ((HDR_PREFETCH(hdr)) != 0) { if ((HDR_PREFETCH(hdr)) != 0) {
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
/* link protected by hash_lock */ /* link protected by hash_lock */
ASSERT(multilist_link_active(&hdr->b_l1hdr.b_arc_node)); ASSERT(multilist_link_active(&hdr->b_l1hdr.b_arc_node));
} }
@ -4896,7 +4906,7 @@ arc_access(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
* This is a prefetch access... * This is a prefetch access...
* move this block back to the MRU state. * move this block back to the MRU state.
*/ */
ASSERT0(refcount_count(&hdr->b_l1hdr.b_refcnt)); ASSERT0(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt));
new_state = arc_mru; new_state = arc_mru;
} }
@ -5098,7 +5108,7 @@ arc_read_done(zio_t *zio)
ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
} }
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt) || ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt) ||
callback_list != NULL); callback_list != NULL);
if (no_zio_error) { if (no_zio_error) {
@ -5109,7 +5119,7 @@ arc_read_done(zio_t *zio)
arc_change_state(arc_anon, hdr, hash_lock); arc_change_state(arc_anon, hdr, hash_lock);
if (HDR_IN_HASH_TABLE(hdr)) if (HDR_IN_HASH_TABLE(hdr))
buf_hash_remove(hdr); buf_hash_remove(hdr);
freeable = refcount_is_zero(&hdr->b_l1hdr.b_refcnt); freeable = zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt);
} }
/* /*
@ -5129,7 +5139,7 @@ arc_read_done(zio_t *zio)
* in the cache). * in the cache).
*/ */
ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon); ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
freeable = refcount_is_zero(&hdr->b_l1hdr.b_refcnt); freeable = zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt);
} }
/* execute each callback and free its structure */ /* execute each callback and free its structure */
@ -5282,7 +5292,7 @@ top:
VERIFY0(arc_buf_alloc_impl(hdr, private, VERIFY0(arc_buf_alloc_impl(hdr, private,
compressed_read, B_TRUE, &buf)); compressed_read, B_TRUE, &buf));
} else if (*arc_flags & ARC_FLAG_PREFETCH && } else if (*arc_flags & ARC_FLAG_PREFETCH &&
refcount_count(&hdr->b_l1hdr.b_refcnt) == 0) { zfs_refcount_count(&hdr->b_l1hdr.b_refcnt) == 0) {
arc_hdr_set_flags(hdr, ARC_FLAG_PREFETCH); arc_hdr_set_flags(hdr, ARC_FLAG_PREFETCH);
} }
DTRACE_PROBE1(arc__hit, arc_buf_hdr_t *, hdr); DTRACE_PROBE1(arc__hit, arc_buf_hdr_t *, hdr);
@ -5348,7 +5358,7 @@ top:
ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, ==, NULL);
ASSERT(GHOST_STATE(hdr->b_l1hdr.b_state)); ASSERT(GHOST_STATE(hdr->b_l1hdr.b_state));
ASSERT(!HDR_IO_IN_PROGRESS(hdr)); ASSERT(!HDR_IO_IN_PROGRESS(hdr));
ASSERT(refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_buf, ==, NULL);
ASSERT3P(hdr->b_l1hdr.b_freeze_cksum, ==, NULL); ASSERT3P(hdr->b_l1hdr.b_freeze_cksum, ==, NULL);
@ -5546,10 +5556,10 @@ arc_add_prune_callback(arc_prune_func_t *func, void *private)
p->p_pfunc = func; p->p_pfunc = func;
p->p_private = private; p->p_private = private;
list_link_init(&p->p_node); list_link_init(&p->p_node);
refcount_create(&p->p_refcnt); zfs_refcount_create(&p->p_refcnt);
mutex_enter(&arc_prune_mtx); mutex_enter(&arc_prune_mtx);
refcount_add(&p->p_refcnt, &arc_prune_list); zfs_refcount_add(&p->p_refcnt, &arc_prune_list);
list_insert_head(&arc_prune_list, p); list_insert_head(&arc_prune_list, p);
mutex_exit(&arc_prune_mtx); mutex_exit(&arc_prune_mtx);
@ -5562,15 +5572,15 @@ arc_remove_prune_callback(arc_prune_t *p)
boolean_t wait = B_FALSE; boolean_t wait = B_FALSE;
mutex_enter(&arc_prune_mtx); mutex_enter(&arc_prune_mtx);
list_remove(&arc_prune_list, p); list_remove(&arc_prune_list, p);
if (refcount_remove(&p->p_refcnt, &arc_prune_list) > 0) if (zfs_refcount_remove(&p->p_refcnt, &arc_prune_list) > 0)
wait = B_TRUE; wait = B_TRUE;
mutex_exit(&arc_prune_mtx); mutex_exit(&arc_prune_mtx);
/* wait for arc_prune_task to finish */ /* wait for arc_prune_task to finish */
if (wait) if (wait)
taskq_wait_outstanding(arc_prune_taskq, 0); taskq_wait_outstanding(arc_prune_taskq, 0);
ASSERT0(refcount_count(&p->p_refcnt)); ASSERT0(zfs_refcount_count(&p->p_refcnt));
refcount_destroy(&p->p_refcnt); zfs_refcount_destroy(&p->p_refcnt);
kmem_free(p, sizeof (*p)); kmem_free(p, sizeof (*p));
} }
@ -5613,7 +5623,7 @@ arc_freed(spa_t *spa, const blkptr_t *bp)
* this hdr, then we don't destroy the hdr. * this hdr, then we don't destroy the hdr.
*/ */
if (!HDR_HAS_L1HDR(hdr) || (!HDR_IO_IN_PROGRESS(hdr) && if (!HDR_HAS_L1HDR(hdr) || (!HDR_IO_IN_PROGRESS(hdr) &&
refcount_is_zero(&hdr->b_l1hdr.b_refcnt))) { zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt))) {
arc_change_state(arc_anon, hdr, hash_lock); arc_change_state(arc_anon, hdr, hash_lock);
arc_hdr_destroy(hdr); arc_hdr_destroy(hdr);
mutex_exit(hash_lock); mutex_exit(hash_lock);
@ -5659,7 +5669,7 @@ arc_release(arc_buf_t *buf, void *tag)
ASSERT(HDR_EMPTY(hdr)); ASSERT(HDR_EMPTY(hdr));
ASSERT3U(hdr->b_l1hdr.b_bufcnt, ==, 1); ASSERT3U(hdr->b_l1hdr.b_bufcnt, ==, 1);
ASSERT3S(refcount_count(&hdr->b_l1hdr.b_refcnt), ==, 1); ASSERT3S(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt), ==, 1);
ASSERT(!list_link_active(&hdr->b_l1hdr.b_arc_node)); ASSERT(!list_link_active(&hdr->b_l1hdr.b_arc_node));
hdr->b_l1hdr.b_arc_access = 0; hdr->b_l1hdr.b_arc_access = 0;
@ -5687,7 +5697,7 @@ arc_release(arc_buf_t *buf, void *tag)
ASSERT3P(state, !=, arc_anon); ASSERT3P(state, !=, arc_anon);
/* this buffer is not on any list */ /* this buffer is not on any list */
ASSERT3S(refcount_count(&hdr->b_l1hdr.b_refcnt), >, 0); ASSERT3S(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt), >, 0);
if (HDR_HAS_L2HDR(hdr)) { if (HDR_HAS_L2HDR(hdr)) {
mutex_enter(&hdr->b_l2hdr.b_dev->l2ad_mtx); mutex_enter(&hdr->b_l2hdr.b_dev->l2ad_mtx);
@ -5778,12 +5788,13 @@ arc_release(arc_buf_t *buf, void *tag)
ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL); ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
ASSERT3P(state, !=, arc_l2c_only); ASSERT3P(state, !=, arc_l2c_only);
(void) refcount_remove_many(&state->arcs_size, (void) zfs_refcount_remove_many(&state->arcs_size,
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
if (refcount_is_zero(&hdr->b_l1hdr.b_refcnt)) { if (zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt)) {
ASSERT3P(state, !=, arc_l2c_only); ASSERT3P(state, !=, arc_l2c_only);
(void) refcount_remove_many(&state->arcs_esize[type], (void) zfs_refcount_remove_many(
&state->arcs_esize[type],
arc_buf_size(buf), buf); arc_buf_size(buf), buf);
} }
@ -5804,7 +5815,7 @@ arc_release(arc_buf_t *buf, void *tag)
nhdr = arc_hdr_alloc(spa, psize, lsize, compress, type); nhdr = arc_hdr_alloc(spa, psize, lsize, compress, type);
ASSERT3P(nhdr->b_l1hdr.b_buf, ==, NULL); ASSERT3P(nhdr->b_l1hdr.b_buf, ==, NULL);
ASSERT0(nhdr->b_l1hdr.b_bufcnt); ASSERT0(nhdr->b_l1hdr.b_bufcnt);
ASSERT0(refcount_count(&nhdr->b_l1hdr.b_refcnt)); ASSERT0(zfs_refcount_count(&nhdr->b_l1hdr.b_refcnt));
VERIFY3U(nhdr->b_type, ==, type); VERIFY3U(nhdr->b_type, ==, type);
ASSERT(!HDR_SHARED_DATA(nhdr)); ASSERT(!HDR_SHARED_DATA(nhdr));
@ -5815,15 +5826,15 @@ arc_release(arc_buf_t *buf, void *tag)
nhdr->b_l1hdr.b_mfu_hits = 0; nhdr->b_l1hdr.b_mfu_hits = 0;
nhdr->b_l1hdr.b_mfu_ghost_hits = 0; nhdr->b_l1hdr.b_mfu_ghost_hits = 0;
nhdr->b_l1hdr.b_l2_hits = 0; nhdr->b_l1hdr.b_l2_hits = 0;
(void) refcount_add(&nhdr->b_l1hdr.b_refcnt, tag); (void) zfs_refcount_add(&nhdr->b_l1hdr.b_refcnt, tag);
buf->b_hdr = nhdr; buf->b_hdr = nhdr;
mutex_exit(&buf->b_evict_lock); mutex_exit(&buf->b_evict_lock);
(void) refcount_add_many(&arc_anon->arcs_size, (void) zfs_refcount_add_many(&arc_anon->arcs_size,
HDR_GET_LSIZE(nhdr), buf); arc_buf_size(buf), buf);
} else { } else {
mutex_exit(&buf->b_evict_lock); mutex_exit(&buf->b_evict_lock);
ASSERT(refcount_count(&hdr->b_l1hdr.b_refcnt) == 1); ASSERT(zfs_refcount_count(&hdr->b_l1hdr.b_refcnt) == 1);
/* protected by hash lock, or hdr is on arc_anon */ /* protected by hash lock, or hdr is on arc_anon */
ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node)); ASSERT(!multilist_link_active(&hdr->b_l1hdr.b_arc_node));
ASSERT(!HDR_IO_IN_PROGRESS(hdr)); ASSERT(!HDR_IO_IN_PROGRESS(hdr));
@ -5860,7 +5871,7 @@ arc_referenced(arc_buf_t *buf)
int referenced; int referenced;
mutex_enter(&buf->b_evict_lock); mutex_enter(&buf->b_evict_lock);
referenced = (refcount_count(&buf->b_hdr->b_l1hdr.b_refcnt)); referenced = (zfs_refcount_count(&buf->b_hdr->b_l1hdr.b_refcnt));
mutex_exit(&buf->b_evict_lock); mutex_exit(&buf->b_evict_lock);
return (referenced); return (referenced);
} }
@ -5877,7 +5888,7 @@ arc_write_ready(zio_t *zio)
fstrans_cookie_t cookie = spl_fstrans_mark(); fstrans_cookie_t cookie = spl_fstrans_mark();
ASSERT(HDR_HAS_L1HDR(hdr)); ASSERT(HDR_HAS_L1HDR(hdr));
ASSERT(!refcount_is_zero(&buf->b_hdr->b_l1hdr.b_refcnt)); ASSERT(!zfs_refcount_is_zero(&buf->b_hdr->b_l1hdr.b_refcnt));
ASSERT(hdr->b_l1hdr.b_bufcnt > 0); ASSERT(hdr->b_l1hdr.b_bufcnt > 0);
/* /*
@ -6029,7 +6040,7 @@ arc_write_done(zio_t *zio)
if (!BP_EQUAL(&zio->io_bp_orig, zio->io_bp)) if (!BP_EQUAL(&zio->io_bp_orig, zio->io_bp))
panic("bad overwrite, hdr=%p exists=%p", panic("bad overwrite, hdr=%p exists=%p",
(void *)hdr, (void *)exists); (void *)hdr, (void *)exists);
ASSERT(refcount_is_zero( ASSERT(zfs_refcount_is_zero(
&exists->b_l1hdr.b_refcnt)); &exists->b_l1hdr.b_refcnt));
arc_change_state(arc_anon, exists, hash_lock); arc_change_state(arc_anon, exists, hash_lock);
mutex_exit(hash_lock); mutex_exit(hash_lock);
@ -6059,7 +6070,7 @@ arc_write_done(zio_t *zio)
arc_hdr_clear_flags(hdr, ARC_FLAG_IO_IN_PROGRESS); arc_hdr_clear_flags(hdr, ARC_FLAG_IO_IN_PROGRESS);
} }
ASSERT(!refcount_is_zero(&hdr->b_l1hdr.b_refcnt)); ASSERT(!zfs_refcount_is_zero(&hdr->b_l1hdr.b_refcnt));
callback->awcb_done(zio, buf, callback->awcb_private); callback->awcb_done(zio, buf, callback->awcb_private);
abd_put(zio->io_abd); abd_put(zio->io_abd);
@ -6222,7 +6233,7 @@ arc_tempreserve_space(uint64_t reserve, uint64_t txg)
/* assert that it has not wrapped around */ /* assert that it has not wrapped around */
ASSERT3S(atomic_add_64_nv(&arc_loaned_bytes, 0), >=, 0); ASSERT3S(atomic_add_64_nv(&arc_loaned_bytes, 0), >=, 0);
anon_size = MAX((int64_t)(refcount_count(&arc_anon->arcs_size) - anon_size = MAX((int64_t)(zfs_refcount_count(&arc_anon->arcs_size) -
arc_loaned_bytes), 0); arc_loaned_bytes), 0);
/* /*
@ -6245,9 +6256,10 @@ arc_tempreserve_space(uint64_t reserve, uint64_t txg)
if (reserve + arc_tempreserve + anon_size > arc_c / 2 && if (reserve + arc_tempreserve + anon_size > arc_c / 2 &&
anon_size > arc_c / 4) { anon_size > arc_c / 4) {
uint64_t meta_esize = uint64_t meta_esize =
refcount_count(&arc_anon->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_count(
&arc_anon->arcs_esize[ARC_BUFC_METADATA]);
uint64_t data_esize = uint64_t data_esize =
refcount_count(&arc_anon->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_count(&arc_anon->arcs_esize[ARC_BUFC_DATA]);
dprintf("failing, arc_tempreserve=%lluK anon_meta=%lluK " dprintf("failing, arc_tempreserve=%lluK anon_meta=%lluK "
"anon_data=%lluK tempreserve=%lluK arc_c=%lluK\n", "anon_data=%lluK tempreserve=%lluK arc_c=%lluK\n",
arc_tempreserve >> 10, meta_esize >> 10, arc_tempreserve >> 10, meta_esize >> 10,
@ -6263,11 +6275,11 @@ static void
arc_kstat_update_state(arc_state_t *state, kstat_named_t *size, arc_kstat_update_state(arc_state_t *state, kstat_named_t *size,
kstat_named_t *evict_data, kstat_named_t *evict_metadata) kstat_named_t *evict_data, kstat_named_t *evict_metadata)
{ {
size->value.ui64 = refcount_count(&state->arcs_size); size->value.ui64 = zfs_refcount_count(&state->arcs_size);
evict_data->value.ui64 = evict_data->value.ui64 =
refcount_count(&state->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_count(&state->arcs_esize[ARC_BUFC_DATA]);
evict_metadata->value.ui64 = evict_metadata->value.ui64 =
refcount_count(&state->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_count(&state->arcs_esize[ARC_BUFC_METADATA]);
} }
static int static int
@ -6484,25 +6496,25 @@ arc_state_init(void)
offsetof(arc_buf_hdr_t, b_l1hdr.b_arc_node), offsetof(arc_buf_hdr_t, b_l1hdr.b_arc_node),
arc_state_multilist_index_func); arc_state_multilist_index_func);
refcount_create(&arc_anon->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_create(&arc_anon->arcs_esize[ARC_BUFC_METADATA]);
refcount_create(&arc_anon->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_create(&arc_anon->arcs_esize[ARC_BUFC_DATA]);
refcount_create(&arc_mru->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_create(&arc_mru->arcs_esize[ARC_BUFC_METADATA]);
refcount_create(&arc_mru->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_create(&arc_mru->arcs_esize[ARC_BUFC_DATA]);
refcount_create(&arc_mru_ghost->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_create(&arc_mru_ghost->arcs_esize[ARC_BUFC_METADATA]);
refcount_create(&arc_mru_ghost->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_create(&arc_mru_ghost->arcs_esize[ARC_BUFC_DATA]);
refcount_create(&arc_mfu->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_create(&arc_mfu->arcs_esize[ARC_BUFC_METADATA]);
refcount_create(&arc_mfu->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_create(&arc_mfu->arcs_esize[ARC_BUFC_DATA]);
refcount_create(&arc_mfu_ghost->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_create(&arc_mfu_ghost->arcs_esize[ARC_BUFC_METADATA]);
refcount_create(&arc_mfu_ghost->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_create(&arc_mfu_ghost->arcs_esize[ARC_BUFC_DATA]);
refcount_create(&arc_l2c_only->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_create(&arc_l2c_only->arcs_esize[ARC_BUFC_METADATA]);
refcount_create(&arc_l2c_only->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_create(&arc_l2c_only->arcs_esize[ARC_BUFC_DATA]);
refcount_create(&arc_anon->arcs_size); zfs_refcount_create(&arc_anon->arcs_size);
refcount_create(&arc_mru->arcs_size); zfs_refcount_create(&arc_mru->arcs_size);
refcount_create(&arc_mru_ghost->arcs_size); zfs_refcount_create(&arc_mru_ghost->arcs_size);
refcount_create(&arc_mfu->arcs_size); zfs_refcount_create(&arc_mfu->arcs_size);
refcount_create(&arc_mfu_ghost->arcs_size); zfs_refcount_create(&arc_mfu_ghost->arcs_size);
refcount_create(&arc_l2c_only->arcs_size); zfs_refcount_create(&arc_l2c_only->arcs_size);
arc_anon->arcs_state = ARC_STATE_ANON; arc_anon->arcs_state = ARC_STATE_ANON;
arc_mru->arcs_state = ARC_STATE_MRU; arc_mru->arcs_state = ARC_STATE_MRU;
@ -6515,25 +6527,25 @@ arc_state_init(void)
static void static void
arc_state_fini(void) arc_state_fini(void)
{ {
refcount_destroy(&arc_anon->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_destroy(&arc_anon->arcs_esize[ARC_BUFC_METADATA]);
refcount_destroy(&arc_anon->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_destroy(&arc_anon->arcs_esize[ARC_BUFC_DATA]);
refcount_destroy(&arc_mru->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_destroy(&arc_mru->arcs_esize[ARC_BUFC_METADATA]);
refcount_destroy(&arc_mru->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_destroy(&arc_mru->arcs_esize[ARC_BUFC_DATA]);
refcount_destroy(&arc_mru_ghost->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_destroy(&arc_mru_ghost->arcs_esize[ARC_BUFC_METADATA]);
refcount_destroy(&arc_mru_ghost->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_destroy(&arc_mru_ghost->arcs_esize[ARC_BUFC_DATA]);
refcount_destroy(&arc_mfu->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_destroy(&arc_mfu->arcs_esize[ARC_BUFC_METADATA]);
refcount_destroy(&arc_mfu->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_destroy(&arc_mfu->arcs_esize[ARC_BUFC_DATA]);
refcount_destroy(&arc_mfu_ghost->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_destroy(&arc_mfu_ghost->arcs_esize[ARC_BUFC_METADATA]);
refcount_destroy(&arc_mfu_ghost->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_destroy(&arc_mfu_ghost->arcs_esize[ARC_BUFC_DATA]);
refcount_destroy(&arc_l2c_only->arcs_esize[ARC_BUFC_METADATA]); zfs_refcount_destroy(&arc_l2c_only->arcs_esize[ARC_BUFC_METADATA]);
refcount_destroy(&arc_l2c_only->arcs_esize[ARC_BUFC_DATA]); zfs_refcount_destroy(&arc_l2c_only->arcs_esize[ARC_BUFC_DATA]);
refcount_destroy(&arc_anon->arcs_size); zfs_refcount_destroy(&arc_anon->arcs_size);
refcount_destroy(&arc_mru->arcs_size); zfs_refcount_destroy(&arc_mru->arcs_size);
refcount_destroy(&arc_mru_ghost->arcs_size); zfs_refcount_destroy(&arc_mru_ghost->arcs_size);
refcount_destroy(&arc_mfu->arcs_size); zfs_refcount_destroy(&arc_mfu->arcs_size);
refcount_destroy(&arc_mfu_ghost->arcs_size); zfs_refcount_destroy(&arc_mfu_ghost->arcs_size);
refcount_destroy(&arc_l2c_only->arcs_size); zfs_refcount_destroy(&arc_l2c_only->arcs_size);
multilist_destroy(arc_mru->arcs_list[ARC_BUFC_METADATA]); multilist_destroy(arc_mru->arcs_list[ARC_BUFC_METADATA]);
multilist_destroy(arc_mru_ghost->arcs_list[ARC_BUFC_METADATA]); multilist_destroy(arc_mru_ghost->arcs_list[ARC_BUFC_METADATA]);
@ -6704,8 +6716,8 @@ arc_fini(void)
mutex_enter(&arc_prune_mtx); mutex_enter(&arc_prune_mtx);
while ((p = list_head(&arc_prune_list)) != NULL) { while ((p = list_head(&arc_prune_list)) != NULL) {
list_remove(&arc_prune_list, p); list_remove(&arc_prune_list, p);
refcount_remove(&p->p_refcnt, &arc_prune_list); zfs_refcount_remove(&p->p_refcnt, &arc_prune_list);
refcount_destroy(&p->p_refcnt); zfs_refcount_destroy(&p->p_refcnt);
kmem_free(p, sizeof (*p)); kmem_free(p, sizeof (*p));
} }
mutex_exit(&arc_prune_mtx); mutex_exit(&arc_prune_mtx);
@ -7108,7 +7120,7 @@ top:
ARCSTAT_INCR(arcstat_l2_lsize, -HDR_GET_LSIZE(hdr)); ARCSTAT_INCR(arcstat_l2_lsize, -HDR_GET_LSIZE(hdr));
bytes_dropped += arc_hdr_size(hdr); bytes_dropped += arc_hdr_size(hdr);
(void) refcount_remove_many(&dev->l2ad_alloc, (void) zfs_refcount_remove_many(&dev->l2ad_alloc,
arc_hdr_size(hdr), hdr); arc_hdr_size(hdr), hdr);
} }
@ -7527,7 +7539,8 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
list_insert_head(&dev->l2ad_buflist, hdr); list_insert_head(&dev->l2ad_buflist, hdr);
mutex_exit(&dev->l2ad_mtx); mutex_exit(&dev->l2ad_mtx);
(void) refcount_add_many(&dev->l2ad_alloc, psize, hdr); (void) zfs_refcount_add_many(&dev->l2ad_alloc, psize,
hdr);
/* /*
* Normally the L2ARC can use the hdr's data, but if * Normally the L2ARC can use the hdr's data, but if
@ -7762,7 +7775,7 @@ l2arc_add_vdev(spa_t *spa, vdev_t *vd)
offsetof(arc_buf_hdr_t, b_l2hdr.b_l2node)); offsetof(arc_buf_hdr_t, b_l2hdr.b_l2node));
vdev_space_update(vd, 0, 0, adddev->l2ad_end - adddev->l2ad_hand); vdev_space_update(vd, 0, 0, adddev->l2ad_end - adddev->l2ad_hand);
refcount_create(&adddev->l2ad_alloc); zfs_refcount_create(&adddev->l2ad_alloc);
/* /*
* Add device to global list * Add device to global list
@ -7808,7 +7821,7 @@ l2arc_remove_vdev(vdev_t *vd)
l2arc_evict(remdev, 0, B_TRUE); l2arc_evict(remdev, 0, B_TRUE);
list_destroy(&remdev->l2ad_buflist); list_destroy(&remdev->l2ad_buflist);
mutex_destroy(&remdev->l2ad_mtx); mutex_destroy(&remdev->l2ad_mtx);
refcount_destroy(&remdev->l2ad_alloc); zfs_refcount_destroy(&remdev->l2ad_alloc);
kmem_free(remdev, sizeof (l2arc_dev_t)); kmem_free(remdev, sizeof (l2arc_dev_t));
} }

View File

@ -72,8 +72,6 @@ static void __dbuf_hold_impl_init(struct dbuf_hold_impl_data *dh,
void *tag, dmu_buf_impl_t **dbp, int depth); void *tag, dmu_buf_impl_t **dbp, int depth);
static int __dbuf_hold_impl(struct dbuf_hold_impl_data *dh); static int __dbuf_hold_impl(struct dbuf_hold_impl_data *dh);
uint_t zfs_dbuf_evict_key;
static boolean_t dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx); static boolean_t dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
static void dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx); static void dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx);
@ -104,7 +102,7 @@ static boolean_t dbuf_evict_thread_exit;
* become eligible for arc eviction. * become eligible for arc eviction.
*/ */
static multilist_t *dbuf_cache; static multilist_t *dbuf_cache;
static refcount_t dbuf_cache_size; static zfs_refcount_t dbuf_cache_size;
unsigned long dbuf_cache_max_bytes = 100 * 1024 * 1024; unsigned long dbuf_cache_max_bytes = 100 * 1024 * 1024;
/* Cap the size of the dbuf cache to log2 fraction of arc size. */ /* Cap the size of the dbuf cache to log2 fraction of arc size. */
@ -165,7 +163,7 @@ dbuf_cons(void *vdb, void *unused, int kmflag)
mutex_init(&db->db_mtx, NULL, MUTEX_DEFAULT, NULL); mutex_init(&db->db_mtx, NULL, MUTEX_DEFAULT, NULL);
cv_init(&db->db_changed, NULL, CV_DEFAULT, NULL); cv_init(&db->db_changed, NULL, CV_DEFAULT, NULL);
multilist_link_init(&db->db_cache_link); multilist_link_init(&db->db_cache_link);
refcount_create(&db->db_holds); zfs_refcount_create(&db->db_holds);
multilist_link_init(&db->db_cache_link); multilist_link_init(&db->db_cache_link);
return (0); return (0);
@ -179,7 +177,7 @@ dbuf_dest(void *vdb, void *unused)
mutex_destroy(&db->db_mtx); mutex_destroy(&db->db_mtx);
cv_destroy(&db->db_changed); cv_destroy(&db->db_changed);
ASSERT(!multilist_link_active(&db->db_cache_link)); ASSERT(!multilist_link_active(&db->db_cache_link));
refcount_destroy(&db->db_holds); zfs_refcount_destroy(&db->db_holds);
} }
/* /*
@ -317,7 +315,7 @@ dbuf_hash_remove(dmu_buf_impl_t *db)
* We mustn't hold db_mtx to maintain lock ordering: * We mustn't hold db_mtx to maintain lock ordering:
* DBUF_HASH_MUTEX > db_mtx. * DBUF_HASH_MUTEX > db_mtx.
*/ */
ASSERT(refcount_is_zero(&db->db_holds)); ASSERT(zfs_refcount_is_zero(&db->db_holds));
ASSERT(db->db_state == DB_EVICTING); ASSERT(db->db_state == DB_EVICTING);
ASSERT(!MUTEX_HELD(&db->db_mtx)); ASSERT(!MUTEX_HELD(&db->db_mtx));
@ -354,7 +352,7 @@ dbuf_verify_user(dmu_buf_impl_t *db, dbvu_verify_type_t verify_type)
ASSERT(db->db.db_data != NULL); ASSERT(db->db.db_data != NULL);
ASSERT3U(db->db_state, ==, DB_CACHED); ASSERT3U(db->db_state, ==, DB_CACHED);
holds = refcount_count(&db->db_holds); holds = zfs_refcount_count(&db->db_holds);
if (verify_type == DBVU_EVICTING) { if (verify_type == DBVU_EVICTING) {
/* /*
* Immediate eviction occurs when holds == dirtycnt. * Immediate eviction occurs when holds == dirtycnt.
@ -478,7 +476,7 @@ dbuf_cache_above_hiwater(void)
uint64_t dbuf_cache_hiwater_bytes = uint64_t dbuf_cache_hiwater_bytes =
(dbuf_cache_target * dbuf_cache_hiwater_pct) / 100; (dbuf_cache_target * dbuf_cache_hiwater_pct) / 100;
return (refcount_count(&dbuf_cache_size) > return (zfs_refcount_count(&dbuf_cache_size) >
dbuf_cache_target + dbuf_cache_hiwater_bytes); dbuf_cache_target + dbuf_cache_hiwater_bytes);
} }
@ -490,7 +488,7 @@ dbuf_cache_above_lowater(void)
uint64_t dbuf_cache_lowater_bytes = uint64_t dbuf_cache_lowater_bytes =
(dbuf_cache_target * dbuf_cache_lowater_pct) / 100; (dbuf_cache_target * dbuf_cache_lowater_pct) / 100;
return (refcount_count(&dbuf_cache_size) > return (zfs_refcount_count(&dbuf_cache_size) >
dbuf_cache_target - dbuf_cache_lowater_bytes); dbuf_cache_target - dbuf_cache_lowater_bytes);
} }
@ -505,14 +503,6 @@ dbuf_evict_one(void)
dmu_buf_impl_t *db; dmu_buf_impl_t *db;
ASSERT(!MUTEX_HELD(&dbuf_evict_lock)); ASSERT(!MUTEX_HELD(&dbuf_evict_lock));
/*
* Set the thread's tsd to indicate that it's processing evictions.
* Once a thread stops evicting from the dbuf cache it will
* reset its tsd to NULL.
*/
ASSERT3P(tsd_get(zfs_dbuf_evict_key), ==, NULL);
(void) tsd_set(zfs_dbuf_evict_key, (void *)B_TRUE);
db = multilist_sublist_tail(mls); db = multilist_sublist_tail(mls);
while (db != NULL && mutex_tryenter(&db->db_mtx) == 0) { while (db != NULL && mutex_tryenter(&db->db_mtx) == 0) {
db = multilist_sublist_prev(mls, db); db = multilist_sublist_prev(mls, db);
@ -524,13 +514,12 @@ dbuf_evict_one(void)
if (db != NULL) { if (db != NULL) {
multilist_sublist_remove(mls, db); multilist_sublist_remove(mls, db);
multilist_sublist_unlock(mls); multilist_sublist_unlock(mls);
(void) refcount_remove_many(&dbuf_cache_size, (void) zfs_refcount_remove_many(&dbuf_cache_size,
db->db.db_size, db); db->db.db_size, db);
dbuf_destroy(db); dbuf_destroy(db);
} else { } else {
multilist_sublist_unlock(mls); multilist_sublist_unlock(mls);
} }
(void) tsd_set(zfs_dbuf_evict_key, NULL);
} }
/* /*
@ -583,35 +572,12 @@ dbuf_evict_thread(void)
static void static void
dbuf_evict_notify(void) dbuf_evict_notify(void)
{ {
/*
* We use thread specific data to track when a thread has
* started processing evictions. This allows us to avoid deeply
* nested stacks that would have a call flow similar to this:
*
* dbuf_rele()-->dbuf_rele_and_unlock()-->dbuf_evict_notify()
* ^ |
* | |
* +-----dbuf_destroy()<--dbuf_evict_one()<--------+
*
* The dbuf_eviction_thread will always have its tsd set until
* that thread exits. All other threads will only set their tsd
* if they are participating in the eviction process. This only
* happens if the eviction thread is unable to process evictions
* fast enough. To keep the dbuf cache size in check, other threads
* can evict from the dbuf cache directly. Those threads will set
* their tsd values so that we ensure that they only evict one dbuf
* from the dbuf cache.
*/
if (tsd_get(zfs_dbuf_evict_key) != NULL)
return;
/* /*
* We check if we should evict without holding the dbuf_evict_lock, * We check if we should evict without holding the dbuf_evict_lock,
* because it's OK to occasionally make the wrong decision here, * because it's OK to occasionally make the wrong decision here,
* and grabbing the lock results in massive lock contention. * and grabbing the lock results in massive lock contention.
*/ */
if (refcount_count(&dbuf_cache_size) > dbuf_cache_target_bytes()) { if (zfs_refcount_count(&dbuf_cache_size) > dbuf_cache_target_bytes()) {
if (dbuf_cache_above_hiwater()) if (dbuf_cache_above_hiwater())
dbuf_evict_one(); dbuf_evict_one();
cv_signal(&dbuf_evict_cv); cv_signal(&dbuf_evict_cv);
@ -679,9 +645,8 @@ retry:
dbuf_cache = multilist_create(sizeof (dmu_buf_impl_t), dbuf_cache = multilist_create(sizeof (dmu_buf_impl_t),
offsetof(dmu_buf_impl_t, db_cache_link), offsetof(dmu_buf_impl_t, db_cache_link),
dbuf_cache_multilist_index_func); dbuf_cache_multilist_index_func);
refcount_create(&dbuf_cache_size); zfs_refcount_create(&dbuf_cache_size);
tsd_create(&zfs_dbuf_evict_key, NULL);
dbuf_evict_thread_exit = B_FALSE; dbuf_evict_thread_exit = B_FALSE;
mutex_init(&dbuf_evict_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&dbuf_evict_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&dbuf_evict_cv, NULL, CV_DEFAULT, NULL); cv_init(&dbuf_evict_cv, NULL, CV_DEFAULT, NULL);
@ -718,12 +683,11 @@ dbuf_fini(void)
cv_wait(&dbuf_evict_cv, &dbuf_evict_lock); cv_wait(&dbuf_evict_cv, &dbuf_evict_lock);
} }
mutex_exit(&dbuf_evict_lock); mutex_exit(&dbuf_evict_lock);
tsd_destroy(&zfs_dbuf_evict_key);
mutex_destroy(&dbuf_evict_lock); mutex_destroy(&dbuf_evict_lock);
cv_destroy(&dbuf_evict_cv); cv_destroy(&dbuf_evict_cv);
refcount_destroy(&dbuf_cache_size); zfs_refcount_destroy(&dbuf_cache_size);
multilist_destroy(dbuf_cache); multilist_destroy(dbuf_cache);
} }
@ -910,7 +874,7 @@ dbuf_loan_arcbuf(dmu_buf_impl_t *db)
ASSERT(db->db_blkid != DMU_BONUS_BLKID); ASSERT(db->db_blkid != DMU_BONUS_BLKID);
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
if (arc_released(db->db_buf) || refcount_count(&db->db_holds) > 1) { if (arc_released(db->db_buf) || zfs_refcount_count(&db->db_holds) > 1) {
int blksz = db->db.db_size; int blksz = db->db.db_size;
spa_t *spa = db->db_objset->os_spa; spa_t *spa = db->db_objset->os_spa;
@ -983,7 +947,7 @@ dbuf_read_done(zio_t *zio, arc_buf_t *buf, void *vdb)
/* /*
* All reads are synchronous, so we must have a hold on the dbuf * All reads are synchronous, so we must have a hold on the dbuf
*/ */
ASSERT(refcount_count(&db->db_holds) > 0); ASSERT(zfs_refcount_count(&db->db_holds) > 0);
ASSERT(db->db_buf == NULL); ASSERT(db->db_buf == NULL);
ASSERT(db->db.db_data == NULL); ASSERT(db->db.db_data == NULL);
if (db->db_level == 0 && db->db_freed_in_flight) { if (db->db_level == 0 && db->db_freed_in_flight) {
@ -1004,7 +968,7 @@ dbuf_read_done(zio_t *zio, arc_buf_t *buf, void *vdb)
db->db_state = DB_UNCACHED; db->db_state = DB_UNCACHED;
} }
cv_broadcast(&db->db_changed); cv_broadcast(&db->db_changed);
dbuf_rele_and_unlock(db, NULL); dbuf_rele_and_unlock(db, NULL, B_FALSE);
} }
static int static int
@ -1017,7 +981,7 @@ dbuf_read_impl(dmu_buf_impl_t *db, zio_t *zio, uint32_t flags)
DB_DNODE_ENTER(db); DB_DNODE_ENTER(db);
dn = DB_DNODE(db); dn = DB_DNODE(db);
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
/* We need the struct_rwlock to prevent db_blkptr from changing. */ /* We need the struct_rwlock to prevent db_blkptr from changing. */
ASSERT(RW_LOCK_HELD(&dn->dn_struct_rwlock)); ASSERT(RW_LOCK_HELD(&dn->dn_struct_rwlock));
ASSERT(MUTEX_HELD(&db->db_mtx)); ASSERT(MUTEX_HELD(&db->db_mtx));
@ -1150,7 +1114,7 @@ dbuf_fix_old_data(dmu_buf_impl_t *db, uint64_t txg)
dr->dt.dl.dr_data = kmem_alloc(bonuslen, KM_SLEEP); dr->dt.dl.dr_data = kmem_alloc(bonuslen, KM_SLEEP);
arc_space_consume(bonuslen, ARC_SPACE_BONUS); arc_space_consume(bonuslen, ARC_SPACE_BONUS);
bcopy(db->db.db_data, dr->dt.dl.dr_data, bonuslen); bcopy(db->db.db_data, dr->dt.dl.dr_data, bonuslen);
} else if (refcount_count(&db->db_holds) > db->db_dirtycnt) { } else if (zfs_refcount_count(&db->db_holds) > db->db_dirtycnt) {
int size = arc_buf_size(db->db_buf); int size = arc_buf_size(db->db_buf);
arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db);
spa_t *spa = db->db_objset->os_spa; spa_t *spa = db->db_objset->os_spa;
@ -1182,7 +1146,7 @@ dbuf_read(dmu_buf_impl_t *db, zio_t *zio, uint32_t flags)
* We don't have to hold the mutex to check db_state because it * We don't have to hold the mutex to check db_state because it
* can't be freed while we have a hold on the buffer. * can't be freed while we have a hold on the buffer.
*/ */
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
if (db->db_state == DB_NOFILL) if (db->db_state == DB_NOFILL)
return (SET_ERROR(EIO)); return (SET_ERROR(EIO));
@ -1277,7 +1241,7 @@ dbuf_read(dmu_buf_impl_t *db, zio_t *zio, uint32_t flags)
static void static void
dbuf_noread(dmu_buf_impl_t *db) dbuf_noread(dmu_buf_impl_t *db)
{ {
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
ASSERT(db->db_blkid != DMU_BONUS_BLKID); ASSERT(db->db_blkid != DMU_BONUS_BLKID);
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
while (db->db_state == DB_READ || db->db_state == DB_FILL) while (db->db_state == DB_READ || db->db_state == DB_FILL)
@ -1397,7 +1361,7 @@ dbuf_free_range(dnode_t *dn, uint64_t start_blkid, uint64_t end_blkid,
mutex_exit(&db->db_mtx); mutex_exit(&db->db_mtx);
continue; continue;
} }
if (refcount_count(&db->db_holds) == 0) { if (zfs_refcount_count(&db->db_holds) == 0) {
ASSERT(db->db_buf); ASSERT(db->db_buf);
dbuf_destroy(db); dbuf_destroy(db);
continue; continue;
@ -1544,7 +1508,7 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
int txgoff = tx->tx_txg & TXG_MASK; int txgoff = tx->tx_txg & TXG_MASK;
ASSERT(tx->tx_txg != 0); ASSERT(tx->tx_txg != 0);
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
DMU_TX_DIRTY_BUF(tx, db); DMU_TX_DIRTY_BUF(tx, db);
DB_DNODE_ENTER(db); DB_DNODE_ENTER(db);
@ -1606,6 +1570,9 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
FTAG); FTAG);
} }
} }
if (tx->tx_txg > dn->dn_dirty_txg)
dn->dn_dirty_txg = tx->tx_txg;
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
if (db->db_blkid == DMU_SPILL_BLKID) if (db->db_blkid == DMU_SPILL_BLKID)
@ -1909,7 +1876,7 @@ dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
ASSERT(db->db_dirtycnt > 0); ASSERT(db->db_dirtycnt > 0);
db->db_dirtycnt -= 1; db->db_dirtycnt -= 1;
if (refcount_remove(&db->db_holds, (void *)(uintptr_t)txg) == 0) { if (zfs_refcount_remove(&db->db_holds, (void *)(uintptr_t)txg) == 0) {
ASSERT(db->db_state == DB_NOFILL || arc_released(db->db_buf)); ASSERT(db->db_state == DB_NOFILL || arc_released(db->db_buf));
dbuf_destroy(db); dbuf_destroy(db);
return (B_TRUE); return (B_TRUE);
@ -1926,7 +1893,7 @@ dmu_buf_will_dirty(dmu_buf_t *db_fake, dmu_tx_t *tx)
dbuf_dirty_record_t *dr; dbuf_dirty_record_t *dr;
ASSERT(tx->tx_txg != 0); ASSERT(tx->tx_txg != 0);
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
/* /*
* Quick check for dirtyness. For already dirty blocks, this * Quick check for dirtyness. For already dirty blocks, this
@ -1978,7 +1945,7 @@ dmu_buf_will_fill(dmu_buf_t *db_fake, dmu_tx_t *tx)
ASSERT(db->db_blkid != DMU_BONUS_BLKID); ASSERT(db->db_blkid != DMU_BONUS_BLKID);
ASSERT(tx->tx_txg != 0); ASSERT(tx->tx_txg != 0);
ASSERT(db->db_level == 0); ASSERT(db->db_level == 0);
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
ASSERT(db->db.db_object != DMU_META_DNODE_OBJECT || ASSERT(db->db.db_object != DMU_META_DNODE_OBJECT ||
dmu_tx_private_ok(tx)); dmu_tx_private_ok(tx));
@ -2053,7 +2020,7 @@ dmu_buf_write_embedded(dmu_buf_t *dbuf, void *data,
void void
dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx) dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx)
{ {
ASSERT(!refcount_is_zero(&db->db_holds)); ASSERT(!zfs_refcount_is_zero(&db->db_holds));
ASSERT(db->db_blkid != DMU_BONUS_BLKID); ASSERT(db->db_blkid != DMU_BONUS_BLKID);
ASSERT(db->db_level == 0); ASSERT(db->db_level == 0);
ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf)); ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf));
@ -2072,7 +2039,7 @@ dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx)
ASSERT(db->db_state == DB_CACHED || db->db_state == DB_UNCACHED); ASSERT(db->db_state == DB_CACHED || db->db_state == DB_UNCACHED);
if (db->db_state == DB_CACHED && if (db->db_state == DB_CACHED &&
refcount_count(&db->db_holds) - 1 > db->db_dirtycnt) { zfs_refcount_count(&db->db_holds) - 1 > db->db_dirtycnt) {
mutex_exit(&db->db_mtx); mutex_exit(&db->db_mtx);
(void) dbuf_dirty(db, tx); (void) dbuf_dirty(db, tx);
bcopy(buf->b_data, db->db.db_data, db->db.db_size); bcopy(buf->b_data, db->db.db_data, db->db.db_size);
@ -2117,7 +2084,7 @@ dbuf_destroy(dmu_buf_impl_t *db)
dmu_buf_impl_t *dndb; dmu_buf_impl_t *dndb;
ASSERT(MUTEX_HELD(&db->db_mtx)); ASSERT(MUTEX_HELD(&db->db_mtx));
ASSERT(refcount_is_zero(&db->db_holds)); ASSERT(zfs_refcount_is_zero(&db->db_holds));
if (db->db_buf != NULL) { if (db->db_buf != NULL) {
arc_buf_destroy(db->db_buf, db); arc_buf_destroy(db->db_buf, db);
@ -2137,7 +2104,7 @@ dbuf_destroy(dmu_buf_impl_t *db)
if (multilist_link_active(&db->db_cache_link)) { if (multilist_link_active(&db->db_cache_link)) {
multilist_remove(dbuf_cache, db); multilist_remove(dbuf_cache, db);
(void) refcount_remove_many(&dbuf_cache_size, (void) zfs_refcount_remove_many(&dbuf_cache_size,
db->db.db_size, db); db->db.db_size, db);
} }
@ -2175,7 +2142,8 @@ dbuf_destroy(dmu_buf_impl_t *db)
* value in dnode_move(), since DB_DNODE_EXIT doesn't actually * value in dnode_move(), since DB_DNODE_EXIT doesn't actually
* release any lock. * release any lock.
*/ */
dnode_rele(dn, db); mutex_enter(&dn->dn_mtx);
dnode_rele_and_unlock(dn, db, B_TRUE);
db->db_dnode_handle = NULL; db->db_dnode_handle = NULL;
dbuf_hash_remove(db); dbuf_hash_remove(db);
@ -2183,7 +2151,7 @@ dbuf_destroy(dmu_buf_impl_t *db)
DB_DNODE_EXIT(db); DB_DNODE_EXIT(db);
} }
ASSERT(refcount_is_zero(&db->db_holds)); ASSERT(zfs_refcount_is_zero(&db->db_holds));
db->db_parent = NULL; db->db_parent = NULL;
@ -2201,8 +2169,10 @@ dbuf_destroy(dmu_buf_impl_t *db)
* If this dbuf is referenced from an indirect dbuf, * If this dbuf is referenced from an indirect dbuf,
* decrement the ref count on the indirect dbuf. * decrement the ref count on the indirect dbuf.
*/ */
if (parent && parent != dndb) if (parent && parent != dndb) {
dbuf_rele(parent, db); mutex_enter(&parent->db_mtx);
dbuf_rele_and_unlock(parent, db, B_TRUE);
}
} }
/* /*
@ -2380,8 +2350,8 @@ dbuf_create(dnode_t *dn, uint8_t level, uint64_t blkid,
dbuf_add_ref(parent, db); dbuf_add_ref(parent, db);
ASSERT(dn->dn_object == DMU_META_DNODE_OBJECT || ASSERT(dn->dn_object == DMU_META_DNODE_OBJECT ||
refcount_count(&dn->dn_holds) > 0); zfs_refcount_count(&dn->dn_holds) > 0);
(void) refcount_add(&dn->dn_holds, db); (void) zfs_refcount_add(&dn->dn_holds, db);
atomic_inc_32(&dn->dn_dbufs_count); atomic_inc_32(&dn->dn_dbufs_count);
dprintf_dbuf(db, "db=%p\n", db); dprintf_dbuf(db, "db=%p\n", db);
@ -2741,12 +2711,12 @@ __dbuf_hold_impl(struct dbuf_hold_impl_data *dh)
} }
if (multilist_link_active(&dh->dh_db->db_cache_link)) { if (multilist_link_active(&dh->dh_db->db_cache_link)) {
ASSERT(refcount_is_zero(&dh->dh_db->db_holds)); ASSERT(zfs_refcount_is_zero(&dh->dh_db->db_holds));
multilist_remove(dbuf_cache, dh->dh_db); multilist_remove(dbuf_cache, dh->dh_db);
(void) refcount_remove_many(&dbuf_cache_size, (void) zfs_refcount_remove_many(&dbuf_cache_size,
dh->dh_db->db.db_size, dh->dh_db); dh->dh_db->db.db_size, dh->dh_db);
} }
(void) refcount_add(&dh->dh_db->db_holds, dh->dh_tag); (void) zfs_refcount_add(&dh->dh_db->db_holds, dh->dh_tag);
DBUF_VERIFY(dh->dh_db); DBUF_VERIFY(dh->dh_db);
mutex_exit(&dh->dh_db->db_mtx); mutex_exit(&dh->dh_db->db_mtx);
@ -2870,7 +2840,7 @@ dbuf_rm_spill(dnode_t *dn, dmu_tx_t *tx)
void void
dbuf_add_ref(dmu_buf_impl_t *db, void *tag) dbuf_add_ref(dmu_buf_impl_t *db, void *tag)
{ {
int64_t holds = refcount_add(&db->db_holds, tag); int64_t holds = zfs_refcount_add(&db->db_holds, tag);
VERIFY3S(holds, >, 1); VERIFY3S(holds, >, 1);
} }
@ -2890,7 +2860,7 @@ dbuf_try_add_ref(dmu_buf_t *db_fake, objset_t *os, uint64_t obj, uint64_t blkid,
if (found_db != NULL) { if (found_db != NULL) {
if (db == found_db && dbuf_refcount(db) > db->db_dirtycnt) { if (db == found_db && dbuf_refcount(db) > db->db_dirtycnt) {
(void) refcount_add(&db->db_holds, tag); (void) zfs_refcount_add(&db->db_holds, tag);
result = B_TRUE; result = B_TRUE;
} }
mutex_exit(&found_db->db_mtx); mutex_exit(&found_db->db_mtx);
@ -2909,7 +2879,7 @@ void
dbuf_rele(dmu_buf_impl_t *db, void *tag) dbuf_rele(dmu_buf_impl_t *db, void *tag)
{ {
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
dbuf_rele_and_unlock(db, tag); dbuf_rele_and_unlock(db, tag, B_FALSE);
} }
void void
@ -2920,10 +2890,19 @@ dmu_buf_rele(dmu_buf_t *db, void *tag)
/* /*
* dbuf_rele() for an already-locked dbuf. This is necessary to allow * dbuf_rele() for an already-locked dbuf. This is necessary to allow
* db_dirtycnt and db_holds to be updated atomically. * db_dirtycnt and db_holds to be updated atomically. The 'evicting'
* argument should be set if we are already in the dbuf-evicting code
* path, in which case we don't want to recursively evict. This allows us to
* avoid deeply nested stacks that would have a call flow similar to this:
*
* dbuf_rele()-->dbuf_rele_and_unlock()-->dbuf_evict_notify()
* ^ |
* | |
* +-----dbuf_destroy()<--dbuf_evict_one()<--------+
*
*/ */
void void
dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag) dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag, boolean_t evicting)
{ {
int64_t holds; int64_t holds;
@ -2935,7 +2914,7 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
* dnode so we can guarantee in dnode_move() that a referenced bonus * dnode so we can guarantee in dnode_move() that a referenced bonus
* buffer has a corresponding dnode hold. * buffer has a corresponding dnode hold.
*/ */
holds = refcount_remove(&db->db_holds, tag); holds = zfs_refcount_remove(&db->db_holds, tag);
ASSERT(holds >= 0); ASSERT(holds >= 0);
/* /*
@ -3014,11 +2993,12 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
dbuf_destroy(db); dbuf_destroy(db);
} else if (!multilist_link_active(&db->db_cache_link)) { } else if (!multilist_link_active(&db->db_cache_link)) {
multilist_insert(dbuf_cache, db); multilist_insert(dbuf_cache, db);
(void) refcount_add_many(&dbuf_cache_size, (void) zfs_refcount_add_many(&dbuf_cache_size,
db->db.db_size, db); db->db.db_size, db);
mutex_exit(&db->db_mtx); mutex_exit(&db->db_mtx);
dbuf_evict_notify(); if (!evicting)
dbuf_evict_notify();
} }
if (do_arc_evict) if (do_arc_evict)
@ -3034,7 +3014,7 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
uint64_t uint64_t
dbuf_refcount(dmu_buf_impl_t *db) dbuf_refcount(dmu_buf_impl_t *db)
{ {
return (refcount_count(&db->db_holds)); return (zfs_refcount_count(&db->db_holds));
} }
void * void *
@ -3311,7 +3291,7 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx)
kmem_free(dr, sizeof (dbuf_dirty_record_t)); kmem_free(dr, sizeof (dbuf_dirty_record_t));
ASSERT(db->db_dirtycnt > 0); ASSERT(db->db_dirtycnt > 0);
db->db_dirtycnt -= 1; db->db_dirtycnt -= 1;
dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg); dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg, B_FALSE);
return; return;
} }
@ -3337,7 +3317,7 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx)
if (db->db_state != DB_NOFILL && if (db->db_state != DB_NOFILL &&
dn->dn_object != DMU_META_DNODE_OBJECT && dn->dn_object != DMU_META_DNODE_OBJECT &&
refcount_count(&db->db_holds) > 1 && zfs_refcount_count(&db->db_holds) > 1 &&
dr->dt.dl.dr_override_state != DR_OVERRIDDEN && dr->dt.dl.dr_override_state != DR_OVERRIDDEN &&
*datap == db->db_buf) { *datap == db->db_buf) {
/* /*
@ -3667,7 +3647,7 @@ dbuf_write_done(zio_t *zio, arc_buf_t *buf, void *vdb)
ASSERT(db->db_dirtycnt > 0); ASSERT(db->db_dirtycnt > 0);
db->db_dirtycnt -= 1; db->db_dirtycnt -= 1;
db->db_data_pending = NULL; db->db_data_pending = NULL;
dbuf_rele_and_unlock(db, (void *)(uintptr_t)tx->tx_txg); dbuf_rele_and_unlock(db, (void *)(uintptr_t)tx->tx_txg, B_FALSE);
} }
static void static void

View File

@ -89,7 +89,7 @@ __dbuf_stats_hash_table_data(char *buf, size_t size, dmu_buf_impl_t *db)
(u_longlong_t)db->db.db_size, (u_longlong_t)db->db.db_size,
!!dbuf_is_metadata(db), !!dbuf_is_metadata(db),
db->db_state, db->db_state,
(ulong_t)refcount_count(&db->db_holds), (ulong_t)zfs_refcount_count(&db->db_holds),
/* arc_buf_info_t */ /* arc_buf_info_t */
abi.abi_state_type, abi.abi_state_type,
abi.abi_state_contents, abi.abi_state_contents,
@ -113,7 +113,7 @@ __dbuf_stats_hash_table_data(char *buf, size_t size, dmu_buf_impl_t *db)
(ulong_t)doi.doi_metadata_block_size, (ulong_t)doi.doi_metadata_block_size,
(u_longlong_t)doi.doi_bonus_size, (u_longlong_t)doi.doi_bonus_size,
(ulong_t)doi.doi_indirection, (ulong_t)doi.doi_indirection,
(ulong_t)refcount_count(&dn->dn_holds), (ulong_t)zfs_refcount_count(&dn->dn_holds),
(u_longlong_t)doi.doi_fill_count, (u_longlong_t)doi.doi_fill_count,
(u_longlong_t)doi.doi_max_offset); (u_longlong_t)doi.doi_max_offset);

View File

@ -342,7 +342,7 @@ dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **dbp)
db = dn->dn_bonus; db = dn->dn_bonus;
/* as long as the bonus buf is held, the dnode will be held */ /* as long as the bonus buf is held, the dnode will be held */
if (refcount_add(&db->db_holds, tag) == 1) { if (zfs_refcount_add(&db->db_holds, tag) == 1) {
VERIFY(dnode_add_ref(dn, db)); VERIFY(dnode_add_ref(dn, db));
atomic_inc_32(&dn->dn_dbufs_count); atomic_inc_32(&dn->dn_dbufs_count);
} }
@ -2044,7 +2044,7 @@ dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole, uint64_t *off)
* Check if dnode is dirty * Check if dnode is dirty
*/ */
for (i = 0; i < TXG_SIZE; i++) { for (i = 0; i < TXG_SIZE; i++) {
if (list_link_active(&dn->dn_dirty_link[i])) { if (multilist_link_active(&dn->dn_dirty_link[i])) {
clean = B_FALSE; clean = B_FALSE;
break; break;
} }

View File

@ -1213,10 +1213,23 @@ dmu_objset_sync_dnodes(multilist_sublist_t *list, dmu_tx_t *tx)
ASSERT3U(dn->dn_nlevels, <=, DN_MAX_LEVELS); ASSERT3U(dn->dn_nlevels, <=, DN_MAX_LEVELS);
multilist_sublist_remove(list, dn); multilist_sublist_remove(list, dn);
/*
* If we are not doing useraccounting (os_synced_dnodes == NULL)
* we are done with this dnode for this txg. Unset dn_dirty_txg
* if later txgs aren't dirtying it so that future holders do
* not get a stale value. Otherwise, we will do this in
* userquota_updates_task() when processing has completely
* finished for this txg.
*/
multilist_t *newlist = dn->dn_objset->os_synced_dnodes; multilist_t *newlist = dn->dn_objset->os_synced_dnodes;
if (newlist != NULL) { if (newlist != NULL) {
(void) dnode_add_ref(dn, newlist); (void) dnode_add_ref(dn, newlist);
multilist_insert(newlist, dn); multilist_insert(newlist, dn);
} else {
mutex_enter(&dn->dn_mtx);
if (dn->dn_dirty_txg == tx->tx_txg)
dn->dn_dirty_txg = 0;
mutex_exit(&dn->dn_mtx);
} }
dnode_sync(dn, tx); dnode_sync(dn, tx);
@ -1621,6 +1634,8 @@ userquota_updates_task(void *arg)
dn->dn_id_flags |= DN_ID_CHKED_BONUS; dn->dn_id_flags |= DN_ID_CHKED_BONUS;
} }
dn->dn_id_flags &= ~(DN_ID_NEW_EXIST); dn->dn_id_flags &= ~(DN_ID_NEW_EXIST);
if (dn->dn_dirty_txg == spa_syncing_txg(os->os_spa))
dn->dn_dirty_txg = 0;
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
multilist_sublist_remove(list, dn); multilist_sublist_remove(list, dn);

View File

@ -114,7 +114,7 @@ dmu_tx_hold_dnode_impl(dmu_tx_t *tx, dnode_t *dn, enum dmu_tx_hold_type type,
dmu_tx_hold_t *txh; dmu_tx_hold_t *txh;
if (dn != NULL) { if (dn != NULL) {
(void) refcount_add(&dn->dn_holds, tx); (void) zfs_refcount_add(&dn->dn_holds, tx);
if (tx->tx_txg != 0) { if (tx->tx_txg != 0) {
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
/* /*
@ -124,7 +124,7 @@ dmu_tx_hold_dnode_impl(dmu_tx_t *tx, dnode_t *dn, enum dmu_tx_hold_type type,
*/ */
ASSERT(dn->dn_assigned_txg == 0); ASSERT(dn->dn_assigned_txg == 0);
dn->dn_assigned_txg = tx->tx_txg; dn->dn_assigned_txg = tx->tx_txg;
(void) refcount_add(&dn->dn_tx_holds, tx); (void) zfs_refcount_add(&dn->dn_tx_holds, tx);
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
} }
} }
@ -132,8 +132,8 @@ dmu_tx_hold_dnode_impl(dmu_tx_t *tx, dnode_t *dn, enum dmu_tx_hold_type type,
txh = kmem_zalloc(sizeof (dmu_tx_hold_t), KM_SLEEP); txh = kmem_zalloc(sizeof (dmu_tx_hold_t), KM_SLEEP);
txh->txh_tx = tx; txh->txh_tx = tx;
txh->txh_dnode = dn; txh->txh_dnode = dn;
refcount_create(&txh->txh_space_towrite); zfs_refcount_create(&txh->txh_space_towrite);
refcount_create(&txh->txh_memory_tohold); zfs_refcount_create(&txh->txh_memory_tohold);
txh->txh_type = type; txh->txh_type = type;
txh->txh_arg1 = arg1; txh->txh_arg1 = arg1;
txh->txh_arg2 = arg2; txh->txh_arg2 = arg2;
@ -228,9 +228,9 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
if (len == 0) if (len == 0)
return; return;
(void) refcount_add_many(&txh->txh_space_towrite, len, FTAG); (void) zfs_refcount_add_many(&txh->txh_space_towrite, len, FTAG);
if (refcount_count(&txh->txh_space_towrite) > 2 * DMU_MAX_ACCESS) if (zfs_refcount_count(&txh->txh_space_towrite) > 2 * DMU_MAX_ACCESS)
err = SET_ERROR(EFBIG); err = SET_ERROR(EFBIG);
if (dn == NULL) if (dn == NULL)
@ -295,7 +295,8 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
static void static void
dmu_tx_count_dnode(dmu_tx_hold_t *txh) dmu_tx_count_dnode(dmu_tx_hold_t *txh)
{ {
(void) refcount_add_many(&txh->txh_space_towrite, DNODE_MIN_SIZE, FTAG); (void) zfs_refcount_add_many(&txh->txh_space_towrite, DNODE_MIN_SIZE,
FTAG);
} }
void void
@ -418,7 +419,7 @@ dmu_tx_hold_free_impl(dmu_tx_hold_t *txh, uint64_t off, uint64_t len)
return; return;
} }
(void) refcount_add_many(&txh->txh_memory_tohold, (void) zfs_refcount_add_many(&txh->txh_memory_tohold,
1 << dn->dn_indblkshift, FTAG); 1 << dn->dn_indblkshift, FTAG);
err = dmu_tx_check_ioerr(zio, dn, 1, i); err = dmu_tx_check_ioerr(zio, dn, 1, i);
@ -477,7 +478,7 @@ dmu_tx_hold_zap_impl(dmu_tx_hold_t *txh, const char *name)
* - 2 blocks for possibly split leaves, * - 2 blocks for possibly split leaves,
* - 2 grown ptrtbl blocks * - 2 grown ptrtbl blocks
*/ */
(void) refcount_add_many(&txh->txh_space_towrite, (void) zfs_refcount_add_many(&txh->txh_space_towrite,
MZAP_MAX_BLKSZ, FTAG); MZAP_MAX_BLKSZ, FTAG);
if (dn == NULL) if (dn == NULL)
@ -568,7 +569,8 @@ dmu_tx_hold_space(dmu_tx_t *tx, uint64_t space)
txh = dmu_tx_hold_object_impl(tx, tx->tx_objset, txh = dmu_tx_hold_object_impl(tx, tx->tx_objset,
DMU_NEW_OBJECT, THT_SPACE, space, 0); DMU_NEW_OBJECT, THT_SPACE, space, 0);
if (txh) if (txh)
(void) refcount_add_many(&txh->txh_space_towrite, space, FTAG); (void) zfs_refcount_add_many(&txh->txh_space_towrite, space,
FTAG);
} }
#ifdef ZFS_DEBUG #ifdef ZFS_DEBUG
@ -916,11 +918,11 @@ dmu_tx_try_assign(dmu_tx_t *tx, uint64_t txg_how)
if (dn->dn_assigned_txg == 0) if (dn->dn_assigned_txg == 0)
dn->dn_assigned_txg = tx->tx_txg; dn->dn_assigned_txg = tx->tx_txg;
ASSERT3U(dn->dn_assigned_txg, ==, tx->tx_txg); ASSERT3U(dn->dn_assigned_txg, ==, tx->tx_txg);
(void) refcount_add(&dn->dn_tx_holds, tx); (void) zfs_refcount_add(&dn->dn_tx_holds, tx);
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
} }
towrite += refcount_count(&txh->txh_space_towrite); towrite += zfs_refcount_count(&txh->txh_space_towrite);
tohold += refcount_count(&txh->txh_memory_tohold); tohold += zfs_refcount_count(&txh->txh_memory_tohold);
} }
/* needed allocation: worst-case estimate of write space */ /* needed allocation: worst-case estimate of write space */
@ -962,7 +964,7 @@ dmu_tx_unassign(dmu_tx_t *tx)
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
ASSERT3U(dn->dn_assigned_txg, ==, tx->tx_txg); ASSERT3U(dn->dn_assigned_txg, ==, tx->tx_txg);
if (refcount_remove(&dn->dn_tx_holds, tx) == 0) { if (zfs_refcount_remove(&dn->dn_tx_holds, tx) == 0) {
dn->dn_assigned_txg = 0; dn->dn_assigned_txg = 0;
cv_broadcast(&dn->dn_notxholds); cv_broadcast(&dn->dn_notxholds);
} }
@ -1100,10 +1102,10 @@ dmu_tx_destroy(dmu_tx_t *tx)
dnode_t *dn = txh->txh_dnode; dnode_t *dn = txh->txh_dnode;
list_remove(&tx->tx_holds, txh); list_remove(&tx->tx_holds, txh);
refcount_destroy_many(&txh->txh_space_towrite, zfs_refcount_destroy_many(&txh->txh_space_towrite,
refcount_count(&txh->txh_space_towrite)); zfs_refcount_count(&txh->txh_space_towrite));
refcount_destroy_many(&txh->txh_memory_tohold, zfs_refcount_destroy_many(&txh->txh_memory_tohold,
refcount_count(&txh->txh_memory_tohold)); zfs_refcount_count(&txh->txh_memory_tohold));
kmem_free(txh, sizeof (dmu_tx_hold_t)); kmem_free(txh, sizeof (dmu_tx_hold_t));
if (dn != NULL) if (dn != NULL)
dnode_rele(dn, tx); dnode_rele(dn, tx);
@ -1135,7 +1137,7 @@ dmu_tx_commit(dmu_tx_t *tx)
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
ASSERT3U(dn->dn_assigned_txg, ==, tx->tx_txg); ASSERT3U(dn->dn_assigned_txg, ==, tx->tx_txg);
if (refcount_remove(&dn->dn_tx_holds, tx) == 0) { if (zfs_refcount_remove(&dn->dn_tx_holds, tx) == 0) {
dn->dn_assigned_txg = 0; dn->dn_assigned_txg = 0;
cv_broadcast(&dn->dn_notxholds); cv_broadcast(&dn->dn_notxholds);
} }
@ -1250,7 +1252,7 @@ dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object)
txh = dmu_tx_hold_object_impl(tx, tx->tx_objset, object, txh = dmu_tx_hold_object_impl(tx, tx->tx_objset, object,
THT_SPILL, 0, 0); THT_SPILL, 0, 0);
if (txh != NULL) if (txh != NULL)
(void) refcount_add_many(&txh->txh_space_towrite, (void) zfs_refcount_add_many(&txh->txh_space_towrite,
SPA_OLD_MAXBLOCKSIZE, FTAG); SPA_OLD_MAXBLOCKSIZE, FTAG);
} }

View File

@ -124,8 +124,8 @@ dnode_cons(void *arg, void *unused, int kmflag)
* Every dbuf has a reference, and dropping a tracked reference is * Every dbuf has a reference, and dropping a tracked reference is
* O(number of references), so don't track dn_holds. * O(number of references), so don't track dn_holds.
*/ */
refcount_create_untracked(&dn->dn_holds); zfs_refcount_create_untracked(&dn->dn_holds);
refcount_create(&dn->dn_tx_holds); zfs_refcount_create(&dn->dn_tx_holds);
list_link_init(&dn->dn_link); list_link_init(&dn->dn_link);
bzero(&dn->dn_next_nblkptr[0], sizeof (dn->dn_next_nblkptr)); bzero(&dn->dn_next_nblkptr[0], sizeof (dn->dn_next_nblkptr));
@ -137,7 +137,7 @@ dnode_cons(void *arg, void *unused, int kmflag)
bzero(&dn->dn_next_blksz[0], sizeof (dn->dn_next_blksz)); bzero(&dn->dn_next_blksz[0], sizeof (dn->dn_next_blksz));
for (i = 0; i < TXG_SIZE; i++) { for (i = 0; i < TXG_SIZE; i++) {
list_link_init(&dn->dn_dirty_link[i]); multilist_link_init(&dn->dn_dirty_link[i]);
dn->dn_free_ranges[i] = NULL; dn->dn_free_ranges[i] = NULL;
list_create(&dn->dn_dirty_records[i], list_create(&dn->dn_dirty_records[i],
sizeof (dbuf_dirty_record_t), sizeof (dbuf_dirty_record_t),
@ -147,6 +147,7 @@ dnode_cons(void *arg, void *unused, int kmflag)
dn->dn_allocated_txg = 0; dn->dn_allocated_txg = 0;
dn->dn_free_txg = 0; dn->dn_free_txg = 0;
dn->dn_assigned_txg = 0; dn->dn_assigned_txg = 0;
dn->dn_dirty_txg = 0;
dn->dn_dirtyctx = 0; dn->dn_dirtyctx = 0;
dn->dn_dirtyctx_firstset = NULL; dn->dn_dirtyctx_firstset = NULL;
dn->dn_bonus = NULL; dn->dn_bonus = NULL;
@ -179,12 +180,12 @@ dnode_dest(void *arg, void *unused)
mutex_destroy(&dn->dn_mtx); mutex_destroy(&dn->dn_mtx);
mutex_destroy(&dn->dn_dbufs_mtx); mutex_destroy(&dn->dn_dbufs_mtx);
cv_destroy(&dn->dn_notxholds); cv_destroy(&dn->dn_notxholds);
refcount_destroy(&dn->dn_holds); zfs_refcount_destroy(&dn->dn_holds);
refcount_destroy(&dn->dn_tx_holds); zfs_refcount_destroy(&dn->dn_tx_holds);
ASSERT(!list_link_active(&dn->dn_link)); ASSERT(!list_link_active(&dn->dn_link));
for (i = 0; i < TXG_SIZE; i++) { for (i = 0; i < TXG_SIZE; i++) {
ASSERT(!list_link_active(&dn->dn_dirty_link[i])); ASSERT(!multilist_link_active(&dn->dn_dirty_link[i]));
ASSERT3P(dn->dn_free_ranges[i], ==, NULL); ASSERT3P(dn->dn_free_ranges[i], ==, NULL);
list_destroy(&dn->dn_dirty_records[i]); list_destroy(&dn->dn_dirty_records[i]);
ASSERT0(dn->dn_next_nblkptr[i]); ASSERT0(dn->dn_next_nblkptr[i]);
@ -199,6 +200,7 @@ dnode_dest(void *arg, void *unused)
ASSERT0(dn->dn_allocated_txg); ASSERT0(dn->dn_allocated_txg);
ASSERT0(dn->dn_free_txg); ASSERT0(dn->dn_free_txg);
ASSERT0(dn->dn_assigned_txg); ASSERT0(dn->dn_assigned_txg);
ASSERT0(dn->dn_dirty_txg);
ASSERT0(dn->dn_dirtyctx); ASSERT0(dn->dn_dirtyctx);
ASSERT3P(dn->dn_dirtyctx_firstset, ==, NULL); ASSERT3P(dn->dn_dirtyctx_firstset, ==, NULL);
ASSERT3P(dn->dn_bonus, ==, NULL); ASSERT3P(dn->dn_bonus, ==, NULL);
@ -375,7 +377,7 @@ dnode_buf_byteswap(void *vbuf, size_t size)
void void
dnode_setbonuslen(dnode_t *dn, int newsize, dmu_tx_t *tx) dnode_setbonuslen(dnode_t *dn, int newsize, dmu_tx_t *tx)
{ {
ASSERT3U(refcount_count(&dn->dn_holds), >=, 1); ASSERT3U(zfs_refcount_count(&dn->dn_holds), >=, 1);
dnode_setdirty(dn, tx); dnode_setdirty(dn, tx);
rw_enter(&dn->dn_struct_rwlock, RW_WRITER); rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
@ -392,7 +394,7 @@ dnode_setbonuslen(dnode_t *dn, int newsize, dmu_tx_t *tx)
void void
dnode_setbonus_type(dnode_t *dn, dmu_object_type_t newtype, dmu_tx_t *tx) dnode_setbonus_type(dnode_t *dn, dmu_object_type_t newtype, dmu_tx_t *tx)
{ {
ASSERT3U(refcount_count(&dn->dn_holds), >=, 1); ASSERT3U(zfs_refcount_count(&dn->dn_holds), >=, 1);
dnode_setdirty(dn, tx); dnode_setdirty(dn, tx);
rw_enter(&dn->dn_struct_rwlock, RW_WRITER); rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
dn->dn_bonustype = newtype; dn->dn_bonustype = newtype;
@ -403,7 +405,7 @@ dnode_setbonus_type(dnode_t *dn, dmu_object_type_t newtype, dmu_tx_t *tx)
void void
dnode_rm_spill(dnode_t *dn, dmu_tx_t *tx) dnode_rm_spill(dnode_t *dn, dmu_tx_t *tx)
{ {
ASSERT3U(refcount_count(&dn->dn_holds), >=, 1); ASSERT3U(zfs_refcount_count(&dn->dn_holds), >=, 1);
ASSERT(RW_WRITE_HELD(&dn->dn_struct_rwlock)); ASSERT(RW_WRITE_HELD(&dn->dn_struct_rwlock));
dnode_setdirty(dn, tx); dnode_setdirty(dn, tx);
dn->dn_rm_spillblk[tx->tx_txg&TXG_MASK] = DN_KILL_SPILLBLK; dn->dn_rm_spillblk[tx->tx_txg&TXG_MASK] = DN_KILL_SPILLBLK;
@ -523,6 +525,7 @@ dnode_destroy(dnode_t *dn)
dn->dn_allocated_txg = 0; dn->dn_allocated_txg = 0;
dn->dn_free_txg = 0; dn->dn_free_txg = 0;
dn->dn_assigned_txg = 0; dn->dn_assigned_txg = 0;
dn->dn_dirty_txg = 0;
dn->dn_dirtyctx = 0; dn->dn_dirtyctx = 0;
if (dn->dn_dirtyctx_firstset != NULL) { if (dn->dn_dirtyctx_firstset != NULL) {
@ -592,8 +595,9 @@ dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs,
ASSERT0(dn->dn_maxblkid); ASSERT0(dn->dn_maxblkid);
ASSERT0(dn->dn_allocated_txg); ASSERT0(dn->dn_allocated_txg);
ASSERT0(dn->dn_assigned_txg); ASSERT0(dn->dn_assigned_txg);
ASSERT(refcount_is_zero(&dn->dn_tx_holds)); ASSERT0(dn->dn_dirty_txg);
ASSERT3U(refcount_count(&dn->dn_holds), <=, 1); ASSERT(zfs_refcount_is_zero(&dn->dn_tx_holds));
ASSERT3U(zfs_refcount_count(&dn->dn_holds), <=, 1);
ASSERT(avl_is_empty(&dn->dn_dbufs)); ASSERT(avl_is_empty(&dn->dn_dbufs));
for (i = 0; i < TXG_SIZE; i++) { for (i = 0; i < TXG_SIZE; i++) {
@ -604,7 +608,7 @@ dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs,
ASSERT0(dn->dn_next_bonustype[i]); ASSERT0(dn->dn_next_bonustype[i]);
ASSERT0(dn->dn_rm_spillblk[i]); ASSERT0(dn->dn_rm_spillblk[i]);
ASSERT0(dn->dn_next_blksz[i]); ASSERT0(dn->dn_next_blksz[i]);
ASSERT(!list_link_active(&dn->dn_dirty_link[i])); ASSERT(!multilist_link_active(&dn->dn_dirty_link[i]));
ASSERT3P(list_head(&dn->dn_dirty_records[i]), ==, NULL); ASSERT3P(list_head(&dn->dn_dirty_records[i]), ==, NULL);
ASSERT3P(dn->dn_free_ranges[i], ==, NULL); ASSERT3P(dn->dn_free_ranges[i], ==, NULL);
} }
@ -779,10 +783,11 @@ dnode_move_impl(dnode_t *odn, dnode_t *ndn)
ndn->dn_allocated_txg = odn->dn_allocated_txg; ndn->dn_allocated_txg = odn->dn_allocated_txg;
ndn->dn_free_txg = odn->dn_free_txg; ndn->dn_free_txg = odn->dn_free_txg;
ndn->dn_assigned_txg = odn->dn_assigned_txg; ndn->dn_assigned_txg = odn->dn_assigned_txg;
ndn->dn_dirty_txg = odn->dn_dirty_txg;
ndn->dn_dirtyctx = odn->dn_dirtyctx; ndn->dn_dirtyctx = odn->dn_dirtyctx;
ndn->dn_dirtyctx_firstset = odn->dn_dirtyctx_firstset; ndn->dn_dirtyctx_firstset = odn->dn_dirtyctx_firstset;
ASSERT(refcount_count(&odn->dn_tx_holds) == 0); ASSERT(zfs_refcount_count(&odn->dn_tx_holds) == 0);
refcount_transfer(&ndn->dn_holds, &odn->dn_holds); zfs_refcount_transfer(&ndn->dn_holds, &odn->dn_holds);
ASSERT(avl_is_empty(&ndn->dn_dbufs)); ASSERT(avl_is_empty(&ndn->dn_dbufs));
avl_swap(&ndn->dn_dbufs, &odn->dn_dbufs); avl_swap(&ndn->dn_dbufs, &odn->dn_dbufs);
ndn->dn_dbufs_count = odn->dn_dbufs_count; ndn->dn_dbufs_count = odn->dn_dbufs_count;
@ -845,6 +850,7 @@ dnode_move_impl(dnode_t *odn, dnode_t *ndn)
odn->dn_allocated_txg = 0; odn->dn_allocated_txg = 0;
odn->dn_free_txg = 0; odn->dn_free_txg = 0;
odn->dn_assigned_txg = 0; odn->dn_assigned_txg = 0;
odn->dn_dirty_txg = 0;
odn->dn_dirtyctx = 0; odn->dn_dirtyctx = 0;
odn->dn_dirtyctx_firstset = NULL; odn->dn_dirtyctx_firstset = NULL;
odn->dn_have_spill = B_FALSE; odn->dn_have_spill = B_FALSE;
@ -969,7 +975,7 @@ dnode_move(void *buf, void *newbuf, size_t size, void *arg)
* hold before the dbuf is removed, the hold is discounted, and the * hold before the dbuf is removed, the hold is discounted, and the
* removal is blocked until the move completes. * removal is blocked until the move completes.
*/ */
refcount = refcount_count(&odn->dn_holds); refcount = zfs_refcount_count(&odn->dn_holds);
ASSERT(refcount >= 0); ASSERT(refcount >= 0);
dbufs = odn->dn_dbufs_count; dbufs = odn->dn_dbufs_count;
@ -997,7 +1003,7 @@ dnode_move(void *buf, void *newbuf, size_t size, void *arg)
list_link_replace(&odn->dn_link, &ndn->dn_link); list_link_replace(&odn->dn_link, &ndn->dn_link);
/* If the dnode was safe to move, the refcount cannot have changed. */ /* If the dnode was safe to move, the refcount cannot have changed. */
ASSERT(refcount == refcount_count(&ndn->dn_holds)); ASSERT(refcount == zfs_refcount_count(&ndn->dn_holds));
ASSERT(dbufs == ndn->dn_dbufs_count); ASSERT(dbufs == ndn->dn_dbufs_count);
zrl_exit(&ndn->dn_handle->dnh_zrlock); /* handle has moved */ zrl_exit(&ndn->dn_handle->dnh_zrlock); /* handle has moved */
mutex_exit(&os->os_lock); mutex_exit(&os->os_lock);
@ -1069,6 +1075,10 @@ dnode_check_slots_free(dnode_children_t *children, int idx, int slots)
{ {
ASSERT3S(idx + slots, <=, DNODES_PER_BLOCK); ASSERT3S(idx + slots, <=, DNODES_PER_BLOCK);
/*
* If all dnode slots are either already free or
* evictable return B_TRUE.
*/
for (int i = idx; i < idx + slots; i++) { for (int i = idx; i < idx + slots; i++) {
dnode_handle_t *dnh = &children->dnc_children[i]; dnode_handle_t *dnh = &children->dnc_children[i];
dnode_t *dn = dnh->dnh_dnode; dnode_t *dn = dnh->dnh_dnode;
@ -1077,18 +1087,17 @@ dnode_check_slots_free(dnode_children_t *children, int idx, int slots)
continue; continue;
} else if (DN_SLOT_IS_PTR(dn)) { } else if (DN_SLOT_IS_PTR(dn)) {
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
dmu_object_type_t type = dn->dn_type; boolean_t can_free = (dn->dn_type == DMU_OT_NONE &&
!DNODE_IS_DIRTY(dn));
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
if (type != DMU_OT_NONE) if (!can_free)
return (B_FALSE); return (B_FALSE);
else
continue; continue;
} else { } else {
return (B_FALSE); return (B_FALSE);
} }
return (B_FALSE);
} }
return (B_TRUE); return (B_TRUE);
@ -1143,7 +1152,7 @@ dnode_special_close(dnode_handle_t *dnh)
* has a hold on this dnode while we are trying to evict this * has a hold on this dnode while we are trying to evict this
* dnode. * dnode.
*/ */
while (refcount_count(&dn->dn_holds) > 0) while (zfs_refcount_count(&dn->dn_holds) > 0)
delay(1); delay(1);
ASSERT(dn->dn_dbuf == NULL || ASSERT(dn->dn_dbuf == NULL ||
dmu_buf_get_user(&dn->dn_dbuf->db) == NULL); dmu_buf_get_user(&dn->dn_dbuf->db) == NULL);
@ -1198,8 +1207,8 @@ dnode_buf_evict_async(void *dbu)
* it wouldn't be eligible for eviction and this function * it wouldn't be eligible for eviction and this function
* would not have been called. * would not have been called.
*/ */
ASSERT(refcount_is_zero(&dn->dn_holds)); ASSERT(zfs_refcount_is_zero(&dn->dn_holds));
ASSERT(refcount_is_zero(&dn->dn_tx_holds)); ASSERT(zfs_refcount_is_zero(&dn->dn_tx_holds));
dnode_destroy(dn); /* implicit zrl_remove() for first slot */ dnode_destroy(dn); /* implicit zrl_remove() for first slot */
zrl_destroy(&dnh->dnh_zrlock); zrl_destroy(&dnh->dnh_zrlock);
@ -1258,7 +1267,7 @@ dnode_hold_impl(objset_t *os, uint64_t object, int flag, int slots,
if ((flag & DNODE_MUST_BE_FREE) && type != DMU_OT_NONE) if ((flag & DNODE_MUST_BE_FREE) && type != DMU_OT_NONE)
return (SET_ERROR(EEXIST)); return (SET_ERROR(EEXIST));
DNODE_VERIFY(dn); DNODE_VERIFY(dn);
(void) refcount_add(&dn->dn_holds, tag); (void) zfs_refcount_add(&dn->dn_holds, tag);
*dnp = dn; *dnp = dn;
return (0); return (0);
} }
@ -1451,7 +1460,7 @@ dnode_hold_impl(objset_t *os, uint64_t object, int flag, int slots,
} }
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
if (!refcount_is_zero(&dn->dn_holds)) { if (!zfs_refcount_is_zero(&dn->dn_holds)) {
DNODE_STAT_BUMP(dnode_hold_free_refcount); DNODE_STAT_BUMP(dnode_hold_free_refcount);
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
dnode_slots_rele(dnc, idx, slots); dnode_slots_rele(dnc, idx, slots);
@ -1475,7 +1484,7 @@ dnode_hold_impl(objset_t *os, uint64_t object, int flag, int slots,
return (type == DMU_OT_NONE ? ENOENT : EEXIST); return (type == DMU_OT_NONE ? ENOENT : EEXIST);
} }
if (refcount_add(&dn->dn_holds, tag) == 1) if (zfs_refcount_add(&dn->dn_holds, tag) == 1)
dbuf_add_ref(db, dnh); dbuf_add_ref(db, dnh);
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
@ -1511,11 +1520,11 @@ boolean_t
dnode_add_ref(dnode_t *dn, void *tag) dnode_add_ref(dnode_t *dn, void *tag)
{ {
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
if (refcount_is_zero(&dn->dn_holds)) { if (zfs_refcount_is_zero(&dn->dn_holds)) {
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
return (FALSE); return (FALSE);
} }
VERIFY(1 < refcount_add(&dn->dn_holds, tag)); VERIFY(1 < zfs_refcount_add(&dn->dn_holds, tag));
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
return (TRUE); return (TRUE);
} }
@ -1524,18 +1533,18 @@ void
dnode_rele(dnode_t *dn, void *tag) dnode_rele(dnode_t *dn, void *tag)
{ {
mutex_enter(&dn->dn_mtx); mutex_enter(&dn->dn_mtx);
dnode_rele_and_unlock(dn, tag); dnode_rele_and_unlock(dn, tag, B_FALSE);
} }
void void
dnode_rele_and_unlock(dnode_t *dn, void *tag) dnode_rele_and_unlock(dnode_t *dn, void *tag, boolean_t evicting)
{ {
uint64_t refs; uint64_t refs;
/* Get while the hold prevents the dnode from moving. */ /* Get while the hold prevents the dnode from moving. */
dmu_buf_impl_t *db = dn->dn_dbuf; dmu_buf_impl_t *db = dn->dn_dbuf;
dnode_handle_t *dnh = dn->dn_handle; dnode_handle_t *dnh = dn->dn_handle;
refs = refcount_remove(&dn->dn_holds, tag); refs = zfs_refcount_remove(&dn->dn_holds, tag);
mutex_exit(&dn->dn_mtx); mutex_exit(&dn->dn_mtx);
/* /*
@ -1559,7 +1568,8 @@ dnode_rele_and_unlock(dnode_t *dn, void *tag)
* that the handle has zero references, but that will be * that the handle has zero references, but that will be
* asserted anyway when the handle gets destroyed. * asserted anyway when the handle gets destroyed.
*/ */
dbuf_rele(db, dnh); mutex_enter(&db->db_mtx);
dbuf_rele_and_unlock(db, dnh, evicting);
} }
} }
@ -1594,12 +1604,12 @@ dnode_setdirty(dnode_t *dn, dmu_tx_t *tx)
/* /*
* If we are already marked dirty, we're done. * If we are already marked dirty, we're done.
*/ */
if (list_link_active(&dn->dn_dirty_link[txg & TXG_MASK])) { if (multilist_link_active(&dn->dn_dirty_link[txg & TXG_MASK])) {
multilist_sublist_unlock(mls); multilist_sublist_unlock(mls);
return; return;
} }
ASSERT(!refcount_is_zero(&dn->dn_holds) || ASSERT(!zfs_refcount_is_zero(&dn->dn_holds) ||
!avl_is_empty(&dn->dn_dbufs)); !avl_is_empty(&dn->dn_dbufs));
ASSERT(dn->dn_datablksz != 0); ASSERT(dn->dn_datablksz != 0);
ASSERT0(dn->dn_next_bonuslen[txg&TXG_MASK]); ASSERT0(dn->dn_next_bonuslen[txg&TXG_MASK]);

View File

@ -21,7 +21,7 @@
/* /*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
*/ */
@ -422,13 +422,26 @@ dnode_evict_dbufs(dnode_t *dn)
mutex_enter(&db->db_mtx); mutex_enter(&db->db_mtx);
if (db->db_state != DB_EVICTING && if (db->db_state != DB_EVICTING &&
refcount_is_zero(&db->db_holds)) { zfs_refcount_is_zero(&db->db_holds)) {
db_marker->db_level = db->db_level; db_marker->db_level = db->db_level;
db_marker->db_blkid = db->db_blkid; db_marker->db_blkid = db->db_blkid;
db_marker->db_state = DB_SEARCH; db_marker->db_state = DB_SEARCH;
avl_insert_here(&dn->dn_dbufs, db_marker, db, avl_insert_here(&dn->dn_dbufs, db_marker, db,
AVL_BEFORE); AVL_BEFORE);
/*
* We need to use the "marker" dbuf rather than
* simply getting the next dbuf, because
* dbuf_destroy() may actually remove multiple dbufs.
* It can call itself recursively on the parent dbuf,
* which may also be removed from dn_dbufs. The code
* flow would look like:
*
* dbuf_destroy():
* dnode_rele_and_unlock(parent_dbuf, evicting=TRUE):
* if (!cacheable || pending_evict)
* dbuf_destroy()
*/
dbuf_destroy(db); dbuf_destroy(db);
db_next = AVL_NEXT(&dn->dn_dbufs, db_marker); db_next = AVL_NEXT(&dn->dn_dbufs, db_marker);
@ -451,7 +464,7 @@ dnode_evict_bonus(dnode_t *dn)
{ {
rw_enter(&dn->dn_struct_rwlock, RW_WRITER); rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
if (dn->dn_bonus != NULL) { if (dn->dn_bonus != NULL) {
if (refcount_is_zero(&dn->dn_bonus->db_holds)) { if (zfs_refcount_is_zero(&dn->dn_bonus->db_holds)) {
mutex_enter(&dn->dn_bonus->db_mtx); mutex_enter(&dn->dn_bonus->db_mtx);
dbuf_destroy(dn->dn_bonus); dbuf_destroy(dn->dn_bonus);
dn->dn_bonus = NULL; dn->dn_bonus = NULL;
@ -489,7 +502,7 @@ dnode_undirty_dbufs(list_t *list)
list_destroy(&dr->dt.di.dr_children); list_destroy(&dr->dt.di.dr_children);
} }
kmem_free(dr, sizeof (dbuf_dirty_record_t)); kmem_free(dr, sizeof (dbuf_dirty_record_t));
dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg); dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg, B_FALSE);
} }
} }
@ -517,7 +530,7 @@ dnode_sync_free(dnode_t *dn, dmu_tx_t *tx)
* zfs_obj_to_path() also depends on this being * zfs_obj_to_path() also depends on this being
* commented out. * commented out.
* *
* ASSERT3U(refcount_count(&dn->dn_holds), ==, 1); * ASSERT3U(zfs_refcount_count(&dn->dn_holds), ==, 1);
*/ */
/* Undirty next bits */ /* Undirty next bits */

View File

@ -287,7 +287,7 @@ dsl_dataset_evict_async(void *dbu)
mutex_destroy(&ds->ds_lock); mutex_destroy(&ds->ds_lock);
mutex_destroy(&ds->ds_opening_lock); mutex_destroy(&ds->ds_opening_lock);
mutex_destroy(&ds->ds_sendstream_lock); mutex_destroy(&ds->ds_sendstream_lock);
refcount_destroy(&ds->ds_longholds); zfs_refcount_destroy(&ds->ds_longholds);
rrw_destroy(&ds->ds_bp_rwlock); rrw_destroy(&ds->ds_bp_rwlock);
kmem_free(ds, sizeof (dsl_dataset_t)); kmem_free(ds, sizeof (dsl_dataset_t));
@ -422,7 +422,7 @@ dsl_dataset_hold_obj(dsl_pool_t *dp, uint64_t dsobj, void *tag,
mutex_init(&ds->ds_opening_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&ds->ds_opening_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&ds->ds_sendstream_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&ds->ds_sendstream_lock, NULL, MUTEX_DEFAULT, NULL);
rrw_init(&ds->ds_bp_rwlock, B_FALSE); rrw_init(&ds->ds_bp_rwlock, B_FALSE);
refcount_create(&ds->ds_longholds); zfs_refcount_create(&ds->ds_longholds);
bplist_create(&ds->ds_pending_deadlist); bplist_create(&ds->ds_pending_deadlist);
dsl_deadlist_open(&ds->ds_deadlist, dsl_deadlist_open(&ds->ds_deadlist,
@ -458,7 +458,7 @@ dsl_dataset_hold_obj(dsl_pool_t *dp, uint64_t dsobj, void *tag,
mutex_destroy(&ds->ds_lock); mutex_destroy(&ds->ds_lock);
mutex_destroy(&ds->ds_opening_lock); mutex_destroy(&ds->ds_opening_lock);
mutex_destroy(&ds->ds_sendstream_lock); mutex_destroy(&ds->ds_sendstream_lock);
refcount_destroy(&ds->ds_longholds); zfs_refcount_destroy(&ds->ds_longholds);
bplist_destroy(&ds->ds_pending_deadlist); bplist_destroy(&ds->ds_pending_deadlist);
dsl_deadlist_close(&ds->ds_deadlist); dsl_deadlist_close(&ds->ds_deadlist);
kmem_free(ds, sizeof (dsl_dataset_t)); kmem_free(ds, sizeof (dsl_dataset_t));
@ -520,7 +520,7 @@ dsl_dataset_hold_obj(dsl_pool_t *dp, uint64_t dsobj, void *tag,
mutex_destroy(&ds->ds_lock); mutex_destroy(&ds->ds_lock);
mutex_destroy(&ds->ds_opening_lock); mutex_destroy(&ds->ds_opening_lock);
mutex_destroy(&ds->ds_sendstream_lock); mutex_destroy(&ds->ds_sendstream_lock);
refcount_destroy(&ds->ds_longholds); zfs_refcount_destroy(&ds->ds_longholds);
kmem_free(ds, sizeof (dsl_dataset_t)); kmem_free(ds, sizeof (dsl_dataset_t));
if (err != 0) { if (err != 0) {
dmu_buf_rele(dbuf, tag); dmu_buf_rele(dbuf, tag);
@ -645,20 +645,20 @@ void
dsl_dataset_long_hold(dsl_dataset_t *ds, void *tag) dsl_dataset_long_hold(dsl_dataset_t *ds, void *tag)
{ {
ASSERT(dsl_pool_config_held(ds->ds_dir->dd_pool)); ASSERT(dsl_pool_config_held(ds->ds_dir->dd_pool));
(void) refcount_add(&ds->ds_longholds, tag); (void) zfs_refcount_add(&ds->ds_longholds, tag);
} }
void void
dsl_dataset_long_rele(dsl_dataset_t *ds, void *tag) dsl_dataset_long_rele(dsl_dataset_t *ds, void *tag)
{ {
(void) refcount_remove(&ds->ds_longholds, tag); (void) zfs_refcount_remove(&ds->ds_longholds, tag);
} }
/* Return B_TRUE if there are any long holds on this dataset. */ /* Return B_TRUE if there are any long holds on this dataset. */
boolean_t boolean_t
dsl_dataset_long_held(dsl_dataset_t *ds) dsl_dataset_long_held(dsl_dataset_t *ds)
{ {
return (!refcount_is_zero(&ds->ds_longholds)); return (!zfs_refcount_is_zero(&ds->ds_longholds));
} }
void void

View File

@ -258,7 +258,7 @@ dsl_destroy_snapshot_sync_impl(dsl_dataset_t *ds, boolean_t defer, dmu_tx_t *tx)
rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG); rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
ASSERT3U(dsl_dataset_phys(ds)->ds_bp.blk_birth, <=, tx->tx_txg); ASSERT3U(dsl_dataset_phys(ds)->ds_bp.blk_birth, <=, tx->tx_txg);
rrw_exit(&ds->ds_bp_rwlock, FTAG); rrw_exit(&ds->ds_bp_rwlock, FTAG);
ASSERT(refcount_is_zero(&ds->ds_longholds)); ASSERT(zfs_refcount_is_zero(&ds->ds_longholds));
if (defer && if (defer &&
(ds->ds_userrefs > 0 || (ds->ds_userrefs > 0 ||
@ -619,7 +619,7 @@ dsl_destroy_head_check_impl(dsl_dataset_t *ds, int expected_holds)
if (ds->ds_is_snapshot) if (ds->ds_is_snapshot)
return (SET_ERROR(EINVAL)); return (SET_ERROR(EINVAL));
if (refcount_count(&ds->ds_longholds) != expected_holds) if (zfs_refcount_count(&ds->ds_longholds) != expected_holds)
return (SET_ERROR(EBUSY)); return (SET_ERROR(EBUSY));
mos = ds->ds_dir->dd_pool->dp_meta_objset; mos = ds->ds_dir->dd_pool->dp_meta_objset;
@ -647,7 +647,7 @@ dsl_destroy_head_check_impl(dsl_dataset_t *ds, int expected_holds)
dsl_dataset_phys(ds->ds_prev)->ds_num_children == 2 && dsl_dataset_phys(ds->ds_prev)->ds_num_children == 2 &&
ds->ds_prev->ds_userrefs == 0) { ds->ds_prev->ds_userrefs == 0) {
/* We need to remove the origin snapshot as well. */ /* We need to remove the origin snapshot as well. */
if (!refcount_is_zero(&ds->ds_prev->ds_longholds)) if (!zfs_refcount_is_zero(&ds->ds_prev->ds_longholds))
return (SET_ERROR(EBUSY)); return (SET_ERROR(EBUSY));
} }
return (0); return (0);

View File

@ -223,7 +223,7 @@ metaslab_class_create(spa_t *spa, metaslab_ops_t *ops)
mc->mc_rotor = NULL; mc->mc_rotor = NULL;
mc->mc_ops = ops; mc->mc_ops = ops;
mutex_init(&mc->mc_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&mc->mc_lock, NULL, MUTEX_DEFAULT, NULL);
refcount_create_tracked(&mc->mc_alloc_slots); zfs_refcount_create_tracked(&mc->mc_alloc_slots);
return (mc); return (mc);
} }
@ -237,7 +237,7 @@ metaslab_class_destroy(metaslab_class_t *mc)
ASSERT(mc->mc_space == 0); ASSERT(mc->mc_space == 0);
ASSERT(mc->mc_dspace == 0); ASSERT(mc->mc_dspace == 0);
refcount_destroy(&mc->mc_alloc_slots); zfs_refcount_destroy(&mc->mc_alloc_slots);
mutex_destroy(&mc->mc_lock); mutex_destroy(&mc->mc_lock);
kmem_free(mc, sizeof (metaslab_class_t)); kmem_free(mc, sizeof (metaslab_class_t));
} }
@ -585,7 +585,7 @@ metaslab_group_create(metaslab_class_t *mc, vdev_t *vd)
mg->mg_activation_count = 0; mg->mg_activation_count = 0;
mg->mg_initialized = B_FALSE; mg->mg_initialized = B_FALSE;
mg->mg_no_free_space = B_TRUE; mg->mg_no_free_space = B_TRUE;
refcount_create_tracked(&mg->mg_alloc_queue_depth); zfs_refcount_create_tracked(&mg->mg_alloc_queue_depth);
mg->mg_taskq = taskq_create("metaslab_group_taskq", metaslab_load_pct, mg->mg_taskq = taskq_create("metaslab_group_taskq", metaslab_load_pct,
maxclsyspri, 10, INT_MAX, TASKQ_THREADS_CPU_PCT | TASKQ_DYNAMIC); maxclsyspri, 10, INT_MAX, TASKQ_THREADS_CPU_PCT | TASKQ_DYNAMIC);
@ -608,7 +608,7 @@ metaslab_group_destroy(metaslab_group_t *mg)
taskq_destroy(mg->mg_taskq); taskq_destroy(mg->mg_taskq);
avl_destroy(&mg->mg_metaslab_tree); avl_destroy(&mg->mg_metaslab_tree);
mutex_destroy(&mg->mg_lock); mutex_destroy(&mg->mg_lock);
refcount_destroy(&mg->mg_alloc_queue_depth); zfs_refcount_destroy(&mg->mg_alloc_queue_depth);
kmem_free(mg, sizeof (metaslab_group_t)); kmem_free(mg, sizeof (metaslab_group_t));
} }
@ -907,7 +907,7 @@ metaslab_group_allocatable(metaslab_group_t *mg, metaslab_group_t *rotor,
if (mg->mg_no_free_space) if (mg->mg_no_free_space)
return (B_FALSE); return (B_FALSE);
qdepth = refcount_count(&mg->mg_alloc_queue_depth); qdepth = zfs_refcount_count(&mg->mg_alloc_queue_depth);
/* /*
* If this metaslab group is below its qmax or it's * If this metaslab group is below its qmax or it's
@ -928,7 +928,7 @@ metaslab_group_allocatable(metaslab_group_t *mg, metaslab_group_t *rotor,
for (mgp = mg->mg_next; mgp != rotor; mgp = mgp->mg_next) { for (mgp = mg->mg_next; mgp != rotor; mgp = mgp->mg_next) {
qmax = mgp->mg_max_alloc_queue_depth; qmax = mgp->mg_max_alloc_queue_depth;
qdepth = refcount_count(&mgp->mg_alloc_queue_depth); qdepth = zfs_refcount_count(&mgp->mg_alloc_queue_depth);
/* /*
* If there is another metaslab group that * If there is another metaslab group that
@ -2663,7 +2663,7 @@ metaslab_group_alloc_increment(spa_t *spa, uint64_t vdev, void *tag, int flags)
if (!mg->mg_class->mc_alloc_throttle_enabled) if (!mg->mg_class->mc_alloc_throttle_enabled)
return; return;
(void) refcount_add(&mg->mg_alloc_queue_depth, tag); (void) zfs_refcount_add(&mg->mg_alloc_queue_depth, tag);
} }
void void
@ -2679,7 +2679,7 @@ metaslab_group_alloc_decrement(spa_t *spa, uint64_t vdev, void *tag, int flags)
if (!mg->mg_class->mc_alloc_throttle_enabled) if (!mg->mg_class->mc_alloc_throttle_enabled)
return; return;
(void) refcount_remove(&mg->mg_alloc_queue_depth, tag); (void) zfs_refcount_remove(&mg->mg_alloc_queue_depth, tag);
} }
void void
@ -2693,7 +2693,7 @@ metaslab_group_alloc_verify(spa_t *spa, const blkptr_t *bp, void *tag)
for (d = 0; d < ndvas; d++) { for (d = 0; d < ndvas; d++) {
uint64_t vdev = DVA_GET_VDEV(&dva[d]); uint64_t vdev = DVA_GET_VDEV(&dva[d]);
metaslab_group_t *mg = vdev_lookup_top(spa, vdev)->vdev_mg; metaslab_group_t *mg = vdev_lookup_top(spa, vdev)->vdev_mg;
VERIFY(refcount_not_held(&mg->mg_alloc_queue_depth, tag)); VERIFY(zfs_refcount_not_held(&mg->mg_alloc_queue_depth, tag));
} }
#endif #endif
} }
@ -3348,7 +3348,7 @@ metaslab_class_throttle_reserve(metaslab_class_t *mc, int slots, zio_t *zio,
ASSERT(mc->mc_alloc_throttle_enabled); ASSERT(mc->mc_alloc_throttle_enabled);
mutex_enter(&mc->mc_lock); mutex_enter(&mc->mc_lock);
reserved_slots = refcount_count(&mc->mc_alloc_slots); reserved_slots = zfs_refcount_count(&mc->mc_alloc_slots);
if (reserved_slots < mc->mc_alloc_max_slots) if (reserved_slots < mc->mc_alloc_max_slots)
available_slots = mc->mc_alloc_max_slots - reserved_slots; available_slots = mc->mc_alloc_max_slots - reserved_slots;
@ -3360,7 +3360,8 @@ metaslab_class_throttle_reserve(metaslab_class_t *mc, int slots, zio_t *zio,
* them individually when an I/O completes. * them individually when an I/O completes.
*/ */
for (d = 0; d < slots; d++) { for (d = 0; d < slots; d++) {
reserved_slots = refcount_add(&mc->mc_alloc_slots, zio); reserved_slots = zfs_refcount_add(&mc->mc_alloc_slots,
zio);
} }
zio->io_flags |= ZIO_FLAG_IO_ALLOCATING; zio->io_flags |= ZIO_FLAG_IO_ALLOCATING;
slot_reserved = B_TRUE; slot_reserved = B_TRUE;
@ -3378,7 +3379,7 @@ metaslab_class_throttle_unreserve(metaslab_class_t *mc, int slots, zio_t *zio)
ASSERT(mc->mc_alloc_throttle_enabled); ASSERT(mc->mc_alloc_throttle_enabled);
mutex_enter(&mc->mc_lock); mutex_enter(&mc->mc_lock);
for (d = 0; d < slots; d++) { for (d = 0; d < slots; d++) {
(void) refcount_remove(&mc->mc_alloc_slots, zio); (void) zfs_refcount_remove(&mc->mc_alloc_slots, zio);
} }
mutex_exit(&mc->mc_lock); mutex_exit(&mc->mc_lock);
} }

View File

@ -38,7 +38,7 @@ static kmem_cache_t *reference_cache;
static kmem_cache_t *reference_history_cache; static kmem_cache_t *reference_history_cache;
void void
refcount_init(void) zfs_refcount_init(void)
{ {
reference_cache = kmem_cache_create("reference_cache", reference_cache = kmem_cache_create("reference_cache",
sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0); sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0);
@ -48,14 +48,14 @@ refcount_init(void)
} }
void void
refcount_fini(void) zfs_refcount_fini(void)
{ {
kmem_cache_destroy(reference_cache); kmem_cache_destroy(reference_cache);
kmem_cache_destroy(reference_history_cache); kmem_cache_destroy(reference_history_cache);
} }
void void
refcount_create(refcount_t *rc) zfs_refcount_create(zfs_refcount_t *rc)
{ {
mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL); mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL);
list_create(&rc->rc_list, sizeof (reference_t), list_create(&rc->rc_list, sizeof (reference_t),
@ -68,21 +68,21 @@ refcount_create(refcount_t *rc)
} }
void void
refcount_create_tracked(refcount_t *rc) zfs_refcount_create_tracked(zfs_refcount_t *rc)
{ {
refcount_create(rc); zfs_refcount_create(rc);
rc->rc_tracked = B_TRUE; rc->rc_tracked = B_TRUE;
} }
void void
refcount_create_untracked(refcount_t *rc) zfs_refcount_create_untracked(zfs_refcount_t *rc)
{ {
refcount_create(rc); zfs_refcount_create(rc);
rc->rc_tracked = B_FALSE; rc->rc_tracked = B_FALSE;
} }
void void
refcount_destroy_many(refcount_t *rc, uint64_t number) zfs_refcount_destroy_many(zfs_refcount_t *rc, uint64_t number)
{ {
reference_t *ref; reference_t *ref;
@ -103,25 +103,25 @@ refcount_destroy_many(refcount_t *rc, uint64_t number)
} }
void void
refcount_destroy(refcount_t *rc) zfs_refcount_destroy(zfs_refcount_t *rc)
{ {
refcount_destroy_many(rc, 0); zfs_refcount_destroy_many(rc, 0);
} }
int int
refcount_is_zero(refcount_t *rc) zfs_refcount_is_zero(zfs_refcount_t *rc)
{ {
return (rc->rc_count == 0); return (rc->rc_count == 0);
} }
int64_t int64_t
refcount_count(refcount_t *rc) zfs_refcount_count(zfs_refcount_t *rc)
{ {
return (rc->rc_count); return (rc->rc_count);
} }
int64_t int64_t
refcount_add_many(refcount_t *rc, uint64_t number, void *holder) zfs_refcount_add_many(zfs_refcount_t *rc, uint64_t number, void *holder)
{ {
reference_t *ref = NULL; reference_t *ref = NULL;
int64_t count; int64_t count;
@ -143,13 +143,13 @@ refcount_add_many(refcount_t *rc, uint64_t number, void *holder)
} }
int64_t int64_t
zfs_refcount_add(refcount_t *rc, void *holder) zfs_refcount_add(zfs_refcount_t *rc, void *holder)
{ {
return (refcount_add_many(rc, 1, holder)); return (zfs_refcount_add_many(rc, 1, holder));
} }
int64_t int64_t
refcount_remove_many(refcount_t *rc, uint64_t number, void *holder) zfs_refcount_remove_many(zfs_refcount_t *rc, uint64_t number, void *holder)
{ {
reference_t *ref; reference_t *ref;
int64_t count; int64_t count;
@ -197,13 +197,13 @@ refcount_remove_many(refcount_t *rc, uint64_t number, void *holder)
} }
int64_t int64_t
refcount_remove(refcount_t *rc, void *holder) zfs_refcount_remove(zfs_refcount_t *rc, void *holder)
{ {
return (refcount_remove_many(rc, 1, holder)); return (zfs_refcount_remove_many(rc, 1, holder));
} }
void void
refcount_transfer(refcount_t *dst, refcount_t *src) zfs_refcount_transfer(zfs_refcount_t *dst, zfs_refcount_t *src)
{ {
int64_t count, removed_count; int64_t count, removed_count;
list_t list, removed; list_t list, removed;
@ -234,7 +234,7 @@ refcount_transfer(refcount_t *dst, refcount_t *src)
} }
void void
refcount_transfer_ownership(refcount_t *rc, void *current_holder, zfs_refcount_transfer_ownership(zfs_refcount_t *rc, void *current_holder,
void *new_holder) void *new_holder)
{ {
reference_t *ref; reference_t *ref;
@ -264,7 +264,7 @@ refcount_transfer_ownership(refcount_t *rc, void *current_holder,
* might be held. * might be held.
*/ */
boolean_t boolean_t
refcount_held(refcount_t *rc, void *holder) zfs_refcount_held(zfs_refcount_t *rc, void *holder)
{ {
reference_t *ref; reference_t *ref;
@ -292,7 +292,7 @@ refcount_held(refcount_t *rc, void *holder)
* since the reference might not be held. * since the reference might not be held.
*/ */
boolean_t boolean_t
refcount_not_held(refcount_t *rc, void *holder) zfs_refcount_not_held(zfs_refcount_t *rc, void *holder)
{ {
reference_t *ref; reference_t *ref;

View File

@ -85,7 +85,7 @@ rrn_find(rrwlock_t *rrl)
{ {
rrw_node_t *rn; rrw_node_t *rn;
if (refcount_count(&rrl->rr_linked_rcount) == 0) if (zfs_refcount_count(&rrl->rr_linked_rcount) == 0)
return (NULL); return (NULL);
for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) { for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) {
@ -120,7 +120,7 @@ rrn_find_and_remove(rrwlock_t *rrl, void *tag)
rrw_node_t *rn; rrw_node_t *rn;
rrw_node_t *prev = NULL; rrw_node_t *prev = NULL;
if (refcount_count(&rrl->rr_linked_rcount) == 0) if (zfs_refcount_count(&rrl->rr_linked_rcount) == 0)
return (B_FALSE); return (B_FALSE);
for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) { for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) {
@ -143,8 +143,8 @@ rrw_init(rrwlock_t *rrl, boolean_t track_all)
mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL); cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL);
rrl->rr_writer = NULL; rrl->rr_writer = NULL;
refcount_create(&rrl->rr_anon_rcount); zfs_refcount_create(&rrl->rr_anon_rcount);
refcount_create(&rrl->rr_linked_rcount); zfs_refcount_create(&rrl->rr_linked_rcount);
rrl->rr_writer_wanted = B_FALSE; rrl->rr_writer_wanted = B_FALSE;
rrl->rr_track_all = track_all; rrl->rr_track_all = track_all;
} }
@ -155,8 +155,8 @@ rrw_destroy(rrwlock_t *rrl)
mutex_destroy(&rrl->rr_lock); mutex_destroy(&rrl->rr_lock);
cv_destroy(&rrl->rr_cv); cv_destroy(&rrl->rr_cv);
ASSERT(rrl->rr_writer == NULL); ASSERT(rrl->rr_writer == NULL);
refcount_destroy(&rrl->rr_anon_rcount); zfs_refcount_destroy(&rrl->rr_anon_rcount);
refcount_destroy(&rrl->rr_linked_rcount); zfs_refcount_destroy(&rrl->rr_linked_rcount);
} }
static void static void
@ -173,19 +173,19 @@ rrw_enter_read_impl(rrwlock_t *rrl, boolean_t prio, void *tag)
DTRACE_PROBE(zfs__rrwfastpath__rdmiss); DTRACE_PROBE(zfs__rrwfastpath__rdmiss);
#endif #endif
ASSERT(rrl->rr_writer != curthread); ASSERT(rrl->rr_writer != curthread);
ASSERT(refcount_count(&rrl->rr_anon_rcount) >= 0); ASSERT(zfs_refcount_count(&rrl->rr_anon_rcount) >= 0);
while (rrl->rr_writer != NULL || (rrl->rr_writer_wanted && while (rrl->rr_writer != NULL || (rrl->rr_writer_wanted &&
refcount_is_zero(&rrl->rr_anon_rcount) && !prio && zfs_refcount_is_zero(&rrl->rr_anon_rcount) && !prio &&
rrn_find(rrl) == NULL)) rrn_find(rrl) == NULL))
cv_wait(&rrl->rr_cv, &rrl->rr_lock); cv_wait(&rrl->rr_cv, &rrl->rr_lock);
if (rrl->rr_writer_wanted || rrl->rr_track_all) { if (rrl->rr_writer_wanted || rrl->rr_track_all) {
/* may or may not be a re-entrant enter */ /* may or may not be a re-entrant enter */
rrn_add(rrl, tag); rrn_add(rrl, tag);
(void) refcount_add(&rrl->rr_linked_rcount, tag); (void) zfs_refcount_add(&rrl->rr_linked_rcount, tag);
} else { } else {
(void) refcount_add(&rrl->rr_anon_rcount, tag); (void) zfs_refcount_add(&rrl->rr_anon_rcount, tag);
} }
ASSERT(rrl->rr_writer == NULL); ASSERT(rrl->rr_writer == NULL);
mutex_exit(&rrl->rr_lock); mutex_exit(&rrl->rr_lock);
@ -216,8 +216,8 @@ rrw_enter_write(rrwlock_t *rrl)
mutex_enter(&rrl->rr_lock); mutex_enter(&rrl->rr_lock);
ASSERT(rrl->rr_writer != curthread); ASSERT(rrl->rr_writer != curthread);
while (refcount_count(&rrl->rr_anon_rcount) > 0 || while (zfs_refcount_count(&rrl->rr_anon_rcount) > 0 ||
refcount_count(&rrl->rr_linked_rcount) > 0 || zfs_refcount_count(&rrl->rr_linked_rcount) > 0 ||
rrl->rr_writer != NULL) { rrl->rr_writer != NULL) {
rrl->rr_writer_wanted = B_TRUE; rrl->rr_writer_wanted = B_TRUE;
cv_wait(&rrl->rr_cv, &rrl->rr_lock); cv_wait(&rrl->rr_cv, &rrl->rr_lock);
@ -250,24 +250,25 @@ rrw_exit(rrwlock_t *rrl, void *tag)
} }
DTRACE_PROBE(zfs__rrwfastpath__exitmiss); DTRACE_PROBE(zfs__rrwfastpath__exitmiss);
#endif #endif
ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) || ASSERT(!zfs_refcount_is_zero(&rrl->rr_anon_rcount) ||
!refcount_is_zero(&rrl->rr_linked_rcount) || !zfs_refcount_is_zero(&rrl->rr_linked_rcount) ||
rrl->rr_writer != NULL); rrl->rr_writer != NULL);
if (rrl->rr_writer == NULL) { if (rrl->rr_writer == NULL) {
int64_t count; int64_t count;
if (rrn_find_and_remove(rrl, tag)) { if (rrn_find_and_remove(rrl, tag)) {
count = refcount_remove(&rrl->rr_linked_rcount, tag); count = zfs_refcount_remove(
&rrl->rr_linked_rcount, tag);
} else { } else {
ASSERT(!rrl->rr_track_all); ASSERT(!rrl->rr_track_all);
count = refcount_remove(&rrl->rr_anon_rcount, tag); count = zfs_refcount_remove(&rrl->rr_anon_rcount, tag);
} }
if (count == 0) if (count == 0)
cv_broadcast(&rrl->rr_cv); cv_broadcast(&rrl->rr_cv);
} else { } else {
ASSERT(rrl->rr_writer == curthread); ASSERT(rrl->rr_writer == curthread);
ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) && ASSERT(zfs_refcount_is_zero(&rrl->rr_anon_rcount) &&
refcount_is_zero(&rrl->rr_linked_rcount)); zfs_refcount_is_zero(&rrl->rr_linked_rcount));
rrl->rr_writer = NULL; rrl->rr_writer = NULL;
cv_broadcast(&rrl->rr_cv); cv_broadcast(&rrl->rr_cv);
} }
@ -288,7 +289,7 @@ rrw_held(rrwlock_t *rrl, krw_t rw)
if (rw == RW_WRITER) { if (rw == RW_WRITER) {
held = (rrl->rr_writer == curthread); held = (rrl->rr_writer == curthread);
} else { } else {
held = (!refcount_is_zero(&rrl->rr_anon_rcount) || held = (!zfs_refcount_is_zero(&rrl->rr_anon_rcount) ||
rrn_find(rrl) != NULL); rrn_find(rrl) != NULL);
} }
mutex_exit(&rrl->rr_lock); mutex_exit(&rrl->rr_lock);

View File

@ -1132,7 +1132,7 @@ sa_tear_down(objset_t *os)
avl_destroy_nodes(&sa->sa_layout_hash_tree, &cookie))) { avl_destroy_nodes(&sa->sa_layout_hash_tree, &cookie))) {
sa_idx_tab_t *tab; sa_idx_tab_t *tab;
while ((tab = list_head(&layout->lot_idx_tab))) { while ((tab = list_head(&layout->lot_idx_tab))) {
ASSERT(refcount_count(&tab->sa_refcount)); ASSERT(zfs_refcount_count(&tab->sa_refcount));
sa_idx_tab_rele(os, tab); sa_idx_tab_rele(os, tab);
} }
} }
@ -1317,13 +1317,13 @@ sa_idx_tab_rele(objset_t *os, void *arg)
return; return;
mutex_enter(&sa->sa_lock); mutex_enter(&sa->sa_lock);
if (refcount_remove(&idx_tab->sa_refcount, NULL) == 0) { if (zfs_refcount_remove(&idx_tab->sa_refcount, NULL) == 0) {
list_remove(&idx_tab->sa_layout->lot_idx_tab, idx_tab); list_remove(&idx_tab->sa_layout->lot_idx_tab, idx_tab);
if (idx_tab->sa_variable_lengths) if (idx_tab->sa_variable_lengths)
kmem_free(idx_tab->sa_variable_lengths, kmem_free(idx_tab->sa_variable_lengths,
sizeof (uint16_t) * sizeof (uint16_t) *
idx_tab->sa_layout->lot_var_sizes); idx_tab->sa_layout->lot_var_sizes);
refcount_destroy(&idx_tab->sa_refcount); zfs_refcount_destroy(&idx_tab->sa_refcount);
kmem_free(idx_tab->sa_idx_tab, kmem_free(idx_tab->sa_idx_tab,
sizeof (uint32_t) * sa->sa_num_attrs); sizeof (uint32_t) * sa->sa_num_attrs);
kmem_free(idx_tab, sizeof (sa_idx_tab_t)); kmem_free(idx_tab, sizeof (sa_idx_tab_t));
@ -1337,7 +1337,7 @@ sa_idx_tab_hold(objset_t *os, sa_idx_tab_t *idx_tab)
ASSERTV(sa_os_t *sa = os->os_sa); ASSERTV(sa_os_t *sa = os->os_sa);
ASSERT(MUTEX_HELD(&sa->sa_lock)); ASSERT(MUTEX_HELD(&sa->sa_lock));
(void) refcount_add(&idx_tab->sa_refcount, NULL); (void) zfs_refcount_add(&idx_tab->sa_refcount, NULL);
} }
void void
@ -1560,7 +1560,7 @@ sa_find_idx_tab(objset_t *os, dmu_object_type_t bonustype, sa_hdr_phys_t *hdr)
idx_tab->sa_idx_tab = idx_tab->sa_idx_tab =
kmem_zalloc(sizeof (uint32_t) * sa->sa_num_attrs, KM_SLEEP); kmem_zalloc(sizeof (uint32_t) * sa->sa_num_attrs, KM_SLEEP);
idx_tab->sa_layout = tb; idx_tab->sa_layout = tb;
refcount_create(&idx_tab->sa_refcount); zfs_refcount_create(&idx_tab->sa_refcount);
if (tb->lot_var_sizes) if (tb->lot_var_sizes)
idx_tab->sa_variable_lengths = kmem_alloc(sizeof (uint16_t) * idx_tab->sa_variable_lengths = kmem_alloc(sizeof (uint16_t) *
tb->lot_var_sizes, KM_SLEEP); tb->lot_var_sizes, KM_SLEEP);

View File

@ -2302,7 +2302,7 @@ spa_load(spa_t *spa, spa_load_state_t state, spa_import_type_t type,
* and are making their way through the eviction process. * and are making their way through the eviction process.
*/ */
spa_evicting_os_wait(spa); spa_evicting_os_wait(spa);
spa->spa_minref = refcount_count(&spa->spa_refcount); spa->spa_minref = zfs_refcount_count(&spa->spa_refcount);
if (error) { if (error) {
if (error != EEXIST) { if (error != EEXIST) {
spa->spa_loaded_ts.tv_sec = 0; spa->spa_loaded_ts.tv_sec = 0;
@ -4260,7 +4260,7 @@ spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props,
* and are making their way through the eviction process. * and are making their way through the eviction process.
*/ */
spa_evicting_os_wait(spa); spa_evicting_os_wait(spa);
spa->spa_minref = refcount_count(&spa->spa_refcount); spa->spa_minref = zfs_refcount_count(&spa->spa_refcount);
spa->spa_load_state = SPA_LOAD_NONE; spa->spa_load_state = SPA_LOAD_NONE;
mutex_exit(&spa_namespace_lock); mutex_exit(&spa_namespace_lock);
@ -6852,12 +6852,12 @@ spa_sync(spa_t *spa, uint64_t txg)
* allocations look at mg_max_alloc_queue_depth, and async * allocations look at mg_max_alloc_queue_depth, and async
* allocations all happen from spa_sync(). * allocations all happen from spa_sync().
*/ */
ASSERT0(refcount_count(&mg->mg_alloc_queue_depth)); ASSERT0(zfs_refcount_count(&mg->mg_alloc_queue_depth));
mg->mg_max_alloc_queue_depth = max_queue_depth; mg->mg_max_alloc_queue_depth = max_queue_depth;
queue_depth_total += mg->mg_max_alloc_queue_depth; queue_depth_total += mg->mg_max_alloc_queue_depth;
} }
mc = spa_normal_class(spa); mc = spa_normal_class(spa);
ASSERT0(refcount_count(&mc->mc_alloc_slots)); ASSERT0(zfs_refcount_count(&mc->mc_alloc_slots));
mc->mc_alloc_max_slots = queue_depth_total; mc->mc_alloc_max_slots = queue_depth_total;
mc->mc_alloc_throttle_enabled = zio_dva_throttle_enabled; mc->mc_alloc_throttle_enabled = zio_dva_throttle_enabled;

View File

@ -80,7 +80,7 @@
* definition they must have an existing reference, and will never need * definition they must have an existing reference, and will never need
* to lookup a spa_t by name. * to lookup a spa_t by name.
* *
* spa_refcount (per-spa refcount_t protected by mutex) * spa_refcount (per-spa zfs_refcount_t protected by mutex)
* *
* This reference count keep track of any active users of the spa_t. The * This reference count keep track of any active users of the spa_t. The
* spa_t cannot be destroyed or freed while this is non-zero. Internally, * spa_t cannot be destroyed or freed while this is non-zero. Internally,
@ -366,7 +366,7 @@ spa_config_lock_init(spa_t *spa)
spa_config_lock_t *scl = &spa->spa_config_lock[i]; spa_config_lock_t *scl = &spa->spa_config_lock[i];
mutex_init(&scl->scl_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&scl->scl_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&scl->scl_cv, NULL, CV_DEFAULT, NULL); cv_init(&scl->scl_cv, NULL, CV_DEFAULT, NULL);
refcount_create_untracked(&scl->scl_count); zfs_refcount_create_untracked(&scl->scl_count);
scl->scl_writer = NULL; scl->scl_writer = NULL;
scl->scl_write_wanted = 0; scl->scl_write_wanted = 0;
} }
@ -381,7 +381,7 @@ spa_config_lock_destroy(spa_t *spa)
spa_config_lock_t *scl = &spa->spa_config_lock[i]; spa_config_lock_t *scl = &spa->spa_config_lock[i];
mutex_destroy(&scl->scl_lock); mutex_destroy(&scl->scl_lock);
cv_destroy(&scl->scl_cv); cv_destroy(&scl->scl_cv);
refcount_destroy(&scl->scl_count); zfs_refcount_destroy(&scl->scl_count);
ASSERT(scl->scl_writer == NULL); ASSERT(scl->scl_writer == NULL);
ASSERT(scl->scl_write_wanted == 0); ASSERT(scl->scl_write_wanted == 0);
} }
@ -406,7 +406,7 @@ spa_config_tryenter(spa_t *spa, int locks, void *tag, krw_t rw)
} }
} else { } else {
ASSERT(scl->scl_writer != curthread); ASSERT(scl->scl_writer != curthread);
if (!refcount_is_zero(&scl->scl_count)) { if (!zfs_refcount_is_zero(&scl->scl_count)) {
mutex_exit(&scl->scl_lock); mutex_exit(&scl->scl_lock);
spa_config_exit(spa, locks & ((1 << i) - 1), spa_config_exit(spa, locks & ((1 << i) - 1),
tag); tag);
@ -414,7 +414,7 @@ spa_config_tryenter(spa_t *spa, int locks, void *tag, krw_t rw)
} }
scl->scl_writer = curthread; scl->scl_writer = curthread;
} }
(void) refcount_add(&scl->scl_count, tag); (void) zfs_refcount_add(&scl->scl_count, tag);
mutex_exit(&scl->scl_lock); mutex_exit(&scl->scl_lock);
} }
return (1); return (1);
@ -441,14 +441,14 @@ spa_config_enter(spa_t *spa, int locks, void *tag, krw_t rw)
} }
} else { } else {
ASSERT(scl->scl_writer != curthread); ASSERT(scl->scl_writer != curthread);
while (!refcount_is_zero(&scl->scl_count)) { while (!zfs_refcount_is_zero(&scl->scl_count)) {
scl->scl_write_wanted++; scl->scl_write_wanted++;
cv_wait(&scl->scl_cv, &scl->scl_lock); cv_wait(&scl->scl_cv, &scl->scl_lock);
scl->scl_write_wanted--; scl->scl_write_wanted--;
} }
scl->scl_writer = curthread; scl->scl_writer = curthread;
} }
(void) refcount_add(&scl->scl_count, tag); (void) zfs_refcount_add(&scl->scl_count, tag);
mutex_exit(&scl->scl_lock); mutex_exit(&scl->scl_lock);
} }
ASSERT(wlocks_held <= locks); ASSERT(wlocks_held <= locks);
@ -464,8 +464,8 @@ spa_config_exit(spa_t *spa, int locks, void *tag)
if (!(locks & (1 << i))) if (!(locks & (1 << i)))
continue; continue;
mutex_enter(&scl->scl_lock); mutex_enter(&scl->scl_lock);
ASSERT(!refcount_is_zero(&scl->scl_count)); ASSERT(!zfs_refcount_is_zero(&scl->scl_count));
if (refcount_remove(&scl->scl_count, tag) == 0) { if (zfs_refcount_remove(&scl->scl_count, tag) == 0) {
ASSERT(scl->scl_writer == NULL || ASSERT(scl->scl_writer == NULL ||
scl->scl_writer == curthread); scl->scl_writer == curthread);
scl->scl_writer = NULL; /* OK in either case */ scl->scl_writer = NULL; /* OK in either case */
@ -484,7 +484,8 @@ spa_config_held(spa_t *spa, int locks, krw_t rw)
spa_config_lock_t *scl = &spa->spa_config_lock[i]; spa_config_lock_t *scl = &spa->spa_config_lock[i];
if (!(locks & (1 << i))) if (!(locks & (1 << i)))
continue; continue;
if ((rw == RW_READER && !refcount_is_zero(&scl->scl_count)) || if ((rw == RW_READER &&
!zfs_refcount_is_zero(&scl->scl_count)) ||
(rw == RW_WRITER && scl->scl_writer == curthread)) (rw == RW_WRITER && scl->scl_writer == curthread))
locks_held |= 1 << i; locks_held |= 1 << i;
} }
@ -602,7 +603,7 @@ spa_add(const char *name, nvlist_t *config, const char *altroot)
spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms); spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
refcount_create(&spa->spa_refcount); zfs_refcount_create(&spa->spa_refcount);
spa_config_lock_init(spa); spa_config_lock_init(spa);
spa_stats_init(spa); spa_stats_init(spa);
@ -680,7 +681,7 @@ spa_remove(spa_t *spa)
ASSERT(MUTEX_HELD(&spa_namespace_lock)); ASSERT(MUTEX_HELD(&spa_namespace_lock));
ASSERT(spa->spa_state == POOL_STATE_UNINITIALIZED); ASSERT(spa->spa_state == POOL_STATE_UNINITIALIZED);
ASSERT3U(refcount_count(&spa->spa_refcount), ==, 0); ASSERT3U(zfs_refcount_count(&spa->spa_refcount), ==, 0);
nvlist_free(spa->spa_config_splitting); nvlist_free(spa->spa_config_splitting);
@ -705,7 +706,7 @@ spa_remove(spa_t *spa)
nvlist_free(spa->spa_feat_stats); nvlist_free(spa->spa_feat_stats);
spa_config_set(spa, NULL); spa_config_set(spa, NULL);
refcount_destroy(&spa->spa_refcount); zfs_refcount_destroy(&spa->spa_refcount);
spa_stats_destroy(spa); spa_stats_destroy(spa);
spa_config_lock_destroy(spa); spa_config_lock_destroy(spa);
@ -766,9 +767,9 @@ spa_next(spa_t *prev)
void void
spa_open_ref(spa_t *spa, void *tag) spa_open_ref(spa_t *spa, void *tag)
{ {
ASSERT(refcount_count(&spa->spa_refcount) >= spa->spa_minref || ASSERT(zfs_refcount_count(&spa->spa_refcount) >= spa->spa_minref ||
MUTEX_HELD(&spa_namespace_lock)); MUTEX_HELD(&spa_namespace_lock));
(void) refcount_add(&spa->spa_refcount, tag); (void) zfs_refcount_add(&spa->spa_refcount, tag);
} }
/* /*
@ -778,9 +779,9 @@ spa_open_ref(spa_t *spa, void *tag)
void void
spa_close(spa_t *spa, void *tag) spa_close(spa_t *spa, void *tag)
{ {
ASSERT(refcount_count(&spa->spa_refcount) > spa->spa_minref || ASSERT(zfs_refcount_count(&spa->spa_refcount) > spa->spa_minref ||
MUTEX_HELD(&spa_namespace_lock)); MUTEX_HELD(&spa_namespace_lock));
(void) refcount_remove(&spa->spa_refcount, tag); (void) zfs_refcount_remove(&spa->spa_refcount, tag);
} }
/* /*
@ -794,7 +795,7 @@ spa_close(spa_t *spa, void *tag)
void void
spa_async_close(spa_t *spa, void *tag) spa_async_close(spa_t *spa, void *tag)
{ {
(void) refcount_remove(&spa->spa_refcount, tag); (void) zfs_refcount_remove(&spa->spa_refcount, tag);
} }
/* /*
@ -807,7 +808,7 @@ spa_refcount_zero(spa_t *spa)
{ {
ASSERT(MUTEX_HELD(&spa_namespace_lock)); ASSERT(MUTEX_HELD(&spa_namespace_lock));
return (refcount_count(&spa->spa_refcount) == spa->spa_minref); return (zfs_refcount_count(&spa->spa_refcount) == spa->spa_minref);
} }
/* /*
@ -1878,7 +1879,7 @@ spa_init(int mode)
#endif #endif
fm_init(); fm_init();
refcount_init(); zfs_refcount_init();
unique_init(); unique_init();
range_tree_init(); range_tree_init();
metaslab_alloc_trace_init(); metaslab_alloc_trace_init();
@ -1914,7 +1915,7 @@ spa_fini(void)
metaslab_alloc_trace_fini(); metaslab_alloc_trace_fini();
range_tree_fini(); range_tree_fini();
unique_fini(); unique_fini();
refcount_fini(); zfs_refcount_fini();
fm_fini(); fm_fini();
qat_fini(); qat_fini();

View File

@ -21,7 +21,7 @@
/* /*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2018 by Delphix. All rights reserved. * Copyright (c) 2011, 2015 by Delphix. All rights reserved.
* Copyright 2017 Nexenta Systems, Inc. * Copyright 2017 Nexenta Systems, Inc.
* Copyright (c) 2014 Integros [integros.com] * Copyright (c) 2014 Integros [integros.com]
* Copyright 2016 Toomas Soome <tsoome@me.com> * Copyright 2016 Toomas Soome <tsoome@me.com>
@ -3039,6 +3039,7 @@ vdev_get_stats_ex(vdev_t *vd, vdev_stat_t *vs, vdev_stat_ex_t *vsx)
vd->vdev_max_asize - vd->vdev_asize, vd->vdev_max_asize - vd->vdev_asize,
1ULL << tvd->vdev_ms_shift); 1ULL << tvd->vdev_ms_shift);
} }
vs->vs_esize = vd->vdev_max_asize - vd->vdev_asize;
if (vd->vdev_aux == NULL && vd == vd->vdev_top && if (vd->vdev_aux == NULL && vd == vd->vdev_top &&
!vd->vdev_ishole) { !vd->vdev_ishole) {
vs->vs_fragmentation = vd->vdev_mg->mg_fragmentation; vs->vs_fragmentation = vd->vdev_mg->mg_fragmentation;

View File

@ -23,7 +23,7 @@
* Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). * Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
* Rewritten for Linux by Brian Behlendorf <behlendorf1@llnl.gov>. * Rewritten for Linux by Brian Behlendorf <behlendorf1@llnl.gov>.
* LLNL-CODE-403049. * LLNL-CODE-403049.
* Copyright (c) 2012, 2018 by Delphix. All rights reserved. * Copyright (c) 2012, 2015 by Delphix. All rights reserved.
*/ */
#include <sys/zfs_context.h> #include <sys/zfs_context.h>
@ -35,14 +35,11 @@
#include <sys/zio.h> #include <sys/zio.h>
#include <sys/sunldi.h> #include <sys/sunldi.h>
#include <linux/mod_compat.h> #include <linux/mod_compat.h>
#include <linux/msdos_fs.h> #include <linux/vfs_compat.h>
char *zfs_vdev_scheduler = VDEV_SCHEDULER; char *zfs_vdev_scheduler = VDEV_SCHEDULER;
static void *zfs_vdev_holder = VDEV_HOLDER; static void *zfs_vdev_holder = VDEV_HOLDER;
/* size of the "reserved" partition, in blocks */
#define EFI_MIN_RESV_SIZE (16 * 1024)
/* /*
* Virtual device vector for disks. * Virtual device vector for disks.
*/ */
@ -80,45 +77,23 @@ vdev_bdev_mode(int smode)
ASSERT3S(smode & (FREAD | FWRITE), !=, 0); ASSERT3S(smode & (FREAD | FWRITE), !=, 0);
if ((smode & FREAD) && !(smode & FWRITE)) if ((smode & FREAD) && !(smode & FWRITE))
mode = MS_RDONLY; mode = SB_RDONLY;
return (mode); return (mode);
} }
#endif /* HAVE_OPEN_BDEV_EXCLUSIVE */ #endif /* HAVE_OPEN_BDEV_EXCLUSIVE */
/* The capacity (in bytes) of a bdev that is available to be used by a vdev */
static uint64_t static uint64_t
bdev_capacity(struct block_device *bdev, boolean_t wholedisk) bdev_capacity(struct block_device *bdev)
{ {
struct hd_struct *part = bdev->bd_part; struct hd_struct *part = bdev->bd_part;
uint64_t sectors = get_capacity(bdev->bd_disk);
/* If there are no paritions, return the entire device capacity */
if (part == NULL)
return (sectors << SECTOR_BITS);
/* /* The partition capacity referenced by the block device */
* If there are partitions, decide if we are using a `wholedisk` if (part)
* layout (composed of part1 and part9) or just a single partition. return (part->nr_sects << 9);
*/
if (wholedisk) {
/* Verify the expected device layout */
ASSERT3P(bdev, !=, bdev->bd_contains);
/*
* Sectors used by the EFI partition (part9) as well as
* partion alignment.
*/
uint64_t used = EFI_MIN_RESV_SIZE + NEW_START_BLOCK +
PARTITION_END_ALIGNMENT;
/* Space available to the vdev, i.e. the size of part1 */ /* Otherwise assume the full device capacity */
if (sectors <= used) return (get_capacity(bdev->bd_disk) << 9);
return (0);
uint64_t available = sectors - used;
return (available << SECTOR_BITS);
} else {
/* The partition capacity referenced by the block device */
return (part->nr_sects << SECTOR_BITS);
}
} }
static void static void
@ -352,7 +327,9 @@ skip_open:
v->vdev_nonrot = blk_queue_nonrot(bdev_get_queue(vd->vd_bdev)); v->vdev_nonrot = blk_queue_nonrot(bdev_get_queue(vd->vd_bdev));
/* Physical volume size in bytes */ /* Physical volume size in bytes */
*psize = bdev_capacity(vd->vd_bdev, v->vdev_wholedisk); *psize = bdev_capacity(vd->vd_bdev);
/* TODO: report possible expansion size */
*max_psize = *psize; *max_psize = *psize;
/* Based on the minimum sector size set the block size */ /* Based on the minimum sector size set the block size */
@ -525,13 +502,38 @@ vdev_submit_bio_impl(struct bio *bio)
#endif #endif
} }
#ifndef HAVE_BIO_SET_DEV #ifdef HAVE_BIO_SET_DEV
#if defined(CONFIG_BLK_CGROUP) && defined(HAVE_BIO_SET_DEV_GPL_ONLY)
/*
* The Linux 5.0 kernel updated the bio_set_dev() macro so it calls the
* GPL-only bio_associate_blkg() symbol thus inadvertently converting
* the entire macro. Provide a minimal version which always assigns the
* request queue's root_blkg to the bio.
*/
static inline void
vdev_bio_associate_blkg(struct bio *bio)
{
struct request_queue *q = bio->bi_disk->queue;
ASSERT3P(q, !=, NULL);
ASSERT3P(q->root_blkg, !=, NULL);
ASSERT3P(bio->bi_blkg, ==, NULL);
if (blkg_tryget(q->root_blkg))
bio->bi_blkg = q->root_blkg;
}
#define bio_associate_blkg vdev_bio_associate_blkg
#endif
#else
/*
* Provide a bio_set_dev() helper macro for pre-Linux 4.14 kernels.
*/
static inline void static inline void
bio_set_dev(struct bio *bio, struct block_device *bdev) bio_set_dev(struct bio *bio, struct block_device *bdev)
{ {
bio->bi_bdev = bdev; bio->bi_bdev = bdev;
} }
#endif /* !HAVE_BIO_SET_DEV */ #endif /* HAVE_BIO_SET_DEV */
static inline void static inline void
vdev_submit_bio(struct bio *bio) vdev_submit_bio(struct bio *bio)

View File

@ -120,7 +120,7 @@ typedef struct {
taskqid_t se_taskqid; /* scheduled unmount taskqid */ taskqid_t se_taskqid; /* scheduled unmount taskqid */
avl_node_t se_node_name; /* zfs_snapshots_by_name link */ avl_node_t se_node_name; /* zfs_snapshots_by_name link */
avl_node_t se_node_objsetid; /* zfs_snapshots_by_objsetid link */ avl_node_t se_node_objsetid; /* zfs_snapshots_by_objsetid link */
refcount_t se_refcount; /* reference count */ zfs_refcount_t se_refcount; /* reference count */
} zfs_snapentry_t; } zfs_snapentry_t;
static void zfsctl_snapshot_unmount_delay_impl(zfs_snapentry_t *se, int delay); static void zfsctl_snapshot_unmount_delay_impl(zfs_snapentry_t *se, int delay);
@ -144,7 +144,7 @@ zfsctl_snapshot_alloc(char *full_name, char *full_path, spa_t *spa,
se->se_root_dentry = root_dentry; se->se_root_dentry = root_dentry;
se->se_taskqid = TASKQID_INVALID; se->se_taskqid = TASKQID_INVALID;
refcount_create(&se->se_refcount); zfs_refcount_create(&se->se_refcount);
return (se); return (se);
} }
@ -156,7 +156,7 @@ zfsctl_snapshot_alloc(char *full_name, char *full_path, spa_t *spa,
static void static void
zfsctl_snapshot_free(zfs_snapentry_t *se) zfsctl_snapshot_free(zfs_snapentry_t *se)
{ {
refcount_destroy(&se->se_refcount); zfs_refcount_destroy(&se->se_refcount);
strfree(se->se_name); strfree(se->se_name);
strfree(se->se_path); strfree(se->se_path);
@ -169,7 +169,7 @@ zfsctl_snapshot_free(zfs_snapentry_t *se)
static void static void
zfsctl_snapshot_hold(zfs_snapentry_t *se) zfsctl_snapshot_hold(zfs_snapentry_t *se)
{ {
refcount_add(&se->se_refcount, NULL); zfs_refcount_add(&se->se_refcount, NULL);
} }
/* /*
@ -179,7 +179,7 @@ zfsctl_snapshot_hold(zfs_snapentry_t *se)
static void static void
zfsctl_snapshot_rele(zfs_snapentry_t *se) zfsctl_snapshot_rele(zfs_snapentry_t *se)
{ {
if (refcount_remove(&se->se_refcount, NULL) == 0) if (zfs_refcount_remove(&se->se_refcount, NULL) == 0)
zfsctl_snapshot_free(se); zfsctl_snapshot_free(se);
} }
@ -192,7 +192,7 @@ static void
zfsctl_snapshot_add(zfs_snapentry_t *se) zfsctl_snapshot_add(zfs_snapentry_t *se)
{ {
ASSERT(RW_WRITE_HELD(&zfs_snapshot_lock)); ASSERT(RW_WRITE_HELD(&zfs_snapshot_lock));
refcount_add(&se->se_refcount, NULL); zfs_refcount_add(&se->se_refcount, NULL);
avl_add(&zfs_snapshots_by_name, se); avl_add(&zfs_snapshots_by_name, se);
avl_add(&zfs_snapshots_by_objsetid, se); avl_add(&zfs_snapshots_by_objsetid, se);
} }
@ -269,7 +269,7 @@ zfsctl_snapshot_find_by_name(char *snapname)
search.se_name = snapname; search.se_name = snapname;
se = avl_find(&zfs_snapshots_by_name, &search, NULL); se = avl_find(&zfs_snapshots_by_name, &search, NULL);
if (se) if (se)
refcount_add(&se->se_refcount, NULL); zfs_refcount_add(&se->se_refcount, NULL);
return (se); return (se);
} }
@ -290,7 +290,7 @@ zfsctl_snapshot_find_by_objsetid(spa_t *spa, uint64_t objsetid)
search.se_objsetid = objsetid; search.se_objsetid = objsetid;
se = avl_find(&zfs_snapshots_by_objsetid, &search, NULL); se = avl_find(&zfs_snapshots_by_objsetid, &search, NULL);
if (se) if (se)
refcount_add(&se->se_refcount, NULL); zfs_refcount_add(&se->se_refcount, NULL);
return (se); return (se);
} }

View File

@ -6634,11 +6634,14 @@ static const struct file_operations zfsdev_fops = {
}; };
static struct miscdevice zfs_misc = { static struct miscdevice zfs_misc = {
.minor = MISC_DYNAMIC_MINOR, .minor = ZFS_DEVICE_MINOR,
.name = ZFS_DRIVER, .name = ZFS_DRIVER,
.fops = &zfsdev_fops, .fops = &zfsdev_fops,
}; };
MODULE_ALIAS_MISCDEV(ZFS_DEVICE_MINOR);
MODULE_ALIAS("devname:zfs");
static int static int
zfs_attach(void) zfs_attach(void)
{ {
@ -6649,12 +6652,24 @@ zfs_attach(void)
zfsdev_state_list->zs_minor = -1; zfsdev_state_list->zs_minor = -1;
error = misc_register(&zfs_misc); error = misc_register(&zfs_misc);
if (error != 0) { if (error == -EBUSY) {
printk(KERN_INFO "ZFS: misc_register() failed %d\n", error); /*
return (error); * Fallback to dynamic minor allocation in the event of a
* collision with a reserved minor in linux/miscdevice.h.
* In this case the kernel modules must be manually loaded.
*/
printk(KERN_INFO "ZFS: misc_register() with static minor %d "
"failed %d, retrying with MISC_DYNAMIC_MINOR\n",
ZFS_DEVICE_MINOR, error);
zfs_misc.minor = MISC_DYNAMIC_MINOR;
error = misc_register(&zfs_misc);
} }
return (0); if (error)
printk(KERN_INFO "ZFS: misc_register() failed %d\n", error);
return (error);
} }
static void static void

View File

@ -66,6 +66,7 @@
#include <sys/dmu_objset.h> #include <sys/dmu_objset.h>
#include <sys/spa_boot.h> #include <sys/spa_boot.h>
#include <sys/zpl.h> #include <sys/zpl.h>
#include <linux/vfs_compat.h>
#include "zfs_comutil.h" #include "zfs_comutil.h"
enum { enum {
@ -259,7 +260,7 @@ zfsvfs_parse_options(char *mntopts, vfs_t **vfsp)
boolean_t boolean_t
zfs_is_readonly(zfsvfs_t *zfsvfs) zfs_is_readonly(zfsvfs_t *zfsvfs)
{ {
return (!!(zfsvfs->z_sb->s_flags & MS_RDONLY)); return (!!(zfsvfs->z_sb->s_flags & SB_RDONLY));
} }
/*ARGSUSED*/ /*ARGSUSED*/
@ -353,15 +354,15 @@ acltype_changed_cb(void *arg, uint64_t newval)
switch (newval) { switch (newval) {
case ZFS_ACLTYPE_OFF: case ZFS_ACLTYPE_OFF:
zfsvfs->z_acl_type = ZFS_ACLTYPE_OFF; zfsvfs->z_acl_type = ZFS_ACLTYPE_OFF;
zfsvfs->z_sb->s_flags &= ~MS_POSIXACL; zfsvfs->z_sb->s_flags &= ~SB_POSIXACL;
break; break;
case ZFS_ACLTYPE_POSIXACL: case ZFS_ACLTYPE_POSIXACL:
#ifdef CONFIG_FS_POSIX_ACL #ifdef CONFIG_FS_POSIX_ACL
zfsvfs->z_acl_type = ZFS_ACLTYPE_POSIXACL; zfsvfs->z_acl_type = ZFS_ACLTYPE_POSIXACL;
zfsvfs->z_sb->s_flags |= MS_POSIXACL; zfsvfs->z_sb->s_flags |= SB_POSIXACL;
#else #else
zfsvfs->z_acl_type = ZFS_ACLTYPE_OFF; zfsvfs->z_acl_type = ZFS_ACLTYPE_OFF;
zfsvfs->z_sb->s_flags &= ~MS_POSIXACL; zfsvfs->z_sb->s_flags &= ~SB_POSIXACL;
#endif /* CONFIG_FS_POSIX_ACL */ #endif /* CONFIG_FS_POSIX_ACL */
break; break;
default: default:
@ -390,9 +391,9 @@ readonly_changed_cb(void *arg, uint64_t newval)
return; return;
if (newval) if (newval)
sb->s_flags |= MS_RDONLY; sb->s_flags |= SB_RDONLY;
else else
sb->s_flags &= ~MS_RDONLY; sb->s_flags &= ~SB_RDONLY;
} }
static void static void
@ -420,9 +421,9 @@ nbmand_changed_cb(void *arg, uint64_t newval)
return; return;
if (newval == TRUE) if (newval == TRUE)
sb->s_flags |= MS_MANDLOCK; sb->s_flags |= SB_MANDLOCK;
else else
sb->s_flags &= ~MS_MANDLOCK; sb->s_flags &= ~SB_MANDLOCK;
} }
static void static void
@ -1245,15 +1246,13 @@ zfs_statvfs(struct dentry *dentry, struct kstatfs *statp)
{ {
zfsvfs_t *zfsvfs = dentry->d_sb->s_fs_info; zfsvfs_t *zfsvfs = dentry->d_sb->s_fs_info;
uint64_t refdbytes, availbytes, usedobjs, availobjs; uint64_t refdbytes, availbytes, usedobjs, availobjs;
uint64_t fsid;
uint32_t bshift;
ZFS_ENTER(zfsvfs); ZFS_ENTER(zfsvfs);
dmu_objset_space(zfsvfs->z_os, dmu_objset_space(zfsvfs->z_os,
&refdbytes, &availbytes, &usedobjs, &availobjs); &refdbytes, &availbytes, &usedobjs, &availobjs);
fsid = dmu_objset_fsid_guid(zfsvfs->z_os); uint64_t fsid = dmu_objset_fsid_guid(zfsvfs->z_os);
/* /*
* The underlying storage pool actually uses multiple block * The underlying storage pool actually uses multiple block
* size. Under Solaris frsize (fragment size) is reported as * size. Under Solaris frsize (fragment size) is reported as
@ -1265,7 +1264,7 @@ zfs_statvfs(struct dentry *dentry, struct kstatfs *statp)
*/ */
statp->f_frsize = zfsvfs->z_max_blksz; statp->f_frsize = zfsvfs->z_max_blksz;
statp->f_bsize = zfsvfs->z_max_blksz; statp->f_bsize = zfsvfs->z_max_blksz;
bshift = fls(statp->f_bsize) - 1; uint32_t bshift = fls(statp->f_bsize) - 1;
/* /*
* The following report "total" blocks of various kinds in * The following report "total" blocks of various kinds in
@ -1282,7 +1281,7 @@ zfs_statvfs(struct dentry *dentry, struct kstatfs *statp)
* static metadata. ZFS doesn't preallocate files, so the best * static metadata. ZFS doesn't preallocate files, so the best
* we can do is report the max that could possibly fit in f_files, * we can do is report the max that could possibly fit in f_files,
* and that minus the number actually used in f_ffree. * and that minus the number actually used in f_ffree.
* For f_ffree, report the smaller of the number of object available * For f_ffree, report the smaller of the number of objects available
* and the number of blocks (each object will take at least a block). * and the number of blocks (each object will take at least a block).
*/ */
statp->f_ffree = MIN(availobjs, availbytes >> DNODE_SHIFT); statp->f_ffree = MIN(availobjs, availbytes >> DNODE_SHIFT);
@ -1765,8 +1764,8 @@ zfs_remount(struct super_block *sb, int *flags, zfs_mnt_t *zm)
int error; int error;
if ((issnap || !spa_writeable(dmu_objset_spa(zfsvfs->z_os))) && if ((issnap || !spa_writeable(dmu_objset_spa(zfsvfs->z_os))) &&
!(*flags & MS_RDONLY)) { !(*flags & SB_RDONLY)) {
*flags |= MS_RDONLY; *flags |= SB_RDONLY;
return (EROFS); return (EROFS);
} }

View File

@ -675,7 +675,10 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
xuio = (xuio_t *)uio; xuio = (xuio_t *)uio;
else else
#endif #endif
uio_prefaultpages(MIN(n, max_blksz), uio); if (uio_prefaultpages(MIN(n, max_blksz), uio)) {
ZFS_EXIT(zfsvfs);
return (SET_ERROR(EFAULT));
}
/* /*
* If in append mode, set the io offset pointer to eof. * If in append mode, set the io offset pointer to eof.
@ -820,8 +823,19 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
if (abuf == NULL) { if (abuf == NULL) {
tx_bytes = uio->uio_resid; tx_bytes = uio->uio_resid;
uio->uio_fault_disable = B_TRUE;
error = dmu_write_uio_dbuf(sa_get_db(zp->z_sa_hdl), error = dmu_write_uio_dbuf(sa_get_db(zp->z_sa_hdl),
uio, nbytes, tx); uio, nbytes, tx);
if (error == EFAULT) {
dmu_tx_commit(tx);
if (uio_prefaultpages(MIN(n, max_blksz), uio)) {
break;
}
continue;
} else if (error != 0) {
dmu_tx_commit(tx);
break;
}
tx_bytes -= uio->uio_resid; tx_bytes -= uio->uio_resid;
} else { } else {
tx_bytes = nbytes; tx_bytes = nbytes;
@ -921,8 +935,12 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
ASSERT(tx_bytes == nbytes); ASSERT(tx_bytes == nbytes);
n -= nbytes; n -= nbytes;
if (!xuio && n > 0) if (!xuio && n > 0) {
uio_prefaultpages(MIN(n, max_blksz), uio); if (uio_prefaultpages(MIN(n, max_blksz), uio)) {
error = EFAULT;
break;
}
}
} }
zfs_inode_update(zp); zfs_inode_update(zp);

View File

@ -149,7 +149,7 @@ zfs_znode_hold_cache_constructor(void *buf, void *arg, int kmflags)
znode_hold_t *zh = buf; znode_hold_t *zh = buf;
mutex_init(&zh->zh_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&zh->zh_lock, NULL, MUTEX_DEFAULT, NULL);
refcount_create(&zh->zh_refcount); zfs_refcount_create(&zh->zh_refcount);
zh->zh_obj = ZFS_NO_OBJECT; zh->zh_obj = ZFS_NO_OBJECT;
return (0); return (0);
@ -161,7 +161,7 @@ zfs_znode_hold_cache_destructor(void *buf, void *arg)
znode_hold_t *zh = buf; znode_hold_t *zh = buf;
mutex_destroy(&zh->zh_lock); mutex_destroy(&zh->zh_lock);
refcount_destroy(&zh->zh_refcount); zfs_refcount_destroy(&zh->zh_refcount);
} }
void void
@ -272,14 +272,14 @@ zfs_znode_hold_enter(zfsvfs_t *zfsvfs, uint64_t obj)
ASSERT3U(zh->zh_obj, ==, obj); ASSERT3U(zh->zh_obj, ==, obj);
found = B_TRUE; found = B_TRUE;
} }
refcount_add(&zh->zh_refcount, NULL); zfs_refcount_add(&zh->zh_refcount, NULL);
mutex_exit(&zfsvfs->z_hold_locks[i]); mutex_exit(&zfsvfs->z_hold_locks[i]);
if (found == B_TRUE) if (found == B_TRUE)
kmem_cache_free(znode_hold_cache, zh_new); kmem_cache_free(znode_hold_cache, zh_new);
ASSERT(MUTEX_NOT_HELD(&zh->zh_lock)); ASSERT(MUTEX_NOT_HELD(&zh->zh_lock));
ASSERT3S(refcount_count(&zh->zh_refcount), >, 0); ASSERT3S(zfs_refcount_count(&zh->zh_refcount), >, 0);
mutex_enter(&zh->zh_lock); mutex_enter(&zh->zh_lock);
return (zh); return (zh);
@ -292,11 +292,11 @@ zfs_znode_hold_exit(zfsvfs_t *zfsvfs, znode_hold_t *zh)
boolean_t remove = B_FALSE; boolean_t remove = B_FALSE;
ASSERT(zfs_znode_held(zfsvfs, zh->zh_obj)); ASSERT(zfs_znode_held(zfsvfs, zh->zh_obj));
ASSERT3S(refcount_count(&zh->zh_refcount), >, 0); ASSERT3S(zfs_refcount_count(&zh->zh_refcount), >, 0);
mutex_exit(&zh->zh_lock); mutex_exit(&zh->zh_lock);
mutex_enter(&zfsvfs->z_hold_locks[i]); mutex_enter(&zfsvfs->z_hold_locks[i]);
if (refcount_remove(&zh->zh_refcount, NULL) == 0) { if (zfs_refcount_remove(&zh->zh_refcount, NULL) == 0) {
avl_remove(&zfsvfs->z_hold_trees[i], zh); avl_remove(&zfsvfs->z_hold_trees[i], zh);
remove = B_TRUE; remove = B_TRUE;
} }

View File

@ -75,9 +75,6 @@ uint64_t zio_buf_cache_frees[SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT];
int zio_delay_max = ZIO_DELAY_MAX; int zio_delay_max = ZIO_DELAY_MAX;
#define ZIO_PIPELINE_CONTINUE 0x100
#define ZIO_PIPELINE_STOP 0x101
#define BP_SPANB(indblkshift, level) \ #define BP_SPANB(indblkshift, level) \
(((uint64_t)1) << ((level) * ((indblkshift) - SPA_BLKPTRSHIFT))) (((uint64_t)1) << ((level) * ((indblkshift) - SPA_BLKPTRSHIFT)))
#define COMPARE_META_LEVEL 0x80000000ul #define COMPARE_META_LEVEL 0x80000000ul
@ -516,7 +513,8 @@ zio_wait_for_children(zio_t *zio, uint8_t childbits, enum zio_wait_type wait)
__attribute__((always_inline)) __attribute__((always_inline))
static inline void static inline void
zio_notify_parent(zio_t *pio, zio_t *zio, enum zio_wait_type wait) zio_notify_parent(zio_t *pio, zio_t *zio, enum zio_wait_type wait,
zio_t **next_to_executep)
{ {
uint64_t *countp = &pio->io_children[zio->io_child_type][wait]; uint64_t *countp = &pio->io_children[zio->io_child_type][wait];
int *errorp = &pio->io_child_error[zio->io_child_type]; int *errorp = &pio->io_child_error[zio->io_child_type];
@ -535,13 +533,33 @@ zio_notify_parent(zio_t *pio, zio_t *zio, enum zio_wait_type wait)
ZIO_TASKQ_INTERRUPT; ZIO_TASKQ_INTERRUPT;
pio->io_stall = NULL; pio->io_stall = NULL;
mutex_exit(&pio->io_lock); mutex_exit(&pio->io_lock);
/* /*
* Dispatch the parent zio in its own taskq so that * If we can tell the caller to execute this parent next, do
* the child can continue to make progress. This also * so. Otherwise dispatch the parent zio as its own task.
* prevents overflowing the stack when we have deeply nested *
* parent-child relationships. * Having the caller execute the parent when possible reduces
* locking on the zio taskq's, reduces context switch
* overhead, and has no recursion penalty. Note that one
* read from disk typically causes at least 3 zio's: a
* zio_null(), the logical zio_read(), and then a physical
* zio. When the physical ZIO completes, we are able to call
* zio_done() on all 3 of these zio's from one invocation of
* zio_execute() by returning the parent back to
* zio_execute(). Since the parent isn't executed until this
* thread returns back to zio_execute(), the caller should do
* so promptly.
*
* In other cases, dispatching the parent prevents
* overflowing the stack when we have deeply nested
* parent-child relationships, as we do with the "mega zio"
* of writes for spa_sync(), and the chain of ZIL blocks.
*/ */
zio_taskq_dispatch(pio, type, B_FALSE); if (next_to_executep != NULL && *next_to_executep == NULL) {
*next_to_executep = pio;
} else {
zio_taskq_dispatch(pio, type, B_FALSE);
}
} else { } else {
mutex_exit(&pio->io_lock); mutex_exit(&pio->io_lock);
} }
@ -1187,7 +1205,7 @@ zio_shrink(zio_t *zio, uint64_t size)
* ========================================================================== * ==========================================================================
*/ */
static int static zio_t *
zio_read_bp_init(zio_t *zio) zio_read_bp_init(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -1221,15 +1239,15 @@ zio_read_bp_init(zio_t *zio)
if (BP_GET_DEDUP(bp) && zio->io_child_type == ZIO_CHILD_LOGICAL) if (BP_GET_DEDUP(bp) && zio->io_child_type == ZIO_CHILD_LOGICAL)
zio->io_pipeline = ZIO_DDT_READ_PIPELINE; zio->io_pipeline = ZIO_DDT_READ_PIPELINE;
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_write_bp_init(zio_t *zio) zio_write_bp_init(zio_t *zio)
{ {
if (!IO_IS_ALLOCATING(zio)) if (!IO_IS_ALLOCATING(zio))
return (ZIO_PIPELINE_CONTINUE); return (zio);
ASSERT(zio->io_child_type != ZIO_CHILD_DDT); ASSERT(zio->io_child_type != ZIO_CHILD_DDT);
@ -1244,7 +1262,7 @@ zio_write_bp_init(zio_t *zio)
zio->io_pipeline = ZIO_INTERLOCK_PIPELINE; zio->io_pipeline = ZIO_INTERLOCK_PIPELINE;
if (BP_IS_EMBEDDED(bp)) if (BP_IS_EMBEDDED(bp))
return (ZIO_PIPELINE_CONTINUE); return (zio);
/* /*
* If we've been overridden and nopwrite is set then * If we've been overridden and nopwrite is set then
@ -1255,13 +1273,13 @@ zio_write_bp_init(zio_t *zio)
ASSERT(!zp->zp_dedup); ASSERT(!zp->zp_dedup);
ASSERT3U(BP_GET_CHECKSUM(bp), ==, zp->zp_checksum); ASSERT3U(BP_GET_CHECKSUM(bp), ==, zp->zp_checksum);
zio->io_flags |= ZIO_FLAG_NOPWRITE; zio->io_flags |= ZIO_FLAG_NOPWRITE;
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
ASSERT(!zp->zp_nopwrite); ASSERT(!zp->zp_nopwrite);
if (BP_IS_HOLE(bp) || !zp->zp_dedup) if (BP_IS_HOLE(bp) || !zp->zp_dedup)
return (ZIO_PIPELINE_CONTINUE); return (zio);
ASSERT((zio_checksum_table[zp->zp_checksum].ci_flags & ASSERT((zio_checksum_table[zp->zp_checksum].ci_flags &
ZCHECKSUM_FLAG_DEDUP) || zp->zp_dedup_verify); ZCHECKSUM_FLAG_DEDUP) || zp->zp_dedup_verify);
@ -1269,7 +1287,7 @@ zio_write_bp_init(zio_t *zio)
if (BP_GET_CHECKSUM(bp) == zp->zp_checksum) { if (BP_GET_CHECKSUM(bp) == zp->zp_checksum) {
BP_SET_DEDUP(bp, 1); BP_SET_DEDUP(bp, 1);
zio->io_pipeline |= ZIO_STAGE_DDT_WRITE; zio->io_pipeline |= ZIO_STAGE_DDT_WRITE;
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -1281,10 +1299,10 @@ zio_write_bp_init(zio_t *zio)
zio->io_pipeline = zio->io_orig_pipeline; zio->io_pipeline = zio->io_orig_pipeline;
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_write_compress(zio_t *zio) zio_write_compress(zio_t *zio)
{ {
spa_t *spa = zio->io_spa; spa_t *spa = zio->io_spa;
@ -1303,11 +1321,11 @@ zio_write_compress(zio_t *zio)
*/ */
if (zio_wait_for_children(zio, ZIO_CHILD_LOGICAL_BIT | if (zio_wait_for_children(zio, ZIO_CHILD_LOGICAL_BIT |
ZIO_CHILD_GANG_BIT, ZIO_WAIT_READY)) { ZIO_CHILD_GANG_BIT, ZIO_WAIT_READY)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
if (!IO_IS_ALLOCATING(zio)) if (!IO_IS_ALLOCATING(zio))
return (ZIO_PIPELINE_CONTINUE); return (zio);
if (zio->io_children_ready != NULL) { if (zio->io_children_ready != NULL) {
/* /*
@ -1366,7 +1384,7 @@ zio_write_compress(zio_t *zio)
zio->io_pipeline = ZIO_INTERLOCK_PIPELINE; zio->io_pipeline = ZIO_INTERLOCK_PIPELINE;
ASSERT(spa_feature_is_active(spa, ASSERT(spa_feature_is_active(spa,
SPA_FEATURE_EMBEDDED_DATA)); SPA_FEATURE_EMBEDDED_DATA));
return (ZIO_PIPELINE_CONTINUE); return (zio);
} else { } else {
/* /*
* Round up compressed size up to the ashift * Round up compressed size up to the ashift
@ -1459,10 +1477,10 @@ zio_write_compress(zio_t *zio)
zio->io_pipeline |= ZIO_STAGE_NOP_WRITE; zio->io_pipeline |= ZIO_STAGE_NOP_WRITE;
} }
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_free_bp_init(zio_t *zio) zio_free_bp_init(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -1472,7 +1490,9 @@ zio_free_bp_init(zio_t *zio)
zio->io_pipeline = ZIO_DDT_FREE_PIPELINE; zio->io_pipeline = ZIO_DDT_FREE_PIPELINE;
} }
return (ZIO_PIPELINE_CONTINUE); ASSERT3P(zio->io_bp, ==, &zio->io_bp_copy);
return (zio);
} }
/* /*
@ -1541,12 +1561,12 @@ zio_taskq_member(zio_t *zio, zio_taskq_type_t q)
return (B_FALSE); return (B_FALSE);
} }
static int static zio_t *
zio_issue_async(zio_t *zio) zio_issue_async(zio_t *zio)
{ {
zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, B_FALSE); zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, B_FALSE);
return (ZIO_PIPELINE_STOP); return (NULL);
} }
void void
@ -1687,14 +1707,13 @@ __attribute__((always_inline))
static inline void static inline void
__zio_execute(zio_t *zio) __zio_execute(zio_t *zio)
{ {
zio->io_executor = curthread;
ASSERT3U(zio->io_queued_timestamp, >, 0); ASSERT3U(zio->io_queued_timestamp, >, 0);
while (zio->io_stage < ZIO_STAGE_DONE) { while (zio->io_stage < ZIO_STAGE_DONE) {
enum zio_stage pipeline = zio->io_pipeline; enum zio_stage pipeline = zio->io_pipeline;
enum zio_stage stage = zio->io_stage; enum zio_stage stage = zio->io_stage;
int rv;
zio->io_executor = curthread;
ASSERT(!MUTEX_HELD(&zio->io_lock)); ASSERT(!MUTEX_HELD(&zio->io_lock));
ASSERT(ISP2(stage)); ASSERT(ISP2(stage));
@ -1736,12 +1755,16 @@ __zio_execute(zio_t *zio)
zio->io_stage = stage; zio->io_stage = stage;
zio->io_pipeline_trace |= zio->io_stage; zio->io_pipeline_trace |= zio->io_stage;
rv = zio_pipeline[highbit64(stage) - 1](zio);
if (rv == ZIO_PIPELINE_STOP) /*
* The zio pipeline stage returns the next zio to execute
* (typically the same as this one), or NULL if we should
* stop.
*/
zio = zio_pipeline[highbit64(stage) - 1](zio);
if (zio == NULL)
return; return;
ASSERT(rv == ZIO_PIPELINE_CONTINUE);
} }
} }
@ -2215,7 +2238,7 @@ zio_gang_tree_issue(zio_t *pio, zio_gang_node_t *gn, blkptr_t *bp, abd_t *data,
zio_nowait(zio); zio_nowait(zio);
} }
static int static zio_t *
zio_gang_assemble(zio_t *zio) zio_gang_assemble(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -2227,16 +2250,16 @@ zio_gang_assemble(zio_t *zio)
zio_gang_tree_assemble(zio, bp, &zio->io_gang_tree); zio_gang_tree_assemble(zio, bp, &zio->io_gang_tree);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_gang_issue(zio_t *zio) zio_gang_issue(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
if (zio_wait_for_children(zio, ZIO_CHILD_GANG_BIT, ZIO_WAIT_DONE)) { if (zio_wait_for_children(zio, ZIO_CHILD_GANG_BIT, ZIO_WAIT_DONE)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
ASSERT(BP_IS_GANG(bp) && zio->io_gang_leader == zio); ASSERT(BP_IS_GANG(bp) && zio->io_gang_leader == zio);
@ -2250,7 +2273,7 @@ zio_gang_issue(zio_t *zio)
zio->io_pipeline = ZIO_INTERLOCK_PIPELINE; zio->io_pipeline = ZIO_INTERLOCK_PIPELINE;
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static void static void
@ -2290,7 +2313,7 @@ zio_write_gang_done(zio_t *zio)
abd_put(zio->io_abd); abd_put(zio->io_abd);
} }
static int static zio_t *
zio_write_gang_block(zio_t *pio) zio_write_gang_block(zio_t *pio)
{ {
spa_t *spa = pio->io_spa; spa_t *spa = pio->io_spa;
@ -2315,7 +2338,7 @@ zio_write_gang_block(zio_t *pio)
ASSERT(!(pio->io_flags & ZIO_FLAG_NODATA)); ASSERT(!(pio->io_flags & ZIO_FLAG_NODATA));
flags |= METASLAB_ASYNC_ALLOC; flags |= METASLAB_ASYNC_ALLOC;
VERIFY(refcount_held(&mc->mc_alloc_slots, pio)); VERIFY(zfs_refcount_held(&mc->mc_alloc_slots, pio));
/* /*
* The logical zio has already placed a reservation for * The logical zio has already placed a reservation for
@ -2349,7 +2372,7 @@ zio_write_gang_block(zio_t *pio)
} }
pio->io_error = error; pio->io_error = error;
return (ZIO_PIPELINE_CONTINUE); return (pio);
} }
if (pio == gio) { if (pio == gio) {
@ -2423,7 +2446,7 @@ zio_write_gang_block(zio_t *pio)
zio_nowait(zio); zio_nowait(zio);
return (ZIO_PIPELINE_CONTINUE); return (pio);
} }
/* /*
@ -2444,7 +2467,7 @@ zio_write_gang_block(zio_t *pio)
* used for nopwrite, assuming that the salt and the checksums * used for nopwrite, assuming that the salt and the checksums
* themselves remain secret. * themselves remain secret.
*/ */
static int static zio_t *
zio_nop_write(zio_t *zio) zio_nop_write(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -2471,7 +2494,7 @@ zio_nop_write(zio_t *zio)
BP_GET_COMPRESS(bp) != BP_GET_COMPRESS(bp_orig) || BP_GET_COMPRESS(bp) != BP_GET_COMPRESS(bp_orig) ||
BP_GET_DEDUP(bp) != BP_GET_DEDUP(bp_orig) || BP_GET_DEDUP(bp) != BP_GET_DEDUP(bp_orig) ||
zp->zp_copies != BP_GET_NDVAS(bp_orig)) zp->zp_copies != BP_GET_NDVAS(bp_orig))
return (ZIO_PIPELINE_CONTINUE); return (zio);
/* /*
* If the checksums match then reset the pipeline so that we * If the checksums match then reset the pipeline so that we
@ -2491,7 +2514,7 @@ zio_nop_write(zio_t *zio)
zio->io_flags |= ZIO_FLAG_NOPWRITE; zio->io_flags |= ZIO_FLAG_NOPWRITE;
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -2519,7 +2542,7 @@ zio_ddt_child_read_done(zio_t *zio)
mutex_exit(&pio->io_lock); mutex_exit(&pio->io_lock);
} }
static int static zio_t *
zio_ddt_read_start(zio_t *zio) zio_ddt_read_start(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -2540,7 +2563,7 @@ zio_ddt_read_start(zio_t *zio)
zio->io_vsd = dde; zio->io_vsd = dde;
if (ddp_self == NULL) if (ddp_self == NULL)
return (ZIO_PIPELINE_CONTINUE); return (zio);
for (p = 0; p < DDT_PHYS_TYPES; p++, ddp++) { for (p = 0; p < DDT_PHYS_TYPES; p++, ddp++) {
if (ddp->ddp_phys_birth == 0 || ddp == ddp_self) if (ddp->ddp_phys_birth == 0 || ddp == ddp_self)
@ -2553,23 +2576,23 @@ zio_ddt_read_start(zio_t *zio)
zio->io_priority, ZIO_DDT_CHILD_FLAGS(zio) | zio->io_priority, ZIO_DDT_CHILD_FLAGS(zio) |
ZIO_FLAG_DONT_PROPAGATE, &zio->io_bookmark)); ZIO_FLAG_DONT_PROPAGATE, &zio->io_bookmark));
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
zio_nowait(zio_read(zio, zio->io_spa, bp, zio_nowait(zio_read(zio, zio->io_spa, bp,
zio->io_abd, zio->io_size, NULL, NULL, zio->io_priority, zio->io_abd, zio->io_size, NULL, NULL, zio->io_priority,
ZIO_DDT_CHILD_FLAGS(zio), &zio->io_bookmark)); ZIO_DDT_CHILD_FLAGS(zio), &zio->io_bookmark));
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_ddt_read_done(zio_t *zio) zio_ddt_read_done(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
if (zio_wait_for_children(zio, ZIO_CHILD_DDT_BIT, ZIO_WAIT_DONE)) { if (zio_wait_for_children(zio, ZIO_CHILD_DDT_BIT, ZIO_WAIT_DONE)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
ASSERT(BP_GET_DEDUP(bp)); ASSERT(BP_GET_DEDUP(bp));
@ -2581,12 +2604,12 @@ zio_ddt_read_done(zio_t *zio)
ddt_entry_t *dde = zio->io_vsd; ddt_entry_t *dde = zio->io_vsd;
if (ddt == NULL) { if (ddt == NULL) {
ASSERT(spa_load_state(zio->io_spa) != SPA_LOAD_NONE); ASSERT(spa_load_state(zio->io_spa) != SPA_LOAD_NONE);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
if (dde == NULL) { if (dde == NULL) {
zio->io_stage = ZIO_STAGE_DDT_READ_START >> 1; zio->io_stage = ZIO_STAGE_DDT_READ_START >> 1;
zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, B_FALSE); zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, B_FALSE);
return (ZIO_PIPELINE_STOP); return (NULL);
} }
if (dde->dde_repair_abd != NULL) { if (dde->dde_repair_abd != NULL) {
abd_copy(zio->io_abd, dde->dde_repair_abd, abd_copy(zio->io_abd, dde->dde_repair_abd,
@ -2599,7 +2622,7 @@ zio_ddt_read_done(zio_t *zio)
ASSERT(zio->io_vsd == NULL); ASSERT(zio->io_vsd == NULL);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static boolean_t static boolean_t
@ -2780,7 +2803,7 @@ zio_ddt_ditto_write_done(zio_t *zio)
ddt_exit(ddt); ddt_exit(ddt);
} }
static int static zio_t *
zio_ddt_write(zio_t *zio) zio_ddt_write(zio_t *zio)
{ {
spa_t *spa = zio->io_spa; spa_t *spa = zio->io_spa;
@ -2822,7 +2845,7 @@ zio_ddt_write(zio_t *zio)
} }
zio->io_pipeline = ZIO_WRITE_PIPELINE; zio->io_pipeline = ZIO_WRITE_PIPELINE;
ddt_exit(ddt); ddt_exit(ddt);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
ditto_copies = ddt_ditto_copies_needed(ddt, dde, ddp); ditto_copies = ddt_ditto_copies_needed(ddt, dde, ddp);
@ -2848,7 +2871,7 @@ zio_ddt_write(zio_t *zio)
zio->io_bp_override = NULL; zio->io_bp_override = NULL;
BP_ZERO(bp); BP_ZERO(bp);
ddt_exit(ddt); ddt_exit(ddt);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
dio = zio_write(zio, spa, txg, bp, zio->io_orig_abd, dio = zio_write(zio, spa, txg, bp, zio->io_orig_abd,
@ -2890,12 +2913,12 @@ zio_ddt_write(zio_t *zio)
if (dio) if (dio)
zio_nowait(dio); zio_nowait(dio);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
ddt_entry_t *freedde; /* for debugging */ ddt_entry_t *freedde; /* for debugging */
static int static zio_t *
zio_ddt_free(zio_t *zio) zio_ddt_free(zio_t *zio)
{ {
spa_t *spa = zio->io_spa; spa_t *spa = zio->io_spa;
@ -2916,7 +2939,7 @@ zio_ddt_free(zio_t *zio)
} }
ddt_exit(ddt); ddt_exit(ddt);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -2953,7 +2976,7 @@ zio_io_to_allocate(spa_t *spa)
return (zio); return (zio);
} }
static int static zio_t *
zio_dva_throttle(zio_t *zio) zio_dva_throttle(zio_t *zio)
{ {
spa_t *spa = zio->io_spa; spa_t *spa = zio->io_spa;
@ -2963,7 +2986,7 @@ zio_dva_throttle(zio_t *zio)
!spa_normal_class(zio->io_spa)->mc_alloc_throttle_enabled || !spa_normal_class(zio->io_spa)->mc_alloc_throttle_enabled ||
zio->io_child_type == ZIO_CHILD_GANG || zio->io_child_type == ZIO_CHILD_GANG ||
zio->io_flags & ZIO_FLAG_NODATA) { zio->io_flags & ZIO_FLAG_NODATA) {
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
ASSERT(zio->io_child_type > ZIO_CHILD_GANG); ASSERT(zio->io_child_type > ZIO_CHILD_GANG);
@ -2979,22 +3002,7 @@ zio_dva_throttle(zio_t *zio)
nio = zio_io_to_allocate(zio->io_spa); nio = zio_io_to_allocate(zio->io_spa);
mutex_exit(&spa->spa_alloc_lock); mutex_exit(&spa->spa_alloc_lock);
if (nio == zio) return (nio);
return (ZIO_PIPELINE_CONTINUE);
if (nio != NULL) {
ASSERT(nio->io_stage == ZIO_STAGE_DVA_THROTTLE);
/*
* We are passing control to a new zio so make sure that
* it is processed by a different thread. We do this to
* avoid stack overflows that can occur when parents are
* throttled and children are making progress. We allow
* it to go to the head of the taskq since it's already
* been waiting.
*/
zio_taskq_dispatch(nio, ZIO_TASKQ_ISSUE, B_TRUE);
}
return (ZIO_PIPELINE_STOP);
} }
void void
@ -3013,7 +3021,7 @@ zio_allocate_dispatch(spa_t *spa)
zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, B_TRUE); zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, B_TRUE);
} }
static int static zio_t *
zio_dva_allocate(zio_t *zio) zio_dva_allocate(zio_t *zio)
{ {
spa_t *spa = zio->io_spa; spa_t *spa = zio->io_spa;
@ -3054,18 +3062,18 @@ zio_dva_allocate(zio_t *zio)
zio->io_error = error; zio->io_error = error;
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_dva_free(zio_t *zio) zio_dva_free(zio_t *zio)
{ {
metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg, B_FALSE); metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg, B_FALSE);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_dva_claim(zio_t *zio) zio_dva_claim(zio_t *zio)
{ {
int error; int error;
@ -3074,7 +3082,7 @@ zio_dva_claim(zio_t *zio)
if (error) if (error)
zio->io_error = error; zio->io_error = error;
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -3172,7 +3180,7 @@ zio_free_zil(spa_t *spa, uint64_t txg, blkptr_t *bp)
* force the underlying vdev layers to call either zio_execute() or * force the underlying vdev layers to call either zio_execute() or
* zio_interrupt() to ensure that the pipeline continues with the correct I/O. * zio_interrupt() to ensure that the pipeline continues with the correct I/O.
*/ */
static int static zio_t *
zio_vdev_io_start(zio_t *zio) zio_vdev_io_start(zio_t *zio)
{ {
vdev_t *vd = zio->io_vd; vdev_t *vd = zio->io_vd;
@ -3192,7 +3200,7 @@ zio_vdev_io_start(zio_t *zio)
* The mirror_ops handle multiple DVAs in a single BP. * The mirror_ops handle multiple DVAs in a single BP.
*/ */
vdev_mirror_ops.vdev_op_io_start(zio); vdev_mirror_ops.vdev_op_io_start(zio);
return (ZIO_PIPELINE_STOP); return (NULL);
} }
ASSERT3P(zio->io_logical, !=, zio); ASSERT3P(zio->io_logical, !=, zio);
@ -3269,31 +3277,31 @@ zio_vdev_io_start(zio_t *zio)
!vdev_dtl_contains(vd, DTL_PARTIAL, zio->io_txg, 1)) { !vdev_dtl_contains(vd, DTL_PARTIAL, zio->io_txg, 1)) {
ASSERT(zio->io_type == ZIO_TYPE_WRITE); ASSERT(zio->io_type == ZIO_TYPE_WRITE);
zio_vdev_io_bypass(zio); zio_vdev_io_bypass(zio);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
if (vd->vdev_ops->vdev_op_leaf && if (vd->vdev_ops->vdev_op_leaf &&
(zio->io_type == ZIO_TYPE_READ || zio->io_type == ZIO_TYPE_WRITE)) { (zio->io_type == ZIO_TYPE_READ || zio->io_type == ZIO_TYPE_WRITE)) {
if (zio->io_type == ZIO_TYPE_READ && vdev_cache_read(zio)) if (zio->io_type == ZIO_TYPE_READ && vdev_cache_read(zio))
return (ZIO_PIPELINE_CONTINUE); return (zio);
if ((zio = vdev_queue_io(zio)) == NULL) if ((zio = vdev_queue_io(zio)) == NULL)
return (ZIO_PIPELINE_STOP); return (NULL);
if (!vdev_accessible(vd, zio)) { if (!vdev_accessible(vd, zio)) {
zio->io_error = SET_ERROR(ENXIO); zio->io_error = SET_ERROR(ENXIO);
zio_interrupt(zio); zio_interrupt(zio);
return (ZIO_PIPELINE_STOP); return (NULL);
} }
zio->io_delay = gethrtime(); zio->io_delay = gethrtime();
} }
vd->vdev_ops->vdev_op_io_start(zio); vd->vdev_ops->vdev_op_io_start(zio);
return (ZIO_PIPELINE_STOP); return (NULL);
} }
static int static zio_t *
zio_vdev_io_done(zio_t *zio) zio_vdev_io_done(zio_t *zio)
{ {
vdev_t *vd = zio->io_vd; vdev_t *vd = zio->io_vd;
@ -3301,7 +3309,7 @@ zio_vdev_io_done(zio_t *zio)
boolean_t unexpected_error = B_FALSE; boolean_t unexpected_error = B_FALSE;
if (zio_wait_for_children(zio, ZIO_CHILD_VDEV_BIT, ZIO_WAIT_DONE)) { if (zio_wait_for_children(zio, ZIO_CHILD_VDEV_BIT, ZIO_WAIT_DONE)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
ASSERT(zio->io_type == ZIO_TYPE_READ || zio->io_type == ZIO_TYPE_WRITE); ASSERT(zio->io_type == ZIO_TYPE_READ || zio->io_type == ZIO_TYPE_WRITE);
@ -3337,7 +3345,7 @@ zio_vdev_io_done(zio_t *zio)
if (unexpected_error) if (unexpected_error)
VERIFY(vdev_probe(vd, zio) == NULL); VERIFY(vdev_probe(vd, zio) == NULL);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -3366,13 +3374,13 @@ zio_vsd_default_cksum_report(zio_t *zio, zio_cksum_report_t *zcr, void *ignored)
zcr->zcr_free = zio_abd_free; zcr->zcr_free = zio_abd_free;
} }
static int static zio_t *
zio_vdev_io_assess(zio_t *zio) zio_vdev_io_assess(zio_t *zio)
{ {
vdev_t *vd = zio->io_vd; vdev_t *vd = zio->io_vd;
if (zio_wait_for_children(zio, ZIO_CHILD_VDEV_BIT, ZIO_WAIT_DONE)) { if (zio_wait_for_children(zio, ZIO_CHILD_VDEV_BIT, ZIO_WAIT_DONE)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
if (vd == NULL && !(zio->io_flags & ZIO_FLAG_CONFIG_WRITER)) if (vd == NULL && !(zio->io_flags & ZIO_FLAG_CONFIG_WRITER))
@ -3402,7 +3410,7 @@ zio_vdev_io_assess(zio_t *zio)
zio->io_stage = ZIO_STAGE_VDEV_IO_START >> 1; zio->io_stage = ZIO_STAGE_VDEV_IO_START >> 1;
zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE, zio_taskq_dispatch(zio, ZIO_TASKQ_ISSUE,
zio_requeue_io_start_cut_in_line); zio_requeue_io_start_cut_in_line);
return (ZIO_PIPELINE_STOP); return (NULL);
} }
/* /*
@ -3442,7 +3450,7 @@ zio_vdev_io_assess(zio_t *zio)
zio->io_physdone(zio->io_logical); zio->io_physdone(zio->io_logical);
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
void void
@ -3477,7 +3485,7 @@ zio_vdev_io_bypass(zio_t *zio)
* Generate and verify checksums * Generate and verify checksums
* ========================================================================== * ==========================================================================
*/ */
static int static zio_t *
zio_checksum_generate(zio_t *zio) zio_checksum_generate(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -3491,7 +3499,7 @@ zio_checksum_generate(zio_t *zio)
checksum = zio->io_prop.zp_checksum; checksum = zio->io_prop.zp_checksum;
if (checksum == ZIO_CHECKSUM_OFF) if (checksum == ZIO_CHECKSUM_OFF)
return (ZIO_PIPELINE_CONTINUE); return (zio);
ASSERT(checksum == ZIO_CHECKSUM_LABEL); ASSERT(checksum == ZIO_CHECKSUM_LABEL);
} else { } else {
@ -3505,10 +3513,10 @@ zio_checksum_generate(zio_t *zio)
zio_checksum_compute(zio, checksum, zio->io_abd, zio->io_size); zio_checksum_compute(zio, checksum, zio->io_abd, zio->io_size);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
static int static zio_t *
zio_checksum_verify(zio_t *zio) zio_checksum_verify(zio_t *zio)
{ {
zio_bad_cksum_t info; zio_bad_cksum_t info;
@ -3523,7 +3531,7 @@ zio_checksum_verify(zio_t *zio)
* We're either verifying a label checksum, or nothing at all. * We're either verifying a label checksum, or nothing at all.
*/ */
if (zio->io_prop.zp_checksum == ZIO_CHECKSUM_OFF) if (zio->io_prop.zp_checksum == ZIO_CHECKSUM_OFF)
return (ZIO_PIPELINE_CONTINUE); return (zio);
ASSERT(zio->io_prop.zp_checksum == ZIO_CHECKSUM_LABEL); ASSERT(zio->io_prop.zp_checksum == ZIO_CHECKSUM_LABEL);
} }
@ -3538,7 +3546,7 @@ zio_checksum_verify(zio_t *zio)
} }
} }
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -3581,7 +3589,7 @@ zio_worst_error(int e1, int e2)
* I/O completion * I/O completion
* ========================================================================== * ==========================================================================
*/ */
static int static zio_t *
zio_ready(zio_t *zio) zio_ready(zio_t *zio)
{ {
blkptr_t *bp = zio->io_bp; blkptr_t *bp = zio->io_bp;
@ -3590,7 +3598,7 @@ zio_ready(zio_t *zio)
if (zio_wait_for_children(zio, ZIO_CHILD_GANG_BIT | ZIO_CHILD_DDT_BIT, if (zio_wait_for_children(zio, ZIO_CHILD_GANG_BIT | ZIO_CHILD_DDT_BIT,
ZIO_WAIT_READY)) { ZIO_WAIT_READY)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
if (zio->io_ready) { if (zio->io_ready) {
@ -3636,7 +3644,7 @@ zio_ready(zio_t *zio)
*/ */
for (; pio != NULL; pio = pio_next) { for (; pio != NULL; pio = pio_next) {
pio_next = zio_walk_parents(zio, &zl); pio_next = zio_walk_parents(zio, &zl);
zio_notify_parent(pio, zio, ZIO_WAIT_READY); zio_notify_parent(pio, zio, ZIO_WAIT_READY, NULL);
} }
if (zio->io_flags & ZIO_FLAG_NODATA) { if (zio->io_flags & ZIO_FLAG_NODATA) {
@ -3652,7 +3660,7 @@ zio_ready(zio_t *zio)
zio->io_spa->spa_syncing_txg == zio->io_txg) zio->io_spa->spa_syncing_txg == zio->io_txg)
zio_handle_ignored_writes(zio); zio_handle_ignored_writes(zio);
return (ZIO_PIPELINE_CONTINUE); return (zio);
} }
/* /*
@ -3716,7 +3724,7 @@ zio_dva_throttle_done(zio_t *zio)
zio_allocate_dispatch(zio->io_spa); zio_allocate_dispatch(zio->io_spa);
} }
static int static zio_t *
zio_done(zio_t *zio) zio_done(zio_t *zio)
{ {
/* /*
@ -3733,7 +3741,7 @@ zio_done(zio_t *zio)
* wait for them and then repeat this pipeline stage. * wait for them and then repeat this pipeline stage.
*/ */
if (zio_wait_for_children(zio, ZIO_CHILD_ALL_BITS, ZIO_WAIT_DONE)) { if (zio_wait_for_children(zio, ZIO_CHILD_ALL_BITS, ZIO_WAIT_DONE)) {
return (ZIO_PIPELINE_STOP); return (NULL);
} }
/* /*
@ -3758,7 +3766,7 @@ zio_done(zio_t *zio)
ASSERT(zio->io_priority == ZIO_PRIORITY_ASYNC_WRITE); ASSERT(zio->io_priority == ZIO_PRIORITY_ASYNC_WRITE);
ASSERT(zio->io_bp != NULL); ASSERT(zio->io_bp != NULL);
metaslab_group_alloc_verify(zio->io_spa, zio->io_bp, zio); metaslab_group_alloc_verify(zio->io_spa, zio->io_bp, zio);
VERIFY(refcount_not_held( VERIFY(zfs_refcount_not_held(
&(spa_normal_class(zio->io_spa)->mc_alloc_slots), zio)); &(spa_normal_class(zio->io_spa)->mc_alloc_slots), zio));
} }
@ -3957,7 +3965,12 @@ zio_done(zio_t *zio)
if ((pio->io_flags & ZIO_FLAG_GODFATHER) && if ((pio->io_flags & ZIO_FLAG_GODFATHER) &&
(zio->io_reexecute & ZIO_REEXECUTE_SUSPEND)) { (zio->io_reexecute & ZIO_REEXECUTE_SUSPEND)) {
zio_remove_child(pio, zio, remove_zl); zio_remove_child(pio, zio, remove_zl);
zio_notify_parent(pio, zio, ZIO_WAIT_DONE); /*
* This is a rare code path, so we don't
* bother with "next_to_execute".
*/
zio_notify_parent(pio, zio, ZIO_WAIT_DONE,
NULL);
} }
} }
@ -3969,7 +3982,11 @@ zio_done(zio_t *zio)
*/ */
ASSERT(!(zio->io_flags & ZIO_FLAG_GODFATHER)); ASSERT(!(zio->io_flags & ZIO_FLAG_GODFATHER));
zio->io_flags |= ZIO_FLAG_DONT_PROPAGATE; zio->io_flags |= ZIO_FLAG_DONT_PROPAGATE;
zio_notify_parent(pio, zio, ZIO_WAIT_DONE); /*
* This is a rare code path, so we don't bother with
* "next_to_execute".
*/
zio_notify_parent(pio, zio, ZIO_WAIT_DONE, NULL);
} else if (zio->io_reexecute & ZIO_REEXECUTE_SUSPEND) { } else if (zio->io_reexecute & ZIO_REEXECUTE_SUSPEND) {
/* /*
* We'd fail again if we reexecuted now, so suspend * We'd fail again if we reexecuted now, so suspend
@ -3987,7 +4004,7 @@ zio_done(zio_t *zio)
(task_func_t *)zio_reexecute, zio, 0, (task_func_t *)zio_reexecute, zio, 0,
&zio->io_tqent); &zio->io_tqent);
} }
return (ZIO_PIPELINE_STOP); return (NULL);
} }
ASSERT(zio->io_child_count == 0); ASSERT(zio->io_child_count == 0);
@ -4023,12 +4040,17 @@ zio_done(zio_t *zio)
zio->io_state[ZIO_WAIT_DONE] = 1; zio->io_state[ZIO_WAIT_DONE] = 1;
mutex_exit(&zio->io_lock); mutex_exit(&zio->io_lock);
/*
* We are done executing this zio. We may want to execute a parent
* next. See the comment in zio_notify_parent().
*/
zio_t *next_to_execute = NULL;
zl = NULL; zl = NULL;
for (pio = zio_walk_parents(zio, &zl); pio != NULL; pio = pio_next) { for (pio = zio_walk_parents(zio, &zl); pio != NULL; pio = pio_next) {
zio_link_t *remove_zl = zl; zio_link_t *remove_zl = zl;
pio_next = zio_walk_parents(zio, &zl); pio_next = zio_walk_parents(zio, &zl);
zio_remove_child(pio, zio, remove_zl); zio_remove_child(pio, zio, remove_zl);
zio_notify_parent(pio, zio, ZIO_WAIT_DONE); zio_notify_parent(pio, zio, ZIO_WAIT_DONE, &next_to_execute);
} }
if (zio->io_waiter != NULL) { if (zio->io_waiter != NULL) {
@ -4040,7 +4062,7 @@ zio_done(zio_t *zio)
zio_destroy(zio); zio_destroy(zio);
} }
return (ZIO_PIPELINE_STOP); return (next_to_execute);
} }
/* /*

View File

@ -181,6 +181,28 @@ zpl_statfs(struct dentry *dentry, struct kstatfs *statp)
spl_fstrans_unmark(cookie); spl_fstrans_unmark(cookie);
ASSERT3S(error, <=, 0); ASSERT3S(error, <=, 0);
/*
* If required by a 32-bit system call, dynamically scale the
* block size up to 16MiB and decrease the block counts. This
* allows for a maximum size of 64EiB to be reported. The file
* counts must be artificially capped at 2^32-1.
*/
if (unlikely(zpl_is_32bit_api())) {
while (statp->f_blocks > UINT32_MAX &&
statp->f_bsize < SPA_MAXBLOCKSIZE) {
statp->f_frsize <<= 1;
statp->f_bsize <<= 1;
statp->f_blocks >>= 1;
statp->f_bfree >>= 1;
statp->f_bavail >>= 1;
}
uint64_t usedobjs = statp->f_files - statp->f_ffree;
statp->f_ffree = MIN(statp->f_ffree, UINT32_MAX - usedobjs);
statp->f_files = statp->f_ffree + usedobjs;
}
return (error); return (error);
} }

View File

@ -52,6 +52,10 @@ URL: http://zfsonlinux.org/
Source0: %{module}-%{version}.tar.gz Source0: %{module}-%{version}.tar.gz
Source10: kmodtool Source10: kmodtool
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id} -u -n) BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id} -u -n)
%if 0%{?rhel}%{?fedora}
BuildRequires: gcc, make
BuildRequires: elfutils-libelf-devel
%endif
# The developments headers will conflict with the dkms packages. # The developments headers will conflict with the dkms packages.
Conflicts: %{module}-dkms Conflicts: %{module}-dkms
@ -191,6 +195,15 @@ chmod u+x ${RPM_BUILD_ROOT}%{kmodinstdir_prefix}/*/extra/*/*/*
rm -rf $RPM_BUILD_ROOT rm -rf $RPM_BUILD_ROOT
%changelog %changelog
* Fri Feb 22 2019 Tony Hutter <hutter2@llnl.gov> - 0.7.13-1
- Released 0.7.13-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.13
* Thu Nov 08 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.12-1
- Released 0.7.12-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.12
* Thu Sep 13 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.11-1
- Released 0.7.11-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.11
* Wed Sep 05 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.10-1 * Wed Sep 05 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.10-1
- Released 0.7.10-1, detailed release notes are available at: - Released 0.7.10-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.10 - https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.10

View File

@ -91,6 +91,7 @@ Provides: %{name}-kmod-common = %{version}
Conflicts: zfs-fuse Conflicts: zfs-fuse
%if 0%{?rhel}%{?fedora}%{?suse_version} %if 0%{?rhel}%{?fedora}%{?suse_version}
BuildRequires: gcc, make
BuildRequires: zlib-devel BuildRequires: zlib-devel
BuildRequires: libuuid-devel BuildRequires: libuuid-devel
BuildRequires: libblkid-devel BuildRequires: libblkid-devel
@ -282,6 +283,15 @@ fi
%endif %endif
exit 0 exit 0
# On RHEL/CentOS 7 the static nodes aren't refreshed by default after
# installing a package. This is the default behavior for Fedora.
%posttrans
%if 0%{?rhel} == 7 || 0%{?centos} == 7
systemctl restart kmod-static-nodes
systemctl restart systemd-tmpfiles-setup-dev
udevadm trigger
%endif
%preun %preun
%if 0%{?_systemd} %if 0%{?_systemd}
%if 0%{?systemd_preun:1} %if 0%{?systemd_preun:1}
@ -371,6 +381,15 @@ systemctl --system daemon-reload >/dev/null || true
%endif %endif
%changelog %changelog
* Fri Feb 22 2019 Tony Hutter <hutter2@llnl.gov> - 0.7.13-1
- Released 0.7.13-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.13
* Thu Nov 08 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.12-1
- Released 0.7.12-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.12
* Thu Sep 13 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.11-1
- Released 0.7.11-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.11
* Wed Sep 05 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.10-1 * Wed Sep 05 2018 Tony Hutter <hutter2@llnl.gov> - 0.7.10-1
- Released 0.7.10-1, detailed release notes are available at: - Released 0.7.10-1, detailed release notes are available at:
- https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.10 - https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.10

View File

@ -50,10 +50,10 @@ function new_change_commit()
{ {
error=0 error=0
# subject is not longer than 50 characters # subject is not longer than 72 characters
long_subject=$(git log -n 1 --pretty=%s "$REF" | grep -E -m 1 '.{51}') long_subject=$(git log -n 1 --pretty=%s "$REF" | grep -E -m 1 '.{73}')
if [ -n "$long_subject" ]; then if [ -n "$long_subject" ]; then
echo "error: commit subject over 50 characters" echo "error: commit subject over 72 characters"
error=1 error=1
fi fi

View File

@ -65,6 +65,10 @@ PRE_BUILD="configure
then then
echo --enable-debug-dmu-tx echo --enable-debug-dmu-tx
fi fi
if [[ \${ZFS_DKMS_ENABLE_DEBUGINFO,,} == @(y|yes) ]]
then
echo --enable-debuginfo
fi
} }
) )
" "

View File

@ -499,7 +499,8 @@ tags = ['functional', 'mmap']
[tests/functional/mmp] [tests/functional/mmp]
tests = ['mmp_on_thread', 'mmp_on_uberblocks', 'mmp_on_off', 'mmp_interval', tests = ['mmp_on_thread', 'mmp_on_uberblocks', 'mmp_on_off', 'mmp_interval',
'mmp_active_import', 'mmp_inactive_import', 'mmp_exported_import', 'mmp_active_import', 'mmp_inactive_import', 'mmp_exported_import',
'mmp_write_uberblocks', 'mmp_reset_interval', 'multihost_history'] 'mmp_write_uberblocks', 'mmp_reset_interval', 'multihost_history',
'mmp_on_zdb']
tags = ['functional', 'mmp'] tags = ['functional', 'mmp']
[tests/functional/mount] [tests/functional/mount]

View File

@ -12,13 +12,19 @@
# #
# #
# Copyright (c) 2012, 2015 by Delphix. All rights reserved. # Copyright (c) 2012, 2018 by Delphix. All rights reserved.
# Copyright (c) 2017 Datto Inc. # Copyright (c) 2017 Datto Inc.
# #
import ConfigParser # some python 2.7 system don't have a configparser shim
try:
import configparser
except ImportError:
import ConfigParser as configparser
import os import os
import logging import sys
from datetime import datetime from datetime import datetime
from optparse import OptionParser from optparse import OptionParser
from pwd import getpwnam from pwd import getpwnam
@ -26,8 +32,6 @@ from pwd import getpwuid
from select import select from select import select
from subprocess import PIPE from subprocess import PIPE
from subprocess import Popen from subprocess import Popen
from sys import argv
from sys import maxint
from threading import Timer from threading import Timer
from time import time from time import time
@ -36,6 +40,10 @@ TESTDIR = '/usr/share/zfs/'
KILL = 'kill' KILL = 'kill'
TRUE = 'true' TRUE = 'true'
SUDO = 'sudo' SUDO = 'sudo'
LOG_FILE = 'LOG_FILE'
LOG_OUT = 'LOG_OUT'
LOG_ERR = 'LOG_ERR'
LOG_FILE_OBJ = None
class Result(object): class Result(object):
@ -79,7 +87,7 @@ class Output(object):
""" """
def __init__(self, stream): def __init__(self, stream):
self.stream = stream self.stream = stream
self._buf = '' self._buf = b''
self.lines = [] self.lines = []
def fileno(self): def fileno(self):
@ -104,15 +112,15 @@ class Output(object):
buf = os.read(fd, 4096) buf = os.read(fd, 4096)
if not buf: if not buf:
return None return None
if '\n' not in buf: if b'\n' not in buf:
self._buf += buf self._buf += buf
return [] return []
buf = self._buf + buf buf = self._buf + buf
tmp, rest = buf.rsplit('\n', 1) tmp, rest = buf.rsplit(b'\n', 1)
self._buf = rest self._buf = rest
now = datetime.now() now = datetime.now()
rows = tmp.split('\n') rows = tmp.split(b'\n')
self.lines += [(now, r) for r in rows] self.lines += [(now, r) for r in rows]
@ -204,23 +212,23 @@ class Cmd(object):
if needed. Run the command, and update the result object. if needed. Run the command, and update the result object.
""" """
if options.dryrun is True: if options.dryrun is True:
print self print(self)
return return
privcmd = self.update_cmd_privs(self.pathname, self.user) privcmd = self.update_cmd_privs(self.pathname, self.user)
try: try:
old = os.umask(0) old = os.umask(0)
if not os.path.isdir(self.outputdir): if not os.path.isdir(self.outputdir):
os.makedirs(self.outputdir, mode=0777) os.makedirs(self.outputdir, mode=0o777)
os.umask(old) os.umask(old)
except OSError, e: except OSError as e:
fail('%s' % e) fail('%s' % e)
self.result.starttime = time() self.result.starttime = time()
proc = Popen(privcmd, stdout=PIPE, stderr=PIPE) proc = Popen(privcmd, stdout=PIPE, stderr=PIPE)
# Allow a special timeout value of 0 to mean infinity # Allow a special timeout value of 0 to mean infinity
if int(self.timeout) == 0: if int(self.timeout) == 0:
self.timeout = maxint self.timeout = sys.maxsize
t = Timer(int(self.timeout), self.kill_cmd, [proc]) t = Timer(int(self.timeout), self.kill_cmd, [proc])
try: try:
@ -247,50 +255,52 @@ class Cmd(object):
self.result.runtime = '%02d:%02d' % (m, s) self.result.runtime = '%02d:%02d' % (m, s)
self.result.result = 'SKIP' self.result.result = 'SKIP'
def log(self, logger, options): def log(self, options):
""" """
This function is responsible for writing all output. This includes This function is responsible for writing all output. This includes
the console output, the logfile of all results (with timestamped the console output, the logfile of all results (with timestamped
merged stdout and stderr), and for each test, the unmodified merged stdout and stderr), and for each test, the unmodified
stdout/stderr/merged in it's own file. stdout/stderr/merged in it's own file.
""" """
if logger is None:
return
logname = getpwuid(os.getuid()).pw_name logname = getpwuid(os.getuid()).pw_name
user = ' (run as %s)' % (self.user if len(self.user) else logname) user = ' (run as %s)' % (self.user if len(self.user) else logname)
msga = 'Test: %s%s ' % (self.pathname, user) msga = 'Test: %s%s ' % (self.pathname, user)
msgb = '[%s] [%s]' % (self.result.runtime, self.result.result) msgb = '[%s] [%s]\n' % (self.result.runtime, self.result.result)
pad = ' ' * (80 - (len(msga) + len(msgb))) pad = ' ' * (80 - (len(msga) + len(msgb)))
result_line = msga + pad + msgb
# If -q is specified, only print a line for tests that didn't pass. # The result line is always written to the log file. If -q was
# This means passing tests need to be logged as DEBUG, or the one # specified only failures are written to the console, otherwise
# line summary will only be printed in the logfile for failures. # the result line is written to the console.
write_log(bytearray(result_line, encoding='utf-8'), LOG_FILE)
if not options.quiet: if not options.quiet:
logger.info('%s%s%s' % (msga, pad, msgb)) write_log(result_line, LOG_OUT)
elif self.result.result is not 'PASS': elif options.quiet and self.result.result is not 'PASS':
logger.info('%s%s%s' % (msga, pad, msgb)) write_log(result_line, LOG_OUT)
else:
logger.debug('%s%s%s' % (msga, pad, msgb))
lines = sorted(self.result.stdout + self.result.stderr, lines = sorted(self.result.stdout + self.result.stderr,
cmp=lambda x, y: cmp(x[0], y[0])) key=lambda x: x[0])
# Write timestamped output (stdout and stderr) to the logfile
for dt, line in lines: for dt, line in lines:
logger.debug('%s %s' % (dt.strftime("%H:%M:%S.%f ")[:11], line)) timestamp = bytearray(dt.strftime("%H:%M:%S.%f ")[:11],
encoding='utf-8')
write_log(b'%s %s\n' % (timestamp, line), LOG_FILE)
# Write the separate stdout/stderr/merged files, if the data exists
if len(self.result.stdout): if len(self.result.stdout):
with open(os.path.join(self.outputdir, 'stdout'), 'w') as out: with open(os.path.join(self.outputdir, 'stdout'), 'wb') as out:
for _, line in self.result.stdout: for _, line in self.result.stdout:
os.write(out.fileno(), '%s\n' % line) os.write(out.fileno(), b'%s\n' % line)
if len(self.result.stderr): if len(self.result.stderr):
with open(os.path.join(self.outputdir, 'stderr'), 'w') as err: with open(os.path.join(self.outputdir, 'stderr'), 'wb') as err:
for _, line in self.result.stderr: for _, line in self.result.stderr:
os.write(err.fileno(), '%s\n' % line) os.write(err.fileno(), b'%s\n' % line)
if len(self.result.stdout) and len(self.result.stderr): if len(self.result.stdout) and len(self.result.stderr):
with open(os.path.join(self.outputdir, 'merged'), 'w') as merged: with open(os.path.join(self.outputdir, 'merged'), 'wb') as merged:
for _, line in lines: for _, line in lines:
os.write(merged.fileno(), '%s\n' % line) os.write(merged.fileno(), b'%s\n' % line)
class Test(Cmd): class Test(Cmd):
@ -318,7 +328,7 @@ class Test(Cmd):
(self.pathname, self.outputdir, self.timeout, self.pre, (self.pathname, self.outputdir, self.timeout, self.pre,
pre_user, self.post, post_user, self.user, self.tags) pre_user, self.post, post_user, self.user, self.tags)
def verify(self, logger): def verify(self):
""" """
Check the pre/post scripts, user and Test. Omit the Test from this Check the pre/post scripts, user and Test. Omit the Test from this
run if there are any problems. run if there are any problems.
@ -328,19 +338,19 @@ class Test(Cmd):
for f in [f for f in files if len(f)]: for f in [f for f in files if len(f)]:
if not verify_file(f): if not verify_file(f):
logger.info("Warning: Test '%s' not added to this run because" write_log("Warning: Test '%s' not added to this run because"
" it failed verification." % f) " it failed verification.\n" % f, LOG_ERR)
return False return False
for user in [user for user in users if len(user)]: for user in [user for user in users if len(user)]:
if not verify_user(user, logger): if not verify_user(user):
logger.info("Not adding Test '%s' to this run." % write_log("Not adding Test '%s' to this run.\n" %
self.pathname) self.pathname, LOG_ERR)
return False return False
return True return True
def run(self, logger, options): def run(self, options):
""" """
Create Cmd instances for the pre/post scripts. If the pre script Create Cmd instances for the pre/post scripts. If the pre script
doesn't pass, skip this Test. Run the post script regardless. doesn't pass, skip this Test. Run the post script regardless.
@ -358,18 +368,18 @@ class Test(Cmd):
if len(pretest.pathname): if len(pretest.pathname):
pretest.run(options) pretest.run(options)
cont = pretest.result.result is 'PASS' cont = pretest.result.result is 'PASS'
pretest.log(logger, options) pretest.log(options)
if cont: if cont:
test.run(options) test.run(options)
else: else:
test.skip() test.skip()
test.log(logger, options) test.log(options)
if len(posttest.pathname): if len(posttest.pathname):
posttest.run(options) posttest.run(options)
posttest.log(logger, options) posttest.log(options)
class TestGroup(Test): class TestGroup(Test):
@ -393,7 +403,7 @@ class TestGroup(Test):
(self.pathname, self.outputdir, self.tests, self.timeout, (self.pathname, self.outputdir, self.tests, self.timeout,
self.pre, pre_user, self.post, post_user, self.user, self.tags) self.pre, pre_user, self.post, post_user, self.user, self.tags)
def verify(self, logger): def verify(self):
""" """
Check the pre/post scripts, user and tests in this TestGroup. Omit Check the pre/post scripts, user and tests in this TestGroup. Omit
the TestGroup entirely, or simply delete the relevant tests in the the TestGroup entirely, or simply delete the relevant tests in the
@ -411,34 +421,34 @@ class TestGroup(Test):
for f in [f for f in auxfiles if len(f)]: for f in [f for f in auxfiles if len(f)]:
if self.pathname != os.path.dirname(f): if self.pathname != os.path.dirname(f):
logger.info("Warning: TestGroup '%s' not added to this run. " write_log("Warning: TestGroup '%s' not added to this run. "
"Auxiliary script '%s' exists in a different " "Auxiliary script '%s' exists in a different "
"directory." % (self.pathname, f)) "directory.\n" % (self.pathname, f), LOG_ERR)
return False return False
if not verify_file(f): if not verify_file(f):
logger.info("Warning: TestGroup '%s' not added to this run. " write_log("Warning: TestGroup '%s' not added to this run. "
"Auxiliary script '%s' failed verification." % "Auxiliary script '%s' failed verification.\n" %
(self.pathname, f)) (self.pathname, f), LOG_ERR)
return False return False
for user in [user for user in users if len(user)]: for user in [user for user in users if len(user)]:
if not verify_user(user, logger): if not verify_user(user):
logger.info("Not adding TestGroup '%s' to this run." % write_log("Not adding TestGroup '%s' to this run.\n" %
self.pathname) self.pathname, LOG_ERR)
return False return False
# If one of the tests is invalid, delete it, log it, and drive on. # If one of the tests is invalid, delete it, log it, and drive on.
for test in self.tests: for test in self.tests:
if not verify_file(os.path.join(self.pathname, test)): if not verify_file(os.path.join(self.pathname, test)):
del self.tests[self.tests.index(test)] del self.tests[self.tests.index(test)]
logger.info("Warning: Test '%s' removed from TestGroup '%s' " write_log("Warning: Test '%s' removed from TestGroup '%s' "
"because it failed verification." % "because it failed verification.\n" %
(test, self.pathname)) (test, self.pathname), LOG_ERR)
return len(self.tests) is not 0 return len(self.tests) is not 0
def run(self, logger, options): def run(self, options):
""" """
Create Cmd instances for the pre/post scripts. If the pre script Create Cmd instances for the pre/post scripts. If the pre script
doesn't pass, skip all the tests in this TestGroup. Run the post doesn't pass, skip all the tests in this TestGroup. Run the post
@ -459,7 +469,7 @@ class TestGroup(Test):
if len(pretest.pathname): if len(pretest.pathname):
pretest.run(options) pretest.run(options)
cont = pretest.result.result is 'PASS' cont = pretest.result.result is 'PASS'
pretest.log(logger, options) pretest.log(options)
for fname in self.tests: for fname in self.tests:
test = Cmd(os.path.join(self.pathname, fname), test = Cmd(os.path.join(self.pathname, fname),
@ -470,11 +480,11 @@ class TestGroup(Test):
else: else:
test.skip() test.skip()
test.log(logger, options) test.log(options)
if len(posttest.pathname): if len(posttest.pathname):
posttest.run(options) posttest.run(options)
posttest.log(logger, options) posttest.log(options)
class TestRun(object): class TestRun(object):
@ -486,7 +496,7 @@ class TestRun(object):
self.starttime = time() self.starttime = time()
self.timestamp = datetime.now().strftime('%Y%m%dT%H%M%S') self.timestamp = datetime.now().strftime('%Y%m%dT%H%M%S')
self.outputdir = os.path.join(options.outputdir, self.timestamp) self.outputdir = os.path.join(options.outputdir, self.timestamp)
self.logger = self.setup_logging(options) self.setup_logging(options)
self.defaults = [ self.defaults = [
('outputdir', BASEDIR), ('outputdir', BASEDIR),
('quiet', False), ('quiet', False),
@ -519,7 +529,7 @@ class TestRun(object):
for prop in Test.props: for prop in Test.props:
setattr(test, prop, getattr(options, prop)) setattr(test, prop, getattr(options, prop))
if test.verify(self.logger): if test.verify():
self.tests[pathname] = test self.tests[pathname] = test
def addtestgroup(self, dirname, filenames, options): def addtestgroup(self, dirname, filenames, options):
@ -541,9 +551,9 @@ class TestRun(object):
self.testgroups[dirname] = testgroup self.testgroups[dirname] = testgroup
self.testgroups[dirname].tests = sorted(filenames) self.testgroups[dirname].tests = sorted(filenames)
testgroup.verify(self.logger) testgroup.verify()
def read(self, logger, options): def read(self, options):
""" """
Read in the specified runfile, and apply the TestRun properties Read in the specified runfile, and apply the TestRun properties
listed in the 'DEFAULT' section to our TestRun. Then read each listed in the 'DEFAULT' section to our TestRun. Then read each
@ -552,7 +562,7 @@ class TestRun(object):
in the 'DEFAULT' section. If the Test or TestGroup passes in the 'DEFAULT' section. If the Test or TestGroup passes
verification, add it to the TestRun. verification, add it to the TestRun.
""" """
config = ConfigParser.RawConfigParser() config = configparser.RawConfigParser()
if not len(config.read(options.runfile)): if not len(config.read(options.runfile)):
fail("Coulnd't read config file %s" % options.runfile) fail("Coulnd't read config file %s" % options.runfile)
@ -584,7 +594,7 @@ class TestRun(object):
# Repopulate tests using eval to convert the string to a list # Repopulate tests using eval to convert the string to a list
testgroup.tests = eval(config.get(section, 'tests')) testgroup.tests = eval(config.get(section, 'tests'))
if testgroup.verify(logger): if testgroup.verify():
self.testgroups[section] = testgroup self.testgroups[section] = testgroup
else: else:
test = Test(section) test = Test(section)
@ -593,7 +603,7 @@ class TestRun(object):
if config.has_option(sect, prop): if config.has_option(sect, prop):
setattr(test, prop, config.get(sect, prop)) setattr(test, prop, config.get(sect, prop))
if test.verify(logger): if test.verify():
self.tests[section] = test self.tests[section] = test
def write(self, options): def write(self, options):
@ -608,7 +618,7 @@ class TestRun(object):
defaults = dict([(prop, getattr(options, prop)) for prop, _ in defaults = dict([(prop, getattr(options, prop)) for prop, _ in
self.defaults]) self.defaults])
config = ConfigParser.RawConfigParser(defaults) config = configparser.RawConfigParser(defaults)
for test in sorted(self.tests.keys()): for test in sorted(self.tests.keys()):
config.add_section(test) config.add_section(test)
@ -637,14 +647,15 @@ class TestRun(object):
""" """
done = False done = False
components = 0 components = 0
tmp_dict = dict(self.tests.items() + self.testgroups.items()) tmp_dict = dict(list(self.tests.items()) +
list(self.testgroups.items()))
total = len(tmp_dict) total = len(tmp_dict)
base = self.outputdir base = self.outputdir
while not done: while not done:
paths = [] paths = []
components -= 1 components -= 1
for testfile in tmp_dict.keys(): for testfile in list(tmp_dict.keys()):
uniq = '/'.join(testfile.split('/')[components:]).lstrip('/') uniq = '/'.join(testfile.split('/')[components:]).lstrip('/')
if uniq not in paths: if uniq not in paths:
paths.append(uniq) paths.append(uniq)
@ -655,42 +666,23 @@ class TestRun(object):
def setup_logging(self, options): def setup_logging(self, options):
""" """
Two loggers are set up here. The first is for the logfile which This funtion creates the output directory and gets a file object
will contain one line summarizing the test, including the test for the logfile. This function must be called before write_log()
name, result, and running time. This logger will also capture the can be used.
timestamped combined stdout and stderr of each run. The second
logger is optional console output, which will contain only the one
line summary. The loggers are initialized at two different levels
to facilitate segregating the output.
""" """
if options.dryrun is True: if options.dryrun is True:
return return
testlogger = logging.getLogger(__name__) global LOG_FILE_OBJ
testlogger.setLevel(logging.DEBUG)
if options.cmd is not 'wrconfig': if options.cmd is not 'wrconfig':
try: try:
old = os.umask(0) old = os.umask(0)
os.makedirs(self.outputdir, mode=0777) os.makedirs(self.outputdir, mode=0o777)
os.umask(old) os.umask(old)
except OSError, e: filename = os.path.join(self.outputdir, 'log')
LOG_FILE_OBJ = open(filename, buffering=0, mode='wb')
except OSError as e:
fail('%s' % e) fail('%s' % e)
filename = os.path.join(self.outputdir, 'log')
logfile = logging.FileHandler(filename)
logfile.setLevel(logging.DEBUG)
logfilefmt = logging.Formatter('%(message)s')
logfile.setFormatter(logfilefmt)
testlogger.addHandler(logfile)
cons = logging.StreamHandler()
cons.setLevel(logging.INFO)
consfmt = logging.Formatter('%(message)s')
cons.setFormatter(consfmt)
testlogger.addHandler(cons)
return testlogger
def run(self, options): def run(self, options):
""" """
@ -707,31 +699,31 @@ class TestRun(object):
if not os.path.exists(logsymlink): if not os.path.exists(logsymlink):
os.symlink(self.outputdir, logsymlink) os.symlink(self.outputdir, logsymlink)
else: else:
print 'Could not make a symlink to directory %s' % ( write_log('Could not make a symlink to directory %s\n' %
self.outputdir) self.outputdir, LOG_ERR)
iteration = 0 iteration = 0
while iteration < options.iterations: while iteration < options.iterations:
for test in sorted(self.tests.keys()): for test in sorted(self.tests.keys()):
self.tests[test].run(self.logger, options) self.tests[test].run(options)
for testgroup in sorted(self.testgroups.keys()): for testgroup in sorted(self.testgroups.keys()):
self.testgroups[testgroup].run(self.logger, options) self.testgroups[testgroup].run(options)
iteration += 1 iteration += 1
def summary(self): def summary(self):
if Result.total is 0: if Result.total is 0:
return 2 return 2
print '\nResults Summary' print('\nResults Summary')
for key in Result.runresults.keys(): for key in list(Result.runresults.keys()):
if Result.runresults[key] is not 0: if Result.runresults[key] is not 0:
print '%s\t% 4d' % (key, Result.runresults[key]) print('%s\t% 4d' % (key, Result.runresults[key]))
m, s = divmod(time() - self.starttime, 60) m, s = divmod(time() - self.starttime, 60)
h, m = divmod(m, 60) h, m = divmod(m, 60)
print '\nRunning Time:\t%02d:%02d:%02d' % (h, m, s) print('\nRunning Time:\t%02d:%02d:%02d' % (h, m, s))
print 'Percent passed:\t%.1f%%' % ((float(Result.runresults['PASS']) / print('Percent passed:\t%.1f%%' % ((float(Result.runresults['PASS']) /
float(Result.total)) * 100) float(Result.total)) * 100))
print 'Log directory:\t%s' % self.outputdir print('Log directory:\t%s' % self.outputdir)
if Result.runresults['FAIL'] > 0: if Result.runresults['FAIL'] > 0:
return 1 return 1
@ -742,6 +734,23 @@ class TestRun(object):
return 0 return 0
def write_log(msg, target):
"""
Write the provided message to standard out, standard error or
the logfile. If specifying LOG_FILE, then `msg` must be a bytes
like object. This way we can still handle output from tests that
may be in unexpected encodings.
"""
if target == LOG_OUT:
os.write(sys.stdout.fileno(), bytearray(msg, encoding='utf-8'))
elif target == LOG_ERR:
os.write(sys.stderr.fileno(), bytearray(msg, encoding='utf-8'))
elif target == LOG_FILE:
os.write(LOG_FILE_OBJ.fileno(), msg)
else:
fail('log_msg called with unknown target "%s"' % target)
def verify_file(pathname): def verify_file(pathname):
""" """
Verify that the supplied pathname is an executable regular file. Verify that the supplied pathname is an executable regular file.
@ -757,7 +766,7 @@ def verify_file(pathname):
return False return False
def verify_user(user, logger): def verify_user(user):
""" """
Verify that the specified user exists on this system, and can execute Verify that the specified user exists on this system, and can execute
sudo without being prompted for a password. sudo without being prompted for a password.
@ -770,13 +779,15 @@ def verify_user(user, logger):
try: try:
getpwnam(user) getpwnam(user)
except KeyError: except KeyError:
logger.info("Warning: user '%s' does not exist.", user) write_log("Warning: user '%s' does not exist.\n" % user,
LOG_ERR)
return False return False
p = Popen(testcmd) p = Popen(testcmd)
p.wait() p.wait()
if p.returncode is not 0: if p.returncode is not 0:
logger.info("Warning: user '%s' cannot use passwordless sudo.", user) write_log("Warning: user '%s' cannot use passwordless sudo.\n" % user,
LOG_ERR)
return False return False
else: else:
Cmd.verified_users.append(user) Cmd.verified_users.append(user)
@ -804,7 +815,7 @@ def find_tests(testrun, options):
def fail(retstr, ret=1): def fail(retstr, ret=1):
print '%s: %s' % (argv[0], retstr) print('%s: %s' % (sys.argv[0], retstr))
exit(ret) exit(ret)
@ -894,7 +905,7 @@ def main():
if options.cmd is 'runtests': if options.cmd is 'runtests':
find_tests(testrun, options) find_tests(testrun, options)
elif options.cmd is 'rdconfig': elif options.cmd is 'rdconfig':
testrun.read(testrun.logger, options) testrun.read(options)
elif options.cmd is 'wrconfig': elif options.cmd is 'wrconfig':
find_tests(testrun, options) find_tests(testrun, options)
testrun.write(options) testrun.write(options)

View File

@ -31,74 +31,132 @@
#include <string.h> #include <string.h>
#include <sys/mman.h> #include <sys/mman.h>
#include <pthread.h> #include <pthread.h>
#include <errno.h>
#include <err.h>
/* /*
* -------------------------------------------------------------------- * --------------------------------------------------------------------
* Bug Id: 5032643 * Bug Issue Id: #7512
* The bug time sequence:
* 1. context #1, zfs_write assign a txg "n".
* 2. In the same process, context #2, mmap page fault (which means the mm_sem
* is hold) occurred, zfs_dirty_inode open a txg failed, and wait previous
* txg "n" completed.
* 3. context #1 call uiomove to write, however page fault is occurred in
* uiomove, which means it need mm_sem, but mm_sem is hold by
* context #2, so it stuck and can't complete, then txg "n" will not
* complete.
* *
* Simply writing to a file and mmaping that file at the same time can * So context #1 and context #2 trap into the "dead lock".
* result in deadlock. Nothing perverse like writing from the file's
* own mapping is required.
* -------------------------------------------------------------------- * --------------------------------------------------------------------
*/ */
static void * #define NORMAL_WRITE_TH_NUM 2
mapper(void *fdp)
{
void *addr;
int fd = *(int *)fdp;
if ((addr = static void *
mmap(0, 8192, PROT_READ, MAP_SHARED, fd, 0)) == MAP_FAILED) { normal_writer(void *filename)
perror("mmap"); {
exit(1); char *file_path = filename;
int fd = -1;
ssize_t write_num = 0;
int page_size = getpagesize();
fd = open(file_path, O_RDWR | O_CREAT, 0777);
if (fd == -1) {
err(1, "failed to open %s", file_path);
} }
for (;;) {
if (mmap(addr, 8192, PROT_READ, char *buf = malloc(1);
MAP_SHARED|MAP_FIXED, fd, 0) == MAP_FAILED) { while (1) {
perror("mmap"); write_num = write(fd, buf, 1);
exit(1); if (write_num == 0) {
err(1, "write failed!");
break;
}
lseek(fd, page_size, SEEK_CUR);
}
if (buf) {
free(buf);
}
}
static void *
map_writer(void *filename)
{
int fd = -1;
int ret = 0;
char *buf = NULL;
int page_size = getpagesize();
int op_errno = 0;
char *file_path = filename;
while (1) {
ret = access(file_path, F_OK);
if (ret) {
op_errno = errno;
if (op_errno == ENOENT) {
fd = open(file_path, O_RDWR | O_CREAT, 0777);
if (fd == -1) {
err(1, "open file failed");
}
ret = ftruncate(fd, page_size);
if (ret == -1) {
err(1, "truncate file failed");
}
} else {
err(1, "access file failed!");
}
} else {
fd = open(file_path, O_RDWR, 0777);
if (fd == -1) {
err(1, "open file failed");
}
}
if ((buf = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0)) == MAP_FAILED) {
err(1, "map file failed");
}
if (fd != -1)
close(fd);
char s[10] = {0, };
memcpy(buf, s, 10);
ret = munmap(buf, page_size);
if (ret != 0) {
err(1, "unmap file failed");
} }
} }
/* NOTREACHED */
return ((void *)1);
} }
int int
main(int argc, char **argv) main(int argc, char **argv)
{ {
int fd; pthread_t map_write_tid;
char buf[1024]; pthread_t normal_write_tid[NORMAL_WRITE_TH_NUM];
pthread_t tid; int i = 0;
memset(buf, 'a', sizeof (buf)); if (argc != 3) {
(void) printf("usage: %s <normal write file name>"
if (argc != 2) { "<map write file name>\n", argv[0]);
(void) printf("usage: %s <file name>\n", argv[0]);
exit(1); exit(1);
} }
if ((fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666)) == -1) { for (i = 0; i < NORMAL_WRITE_TH_NUM; i++) {
perror("open"); if (pthread_create(&normal_write_tid[i], NULL, normal_writer,
exit(1); argv[1])) {
} err(1, "pthread_create normal_writer failed.");
(void) pthread_setconcurrency(2);
if (pthread_create(&tid, NULL, mapper, &fd) != 0) {
perror("pthread_create");
close(fd);
exit(1);
}
for (;;) {
if (write(fd, buf, sizeof (buf)) == -1) {
perror("write");
close(fd);
exit(1);
} }
} }
close(fd); if (pthread_create(&map_write_tid, NULL, map_writer, argv[2])) {
err(1, "pthread_create map_writer failed.");
}
/* NOTREACHED */ /* NOTREACHED */
pthread_join(map_write_tid, NULL);
return (0); return (0);
} }

View File

@ -26,7 +26,7 @@
# #
# #
# Copyright (c) 2012, 2018 by Delphix. All rights reserved. # Copyright (c) 2012, 2016 by Delphix. All rights reserved.
# Copyright (c) 2017 Lawrence Livermore National Security, LLC. # Copyright (c) 2017 Lawrence Livermore National Security, LLC.
# #
@ -43,9 +43,8 @@
# 1) Create 3 files # 1) Create 3 files
# 2) Create a pool backed by the files # 2) Create a pool backed by the files
# 3) Expand the files' size with truncate # 3) Expand the files' size with truncate
# 4) Use zpool reopen to check the expandsize # 4) Use zpool online -e to online the vdevs
# 5) Use zpool online -e to online the vdevs # 5) Check that the pool size was expanded
# 6) Check that the pool size was expanded
# #
verify_runnable "global" verify_runnable "global"
@ -65,8 +64,8 @@ log_onexit cleanup
log_assert "zpool can expand after zpool online -e zvol vdevs on LUN expansion" log_assert "zpool can expand after zpool online -e zvol vdevs on LUN expansion"
for type in " " mirror raidz raidz2; do for type in " " mirror raidz raidz2; do
# Initialize the file devices and the pool
for i in 1 2 3; do for i in 1 2 3; do
log_must truncate -s $org_size ${TEMPFILE}.$i log_must truncate -s $org_size ${TEMPFILE}.$i
done done
@ -81,35 +80,13 @@ for type in " " mirror raidz raidz2; do
"$autoexp" "$autoexp"
fi fi
typeset prev_size=$(get_pool_prop size $TESTPOOL1) typeset prev_size=$(get_pool_prop size $TESTPOOL1)
typeset zfs_prev_size=$(get_prop avail $TESTPOOL1) typeset zfs_prev_size=$(zfs get -p avail $TESTPOOL1 | tail -1 | \
awk '{print $3}')
# Increase the size of the file devices
for i in 1 2 3; do for i in 1 2 3; do
log_must truncate -s $exp_size ${TEMPFILE}.$i log_must truncate -s $exp_size ${TEMPFILE}.$i
done done
# Reopen the pool and check that the `expandsize` property is set
log_must zpool reopen $TESTPOOL1
typeset zpool_expandsize=$(get_pool_prop expandsize $TESTPOOL1)
if [[ $type == "mirror" ]]; then
typeset expected_zpool_expandsize=$(($exp_size-$org_size))
else
typeset expected_zpool_expandsize=$((3*($exp_size-$org_size)))
fi
if [[ "$zpool_expandsize" = "-" ]]; then
log_fail "pool $TESTPOOL1 did not detect any " \
"expandsize after reopen"
fi
if [[ $zpool_expandsize -ne $expected_zpool_expandsize ]]; then
log_fail "pool $TESTPOOL1 did not detect correct " \
"expandsize after reopen: found $zpool_expandsize," \
"expected $expected_zpool_expandsize"
fi
# Online the devices to add the new space to the pool
for i in 1 2 3; do for i in 1 2 3; do
log_must zpool online -e $TESTPOOL1 ${TEMPFILE}.$i log_must zpool online -e $TESTPOOL1 ${TEMPFILE}.$i
done done
@ -119,7 +96,8 @@ for type in " " mirror raidz raidz2; do
sync sync
typeset expand_size=$(get_pool_prop size $TESTPOOL1) typeset expand_size=$(get_pool_prop size $TESTPOOL1)
typeset zfs_expand_size=$(get_prop avail $TESTPOOL1) typeset zfs_expand_size=$(zfs get -p avail $TESTPOOL1 | tail -1 | \
awk '{print $3}')
log_note "$TESTPOOL1 $type has previous size: $prev_size and " \ log_note "$TESTPOOL1 $type has previous size: $prev_size and " \
"expanded size: $expand_size" "expanded size: $expand_size"
@ -134,8 +112,8 @@ for type in " " mirror raidz raidz2; do
grep "(+${expansion_size}" | wc -l) grep "(+${expansion_size}" | wc -l)
if [[ $size_addition -ne $i ]]; then if [[ $size_addition -ne $i ]]; then
log_fail "pool $TESTPOOL1 did not expand " \ log_fail "pool $TESTPOOL1 is not autoexpand " \
"after LUN expansion and zpool online -e" "after LUN expansion"
fi fi
elif [[ $type == "mirror" ]]; then elif [[ $type == "mirror" ]]; then
typeset expansion_size=$(($exp_size-$org_size)) typeset expansion_size=$(($exp_size-$org_size))
@ -145,8 +123,8 @@ for type in " " mirror raidz raidz2; do
grep "(+${expansion_size})" >/dev/null 2>&1 grep "(+${expansion_size})" >/dev/null 2>&1
if [[ $? -ne 0 ]]; then if [[ $? -ne 0 ]]; then
log_fail "pool $TESTPOOL1 did not expand " \ log_fail "pool $TESTPOOL1 is not autoexpand " \
"after LUN expansion and zpool online -e" "after LUN expansion"
fi fi
else else
typeset expansion_size=$((3*($exp_size-$org_size))) typeset expansion_size=$((3*($exp_size-$org_size)))
@ -156,13 +134,13 @@ for type in " " mirror raidz raidz2; do
grep "(+${expansion_size})" >/dev/null 2>&1 grep "(+${expansion_size})" >/dev/null 2>&1
if [[ $? -ne 0 ]] ; then if [[ $? -ne 0 ]] ; then
log_fail "pool $TESTPOOL1 did not expand " \ log_fail "pool $TESTPOOL1 is not autoexpand " \
"after LUN expansion and zpool online -e" "after LUN expansion"
fi fi
fi fi
else else
log_fail "pool $TESTPOOL1 did not expand after LUN expansion " \ log_fail "pool $TESTPOOL1 is not autoexpanded after LUN " \
"and zpool online -e" "expansion"
fi fi
log_must zpool destroy $TESTPOOL1 log_must zpool destroy $TESTPOOL1
done done

View File

@ -53,12 +53,14 @@ if ! is_mp; then
fi fi
log_must chmod 777 $TESTDIR log_must chmod 777 $TESTDIR
mmapwrite $TESTDIR/test-write-file & mmapwrite $TESTDIR/normal_write_file $TESTDIR/map_write_file &
PID_MMAPWRITE=$! PID_MMAPWRITE=$!
log_note "mmapwrite $TESTDIR/test-write-file pid: $PID_MMAPWRITE" log_note "mmapwrite $TESTDIR/normal_write_file $TESTDIR/map_write_file"\
"pid: $PID_MMAPWRITE"
log_must sleep 30 log_must sleep 30
log_must kill -9 $PID_MMAPWRITE log_must kill -9 $PID_MMAPWRITE
log_must ls -l $TESTDIR/test-write-file log_must ls -l $TESTDIR/normal_write_file
log_must ls -l $TESTDIR/map_write_file
log_pass "write(2) a mmap(2)'ing file succeeded." log_pass "write(2) a mmap(2)'ing file succeeded."

View File

@ -10,6 +10,7 @@ dist_pkgdata_SCRIPTS = \
mmp_exported_import.ksh \ mmp_exported_import.ksh \
mmp_write_uberblocks.ksh \ mmp_write_uberblocks.ksh \
mmp_reset_interval.ksh \ mmp_reset_interval.ksh \
mmp_on_zdb.ksh \
setup.ksh \ setup.ksh \
cleanup.ksh cleanup.ksh

View File

@ -0,0 +1,74 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
#
# Copyright (c) 2018 Lawrence Livermore National Security, LLC.
# Copyright (c) 2018 by Nutanix. All rights reserved.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/mmp/mmp.cfg
. $STF_SUITE/tests/functional/mmp/mmp.kshlib
#
# Description:
# zdb will work while multihost is enabled.
#
# Strategy:
# 1. Create a pool
# 2. Enable multihost
# 3. Run zdb -d with pool and dataset arguments.
# 4. Create a checkpoint
# 5. Run zdb -kd with pool and dataset arguments.
# 6. Discard the checkpoint
# 7. Export the pool
# 8. Run zdb -ed with pool and dataset arguments.
#
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
for DISK in $DISKS; do
zpool labelclear -f $DEV_RDSKDIR/$DISK
done
log_must mmp_clear_hostid
}
log_assert "Verify zdb -d works while multihost is enabled"
log_onexit cleanup
verify_runnable "global"
verify_disk_count "$DISKS" 2
default_mirror_setup_noexit $DISKS
log_must mmp_set_hostid $HOSTID1
log_must zpool set multihost=on $TESTPOOL
log_must zfs snap $TESTPOOL/$TESTFS@snap
log_must zdb -d $TESTPOOL
log_must zdb -d $TESTPOOL/
log_must zdb -d $TESTPOOL/$TESTFS
log_must zdb -d $TESTPOOL/$TESTFS@snap
log_must zpool export $TESTPOOL
log_must zdb -ed $TESTPOOL
log_must zdb -ed $TESTPOOL/
log_must zdb -ed $TESTPOOL/$TESTFS
log_must zdb -ed $TESTPOOL/$TESTFS@snap
log_must zpool import $TESTPOOL
cleanup
log_pass "zdb -d works while multihost is enabled"

View File

@ -31,9 +31,12 @@
. $STF_SUITE/include/libtest.shlib . $STF_SUITE/include/libtest.shlib
if ! $STF_SUITE/tests/functional/tmpfile/tmpfile_test /tmp; then DISK=${DISKS%% *}
log_unsupported "The kernel doesn't support O_TMPFILE." default_setup_noexit $DISK
if ! $STF_SUITE/tests/functional/tmpfile/tmpfile_test $TESTDIR; then
default_cleanup_noexit
log_unsupported "The kernel/filesystem doesn't support O_TMPFILE"
fi fi
DISK=${DISKS%% *} log_pass
default_setup $DISK

View File

@ -36,13 +36,14 @@ main(int argc, char *argv[])
fd = open(argv[1], O_TMPFILE | O_WRONLY, 0666); fd = open(argv[1], O_TMPFILE | O_WRONLY, 0666);
if (fd < 0) { if (fd < 0) {
/*
* Only fail on EISDIR. If we get EOPNOTSUPP, that means
* kernel support O_TMPFILE, but the path at argv[1] doesn't.
*/
if (errno == EISDIR) { if (errno == EISDIR) {
fprintf(stderr, "kernel doesn't support O_TMPFILE\n"); fprintf(stderr,
"The kernel doesn't support O_TMPFILE\n");
return (1); return (1);
} else if (errno == EOPNOTSUPP) {
fprintf(stderr,
"The filesystem doesn't support O_TMPFILE\n");
return (2);
} }
perror("open"); perror("open");
} else { } else {

Some files were not shown because too many files have changed in this diff Show More