The bug time sequence:
1. thread #1, `zfs_write` assign a txg "n".
2. In a same process, thread #2, mmap page fault (which means the
`mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed,
and wait previous txg "n" completed.
3. thread #1 call `uiomove` to write, however page fault is occurred
in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by
thread #2, so it stuck and can't complete, then txg "n" will
not complete.
So thread #1 and thread #2 are deadlocked.
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Grady Wong <grady.w@xtaotech.com>
Closes#7939
On some Linux distributions, the kernel module build will not
default to building with debuginfo symbols, which can make it
difficult for debugging and testing.
For this case, we provide a flag to override the build to force
debuginfo to be produced for the kernel module build.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Neal Gompa <ngompa@datto.com>
Co-authored-by: Simon Watson <swatson@datto.com>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Simon Watson <swatson@datto.com>
Closes#8304
There's not really a reason to keep the subject length so short,
since the reason to make it this short was for making nice renders
of a summary list of the git log. With 72 characters, this still
works out fine, so let's just raise it to that so that it's easier
to give slightly more descriptive change summaries.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes#8250
Add an 'enclosure_symlinks' option to vdev_id.conf. This creates
consistently named symlinks to the enclosure devices (/dev/sg*) based
off the configuration in vdev_id.conf. The enclosure symlinks show
up in /dev/by-enclosure/<prefix>-<channel><num>. The links make it
make it easy to run sg_ses on a particular enclosure device. The
enclosure links are created in addition to the normal
/dev/disk/by-vdev links.
'enclosure_symlinks' is only valid in sas_direct configurations.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Simon Guest <simon.guest@tesujimath.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#8194
This extends vdev_id to support a new slot type, ses, for SCSI Enclosure
Services. With slot type ses, the disk slot numbers are determined by
using the device slot number reported by sg_ses for the device with
matching SAS address, found by querying all available enclosures.
This is primarily of use on systems with a deficient driver omitting
support for bay_identifier in /sys/devices. In my testing, I found that
the existing slot types of port and id were not stable across disk
replacement, so an alternative was required.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Simon Guest <simon.guest@tesujimath.org>
Closes#6956
On systems with SCSI rather than SAS disk topology, this change enables
the vdev_id script to match against the block device path, and therefore
create a vdev alias in /dev/disk/by-vdev.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Simon Guest <simon.guest@tesujimath.org>
Closes#6592
Macro ZFS_MINOR, introduced in commit a6cc9756 to record the chosen
static minor number for /dev/zfs, conflicts with an existing macro
in Lustre. The lustre macro (along with _MAJOR, _PATCH, _FIX) is
used to record the zfsonlinux version Lustre is being built against.
Since the Lustre macro came first, and is used in past versions of
lustre at least going back to 2.10, it makes sense to rename the
macro in ZFS instead of doing so in Lustre which would require
backporting the patch.
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#8195
Historically a dynamic misc minor number was registered for the
/dev/zfs device in order to prevent minor number collisions. This
was fine but it prevented us from being able to use the kernel
module auto-loaded which requires a known reserved value.
Resolve this issue by adding a configure test to find an available
misc minor number which can then be used in MODULE_ALIAS_MISCDEV at
build time. By adding this alias the zfs kmod is added to the list
of known static-nodes and the systemd-tmpfiles-setup-dev service
will create a /dev/zfs character device at boot time.
This in turn allows us to update the 90-zfs.rules file to make it
aware this is a static node. The upshot of this is that whenever
a process (zpool, zfs, zed) opens the /dev/zfs the kmods will be
automatic loaded. This even works for unprivileged users so there
is no longer a need to manually load the modules at boot time.
As an additional bonus the zed now no longer needs to start after
the zfs-import.service since it will trigger the module load.
In the unlikely event the minor number we selected conflicts with
another out of tree unregistered minor number the code falls back
to dynamically allocating it. In this case the modules again
must be manually loaded.
Note that due to the change in the method of registering the minor
number the zimport.sh test case may incorrectly fail when the
static node for the installed packages is created instead of the
dynamic one. This issue will only transiently impact zimport.sh
for this single commit when we transition and are mixing and
matching methods.
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes#7287
This fixes the build when cross-compiling, where the preprocessor might
be prefixed.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ben Wolsieffer <benwolsieffer@gmail.com>
Closes#8180
The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary -
we can instead pass a flag down in a few places to prevent recursive
dbuf eviction. Making this change has 3 benefits:
1. The code semantics are easier to understand.
2. On Linux, performance is improved, because creating/removing
TSD values (by setting to NULL vs non-NULL) is expensive, and
we do it very often.
3. According to Nexenta, the current semantics can cause a
deadlock when concurrently calling dmu_objset_evict_dbufs()
(which is rare today, but they are working on a "parallel
unmount" change that triggers this more easily):
Porting Notes:
* Minor conflict with OpenZFS 9337 which has not yet been ported.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9577
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/645
External-issue: DLPX-58547
Closes#7602
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#6962
Use automake to build initramfs scripts and hooks.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#6761
This adds a BuildRequires for gcc, make, and elfutils-libelf-devel
into our spec files. gcc has been a packaging requirement for
awhile now:
https://fedoraproject.org/wiki/Packaging:C_and_C%2B%2B
These additional BuildRequires allow us to mock build in
Fedora 29.
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#8095Closes#8102
From, https://lintlyci.github.io/Flake8Rules/rules/W605.html
As of Python 3.6, a backslash-character pair that is not a valid
escape sequence now generates a DeprecationWarning. Although this
will eventually become a SyntaxError, that will not be for several
Python releases.
Note 'float_pobj' was simply removed from arcstat.py since it
was entirely unused.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#8056
In CentOS 7.5 the kernel provided a compatibility wrapper to support
O_TMPFILE. This results in the test setup script correctly detecting
kernel support. But the ZFS module was built without O_TMPFILE
support due to the non-standard CentOS kernel interface.
Handle this case by updating the setup check to fail either when
the kernel or the ZFS module fail to provide support. The reason
will be clearly logged in the test results.
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7528
It's helpful if there are pools with same names,
but you need to use only one of them.
Main case is twin servers, meanwhile some software
requires the same name of pools (e.g. Proxmox).
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Igor ‘guardian’ Lidin of Moscow, Russia
Closes#8052
Update arc_release to use arc_buf_size(). This hunk was accidentally
dropped when porting compressed send/recv, 2aa34383b.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#8000
Recent changes in the Linux kernel made it necessary to prefix
the refcount_add() function with zfs_ due to a name collision.
To bring the other functions in line with that and to avoid future
collisions, prefix the other refcount functions as well.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes#7963
torvalds/linux@59b57717f ("blkcg: delay blkg destruction until
after writeback has finished") added a refcount_t to the blkcg
structure. Due to the refcount_t compatibility code, zfs_refcount_t
was used by mistake.
Resolve this by removing the compatibility code and replacing the
occurrences of refcount_t with zfs_refcount_t.
Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes#7885Closes#7932
Bandwidth and iops are average per second while *_wait are averages
per request for latency or, for queue depths, an instantaneous
measurement at the end of an interval (according to man zpool).
When calculating the first two it makes sense to do
x/interval_duration (x being the increase in total bytes or number of
requests over the duration of the interval, interval_duration in
seconds) to 'scale' from amount/interval_duration to amount/second.
But applying the same math for the latter (*_wait latencies/queue) is
wrong as there is no interval_duration component in the values (these
are time/requests to get to average_time/request or already an
absulute number).
This bug leads to the only correct continuous *_wait figures for both
latencies and queue depths from 'zpool iostat -l/q' being with
duration=1 as then the wrong math cancels itself (x/1 is a nop).
This removes temporal scaling from latency and queue depth figures.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes#7945Closes#7694
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.
This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.
Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.
Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7927Closes#7122Closes#7937
Since zdb opens the pools read-only, it cannot damage the pool in the
event the pool is already imported either on the same host or on
another one.
If the pool vdev structure is changing while zdb is importing the
pool, it may cause zdb to crash. However this is unlikely, and in any
case it's a user space process and can simply be run again.
For this reason, zdb should disable the multihost activity test on
import that is normally run.
This commit fixes a few zdb code paths where that had been overlooked.
It also adds tests to ensure that several common use cases handle this
properly in the future.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gu Zheng <guzheng2331314@163.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#7797Closes#7801
When doing a read from disk, ZFS creates 3 ZIO's: a zio_null(), the
logical zio_read(), and then a physical zio. Currently, each of these
results in a separate taskq_dispatch(zio_execute).
On high-read-iops workloads, this causes a significant performance
impact. By processing all 3 ZIO's in a single taskq entry, we reduce the
overhead on taskq locking and context switching. We accomplish this by
allowing zio_done() to return a "next zio to execute" to zio_execute().
This results in a ~12% performance increase for random reads, from
96,000 iops to 108,000 iops (with recordsize=8k, on SSD's).
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-59292
Closes#7736
Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.
This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7147Closes#7388
This change was missed in 0735ecb334.
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: András Korn <korn-github.com@elan.rulez.org>
Closes#7096
This reverts commit 2a16d4cfaf.
The commit was causing a "attempt to access beyond the end
of device" error:
list.zfsonlinux.org/pipermail/zfs-discuss/2018-September/032217.html
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes#7764Closes#7864
fs/zfs/zfs/metaslab.c:1055:2: error: positional initialization of field
in ‘struct’ declared with ‘designated_init’ attribute
[-Werror=designated-init]
metaslab_rt_remove,
Signed-off-by: Georgy Yakovlev <ya@sysdump.net>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes: #7069
Currently, there is a bug where older send streams without the
DMU_BACKUP_FEATURE_LARGE_DNODE flag are not handled correctly.
The code in receive_object() fails to handle cases where
drro->drr_dn_slots is set to 0, which is always the case when the
sending code does not support this feature flag. This patch fixes
the issue by ensuring that that a value of 0 is treated as
DNODE_MIN_SLOTS.
Tested-by: DHE <git@dehacked.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7617Closes#7662
Currently, when the receive_object() code wants to reclaim an
object, it always assumes that the dnode is the legacy 512 bytes,
even when the incoming bonus buffer exceeds this length. This
causes a buffer overflow if --enable-debug is not provided and
triggers an ASSERT if it is. This patch resolves this issue and
adds an ASSERT to ensure this can't happen again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7097Closes#7433
This is a port of 047116ac - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch. It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.
From 047116ac:
Currently, when a raw zfs send file includes a
DRR_OBJECT record that would decrease the number of
levels of an existing object, the object is reallocated
with dmu_object_reclaim() which creates the new dnode
using the old object's nlevels. For non-raw sends this
doesn't really matter, but raw sends require that
nlevels on the receive side match that of the send
side so that the checksum-of-MAC tree can be properly
maintained. This patch corrects the issue by freeing
the object completely before allocating it again in
this case.
This patch also corrects several issues with
dnode_hold_impl() and related functions that prevented
dnodes (particularly multi-slot dnodes) from being
reallocated properly due to the fact that existing
dnodes were not being fully cleaned up when they
were freed.
This patch adds a test to make sure that zfs recv
functions properly with incremental streams containing
dnodes of different sizes.
This also includes a one-liner fix from loli10K to fix a test failure:
https://github.com/zfsonlinux/zfs/pull/7792#discussion_r212769264
Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
Closes#6821Closes#6864
NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL. The other two patches should immediately
follow this one.
Fix a bunch of truncation compiler warnings that show up
on Fedora 28 (GCC 8.0.1).
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7368Closes#7826Closes#7830
BuildRequires tags for "-devel" packages in the RPM spec file do not
work when building on Debian-based distributions.
Fix this issue by making this requirement conditional to RPM-based
distributions.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#7829Closes#7831
The zfs-test package needs a build requirement on the libaio-devel
package. Without it ./configure will correctly determine that
mmap_libaio cannot be built and it will be skipped.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7821Closes#7824
The zfs-dracut package requires the hostid, basename, head, awk,
and grep utilities be installed. The first three are provided by
coreutils but additional dependencies are required for awk and grep.
Reviewed-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7729Closes#7747
argv[] gets modified during string parsing for input arguments. This
is reflected in the live process listing. Don't do that.
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes#7760
This change allows the arcstat.py script to handle unsupported options
gracefully and print both error and usage messages when one such option
is provided.
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#7799
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#7755Closes#7576Closes#7757
When receiving an incremental send stream with intermediary snapshots
zfs_receive_one() does not correctly identify the top-level dataset:
consequently we restore said snapshots as if they were children
datasets in the hierarchy, forcing inheritance of any property received
with 'zfs send -o' and effectively removing any locally set value.
The test case did not correctly verify this situation because it uses
adjacent snapshots, basically testing 'zfs send -i' instead of
'zfs send -I': this commit adds an additional intermediary snapshot to
the test script.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#7478
Porting notes:
* As of grub-2.02 these checksums are not supported. However, as
pointed out in #6501 there are alternatives such as EFISTUB which
work and have no such restriction. A warning was added to the
checksum property section of the zfs.8 man page.
Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/8906
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52fCloses#6501Closes#7714
Commit 93b43af10 inadvertently introduced the following scenario which
can result in a deadlock. This issue was most easily reproduced by
LXD containers using a ZFS storage backend but should be reproducible
under any workload which is frequently mounting and unmounting.
-- THREAD A --
spa_sync()
spa_sync_upgrades()
rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B
-- THREAD B --
mount_fs()
zpl_mount()
zpl_mount_impl()
dmu_objset_hold()
dmu_objset_hold_flags()
dsl_pool_hold()
dsl_pool_config_enter()
rrw_enter(&dp->dp_config_rwlock, RW_READER, tag);
sget()
sget_userns()
grab_super()
down_write(&s->s_umount); <- Waiting on C
-- THREAD C --
cleanup_mnt()
deactivate_super()
down_write(&s->s_umount);
deactivate_locked_super()
zpl_kill_sb()
kill_anon_super()
generic_shutdown_super()
sync_filesystem()
zpl_sync_fs()
zfs_sync()
zil_commit()
txg_wait_synced() <- Waiting on A
Reviewed by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7598Closes#7659Closes#7691Closes#7693
Update the SA_COPY_DATA macro to check if architecture supports
efficient unaligned memory accesses at compile time. Otherwise
fallback to using the sa_copy_data() function.
The kernel provided CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
used to determine availability in kernel space. In user space
the x86_64, x86, powerpc, and sometimes arm architectures will
define the HAVE_EFFICIENT_UNALIGNED_ACCESS macro.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7642Closes#7684
Add a default 4 KiB ashift for Amazon EC2 NVMe devices on instances with
NVMe ephemeral devices, such as the types c5d, f1, i3 and m5d.
As per the official documentation [1] a 4096 byte blocksize should be
used to match the underlying hardware.
The string was identified via:
$ sudo sginfo -M /dev/nvme0n1
INQUIRY response (cmd: 0x12)
----------------------------
Device Type 0
Vendor: NVMe
Product: Amazon EC2 NVMe
Revision level:
$ lsblk -io KNAME,TYPE,SIZE,MODEL
KNAME TYPE SIZE MODEL
nvme0n1 disk 442.4G Amazon EC2 NVMe Instance Storage
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/
storage-optimized-instances.html
Retrived 2018-07-03
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Troels Nørgaard <tnn@tradeshift.com>
Closes#7676
The blk_queue_stackable() function was replaced in the 4.14 kernel
by queue_is_rq_based(), commit torvalds/linux@5fdee212. This change
resulted in the default elevator being used which can negatively
impact performance.
Rather than adding additional compatibility code to detect the
new interface unconditionally attempt to set the elevator. Since
we expect this to fail for block devices without an elevator the
error message has been moved in to zfs_dbgmsg().
Finally, it was observed that the elevator_change() was removed
from the 4.12 kernel, commit torvalds/linux@c033269. Update the
comment to clearly specify which are expected to export the
elevator_change() symbol.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7645
1. Add a proc entry to display the pool's state:
$ cat /proc/spl/kstat/zfs/tank/state
ONLINE
This is done without using the spa config locks, so it will
never hang.
2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#7331Closes#7563
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.
Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.
Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes#7546
Issue #7582