The existing rules miss nvme disk devices because of the trailing
digits in the KERNEL device name, e.g. nvme0n1. Partitions of nvme
disk devices are already properly handled by the existing rule for
ENV{DEVTYPE}=="partition".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Thomas Geppert <geppi@digitx.de>
Closes#9730
On systems that utilize TTY for password entry, if the kernel
option "quiet" is set, the system would appear to freeze on a
blank screen, when in fact it is waiting for password entry
from the user.
Since TTY is the fallback method, this has no effect on systemd
or plymouth password prompting.
By temporarily setting "printk" to "7", running the command,
then resuming with the original "printk" state, the user can
see the password prompt.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes#9731
From Steve Langasek <steve.langasek@canonical.com>:
> The poorly-named 'FRAMEBUFFER' option in initramfs-tools controls
> whether the console_setup and plymouth scripts are included and used
> in the initramfs. These are required for any initramfs which will be
> prompting for user input: console_setup because without it the user's
> configured keymap will not be set up, and plymouth because you are
> not guaranteed to have working video output in the initramfs without
> it (e.g. some nvidia+UEFI configurations with the default GRUB
> behavior).
> The zfs initramfs script may need to prompt the user for passphrases
> for encrypted zfs datasets, and we don't know definitively whether
> this is the case or not at the time the initramfs is constructed (and
> it's difficult to dynamically populate initramfs config variables
> anyway), therefore the zfs-initramfs package should just set
> FRAMEBUFFER=yes in a conf snippet the same way that the
> cryptsetup-initramfs package does
> (/usr/share/initramfs-tools/conf-hooks.d/cryptsetup).
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856408
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Steve Langasek <steve.langasek@canonical.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes#9723
Apply umask to `mode` which will eventually be applied to inode.
This is needed since VFS doesn't apply umask for O_TMPFILE files.
(Note that zpl_init_acl() applies `ip->i_mode &= ~current_umask();`
only when POSIX ACL is used.)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes#8997Closes#8998
Currently, 'zfs list' and 'zfs get' commands can be slow when
working with snapshots that have a ds_props_obj. This is
because the code that discovers all of the properties for these
snapshots needs to read this object for each snapshot, which
almost always ends up causing an extra random synchronous read
for each snapshot. This performance penalty exists even if the
properties on that snapshot have been unset because the object
is normally only freed when the snapshot is freed, even though
it is only created when it is needed.
This patch allows the user to regain 'zfs list' performance on
these snapshots by destroying the ds_props_obj when it no longer
has any entries left. In practice on a production machine, this
optimization seems to make 'zfs list' about 55% faster.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#9704
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed). Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().
Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.
Found by AddressSanitizer:
ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
#0 0x7fe88b0c166d (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
#1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
#2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
#3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
#4 0x55ffbc769fba in ztest_execute ztest.c:6714
#5 0x55ffbc779a90 in ztest_thread ztest.c:6761
#6 0x7fe889cbc6da in start_thread
#7 0x7fe8899e588e in __clone
0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
#0 0x7fe88b14e7b8 in __interceptor_free
#1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
#2 0x7fe88ae543ba in nvpair_free nvpair.c:844
#3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
#4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
#5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
#6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
#7 0x55ffbc769fba in ztest_execute ztest.c:6714
#8 0x55ffbc779a90 in ztest_thread ztest.c:6761
#9 0x7fe889cbc6da in start_thread
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#9706
This interferes with zdb_read_block trying all the decompression
algorithms when the 'd' flag is specified, as some are
expected to fail. Also control the output when guessing
algorithms, try the more common compression types first, allow
specifying lsize/psize, and fix an uninitialized variable.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#9612Closes#9630
This change allows us to align the code dump logic across platforms.
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#9691
Update the vdev_disk_open() retry logic to use a specified number
of milliseconds to be more robust. Additionally, on failure log
both the time waited and requested timeout to the internal log.
The default maximum allowed open retry time has been increased
from 500ms to 1000ms.
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9680
Conflicts:
This sets send_realloc_files.ksh to use properties.shlib
(like the other compression related tests)
It was missing from #9645
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Issue #9645Closes#9679
arc_summary3 reports L2ARC hits and misses as Bytes, whereas they
should be reported as events. arc_summary2 reports these correctly.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#9669
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#9644Closes#9657
In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO
to read data from the original storage device. Unfortunately pointer
to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if
some other read try to bump the ZIO priority, it will crash.
The problem is reproducible by corrupting L2ARC content and reading
some data with prefetch if l2arc_noprefetch tunable is changed to 0.
With the default setting the issue is probably not reproducible now.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#9648
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device. Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M. When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9131Closes#9355
Remove the specific gitignore rules for module left-overs and add a
generic one in modules/.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes#9656
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#9611
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#9611
The command line switch -A (ignore ASSERTs) has always been available
in zdb but was never connected up to the correct global variable.
There are times when you need zdb to ignore asserts and keep dumping
out whatever information it can get despite the ASSERT(s) failing.
It was always intended to be part of zdb but was incomplete.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#9610
As described in commit f81d5ef6 the zfs_vdev_elevator module
option is being removed. Users who require this functionality
should update their systems to set the disk scheduler using a
udev rule.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664Closes#9417Closes#9609
The function zdb_read_block (zdb -R) was always intended to have a :c
flag which would read the DVA and length supplied by the user, and
display the checksum. Since we don't know which checksum goes with
the data, we should calculate and display them all.
For each checksum in the table, read in the data at the supplied
DVA:length, calculate the checksum, and display it. Update the man
page and create a zfs test for the new feature.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#9607
The changes in commit 41e1aa2a / PR #9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:
openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
<...>
fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA
The originally proposed change on PR #9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes#9602
The tst.terminate_by_signal test case may occasionally fail when
running in a less consistent virtual environment. For all observed
failures the process was terminated correctly but it took longer than
expected resulting in too many snapshot being created.
To minimize the likelyhood of this occuring increase the threshold
from 50 to 90 snapshots. The larger limit will still verifiy that
the channel program was correctly terminated early.
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9601
Use `printf` to properly interpret unicode characters.
Illumos uses a utility called `zlook` to allow additional flags to be
provided to readdir and lookup for testing. This functionality could
be ported to Linux, but even without it several of the tests can be
enabled by instead using the standard `test` command.
Additional, work is required to enable the remaining test cases.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Issue #7633Closes#8812
df58307 removed the need to specify -d 1 when zfs list and zfs get are
called with -t snapshot on a datset. This commit extends the same
behaviour to -t bookmark.
This commit also introduces the 'snap' shorthand for snapshots from
zfs list to zfs get.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#9589
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.
Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes#9583
Removes an incorrect error message from libzfs that suggests applying
'-r' when a zfs subcommand is called with a filesystem path while
expecting either a snapshot or bookmark path.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#9574
zed.service does not exist
replaced with correct service name in man.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes#9581
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted. To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#9546Closes#9577
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes#9034
When `zpool create -o <property>` is run without root permissions
and the pool property requested is not specifically enumerated in
zpool_valid_proplist(). Then an incorrect error message referring
to an invalid property is printed rather than the expected permission
denied error.
Specifying a pool property at create time should be handled the same
way as filesystem properties in zfs_valid_proplist(). There should
not be default zfs_error_aux() set for properties which are not
listed.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9550Closes#9568
Before my ZIL space optimization few years ago 128KB writes were logged
as two 64KB+ records in two 128KB log blocks. After that change it
became ~127KB+/1KB+ in two 128KB log blocks to free space in the second
block for another record. Unfortunately in case of 128KB only writes,
when space in the second block remained unused, that change increased
write latency by unbalancing checksum computation and write times
between parallel threads. It also didn't help with SLOG space
efficiency in that case.
This change introduces new 68KB log block size, used for both writes
below 67KB and 128KB-sharp writes. Writes of 68-127KB are still using
one 128KB block to not increase processing overhead. Writes above
131KB are still using full 128KB blocks, since possible saving there
is small. Mixed loads will likely also fall back to previous 128KB,
since code uses maximum of the last 16 requested block sizes.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#9409
Don't ask for the password / try to load the key if the key for the
encryptionroot is already loaded. The user might have loaded the key
manually or by other means before the scripts get called.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com>
Closes#9495Closes#9529
Some systemd users may want to change configurations in
/etc/defaults/zfs, but these settings won't affect systemd
services.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mo Zhou <cdluminate@gmail.com>
Closes#9544
Address two prototype related warnings emitted by clang.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#9535
Removes the 'ZFS=' prefix from $BOOTFS instead of $root. This makes sure
that the 'zfs:' prefix remains stripped so that users with
'root=zfs:dataset' cmdline can have key loaded on boot again.
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes#9520
Remove the stray leading + from the Makefile. This was
preventing the autosnap.lua channel program from being
properly included by `make dist`.
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9527
Currently, when you call 'zfs change-key' on an encrypted dataset
that has an unencrypted child, the code will trigger a VERIFY.
This VERIFY is leftover from before we allowed unencrypted
datasets to exist underneath encrypted ones. This patch fixes the
issue by simply replacing the VERIFY with an early return when
recursing through datasets.
Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#9524
In original implementation, zpool history will read the whole history
before printing anything, causing memory usage goes unbounded. We fix
this by breaking it into read-print iterations.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes#9516
Currently, incremental recursive encrypted receives fail to work
for any snapshot after the first. The reason for this is because
the check in zfs_setup_cmdline_props() did not properly realize
that when the user attempts to use '-x encryption' in this
situation, they are not really overriding the existing encryption
property and instead are attempting to prevent it from changing.
This resulted in an error message stating: "encryption property
'encryption' cannot be set or excluded for raw or incremental
streams".
This problem is fixed by updating the logic to expect this use
case.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#9494
* Use .ksh extension for ksh scripts, not .sh
* Remove .ksh extension from tests in common.run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#9502
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#9486
With 4k disks, this test will fail in the last section because the
expected human readable value of 20.0M is reported as 20.1M. Rather than
use the human readable property, switch to the parsable property and
verify that the values are reasonably close.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes#9477
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name flags total_memory
------------------------ ----------------------- ------------
abd_t KMC_NOMAGAZINE|KMC_SLAB 4.5MB
arc_buf_hdr_t_full KMC_NOMAGAZINE|KMC_SLAB 12.3MB
... <cropped> ...
ddt_cache KMC_VMEM 583.7KB
ddt_entry_cache KMC_NOMAGAZINE|KMC_SLAB 0.0B
... <cropped> ...
zio_buf_1048576 KMC_NODEBUG|KMC_VMEM 0.0B
... <cropped> ...
```
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes#9478
Currently, for certain sizes and classes of allocations we use
SPL caches that are backed by caches in the Linux Slab allocator
to reduce fragmentation and increase utilization of memory. The
way things are implemented for these caches as of now though is
that we don't keep any statistics of the allocations that we
make from these caches.
This patch enables the tracking of allocated objects in those
SPL caches by making the trade-off of grabbing the cache lock
at every object allocation and free to update the respective
counter.
Additionally, this patch makes those caches visible in the
/proc/spl/kmem/slab special file.
As a side note, enabling the specific counter for those caches
enables SDB to create a more user-friendly interface than
/proc/spl/kmem/slab that can also cross-reference data from
slabinfo. Here is for example the output of one of those
caches in SDB that outputs the name of the underlying Linux
cache, the memory of SPL objects allocated in that cache,
and the percentage of those objects compared to all the
objects in it:
```
> spl_kmem_caches | filter obj.skc_name == "zio_buf_512" | pp
name ... source total_memory util
----------- ... ----------------- ------------ ----
zio_buf_512 ... kmalloc-512[SLUB] 16.9MB 8
```
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes#9474
While it may sometimes be convenient to export an NFS filesystem with
no_root_squash it should not be the default behavior. Align the
default behavior with the Linux NFS server defaults. To restore
the previous behavior use 'zfs set sharenfs="no_root_squash,..."'.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9397Closes#9425
After commit 5e74ac51 which split and reordered the run files the
`zpool_status_-s` test began failing. The new ordering placed
the test after a previous test which used `zpool replace` to replace
a disk but did not clear its label. This resulted in the next test,
`zpool_status_-s`, failing because of the potentially active
pool being detected on the replaced vdev.
/dev/loop0 is part of potentially active pool 'testpool'
Use the default_mirror_setup_noexit() and default_cleanup_noexit()
functions to create the pool in `zpool_status_-s`. They use the -f
flag by default.
In the `scrub_after_resilver` test wipe the label during cleanup
to prevent future failures if the tests are again reordered.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#9451
Since 0.7.0, zpool import would unconditionally block on udev for 30
seconds. This introduced a regression in initramfs environments that
lack udev (particularly mdev based environments), yet use a zfs userland
tools intended for the system that had been built against udev. Gentoo's
genkernel is the main example, although custom user initramfs
environments would be similarly impacted unless special builds of the
ZFS userland utilities were done for them. Such environments already
have their own mechanisms for blocking until device nodes are ready
(such as genkernel's scandelay parameter), so it is unnecessary for
zpool import to block on a non-existent udev until a timeout is reached
inside of them.
Rather than trying to intelligently determine whether udev is available
on the system to avoid unnecessarily blocking in such environments, it
seems best to just allow the environment to override the timeout. I
propose that we add an environment variable called
ZPOOL_IMPORT_UDEV_TIMEOUT_MS. Setting it to 0 would restore the 0.6.x
behavior that was more desirable in mdev based initramfs environments.
This allows the system user land utilities to be reused when building
mdev-based initramfs archives.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#9436
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#9445
Mostly whitespace changes, no functional changes intended.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#9447