Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
vaclavskala	8929355b4c	Propagate extent_bytes change to autotrim thread The autotrim thread only reads zfs_trim_extent_bytes_min and zfs_trim_extent_bytes_max variable only on thread start. We should check for parameter changes during thread execution to allow parameter changes take effect without needing to disable then restart the autotrim. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Václav Skála <skala@vshosting.cz> Closes #14077	2022-11-01 12:48:23 -07:00
Coleman Kane	212ba9bd97	Linux 6.1 compat: change order of sys/mutex.h includes After Linux 6.1-rc1 came out, the build started failing to build a couple of the files in the linux spl code due to the mutex_init redefinition. Moving the sys/mutex.h include to a lower position within these two files appears to fix the problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #14040	2022-11-01 12:44:56 -07:00
Brian Behlendorf	7ce097c874	Linux 6.0 compat: META Update the META file to reflect compatibility with the 6.0 kernel. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14091	2022-11-01 12:43:49 -07:00
Alexander	3e767e34bd	Linux compat: fix DECLARE_EVENT_CLASS() test when ZFS is built-in ZFS_LINUX_TRY_COMPILE_HEADER macro doesn't take CONFIG_ZFS=y into account. As a result, on several latest Linux versions, configure script marks DECLARE_EVENT_CLASS() available for non-GPL when ZFS is being built as a module, but marks it unavailable when ZFS is built-in. Follow the logic of the neighbor macros and adjust ZFS_LINUX_TRY_COMPILE_HEADER accordingly, so that it doesn't try to look for a .ko when ZFS is built-in. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #14006	2022-11-01 12:43:29 -07:00
Christian Schwarz	df000276b8	zfs_domount: fix double-disown of dataset / double-free of zfsvfs_t Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we leave zfsvfs != NULL. This will lead to execution of the error handling `if` statement at the `out` label, and hence to a call to dmu_objset_disown and zfsvfs_free. However, zfs_umount, which we call upon failure of zfs_root and d_make_root already does dmu_objset_disown and zfsvfs_free. I suppose this patch rather adds to the brittleness of this part of the code base, but I don't want to invest more time in this right now. To add a regression test, we'd need some kind of fault injection facility for zfs_root or d_make_root, which doesn't exist right now. And even then, I think that regression test would be too closely tied to the implementation. To repro the double-disown / double-free, do the following: 1. patch zfs_root to always return an error 2. mount a ZFS filesystem Here's the stack trace you would see then: VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000) PANIC at dsl_dataset.c:1003:dsl_dataset_disown() Showing stack for process 28332 CPU: 2 PID: 28332 Comm: zpool Tainted: G O 5.10.103-1.nutanix.el7.x86_64 #1 Call Trace: dump_stack+0x74/0x92 spl_dumpstack+0x29/0x2b [spl] spl_panic+0xd4/0xfc [spl] dsl_dataset_disown+0xe9/0x150 [zfs] dmu_objset_disown+0xd6/0x150 [zfs] zfs_domount+0x17b/0x4b0 [zfs] zpl_mount+0x174/0x220 [zfs] legacy_get_tree+0x2b/0x50 vfs_get_tree+0x2a/0xc0 path_mount+0x2fa/0xa70 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x38/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com> Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com> Closes #14025	2022-11-01 12:42:32 -07:00
Richard Yao	7a1b6c51d0	Linux: Remove ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST autotools check On older kernels, the definition for `module_param_call()` typecasts function pointers to `(void *)`, which triggers -Werror, causing the check to return false when it should return true. Fixing this breaks the build process on some older kernels because they define a `__check_old_set_param()` function in their headers that checks for a non-constified `->set()`. We workaround that through the c preprocessor by defining `__check_old_set_param(set)` to `(set)`, which prevents the build failures. However, it is now apparent that all kernels that we support have adopted the GRSecurity change, so there is no need to have an explicit autotools check for it anymore. We therefore remove the autotools check, while adding the workaround to our headers for the build time non-constified `->set()` check done by older kernel headers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13984 Closes #14004	2022-11-01 12:42:01 -07:00
George Melikov	4dd9c3b08e	CI: bump actions/upload-artifact to v3 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #14018	2022-11-01 12:38:22 -07:00
George Melikov	1bbc09e1f7	CI: bump actions/checkout to v3 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #14018	2022-11-01 12:38:09 -07:00
Serapheim Dimitropoulos	37d5a3e04b	Stop ganging due to past vdev write errors = Problem While examining a customer's system we noticed unreasonable space usage from a few snapshots due to gang blocks. Under some further analysis we discovered that the pool would create gang blocks because all its disks had non-zero write error counts and they'd be skipped for normal metaslab allocations due to the following if-clause in `metaslab_alloc_dva()`: ``` /* * Avoid writing single-copy data to a failing, * non-redundant vdev, unless we've already tried all * other vdevs. */ if ((vd->vdev_stat.vs_write_errors > 0 \|\| vd->vdev_state < VDEV_STATE_HEALTHY) && d == 0 && !try_hard && vd->vdev_children == 0) { metaslab_trace_add(zal, mg, NULL, psize, d, TRACE_VDEV_ERROR, allocator); goto next; } ``` = Proposed Solution Get rid of the predicate in the if-clause that checks the past write errors of the selected vdev. We still try to allocate from HEALTHY vdevs anyway by checking vdev_state so the past write errors doesn't seem to help us (quite the opposite - it can cause issues in long-lived pools like the one from our customer). = Testing I first created a pool with 3 vdevs: ``` $ zpool list -v volpool NAME SIZE ALLOC FREE volpool 22.5G 117M 22.4G xvdb 7.99G 40.2M 7.46G xvdc 7.99G 39.1M 7.46G xvdd 7.99G 37.8M 7.46G ``` And used `zinject` like so with each one of them: ``` $ sudo zinject -d xvdb -e io -T write -f 0.1 volpool ``` And got the vdevs to the following state: ``` $ zpool status volpool pool: volpool state: ONLINE status: One or more devices has experienced an unrecoverable error. ...<cropped>.. action: Determine if the device needs to be replaced, and clear the ...<cropped>.. config: NAME STATE READ WRITE CKSUM volpool ONLINE 0 0 0 xvdb ONLINE 0 1 0 xvdc ONLINE 0 1 0 xvdd ONLINE 0 4 0 ``` I also double-checked their write error counters with sdb: ``` sdb> spa volpool \| vdev \| member vdev_stat.vs_write_errors (uint64_t)0 # <---- this is the root vdev (uint64_t)2 (uint64_t)1 (uint64_t)1 ``` Then I checked that I the problem was reproduced in my VM as I the gang count was growing in zdb as I was writting more data: ``` $ sudo zdb volpool \| grep gang ganged count: 1384 $ sudo zdb volpool \| grep gang ganged count: 1393 $ sudo zdb volpool \| grep gang ganged count: 1402 $ sudo zdb volpool \| grep gang ganged count: 1414 ``` Then I updated my bits with this patch and the gang count stayed the same. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #14003	2022-11-01 12:36:25 -07:00
Serapheim Dimitropoulos	25096e1180	zvol_wait logic may terminate prematurely Setups that have a lot of zvols may see zvol_wait terminate prematurely even though the script is still making progress. For example, we have a customer that called zvol_wait for ~7100 zvols and by the last iteration of that script it was still waiting on ~2900. Similarly another one called zvol_wait for 2200 and by the time the script terminated there were only 50 left. This patch adjusts the logic to stay within the outer loop of the script if we are making any progress whatsoever. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #13998	2022-11-01 12:35:36 -07:00
shodanshok	820edcbf91	Remove ambiguity on demand vs prefetch stats reported by arc_summary arc_summary currently list prefetch stats as "demand prefetch" However, a hit/miss can be due to demand or prefetch, not both. To remove any confusion, this patch removes the "Demand" word from the affected lines. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #13985	2022-11-01 12:35:05 -07:00
Serapheim Dimitropoulos	37763ea2a6	Fix panic in dsl_process_sub_livelist for EINTR = Issue Recently we hit an assertion panic in `dsl_process_sub_livelist` while exporting the spa and interrupting `bpobj_iterate_nofree`. In that case `bpobj_iterate_nofree` stops mid-way returning an EINTR without clearing the intermediate AVL tree that keeps track of the livelist entries it has encountered so far. At that point the code has a VERIFY for the number of elements of the AVL expecting it to be zero (which is not the case for EINTR). = Fix Cleanup any intermediate state before destroying the AVL when encountering EINTR. Also added a comment documenting the scenario where the EINTR comes up. There is no need to do anything else for the calles of `dsl_process_sub_livelist` as they already handle the EINTR case. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #13939	2022-11-01 12:34:08 -07:00
Mateusz Guzik	c8d6a91a99	Bring per_txg_dirty_frees_percent back to 30 The current value causes significant artificial slowdown during mass parallel file removal, which can be observed both on FreeBSD and Linux when running real workloads. Sample results from Linux doing make -j 96 clean after an allyesconfig modules build: before: 4.14s user 6.79s system 48% cpu 22.631 total after: 4.17s user 6.44s system 153% cpu 6.927 total FreeBSD results in the ticket. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13932 Closes #13938	2022-11-01 12:32:40 -07:00
Akash B	7ac732b8d6	Add options to zfs redundant_metadata property Currently, additional/extra copies are created for metadata in addition to the redundancy provided by the pool(mirror/raidz/draid), due to this 2 times more space is utilized per inode and this decreases the total number of inodes that can be created in the filesystem. By setting redundant_metadata to none, no additional copies of metadata are created, hence can reduce the space consumed by the additional metadata copies and increase the total number of inodes that can be created in the filesystem. Additionally, this can improve file create performance due to the reduced amount of metadata which needs to be written. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Signed-off-by: Akash B <akash-b@hpe.com> Closes #13680	2022-11-01 12:25:58 -07:00
Andriy Gapon	04f1983aab	FreeBSD: vn_flush_cached_data: observe vnode locking contract vm_object_page_clean() expects that the associated vnode is locked as VOP_PUTPAGES() may get called on the vnode. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Closes #14079 (cherry picked from commit `41133c9794`)	2022-10-27 16:14:57 -07:00
Mark Johnston	4e3fecbdfd	FreeBSD: Fix a pair of bugs in zfs_fhtovp() - Add a zfs_exit() call in an error path, otherwise a lock is leaked. - Remove the fid_gen > 1 check. That appears to be Linux-specific: zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the snapshot directory is mounted. On FreeBSD it fails, making snapshot dirs inaccessible via NFS. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Fixes: `43dbf88178` ("FreeBSD: vfsops: use setgen for error case") Closes #14001 Closes #13974 (cherry picked from commit `ed566bf1cd`)	2022-10-26 14:59:25 -07:00
samwyc	fc1c0053f9	Fix sequential resilver drive failure race condition This patch handles the race condition on simultaneous failure of 2 drives, which misses the vdev_rebuild_reset_wanted signal in vdev_rebuild_thread. We retry to catch this inside the vdev_rebuild_complete_sync function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com> Closes #14041 Closes #14050	2022-10-21 14:05:06 -07:00
Brian Behlendorf	7795975681	contrib: dracut: zfs-snapshot-bootfs: exit status fix Correct misplaced `-` is the original backport of #13769. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #13769	2022-10-20 11:37:21 -07:00
gregory-lee-bartholomew	3b935cc3ed	contrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix Due to a missing semicolon on the ExecStart line, it wasn't possible to specify the snapshot name on the bootfs.{rollback,snapshot} kernel parameters if the boot dataset name was obtained from the root=zfs:... kernel parameter. Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Closes #13585	2022-10-20 11:34:59 -07:00
Richard Yao	b0bc882395	kcfpool_alloc() should have its argument list marked void This error occurred when building on Gentoo with debugging enabled: zfs-kmod-2.1.6/work/zfs-2.1.6/module/icp/core/kcf_sched.c:1277:14: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] kcfpool_alloc() ^ void 1 error generated. This function is not present in master. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14023	2022-10-12 15:47:39 -07:00
наб	8cf59e97c4	etc: mask zfs-load-key.service Otherwise, systemd-sysv-generator will generate a service equivalent that breaks the boot: under systemd this is covered by zfs-mount-generator We already do this for zfs-import.service, and other init scripts are suppressed automatically by the "actual" .service files Fixes: commit `f04b976200` ("Add init script to load keys") Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #14010 Closes #14019	2022-10-12 15:29:21 -07:00
Damian Szuberski	4d22befde6	initramfs: use `mount.zfs` instead of `mount` A followup to `d7a67402a8` For `mount -t zfs -o opts ds mp` command line some implementations of `mount(8)`, e. g. Busybox in Debian work as follows: ``` newfstatat(AT_FDCWD, "ds", 0x7fff826f4ab0, 0) = -1 mount("ds", "mp", "zfs", MS_SILENT, NULL) = 0 ``` The logic above skips completely `mount.zfs` and prevents us from reading filesystem properties and applying mount options. For comparison, the coreutils `mount(8)` implementation does: ``` openat(AT_FDCWD, "/proc/filesystems", O_RDONLY\|O_CLOEXEC) = 3 // figure out that zfs is a `nodev` filesystem and look for a helper newfstatat(AT_FDCWD, "/sbin/mount.zfs" ...) = 0 execve("/sbin/mount.zfs" ...) = 0 ``` Using `mount.zfs` in initramfs would help circumvent deficiencies of some of `mount(8)` implementations. `mount -t zfs` translates to `mount.zfs` invocation, except for cases when explicitly disabled by `-i`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #13305 (cherry picked from commit `35d81a75a8`)	2022-10-05 17:01:39 -07:00
Tony Hutter	6a6bd49398	Tag zfs-2.1.6 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2022-09-28 17:25:10 -07:00
Richard Yao	566e908fa0	Fix bad free in skein code Clang's static analyzer found a bad free caused by skein_mac_atomic(). It will allocate a context on the stack and then pass it to skein_final(), which attempts to free it. Upon inspection, skein_digest_atomic() also has the same problem. These functions were created to match the OpenSolaris ICP API, so I was curious how we avoided this in other providers and looked at the SHA2 code. It appears that SHA2 has a SHA2Final() helper function that is called by the exported sha2_mac_final()/sha2_digest_final() as well as the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work is done in SHA2Final() while some checks and the free are done in sha2_mac_final()/sha2_digest_final(). We fix the use after free in the skein code by taking inspiration from the SHA2 code. We introduce a skein_final_nofree() that does most of the work, and make skein_final() into a function that calls it and then frees the memory. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13954	2022-09-28 17:25:10 -07:00
Tony Hutter	a2705b1dd5	zpool: Don't print "repairing" on force faulted drives If you force fault a drive that's resilvering, it's scan stats can get frozen in time, giving the false impression that it's being resilvered. This commit checks the vdev state to see if the vdev is healthy before reporting "resilvering" or "repairing" in zpool status. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #13927 Closes #13930	2022-09-28 12:41:23 -07:00
Mateusz Guzik	63d4838b4a	FreeBSD: handle V_PCATCH See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13910	2022-09-28 10:35:13 -07:00
Mateusz Guzik	eec942cc54	FreeBSD: catch up to 1400068 Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13909	2022-09-28 10:35:13 -07:00
Mateusz Guzik	2c8e3e4b28	FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK There is an ongoing effort to eliminate this feature. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13908	2022-09-28 10:35:13 -07:00
Richard Yao	55816c64da	FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}() When reviewing #13875, I noticed that our FreeBSD code has an issue where it converts from `int64_t` to `int` when calling `vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 << 36`, the int will be 0, since the low bits are 0. Even when some low bits are set, a value such as `((1 << 36) + 1)` would truncate to 1, which is wrong. There is protection against this on 32-bit platforms, but on 64-bit platforms, there is no check to protect us, so we add a check. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13882	2022-09-28 10:35:13 -07:00
Ryan Moeller	8dcd6af623	FreeBSD: Ignore symlink to i386 includes A symlink to i386 includes is created in the build dir on amd64 since freebsd/freebsd-src@d07600c563 Tell git to ignore it like the other include links. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #13719	2022-09-28 10:35:13 -07:00
Richard Yao	c973929b29	LUA: Fix CVE-2014-5461 Apply the fix from upstream. http://www.lua.org/bugs.html#5.2.2-1 https://www.opencve.io/cve/CVE-2014-5461 It should be noted that exploiting this requires the `SYS_CONFIG` privilege, and anyone with that privilege likely has other opportunities to do exploits, so it is unlikely that bad actors could exploit this unless system administrators are executing untrusted ZFS Channel Programs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13949	2022-09-27 16:49:02 -07:00
Richard Yao	835e03682c	Linux: Fix uninitialized variable usage in zio_do_crypt_data() Coverity complained about this. An error from `hkdf_sha512()` before uio initialization will cause pointers to uninitialized memory to be passed to `zio_crypt_destroy_uio()`. This is a regression that was introduced by `cf63739191`. Interestingly, this never affected FreeBSD, since the FreeBSD version never had that patch ported. Since moving uio initialization to the top of this function would slow down the qat_crypt() path, we only move the `memset()` calls to the top of the function. This is sufficient to fix this problem. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13944	2022-09-27 15:43:26 -07:00
Alexander Motin	33223cbc3c	Refactor Log Size Limit Original Log Size Limit implementation blocked all writes in case of limit reached until the TXG is committed and the log is freed. It caused huge delays and following speed spikes in application writes. This implementation instead smoothly throttles writes, using exactly the same mechanism as used for dirty data. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: jxdking <lostking2008@hotmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Issue #12284 Closes #13476	2022-09-26 14:55:27 -07:00
Brian Behlendorf	91e02156dd	Revert "Reduce dbuf_find() lock contention" This reverts commit `34dbc618f5`. While this change resolved the lock contention observed for certain workloads, it inadventantly reduced the maximum hash inserts/removes per second. This appears to be due to the slightly higher acquisition cost of a rwlock vs a mutex. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2022-09-21 13:15:51 -07:00
Richard Yao	b66f8d3c2b	Add zfs_btree_verify_intensity kernel module parameter I see a few issues in the issue tracker that might be aided by being able to turn this on. We have no module parameter for it, so I would like to add one. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13874	2022-09-21 13:15:51 -07:00
Richard Yao	5096ed31c8	Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c We pass sizeof (struct redact_record *) rather than sizeof (struct redact_record). Passing the pointer size is wrong. Coverity caught this in two places. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13885	2022-09-21 13:15:51 -07:00
Ameer Hamza	035e52f591	Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive For encrypted raw receive, objset creation is delayed until a call to dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be populated when calling zpl_earlier_version(). To correctly handle the ZFS_PROP_SHARESMB property for encrypted raw receive, this change delays setting the property. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13878	2022-09-21 13:15:26 -07:00
Ameer Hamza	d5105f068f	zfs recv hangs if max recordsize is less than received recordsize - Some optimizations for bqueue enqueue/dequeue. - Added a fix to prevent deadlock when both bqueue_enqueue_impl() and bqueue_dequeue() waits for signal to be triggered. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13855	2022-09-21 13:15:26 -07:00
наб	faa1e4082d	include: move SPA_MINBLOCKSHIFT and zio_encrypt to sys/fs/zfs.h These are used by userspace, so should live in a public header Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12116	2022-09-21 13:15:26 -07:00
Alexander Motin	44cec45f72	Improve too large physical ashift handling When iterating through children physical ashifts for vdev, prefer ones above the maximum logical ashift, that we can actually use, but within the administrator defined maximum. When selecting top-level vdev ashift, do not set it to the defined maximum in case physical ashift is even higher, but just ignore one. Using the maximum does not prevent misaligned writes, but reduces space efficiency. Since ZFS tries to write data sequentially and aggregates the writes, in many cases large misanigned writes may be not as bad as the space penalty otherwise. Allow internal physical ashifts for vdevs higher than SHIFT_MAX. May be one day allocator or aggregation could benefit from that. Reduce zfs_vdev_max_auto_ashift default from 16 (64KB) to 14 (16KB), so that ZFS may still use bigger ashifts up to SHIFT_MAX (64KB), but only if it really has to or explicitly told to, but not as an "optimization". There are some read-intensive NVMe SSDs that report Preferred Write Alignment of 64KB, and attempt to build RAIDZ2 of those leads to a space inefficiency that can't be justified. Instead these changes make ZFS fall back to logical ashift of 12 (4KB) by default and only warn user that it may be suboptimal for performance. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #13798	2022-09-21 13:15:15 -07:00
Rich Ercolani	ebbbe01e31	Ask libtool to stop hiding some errors For #13083, curiously, it did not print the actual error, just that the compile failed with "Error 1". In theory, this flag should cause it to report errors twice sometimes. In practice, I'm pretty okay with reporting some twice if it avoids reporting some never. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #13086	2022-09-21 16:12:14 -07:00
Kevin Jin	d05f3039f7	Add Module Parameter Regarding Log Size Limit zfs_wrlog_data_max The upper limit of TX_WRITE log data. Once it is reached, write operation is blocked, until log data is cleared out after txg sync. It only counts TX_WRITE log with WR_COPIED or WR_NEED_COPY. Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: jxdking <lostking2008@hotmail.com> Closes #12284	2022-09-21 16:12:14 -07:00
Kevin Jin	999830a021	Optimize txg_kick() process (#12274 ) Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in original code. Extra parameter "txg" is added for txg_kick(), thus it knows which txg to kick. Also txg_kick() call is moved from dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can know the txg number assigned for txg_kick(). Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is also cleaned up. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: jxdking <lostking2008@hotmail.com> Closes #12274	2022-09-21 16:12:14 -07:00
Ameer Hamza	a5b0d42540	zfs recv hangs if max recordsize is less than received recordsize - Some optimizations for bqueue enqueue/dequeue. - Added a fix to prevent deadlock when both bqueue_enqueue_impl() and bqueue_dequeue() waits for signal to be triggered. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13855	2022-09-19 09:39:07 -07:00
Christian Schwarz	cde04badd1	make DMU_OT_IS_METADATA and DMU_OT_IS_ENCRYPTED return B_TRUE or B_FALSE Without this patch, the ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf)); at the beginning of dbuf_assign_arcbuf can panic if the object type is a DMU_OT_NEWTYPE that has DMU_OT_METADATA set. While we're at it, fix DMU_OT_IS_ENCRYPTED as well. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com> Closes #13842	2022-09-15 16:58:35 -07:00
Richard Yao	3f7c174b50	vdev_draid_lookup_map() should not iterate outside draid_maps Coverity reported this as an out-of-bounds read. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13865	2022-09-15 16:58:35 -07:00
Akash B	03fa3ef264	Add physical device size to SIZE column in 'zpool list -v' Add physical device size/capacity only for physical devices in 'zpool list -v' instead of displaying "-" in the SIZE column. This would make it easier to see the individual device capacity and to determine which spares are large enough to replace which devices. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Signed-off-by: Akash B <akash-b@hpe.com> Closes #12561 Closes #13106	2022-09-15 10:23:01 -07:00
George Amanakis	8bd3dca9bf	Introduce a tunable to exclude special class buffers from L2ARC Special allocation class or dedup vdevs may have roughly the same performance as L2ARC vdevs. Introduce a new tunable to exclude those buffers from being cacheable on L2ARC. Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #11761 Closes #12285	2022-09-14 11:27:00 -07:00
наб	c8f795ba53	config: check for parallel(1), use it for cstyle Before: $ time make cstyle real 0m23.118s user 0m23.002s sys 0m0.114s After: $ time make cstyle real 0m4.577s user 0m31.487s sys 0m0.699s Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue #12899	2022-09-14 11:23:25 -07:00
Tony Hutter	7bbfac9d04	zed: Fix config_sync autoexpand flood Users were seeing floods of `config_sync` events when autoexpand was enabled. This happened because all "disk status change" udev events invoke the autoexpand codepath, which calls zpool_relabel_disk(), which in turn cause another "disk status change" event to happen, in a feedback loop. Note that "disk status change" happens every time a user calls close() on a block device. This commit breaks the feedback loop by only allowing an autoexpand to happen if the disk actually changed size. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes: #7132 Closes: #7366 Closes #13729	2022-09-14 09:57:44 -07:00

1 2 3 4 5 ...

7475 Commits All Branches Search

7475 Commits

All Branches