Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Herb Wartens	33075e465f	Allow MMP to bypass waiting for other threads At our site we have seen cases when multi-modifier protection is enabled (multihost=on) on our pool and the pool gets suspended due to a single disk that is failing and responding very slowly. Our pools have 90 disks in them and we expect disks to fail. The current version of MMP requires that we wait for other writers before moving on. When a disk is responding very slowly, we observed that waiting here was bad enough to cause the pool to suspend. This change allows the MMP thread to bypass waiting for other threads and reduces the chances the pool gets suspended. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Herb Wartens <hawartens@gmail.com> Closes #14659	2023-04-24 12:55:07 -07:00
Brian Behlendorf	cdbe1d65c4	Increase default zfs_rebuild_vdev_limit to 64MB When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14428	2023-04-24 12:55:07 -07:00
Brian Behlendorf	fa28e26e42	Increase default zfs_scan_vdev_limit to 16MB For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk. This does increase the memory footprint during scrub/resilver but given the performance win this is a reasonable trade off. Memory usage is capped at 1/4 of arc_c_max. Note that number of outstanding I/Os has not changed and is still limited by zfs_vdev_scrub_max_active. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14428	2023-04-24 12:55:07 -07:00
Brian Behlendorf	9fe3da9364	Improve resilver ETAs When resilvering the estimated time remaining is calculated using the average issue rate over the current pass. Where the current pass starts when a scan was started, or restarted, if the pool was exported/imported. For dRAID pools in particular this can result in wildly optimistic estimates since the issue rate will be very high while scanning when non-degraded regions of the pool are scanned. Once repair I/O starts being issued performance drops to a realistic number but the estimated performance is still significantly skewed. To address this we redefine a pass such that it starts after a scanning phase completes so the issue rate is more reflective of recent performance. Additionally, the zfs_scan_report_txgs module option can be set to reset the pass statistics more often. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14410	2023-04-24 12:55:07 -07:00
Ameer Hamza	a68dfdb88c	Fix "Detach spare vdev in case if resilvering does not happen" Spare vdev should detach from the pool when a disk is reinserted. However, spare detachment depends on the completion of resilvering, and if resilver does not schedule, the spare vdev keeps attached to the pool until the next resilvering. When a zfs pool contains several disks (25+ mirror), resilvering does not always happen when a disk is reinserted. In this patch, spare vdev is manually detached from the pool when resilvering does not occur and it has been tested on both Linux and FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14722	2023-04-24 09:56:13 -07:00
Richard Yao	4a5950a129	Linux: zfs_fillpage() should handle partial pages from end of file After `89cd2197b9` was merged, Clang's static analyzer began complaining about a dead assignment in `zfs_fillpage()`. Upon inspection, I noticed that the dead assignment was because we are not using the calculated io_len that we should use to avoid asking the DMU to read past the end of a file. This should result in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`. This issue predates `89cd2197b9`, but its simplification of zfs_fillpage() eliminated the only use of the assignment to io_len, which made Clang's static analyzer complain about the issue. Also, as a precaution, we add an assertion that io_offset < i_size. If this ever fails, bad things will happen. Otherwise, we are blindly trusting the kernel not to give us invalid offsets. We continue to blindly trust it on non-debug kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14534	2023-04-21 13:12:35 -07:00
Brian Behlendorf	c7db374ac6	Fix buffered/direct/mmap I/O race When a page is faulted in for memory mapped I/O the page lock may be dropped before it has been read and marked up to date. If a buffered read encounters such a page in mappedread() it must wait until the page has been updated. Failure to do so will result in a panic on debug builds and incorrect data on production builds. The critical part of this change is in mappedread() where pages which are not up to date are now handled. Additionally, it includes the following simplifications. - zfs_getpage() and zfs_fillpage() could be passed an array of pages. This could be more efficient if it was used but in practice only a single page was ever provided. These interfaces were simplified to acknowledge that. - update_pages() was modified to correctly set the PG_error bit on a page when it cannot be read by dmu_read(). - Setting PG_error and PG_uptodate was moved to zfs_fillpage() from zpl_readpage_common(). This is consistent with the handling in update_pages() and mappedread(). - Minor additional refactoring to comments and variable declarations to improve readability. - Add a test case to exercise concurrent buffered, direct, and mmap IO to the same file. - Reduce the mmap_sync test case default run time. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13608 Closes #14498	2023-04-21 13:12:35 -07:00
Tony Hutter	e25f9131d6	Tag zfs-2.1.11 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2023-04-18 11:44:34 -07:00
Low-power	4e49d8ed27	Values printed by zpool-iostat(8) should be right-aligned This inappropriate left-alignment was introduced in `7bb7b1f`. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: WHR <msl0000023508@gmail.com> Closes #14751	2023-04-18 11:44:07 -07:00
Tony Hutter	a969b1b22c	Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()" This reverts commit `4b3133e671`. Users identified this commit as a possible source of data corruption: https://github.com/openzfs/zfs/issues/14753 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Issue #14753 Closes #14761	2023-04-18 11:44:07 -07:00
наб	18edf7a3ba	contrib: dracut: fix race with root=zfs:dset when necessities required This had always worked in my testing, but a user on hardware reported this to happen 100%, and I reproduced it once with cold VM host caches. dracut-zfs-generator runs as a systemd generator, i.e. at Some Relatively Early Time; if root= is a fixed dataset, it tries to "solve [necessities] statically at generation time". If by that point zfs-import.target hasn't popped (because the import is taking a non-negligible amount of time for whatever reason), it'll see no children for the root datase, and as such generate no mounts. This has never had any right to work. No-one caught this earlier because it's just that much more convenient to have root=zfs:AUTO, which orders itself properly. To fix this, always run zfs-nonroot-necessities.service; this additionally simplifies the implementation by: * making BOOTFS from zfs-env-bootfs.service be the real, canonical, root dataset name, not just "whatever the first bootfs is", and only set it if we're ZFS-booting * zfs-{rollback,snapshot}-bootfs.service can use this instead of re-implementing it * having zfs-env-bootfs.service also set BOOTFSFLAGS * this means the sysroot.mount drop-in can be fixed text * zfs-nonroot-necessities.service can also be constant and always enabled, because it's conditioned on BOOTFS being set There is no longer any code generated at run-time (the sysroot.mount drop-in is an unavoidable gratuitous cp). The flow of BOOTFS{,FLAGS} from zfs-env-bootfs.service to sysroot.mount is not noted explicitly in dracut.zfs(7), because (a) at some point it's just visual noise and (b) it's already ordered via d-p-m.s from z-i.t. Backport-of: `3399a30ee0` Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>	2023-04-18 10:10:45 -07:00
Tony Hutter	33003ecc93	Tag zfs-2.1.10 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2023-04-13 15:59:45 -07:00
Damian Szuberski	07cc8ae46a	Removed Python 2 and Python 3.5- support Deprecation of Python versions below 3.6 gives opportunity to unify the build and install requirements for OpenZFS packages. The minimal supported Python version is 3.6 as this is the most recent Python package CentOS/RHEL 7 users can get. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #12925	2023-04-13 15:59:45 -07:00
youzhongyang	e202d3b867	linux 6.3 compat: needs REQ_PREFLUSH \| REQ_OP_WRITE Modify bio_set_flush() so if kernel version is >= 4.10, flags REQ_PREFLUSH and REQ_OP_WRITE are set together. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #14695	2023-04-13 09:39:59 -07:00
Tino Reichardt	3d9f37026d	Fix "Add colored output to zfs list" Running `zfs list -o avail rpool` resulted in a core dump. This commit will fix this. Run the needed overhead only, when `use_color()` is true. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14712	2023-04-13 09:39:53 -07:00
Tony Hutter	fc2c0256c5	ZTS: Log test name to /dev/kmsg on Linux Add a -K option to the test suite to log each test name to /dev/kmsg (on Linux), so if there's a kernel warning we'll be able to match it up to a particular test. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #13227	2023-04-03 13:50:02 -07:00
Damian Szuberski	a69765ea5b	Add Linux kmemleak support to ZTS - Kmemleak `clear` is invoked right before every test case run. - Kmemleak `scan` is requested right after each test case is finished. - Kmemleak instrumentation is not used for setup/cleanup/pretest/posttest/failsafe stages to shorten the test case execution time. - Kmemleak periodic scan is disabled (`scan=0`) before the test suite run to avoid interfering with the on-demand scan results. - There are unavoidable potential false positives coming from kernel areas other than OpenZFS module. - The ZTS with kmemleak enabled duration is increased by ~50%. Example run ``` Running Time: 07:12:13 Percent passed: 98.3% unreferenced object 0xffff9da82aea5410 (size 80): comm "kworker/u32:10", pid 942206, jiffies 4296749716 (age 2615.516s) hex dump (first 32 bytes): 00 30 30 00 00 00 00 00 ff 8f 30 00 00 00 00 00 .00.......0..... 51 e6 77 05 a8 9d ff ff 00 00 00 00 00 00 00 00 Q.w............. backtrace: [<000000005cf1fea2>] alloc_extent_state+0x1d/0xb0 [btrfs] [<0000000083f78ae5>] set_extent_bit+0x2ff/0x670 [btrfs] [<00000000de29249e>] lock_extent_bits+0x6b/0xa0 [btrfs] [<00000000b241f424>] lock_and_cleanup_extent_if_need+0xaf/0x1c0 [btrfs] [<0000000093ca72b5>] btrfs_buffered_write+0x297/0x7d0 [btrfs] [<000000002c2938c8>] btrfs_file_write_iter+0x127/0x390 [btrfs] [<00000000b888f720>] do_iter_readv_writev+0x152/0x1b0 [<00000000320f0bcc>] do_iter_write+0x7c/0x1c0 [<000000000b5a8fe0>] lo_write_bvec+0x62/0x150 [loop] [<000000009aa03c73>] loop_process_work+0x250/0xbd0 [loop] [<00000000c7487d8a>] process_one_work+0x1f1/0x390 [<000000000b236831>] worker_thread+0x53/0x3e0 [<0000000023cb3e57>] kthread+0x127/0x150 [<000000002d48676a>] ret_from_fork+0x22/0x30 ``` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #13084	2023-04-03 13:50:02 -07:00
Tony Hutter	a164143dfd	Linux 6.2 compat: META Update the META file to reflect compatibility with the 6.2 kernel. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #14689	2023-03-29 15:49:06 -07:00
Ameer Hamza	c63b47e4da	Fix console progress reporting for recursive send After commit `19d3961`, progress reporting (-v) with replication flag enabled does not report the progress on the console. This commit fixes the issue by updating the logic to check for pa->progress instead of pa_verbosity in send_progress_thread(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14448	2023-03-29 14:45:34 -07:00
rob-wing	0e30635f9c	zfs_main.c: fix unused variable error with GCC zfs_setproctitle_init() is stubbed out on FreeBSD. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob Wing <rob.fx907@gmail.com> Closes #14441	2023-03-29 14:45:34 -07:00
Ameer Hamza	777c98ee52	Use setproctitle to report progress of zfs send This allows parsing of zfs send progress by checking the process title. Doing so requires some changes to the send code in libzfs_sendrecv.c; primarily these changes move some of the accounting around, to allow for the code to be verbose as normal, or set the process title. Unlike BSD, setproctitle() isn't standard in Linux; thus, borrowed it from libbsd with slight modifications. Authored-by: Sean Eric Fagan <sef@FreeBSD.org> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14376	2023-03-29 14:45:34 -07:00
Brian Behlendorf	164d184ed9	Additional limits on hole reporting Holding the zp->z_rangelock as a RL_READER over the range 0-UINT64_MAX is sufficient to prevent the dnode from being re-dirtied by concurrent writers. To avoid potentially looping multiple times for external caller which do not take the rangelock holes are not reported after the first sync. While not optimal this is always functionally correct. This change adds the missing rangelock calls on FreeBSD to zvol_cdev_ioctl(). Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14512 Closes #14641	2023-03-29 10:40:49 -07:00
Tino Reichardt	3da577280a	Add colored output to zfs list Use a bold header row and colorize the AVAIL column based on the used space percentage of volume. We define these colors: - when > 80%, use yellow - when > 90%, use red Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621 Closes #14350	2023-03-28 14:13:33 -07:00
Tino Reichardt	433b9a89c4	Colorize zpool iostat output Use a bold header and colorize the space suffixes in iostat by order of magnitude like this: - K is green - M is yellow - G is red - T is lightblue - P is magenta - E is cyan - 0 space is colored gray Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621 Closes #14459	2023-03-28 14:13:24 -07:00
Tino Reichardt	374f86b5c9	Add more ANSI colors to libzfs Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621	2023-03-28 14:03:36 -07:00
Rich Ercolani	59f1875639	linux 6.3 compat: add another bdev_io_acct case Linux 6.3+, and backports from it (6.2.8+), changed the signatures on bdev_io_{start,end}_acct. Add a case for it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14658 Closes #14668	2023-03-28 08:16:46 -07:00
Ameer Hamza	aa7258ced0	Update vdev state for spare vdev zfsd fetches new pool configuration through ZFS_IOC_POOL_STATS but it does not get updated nvlist configuration for spare vdev since the configuration is read by spa_spares->sav_config. In this commit, updating the vdev state for spare vdev that is consumed by zfsd on spare disk hotplug. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14653	2023-03-27 11:46:24 -07:00
Ameer Hamza	dedd8243fc	zed: add hotplug support for spare vdevs This commit supports for spare vdev hotplug. The spare vdev associated with all the pools will be marked as "Removed" when the drive is physically detached and will become "Available" when the drive is reattached. Currently, the spare vdev status does not change on the drive removal and the same is the case with reattachment. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14295	2023-03-27 11:32:09 -07:00
Ameer Hamza	43d63ab2d4	zed: post a udev change event from spa_vdev_attach() In order for zed to process the removal event correctly, udev change event needs to be posted to sync the blkid information. spa_create() and spa_config_update() posts the event already through spa_write_cachefile(). Doing the same for spa_vdev_attach() that handles the case for vdev attachment and replacement. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14172	2023-03-27 11:32:09 -07:00
Ameer Hamza	bd9a9a4e1a	zed: mark disks as REMOVED when they are removed ZED does not take any action for disk removal events if there is no spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED on removal event. This means that if you are running zed and remove a disk, it will be propertly marked as REMOVED. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-03-27 11:32:09 -07:00
Alexander Motin	5219a2691e	FreeBSD: Remove extra arc_reduce_target_size() call Remove arc_reduce_target_size() call from arc_prune_task(). The idea of arc_prune_task() is to remove external references on ARC metadata, such as vnodes. Since arc_prune_async() is called only from ARC itself, it makes no sense to create a parasitic loop between ARC eviction and the pruning, treatening to drop ARC to its minimum. I can't guess why it was added as part of FreeBSD to OpenZFS integration. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14639	2023-03-24 12:27:52 -05:00
Alexander Motin	48f376b0c5	Improve arc_read() error reporting Debugging reported NULL de-reference panic in dnode_hold_impl() I found that for certain types of errors arc_read() may only return error code, but not properly report it via done and pio arguments. Lack of done calls may result in reference and/or memory leaks in higher level code. Lack of error reporting via pio may result in unnoticed errors there. For example, dbuf_read(), where dbuf_read_impl() ignores arc_read() return, relies completely on the pio mechanism and missed the errors. This patch makes arc_read() to always call done callback and always propagate errors to parent zio, if either is provided. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.	2023-03-24 11:55:36 -05:00
naivekun	345f8beb58	QAT: Fix uninitialized seed in QAT compression CpaDcRqResults have to be initialized with checksum=1 for adler32. Otherwise when error CPA_DC_OVERFLOW occurred, the next compress operation will continue on previously part-compressed data, and write invalid checksum data. When zfs decompress the compressed data, a invalid checksum will occurred and lead to #14463 Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Weigang Li <weigang.li@intel.com> Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com> Signed-off-by: naivekun <naivekun0817@gmail.com> Closes #14632 Closes #14463	2023-03-17 11:09:07 -07:00
ofthesun9	a5c469c5f3	Fix for mountpoint=legacy We need to clear mountpoint only after checking it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: ofthesun9 <olivier@ofthesun.net> Closes #14599 Closes #14604	2023-03-15 10:45:07 -07:00
Matthew Ahrens	4b3133e671	ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced() `lseek(SEEK_DATA \| SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #13368 Issue #14594 Issue #14512 Issue #14009	2023-03-15 10:44:54 -07:00
Brian Behlendorf	4f6e0ca117	Update workflows Update the GitHub actions workflows using a subset of the changes from the master branch, commit `620a977f22`. Cherry-picking each relevant commit would have resulted in a large number of conflicts so this change only applies a minimal set of useful updates. - Added build-dependencies.txt and checkstyle-dependencies.txt - Added reclaim_disk_space.sh script - Minor changes to build steps - Reduced ztest run time - checkbashisms, mandoc, and cppcheck were not included to avoid additional backports - Add exceptions for shellcheck Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2023-03-09 18:46:36 -08:00
Brian Behlendorf	1d2aea7711	Workaround GitHub Action failure Ubuntu 20.04 and 22.04 workflows are failing due to an error which is hit when running `apt-get update`. Until the problematic package is fixed apply the suggested workaround described here: https://github.com/orgs/community/discussions/47863 Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14530	2023-03-09 18:46:36 -08:00
Brian Behlendorf	0a4e41d0fa	Ubuntu 22.04 integration: GitHub workflows - GitHub workflows are run on Ubuntu 22.04 - Extract the `checkstyle` workflow dependencies to a separate file. - Refresh the `build-dependencies.txt` list. Notes: Partial check pick of `9e7fc5da38`. This change does not include the build-dependencies.txt or checkstyle-dependencies.txt changes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14148	2023-03-09 18:46:36 -08:00
q66	050be02c02	initramfs: fix zpool get argument order When using the zfs initramfs scripts on my system, I get various errors at initramfs stage, such as: cannot open '-o': name must begin with a letter My zfs binaries are compiled with musl libc, which may be why this happens. In any case, fix the argument order to make the zpool binary happy, and to match its --help output. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Closes #14572	2023-03-07 14:07:56 -08:00
Mateusz Piotrowski	576d34cb11	Turn default_bs and default_ibs into ZFS_MODULE_PARAMs The default_bs and default_ibs tunables control the default block size and indirect block size. So far, default_bs and default_ibs were tunable only on FreeBSD, e.g., sysctl vfs.zfs.default_ibs Remove the FreeBSD-specific sysctl code and expose default_bs and default_ibs as tunables on both Linux and FreeBSD using ZFS_MODULE_PARAM. One of the use cases for changing the values of those tunables is to lower the indirect block size, which may improve performance of large directories (as discussed during the OpenZFS Leadership Meeting on 2022-08-16). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Closes #14293	2023-03-07 13:58:36 -08:00
Richard Yao	6281b5c488	Add missing increment to dsl_deadlist_move_bpobj() `dc5c8006f6` was recently merged to prefetch up to 128 deadlists. Unfortunately, a loop was missing an increment, such that it will prefetch all deadlists. The performance properties of that patch probably should be re-evaluated. This was caught by CodeQL's cpp/constant-comparison check in an experimental branch where I am testing the security-and-extended queries. It complained about the `i < 128` part of the loop condition always evaluating to the same thing. The standard CodeQL configuration we use missed this because it does not include that check. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14573	2023-03-07 09:08:18 -08:00
George Amanakis	231a37c4c0	Optimize the is_l2cacheable functions by placing the most common use case (no special vdevs) first and avoid allocating new variables. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14494 Closes #14563	2023-03-07 09:07:58 -08:00
Alexander Motin	9d2e5c14b2	System-wide speculative prefetch limit. With some pathological access patterns it is possible to make ZFS accumulate almost unlimited amount of speculative prefetch ZIOs. Combined with linear ABD allocations in RAIDZ code, it appears to be possible to exhaust system KVA, triggering kernel panic. Address this by introducing a system-wide counter of active prefetch requests and blocking prefetch distance doubling per stream hits if the number of active requests is higher that ~6% of ARC size. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14516	2023-03-02 14:37:07 -08:00
Alexander Motin	b644a45bd4	Prefetch on deadlists merge During snapshot deletion ZFS may issue several reads for each deadlist to merge them into next snapshot's or pool's bpobj. Number of the dead lists increases with number of snapshots. On HDD pools it may take significant time during which sync thread is blocked. This patch introduces prescient prefetch of required blocks for up to 128 deadlists ahead. Tests show reduction of time required to delete dataset with 720 snapshots with randomly overwritten file on wide HDD pool from 75-85 to 22-28 seconds. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue #14276 Closes #14402	2023-03-02 14:37:07 -08:00
Alexander Motin	fd0893cf1f	Introduce minimal ZIL block commit delay Despite all optimizations, tests on actual hardware show that FreeBSD kernel can't sleep for less then ~2us. Similar tests on Linux show ~50us delay at least from nanosleep() (haven't tested inside kernel). It means that on very fast log device ZIL may not be able to satisfy zfs_commit_timeout_pct block commit timeout, increasing log latency more than desired. Handle that by introduction of zil_min_commit_timeout parameter, specifying minimal timeout value where additional delays to aggregate writes may be skipped. Also skip delays if the LWB is more than 7/8 full, that often happens if I/O sizes are constant and match one of LWB sizes. Both things are applied only if there were no already outstanding log blocks, that may indicate single-threaded workload, that by definition can not benefit from the commit delays. While there, add short time moving average to zl_last_lwb_latency to make it more stable. Tests of single-threaded 4KB writes to NVDIMM SLOG on FreeBSD show IOPS increase by 9% instead of expected 5%. For zfs_commit_timeout_pct of 1 there IOPS increase by 5.5% instead of expected 1%. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14418	2023-03-02 14:37:07 -08:00
Alexander Motin	c8d36e2192	Pack zrlock_t by 8 bytes On FreeBSD this reduces this structure size from 64 to 56 bytes. dnode_handle_t respectively reduces from 72 to 64 bytes. It sounds like a waste to need 72 bytes to be able to relocate 808 bytes of dnode_t, which relocation on FreeBSD is not even supported. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14317	2023-03-02 14:37:07 -08:00
Alexander Motin	edaa250bb3	Remove few pointer dereferences in dbuf_read() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #14199	2023-03-02 14:37:07 -08:00
Alexander Motin	04a6ae0585	Switch dnode stats to wmsums I've noticed that some of those counters are used in hot paths like dnode_hold_impl(), and results of this change is visible in profiler. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #14198	2023-03-02 14:37:07 -08:00
Alexander Motin	eb68e3cd56	Micro-optimize zrl_remove() atomic_dec_32() should be a bit lighter than atomic_dec_32_nv(). Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #14200	2023-03-02 14:37:07 -08:00
Alexander Motin	2098a00318	Remove atomics from zh_refcount It is protected by z_hold_locks, so we do not need more serialization, simple integer math should be fine. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #14196	2023-03-02 14:37:07 -08:00

1 2 3 4 5 ...

7543 Commits All Branches Search

7543 Commits

All Branches