Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Tony Hutter	9d14963de6	vdev_id: Fix partition regular expression Given a DM device name, the old vdev_id script would extract any text after a 'p' as the partition number. It then appends "-part" + the partition number to the name, giving a by-vdev name like "L0-part5". This works fine if the DM name is like 'dm-2p5', but doesn't work if the DM name is a multipath name like "mpatha". In those cases it incorrectly matches the 'p' in "mpatha", giving by-vdev names like "L0-partatha". This patch fixes the issue by making the partition regex match stricter. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #11637	2021-11-12 16:31:55 -08:00
Tony Hutter	88a3751e67	Better zfs_get_enclosure_sysfs_path() enclosure support A multpathed disk will have several 'underlying' paths to the disk. For example, multipath disk 'dm-0' may be made up of paths: /dev/{sda,sdb,sdc,sdd}. On many enclosures those underlying sysfs paths will have a symlink back to their enclosure device entry (like 'enclosure_device0/slot1'). This is used by the statechange-led.sh script to set/clear the fault LED for a disk, and by 'zpool status -c'. However, on some enclosures, those underlying paths may not all have symlinks back to the enclosure device. Maybe only two out of four of them might. This patch updates zfs_get_enclosure_sysfs_path() to favor returning paths that have symlinks back to their enclosure devices, rather than just returning the first path. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #11617	2021-11-12 16:31:55 -08:00
Arshad Hussain	bdd43a2396	vdev_id: Support daisy-chained JBODs in multipath mode Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes #11526	2021-11-12 16:31:55 -08:00
Rich Ercolani	47ada9e5ed	Added error for writing to /dev/ on Linux Starting in Linux 5.10, trying to write to /dev/{null,zero} errors out. Prefer to inform people when this happens rather than hoping they guess what's wrong. Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes: #11991	2021-11-12 16:31:55 -08:00
Brian Behlendorf	c4a5e56abc	ZTS: Add known exceptions The receive-o-x_props_override test case reliably fails on the FreeBSD main builders (but not on Linux), until the root cause is understood add this test to the FreeBSD exception list. On Linux the alloc_class_012_pos test case may occasionally fail. This is a known false positive which has also been added to the Linux exception list until the test can be made entirely reliable. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12272	2021-11-12 16:31:55 -08:00
Brian Behlendorf	ec1b033413	ZTS: Standardize use of destroy_dataset in cleanup When cleaning up a test case standardize on using the convention: datasetexists $ds && destroy_dataset $ds <flags> By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure that the destroy is retried in the event that a ZFS volume is busy. This helps ensures ensure tests are fully cleaned up and prevents false positive test failures on Linux. Note that all of the tests which used 'zfs destroy' in cleanup have been updated even if they don't use volumes. This was done to clearly establish the expected convention. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12663	2021-11-12 16:31:55 -08:00
Damian Szuberski	8464de1315	Update `checkstyle` workflow env to ubuntu-20.04 - `checkstyle` workflow uses ubuntu-20.04 environment - improved `mancheck.sh` readability Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #12713	2021-11-12 16:31:55 -08:00
Rich Ercolani	cc40a67cf8	Workaround cloud-init hotplug issue cloud-init added a hook which triggers on every device add/rm event, which results in holding open devices for a while after they're created/destroyed. So let's shove an exclusion rule for that into the GH workflows until it gets fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12644 Closes #12669	2021-11-12 16:31:54 -08:00
George Melikov	c2a69a21ef	CI: don't install abigail-tools We use docker image instead. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #12529	2021-11-12 16:31:54 -08:00
George Melikov	866ac70904	CI: use fresh libabigail via docker image Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #12529	2021-11-12 16:31:48 -08:00
Jonathon	f3c85e3ebd	Update libera webchat client URL Libera have made a webchat client available. This change builds on #12127. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev> Closes #12251	2021-11-12 15:40:55 -08:00
Paul Dagnelie	14770e1030	Don't direct to freenode in issue template While Libera doesn't yet have a webchat client, we should at least direct them to the right network. Once a webchat client is available, we can direct them to it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #12127	2021-11-12 15:40:35 -08:00
Attila Fülöp	fb9eee4cc2	gcc 11 cleanup Compiling with gcc 11.1.0 produces three new warnings. Change the code slightly to avoid them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #12130 Closes #12188 Closes #12237	2021-11-12 15:24:36 -08:00
Brian Behlendorf	820c95750b	Use fallthrough macro As of the Linux 5.9 kernel a fallthrough macro has been added which should be used to anotate all intentional fallthrough paths. Once all of the kernel code paths have been updated to use fallthrough the -Wimplicit-fallthrough option will because the default. To avoid warnings in the OpenZFS code base when this happens apply the fallthrough macro. Additional reading: https://lwn.net/Articles/794944/ Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12441	2021-11-12 15:24:36 -08:00
Rich Ercolani	66c9e15686	Correct a flaw in the Python 3 version checking It turns out the ax_python_devel.m4 version check assumes that ("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1 is 10 or above and X is not. (Also presumably X+1=100 and ...) So let's remake the check to behave consistently, using the "packaging" or (if absent) the "distlib" modules. (Also, update the Github workflows to use the new packages.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes: #12073	2021-11-12 15:24:36 -08:00
Rich Ercolani	c64c17328b	Let zfs diff be more permissive In the current world, `zfs diff` will die on certain kinds of errors that come up on ordinary, not-mangled filesystems - like EINVAL, which can come from a file with multiple hardlinks having the one whose name is referenced deleted. Since it should always be safe to continue, let's relax about all error codes - still print something for most, but don't immediately abort when we encounter them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12072	2021-11-12 15:24:36 -08:00
Rich Ercolani	439b4b134d	Added test for being able to read various variants of zstd As detailed in #12022 and #12008, it turns out the current zstd implementation is quite nonportable, and results in various configurations of ondisk header that only each platform can read. So I've added a test which contains a dataset with a file written by Linux/x86_64 and one written by FBSD/ppc64. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12030	2021-11-12 15:24:36 -08:00
наб	5957574694	zed: only go up to current limit in close_from() fallback Consider the following strace log: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=10241024}) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = -1 EBADF (Bad file descriptor) dup2(0, 30000) = -1 EBADF (Bad file descriptor) dup2(0, 300000) = -1 EBADF (Bad file descriptor) prlimit64(0, RLIMIT_NOFILE, {rlim_cur=10241024, rlim_max=1024*1024}, NULL) = 0 dup2(0, 30) = 30 dup2(0, 300) = 300 dup2(0, 3000) = 3000 dup2(0, 30000) = 30000 dup2(0, 300000) = 300000 Even a privileged process needs to bump its rlimit before being able to use fds higher than rlim_cur. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11834	2021-11-12 15:24:36 -08:00
наб	3e04897edc	zed: implement close_from() in terms of /proc/self/fd, if available /dev/fd on Darwin Consider the following strace output: prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0 Yes, that is well over a million file descriptors! This reduces the ZED start-up time from "at least a second" to "instantaneous", and, under strace, from "don't even try" to "usable" by simple virtue of doing five syscalls instead of over a million; in most cases the main loop does nothing Recent Linuxes (5.8+) have close_range(2) for this, but that's an overoptimisation (and libcs don't have wrappers for it yet) This is also run by the ZEDLET pre-exec. Compare: Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0 to Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0 Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0 lol Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11834	2021-11-12 15:24:36 -08:00
Rich Ercolani	fb823061b0	Fix cross-endian interoperability of zstd It turns out that layouts of union bitfields are a pain, and the current code results in an inconsistent layout between BE and LE systems, leading to zstd-active datasets on one erroring out on the other. Switch everyone over to the LE layout, and add compatibility code to read both. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12008 Closes #12022	2021-11-12 15:24:36 -08:00
George Melikov	4e8a639d5f	CI: generate ABI files if changed So commit author can just download them as artifacts and commit. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #12379	2021-11-12 15:22:08 -08:00
Brian Behlendorf	8009f02748	Update bug report template - Remove the "SPL Version" line, the repositories have been merged since the 0.8 release and we no longer need to ask about this. - Simply ask for the kernel version / patch level and add a hint about how to get this information on Linux and FreeBSD. - Remove "Status: Triage Needed" from the template, in practice we really haven't been using this label so let's step setting it. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #12340	2021-11-12 15:22:00 -08:00
Jonathon	4a5316eef4	Update libera webchat client URL Libera have made a webchat client available. This change builds on #12127. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev> Closes #12251	2021-11-12 15:21:48 -08:00
Paul Dagnelie	e00f3da136	Don't direct to freenode in issue template While Libera doesn't yet have a webchat client, we should at least direct them to the right network. Once a webchat client is available, we can direct them to it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #12127	2021-11-12 15:21:31 -08:00
Tony Hutter	ef686e96ec	Tag zfs-2.0.6 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2021-09-22 15:19:08 -07:00
Brian Behlendorf	7a41ef240a	Linux 5.15 compat: get_acl() Kernel commits 332f606b32b6 ovl: enable RCU'd ->get_acl() 0cad6246621b vfs: add rcu argument to ->get_acl() callback Added compatibility code to detect the new ->get_acl() interface and correctly handle the case where the new rcu argument is set. Reviewed-by: Coleman Kane <ckane@colemankane.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12548	2021-09-22 15:19:08 -07:00
Alexander	72d16a9b49	Linux 5.15 compat: standalone <linux/stdarg.h> Kernel commits 39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers") c0891ac15f04 ("isystem: ship and use stdarg.h") 564f963eabd1 ("isystem: delete global -isystem compile option") (for now can be found in linux-next.git tree, will land into the Linus' tree during the ongoing 5.15 cycle with one of akpm merges) removed the -isystem flag and disallowed the inclusion of any compiler header files. They also introduced a minimal <linux/stdarg.h> as a replacement for <stdarg.h>. include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes <stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h> and include it instead of the compiler's one to prevent module build breakage. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #12531	2021-09-22 15:19:08 -07:00
Brian Behlendorf	54c358c3f2	Linux 5.15 compat: block device readahead The 5.15 kernel moved the backing_dev_info structure out of the request queue structure which causes a build failure. Rather than look in the new location for the BDI we instead detect this upstream refactoring by the existance of either the blk_queue_update_readahead() or disk_update_readahead() functions. In either case, there's no longer any reason to manually set the ra_pages value since it will be overridden with a reasonable default (2x the block size) when blk_queue_io_opt() is called. Therefore, we update the compatibility wrapper to do nothing for 5.9 and newer kernels. While it's tempting to do the same for older kernels we want to keep the compatibility code to preserve the existing behavior. Removing it would effectively increase the default readahead to 128k. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12532	2021-09-22 15:19:08 -07:00
Brian Behlendorf	560e9fc817	Linux 5.14 compat: META Increase the Linux-Maximum version in the META file to 5.14. All of the required compatibility patches have been merged and the 5.14 kernel has been officially released. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12565	2021-09-22 15:19:08 -07:00
Brian Behlendorf	d642efe83c	Linux 5.13 compat: META Increase the Linux-Maximum version in the META file to 5.13. All of the required compatibility patches have been merged and the 5.13 kernel has been officially released. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2021-09-22 15:19:08 -07:00
Alexander Motin	51e313f610	FreeBSD: Ignore make_dev_s() errors Since errors returned by zvol_create_minor_impl() are ignored by the common code, it is more convenient to ignore make_dev_s() errors there. It allows, for example, to get device created for the zvol after later rename instead of having it further stuck in half-created state. zvol_rename_minor() already ignores those errors. While there, switch from MAXPHYS to maxphys in FreeBSD 13+. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12375	2021-09-22 15:19:08 -07:00
Alexander Motin	327f12c291	FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+ Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12378	2021-09-22 15:19:08 -07:00
Alexander Motin	5d4f7e7566	FreeBSD: Retry OCF ENOMEM errors. ZFS does not expect transient errors from crypto. For read they are counted as checksum errors, while for write end up in panic. To not panic on random low memory conditions retry ENOMEM errors in the OCF wrapper function. While there remove unneeded timeout and priority from msleep(). External-issue: https://reviews.freebsd.org/D30339 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12077	2021-09-22 15:19:08 -07:00
Serapheim Dimitropoulos	abd0b59e48	Livelist logic should handle dedup blkptrs Update the logic to handle the dedup-case of consecutive FREEs in the livelist code. The logic still ensures that all the FREE entries are matched up with a respective ALLOC by keeping a refcount for each FREE blkptr that we encounter and ensuring that this refcount gets to zero by the time we are done processing the livelist. zdb -y no longer panics when encountering double frees Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #11480 Closes #12177	2021-09-22 15:19:08 -07:00
Coleman Kane	6b9d0eda75	Linux 5.14 compat: explicity assign set_page_dirty Kernel 5.14 introduced a change where set_page_dirty of struct address_space_operations is no longer implicitly set to __set_page_dirty_buffers(), which ended up resulting in a NULL pointer deref in the kernel when it is attempted to be called. This change sets .set_page_dirty in the structure to __set_page_dirty_nobuffers(), which was introduced with the related patch set. The breaking change was introduce in commit 0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12427	2021-09-22 15:19:08 -07:00
Paul Dagnelie	744cdfd93b	Add SIGSTOP and SIGTSTP handling to issig This change adds SIGSTOP and SIGTSTP handling to the issig function; this mirrors its behavior on Solaris. This way, long running kernel tasks can be stopped with the appropriate signals. Note that doing so with ctrl-z on the command line doesn't return control of the tty to the shell, because tty handling is done separately from stopping the process. That can be future work, if people feel that it is a necessary addition. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Issue #810 Issue #10843 Closes #11801	2021-09-22 15:19:08 -07:00
Brian Behlendorf	36d50b60d8	Linux 5.14 compat: blk_alloc_disk() In Linux 5.14, blk_alloc_queue is no longer exported, and its usage has been superseded by blk_alloc_disk, which returns a gendisk struct from which we can still retrieve the struct request_queue* that is needed in the one place where it is used. This also replaces the call to alloc_disk(minors), and minors is now set via struct member assignment. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12362 Closes #12409	2021-09-22 15:19:08 -07:00
Mark Johnston	0c4f86be74	Initialize dn_next_type[] in the dnode constructor It seems nothing ensures that this array is zeroed when a dnode is freshly allocated, so in principle it retains the values from the previous allocation. In practice it seems to be the case that the fields should end up zeroed, but we can zero the field anyway for consistency. This was found using KMSAN. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12383	2021-09-22 15:19:08 -07:00
Mark Johnston	88be308b2f	Zero pad bytes following TX_WRITE log data When logging a TX_WRITE record in the case where file data has to be copied from the DMU, we pad the log record size to a multiple of 8 bytes. In this case, any padding bytes should be zeroed, otherwise the contents of uninitialized memory are written to the ZIL. This was found using KMSAN. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12383	2021-09-22 15:19:08 -07:00
Mark Johnston	d6dc79eabc	Zero pad bytes when allocating a ZIL record When allocating a record, we round up the allocation size to a multiple of 8. In this case, any padding bytes should be zeroed, otherwise the contents of uninitialized memory are written to the ZIL. This was found using KMSAN. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12383	2021-09-22 15:19:08 -07:00
Mark Johnston	900a444107	Initialize all fields in zfs_log_xvattr() When logging TX_SETATTR, we could otherwise fail to initialize part of the corresponding ZIL record depending on which fields are present in the xvattr. Initialize the creation time and the AV scan timestamp to zero so that uninitialized bytes are not written to the ZIL. This was found using KMSAN. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12383	2021-09-22 15:19:08 -07:00
George Wilson	1885e5ebab	file reference counts can get corrupted Callers of zfs_file_get and zfs_file_put can corrupt the reference counts for the file structure resulting in a panic or a soft lockup. When zfs send/recv runs, it will add a reference count to the open file, and begin to send or recv the stream. If the file descriptor is closed, then when dmu_recv_stream() or dmu_send() return we will call zfs_file_put to remove the reference we placed on the file structure. Unfortunately, because zfs_file_put() uses the file descriptor to lookup the file structure, it may end up finding that the file descriptor table no longer contains the file struct, thus leaking the file structure. Or it might end up finding a file descriptor for a different file and blindly updating its reference counts. Other failure modes probably exists. This change reworks the zfs_file_[get\|put] interface to not rely on the file descriptor but instead pass the zfs_file_t pointer around. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-issue: DLPX-76119 Closes #12299	2021-09-22 15:19:08 -07:00
Antonio Russo	1ad8fcc054	Revert Consolidate arc_buf allocation checks This reverts commit `13fac09868`. Per the discussion in #11531, the reverted commit---which intended only to be a cleanup commit---introduced a subtle, unintended change in behavior. Care was taken to partially revert and then reapply `10b3c7f5e4` which would otherwise have caused a conflict. These changes were squashed in to this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Suggested-by: @chrisrd Suggested-by: robn@despairlabs.com Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11531 Closes #12227	2021-09-22 15:19:08 -07:00
Rich Ercolani	05c96b438a	Fix unfortunate NULL in spa_update_dspace After `1325434b`, we can in certain circumstances end up calling spa_update_dspace with vd->vdev_mg NULL, which ends poorly during vdev removal. So let's not do that further space adjustment when we can't. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12380 Closes #12428	2021-09-22 15:19:08 -07:00
Rich Ercolani	ccf6d0a59b	Tinker with slop space accounting with dedup * Tinker with slop space accounting with dedup Do not include the deduplicated space usage in the slop space reservation, it leads to surprising outcomes. * Update spa_dedup_dspace sometimes Sometimes, we get into spa_get_slop_space() with spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set. So call the code to update it before we use it if we hit that case. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12271	2021-09-22 15:19:08 -07:00
Prakash Surya	61ae6c99f7	Add upper bound for slop space calculation This change modifies the behavior of how we determine how much slop space to use in the pool, such that now it has an upper limit. The default upper limit is 128G, but is configurable via a tunable. (Backporting note: Snipped out the embedded_log portion of the changes.) Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #11023	2021-09-22 15:19:08 -07:00
Tony Hutter	e9353bc2ef	Tag zfs-2.0.5 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2021-06-23 13:28:36 -07:00
George Amanakis	87d93731e7	Avoid deadlock when removing L2ARC devices under I/O In case we have I/O and try to remove an L2ARC device a deadlock might occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV to be dropped while holding the hash_lock. However, spa_l2cache_load() holds SCL_ALL and waits for the hash_lock in l2arc_evict(). Fix this by moving zfs_blkptr_verify() to the top top arc_read() before the hash_lock is taken. Verify the block pointer and return a checksum error if damaged rather than halting the system, by using BLK_VERIFY_LOG instead of BLK_VERIFY_HALT. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #12054	2021-06-23 13:22:15 -07:00
Paul Zuchowski	bd197378e7	Do not hash unlinked inodes In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210	2021-06-23 13:22:15 -07:00
jharmening	cd2bb9ca44	FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate whether it has changed the busy state of the mount. The filesystem may still assume that its .vfs_quotactl entrypoint is always called with the mount busied, but only needs to unbusy the mount (and clear *mp_busy) if it does something that actually requires the mount to be unbusied. It no longer needs to blindly copy-paste the UFS protocol for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jason Harmening <jason.harmening@gmail.com> Closes #12052	2021-06-23 13:22:15 -07:00

1 2 3 4 5 ...

6645 Commits All Branches Search

6645 Commits

All Branches