Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Rich Ercolani	35a6247c5f	Add a delay to tearing down threads. It's been observed that in certain workloads (zvol-related being a big one), ZFS will end up spending a large amount of time spinning up taskqs only to tear them down again almost immediately, then spin them up again... I noticed this when I looked at what my mostly-idle system was doing and wondered how on earth taskq creation/destroy was a bunch of time... So I added a configurable delay to avoid it tearing down tasks the first time it notices them idle, and the total number of threads at steady state went up, but the amount of time being burned just tearing down/turning up new ones almost vanished. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14938	2023-06-26 13:57:12 -07:00
Ameer Hamza	a52c8d49c6	Merge pull request #139 from truenas/truenas/zfs-2.2-testing Forward port truenas/zfs patches to upstream openzfs master	2023-06-22 16:30:04 +05:00
Ameer Hamza	5bf0d5db13	Bump changelog for 2.1.99 Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ameer Hamza	fc2b4d3458	Skip id-mapped tests for now due to nfsv4 acls incompatibility	2023-06-21 21:29:23 +05:00
Ameer Hamza	e3b5817448	Port latest zfsd changes from upstream FreeBSD Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ameer Hamza	06029e211c	Port TrueNAS contrib changes and adjust github workflows Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 21:29:23 +05:00
Andrew Walker	c16f99d389	Improve zpl_permission performance This function can be frequently called with MAY_EXEC\|MAY_NOT_BLOCK during RCU path walk. Where possible we should try not to break out of it. In this case we check whether flag ZFS_NO_EXECS_DENIED is set and check mode (similar to fastexecute check in zfs_acl.c). Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ameer Hamza	f34365ed28	zfsd: add support for hotplugging spares If you remove an unused spare and then reinsert it, zfsd will now online it in all pools. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 21:29:23 +05:00
Umer Saleem	0b58a60509	Fix OpenZFS build issue for Debian Bookworm dkms package layout is changed in bookworm and splits into dh-dkms package. Debhelper in Bookworm is updated to use dh-sequence-dkms instead of dkms. GitHub Actions are updated to use Ubuntu 22.04 instead of Ubuntu 20.04, since dh-sequence-dkms is not aavailable on Ubuntu 20.04. Signed-off-by: Umer Saleem <usaleem@ixsystems.com>	2023-06-21 21:29:23 +05:00
Andrew Walker	33ec2c3e96	Simplify get/set NFS4 ACL (#113 ) This removes an extra memory allocation / free from the NFS4 ACL xattr handler. Initially this was written rather quickly in the alpha cycle of SCALE and implemented in a way to ensure that xattr was exactly matching format used internally in samba's vfs_acl_xattr module. Since this time a more efficient conversion between the Samba format and various other ones was added for the purpose of inclusion in the Kernel NFS server. This change simplifies conversion between internal NFS ACL and external xattr representation, but has no impact on userspace and kernel consumers of this xattr (format does not change). Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 21:29:23 +05:00
Andrew Walker	09a0c8a0ee	Fix ZFS_READONLY implementation on Linux (#121 ) MS-FSCC 2.6 is the governing document for DOS attribute behavior. It specifies the following: For a file, applications can read the file but cannot write to it or delete it. For a directory, applications cannot delete it, but applications can create and delete files from the directory. Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 21:29:23 +05:00
Umer Saleem	02af6c4175	Update CI workflow for native packages CI workflow now builds RPM converted Debian packages along with native debian packages. Signed-off-by: Umer Saleem <usaleem@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ryan Moeller	6115cf6a76	SCALE: ignore wholedisk We never want to partition vdevs automatically from ZFS in SCALE. Ignore the wholedisk flag in SCALE and skip the tests that expect auto partitioning to work. Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 21:29:23 +05:00
Umer Saleem	f4efe4ea92	Build packages with debug symbols With --enable-debuginfo configured, ZFS packages are built with debug symbols embedded into the binaries. Signed-off-by: Umer Saleem <usaleem@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ameer Hamza	f41d5dc6f1	Add kfpu entry to kbuild and suppress Cppcheck checks Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ryan Moeller	26b74065b9	Provide kfpu_begin/end from spl Jira: NAS-115648	2023-06-21 21:29:23 +05:00
Ryan Moeller	631adac5f6	initramfs: Skip lvm scan before boot pool import TrueNAS SCALE doesn't boot from pools on top of LVM, and the scan can take a significant amount of time on systems with a large number of disks. Skip the lvm commands in our local-top/zfs script. Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 21:29:23 +05:00
Andrew	ac2420afb0	NAS-116836 / Force BSD semantics for group ownership if NFSV4ACL (#78 ) When a new file is created on FreeBSD it is given the group of the directory which contains it. On Linux it is given to either the effective GID of the process (System V semantices) or the GID of the parent directory (BSD semantics). Since there is no hard-and-fast rule about creation semantics for NFSv4 ACLs on Linux, we should opt for what is least likely to break users permissions on change from FreeBSD to Linux. Avoid setting actually setting the SGID bit on dirs unless it was explicitly set. Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ameer Hamza	c0d493822b	Fix ACL build errors on sync with openzfs/master Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 21:29:23 +05:00
Andrew	6bf8daf376	Add ability for xattr handler to "strip" NFSv4 ACL (#54 ) On Linux POSIX ACLs can be removed via rmxattr() for the relevant system xattrs. On FreeBSD a non-trivial ACL can be converted to one that is described by the mode with no loss of info via combination of acl_get_file(), acl_strip_np(), and acl_set_file(). Since there's no libc equivalent of these ops in Linux for NFSv4 ACLs, this commit makes this less error prone by handling entirely in ZFS. When user performs rmxattr() vfs_setxattr() is called with value of NULL and length of 0. Add special handling for this situation in the xattr handler for the NFSv4 ACL so that we generate a new ACL and zfs_acl_chmod() with the existing mode of file, then set the ACL. Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 21:29:23 +05:00
Andrew	6dc46c7d54	NAS-115465 / 22.12 / expose ZFS_ACL_TRIVIAL to users (#52 ) Add ACL_IS_TRIVIAL and ACL_IS_DIR flags as ACL-wide flags in the system.nfs4_acl_xdr generated on getxattr requests. This are non-RFC flags that are useful for userspace applications (especially the ACL_IS_TRIVIAL flag as it can be used to avoid relatively expensive ACL-related operations). Also add system.nfs4_acl_xdr to xattr results if ACL is not trivial. This duplicates POSIX ACL behavior where whether an ACL is set on a path can be determined via listxattr(). Since the ACL is not actually removed, we check whether the ZFS_ACL_TRIVIAL is set. If the flag is not set, then we omit the xattr name from the list. This allows users to determine whether ACL is trivial from listxattr(). Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 21:29:23 +05:00
Ryan Moeller	e5f1583a08	Make zpl_permission work with 5.12+ kernels The "permission" inode operation takes a new `struct user_namespace *` parameter starting in Linux 5.12. Add a configure check and adapt accordingly. Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 02:51:24 +05:00
Ryan Moeller	e7904b8280	Switch to production builds for SCALE Jira: NAS-113186 Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 02:51:24 +05:00
Andrew Walker	8503a85e06	Fix access check when cred allows override of ACL Properly evaluate edge cases where user credential may grant capability to override DAC in various situations. Switch to using ns-aware checks rather than capable(). Expand optimization allow bypass of zfs_zaccess() in case of trivial ACL if MAY_OPEN is included in requested mask. This will be evaluated in generic_permission() check, which is RCU walk safe. This means that in most cases evaluating permissions on boot volume with NFSv4 ACLs will follow the fast path on checking inode permissions. Additionally, CAP_SYS_ADMIN is granted to nfsd process, and so override for this capability in access2 policy check is removed in favor of a simple check for fsid == 0. Checks for CAP_DAC_OVERRIDE and other override capabilities are kept as-is. Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 02:51:24 +05:00
Alexander Motin	4d8b67b164	Write /sys/kernel/wait_for_device_probe before import. The new sysfs attribute makes kernel to wait for all device probe to complete before return. Without it wait_for_udev call does not give any guaranties. Ticket: NAS-108200 Signed-off-by: Alexander Motin <mav@FreeBSD.org>	2023-06-21 02:51:24 +05:00
Ryan Moeller	c078b8660e	Make acltype=nfsv4 the default on Linux, too Now that we support NFSv4 ACLs on Linux, this can now be made the default across all platforms. Update the documentation and tests accordingly. Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 02:51:24 +05:00
Ameer Hamza	3c72bef6bd	Adjust zfsd Makefiles for openzfs compatibility Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-06-21 02:51:15 +05:00
Ryan Moeller	35ca19b591	Add zfsd for FreeBSD Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 00:33:40 +05:00
Andrew	c6ba4a01f0	Implement NFSv41 ACLs through xattr This implements NFSv41 (RFC 5661) ACLs in a manner compatible with vfs_nfs4acl_xattr in Samba and nfs4xdr-acl-tools. There are three key areas of change in this commit: 1) NFSv4 ACL management through system.nfs4_acl_xdr xattr. Install an xattr handler for "system.nfs4_acl_xdr" that presents an xattr containing full NFSv41 ACL structures generated through rpcgen using specification from the Samba project. This xattr is used by userspace programs to read and set permissions. 2) add an i_op->permissions endpoint: zpl_permissions(). This is used by the VFS in Linux to determine whether to allow / deny an operation. Wherever possible, we try to avoid having to call zfs_access(). If kernel has NFSv4 patch for VFS, then perform more complete check of avaiable access mask. 3) add capability-based overrides to secpolicy_vnode_access2() there are various situations in which ACL may need to be overridden based on capabilities. This logic is almost directly copied from Linux VFS. For instance, root needs to be able to always read / write ACLs (otherwise admin can get locked out from files). This is commit was initially inspired by work from Paul B. Henson to implement NFSv4.0 (RFC3530) ACLs in ZFS on Linux. Key areas of divergence are as follows: - ACL specification, xattr format, xattr name - Addition of handling for NFSv4 masks from Linux VFS - Addition of ACL overrides based on capabilities Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 00:33:32 +05:00
Andrew Walker	5e1eba8718	Advertise support for large xattrs on TrueNAS SB_LARGEXATTR is used in TrueNAS SCALE to indicate to the kernel that the filesystem supports large-size xattrs (greater than 64KiB). This flag is used to evaluate whether to allow large xattr read or write requests (up to 2 MiB). Signed-off-by: Andrew Walker <awalker@ixsystems.com>	2023-06-21 00:33:25 +05:00
Waqar Ahmed	cfd08bedb2	Add action to build and push docker image on master update Signed-off-by: Waqar Ahmed <waqarahmedjoyia@live.com>	2023-06-21 00:33:20 +05:00
Andrew Walker	17d7f9de97	Add check for custom TrueNAS kernel Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 00:33:13 +05:00
Waqar Ahmed	fd31804abc	Add CI for building zfs package Signed-off-by: Ryan Moeller <ryan@iXsystems.com>	2023-06-21 00:33:06 +05:00
Matt Macy	ae78a23f75	Fix ZFS_DEBUG_MODIFY assert in arc_buf_try_copy_decompressed_data The assert does not account for the case where there is a single buffer in the chain that is decompressed and has a valid checksum. Signed-off-by: Matt Macy <mmacy@FreeBSD.org>	2023-06-21 00:32:59 +05:00
Ryan Moeller	23f878a89d	Add packaging bits for TrueNAS SCALE	2023-06-21 00:32:51 +05:00
Alexander Motin	8e8acabdca	Fix memory leak in zil_parse(). `482da24e2` missed arc_buf_destroy() calls on log parse errors, possibly leaking up to 128KB of memory per dataset during ZIL replay. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14987	2023-06-17 19:51:37 -07:00
George Amanakis	10e36e1761	Shorten arcstat_quiescence sleep time With the latest L2ARC fixes, 2 seconds is too long to wait for quiescence of arcstats like l2_size. Shorten this interval to avoid having the persistent L2ARC tests in ZTS prematurely terminated. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14981	2023-06-15 12:45:36 -07:00
Alexander Motin	ccec7fbe1c	Remove ARC/ZIO physdone callbacks. Those callbacks were introduced many years ago as part of a bigger patch to smoothen the write throttling within a txg. They allow to account completion of individual physical writes within a logical one, improving cases when some of physical writes complete much sooner than others, gradually opening the write throttle. Few years after that ZFS got allocation throttling, working on a level of logical writes and limiting number of writes queued to vdevs at any point, and so limiting latency distribution between the physical writes and especially writes of multiple copies. The addition of scheduling deadline I proposed in #14925 should further reduce the latency distribution. Grown memory sizes over the past 10 years should also reduce importance of the smoothing. While the use of physdone callback may still in theory provide some smoother throttling, there are cases where we simply can not afford it. Since dirty data accounting is protected by pool-wide lock, in case of 6-wide RAIDZ, for example, it requires us to take it 8 times per logical block write, creating huge lock contention. My tests of this patch show radical reduction of the lock spinning time on workloads when smaller blocks are written to RAIDZ pools, when each of the disks receives 8-16KB chunks, but the total rate reaching 100K+ blocks per second. Same time attempts to measure any write time fluctuations didn't show anything noticeable. While there, remove also io_child_count/io_parent_count counters. They are used only for couple assertions that can be avoided. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14948	2023-06-15 10:49:03 -07:00
Brian Behlendorf	e32e326c5b	ZTS: Skip send_raw_ashift on FreeBSD On FreeBSD 14 this test runs slowly in the CI environment and is killed by the 10 minute timeout. Skip the test on FreeBSD until the slow down is resolved. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #14961	2023-06-14 08:04:05 -07:00
Alexander Motin	d057807ede	Switch refcount tracking from lists to AVL-trees. With large number of tracked references list searches under the lock become too expensive, creating enormous lock contention. On my tests with ZFS_DEBUG enabled this increases write throughput with 32KB blocks from ~1.2GB/s to ~7.5GB/s. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14970	2023-06-14 08:02:27 -07:00
George Amanakis	8af1104f83	Store the L2ARC device ashift in the vdev label If this is not done, and the pool has an ashift other than the default (at the moment 9) then the following happens: 1) vdev_alloc() assigns the ashift of the pool to L2ARC device, but upon export it is not stored anywhere 2) at the first import, vdev_open() sees an vdev_ashift() of 0 and assigns the logical_ashift, which is 9 3) reading the contents of L2ARC, including the header fails 4) L2ARC buffers are not restored in ARC. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14313 Closes #14963	2023-06-14 08:01:17 -07:00
George Amanakis	feff9dfed3	Fix the L2ARC write size calculating logic (2) While commit `bcd5321` adjusts the write size based on the size of the log block, this happens after comparing the unadjusted write size to the evicted (target) size. In this case l2ad_hand will exceed l2ad_evict and violate an assertion at the end of l2arc_write_buffers(). Fix this by adding the max log block size to the allocated size of the buffer to be committed before comparing the result to the target size. Also reset the l2arc_trim_ahead ZFS module variable when the adjusted write size exceeds the size of the L2ARC device. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14936 Closes #14954	2023-06-09 17:05:47 -07:00
Alexander Motin	70ea484e3e	Finally drop long disabled vdev cache. It was a vdev level read cache, designed to aggregate many small reads by speculatively issuing bigger reads instead and caching the result. But since it has almost no idea about what is going on with exception of ZIO_FLAG_DONT_CACHE flag set by higher layers, it was found to make more harm than good, for which reason it was disabled for the past 12 years. These days we have much better instruments to enlarge the I/Os, such as speculative and prescient prefetches, I/O scheduler, I/O aggregation etc. Besides just the dead code removal this removes one extra mutex lock/unlock per write inside vdev_cache_write(), not otherwise disabled and trying to do some work. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14953	2023-06-09 12:40:55 -07:00
Brian Behlendorf	6db4ed51d6	ZTS: Skip checkpoint_discard_busy Until the ASSERT which is occasionally hit while running checkpoint_discard_busy is resolved skip this test case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #12053 Closes #14952	2023-06-09 11:10:01 -07:00
Alexander Motin	90ccfd426d	Improve l2arc reporting in arc_summary. - Do not report L2ARC as FAULTED in presence of in-flight writes. - Report read and write I/Os, bytes and errors. - Remove few numbers not important to average user. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #12304 Closes #14946	2023-06-09 10:14:05 -07:00
Alexander Motin	b3ad3f48d9	Use list_remove_head() where possible. ... instead of list_head() + list_remove(). On FreeBSD the list functions are not inlined, so in addition to more compact code this also saves another function call. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14955	2023-06-09 10:12:52 -07:00
Alexander Motin	55b1842f92	ZIL: Fix race introduced by `f63811f072`. We are not allowed to access lwb after setting LWB_STATE_FLUSH_DONE state and dropping zl_lock, since it may be freed by zil_sync(). To free itxs and waiters after dropping the lock we need to move lwb_itxs and lwb_waiters lists elements to local storage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14957 Closes #14959	2023-06-09 10:08:05 -07:00
Rich Ercolani	6c96269024	Revert "systemd: Use non-absolute paths in Exec* lines" This reverts commit `79b20949b2` since it doesn't work with the systemd version shipped with RHEL7-based systems. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14943 Closes #14945	2023-06-07 11:14:05 -07:00
Brian Behlendorf	93f8abeff0	Linux: Never sleep in kmem_cache_alloc(..., KM_NOSLEEP) (#14926 ) When a kmem cache is exhausted and needs to be expanded a new slab is allocated. KM_SLEEP callers can block and wait for the allocation, but KM_NOSLEEP callers were incorrectly allowed to block as well. Resolve this by attempting an emergency allocation as a best effort. This may fail but that's fine since any KM_NOSLEEP consumer is required to handle an allocation failure. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2023-06-07 10:43:43 -07:00
George Amanakis	bcd5321039	Fix the L2ARC write size calculating logic l2arc_write_size() should return the write size after adjusting for trim and overhead of the L2ARC log blocks. Also take into account the allocated size of log blocks when deciding when to stop writing buffers to L2ARC. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14939	2023-06-06 12:32:37 -07:00

... 4 5 6 7 8 ...

8935 Commits All Branches Search

8935 Commits

All Branches