Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Andrea Righi	cb28c0b770	Linux 6.5 compat: safe cleanup in spl_proc_fini() If we fail to create a proc entry in spl_proc_init() we may end up calling unregister_sysctl_table() twice: one in the failure path of spl_proc_init() and another time during spl_proc_fini(). Avoid the double call to unregister_sysctl_table() and while at it refactor the code a bit to reduce code duplication. This was accidentally introduced when the spl code was updated for Linux 6.5 compatibility. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Closes #15234 Closes #15235	2023-09-11 16:34:17 -07:00
Coleman Kane	c74a17a498	Linux 6.5 compat: Use copy_splice_read instead of filemap_splice_read Using the filemap_splice_read function for the splice_read handler was leading to occasional data corruption under certain circumstances. Favor using copy_splice_read instead, which does not demonstrate the same erroneous behavior under the tested failure cases. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15164	2023-09-11 16:34:12 -07:00
Coleman Kane	9b7f7f02e9	Linux 6.5 compat: replace generic_file_splice_read with filemap_splice_read The generic_file_splice_read function was removed in Linux 6.5 in favor of filemap_splice_read. Add an autoconf test for filemap_splice_read and use it if it is found as the handler for .splice_read in the file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This change removes the ITER_* macros that OpenZFS doesn't use from being tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was causing the test to fail, which also affected the code responsible for setting the .splice_read handler, above. That behavior caused run-time panics on Linux 6.5. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15155	2023-09-11 16:34:01 -07:00
Coleman Kane	cb115edfc6	Linux 6.5 compat: register_sysctl_table removed Additionally, the .child element of ctl_table has been removed in 6.5. This change adds a new test for the pre-6.5 register_sysctl_table() function, and uses the old code in that case. If it isn't found, then the parentage entries in the tables are removed, and the register_sysctl call is provided the paths of "kernel/spl", "kernel/spl/kmem", and "kernel/spl/kstat" directly, to populate each subdirectory over three calls, as is the new API. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15138	2023-09-11 16:33:55 -07:00
Brian Atkinson	0ee7a08627	Revert "Linux 6.5 compat: register_sysctl_table removed" This reverts commit `b35374fd64` as there are error messages when loading the SPL module. Errors seemed to be tied to duplicate a duplicate entry. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #15134	2023-09-11 16:33:43 -07:00
Coleman Kane	5ee79af41f	Linux 4.20 compat: wrapper function for iov_iter type access An iov_iter_type() function to access the "type" member of the struct iov_iter was added at one point. Move the conditional logic to decide which method to use for accessing it into a macro and simplify the zpl_uio_init code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15100	2023-09-11 16:20:50 -07:00
Coleman Kane	feb0fa6b38	Linux 6.4 compat: iter_iov() function now used to get old iov member The iov_iter->iov member is now iov_iter->__iov and must be accessed via the accessor function iter_iov(). Create a wrapper that is conditionally compiled to use the access method appropriate for the target kernel version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15100	2023-09-11 16:20:42 -07:00
Coleman Kane	7c0618bdb7	Linux 6.5 compat: blkdev changes Multiple changes to the blkdev API were introduced in Linux 6.5. This includes passing (void* holder) to blkdev_put, adding a new blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type that replaces uses of fmode_t, and removing an argument from the release handler on block_device_operations that we weren't using. The open function definition has also changed to take gendisk* and blk_mode_t, so update it accordingly, too. Implement local wrappers for blkdev_get_by_path() and vdev_blkdev_put() so that the in-line calls are cleaner, and place the conditionally-compiled implementation details inside of both of these local wrappers. Both calls are exclusively used within vdev_disk.c, at this time. Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE The wrapper function is now used for testing using the appropriate method for the kernel, whether the open mode is writable or not. Emphasize fmode_t arg in zvol_release is not used Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15099	2023-09-11 16:20:26 -07:00
Coleman Kane	211868b5d0	Linux 6.5 compat: register_sysctl_table removed Additionally, the .child element of ctl_table has been removed in 6.5. This change adds a new test for the pre-6.5 register_sysctl_table() function, and uses the old code in that case. If it isn't found, then the parentage entries in the tables are removed, and the register_sysctl call is provided the paths of "kernel/spl", "kernel/spl/kmem", and "kernel/spl/kstat" directly, to populate each subdirectory over three calls, as is the new API. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15098	2023-09-11 15:18:05 -07:00
Brian Behlendorf	837e426c1f	Linux: Never sleep in kmem_cache_alloc(..., KM_NOSLEEP) (#14926 ) When a kmem cache is exhausted and needs to be expanded a new slab is allocated. KM_SLEEP callers can block and wait for the allocation, but KM_NOSLEEP callers were incorrectly allowed to block as well. Resolve this by attempting an emergency allocation as a best effort. This may fail but that's fine since any KM_NOSLEEP consumer is required to handle an allocation failure. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2023-09-11 15:17:41 -07:00
Rich Ercolani	426d07d64c	quick fix for lingering snapdir unmount problems Unfortunately, even after `e79b6807`, I still, much more rarely, tripped asserts when playing with many ctldir mounts at once. Since this appears to happen if we dispatched twice too fast, just ignore it. We don't actually need to do anything if someone already started doing it for us. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14462	2023-08-24 17:16:05 -07:00
Rich Ercolani	692f78045e	Workaround issue cleaning up automounted snapshots on Linux On Linux, sometimes, when ZFS goes to unmount an automounted snap, it fails a VERIFY check on debug builds, because taskq_cancel_id returned ENOENT after not finding the taskq it was trying to cancel. This presumably happens when it already died for some reason; in this case, we don't really mind it already being dead, since we're just going to dispatch a new task to unmount it right after. So we just ignore it if we get back ENOENT trying to cancel here, retry a couple times if we get back the only other possible condition (EBUSY), and log to dbgmsg if we got anything but ENOENT or success. (We also add some locking around taskqid, to avoid one or two cases of two instances of trying to cancel something at once.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #11632 Closes #12670	2023-08-24 17:16:05 -07:00
szubersk	dbbc2f9688	Fix Clang 15 compilation errors - Clang 15 doesn't support `-fno-ipa-sra` anymore. Do a separate check for `-fno-ipa-sra` support by $KERNEL_CC. - Don't enable `-mgeneral-regs-only` for certain module files. Fix #13260 - Scope `GCC diagnostic ignored` statements to GCC only. Clang doesn't need them to compile the code. Porting notes: - Moved the stanzas removing -mgeneral-regs-only to Makefile.in since they wouldn't readily work in Kbuild.in and that did. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #13260 Closes #14150 Closes #14624 Ported-by: Rich Ercolani <rincebrain@gmail.com Signed-off-by: Rich Ercolani <rincebrain@gmail.com>	2023-06-05 18:25:57 -07:00
youzhongyang	5f125e9012	Linux 6.4 compat: reclaimed_slab renamed to reclaimed Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #14891	2023-06-05 10:59:02 -07:00
youzhongyang	f0aca5f7bb	Linux 6.3 compat: idmapped mount API changes Linux kernel 6.3 changed a bunch of APIs to use the dedicated idmap type for mounts (struct mnt_idmap), we need to detect these changes and make zfs work with the new APIs. NOTE: This backport only includes the configure checks to detect the 6.3 idmap API changes. It does not include support for idmap. When provided the idmap variable is ignored in most case in the same way the user_ns argument was ignored. This change is solely to provide compatibility with the new interfaces. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #14682	2023-06-05 10:59:02 -07:00
youzhongyang	04305bbd18	Linux 6.3 compat: writepage_t first arg struct folio* The type def of writepage_t in kernel 6.3 is changed to take struct folio* as the first argument. We need to detect this change and pass correct function to write_cache_pages(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #14699	2023-06-05 10:59:02 -07:00
Brian Behlendorf	3ad6c1692f	Linux: use filemap_range_has_page() As of the 4.13 kernel filemap_range_has_page() can be used to check if there is a page mapped in a given file range. When available this interface should be used which eliminates the need for the zp->z_is_mapped boolean. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14493	2023-06-05 10:59:02 -07:00
Shaan Nobee	9e5a297de6	Speed up WB_SYNC_NONE when a WB_SYNC_ALL occurs simultaneously Page writebacks with WB_SYNC_NONE can take several seconds to complete since they wait for the transaction group to close before being committed. This is usually not a problem since the caller does not need to wait. However, if we're simultaneously doing a writeback with WB_SYNC_ALL (e.g via msync), the latter can block for several seconds (up to zfs_txg_timeout) due to the active WB_SYNC_NONE writeback since it needs to wait for the transaction to complete and the PG_writeback bit to be cleared. This commit deals with 2 cases: - No page writeback is active. A WB_SYNC_ALL page writeback starts and even completes. But when it's about to check if the PG_writeback bit has been cleared, another writeback with WB_SYNC_NONE starts. The sync page writeback ends up waiting for the non-sync page writeback to complete. - A page writeback with WB_SYNC_NONE is already active when a WB_SYNC_ALL writeback starts. The WB_SYNC_ALL writeback ends up waiting for the WB_SYNC_NONE writeback. The fix works by carefully keeping track of active sync/non-sync writebacks and committing when beneficial. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shaan Nobee <sniper111@gmail.com> Closes #12662 Closes #12790	2023-06-05 10:59:02 -07:00
Mateusz Guzik	07a2ba541d	FreeBSD: add missing vop_fplookup assignments It became illegal to not have them as of 5f6df177758b9dff88e4b6069aeb2359e8b0c493 ("vfs: validate that vop vectors provide all or none fplookup vops") upstream. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #14788	2023-05-30 08:53:21 -07:00
rob-wing	f786232b2a	FreeBSD: don't verify recycled vnode for zfs control directory Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501	2023-05-30 08:53:21 -07:00
Brian Behlendorf	45c4b3e680	Fix checkstyle warning Resolve a missed checkstyle warning. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14799	2023-05-30 08:53:21 -07:00
Mateusz Guzik	092021ba39	FreeBSD: add missing vn state transition for .zfs Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #14774	2023-05-30 08:53:21 -07:00
Mateusz Guzik	aef1324d59	FreeBSD: fix up EINVAL from getdirentries on .zfs Without the change: /.zfs /.zfs/snapshot find: /.zfs: Invalid argument Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #14774	2023-05-30 08:53:21 -07:00
Dimitry Andric	d1e05c6856	FreeBSD: make zfs_vfs_held() definition consistent with declaration Noticed while attempting to change FreeBSD's boolean_t into an actual bool: in include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to return a boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is defined to return an int. Make the definition match the declaration. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Dimitry Andric <dimitry@andric.com> Closes #14776	2023-05-30 08:53:21 -07:00
Brian Behlendorf	6ec3abcb59	Use vmem_zalloc to silence allocation warning The kmem allocation in zfs_prune_aliases() will trigger a large allocation warning on systems with 64K pages. Resolve this by switching to vmem_alloc() which internally uses kvmalloc() so the right allocator will be used based on the allocation size. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8491 Closes #14694	2023-05-26 10:10:09 -07:00
Richard Yao	4a5950a129	Linux: zfs_fillpage() should handle partial pages from end of file After `89cd2197b9` was merged, Clang's static analyzer began complaining about a dead assignment in `zfs_fillpage()`. Upon inspection, I noticed that the dead assignment was because we are not using the calculated io_len that we should use to avoid asking the DMU to read past the end of a file. This should result in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`. This issue predates `89cd2197b9`, but its simplification of zfs_fillpage() eliminated the only use of the assignment to io_len, which made Clang's static analyzer complain about the issue. Also, as a precaution, we add an assertion that io_offset < i_size. If this ever fails, bad things will happen. Otherwise, we are blindly trusting the kernel not to give us invalid offsets. We continue to blindly trust it on non-debug kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14534	2023-04-21 13:12:35 -07:00
Brian Behlendorf	c7db374ac6	Fix buffered/direct/mmap I/O race When a page is faulted in for memory mapped I/O the page lock may be dropped before it has been read and marked up to date. If a buffered read encounters such a page in mappedread() it must wait until the page has been updated. Failure to do so will result in a panic on debug builds and incorrect data on production builds. The critical part of this change is in mappedread() where pages which are not up to date are now handled. Additionally, it includes the following simplifications. - zfs_getpage() and zfs_fillpage() could be passed an array of pages. This could be more efficient if it was used but in practice only a single page was ever provided. These interfaces were simplified to acknowledge that. - update_pages() was modified to correctly set the PG_error bit on a page when it cannot be read by dmu_read(). - Setting PG_error and PG_uptodate was moved to zfs_fillpage() from zpl_readpage_common(). This is consistent with the handling in update_pages() and mappedread(). - Minor additional refactoring to comments and variable declarations to improve readability. - Add a test case to exercise concurrent buffered, direct, and mmap IO to the same file. - Reduce the mmap_sync test case default run time. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13608 Closes #14498	2023-04-21 13:12:35 -07:00
Brian Behlendorf	164d184ed9	Additional limits on hole reporting Holding the zp->z_rangelock as a RL_READER over the range 0-UINT64_MAX is sufficient to prevent the dnode from being re-dirtied by concurrent writers. To avoid potentially looping multiple times for external caller which do not take the rangelock holes are not reported after the first sync. While not optimal this is always functionally correct. This change adds the missing rangelock calls on FreeBSD to zvol_cdev_ioctl(). Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14512 Closes #14641	2023-03-29 10:40:49 -07:00
Ameer Hamza	bd9a9a4e1a	zed: mark disks as REMOVED when they are removed ZED does not take any action for disk removal events if there is no spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED on removal event. This means that if you are running zed and remove a disk, it will be propertly marked as REMOVED. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-03-27 11:32:09 -07:00
Alexander Motin	5219a2691e	FreeBSD: Remove extra arc_reduce_target_size() call Remove arc_reduce_target_size() call from arc_prune_task(). The idea of arc_prune_task() is to remove external references on ARC metadata, such as vnodes. Since arc_prune_async() is called only from ARC itself, it makes no sense to create a parasitic loop between ARC eviction and the pruning, treatening to drop ARC to its minimum. I can't guess why it was added as part of FreeBSD to OpenZFS integration. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14639	2023-03-24 12:27:52 -05:00
naivekun	345f8beb58	QAT: Fix uninitialized seed in QAT compression CpaDcRqResults have to be initialized with checksum=1 for adler32. Otherwise when error CPA_DC_OVERFLOW occurred, the next compress operation will continue on previously part-compressed data, and write invalid checksum data. When zfs decompress the compressed data, a invalid checksum will occurred and lead to #14463 Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Weigang Li <weigang.li@intel.com> Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com> Signed-off-by: naivekun <naivekun0817@gmail.com> Closes #14632 Closes #14463	2023-03-17 11:09:07 -07:00
Mateusz Piotrowski	576d34cb11	Turn default_bs and default_ibs into ZFS_MODULE_PARAMs The default_bs and default_ibs tunables control the default block size and indirect block size. So far, default_bs and default_ibs were tunable only on FreeBSD, e.g., sysctl vfs.zfs.default_ibs Remove the FreeBSD-specific sysctl code and expose default_bs and default_ibs as tunables on both Linux and FreeBSD using ZFS_MODULE_PARAM. One of the use cases for changing the values of those tunables is to lower the indirect block size, which may improve performance of large directories (as discussed during the OpenZFS Leadership Meeting on 2022-08-16). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Closes #14293	2023-03-07 13:58:36 -08:00
Alexander Motin	2098a00318	Remove atomics from zh_refcount It is protected by z_hold_locks, so we do not need more serialization, simple integer math should be fine. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #14196	2023-03-02 14:37:07 -08:00
Allan Jude	e45a981f6d	Avoid a null pointer dereference in zfs_mount() on FreeBSD When mounting the root filesystem, vfs_t->mnt_vnodecovered is null This will cause zfsctl_is_node() to dereference a null pointer when mounting, or updating the mount flags, on the root filesystem, both of which happen during the boot process. Reported-by: Martin Matuska <mm@FreeBSD.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #14218	2023-02-06 10:40:16 -08:00
Allan Jude	5161e5d8a4	Allow mounting snapshots in .zfs/snapshot as a regular user Rather than doing a terrible credential swapping hack, we just check that the thing being mounted is a snapshot, and the mountpoint is the zfsctl directory, then we allow it. If the mount attempt is from inside a jail, on an unjailed dataset (mounted from the host, not by the jail), the ability to mount the snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Modirum MDPay Sponsored-by: Klara Inc. Closes #13758	2023-02-06 10:40:16 -08:00
Coleman Kane	232fc23c6e	linux 6.2 compat: zpl_set_acl arg2 is now struct dentry Linux 6.2 changes the second argument of the set_acl operation to be a "struct dentry " rather than a "struct inode ". The inode* parameter is still available as dentry->d_inode, so adjust the call to the _impl function call to dereference and pass that pointer to it. Also document that the get_acl -> get_inode_acl member name change from commit `884a693` was an API change also introduced in Linux 6.2. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #14415	2023-01-24 15:36:08 -08:00
Mateusz Guzik	be697f4339	FreeBSD: catch up to 1400077 Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #14328	2023-01-19 12:50:42 -08:00
Doug Rabson	70b1b5bb98	FreeBSD: Remove stray debug printf Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Doug Rabson <dfr@rabson.org> Closes #14286 Closes #14287	2023-01-19 12:50:36 -08:00
Allan Jude	6219190d7f	Restrict visibility of per-dataset kstats inside FreeBSD jails When inside a jail, visibility on datasets not "jailed" to the jail is restricted. However, it was possible to enumerate all datasets in the pool by looking at the kstats sysctl MIB. Only the kstats corresponding to datasets that the user has visibility on are accessible now. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #14254	2023-01-19 12:50:36 -08:00
Richard Yao	572114d846	FreeBSD: zfs_register_callbacks() must implement error check correctly I read the following article and noticed a couple of ZFS bugs mentioned: https://pvs-studio.com/en/blog/posts/cpp/0377/ I decided to search for them in the modern OpenZFS codebase and then found one that matched the description of the first one: V593 Consider reviewing the expression of the 'A = B != C' kind. The expression is calculated as following: 'A = (B != C)'. zfs_vfsops.c 498 The consequence of this is that the error value is replaced with `1` when there is an error. When there is no error, 0 is correctly passed. This is a very minor issue that is unlikely to cause any real problems. The incorrect error code would either be returned to the mount command on a failure or any of `zfs receive`, `zfs recv`, `zfs rollback` or `zfs upgrade`. The second one has already been fixed. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14261	2023-01-19 12:50:36 -08:00
Coleman Kane	b586ea5d93	linux 6.2 compat: get_acl() got moved to get_inode_acl() in 6.2 Linux 6.2 renamed the get_acl() operation to get_inode_acl() in the inode_operations struct. This should fix Issue #14323. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #14323 Closes #14331	2023-01-10 08:43:49 -08:00
Antonio Russo	138d2b29dd	Linux 6.1 compat: open inside tmpfile() commit `d27c81847b` upstream Linux 863f144 modified the .tmpfile interface to pass a struct file, rather than a struct dentry, and expect the tmpfile implementation to open inside of tmpfile(). This patch implements a configuration test that checks for this new API and appropriately sets a HAVE_TMPFILE_DENTRY flag that tracks this old API. Contingent on this flag, the appropriate API is implemented. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #14301 Closes #14343	2023-01-09 17:15:22 -08:00
Ameer Hamza	75fbe7eb99	skip permission checks for extended attributes zfs_zaccess_trivial() calls the generic_permission() to read xattr attributes. This causes deadlock if called from zpl_xattr_set_dir() context as xattr and the dent locks are already held in this scenario. This commit skips the permissions checks for extended attributes since the Linux VFS stack already checks it before passing us the control. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2023-01-05 11:10:28 -08:00
Ryan Moeller	fbbc375d43	FreeBSD: Fix potential boot panic with bad label vdev_geom_read_pool_label() can leave NULL in configs. Check for it and skip consistently when generating rootconf. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #14291 (cherry picked from commit `dc8c2f6158`)	2023-01-05 11:00:09 -08:00
Damian Szuberski	3d1e808096	Fix clang 13 compilation errors ``` os/linux/zfs/zvol_os.c:1111:3: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result] add_disk(zv->zv_zso->zvo_disk); ^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ zpl_xattr.c:1579:1: warning: no previous prototype for function 'zpl_posix_acl_release_impl' [-Wmissing-prototypes] ``` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #13551 (cherry picked from commit `9884319666`)	2022-12-01 12:39:44 -08:00
Richard Yao	957c3776f2	FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy() There is an off by 1 error in the check. Fortunately, this function does not appear to be used in kernel space, despite being compiled as part of the kernel module. However, it is used in userspace. Callers of lzc_ioctl_fd() likely will crash if they attempt to use the unimplemented request number. This was reported by FreeBSD's coverity scan. Reported-by: Coverity (CID 1432059) Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14135	2022-12-01 12:39:43 -08:00
Serapheim Dimitropoulos	85537f77a3	Expose zfs_vdev_open_timeout_ms as a tunable Some of our customers have been occasionally hitting zfs import failures in Linux because udevd doesn't create the by-id symbolic links in time for zpool import to use them. The main issue is that the systemd-udev-settle.service that zfs-import-cache.service and other services depend on is racy. There is also an openzfs issue filed (see https://github.com/openzfs/zfs/issues/10891) outlining the problem and potential solutions. With the proper solutions being significant in terms of complexity and the priority of the issue being low for the time being, this patch exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are experiencing this issue often can increase it as a workaround. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #14133	2022-12-01 12:39:43 -08:00
Pavel Snajdr	52e658edd7	Remove zpl_revalidate: fix snapshot rollback Open files, which aren't present in the snapshot, which is being roll-backed to, need to disappear from the visible VFS image of the dataset. Kernel provides d_drop function to drop invalid entry from the dcache, but inode can be referenced by dentry multiple dentries. The introduced zpl_d_drop_aliases function walks and invalidates all aliases of an inode. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #9600 Closes #14070	2022-12-01 12:39:42 -08:00
Richard Yao	c6d93d0a80	FreeBSD: Fix uninitialized pointer read in spa_import_rootpool() The FreeBSD project's coverity scans found this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13923	2022-12-01 12:39:40 -08:00
Richard Yao	9f1691a964	Linux: Fix use-after-free in zfsvfs_create() Coverity reported that we pass a pointer to zfsvfs to `dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after a failure in zfsvfs_init(). We have nearly identical duplicate versions of this code for FreeBSD and Linux, but interestingly, the FreeBSD version of this code differs in such a way that it does not suffer from this bug. We remove the difference from the FreeBSD version to fix this bug. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13883	2022-12-01 12:39:40 -08:00

1 2 3 4 5 ...

513 Commits