Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Rob Norris	1feddf1ed6	vdev_disk: reorganise vdev_disk_io_start Light reshuffle to make it a bit more linear to read and get rid of a bunch of args that aren't needed in all cases. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. (cherry picked from commit ad847ff6acb77fbba0f3ab2e864784225fd41007)	2024-04-03 10:10:04 +11:00
Rob Norris	d00ab549d4	vdev_disk: rename existing functions to vdev_classic_* This is just renaming the existing functions we're about to replace and grouping them together to make the next commits easier to follow. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. (cherry picked from commit 9bf6a7c8c3bdcc4e5975fa5baf6e9ff6f279a553)	2024-04-03 10:09:57 +11:00
Rob Norris	edf4cf0ce7	abd: add page iterator The regular ABD iterators yield data buffers, so they have to map and unmap pages into kernel memory. If the caller only wants to count chunks, or can use page pointers directly, then the map/unmap is just unnecessary overhead. This adds adb_iterate_page_func, which yields unmapped struct page instead. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. (cherry picked from commit 930b785c61e9724f0a3a0e09571032ed397f368c)	2024-04-03 10:09:53 +11:00
Richard Yao	609550bc5f	Convert enum zio_flag to uint64_t We ran out of space in enum zio_flag for additional flags. Rather than introduce enum zio_flag2 and then modify a bunch of functions to take a second flags variable, we expand the type to 64 bits via `typedef uint64_t zio_flag_t`. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Richard Yao <richard.yao@klarasystems.com> Closes #14086	2023-09-07 18:54:12 +00:00
Rob Norris	d6d343bb87	Revert debug code from bio-seg-limit This patch reverts parts of bf08bc108dce6f1ecd0820c8b5a67b6fb7962c7e.	2023-08-10 19:10:59 +00:00
Rob Norris	bf18873541	bio-seg-limit debug Various bits of output for catching broken bios. (cherry picked from commit b1a5bc49acce3cbec56f3bf0638539f836aa2208) Signed-off-by: Allan Jude <allan@klarasystems.com>	2023-07-31 15:05:56 +00:00
Rob Norris	74e8091130	abd_bio_map_off: avoid splitting scatter pages across bios This is the same change as the previous commit, but for scatter abds. Its less clear if this change is needed. Since scatter abds are only ever added a page at time, both sides of the split should always be added in consecutive segments. Intuitively though, it may be possible for a partially-filled bio to be used, or a bio with an odd number of iovecs, and that then could lead to a misaligned bio. While I've not been able to reproduce this at all, it seems to make sense to protect against it. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit cbdf21fd1a32a5e696a22cad497d9211221fa309)	2023-07-31 15:05:56 +00:00
Rob Norris	1bd475d6c6	bio_map: avoid splitting abd pages across bios If we encounter a split page, we add two iovecs to the bio, one for the fragment of the buffer on each side of the split. In order to do this safely, we must be sure that we always have room for both fragments. Its possible for a linear abd to have multiple pages, in which case we want to add the "left" fragment, then a run of proper 4K pages. then then "right" fragment. In this way we can keep whole pages together as much possible. This change handles both cases by noticing a split page. If we don't have at least two iovecs remaining in the bio, then we abort outright (allowing the caller to allocate a new bio and try again). We add the "left" fragment, and note how big we expect the right fragment to be. Then we load in as many full pages as are available. When we reach the last iovec, we close out the bio by taking as uch as is necessary to restore alignment. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit 173cafcc3d8b6c94c61844c705d7a410f412a18e)	2023-07-31 15:05:56 +00:00
Rob Norris	98bcc390e8	vdev_disk: rework bio max segment calculation A single "page" in an ABD does not necessarily correspond to one segment in a bio, because of how ZFS does ABD allocations and how it breaks them up with adding them to a bio. Because of this, simply dividing the ABD size by the page size can only ever give a minimum number of segments required, rather than the correct number. Until we can fix that, we'll just make each bio as large as they can be for as many segments as the device queue will permit without needing to split the the bio. This is a little wasteful if we don't intend to put that many segments in the bio, but its not a lot of memory and its only lost until the bio is completed. This also adds a tuneable, vdev_disk_max_segs, to allow setting this value to be set by the operator. This is very useful for debugging. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit a3a438d1bedb0626417cd73ba10b1479a06bef7f)	2023-07-31 15:05:56 +00:00
Allan Jude	066532da51	Add module parameter to block 0 byte writes Some hardware has issues when issues a write of 0 bytes Add a new module paramter, zio_suppress_zero_writes That when enabled (default) will just complete these I/Os without sending them to the hardware. Signed-off-by: Allan Jude <allan@klarasystems.com>	2023-07-05 13:27:31 +00:00
Allan Jude	2284c4d200	Add module parameter to block 0 byte writes Some hardware has issues when issues a write of 0 bytes Add a new module paramter, zio_suppress_zero_writes That when enabled (default) will just complete these I/Os without sending them to the hardware. Signed-off-by: Allan Jude <allan@klarasystems.com>	2023-07-05 13:27:31 +00:00
Rob Norris	2724bcb3d6	zil: allow the ZIL to fail and restart independently of the pool zil_commit() has always returned void, and thus, cannot fail. Everything inside it assumed that if anything ever went wrong, it could fall back on txg_wait_synced() until the txg covering the operations being flushed from the ZIL has fully committed. This meant that if the pool failed and failmode=continue was set, syncing operations like fsync() would still block. Unblocking zil_commit() means largely the same approach. The difficulty is that the ZIL carries the record of uncommitted VFS operations (vs the changed data), and attached to those, callbacks and cvs that will release userspace callers once the data is on disk. So if we can't write the ZIL, we also can't release those records until the data is on disk. This wasn't a problem before, because the zil_commit() would block. If we change zil_commit() to return error, we still need to track those entries until the data they represent hits the disk. We also need to accept new records; just because the ZIL fails may not necessarily mean the pool itself is unavailable. This commit reorganises the ZIL to allow zil_commit() to return failure. If ZIL writes or flushes fail, the ZIL is moved into a "failed" state, and no further writes are done; all zil_commit() calls are serviced by the regular txg mechanism. Outstanding records (itx_ts) are held until the main pool writes their associated txg out. The records are then released. Once all records are cleared, the ZIL is reset and reopened. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit af821006f6602261e690fe6635689cabdeefcadf)	2023-07-05 13:27:31 +00:00
Rob Norris	7b7af8ba02	vnops: thread DMU_TX_ASSIGN_CONTINUE to a bunch of vnops These are ones that I'm reasonably sure connect to a real syscall and have a reasonable error response. I've left stuff like `dirty_inode`, `zfs_inactive`, etc, which are internal kernel housekeeping things, as well as anything that looks like it belongs to zvols, ioctls, admin commands, etc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit 39c2801c611e27b521d716fea8f771307820362e)	2023-07-05 13:27:30 +00:00
Rob Norris	48a48059c7	dmu: rename dmu_tx_assign flags Their names clash with those for txg_wait_synced_tx, and they aren't directly compatible, leading to confusion. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit 1f0fb1dae7c1e84de3b39e669e09b8b3d5b80b87)	2023-07-05 13:27:30 +00:00
Rob Norris	3aea149bf8	linux: reject syncing ops if the filesystem is unmounting The kernel can call these during unmount, so we have to handle them directly to prevent any further IO being issued. zfs_fsync reorganised slightly to not set up zfs_fsyncer_key until after the teardown lock is acquired, just in case we don't get it. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> (cherry picked from commit 900c26570ddcdd1d3ca135e6aee5df6456f6bfd6)	2023-07-05 13:27:30 +00:00
Mariusz Zaborski	40a9efd0e8	zfs: support force exporting pools This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test #3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #3461 (cherry picked from commit 852e633772217d779a63e8c46fe3c5f81dd8960e)	2023-07-05 13:27:30 +00:00
Brian Behlendorf	fec407fb69	Linux 5.19 compat: aops->read_folio() As of the Linux 5.19 kernel the readpage() address space operation has been replaced by read_folio(). Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13515	2022-06-01 14:24:49 -07:00
Brian Behlendorf	7ae5ea8864	Linux 5.19 compat: blkdev_issue_secure_erase() Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure erase functionality from the blkdev_issue_discard() function. The blkdev_issue_secure_erase() must now be issued to issue a secure erase. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13515	2022-06-01 14:24:49 -07:00
Brian Behlendorf	048301b6dc	Linux 5.19 compat: bdev_max_secure_erase_sectors() Linux 5.19 commit torvalds/linux@44abff2c0 removed the blk_queue_secure_erase() helper function. The preferred interface is to now use the bdev_max_secure_erase_sectors() function to check for discard support. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13515	2022-06-01 14:24:49 -07:00
Brian Behlendorf	9ce5eb18ef	Linux 5.19 compat: bdev_max_discard_sectors() Linux 5.19 commit torvalds/linux@70200574cc removed the blk_queue_discard() helper function. The preferred interface is to now use the bdev_max_discard_sectors() function to check for discard support. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13515	2022-06-01 14:24:49 -07:00
Brian Behlendorf	5a639f0802	Linux 5.18 compat: bio_alloc() As for the Linux 5.18 kernel bio_alloc() expects a block_device struct as an argument. This removes the need for the bio_set_dev() compatibility code for 5.18 and newer kernels. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13515	2022-06-01 14:24:49 -07:00
hping	b28c0c4bf8	abd_os: remove redundant refcount creation for abd_children Refcount creation for abd_zero_scatter->abd_children is redundant in abd_alloc_zero_scatter, as it has been done in abd_init_struct. In addition, abd_children is undefined when ZFS_DEBUG is disabled, the reference of abd_children in abd_alloc_zero_scatter breaks build of libzpool when ZFS_DEBUG is disabled. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Ping Huang <huangping@smartx.com> Closes #13429	2022-05-20 10:33:24 -07:00
Aidan Harris	eee389ba2e	Fix functions without a prototype clang-15 emits the following error message for functions without a prototype: fs/zfs/os/linux/spl/spl-kmem-cache.c:1423:27: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Aidan Harris <me@aidanharr.is> Closes #13421	2022-05-20 10:33:24 -07:00
Mateusz Guzik	2c5c8bb0a6	FreeBSD: use zero_region instead of allocating a dedicated page Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13406	2022-05-20 10:33:24 -07:00
szubersk	756c3e085b	autoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol A followup to `849c14e048` Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #13389	2022-05-20 10:33:24 -07:00
Damian Szuberski	13b1f336d3	PPC get_user workaround Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic() inline helpers very indirectly include a reference to the GPL'd array mmu_feature_keys[] and fails to build. Workaround this by using copy_from_user() and throwing EFAULT for any calls to __copy_from_user_inatomic(). This is a workaround until a fix for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458 "powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Authored-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #11958 Closes #12590 Closes #13367	2022-05-20 10:33:24 -07:00
Brian Atkinson	60fc173251	Adding ZERO_PAGE detection On some architectures ZERO_PAGE is unavailable because it references a GPL exported symbol of empty_zero_page. Originally `e08b993` removed the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However, a simple check can be done to avoid a kernel allocation and free for the abd_zero_page if ZERO_PAGE is available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #13199	2022-05-20 10:33:24 -07:00
Ka Ho Ng	1f31889046	FreeBSD: Implement hole-punching support This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes #12458	2022-05-17 11:15:29 -07:00
наб	ecec151c14	module: zfs: freebsd: fix unused, remove argsused Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12844	2022-05-02 15:42:58 -07:00
наб	a4f582f0b6	FreeBSD: remove unused variable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue #12899	2022-05-02 15:42:58 -07:00
наб	b8e1366ee6	zvol: remove unused variable Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12917	2022-05-02 15:42:58 -07:00
Brian Behlendorf	49c1346c10	Linux 5.18 compat: replace __set_page_dirty_nobuffers Replace __set_page_dirty_nobuffers with filemap_dirty_folio. Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache ") Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Authored-by: Satadru Pramanik <satadru@gmail.com> Signed-off-by: Satadru Pramanik <satadru@gmail.com> Closes #13325 Closes #13380	2022-04-28 15:17:38 -07:00
Brian Behlendorf	71cd3726c0	Fix O_APPEND for Linux 3.15 and older kernels When using a Linux kernel which predates the iov_iter interface the O_APPEND flag should be applied in zpl_aio_write() via the call to generic_write_checks(). The updated pos variable was incorrectly ignored resulting in the current offset being used. This issue should only realistically impact the RHEL/CentOS 7.x kernels which are based on Linux 3.10. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13370 Closes #13377	2022-04-28 15:15:28 -07:00
наб	642426095a	Linux 5.18 compat: kobj_type.default_attrs replaced with default_groups Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject: kobj_type: remove default_attrs") Upstream-commit: `0cdda2edb3` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #13357	2022-04-25 10:00:09 -07:00
Alexander Motin	972637dc06	FreeBSD: Fix translation from ABD to physical pages. In hypothetical case of non-linear ABD with single segment, multiple to page size but not aligned to it, vdev_geom_fill_unmap_cb() could fill one page less into bio_ma array. I am not sure it is expoitable, but better to be safe than sorry. Reported-by: Mark Johnston <markj@FreeBSD.org> Signed-off-by: Alexander Motin <mav@FreeBSD.org> (cherry picked from commit `5352f85cdd`)	2022-04-21 16:59:09 -07:00
Rich Ercolani	c220771a47	Corrected oversight in ZERO_RANGE behavior It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do have differing semantics in some ways - in particular, one requires KEEP_SIZE, and the other does not. Also added a zero-range test to catch this, corrected a flaw that made the punch-hole test succeed vacuously, and a typo in file_write. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #13329 Closes #13338	2022-04-21 16:58:07 -07:00
Brian Behlendorf	aa1c3c1d1d	Linux 5.17 compat: GENHD_FL_EXT_DEVT / GENHD_FL_NO_PART_SCAN As of the 5.17 kernel the GENHD_FL_EXT_DEVT flag has been removed and the GENHD_FL_NO_PART_SCAN flag renamed GENHD_FL_NO_PART. Update zvol_alloc() to set GENHD_FL_NO_PART for the newer kernels which is sufficient. The behavior for prior kernels remains unchanged. 1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT") 46e7eac6 ("block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART") Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13294 Closes #13297	2022-04-20 13:44:19 -07:00
Mark Johnston	b7546f92ea	FreeBSD: Return Mach error codes from VOP_(GET\|PUT)PAGES FreeBSD's memory management system uses its own error numbers and gets confused when these VOPs return EIO. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reported-by: Peter Holm <pho@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #13311	2022-04-19 10:42:54 -07:00
Mark Johnston	e9cd90f6e5	FreeBSD: Parameterize ZFS_ENTER/ZFS_VERIFY_VP with an error code For legacy reasons, a couple of VOPs have to return error numbers that don't come from the usual errno namespace. To handle the cases where ZFS_ENTER or ZFS_VERIFY_ZP fail, we need to be able to override the default error return value of EIO. Extend the macros to permit this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #13311	2022-04-19 10:42:54 -07:00
Riccardo Schirone	35ddd8ee2e	Linux 5.18 compat: use address_space_operations->readahead ->readpages was removed and replaced by ->readahead. Define zpl_readahead for kernels that don't have ->readpages. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Riccardo Schirone <rschirone91@gmail.com> Closes #13278	2022-04-06 13:15:27 -07:00
Riccardo Schirone	10a9f5fc47	Linux 5.18 compat: blkg_tryget is moved to private headers Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Riccardo Schirone <rschirone91@gmail.com> Closes #13278	2022-04-06 13:15:27 -07:00
наб	9f7f704507	Linux 5.18 compat: replace genhd.h with blkdev.h includes blkdev.h includes genhd.h since dawn of upstream git, so this is globally safe Upstream-commit: 322cbb50de711814c42fb088f6d31901502c711a ("block: remove genhd.h") Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #13251	2022-04-06 13:15:27 -07:00
наб	215a8255a9	Linux 5.18 compat: 4-argument bio_alloc() bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs) became bio_alloc(struct block_device *bdev, unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask) passing NULL/0 continues previous behaviour Upstream-commit: 07888c665b405b1cd3577ddebfeb74f4717a84c4 ("block: pass a block_device and opf to bio_alloc") Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #13251	2022-04-06 13:15:27 -07:00
Ryan Moeller	a5a28723bd	FreeBSD: Use NDFREE_PNBUF if available NDF_ONLY_PNBUF has been removed from FreeBSD in favor of NDFREE_PNBUF. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #13277	2022-04-06 10:29:53 -07:00
Brian Behlendorf	847d03060f	Fix ACL checks for NFS kernel server This PR changes ZFS ACL checks to evaluate fsuid / fsgid rather than euid / egid to avoid accidentally granting elevated permissions to NFS clients. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Andrew Walker <awalker@ixsystems.com> Co-authored-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #13221	2022-03-20 21:21:18 -07:00
Kyle Evans	421750672b	module: freebsd: avoid a taking a destroyed lock in zfs_zevent bits At shutdown time, we drain all of the zevents and set the ZEVENT_SHUTDOWN flag. On FreeBSD, we may end up calling zfs_zevent_destroy() after the zevent_lock has been destroyed while the sysevent thread is winding down; we observe ESHUTDOWN, then back out. Events have already been drained, so just inline the kmem_free call in sysevent_worker() to avoid the race, and document the assumption that zfs_zevent_destroy doesn't do anything else useful at that point. This fixes a panic that can occur at module unload time. Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes #13220	2022-03-18 17:11:43 -07:00
Mateusz Guzik	275c756730	FreeBSD: add missing replay check to an assert in zfs_xvattr_set Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13219	2022-03-18 17:11:43 -07:00
Mark Johnston	b3427b18b1	zfs: Fix a deadlock between page busy and the teardown lock When rolling back a dataset, ZFS has to purge file data resident in the system page cache. To do this, it loops over all vnodes for the mountpoint and calls vn_pages_remove() to purge pages associated with the vnode's VM object. Each page is thus exclusively busied while the dataset's teardown write lock is held. When handling a page fault on a mapped ZFS file, FreeBSD's page fault handler busies newly allocated pages and then uses VOP_GETPAGES to fill them. The ZFS getpages VOP acquires the teardown read lock with vnode pages already busied. This represents a lock order reversal which can lead to deadlock. To break the deadlock, observe that zfs_rezget() need only purge those pages marked valid, and that pages busied by the page fault handler are, by definition, invalid. Furthermore, ZFS pages always transition from invalid to valid with the teardown lock held, and ZFS never creates partially valid pages. Thus, zfs_rezget() can use the new vn_pages_remove_valid() to skip over pages busied by the fault handler. PR: 258208 Tested by: pho Reviewed by: avg, sef, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32931 Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #12828	2022-03-04 15:37:41 -08:00
Alexander Motin	0e2bb1a3ee	Really zero the zero page While switching abd_zero_buf allocation KPI I've missed the fact that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc() does not. Add explicit bzero() after it. I don't think it should have caused real problems, but leaking one memory page content all over the pool is not good. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12569	2022-03-04 15:37:33 -08:00
Paul Dagnelie	7bd292e59b	Fix cpu hotplug atomic sleep issue We move the spinlock unlock before the thread creation. This should be safe because the thread creation code doesn't actually manipulate any taskq data structures; that's done by the thread once it's created. We also remove the assertion that the maxthreads is the current threads plus one; that assertion could fail if multiple hotplug events come in quick succession, and the first new taskq thread hasn't had a chance to start processing yet. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> eviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #12714	2022-02-15 21:52:45 -08:00

1 2 3 4 5 ...

464 Commits